دكتوراه
العلوم والتقنية
University of Birmingham
مجال التميز | بحثي دراسي |
البحوث المنشورة |
|
البحث (1): | |
عنوان البحث: | Integrating character-level and word-level representation for affect in Arabic tweets |
رابط إلى البحث: | https://www.sciencedirect.com/science/article/pii/S0169023X21000938 |
تاريخ النشر: | 27/12/2021 |
موجز عن البحث: | Affect tasks, which range from sentiment polarity classification to finer grained sentiment strength and emotional intensity detection, have become of increasing interest due to the vast amount of user-generated content and advanced learning models. Word representation models have been leveraged effectively within a variety of natural language processing tasks. However, these models are not always effective in the context of social media. When dealing with social media posts in Arabic, the use of Arabic dialects needs to be considered. Although using informal text to train word-level models can lead to the identification of words that convey the same meaning, these models are unable to capture the full extent of the words that are used in the real world due to out-of-vocabulary (OOV) words. The inability to identify such words is one of the main limitations of word-level models. One approach of overcoming OOV is through the use of character-level embeddings as they can effectively learn the vectors of word parts or character n-grams. This study uses a combination of character-level and word-level models to identify the most effective methods by which affective Arabic words in tweets can be represented semantically and morphologically. We evaluate our generated models and the proposed method by integrating them in a supervised learning framework that was used for a range of affect tasks and other related tasks. Our findings reveal that the developed models surpassed the performance of state-of-the-art Arabic pre-trained word embeddings over eight datasets. In addition, our models enhance previous state-of-the-art outcomes on tasks involving Arabic emotion intensity, outperforming the top-systems that used advanced ensemble learning models and several additional features. |
البحث (2): | |
عنوان البحث: | Enhancing contextualised language models with static character and word embeddings for emotional intensity and sentiment strength detection in Arabic tweets |
رابط إلى البحث: | https://www.sciencedirect.com/science/article/pii/S1877050921012084 |
تاريخ النشر: | 01/01/2021 |
موجز عن البحث: | Many studies have focused on Arabic sentiment or emotion classification tasks. However, research on alternative aspects of affect, such as emotional intensity and sentiment strength tasks, has been somewhat limited. In this paper, we propose a method for enriching a contextualised language model that incorporates static character and word embeddings for emotional intensity and sentiment strength in Arabic tweets. We examine the assumption that models using static embeddings that are trained specifically on a corpus containing extensive Arabic affect-related words can boost the performance of language models. Through the development of character-level embeddings, we have found that our method is able to overcome the limitations associated with out-of-vocabulary words, which is a common problem when dealing with Arabic informal text. Given this, the method that we have developed achieves state-of-the-art results for the detection of the intensity of emotion and sentiment strength in Arabic social media. |
البحث (3): | |
عنوان البحث: | International Conference on Applications of Natural Language to Information Systems |
رابط إلى البحث: | https://link.springer.com/chapter/10.1007/978-3-030-51310-8_20 |
تاريخ النشر: | 17/06/2021 |
موجز عن البحث: | Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features. |
المؤتمرات العلمية |
|
المؤتمر (1): | |
عنوان المؤتمر: | International Conference on Applications of Natural Language to Information Systems |
تاريخ الإنعقاد: | 17/06/2021 |
مكان الإنعقاد: | Germany |
طبيعة المشاركة: | Conference paper and presentation |
عنوان المشاركة: | Combining character and word embeddings for affect in Arabic informal social media microblogs |
ملخص المشاركة: | Word representation models have been successfully applied in many natural language processing tasks, including sentiment analysis. However, these models do not always work effectively in some social media contexts. When considering the use of Arabic in microblogs like Twitter, it is important to note that a variety of different linguistic domains are involved. This is mainly because social media users employ various dialects in their communications. While training word-level models with such informal text can lead to words being captured that have the same meanings, these models cannot capture all words that can be encountered in the real world due to out-of-vocabulary (OOV) words. The inability to identify words is one of the main limitations of this word-level model. In contrast, character-level embeddings can work effectively with this problem through their ability to learn the vectors of character n-grams or parts of words. We take advantage of both character- and word-level models to discover more effective methods to represent Arabic affect words in tweets. We evaluate our embeddings by incorporating them into a supervised learning framework for a range of affect tasks. Our models outperform the state-of-the-art Arabic pre-trained word embeddings in these tasks. Moreover, they offer improved state-of-the-art results for the task of Arabic emotion intensity, outperforming the top-performing systems that employ a combination of deep neural networks and several other features. |
المؤتمر (2): | |
عنوان المؤتمر: | The 4th Workshop on Open-Source Arabic Corpora and Processing Tools with a Shared Task on Offensive Language Detection |
تاريخ الإنعقاد: | 12/05/2020 |
مكان الإنعقاد: | France |
طبيعة المشاركة: | Conference paper |
عنوان المشاركة: | Combining Character and Word Embeddings for the Detection of Offensive Language in Arabic |
ملخص المشاركة: | Twitter and other social media platforms offer users the chance to share their ideas via short posts. While the easy exchange of ideas has value, these microblogs can be leveraged by people who want to share hatred. and such individuals can share negative views about an individual, race, or group with millions of people at the click of a button. There is thus an urgent need to establish a method that can automatically identify hate speech and offensive language. To contribute to this development, during the OSACT4 workshop, a shared task was undertaken to detect offensive language in Arabic. A key challenge was the uniqueness of the language used on social media, prompting the out-of-vocabulary (OOV) problem. In addition, the use of different dialects in Arabic exacerbates this problem. To deal with the issues associated with OOV, we generated a character-level embeddings model, which was trained on a massive data collected carefully. This level of embeddings can work effectively in resolving the problem of OOV words through its ability to learn the vectors of character n-grams or parts of words. The proposed systems were ranked 7th and 8th for Subtasks A and B, respectively. |
المؤتمر (3): | |
عنوان المؤتمر: | The Fourteenth Workshop on Semantic Evaluation |
تاريخ الإنعقاد: | 13/12/2020 |
مكان الإنعقاد: | Barcelona (online) |
طبيعة المشاركة: | Conference paper and present it as a poster. |
عنوان المشاركة: | BhamNLP at SemEval-2020 Task 12: An ensemble of different word embeddings and emotion transfer learning for Arabic offensive language identification in social media |
ملخص المشاركة: | Social media platforms such as Twitter offer people an opportunity to publish short posts in which they can share their opinions and perspectives. While these applications can be valuable, they can also be exploited to promote negative opinions, insults, and hatred against a person, race, or group. These opinions can be spread to millions of people at the click of a mouse. As such, there is a need to develop mechanisms by which offensive language can be automatically detected in social media channels and managed in a timely manner. To help achieve this goal, SemEval 2020 offered a shared task (OffensEval 2020) that involved the detection of offensive text in Arabic. We propose an ensemble approach that combines different levels of word embedding models and transfers learning from other sources of emotion-related tasks. The proposed system ranked 9th out of the 52 entries within the Arabic Offensive language identification subtask. |
المؤتمر (4): | |
عنوان المؤتمر: | The Sixth Arabic Natural Language Processing Workshop |
تاريخ الإنعقاد: | 19/04/2021 |
مكان الإنعقاد: | Barcelona (online) |
طبيعة المشاركة: | Conference paper and presentation. |
عنوان المشاركة: | Multi-task learning using a combination of contextualised and static word embeddings for Arabic sarcasm detection and sentiment analysis |
ملخص المشاركة: | Sarcasm detection and sentiment analysis are important tasks in Natural Language Understanding. Sarcasm is a type of expression where the sentiment polarity is flipped by an interfering factor. In this study, we exploited this relationship to enhance both tasks by proposing a multi-task learning approach using a combination of static and contextualised embeddings. Our proposed system achieved the best result in the sarcasm detection subtask. |
جوائز التكريم |
|
الجائزة (1): | |
مسمى الجائزة: | Best conference paper |
الجهة المانحة: | NLDB 2020 – Springer |
تاريخ الجائزة: | 17/06/2020 |
مجال التكريم: | The Best Conference Paper Award seeks to recognise high-quality research presented among all accepted papers. |
الجائزة (2): | |
مسمى الجائزة: | Winning System |
الجهة المانحة: | The Sixth Workshop on Arabic Natural Language Processing Co-located with EACL 2021 |
تاريخ الجائزة: | 19/04/2021 |
مجال التكريم: | Winning System in ArSarcasm Shared-Task |
الجائزة (3): | |
مسمى الجائزة: | Second Place Winner |
الجهة المانحة: | King Abdullah University of Science and Technology (KAUST) |
تاريخ الجائزة: | 28/09/2021 |
مجال التكريم: | Arabic Sentiment Analysis Challenge |