مجال التميز | بحثي دراسي |
البحوث المنشورة |
|
البحث (1): | |
عنوان البحث: | Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study |
رابط إلى البحث: | https://medinform.jmir.org/2021/9/e27670 |
تاريخ النشر: | 17/09/2021 |
موجز عن البحث: | Background:Twitter is a real-time messaging platform widely used by people and organizations to share information on many topics. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, such an analysis is currently not possible in the Arabic-speaking world owing to a lack of basic building blocks for research and dialectal variation.Objective:We collected around 4000 Arabic tweets related to COVID-19 and influenza. We cleaned and labeled the tweets relative to the Arabic Infectious Diseases Ontology, which includes nonstandard terminology, as well as 11 core concepts and 21 relations. The aim of this study was to analyze Arabic tweets to estimate their usefulness for health surveillance, understand the impact of the informal terms in the analysis, show the effect of deep learning methods in the classification process, and identify the locations where the infection is spreading.Methods:We applied the following multilabel classification techniques: binary relevance, classifier chains, label power set, adapted algorithm (multilabel adapted k-nearest neighbors [MLKNN]), support vector machine with naive Bayes features (NBSVM), bidirectional encoder representations from transformers (BERT), and AraBERT (transformer-based model for Arabic language understanding) to identify tweets appearing to be from infected individuals. We also used named entity recognition to predict the place names mentioned in the tweets.Results:We achieved an F1 score of up to 88% in the influenza case study and 94% in the COVID-19 one. Adapting for nonstandard terminology and informal language helped to improve accuracy by as much as 15%, with an average improvement of 8%. Deep learning methods achieved an F1 score of up to 94% during the classifying process. Our geolocation detection algorithm had an average accuracy of 54% for predicting the location of users according to tweet content.Conclusions:This study identified two Arabic social media data sets for monitoring tweets related to influenza and COVID-19. It demonstrated the importance of including informal terms, which are regularly used by social media users, in the analysis. It also proved that BERT achieves good results when used with new terms in COVID-19 tweets. Finally, the tweet content may contain useful information to determine the location of disease spread. |
المؤتمرات العلمية |
|
المؤتمر (1): | |
عنوان المؤتمر: | HEALTHCARE TEXT ANALYTICS CONFERENCE 2021 |
تاريخ الإنعقاد: | 17/06/2021 |
مكان الإنعقاد: | Online |
طبيعة المشاركة: | Poster |
عنوان المشاركة: | Detecting COVID-19 Misinformation in Arabic Tweets |
ملخص المشاركة: | In this study, we collect around 2000 tweets related to COVID-19 from the Twitter streaming API. We manually tag the tweets for false information, correct information, and unrelated. Then, we apply three different machine learning algorithms, Logistic Regression, Support Vector Classification, and Naïve Bayes with two sets of features, word frequency approach and word embedding. We find that with 84% accuracy, Machine Learning classifiers are able to correctly classify rumour related tweets. |
المؤتمر (2): | |
عنوان المؤتمر: | Proceedings of the 12th Language Resources and Evaluation Conference |
تاريخ الإنعقاد: | 01/05/2020 |
مكان الإنعقاد: | Marseille, France |
طبيعة المشاركة: | ورقة علمية |
عنوان المشاركة: | Developing an Arabic Infectious Disease Ontology to Include Non-Standard Terminology |
ملخص المشاركة: | Building ontologies is a crucial part of the semantic web endeavour. In recent years, research interest has grown rapidly in supporting languages such as Arabic in NLP in general but there has been very little research on medical ontologies for Arabic.
We present a new Arabic ontology in the infectious disease domain to support various important applications including the monitoring of infectious disease spread via social media. This ontology meaningfully integrates the scientific vocabularies of infectious diseases with their informal equivalents. We use ontology learning strategies with manual checking to build the ontology. We applied three statistical methods for term extraction from selected Arabic infectious diseases articles: TF-IDF, C-value, and YAKE. We also conducted a study, by consulting around 100 individuals, to discover the informal terms related to infectious diseases in Arabic. In future work, we will automatically extract the relations for infectious disease concepts but for now these are manually created. We report two complementary experiments to evaluate the ontology. First, a quantitative evaluation of the term extraction results and an additional qualitative evaluation by a domain expert. |
لما صالح علي السديس
دكتوراه
العلوم والتقنية
Lancaster University