|مجال التميز||تميز دراسي وبحثي|
Predicting the risk of cancer in adults using supervised machine learning: a scoping review
|رابط إلى البحث:|
|موجز عن البحث:||
Objectives The purpose of this scoping review is to: (1) identify existing supervised machine learning (ML) approaches on the prediction of cancer in asymptomatic adults; (2) to compare the performance of ML models with each other and (3) to identify potential gaps in research.
Design Scoping review using the population, concept and context approach.
Search strategy PubMed search engine was used from inception to 10 November 2020 to identify literature meeting following inclusion criteria: (1) a general adult (≥18 years) population, either sex, asymptomatic (population); (2) any study using ML techniques to derive predictive models for future cancer risk using clinical and/or demographic and/or basic laboratory data (concept) and (3) original research articles conducted in all settings in any region of the world (context).
Results The search returned 627 unique articles, of which 580 articles were excluded because they did not meet the inclusion criteria, were duplicates or were related to benign neoplasm. Full-text reviews were conducted for 47 articles and a final set of 10 articles were included in this scoping review. These 10 very heterogeneous studies used ML to predict future cancer risk in asymptomatic individuals. All studies reported area under the receiver operating characteristics curve (AUC) values as metrics of model performance, but no study reported measures of model calibration.
Conclusions Research gaps that must be addressed in order to deliver validated ML-based models to assist clinical decision-making include: (1) establishing model generalisability through validation in independent cohorts, including those from low-income and middle-income countries; (2) establishing models for all cancer types; (3) thorough comparisons of ML models with best available clinical tools to ensure transparency of their potential clinical utility; (4) reporting of model calibration performance and (5) comparisons of different methods on the same cohort to reveal important information about model generalisability and performance.
19th International Conference on Informatics, Management and Technology in Healthcare
Early Prediction of Neoplasms Using Machine Learning: A Study of Electronic Health Records from the Ministry of National Guard Health Affairs in Saudi Arabia
The early detection and treatment of neoplasms, and in particular the malignant, can save lives. However, identifying those most at risk of developing neoplasms remains challenging. Electronic Health Records (EHR) provide a rich source of “big” data on large numbers of patients. We hypothesised that in the period preceding a definitive diagnosis, there exists a series of ordered healthcare events captured within EHR data that characterise the onset and progression of neoplasms that can be exploited to predict future neoplasms occurrence. Using data from the EHR of the Ministry of National Guard Health Affairs (MNG-HA), a large healthcare provider in Saudi Arabia, we aimed to discover health event patterns present in EHR data that predict the development of neoplasms in the year prior to diagnosis. After data cleaning, pre-processing, and applying the inclusion and exclusion criteria, 5,466 patients were available for model construction: 1,715 cases and 3,751 controls. Two predictive models were developed (using Decision tree (DT), and Random Forests (RF)). Age, gender, ethnicity, and ICD-10-chapter (broad disease classification) codes as predictor variables and the presence or absence of neoplasms as the output variable. The common factors associated with a diagnosis of neoplasms within one or more years after their occurrence across all the models were: (1) age at neoplasms/event diagnosis; (2) gender; and patient medical history of (3) diseases of the blood and blood-forming organs and certain disorders involving immune mechanisms, and (4) diseases of the genitourinary system. Model performance assessment showed that RF has higher Area Under the Curve (AUC)=0.76 whereas the DT was less complex. This study is a demonstration that EHR data can be used to predict future neoplasm occurrence.