Challenges and Considerations of Developing and Implementing Machine Learning Tools in Clinical labs
Updated: Nov 6, 2022
He Sarina Yang, PhD, DABCC, Assistant Professor of Pathology and Laboratory Medicine, Weill Cornell Medicine, Cornell University.
Laboratory medicine is data-rich due to the enormous volume of laboratory test results produced by different sections of the clinical laboratory. It is estimated that up to 70% of the data in the electronic health record (EHR) is derived from the clinical laboratory. Most of this data are test results reported as individual numerical or categorical values in a structured format. Patient laboratory test profiles are high-dimensional datasets, as each patient usually has multiple individual laboratory test results generated from a single physician visit as well as longitudinal test results to monitor “wellness” status or to follow one or more disease processes. The enormity of the data, including the number of tests and interdependent multidimensional relationships of the different test results, is difficult for us, as humans, to interpret without computational assistance. Thus, Machine learning (ML) has emerged as a powerful tool for analyzing and interpreting massive quantities of laboratory test results as well as integrating clinical findings with laboratory data. In recent years, there has been a surge of interest in employing ML on a variety of applications in clinical laboratories. However, despite familiarity with traditional data approaches, many laboratory professionals are not familiar with the workflow of machine learning analysis, resulting in a knowledge gap with respect to the development, understanding, and use of ML models. There are risks of generating biased or unrepresentative models, which can lead to misleading clinical conclusions or over-estimation of the model performance.
In the recent review published in Archives of Pathology and Laboratory Medicine, the core clinical medical journal published by the College of American Pathologists, Dr. Yang and Dr. Wang et al. discussed the four major components for creating ML models, including data collection, data preprocessing, model development, and model evaluation. They also highlighted many challenges and pitfalls in developing accurate ML models, which could result in misleading clinical impressions or inaccurate model performance, and provided suggestions and guidance on how to overcome these challenges. In particular, this review addressed the questions of how to collect sufficiently large and high-quality data, properly report the dataset characteristics, and combine data from multiple institutions with proper normalization; how to properly handle missing data and determine the inclusion or exclusion of outliers; and how to evaluate the completeness of a dataset. They also discussed the selection of a suitable ML model for a specific clinical question, as well as the evaluation of model performance based on objective criteria. It was highly recommended to use multiple criteria to evaluate model performance rather than a single criterion. Evaluation using external datasets and/or prospective data collection was preferred to understand model generalizability better. In addition, they demonstrated the causes of model overfitting and under-specification in clinical scenarios.
The role of laboratorians is not just to provide data but also to use their clinical knowledge with the data to guide model development, to correctly interpret the model, and to evaluate its performance in the patient care setting. The future of personalized and generalized medicine requires interdisciplinary collaboration between laboratory medicine and data science experts to create innovative, accurate ML models, which will advance the medical field, provide needed support in periods of health care crisis, and better treat individual patients.
在近期发表于美国病理医师学会官方期刊《病理学与实验医学档案》的一篇综述中，杨鹤与王飞博士等人深入讨论了创建机器学习模型的四个主要组成步骤，包括数据采集、数据预处理、模型开发和模型评估。 他们提纲携领地分析了在这个过程中每个步骤可能出现的技术问题和常见疏漏，提出了具体可行的建议和指导。这篇综述讨论了如何收集大量高质量的数据，正确评估数据的分布特点；如何标准化来自多个医院的数据并加以整合；如何正确处理缺失的信息和判断极少离群值避免其干扰；如何根据临床医学知识来评估数据集的完整性。文章进一步讨论了选取与数据量相匹配的机器学习模型去解决特定的临床问题，以及基于客观标准公平地评估模型准确度性能。 该综述强烈推荐利用多标准取代单一标准去评估模型性能表现，采用外部数据或者未来数据评估将更有利于模型的推广应用。此外，文章还探讨了模型过度拟合以及验证模型难以推广于临床案例等问题的成因。
He S. Yang, Daniel.D.Rhoads, Jorge Sepulveda, Chengxi Zang, Amy Chadburn, Fei Wang, Building the Model: Challenges and Considerations of Developing and Implementing Machine Learning Tools for Clinical Laboratory Medicine Practice. Archives of Pathology and Laboratory Medicine, 2022.