top of page
Search

Developing a Machine Learning Model to Improve Test Utilization and Laboratory Stewardship

Author: He Sarina Yang, PhD, DABCC, Associate Professor of Clinical Pathology and Laboratory Medicine, Weill Cornell Medical College, Cornell University


He Sarina Yang, PhD, MBBS, DABCC (CC,TC)

Associate Professor of Clinical Pathology and Laboratory Medicine

Director, Clinical Chemistry service

Director, Toxicology and Therapeutic Drug Monitoring service

Weill Cornell Medicine


Approximately 90% of total hypercalcemia cases are diagnosed as primary hyperparathyroidism or malignancy-related hypercalcemia. The latter is primarily mediated by parathyroid hormone-related peptide (PTHrP), which stimulates calcium resorption from bone and reabsorption in the kidneys. Hypercalcemia mediated by PTHrP is most frequently caused by malignant solid organ tumors indicating a poor prognosis. Clinically, measuring PTHrP levels can aid in diagnosing the humoral hypercalcemia of malignancy when the source of elevated calcium levels is not immediately evident. A logical, systematic approach to the workup of a patient with hypercalcemia would include confirmation of elevated calcium, ideally through the determination of ionized calcium, followed by measurement of intact parathyroid hormone (PTH). Elevated PTH indicates primary hyperparathyroidism. PTHrP measurement is only necessary when the cause of the hypercalcemia is not readily clear. However, in real practice, the elevation of total calcium often prompts simultaneous orders of both PTH and PTHrP tests. PTHrP testing is often ordered on patients with a low pretest probability of this condition. As a result, many institutes employ a manual, rule-based approach in which the laboratory medicine residents review PTH and calcium results and attempt to identify inappropriate orders in instances where the likelihood of an abnormal PTHrP result is low (e.g. high calcium levels and high PTH levels). This approach is labor-intensive and time-consuming. This inadequate laboratory utilization practice results in increased healthcare costs, drains laboratory resources, and can trigger unnecessary patient anxiety. Therefore, the goal of this study carried out by Dr. Yang and her colleagues at Weill Cornell Medicine was to develop a machine learning (ML) model to identify inappropriate PTHrP orders, thereby improving the PTHrP test utilization.


In this recent study published in Clinical Chemistry, Dr. Yang and her colleagues have developed an ML model to predict the normalcy of PTHrP levels using routine laboratory test results available at the time when the patient’s PTHrP test was ordered. The model can output a score indicating the likelihood of a normal or abnormal PTHrP result. This model achieved an area under the ROC curve of 0.936 in the original dataset provided by the Washington University School of Medicine in St. Louis (WUSM). The collaborator at WUSM found that this model achieved a significant improvement compared to their manual approach for identifying patients at risk for PTHrP. Thus, if implemented, the proposed ML model has the potential to complement the current workup algorithm by detecting inappropriate PTHrP orders, thus facilitating automation of the decision-making process, improving test utilization and laboratory stewardship. Furthermore, the ML-based data-driven approach detects variables that are presently not included in the existing work-up algorithm consisting of intact PTH and total calcium. For instance, patients who have hypercalcemia of malignancy may exhibit lower levels of albumin partly due to liver dysfunction, nephrotic syndrome, or malnutrition. In addition, hypercalcemia of malignancy may be associated with systemic inflammatory response leading to higher levels of WBCs and lower levels of albumin. The clinical interpretability of the ML model is crucial as laboratorians and clinicians prefer to use models that can be comprehended and aligned with their knowledge and experience.

Before an ML model can be deployed in clinical practice, its generalizability and transportability, i.e. the ability of a model to perform well on independent external datasets collected from different geographic or demographic populations or different hospital settings, need to be assessed. The differences among various laboratories, including instrument platforms, testing methodologies, sample handling, or use of send-out laboratories, pose technical challenges for model generalization. In this study, the PTHrP predictive model was evaluated in two independent external datasets collected from Weill Cornell Medicine (WCM) and the University of Texas, M.D. Anderson Cancer Center (MDACC). Dr. Yang and her colleagues proposed site-specific customization strategies, such as model re-training, re-building, and fine-tuning, using local data to improve the model’s predictive performance in external datasets.

In summary, this study presents the workflow of data collection, data preprocessing, model development, and evaluation, as well as strategies to improve the model’s performance on additional sites. ML offers promise to improve PTHrP test utilization while relieving the burden of manual review.


开发机器学习模型以提高检测利用率和实验室管理


大约90%的高钙血症病例被诊断为原发性甲状旁腺功能亢进或恶性肿瘤相关的高钙血症。后者主要由甲状旁腺激素相关肽(PTHrP)介导,PTHrP刺激骨骼释放钙并促进肾脏对钙的再吸收。PTHrP介导的高钙血症最常由恶性实体器官肿瘤引起并表明患者预后不良。临床上,测量PTHrP水平有助于诊断肿瘤性高钙血症。对于高钙血症患者的检查应包括再次确认血钙浓度,然后测量甲状旁腺激素(PTH)。如果PTH水平高,患者被诊断为原发性甲状旁腺功能亢进。仅当高钙血症的原因不明显时,才需要测量PTHrP。然而,在实际临床操作中,临床医生对高血钙的病人常常同时测量PTH和PTHrP,而很多PTHrP检测是不必要的。也就是说,PTHrP检测的前测概率通常比较低。因此,许多医院采用手动核查血钙和PTH 的方法来识别不恰当的PTHrP 要求。这种方法耗时耗力。过度的PTHrP检测导致医疗成本增加,浪费实验室资源,并可能引发不必要的患者焦虑。因此,康奈尔大学医学院的研究者们致力于开发一种机器学习(ML)的模型,以帮助实验室人员识别不恰当的PTHrP orders,从而提高PTHrP测试的利用率。

在最近发表于《临床化学》杂志的一篇文章中,杨鹤博士和她的同事们开发了一个机器学习模型。该模型利用病人常规实验室血检的结果来预测PTHrP结果正常或异常的可能性。这个模型在华盛顿大学圣路易斯医学院(WUSM)提供的原始数据集中实现了0. 936的ROC曲线下面积。WUSM的合作者指出,与他们手动识别PTHrP风险患者的方法相比,这种模型取得了显著的改进。因此,如果实施,这个ML模型将能完善现有的临床诊断方法,促进决策过程的自动化,并且改善PTHrP测试的利用率。除了PTH和血钙这两个主要因素外,机器学习模型还应用了其他临床检测结果。例如,恶性肿瘤的高钙血症患者可能会因为肝功能障碍、肾病综合征或营养不良而表现出较低的白蛋白水平。此外,恶性肿瘤的高钙血症可能与系统性炎症反应有关,导致WBC水平升高和白蛋白水平降低。ML模型的临床解释性至关重要,因为实验室技师和临床医生更倾向于使用与他们的临床知识和经验相符的模型。

在ML模型被应用于临床实践之前,需要评估其可普及性(generalizability)和可转移性(transportability),即模型在从不同地理或人口群体或不同医院环境收集的独立外部数据集上的性能表现。各种实验室之间的差异,包括仪器平台、测试方法、样本处理或使用外送实验室,为模型的普及提出了技术挑战。在这项研究中,作者评估了PTHrP预测模型在康奈尔医学院(WCM)和德克萨斯大学M.D.安德森癌症中心(MDACC)两个独立医院中收集到的数据集中的表现。作者提出了一些新颖的方法来提高模型在外部数据集中的预测性能,例如使用本地数据重新训练、重建和微调模型。

总之,这项研究展示了用临床检查数据搭建机器模型的工作流程,包括数据收集、数据预处理、模型开发和评估,以及提高模型在其他数据集上性能的策略。ML有望改善PTHrP测试的利用效率,同时减轻手动审查的负担。


Reference:

1) Yang HS, Pan W, Wang Y, Zaydman MA, Spies NC, Guise TA, Meng QH, Wang F. Generalizability of a machine learning model for improving utilization of parathyroid hormone-related peptide testing across multiple clinical centers. Clinical Chemistry. 2023, 69(11):1260-1269.


67 views0 comments
bottom of page