Thumbnail Image

Can minimal clinically important differences in patient reported outcome measures be predicted by machine learning in patients with total knee or hip arthroplasty? A systematic review

Langenberger, Benedikt; Thoma, Andreas; Vogt, Verena

FG Management im Gesundheitswesen

Objectives: To systematically review studies using machine learning (ML) algorithms to predict whether patients undergoing total knee or total hip arthroplasty achieve an improvement as high or higher than the minimal clinically important differences (MCID) in patient reported outcome measures (PROMs) (classification problem). Methods: Studies were eligible to be included in the review if they collected PROMs both pre- and postintervention, reported the method of MCID calculation and applied ML. ML was defined as a family of models which automatically learn from data when selecting features, identifying nonlinear relations or interactions. Predictive performance must have been assessed using common metrics. Studies were searched on MEDLINE, PubMed Central, Web of Science Core Collection, Google Scholar and Cochrane Library. Study selection and risk of bias assessment (ROB) was conducted by two independent researchers. Results: 517 studies were eligible for title and abstract screening. After screening title and abstract, 18 studies qualified for full-text screening. Finally, six studies were included. The most commonly applied ML algorithms were random forest and gradient boosting. Overall, eleven different ML algorithms have been applied in all papers. All studies reported at least fair predictive performance, with two reporting excellent performance. Sample size varied widely across studies, with 587 to 34,110 individuals observed. PROMs also varied widely across studies, with sixteen applied to TKA and six applied to THA. There was no single PROM utilized commonly in all studies. All studies calculated MCIDs for PROMs based on anchor-based or distribution-based methods or referred to literature which did so. Five studies reported variable importance for their models. Two studies were at high risk of bias. Discussion: No ML model was identified to perform best at the problem stated, nor can any PROM said to be best predictable. Reporting standards must be improved to reduce risk of bias and improve comparability to other studies.