Predictive modeling of patient no-show

Introduction

No-show is defined as patients who fail to attend their scheduled appointments. It is one of the targets for improving the quality of care. It leads to longer waiting times for patients and causes patients to miss their urgent appointments. It also results in a waste of clinic resources, including physician and other healthcare practitioners’ time. For every missed visit, it is estimated that a doctor loses USD 500 (Read Here).

Nearly 30% of medical appointments eventually end up getting cancelled or not showing up. This, in turn, results in a considerable loss for hospitals / Clinics that employ specialists paid per hour.

We discuss how saal.ai tackles this issue in the middle east using machine learning techniques to build a suitable model that predicts if a patient will show or not. This model is expected to make better decisions than a human would in predicting if a patient will show or not. It would not be a rule-based technique but actually intelligently informed using proper AI algorithms to predict the show/no-show.

Data Preprocessing, Analysis and Insights

Data analysis plays a foundational role in developing the larger narrative of how AI can offer businesses the value they seek. Data obtained from a local hospital, for a period of ten months contains almost 57,000 observations and more than 20 features of patients including age, patient id, booked times, date of appointments and a column which labels show/no-show values. The raw data is analyzed to identify patterns across patient groups and variables to gain an understanding of what causes patients to miss their appointments. The identified patterns, coupled with understanding how the clinic currently operates, helps in categorizing them for Show/no-show. Insights from the analysis were fundamental in creating better predictive models.

Data pre-processing is a multi-step and an iterative process where cleaning, feature generation and handling of imbalanced data are performed repeatedly to ensure we have a sample worth building a model on.

We retain only the outpatient data. One main concern when we addressed the missing values (NaN) was that, almost 50 % of the data needed to be discarded. The data was, however, bootstrapped and checked with various statistical tests to ensure that discarding such a massive portion would not affect the model predictive power.We used the existing features and also generated other variables which were expected to affect the model accuracy. This was done either by regrouping the values of some columns to categories depending on their correlation with our labels or by considering other factors like traffic hours, school timing, weekdays, etc.After cleaning the data and doing feature engineering, we ended with 16000 observations, of which 20% was no-show and 80% show. We used a combination of oversampling (bootstrapping) and under-sampling (the larger class) to balance the datasets to improve the final model accuracy.

Model

We tried different models, including logistic regression, decision trees, support vector machine and compared their accuracy. Almost all of them gave an F1 score of less than 75% but with different AUC under the ROC curve. By using a gradient boosting algorithm that is adjusted in a way to fit our data, the F1 score improved to 84% on a test dataset that the model had never encountered and an AUC of 0.9 was achieved.

The model interpretation is the most crucial part from the client’s perspective, as it gives an idea about the reasons behind the patient no-show. Our model outputs probabilities of various reasons for no-show and uses an informed threshold to assign the possible reason for the no-show of any given patient. It also provides analytics across the population for various reasons, thereby helping hospitals either reschedule ahead of the appointment time or plan their capacity accordingly when dealing with a group of patients.

Predictive modeling of patient no-show