Bariatric surgery (BAR) has become a popular treatment for type 2 diabetes mellitus (T2DM) which is among the most critical obesity-related comorbidities. Patients who have bariatric surgery, are exposed to complications after surgery. Furthermore, the mid- to long-term complications after bariatric surgery can be deadly and increase the complexity of managing safety of these operations and healthcare costs. Current studies on BAR complications have mainly used risk scoring for identifying patients who are more likely to have complications after surgery. Though, these studies do not take into considera-tion the imbalanced nature of the data where the size of the class of interest (patients who have complications after surgery) is relatively small. We propose the use of imbalanced classification techniques to tackle the imbalanced bariatric surgery data: synthetic minority oversampling technique (SMOTE), random undersampling, and en-semble learning classification methods including Random Forest, Bagging, and AdaBoost. Moreover, we improve classification performance through using Chi-Squared, Information Gain, and Correlation-based feature selection (CFS) techniques. We study the Premier Healthcare Database with focus on the most-frequent complications includ-ing Diabetes, Angina, Heart Failure, and Stroke. Our results show that the ensemble learning-based classiﬁcation techniques using any feature selection method mentioned above are the best approach for handling the imbalanced nature of the bariatric surgical outcome data. In our evaluation, we ﬁnd a slight preference toward using SMOTE method compared to the random undersampling method. These results demonstrate the potential of machine-learning tools as clinical decision support in identifying risks/outcomes associated with bariatric surgery and their eﬀectiveness in reducing the surgery complications as well as improving patient care.
Razzaghi, Talayeh; Safro, Ilya; Ewing, Joseph; Sadrfaridpour, Ehsan; and Scott, John D., "Predictive Models for Bariatric Surgery Risks with Imbalanced Medical Datasets" (2017). Publications . 32.