BACKGROUND: Hepatitis C virus (HCV) infection is a major global health problem. WHO guidelines recommend screening all people living with HIV for hepatitis C. Considering the limited resources for health in low and middle income countries, targeted HCV screening is potentially a more feasible screening strategy for many HIV cohorts. Hence there is an interest in developing clinician-friendly tools for selecting subgroups of HIV patients for whom HCV testing should be prioritized. Several statistical methods have been developed to predict a binary outcome. Multiple studies have compared the performance of different predictive models, but results were inconsistent.
METHODS: A cross-sectional HCV diagnostic study was conducted in the HIV cohort of Sihanouk Hospital Center of Hope in Phnom Penh, Cambodia. We compared the performance of logistic regression, Spiegelhalter-Knill-Jones and CART to predict Hepatitis C co-infection in this cohort. We estimated the number of HCV co-infections that would be missed. To correct for over-optimism, the leave-one-out bootstrap estimator was used for estimating this quantity.
RESULTS: Logistic regression misses the fewest HCV co-infections (8%), but would still refer 98% of HIV patients for HCV testing. Spiegelhalter-Knill-Jones (SKJ) and CART respectively miss 12% and 29% of HCV co-infections but would only refer about 30% for HCV testing.
CONCLUSIONS: In our dataset, logistic regression has the highest log-likelihood and smallest proportions of HCV co-infections missed but Spiegelhalter-Knill-Jones has the highest area under the ROC curve. The likelihood ratios estimated by Spiegelhalter-Knill-Jones might be easier to interpret for clinicians than odds ratios estimated by logistic regression or the decision tree from CART. CART is the most flexible method, and no model has to be specified regarding presence of interactions and form of the relationship between outcome and predictor variables.