Recent Ebola virus disease outbreaks affirm the dire need for treatments with proven efficacy. Randomized controlled clinical trials remain the gold standard but, during disease outbreaks, may be difficult to conduct due to ethical concerns and challenging field conditions. In the absence of a randomized control group, statistical modeling to create a control group could be a possibility. Such a model-based reference control would only be credible if it had the same mortality risk as that of the experimental group in the absence of treatment. One way to test this counterfactual assumption is to evaluate whether reasonable similarity exists across nonrandomized control groups from different clinical studies, which might suggest that a future control group would be similarly homogeneous. We evaluated similarity across six clinical studies conducted during the 2013-2016 West Africa outbreak of Ebola virus disease. These studies evaluated favipiravir, the biologic ZMapp, the antimalarial drug amodiaquine, or administration of convalescent plasma or convalescent whole blood. We compared the nonrandomized control groups of these six studies comprising 1147 individuals infected with Ebola virus. We found considerable heterogeneity, which did not disappear after statistical modeling to adjust for prognostic variables. Mortality risk varied widely (31 to 66%) across the nonrandomized control arms of these six studies. Models adjusting for baseline covariates (age, sex, and cycle threshold, a proxy for viral load) failed to sufficiently recalibrate these studies and showed that heterogeneity remained. Our findings highlight concerns about making invalid conclusions when comparing nonrandomized control groups to cohorts receiving experimental treatments.