TY - JOUR
T1 - Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species
AU - Marin, MG
AU - Quinones-Olvera, N
AU - Wippel, C
AU - Behruznia, M
AU - Jeffrey, BM
AU - Harris, M
AU - Mann, BC
AU - Rosenthal, A
AU - Jacobson, KR
AU - Warren, RM
AU - Li, H
AU - Meehan, CJ
AU - Farhat, MR
N1 - FTX; DOAJ; (CC BY)
PY - 2025
Y1 - 2025
N2 - Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.
AB - Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=itm_wosliteitg&SrcAuth=WosAPI&KeyUT=WOS:001497766700001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1093/bioinformatics/btaf219
DO - 10.1093/bioinformatics/btaf219
M3 - A1: Web of Science-article
C2 - 40341387
SN - 1367-4803
VL - 41
JO - Bioinformatics
JF - Bioinformatics
IS - 5
M1 - btaf219
ER -