Abstract
SUMMARY: Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.
AVAILABILITY AND IMPLEMENTATION: Panqc is freely available under an MIT license at https://github.com/maxgmarin/panqc.
| Original language | English |
|---|---|
| Article number | btaf219 |
| Journal | Bioinformatics |
| Volume | 41 |
| Issue number | 5 |
| Number of pages | 13 |
| ISSN | 1367-4803 |
| DOIs | |
| Publication status | Published - 6-May-2025 |
Keywords
- Escherichia coli/genetics
- Genome, Bacterial
- Genomics/methods
- Mycobacterium tuberculosis/genetics
- Software
- Staphylococcus aureus/genetics
Fingerprint
Dive into the research topics of 'Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver