Abstract

Tuberculosis (TB) is a grave public health concern and is considered the foremost contributor to human mortality resulting from infectious disease. Due to the stringent clonality and extremely restricted genomic diversity, conventional methods prove inefficient for in-depth exploration of minor genomic variations and the evolutionary dynamics operating in Mycobacterium tuberculosis (M.tb) populations. Until now, the majority of reviews have primarily focused on delineating the application of whole-genome sequencing (WGS) in predicting antibiotic resistant genes, surveillance of drug resistance strains, and M.tb lineage classifications. Despite the growing use of next generation sequencing (NGS) and WGS analysis in TB research, there are limited studies that provide a comprehensive summary of there role in studying macroevolution, minor genetic variations, assessing mixed TB infections, and tracking transmission networks at an individual level. This highlights the need for systematic effort to fully explore the potential of WGS and its associated tools in advancing our understanding of TB epidemiology and disease transmission. We delve into the recent bioinformatics pipelines and NGS strategies that leverage various genetic features and simultaneous exploration of host-pathogen protein expression profile to decipher the genetic heterogeneity and host-pathogen interaction dynamics of the M.tb infections. This review highlights the potential benefits and limitations of NGS and bioinformatics tools and discusses their role in TB detection and epidemiology. Overall, this review could be a valuable resource for researchers and clinicians interested in NGS-based approaches in TB research.

Impact Statement

This study highlights how recent advancements in whole-genome sequencing (WGS) and Next Generation Sequencing (NGS) technologies have transformed tuberculosis (TB) research. NGS strategies have accelerated patient-level investigations and the discovery of drug resistance determinants. The integration of tailored bioinformatics pipelines has further enhanced the application of WGS and NGS approaches in TB research, offering valuable insights into genetic heterogeneity, microevolution, and disease transmission events.

Introduction

The advent of whole-genome sequencing (WGS) has facilitated the comprehensive examination of Mycobacterium tuberculosis (M.tb) strains, offered enhanced genetic resolution, and thereby enabled a deeper understanding of resistance mutations, co-infections, patterns of disease dissemination, and the identification of genetic variants. This method is a pivotal tool in tuberculosis (TB) research for understanding the evolution and pathogenicity of TB strains (Meehan et al. 2019). In TB research, the widespread adoption of the WGS approach by scientific community has led to extensive collections of M.tb strains representing various lineages and sublineages of the M.tb complex. High TB burden countries are encountering a shortage of experienced bioinformaticians for routine diagnosis, typing, and resistance monitoring of M.tb strains in healthcare settings (Rivière et al. 2021). This shortage highlights the importance of developing training programmes and freely accessible knowledge hubs focused on WGS and bioinformatics skills for low-income countries with a high burden TB infection (Karikari et al. 2015, Helmy et al. 2016). Recent developments in WGS allow tracing microevolutionary features and heteroresistance in mixed TB populations and provide super-resolution to the heterogeneity of genomic regions (Liang et al. 2020). However, the availability of WGS data varies among nations, with developing countries producing less genomic data than developed nations (Helmy et al. 2016, Rivière et al. 2021). This bias leads to an inadequate representation of M.tb strains from various geographical regions, resulting in inconsistencies in datasets used for epidemiological investigations. The emergence of state-of-the-art next generation sequencing (NGS) technology has substantially reduced the expenses associated with genome sequencing, thus enabling widespread access to M.tb genomes on a global scale. This breakthrough has opened avenues for exploring the distribution of M.tb lineages and monitoring drug-resistant strains across diverse geographical regions (Freschi et al. 2021). Cutting-edge NGS technologies, coupled with sophisticated bioinformatics tools and methodologies, have become an essential for conducting comprehensive analyses of M.tb infections (Meehan et al. 2019). The robustness of the bioinformatics method utilized in clinically relevant prediction tools is crucial, as variations in bioinformatics analyses can potentially impact clinical decisions, such as the selection of a drug regimen for treatment.

These factors highlight the need for development of robust genomic pipeline or algorithm to achieve standardized WGS results when comparing M.tb isolate genomes submitted from diverse global sources. This review aims to explore recent developments in NGS and WGS analysis employed in the investigation of TB infection. It provides a summary of available software and bioinformatics pipelines specifically designed to analyse the Mycobacterium genome. Special attention has been given to the evolving NGS approach and its optimal application in TB research and highlights recently developed NGS tools and techniques specifically designed to analyse mixed TB infections, minor genomic variations, and transmission networks.

RNA sequencing (RNA-Seq)

To date, various NGS strategies have emerged to study infectious diseases, offering comprehensive insights into the biology of host-pathogen systems (Fig. 1). RNA-Seq, an NGS sequencing technique, is employed to quantify M.tb RNA in clinical samples and explore infection mechanisms, offering cost-effective high-throughput data and exhibiting superior specificity and sensitivity compared to alternative gene expression analysis techniques. However, there are technological restrictions when employing this tactic against intracellular pathogens like M.tb. In recent studies, the RNA-seq approach has been utilized to investigate the adaptation of intracellular Mycobacteria (Rienksma et al. 2015), to identify crucial genes involved in the infection mechanism (Cornejo-Granados et al. 2021), and to comprehend the transcriptional regulatory network’s response to environmental factors (Yoo et al. 2022).

Schematic representation of various NGS method employed to infer M.tb infections.
Figure 1.

Schematic representation of various NGS method employed to infer M.tb infections.

Cornejo et al. (2021) conducted an RNA-seq study to establish an experimental protocol for testing the gene expression of M.tb and its host during in vivo infection (Cornejo-Granados et al. 2021). This approach is unique as it does not require specialized equipment to examine host–pathogen interactions in TB and can also be applied to explore other similar intracellular infections (Cornejo-Granados et al. 2021). In another study, Estévez et al. (2020) conducted an RNA-seq based investigation in latent TB infected (LTBI) individuals from Spain and Mozambique and identified a gene expression signature associated with TB progression (Estévez et al. 2020). Machine learning revealed heterogeneity within LTBI, with some individuals showing TB-like disease features. They proposed a gene panel for distinguishing between LTBI subgroups, aiding in targeted preventive treatment for those at higher risk of TB progression. This method boosts infection-specific gene detection from 13 to 702, including encoded proteins such as PE-PGRS, lppN, and LpqH lipoproteins and three non-coding RNAs. The study of the M.tb transcriptome can be challenging due to the low amount of Mycobacterial RNA in comparison to the RNA of the host cell. This low quantity of RNA could be attributed to the paucibacillary nature of M.tb infections within host cells relative to other pathogenic bacteria (Repasy et al. 2013).

Dual RNA sequencing (du RNA-seq)

Dual RNA-seq enables simultaneous tracking of gene expression alterations in both microbial and eukaryotic cells within defined conditions, such as pathogen-host interaction (Westermann et al. 2012). In recent times, the popular dual RNA-seq approach has been implemented to elucidate host–pathogen interactions, particularly in studying the response against drug-resistant M.tb infection (López-Agudelo et al. 2022). Pisu et al. 2020aisolated M.tb infected alveolar macrophages (AM) and interstitial macrophages (IM) host cell populations directly from mice lungs and used them for dual RNA-seq where they found enhanced upregulation of iron and fatty acid access in AM compared to IM, which limits M.tb growth by iron sequestration and higher nitric oxide levels (Pisu et al. 2021). An improved RNA extraction procedure and data analysis pipeline have been designed for samples with shallow sequencing depth (Pisu and Russell 2023). However, it has limitations due to distinct RNA features in prokaryotes and eukaryotes (Pisu et al. 2020a)⁠. Pisu et al. (2020a) developed a methodology to run dual RNA-seq on populations of M.tb-infected in vivo-derived macrophages where they provided a useful, step-by-step tutorial for carrying out the procedure on M.tb-infected cells from an infection model in mice (Pisu et al. 2020b). The reported methodology may also be simply applied to multiple types of M.tb-infected host cells as well as other in vitro-derived infected cells (Pisu et al. 2020b). In a subsequent study, infected human splenic macrophages were used to study the mRNA profiles of two closely related clinical strains of the Latin American and Mediterranean family of M.tb using dual RNA-seq (López-Agudelo et al. 2022). They further analysed the data using a genome-scale host–pathogen metabolic reconstruction and showed that the host and pathogen’s metabolic responses are also determined by the M.tb strain that is causing the infection. This highlights the significant role of macrophage ontogeny and M.tb genetic programs in shaping host–pathogen interactions. The expression of M.tb proteins, which are targeted by vaccines and drugs, in the lung is crucial for their effectiveness; however, the pulmonary expression of most M.tb genes and their proteins remains inadequately understood. To bridge this gap, an extensive transcriptomic analysis has been performed from M.tb-infected humans and TB prone C3H/FeJ and TB resistant C57BL/6 J mice, comparing with in vitro M.tb gene expression (Lai et al. 2021). Results reveal distinct host responses despite genetic similarity between the pathogens, indicating that the infecting M.tb strain influences both host and pathogen metabolic responses. This highlights the significant role of macrophage ontogeny and M.tb genetic programs in shaping host–pathogen interactions. Modelling the critical transmission phase of TB, which depends on infectious sputum, presents significant challenges. Dual RNA-seq on sputum of TB-infected patients reveals heightened transcriptional activation of inflammatory responses, particularly an interferon-driven proinflammatory reaction, alongside a metabolic transition towards glycolysis in the host (Lai et al. 2021). Mycobacterium tuberculosis constituted ∼1.5% of sputum bacterial sequences, and its presence led to a reduction in commensal bacterial abundance.

Single-cell RNA sequencing (scRNA-seq)

scRNA-seq technology enables the comparison of transcriptomes among distinct cell types in biological systems, overcoming cellular heterogeneity in clinical samples, and facilitating the study of individual M.tb strains from infection subpopulations during drug treatment. Recent studies have explained functional heterogeneity and the role of type I interferons in TB-infected lungs and intestines, highlighting the potential of this technique. Recently, an innovative approach was reported to concurrently obtain the host transcriptome, surface marker expression, and bacterial phenotype for each infected cell using scRNA-seq and fluorescent reporter strains of bacteria. This method facilitates easier analysis of the functional heterogeneity of IMs and AMs infected with M.tb in vivo (Pisu et al. 2021). Along with three distinct populations of IMs with diverse bacterial characteristics, they additionally identified clusters of pro-inflammatory AMs linked to stressed bacteria. Ultimately, they showed that the predominant lung macrophage populations exhibit epigenetically regulated responses to infection, whereas cross-species analysis indicates that the majority of AM subsets are shared by humans and mice. This theoretical framework can be easily applied to other infectious disease agents, perhaps leading to a deeper comprehension of the functions that various host cell types perform during the course of an infection (Pisu et al. 2021). In a follow-up investigation, Pisu and Russell (2023) developed a method for conducting multi-modal scRNA-seq on in vivo M.tb-infected lung macrophages (Pisu and Russell 2023). This approach aimed to elucidate the diverse roles of immune cells in either combating or exacerbating M.tb infection. This method captures cell transcriptome, surface markers, and bacterial phenotype simultaneously, using methanol fixation method, scRNA-seq library preparation, cell sorting, CITE-seq, and antibody labelling. It is applicable to diverse tissues from humans, nonhuman primates, and mice. Akter and Khader (2023) outline a protocol for robustly analysing scRNA-seq data from lung lymphocyte populations, addressing challenges posed by cellular diversity and biological variability (Akter and Khader 2023). Focused on lymphocyte populations from healthy and M.tb-infected mice, the protocol cover downloading processed data, integration of samples, and conducting cluster analysis. Furthermore, it elaborates on identifying lymphoid cell subtypes, performing differential analysis, and enriching pathways, offering a comprehensive approach to studying lung immune responses. In a single-cell transcriptomic study, lung tissues from patients with pulmonary TB, comparing the region with high 18F-labeled fluorodeoxyglucose-avidity to adjacent uninvolved areas and healthy donor tissues identified immune cell types and their transcriptional states associated with TB and inflammation (Pan et al. 2023b). scRNA-seq enables a comprehensive understanding of TB-associated inflammation and potential therapeutic targets for disease management (Wang et al. 2023). In yet another study, scRNA-seq was employed to analyse CD4+ and CD8-T cells from healthy individuals and TB patients, identifying distinct subsets and elucidating transcriptomic changes (Pan et al. 2023b). It revealed signature genes and pathways associated with T-cell exhaustion post-TB infection, highlighting potential exhaustion marker genes and an exhaustion-specific CD8-T cell subcluster, offering insights into TB-related T-cell signatures. Analysis of the local T cell-mediated immunity landscape in human based on TB pleural effusion reveals distinct T cell subsets and highlights the involvement of granzyme K-expressing CD8 T cells in disease pathogenesis (Cai et al. 2022). An scRNA-seq analysis framework identified immune response to M.tb infection and revealed a heterogeneous macrophage dynamics and suppressed host signalling (Gómez-González et al. 2021). Despite the limited tumor necrosis factor (TNF) production in resting macrophages during infection, the study revealed that the strength of inflammatory signals does not correspond with M.tb growth control, highlighting the importance of developing pathogen-specific signalling models.

Selection of NGS sequencing platforms for M.tb genome analysis

The M.tb genome exhibits high clonality and is marked by a combination of elevated guanine–cytosine (GC) content and repetitive structure, rendering WGS a challenging task (Phelan et al. 2016). The Illumina sequencing platform, renowned for its paired short reads and low error rates, has made it possible to successfully analyse almost the entire genome, including drug-resistance loci (Gómez-González et al. 2021). WGS demonstrates enhanced sensitivity and specificity in detecting resistance to first-line anti-TB drugs compared to second-line drugs (Wang et al. 2022). Comparative analysis of sequencing data reveals that long-read sequencing demonstrates a reduced efficiency in predicting resistance to first-line drug relative to short-read data (Peker et al. 2021). Conversely, both long-read and short-read sequencing data exhibit equivalent predictive capabilities for resistance against second-line drug (Peker et al. 2021). Additionally, TB genomes assembled from short-read data exhibit higher accuracy compared to those obtained through long-read assembly (Peker et al. 2021). Observations have indicated that Illumina short-read sequencing often encounters difficulties in amplifying repeat regions (Galagan 2014), resulting in coverage bias and, consequently, the exclusion of some or all of the pe and ppe multigene families in M.tb genomes (Mikheecheva et al. 2017, Meehan et al. 2019). GC bias in the M.tb genome is a frequently overlooked factor contributing to varying depth of coverage during WGS, a crucial aspect that often escapes the attention of researchers. Certain researchers mistakenly interpret coverage bias in Illumina WGS data as genuine deletions (Advani et al. 2019), while others recognize and acknowledge this bias (Zakham et al. 2019). In TB research, inference based on WGS primarily relies on the detection of single nucleotide polymorphisms (SNPs), the identification of complete gene deletions or insertions, and the acknowledgment of gene loss as a substantial contributor to variability within mycobacterial populations (Kato-Maeda et al. 2001, Brosch et al. 2002, Tsolaki et al. 2004). Identifying variations in repetitive regions, gene duplications, chromosomal rearrangements, and changes in the number of tandem repeats using NGS techniques like Illumina poses significant challenges and can produce biased results, leading to significant biological consequences.

The division within the M.tb WGS community regarding the management of Illumina bias highlights the importance of establishing precise exclusion criteria based on empirical evidence. To initiate progress in this direction, Tyler et al. (2016) research on the accuracy of Illumina-sequenced M.tb genomes emphasized the differing coverage bias in genomes prepared using Nextera and TruSeq library preps (Tyler et al. 2016). Their findings indicated that TruSeq-prepared samples exhibited a more consolidated genome structure than Nextera-prepared samples. Additionally, the study highlighted the difficulty in resolving specific regions, particularly those with high GC content, using either library preparation method. Modlin et al. (2021) developed tailored catalogs of genomic blind spots for various sequencing instruments and library preparations (Modlin et al. 2021). These catalogs are designed to optimize coverage for specific regions of interest, enabling the establishment of exclusion criteria based on empirical data from previous studies, thereby moving away from heuristic approaches. Recent studies have showcased the promising capabilities of long-read sequencing, particularly Oxford Nanopore Technologies, which has exhibited strong performance in variant calling and has shown improved coverage in repetitive regions, despite a relatively higher error rate (Gómez-González et al. 2022). In a subsequent study, a comparison between Illumina (short read) and Nanopore (long read) sequencing of M.tb isolates suggested that both technologies can be used independently or together by health laboratories involved in M.tb drug susceptibility testing (DST) genotypic and outbreak analysis (Hall et al. 2023). The selection of the sequencing platform depends on considerations such as costs, which may vary by country, as well as batching and turnaround time (Hall et al. 2023).

WGS to investigate genetic determinants in M.tb strains

Current research is predominantly centered on the identification and characterization of chromosomal mutations that confer resistance, along with an exploration of their association with the fitness of M.tb species (Björkman et al. 2000, Gagneux et al. 2006, Caminero 2010). A notable frequency of M.tb strains with high fitness-cost conferring mutations can be observed where compensatory mutations trade-off to restore reduced fitness capability (Gygli et al. 2017). The development of drug resistance, increased transmissibility, and heightened virulence is significantly influenced by genomic heterogeneity or microevolution in M.tb strains (Gygli et al. 2017). The adaptive capability of drug-resistant M.tb strains is associated with factors such as strain genetic background, epistatic interaction of compensatory mutations, and availability of multiple resistance mutations (Fenner et al. 2012). The genetic background of a strain may influence the impact of drug resistance mutation on fitness, as demonstrated by the variable fitness observed among different clinical isolates with the same amino acid substitution. This suggests that the genetic background plays a crucial role in overcoming the fitness cost imposed by resistance-conferring mutations. Furthermore, the extent of drug resistance resulting from a specific mutation may vary based on the genetic background of the strain (Fenner et al. 2012). Conventional molecular methods and targeted sequencing techniques, such as PCR, tend to be time-consuming and have limited access to the genomic regions of the M.tb strain. Routine molecular methods, including restriction fragment length polymorphism (RFLP) typing (Kamerbeek et al. 1997), spoligotyping (van Embden et al. 1993), and Mycobacterial Interspersed Repetitive Unit-Variable-Number Tandem Repeat (MIRU-VNTR) typing (Supply et al. 1997, 2000), are limited in their ability to resolve genetic heterogeneity across the entire genome, as they operate exclusively on specific regions of the M.tb genome. The application of sanger sequencing (Sanger et al. 1977) for targeting drug resistance-associated genes revealed hetero resistance (concurrent presence of drug-resistant and drug-sensitive strains) (Streicher et al. 2012); however, the identification of heterogeneity was constrained to a small portion of the M.tb genome. Prior investigations have indicated that the identification of genetic diversity in M.tb through WGS may be influenced by the approach to sample collection (Goig et al. 2020). Specifically, the direct sequencing of strains from sputum samples has been shown to be more efficient in capturing within-sample genetic diversity compared to isolating M.tb strains from MGIT (mycobacterium growth indicator tube) culture (Walker et al. 2013). The advent of WGS has enabled the examination of the complete genomes, facilitating a more comprehensive analysis of evolutionary relationships and providing a higher resolution to the genetic background of M.tb lineages (Castro et al. 2020). As the number of TB cases continues to rise worldwide, the trend of utilizing WGS analysis and comparisons is on the rise. These NGS approaches are instrumental in delineating genetic variations, resistant mutations, and other significant genetic changes occurring in M.tb strains.

Advantage of WGS in predicting drug susceptibility of M.tb

The MGIT approach currently serves as the reference standard for DST of M.tb isolates. Nevertheless, the use of WGS is growing in many developed countries for resistance detection and susceptibility prediction. The preceding studies examined the reliability of WGS and its disparities with the MGIT method in predicting drug susceptibility for first-line drugs (Quan et al. 2018, van Beek et al. 2019). WGS outperforms line probe assay (LPA)-based susceptibility testing in detecting low-level resistance to rifampicin and ethambutol (Genestet et al. 2020). The LPA proves to be less accurate than WGS in predicting susceptibility to anti-TB drugs (Genestet et al. 2020). Consequently, WGS has the potential to substitute for phenotypic DST, especially in countries with a low prevalence of drug-resistant M.tb strain. Still, refinement is possible as additional data accumulate, especially concerning the characterization of the impacts of novel mutations, which may confer resistance, on DST predictions (Quan et al. 2018). Phenotypic DST, which relies on critical concentration testing methods like the MGIT method, may produce inaccurate results (Ruesen et al. 2018). WGS-predicted drug resistance mutations with minimum inhibitory concentrations demonstrates specific mutations correlate with different levels of resistance (Ruesen et al. 2018). This insight not only facilitates decision-making but also enhances drug monitoring protocols to improve treatment outcomes.

Recent advances in tools and databases in WGS analysis of M.tb

Creating novel software tools for the analysis of extensive biological datasets is a crucial element in the progression of contemporary biomedical research. The extensive amount of M.tb WGS data highlights the imperative for the development of customized tools and databases for bioinformatics analysis (Fig. 2). This development is essential for the effective and efficient handling of data analysis and visualization of results. Until now, the majority of databases and software tools have primarily focused on predicting resistant M.tb strains and identifying drug targets (Table 1). Here, we provide a compilation of freely available bioinformatics software tools and databases designed for the genomic analysis of M.tb strains (Table 1). Additionally, drug resistance mutations in M.tb are critical for identifying and managing drug-resistant TB strains, which can be detected using genomic analysis tools. Public databases such as TB-Lineage and TB DEPOT provide comprehensive resources for researchers to access and analyse mutation data specific to TB, facilitating advancements in diagnosis. Table 2 summarizes list of public databases designed to identify drug resistance mutations in M.tb. This resource facilitates the swift determination of antibiotic resistance profiles in clinical isolates, aiding in the development of effective treatment strategies.

NGS sequencing leading to development of bioinformatics tools and database in TB research.
Figure 2.

NGS sequencing leading to development of bioinformatics tools and database in TB research.

Table 1.

Tools and pipelines for identifying drug-resistant M.tb strains and analysing TB infections.

NameCategoryApplicationAccess linkReferences
kmer-based method (Bugwas)Software/DatabaseUses a linear mixed model method to identify genetic variants causing drug resistance at lineage-level, focusing on differences in genomic regions in bacterial pathogens causing TB infection.https://github.com/sgearle/bugwasEarle et al. (2016), Jaillard et al. (2018)
Mykrobe predictorSoftware/DatabaseThe Mykrobe predictor software package efficiently analyses raw read sequence data to produce user-friendly reports on drug-resistant M.tb strainshttps://github.com/Mykrobe-tools/mykrobe Hunt et al. (2019)
TnSeq pipelineSoftware/DatabaseTnSeq data analysis maps reads from transposon-junction to the mutant strain’s genome, allowing for strain-specific traits investigationhttps://gitlab.com/tbgenomicsunit/tnseq-pipeline Carey et al. (2018)
TB-DROPSoftware/DatabaseA tailored deep learning model to predict MTB drug resistance using genome mutationshttps://github.com/nottwy/TB-DROP Wang et al. (2024)
Protein druggability database (TuberQ)Pipeline/DatabaseIt uses 3982 Open Reading Frames from the H37Rv strain for HMMer analysis, microarray expression data, structural homology-modeling, and drug pockets prediction to predict druggable M.tb proteinshttp://tuberq.proteinq.com.ar/ Radusky et al. (2014)
SpolLineagesPipeline/DatabaseThis tool is a Java-based program that mainly relies on components from RuleTB, SITVIT2 database, decision tree, and evolutionary computations. Used to identify M.tb complex through various typing patternshttps://github.com/dcouvin/SpolLineages Couvin et al. (2020)
CHOPINDatabasePredicts the structural effect from mutations conferring drug resistance to the M.tb complexhttp://structure.bioc.cam.ac.uk/chopin Ochoa-Montaño et al. (2015)
TbvarDatabaseAnnotates and identifies novel variants using the WGS techniquehttp://genome.igib.res.in/tbvar/ Joshi et al. (2014)
SInCreDatabaseAnalyse the M. tb proteome, enabling functional domain, homology, binding pockets, and structural annotationhttp://proline.biochem.iisc.ernet.in Metri et al. (2015)
TIBLEPipeline/DatabaseTIBLE is a user-friendly online resource that offers convenient access to information on the minimal inhibitory concentrations of small molecules against various mycobacterial species. Additionally, it provides predictions on target binding and off-target effects for M.tbhttp://www-cryst.bioc.cam.ac.uk/tible/ Malhotra et al. (2017)
AntiTbPdbDatabaseThe AntiTbPdb serves as a repository for experimentally validated peptides with anti-tubercular or anti-mycobacterial properties. It furnishes comprehensive details for each peptide, including sequence, modifications, origin, strain-specific mycobacterium species, inhibition concentration, specific immune response, and more. Additionally, the database incorporates predicted structures for these anti-tubercular peptideshttp://webs.iiitd.edu.in/raghava/antitbpdb/ Usmani et al. (2018)
HGV&TB DatabaseDatabaseIt contains information on 98 TB genes from 307 variantsgenome.igig.res.in/hgvtb/index.html Sahajpal et al. (2014)
SpolPredSoftwareIdentifies the spoligotype in M.tb from NGS raw read sequenceswww.pathogenseq.org/spolpred Coll et al. (2012)
MycPermCheckOnline prediction toolA web tool for analysing small molecule permeability in M.tb cells, predicting based on logistic regression, and target molecule physico-chemical featureshttp://www.mycpermcheck.aksotriffer.pharmazie.uni-wuerzburg.de Merget et al. (2013)
DeepAMROnline prediction toolThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.robots.ox.ac.uk/∼davidc/code.php Yang et al. (2019)
MtbRegListDatabaseThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.USherbrooke.ca/vers/MtbRegList Jacques et al. (2005)
TubercuListDatabaseThis database utilizes up-to-date curated genomes and protein 3D structures information to reannotate previously published TB genomes, enabling accurate prediction of genes and their respective functionshttp://genolist.pasteur.fr/TubercuList/ Camus et al. (2002)
SAM-TBPipeline/DatabaseSAM-TB integrates variant detection, genomic cluster inference, detection of mixed NTM and MTB samples, and NTM species identification. SAM-TB also offers confidence levels for resistance predictions and supports batch export of analysis resultshttp://samtb.szmbzx.com Yang et al. (2022b)
TB-ProfilerOnline profiling toolBioinformatics webserve for trimming NGS reads, reference genome alignment, and variant callinghttps://tbdr.lshtm.ac.uk/ Phelan et al. (2019)
NameCategoryApplicationAccess linkReferences
kmer-based method (Bugwas)Software/DatabaseUses a linear mixed model method to identify genetic variants causing drug resistance at lineage-level, focusing on differences in genomic regions in bacterial pathogens causing TB infection.https://github.com/sgearle/bugwasEarle et al. (2016), Jaillard et al. (2018)
Mykrobe predictorSoftware/DatabaseThe Mykrobe predictor software package efficiently analyses raw read sequence data to produce user-friendly reports on drug-resistant M.tb strainshttps://github.com/Mykrobe-tools/mykrobe Hunt et al. (2019)
TnSeq pipelineSoftware/DatabaseTnSeq data analysis maps reads from transposon-junction to the mutant strain’s genome, allowing for strain-specific traits investigationhttps://gitlab.com/tbgenomicsunit/tnseq-pipeline Carey et al. (2018)
TB-DROPSoftware/DatabaseA tailored deep learning model to predict MTB drug resistance using genome mutationshttps://github.com/nottwy/TB-DROP Wang et al. (2024)
Protein druggability database (TuberQ)Pipeline/DatabaseIt uses 3982 Open Reading Frames from the H37Rv strain for HMMer analysis, microarray expression data, structural homology-modeling, and drug pockets prediction to predict druggable M.tb proteinshttp://tuberq.proteinq.com.ar/ Radusky et al. (2014)
SpolLineagesPipeline/DatabaseThis tool is a Java-based program that mainly relies on components from RuleTB, SITVIT2 database, decision tree, and evolutionary computations. Used to identify M.tb complex through various typing patternshttps://github.com/dcouvin/SpolLineages Couvin et al. (2020)
CHOPINDatabasePredicts the structural effect from mutations conferring drug resistance to the M.tb complexhttp://structure.bioc.cam.ac.uk/chopin Ochoa-Montaño et al. (2015)
TbvarDatabaseAnnotates and identifies novel variants using the WGS techniquehttp://genome.igib.res.in/tbvar/ Joshi et al. (2014)
SInCreDatabaseAnalyse the M. tb proteome, enabling functional domain, homology, binding pockets, and structural annotationhttp://proline.biochem.iisc.ernet.in Metri et al. (2015)
TIBLEPipeline/DatabaseTIBLE is a user-friendly online resource that offers convenient access to information on the minimal inhibitory concentrations of small molecules against various mycobacterial species. Additionally, it provides predictions on target binding and off-target effects for M.tbhttp://www-cryst.bioc.cam.ac.uk/tible/ Malhotra et al. (2017)
AntiTbPdbDatabaseThe AntiTbPdb serves as a repository for experimentally validated peptides with anti-tubercular or anti-mycobacterial properties. It furnishes comprehensive details for each peptide, including sequence, modifications, origin, strain-specific mycobacterium species, inhibition concentration, specific immune response, and more. Additionally, the database incorporates predicted structures for these anti-tubercular peptideshttp://webs.iiitd.edu.in/raghava/antitbpdb/ Usmani et al. (2018)
HGV&TB DatabaseDatabaseIt contains information on 98 TB genes from 307 variantsgenome.igig.res.in/hgvtb/index.html Sahajpal et al. (2014)
SpolPredSoftwareIdentifies the spoligotype in M.tb from NGS raw read sequenceswww.pathogenseq.org/spolpred Coll et al. (2012)
MycPermCheckOnline prediction toolA web tool for analysing small molecule permeability in M.tb cells, predicting based on logistic regression, and target molecule physico-chemical featureshttp://www.mycpermcheck.aksotriffer.pharmazie.uni-wuerzburg.de Merget et al. (2013)
DeepAMROnline prediction toolThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.robots.ox.ac.uk/∼davidc/code.php Yang et al. (2019)
MtbRegListDatabaseThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.USherbrooke.ca/vers/MtbRegList Jacques et al. (2005)
TubercuListDatabaseThis database utilizes up-to-date curated genomes and protein 3D structures information to reannotate previously published TB genomes, enabling accurate prediction of genes and their respective functionshttp://genolist.pasteur.fr/TubercuList/ Camus et al. (2002)
SAM-TBPipeline/DatabaseSAM-TB integrates variant detection, genomic cluster inference, detection of mixed NTM and MTB samples, and NTM species identification. SAM-TB also offers confidence levels for resistance predictions and supports batch export of analysis resultshttp://samtb.szmbzx.com Yang et al. (2022b)
TB-ProfilerOnline profiling toolBioinformatics webserve for trimming NGS reads, reference genome alignment, and variant callinghttps://tbdr.lshtm.ac.uk/ Phelan et al. (2019)
Table 1.

Tools and pipelines for identifying drug-resistant M.tb strains and analysing TB infections.

NameCategoryApplicationAccess linkReferences
kmer-based method (Bugwas)Software/DatabaseUses a linear mixed model method to identify genetic variants causing drug resistance at lineage-level, focusing on differences in genomic regions in bacterial pathogens causing TB infection.https://github.com/sgearle/bugwasEarle et al. (2016), Jaillard et al. (2018)
Mykrobe predictorSoftware/DatabaseThe Mykrobe predictor software package efficiently analyses raw read sequence data to produce user-friendly reports on drug-resistant M.tb strainshttps://github.com/Mykrobe-tools/mykrobe Hunt et al. (2019)
TnSeq pipelineSoftware/DatabaseTnSeq data analysis maps reads from transposon-junction to the mutant strain’s genome, allowing for strain-specific traits investigationhttps://gitlab.com/tbgenomicsunit/tnseq-pipeline Carey et al. (2018)
TB-DROPSoftware/DatabaseA tailored deep learning model to predict MTB drug resistance using genome mutationshttps://github.com/nottwy/TB-DROP Wang et al. (2024)
Protein druggability database (TuberQ)Pipeline/DatabaseIt uses 3982 Open Reading Frames from the H37Rv strain for HMMer analysis, microarray expression data, structural homology-modeling, and drug pockets prediction to predict druggable M.tb proteinshttp://tuberq.proteinq.com.ar/ Radusky et al. (2014)
SpolLineagesPipeline/DatabaseThis tool is a Java-based program that mainly relies on components from RuleTB, SITVIT2 database, decision tree, and evolutionary computations. Used to identify M.tb complex through various typing patternshttps://github.com/dcouvin/SpolLineages Couvin et al. (2020)
CHOPINDatabasePredicts the structural effect from mutations conferring drug resistance to the M.tb complexhttp://structure.bioc.cam.ac.uk/chopin Ochoa-Montaño et al. (2015)
TbvarDatabaseAnnotates and identifies novel variants using the WGS techniquehttp://genome.igib.res.in/tbvar/ Joshi et al. (2014)
SInCreDatabaseAnalyse the M. tb proteome, enabling functional domain, homology, binding pockets, and structural annotationhttp://proline.biochem.iisc.ernet.in Metri et al. (2015)
TIBLEPipeline/DatabaseTIBLE is a user-friendly online resource that offers convenient access to information on the minimal inhibitory concentrations of small molecules against various mycobacterial species. Additionally, it provides predictions on target binding and off-target effects for M.tbhttp://www-cryst.bioc.cam.ac.uk/tible/ Malhotra et al. (2017)
AntiTbPdbDatabaseThe AntiTbPdb serves as a repository for experimentally validated peptides with anti-tubercular or anti-mycobacterial properties. It furnishes comprehensive details for each peptide, including sequence, modifications, origin, strain-specific mycobacterium species, inhibition concentration, specific immune response, and more. Additionally, the database incorporates predicted structures for these anti-tubercular peptideshttp://webs.iiitd.edu.in/raghava/antitbpdb/ Usmani et al. (2018)
HGV&TB DatabaseDatabaseIt contains information on 98 TB genes from 307 variantsgenome.igig.res.in/hgvtb/index.html Sahajpal et al. (2014)
SpolPredSoftwareIdentifies the spoligotype in M.tb from NGS raw read sequenceswww.pathogenseq.org/spolpred Coll et al. (2012)
MycPermCheckOnline prediction toolA web tool for analysing small molecule permeability in M.tb cells, predicting based on logistic regression, and target molecule physico-chemical featureshttp://www.mycpermcheck.aksotriffer.pharmazie.uni-wuerzburg.de Merget et al. (2013)
DeepAMROnline prediction toolThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.robots.ox.ac.uk/∼davidc/code.php Yang et al. (2019)
MtbRegListDatabaseThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.USherbrooke.ca/vers/MtbRegList Jacques et al. (2005)
TubercuListDatabaseThis database utilizes up-to-date curated genomes and protein 3D structures information to reannotate previously published TB genomes, enabling accurate prediction of genes and their respective functionshttp://genolist.pasteur.fr/TubercuList/ Camus et al. (2002)
SAM-TBPipeline/DatabaseSAM-TB integrates variant detection, genomic cluster inference, detection of mixed NTM and MTB samples, and NTM species identification. SAM-TB also offers confidence levels for resistance predictions and supports batch export of analysis resultshttp://samtb.szmbzx.com Yang et al. (2022b)
TB-ProfilerOnline profiling toolBioinformatics webserve for trimming NGS reads, reference genome alignment, and variant callinghttps://tbdr.lshtm.ac.uk/ Phelan et al. (2019)
NameCategoryApplicationAccess linkReferences
kmer-based method (Bugwas)Software/DatabaseUses a linear mixed model method to identify genetic variants causing drug resistance at lineage-level, focusing on differences in genomic regions in bacterial pathogens causing TB infection.https://github.com/sgearle/bugwasEarle et al. (2016), Jaillard et al. (2018)
Mykrobe predictorSoftware/DatabaseThe Mykrobe predictor software package efficiently analyses raw read sequence data to produce user-friendly reports on drug-resistant M.tb strainshttps://github.com/Mykrobe-tools/mykrobe Hunt et al. (2019)
TnSeq pipelineSoftware/DatabaseTnSeq data analysis maps reads from transposon-junction to the mutant strain’s genome, allowing for strain-specific traits investigationhttps://gitlab.com/tbgenomicsunit/tnseq-pipeline Carey et al. (2018)
TB-DROPSoftware/DatabaseA tailored deep learning model to predict MTB drug resistance using genome mutationshttps://github.com/nottwy/TB-DROP Wang et al. (2024)
Protein druggability database (TuberQ)Pipeline/DatabaseIt uses 3982 Open Reading Frames from the H37Rv strain for HMMer analysis, microarray expression data, structural homology-modeling, and drug pockets prediction to predict druggable M.tb proteinshttp://tuberq.proteinq.com.ar/ Radusky et al. (2014)
SpolLineagesPipeline/DatabaseThis tool is a Java-based program that mainly relies on components from RuleTB, SITVIT2 database, decision tree, and evolutionary computations. Used to identify M.tb complex through various typing patternshttps://github.com/dcouvin/SpolLineages Couvin et al. (2020)
CHOPINDatabasePredicts the structural effect from mutations conferring drug resistance to the M.tb complexhttp://structure.bioc.cam.ac.uk/chopin Ochoa-Montaño et al. (2015)
TbvarDatabaseAnnotates and identifies novel variants using the WGS techniquehttp://genome.igib.res.in/tbvar/ Joshi et al. (2014)
SInCreDatabaseAnalyse the M. tb proteome, enabling functional domain, homology, binding pockets, and structural annotationhttp://proline.biochem.iisc.ernet.in Metri et al. (2015)
TIBLEPipeline/DatabaseTIBLE is a user-friendly online resource that offers convenient access to information on the minimal inhibitory concentrations of small molecules against various mycobacterial species. Additionally, it provides predictions on target binding and off-target effects for M.tbhttp://www-cryst.bioc.cam.ac.uk/tible/ Malhotra et al. (2017)
AntiTbPdbDatabaseThe AntiTbPdb serves as a repository for experimentally validated peptides with anti-tubercular or anti-mycobacterial properties. It furnishes comprehensive details for each peptide, including sequence, modifications, origin, strain-specific mycobacterium species, inhibition concentration, specific immune response, and more. Additionally, the database incorporates predicted structures for these anti-tubercular peptideshttp://webs.iiitd.edu.in/raghava/antitbpdb/ Usmani et al. (2018)
HGV&TB DatabaseDatabaseIt contains information on 98 TB genes from 307 variantsgenome.igig.res.in/hgvtb/index.html Sahajpal et al. (2014)
SpolPredSoftwareIdentifies the spoligotype in M.tb from NGS raw read sequenceswww.pathogenseq.org/spolpred Coll et al. (2012)
MycPermCheckOnline prediction toolA web tool for analysing small molecule permeability in M.tb cells, predicting based on logistic regression, and target molecule physico-chemical featureshttp://www.mycpermcheck.aksotriffer.pharmazie.uni-wuerzburg.de Merget et al. (2013)
DeepAMROnline prediction toolThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.robots.ox.ac.uk/∼davidc/code.php Yang et al. (2019)
MtbRegListDatabaseThis tool uses genome sequence data to classify drug-resistance labels with reduced dimensionality, achieving high sensitivity and specificityhttp://www.USherbrooke.ca/vers/MtbRegList Jacques et al. (2005)
TubercuListDatabaseThis database utilizes up-to-date curated genomes and protein 3D structures information to reannotate previously published TB genomes, enabling accurate prediction of genes and their respective functionshttp://genolist.pasteur.fr/TubercuList/ Camus et al. (2002)
SAM-TBPipeline/DatabaseSAM-TB integrates variant detection, genomic cluster inference, detection of mixed NTM and MTB samples, and NTM species identification. SAM-TB also offers confidence levels for resistance predictions and supports batch export of analysis resultshttp://samtb.szmbzx.com Yang et al. (2022b)
TB-ProfilerOnline profiling toolBioinformatics webserve for trimming NGS reads, reference genome alignment, and variant callinghttps://tbdr.lshtm.ac.uk/ Phelan et al. (2019)
Table 2.

Drug resistance mutation and public database designed for TB research.

NameCategoryApplicationAccess linkReferences
TB-LineagePipeline/ Online prediction toolAn online tool for classification and analysis of strains of M.tb complexhttps://tbinsight.cs.rpi.edu/run_tb_lineage.html Shabbeer et al. (2012)
The TB PortalsDatabaseAn open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysishttps://tbportals.niaid.nih.gov/ Rosenthal et al. (2017)
COMBAT-TB-NeoDBPipeline/DatabaseFostering TB research through integrative analysis using graph database technologieshttps://github.com/COMBAT-TB/combat-tb-neodb Lose et al. (2020)
TB DEPOTDatabaseA novel public analytics platform integrating TB clinical, genomic, and radiological data for visual and statistical explorationhttps://depot.tbportals.niaid.nih.gov/#/home Gabrielian et al. (2019)
getTBinRSoftwareAn R package for accessing and summarizing the World Health Organization Tuberculosis datahttps://github.com/seabbs/getTBinR Abbott (2019)
TBDBTDatabaseA TB DataBase template for collection of harmonized TB clinical research data in REDCap, facilitating data standardization for inter-study comparison and meta-analyseshttps://github.com/CIDRI-Africa/TBDBT/ Allie et al. (2021)
TBNetPipeline/DatabaseA context-aware graph network for TB diagnosishttps://www.tbnet.eu/ Giehl et al. (2012)
NameCategoryApplicationAccess linkReferences
TB-LineagePipeline/ Online prediction toolAn online tool for classification and analysis of strains of M.tb complexhttps://tbinsight.cs.rpi.edu/run_tb_lineage.html Shabbeer et al. (2012)
The TB PortalsDatabaseAn open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysishttps://tbportals.niaid.nih.gov/ Rosenthal et al. (2017)
COMBAT-TB-NeoDBPipeline/DatabaseFostering TB research through integrative analysis using graph database technologieshttps://github.com/COMBAT-TB/combat-tb-neodb Lose et al. (2020)
TB DEPOTDatabaseA novel public analytics platform integrating TB clinical, genomic, and radiological data for visual and statistical explorationhttps://depot.tbportals.niaid.nih.gov/#/home Gabrielian et al. (2019)
getTBinRSoftwareAn R package for accessing and summarizing the World Health Organization Tuberculosis datahttps://github.com/seabbs/getTBinR Abbott (2019)
TBDBTDatabaseA TB DataBase template for collection of harmonized TB clinical research data in REDCap, facilitating data standardization for inter-study comparison and meta-analyseshttps://github.com/CIDRI-Africa/TBDBT/ Allie et al. (2021)
TBNetPipeline/DatabaseA context-aware graph network for TB diagnosishttps://www.tbnet.eu/ Giehl et al. (2012)
Table 2.

Drug resistance mutation and public database designed for TB research.

NameCategoryApplicationAccess linkReferences
TB-LineagePipeline/ Online prediction toolAn online tool for classification and analysis of strains of M.tb complexhttps://tbinsight.cs.rpi.edu/run_tb_lineage.html Shabbeer et al. (2012)
The TB PortalsDatabaseAn open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysishttps://tbportals.niaid.nih.gov/ Rosenthal et al. (2017)
COMBAT-TB-NeoDBPipeline/DatabaseFostering TB research through integrative analysis using graph database technologieshttps://github.com/COMBAT-TB/combat-tb-neodb Lose et al. (2020)
TB DEPOTDatabaseA novel public analytics platform integrating TB clinical, genomic, and radiological data for visual and statistical explorationhttps://depot.tbportals.niaid.nih.gov/#/home Gabrielian et al. (2019)
getTBinRSoftwareAn R package for accessing and summarizing the World Health Organization Tuberculosis datahttps://github.com/seabbs/getTBinR Abbott (2019)
TBDBTDatabaseA TB DataBase template for collection of harmonized TB clinical research data in REDCap, facilitating data standardization for inter-study comparison and meta-analyseshttps://github.com/CIDRI-Africa/TBDBT/ Allie et al. (2021)
TBNetPipeline/DatabaseA context-aware graph network for TB diagnosishttps://www.tbnet.eu/ Giehl et al. (2012)
NameCategoryApplicationAccess linkReferences
TB-LineagePipeline/ Online prediction toolAn online tool for classification and analysis of strains of M.tb complexhttps://tbinsight.cs.rpi.edu/run_tb_lineage.html Shabbeer et al. (2012)
The TB PortalsDatabaseAn open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysishttps://tbportals.niaid.nih.gov/ Rosenthal et al. (2017)
COMBAT-TB-NeoDBPipeline/DatabaseFostering TB research through integrative analysis using graph database technologieshttps://github.com/COMBAT-TB/combat-tb-neodb Lose et al. (2020)
TB DEPOTDatabaseA novel public analytics platform integrating TB clinical, genomic, and radiological data for visual and statistical explorationhttps://depot.tbportals.niaid.nih.gov/#/home Gabrielian et al. (2019)
getTBinRSoftwareAn R package for accessing and summarizing the World Health Organization Tuberculosis datahttps://github.com/seabbs/getTBinR Abbott (2019)
TBDBTDatabaseA TB DataBase template for collection of harmonized TB clinical research data in REDCap, facilitating data standardization for inter-study comparison and meta-analyseshttps://github.com/CIDRI-Africa/TBDBT/ Allie et al. (2021)
TBNetPipeline/DatabaseA context-aware graph network for TB diagnosishttps://www.tbnet.eu/ Giehl et al. (2012)

Evaluating minor genetic variation in M.tb population

The progress in molecular techniques has unveiled the capacity of M.tb to engage in polyclonal infections (Moreno-Molina et al. 2021). Mixed infections may give rise to multiple unrelated clones within a patient, or microevolution may lead to the emergence of closely related clones from a previously clonal M.tb population. It is crucial to accurately identify minor variants in M.tb population to improve our understanding of hetero-resistance within dynamic M.tb populations. Identification of minor variants in WGS data has always been challenging due to the limitations of trimming, filtering, and standard methods to differentiate low-frequency variants from sequence ambiguities (Said Mohammed et al. 2018). The recent introduction of the bioinformatics tool, BinoSNP, has simplified the process of identifying minor variants by assessing a customized collection of genomic positions through a binomial test method (Dreyer et al. 2020). Nevertheless, its capability is confined to the identification of SNPs in resistance-conferring genes. Therefore, tools like BinoSNP are unsuitable for detecting unspecified variants like de novo detection of non-resistant variation in minor population groups. Subsequent studies have reported that the LoFreq variant calling tool facilitates the identification of minor variants, including both SNPs and indels, within predetermined resistance-associated loci and previously unexplored genomic regions (Wilm et al. 2012). Goosens et al. (2022), assessed LoFreq’s performance in detecting de novo and drug resistance-associated minor variants in both simulated and clinical M.tb NGS data (Goossens et al. 2022). The results show LoFreq as a precise variant caller with high sensitivity, especially for indels. It exhibits exceptional sharpness and accuracy across the entire spectrum of coverage depths assessed, regardless of the minor variant type or frequency. It reliably detects variants with a frequency limit of detection at 0.5% for indels and 3% for SNPs. In clinical data, LoFreq successfully identified minor M.tb variants, even at low allele frequencies. This suggests its potential to reduce false positives due to sequencing errors. These findings aid in determining detection limits and guiding future M.tb variant studies. An additional limitation is small clinical sample size, which precluded the statistical validation of LoFreq’s performance metrics and underscored the need to conduct validation tests on a larger set of clinical samples, covering both SNP and indel mutations. These observations collectively emphasize the ongoing need for benchmarking whole-genome variant calling tools capable of detecting minor M.tb variants at various depths of population coverage.

Mycobacterium tuberculosis pangenome analysis

The pan-genome encompasses the entire genetic repertoire of a microbial population, comprising core orthologous genes, unique strain-specific genes, and accessory genes. Open pan-genomes incorporate novel gene families, whereas closed pan-genomes exhibit no additional extension (Bosi et al. 2015). The pan-genome approach offers insights into the distribution of virulence genes within pathogenic microbial populations, particularly in monitoring the emergence of drug-resistant genes from novel genetic variants (Muzzi et al. 2007). In recent times, pan-genome analysis has gained popularity for investigating genetic signatures related to antibiotic resistance (Kavvas et al. 2018), adaptive evolution (Yang et al. 2018), and assessing genomic distance among M.tb lineages (Jandrasits et al. 2019). PANPASCO, a computational method for pan-genome mapping, utilizes pairwise distance calculations, demonstrating high sensitivity to variations between cases, and leverages WGS for effective transmission surveillance (Jandrasits et al. 2019). Additional research on the Mycobacterium pan-genome has revealed its significance in identifying potential drug targets and understanding the diversification of the Type VII secretion system, which in turn influences the pathogenicity of M.tb strains (Dumas et al. 2016, Dar et al. 2020). However, as indicated by Kim et al. M.tb is not considered a pathogen that is ideally suited for pan-genome studies; this may be due to its high genomic homogeneity and strict clonality (Kim et al. 2020).

WGS for identification of M.tb mixed infections

Mixed infections arise from either concurrent infection by distinct strains in a patient or strain evolution within the host, resulting in two co-existing populations. Mixed M.tb infections and heteroresistance present challenges for the prognosis and treatment of TB disease. Their detection has predominantly been limited to conventional genotyping techniques, which often lack the required sensitivity and result in inaccurate estimations of population diversity in TB infections (Richardson et al. 2002, van Rie et al. 2005, Zetola et al. 2014, Zong et al. 2018, Liang et al. 2020). The GeneXpert assay has revealed that the current diagnostic methods are not very effective in identifying mixed infections that involve both M.tb and nontuberculous Mycobacteria (NTM). This highlights the urgent need to promptly adopt targeted molecular analyses that specifically capture multiple loci of mycobacterial species from specimens. Inadequate exploration of within-host M.tb diversity makes it difficult to distinguish between relapse and reinfection (Zong et al. 2018). Although WGS provides a comprehensive view of the genetic makeup of an individual strain, challenges still exist in interpreting and analysing the data to identify components of a mixed infection. There are limited established methods for identifying mixed TB infections through WGS data. New approaches, such as Bayesian framework analysis and heterozygous allele identification, have emerged to distinguish strains within M.tb population (Yang et al. 2023).

Deep WGS has proven effective in discerning M.tb strains within mixed infections through exploration of phylogenomic databases derived from single nucleotide variant (SNV) analysis (Gan et al. 2016). A recent paper by Lozano et al. (2021) proposed a novel strategy for capturing minority variants and identifying mixed infections using WGS data (Lozano et al. 2021). The researchers designed a platform named MycoCAP, comprising M.tb DNA capture probes, which enables the targeted enrichment of samples with M.tb DNA (Lozano et al. 2021). Subsequently, they conducted WGS on the captured M.tb DNA to enhance the detection of minority variants and mixed infections. To date, two bioinformatics tools have been reported, specifically designed for the classification of strains in mixed infections using WGS data. The first tool, QuantTB, was designed to quantify individual M.tb strains by comparing TB genomes with reference SNPs of each lineage (Anyansi et al. 2020). The second tool, SplitStrains, employs a rigorous statistical method and the Expectation-Maximization algorithm to separate the constituent strains in a mixed infection accurately (Gabbassov et al. 2021).

Transmission network analysis of TB infection

Genomic sequences of M.tb strains obtained at various time points are progressively employed to deduce the initiation of specific outbreaks, the emergence and proliferation of drug-resistant clones, or the introduction of a strain into a particular geographic region (Saavedra Cervera et al. 2022, Yang et al. 2022a). To gain epidemiological insights, the consideration of a temporally calibrated phylogeny is particularly valuable for reconstructing infectious disease transmission patterns from genomic data (Didelot et al. 2021). More recently, the application of deep NGS has demonstrated considerable promise as an effective strategy for genome-based surveillance of pathogens and the establishment of transmission links among sequenced bacterial pathogens (Sobkowiak et al. 2023). A systematic effort has been devoted to the comparative analysis of publicly available transmission reconstruction models (Sobkowiak et al. 2023). The primary objective is to evaluate the accuracy of these models in predicting transmission events in both simulated and real-world outbreaks of M.tb (Sobkowiak et al. 2023). Using SNP thresholds, WGS improves precision in determining the direction and timing of individual transmission events in M.tb infection (Walker et al. 2013, Stimson et al. 2019). A more advanced approach for transmission reconstruction involves using time scale phylogenetic trees, known as phylodynamics (Didelot et al. 2014). However, challenges such as within-host evolution, latency periods, and low genomic heterogeneity complicate the application of phylodynamics in TB transmission analysis (Ypma et al. 2013, Romero-Severson et al. 2014). Various computational tools integrate genomic variation and epidemiological data to estimate the likelihood of individual-level transmission events from genomic data (Table 3). These tools predominantly employ a Bayesian Markov Chain Monte Carlo framework and robust statistical approach with rigorous computational validation of epidemiological parameters. A recent study has described online tools used for the visualization of transmission networks and evaluated their feasibility for real-time analyses of pathogen sequence data (Neher and Bedford et al. 2018).

Table 3.

List of available software and tools used for infection transmission analysis.

ToolsSoftware applicationsInput dataAccesses linkSource
TransPhyloR and MatlabTime-stamped phylogenetic treehttps://github.com/xavierdidelot/TransPhylo Didelot et al. (2014)
SCOTTIPythonTime-stamped phylogenetic treehttps://bitbucket.org/nicofmay/scotti/src/master/ De Maio et al. (2016)
outbreaker2R and C++Time-stamped phylogenetic treehttp://www.repidemicsconsortium.org/outbreaker2/ Campbell et al. (2018)
TransFlowR and PythonRaw reads and sample metadatahttps://github.com/cvn001/transflow Pan et al. (2023a)
PhybreakRTime-stamped phylogenetic treehttps://github.com/donkeyshot/phybreak Klinkenberg et al. (2017)
QUENTINMATLABAligned fasta Sequencehttps://github.com/skumsp/QUENTIN Skums et al. (2018)
PHYLOSCANNERPython and RBam fileshttps://github.com/BDI-pathogens/phyloscanner Wymant et al. (2018)
nosoiRUser defined host parametershttps://slequime.github.io/nosoi/index.html Lequime et al. (2020)
TNetPythonPathogen phylogenyhttps://github.com/sauravdhr/tnet_python Dhar et al. (2022)
LITTRSNP matrix and epidemological datahttps://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; Winglee et al. (2021)
o2geosocialREpidemiological data (do not include genetic sequences)https://github.com/alxsrobert/o2geosocial Robert et al. (2021)
GraphSNPR and JavaSNP distancehttps://github.com/nalarbp/graphsnp Permana et al. (2023)
SOPHIEPython and MATLABPhylogenetic tree and sample meta datahttps://github.com/compbel/SOPHIE/ Skums et al. (2022)
P-DORPythonAssembled genome and SNP phylogenyhttps://github.com/SteMIDIfactory/P-DOR Batisti Biffignandi et al. (2023)
StrainHubRphylogenetic tree and associated metadatahttps://github.com/abschneider/StrainHub de Bernardi Schneider et al. (2020)
Time-scaled haplotypic density (THD)RGenetic distances and user-defined parametershttps://github.com/rasigadelab/thd Wirth et al. (2020)
Visualization of transmission network
NextstrainWeb applicationNextstrain employs TreeTime to infer time-scaled phylogenies and conduct ancestral sequence inference to determine the likely geographic origins of ancestral nodeshttps://nextstrain.org/Hadfield et al. 2018
MicroreactWeb applicationMicroreact facilitates the exploration of phylogenetic trees as well as spatial and temporal data of samples. Custom datasets can be imported into the application using a Newick tree and sample metadata in tabular formathttps://microreact.org/Argimon et al. 2016
GraphiaOpen-source platformGraphia is a novel visual analytics platform specifically designed for the network-based analysis of large and complex datasets, such as those generated in vast quantities by modern biological analyseshttps://graphia.app/Freeman et al. 2022
ToolsSoftware applicationsInput dataAccesses linkSource
TransPhyloR and MatlabTime-stamped phylogenetic treehttps://github.com/xavierdidelot/TransPhylo Didelot et al. (2014)
SCOTTIPythonTime-stamped phylogenetic treehttps://bitbucket.org/nicofmay/scotti/src/master/ De Maio et al. (2016)
outbreaker2R and C++Time-stamped phylogenetic treehttp://www.repidemicsconsortium.org/outbreaker2/ Campbell et al. (2018)
TransFlowR and PythonRaw reads and sample metadatahttps://github.com/cvn001/transflow Pan et al. (2023a)
PhybreakRTime-stamped phylogenetic treehttps://github.com/donkeyshot/phybreak Klinkenberg et al. (2017)
QUENTINMATLABAligned fasta Sequencehttps://github.com/skumsp/QUENTIN Skums et al. (2018)
PHYLOSCANNERPython and RBam fileshttps://github.com/BDI-pathogens/phyloscanner Wymant et al. (2018)
nosoiRUser defined host parametershttps://slequime.github.io/nosoi/index.html Lequime et al. (2020)
TNetPythonPathogen phylogenyhttps://github.com/sauravdhr/tnet_python Dhar et al. (2022)
LITTRSNP matrix and epidemological datahttps://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; Winglee et al. (2021)
o2geosocialREpidemiological data (do not include genetic sequences)https://github.com/alxsrobert/o2geosocial Robert et al. (2021)
GraphSNPR and JavaSNP distancehttps://github.com/nalarbp/graphsnp Permana et al. (2023)
SOPHIEPython and MATLABPhylogenetic tree and sample meta datahttps://github.com/compbel/SOPHIE/ Skums et al. (2022)
P-DORPythonAssembled genome and SNP phylogenyhttps://github.com/SteMIDIfactory/P-DOR Batisti Biffignandi et al. (2023)
StrainHubRphylogenetic tree and associated metadatahttps://github.com/abschneider/StrainHub de Bernardi Schneider et al. (2020)
Time-scaled haplotypic density (THD)RGenetic distances and user-defined parametershttps://github.com/rasigadelab/thd Wirth et al. (2020)
Visualization of transmission network
NextstrainWeb applicationNextstrain employs TreeTime to infer time-scaled phylogenies and conduct ancestral sequence inference to determine the likely geographic origins of ancestral nodeshttps://nextstrain.org/Hadfield et al. 2018
MicroreactWeb applicationMicroreact facilitates the exploration of phylogenetic trees as well as spatial and temporal data of samples. Custom datasets can be imported into the application using a Newick tree and sample metadata in tabular formathttps://microreact.org/Argimon et al. 2016
GraphiaOpen-source platformGraphia is a novel visual analytics platform specifically designed for the network-based analysis of large and complex datasets, such as those generated in vast quantities by modern biological analyseshttps://graphia.app/Freeman et al. 2022

All software and tools are freely available for public use under the General Public License version 3.

Table 3.

List of available software and tools used for infection transmission analysis.

ToolsSoftware applicationsInput dataAccesses linkSource
TransPhyloR and MatlabTime-stamped phylogenetic treehttps://github.com/xavierdidelot/TransPhylo Didelot et al. (2014)
SCOTTIPythonTime-stamped phylogenetic treehttps://bitbucket.org/nicofmay/scotti/src/master/ De Maio et al. (2016)
outbreaker2R and C++Time-stamped phylogenetic treehttp://www.repidemicsconsortium.org/outbreaker2/ Campbell et al. (2018)
TransFlowR and PythonRaw reads and sample metadatahttps://github.com/cvn001/transflow Pan et al. (2023a)
PhybreakRTime-stamped phylogenetic treehttps://github.com/donkeyshot/phybreak Klinkenberg et al. (2017)
QUENTINMATLABAligned fasta Sequencehttps://github.com/skumsp/QUENTIN Skums et al. (2018)
PHYLOSCANNERPython and RBam fileshttps://github.com/BDI-pathogens/phyloscanner Wymant et al. (2018)
nosoiRUser defined host parametershttps://slequime.github.io/nosoi/index.html Lequime et al. (2020)
TNetPythonPathogen phylogenyhttps://github.com/sauravdhr/tnet_python Dhar et al. (2022)
LITTRSNP matrix and epidemological datahttps://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; Winglee et al. (2021)
o2geosocialREpidemiological data (do not include genetic sequences)https://github.com/alxsrobert/o2geosocial Robert et al. (2021)
GraphSNPR and JavaSNP distancehttps://github.com/nalarbp/graphsnp Permana et al. (2023)
SOPHIEPython and MATLABPhylogenetic tree and sample meta datahttps://github.com/compbel/SOPHIE/ Skums et al. (2022)
P-DORPythonAssembled genome and SNP phylogenyhttps://github.com/SteMIDIfactory/P-DOR Batisti Biffignandi et al. (2023)
StrainHubRphylogenetic tree and associated metadatahttps://github.com/abschneider/StrainHub de Bernardi Schneider et al. (2020)
Time-scaled haplotypic density (THD)RGenetic distances and user-defined parametershttps://github.com/rasigadelab/thd Wirth et al. (2020)
Visualization of transmission network
NextstrainWeb applicationNextstrain employs TreeTime to infer time-scaled phylogenies and conduct ancestral sequence inference to determine the likely geographic origins of ancestral nodeshttps://nextstrain.org/Hadfield et al. 2018
MicroreactWeb applicationMicroreact facilitates the exploration of phylogenetic trees as well as spatial and temporal data of samples. Custom datasets can be imported into the application using a Newick tree and sample metadata in tabular formathttps://microreact.org/Argimon et al. 2016
GraphiaOpen-source platformGraphia is a novel visual analytics platform specifically designed for the network-based analysis of large and complex datasets, such as those generated in vast quantities by modern biological analyseshttps://graphia.app/Freeman et al. 2022
ToolsSoftware applicationsInput dataAccesses linkSource
TransPhyloR and MatlabTime-stamped phylogenetic treehttps://github.com/xavierdidelot/TransPhylo Didelot et al. (2014)
SCOTTIPythonTime-stamped phylogenetic treehttps://bitbucket.org/nicofmay/scotti/src/master/ De Maio et al. (2016)
outbreaker2R and C++Time-stamped phylogenetic treehttp://www.repidemicsconsortium.org/outbreaker2/ Campbell et al. (2018)
TransFlowR and PythonRaw reads and sample metadatahttps://github.com/cvn001/transflow Pan et al. (2023a)
PhybreakRTime-stamped phylogenetic treehttps://github.com/donkeyshot/phybreak Klinkenberg et al. (2017)
QUENTINMATLABAligned fasta Sequencehttps://github.com/skumsp/QUENTIN Skums et al. (2018)
PHYLOSCANNERPython and RBam fileshttps://github.com/BDI-pathogens/phyloscanner Wymant et al. (2018)
nosoiRUser defined host parametershttps://slequime.github.io/nosoi/index.html Lequime et al. (2020)
TNetPythonPathogen phylogenyhttps://github.com/sauravdhr/tnet_python Dhar et al. (2022)
LITTRSNP matrix and epidemological datahttps://github.com/CDCgov/TB_molecular_epidemiology/tree/1.0; Winglee et al. (2021)
o2geosocialREpidemiological data (do not include genetic sequences)https://github.com/alxsrobert/o2geosocial Robert et al. (2021)
GraphSNPR and JavaSNP distancehttps://github.com/nalarbp/graphsnp Permana et al. (2023)
SOPHIEPython and MATLABPhylogenetic tree and sample meta datahttps://github.com/compbel/SOPHIE/ Skums et al. (2022)
P-DORPythonAssembled genome and SNP phylogenyhttps://github.com/SteMIDIfactory/P-DOR Batisti Biffignandi et al. (2023)
StrainHubRphylogenetic tree and associated metadatahttps://github.com/abschneider/StrainHub de Bernardi Schneider et al. (2020)
Time-scaled haplotypic density (THD)RGenetic distances and user-defined parametershttps://github.com/rasigadelab/thd Wirth et al. (2020)
Visualization of transmission network
NextstrainWeb applicationNextstrain employs TreeTime to infer time-scaled phylogenies and conduct ancestral sequence inference to determine the likely geographic origins of ancestral nodeshttps://nextstrain.org/Hadfield et al. 2018
MicroreactWeb applicationMicroreact facilitates the exploration of phylogenetic trees as well as spatial and temporal data of samples. Custom datasets can be imported into the application using a Newick tree and sample metadata in tabular formathttps://microreact.org/Argimon et al. 2016
GraphiaOpen-source platformGraphia is a novel visual analytics platform specifically designed for the network-based analysis of large and complex datasets, such as those generated in vast quantities by modern biological analyseshttps://graphia.app/Freeman et al. 2022

All software and tools are freely available for public use under the General Public License version 3.

Tools such as TransPhylo, Quentin, and Phyloscanner can be used to incorporate within-host genomic diversity of strains to infer transmission routes but have certain limitations. For example, TransPhylo uses a time-calibrated phylogeny that takes into account multiple consensus genomes from a single host. However, in the case of a new outbreak, the short timescale involved may make it difficult to generate a clear temporal signal. Quentin and Phyloscanner use different methods to establish transmission links between hosts. Quentin employs graph and network theories to reconstruct within-host phylogenies, while Phyloscanner uses subsample read mapping to generate BAM files. This allows Phyloscanner to identify sub-populations within the host. These within host network reconstruction process might cause biases potentially affecting the accurate distribution of bacterial sub-populations. A comparative analysis was conducted to evaluate the efficacy of tools used in genome-based analysis of TB transmission and proposed Phybreak, Outbreaker2, and TransPhylo are the most effective tools for identifying accurate links in the transmission network derived from TB infection data (Sobkowiak et al. 2023). These tools have demonstrated superior performance in accurately identifying the maximum number of links in the TB transmission network. Study results suggest that these tools could potentially be useful for the development of effective TB control strategies. The accuracy of transmission history inference depends on the rate and extent of genomic heterogeneity (Campbell et al. 2018). Inferring transmission trees from genetic data becomes challenging in pathogens with a high evolutionary rate, primarily due to substantial within-host diversity and genomic dissimilarity among sequenced strains (Morelli et al. 2012). As a result, methods have been developed to integrate both genomic and epidemiological data to infer potential transmission trees (Jombart et al. 2014, Goldstein et al. 2022). According to a recent study, it is crucial to conduct a comprehensive investigation to address potential biases when utilizing a statistical method that combines phylogeny and epidemiological data to analyse TB transmission incidents (Pan et al. 2023a).

Conclusion

The sequencing of the M.tb genomes has revolutionized the field of TB research and positively impacted various aspects of both research and practice. WGS has played a pivotal role in enhancing epidemiological surveillance, facilitating the monitoring of transmission within communities, and tracing the lineage of M.tb across broader geographical and temporal landscapes. The insights derived from the M.tb genome sequencing have potential to accelerate further advancements in the years ahead, as researchers delve into patient-level investigations and employ innovative whole-genome bacteriological methodologies for translational purposes. Emerging NGS strategies facilitate the simultaneous tracing of gene expression patterns while providing comprehensive coverage for studying immune responses to M.tb infections in an efficient manner. Furthermore, the expanding pool of WGS obtained from phenotypically diverse M.tb strains, coupled with advancements in genome-wide association study algorithms, is unlocking the discovery of previously elusive determinants of drug resistance. In TB research, careful selection of genomic pipelines and algorithms has become a critical factor in ensuring standardized WGS results when examining various features of M.tb isolate genomes to investigate genetic heterogeneity, microevolution, and disease transmission events. The study provides systematic information on available software and bioinformatics pipelines that are tailored for Mycobacterium genome analysis, with a focus on emerging NGS approaches and their optimal integration into TB research efforts.

Conflict of interest

The authors declare no conflicts of interest.

Funding

The authors received no specific grant from any funding agency.

Author contributions

Sushanta Deb (Conceptualization, Data curation, Formal analysis, Investigation, Methodology,Writing – original draft), Jhinuk Basu (Investigation, Methodology,Writing – original draft), and Megha Choudhary (Formal analysis, Investigation,Writing – original draft)

References

Abbott
 
S
.
getTBinR: an R package for accessing and summarising the World Health Organisation Tuberculosis data
.
J Open Source Softw
.
2019
;
4
:
1260
.

Advani
 
J
,
Verma
 
R
,
Chatterjee
 
O
 et al.  
Whole genome sequencing of Mycobacterium tuberculosis clinical isolates from India reveals genetic heterogeneity and region-specific variations that might affect drug susceptibility
.
Front Microbiol
.
2019
;
10
:
309
.

Akter
 
S
,
Khader
 
SA
.
A protocol to analyze single-cell RNA-seq data from Mycobacterium tuberculosis-infected mice lung
.
STAR Protoc
.
2023
;
4
:
102544
.

Allie
 
T
,
Jackson
 
A
,
Ambler
 
J
 et al.  
TBDBT: a TB DataBase template for collection of harmonized TB clinical research data in REDCap, facilitating data standardisation for inter-study comparison and meta-analyses
.
PLoS One
.
2021
;
16
:
e0249165
.

Anyansi
 
C
,
Keo
 
A
,
Walker
 
BJ
 et al.  
QuantTB—a method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data
.
BMC Genomics [Electronic Resource]
.
2020
;
21
:
80
.

Argimón
 
S
,
Abudahab
 
K
,
Goater
 
RJE
,
Fedosejev
 
A
,
Bhai
 
J
,
Glasner
 
C
,
Feil
 
EJ
,
Holden
 
MTG
,
Yeats
 
CA
,
Grundmann
 
H
,
Spratt
 
BG
,
Aanensen
 
DM
.
Microreact: visualizing and sharing data for genomic epidemiology and phylogeography
.
Microb Genom
.
2016 Nov 30
;
2
:
e000093
. https://doi.org/10.1099/mgen.0.000093

Batisti Biffignandi
 
G
,
Bellinzona
 
G
,
Petazzoni
 
G
 et al.  
P-DOR, an easy-to-use pipeline to reconstruct bacterial outbreaks using genomics
.
Bioinformatics
.
2023
;
39
:
btad571
.

Björkman
 
J
,
Nagaev
 
I
,
Berg
 
OG
 et al.  
Effects of environment on compensatory mutations to ameliorate costs of antibiotic resistance
.
Science
.
2000
;
287
:
1479
82
.

Bosi
 
E
,
Fani
 
R
,
Fondi
 
M
.
Defining orthologs and pangenome size metrics
.
Methods Mol Biol
.
2015
;
1231
:
191
202
.

Brosch
 
R
,
Gordon
 
SV
,
Marmiesse
 
M
 et al.  
A new evolutionary scenario for the Mycobacterium tuberculosis complex
.
Proc Natl Acad Sci USA
.
2002
;
99
:
3684
9
.

Cai
 
Y
,
Wang
 
Y
,
Shi
 
C
 et al.  
Single-cell immune profiling reveals functional diversity of T cells in tuberculous pleural effusion
.
J Exp Med
.
2022
;
219
:
e20211777
.

Caminero
 
JA
.
Multidrug-resistant tuberculosis: epidemiology, risk factors and case finding
.
Int J Tuberc Lung Dis
.
2010
;
14
:
382
90
.

Campbell
 
F
,
Strang
 
C
,
Ferguson
 
N
 et al.  
When are pathogen genome sequences informative of transmission events?
.
PLoS Pathog
.
2018
;
14
:
e1006885
.

Camus
 
J-C
,
Pryor
 
MJ
,
Médigue
 
C
 et al.  
Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv
.
Microbiology (Reading)
.
2002
;
148
:
2967
73
.

Carey
 
AF
,
Rock
 
JM
,
Krieger
 
IV
 et al.  
TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities
.
PLoS Pathog
.
2018
;
14
:
e1006939
.

Castro
 
RAD
,
Ross
 
A
,
Kamwela
 
L
 et al.  
The genetic background modulates the evolution of fluoroquinolone-resistance in Mycobacterium tuberculosis
.
Mol Biol Evol
.
2020
;
37
:
195
207
.

Coll
 
F
,
Mallard
 
K
,
Preston
 
MD
 et al.  
SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences
.
Bioinformatics
.
2012
;
28
:
2991
3
.

Cornejo-Granados
 
F
,
López-Leal
 
G
,
Mata-Espinosa
 
DA
 et al.  
Targeted RNA-seq reveals the M. tuberculosis transcriptome from an in vivo infection model
.
Biology (Basel)
.
2021
;
10
:
848
.

Couvin
 
D
,
Segretier
 
W
,
Stattner
 
E
 et al.  
Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families
.
Database (Oxford)
.
2020
;
2020
:
baaa108
.

Dar
 
HA
,
Zaheer
 
T
,
Ullah
 
N
 et al.  
Pangenome analysis of Mycobacterium tuberculosis reveals core-drug targets and screening of promising lead compounds for drug discovery
.
Antibiotics (Basel)
.
2020
;
9
:
819
.

de Bernardi Schneider
 
A
,
Ford
 
CT
,
Hostager
 
R
 et al.  
StrainHub: a phylogenetic tool to construct pathogen transmission networks
.
Bioinformatics
.
2020
;
36
:
945
7
.

De Maio
 
N
,
Wu
 
C-H
,
Wilson
 
DJ
.
SCOTTI: efficient reconstruction of transmission within outbreaks with the structured coalescent
.
PLoS Comput Biol
.
2016
;
12
:
e1005130
.

Dhar
 
S
,
Zhang
 
C
,
Mandoiu
 
II
 et al.  
TNet: transmission network inference using within-host strain diversity and its application to geographical tracking of COVID-19 spread
.
IEEE/ACM Trans Comput Biol Bioinform
.
2022
;
19
:
230
42
.

Didelot
 
X
,
Gardy
 
J
,
Colijn
 
C
.
Bayesian inference of infectious disease transmission from whole-genome sequence data
.
Mol Biol Evol
.
2014
;
31
:
1869
79
.

Didelot
 
X
,
Kendall
 
M
,
Xu
 
Y
 et al.  
Genomic epidemiology analysis of infectious disease outbreaks using TransPhylo
.
Curr Protoc
.
2021
;
1
:
e60
.

Dreyer
 
V
,
Utpatel
 
C
,
Kohl
 
TA
 et al.  
Detection of low-frequency resistance-mediating SNPs in next-generation sequencing data of Mycobacterium tuberculosis complex strains with binoSNP
.
Sci Rep
.
2020
;
10
:
7874
.

Dumas
 
E
,
Christina Boritsch
 
E
,
Vandenbogaert
 
M
 et al.  
Mycobacterial pan-genome analysis suggests important role of plasmids in the radiation of type VII secretion systems
.
Genome Biol Evol
.
2016
;
8
:
387
402
.

Earle
 
SG
,
Wu
 
C-H
,
Charlesworth
 
J
 et al.  
Identifying lineage effects when controlling for population structure improves power in bacterial association studies
.
Nat Microbiol
.
2016
;
1
:
16041
.

Estévez
 
O
,
Anibarro
 
L
,
Garet
 
E
 et al.  
An RNA-seq based machine learning approach identifies latent tuberculosis patients with an active tuberculosis profile
.
Front Immunol
.
2020
;
11
:
1470
.

Fenner
 
L
,
Egger
 
M
,
Bodmer
 
T
 et al.  
Effect of mutation and genetic background on drug resistance in Mycobacterium tuberculosis
.
Antimicrob Agents Chemother
.
2012
;
56
:
3047
53
.

Freeman
 
TC
,
Horsewell
 
S
,
Patir
 
A
,
Harling-Lee
 
J
,
Regan
 
T
,
Shih
 
BB
,
Prendergast
 
J
,
Hume
 
DA
,
Angus
 
T
.
Graphia: A platform for the graph-based visualisation and analysis of high dimensional data
.
PLoS Comput Biol
.
2022 Jul 25
;
18
:
e1010310
.
doi
:

Freschi
 
L
,
Vargas
 
R
,
Husain
 
A
 et al.  
Population structure, biogeography and transmissibility of Mycobacterium tuberculosis
.
Nat Commun
.
2021
;
12
:
6099
.

Gabbassov
 
E
,
Moreno-Molina
 
M
,
Comas
 
I
 et al.  
SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data
.
Microb Genom
.
2021
;
7
:
000607
.

Gabrielian
 
A
,
Engle
 
E
,
Harris
 
M
 et al.  
TB DEPOT (data exploration portal): a multi-domain tuberculosis data analysis resource
.
PLoS One
.
2019
;
14
:
e0217410
.

Gagneux
 
S
,
Long
 
CD
,
Small
 
PM
 et al.  
The competitive cost of antibiotic resistance in Mycobacterium tuberculosis
.
Science
.
2006
;
312
:
1944
6
.

Galagan
 
JE
.
Genomic insights into tuberculosis
.
Nat Rev Genet
.
2014
;
15
:
307
20
.

Gan
 
M
,
Liu
 
Q
,
Yang
 
C
 et al.  
Deep whole-genome sequencing to detect mixed infection of Mycobacterium tuberculosis
.
PLoS One
.
2016
;
11
:
e0159029
.

Genestet
 
C
,
Hodille
 
E
,
Berland
 
J-L
 et al.  
Whole-genome sequencing in drug susceptibility testing of Mycobacterium tuberculosis in routine practice in Lyon, France
.
Int J Antimicrob Agents
.
2020
;
55
:
105912
.

Giehl
 
C
,
Lange
 
C
,
Duarte
 
R
 et al.  
TBNET—collaborative research on tuberculosis in Europe
.
Eur J Microbiol Immunol (Bp)
.
2012
;
2
:
264
74
.

Goig
 
GA
,
Cancino-Muñoz
 
I
,
Torres-Puente
 
M
 et al.  
Whole-genome sequencing of Mycobacterium tuberculosis directly from clinical samples for high-resolution genomic epidemiology and drug resistance surveillance: an observational study
.
Lancet Microbe
.
2020
;
1
:
e175
83
.

Goldstein
 
IH
,
Bayer
 
D
,
Barilar
 
I
 et al.  
Using genetic data to identify transmission risk factors: statistical assessment and application to tuberculosis transmission
.
PLoS Comput Biol
.
2022
;
18
:
e1010696
.

Gómez-González
 
PJ
,
Campino
 
S
,
Phelan
 
JE
 et al.  
Portable sequencing of Mycobacterium tuberculosis for clinical and epidemiological applications
.
Brief Bioinform
.
2022
;
23
:
bbac256
.

Gómez-González
 
PJ
,
Perdigao
 
J
,
Gomes
 
P
 et al.  
Genetic diversity of candidate loci linked to Mycobacterium tuberculosis resistance to bedaquiline, delamanid and pretomanid
.
Sci Rep
.
2021
;
11
:
19431
.

Goossens
 
SN
,
Heupink
 
TH
,
De Vos
 
E
 et al.  
Detection of minor variants in Mycobacterium tuberculosis whole genome sequencing data
.
Brief Bioinform
.
2022
;
23
:
bbab541
.

Gygli
 
SM
,
Borrell
 
S
,
Trauner
 
A
 et al.  
Antimicrobial resistance in Mycobacterium tuberculosis: mechanistic and evolutionary perspectives
.
FEMS Microbiol Rev
.
2017
;
41
:
354
73
.

Hadfield
 
J
,
Megill
 
C
,
Bell
 
SM
,
Huddleston
 
J
,
Potter
 
B
,
Callender
 
C
,
Sagulenko
 
P
,
Bedford
 
T
,
Neher
 
RA
.
Nextstrain: real-time tracking of pathogen evolution
.
Bioinformatics
.
2018
;
34
:
4121
4123
.
doi
:.

Hall
 
MB
,
Rabodoarivelo
 
MS
,
Koch
 
A
 et al.  
Evaluation of nanopore sequencing for Mycobacterium tuberculosis drug susceptibility testing and outbreak investigation: a genomic analysis
.
Lancet Microbe
.
2023
;
4
:
e84
92
.

Helmy
 
M
,
Awad
 
M
,
Mosa
 
KA
.
Limited resources of genome sequencing in developing countries: challenges and solutions
.
Appl Transl Genom
.
2016
;
9
:
15
9
.

Hunt
 
M
,
Bradley
 
P
,
Lapierre
 
SG
 et al.  
Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe
.
Wellcome Open Res
.
2019
;
4
:
191
.

Jacques
 
P-E
,
Gervais
 
AL
,
Cantin
 
M
 et al.  
MtbRegList, a database dedicated to the analysis of transcriptional regulation in Mycobacterium tuberculosis
.
Bioinformatics
.
2005
;
21
:
2563
5
.

Jaillard
 
M
,
Lima
 
L
,
Tournoud
 
M
 et al.  
A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between k-mers and genetic events
.
PLoS Genet
.
2018
;
14
:
e1007758
.

Jandrasits
 
C
,
Kröger
 
S
,
Haas
 
W
 et al.  
Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters
.
PLoS Comput Biol
.
2019
;
15
:
e1007527
.

Jombart
 
T
,
Cori
 
A
,
Didelot
 
X
 et al.  
Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data
.
PLoS Comput Biol
.
2014
;
10
:
e1003457
.

Joshi
 
KR
,
Dhiman
 
H
,
Scaria
 
V
.
tbvar: a comprehensive genome variation resource for Mycobacterium tuberculosis
.
Database (Oxford)
.
2014
;
2014
:
bat083
.

Kamerbeek
 
J
,
Schouls
 
L
,
Kolk
 
A
 et al.  
Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology
.
J Clin Microbiol
.
1997
;
35
:
907
14
.

Karikari
 
TK
,
Quansah
 
E
,
Mohamed
 
WMY
.
Widening participation would be key in enhancing bioinformatics and genomics research in Africa
.
Appl Transl Genom
.
2015
;
6
:
35
41
.

Kato-Maeda
 
M
,
Bifani
 
PJ
,
Kreiswirth
 
BN
 et al.  
The nature and consequence of genetic variability within Mycobacterium tuberculosis
.
J Clin Invest
.
2001
;
107
:
533
7
.

Kavvas
 
ES
,
Catoiu
 
E
,
Mih
 
N
 et al.  
Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance
.
Nat Commun
.
2018
;
9
:
4306
.

Kim
 
Y
,
Gu
 
C
,
Kim
 
HU
 et al.  
Current status of pan-genome analysis for pathogenic bacteria
.
Curr Opin Biotechnol
.
2020
;
63
:
54
62
.

Klinkenberg
 
D
,
Backer
 
JA
,
Didelot
 
X
 et al.  
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks
.
PLoS Comput Biol
.
2017
;
13
:
e1005495
.

Lai
 
RPJ
,
Cortes
 
T
,
Marais
 
S
 et al.  
Transcriptomic characterization of tuberculous sputum reveals a host warburg effect and microbial cholesterol catabolism
.
mBio
.
2021
;
12
:
e0176621
.

Lequime
 
S
,
Bastide
 
P
,
Dellicour
 
S
 et al.  
nosoi: a stochastic agent-based transmission chain simulation framework in r
.
Methods Ecol Evol
.
2020
;
11
:
1002
7
.

Liang
 
Q
,
Shang
 
Y
,
Huo
 
F
 et al.  
Assessment of current diagnostic algorithm for detection of mixed infection with Mycobacterium tuberculosis and nontuberculous mycobacteria
.
J Infect Public Health
.
2020
;
13
:
1967
71
.

López-Agudelo
 
VA
,
Baena
 
A
,
Barrera
 
V
 et al.  
Dual RNA sequencing of Mycobacterium tuberculosis-infected Human splenic macrophages reveals a strain-dependent host-pathogen response to infection
.
Int J Mol Sci
.
2022
;
23
:
1803
.

Lose
 
T
,
van Heusden
 
P
,
Christoffels
 
A
.
COMBAT-TB-NeoDB: fostering tuberculosis research through integrative analysis using graph database technologies
.
Bioinformatics
.
2020
;
36
:
982
3
.

Lozano
 
N
,
Lanza
 
VF
,
Suárez-González
 
J
 et al.  
Detection of minority variants and mixed infections in Mycobacterium tuberculosis by direct whole-genome sequencing on noncultured specimens using a specific-DNA capture strategy
.
mSphere
.
2021
;
6
:
e0074421
.

Malhotra
 
S
,
Mugumbate
 
G
,
Blundell
 
TL
 et al.  
TIBLE: a web-based, freely accessible resource for small-molecule binding data for mycobacterial species
.
Database (Oxford)
.
2017
;
2017
:
bax041
.

Meehan
 
CJ
,
Goig
 
GA
,
Kohl
 
TA
 et al.  
Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues
.
Nat Rev Microbiol
.
2019
;
17
:
533
45
.

Merget
 
B
,
Zilian
 
D
,
Müller
 
T
 et al.  
MycPermCheck: the Mycobacterium tuberculosis permeability prediction tool for small molecules
.
Bioinformatics
.
2013
;
29
:
62
8
.

Metri
 
R
,
Hariharaputran
 
S
,
Ramakrishnan
 
G
 et al.  
SInCRe-structural interactome computational resource for Mycobacterium tuberculosis
.
Database (Oxford)
.
2015
;
2015
:
bav060
.

Mikheecheva
 
NE
,
Zaychikova
 
MV
,
Melerzanov
 
AV
 et al.  
A nonsynonymous SNP catalog of Mycobacterium tuberculosis virulence genes and its use for detecting new potentially virulent sublineages
.
Genome Biol Evol
.
2017
;
9
:
887
99
.

Modlin
 
SJ
,
Robinhold
 
C
,
Morrissey
 
C
 et al.  
Exact mapping of Illumina blind spots in the Mycobacterium tuberculosis genome reveals platform-wide and workflow-specific biases
.
Microb Genom
.
2021
;
7
:
mgen000465
.

Morelli
 
MJ
,
Thébaud
 
G
,
Chadœuf
 
J
 et al.  
A bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data
.
PLoS Comput Biol
.
2012
;
8
:
e1002768
.

Moreno-Molina
 
M
,
Shubladze
 
N
,
Khurtsilava
 
I
 et al.  
Genomic analyses of Mycobacterium tuberculosis from human lung resections reveal a high frequency of polyclonal infections
.
Nat Commun
.
2021
;
12
:
2716
.

Muzzi
 
A
,
Masignani
 
V
,
Rappuoli
 
R
.
The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials
.
Drug Discov Today
.
2007
;
12
:
429
39
.

Neher
 
RA
,
Bedford
 
T
.
Real-Time Analysis and Visualization of Pathogen Sequence Data J Clin Microbiol
.
2018
,
56
:
10
 https://doi.org/10.1128/jcm.00480-18

Ochoa-Montaño
 
B
,
Mohan
 
N
,
Blundell
 
TL
.
CHOPIN: a web resource for the structural and functional proteome of Mycobacterium tuberculosis
.
Database (Oxford)
.
2015
;
2015
:
bav026
.

Pan
 
J
,
Li
 
X
,
Zhang
 
M
 et al.  
TransFlow: a Snakemake workflow for transmission analysis of Mycobacterium tuberculosis whole-genome sequencing data
.
Bioinformatics
.
2023a
;
39
:
btac785
.

Pan
 
J
,
Zhang
 
X
,
Xu
 
J
 et al.  
Landscape of exhausted T cells in tuberculosis revealed by single-cell sequencing
.
Microbiol Spectr
.
2023b
;
11
:
e0283922
.

Peker
 
N
,
Schuele
 
L
,
Kok
 
N
 et al.  
Evaluation of whole-genome sequence data analysis approaches for short- and long-read sequencing of Mycobacterium tuberculosis
.
Microb Genom
.
2021
;
7
:
000695
.

Permana
 
B
,
Beatson
 
SA
,
Forde
 
BM
.
GraphSNP: an interactive distance viewer for investigating outbreaks and transmission networks using a graph approach
.
BMC Bioinf
.
2023
;
24
:
209
.

Phelan
 
J
,
O'Sullivan
 
DM
,
Machado
 
D
 et al.  
The variability and reproducibility of whole genome sequencing technology for detecting resistance to anti-tuberculous drugs
.
Genome Med
.
2016
;
8
:
132
.

Phelan
 
JE
,
O'Sullivan
 
DM
,
Machado
 
D
 et al.  
Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs
.
Genome Med
.
2019
;
11
:
41
.

Pisu
 
D
,
Huang
 
L
,
Grenier
 
JK
 et al.  
Dual RNA-seq of mtb-infected macrophages in vivo reveals ontologically distinct host–pathogen interactions
.
Cell Rep
.
2020a
;
30
:
335
50.e4
.

Pisu
 
D
,
Huang
 
L
,
Narang
 
V
 et al.  
Single cell analysis of M. tuberculosis phenotype and macrophage lineages in the infected lung
.
J Exp Med
.
2021
;
218
:
e20210615
.

Pisu
 
D
,
Huang
 
L
,
Rin Lee
 
BN
 et al.  
Dual RNA-sequencing of Mycobacterium tuberculosis-infected cells from a murine infection model
.
STAR Protoc
.
2020b
;
1
:
100123
.

Pisu
 
D
,
Russell
 
DG
.
Protocol for multi-modal single-cell RNA sequencing on M. tuberculosis-infected mouse lungs
.
STAR Protoc
.
2023
;
4
:
102102
.

Quan
 
TP
,
Bawa
 
Z
,
Foster
 
D
 et al.  
Evaluation of whole-genome sequencing for mycobacterial species identification and drug susceptibility testing in a clinical setting: a large-scale prospective assessment of performance against line probe assays and phenotyping
.
J Clin Microbiol
.
2018
;
56
:
e01480
17
.

Radusky
 
L
,
Defelipe
 
LA
,
Lanzarotti
 
E
 et al.  
TuberQ: a Mycobacterium tuberculosis protein druggability database
.
Database (Oxford)
.
2014
;
2014
:
bau035
.

Repasy
 
T
,
Lee
 
J
,
Marino
 
S
 et al.  
Intracellular bacillary burden reflects a burst size for Mycobacterium tuberculosis in vivo
.
PLoS Pathog
.
2013
;
9
:
e1003190
.

Richardson
 
M
,
Carroll
 
NM
,
Engelke
 
E
 et al.  
Multiple Mycobacterium tuberculosis strains in early cultures from patients in a high-incidence community setting
.
J Clin Microbiol
.
2002
;
40
:
2750
4
.

Rienksma
 
RA
,
Suarez-Diez
 
M
,
Mollenkopf
 
H-J
 et al.  
Comprehensive insights into transcriptional adaptation of intracellular mycobacteria by microbe-enriched dual RNA sequencing
.
BMC Genomics [Electronic Resource]
.
2015
;
16
:
34
.

Rivière
 
E
,
Heupink
 
TH
,
Ismail
 
N
 et al.  
Capacity building for whole genome sequencing of Mycobacterium tuberculosis and bioinformatics in high TB burden countries
.
Brief Bioinform
.
2021
;
22
:
bbaa246
.

Robert
 
A
,
Funk
 
S
,
Kucharski
 
AJ
.
o2geosocial: reconstructing who-infected-whom from routinely collected surveillance data
.
F1000Res
.
2021
;
10
:
31
.

Romero-Severson
 
E
,
Skar
 
H
,
Bulla
 
I
 et al.  
Timing and order of transmission events is not directly reflected in a pathogen phylogeny
.
Mol Biol Evol
.
2014
;
31
:
2472
82
.

Rosenthal
 
A
,
Gabrielian
 
A
,
Engle
 
E
 et al.  
The TB portals: an open-access, web-based platform for global drug-resistant-tuberculosis data sharing and analysis
.
J Clin Microbiol
.
2017
;
55
:
3267
82
.

Ruesen
 
C
,
Riza
 
AL
,
Florescu
 
A
 et al.  
Linking minimum inhibitory concentrations to whole genome sequence-predicted drug resistance in Mycobacterium tuberculosis strains from Romania
.
Sci Rep
.
2018
;
8
:
9676
.

Saavedra Cervera
 
B
,
López
 
MG
,
Chiner-Oms
 
Á
 et al.  
Fine-grain population structure and transmission patterns of Mycobacterium tuberculosis in southern Mozambique, a high TB/HIV burden area
.
Microb Genom
.
2022
;
8
:
mgen000844

Sahajpal
 
R
,
Kandoi
 
G
,
Dhiman
 
H
 et al.  
HGV&TB: a comprehensive online resource on human genes and genetic variants associated with tuberculosis
.
Database
.
2014
;
2014
:
bau112
.

Said Mohammed
 
K
,
Kibinge
 
N
,
Prins
 
P
 et al.  
Evaluating the performance of tools used to call minority variants from whole genome short-read data
.
Wellcome Open Res
.
2018
;
3
:
21
.

Sanger
 
F
,
Nicklen
 
S
,
Coulson
 
AR
.
DNA sequencing with chain-terminating inhibitors
.
Proc Natl Acad Sci USA
.
1977
;
74
:
5463
7
.

Shabbeer
 
A
,
Cowan
 
LS
,
Ozcaglar
 
C
 et al.  
TB-lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex
.
Infect Genet Evol
.
2012
;
12
:
789
97
.

Skums
 
P
,
Mohebbi
 
F
,
Tsyvina
 
V
 et al.  
SOPHIE: viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework
.
Cell Syst
.
2022
;
13
:
844
56.e4
.

Skums
 
P
,
Zelikovsky
 
A
,
Singh
 
R
 et al.  
QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data
.
Bioinformatics
.
2018
;
34
:
163
70
.

Sobkowiak
 
B
,
Romanowski
 
K
,
Sekirov
 
I
 et al.  
Comparing Mycobacterium tuberculosis transmission reconstruction models from whole genome sequence data
.
Epidemiol Infect
.
2023
;
151
:
e105
.

Stimson
 
J
,
Gardy
 
J
,
Mathema
 
B
 et al.  
Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions
.
Mol Biol Evol
.
2019
;
36
:
587
603
.

Streicher
 
EM
,
Bergval
 
I
,
Dheda
 
K
 et al.  
Mycobacterium tuberculosis population structure determines the outcome of genetics-based second-line drug resistance testing
.
Antimicrob Agents Chemother
.
2012
;
56
:
2420
7
.

Supply
 
P
,
Magdalena
 
J
,
Himpens
 
S
 et al.  
Identification of novel intergenic repetitive units in a mycobacterial two-component system operon
.
Mol Microbiol
.
1997
;
26
:
991
1003
.

Supply
 
P
,
Mazars
 
E
,
Lesjean
 
S
 et al.  
Variable human minisatellite-like regions in the Mycobacterium tuberculosis genome
.
Mol Microbiol
.
2000
;
36
:
762
71
.

Tsolaki
 
AG
,
Hirsh
 
AE
,
DeRiemer
 
K
 et al.  
Functional and evolutionary genomics of Mycobacterium tuberculosis: insights from genomic deletions in 100 strains
.
Proc Natl Acad Sci USA
.
2004
;
101
:
4865
70
.

Tyler
 
AD
,
Christianson
 
S
,
Knox
 
NC
 et al.  
Comparison of sample preparation methods used for the next-generation sequencing of Mycobacterium tuberculosis
.
PLoS One
.
2016
;
11
:
e0148676
.

Usmani
 
SS
,
Kumar
 
R
,
Kumar
 
V
 et al.  
AntiTbPdb: a knowledgebase of anti-tubercular peptides
.
Database (Oxford)
.
2018
;
2018
:
bay025
.

van Beek
 
J
,
Haanperä
 
M
,
Smit
 
PW
 et al.  
Evaluation of whole genome sequencing and software tools for drug susceptibility testing of Mycobacterium tuberculosis
.
Clin Microbiol Infect
.
2019
;
25
:
82
86
.

van Embden
 
JD
,
Cave
 
MD
,
Crawford
 
JT
 et al.  
Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology
.
J Clin Microbiol
.
1993
;
31
:
406
9
.

van Rie
 
A
,
Victor
 
TC
,
Richardson
 
M
 et al.  
Reinfection and mixed infection cause changing Mycobacterium tuberculosis drug-resistance patterns
.
Am J Respir Crit Care Med
.
2005
;
172
:
636
42
.

Walker
 
TM
,
Ip
 
CLC
,
Harrell
 
RH
 et al.  
Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study
.
Lancet Infect Dis
.
2013
;
13
:
137
46
.

Wang
 
L
,
Ma
 
H
,
Wen
 
Z
 et al.  
Single-cell RNA-sequencing reveals heterogeneity and intercellular crosstalk in human tuberculosis lung
.
J Infect
.
2023
;
87
:
373
84
.

Wang
 
L
,
Yang
 
J
,
Chen
 
L
 et al.  
Whole-genome sequencing of Mycobacterium tuberculosis for prediction of drug resistance
.
Epidemiol Infect
.
2022
;
150
:
e22
.

Wang
 
Y
,
Jiang
 
Z
,
Liang
 
P
 et al.  
TB-DROP: deep learning-based drug resistance prediction of Mycobacterium tuberculosis utilizing whole genome mutations
.
BMC Genomics [Electronic Resource]
.
2024
;
25
:
167
.

Westermann
 
AJ
,
Gorski
 
SA
,
Vogel
 
J
.
Dual RNA-seq of pathogen and host
.
Nat Rev Microbiol
.
2012
;
10
:
618
30
.

Wilm
 
A
,
Aw
 
PPK
,
Bertrand
 
D
 et al.  
LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets
.
Nucleic Acids Res
.
2012
;
40
:
11189
201
.

Winglee
 
K
,
McDaniel
 
CJ
,
Linde
 
L
 et al.  
Logically inferred tuberculosis transmission (LITT): a data integration algorithm to rank potential source cases
.
Front Public Health
.
2021
;
9
:
667337
.

Wirth
 
T
,
Wong
 
V
,
Vandenesch
 
F
 et al.  
Applied phyloepidemiology: detecting drivers of pathogen transmission from genomic signatures using density measures
.
Evol Appl
.
2020
;
13
:
1513
25
.

Wymant
 
C
,
Hall
 
M
,
Ratmann
 
O
 et al.  
PHYLOSCANNER: inferring transmission from within- and between-host pathogen genetic diversity
.
Mol Biol Evol
.
2018
;
35
:
719
33
.

Yang
 
C
,
Sobkowiak
 
B
,
Naidu
 
V
 et al.  
Phylogeography and transmission of M. tuberculosis in Moldova: a prospective genomic analysis
.
PLoS Med
.
2022a
;
19
:
e1003933
.

Yang
 
T
,
Gan
 
M
,
Liu
 
Q
 et al.  
SAM-TB: a whole genome sequencing data analysis website for detection of Mycobacterium tuberculosis drug resistance and transmission
.
Brief Bioinform
.
2022b
;
23
:
bbac030
.

Yang
 
T
,
Zhong
 
J
,
Zhang
 
J
 et al.  
Pan-genomic study of Mycobacterium tuberculosis reflecting the primary/secondary genes, generality/individuality, and the interconversion through copy number variations
.
Front Microbiol
.
2018
;
9
:
1886
.

Yang
 
Y
,
Walker
 
TM
,
Walker
 
AS
 et al.  
DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis
.
Bioinformatics
.
2019
;
35
:
3240
9
.

Yang
 
Z
,
Wang
 
C
,
Liu
 
L
 et al.  
CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses
.
Nat Genet
.
2023
;
55
:
1057
65
.

Yoo
 
R
,
Rychel
 
K
,
Poudel
 
S
 et al.  
Machine learning of all Mycobacterium tuberculosis H37Rv RNA-seq data reveals a structured interplay between metabolism, stress response, and infection
.
mSphere
.
2022
;
7
:
e0003322
.

Ypma
 
RJF
,
van Ballegooijen
 
WM
,
Wallinga
 
J
.
Relating phylogenetic trees to transmission trees of infectious disease outbreaks
.
Genetics
.
2013
;
195
:
1055
62
.

Zakham
 
F
,
Laurent
 
S
,
Esteves Carreira
 
AL
 et al.  
Whole-genome sequencing for rapid, reliable and routine investigation of Mycobacterium tuberculosis transmission in local communities
.
New Microbes New Infect
.
2019
;
31
:
100582
.

Zetola
 
NM
,
Shin
 
SS
,
Tumedi
 
KA
 et al.  
Mixed Mycobacterium tuberculosis complex infections and false-negative results for rifampin resistance by GeneXpert MTB/RIF are associated with poor clinical outcomes
.
J Clin Microbiol
.
2014
;
52
:
2422
9
.

Zong
 
Z
,
Huo
 
F
,
Shi
 
J
 et al.  
Relapse versus reinfection of recurrent tuberculosis patients in a National Tuberculosis Specialized Hospital in Beijing, China
.
Front Microbiol
.
2018
;
9
:
1858
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)