Abstract

Here, we report the first telomere-to-telomere genome assembly of matsutake (Tricholoma matsutake), which consists of 13 sequences (spanning 161.0 Mb) and a 76 kb circular mitochondrial genome. All the 13 sequences were supported with telomeric repeats at the ends. GC-rich regions are located at the middle of the sequences and are enriched with long interspersed nuclear elements (LINEs). Repetitive sequences including long-terminal repeats (LTRs) and LINEs occupy 71.6% of the genome. A total of 21,887 potential protein-coding genes were predicted. The genomic data reported in this study served not only matsutake gene sequences but also genome structures and intergenic sequences. The information gained would be a great reference for exploring the genetics, genomics, and evolutionary study of matsutake in the future, and ultimately facilitate the conservation of this vulnerable genetic resource.

1. Introduction

Matsutake (Tricholoma matsutake [S. Ito et Imai] Singer), belonging to the phylum Basidiomycota, is an ectomycorrhizal fungus that coexists with Pinaceae and Fagaceae trees in a symbiotic association.1,2 In the field, two spores of matsutake fuse together and grow to form a ‘shiro’, which is a symbiotic entity formed between matsutake and its host tree. One shiro produces a number of sporocarps during the growing season. The sporocarp of matsutake has been considered as one of the most valuable components of traditional Japanese cuisine since ancient times, as mentioned in Manyo-shu (a series of books for Japanese poetry compiled around 700 AD in Japan), owing to its pleasant aroma, which is largely attributed to 1-octen-3-ol (also known as matsutakeol)3,4; however, sporocarps are non-culturable. In 2019, the International Union for Conservation of Nature categorized matsutake as vulnerable. The production of sporocarps has drastically decreased in recent years5 because of the deterioration of its growing environment. To understand the life cycle and life history of matsutake, safeguarding its production and conservation is necessary, which requires genomic analysis.

Four assemblies of the matsutake genome are currently available in a public DNA database.6,7 However, the sequences are highly fragmented because contigs are enormous in number (2,545–88,884) and short (N50 length = 2.9–320.9 kb), thus providing insufficient genome coverage even though comprehensive protein-coding genes might be represented by the sequences.6 Moreover, because retrotransposons such as MarY1 span ~6 kb in length and are dispersed throughout the matsutake genome,8 a full-length genome assembly may not be achieved with short-read and error-prone long-read sequencing technologies, both of which were employed to construct the four genome assemblies. The recent advanced sequencing technologies, e.g. high-fidelity long-read (HiFi) technology (PacBio, Menlo Park, CA, USA) together with ultralong-read sequencing (Oxford Nanopore Technologies, Oxford, UK), PCR-free sequencing (Illumina, San Diego, CA, USA), high-throughput chromosome conformation (Arima Genomics, Carlsbad, CA, USA), optical maps (Bionano Genomics, San Diego, CA, USA), and single-cell DNA template strand sequencing (10X Genomics, Pleasanton, CA, USA), enabled to span repetitive sequences in genomes. These technologies contribute to establish complete gapless assemblies of the human haploid genome at the telomere-to-telomere level,9 in which a single contig corresponds to a single chromosome.

In this study, we applied the HiFi technology to address the complexity of the matsutake genome. Using this technology, we established 13 telomere-to-telomere sequences, which would provide not only matsutake gene sequences but also the genome structures and the intergenic sequences. Overall, this study represents a milestone in the cytogenetics-, genetics-, and genomics-focussed research on matsutake mushroom.

2. Materials and methods

2.1. Fungus material and DNA extraction

Two sporocarps, which were probably ramets derived from a single shiro (radius > 2 m) that has been generating sporocarps for more than 20 years,10 were collected from Ina, Nagano, Japan. The sporocarps were flash-frozen in liquid nitrogen, dried under vacuum, and then stored at room temperature until needed for DNA extraction.

Genomic DNA was extracted from the dried stipes using the cetyltrimethylammonium bromide (CTAB) method.11 The concentration of the extracted DNA was measured using the Qubit dsDNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, USA), and DNA fragment length was evaluated by agarose gel electrophoresis with Pippin Pulse (Sage Science, Beverly, MA, USA).

2.2. DNA sequencing

Genomic DNA was subjected to HiFi SMRTbell library construction using the SMRTbell Express Template Prep Kit 2.0 (PacBio), according to the manufacturer’s instructions, with a minor modification. Because the genomic DNA was degraded, the DNA shearing step recommended in the protocol was skipped. The resultant DNA was fractionated with BluePippin (Sage Science) to eliminate fragments less than 10 kb in size. The DNA libraries prepared from the two sporocarps were indexed with unique barcode adapters, and sequenced on a single SMRT cell 8M on the Sequel IIe system (PacBio).

2.3. Genome assembly and gene annotation

Using the HiFi reads obtained from the Sequel IIe system (PacBio), the genome size and heterozygosity of matsutake was estimated with GCE12 and GenomeScope,13 respectively, based on k-mer frequency (k = 21) calculated with Jellyfish14 (version 2.3.0). The reads were assembled using hifiasm15 (version 0.16.1), with default parameters or with the primary mode. Assembly completeness was evaluated with Benchmarking Universal Single-Copy Orthologs (BUSCO)16 (version 5.2.2; default parameters) using lineage dataset agaricales_odb10 (eukaryota, 2020-08-05). Telomere sequences containing repeats of a 6 bp motif (5ʹ-TTAGGG-3ʹ) were searched by the search subcommand of tidk (https://github.com/tolkit/telomeric-identifier) with a window size of 10,000. Contig connections were evaluated with D-GENIES17 or PCR as described by Shirasawa et al.18 with a pair of primers (5ʹ-TTTGCTGGAAACACAGTAAACTACA-3ʹ and 5ʹ-CTGGGTAATCTTGTGAAACTCTGTC-3ʹ). Amplified DNA was electrophoresed with Genomic DNA ScreenTape on 4200 TapeStation System (Agilent Technologies, Santa Clara, CA, USA). Potential contaminated sequences were identified by a sequence similarity search to UniProtKB19 using DIAMOND20 with an E-value cutoff of <1E-10.

Nuclear genes were predicted with GeMoMa (version 1.9)21 by mapping predicted genes in the matsutake genome, Trima3,6 with MMseqs.22 Mitochondrial genes were predicted with Artemis,23 in accordance with the gene sequences reported in previous mitochondrial genome assemblies (accession number: NC_028135). The predicted genes were functionally annotated with emapper24 (version 2.1.6; search option: mmseqs) implemented in EggNOG,25 and with DIAMOND20 (version 2.0.13; more sensitive mode) search against the UniProtKB19 database. Repetitive sequences in the assembly were identified with RepeatMasker (https://www.repeatmasker.org) (version 4.1.2; parameters: -poly and -xsmall) using repeat sequences registered in Repbase26 and a de novo repeat library built with RepeatModeler (https://www.repeatmasker.org) (version 2.0.2a; default parameters). Sequences showing similarity to MarY1 (accession number: AB028236; 6047 bp) and its long terminal repeats (LTRs; 426 bp) were searched by BLASTN.27

3. Results

3.1. DNA sequencing, data analysis, and genome assembly

Genomic DNA was extracted from two dried sporocarps (samples A and B) of matsutake. The amount of DNA extracted from each sample (9 µg) was sufficient for library construction; however, because of degradation (Supplementary Fig. S1), the extracted DNA was used for library preparation without shearing. The resultant libraries were sequenced on a SMRT Cell 8M to obtain 9.5 Gb (sample A) and 7.8 Gb (sample B) data, with N50 lengths of 11 kb (sample A) and 10 kb (sample B). The k-mer analysis detected two peaks (Supplementary Fig. S2), indicating that the haploid genome size of matsutake was 149 Mb and the level of heterozygosity was 1.05%. The sequence reads of each sample were assembled separately to obtain two sets of contigs: 182 contigs (165.5 Mb) for sample A, and 146 contigs (162.9 Mb) for sample B (Supplementary Table S1). In parallel, haplotype phased contigs were generated for samples A and B (Supplementary Table S1): 163.3 Mb in the haplotype 1 of sample A (Ahap1); 158.8 Mb in Ahap2; 163.8 Mb in Bhap1; and 157.0 Mb in Bhap2.

Next, we searched for the telomeric motif, (TTAGGG)n, in the contigs (Fig. 1). In sample A, the telomeric motif was found at both ends of nine contigs (A2, A6, A8, A11, A12, A13, A14, A15, and A16) and at one end of six contigs (A1, A3, A4, A7, A10, and A22). In sample B, the motif was found at both ends of 10 contigs (B1, B2, B3, B4, B5, B7, B8, B9, B11, and B12) and at one end of two contigs (B6 and B13). The length of the telomeric sequence was ranged from 96 bp (16 repeats) to 186 bp (31 repeats).

Comparative map of contigs of samples A and B. Dots indicate sequences with a sequence identity of ≥75% between the two samples. Red and blue arrows indicate telomeric motifs detected at the ends of contigs of samples A and B, respectively. Black numbers in the plot indicate sequence names in the final assembly (TMA_r1.1) and red and blue numbers by the arrows indicate repeat numbers of telomeric motifs. Contigs A5 and B10 are lacked because of the short sequence length (<1 Mb).
Figure 1.

Comparative map of contigs of samples A and B. Dots indicate sequences with a sequence identity of ≥75% between the two samples. Red and blue arrows indicate telomeric motifs detected at the ends of contigs of samples A and B, respectively. Black numbers in the plot indicate sequence names in the final assembly (TMA_r1.1) and red and blue numbers by the arrows indicate repeat numbers of telomeric motifs. Contigs A5 and B10 are lacked because of the short sequence length (<1 Mb).

Comparison of the two sets of genome assemblies revealed 10 pairs of aligned contigs (A1-B1, A2-B9, A6-B11, A8-B3, A10-B4, A11-B8, A12-B2, A13-B7, A14-B5, and A15-B12) (Fig. 1). Three contigs of sample A (A3, A4, and A9) covered the entire sequence of one contig of sample B (B13). Furthermore, three contigs of sample A (A7, A16, and A22) corresponded to one contig of sample B (B6). Thus, we assumed that contigs A3, A4, and A9 were unassembled, and contig B6 was misassembled at the telomeres. Therefore, we joined contigs A3, A4, and A9 with 100 Ns to establish a single contig, and left contigs A7 and A22 were separated from A16. The jointed points between A7 and A22 and between A4 and A9 were supported by haplotype phased contigs (Supplementary Fig. S3A), and the remaining points between A3 and A9 was verified with PCR (Supplementary Fig. S3B).

Finally, 13 contigs spanning 161.0 Mb were obtained, all which contigs were supported with telomeric motifs at both ends. The remaining contigs as potential contaminated sequences were eliminated from the following analysis because of no sequence similarity with nuclear protein-coding genes of Trima3. The 13 contigs represented 94.2% complete BUSCOs. The final assembly was designated as TMA_r1.1, and the contigs were named TMA_r1.1ch01 to TMA_r1.1ch13 in order of decreasing sequence length (Fig. 1, Table 1). The GC content was ca. 45% over the entire genome, with one peak (~55%) in each chromosome, except chromosome 1, which showed two peaks (Fig. 2). In addition, we identified a 76,067 bp circular contig, which represented the mitochondrial genome of matsutake (Tma1.0mito).

Table 1.

Statistics of the matsutake genome assembly

ChromosomeSequence length (bp)No. of genesContigs of sample AContigs of sample B
Tma1.0ch0119,249,0052,139A3, A4, A9B13
Tma1.0ch0213,874,6491,625A6B11
Tma1.0ch0313,809,4792,285A16B6 (bottom)
Tma1.0ch0413,409,8782,032A11B8
Tma1.0ch0512,860,2861,968A10B4
Tma1.0ch0612,376,7471,991A8B3
Tma1.0ch0712,297,7951,421A7, A22B6 (top)
Tma1.0ch0811,264,5061,594A1B1
Tma1.0ch0910,996,2411,368A15B12
Tma1.0ch1010,915,9691,520A13B7
Tma1.0ch1110,203,9961,247A12B2
Tma1.0ch129,912,9171,205A14B5
Tma1.0ch139,869,2531,492A2B9
Total161,040,72121,887
ChromosomeSequence length (bp)No. of genesContigs of sample AContigs of sample B
Tma1.0ch0119,249,0052,139A3, A4, A9B13
Tma1.0ch0213,874,6491,625A6B11
Tma1.0ch0313,809,4792,285A16B6 (bottom)
Tma1.0ch0413,409,8782,032A11B8
Tma1.0ch0512,860,2861,968A10B4
Tma1.0ch0612,376,7471,991A8B3
Tma1.0ch0712,297,7951,421A7, A22B6 (top)
Tma1.0ch0811,264,5061,594A1B1
Tma1.0ch0910,996,2411,368A15B12
Tma1.0ch1010,915,9691,520A13B7
Tma1.0ch1110,203,9961,247A12B2
Tma1.0ch129,912,9171,205A14B5
Tma1.0ch139,869,2531,492A2B9
Total161,040,72121,887
Table 1.

Statistics of the matsutake genome assembly

ChromosomeSequence length (bp)No. of genesContigs of sample AContigs of sample B
Tma1.0ch0119,249,0052,139A3, A4, A9B13
Tma1.0ch0213,874,6491,625A6B11
Tma1.0ch0313,809,4792,285A16B6 (bottom)
Tma1.0ch0413,409,8782,032A11B8
Tma1.0ch0512,860,2861,968A10B4
Tma1.0ch0612,376,7471,991A8B3
Tma1.0ch0712,297,7951,421A7, A22B6 (top)
Tma1.0ch0811,264,5061,594A1B1
Tma1.0ch0910,996,2411,368A15B12
Tma1.0ch1010,915,9691,520A13B7
Tma1.0ch1110,203,9961,247A12B2
Tma1.0ch129,912,9171,205A14B5
Tma1.0ch139,869,2531,492A2B9
Total161,040,72121,887
ChromosomeSequence length (bp)No. of genesContigs of sample AContigs of sample B
Tma1.0ch0119,249,0052,139A3, A4, A9B13
Tma1.0ch0213,874,6491,625A6B11
Tma1.0ch0313,809,4792,285A16B6 (bottom)
Tma1.0ch0413,409,8782,032A11B8
Tma1.0ch0512,860,2861,968A10B4
Tma1.0ch0612,376,7471,991A8B3
Tma1.0ch0712,297,7951,421A7, A22B6 (top)
Tma1.0ch0811,264,5061,594A1B1
Tma1.0ch0910,996,2411,368A15B12
Tma1.0ch1010,915,9691,520A13B7
Tma1.0ch1110,203,9961,247A12B2
Tma1.0ch129,912,9171,205A14B5
Tma1.0ch139,869,2531,492A2B9
Total161,040,72121,887
Features of the matsutake genome. Bars indicates the GC content and numbers of genes, LINEs, and LTRs within a 100 kb window.
Figure 2.

Features of the matsutake genome. Bars indicates the GC content and numbers of genes, LINEs, and LTRs within a 100 kb window.

3.2. Repetitive sequence analysis

Repetitive sequences occupied a total physical distance of 115.2 Mb (71.6%) in the genome assembly (TMA_r1.1; 161.0 Mb). Nine major types of repeats were identified in varying proportions (Table 2). The dominant repeat types in the chromosome sequences were LTRs (60.2 Mb) and long interspersed nuclear elements (LINEs; 8.9 Mb). LINEs were predominant in regions with high GC content in all chromosomes, whereas LTR retrotransposons were predominant in regions with low GC content (Fig. 2). Repeat sequences unavailable in public databases totalled 40.9 Mb. Among the LTR retrotransposons, MarY1, which has been extensively studied to date, and its terminal repeats were present as 683 and 3,240 copies, respectively, across all 13 chromosomes.

Table 2.

Repetitive sequences in the matsutake genome

Type of repetitive sequenceCopy numberLength (bp)Proportion of genome (%)
SINEs13757>0.0
LINEs7,5138,876,0435.5
LTR elements41,30860,201,80537.4
DNA transposons10,8607,871,8734.9
Small RNA23421,936,9581.2
Satellites36374,650>0.0
Simple repeats9,342407,2710.3
Low complexity98148,999>0.0
Unclassified111,84440,948,94325.4
Type of repetitive sequenceCopy numberLength (bp)Proportion of genome (%)
SINEs13757>0.0
LINEs7,5138,876,0435.5
LTR elements41,30860,201,80537.4
DNA transposons10,8607,871,8734.9
Small RNA23421,936,9581.2
Satellites36374,650>0.0
Simple repeats9,342407,2710.3
Low complexity98148,999>0.0
Unclassified111,84440,948,94325.4
Table 2.

Repetitive sequences in the matsutake genome

Type of repetitive sequenceCopy numberLength (bp)Proportion of genome (%)
SINEs13757>0.0
LINEs7,5138,876,0435.5
LTR elements41,30860,201,80537.4
DNA transposons10,8607,871,8734.9
Small RNA23421,936,9581.2
Satellites36374,650>0.0
Simple repeats9,342407,2710.3
Low complexity98148,999>0.0
Unclassified111,84440,948,94325.4
Type of repetitive sequenceCopy numberLength (bp)Proportion of genome (%)
SINEs13757>0.0
LINEs7,5138,876,0435.5
LTR elements41,30860,201,80537.4
DNA transposons10,8607,871,8734.9
Small RNA23421,936,9581.2
Satellites36374,650>0.0
Simple repeats9,342407,2710.3
Low complexity98148,999>0.0
Unclassified111,84440,948,94325.4

3.3. Gene prediction and annotation

TMA_r1.1 was predicted to contain a total of 21,887 protein-coding genes (Table 1). These predicted genes possessed 93.0% complete BUSCOs. Additionally, sequence alignment revealed that of the 22,885 genes predicted in the previous assembly (Trima3), 22,152 were represented in the current assembly (TMA_r1.1).

4. Discussion

This study presents 13 telomere-to-telomere genome sequence of matsutake (Fig. 1, Table 1). In addition to the telomeric repeat motifs at the ends of the sequences, GC-rich regions were found at a single position in all chromosomes, except chromosome 1, which had two GC-rich regions (Fig. 2). Interestingly, the GC-rich regions were enriched with LINEs but devoid of LTRs (Fig. 2). Together, these observations suggest that GC-rich regions might represent centromeres, and that chromosome 1 is likely a dicentric chromosome formed by the telomeric fusion of two chromosomes. We also compared the genome assemblies generated from two independent data sets (samples A and B) (Fig. 1). Consequently, it was possible to identify a misassembled region and an unassembled region (Table 1), which led to the establishment of a telomere-to-telomere genome assembly. To the best of our knowledge, haploid chromosome number of matsutake (n = 7) has been reported in only one study to date.28 Constructing a telomere-to-telomere assembly could serve as an alternative to karyotyping for proposing the chromosome number of a species, for which no chromosome information is available. Further chromosome observations would be required to validate the assumption and characterize the matsutake chromosomes.

The telomere-to-telomere genome assembly generated in this study spans a physical distance pf 161.0 Mb. Whereas the assembly size was 8% larger than the estimated size of 149.0 Mb (Supplementary Fig. S2), this discrepancy was observed in other organisms depending on species and estimation methods.29 The genome size of matsutake is larger than that of other mushroom species6,7 because of the high proportion of repetitive sequences (Table 2)30. Owing to its high content of repetitive sequences (Table 2) and high heterozygosity (Supplementary Fig. S2), the matsutake genome could not be fully sequenced with short-read and error-prone long-read sequencing technologies. The HiFi sequencing technology (~10 kb read length) employed in this study likely helped overcome the problem posed by repetitive sequences, such as MarY1 (~6 kb), thus enabling the construction of the telomere-to-telomere genome assembly. Owing to the long contigs and high genome coverage, 21,887 genes were predicted in the matsutake genome.

The genome sequences and predicted genes could help us understand the ecophysiology of a shiro and thus reveal the mechanism of sporocarp formation. The long contiguity sequence would provide not only the matsutake genes but also the genome structure and the intergenic sequences, which could contribute biological and evolutional studies of mushroom. Whole-genome sequencing analysis of matsutake lines would provide sequence differences between alleles and their chromosomal locations. This information could be used to reveal the genetic diversity of matsutake in nature, conserve its genetic resources, and ensure its production. Furthermore, genetic analysis, i.e. genome-wide association study, could reveal the genetic mechanisms underlying phenotypic variations in the physiological and metabolomic traits of matsutake. As mentioned above, the matsutake genome assembly constructed in this study could serve as a reference for further genomics and genetic studies.

Acknowledgements

We thank Prof. S. Kuraku (National Institute of Genetics, Japan) for helpful discussions; T. Kurokochi for providing the matsutake samples; and Y. Kishida, C. Minami, K. Ozawa, H. Tsuruoka, and A. Watanabe (Kazusa DNA Research Institute) for technical assistance.

Data availability

Raw sequence reads were deposited in the Sequence Read Archive (SRA) database of the DNA Data Bank of Japan (DDBJ) under the accession number DRA014434. Assembled sequences are available at DDBJ (accession numbers AP026538 - AP026551) and Plant GARDEN (https://plantgarden.jp).

Funding

This study was supported in part by JSPS KAKENHI (16K20964, 20H00429, 22H05172, and 22H05181) and the Kazusa DNA Research Institute Foundation.

Conflict of interest

None declared.

References

1.

Yamada
,
A.
,
Endo
,
N.
,
Murata
,
H.
,
Ohta
,
A.
, and
Fukuda
,
M.
2014
,
Tricholoma matsutake Y1 strain associated with Pinus densiflora shows a gradient of in vitro ectomycorrhizal specificity with Pinaceae and oak hosts
,
Mycoscience
,
55
,
27
34
.

2.

Van Gevelt
,
T.
2014
,
The role of state institutions in non-timber forest product commercialisation: a case study of Tricholoma matsutake in South Korea
,
Int. For. Rev.
,
16
,
1
13
.

3.

Iwade
,
I.
1936
,
Über die charakteristischen Bestandteile der höhren-pilze (II)
,
J. Jpn. For. Soc.
,
18
,
528
36
.

4.

Murahashi
,
S.
1938
,
Uber die riechstoffe des matsutake (Armillaria Matsutake Ito et Imai Agaricaceae)
,
Sci. Pap. Inst. Phys. Chem. Res.
,
34
,
155
72
.

5.

Yamanaka
,
T.
,
Yamada
,
A.
, and
Furukawa
,
H.
2020
,
Advances in the cultivation of the highly-prized ectomycorrhizal mushroom Tricholoma matsutake
,
Mycoscience
,
61
,
49
57
.

6.

Miyauchi
,
S.
,
Kiss
,
E.
,
Kuo
,
A.
, et al.
2020
,
Large-scale genome sequencing of mycorrhizal fungi provides insights into the early evolution of symbiotic traits
,
Nat. Commun.
,
11
,
5125
.

7.

Li
,
H.
,
Wu
,
S.
,
Ma
,
X.
, et al.
2018
,
The genome sequences of 90 mushrooms
,
Sci. Rep.
,
8
,
9982
.

8.

Murata
,
H.
and
Yamada
,
A.
2000
,
marY1, a member of the gypsy group of long terminal repeat retroelements from the ectomycorrhizal basidiomycete Tricholoma matsutake
,
Appl. Environ. Microbiol.
,
66
,
3642
5
.

9.

Nurk
,
S.
,
Koren
,
S.
,
Rhie
,
A.
, et al.
2022
,
The complete sequence of a human genome
,
Science
,
376
,
44
53
.

10.

Kurokochi
,
H.
,
Zhang
,
S.
,
Takeuchi
,
Y.
,
Tan
,
E.
,
Asakawa
,
S.
, and
Lian
,
C.
2017
,
Local-level genetic diversity and structure of Matsutake mushroom (Tricholoma matsutake) populations in Nagano Prefecture, Japan, revealed by 15 microsatellite markers
,
J. Fungi
,
3
,
23
.

11.

Doyle
,
J.J.
and
Doyle
,
J.L.
1990
,
Isolation of plant DNA from fresh tissue
,
Focus
,
12
,
13
5
.

12.

Liu
,
B.
,
Shi
,
Y.
,
Yuan
,
J.
, et al.
2013
,
Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects
,
arXiv:1308.2012
.

13.

Vurture
,
G.W.
,
Sedlazeck
,
F.J.
,
Nattestad
,
M.
, et al.
2017
,
GenomeScope: fast reference-free genome profiling from short reads
,
Bioinformatics
,
33
,
2202
4
.

14.

Marçais
,
G.
and
Kingsford
,
C.
2011
,
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
,
Bioinformatics
,
27
,
764
70
.

15.

Cheng
,
H.
,
Concepcion
,
G.T.
,
Feng
,
X.
,
Zhang
,
H.
, and
Li
,
H.
2021
,
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
,
Nat. Methods
,
18
,
170
5
.

16.

Simão
,
F.A.
,
Waterhouse
,
R.M.
,
Ioannidis
,
P.
,
Kriventseva
,
E.V.
, and
Zdobnov
,
E.M.
2015
,
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
,
Bioinformatics
,
31
,
3210
2
.

17.

Cabanettes
,
F.
and
Klopp
,
C.
2018
,
D-GENIES: dot plot large genomes in an interactive, efficient and simple way
,
PeerJ
,
6
,
e4958
.

18.

Shirasawa
,
K.
,
Asamizu
,
E.
,
Fukuoka
,
H.
, et al.
2010
,
An interspecific linkage map of SSR and intronic polymorphism markers in tomato
,
Theor. Appl. Genet.
,
121
,
731
9
.

19.

UniProt Consortium
.
2023
,
UniProt: the Universal Protein Knowledgebase in 2023
,
Nucleic Acids Res.
,
51
,
D523
31
.

20.

Buchfink
,
B.
,
Reuter
,
K.
, and
Drost
,
H.-G.
2021
,
Sensitive protein alignments at tree-of-life scale using DIAMOND
,
Nat. Methods
,
18
,
366
8
.

21.

Keilwagen
,
J.
,
Hartung
,
F.
,
Paulini
,
M.
,
Twardziok
,
S.O.
, and
Grau
,
J.
2018
,
Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi
,
BMC Bioinf.
,
19
,
189
.

22.

Hauser
,
M.
,
Steinegger
,
M.
, and
Söding
,
J.
2016
,
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets
,
Bioinformatics
,
32
,
1323
30
.

23.

Carver
,
T.
,
Harris
,
S.R.
,
Berriman
,
M.
,
Parkhill
,
J.
, and
McQuillan
,
J.A.
2012
,
Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data
,
Bioinformatics
,
28
,
464
9
.

24.

Cantalapiedra
,
C.P.
,
Hernández-Plaza
,
A.
,
Letunic
,
I.
,
Bork
,
P.
, and
Huerta-Cepas
,
J.
2021
,
eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale
,
Mol. Biol. Evol.
,
38
,
5825
9
.

25.

Huerta-Cepas
,
J.
,
Szklarczyk
,
D.
,
Heller
,
D.
, et al.
2019
,
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses
,
Nucleic Acids Res.
,
47
,
D309
14
.

26.

Jurka
,
J.
,
Kapitonov
,
V.V
,
Pavlicek
,
A.
,
Klonowski
,
P.
,
Kohany
,
O.
, and
Walichiewicz
,
J.
2005
,
Repbase Update, a database of eukaryotic repetitive elements
,
Cytogenet Genome Res.
,
110
,
462
7
.

27.

Altschul
,
S.F.
,
Madden
,
T.L.
,
Schäffer
,
A.A.
, et al.
1997
,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
,
Nucleic Acids Res.
,
25
,
3389
402
.

28.

Tominaga
,
Y.
1963
,
Studies on the life history of Japanese pine mushroom, Armillaria matsutake Ito et Imai
,
Bull Hiroshima Agr Col
,
2
,
105
45
.

29.

Pflug
,
J.M.
,
Holmes
,
V.R.
,
Burrus
,
C.
,
Johnston
,
J.S.
, and
Maddison
,
D.R.
2020
,
Measuring genome sizes using read-depth, k-mers, and flow cytometry: methodological comparisons in beetles (Coleoptera)
,
G3
,
10
,
3047
60
.

30.

Min
,
B.
,
Yoon
,
H.
,
Park
,
J.
, et al.
2020
,
Unusual genome expansion and transcription suppression in ectomycorrhizal Tricholoma matsutake by insertions of transposable elements
,
PLoS One
,
15
,
e0227923
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.