Abstract

Gentiana straminea Maxim. is a perennial herb and mainly distributed in the Qinghai-Tibetan Plateau. To adapt to the extreme environment, it has developed particular morphological, physiological, and genetic structures. Also, rich in iridoids, it is one of the original plants of traditional Chinese herb ‘Qinjiao’. Herein, we present its first chromosome-level genome sequence assembly and compare it with the genomes of other Gentiana species to facilitate the analysis of genomic characteristics. The assembled genome size of G. straminea was 1.25 Gb, with a contig N50 of 7.5 Mb. A total of 96.08% of the genome sequences was anchored on 13 pseudochromosomes, with a scaffold N50 of 92.70 Mb. A total of 54,310 protein-coding genes were predicted, 80.25% of which were functionally annotated. Comparative genomic analyses indicated that G. straminea experienced two whole-genome duplication events after the γ whole-genome triplication with other eudicots, and it diverged from other Gentiana species at ~3.2 Mya. A total of 142 enzyme-coding genes related to iridoid biosynthesis were identified in its genome. Additionally, we identified differences in the number and expression patterns of iridoid biosynthetic pathway genes in G. straminea compared with two other Gentiana species by integrating whole-genome sequence and transcriptomic analyses.

1. Introduction

It is one of the basic questions to elucidate the origin and diversification of plants for Plant Systematics and Evolution.1 The first plant genome sequence of Arabidopsis thaliana was completed in 2000.2 After that, many plant genomes have been sequenced; 1,490 plant species have had their genomes sequenced to date (February 2024), and the number continues to grow exponentially.3 These published plant genomes have dramatically advanced studies in all disciplines of plant biology including Plant Systematics and Evolution.

As the largest genus in the family Gentianaceae, Gentiana comprises ca. 400 species and is found mostly in the temperate and alpine regions of the world. Unfortunately, published genome sequences of the genus are rather insufficient. Also, Gentiana section Cruciata consists of 16 species in China, and they are mainly distributed in the Qinghai-Tibetan Plateau (QTP).4 With abundant iridoids as the main active components, Gentiana straminea Maxim. and some other species in the section are often used as medicinal herbs. In the Chinese Pharmacopoeia (2020 Edition), four original plants of Gentianae Macrophyllae Radix (Qin-Jiao) include Gentiana straminea Maxim., Gentiana macrophylla Pall., Gentiana dahurica Fisch. and Gentiana crassicaulis Duthie ex Burk.. Their dry roots have the effects of dispelling wind dampness, clearing dampness heat, stopping arthralgia pain, and withdrawing deficiency heat, and as the index components, the total content of gentiopicroside and loganic acid is not less than 2.5%.5 In addition, G. straminea and such species as Gentiana waltonii Burk., Gentiana tibetica King ex Hook. f. and so on are used as Tibetan herb ‘Jie-Ji’ to treat rheumatic arthritis.6 Because of the harsh living environment and overexploitation, the wild resource of G. straminea becomes rare, and the species has been included in the Key Protection List of Medicinal Species of China.7 Therefore, cultivation research, in situ conservation and genetic breeding about the alpine species are urgently needed. We have previously carried out studies on morphological classification, chloroplast genome, mitochondrial genome, ISSR markers, and embryonic development of G. straminea.8–16 However, the lack of genomic data limits the further researches.

Iridoids, widely distributed in Gentianaceae, have antiviral, anti-inflammatory, antioxidant, anti-tumour, antibacterial, and hepatoprotective effects.17 Such compounds as gentiopicroside, loganic acid, sweroside, and swertiamarin are often used as index components in traditional Chinese medicine.5 The synthesis of iridoids can be divided into two stages, namely the formation of intermediates, i.e. isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP), and the formation and structural modification of the iridoid skeleton.18 There are two recognized pathways for the generation of iridoid intermediates: the mevalonate pathway (MVA) in the cytosol and the methylerythritol phosphate pathway (MEP) in plastids.19 The iridoid biosynthetic pathway has been extensively studied, but the key steps of the downstream pathway and the structural genes that catalyse the corresponding enzymes have not been fully identified. As a medicinal plant rich in iridoids, the assembly and analysis of G. straminea genome will be helpful for the study of iridoid biosynthetic pathway.

Gentiana has complex genetic backgrounds, and the analysis results of the divergence time of Gentiana species in related studies were different.20,21 The interspecific relationships between most of the species within section Cruciata and the process of diversification associated with section Cruciata remain largely unknown, although relative studies on karyotypes, molecular phylogeny, and biogeography have been carried out.22–24 In addition, interspecies hybridization occurred within section Cruciata, such as the hybridization of G. straminea and G. siphonantha, and differences in ploidy within species further increased the complexity.25,26 High-quality genome of G. straminea could provide rich information for the analysis of the phylogeny and genome evolution of Gentiana. To date, the genomes of two species of section Cruciata (G. macrophylla and G. dahurica) have been assembled, and two whole-genome duplications (WGDs) after the ancient whole-genome triplication with other eudicots have been found in each species, which are important driving force for species diversification and shaping of genomes.27,28 Further comparisons of Gentiana genomes are expected to yield more in-depth information.

In this study, the first chromosome-level reference genome of G. straminea was assembled, which may provide scientific data for the evaluation of germplasm resources, genetic breeding, and cultivation of the alpine gentian. Furthermore, three genomes of G. straminea, G. macrophylla27 and G. dahurica28 from the sect. Cruciata were compared, and the genes involved in the iridoid biosynthetic pathway were analysed to provide a basis for further researches on the gene functions of bioactive ingredients.

2. Materials and methods

2.1. Plant materials

Specimens of G. straminea were collected from Tibet, China (E 97°14.021ʹ, N 31°09.174ʹ, 4,037 m asl) (Fig. 1A), and the fresh samples were immediately transported to the laboratory. Fresh leaves from one individual were harvested, immediately frozen in liquid nitrogen, and stored at −80°C for genomic DNA extraction. Roots, stems, leaves, and flowers from the plant and two other individuals were harvested for total RNA extraction.

Morphological and genomic characteristics of G. straminea. (A) G. straminea. (B) Genome assembly of the 13 pseudochromosomes. a, assembled pseudochromosomes; b, gene density; c, tandem density; and d, GC density.
Figure 1.

Morphological and genomic characteristics of G. straminea. (A) G. straminea. (B) Genome assembly of the 13 pseudochromosomes. a, assembled pseudochromosomes; b, gene density; c, tandem density; and d, GC density.

2.2. DNA library construction and sequencing

High-quality genomic DNA was isolated from fresh leaves using the CTAB method,29 and DNA quality and concentration were evaluated by 0.75% agarose gel electrophoresis, NanoDrop One spectrophotometer (Thermo Fisher Scientific), and Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA).

After its quality and integrity were confirmed, the DNA was randomly sheared using a Covaris ultrasonic disruptor. Illumina sequencing pair-end libraries with an insert size of 300 bp were prepared using Nextera DNA Flex Library Prep Kit (Illumina, San Diego, CA, USA). Sequencing was performed using the Illumina NovaSeq6000 platform (Illumina, San Diego, CA, USA). Raw reads were cleaned to discard low-quality reads (reads with adaptors unknown nucleotides (Ns), or with more than 20% low-quality bases) using the SOAPnuke (v2.1.4) tool (https://github.com/BGI-flexlab/SOAPnuke) and, after data filtering, clean data were used for subsequent analyses.

For Oxford Nanopore sequencing, the libraries were prepared using the SQK- LSK109 ligation kit using the standard protocol. The purified library was loaded onto primed R9.4 Spot-On Flow Cells and sequenced using a PromethION sequencer (Oxford Nanopore Technologies, Oxford, UK) with 48-h runs at Wuhan Benagen Technology (Wuhan, China). Base-calling of raw data was performed using the Oxford Nanopore GUPPY software (v0.3.0).

2.3. RNA isolation and sequencing

Roots, stems, leaves and flowers of three biological replicates were used for RNA isolation. Total RNA was extracted using Total RNA Extractor (Trizol) (Sangon) and sequenced using an Illumina HiSeq X Ten system in paired-end mode (2 × 150 bp). Prepared libraries were sequenced on the Illumina HiSeq 2500 platform following the manufacturer’s protocol in Sangon Biotech, China.

2.4. Genome assembly

Based on the sequencing data, the K-mer analysis method30 was used to estimate the genome size and heterozygosity with the kmer_freq program in the GCEpackage (v1.0.0).

Genomic assembly was performed using NextDenovo (https://github.com/Nextomics/NextDenovo). Two rounds of error correction were performed on the assembly from nanopore sequencing data using Racon (v1.4.11) (https://github.com/isovic/racon). Two rounds of error correction were performed on the assembly from Illumina NovaSeq sequencing data using Pilon (v1.23).31 Finally, heterozygous sequences were removed from the genome assembly using the Purge_haplotigs pipeline (v1.0.4)32 to obtain the final sequence. To assess the completeness of the assembled genome, we performed benchmarking single-copy orthologs (BUSCO) analysis using BUSCO (v4.1.2)33 in genome mode with the Embryophyta_odb10 database, and telomere sequences (TTTAGGG) were searched using telomere_finder (https://github.com/MitsuhikoP/telomere_finder). LTR assembly index (LAI) scores were calculated in a sliding window of 3Mb with a 300 kb step size across the whole genome by LTR_retriever (v2.8).34

2.5. Hi-C sequencing and data processing

High-quality DNA extracted from young leaves was used for Hi-C sequencing. Formaldehyde was used for fixing chromatin. In situ Hi-C chromosome conformation capture was performed according to the DNase-based protocol described by Ramani.35 The libraries were sequenced using 150 bp paired-end mode on an Illumina NovaSeq (Illumina, San Diego, CA, USA). For pseudochromosome level scaffolding, we used the assembly software ALLHIC (v0.9.12) for stitching, and then we imported the final files (.hic and.assembly) generated by the software into Juicebox (v1.11.08)36 for manual optimization.

2.6. Repeat annotation

Repetitive sequences were identified using a de novo method and a homology-based search. A de novo repeats library was modelled using RepeatModeler v1.0.11,37 and intact LTR retrotransposons were identified using LTR_Finder (v1.06) with default parameters.38 For homology-based search, RepeatMasker (v4.0.9) was used to find the repeats against the Repbase with default parameters.39 Finally, all the predicted repeats were combined to eliminate redundancy.

2.7. Gene prediction and annotation

Evidence from transcript mapping, ab initio gene prediction, and homologous gene alignment was combined to predict protein-coding genes. A genome-guided transcripts assembly was performed, and transcript-based gene predictions were built with the PASA pipeline (v2.1.0) (‘build_comprehensive_transcriptome.dbi’ and ‘pasa_asmbls_to_training_set.dbi’) following default parameter settings.40 Augustus (v3.3.2),41 Genscan (v1.0),42 and GlimmerHMM (v3.0.4)43 were used for ab initio gene prediction. For homologous gene alignment, proteins from five related species (Eustoma grandiflorum,44Gelsemium sempervirens,45Coffea canephora,46Catharanthus roseus,47 and Calotropis gigantea48) were aligned to the genome using Exonerate (v2.4.0) (https://github.com/nathanweeks/exonerate). Finally, MAKER (v2.31.10)49 was used to integrate gene sets predicted by the three methods and remove incomplete genes and genes with too short CDS (< 150 bp), generating a non-redundant and more complete gene set. We employed the BUSCO software (v4.1.2)33 for evaluating the quality of the prediction based on the Embryophyta_odb10 database.

Functional annotation of the predicted protein-coding genes was carried out by performing Blastp (v2.6.0) (parameter: e-value cut-off 1 × e−5) searches against entries in both the NCBI nr and Uniprot databases (http://www.uniprot.org/). Searches for gene motifs and domains were performed using InterProScan (v5.33)50 and HMMER (v3.1). GO terms (http://geneontology.org/) for genes were obtained from the corresponding InterPro (https://github.com/ebi-pf-team/interproscan) or Uniprot (https://www.uniprot.org/) entries. Pathway annotation was performed using KOBAS (v3.0) (https://github.com/xmao/kobas) against the KEGG database.

2.8. Gene family construction and phylogeny

Amino acid sequences of the 15 selected angiosperm species (G. straminea, G. macrophylla,27G. dahurica,28E. grandiflorum,44G. sempervirens,45C. canephora,46C. roseus,47C. gigantea,48Chiococca alba,51Mimulus guttatus,52Solanum lycopersicum,53Helianthus annuus,54Camptotheca acuminata,55Ophiorrhiza pumila,56 and Vitis vinifera57) were aligned using Blastp (e-value 1 × e−5 -outfmt 6), and gene family clustering was performed using OrthoMCL software (v2.0.9) (parameters: percent match cut-off = 30, e-value exponent cut-off = 1e-5, expansion cut-off = 1.5).58

Gene family contraction and expansion analyses were performed using CAFé (v2.1) (parameter: -filter) software59 based on gene family clustering results.

Single-copy gene families shared by selected species were screened to construct phylogenetic tree. MUSCLE (v3.8.31)60 was used to perform multiple sequence alignment for protein sequences of each single-copy gene family, and then the comparison results were filtered by trimAL (v1.2rev59) (parameter: -gt 0.2).61 The filtered results were merged and connected to supergenes. Finally, RAxML (v8.2.10)62 (parameters: model PROTGAMMAWAG, 1000 bootstrap replicates) was used to construct a maximum likelihood tree of species based on supergenes.

Based on a phylogenetic tree with some nodes calibrated by fossils according to TIMETREE (Supplementary Table S1), the mcmctree tool of PAML (v4.9) (parameters: nsample = 1000000; burnin = 200000; seqtype = 0; clock = 3; model = 4)63 was used to estimate the divergence time of the different species.

2.9. Whole-genome duplication analyses

Amino acid sequences of G. straminea, G. macrophylla,27G. dahurica,28C. canephora,46 and V. vinifera57 were self-aligned using Blastp (e-value cut-off: 1 × e−5) and the best Blastp result was retained. To obtain paralogous gene families, we performed gene cluster analysis based on the CDS alignment using OrthoMCL (v2.0.9) (parameters: percent match cut-off = 30, e-value exponent cut-off = 1e-5, expansion cut-off = 1.5).58 Ks values were calculated from all paralogous families using yn00 in the PAML package.63 The Ks of a given family was represented by the median value, and the distribution of corrected Ks values was plotted by median values.64 To distinguish whether this peak represented a WGD event or background small-scale duplications, we identified paralogous gene pairs using Blastp and determined syntenic blocks using MCScanX50 (https://github.com/wyp1125/MCScanx). The timings of WGD events were calculated with the following formula: divergence date T = Ks/2r.54,65,66

We used LAST (v1170)67 to detect syntenic blocks for G. straminea, G. dahurica, and C. canephora, and JCVI (v0.9.13)68 was then used to determine whether similar gene pairs were adjacent to each other on the chromosome according to the annotation file (gff3). Finally, all the genes in the syntenic blocks were obtained.

2.10. Cytochrome P450 (CYP) gene family identification

The HMM model file for cytochrome P450 (PF00067) was downloaded from the Pfam database (http://pfam.xfam.org/), and the hmmscan program of HMMER (v3.2.1) was used to identify the CYP genes with an e-value of 1e-5. Blastp (v2.6.0) was used to compare the protein sequences to the CYP genes of Arabidopsis thaliana downloaded from the Arabidopsis Information Resource (https://www.arabidopsis.org/) with an e-value ≤ 1 × e−40. Final identification was achieved by intersecting the results of domain prediction and homology comparison. The retained amino acid sequences were aligned using MAFFT69, and the aligned amino acids were used for phylogenetic tree construction with PhyML (v3.3) using the maximum likelihood method with 1000 bootstrap replicates.70,71

2.11. Identification of genes involved in iridoid biosynthesis and expression analyses

Iridoids are the main bioactive components of G. straminea derived from either the plastidial 2-C-methyl-derythritol-4-phosphate (MEP) pathway or the cytosolic mevalonic acid (MVA) pathway. Genes encoding key enzymes involved in iridoid biosynthesis were searched according to the integrated annotations using Blastp with an e-value of 1 × e-5 as the threshold.72

Transcriptome data from the root, stem, leaf and flower tissues were employed, and gene expression levels in different tissues were obtained by calculating the fragments per kilobase of transcript per million of fragments mapped (FPKM) using eXpress86.73 The significantly DEGs were analysed using DESeq274 with the thresholds P < 0.05 and |log2(FoldChange)| > 1. Heatmaps displaying gene expression profiles were plotted using pheatmap (v1.0.12) in R v.2.15 (www.r-project.org).

3. Results

3.1. Sequencing information, genome size, and heterozygosity estimation

Illumina sequencing produced an average coverage rate of × 164 paired-end short reads (205.29 Gb), whereas Oxford Nanopore sequencing produced a coverage rate of × 175 single-molecule long reads (218.94 Gb in total, with an average length of 14.87 kb and a read N50 length of 32.01 kb). Hi-C sequencing produced 326.38 Gb of data (Supplementary Table S2). More than 115 Gb of reads were obtained by transcriptome sequencing of roots, stems, leaves, and flowers. K-mer frequency distribution analysis showed that K-mer peak depth was 58 and the estimated genome size was 1.18 Gb (Supplementary Fig. S1). Based on the total number of K-mers, the heterozygosity and repetitive content of G. straminea were estimated to be 1.86% and 71.3%, respectively.

3.2. Genome assembly and quality evaluation

A total of 567 contigs were obtained, with a total length of approximately 1.289 Gb and a contig N50 of 7.5 Mb (Supplementary Table S3). A Hi-C scaffolding strategy was used to assemble the chromosome sequences with 109.13 million clean reads from the Illumina sequencing platform. The total length of the assembly was 1.25 Gb with a scaffold N50 of 92.70 Mb, smaller than the G. dahurica and G. macrophylla genomes (1.42 and 1.79 Gb, respectively). Finally, 100 N was added to the assembled contig sequences in the determined order and direction to obtain the final chromosome-level genome sequence, with a chromosome mount rate of 96.08% (Supplementary Table S4). The Hi-C contact map showed that the clustering, ordering and orientation were effective (Supplementary Fig. S2).

The assembly quality of the genome was evaluated by different methods. The mapping rate of Illumina reads to the assembled genome of G. straminea was 99.70% (Supplementary Table S5). RNA-seq data for genome annotation were also mapped to the reference genome to assess the quality of assembly, resulting in 94.97%, 95.74%, 95.44%, and 95.17% of the total mapped RNA-seq reads for roots, stems, leaves, and flowers, respectively (Supplementary Table S6). BUSCO assessment results showed that a total of 1,540 complete gene models (95.42%) were identified among 1,614 conserved genes (Supplementary Table S3). In the assembly, telomeres were detected on 22 of the 26 ends of the 13 pseudochromosomes (Supplementary Table S7). And, an average LAI score of 18.53 for the whole genome was observed (Supplementary Fig. S3), which indicated that the genome assembly reached reference quality.75

3.3. Repeat annotation

Transposable elements (TEs) and tandem repeats in the G. straminea genome were predicted by homology-based annotation and a de novo method. The total length of TEs was 1,019,365,257 bp, accounting for 81.47% of genome length, greater than the 73.47% in G. macrophylla and 70.25% in G. dahurica. Long terminal repeat (LTR) elements were the dominant repeat type (60.12% of genome length) (Supplementary Table S8), similar to that of G. macrophylla and G. dahurica. DNA transposons with a total length of 25,840,648 bp and long interspersed nuclear elements with a total length of 20,987,776 bp accounted for 2.07% and 1.68% of the genome assembly, respectively (Supplementary Table S8). The tandem repeats in G. straminea genome represented 244,651 bp.

3.4. Gene annotation and functional prediction

In the assembled genome of G. straminea, 54,310 protein-coding, 772 tRNA, 1,128 rRNA, 117 miRNA, and 3,723 snRNA genes were predicted (Supplementary Table S9). The gene, tandem, and GC densities were mapped onto the pseudochromosomes (Fig. 1B). The average length of protein-coding genes in the G. straminea genome was 4,760.64 bp. The average length of coding DNA sequence was 890.44 bp, and the average exon number of each gene was 4.28. The evaluation of completeness of gene prediction based on 1,614 BUSCO genes suggested 95.5% complete BUSCOs were present (Supplementary Table S9). The parameters of gene structure were compared with those of seven selected species in Gentianales: G. macrophylla, G. dahurica, E. grandiflorum, C. canephora, C. gigantea, C. roseus, and G. sempervirens (Supplementary Table S9). G. straminea and G. macrophylla had 54,310 and 55,337 genes, respectively, less than E. grandiflorum and C. roseus, whereas G. dahurica had only 37,988 genes, the fewest among all these species. The average gene length of the three Gentian species was 3,600–4,700 bp, much smaller than that of the other species (9,839–12,150 bp). The average number of exons per gene and the average exon length of the three Gentian species were higher than those of other species.

In addition, a total of 43,584 (80.25%) genes were functionally annotated in the G. straminea genome in at least one of the public databases (Supplementary Table S10), and 23,982 (44.16%) genes were assigned to at least one Gene ontology (GO) term and classified into 60 GO functional subcategories (Supplementary Fig. S4). The BLASTX top-hit species distribution showed the highest homology to Coffea arabica (40.02%), Coffea eugenioides (20.67%), and C. canephora (14.56%) (Supplementary Fig. S5).

3.5. Gene family analysis

Gene family analysis among G. straminea and 14 other angiosperm species revealed that the 54,310 genes in the G. straminea genome were clustered into 27,322 gene families. Gene families of G. straminea and four other species of Gentianales (G. macrophylla, G. dahurica, C. roseus, and O. pumila) and iridoid-producing medicinal plant C. acuminata were analysed further. The specific and shared gene families among these species were determined: 9,962 gene families were observed in all the selected species, and 12,709 genes and 8,614 gene families were specific to G. straminea. A total of 2,215 gene families were shared among the three Gentiana species (Fig. 2). The pathways enriched by KEGG for the shared genes of these six species included terpenoid backbone biosynthesis (ko00900), and diterpenoid biosynthesis (ko00904), which may be correlated with the presence of iridoid or iridoid derived alkaloids (C. roseus) in these plants. GO term enrichment analysis (P-value < 0.01) revealed that the genes unique to G. straminea were involved in GTP binding (GO:0005525), GTPase activity (GO:0003924), catalytic activity (GO:0003824), nucleolus (GO:0005730), and FMN binding (GO:0010181) (Supplementary Table S11). KEGG analysis indicated that G. straminea-specific gene families were mainly enriched in carbon metabolism (ko01200), biosynthesis of amino acids (ko01230), starch and sucrose metabolism (ko00500), ribosome biogenesis in eukaryotes (ko03008), and phagosome (ko04145) (Supplementary Table S12).

Gene family statistics in the comparative genomics analysis. (A) Number of genes in 15 selected angiosperm species. (B) Venn diagram showing overlap of gene families between G. straminea and five other angiosperm species.
Figure 2.

Gene family statistics in the comparative genomics analysis. (A) Number of genes in 15 selected angiosperm species. (B) Venn diagram showing overlap of gene families between G. straminea and five other angiosperm species.

3.6. Phylogenetic relationships

A high-confidence phylogenetic tree and the estimated divergence times of 15 different plant species based on 218 single-copy orthologous genes were constructed with V. vinifera as the outgroup (Fig. 3). As expected, G. straminea, G. macrophylla, and G. dahurica were sisters to E. grandiflorum, forming an Gentianaceae clade. Species in Gentianales can be grouped into a clade, including Gentianaceae, Rubiaceae, Gelsemiaceae, and Apocynaceae. Rubiaceae (C. alba and C. canephora) were located at the base of the Gentianales branches. The estimated Gentianaceae divergence time was 55.9 (48.9–64.5) Mya, and the divergence time of Gentiana was 32.1 (23.0–42.1) Mya, with a divergence time between G. straminea and other species in Gentiana of 3.8 (2.1–6.6) Mya (Fig. 3).

Maximum likelihood phylogeny and number of gene families that have expanded or contracted among 15 plant species. Confidence intervals of estimated divergence times are indicated at each node. All the branches were obtained with a bootstrap value of 100.
Figure 3.

Maximum likelihood phylogeny and number of gene families that have expanded or contracted among 15 plant species. Confidence intervals of estimated divergence times are indicated at each node. All the branches were obtained with a bootstrap value of 100.

The expansion and contraction of orthologous gene families were also determined. The results show (Fig. 3) that 2,716 gene families expanded in the lineage leading to the Gentianaceae, whereas 713 gene families contracted. Furthermore, 1,400 gene families expanded, and 1,198 gene families contracted, in the lineage leading to the Gentiana genus. In G. straminea, 499 gene families expanded, compared with 4,864 in G. macrophylla and 1,332 in G. dahurica, and 1,723 gene families contracted, compared with 253 in G. macrophylla and 2,171 in G. dahurica. GO enrichment analysis showed that the expanded gene families in G. straminea were mainly involved in extracellular region (GO:0005576), rejection of self pollen (GO:0060320), protein serine kinase activity (GO:0106310), protein serine/threonine/tyrosine kinase activity (GO:0004712), and defense response (GO:0006952) (Supplementary Table S13). KEGG analysis of these expanded gene families revealed significant enrichment in plant-pathogen interaction (ko04626), MAPK signalling pathway-plant (ko04016), oxidative phosphorylation (ko00190), and stilbenoid, diarylheptanoid, and gingerol biosynthesis (ko00945). The expanded gene families were also enriched in the iridoid biosynthetic pathway (ko00900, terpenoid backbone biosynthesis) (Supplementary Table S14).

3.7. Analyses of genome synteny and WGD

The genome synteny between G. straminea and two other species (G. dahurica and C. canephora) was screened to detect the occurrence of potential WGD events in G. straminea. Synteny analyses between G. straminea and C. canephora provided structural evidence for two WGDs in G. straminea with a 1:4 syntenic depth ratio in the G. straminea and C. canephora comparison (Fig. 4A, Supplementary Fig. S6A). In addition, G. dahurica underwent two WGDs after whole-genome triplication (WGT) of the eudicots (WGT-γ),28 and high collinearity was observed between the G. straminea and G. dahurica genomes with 1:1 syntenic depth ratios (Fig. 4A, Supplementary Fig. S6B). Besides, the genome synteny between G. straminea and two other iridoid-producing plants (C. acuminata and O. pumila) was also analysed. Previous studies have indicated that C. acuminata underwent one recent WGD after WGT-γ and O. pumila did not experience any recent WGD.55,56 In our analysis, we recovered 2:4 and 1:4 syntenic ratios in G. straminea-C. acuminata and G. straminea-O. pumila comparisons (Supplementary Fig. S7), also suggesting two WGD events in G. straminea. Within G. straminea genome, a total of 5,988 colinear gene pairs on 865 colinear blocks were detected (Supplementary Fig. S8), without a clear 1:3 syntenic depth ratio found. The results were consistent with those of G. macrophylla.27 Meanwhile, G. straminea and G. macrophylla also showed high collinearity (Supplementary Fig. S9). The duplicate gene origins in G. straminea were also analysed, and the results showed that dispersed duplication was the main type of gene duplication (48280, 77.60%), and other types of gene duplication including transposed duplication (8208, 13.19%), proximal duplication (2212, 3.56%), WGD (1841, 2.96%), and tandem duplication (1671, 2.68%).

Syntenic comparisons and distribution of synonymous substitutions (Ks) of paralogous genes. (A) Syntenic comparisons of G. straminea, G. dahurica, and Coffea canephora chromosomes. (B) Distribution of corrected Ks values in syntenic blocks.
Figure 4.

Syntenic comparisons and distribution of synonymous substitutions (Ks) of paralogous genes. (A) Syntenic comparisons of G. straminea, G. dahurica, and Coffea canephora chromosomes. (B) Distribution of corrected Ks values in syntenic blocks.

The value of synonymous nucleotide substitutions (Ks) was calculated and characterized between collinear homeologs within or between G. straminea and four other species (G. dahurica, G. macrophylla, C. canephora, and V. vinifera). The Ks distribution of V. vinifera and C. canephora confirmed the WGT-γ,76 and the Ks distribution of G. straminea showed two peaks at approximately 0.2 and 0.7 after the peak representing WGT-γ, suggesting that this species underwent two more WGD events (Fig. 4B), which were shared by the other two species of Gentiana (G. dahurica and G. macrophylla) (Fig. 4B).

Speciation event times were obtained for C. canephora—G. straminea at 64.7-78.6 Mya,77 and the Ks value of C. canephora—G. straminea was 1.38, which allowed us to calculate the age of WGD2 (Ks = 0. 098) at 4.59–5.58 Mya and the age of WGD1 (Ks = 0.78) at 36.6–44.4 Mya according to the computational formula: T = Ks/2r. The Ks distribution also indicated that the divergence of Gentiana and C. canephora occurred after WGT-γ and that the WGD events in Gentiana occurred after the divergence from C. canephora.

3.8. Identification and characterization of the CYP gene family

Cytochrome P450 monooxygenases (CYPs) are among the most powerful biocatalysts in nature and play an important role in a variety of secondary metabolic pathways in plants, such as the biosynthesis of terpenoids, flavonoids, alkaloids, lignin, fatty acids, and other metabolites.78,79 In this study, all the CYP genes were identified in three Gentiana species (G. straminea, G. macrophylla,27 and G. dahurica28), and two other Gentianales species (C. canephora46 and C. roseus48). There were 430, 527, 389, 263, and 467 CYP genes in G. straminea, G. macrophylla, G. dahurica, C. roseus, and C. canephora, respectively. The number of CYP genes identified in G. straminea was similar to that of C. canephora, but significantly greater than that of G. dahurica and C. roseus, and smaller than that of G. macrophylla.

Analysis of the distribution of CYP genes in the G. straminea genome showed that the 430 CYP genes were spread across the 13 pseudochromosomes (Fig. 5C), and pseudochromosomes 1, 2, 5, and 7 had more than 40 CYP genes each. The CYP gene family members of G. straminea and the other four species were divided into nine clans (the CYP51, CYP72, CYP710, CYP711, CYP74, CYP85, CYP86, CYP97, and CYP71 clans) with similar distribution rules in general (Fig. 5A; Supplementary Table S15) and the same as those in A. thaliana.80 The CYP71 clan was of the A type, and the other clans were of the non-A type.81 In addition to the CYP51, CYP74, CYP97, CYP710, and CYP711 clans, other clans were further divided into different families. The CYP71 clan was the largest clan with the most families, including CYP71, CYP75, CYP76, CYP77, CYP78, CYP79, CYP81, CYP82, CYP83, CYP84, CYP89, CYP93, CYP98, CYP701, CYP703, CYP705, CYP706, and CYP712, with most genes participating in the oxidative stress response and the biosynthesis of terpenes, sterols, indole alkaloids, and phenylpropanoids.82 Among the families in the CYP71 clan, CYP79 and CYP712 were found in C. roseus and C. canephora, but not in any of the three Gentian species. CYP79 catalyses the conversion of amino acid to the corresponding aldoxime, which is the initial step in the synthesis of many glycosides.83,84 Moreover, there were more CYP716, CYP71, CYP72, CYP76, and CYP83 genes in G. macrophylla than in G. straminea and G. dahurica, resulting in a larger total number of CYP genes in G. macrophylla.

CYP gene family in G. straminea. (A) Phylogenetic tree of the candidate CYP enzyme-encoding genes identified in the G. straminea genome. (B) Phylogenetic analysis of CYP72A genes in G. straminea, G. macrophylla, G. dahurica, Catharanthus roseus, Coffea canephora, and Arabidopsis thaliana. C Chromosomal locations of CYP genes in the G. straminea genome. G10H genes are marked with circles, SLS genes with asterisks, and 7-DLH genes with triangles. D Heatmap showing the gene expression levels (log10(FPKM + 1)) of CYP72A genes in different tissues of G. straminea.
Figure 5.

CYP gene family in G. straminea. (A) Phylogenetic tree of the candidate CYP enzyme-encoding genes identified in the G. straminea genome. (B) Phylogenetic analysis of CYP72A genes in G. straminea, G. macrophylla, G. dahurica, Catharanthus roseus, Coffea canephora, and Arabidopsis thaliana. C Chromosomal locations of CYP genes in the G. straminea genome. G10H genes are marked with circles, SLS genes with asterisks, and 7-DLH genes with triangles. D Heatmap showing the gene expression levels (log10(FPKM + 1)) of CYP72A genes in different tissues of G. straminea.

Secologanin synthase (SLS/CYP72A219) and 7-deoxyloganic acid 7-hydroxylase (7-DLH/CYP72A224) in the subfamily CYP72A and geraniol 10-hydroxylase (G10H/CYP76B10) in family CYP76 are key enzyme genes involved in iridoid biosynthesis with high amino acid residue sequence identity.18 We constructed phylogenetic trees of CYP72A and CYP76 from G. straminea, G. macrophylla, G. dahurica, C. roseus, C. canephora, and A. thaliana to explore potential genes encoding 7-DLH, SLS, and G10H in G. straminea. All candidate genes that may encode these enzymes were retrieved. The phylogenetic trees showed that 10 SLS genes and 5 7-DLH genes were retrieved (Fig. 5B), whereas 14 genes clustered with the reported G10H genes (Supplementary Fig. S10A). The location of these genes on the pseudochromosomes of G. straminea was also marked, with G10H and SLS having a tandem duplication on pseudochromosome 2 (4 genes) and pseudochromosome 10 (7 genes), respectively (Fig. 5C). The expression levels of the CYP72A and CYP76 genes in different tissues of G. straminea were obtained from transcriptome data. The expression of SLS and 7-DLH was high in flowers and roots, and the expression of other CYP72A genes was similar in different tissues (Fig. 5D). Most CYP76 genes were expressed similarly in different tissues, and this expression pattern was particularly obvious in G10H genes (Supplementary Fig. S10B).

3.9. Key genes involved in the biosynthesis of iridoids

Iridoids are important medicinal components of Gentiaceae. Based on previous studies44, 45, 46, we proposed the biosynthetic pathway of iridoids in G. straminea as follows: (i) Formation of the precursor: IPP is synthesized by either the MVA or MEP pathway; (ii) formation and structural modification of the iridoid skeleton: DMAPP reacts with IPP to generate geranyl diphosphate, which is then converted into geraniol; secologanin is produced from geraniol by a series of enzymatic reactions (Fig. 6). We searched for all G. straminea genes involved in the iridoid biosynthetic pathway and identified 142 candidate genes based on our genome annotation (Supplementary Table S16). Five genes in the MEP pathway, i.e., 1-Deoxy-D-xylulose-5-phosphate reductoisomerase (DXR), 4-(Cytidine 5’-diphospho)-2-C-methyl-Derythritol kinase (CMK), 2-C-Methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS), 4-Hydroxy-3-methylbut-2-enyl-diphosphate synthase (HDS), and 4-Hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR), and one gene in the MVA pathway, i.e. Diphosphomevalonate decarboxylase (MVD), were single-copy genes, whereas other genes in the MEP and MVA pathways contained 2 to 13 copies. In addition, key genes in the formation and structural modification pathway of the iridoid skeleton, such as Geranyl diphosphate synthase (GPPS), Geraniol synthase (GES), G10H, 8-hydroxygeraniol oxidoreductase (8-HGO), iridoid synthase (IS), 7-deoxyloganetic acid synthase (7-DLS), 7-deoxyloganetic acid glucosyltransferase (7-DLGT), 7-DLH, Loganic acid methyltransferase (LAMT), SLS, and strictosidine synthase (STR), with different copies were identified. By comparing the genomes of G. straminea, G. macrophylla,27G. dahurica,28 and two iridoid-producing plants (C. acuminata and O. pumila), the copy number variations of genes involved in iridoid biosynthesis were investigated (Supplementary Table S17). The results showed that the copies of genes in the MVA and MEP pathways were similar in the three Gentiana species. Compared with C. acuminata and O. pumila, the number of copies of MVA pathway-related genes in Gentiana was significantly higher, such as Hydroxymethylglutaryl-CoA synthase (HMGS), Hydroxymethylglutaryl-CoA reductase (HMGR), Mevalonate kinase (MK), and Phosphomevalonate kinase (PMK), but the copies of genes in MEP pathway of Gentian species were similar to those of C. acuminata and O. pumila. As for the copies of genes in the downstream pathways of iridoid synthesis, such as G10H, 8-HGO, IS, 7-DLGT, 7-DLH, LAMT, and STR, there were significant differences among the three Gentiana species. For example, there were 30 and 42 STR genes in G. straminea and G. macrophylla, respectively, but only 3 STR genes were identified in G. dahurica. The number of 8-HGO and IS genes also varied significantly among them. And compared with C. acuminata and O. pumila, Gentiana species had significantly more copies of most genes in the downstream pathways. Interestingly, of all iridoid genes, only 7-DLH had significantly higher copy numbers in C. acuminata and O. pumila (18 and 10, respectively) than in Gentiana species.

Expression analysis of genes involved in iridoid biosynthesis in G. straminea. Different colour blocks represent normalized gene expression levels (log10(FPKM + 1)) of all genes in different tissues: flower, leaf, stem, and root.
Figure 6.

Expression analysis of genes involved in iridoid biosynthesis in G. straminea. Different colour blocks represent normalized gene expression levels (log10(FPKM + 1)) of all genes in different tissues: flower, leaf, stem, and root.

A total of 148 differentially expressed genes (DEGs) between tissues of G. straminea were identified using transcriptome data from tissues of roots, stems, leaves, and flowers. In addition, 41 genes related to iridoid biosynthesis, i.e. Acetyl-CoA acetyltransferase (AACT), HMGS, 2 HMGRs, 2 MKs, MVD, 3 1-Deoxy-D-xylulose-5-phosphate synthases (DXSs), 2-C-Methyl-D-erythritol 4-phosphate cytidylyltransferase (MCT), HDS, 2 GPPSs, 4 G10Hs, 5 8HGOs, IS, 4 7-DLGTs, 2 7-DLHs, LAMT, 7 SLSs, and 3 STRs, were differentially expressed in different tissues. The number of genes highly expressed in roots, stems, leaves, and flowers were 13, 6, 4, and 17, respectively (Supplementary Table S16). With the exception of one HMGS gene, all DEGs in the MVA pathway were highly expressed in roots. The number of highly expressed DEGs in each tissue in the MEP pathway was similar. In the pathway of formation and structural modification of the iridoid skeleton, most DEGs were highly expressed in flowers. As for G. dahurica, genes from the MEP pathway and the SLS and LAMT genes showed high expression levels in leaves, and genes related to the MVA pathway were highly expressed in roots.28 In G. macrophylla, there were 40 DEGs related to iridoid biosynthesis between different tissues, with 19 exhibiting higher expression levels in leaves and 13 highly expressed in flowers.27 These data provide a basis for functional analysis of genes involved in the iridoid biosynthetic pathway in Gentiana.

4. Discussion

Gentiana straminea is a perennial herb and mainly distributed in the QTP. Unfortunately, it is a threatened species at present, and the lack of genomic data limits further researches on the evolution and molecular biology of the alpine species. The genome heterozygosity of species in Gentiana sect. Cruciata is generally high,14 meaning that these genomes are difficult to assemble.85 In the present study, the heterozygosity of the G. straminea genome reaches 1.86%, and Nanopore and Hi-C sequencing techniques were used to assemble the chromosome-level genome sequence of G. straminea with a contig N50 of 7.5 Mb. The G. straminea genome length was 1.25 Gb, of which 97.29% (1378.19 Mb) was anchored on 13 pseudochromosomes, with 95.5% complete BUSCOs present and 54,310 predicted genes. These results indicate high accuracy of genome assembly. The genome of G. straminea is similar in size to that of G. dahurica, and smaller than that of G. macrophylla, which may be related to the expansion of a large number of gene families in the G. macrophylla genome.

Phylogenetic tree was constructed based on 218 single-copy orthologous genes of 15 different species, and the result strongly supported the monophyly of Gentianales (Fig. 3). It should be noted that the relationship of Gentianales, Boraginales, Vahliales, Solanales, and Lamiales remains poorly resolved, with orders of Boraginales (Boraginaceae) and Vahliales (Vahliaceae) newly recognized in APG IV.86 In our previous analysis based on chloroplast genomes,87 the result strongly supported the position of Boraginaceae as the sister of Solanales, which then clustered together with Gentianales in the absence of sequences of Vahliaceae ([Boraginaceae + Solanales] + Gentianales). In the analysis based on mitochondrial genomes,10 the phylogenetic tree supports the position of Gentianales as the sister of Solanales in the absence of sequences of the Boraginaceae and Vahliaceae ([Gentianales + Solanales] + Lamiales). According to phylotranscriptomics study of Asterid,88 Core lamiids fall into two sister clades, lamiids I (Lamiales + Vahliales) and lamiids II ([Boraginales + Gentianales] + Solanales). In this study, Solanales and Labiales formed sister branches and then clustered with Gentianales with lack of genome sequences of Boraginaceae and Vahliaceae ([Solanales + Lamiales] + Gentianales). The discrepancy in the results of related analyses may be due to the imbalance of sample size, and it is worth further analysis. According to APG IV, Gentianales consists of five families, i.e. Apocynaceae, Gelsemiaceae, Gentianaceae, Loganiaceae, and Rubiaceae,86 four of which were included in this study without Loganiaceae. The result supported the monophyly of these families, and Gentianaceae was sister to Gelsemiaceae, with Rubiaceae as the base of the Gentianales branches. The phylogenetic relationships within Gentiaceae are complex, including genera that are not monophyletic groups, such as Swertia, Gentianella, Lomatogonium, and Comastoma.89 And Gentiana containing about 362 species is generally considered to be a monophyletic group,90 which is consistent with the results in this study, although the sample size is small.

It is generally believed that WGD events are important drivers of species and metabolite diversity in plants.91,92 The results of this study showed that G. straminea, like G. dahurica and G. macrophylla, underwent two WGD events after WGT-γ, and the WGD events in Gentiana species occurred 4.59-5.58 Mya and 36.6-44.4 Mya, respectively, which are consistent with the results of previous study (~5.80 Mya and 34.60 Mya, respectively).27 According to plastid genome-based systematic analysis, Gentiana diverged at about 38.09 Mya (95% HPD: 37.88–38.21 Mya).20 However, in another study based on chloroplast genomes of Gentianaceae, the division of Gentianaceae into different genera in different directions occurred after 21.87 Mya, with Gentiana appearing 9.45 million years ago.21 In our analysis, the divergence time of Gentiana species was ~32.1 Mya, similar to the result of 38.09 Mya. The results indicated that these two WGDs may have occurred before and after speciation in Gentiana or both after the divergence of Gentiana. More in-depth analysis is needed with further acquisition of Gentian and Gentiaceae genomes. And other Gentiaceae species may have undergone additional WGD events. For example, Sinoswertia tetraptera had two independent WGD events earlier than G. macrophylla,93 and there may have been a further WGD after the divergence between E. grandiflorum and G. dahurica.94 Therefore, we speculate that these two WGDs may be Gentiana-specific, and further analysis is required based on subsequent Gentiana genomic resources.

In our assembly, about 11% of the G. straminea genome were syntenic with low proportions of intragenomic collinearity. In collinear analysis, a collinear block was defined as a homologous block composed of five or more continuous collinear genes with the same direction. And most of the duplication in G. straminea was dispersed duplication (77.60%), which may be the reason for the low collinearity.95 In addition, WGD doubled the whole chromosomes, and chromosomes may be broken, recombined, or mutated in the history of species evolution, leaving only fragmented collinear blocks to provide traces of WGD. The more frequently the breaks, recombination, and mutations occurred, the less linear blocks and related collinear genes and WGD traces could be detected. If more high-quality genomes of closely related species, especially those with more WGD traces, could be used for comparative analysis, more accurate analysis basis for the evolution rate and chromosome variation of Gentiana may be provided.

The WGDs likely increased the copy numbers of iridoid-related genes in Gentiana compared with those of other iridoid-producing plants (O. pumila and C. acuminata), and the gene families that GPPS and STR belong to expanded significantly in G. straminea. In our analyses, there were significant differences in the number of gene families experienced expansion among the three Gentiana species (499 in G. straminea, 1,332 in G. dahurica, 4,864 in G. macrophylla), although they experienced the same WGDs. Gene family was the result of gene replication, and a large number of gene replication often caused gene family expansion, which was the result of biological adaptive evolution.96 The functions of the expanded gene families in the three Gentiana species were obviously different, which was closely related to the environmental adaptability of the species and the synthesis of secondary metabolites. In addition, duplication was also a factor that led to the expansion of gene families,97 and tandem duplication was an important reason for the expansion of gene families in the three Gentiana species.27,28 Individual replication of related genes occured through tandem repetition, which may enhance the function of related genes.98 As is commonly believed, WGDs increased the genome size of angiosperms, but mean genome size was not correlated with ploidy,99 with a large number of homologous gene pairs often lost after WGDs.100 WGDs played an important role in the shaping of plant genomes, and WGDs associated with gene loss were considered as the main evolutionary force for new gene functionalization in plants.101 WGD and polyploidization caused rapid genomic reorganization, massive gene losses and structural variations,102 which was an important driving force for plant genome evolution.103,104 The effects of WGD on gene losses and gene family expansion and contraction may be species-specific, and the relevant laws need to be further revealed.

The CYP superfamily comprises a large number of monooxygenases that play an important role in many secondary metabolic pathways. In angiosperms, the number of CYP genes varies due to differences in species ploidy and genome size. For example, there are 246 CYP genes in Arabidopsis thaliana,105 279 in Vitis vinifera,106 329 in Oryza sativa,107 443 in Eucalyptus grandis, and 741 in Triticum aestivumv.108 And it should be noted that the number of CYP genes increases when pseudogenes and fragments are included.108 Therefore, the assembly of high-quality genomes is important for the accuracy of gene family analysis. In this study, a total of 430 CYP genes were identified in the genome of G. straminea, widely distributed across different chromosomes (Fig. 5C). The comparison also showed that there were significant differences in the number of CYP genes among Gentianales species, and the main difference was found in the CYP71 clan, the largest clan with the most families. The three Gentian species contain the same kinds of CYP genes, but with different copy numbers, which may be related to differences in gene family expansions. In addition, no CYP79 genes were detected in Gentiana or in the other two species of Gentianales; the encoded enzyme is the initial step in the synthesis of many glycosides.

The genome assembly of G. straminea, a medicinal herb containing active iridoid substances, is helpful for genetic background analysis and the study of the iridoid synthesis pathway. The contents of gentiopicrosides in the four original species of Gentianae Macrophyllae Radix are significantly different, with the highest content in G. macrophylla.109,110 Metabolomics analysis by TOF/MS and NMR also revealed chemical differences among them.111 In the present study, the genes involved in the iridoid synthesis pathway of three Gentian species were compared. The copy numbers of genes in the MVA and MEP pathways were similar in the three species, whereas those of genes in the downstream pathways of iridoid synthesis were significantly different (Supplementary Table S17). Based on transcriptome data from roots, stems, leaves, and flowers, DEGs related to the iridoid synthesis pathway were analysed. In the MVA pathway, most DEGs were highly expressed in the roots of G. straminea and G. dahurica, whereas most DEGs showed high expression levels in the flowers of G. macrophylla. In the MEP pathway, the number of highly expressed DEGs in each tissue was similar in G. straminea, and most DEGs were highly expressed in the leaves of G. dahurica and G. macrophylla. In the downstream pathway of iridoid synthesis, most DEGs were highly expressed in the flowers and leaves of G. straminea and G. dahurica, respectively, whereas most DEGs showed high expression levels in both flowers and leaves of G. macrophylla. There were clear differences in the expression pattern of genes related to the iridoid biosynthesis pathway in Gentiana, which may be the main factor leading to differences in iridoid content among Gentiana species.

Previous studies indicated that the contents of iridoids such as gentiopicroside and loganic acid in the roots of medicinal plants in Gentiana section Cruciata, including G. straminea, G. dahurica, and G. macrophylla, were significantly higher than those in the above-ground parts (stems, leaves and flowers).112–114 The results of UPLC-ESI-HRMSn analysis also showed that the chemical markers of iridoids in G. crassicaulis were mainly distributed in roots.115 As for the above-ground parts, in most studies, the content of gentiopicroside and loganic acid in flowers was higher than that in stems and leaves, but analysis of samples from different populations may show a variety of results.116,117 It is worth noting that in addition to the place of origin, the time of harvesting also has a greater impact on the content of iridoids. For example, in G. straminea and G. macrophylla, the content of gentiopicroside, loganic acid, swertiamarin, and sweroside fluctuates with the growth time of the plants.112,118 The phenomena may be caused by the ecological environment and the inherent genetic mechanism of the species. In this study, we compared the expression of genes involved in the iridoid biosynthetic pathway in different tissues of three Gentiana species and found that the expression patterns were quite different, which may be related to the different iridoid contents among tissues. It is common for genes to be expressed differently in closely related species of different tissues because of differences in species, habitat, and growth periods.119 In particular, the samples of the three Gentiana species were not collected at the same time, further leading to differences in the expression of related genes. However, due to the limited sample size in our comparison, no obvious rule was found. The relation between gene copy number, gene family expansion, gene expression pattern, and iridoid content should be further investigated.

5. Conclusion

We report a chromosome-level genome assembly of G. straminea, and the results of this study lay a foundation for further research on gene functions and provide a reference for in situ conservation and the molecular breeding of this alpine herb. In addition, the genomic resources obtained provide insights into the molecular elucidation of the bioactive ingredients of Gentiana and increase our understanding of Gentianaceae evolution.

Funding

This study was supported by the National Natural Science Foundation of China (No. 82073959 and No. 81173654).

Conflict of interest

All authors did not have any conflict of interest.

Data availability

The genomic sequencing data have been deposited at the NCBI under the BioProject accession number PRJNA1020452, and the genome assembly and annotation files have been deposited at Figshare database (https://doi.org/10.6084/m9.figshare.25271983).

References

1.

Ge
,
S.
2022
,
A review of recent studies of plant systematics and evolution in China
,
Biodiv. Sci.
,
30
,
22385
.

2.

The Arabidopsis Genome Initiative
.
2000
,
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana
,
Nature
,
408
,
796
815
.

3.

PlaBi database
. https://www.plabipd.de/plant_genomes_pa.ep
(Accessed February 1, 2024)
.

4.

He
,
T.N.
1988
,
Genus Gentiana
. In:
He
,
T.N.
(eds.)
Flora Reipublicae Popularis Sinicae
. Volume
62
.
Beijing
:
Science Press
, p.
14
75
.

5.

Chinese Pharmacopoeia Commission
.
2020
,
Pharmacopoeia of the People’s Republic of China
. Volume
1
.
Beijing
:
Chinese Medical Science and Technology Press
, p.
282
.

6.

Wang
,
W.Y.
1991
,
Jie-Ji,
In:
Yang
,
Y.C.
(eds.)
Tibetan Medicine
.
Xining
:
Qinghai People’s Publishing House
, p.
9
12
.

7.

Zhou
,
X.J.
,
Xu
,
H.F.
and
Shun
,
Q.S.
2007
,
Resource science of Chinese medicinal materials
.
Shanghai
:
Shanghai Scientific and Technological Literature Publishing House
, p.
370
.

8.

Zhao
,
Z.
,
Dorje
,
G.
, and
Wang
,
Z.
2010
,
Identification of medicinal plants used as Tibetan traditional medicine Jie-Ji
,
J. Ethnopharmacol.
,
132
,
122
6
.

9.

Ni
,
L.
,
Zhao
,
Z.
,
Xu
,
H.
,
Chen
,
S.
, and
Dorje
,
G.
2016
,
The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion
,
Gene
,
577
,
281
8
.

10.

Ala
,
K.G.
,
Zhao
,
Z.
,
Ni
,
L.
, and
Wang
,
Z.
2023
,
Comparative analysis of mitochondrial genomes of two alpine medicinal plants of Gentiana (Gentianaceae)
,
PLoS One
,
18
,
e0281134
.

11.

Ni
,
L.
,
Zhao
,
Z.
,
Meng
,
Q.
,
Dorje
,
G.
, and
Mi
,
M.
2013
,
Genetic diversity of germplasm resources of Gentiana straminea from Tibet of China, Chin. Tradit. Herb
,
Drugs
,
44
,
3212
5
.

12.

Wang
,
L.
,
Zhao
,
Z.
,
Ni
,
L.
,
Dorje
,
G.
, and
Mi
,
M.
2017
,
Assessment of genetic diversity on Gentiana straminea based on ISSR markers
,
Chin. Tradit. Herb. Drugs.
,
48
,
3168
74
.

13.

Ala
,
K.G.
,
Ni
,
L.
,
Zhao
,
Z.
,
Kang
,
H.
,
Wu
,
J.
, and
Dorje
,
G.
2022
,
Molecular pharmacognostical identification of mainstream varieties of Tibetan medicine Jieji Gabao from Sichuan and Gansu in China
,
Acad. J. Shanghai Univ. Tradit. Chin. Med.
,
36
,
22
9
.

14.

Ni
,
L.
,
Zhao
,
Z.
,
Xiong
,
B.
,
Dorje
,
G.
, and
Mi
,
M.
2016
,
A strategy for identifying six species of Sect. Cruciata (Gentiana) in Gansu using DNA barcode sequences
,
Acta Pharm. Sin.
,
51
,
821
7
.

15.

Lu
,
J.
,
Zhao
,
Z.
,
Ni
,
L.
,
Dorje
,
G.
, and
Mi
,
M.
2019
,
The identification of Sect. Cruciata (Gentiana) species using mtDNA nad1/b-c and nad5/d-e fragments
,
Acta Pharm. Sin.
,
54
,
166
72
.

16.

Ni
,
L.
,
Zhao
,
Z.
,
Wu
,
J.
,
Dorje
,
G.
, and
Mi
,
M.
2015
,
Study on embryonic development of four species of Gentiana (Gentianaceae)
,
J. Chin. Med. Mater.
,
38
,
1572
6
.

17.

Zheng
,
H.R.
,
Chu
,
Y.
,
Li
,
W.
, et al.
2018
,
Research progress on pharmacokinetics of natural iridoids
,
Drug. Eval. Res.
,
41
,
1147
53
.

18.

Yang
,
R.
,
Fang
,
L.
,
Li
,
J.
, and
Zhang
,
Y.
2018
,
Research progress on biosynthetic pathways and related enzymes of iridoid glycosides
,
Chin. Tradit. Herb. Drugs
,
49
,
2482
8
.

19.

Vranova
,
E.
,
Coman
,
D.
, and
Gruissem
,
W.
2013
,
Network analysis of the MVA and MEP pathways for isoprenoid synthesis
,
Annu. Rev. Plant Biol.
,
64
,
665
700
.

20.

Fu
,
P.C.
,
Sun
,
S.S.
,
Twyford
,
A.D.
, et al.
2021
,
Lineage-specific plastid degradation in subtribe Gentianinae (Gentianaceae)
,
Ecol. Evol.
,
11
,
3286
99
.

21.

Zhang
,
Y.
,
Yu
,
J.
,
Xia
,
M.
, et al.
2021
,
Plastome sequencing reveals phylogenetic relationships among Comastoma and related taxa (Gentianaceae) from the Qinghai-Tibetan Plateau
,
Ecol. Evol.
,
11
,
16034
46
.

22.

Yuan
,
Y.M.
1993
,
Karyological studies on Gentiana section Cruciata Gaudin (Gentianaceae) from China
,
Caryologia
,
46
,
99
114
.

23.

Zhang
,
X.L.
,
Wang
,
Y.J.
,
Ge
,
X.J.
,
Yuan
,
Y.-M.
,
Yang
,
H.-L.
, and
Liu
,
J.-Q.
2009
,
Molecular phylogeny and biogeography of Gentiana sect. Cruciata (Gentianaceae) based on four chloroplast DNA datasets
,
Taxon
,
58
,
862
70
.

24.

Zhou
,
T.
,
Wang
,
J.
,
Jia
,
Y.
,
Li
,
W.
,
Xu
,
F.
, and
Wang
,
X.
2018
,
Comparative chloroplast genome analyses of species in Gentiana section Cruciata (Gentianaceae) and the development of authentication markers
,
Int. J. Mol. Sci.
,
19
,
1962
.

25.

Li
,
X.
,
Wang
,
L.
,
Yang
,
H.
, and
Liu
,
J.
2008
,
Confirmation of natural hybrids between Gentiana straminea and G. siphonantha (Gentianaceae) based on molecular evidence
,
Front. Biol. China.
,
3
,
470
6
.

26.

Zhang
,
X.
,
Ge
,
X.
,
Liu
,
J.
, and
Yuan
,
Y.
2006
,
Morphological, karyological and molecular delimitation of two gentians: Gentiana crassicaulis versus G. tibetica (Gentianaceae)
,
Acta Phytotax. Sin.
,
44
,
627
40
.

27.

Zhou
,
T.
,
Bai
,
G.
,
Hu
,
Y.
,
Ruhsam
,
M.
,
Yang
,
Y.
, and
Zhao
,
Y.
2022
,
De novo genome assembly of the medicinal plant Gentiana macrophylla provides insights into the genomic evolution and biosynthesis of iridoids
,
DNA Res.
,
29
,
1
15
.

28.

Li
,
T.
,
Yu
,
X.
,
Ren
,
Y.
, et al.
2022
,
The chromosome-level genome assembly of Gentiana dahurica (Gentianaceae) provides insights into gentiopicroside biosynthesis
,
DNA Res.
,
29
,
1
10
.

29.

Porebski
,
S.
,
Bailey
,
L.G.
, and
Baum
,
B.R.
1997
,
Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components
,
Plant Mol. Biol. Rep.
,
15
,
8
15
.

30.

Marçais
,
G.
and
Kingsford
,
C.
2011
,
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
,
Bioinformatics
,
27
,
764
70
.

31.

Walker
,
B.J.
,
Abeel
,
T.
,
Shea
,
T.
, et al.
2014
,
Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement
,
PLoS One
,
9
,
e112963
.

32.

Roach
,
M.J.
,
Schmidt
,
S.A.
, and
Borneman
,
A.R.
2018
,
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies
,
BMC Bioinf.
,
19
,
460
.

33.

Simao
,
F.A.
,
Waterhouse
,
R.M.
,
Ioannidis
,
P.
,
Kriventseva
,
E.V.
, and
Zdobnov
,
E.M.
2015
,
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
,
Bioinformatics.
,
31
,
3210
2
.

34.

Ou
,
S.
and
Jiang
,
N.
2018
,
LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons
,
Plant Physiol.
,
176
,
1410
22
.

35.

Ramani
,
V.
,
Deng
,
X.
,
Qiu
,
R.
, et al.
2020
,
Sci-Hi-C: a single-cell Hi-C method for mapping 3D genome organization in large number of single cells
,
Methods.
,
170
,
61
8
.

36.

Robinson
,
J.T.
,
Turner
,
D.
,
Durand
,
N.C.
,
Thorvaldsdóttir
,
H.
,
Mesirov
,
J.P.
, and
Aiden
,
E.L.
2018
,
Juicebox.js provides a cloud-based visualization system for Hi-C data
,
Cell Syst.
,
6
,
256
8.e1
.

37.

Price
,
A.L.
,
Jones
,
N.C.
, and
Pevzner
,
P.A.
2005
,
De novo identification of repeat families in large genomes
,
Bioinformatics.
,
21
,
i351
8
.

38.

Xu
,
Z.
and
Wang
,
H.
2007
,
LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons
,
Nucleic Acids Res.
,
35
,
W265
8
.

39.

Tarailo-Graovac
,
M.
and
Chen
,
N.
2009
,
Using RepeatMasker to identify repetitive elements in genomic sequences
,
Curr. Protoc. Bioinformatics.
,
25
,
1
14
.

40.

Haas
,
B.J.
,
Delcher
,
A.L.
,
Mount
,
S.M.
, et al.
2003
,
Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
,
Nucleic Acids Res.
,
31
,
5654
66
.

41.

Nachtweide
,
S.
and
Stanke
,
M.
2019
,
Multi-genome annotation with AUGUSTUS
,
Methods mol biol.
,
1962
,
139
60
.

42.

Burge
,
C.
and
Karlin
,
S.
1997
,
Prediction of complete gene structures in human genomic DNA
,
J. Mol. Biol.
,
268
,
78
94
.

43.

Majoros
,
W.H.
,
Pertea
,
M.
, and
Salzberg
,
S.L.
2004
,
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders
,
Bioinformatics
,
20
,
2878
9
.

44.

Liang
,
Y.
,
Li
,
F.
,
Gao
,
Q.
, et al.
2022
,
The genome of Eustoma grandiflorum reveals the wholegenome triplication event contributing to ornamental traits in cultivated lisianthus
,
Plant Biotechnol. J.
,
20
,
1856
8
.

45.

Franke
,
J.
,
Kim
,
J.
,
Hamilton
,
J.P.
, et al.
2019
,
Gene Discovery in Gelsemium highlights conserved gene clusters in monoterpene indole alkaloid biosynthesis
,
ChemBioChem
,
20
,
83
7
.

46.

Denoeud
,
F.
,
Carretero-Paulet
,
L.
,
Dereeper
,
A.
, et al.
2014
,
The coffee genome provides insight into the convergent evolution of caffeine biosynthesis
,
Science.
,
345
,
1181
4
.

47.

Clément
,
C.
,
Emily
,
A.S.
,
Hans
,
J.J.
, et al.
2022
,
An updated version of the Madagascar periwinkle genome
,
F1000Research.
,
11
,
1541
.

48.

Hoopes
,
G.M.
,
Hamilton
,
J.P.
,
Kim
,
J.
, et al.
2018
,
Genome assembly and annotation of the medicinal plant Calotropis gigantea, a producer of anticancer and antimalarial cardenolides
,
G3 (Bethesda).
,
8
,
385
91
.

49.

Cantarel
,
B.L.
,
Korf
,
I.
,
Robb
,
S.M.
, et al.
2008
,
MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes
,
Genome Res.
,
18
,
188
96
.

50.

Jones
,
P.
,
Binns
,
D.
,
Chang
,
H.Y.
, et al.
2014
,
InterProScan 5: genome-scale protein function classification
,
Bioinformatics
,
30
,
1236
40
.

51.

Lau
,
K.H.
,
Bhat
,
W.W.
,
Hamilton
,
J.P.
, et al.
2020
,
Genome assembly of Chiococca alba uncovers key enzymes involved in the biosynthesis of unusual terpenoids
,
DNA Res.
,
27
,
dsaa013
.

52.

Hellsten
,
U.
,
Wright
,
K.M.
,
Jenkins
,
J.
, et al.
2013
,
Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing
,
Proc. Natl. Acad. Sci. USA
,
110
,
19478
82
.

53.

Su
,
X.
,
Wang
,
B.
,
Geng
,
X.
, et al.
2021
,
A high-continuity and annotated tomato reference genome
,
BMC Genomics.
,
22
,
898
.

54.

Badouin
,
H.
,
Gouzy
,
J.
,
Grassa
,
C.J.
, et al.
2017
,
The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution
,
Nature
,
546
,
148
52
.

55.

Kang
,
M.
,
Fu
,
R.
,
Zhang
,
P.
, et al.
2021
,
A chromosome-level Camptotheca acuminata genome assembly provides insights into the evolutionary origin of camptothecin biosynthesis
,
Nat. Commun.
,
12
,
3531
.

56.

Rai
,
A.
,
Hirakawa
,
H.
,
Nakabayashi
,
R.
, et al.
2021
,
Chromosomelevel genome assembly of Ophiorrhiza pumila reveals the evolution of camptothecin biosynthesis
,
Nat. Commun.
,
12
,
405
.

57.

Massonnet
,
M.
,
Cochetel
,
N.
,
Minio
,
A.
, et al.
2020
,
The genetic basis of sex determination in grapes
,
Nat. Commun.
,
11
,
2902
.

58.

Li
,
L.
,
Stoeckert
,
C.J.
, and
Roos
,
D.S.
2003
,
OrthoMCL: identification of ortholog groups for eukaryotic genomes
,
Genome Res.
,
13
,
2178
89
.

59.

Han
,
M.V.
,
Thomas
,
G.W.
,
Lugo-Martinez
,
J.
, and
Hahn
,
M.W.
2013
,
Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3
,
Mol. Biol. Evol.
,
30
,
1987
97
.

60.

Edgar
,
R.C.
2004
,
MUSCLE: multiple sequence alignment with high accuracy and high throughput
,
Nucleic Acids Res.
,
32
,
1792
7
.

61.

Capella-Gutiérrez
,
S.
,
Silla-Martı́nez
,
J.M.
, and
Gabaldón
,
T.
2009
,
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
,
Bioinformatics
,
25
,
1972
3
.

62.

Stamatakis
,
A.
2014
,
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
,
Bioinformatics
,
30
,
1312
3
.

63.

Yang
,
Z.
2007
,
PAML 4: phylogenetic analysis by maximum likelihood
,
Mol. Biol. Evol.
,
24
,
1586
91
.

64.

Guan
,
R.
,
Zhao
,
Y.
,
Zhang
,
H.
, et al.
2016
,
Draft genome of the living fossil Ginkgo biloba
,
GigaScience
,
5
,
49
.

65.

Zhang
,
X.
,
Zhang
,
Y.
,
Kou
,
Y.
, et al.
2023
,
Diploid chromosome-level reference genome and population genomic analyses provide insights into Gypenoside biosynthesis and demographic evolution of Gynostemma pentaphyllum (Cucurbitaceae)
,
Hortic. Res.
,
10
,
uhac231
.

66.

Guo
,
L.
,
Winzer
,
T.
,
Yang
,
X.
, et al.
2018
,
The opium poppy genome and morphinan production
,
Science
,
362
,
343
7
.

67.

Frith
,
M.C.
,
Hamada
,
M.
, and
Horton
,
P.
2010
,
Parameters for accurate genome alignment
,
BMC Bioinf.
,
11
,
1
14
.

68.

Tang
,
H.
,
Bowers
,
J.E.
,
Wang
,
X.
,
Ming
,
R.
,
Alam
,
M.
, and
Paterson
,
A.H.
2008
,
Synteny and collinearity in plant genomes
,
Science
,
320
,
486
8
.

69.

Katoh
,
K.
and
Standley
,
D.M.
2013
,
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
,
Mol. Biol. Evol.
,
30
,
772
80
.

70.

Guindon
,
S.
,
Lethiec
,
F.
,
Duroux
,
P.
, and
Gascuel
,
O.
2005
,
PHYML Online-a web server for fast maximum likelihood-based phylogenetic inference
,
Nucleic Acids Res.
,
33
,
W557
9
.

71.

Jones
,
D.T.
,
Taylor
,
W.R.
, and
Thornton
,
J.M.
1992
,
The rapid generation of mutation data matrices from protein sequences
,
Comput Appl Biosci.
,
8
,
275
82
.

72.

Liu
,
W.
,
Guo
,
W.
,
Chen
,
S.
, et al.
2022
,
A high-quality reference genome sequence and genetic transformation system of Aralia elata
,
Front. Plant Sci.
,
13
,
822942
.

73.

Roberts
,
A.
and
Pachter
,
L.
2013
,
Streaming fragment assignment for real-time analysis of sequencing experiments
,
Nat. Methods
,
10
,
71
3
.

74.

Love
,
M.I.
,
Huber
,
W.
, and
Anders
,
S.
2014
,
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
,
Genome Biol.
,
15
,
1
21
.

75.

Ou
,
S.
,
Chen
,
J.
, and
Jiang
,
N.
2018
,
Assessing genome assembly quality using the LTR Assembly Index (LAI)
,
Nucleic Acids Res.
,
46
,
e126
.

76.

van de Peer
,
Y.
,
Fawcett
,
J.A.
,
Proost
,
S.
,
Sterck
,
L.
, and
Vandepoele
,
K.
2009
,
The flowering world: a tale of duplications
,
Trends Plant Sci.
,
14
,
680
8
.

77.

Sudhir
,
K.
,
Glen
,
S.
,
Michael
,
S.
, et al.
2017
,
TimeTree: a resource for timelines, timetrees, and divergence times
,
Mol. Biol. Evol.
,
34
,
1812
9
.

78.

Schuler
,
M.A.
2015
,
P450s in plants, insects, and their fungal pathogens
,
Cytochrome.
,
450
,
409
49
.

79.

Coon
,
M.J.
2005
,
Cytochrome P450: Nature’s most versatile biological catalyst
,
Annu. Rev. Pharmacol. Toxicol.
,
45
,
1
25
.

80.

Paquette
,
S.M.
,
Bak
,
S.
, and
Feyereisen
,
R.
2000
,
Intron-exon organization and phylogeny in a large superfamily, the paralogous cytochrome P450 genes of Arabidopsis thaliana
,
DNA Cell Biol.
,
19
,
307
17
.

81.

Yu
,
J.
,
Tehrim
,
S.
,
Wang
,
L.
, et al.
2017
,
Evolutionary history and functional divergence of the cytochrome P450 gene superfamily between Arabidopsis thaliana and Brassica species uncover effects of whole genome and tandem duplications
,
BMC Genomics
,
18
,
733
.

82.

Ma
,
Y.
,
Cai
,
Y.
,
Ma
,
X.
, et al.
2020
,
Research progress of P450 in the biosynthesis of bioactive compound of medicinal plants
,
Acta Pharm. Sin.
,
55
,
1573
89
.

83.

Ganjewala
,
D.
,
Kumar
,
S.
,
Devi
,
A.
, and
Ambika
,
K.
2010
,
Advances in cyanogenic glycosides biosynthesis and analyses in plants
,
Acta Biol. Szegediensis.
,
54
,
1
14
.

84.

Sonderby
,
I.E.
,
Geu-Flores
,
F.
, and
Halkier
,
B.A.
2010
,
Biosynthesis of glucosinolates-gene discovery and beyond
,
Trends Plant Sci.
,
15
,
283
90
.

85.

Xin
,
T.
,
Zhang
,
Y.
,
Pu
,
X.
,
Gao
,
R.
,
Xu
,
Z.
, and
Song
,
J.
2019
,
Trends in Herbgenomics
,
Sci. China Life Sci.
,
62
,
288
308
.

86.

The Angiosperm Phylogeny Group
.
2016
,
An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV
,
Bot. J. Linn. Soc.
,
181
,
1
20
.

87.

Ni
,
L.
,
Zhao
,
Z.
,
Xu
,
H.
,
Chen
,
S.
, and
Dorje
,
G.
2017
,
Chloroplast genome structures in Gentiana (Gentianaceae), based on three medicinal alpine plants used in Tibetan herbal medicine
,
Curr. Genet.
,
63
,
241
52
.

88.

Zhang
,
C.
,
Zhang
,
T.
,
Luebert
,
F.
, et al.
2020
,
Asterid phylogenomics/phylotranscriptomics uncover morphological evolutionary histories and support phylogenetic placement for numerous whole-genome duplications
,
Mol. Biol. Evol.
,
37
,
3188
210
.

89.

Cao
,
Q.
,
Xu
,
L.H.
,
Wang
,
J.L.
,
Zhang
,
F.Q.
, and
Chen
,
S.L.
20212021
,
Molecular phylogeny of subtribe Swertiinae
,
Bull. Bot. Res.
,
41
,
408
18
.

90.

Ho
,
T.
,
Liu
,
S.W.
, and
Lu
,
X.F.
1996
,
A phylogenetic analysis of Gentiana (Gentianaceae)
,
Acta Phytotax. Sin.
,
34
,
505
30
.

91.

Wu
,
S.
,
Han
,
B.
, and
Jiao
,
Y.
2020
,
Genetic contribution of paleopolyploidy to adaptive evolution in Angiosperms
,
Mol. Plant
,
13
,
59
71
.

92.

Van de Peer
,
Y.
,
Maere
,
S.
, and
Meyer
,
A.
2009
,
The evolutionary significance of ancient genome duplications
,
Nat. Rev. Genet.
,
10
,
725
32
.

93.

Zhu
,
M.
,
Wang
,
Z.
,
Yang
,
Y.
,
Wang
,
Z.
,
Mu
,
W.
, and
Liu
,
J.
2023
,
Multi-omics reveal differentiation and maintenance of dimorphic flowers in an alpine plant on the Qinghai-Tibet Plateau
,
Mol. Ecol.
,
32
,
1411
24
.

94.

Shirasawa
,
K.
,
Arimoto
,
R.
,
Hirakawa
,
H.
, et al.
2023
,
Chromosome-scale genome assembly of Eustoma grandiflorum, the first complete genome sequence in the genus Eustoma
,
G3
,
13
,
jkac329
.

95.

Cheng
,
J.
,
Wang
,
X.
,
Liu
,
X.
, et al.
2021
,
Chromosome-level genome of Himalayan yew provides insights into the origin and evolution of the paclitaxel biosynthetic pathway
,
Mol. Plant
,
14
,
1199
209
.

96.

Xiong
,
Y.
,
Mei
,
W.
,
Kim
,
E.
, et al.
2014
,
Adaptive expansion of the maize maternally expressed gene (Meg) family involves changes in expression patterns and protein secondary structures of its members
,
BMC Plant Biol.
,
14
,
204
.

97.

Cannon
,
S.B.
,
Mitra
,
A.
,
Baumgarten
,
A.
,
Young
,
N.D.
, and
May
,
G.
2004
,
The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana
,
BMC Plant Biol.
,
4
,
10
.

98.

Wu
,
W.
,
Zhu
,
S.
,
Xu
,
L.
, et al.
2022
,
Genome-wide identification of the Liriodendron chinense WRKY gene family and its diverse roles in response to multiple abiotic stress
,
BMC Plant Biol.
,
22
,
25
.

99.

Zenil-Ferguson
,
R.
,
Ponciano
,
J.M.
, and
Burleigh
,
J.G.
2016
,
Evaluating the role of genome downsizing and size thresholds from genome size distributions in angiosperms
,
Am. J. Bot.
,
103
,
1175
86
.

100.

Sankoff
,
D.
,
Zheng
,
C.F.
, and
Zhu
,
Q.A.
2010
,
The collapse of gene complement following whole genome duplication
,
BMC Genomics
,
11
,
313
.

101.

Hollister
,
J.D.
2015
,
Polyploidy: adaptation to the genomic environment
,
New Phytol.
,
205
,
1034
9
.

102.

Chaney
,
L.
,
Sharp
,
A.R.
,
Evans
,
C.R.
, and
Udall
,
J.A.
2016
,
Genome mapping in plant comparative genomics
,
Trends Plant Sci.
,
21
,
770
80
.

103.

Soltis
,
P.S.
and
Soltis
,
D.E.
2016
,
Ancient WGD events as drivers of key innovations in angiosperms
,
Curr. Opin Plant Biol.
,
30
,
159
65
.

104.

Panchy
,
N.
,
Lehti-Shiu
,
M.
, and
Shiu
,
S.H.
2016
,
Evolution of gene duplication in plants
,
Plant Physiol.
,
171
,
2294
316
.

105.

Nelson
,
D.R.
,
Schuler
,
M.A.
,
Paquette
,
S.M.
,
Werck-Reichhart
,
D.
, and
Bak
,
S.
2004
,
Comparative genomics of rice and Arabidopsis. Analysis of 727 cytochrome P450 genes and pseudogenes from a monocot and a dicot
,
Plant Physiol.
,
135
,
756
72
.

106.

Ilc
,
T.
,
Arista
,
G.
,
Tavares
,
R.
, et al.
2018
,
Annotation, classification, genomic organization and expression of the Vitis vinifera CYPome
,
PLoS One
,
13
,
e0199902
.

107.

Nelson
,
D.
and
Werck-Reichhart
,
D.
2011
,
A P450-centric view of plant evolution
,
Plant J.
,
66
,
194
211
.

108.

Hansen
,
C.C.
,
Nelson
,
D.R.
,
Møller
,
B.L.
, and
Werck-Reichhart
,
D.
2021
,
Plant cytochrome P450 plasticity and evolution
,
Mol. Plant.
,
14
,
1244
65
.

109.

Zheng
,
P.
,
Zhang
,
K.
, and
Wang
,
Z.
2011
,
Genetic diversity and gentiopicroside content of four Gentiana species in China revealed by ISSR and HPLC methods
,
Biochem. Syst. Ecol.
,
39
,
704
10
.

110.

Wu
,
L.H.
,
Ye
,
Y.
,
Li
,
X.S.
,
Duan
,
C.H.
, and
Wang
,
Z.T.
2009
,
RP - HPLC determination of gentiopcroside in Radix Gentianae Macrophyllae in traditional area, Chin
,
J. Pharm. Anal.
,
29
,
184
7
.

111.

Li
,
Z.
,
Du
,
Y.
,
Yuan
,
Y.
,
Zhang
,
X.
,
Wang
,
Z.
, and
Tian
,
X.
2020
,
Integrated quality evaluation strategy for multi-species resourced herb medicine of Qinjiao by metabolomics analysis and genetic comparation
,
Chin. Med.
,
15
,
16
.

112.

Wang
,
W.
,
Liang
,
Z.
,
Xu
,
L.
, et al.
2014
,
Dynamic changes of yield and active component mass fraction in different parts of Gentiana macrophylla Pall. at different ages
,
Acta Agric. Boreali-Occidentalis Sin.
,
23
,
167
71
.

113.

Ma
,
X.
,
Zhu
,
J.
,
He
,
L.
, et al.
2009
,
Determination of gentiopicroside in different parts of Gentiana macrophylla from Gansu Province
,
Chin. J. Exp. Tradit. Med. Form.
,
15
,
10
1
.

114.

Li
,
J.
,
Li
,
F.
,
Li
,
X.
, et al.
2004
,
Analysis on the amounts of gentiopicrin in different location of Gentiana crassicaulis Duthie ex Burk
,
Nat. Prod. Res. Dev.
,
16
,
225
.

115.

Chen
,
J.
and
Zeng
,
R.
2018
,
Application of metabolomics approach to study on chemical constituents in different parts of Gentiana crassicaulis based on UPLC-ESI-HRMSn
,
Chin. Tradit. Herb. Drugs.
,
49
,
2328
35
.

116.

Li
,
X.Y.
,
Li
,
F.A.
,
Li
,
J.M.
, and
Wei
,
Q.J.
2005
,
Distribution of gentiopicroside in Gentiana straminea and Gentiana dahurica from Qinghai Province
,
J. Chin. Med. Mater.
,
28
,
174
6
.

117.

Cao
,
X.Y.
,
Wang
,
Z.J.
, and
Wang
,
Z.Z.
2012
,
Comparative analysis of contents of four iridoid glucosides in different organs of four species of Gentiana L
.,
J. Plant Resour. Environ.
,
21
,
58
63
.

118.

Sun
,
J.
,
Li
,
Y.L.
,
Ji
,
L.J.
, et al.
2006
,
HPLC determination of contents of four active constituents in Tibetan medicine Gentiana straminea (Gentianaceae) during different growing period
,
Acta Bot. Yunnan.
,
28
,
219
22
.

119.

Zhou
,
T.
,
Luo
,
X.
,
Yu
,
C.
, et al.
2019
,
Transcriptome analyses provide insights into the expression pattern and sequence similarity of several taxol biosynthesis-related genes in three Taxus species
,
BMC Plant Biol.
,
19
,
33
.

Author notes

Gyab Ala Kelsang and Lianghong Ni contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].