Abstract

The advent of high-throughput sequencing technologies has led to the production of a significant amount of omics data in plants, which serves as valuable assets for conducting cross-species multi-omics comparative analysis. Nevertheless, the current dearth of comprehensive platforms providing evolutionary annotation information and multi-species multi-omics data impedes users from systematically and efficiently performing evolutionary and functional analysis on specific genes. In order to establish an advanced plant multi-omics platform that provides timely, accurate, and high-caliber omics information, we collected 7 distinct types of omics data from 6 monocots, 6 dicots, and 1 moss, and reanalyzed these data using standardized pipelines. Additionally, we furnished homology information, duplication events, and phylostratigraphic stages of 13 species to facilitate evolutionary examination. Furthermore, the integrative plant omics platform (IPOP) is bundled with a variety of online analysis tools that aid users in conducting evolutionary and functional analysis. Specifically, the Multi-omics Integration Analysis tool is available to consolidate information from diverse omics sources, while the Transcriptome-wide Association Analysis tool facilitates the linkage of functional analysis with phenotype. To illustrate the application of IPOP, we conducted a case study on the YTH domain gene family, wherein we observed shared functionalities within orthologous groups and discerned variations in evolutionary patterns across these groups. To summarize, the IPOP platform offers valuable evolutionary insights and multi-omics data to the plant sciences community, effectively addressing the need for cross-species comparison and evolutionary research platforms. All data and modules within IPOP are freely accessible for academic purposes (http://omicstudio.cloud:4012/ipod/).

Introduction

The evolutionary divergence of plant species is evident not only in the structures of their genomes but also in the genetic and epigenetic variations of homologous genes across various molecular processes, including nucleotide polymorphism, gene expression, and modification (Miao et al. 2020, 2022; Theissinger et al. 2023). With the advent of high-throughput sequencing technologies in the postgenomic era, an extensive collection of omics data has been generated in the plant kingdom. These data sets serve as valuable experimental resources for investigating the patterns of plant species evolution. However, the complete utilization of multi-omics data poses a significant challenge in the field of plant evolutionary study. Consequently, the development of an informative multi-omics platform that encompasses extensive evolutionary annotations and practical integrated analysis tools across multiple species is crucial for facilitating cross-species comparison and investigating the evolutionary implications of gene functions.

In recent years, several multi-omics integrated databases tailored to specific plant species have emerged, aiming to enhance functional genomics research through the utilization of omics data. For instance, AtMAD is an Arabidopsis multi-omics association database that facilitates the exploration of comprehensive interactions across various levels, including the genome, transcriptome, methylome, environment, pathway, and phenotype (Lan et al. 2021). CottonFGD serves as a platform that integrates diverse data sets encompassing cotton genomes, transcriptomes, and population genomes, catering to the needs of the cotton functional genomics and breeding research community (Zhu et al. 2017). RiceENCODE, functioning as an encyclopedia of DNA elements specific to rice, amalgamates data sets related to 3D chromatin interactions, histone modification, chromatin accessibility, DNA methylation, and transcriptomes data sets for rice epigenomics and functional genomic research (Xie et al. 2021). WheatOmics integrates various multi-omics data sets and provides practical toolkits to expedite functional genomics research in wheat (Ma et al. 2021). SoyOmics offers a comprehensive integration of multi-omics data sets, such as population genomes, transcriptomes, and epigenetic data, for the purpose of studying soybean functionality and molecular breeding (Liu et al. 2023). Despite the ongoing expansion and frequent utilization of these databases, the homogeneity of omics features pertaining to individual species hinders the ability to obtain a comprehensive understanding of the evolutionary implications of homologous genes.

Recently, a number of integrative databases have emerged that aim to compare and analyze omics features from multiple plant species. For instance, PODC focuses on analyzing gene expression networks of 8 plant species (Ohyanagi et al. 2015), MaGenDB provides functional genomics information for 13 Malvaceae plants (Wang et al. 2020), and ChIP-Hub offers an exploration of plant regulome and epigenome information. These databases have taken on the task of addressing the challenge of integrating and analyzing diverse omics features across different plant species (Fu et al. 2022). However, a number of emerging challenges have surfaced, including the absence of adequate platforms for integrating multi-omics knowledge and evolutionary characteristics of homologous genes, as well as the infrequent updating of these platforms with newly published omics data. These limitations impede the integration of omics sciences with the investigation of molecular evolution and function.

The objective of this study was to construct an integrative plant omics platform (IPOP) in order to investigate and elucidate the correlation between multi-omics features and evolutionary characteristics. The ultimate aim was to utilize the acquired knowledge for cross-species evolutionary analysis and gene functional research. A vast collection of recently published multi-omics data sets, encompassing population genomics, 3D genomics, epigenomics, transcriptomics, and epitranscriptomics, were comprehensively gathered from 13 representative plant species, consisting of 6 monocotyledons, 6 dicotyledons, and 1 moss. Each category of omics data sets underwent individual reanalysis using a standardized pipeline. Furthermore, we have successfully devised a range of practical online tools to facilitate the molecular evolutionary and functional analysis of putative gene families. These tools encompass omics comparison analysis, multi-omics integration analysis, transcripome-wide association analysis (TWAS), evolutionary feature annotation, phylogenetic tree construction, and basic local alignment search tool (BLAST).

Results and Discussion

Overview of IPOP

The primary structure of IPOP is depicted in Fig. 1. A multitude of recently published high-throughput sequencing data sets were gathered, encompassing various techniques such as chromatin immunoprecipitation sequencing (ChIP-seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq), high-throughput technique to capture the conformation of genomes (Hi-C), Bisulfite-seq (BS-seq), RNA-seq, N6-methyladenosine (m6A)-seq, and associations identified by genome-wide association study (GWAS) of population resequencing data, from a total of 13 plant species (Table 1). Each type of data set underwent individual quality control and processing procedures through a standardized pipeline. All acquired omics knowledge was systematically integrated into IPOP to facilitate data access and visualization (http://omicstudio.cloud:4012/ipod/). Specifically, IPOP encompasses comprehensive information regarding the distribution patterns and abundance of 2,530,647 modification peaks identified from ChIP-seq data, 10,000,863 accessible chromatin regions identified from ATAC-seq data, 5,367 chromatin loops and 565 topological associating domains (TADs) identified from Hi-C data, 48,193,750 DNA 5mC methylated genes identified from BS-seq, abundance of 52,651,064 gene transcripts identified from RNA-seq, 683,022 RNA m6A methylation peaks identified from m6A-seq data, and 106,876 trait-associated SNPs identified from population resequencing data. Moreover, drawing upon this biological understanding, we have devised a range of functional modules, namely Omics-Hub, Evolutionary Genomics, Search, Browser, Tools, and Download, to facilitate users in gaining comprehensive insights into the exploration of plant biological mechanisms across multiple omics levels. These modules are elaborated upon in subsequent sections.

Overview of IPOP. A) Data resources included 7 types of omics data from 13 species. B) All data are stored in a MySQL relationship database with additional indexes. JavaScript and jQuery are used for front-end, and back-end programs are implemented with SpringBoot and MyBatis. C) Overview of the functional modules and usage of IPOP.
Fig. 1.

Overview of IPOP. A) Data resources included 7 types of omics data from 13 species. B) All data are stored in a MySQL relationship database with additional indexes. JavaScript and jQuery are used for front-end, and back-end programs are implemented with SpringBoot and MyBatis. C) Overview of the functional modules and usage of IPOP.

Table 1

Summary of omics data sets collected in IPOP

ChIP-seqATAC-seqHi-CGWASBS-seqRNA-seqm6A-seq
Arabidopsis thaliana1682551642,20724327663
Gossypium arboreum/21//993
Gossypium hirsutum823/4853
Phaseolus vulgaris66//92853
Glycine max224/444521093
Solanum lycopersicum3629/610918
Sorghum bicolor/2/281741153
Zea mays/127/41,219710139
Aegilops tauschii63//31973
Triticum dicoccoides/////1353
Triticum aestivum4319///1299
Oryza sativa7585220,1891718736
Physcomitrella patens/30//2993
ChIP-seqATAC-seqHi-CGWASBS-seqRNA-seqm6A-seq
Arabidopsis thaliana1682551642,20724327663
Gossypium arboreum/21//993
Gossypium hirsutum823/4853
Phaseolus vulgaris66//92853
Glycine max224/444521093
Solanum lycopersicum3629/610918
Sorghum bicolor/2/281741153
Zea mays/127/41,219710139
Aegilops tauschii63//31973
Triticum dicoccoides/////1353
Triticum aestivum4319///1299
Oryza sativa7585220,1891718736
Physcomitrella patens/30//2993
Table 1

Summary of omics data sets collected in IPOP

ChIP-seqATAC-seqHi-CGWASBS-seqRNA-seqm6A-seq
Arabidopsis thaliana1682551642,20724327663
Gossypium arboreum/21//993
Gossypium hirsutum823/4853
Phaseolus vulgaris66//92853
Glycine max224/444521093
Solanum lycopersicum3629/610918
Sorghum bicolor/2/281741153
Zea mays/127/41,219710139
Aegilops tauschii63//31973
Triticum dicoccoides/////1353
Triticum aestivum4319///1299
Oryza sativa7585220,1891718736
Physcomitrella patens/30//2993
ChIP-seqATAC-seqHi-CGWASBS-seqRNA-seqm6A-seq
Arabidopsis thaliana1682551642,20724327663
Gossypium arboreum/21//993
Gossypium hirsutum823/4853
Phaseolus vulgaris66//92853
Glycine max224/444521093
Solanum lycopersicum3629/610918
Sorghum bicolor/2/281741153
Zea mays/127/41,219710139
Aegilops tauschii63//31973
Triticum dicoccoides/////1353
Triticum aestivum4319///1299
Oryza sativa7585220,1891718736
Physcomitrella patens/30//2993

The Omics-Hub Module

The Omics-Hub module encompasses 7 distinct types of omics profiles, including histone modification, chromatin accessibility, chromatin interaction, DNA 5mC modification, transcriptome, RNA m6A modification, and population genomics. The web page for each individual omics profile displays the statistical information, encompassing strain names, tissues total sample count, detailed descriptions, and data quality of all experimental samples. Furthermore, users have the capability to extract a multitude of insights from the sample page associated with each omics feature. For instance, within the sample page of histone modification, users can access distribution patterns and detailed descriptions pertaining to histone modification peaks (supplementary fig. S1, Supplementary Material online). Within the sample page of chromatin accessibility, users can acquire distribution patterns and detailed descriptions concerning accessible chromatin regions (supplementary fig. S2, Supplementary Material online). The sample page of chromatin interaction provides users with information regarding chromatin loops and TADs (supplementary fig. S3, Supplementary Material online). The sample page of DNA modification allows users to access gene methylation levels in the context of CG, CHG, and CHH (supplementary fig. S4, Supplementary Material online). Additionally, the sample page of transcriptome offers users the transcriptional levels of gene transcripts (supplementary fig. S5, Supplementary Material online). The sample page of RNA modification provides users with information on m6A peak distribution, abundance, sequence motif enrichment, and the transcriptional profiles of m6A regulator candidates (supplementary fig. S6, Supplementary Material online). On the population genomics profile page, users have access to information pertaining to SNP-trait associations, including genotype, phenotype, significance of the association, and related protein-coding genes (supplementary fig. S7, Supplementary Material online). The sequencing read coverage of each omics features can be visualized through the use of a customized JBrowser. These various omics features have been integrated into their respective gene information page, which also includes evolutionary genomic annotations (supplementary fig. S8, Supplementary Material online). Additionally, we have implemented “Omics Search” function in each omics profile page, enabling users to search for the knowledge profile of a specific sample conveniently.

The Evolutionary Genomics Module

Through the utilization of genomic evolution analysis, a total of 512,106 orthologous genes, 30,454 orthologous groups (orthogroups), and 567,209 duplicate genes were identified from a pool of 612,396 protein-coding genes across 13 plant species (supplementary tables S1 and S2, Supplementary Material online). Building upon this foundation, we have developed an Evolutionary Genomics module that encompasses comprehensive knowledge pertaining to gene homology, evolutionary origins, and duplication events.

The Evolutionary Genomics module consists of 5 main components. The first part displays the basic information of the evolutionary genomics module, encompassing the count of species, phylostratigraphic stages, and orthogroups (Fig. 2A). Elaborate information can be acquired by either hovering over or clicking on the specific statistical values. The subsequent component of the module encompasses a phylostratigraphic tree, which visually represents the evolutionary stages of orthologous genes (Fig. 2B). By hovering over or clicking on the label associated with a phylostratigraphic stage, users can access the corresponding speciation time and information page (supplementary fig. S9, Supplementary Material online). The third section presents a stacked bar chart illustrating the distribution of orthogroups across different phylostratigraphic stages (Fig. 2C). Each orthogroup is associated with a profile page that provides diverse relevant information (supplementary fig. S10, Supplementary Material online). The fourth section provides a summary of the number of orthologous genes, orthogroups, and phylostratigraphic stages for each species (Fig. 2D). The fifth section presents the breakdown of duplication events in each species (Fig. 2E). For more detailed information on these evolutionary features, users can access the corresponding bars by clicking on them. Finally, users are able to conveniently retrieve the information pertaining to specific orthologous genes, orthogroups, phylostratigraphic stages, and duplication events through the utilization of the “Search” function integrated within the Evolutionary Genomics page.

The main structure of evolutionary genomics modules. A) Basic information of the evolutionary genomics module. B) Phylostratigraphic tree. C) Number of orthogroups in each phylostratigraphic stage. D) Statistics of orthologous genes and groups in each species. E) Statistics of duplication events in each species.
Fig. 2.

The main structure of evolutionary genomics modules. A) Basic information of the evolutionary genomics module. B) Phylostratigraphic tree. C) Number of orthogroups in each phylostratigraphic stage. D) Statistics of orthologous genes and groups in each species. E) Statistics of duplication events in each species.

Tools for Integrative Analysis of Multi-omics and Evolution

With the objective of facilitating the investigation of cross-species evolution and gene functionality, we have developed a range of online tools. These tools enable users to investigate and interpret the associations between multi-omics data and evolutionary characteristics. Primarily, the BLAST tool (Version 2.12.0) (Camacho et al. 2009) is employed to identify homologous genes of interest, and the outcomes are presented in a downloadable tabular format (Fig. 3A). The Phylogenetic Tree Construction tool embedded FastTree (Version 2.1.11) (Price et al. 2009) facilitates the construction of evolutionary trees for the purpose of investigating the evolutionary mechanisms of gene families (Fig. 3B). The Evolutionary Information Annotation tool offers additional pertinent evolutionary data, including orthogroup statistics (Fig. 3C), phylostratigraphic stage (Fig. 3D), and duplication events (Fig. 3E), which can be accessed by hovering over or clicking on the respective bars. Furthermore, the Omics Comparison Analysis tool and Multi-Omics Integration Analysis tool are available to enable functional analysis of gene families. The Omics Comparison Analysis tool conducts statistical analysis and identifies differences in various samples of omics data. The tool presents overlapped and unique peaks in a bar graph format (Fig. 3F). The differences in expression and modification levels are calculated using DESeq2 (Version 1.30.1) (Love et al. 2014) and CGmapTools (Guo et al. 2018), visualized by volcano maps (Fig. 3G). The Multi-Omics Integration Analysis tool combines different omics data using BEDTools (Version 2.30.0) (Quinlan and Hall 2010) to facilitate the identification of significant areas and provide contextual information (Fig. 3H). Lastly, the Transcriptome-wide association analysis tool based on FUSION (Gusev et al. 2016) establishes correlations between gene expression levels and phenotypic variation, thereby enhancing the functional investigation of gene families (Fig. 3I).

The results of running the tools of IPOP. A) Result of Basic Local Alignment tool. Hover at the top of the table to see each column of definitions. Q-Seq-ID, query sequence ID; S-Seq-ID, subject sequence ID; P-Ident, percentage of identical matches; length, alignment length; Q-Start, start of alignment in query; Q-End, end of alignment in query; S-Start, start of alignment in subject; S-End, end of alignment in subject; E-value, expected value. B) Result of Phylogenetic Tree Construction tool. Physcomitrella patens gene Pp3c2_11860 was used as an outgoup. C–E) Results of Evolutionary Information Annotation tool. C) Statistics of the gene number for each orthogroup in the input data set. D) Statistics of gene evolutionary origin. E) Statistics of related duplication events. F) and G) Results of Omics Comparison Analysis tool. F) Statistics on the overlapped and unique peaks. G) Volcanic maps for differentially expressed genes. H) Result of Multi-omics Integration Analysis tool. I) Result of Transcriptome-wide Association Analysis tool. Hover at the top of the table to see each column of definitions. ID, gene identifier; CHR, chromosome; Best Gwas ID, rsID of the most significant GWAS SNP in locus; P0, gene start; P1, gene end; HSQ, heritability of the gene; MODEL, best performing model; N-SNP, number of SNPs in the locus; TWAS_P, TWAS P-value; TWAS_Z, TWAS Z-score.
Fig. 3.

The results of running the tools of IPOP. A) Result of Basic Local Alignment tool. Hover at the top of the table to see each column of definitions. Q-Seq-ID, query sequence ID; S-Seq-ID, subject sequence ID; P-Ident, percentage of identical matches; length, alignment length; Q-Start, start of alignment in query; Q-End, end of alignment in query; S-Start, start of alignment in subject; S-End, end of alignment in subject; E-value, expected value. B) Result of Phylogenetic Tree Construction tool. Physcomitrella patens gene Pp3c2_11860 was used as an outgoup. C–E) Results of Evolutionary Information Annotation tool. C) Statistics of the gene number for each orthogroup in the input data set. D) Statistics of gene evolutionary origin. E) Statistics of related duplication events. F) and G) Results of Omics Comparison Analysis tool. F) Statistics on the overlapped and unique peaks. G) Volcanic maps for differentially expressed genes. H) Result of Multi-omics Integration Analysis tool. I) Result of Transcriptome-wide Association Analysis tool. Hover at the top of the table to see each column of definitions. ID, gene identifier; CHR, chromosome; Best Gwas ID, rsID of the most significant GWAS SNP in locus; P0, gene start; P1, gene end; HSQ, heritability of the gene; MODEL, best performing model; N-SNP, number of SNPs in the locus; TWAS_P, TWAS P-value; TWAS_Z, TWAS Z-score.

Search, Browser, Download, User Manual, and Feedback Modules

IPOP offers various modules aimed at enhancing user efficiency, which can be accessed through buttons located in the top sidebar. The Search module facilitates comprehensive data exploration within the database by allowing users to search using gene ID, sample ID, and keyword inputs. The result page showcases sample statistics associated with the search keywords, and users can access detailed sample information by simply clicking on the provided links (supplementary fig. S11A, Supplementary Material online). The Browser module serves as a repository for sample coverage and genome information pertaining to 13 distinct species, offering visualization capabilities (supplementary fig. S11B, Supplementary Material online). Additionally, the Download module grants users access to all omics result files (supplementary fig. S11C, Supplementary Material online). The user manual, accessible on the home page, provides instructions on the efficient utilization of IPOP. Additionally, the Contact page incorporates a feedback module, which facilitates the submission of feedback and enables users to establish communication with our team at their convenience (supplementary fig. S11D, Supplementary Material online).

Application of IPOP in Evolutionary Genomics Study

The m6A modification is the most prevalent RNA modification observed in cellular systems (Jiang et al. 2021). The YTH domain, which is highly conserved, has been identified as the gene family responsible for reading m6A modifications in various species. This gene family plays a critical role in deciphering the m6A modifications present in RNA molecules (Zhang et al. 2010). To illustrate the functionality and utility of IPOP, we conducted an evolutionary analysis of the YTH domain gene family. Initially, we employed the BLAST tool to align 13 known YTH domain genes from Arabidopsis in order to identify homologous genes in 12 other species. This search yielded a total of 166 YTH domain genes (supplementary table S3, Supplementary Material online), which were classified into 2 subfamilies (YTH-DC and YTH-DF), consistent with previous reports (Scutenaire et al. 2018). We constructed a phylogenetic tree for the 166 candidates using the Phylogenetic Tree Construction tool and annotated the evolutionary information using the Evolutionary Information Annotation tool. Figure 4A demonstrates that all members were assigned to 7 orthogroups and further divided into 3 phylogenetic clades: Clade I (OG0014009 and OG0007584), Clade II (OG0011256 and OG0001792), and Clade III (OG0006668, OG0006753, and OG0001785). We conducted further analysis on the evolutionary origin of these orthogroups (Fig. 4B) and identified 3 phylostratigraphic stages (Embryophyta, Angiosperms, and Monocots). Significantly, the orthogroups derived from Embryophyta and Angiosperms exhibited evolutionary variability, with certain members undergoing the loss of the YTH domain throughout the course of evolution. Conversely, the OG0011256 group originating from Monocots displayed a greater degree of evolutionary conservation. Additionally, our observations revealed that the duplication events basically covered all members of the YTH domain gene family (Fig. 4A), primarily driven by whole-genome duplication (WGD) and subsequently transposed duplication (Fig. 4C), thereby facilitating the expansion and evolutionary progression of the YTH domain gene family. For all WGD pairs, the 2 couples were distributed in the same clade and belonged to the same orthogroups, demonstrating the evolutionary conservation in phylogeny (Fig. 4D).

Evolutionary analysis of YTH domain gene family. A) Phylogenetic tree of YTH domain gene family, using Chondrus crispus gene XM_005713289 as outgroup. B) Statistics of gene duplication events in YTH domain gene family. C) Statistics of YTH domain gene number and phylostratigraphic stage for each orthogroup. D) Gene duplication relationships among orthogroups of YTH domain genes.
Fig. 4.

Evolutionary analysis of YTH domain gene family. A) Phylogenetic tree of YTH domain gene family, using Chondrus crispus gene XM_005713289 as outgroup. B) Statistics of gene duplication events in YTH domain gene family. C) Statistics of YTH domain gene number and phylostratigraphic stage for each orthogroup. D) Gene duplication relationships among orthogroups of YTH domain genes.

Following the construction of phylogenetic trees for each orthogroup of YTH domain genes, we proceeded to compare the evolutionary patterns among them (supplementary fig. S12, Supplementary Material online). Within Clade I, the OG0007584 group covered both monocotyledonous and dicotyledonous genes; however, the monocotyledonous genes were absent from the OG0014009 group. In Clade II, the groups OG0011256 and OG0001792 exhibited distinct gene retention patterns in dicotyledonous species, wherein OG0011256 covered all dicotyledonous members while the dicotyledonous members were absent in the OG0011256 group. Conversely, in the context of monocotyledonous species, the OG0011256 and OG0001792 groups displayed similar patterns, as evidenced by the consistent retention of YTH domain genes in Aegilops tauschii, Sorghum bicolor, Triticum dicoccoides, and Triticum aestivum. In Clade III, the evolutionary patterns of the OG0006668 and OG0006753 groups exhibited similarities across monocotyledonous and dicotyledonous species. This similarity was observed in Arabidopsis thaliana, Solanum lycopersicum, Glycine max, Ae. tauschii, Oryza sativa, Triticum dicoccoides, and Triticum aestivum, where the number of members in these groups remained consistent. In contrast, the OG0001792 group displayed a distinct gene retention pattern compared with the OG0006668 and OG0006753 groups in most species. However, it is noteworthy that these 3 orthogroups shared the absence of Malvaceae genes, which diverged from other orthogroups. In conclusion, despite belonging to the same phylogenetic clade, members of orthogroups in different species exhibit significant divergence in evolutionary patterns, potentially leading to functional differentiation of YTH domain genes.

Application of IPOP in Functional Genomics Study

To explore the functional disparities among these orthologous groups, we employed the Search module to retrieve multi-omics features for the YTH domain gene family and conducted statistical analysis on the shared species within each clade.

In Clade I, the orthogroups OG0014009 and OG0007584 exhibited comparable expression and modification patterns in dicots, but displayed divergent expression levels in A. thaliana and Phaseolus vulgaris, divergent m6A abundance in G. max, and divergent H3K4me3 modification levels in S. lycopersicum (supplementary table S4, Supplementary Material online). Both orthogroups were annotated with RNA splicing-related gene ontology (GO) terms (supplementary fig. S13A, Supplementary Material online). In Clade II, the expression and modification patterns of OG0011256 and OG0001792 differed significantly (supplementary table S4, Supplementary Material online). Specifically, OG0001792 exhibited higher expression and m6A modification levels compared with OG0011256 in monocot species. The genes implicated in the 3D chromatin architecture were detected within OG0001792, while they were absent in OG0011256. Within OG0001792, the AT3G03950 gene was observed to reside within a topologically associating domain, exhibiting significant interactions with other regions of the TAD (Fig. 5A). Furthermore, the AT3G13460 and AT1G55500 genes were identified within chromatin loops (supplementary fig. S14A and B, Supplementary Material online). Notably, the AT3G13460 gene was found to harbor a variant associated with protist disease resistance (supplementary fig. S15, Supplementary Material online). Additionally, our findings indicate that the functional annotations of OG0001792 and OG0011256 primarily pertain to cell metabolism processes. Moreover, these 2 orthogroups also exhibit stress response functions (Fig. 5B), suggesting their potential significance in plant resistance to external stress and the maintenance of normal plant life activities. In Clade III, the expression and modification patterns of OG0001785, OG0006668, and OG0006753 were significantly varied in over half of the species (supplementary table S4, Supplementary Material online). Notably, among these 3 orthogroups, only the AT3G13060 gene of OG0001785 was found to be involved in a chromatin-loop architecture (supplementary fig. S14C, Supplementary Material online). The Arabidopsis gene variations observed in OG0006753 and OG0001785 were found to be associated with the plant's adaptation to specific growth conditions. Additionally, the maize gene Zm00001d002372 in OG0001785 was found to have an impact on various traits such as kernel row number per ear, leaf length, and days to flowering (Fig. 5C). The maize genes belonging to Clade III exhibited high expression levels specifically in tissues undergoing active growth and development, such as spikelet, root, and tassel (Fig. 5D). This observation suggested a potential relationship between the YTH domain genes in Clade III and plant growth and development. Furthermore, all 3 orthogroups were annotated with functions related to mRNA destabilization (supplementary fig. S13B, Supplementary Material online).

Function analysis of YTH domain gene family. A) Topologically associating domain related to the AT3G03950 gene of OG0001792. B) GO terms statistics of OG0011256 and OG0001792. C) Significant variants–trait associations. Points marked with red circles indicate variant–trait associations related to Zm00001d002372 gene in OG0001785. Gray lines indicate Zm00001d002372 gene location. D) Tissue-specific expression profiles of maize YTH domain genes.
Fig. 5.

Function analysis of YTH domain gene family. A) Topologically associating domain related to the AT3G03950 gene of OG0001792. B) GO terms statistics of OG0011256 and OG0001792. C) Significant variants–trait associations. Points marked with red circles indicate variant–trait associations related to Zm00001d002372 gene in OG0001785. Gray lines indicate Zm00001d002372 gene location. D) Tissue-specific expression profiles of maize YTH domain genes.

Overall, it is apparent that diversification exists among orthogroups within the same clade in terms of both evolutionary patterns and functions. Even among species with similar evolutionary patterns, there are discernible variations in expression and modification patterns. Nevertheless, despite the shared functions of orthogroups within the same clade in the YTH domain gene family, notable distinctions have arisen, thereby enriches the role of the YTH domain gene family in plants.

Conclusions and Prospects

In this study, we present the introduction of IPOP, an integrated plant multi-omics platform that facilitates evolutionary and functional investigations. IPOP encompasses 7 distinct types of omics data and offers comprehensive evolutionary characteristics of 13 species, presented through diverse visual representations and accessible to users. Additionally, it incorporates numerous online tools to provide services for functional and comparative genomic analysis. Despite the incomplete coverage of certain species within the omics data, IPOP surpasses existing multi-omics databases by encompassing a broader spectrum of omics data, facilitating a more extensive comprehension of evolutionary processes, bridging the gaps within the angiosperms family, and offering substantial support for scientific research endeavors (Table 2). In forthcoming endeavors, we aim to overcome these limitations by augmenting the number of species included in our database, thereby enabling more comprehensive comparative genomics research and fostering a more profound understanding of evolutionary phenomena. In order to enhance the comprehensiveness of IPOP, we intend to augment the range of omics data by incorporating translation, protein, metabolic, and other relevant omics data. Furthermore, we will consistently update the platform's contents and promptly supplement it with the most recent multi-omics data. Consequently, IPOP will serve as a proficient platform for scientists engaged in evolutionary and functional genomics research.

Table 2

Data comparison of IPOP and other databases

SpeciesIPOP
(this study)
GERDH
(Cheng et al. 2023)
ChIP-Hub
(Fu et al. 2022)
MaGenDB
(Wang et al. 2020)
PODC
(Ohyanagi et al. 2015)
1 moss
6 Dicots
6 Monocots
40 Horticultural plants43 plants13 Malvaceae1 Magnoliatae
5 Dicots
2 Monocots
Raw dataTranscriptome1,72612,856/280745
Histone modification364214,40340/
Chromatin accessibility537/1,26310/
Chromatin interaction31////
DNA 5mC modification43063/4/
RNA m6A modification18918///
Population genomics931////
Transcription-factor families//2,994//
Cap analysis of gene expression///16/
Evolutionary informationDuplicate event////
Phylostratigraphic stage////
Orthologous groups//
SpeciesIPOP
(this study)
GERDH
(Cheng et al. 2023)
ChIP-Hub
(Fu et al. 2022)
MaGenDB
(Wang et al. 2020)
PODC
(Ohyanagi et al. 2015)
1 moss
6 Dicots
6 Monocots
40 Horticultural plants43 plants13 Malvaceae1 Magnoliatae
5 Dicots
2 Monocots
Raw dataTranscriptome1,72612,856/280745
Histone modification364214,40340/
Chromatin accessibility537/1,26310/
Chromatin interaction31////
DNA 5mC modification43063/4/
RNA m6A modification18918///
Population genomics931////
Transcription-factor families//2,994//
Cap analysis of gene expression///16/
Evolutionary informationDuplicate event////
Phylostratigraphic stage////
Orthologous groups//
Table 2

Data comparison of IPOP and other databases

SpeciesIPOP
(this study)
GERDH
(Cheng et al. 2023)
ChIP-Hub
(Fu et al. 2022)
MaGenDB
(Wang et al. 2020)
PODC
(Ohyanagi et al. 2015)
1 moss
6 Dicots
6 Monocots
40 Horticultural plants43 plants13 Malvaceae1 Magnoliatae
5 Dicots
2 Monocots
Raw dataTranscriptome1,72612,856/280745
Histone modification364214,40340/
Chromatin accessibility537/1,26310/
Chromatin interaction31////
DNA 5mC modification43063/4/
RNA m6A modification18918///
Population genomics931////
Transcription-factor families//2,994//
Cap analysis of gene expression///16/
Evolutionary informationDuplicate event////
Phylostratigraphic stage////
Orthologous groups//
SpeciesIPOP
(this study)
GERDH
(Cheng et al. 2023)
ChIP-Hub
(Fu et al. 2022)
MaGenDB
(Wang et al. 2020)
PODC
(Ohyanagi et al. 2015)
1 moss
6 Dicots
6 Monocots
40 Horticultural plants43 plants13 Malvaceae1 Magnoliatae
5 Dicots
2 Monocots
Raw dataTranscriptome1,72612,856/280745
Histone modification364214,40340/
Chromatin accessibility537/1,26310/
Chromatin interaction31////
DNA 5mC modification43063/4/
RNA m6A modification18918///
Population genomics931////
Transcription-factor families//2,994//
Cap analysis of gene expression///16/
Evolutionary informationDuplicate event////
Phylostratigraphic stage////
Orthologous groups//

Materials and Methods

Data Collection and Preprocessing

The data sources and download links regarding the reference genomes and gene annotations of 13 plant species (A. thaliana, Gossypium arboretum, Gossypium hirsutum, P. vulgaris, G. max, S. lycopersicum, Sor. bicolor, Zea mays, Ae. tauschii, Triticum dicoccoides, Triticum aestivum, O. sativa, and Physcomitrella patens) are listed in supplementary table S5, Supplementary Material online. Raw omics data sets were obtained from the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) and the National Genomics Data Center (NGDC) (CNCB-NGDC Members and Partners 2023). To eliminate sequencing adapters and low-quality reads, the Fastp tool was employed (Version 0.20.1) (Chen et al. 2018) (fastp -5 -3 -r -n 0 -e 20 --trim_poly_x --poly_x_min_len 15). Quality control reports were consolidated using MultiQC (Version 1.10.1) (Ewels et al. 2016).

Histone Modification Data Analysis

The alignment of sequencing reads to the reference genomes was performed using BWA-MEM (Version 0.7.17) (Li 2013) (bwa mem -M -I 200,200,5000). SAMtools (Version 1.7) (Danecek et al. 2021) was utilized to obtain high-quality reads (samtools view -b -h -F 4 -F 8 -F 256 -F 1024 -F 2048 -q 30) and sort the resulting BAM files (samtools sort). The removal of duplicated reads was accomplished using the Picard Toolkit software (Version 2.26.9) (“Picard Toolkit” 2019, Broad Institute, GitHub Repository. https://broadinstitute.github.io/picard/; Broad Institute) (picard MarkDuplicates). Peak calling was conducted using the MACS2 software (Version 2.1.2) (Zhang et al. 2008) (--nomodel --shift 73 -B --qvalue 0.01). The calculation of read coverage was carried out using DeepTools (Version 3.5.1) (Ramírez et al. 2016) (bamCoverage --binSize 25 --normalizeUsing RPGC). The annotation of the identified peaks was performed using the ChIPseeker R package (Version 1.30.3) (Yu et al. 2015). Lastly, the calculation of peak abundance was executed using the DiffBind R package (Version 3.4.11) (Ross-Innes et al. 2012), employing the formula (mapped fragmentsIP × total mapped fragmentsInput)/(mapped fragmentsInput × total mapped fragmentsIP).

Chromatin Accessibility Data Analysis

The read alignment processing strategy for chromatin accessibility data sets aligns with that of histone modification data sets, with slight variations in the parameter settings of MACS2 (Version 2.1.2) software for peak calling (macs2 callpeak --nomodel --shift -100 --extsize 200 --keep-dup all -B --qvalue 0.01). The Picard Toolkit was employed to calculate insert sizes (Version 2.26.9) (picard CollectInsertSizeMetrics), while DeepTools (Version 3.5.1) was utilized to compute scores for genome regions (computeMatrix -b 2000 -a 2000 --skipZeros) and visualize the distribution characteristics of read segments (plotProfile --perGroup).

Chromatin Interaction Data Analysis

The genome size was determined through the utilization of SAMtools (Version 1.7). HIC-Pro (Version 3.1.0) (Servant et al. 2015) was employed to generate enzyme fragment files (digest_genome.py) and interaction matrices, utilizing default parameters. Juicer (Version 1.6) (Durand et al. 2016) utilized to identify TAD regions (juicer_tools.jar arrowhead -m 500 -r 5000,10000 -k KR) and loop regions (juicer_tools.jar hiccups -m 500 -r 5000,10000 -f 0.1,0.1 -p 4,2 -i 7,5 -d 20000,20000).

DNA 5mC Modification Data Analysis

Bismark (Version 0.23.1) (Krueger and Andrews 2011) was employed to align clean reads to the reference genomes, utilizing default parameters. CGmapTools software (Version 0.1.2) (Guo et al. 2018) was utilized to convert BAM files into CGmap files. The calculation of read coverage was performed using DeepTools (Version 3.5.1), utilizing default parameters. MethGo software (Liao et al. 2015) was utilized to determine the global and gene-centric cytosine methylation levels, as well as the coverage distribution of each cytosine.

Transcriptome Data Analysis

The clean reads were aligned to the reference genomes using the HISAT2 software (Version 0.7.17) (hisat2 -k 5 --max-intronlen) (Kim et al. 2015). SAMtools software (Version 1.7) was employed to extract high-quality reads (samtools view -F 1796 -q 30) and organize the resulting BAM files. For quantitative analysis of transcriptome expression, the StringTie software (Version 2.1.7) (Pertea et al. 2016) and featureCounts (Version 2.0.1) (Liao et al. 2014) (featureCounts -p -t exon -g gene_id) were utilized.

RNA m6A Modification Data Analysis

The clean reads were aligned using the HISAT2 software (Version 0.7.17) (hisat2 -k 5 --max-intronlen). The resulting BAM files were sorted using the SAMtools software (Version 1.7), and the high-quality reads were extracted utilizing samtools view (samtools view -F 1796 -q 30). To authenticate m6A peaks, the slidingwindow model in the R package PEA (Zhai et al. 2018) was applied with default options. Additionally, StringTie (Version 2.1.7) was employed to calculate the gene TPM levels (Transcripts Per Kilobase per Million mapped) in input samples using default parameters. The IRanges R package (Lawrence et al. 2013) was utilized for the purpose of merging peaks (only peaks supported by more than 2 replicates are kept as confidence m6A peaks). Finally, the DiffBind package (Version 3.4.11) was employed to calculate the peak abundance using the formula (mapped fragmentsIP × total mapped fragmentsInput)/(mapped fragmentsInput × total mapped fragmentsIP).

GWAS Analysis

The population genomic data sets were obtained from the AraGWAS database (https://aragwas.1001genomes.org/) (Togninalli et al. 2018) and GWAS Atlas database (https://ngdc.cncb.ac.cn/gwas/) (Tian et al. 2020). GWAS analysis was conducted using GEMMA (Version 0.98.3) (Zhou and Stephens 2014).

TWAS Data Analysis

The data sets comprised 3 maize populations (Hirsch et al. 2014; Leiboff et al. 2015; Gui et al. 2020) and 1 Arabidopsis population (1001 Genomes Consortium 2016). In terms of genotype data, the [aa, Aa, AA] gene types were converted to [−1,0,1] and the VCFtools software (Version 0.1.16) (Danecek et al. 2011) was utilized to filter out SNPs with a minor allele frequency below 5% (vcftools --max-missing 0.95 --maf 0.05 --recode --recode-INFO-all). Quality control for gene expression data was conducted using FastQC (Version 0.11.5) and Trimmomatic (Version 0.39) (Bolger et al. 2014) (trimmomatic-0.36.jar LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:20). The alignment of clean reads to the reference genomes was performed using STAR (Version 2.7.3) (Dobin et al. 2013) (STAR --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1 --outSAMstrandField intronMotif --twopassMode Basic). Unique mapped reads were extracted using Samtools (Version 1.7). Transcript expression was quantified using StringTie (Version 2.1.7) with default parameters, and transcripts with a TPM > 1 in at least 5 samples were filtered out. Predictive expression models were constructed and TWAS analysis was conducted using Fusion software (Gusev et al. 2016).

Evolutionary Genomic Annotations

The construction of orthologous gene groups was carried out using OrthoFinder (v.2.4.0) (Emms and Kelly 2019) with default parameters, utilizing the protein-coding genes from 13 species. In this process, the longest protein sequence was chosen for identification when multiple isoforms were present for a particular gene. The similarity relationships between protein sequences were determined using DIAMOND (v.0.9.24) (Buchfink et al. 2015); and the clustering of genes into orthogroups was accomplished using the MCL graph clustering algorithm (Van Dongen 2008). The summary of orthologous gene annotations for each species is listed in supplementary table S2, Supplementary Material online.

The construction of the phylostratigraphic tree of orthologous genes was carried out using TimeTree (Kumar et al. 2017), following the methodologies outlined in previous research (Domazet-Loso et al. 2007; Guo 2013; Lei et al. 2017). As shown in supplementary table S6, Supplementary Material online, the orthologous genes were assigned to different phylostratigraphic stages representing different evolutionary ages.

Furthermore, the classification of orthologous genes into 4 duplication types (WGD, tandem duplication, proximal duplication, and transposed duplication) was performed using DupGen_finder (Qiao et al. 2019) utilizing the default parameters.

Platform Implementation

The data in IPOP were stored and managed using MySQL (https://www.mysql.com/) database management system. The IPOP was specifically designed to transfer data and results through a front-end web interface written in Java (https://www.java.com/) and JavaScript (https://www.javascript.com/) with Vue.js (https://vuejs.org/) and JQuery.js (https://jquery.com/) frameworks and back-end programs implemented with SpringBoot (https://spring.io/projects/spring-boot/), SpringMVC (https://spring.io/projects/spring-webflow), and MyBatis (https://blog.mybatis.org/). The IPOP was deployed on a Linux-based (https://www.linux.org/) Nginx web server powered with the Ali cloud (https://hk.alibabacloud.com/) computing platform.

Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online.

Acknowledgments

We thank High-Performance Computing (HPC) of Northwest A&F University for providing computing resources.

Author Contributions

Z.M. and C.M. conceived the project; W.H. and X.H. collected and processed the omics data; W.H., X.H., Y.R., and M.S. designed and built the IPOP website; Z.M. and W.H. wrote the manuscript. All authors read, critically revised, and approved the final manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32000410, 32370723, and 32170681).

Data Availability

All data analyzed during this study are included in the IPOP database (http://omicstudio.cloud:4012/ipod/).

References

1001 Genomes Consortium
.
1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana
.
Cell
.
2016
:
166
(
2
):
481
491
. https://doi.org/10.1016/j.cell.2016.05.063.

Bolger
 
AM
,
Lohse
 
M
,
Usadel
 
B
.
Trimmomatic: a flexible trimmer for Illumina sequence data
.
Bioinformatics
.
2014
:
30
(
15
):
2114
2120
. https://doi.org/10.1093/bioinformatics/btu170.

Buchfink
 
B
,
Xie
 
C
,
Huson
 
DH
.
Fast and sensitive protein alignment using DIAMOND
.
Nat Methods
.
2015
:
12
(
1
):
59
60
. https://doi.org/10.1038/nmeth.3176.

Camacho
 
C
,
Coulouris
 
G
,
Avagyan
 
V
,
Ma
 
N
,
Papadopoulos
 
J
,
Bealer
 
K
,
Madden
 
TL
.
BLAST+: architecture and applications
.
BMC Bioinformatics
.
2009
:
10
(
1
):
421
. https://doi.org/10.1186/1471-2105-10-421.

Chen
 
S
,
Zhou
 
Y
,
Chen
 
Y
,
Gu
 
J
.
fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
.
2018
:
34
(
17
):
i884
i890
. https://doi.org/10.1093/bioinformatics/bty560.

Cheng
 
H
,
Zhang
 
H
,
Song
 
J
,
Jiang
 
J
,
Chen
 
S
,
Chen
 
F
,
Wang
 
L
.
GERDH: an interactive multi-omics database for cross-species data mining in horticultural crops
.
Plant J.
 
2023
:
116
(
4
):
1018
1029
. https://doi.org/10.1111/tpj.16350.

CNCB-NGDC Members and Partners
.
Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2023
.
Nucleic Acids Res
.
2023
:
51
(
D1
):
D18
D28
. https://doi.org/10.1093/nar/gkac1073.

Danecek
 
P
,
Auton
 
A
,
Abecasis
 
G
,
Albers
 
CA
,
Banks
 
E
,
DePristo
 
MA
,
Handsaker
 
RE
,
Lunter
 
G
,
Marth
 
GT
,
Sherry
 
ST
, et al.  
The variant call format and VCFtools
.
Bioinformatics
.
2011
:
27
(
15
):
2156
2158
. https://doi.org/10.1093/bioinformatics/btr330.

Danecek
 
P
,
Bonfield
 
JK
,
Liddle
 
J
,
Marshall
 
J
,
Ohan
 
V
,
Pollard
 
MO
,
Whitwham
 
A
,
Keane
 
T
,
McCarthy
 
SA
,
Davies
 
RM
, et al.  
Twelve years of SAMtools and BCFtools
.
GigaScience
.
2021
:
10
(
2
):
giab008
. https://doi.org/10.1093/gigascience/giab008.

Dobin
 
A
,
Davis
 
CA
,
Schlesinger
 
F
,
Drenkow
 
J
,
Zaleski
 
C
,
Jha
 
S
,
Batut
 
P
,
Chaisson
 
M
,
Gingeras
 
TR
.
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
.
2013
:
29
(
1
):
15
21
. https://doi.org/10.1093/bioinformatics/bts635.

Domazet-Loso
 
T
,
Brajković
 
J
,
Tautz
 
D
.
A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages
.
Trends Genet
.
2007
:
23
(
11
):
533
539
. https://doi.org/10.1016/j.tig.2007.08.014.

Durand
 
NC
,
Shamim
 
MS
,
Machol
 
I
,
Rao
 
SS
,
Huntley
 
MH
,
Lander
 
ES
,
Aiden
 
EL
.
Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments
.
Cell Syst
.
2016
:
3
(
1
):
95
98
. https://doi.org/10.1016/j.cels.2016.07.002.

Emms
 
DM
,
Kelly
 
S
.
OrthoFinder: phylogenetic orthology inference for comparative genomics
.
Genome Biol
.
2019
:
20
(
1
):
238
. https://doi.org/10.1186/s13059-019-1832-y.

Ewels
 
P
,
Magnusson
 
M
,
Lundin
 
S
,
Käller
 
M
.
MultiQC: summarize analysis results for multiple tools and samples in a single report
.
Bioinformatics
.
2016
:
32
(
19
):
3047
3048
. https://doi.org/10.1093/bioinformatics/btw354.

Fu
 
L-Y
,
Zhu
 
T
,
Zhou
 
X
,
Yu
 
R
,
He
 
Z
,
Zhang
 
P
,
Wu
 
Z
,
Chen
 
M
,
Kaufmann
 
K
,
Chen
 
D
.
ChIP-Hub provides an integrative platform for exploring plant regulome
.
Nat Commun
.
2022
:
13
(
1
):
3413
. https://doi.org/10.1038/s41467-022-30770-1.

Gui
 
S
,
Yang
 
L
,
Li
 
J
,
Luo
 
J
,
Xu
 
X
,
Yuan
 
J
,
Chen
 
L
,
Li
 
W
,
Yang
 
X
,
Wu
 
S
, et al.  
ZEAMAP, a comprehensive database adapted to the maize multi-omics era
.
iScience
.
2020
:
23
(
6
):
101241
. https://doi.org/10.1016/j.isci.2020.101241.

Guo
 
W
,
Zhu
 
P
,
Pellegrini
 
M
,
Zhang
 
MQ
,
Wang
 
X
,
Ni
 
Z
.
CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data
.
Bioinformatics
.
2018
:
34
(
3
):
381
387
. https://doi.org/10.1093/bioinformatics/btx595.

Guo
 
Y-L
.
Gene family evolution in green plants with emphasis on the origination and evolution of Arabidopsis thaliana genes
.
Plant J
.
2013
:
73
(
6
):
941
951
. https://doi.org/10.1111/tpj.12089.

Gusev
 
A
,
Ko
 
A
,
Shi
 
H
,
Bhatia
 
G
,
Chung
 
W
,
Penninx
 
BW
,
Jansen
 
R
,
de Geus
 
EJ
,
Boomsma
 
DI
,
Wright
 
FA
, et al.  
Integrative approaches for large-scale transcriptome-wide association studies
.
Nat Genet
.
2016
:
48
(
3
):
245
252
. https://doi.org/10.1038/ng.3506.

Hirsch
 
CN
,
Foerster
 
JM
,
Johnson
 
JM
,
Sekhon
 
RS
,
Muttoni
 
G
,
Vaillancourt
 
B
,
Peñagaricano
 
F
,
Lindquist
 
E
,
Pedraza
 
MA
,
Barry
 
K
, et al.  
Insights into the maize pan-genome and pan-transcriptome
.
Plant Cell
.
2014
:
26
(
1
):
121
135
. https://doi.org/10.1105/tpc.113.119982.

Jiang
 
X
,
Liu
 
B
,
Nie
 
Z
,
Duan
 
L
,
Xiong
 
Q
,
Jin
 
Z
,
Yang
 
C
,
Chen
 
Y
.
The role of m6A modification in the biological functions and diseases
.
Signal Transduct Target Ther
.
2021
:
6
(
1
):
74
. https://doi.org/10.1038/s41392-020-00450-x.

Kim
 
D
,
Langmead
 
B
,
Salzberg
 
SL
.
HISAT: a fast spliced aligner with low memory requirements
.
Nat Methods
.
2015
:
12
(
4
):
357
360
. https://doi.org/10.1038/nmeth.3317.

Krueger
 
F
,
Andrews
 
SR
.
Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications
.
Bioinformatics
.
2011
:
27
(
11
):
1571
1572
. https://doi.org/10.1093/bioinformatics/btr167.

Kumar
 
S
,
Stecher
 
G
,
Suleski
 
M
,
Hedges
 
SB
.
TimeTree: a resource for timelines, timetrees, and divergence times
.
Mol Biol Evol
.
2017
:
34
(
7
):
1812
1819
. https://doi.org/10.1093/molbev/msx116.

Lan
 
Y
,
Sun
 
R
,
Ouyang
 
J
,
Ding
 
W
,
Kim
 
M-J
,
Wu
 
J
,
Li
 
Y
,
Shi
 
T
.
AtMAD: Arabidopsis thaliana multi-omics association database
.
Nucleic Acids Res
.
2021
:
49
(
D1
):
D1445
D1451
. https://doi.org/10.1093/nar/gkaa1042.

Lawrence
 
M
,
Huber
 
W
,
Pagès
 
H
,
Aboyoun
 
P
,
Carlson
 
M
,
Gentleman
 
R
,
Morgan
 
MT
,
Carey
 
VJ
.
Software for computing and annotating genomic ranges
.
PLoS Comput Biol
.
2013
:
9
(
8
):
e1003118
. https://doi.org/10.1371/journal.pcbi.1003118.

Lei
 
L
,
Steffen
 
JG
,
Osborne
 
EJ
,
Toomajian
 
C
.
Plant organ evolution revealed by phylotranscriptomics in Arabidopsis thaliana
.
Sci Rep
.
2017
:
7
(
1
):
7567
. https://doi.org/10.1038/s41598-017-07866-6.

Leiboff
 
S
,
Li
 
X
,
Hu
 
HC
,
Todt
 
N
,
Yang
 
J
,
Li
 
X
,
Yu
 
X
,
Muehlbauer
 
GJ
,
Timmermans
 
MC
,
Yu
 
J
, et al.  
Genetic control of morphometric diversity in the maize shoot apical meristem
.
Nat Commun
.
2015
:
6
(
1
):
8974
. https://doi.org/10.1038/ncomms9974.

Li
 
H
.
2013
.
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv,arXiv:1303.3997v2, preprint: not peer reviewed
.

Liao
 
W-W
,
Yen
 
M-R
,
Ju
 
E
,
Hsu
 
F-M
,
Lam
 
L
,
Chen
 
P-Y
.
MethGo: a comprehensive tool for analyzing whole-genome bisulfite sequencing data
.
BMC Genomics
.
2015
:
16
(
Suppl 12
):
S11
. https://doi.org/10.1186/1471-2164-16-S12-S11.

Liao
 
Y
,
Smyth
 
GK
,
Shi
 
W
.
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
.
Bioinformatics
.
2014
:
30
(
7
):
923
930
. https://doi.org/10.1093/bioinformatics/btt656.

Liu
 
Y
,
Zhang
 
Y
,
Liu
 
X
,
Shen
 
Y
,
Tian
 
D
,
Yang
 
X
,
Liu
 
S
,
Ni
 
L
,
Zhang
 
Z
,
Song
 
S
, et al.  
SoyOmics: a deeply integrated database on soybean multi-omics
.
Mol Plant
.
2023
:
16
(
5
):
794
797
. https://doi.org/10.1016/j.molp.2023.03.011.

Love
 
MI
,
Huber
 
W
,
Anders
 
S
.
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
.
2014
:
15
(
12
):
550
. https://doi.org/10.1186/s13059-014-0550-8.

Ma
 
S
,
Wang
 
M
,
Wu
 
J
,
Guo
 
W
,
Chen
 
Y
,
Li
 
G
,
Wang
 
Y
,
Shi
 
W
,
Xia
 
G
,
Fu
 
D
, et al.  
WheatOmics: a platform combining multiple omics data to accelerate functional genomics studies in wheat
.
Mol Plant
.
2021
:
14
(
12
):
1965
1968
. https://doi.org/10.1016/j.molp.2021.10.006.

Miao
 
Z
,
Zhang
 
T
,
Qi
 
Y
,
Song
 
J
,
Han
 
Z
,
Ma
 
C
.
Evolution of the RNA N6-methyladenosine methylome mediated by genomic duplication
.
Plant Physiol
.
2020
:
182
(
1
):
345
360
. https://doi.org/10.1104/pp.19.00323.

Miao
 
Z
,
Zhang
 
T
,
Xie
 
B
,
Qi
 
Y
,
Ma
 
C
.
Evolutionary implications of the RNA N6-methyladenosine methylome in plants
.
Mol Biol Evol
.
2022
:
39
(
1
):
msab299
. https://doi.org/10.1093/molbev/msab299.

Ohyanagi
 
H
,
Takano
 
T
,
Terashima
 
S
,
Kobayashi
 
M
,
Kanno
 
M
,
Morimoto
 
K
,
Kanegae
 
H
,
Sasaki
 
Y
,
Saito
 
M
,
Asano
 
S
, et al.  
Plant omics data center: an integrated web repository for interspecies gene expression networks with NLP-based curation
.
Plant Cell Physiol
.
2015
:
56
(
1
):
e9
. https://doi.org/10.1093/pcp/pcu188.

Pertea
 
M
,
Kim
 
D
,
Pertea
 
GM
,
Leek
 
JT
,
Salzberg
 
SL
.
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
.
Nat Protoc
.
2016
:
11
(
9
):
1650
1667
. https://doi.org/10.1038/nprot.2016.095.

Price
 
MN
,
Dehal
 
PS
,
Arkin
 
AP
.
FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
.
Mol Biol Evol
.
2009
:
26
(
7
):
1641
1650
. https://doi.org/10.1093/molbev/msp077.

Qiao
 
X
,
Li
 
Q
,
Yin
 
H
,
Qi
 
K
,
Li
 
L
,
Wang
 
R
,
Zhang
 
S
,
Paterson
 
AH
.
Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants
.
Genome Biol
.
2019
:
20
(
1
):
38
. https://doi.org/10.1186/s13059-019-1650-2.

Quinlan
 
AR
,
Hall
 
IM
.
BEDTools: a flexible suite of utilities for comparing genomic features
.
Bioinformatics
.
2010
:
26
(
6
):
841
842
. https://doi.org/10.1093/bioinformatics/btq033.

4

Ramírez
 
F
,
Ryan
 
DP
,
Grüning
 
B
,
Bhardwaj
 
V
,
Kilpert
 
F
,
Richter
 
AS
,
Heyne
 
S
,
Dündar
 
F
,
Manke
 
T
.
deepTools2: a next generation web server for deep-sequencing data analysis
.
Nucleic Acids Res
.
2016
:
44
(
W1
):
W160
W165
. https://doi.org/10.1093/nar/gkw257.

Ross-Innes
 
CS
,
Stark
 
R
,
Teschendorff
 
AE
,
Holmes
 
KA
,
Ali
 
HR
,
Dunning
 
MJ
,
Brown
 
GD
,
Gojis
 
O
,
Ellis
 
IO
,
Green
 
AR
, et al.  
Differential oestrogen receptor binding is associated with clinical outcome in breast cancer
.
Nature
.
2012
:
481
(
7381
):
389
393
. https://doi.org/10.1038/nature10730.

Scutenaire
 
J
,
Deragon
 
JM
,
Jean
 
V
,
Benhamed
 
M
,
Raynaud
 
C
,
Favory
 
JJ
,
Merret
 
R
,
Bousquet-Antonelli
 
C
.
The YTH domain protein ECT2 is an m(6)A reader required for normal trichome branching in Arabidopsis
.
Plant Cell
.
2018
:
30
(
5
):
986
1005
. https://doi.org/10.1105/tpc.17.00854.

Servant
 
N
,
Varoquaux
 
N
,
Lajoie
 
BR
,
Viara
 
E
,
Chen
 
CJ
,
Vert
 
JP
,
Heard
 
E
,
Dekker
 
J
,
Barillot
 
E
.
HiC-Pro: an optimized and flexible pipeline for Hi-C data processing
.
Genome Biol
.
2015
:
16
(
1
):
259
. https://doi.org/10.1186/s13059-015-0831-x.

Theissinger
 
K
,
Fernandes
 
C
,
Formenti
 
G
,
Bista
 
I
,
Berg
 
PR
,
Bleidorn
 
C
,
Bombarely
 
A
,
Crottini
 
A
,
Gallo
 
GR
,
Godoy
 
JA
, et al.  
How genomics can help biodiversity conservation
.
Trends Genet
.
2023
:
39
(
7
):
545
559
. https://doi.org/10.1016/j.tig.2023.01.005.

Tian
 
D
,
Wang
 
P
,
Tang
 
B
,
Teng
 
X
,
Li
 
C
,
Liu
 
X
,
Zou
 
D
,
Song
 
S
,
Zhang
 
Z
.
GWAS atlas: a curated resource of genome-wide variant-trait associations in plants and animals
.
Nucleic Acids Res
.
2020
:
48
(
D1
):
D927
D932
. https://doi.org/10.1093/nar/gkz828.

Togninalli
 
M
,
Seren
 
Ü
,
Meng
 
D
,
Fitz
 
J
,
Nordborg
 
M
,
Weigel
 
D
,
Borgwardt
 
K
,
Korte
 
A
,
Grimm
 
DG
.
The AraGWAS catalog: a curated and standardized Arabidopsis thaliana GWAS catalog
.
Nucleic Acids Res
.
2018
:
46
(
D1
):
D1150
D1156
. https://doi.org/10.1093/nar/gkx954.

Van Dongen
 
S
.
Graph clustering via a discrete uncoupling process
.
SIAM J Matrix Anal A.
 
2008
:
30
(
1
):
121
141
. https://doi.org/10.1137/040608635.

Wang
 
D
,
Fan
 
W
,
Guo
 
X
,
Wu
 
K
,
Zhou
 
S
,
Chen
 
Z
,
Li
 
D
,
Wang
 
K
,
Zhu
 
Y
,
Zhou
 
Y
.
MaGenDB: a functional genomics hub for Malvaceae plants
.
Nucleic Acids Res
.
2020
:
48
(
D1
):
D1076
D1084
. https://doi.org/10.1093/nar/gkz953.

Xie
 
L
,
Liu
 
M
,
Zhao
 
L
,
Cao
 
K
,
Wang
 
P
,
Xu
 
W
,
Sung
 
WK
,
Li
 
X
,
Li
 
G
.
RiceENCODE: a comprehensive epigenomic database as a rice encyclopedia of DNA elements
.
Mol Plant
.
2021
:
14
(
10
):
1604
1606
. https://doi.org/10.1016/j.molp.2021.08.018.

Yu
 
G
,
Wang
 
L-G
,
He
 
Q-Y
.
ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization
.
Bioinformatics
.
2015
:
31
(
14
):
2382
2383
. https://doi.org/10.1093/bioinformatics/btv145.

Zhai
 
J
,
Song
 
J
,
Cheng
 
Q
,
Tang
 
Y
,
Ma
 
C
.
PEA: an integrated R toolkit for plant epitranscriptome analysis
.
Bioinformatics
.
2018
:
34
(
21
):
3747
3749
. https://doi.org/10.1093/bioinformatics/bty421.

Zhang
 
Y
,
Liu
 
T
,
Meyer
 
CA
,
Eeckhoute
 
J
,
Johnson
 
DS
,
Bernstein
 
BE
,
Nusbaum
 
C
,
Myers
 
RM
,
Brown
 
M
,
Li
 
W
, et al.  
Model-based analysis of ChIP-Seq (MACS)
.
Genome Biol
.
2008
:
9
(
9
):
R137
. https://doi.org/10.1186/gb-2008-9-9-r137.

Zhang
 
Z
,
Theler
 
D
,
Kaminska
 
KH
,
Hiller
 
M
,
de la Grange
 
P
,
Pudimat
 
R
,
Rafalska
 
I
,
Heinrich
 
B
,
Bujnicki
 
JM
,
Allain
 
FH-T
, et al.  
The YTH domain is a novel RNA binding domain
.
J Biol Chem
.
2010
:
285
(
19
):
14701
14710
. https://doi.org/10.1074/jbc.M110.104711.

Zhou
 
X
,
Stephens
 
M
.
Efficient multivariate linear mixed model algorithms for genome-wide association studies
.
Nat Methods
.
2014
:
11
(
4
):
407
409
. https://doi.org/10.1038/nmeth.2848.

Zhu
 
T
,
Liang
 
C
,
Meng
 
Z
,
Sun
 
G
,
Meng
 
Z
,
Guo
 
S
,
Zhang
 
R
.
CottonFGD: an integrated functional genomics database for cotton
.
BMC Plant Biol
.
2017
:
17
(
1
):
101
. https://doi.org/10.1186/s12870-017-1039-x.

Author notes

Wenyue Huang and Xiaona Hu contributed equally to this work.

Conflict of interest statement. The authors declare no competing financial interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
Associate Editor: Aida Ouangraoua
Aida Ouangraoua
Associate Editor
Search for other works by this author on:

Supplementary data