Abstract

Malaria is a tropical parasitic disease caused by the Plasmodium genus, which resulted in an estimated 219 million cases of malaria and 435 000 malaria-related deaths in 2017. Despite the availability of the Plasmodium falciparum genome since 2002, 74% of the genes remain uncharacterized. To remedy this paucity of functional information, we used transcriptomic data to build gene co-expression networks for two Plasmodium species (P. falciparum and P. berghei), and included genomic data of four other Plasmodium species, P. yoelii, P. knowlesi, P. vivax and P. cynomolgi, as well as two non-Plasmodium species from the Apicomplexa, Toxoplasma gondii and Theileria parva. The genomic and transcriptomic data were incorporated into the resulting database, malaria.tools, which is preloaded with tools that allow the identification and cross-species comparison of co-expressed gene neighbourhoods, clusters and life stage-specific expression, thus providing sophisticated tools to predict gene function. Moreover, we exemplify how the tools can be used to easily identify genes relevant for pathogenicity and various life stages of the malaria parasite. The database is freely available at www.malaria.tools.

INTRODUCTION

Malaria is a widespread infectious disease transmitted by the Anopheles mosquito that caused an estimated 219 million cases and 435 000 malaria-related deaths in the year 2017 (1). It is caused by various Plasmodium species with P. falciparum and P. vivax being the two most widespread and deadly for humans (2). Over the years, varying degrees of resistance have emerged in Plasmodium against all drugs used for treating malaria. Hence, it is a race against time to find new treatments, and this requires an understanding of malaria biology, and more specifically, the characterization of genes and their functions in order to develop drugs that target genes responsible for pathogenicity.

Despite the availability of genomes of the various Plasmodium parasites for already more than a decade, many of the genes have unknown functions. These genes are often parasite-specific (3), thus making conventional gene function prediction methods that are based on homology to genes from other organisms less efficient. To this end, alternative methods for gene function prediction such as gene co-expression networks have been developed. The rationale behind using gene co-expression networks stems from the observation that functionally related genes have similar gene expression profiles (4). Thus, by identifying clusters of genes that have highly similar expression profiles, we attain groups of functionally related genes where the function of uncharacterized genes can be inferred from its neighbours.

We have built the database malaria.tools (www.malaria.tools) based on the CoNekT database framework (5) using publicly available RNA sequencing data of two model Plasmodium species. This database calculates gene co-expression networks from gene expression data obtained from more than 800 experiments and contains multiple tools to predict gene function from co-expression network neighbours, co-expressed clusters and specific expression profiles. Additionally, an in-built phylogenetic tree function combined with gene expression comparison allows Plasmodium researchers to identify orthologs of P. falciparum genes. Malaria.tools provides the malaria research community with a comprehensive and highly valuable resource for an efficient characterization of genes based on their expression profile, and will aid in the identification of potential drug targets in silico in the race for new antimalarial drugs.

MATERIALS AND METHODS

Publicly available RNA sequencing experiments were identified through NCBI SRA (6) and include experiments such as developmental stages, gene knockout variants and chemical treatment. The corresponding publication of the various experiments, if available, is provided in Supplementary Tables S3 and S4. Using the SRA run accession, RNA sequencing experiments for P. falciparum and P. berghei were downloaded as fastq files from European Nucleotide Archive (ENA) (7) via aspera v3.8.1.160447. For paired-end experiments, only the file containing the first read, designated with ‘_1’, was downloaded. To remove possible RNA contaminants from the parasite’s hosts, the reads were mapped first against human and mouse for P. falciparum and P. berghei, respectively. The unmapped reads were then mapped against the mosquito vector. Human mosquito vector was used for both Plasmodium species as the CDS of the mouse mosquito vector was unavailable. Unmapped reads were mapped against the respective Plasmodium species, and experiments with at least 1 million reads and 50% of reads mapped to the Plasmodium species were used to construct the database.

The mapping was done using kallisto v0.44.0 (8). Kallisto index files were generated for Homo sapiens (GCF_000001405.38), Anopheles gambiae (GCF_000005575.2), P. falciparum 3D7 and P. berghei ANKA CDS sequences with default parameters. Both paired and single libraries were treated as single end libraries for mapping using kallisto quant for single end library with estimated fragment length of 200 bp, estimated standard deviation of 20 and pseudobam option. Unmapped reads from the output BAM files were written into a new fastq file using samtools v1.9–52-g651bf14 (9).

In total, 206 experiments from P. berghei (Supplementary Figure S1A) and 620 experiments from P. falciparum (Supplementary Figure S1B) were included in database and annotated based on the information available from NCBI Sequence Read Archive, ENA and literature (Supplementary Table S3 (P. berghei) and Supplementary Table S4 (P. falciparum)). Transcripts Per Kilobase Million (TPM) values were represented as an expression matrix where genes are arranged in rows and experiments in columns. The expression matrices are available in Supplementary Table S1 for P. berghei and Supplementary Table S2 for P. falciparum. Highest Reciprocal Rank (HRR) co-expression networks were then constructed (10).

CDS sequences, gene annotations and associated GO terms for the eight species in the database were obtained from the various sources described in Table 1. Superfamily and Pfam domain annotations from interproscan v5.32–71.0 (11) were used as sequence description when descriptions were not available. Aliases were imported from GeneDB and PlasmoDB, if available (12,13). Groups of orthologous genes (Supplementary Table S5) and phylogenetic trees were obtained from Orthofinder v1.1.8 (14) using BLAST for sequence similarity inference with default settings. The database was based on the CoNekt database framework (5) with default settings, where Heuristic Cluster Chiseling Algorithm (HCCA) (10) cluster size was limited to 100 genes. Experiments involving wild-type Plasmodium were further binned according to its various life stages (ring, trophozoite, schizont, male gametocyte and female gametocyte) and used to calculate tissue specificity. Nucleotide and protein blast databases for blastn and blastp were created using makeblastdb v2.9.0+ (15).

Table 1.

Source of data for species in database

SpeciesRNA-seq samplesCDSCDS descriptionGO annotation
Theilera parva MugugaN/ANCBI Ref-seqNCBI Ref-seqInterproscan
Toxoplasma gondii ME49N/ANCBI Ref-seqNCBI Ref-seqInterproscan
Plasmodium berghei ANKA206PlasmoDBGeneDBPlasmoDB
Plasmodium cynomolgi BN/APlasmoDBInterproscanInterproscan
Plasmodium falciparum 3D7620PlasmoDBGeneDBPlasmoDB
Plasmodium knowlesi Malayan Pk1AN/APlasmoDBInterproscanInterproscan
Plasmodium vivax P01N/APlasmoDBGeneDBInterproscan
Plasmodium yoelii YMN/APlasmoDBInterproscanInterproscan
SpeciesRNA-seq samplesCDSCDS descriptionGO annotation
Theilera parva MugugaN/ANCBI Ref-seqNCBI Ref-seqInterproscan
Toxoplasma gondii ME49N/ANCBI Ref-seqNCBI Ref-seqInterproscan
Plasmodium berghei ANKA206PlasmoDBGeneDBPlasmoDB
Plasmodium cynomolgi BN/APlasmoDBInterproscanInterproscan
Plasmodium falciparum 3D7620PlasmoDBGeneDBPlasmoDB
Plasmodium knowlesi Malayan Pk1AN/APlasmoDBInterproscanInterproscan
Plasmodium vivax P01N/APlasmoDBGeneDBInterproscan
Plasmodium yoelii YMN/APlasmoDBInterproscanInterproscan
Table 1.

Source of data for species in database

SpeciesRNA-seq samplesCDSCDS descriptionGO annotation
Theilera parva MugugaN/ANCBI Ref-seqNCBI Ref-seqInterproscan
Toxoplasma gondii ME49N/ANCBI Ref-seqNCBI Ref-seqInterproscan
Plasmodium berghei ANKA206PlasmoDBGeneDBPlasmoDB
Plasmodium cynomolgi BN/APlasmoDBInterproscanInterproscan
Plasmodium falciparum 3D7620PlasmoDBGeneDBPlasmoDB
Plasmodium knowlesi Malayan Pk1AN/APlasmoDBInterproscanInterproscan
Plasmodium vivax P01N/APlasmoDBGeneDBInterproscan
Plasmodium yoelii YMN/APlasmoDBInterproscanInterproscan
SpeciesRNA-seq samplesCDSCDS descriptionGO annotation
Theilera parva MugugaN/ANCBI Ref-seqNCBI Ref-seqInterproscan
Toxoplasma gondii ME49N/ANCBI Ref-seqNCBI Ref-seqInterproscan
Plasmodium berghei ANKA206PlasmoDBGeneDBPlasmoDB
Plasmodium cynomolgi BN/APlasmoDBInterproscanInterproscan
Plasmodium falciparum 3D7620PlasmoDBGeneDBPlasmoDB
Plasmodium knowlesi Malayan Pk1AN/APlasmoDBInterproscanInterproscan
Plasmodium vivax P01N/APlasmoDBGeneDBInterproscan
Plasmodium yoelii YMN/APlasmoDBInterproscanInterproscan

RESULTS

Malaria.tools offers a wide selection of tools to query the database. For example, the user can find the genes of interest by using BLAST, gene IDs (e.g. PF3D7_1223100) and keywords (e.g. rhoptry). Genes that work together in a specific biological process or contain a particular domain can be identified by querying the database with Gene Ontology terms (e.g. GO:0009405) or a Pfam domain (e.g. VSA_Rifin), respectively. The database offers multiple comparative genomic and transcriptomic tools that allow the user to view and compare expression profiles within and across species, and to investigate the phylogenetic and expression relationships of gene families. A full description of the features is found at https://malaria.sbs.ntu.edu.sg/features. To exemplify some of the features of malaria.tools, we provide three analyses showing typical case studies.

Identification of a co-expression neighbourhood important for erythrocyte invasion

A co-expression neighbourhood consists of a gene of interest and its co-expressed genes (neighbours) calculated based on Highest Reciprocal Rank (HRR). To identify co-expression neighbourhood of interest to malaria researchers, we calculated which genes are network neighbour to already functionally characterized genes. We identified gene PF3D7_1223100, which was co-expressed with 91 other genes of which 48 (53%) are annotated with specific GO terms indicating that experimental evidence for the gene function already exists (e.g. evidence codes EXP, IDA, IPI). The high percentage of functionally characterized genes in this neighbourhood indicates that the corresponding biological process has received special attention from the malaria researchers, likely due to its involvement in pathogenicity. Indeed, the enriched GO terms in the neighbourhood are ‘entry into host’ and ‘movement in host environment’ (available from Functional Annotation/Predicted GO on https://malaria.sbs.ntu.edu.sg/sequence/view/16011), suggesting that the neighbourhood is important for the invasion of the host cell.

To gain further insight into the function of this neighbourhood, we first studied the expression profile of PF3D7_1223100. The expression profiles in malaria.tools are available in a detailed format that showcase expression in all annotated samples (Figure 1A), and as an average expression in the major life stages of malaria (Figure 1B, https://malaria.sbs.ntu.edu.sg/profile/view/8699). In both expression profiles, we observed that PF3D7_1223100 and its co-expressed genes are expressed in all major life stages with particularly high expression in the schizont stage (Figure 1A and B). Next, we retrieved publications on the functional characterization of the genes and we found that the highly studied genes are clearly associated with functions important for erythrocyte invasion. The genes can be classified into three major groups relating to motility, cytoadherence and erythrocyte invasion. Genes relating to the glideosome complex [PF3D7_0918000 (GAP50) (16), PF3D7_1323700 (GAPM1) (17)], the inner membrane complex (18) [PF3D7_0109000 (PHIL1) and PF3D7_1003600 (IMC1c)] and cytoskeleton (19,20) [PF3D7_1251200 (coronin), PF3D7_0932200 (Profilin) and PF3D7_1246200 (Actin 1)] are essential for the parasite to move towards a new red blood cell through gliding motility. Upon reaching the red blood cell, merozoite surface proteins such as PF3D7_1035400 (MSP3), PF3D7_1335100 (MSP7), PF3D7_1035500 (MSP6) (21) and PF3D7_1035700 (DBLMSP) facilitate the binding of the parasite to the red blood cell. Finally, erythrocyte invasion is enabled by various genes such as enzymes [PF3D7_0507500 (SUB1) (22), PF3D7_1136500 (casein kinase 1) (23) and PF3D7_0404700 (DPAP3) (24)], signalling mediators (25) [PF3D7_0934800 (PKAc), PF3D7_1223100 (PKAr)], rhoptry proteins (26,27) (PF3D7_0929400 (RhopH2), PF3D7_0905400 (RhopH3), PF3D7_1410400 (RAP1), PF3D7_0414900 (ARO), PF3D7_1017100 (RON12), PF3D7_0501600 (RAP2), PF3D7_0817700 (RON5)] and others [PF3D7_0423800 (CyRPA) (28), PF3D7_0935800 (CLAG9) (29), PF3D7_0612700 (P12), PF3D7_0404900 (P41) (30)]. The schizont is a non-infective life stage during the erythrocytic cycle. However, a mature schizont contains multiple merozoites, which upon rupture of the schizont moves and invades fresh erythrocytes. In the co-expression neighbourhood of PF3D7_1223100, we observe an upregulation of merozoite and erythrocyte invasion-related genes during the schizont stage. In conclusion, the remaining 47% of genes in this cluster that are not yet functionally characterized are prime candidates for further studies on parasite motility, cytoadherence and erythrocyte invasion.

Expression profiles and co-expression neighbourhood of gene PF3D7_1223100. (A) Full expression profile of the gene. The x-axis represents the different RNA-seq experiments capturing the life stages and genetic perturbations of P. falciparum, while the y-axis indicates the expression level (Transcripts Per Million, TPM). The different life stages are colour-coded by blue (ring), purple (trophozoite), green (schizont), light blue (male gametocyte), pink (female gametocyte) and brown (sporozoite). The bars indicate the mean expression value, while the dots show the expression values of the individual samples. For brevity, only the general descriptions of the samples are shown. (B) Simplified expression profile of PF3D7_1223100, showing the average expression in the five major life stages. (C) Co-expression network neighbourhood containing functionally characterized genes. Nodes represent genes, edges (lines) connect co-expressed genes, while coloured shapes indicate orthogroups. For brevity, only genes with experimentally verified function supported by at least two publications are shown in the figure.
Figure 1.

Expression profiles and co-expression neighbourhood of gene PF3D7_1223100. (A) Full expression profile of the gene. The x-axis represents the different RNA-seq experiments capturing the life stages and genetic perturbations of P. falciparum, while the y-axis indicates the expression level (Transcripts Per Million, TPM). The different life stages are colour-coded by blue (ring), purple (trophozoite), green (schizont), light blue (male gametocyte), pink (female gametocyte) and brown (sporozoite). The bars indicate the mean expression value, while the dots show the expression values of the individual samples. For brevity, only the general descriptions of the samples are shown. (B) Simplified expression profile of PF3D7_1223100, showing the average expression in the five major life stages. (C) Co-expression network neighbourhood containing functionally characterized genes. Nodes represent genes, edges (lines) connect co-expressed genes, while coloured shapes indicate orthogroups. For brevity, only genes with experimentally verified function supported by at least two publications are shown in the figure.

Comparative transcriptomic analysis of gene modules involved in male gametocyte-specific motility

Comparative transcriptomic analyses can reveal that gene modules are conserved across species (31), thus enabling the identification of the core genetic components of specific biological processes (32,33). Malaria.tools provides two methods to extract these conserved transcriptional programs by (i) identifying common gene families that are specifically expressed in a particular life stage of two malaria species or by (ii) identifying conserved clusters of co-expressed genes.

Using the first method to identify conserved transcriptional programs, we navigated to ‘Tools\Compare specificities’, selected species P. falciparum and P. berghei, set condition ‘Gametocyte (male) for both species and clicked ‘Compare specificity’. The database first identified genes that are preferentially expressed in male gametocytes in both species (specificity measure (SPM) > 0.85) (34), which revealed that 187 gene families are expressed at this life stage in both parasites (Figure 2A, Supplementary Table S6). The table below the Venn diagram shows the identity and links to the 187 gene families, and clicking on a gene family followed by the tree links depicted the phylogenetic and expression relationships of the genes in the family. Not surprisingly, many of the gene families show conserved male gametocyte-specific expression in the two Plasmodium species, as exemplified by the phylogenetic tree of gene family OG0000502 (Figure 2B, https://malaria.sbs.ntu.edu.sg/tree/view/503), indicating that this family is male gametocyte-specific. Interestingly, we also observed cases where only one of the clades of the phylogenetic tree showed a male gametocyte-specific expression (Figure 2C, OG0000055, https://malaria.sbs.ntu.edu.sg/tree/view/56), suggesting that for this particular gene family an ancient gene duplication took place in the ancestor of the Plasmodium species followed by either a sub-functionalization or neo-functionalization of the genes. The list of the conserved male-specific genes and gene families provides a good starting point to dissect the genetic basis of male gametocyte-specific biological processes.

Comparative analysis of male gametocyte-specific gene expression in P. berghei and P. falciparum. (A) Venn diagram showing the overlap of the male gametophyte-specific gene families in the two malaria species. (B) Phylogenetic gene tree of gene family OG0109400. The different species are colour coded and represent Toxoplasma gondii in red (gene IDs XP_NNNNNNNNN, NP_XXXXXX), Theilera parva in orange (gene IDs XP_NNNNNN), P. falciparum in olive (gene IDs PF3D7_XXXXXXX), P. berghei in dark green (gene IDs PBANKA_XXXXXXX), P. yoleii in light green (gene IDs PYYM_XXXXXXX), P. knowlesi in blue (gene IDs PKNOH_SXXXXXXXXX-t35_1), P. vivax in dark blue (gene IDs PVP01_XXXXXXX) and P. cynomolgi in purple (gene IDs PCYB_XXXXXX-t26_1). The coloured boxes to the right of the gene IDs show the average gene expression in five major life stages of the malaria parasite, where yellow and blue colour indicates low and high gene expression. (C) Phylogenetic gene tree of gene family OG0000055. (D) Comparison of cluster 13 (left blue box) and cluster 15 (right green box) from P. berghei and P. falciparum, respectively. Nodes represent, genes, solid edges connect co-expressed genes, dashed edges connect orthologs, while coloured shapes indicate genes belonging to the same gene families. (E) Average expression profiles of the genes found in cluster 13 (left) and 15 (right).
Figure 2.

Comparative analysis of male gametocyte-specific gene expression in P. berghei and P. falciparum. (A) Venn diagram showing the overlap of the male gametophyte-specific gene families in the two malaria species. (B) Phylogenetic gene tree of gene family OG0109400. The different species are colour coded and represent Toxoplasma gondii in red (gene IDs XP_NNNNNNNNN, NP_XXXXXX), Theilera parva in orange (gene IDs XP_NNNNNN), P. falciparum in olive (gene IDs PF3D7_XXXXXXX), P. berghei in dark green (gene IDs PBANKA_XXXXXXX), P. yoleii in light green (gene IDs PYYM_XXXXXXX), P. knowlesi in blue (gene IDs PKNOH_SXXXXXXXXX-t35_1), P. vivax in dark blue (gene IDs PVP01_XXXXXXX) and P. cynomolgi in purple (gene IDs PCYB_XXXXXX-t26_1). The coloured boxes to the right of the gene IDs show the average gene expression in five major life stages of the malaria parasite, where yellow and blue colour indicates low and high gene expression. (C) Phylogenetic gene tree of gene family OG0000055. (D) Comparison of cluster 13 (left blue box) and cluster 15 (right green box) from P. berghei and P. falciparum, respectively. Nodes represent, genes, solid edges connect co-expressed genes, dashed edges connect orthologs, while coloured shapes indicate genes belonging to the same gene families. (E) Average expression profiles of the genes found in cluster 13 (left) and 15 (right).

The second method to identify conserved gene modules using malaria.tools is based on co-expression network clusters. The clusters are used to identify functionally related genes based on the topology of the networks (10), i.e. similar clusters are found by identifying which cluster pairs contain significantly similar (P < 0.05, hypergeometric test) number of gene families (expressed as Jaccard index, (5)). To exemplify this feature, we clicked first on one of the P. berghei genes found in the table (PBANKA_0102700, https://malaria.sbs.ntu.edu.sg/sequence/view/17914) and then on the P. berghei co-expression cluster 13 containing this gene (https://malaria.sbs.ntu.edu.sg/cluster/view/40). The ‘Similar clusters’ table found on this page identified P. falciparum cluster 15 as being significantly similar to P. berghei cluster 13 (Jaccard index = 0.246). Clicking on the ‘Compare’ button revealed the co-expression networks of the two modules (Figure 2D). As expected, the two conserved clusters show male gametocyte-specific expression profiles (https://malaria.sbs.ntu.edu.sg/cluster/view/40, https://malaria.sbs.ntu.edu.sg/cluster/view/124), further reinforcing that the two clusters represent a bona fide conserved transcriptional program for male gametocyte-specific motility.

The analysis of conserved gene modules resulted in 132 genes present in the two conserved clusters, and an inspection of the expression profiles indicated that 123 genes (93%) showed male gametocyte-specific expression (Supplementary Table S7). However, only 60 of these homologs were annotated and the remaining ones were conserved proteins of unknown function (4 genes) or conserved Plasmodium proteins of unknown function. A closer inspection of the homologs present in the clusters revealed motor proteins such as kinesin (PBANKA_0202700, PF3D7_0111000, PBANKA_0902400, PF3D7_1146700, PBANKA_1458800 and PF3D7_1245600) and dynein (PBANKA_1022400 and PF3D7_1420800) (35), as well as flagella-related proteins such as the radial spoke protein 3 (PBANKA_1039000), growth arrest protein (36) (PBANKA_1455800 and PF3D7_1242400) and MORN repeat containing protein that localizes near the flagellar basal body in male gametocytes (37) (PBANKA_1018200 and PF3D7_1426400). Taken together, the functions of these genes suggest an overall motility and flagella-related function associated with the clusters. Hence, the unannotated genes in these homologous networks will be of prime interest for researchers interested in male gametocyte motility.

Identification of microneme-specific co-expressed gene clusters

Micronemes are protein rich, secretory organelles important for host-cell invasion and gliding motility in parasitic Apicomplexans (38). Proteins are being discharged to facilitate entry of the parasites into red blood cells.

To gain insight into microneme biogenesis and function using malaria.tools, we entered GO:0020009 (GO term for microneme) to arrive at the page dedicated to microneme cellular component (https://malaria.sbs.ntu.edu.sg/go/view/12866). The page revealed 270 and 299 annotated microneme-associated genes in P. berghei and P. falciparum, respectively. Furthermore, the page contains information about Pfam domains (Prot_kinase_dom, VWF_A, MORN and others) and gene families (OG_01_0000012, OG_01_0000038 and others), which may also play a role in the microneme function. Furthermore, the database identified cluster 2 (https://malaria.sbs.ntu.edu.sg/cluster/view/31) and cluster 7 (https://malaria.sbs.ntu.edu.sg/cluster/view/33) from P. berghei and cluster 2 (https://malaria.sbs.ntu.edu.sg/cluster/view/74) and cluster 12 (https://malaria.sbs.ntu.edu.sg/cluster/view/146) from P. falciparum as being significantly similar (P < 0.05, Benjamini–Hochenberg corrected P-value) (39) and all clusters were enriched for OG:0020009 (microneme), implicating these clusters in a microneme-specific process.

To learn more about the function of these four clusters, we investigated their expression profiles. While cluster 2 from P. berghei and cluster 12 from P. falciparum show ubiquitous expression at all life stages of malaria (Figure 3A), cluster 7 from P.berghei and cluster 2 from P. falciparum show ookinete- and sporozoite-specific expression, respectively. Since ookinetes and sporozoites are mosquito stage-specific, we speculate that the two Plasmodium species have at least two types of micronemes, one being ubiquitously expressed (clusters 2 and 12) and another being mosquito-specific (clusters 7 and 2).

Expression profiles and co-expression network of the microneme-enriched clusters in P. falciparum and P. berghei. (A) Expression profiles of clusters 2 (first) and 7 (second) from P. berghei and clusters 2 (third) and 12 (fourth) from P. falciparum. The different life stages are colour-coded. For brevity, the sample annotations are abbreviated to the major life stages. (B) Co-expression cluster 7 from P. berghei. Nodes represent genes, co-expressed genes are connected by grey edges, while coloured shapes indicate orthogroups. For brevity, only the discussed genes are highlighted.
Figure 3.

Expression profiles and co-expression network of the microneme-enriched clusters in P. falciparum and P. berghei. (A) Expression profiles of clusters 2 (first) and 7 (second) from P. berghei and clusters 2 (third) and 12 (fourth) from P. falciparum. The different life stages are colour-coded. For brevity, the sample annotations are abbreviated to the major life stages. (B) Co-expression cluster 7 from P. berghei. Nodes represent genes, co-expressed genes are connected by grey edges, while coloured shapes indicate orthogroups. For brevity, only the discussed genes are highlighted.

We further investigated the putative function of the ookinete-specific cluster 7 from P. berghei (https://malaria.sbs.ntu.edu.sg/cluster/view/33), and found that the cluster is significantly enriched for GO terms such as entry into host cell (GO:0030260, Supplementary Table S8), which is in line with the microneme being involved in invasion of blood cells. A closer look at the genes found in cluster 7 from P. berghei revealed that it contains genes that are essential for the infectivity and maturation of ookinetes (Figure 3B). Specifically, a group of genes relating to the inner membrane complex, surface ookinete protein and secreted ookinete protein is required for efficient gliding motility and midgut traversal [PBANKA_1354600 (ISC1), PBANKA_1025700 (IMC1l), PBANKA_0513000 (IMC1m) (40), PBANKA_1106900 (PIMMS2) (41), PBANKA_0714300 (HSP20) (42) and PBANKA_1432300 (CeITOS) (43)]. Another group of co-expressed genes contains perforins and a secreted protein, which are known to be important in midgut invasion where the parasite disrupts the membrane of the endothelial cell [PBANKA_0824200 (PLP3), PBANKA_0711400 (PLP4), PBANKA_0711600 (PLP5) (44) and PBANKA_1037800 (SOAP) (45)]. Last but not least, genes important for the transition from the ookinete to oocyst stage are also found enriched in this cluster [PBANKA_0701900 (GAMA/PSOP9) (46), PBANKA_1119200 (PSOP25) (47) and PBANKA_0412900 (CTRP) (48)]. Taken together, cluster 7 from P. berghei contain genes that are most likely important for microneme function and host cell invasion.

CONCLUSIONS

The lack of comparative genomic and transcriptomic resources for malaria prompted us to construct malaria.tools, a state-of-the-art database containing a wide range of user-friendly features. The database can be queried by BLAST, gene IDs, keywords, and Pfam domains and Gene Ontology searches. To identify novel genes relevant for a biological process of interest, the co-expression neighbourhoods and clusters can be mined for uncharacterized candidates that are connected to well-studied genes. Alternatively, the database allows an easy identification of genes that are expressed during a specific life stage of the malaria parasite, thus allowing researchers to dissect the transcriptome critical for pathogenicity and other life stages. Finally, the database can compare the clusters and stage-specific expression profiles to identify the conserved core components of various biological processes. We envision that malaria.tools will aid malaria researchers in selecting relevant genes for experimental functional characterization and potential drug development for successful combating the emerging drug resistances.

DATA AVAILABILITY

The expression matrices, RNA-seq sample annotation and gene families are available from the Supplementary Data. The co-expression networks, coding and protein sequences can be downloaded from malaria.tools.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

ACKNOWLEDGEMENTS

Malaria.tools is hosted at Nanyang Technological University Singapore and we would like to thank Ryan Chee Kiang Ng for excellent tech support. We would like to thank Dr Daniela Mutwil-Anderwald for proofreading the manuscript. Furthermore, we would like to thank Dr Lei Zhu from Prof. Zbynek Bozdech lab, SBS, NTU for useful discussions.

Author Contributions: Malaria.tools was implemented by Q.W.T. who also prepared the data and built malaria.tools with input from M.M. Both Q.W.T. and M.M. wrote the manuscript.

FUNDING

Nanyang Technological University Start-Up Grant. Funding for open access charge: Start-Up Grant.

Conflict of interest statement. None declared.

REFERENCES

2.

Thu
A.M.
,
Phyo
A.P.
,
Landier
J.
,
Parker
D.M.
,
Nosten
F.H.
Combating multidrug-resistant Plasmodium falciparum malaria
.
FEBS J.
2017
;
284
:
2569
2578
.

3.

Florent
I.
,
Maréchal
E.
,
Gascuel
O.
,
Bréhélin
L.
Bioinformatic strategies to provide functional clues to the unknown genes in Plasmodium falciparum genome
.
Parasite
.
2010
;
17
:
273
283
.

4.

Zhou
X.
,
Kao
M.-C.J.
,
Wong
W.H.
Transitive functional annotation by shortest-path analysis of gene expression data
.
Proc. Natl. Acad. Sci. U.S.A.
2002
;
99
:
12783
12788
.

5.

Proost
S.
,
Mutwil
M.
CoNekT: an open-source framework for comparative genomic and transcriptomic network analyses
.
Nucleic Acids Res.
2018
;
46
:
255075
.

6.

Leinonen
R.
,
Sugawara
H.
,
Shumway
M.
The sequence read archive
.
Nucleic Acids Res.
2011
;
39
:
D19
D21
.

7.

Silvester
N.
,
Alako
B.
,
Amid
C.
,
Cerdeño-Tarrága
A.
,
Clarke
L.
,
Cleland
I.
,
Harrison
P.W.
,
Jayathilaka
S.
,
Kay
S.
,
Keane
T.
et al. .
The European nucleotide archive in 2017
.
Nucleic Acids Res.
2017
;
46
:
D36
D40
.

8.

Bray
N.L.
,
Pimentel
H.
,
Melsted
P.
,
Pachter
L.
Near-optimal probabilistic RNA-seq quantification
.
Nat. Biotechnol.
2016
;
34
:
525
527
.

9.

Li
H.
,
Handsaker
B.
,
Wysoker
A.
,
Fennell
T.
,
Ruan
J.
,
Homer
N.
,
Marth
G.
,
Abecasis
G.
,
Durbin
R.
The Sequence Alignment/Map format and SAMtools
.
Bioinformatics
.
2009
;
25
:
2078
2079
.

10.

Mutwil
M.
,
Usadel
B.
,
Schütte
M.
,
Loraine
A.
,
Ebenhöh
O.
,
Persson
S.
,
Schutte
M.
,
Loraine
A.
,
Ebenhoh
O.
,
Persson
S.
Assembly of an interactive correlation network for the Arabidopsis genome using a novel Heuristic Clustering Algorithm
.
Plant Physiol.
2010
;
152
:
29
43
.

11.

Jones
P.
,
Binns
D.
,
Chang
H.-Y.
,
Fraser
M.
,
Li
W.
,
McAnulla
C.
,
McWilliam
H.
,
Maslen
J.
,
Mitchell
A.
,
Nuka
G.
et al. .
InterProScan 5: genome-scale protein function classification
.
Bioinformatics
.
2014
;
30
:
1236
1240
.

12.

Logan-Klumpler
F.J.
,
De Silva
N.
,
Boehme
U.
,
Rogers
M.B.
,
Velarde
G.
,
McQuillan
J.A.
,
Carver
T.
,
Aslett
M.
,
Olsen
C.
,
Subramanian
S.
et al. .
GeneDB–an annotation database for pathogens
.
Nucleic Acids Res.
2012
;
40
:
D98
D108
.

13.

Aurrecoechea
C.
,
Brestelli
J.
,
Brunk
B.P.
,
Dommer
J.
,
Fischer
S.
,
Gajria
B.
,
Gao
X.
,
Gingle
A.
,
Grant
G.
,
Harb
O.S.
et al. .
PlasmoDB: a functional genomic database for malaria parasites
.
Nucleic Acids Res.
2009
;
37
:
D539
D543
.

14.

Emms
D.M.
,
Kelly
S.
OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy
.
Genome Biol.
2015
;
16
:
157
.

15.

Camacho
C.
,
Coulouris
G.
,
Avagyan
V.
,
Ma
N.
,
Papadopoulos
J.
,
Bealer
K.
,
Madden
T.L.
BLAST+: architecture and applications
.
BMC Bioinformatics
.
2009
;
10
:
421
.

16.

Yeoman
J.A.
,
Hanssen
E.
,
Maier
A.G.
,
Klonis
N.
,
Maco
B.
,
Baum
J.
,
Turnbull
L.
,
Whitchurch
C.B.
,
Dixon
M.W.A.
,
Tilley
L.
Tracking glideosome-associated protein 50 reveals the development and organization of the inner membrane complex of plasmodium falciparum
.
Eukaryot. Cell
.
2011
;
10
:
556
564
.

17.

Bullen
H.E.
,
Tonkin
C.J.
,
O’Donnell
R.A.
,
Tham
W.-H.
,
Papenfuss
A.T.
,
Gould
S.
,
Cowman
A.F.
,
Crabb
B.S.
,
Gilson
P.R.
A novel family of apicomplexan Glideosome-associated proteins with an inner Membrane-anchoring role
.
J. Biol. Chem.
2009
;
284
:
25353
25363
.

18.

Saini
E.
,
Zeeshan
M.
,
Brady
D.
,
Pandey
R.
,
Kaiser
G.
,
Koreny
L.
,
Kumar
P.
,
Thakur
V.
,
Tatiya
S.
,
Katris
N.J.
et al. .
Photosensitized INA-Labelled protein 1 (PhIL1) is novel component of the inner membrane complex and is required for Plasmodium parasite development
.
Sci. Rep.
2017
;
7
:
15577
.

19.

Olshina
M.A.
,
Angrisano
F.
,
Marapana
D.S.
,
Riglar
D.T.
,
Bane
K.
,
Wong
W.
,
Catimel
B.
,
Yin
M.-X.
,
Holmes
A.B.
,
Frischknecht
F.
et al. .
Plasmodium falciparum coronin organizes arrays of parallel actin filaments potentially guiding directional motility in invasive malaria parasites
.
Malar. J.
2015
;
14
:
280
.

20.

Moreau
C.A.
,
Bhargav
S.P.
,
Kumar
H.
,
Quadt
K.A.
,
Piirainen
H.
,
Strauss
L.
,
Kehrer
J.
,
Streichfuss
M.
,
Spatz
J.P.
,
Wade
R.C.
et al. .
A unique profilin-actin interface is important for malaria parasite motility
.
PLoS Pathog.
2017
;
13
:
e1006412
.

21.

Beeson
J.G.
,
Drew
D.R.
,
Boyle
M.J.
,
Feng
G.
,
Fowkes
F.J.I.
,
Richards
J.S.
Merozoite surface proteins in red blood cell invasion, immunity and vaccines against malaria
.
FEMS Microbiol. Rev.
2016
;
40
:
343
372
.

22.

Withers-Martinez
C.
,
Suarez
C.
,
Fulle
S.
,
Kher
S.
,
Penzo
M.
,
Ebejer
J.-P.
,
Koussis
K.
,
Hackett
F.
,
Jirgensons
A.
,
Finn
P.
et al. .
Plasmodium subtilisin-like protease 1 (SUB1): insights into the active-site structure, specificity and function of a pan-malaria drug target
.
Int. J. Parasitol.
2012
;
42
:
597
612
.

23.

Dorin-Semblat
D.
,
Demarta-Gatsi
C.
,
Hamelin
R.
,
Armand
F.
,
Carvalho
T.G.
,
Moniatte
M.
,
Doerig
C.
Malaria Parasite-Infected erythrocytes secrete PfCK1, the plasmodium homologue of the pleiotropic protein kinase casein kinase 1
.
PLoS One
.
2015
;
10
:
e0139591
.

24.

Lehmann
C.
,
Tan
M.S.Y.
,
de Vries
L.E.
,
Russo
I.
,
Sanchez
M.I.
,
Goldberg
D.E.
,
Deu
E.
Plasmodium falciparum dipeptidyl aminopeptidase 3 activity is important for efficient erythrocyte invasion by the malaria parasite
.
PLoS Pathog.
2018
;
14
:
e1007031
.

25.

Baker
D.A.
,
Drought
L.G.
,
Flueck
C.
,
Nofal
S.D.
,
Patel
A.
,
Penzo
M.
,
Walker
E.M.
Cyclic nucleotide signalling in malaria parasites
.
Open Biol.
2017
;
7
:
170213
.

26.

Counihan
N.A.
,
Kalanon
M.
,
Coppel
R.L.
,
De Koning-Ward
T.F.
Plasmodium rhoptry proteins: why order is important
.
Trends Parasitol.
2013
;
29
:
228
236
.

27.

Counihan
N.A.
,
Chisholm
S.A.
,
Bullen
H.E.
,
Srivastava
A.
,
Sanders
P.R.
,
Jonsdottir
T.K.
,
Weiss
G.E.
,
Ghosh
S.
,
Crabb
B.S.
,
Creek
D.J.
et al. .
Plasmodium falciparum parasites deploy RhopH2 into the host erythrocyte to obtain nutrients, grow and replicate
.
Elife
.
2017
;
6
:
e23217
.

28.

Favuzza
P.
,
Guffart
E.
,
Tamborrini
M.
,
Scherer
B.
,
Dreyer
A.M.
,
Rufer
A.C.
,
Erny
J.
,
Hoernschemeyer
J.
,
Thoma
R.
,
Schmid
G.
et al. .
Structure of the malaria vaccine candidate antigen CyRPA and its complex with a parasite invasion inhibitory antibody
.
Elife
.
2017
;
6
:
e20383
.

29.

Ling
I.T.
,
Florens
L.
,
Dluzewski
A.R.
,
Kaneko
O.
,
Grainger
M.
,
Yim Lim
B.Y.S.
,
Tsuboi
T.
,
Hopkins
J.M.
,
Johnson
J.R.
,
Torii
M.
et al. .
The Plasmodium falciparum clag9 gene encodes a rhoptry protein that is transferred to the host erythrocyte upon invasion
.
Mol. Microbiol.
2004
;
52
:
107
118
.

30.

Tonkin
M.L.
,
Arredondo
S.A.
,
Loveless
B.C.
,
Serpa
J.J.
,
Makepeace
K.A.T.
,
Sundar
N.
,
Petrotchenko
E. V
,
Miller
L.H.
,
Grigg
M.E.
,
Boulanger
M.J.
Structural and biochemical characterization of Plasmodium falciparum 12 (Pf12) reveals a unique interdomain organization and the potential for an antiparallel arrangement with Pf41
.
J. Biol. Chem.
2013
;
288
:
12805
12817
.

31.

Mutwil
M.
,
Klie
S.
,
Tohge
T.
,
Giorgi
F.M.
,
Wilkins
O.
,
Campbell
M.M.
,
Fernie
A.R.
,
Usadel
B.
,
Nikoloski
Z.
,
Persson
S.
PlaNet: Combined sequence and expression comparisons across plant networks derived from seven species
.
Plant Cell
.
2011
;
23
:
895
910
.

32.

Movahedi
S.
,
Van Bel
M.
,
Heyndrickx
K.S.
,
Vandepoele
K.
Comparative co-expression analysis in plant biology
.
Plant Cell Environ.
2012
;
35
:
1787
1798
.

33.

Hansen
B.O.
,
Vaid
N.
,
Musialak-Lange
M.
,
Janowski
M.
,
Mutwil
M.
Elucidating gene function and function evolution through comparison of co-expression networks of plants
.
Front. Plant Sci.
2014
;
5
:
1
9
.

34.

Xiao
S.J.
,
Zhang
C.
,
Zou
Q.
,
Ji
Z.L.
TiSGeD: a database for tissue-specific genes
.
Bioinformatics
.
2010
;
26
:
1273
1275
.

35.

Talman
A.M.
,
Prieto
J.H.
,
Marques
S.
,
Ubaida-Mohien
C.
,
Lawniczak
M.
,
Wass
M.N.
,
Xu
T.
,
Frank
R.
,
Ecker
A.
,
Stanway
R.S.
et al. .
Proteomic analysis of the Plasmodium male gamete reveals the key role for glycolysis in flagellar motility
.
Malar. J.
2014
;
13
:
315
.

36.

Yeh
S.-D.
,
Chen
Y.-J.
,
Chang
A.C.Y.
,
Ray
R.
,
She
B.-R.
,
Lee
W.-S.
,
Chiang
H.-S.
,
Cohen
S.N.
,
Lin-Chao
S.
Isolation and properties of Gas8, a growth arrest-specific gene regulated during male gametogenesis to produce a protein associated with the sperm motility apparatus
.
J. Biol. Chem.
2002
;
277
:
6311
6317
.

37.

Ferguson
D.J.P.
,
Sahoo
N.
,
Pinches
R.A.
,
Bumstead
J.M.
,
Tomley
F.M.
,
Gubbels
M.-J.
MORN1 has a conserved role in asexual and sexual development across the apicomplexa
.
Eukaryot. Cell
.
2008
;
7
:
698
711
.

38.

Black
M.W.
,
Boothroyd
J.C.
Lytic cycle of Toxoplasma gondii
.
Microbiol. Mol. Biol. Rev.
2000
;
64
:
607
623
.

39.

Benjamini
Y.
,
Hochberg
Y.
Controlling the false discovery Rate: A practical and powerful approach to multiple testing
.
J. R. Stat. Soc. Ser. B
.
1995
;
57
:
289
300
.

40.

Harding
C.R.
,
Meissner
M.
The inner membrane complex through development of Toxoplasma gondii and Plasmodium
.
Cell Microbiol.
2014
;
16
:
632
641
.

41.

Ukegbu
C. V
,
Akinosoglou
K.A.
,
Christophides
G.K.
,
Vlachou
D.
Plasmodium berghei PIMMS2 Promotes Ookinete Invasion of the Anopheles gambiae Mosquito Midgut
.
Infect. Immun.
2017
;
85
:
e00139-17
.

42.

Montagna
G.N.
,
Buscaglia
C.A.
,
Münter
S.
,
Goosmann
C.
,
Frischknecht
F.
,
Brinkmann
V.
,
Matuschewski
K.
Critical role for heat shock protein 20 (HSP20) in migration of malarial sporozoites
.
J. Biol. Chem.
2012
;
287
:
2410
2422
.

43.

Kariu
T.
,
Ishino
T.
,
Yano
K.
,
Chinzei
Y.
,
Yuda
M.
CelTOS, a novel malarial protein that mediates transmission to mosquito and vertebrate hosts
.
Mol. Microbiol.
2006
;
59
:
1369
1379
.

44.

Deligianni
E.
,
Silmon de Monerri
N.C.
,
McMillan
P.J.
,
Bertuccini
L.
,
Superti
F.
,
Manola
M.
,
Spanos
L.
,
Louis
C.
,
Blackman
M.J.
,
Tilley
L.
et al. .
Essential role of Plasmodium perforin-like protein 4 in ookinete midgut passage
.
PLoS One
.
2018
;
13
:
e0201651
.

45.

Dessens
J.T.
,
Siden-Kiamos
I.
,
Mendoza
J.
,
Mahairaki
V.
,
Khater
E.
,
Vlachou
D.
,
Xu
X.-J.
,
Kafatos
F.C.
,
Louis
C.
,
Dimopoulos
G.
et al. .
SOAP, a novel malaria ookinete protein involved in mosquito midgut invasion and oocyst development
.
Mol. Microbiol.
2003
;
49
:
319
329
.

46.

Ecker
A.
,
Bushell
E.S.C.
,
Tewari
R.
,
Sinden
R.E.
Reverse genetics screen identifies six proteins important for malaria development in the mosquito
.
Mol. Microbiol.
2008
;
70
:
209
220
.

47.

Zheng
W.
,
Liu
F.
,
He
Y.
,
Liu
Q.
,
Humphreys
G.B.
,
Tsuboi
T.
,
Fan
Q.
,
Luo
E.
,
Cao
Y.
,
Cui
L.
Functional characterization of Plasmodium berghei PSOP25 during ookinete development and as a malaria transmission-blocking vaccine candidate
.
Parasit. Vectors
.
2017
;
10
:
8
.

48.

Ramakrishnan
C.
,
Dessens
J.T.
,
Armson
R.
,
Pinto
S.B.
,
Talman
A.M.
,
Blagborough
A.M.
,
Sinden
R.E.
Vital functions of the malarial ookinete protein, CTRP, reside in the A domains
.
Int. J. Parasitol.
2011
;
41
:
1029
1039
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.