BEERE: a web server for biomedical entity expansion, ranking and explorations Open Access

The features comparison between BEERE and other three web servers: ToppGene, Phenolyzer and Semantic Medline

Feature		BEERE	ToppGene	Phenolyzer	Semantic MEDLINE
Algorithm	Ranking algorithm	PageRank, ant-colony	k-step Markov, PageRank, HITS	Gene-disease score	Not mention
	Iterative ranking	yes	yes	no	no
Evaluation	Statistical model	yes	yes	no	no
Data and quality	Relationship quality control	yes	no	yes (factor control)	no
	Network extension with IOA evaluation	yes	yes (neighbor distance)	no	no
	Biomedical term	yes	no	yes (disease)	yes
Visualization	Network with provenance data	yes	yes (provide networks)	yes	yes
	Grouping annotation	yes	no	no	no

Feature		BEERE	ToppGene	Phenolyzer	Semantic MEDLINE
Algorithm	Ranking algorithm	PageRank, ant-colony	k-step Markov, PageRank, HITS	Gene-disease score	Not mention
	Iterative ranking	yes	yes	no	no
Evaluation	Statistical model	yes	yes	no	no
Data and quality	Relationship quality control	yes	no	yes (factor control)	no
	Network extension with IOA evaluation	yes	yes (neighbor distance)	no	no
	Biomedical term	yes	no	yes (disease)	yes
Visualization	Network with provenance data	yes	yes (provide networks)	yes	yes
	Grouping annotation	yes	no	no	no

Grouping annotation: The node color visualization using the annotation, e.g the genes annotated in different pathways can be visualized in the network using different colors.

Table 1.

The features comparison between BEERE and other three web servers: ToppGene, Phenolyzer and Semantic Medline

Feature		BEERE	ToppGene	Phenolyzer	Semantic MEDLINE
Algorithm	Ranking algorithm	PageRank, ant-colony	k-step Markov, PageRank, HITS	Gene-disease score	Not mention
	Iterative ranking	yes	yes	no	no
Evaluation	Statistical model	yes	yes	no	no
Data and quality	Relationship quality control	yes	no	yes (factor control)	no
	Network extension with IOA evaluation	yes	yes (neighbor distance)	no	no
	Biomedical term	yes	no	yes (disease)	yes
Visualization	Network with provenance data	yes	yes (provide networks)	yes	yes
	Grouping annotation	yes	no	no	no

Feature		BEERE	ToppGene	Phenolyzer	Semantic MEDLINE
Algorithm	Ranking algorithm	PageRank, ant-colony	k-step Markov, PageRank, HITS	Gene-disease score	Not mention
	Iterative ranking	yes	yes	no	no
Evaluation	Statistical model	yes	yes	no	no
Data and quality	Relationship quality control	yes	no	yes (factor control)	no
	Network extension with IOA evaluation	yes	yes (neighbor distance)	no	no
	Biomedical term	yes	no	yes (disease)	yes
Visualization	Network with provenance data	yes	yes (provide networks)	yes	yes
	Grouping annotation	yes	no	no	no

Grouping annotation: The node color visualization using the annotation, e.g the genes annotated in different pathways can be visualized in the network using different colors.

Given a list of biomedical entities, BEERE provides a five-step procedure to generate the biomedical rank and visualization panels shown in the ‘Graphical abstract’.

Searching the databases with a list of terms or genes

The input page allows users to enter a gene list or a biomedical term list, and BEERE will retrieve the matched entity and relationships from the databases HAPPI 2.0 and SemMedDB respectively. We use two as examples. One is a gene list consisting of 200 glioblastoma (GBM) genetic candidates from the OMIM database. The other is a term list consisting of three different vitamins and Alzheimer’s disease. The advanced parameters setting allows users to control relationships quality. In PPI retrieval, the parameter ‘PPI confidence’ provides three PPI cutoffs ‘0.45’, ‘0.75’ and ‘0.9’ and a ‘customized cutoff’ option. Those three PPI cutoffs are equivalent to the 3-star, 4-star and 5-star PPI’s quality in the HAPPI 2.0 database. In the term-to-term relationship retrieval, users can enter a number varied from 0 to max-value of the RDS. The parameter ‘expanded’ provides the option for network one-layer expansion, which potentially increases the index of aggregation by introducing the ‘bridge’ nodes to the network. In biomedical term retrieval, the parameter ‘matching’ offers ‘fuzzy matching’, ‘substring matching’ and ‘exact matching’ options to maximize the retrieving power. The parameter ‘predicate’ provides a list of predicates for a single or multiple selections.

Verifying retrieved terms against search terms

The retrieved biomedical entity page helps the user to verify the query. In the matched entity table, the matches and mismatches are displayed for users to review and search again. In the example of a gene list, the gene matching table shows the queried gene symbols, matched gene and Seed/Expanded/none tag ‘S/E/-’. If the user’s input gene is an alias or a gene synonym, BEERE will automatically map the queried gene to HAPPI 2.0 database gene symbols. In the example of a term list using ‘substring’ matching, the table shows matches and modified matches with the lowest Levenshtein’s distance (L-distance) as the best candidates. In the advanced model, all the potential mismatches are displayed for an adjustment.

Retrieving known relationships

In the retrieved related relationships page, the relationships table shows the quality of the relationships. Reviewing the two tables, users can choose to ‘refine the table and try to match again’ or ‘process to entity prioritization’. In advanced search, users can adjust the parameters ‘iteration’, ‘sigma’ and ‘method.’ The parameter ‘iteration’ provides the recursive ranking score. The parameter ‘method’ provides the ‘page rank’ and ‘ant colony’ algorithms. The parameter ‘sigma’ provide a damping factor varying from [0,1] (default value is 0.8), which determines the probability of randomly choosing a relationship will eventually stop choosing.

Rank entities from the network

The biomedical entity prioritization page provides the table and two visualization panels. In the prioritization table, there are six parameters: ‘entity name’, ‘in-expanded network’, ‘ranking score’, ‘rank’, ‘adjust P-value’ and ‘significance’. Each row is clickable and it is linked to the entity information page. The page shows the attributes of the biomedical entity and the relationships specific to the selected entity. In the visualization panels, two graphic figures help users to intuitively view the significant entities and the ranking score distribution. The word-cloud graph deploys the highly significant biomedical entities in the center with relative larger fonts.

Visual exploration of network relationship data

The network visualization page provides an interactive graphical panel to allow users intuitively to discover the critical entities and interactions with provenance. Three layout algorithms, directed-force (default), DEMA and circular have been provided. The current view of the network can be exported as a PNG image and the network’s edge and node information can be exported as an SVG file. The left side advanced function panel offers a customized entity association input for adding the grouping information, and by clicking ‘next’ button, BEERE visualizes the color-grouped nodes in the network and shows the grouping information table below the network graph. The edges are clickable that can trigger the right side panel and show the table of entity relationship’s detail and the provenance of the relationships.

CASE STUDY

While BEERE supports user analysis with either ‘a gene list as the input’ or ‘a term list as the input’ independently, we will demonstrate a more sophisticated case study in which genes and terms are analyzed in conjunction with each other. In this case study, a user is presumably interested in exploring all candidate genes for glioblastoma (GBM), an aggressive form of brain tumor with low patient survival, to understand which genes may be worth validating experimentally and whether there are additional candidate genes not yet curated in public databases.

To prepare the BEERE web-based data analysis, a BEERE user first performs a search for disease-specific candidate genes against the NCBI-hosted OMIM database (34), which contains disease–gene curations for more than 6300 disease phenotypes and more than 4000 genes. Upon search with the term ‘glioblastoma’, the OMIM database returned 241 entries, among which 200 are gene candidates. The user then saves the gene list (‘seed genes’ or the ‘seed’) to perform a ‘gene list as the input’ query against the BEERE web server (refer to Example B on the web server to obtain this gene list). Since the user is also interested in finding new gene candidates, user can set the network expansion flag as ‘YES’, using HAPPI 2.0 PPI data with the quality filter of 0.75 (four-star ratings) and above.

After a user confirms the mapping of matched gene symbols in the BEERE’s database, BEERE shows (in step third) a list all the seed genes and network-expanded genes using the nearest neighbor network consisting of PPIs from the HAPPI 2.0 database, default PPI quality control parameters and the default gene ranking algorithm. The expanded gene network has a high IOA at 99% and seed gene coverage in the network (SCN) is 87% (Table 2). The result shows that seed gene ranking remains consistent regardless of whether there is an expanded network or not, because top-10 BEERE-ranked seed genes in the expanded network are also found in the top 10% in non-expanded network, except for PLK1 (Table 3). The entire gene rank of the expanded and non-expanded networks are shown in Supplementary Tables S1–4. The PLK1 ranks in the top due to its high connectivity to the expanded genes. About 250 out of 258 PLK’s interactors are expanded genes. Three of those expanded genes, UBC, APP and MYC, are statistically significant with P-values <0.05. Meanwhile, these genes are also reported to be ubiquitously expressed in brain (35). PLK1 is a Ser/Thr protein kinase gene belonging to the CDC5/Polo subfamily. It is highly expressed during mitosis and elevated levels of PLK1 are found in many cancers including glioblastoma (36,37). About 14 expanded genes are statistically significant at P-values ≤0.05 (Figure 1). All 14 genes except for the EGF gene, a gene with more than 384 PubMed abstract co-citations with the term ‘glioblastoma’, are also ranked by ToppGene in the top 5.5% (Table 4). Our BEERE results also highlighted the significance of performing biological entity expansions—expanding gene symbols to aliases and gene full names—to reduce false negatives.

Table 2.

The network quality control using different PPI cutoffs

Expanded	Yes (0.9)	No
PPI Cutoff	0.9	0.45	0.75	0.9
IOA	0.99 (1962/1984)	0.76 (130/172)	0.64 (110/172)	0.52 (90/172)
SCN	0.87 (150/172)	0.78 (134/172)	0.68 (117/172)	0.59 (102/172)
Interaction	6833	543	326	200

Expanded	Yes (0.9)	No
PPI Cutoff	0.9	0.45	0.75	0.9
IOA	0.99 (1962/1984)	0.76 (130/172)	0.64 (110/172)	0.52 (90/172)
SCN	0.87 (150/172)	0.78 (134/172)	0.68 (117/172)	0.59 (102/172)
Interaction	6833	543	326	200

IOA: Index of Aggregation, SCN: Seed's Candidate Coverage in Network

Table 2.

The network quality control using different PPI cutoffs

Expanded	Yes (0.9)	No
PPI Cutoff	0.9	0.45	0.75	0.9
IOA	0.99 (1962/1984)	0.76 (130/172)	0.64 (110/172)	0.52 (90/172)
SCN	0.87 (150/172)	0.78 (134/172)	0.68 (117/172)	0.59 (102/172)
Interaction	6833	543	326	200

Expanded	Yes (0.9)	No
PPI Cutoff	0.9	0.45	0.75	0.9
IOA	0.99 (1962/1984)	0.76 (130/172)	0.64 (110/172)	0.52 (90/172)
SCN	0.87 (150/172)	0.78 (134/172)	0.68 (117/172)	0.59 (102/172)
Interaction	6833	543	326	200

IOA: Index of Aggregation, SCN: Seed's Candidate Coverage in Network

Table 3.

The top-10 ranked seed genes in expanded network compared to the ranks in non-expanded networks

	5-star+Exp.		3-star		4-star		5-star
Seed Gene	Rank	P-value	Rank	P-value	Rank	P-value	Rank	P-value
TP53	1	0.00053	2	0.015	2	0.017	4	0.0028
EGFR	2	0.0011	4	0.03	4	0.034	5	0.0032
PIK3R1	3	0.0016	5	0.037	3	0.026	2	0.0012
AKT1	4	0.0021	3	0.022	6	0.051	8	0.0085
CTNNB1	5	0.0027	1	0.0075	1	0.0085	1	0.00096
PIK3CA	6	0.0032	9	0.067	5	0.043	3	0.0027
PLK1	7	0.0037	31	0.23	21	0.18	12	0.036
RAC1	8	0.0043	11	0.082	10	0.085	6	0.0053
RB1	9	0.0048	10	0.075	9	0.077	7	0.0065
CCND1	10	0.0053	7	0.052	7	0.06	11	0.019

	5-star+Exp.		3-star		4-star		5-star
Seed Gene	Rank	P-value	Rank	P-value	Rank	P-value	Rank	P-value
TP53	1	0.00053	2	0.015	2	0.017	4	0.0028
EGFR	2	0.0011	4	0.03	4	0.034	5	0.0032
PIK3R1	3	0.0016	5	0.037	3	0.026	2	0.0012
AKT1	4	0.0021	3	0.022	6	0.051	8	0.0085
CTNNB1	5	0.0027	1	0.0075	1	0.0085	1	0.00096
PIK3CA	6	0.0032	9	0.067	5	0.043	3	0.0027
PLK1	7	0.0037	31	0.23	21	0.18	12	0.036
RAC1	8	0.0043	11	0.082	10	0.085	6	0.0053
RB1	9	0.0048	10	0.075	9	0.077	7	0.0065
CCND1	10	0.0053	7	0.052	7	0.06	11	0.019

Table 3.

Open in new tab Download slide

The top-10 ranked seed genes in expanded network compared to the ranks in non-expanded networks

	5-star+Exp.		3-star		4-star		5-star
Seed Gene	Rank	P-value	Rank	P-value	Rank	P-value	Rank	P-value
TP53	1	0.00053	2	0.015	2	0.017	4	0.0028
EGFR	2	0.0011	4	0.03	4	0.034	5	0.0032
PIK3R1	3	0.0016	5	0.037	3	0.026	2	0.0012
AKT1	4	0.0021	3	0.022	6	0.051	8	0.0085
CTNNB1	5	0.0027	1	0.0075	1	0.0085	1	0.00096
PIK3CA	6	0.0032	9	0.067	5	0.043	3	0.0027
PLK1	7	0.0037	31	0.23	21	0.18	12	0.036
RAC1	8	0.0043	11	0.082	10	0.085	6	0.0053
RB1	9	0.0048	10	0.075	9	0.077	7	0.0065
CCND1	10	0.0053	7	0.052	7	0.06	11	0.019

	5-star+Exp.		3-star		4-star		5-star
Seed Gene	Rank	P-value	Rank	P-value	Rank	P-value	Rank	P-value
TP53	1	0.00053	2	0.015	2	0.017	4	0.0028
EGFR	2	0.0011	4	0.03	4	0.034	5	0.0032
PIK3R1	3	0.0016	5	0.037	3	0.026	2	0.0012
AKT1	4	0.0021	3	0.022	6	0.051	8	0.0085
CTNNB1	5	0.0027	1	0.0075	1	0.0085	1	0.00096
PIK3CA	6	0.0032	9	0.067	5	0.043	3	0.0027
PLK1	7	0.0037	31	0.23	21	0.18	12	0.036
RAC1	8	0.0043	11	0.082	10	0.085	6	0.0053
RB1	9	0.0048	10	0.075	9	0.077	7	0.0065
CCND1	10	0.0053	7	0.052	7	0.06	11	0.019

Figure 1.

The pipeline overview of the conjunction analysis in glioblastoma (GBM) genetic candidate’s discovery. In the first part, BEERE offers a ranked order list of critical seed and expanded genes with statistical significance using the expanded network analysis. (A) The input is the 200 genetic candidate genes downloaded from OMIM databases. (B) In BEERE quality control, BEERE automatically maps the queried genes to HAPPI database gene symbols. By verifying the 172 genes matched to HAPPI database, BEERE returns 6833 PPIs passing the PPI cutoff. The network quality is good that Index of Aggregation = 0.99 and Seed’s Coverage in Network = 0.87. (C) BEERE generates the gene ranks. About 87 seed genes are statistically significant and 14 expanded genes are statistically significant. In the second part, BEERE reveals the critical mechanisms using comprehensive term mapping, heterogeneous network analysis, and term ranking. (D) About 28 biomedical entities using genes, aliases and disease terms are the input of the network meta-analysis. (E) The term ranking score distribution and term rank word-cloud intuitively show the important entities such as epidermal growth factors, amyloid genes, ubiquitin genes and tyrosine genes are tightly related to glioblastoma. The provenance of the gene to glioblastoma relationship is displayed on the selected edge such as APP affects glioblastoma with one literature support. The PMID and a link to outsource are displayed by clicking the entry with the detail of the relationships.

Table 4.

The expanded genes validation using the PubMed article term-to-term co-citations and network semantic relationship validation

Gene	Search term	BEERE Top rank	P-value	ToppGene rank	ToppGene Normalized rank	PubMed Initial count	PubMed Extended count	Network validation	Literatures	PMID
UBC	UBC or ubiquitin or ubiquitin C	1 (33)	0.017	116	32	0	137	Augments	1	27766591
APP	APP or amyloid or amyloid beta	2 (54)	0.028	1	1	23	61	Affects	1	15302999
MYC	MYC or c-myc or myc proto-oncogene	3 (69)	0.035	17	5	300	300	Augments	1	26993778
HDAC1	HDAC1 or Histone decacetylase	4 (74)	0.038	46	13	12	12	Augments	1	27766591
SUMO1	SUMO1 or ubiquitin	5 (75)	0.038	373	102	3	138	Augments	1	27766591
SRC	SRC or Tyrosine-Protein Kinase	6 (76)	0.039	31	9	165	167	Affects	6	3146045\|15994925\|15618223\|20947248\|19098899\|25048528
ABL1	ABL1 or Tyrosine-Protein Kinase	7 (83)	0.042	142	39	4	10	Augments	1	23383209
FYN	FYN or Tyrosine-protein kinase	8 (85)	0.043	93	26	17	23	Affects	1	15994925
PCNA	PCNA or Proliferating Cell Nuclear Antigen	9 (87)	0.044	166	45	93	108	Indirectly affect	-	-
EP300	EP300 or E1A Binding Protein P300	10 (88)	0.045	52	15	3	5	Affects	2	21489305\|26722247
GRB2	GRB2 or Growth Factor Receptor Bound Protein	11 (90)	0.046	22	6	14	14	Indirectly affect	-	-
CREBBP	CREBBP or CREB Binding Protein	12 (92)	0.047	83	23	1	4	Indirectly affect	-	-
EGF	EGF or Epidermal Growth Factor	13 (93)	0.047	5146	1395	384	1419	Produces	1	3011820
ESR1	ESR1 or Estrogen Receptor 1	14 (96)	0.049	34	10	5	6	Produces	1	20841389

Gene	Search term	BEERE Top rank	P-value	ToppGene rank	ToppGene Normalized rank	PubMed Initial count	PubMed Extended count	Network validation	Literatures	PMID
UBC	UBC or ubiquitin or ubiquitin C	1 (33)	0.017	116	32	0	137	Augments	1	27766591
APP	APP or amyloid or amyloid beta	2 (54)	0.028	1	1	23	61	Affects	1	15302999
MYC	MYC or c-myc or myc proto-oncogene	3 (69)	0.035	17	5	300	300	Augments	1	26993778
HDAC1	HDAC1 or Histone decacetylase	4 (74)	0.038	46	13	12	12	Augments	1	27766591
SUMO1	SUMO1 or ubiquitin	5 (75)	0.038	373	102	3	138	Augments	1	27766591
SRC	SRC or Tyrosine-Protein Kinase	6 (76)	0.039	31	9	165	167	Affects	6	3146045\|15994925\|15618223\|20947248\|19098899\|25048528
ABL1	ABL1 or Tyrosine-Protein Kinase	7 (83)	0.042	142	39	4	10	Augments	1	23383209
FYN	FYN or Tyrosine-protein kinase	8 (85)	0.043	93	26	17	23	Affects	1	15994925
PCNA	PCNA or Proliferating Cell Nuclear Antigen	9 (87)	0.044	166	45	93	108	Indirectly affect	-	-
EP300	EP300 or E1A Binding Protein P300	10 (88)	0.045	52	15	3	5	Affects	2	21489305\|26722247
GRB2	GRB2 or Growth Factor Receptor Bound Protein	11 (90)	0.046	22	6	14	14	Indirectly affect	-	-
CREBBP	CREBBP or CREB Binding Protein	12 (92)	0.047	83	23	1	4	Indirectly affect	-	-
EGF	EGF or Epidermal Growth Factor	13 (93)	0.047	5146	1395	384	1419	Produces	1	3011820
ESR1	ESR1 or Estrogen Receptor 1	14 (96)	0.049	34	10	5	6	Produces	1	20841389

Table 4.

The expanded genes validation using the PubMed article term-to-term co-citations and network semantic relationship validation

Gene	Search term	BEERE Top rank	P-value	ToppGene rank	ToppGene Normalized rank	PubMed Initial count	PubMed Extended count	Network validation	Literatures	PMID
UBC	UBC or ubiquitin or ubiquitin C	1 (33)	0.017	116	32	0	137	Augments	1	27766591
APP	APP or amyloid or amyloid beta	2 (54)	0.028	1	1	23	61	Affects	1	15302999
MYC	MYC or c-myc or myc proto-oncogene	3 (69)	0.035	17	5	300	300	Augments	1	26993778
HDAC1	HDAC1 or Histone decacetylase	4 (74)	0.038	46	13	12	12	Augments	1	27766591
SUMO1	SUMO1 or ubiquitin	5 (75)	0.038	373	102	3	138	Augments	1	27766591
SRC	SRC or Tyrosine-Protein Kinase	6 (76)	0.039	31	9	165	167	Affects	6	3146045\|15994925\|15618223\|20947248\|19098899\|25048528
ABL1	ABL1 or Tyrosine-Protein Kinase	7 (83)	0.042	142	39	4	10	Augments	1	23383209
FYN	FYN or Tyrosine-protein kinase	8 (85)	0.043	93	26	17	23	Affects	1	15994925
PCNA	PCNA or Proliferating Cell Nuclear Antigen	9 (87)	0.044	166	45	93	108	Indirectly affect	-	-
EP300	EP300 or E1A Binding Protein P300	10 (88)	0.045	52	15	3	5	Affects	2	21489305\|26722247
GRB2	GRB2 or Growth Factor Receptor Bound Protein	11 (90)	0.046	22	6	14	14	Indirectly affect	-	-
CREBBP	CREBBP or CREB Binding Protein	12 (92)	0.047	83	23	1	4	Indirectly affect	-	-
EGF	EGF or Epidermal Growth Factor	13 (93)	0.047	5146	1395	384	1419	Produces	1	3011820
ESR1	ESR1 or Estrogen Receptor 1	14 (96)	0.049	34	10	5	6	Produces	1	20841389

Gene	Search term	BEERE Top rank	P-value	ToppGene rank	ToppGene Normalized rank	PubMed Initial count	PubMed Extended count	Network validation	Literatures	PMID
UBC	UBC or ubiquitin or ubiquitin C	1 (33)	0.017	116	32	0	137	Augments	1	27766591
APP	APP or amyloid or amyloid beta	2 (54)	0.028	1	1	23	61	Affects	1	15302999
MYC	MYC or c-myc or myc proto-oncogene	3 (69)	0.035	17	5	300	300	Augments	1	26993778
HDAC1	HDAC1 or Histone decacetylase	4 (74)	0.038	46	13	12	12	Augments	1	27766591
SUMO1	SUMO1 or ubiquitin	5 (75)	0.038	373	102	3	138	Augments	1	27766591
SRC	SRC or Tyrosine-Protein Kinase	6 (76)	0.039	31	9	165	167	Affects	6	3146045\|15994925\|15618223\|20947248\|19098899\|25048528
ABL1	ABL1 or Tyrosine-Protein Kinase	7 (83)	0.042	142	39	4	10	Augments	1	23383209
FYN	FYN or Tyrosine-protein kinase	8 (85)	0.043	93	26	17	23	Affects	1	15994925
PCNA	PCNA or Proliferating Cell Nuclear Antigen	9 (87)	0.044	166	45	93	108	Indirectly affect	-	-
EP300	EP300 or E1A Binding Protein P300	10 (88)	0.045	52	15	3	5	Affects	2	21489305\|26722247
GRB2	GRB2 or Growth Factor Receptor Bound Protein	11 (90)	0.046	22	6	14	14	Indirectly affect	-	-
CREBBP	CREBBP or CREB Binding Protein	12 (92)	0.047	83	23	1	4	Indirectly affect	-	-
EGF	EGF or Epidermal Growth Factor	13 (93)	0.047	5146	1395	384	1419	Produces	1	3011820
ESR1	ESR1 or Estrogen Receptor 1	14 (96)	0.049	34	10	5	6	Produces	1	20841389

To investigate whether the ‘new’ candidate genes discovered from BEERE gene-based analysis is valid, a user may continue the web-based analysis by switching to BEERE term-based analysis section, using terms including ‘glioblastoma’ and each of the 14 new candidate genes. After term expansion and ranking, BEERE helps users to construct a network in the last analysis step (step fifth) to allow users’ visual exploration of the heterogeneous biomedical entity-to-entity interaction network, which consists of both disease terms and gene symbols. BEERE-generated semantic predications help users to validate co-cited gene-to-disease pairs (Table 4) given the options of SemMed V30 and ‘fuzzy matching'. BEERE-generated ranks of the biomedical terms reveal that the epidermal growth factors, amyloid genes, ubiquitin genes and tyrosine genes are tightly related to glioblastoma. Further, exploring each candidate gene’s mechanisms of actions in glioblastoma shows that 11 out of 14 genes have a direct effect on glioblastoma. Among them, five genes (UBC, MYC, HDAC1, SUMO1 and ABL1) augment glioblastoma; four genes (APP, SRC, FYN and EP300) are associated with glioblastoma; a gene (EGF) and an estrogen receptor (ESR1) produce the glioblastoma (38,39). All the above relationships can be explored within the network visually and shown in the detailed HTML tables next to the network graph on the web server to reveal underlying PubMed articles referred by the predications. Interestingly, among the three genes (CREBBP, PCNA, ESR1) without direct relationships found within SemMedDB extracted predications, each gene connects to six (for CREBBP), seven (for PCNA) and six (for ESR1) existing genes, respectively. This suggests strong candidacy for these genes were investigated further for their molecular mechanism links to glioblastoma.

Furthermore, we believe the case study above demonstrated that BEERE not only as a tool for gene-based or term-based network expansion, ranking and exploration tools separately, but also as an iterative analysis platform for users who switch between biomedical entity relationship explorations and phenotypically significant gene network explorations. For example, having obtained the above results, a user may enter into ‘gene-based’ analysis again, using 200 previously OMIM-curated genes and 14 newly discovered candidate genes, to explore a gene-to-gene association network without network expansions. Such ranking may shed additional light on the relative significance of all candidate genes. Networks using disease, genes and drugs may also be expanded, ranked and explored iteratively from within the BEERE tool subsequently.

DISCUSSION

BEERE is a new web-based data analysis tool to help biomedical researchers characterize any input list of genes/proteins, biomedical terms, or their combinations against databases containing gene-to-gene relationships and semantic term-to-term relationships. We developed BEERE first to help users examine whether there is credible biological evidence of gene-to-gene associative relationships or term-to-term semantic relationships within the user input of a gene/term list. This is an important first step towards the interpretation of high-throughput Omics sequencing data or manually curated biological entities for hypothesis-driven research. Moreover, using the entire collection of biomedical entity-to-entity relationship pairs, we demonstrated that BEERE can help users uncover the inherent relative importance of each entity within the list, allow users to visually explore constructed global entity relationship network and assist users with examining different types of relationship pairs to trace back biomedical entity relationships of interest to the original cited PubMed articles. We demonstrated that BEERE could accelerate biomedical mechanistic studies downstream of Omics analysis or initial curation with its current set of features when proper parameters are set correctly. We envision biological users of BEERE could use the entity expansion, ranking and exploration features iteratively to examine gene-to-gene, gene-to-disease, gene-to-drug, gene-to-risk factors, and many types of biomedical entity relationships for their research. With ongoing database update and database coverage expansion to include additional contents from sources such as gene sets and gene signatures, e.g., PAGER (40) and GeneSigDB (41), we expect BEERE to become a useful web service for the biomedical research community.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

University of Alabama at Birmingham (UAB) Informatics Institute; UAB Academic Enrichment Fund (to J.C.); Center for Clinical and Translational Science of the University of Alabama at Birmingham [1TL1TR001418–01 to J.C.]; National Center for the Advancement of Translational Science (NCATS); National Cancer Institute [U01CA223976 to J.C., C.D.W., A.B.H.)]. Source of open access charge: University of Alabama at Birmingham (UAB) Informatics Institute; UAB Academic Enrichment Fund (to J.C.); Center for Clinical and Translational Science of the University of Alabama at Birmingham [1TL1TR001418–01 to J.C.]; National Center for the Advancement of Translational Science (NCATS); National Cancer Institute [U01CA223976 to J.C., C.D.W., A.B.H.)].

Conflict of interest statement. None declared.

REFERENCES

Chen

J.Y.

Shen

Sivachenko

A.Y.

Mining Alzheimer disease relevant proteins from integrated protein interactome data

Pac. Symp. Biocomput.

2006

;

367

–

378

Guala

Sonnhammer

E.L.L.

A large-scale benchmark of gene prioritization methods

Sci. Rep.

2017

;

46598

Nitsch

Tranchevent

L.C.

Goncalves

J.P.

Vogt

J.K.

Madeira

S.C.

Moreau

PINTA: a web server for network-based gene prioritization from expression data

Nucleic Acids Res.

2011

;

W334

–

W338

Chen

Bardes

E.E.

Aronow

B.J.

Jegga

A.G.

ToppGene Suite for gene list enrichment analysis and candidate gene prioritization

Nucleic Acids Res.

2009

;

W305

–

W311

Adie

E.A.

Adams

R.R.

Evans

K.L.

Porteous

D.J.

Pickard

B.S.

SUSPECTS: enabling fast and effective prioritization of positional candidates

Bioinformatics

2006

;

773

–

774

Wulf

Liu

Khoury

M.J.

Gwinn

Gene Prospector: an evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases

BMC Bioinformatics

2008

;

528

Tranchevent

L.C.

Barriot

Van Vooren

Van Loo

Coessens

De Moor

Aerts

Moreau

ENDEAVOUR update: a web resource for gene prioritization in multiple species

Nucleic Acids Res.

2008

;

W377

–

W384

Lupski

J.R.

Reid

J.G.

Gonzaga-Jauregui

Rio Deiros

Chen

D.C.

Nazareth

Bainbridge

Dinh

Jing

Wheeler

D.A.

et al. .

Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy

N. Engl. J. Med.

2010

;

362

1181

–

1191

Doncheva

N.T.

Kacprowski

Albrecht

Recent approaches to the prioritization of candidate disease genes

Wiley Interdiscip. Rev. Syst. Biol. Med.

2012

;

429

–

442

10.

Bornigen

Tranchevent

L.C.

Bonachela-Capdevila

Devriendt

De Moor

De Causmaecker

Moreau

An unbiased evaluation of gene prioritization tools

Bioinformatics

2012

;

3081

–

3088

11.

Moreau

Tranchevent

L.C.

Computational tools for prioritizing candidate genes: boosting disease gene discovery

Nat. Rev. Genet.

2012

;

523

–

536

12.

Oti

Ballouz

Wouters

M.A.

Web tools for the prioritization of candidate disease genes

Methods Mol. Biol.

2011

;

760

189

–

206

13.

Piro

R.M.

Di Cunto

Computational approaches to disease-gene prediction: rationale, classification and successes

FEBS J.

2012

;

279

678

–

696

14.

Tranchevent

L.C.

Capdevila

F.B.

Nitsch

De Moor

De Causmaecker

Moreau

A guide to web tools to prioritize candidate genes

Brief Bioinform.

2011

;

–

15.

Szklarczyk

Morris

J.H.

Cook

Kuhn

Wyder

Simonovic

Santos

Doncheva

N.T.

Roth

Bork

et al. .

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible

Nucleic Acids Res.

2017

;

D362

–

D368

16.

Chen

J.Y.

Pandey

Nguyen

T.M.

HAPPI-2: a comprehensive and high-quality map of human annotated and predicted protein interactions

BMC Genomics

2017

;

182

17.

Isik

Baldow

Cannistraci

C.V.

Schroeder

Drug target prioritization by perturbed gene expression and network information

Sci. Rep.

2015

;

17417

18.

Sivachenko

A.Y.

Yuryev

Pathway analysis software as a tool for drug target selection, prioritization and validation of drug mechanism

Expert Opin. Ther. Targets

2007

;

411

–

421

19.

Yue

Arora

Zhang

E.Y.

Laufer

Bridges

S.L.

Chen

J.Y.

Repositioning drugs by targeting network modules: a Parkinson's disease case study

BMC Bioinformatics

2017

;

532

20.

Denecke

Semantic structuring of and information extraction from medical documents using the UMLS

Methods Inf. Med.

2008

;

425

–

434

21.

Burger

Abu-Hanna

de Keizer

Cornet

Natural language processing in pathology: a scoping review

J. Clin. Pathol.

2016

;

949

–

955

Crossref

22.

Matthies

Hahn

Scholarly information extraction is going to make a quantum leap with pubmed central (PMC)

Stud. Health Technol. Inform.

2017

;

245

521

–

525

23.

Yang

Robinson

P.N.

Wang

Phenolyzer: phenotype-based prioritization of candidate genes for human diseases

Nat Methods

2015

;

841

–

843

24.

Song

Kim

Lee

G.G.

B.K.

POSBIOTM-NER: a trainable biomedical named-entity recognition system

Bioinformatics

2005

;

2794

–

2796

25.

Wang

Zhang

Ren

Zhang

Zitnik

Shang

Langlotz

Han

Cross-type biomedical named entity recognition with deep multi-task learning

Bioinformatics

2018

;

1745

–

1752

Crossref

26.

Zhao

Yang

Luo

Wang

Zhang

Lin

Wang

Disease named entity recognition from biomedical literature using a novel convolutional neural network

BMC Med. Genomics

2017

;

27.

Lee

Kim

Lee

Choi

Kim

Jeon

Lim

Choi

Kim

Tan

A.C.

et al. .

BEST: next-generation biomedical entity search tool for knowledge discovery from biomedical literature

PLoS One

2016

;

e0164680

28.

Brown

G.R.

Hem

Katz

K.S.

Ovetsky

Wallin

Ermolaeva

Tolstoy

Tatusova

Pruitt

K.D.

Maglott

D.R.

et al. .

Gene: a gene-centered information resource at NCBI

Nucleic Acids Res.

2015

;

D36

–

D42

29.

McInnes

B.T.

Pedersen

Carlis

Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain

AMIA Annu. Symp. Proc.

2007

;

2007

533

–

537

30.

Kilicoglu

Shin

Fiszman

Rosemblat

Rindflesch

T.C.

SemMedDB: a PubMed-scale repository of biomedical semantic predications

Bioinformatics

2012

;

3158

–

3160

31.

Liu

Bill

Fiszman

Rindflesch

Pedersen

Melton

G.B.

Pakhomov

S.V.

Using SemRep to label semantic relations extracted from clinical text

AMIA Annu. Symp. Proc.

2012

;

2012

587

–

595

32.

Wei

C.H.

Kao

H.Y.

PubTator: a web-based text mining tool for assisting biocuration

Nucleic Acids Res.

2013

;

W518

–

W522

33.

Cairelli

M.J.

Miller

C.M.

Fiszman

Workman

T.E.

Rindflesch

T.C.

Semantic MEDLINE for discovery browsing: using semantic predications and the literature-based discovery paradigm to elucidate a mechanism for the obesity paradox

AMIA Annu. Symp. Proc.

2013

;

2013

164

–

173

34.

Amberger

J.S.

Bocchini

C.A.

Schiettecatte

Scott

A.F.

Hamosh

OMIM.org: online mendelian inheritance in man (OMIM(R)), an online catalog of human genes and genetic disorders

Nucleic Acids Res.

2015

;

D789

–

D798

35.

Fishilevich

Zimmerman

Kohn

Iny Stein

Olender

Kolker

Safran

Lancet

Genic insights from integrated human proteomics in GeneCards

Database

2016

;

2016

baw030

36.

Lerner

R.G.

Grossauer

Kadkhodaei

Meyers

Sidorov

Koeck

Hashizume

Ozawa

Phillips

J.J.

Berger

M.S.

et al. .

Targeting a Plk1-controlled polarity checkpoint in therapy-resistant glioblastoma-propagating cells

Cancer Res

2015

;

5355

–

5366

37.

O’Leary

N.A.

Wright

M.W.

Brister

J.R.

Ciufo

Haddad

McVeigh

Rajput

Robbertse

Smith-White

Ako-Adjei

et al. .

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nucleic Acids Res.

2016

;

D733

–

D745

38.

Liu

Sareddy

G.R.

Zhou

Viswanadhapalli

Lai

Tekmal

R.R.

Brenner

Vadlamudi

R.K.

Differential Effects of Estrogen Receptor β Isoforms on Glioblastoma Progression

Cancer Res.

2018

;

3176

–

3189