Article Navigation

Journal Article

WormBase: new content and better access

Abstract

WormBase (Author Webpage), a model organism database for Caenorhabditis elegans and other related nematodes, continues to evolve and expand. Over the past year WormBase has added new data on C.elegans, including data on classical genetics, cell biology and functional genomics; expanded the annotation of closely related nematodes with a new genome browser for Caenorhabditis remanei; and deployed new hardware for stronger performance. Several existing datasets including phenotype descriptions and RNAi experiments have seen a large increase in new content. New datasets such as the C.remanei draft assembly and annotations, the Vancouver Fosmid library and TEC-RED 5′ end sites are now available as well. Access to and searching WormBase has become more dependable and flexible via multiple mirror sites and indexing through Google.

DESCRIPTION

Much of our understanding of biological processes and human biology comes from a vast amount of data generated in the study of a few model organisms. The National Human Genome Research Institute (NHGRI; Author Webpage) in the United States and the British Medical Research Council have funded model organism databases (MODs) to store and integrate information for individual model organisms. WormBase is one of these MODs for the small, soil nematode, Caenorhabditis elegans. WormBase originated in 2001 and initially was simply a web-based interface for its predecessor, ACeDB (1,2) (Author Webpage), which contained primarily the genetic and physical maps, and the genome sequence of C.elegans. Over the years, WormBase has evolved and expanded through developing and utilizing better software, and providing richer content (3) by bringing in data from classical genetics, cell biology and functional genomics (4,5), and the genomic sequences of other closely related nematodes (6,7).

WormBase is maintained by the International WormBase Consortium, a group of ∼30 scientists located at four sites (Author Webpage). The data are organized into a single database that is available for downloading (Author Webpage) and web browsing. Access to, and use of, the database and resources are freely available, subject to an Acceptable Use Policy (Author Webpage) and copyrights (Author Webpage). There is also a User's Guide (Author Webpage), an evolving Wiki site (Author Webpage) and an email-based Help Desk ([email protected])

WormBase is not a static database. It is constantly growing with new data continually being added and made available to users through new builds and releases of the database every three weeks. Every 10th release is maintained as a permanently available, stable data source for reference. This review is an overview of major new content, software and performance improvements, and how the user community helps to focus efforts of WormBase.

NEW CONTENT

WormBase's content has doubled in the last three years as measured by the size of its underlying database (Figure 1). This growth is expected to increase as research using C.elegans expands and more nematode genome sequencing projects are completed.

Figure 1

The increase in content of WormBase as measured by the size of the underlying database. Plotted are the sizes in gigabytes of the archived releases of the WormBase database.

Open in new tab Download slide

Additions to existing data types

Phenotypes

Over the past year one of the largest percentage increases in an existing data type has been for phenotype objects. There have been major changes in phenotype curation, including a new and expanded Phenotype Ontology, the evaluation and consolidation of preexisting phenotype classes, and improvements in phenotype curation methods. All of these allow more accurate and detailed phenotype information to be stored in WormBase, and make querying for phenotype-related information more efficient. Consequently, the number of phenotype objects in WormBase has increased dramatically over the past year from 119 to 1282.

Gene Ontology

Gene products annotated with Gene Ontology (GO) terms are another existing data type to see a large percentage increase. WormBase is a member of the Gene Ontology Consortium (8), and is developing GO content and annotating C.elegans' genes in GO terms. Currently there are over 50 000 GO annotations for over 10 000 genes in WormBase. Most of these are automated mappings, although there are curator-performed, manual annotations for ∼700 genes with over 800 unique papers cited as references. The automatic GO annotations are of two types. The first is the automated assignment of GO terms to genes based on the RNAi-knockdown phenotype. For example, an RNAi phenotype of Egl (egg-laying defective) is automatically assigned the GO biological process term ‘oviposition’. These are based on manual mapping of phenotypes to GO terms from large-scale RNAi screens. The second type of automated GO annotation uses the InterPro2GO mappings (Author Webpage) provided by the European Bioinformatics Institute (Author Webpage). WormBase has recently begun determining InterPro domains with each build to ensure that they are completely up to date. These are GO term assignments based on protein family, domain, repeat, etc. In WormBase the GO annotations can now be found in a Gene Ontology section on the Gene page, which is the most common entry point for users (Supplementary Figure 1a). A summary of all the GO annotations (Supplementary Figure 1b) is reached through the GO summary link in the ‘Gene Ontology’ section. Details for individual annotations (Supplementary Figure 1c) can also be obtained through links in the section.

RNAi

RNAi data have also increased significantly over the past year, specifically from papers that describe individual RNAi experiments. Curation of these experiments is slow and labor intensive compared with papers containing large-scale datasets. Nonetheless, there has been a 150% increase in the individual RNAi experiments in WormBase. There are currently 2754 such experiments. Including curation of large-scale datasets, there are 63 740 total, curated RNAi experiments in WormBase, compared with 59 882 a year ago. There is still more to do, however. Approximately 45% of 871 C.elegans papers that contain RNAi data have been curated, compared with just over 20% a year ago. RNAi results are summarized in the Function section of the Gene page (Supplementary Figure 2a). A link to view all RNAi experiments that target the gene takes the user to a table (Supplementary Figure 2b) containing the details and results of each RNAi experiment.

Microarray data

The June 2006 database release (WS160) has microarray data from 33 papers describing 417 experiments, which represents almost a doubling in the number of curated, microarray papers and a 78% increase in the number of curated experiments in WormBase over the past year.

To support microarray research in the worm community, the probe sets for two new microarray platforms available from Agilent (Author Webpage) and Washington University's Genome Sequencing Center (Author Webpage) are now in WormBase. These are in addition to the Affymetrix chip data previously available in WormBase. All of these data can be downloaded using the WormBase data-mining tool, WormMart (Author Webpage). The probes are mapped to the genome sequence and displayed as a track on the Genome Browser (9) (Supplementary Figure 3a). They are also mapped to corresponding gene models (Supplementary Figure 3b) with detailed reports (Supplementary Figure 3c) linked to the Gene page.

Intellectual lineage

The Intellectual Lineage was conceived as a way to indicate the heritage of the C.elegans research community. The types of connections a person can have are as follows: Supervised, Supervised by and Collaborated with. There are 1461 people that have connections to other people with 3729 person-to-person connections. This is up from 1030 people and 2626 person-to-person connections from a year ago. The lineage for an individual can be found on their Person page (Supplementary Figure 4), while the global view of the entire worm community lineage is available at WormBase (Author Webpage).

New datasets

Caenorhabditis remanei annotations

A preliminary, draft assembly and automated annotations of the C.remanei genome are now available on a Genome Browser (Author Webpage) and can be download from the WormBase ftp site (Author Webpage). Annotations include gene predictions from several ab initio and alignment-based algorithms. Sequence improvement is ongoing at the Washington University Genome Sequencing Center with a final draft assembly and new annotations anticipated early in 2007. A summary of the best BLASTP match to C.remanei is now displayed on C.elegans and Caenorhabditis briggsae Gene pages (Supplementary Figure 5).

Vancouver fosmids

The fosmid library constructed and mapped to the genome by Don Moerman and colleagues at the C.elegans Reverse Genetics Core Facility located at the University of British Columbia is now available on the Genome Browser in the YAC, Fosmids and Cosmids track (Supplementary Figure 6) and searchable in WormBase. This library was made to supplement the aging cosmid and YAC libraries used in the original mapping and sequencing of the genome, some of which are now over 20 years old, and are more difficult to work with than fosmids.

TEC-RED

This relatively new technique involves trans-spliced exon coupled 5′ RNA end determination, which identifies 5′ ends of expressed genes in nematodes [details of the procedure can be found in reference (10)]. The TEC-RED data of Hwang et al. (10) is now in WormBase and can be viewed in the Genome Browser (Supplementary Figure 7). These data have been used extensively to update numerous gene structures.

IMPROVED ACCESS AND USABILITY

Fast and robust access

Community use of WormBase continues to grow. There are now over 2 million page hits per month, a 40% increase over the past year, with the Gene page being the most frequently accessed page. This increase prompted the deployment of a multiply-redundant, load-balancing server system, which now distributes requests across three back-end servers, each capable of serving the complete WormBase site. Two additional servers handle specialized tasks such as BLAST requests and query pages. There is also robust caching that delivers the most frequently requested pages from a high-speed, in-memory and on-disk cache. These improvements have resulted in dramatic increases in reliability and performance. Inserting low-cost machines into the existing infrastructure can easily accommodate increased demand in the future.

Access to the data at WormBase is becoming more diverse and flexible. In addition to the main WormBase site located at the Cold Spring Harbor Laboratory (Author Webpage), there are now two new mirror sites: one at the Welcome Trust Sanger Institute in England (Author Webpage) and one at the University of Marseille in France (Author Webpage) that provide flexibility and backup if the main server experiences problems. This increases the total number of mirror sites to four with the existing sites at the California Institute of Technology (Author Webpage) and the Institute of Molecular Biology and Biochemistry in Greece (Author Webpage). There are links to all the mirror sites and the main and development site on the WormBase homepage (Supplementary Figure 8).

Another portal to WormBase is through WormBook (Author Webpage). WormBook content is extensively linked to WormBase with Genes, Proteins and Cells linked to the relevant pages in WormBase. Links from WormBase back to WormBook have recently been implemented to provide users easy access to background and in-depth information.

Improved searching

More recently, access to WormBase can be gained through search engines such as Google, which are now able to index the site. Previously their access had been restricted due to load issues on the old, single server infrastructure. Researchers worldwide can now search the vast content of highly curated data without having to actually visit the WormBase site. Google searches can be limited to WormBase results by using the format ‘site: Author Webpage [search terms]’. Google's page caching also provides an additional layer of redundancy during unexpected server interruptions at the WormBase site.

Searching for strains at the Caenorhabditis Genetic Center (CGC) has been improved and is now hosted at WormBase. In addition to searching by strain name, users may now search for strains carrying a specific gene or allele, or other information found in the strain class (i.e. species, mutagen, data received and remarks).

Improved data presentation

Data layout and access has become more intuitive with new or updated web pages. Information from the Protein Data Bank's TargetDB (11) (Author Webpage), which provides status and tracking information on the production and solution of structures, is now displayed on Gene pages. Also on the Gene page, a GO summary (Supplementary Figure 1) has been added as well as three types of evolutionary data: InParanoid clusters of orthologous genes (12) (Author Webpage), Treefam's (13) (Author Webpage) curated ortholog and paralog assignments (Figure 2), and C.elegans/C.briggsae mutual best BLASTP matches (Supplementary Figure 5). The Gene pages also include the phenotype of RNAi experiments, and physical and genetic interactions, which are now displayed in a convenient table (Supplementary Figure 2a).

Figure 2

Screen shot of the Homology section of the mec-14 Gene page showing InParanoid groups, C.briggsae orthologs and a TreeFam tree.

Open in new tab Download slide

WORMBASE IS COMMUNITY DRIVEN

The focus and direction of WormBase is driven by the user community. Two ways this occurs is through user surveys and the WormBase advisory board. User surveys are commissioned periodically to gauge user concerns and direct future efforts for data curation. The most recent survey was conducted in the fall of 2005 (Author Webpage). The results were discussed at the WormBase advisory board in November 2005 and summarized in the January 2006 WormBase Newsletter (Author Webpage). One of the concerns of many of the respondents was the speed of WormBase. The Advisory Board felt improving response time should be a major priority, and WormBase responded with the new hardware structure, additional servers and page caching described above.

The WormBase Advisory board meets once a year with the entire WormBase staff to review progress and prioritize projects for the upcoming year. The board consists of several members of the C.elegans research community, as well as people with expertise in bioinformatics and databases.

WormBase also reaches out to the user community by having staff members at all the international, regional and topical C.elegans meetings, as well as other meetings, often presenting talks and posters, but mainly to receive input from users regarding problems as well as new features or datasets they would like to see WormBase host. Some requests are for information for which WormBase simply does not have access, such as large-scale studies before publication. WormBase strives to have data available at the time of publication and encourages authors to contact WormBase about upcoming publications.

The WormBase Wiki site (Author Webpage) provides a mechanism for users to easily add content to WormBase. Individuals can post experimental protocols, make posting about meetings and job openings, or add useful information about their favorite gene(s). A long-term goal is for the Wiki site to assume the role of the print version of the Worm Breeder's Gazette.

P.W.S. is an investigator with the Howard Hughes Medical Institute. WormBase is supported by grant P41-HG02223 from the US National Human Genome Research Institute and by the British Medical Research Council. Funding to pay the Open Access publication charges for this article was provided by grant P41-HG02223 from the US National Human Genome Research Institute.

Conflict of interest statement. None declared.

REFERENCES

Eeckman

F.H.

Durbin

ACeDB and macace

Methods Cell Biol.

1995

, vol.

(pg.

583

605

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Stein

L.D.

Thierry-Mieg

Scriptable access to the Caenorhabditis elegans genome sequence and other ACEDB databases

Genome Res.

1998

, vol.

(pg.

1308

1315

)

Schwarz

E.M.

Antoshechkin

Bastiani

Bieri

Blasiar

Canaran

Chan

Chen

W.J.

Davis

et al. ,

WormBase: better software, richer content

Nucleic Acids Res.

2006

, vol.

(pg.

D475

D478

)

Chen

Lawson

Bradnam

Harris

T.W.

Stein

L.D.

WormBase as an integrated platform for the C.elegans ORFeome

Genome Res.

2004

, vol.

(pg.

2155

2161

)

Chen

Harris

T.W.

Antoshechkin

Bastiani

Bieri

Blasiar

Bradnam

Canaran

Chan

Chen

C.K.

et al. ,

WormBase: a comprehensive data resource for Caenorhabditis biology and genomics

Nucleic Acids Res.

2005

, vol.

(pg.

D383

D389

)

Stein

L.D.

Bao

Blasiar

Blumenthal

Brent

M.R.

Chen

Chinwalla

Clarke

Clee

Coghlan

et al. ,

The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics

PLoS Biol.

2003

, vol.

pg.

E45

Harris

T.W.

Chen

Cunningham

Tello-Ruiz

Antoshechkin

Bastiani

Bieri

Blasiar

Bradnam

Chan

et al. ,

WormBase: a multi-species resource for nematode biology and genomics

Nucleic Acids Res.

2004

, vol.

(pg.

D411

D417

)

Gene Ontology Consortium.

Creating the gene ontology resource: design and implementation

Genome Res.

2001

, vol.

(pg.

1425

1433

)

Crossref

PubMed

WorldCat

Stein

L.D.

Mungall

Shu

Caudy

Mangone

Day

Nickerson

Stajich

J.E.

Harris

T.W.

Arva

et al. ,

The Generic Genome Browser: a building block for a model organism system database

Genome Res.

2002

, vol.

(pg.

1599

1610

)

Hwang

B.J.

Müller

H.M.

Sternberg

P.W.

Genome annotation by high-throughput 5′ RNA end determination

Proc. Natl Acad. Sci. USA

2004

, vol.

101

(pg.

1650

1655

)

Google Scholar

Crossref

WorldCat

Chen

Oughtred

Berman

H.M.

Westbrook

TargetDB: a target registration database for structural genomics projects

Bioinformatics

2004

, vol.

(pg.

2860

2862

)

Remm

Storm

C.E.V.

Sonnhammer

E.L.L.

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons

J. Mol. Biol.

2001

, vol.

314

(pg.

1041

1052

)

Coghlan

Ruan

Coin

L.J.

Hériché

J.-K.

Osmotherly

Lui

Zhang

Bolund

et al. ,

TreeFam: a curated database of phylogenetic trees of animal gene families

Nucleic Acids Res.

2006

, vol.

(pg.

D572

D580

)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Views

1,482

Altmetric

Total Views 1,482

1,164 Pageviews

318 PDF Downloads

Since 1/1/2017

Month:	Total Views:
January 2017	3
February 2017	5
March 2017	3
April 2017	1
July 2017	2
August 2017	2
September 2017	7
October 2017	3
November 2017	1
December 2017	13
January 2018	6
February 2018	14
March 2018	16
April 2018	15
May 2018	12
June 2018	11
July 2018	12
August 2018	11
September 2018	11
October 2018	20
November 2018	19
December 2018	4
January 2019	7
February 2019	10
March 2019	33
April 2019	30
May 2019	24
June 2019	20
July 2019	34
August 2019	14
September 2019	21
October 2019	15
November 2019	18
December 2019	17
January 2020	14
February 2020	19
March 2020	12
April 2020	6
May 2020	23
June 2020	14
July 2020	8
August 2020	17
September 2020	14
October 2020	9
November 2020	11
December 2020	8
January 2021	3
February 2021	6
March 2021	14
April 2021	10
May 2021	16
June 2021	11
July 2021	16
August 2021	18
September 2021	21
October 2021	20
November 2021	40
December 2021	24
January 2022	29
February 2022	24
March 2022	23
April 2022	21
May 2022	19
June 2022	30
July 2022	16
August 2022	20
September 2022	50
October 2022	68
November 2022	19
December 2022	57
January 2023	11
February 2023	8
March 2023	2
April 2023	36
May 2023	5
June 2023	3
July 2023	9
August 2023	12
September 2023	9
October 2023	6
November 2023	11
December 2023	16
January 2024	21
February 2024	21
March 2024	26
April 2024	10
May 2024	17
June 2024	6
July 2024	19
August 2024	9
September 2024	6
October 2024	12
November 2024	13
December 2024	3
January 2025	3
February 2025	6
March 2025	9
April 2025	9

Article Contents

WormBase: new content and better access

Abstract

DESCRIPTION

NEW CONTENT

Additions to existing data types

Phenotypes

Gene Ontology

RNAi

Microarray data

Intellectual lineage

New datasets

Caenorhabditis remanei annotations

Vancouver fosmids

TEC-RED

IMPROVED ACCESS AND USABILITY

Fast and robust access

Improved searching

Improved data presentation

WORMBASE IS COMMUNITY DRIVEN

REFERENCES

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

WormBase: new content and better access Open Access

Abstract

DESCRIPTION

NEW CONTENT

Additions to existing data types

Phenotypes

Gene Ontology

RNAi

Microarray data

Intellectual lineage

New datasets

Caenorhabditis remanei annotations

Vancouver fosmids

TEC-RED

IMPROVED ACCESS AND USABILITY

Fast and robust access

Improved searching

Improved data presentation

WORMBASE IS COMMUNITY DRIVEN

REFERENCES

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

WormBase: new content and better access