Table 3

Dataset preparation for systems in BioCreative Workshop 2012

SystemDataset selection for pre-workshop evaluationInformation capturedBiocurators involved in gold standard annotationBiocurators involved in annotation in evaluation
Textpresso30 full-length articles about Dictyostelium discoideum from 2011 to 2012 not yet annotated in dictyBase. This set contains 61 GO cellular component annotations in 124 sentences as annotated by senior dictyBase biocuratorPaper Identifier, annotation entity, paper section, curatable sentence, component term in sentence, GO term, GO ID and evidence code.dictyBase senior curatordictyBase and Plant Ontologya
PCS50 textual descriptions of phenotypic characters in NeXML format randomly selected from 50 articles about fish or other vertebrates. Gold standard 50 character descriptions annotated by a senior Phenoscape biocuratorEntity term, entity ID, quality term, quality ID, quality negated, quality modifier, entity locator, count and morePhenoscape senior curatorZFIN and Phenoscape
PubTatorTAIR set: 50 abstracts (24 relevant) sampled from November 2011 for Arabidopsis already curated by TAIRGene indexing: gene names and Entrez gene IDExisting annotated corpusTAIR and National Library of Medicine (NLM)
NLM set: 50 abstracts sampled from Gene Indexing Assistant Test Collection (human)Document triage information: list of relevant PMIDs
PPInterFinder50 abstracts describing human kinases obtained by using a combination of tool/resources (such as UniProt, PubMeMiner, FABLE, and PIE).PMID, protein interactant name 1, protein interactant name 2NRBioGrid and MINT
eFIPPMID-centric: 50 abstracts randomly selected based on proteins involved in two pathways of interest to Reactome autophagy and HIV infectionPMID, phosphorylated protein, phosphorylated site, interactant name, effect, evidence sentenceNRMerck Serono, Reactome, and SGDb
gene-centric: 10 first-ranked abstracts for 4 proteins involved in the adaptive immune system (Reactome: REACT_75774)
T-HODPMID-centric: 50 abstracts from 2011 journals about obesity, diabetes or hypertensionPMID, EntrezGene ID, gene name, disease, gene–disease relation, evidence sentenceProtein Ontology senior curatorPfizer, Reactome, GAD, and MGI
gene-centric: review relevancy of documents for four genes
SystemDataset selection for pre-workshop evaluationInformation capturedBiocurators involved in gold standard annotationBiocurators involved in annotation in evaluation
Textpresso30 full-length articles about Dictyostelium discoideum from 2011 to 2012 not yet annotated in dictyBase. This set contains 61 GO cellular component annotations in 124 sentences as annotated by senior dictyBase biocuratorPaper Identifier, annotation entity, paper section, curatable sentence, component term in sentence, GO term, GO ID and evidence code.dictyBase senior curatordictyBase and Plant Ontologya
PCS50 textual descriptions of phenotypic characters in NeXML format randomly selected from 50 articles about fish or other vertebrates. Gold standard 50 character descriptions annotated by a senior Phenoscape biocuratorEntity term, entity ID, quality term, quality ID, quality negated, quality modifier, entity locator, count and morePhenoscape senior curatorZFIN and Phenoscape
PubTatorTAIR set: 50 abstracts (24 relevant) sampled from November 2011 for Arabidopsis already curated by TAIRGene indexing: gene names and Entrez gene IDExisting annotated corpusTAIR and National Library of Medicine (NLM)
NLM set: 50 abstracts sampled from Gene Indexing Assistant Test Collection (human)Document triage information: list of relevant PMIDs
PPInterFinder50 abstracts describing human kinases obtained by using a combination of tool/resources (such as UniProt, PubMeMiner, FABLE, and PIE).PMID, protein interactant name 1, protein interactant name 2NRBioGrid and MINT
eFIPPMID-centric: 50 abstracts randomly selected based on proteins involved in two pathways of interest to Reactome autophagy and HIV infectionPMID, phosphorylated protein, phosphorylated site, interactant name, effect, evidence sentenceNRMerck Serono, Reactome, and SGDb
gene-centric: 10 first-ranked abstracts for 4 proteins involved in the adaptive immune system (Reactome: REACT_75774)
T-HODPMID-centric: 50 abstracts from 2011 journals about obesity, diabetes or hypertensionPMID, EntrezGene ID, gene name, disease, gene–disease relation, evidence sentenceProtein Ontology senior curatorPfizer, Reactome, GAD, and MGI
gene-centric: review relevancy of documents for four genes

NR:non-recorded. aCurator novice to GO annotation. bSGD curator participated in first evaluation which is not reported in performance results here.

Table 3

Dataset preparation for systems in BioCreative Workshop 2012

SystemDataset selection for pre-workshop evaluationInformation capturedBiocurators involved in gold standard annotationBiocurators involved in annotation in evaluation
Textpresso30 full-length articles about Dictyostelium discoideum from 2011 to 2012 not yet annotated in dictyBase. This set contains 61 GO cellular component annotations in 124 sentences as annotated by senior dictyBase biocuratorPaper Identifier, annotation entity, paper section, curatable sentence, component term in sentence, GO term, GO ID and evidence code.dictyBase senior curatordictyBase and Plant Ontologya
PCS50 textual descriptions of phenotypic characters in NeXML format randomly selected from 50 articles about fish or other vertebrates. Gold standard 50 character descriptions annotated by a senior Phenoscape biocuratorEntity term, entity ID, quality term, quality ID, quality negated, quality modifier, entity locator, count and morePhenoscape senior curatorZFIN and Phenoscape
PubTatorTAIR set: 50 abstracts (24 relevant) sampled from November 2011 for Arabidopsis already curated by TAIRGene indexing: gene names and Entrez gene IDExisting annotated corpusTAIR and National Library of Medicine (NLM)
NLM set: 50 abstracts sampled from Gene Indexing Assistant Test Collection (human)Document triage information: list of relevant PMIDs
PPInterFinder50 abstracts describing human kinases obtained by using a combination of tool/resources (such as UniProt, PubMeMiner, FABLE, and PIE).PMID, protein interactant name 1, protein interactant name 2NRBioGrid and MINT
eFIPPMID-centric: 50 abstracts randomly selected based on proteins involved in two pathways of interest to Reactome autophagy and HIV infectionPMID, phosphorylated protein, phosphorylated site, interactant name, effect, evidence sentenceNRMerck Serono, Reactome, and SGDb
gene-centric: 10 first-ranked abstracts for 4 proteins involved in the adaptive immune system (Reactome: REACT_75774)
T-HODPMID-centric: 50 abstracts from 2011 journals about obesity, diabetes or hypertensionPMID, EntrezGene ID, gene name, disease, gene–disease relation, evidence sentenceProtein Ontology senior curatorPfizer, Reactome, GAD, and MGI
gene-centric: review relevancy of documents for four genes
SystemDataset selection for pre-workshop evaluationInformation capturedBiocurators involved in gold standard annotationBiocurators involved in annotation in evaluation
Textpresso30 full-length articles about Dictyostelium discoideum from 2011 to 2012 not yet annotated in dictyBase. This set contains 61 GO cellular component annotations in 124 sentences as annotated by senior dictyBase biocuratorPaper Identifier, annotation entity, paper section, curatable sentence, component term in sentence, GO term, GO ID and evidence code.dictyBase senior curatordictyBase and Plant Ontologya
PCS50 textual descriptions of phenotypic characters in NeXML format randomly selected from 50 articles about fish or other vertebrates. Gold standard 50 character descriptions annotated by a senior Phenoscape biocuratorEntity term, entity ID, quality term, quality ID, quality negated, quality modifier, entity locator, count and morePhenoscape senior curatorZFIN and Phenoscape
PubTatorTAIR set: 50 abstracts (24 relevant) sampled from November 2011 for Arabidopsis already curated by TAIRGene indexing: gene names and Entrez gene IDExisting annotated corpusTAIR and National Library of Medicine (NLM)
NLM set: 50 abstracts sampled from Gene Indexing Assistant Test Collection (human)Document triage information: list of relevant PMIDs
PPInterFinder50 abstracts describing human kinases obtained by using a combination of tool/resources (such as UniProt, PubMeMiner, FABLE, and PIE).PMID, protein interactant name 1, protein interactant name 2NRBioGrid and MINT
eFIPPMID-centric: 50 abstracts randomly selected based on proteins involved in two pathways of interest to Reactome autophagy and HIV infectionPMID, phosphorylated protein, phosphorylated site, interactant name, effect, evidence sentenceNRMerck Serono, Reactome, and SGDb
gene-centric: 10 first-ranked abstracts for 4 proteins involved in the adaptive immune system (Reactome: REACT_75774)
T-HODPMID-centric: 50 abstracts from 2011 journals about obesity, diabetes or hypertensionPMID, EntrezGene ID, gene name, disease, gene–disease relation, evidence sentenceProtein Ontology senior curatorPfizer, Reactome, GAD, and MGI
gene-centric: review relevancy of documents for four genes

NR:non-recorded. aCurator novice to GO annotation. bSGD curator participated in first evaluation which is not reported in performance results here.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close