Abstract

Accurate information retrieval of scientific articles is complicated by the intricacies of natural language. Ontologies are useful for formalizing scientific concepts and providing a foundation to mediate between formal, computationally accessible, representations and informal natural language expressions. The terminology associated with the MHC exemplifies the need for a sophisticated translation process between a natural language query and the relevant documents. For example, a researcher searching for “peptides that bind MHC” should be presented with a text containing “peptides were eluted from HLA-A2.” To achieve this goal, we have developed a multi-species MHC ontology which encompasses all human MHC alleles and serologic groups. Because of the rapid rate of new allele discovery, allele incorporation is automated using the IMGT/HLA and IPD-MHC databases. In the context of the StemNet knowledge management project, this MHC ontology contributes to machine-learning-driven text annotation, as well as to search-string interpretation and mapping. In this way, StemNet aims to offer researchers a platform to efficiently access the goldmine of clinically relevant immunogenetic information, which is otherwise inaccessible by standard search engines and databases.

This work was financed by the Federal Ministry of Education and Research, Germany, funding number 01DS001C; the authors are solely responsible for content.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)
You do not currently have access to this article.