|
This web site is dedicated to the Ontology Server(OS) component of the Query
Integration System (QIS),
a prototype application framework for biomedical database federation developed at
the Center for Medical Informatics to test distributed database integration involving
heterogeneous data sources in bioscience.
OS provides the following functionality:
- Mapping of either data or metadata elements within a data source to concepts in
a controlled vocabulary.
- By tagging these elements with controlled vocabulary concept IDs, the task of
integration between different data sources in distributed systems is facilitated.
This use of controlled vocabularies is well known to system integrators in medical
informatics who have to interchange data between systems; the benefits are fairly
obvious and do not need elaboration.
- QIS will let you query our databases by searchiing the UMLS for a particular concept,
and then letting you know if this concept exists within our own database. The neuroscience-related
details of the object/s that map to this concept (such as the higher-order anatomical
structures that they are contained in, or
While the terms controlled vocabulary, thesaurus and ontology
originally meant different things (each succeeding concept is a more robust and
elaborate version of the previous one), today one tends to use these phrases synonymously.
Specifically, an ontology consists of three classes of entity: Concepts, terms (or
alternative names for these concepts), and relationships between these concepts.
The most common kind of relationship is the "is-a" relationship (e.g.,
acetylcholine is-a neurotransmitter) - also called hyponymy, other relationships
also exist, such as "part-of" (meronymy) - e.g., the finger is part-of
the hand.
The reference ontology/controlled vocabulary that we use is the National Library
of Medicine's Unified Medical Language System (UMLS). The UMLS is a meta-thesaurus
- that is, a compendium of other biomedical vocabularies. Some vocabularies
within the UMS are much more useful than others (in fact, the whole is not
greater than the sum of its parts - due to indifferent curation of some vocabularies,
you can actually get conflicting results if you try to make use of all the inter-concept
relationships), and therefore we use a subset of it: the Medical Subject Headings
(MeSH), Gene Ontology, SNOMED and BrainInfo.
We should emphasize that only about a third of the total number of entries in SenseLab
have been mapped to UMLS concepts. The UMLS is under-represented with respect to
neuroscience, and only gross neuroanatomy is well represented. You will search in
vain, for example, for "olfactory glomerular cells", which are involved
in detection of odors (they lie within the cerebral olfactory bulb) and which happen
to be an important focus of research in the Shepherd Lab at Yale.
We emphasize that development of a "local" ontology is currently minimal.
We do record that certain classes of data SHOULD be concepts in the UMLS, even if
all instances of that class are not currently concepts. Such instances, which are
not yet mapped, are called CANDIDATE concepts. It is not enough to merely record
candidate concepts in a local list: one must also specify relationships between
unmapped concepts to each other, and to existing UMLS concepts, in order both to
make the local set useful, as well as to eventually volunteer these to NLM for future
inclusion in UMLS. Specifying such relationships enables a graphical browser to
navigate relationships so that the user can visualize local concepts in their correct
context.
Assigning candidate concepts to their "correct" position in a relationships
tree/network is a human-intensive process, and in general, local ontology development
takes financial resources to support the curators; we do not have such resources
earmarked. The efforts need to be considerable, because certain areas of neuroscience
(for example, neuronal compartments or the large vocabulary of neuronal modelers)
are almost non-existent in the UMLS and here, one has to start from scratch.
In SenseLab, access to the UMLS serves two purposes:
-
Synonymy: The vocabulary of biomedicine
is rife with synonyms: "liver" and "hepatic", "kidney"
and "renal", "vomiting" and "emesis" are Anglo-Saxon
and Graeco-Latin equivalents for the same thing. Keyword-based search is likely
to miss the records of interest unless you take the trouble to manually specify
every alternative spelling of what you are looking for, because you don't know a
priori how it might be recorded in the database. Neuroscience isn't so bad, but
even here, alternative names have arisen: Serotonin has the alternative name 5-hydroxy-tryptamine
(and the abbreviation, 5-HT), while norepinephrine is also called noradrenaline,
and the midbrain is also the mesencephalon.
The nice thing about UMLS is that the curators of the vocabularies that contribute
to it (as well as NLM personnel) have taken pains to specify alternative synonyms
for a concept, so if an object in SenseLab has already been mapped to a UMLS concept,
then you can locate it as follows. You type in a phrase, or part of a phrase, locate
all UMLS terms (and thence) concepts that contain that phrase. After you pick the
one you want, you can directly jump to that concept.
-
Location Transparency: If you've browsed
SenseLab already, then you know that it is organized into
virtual databases or portals.
That is, everything is stored in one big physical database but, in order to cater
to various types of user who are more interested in some classes of data than in
others (e.g., neuronal modelers versus the miicro-anatomists versus olfactory-receptor-sequencers),
the user interface segregates related classes so that a particular family of data
is more directly accessible to a particular type of user. (Some classes of data
- such as neurotransmitter molecules and receptors - show up in multiple portals
because they are so fundamental to all of neuroscience.)
Segregating data like this is all very well for regular users who know exactly where
something of interest to them is likely to be, but casual users who are entering
SenseLab for the first time often simply want to know about something without having
to figure out which portal it lies in. Concept-based search works across
all objects in SenseLab, as well as the remote databases that it links
to, such as the University of Washington, Seattle's BrainInfo)- if any concept of
interest happens to have been mapped to an object, you are taken directly to it,
where it is displayed, typically in the context of other related information in
its portal.
(Caveat: this short-cut navigation feature does not work for the University of Southern
California's Brain Architecture Management System (BAMS) . Unlike databases such
as NCBI's Entrez (or, for that matter, SenseLab) BAMS currently does not provide
a means of displaying all information about an object through a Web request that
specifies that object's unique identifier.)
-
Inter-Operation: Consider the scenario where
multiple databases maintained by separate curator teams are required to inter-operate.
(At a crude level, inter-operation simply means locating an object of interest in
one database, getting some information on it, and then jumping to the same object
in another database to get some different information on it.) If no reference ontology
existed, teams of curators would have to meet in pairwise fashion with lists of
their objects and their accompanying descriptions, and spend days to weeks specifying
correspondences between internal object identifiers
between both databases. If there were N databases, the number of meetings required
would be N * (N-1) / 2.
If, however, a reference ontology was used, such meetings would not even be necessary.
Each group could autonomously map objects in their own database to concepts in the
reference ontology. The latter serves as a lingua franca, and one can use the reference-ontology
concept IDs as a bridge between these two databases for the mapped concepts, irrespective
of what they are called internally (the two databases may even be in different languages,
e.g., English vs. German). Doug Bowden's BrainInfo group at Seattle and our own
group use the UMLS this way to map concepts in gross neuroanatomy. (BrainInfo also
happens to be part of UMLS).
Caveats when searching the UMLS: Ambiguous Terms
Certain words can refer to multiple concepts: for example, the term "serotonin"
can refer to the molecule, a neurotransmitter, but also to the receptor for the
same molecule (where the word "receptor" is likely to be omitted or elided
- e.g., in a table of receptors, the concept of receptor is implicit). Fortunately,
in most cases, the "preferred name" of the concept in the UMLS is informative.
This mini-tutorial lets you search the UMLS to locate a concept of interest, and
then jump to the object mapped to this concept within SenseLab.
-
Go to the Ontology Server, http://os-qis.med.yale.edu
(Open a new browser window by shift-clicking on the link, so that this window is
still available to you.)
-
Choose Tools -> UMLS Search
-
In the pull-down list for "Search For", specify the choice "Term
Name"
-
In the pull-down below this list, specify "Starting with", and type "retinal
gang"
-
You will be shown two rows: "retinal ganglion" and
"retinal ganglion cells". Click the
link (>>) against the second row.
-
A new screen opens, containing the definition of the matching concept in UMLS. (The
prose description that you see is taken from the NLM's Medical Subject Headings
definitions.) Below this definition, you will see two rows, indicating that this
concept is mapped to objects in the BAMS and SenseLab databases. The SenseLab entry
has the hyperlink indicated by "o270". Clicking this link will take you
to the page within SenseLab that describes this object. (e.g., you will see a diagram
of some retinal neurons, as well as links that take you to other details, such as
receptors associated with this type of neuron).
To start, select the "Domains" menu item in the top toolbar to get
the list of the Ontological communities in this OS.
|