Next: Lexicons for Machine-Translation
Up: Lexical Semantic Resources
Previous: Higher Level Ontologies
Subsections
Unified Medical Language System
Introduction
Unified Medical Language System (UMLS) is a set of knowledge sources developed
by the US National Library of Medicine as experimental products. It consists of
four sections: a metathesaurus, a semantic network, a specialist lexicon and an
information sources map, and contains information about medical terms and their
interrelationships.
Description
The Metathesaurus contains syntactic and
semantic information about medical terms that appear in 38 controlled
vocabularies and classifications, such as SNOMED, MeSh, and ICD. It is
organised by concept, and contains over 330,000 concepts and 739,439
terms. It also contains syntactic variations of terms, represented as
strings. The representation takes the form of three levels: a
set of general concepts (represented by a code), a set of concept
names (represented by another, related, code) and a set of strings
(represented by another code and the lexical string itself). An
illustrative example is given in
Figure 3.3. Meanings and relationships are
preserved from the source vocabularies, but some additional
information is provided and new relationships between concepts and
terms from different sources are established.
Figure 3.3:
Fragment of Concept Hierarchy.
|
The relationships described between concepts in the metathesaurus are
the following:
- X is broader than Y
- X is narrower than Y
- X and Y are ''alike''
- X is a parent of Y
- X is a child of Y
- X is a sibling of Y
- X and Y have some other relation
An example record from the relational file is given below.
C0001430 | CHD | C0022134 | isa | MSH97 | MTH
This indicates that there is an is-a relationship between the concept
nesidioblastoma (C0001430) and the term adenoma (C0022134),
that the former is a child of the latter, the source of the relationship comes
from the MeSh subject headings (MSH97), and that this relationship was created
specifically for the Metathesaurus (MTH).
The semantic network contains information about the semantic types that are
assigned to the concepts in the Metathesaurus. The types are defined explicitly
by textual information and implicitly by means of the hierarchies represented.
The semantic types are represented as nodes and the relationships between them
as links. Relationships are established with the highest level possible.
As a result, the classifications are very general rather than
explicit ones between individual concepts.
In the semantic network, relations are stated between semantic types. The
primary relation is that of hyponymy, but there are also five major categories
of non-hierarchical relations:
- physical, e.g. contains
- spatial, e.g. location of
- functional, e.g. prevents
- temporal, e.g. co-occurs with
- conceptual, e.g. diagnoses
The following example shows how terms can be decomposed into their
various concepts and positioned in the hierarchy.
D-33-- Open Wounds of the Limbs
DD-33620 Open wound of knee without complication 891.0
(T-D9200)(M-14010)(G-C009)(F-01450)
DD-33621 Open wound of knee with complication 891.1
(T-D9200)(M-14010)(G-C008)(F-01450)
The specialist lexicon provides detailed syntactic information about biomedical
terms and common English words. An individual lexical entry is created for each
spelling variant and syntactic category for a word. These are then grouped
together to form a unit record for each word, defined in a frame structure
consisting of slots and fillers. Full morphological and syntactic information is
provided, e.g. syntactic category, number, gender, tense, adjectival type, noun
type, etc.
The information sources map contains details of the original sources of the
terms. It consists of a database of records describing the information
resources, with details such as scope, probability utility and access
conditions. The sources themselves are varied and include bibliographic
databases, factual databases and expert systems.
Comparison with Other Lexical Databases
SNOMED thus represents an implicit hierarchy of medical terms and
their relationships by means of a coding system, enabling the
identification of synonyms, hyponyms and hyperonyms. This makes it
related to WordNet, although its coverage is very different. All the
information in SNOMED is contained in UMLS, but represented in a more
explicit tree-like structure. However, UMLS also includes information
from a wide variety of other sources, and establishes relationships
between these. UMLS further provides explicit morphological, syntactic
and semantic information.
Relations to Notions of Lexical Semantics
As with the Higher-Level Ontologies discussed in the previous section,
the structuring as synonyms, hyponyms and hyperonyms relate it to
cognitive taxonomic models referred to in §2.7.
LE Uses
Both NLM and many other research groups and institutions are using UMLS in a
variety of applications, including natural language processing, information
extraction and retrieval, document classification, creation of medical data
interfaces, etc. NLM itself uses it in several applications [UMLS97],
including Internet Grateful Med, an assisted interactive retrieval system,
SPECIALIST, an NLP system for processing biomedical information, and the
NLM/AHCPR Large-Scale Vocabulary Test.
Next: Lexicons for Machine-Translation
Up: Lexical Semantic Resources
Previous: Higher Level Ontologies
EAGLES Central Secretariat eagles@ilc.cnr.it