| Annotators' home |
| Oncology annotators' page |
|
The definition of the "Gene" category is on its own page. (This definition does not (as of early July, 2003) mention the distinction between genomic material and gene products described below.)
There are three buttons in WordFreak under the "Gene" entity:
"Gene-Gene/RNA" is for genes and RNA elements (see the Definition).
Usually there is no problem deciding which tag to use. But the same name or symbol can be used for a gene and for a protein that expresses it. "Generic" is just for those times when either
Some gene names include another gene in their name, such as "p53 tumor suppressor gene". With these, only tag the longest name, the one that includes the other(s). See the general rule on tags within tags.
When you see a phrase like "the p53 gene" or "an N-ras protein", don't include the word "gene" or "protein" in the tag. It tells you whether to use the Gene/RNA button or the Protein button, and once you've done that the word itself is superfluous. The same goes for "oncogene".
But a pseudogene is not a gene, so if you're tagging "XXX" include "pseudogene" as well: [2004-10-14]
"X Y-ase" is often an enzyme (a Y-ase) that acts on X, so it is safest to include the last word in the tag if it is an enzyme name. But if the "Y-ase" precedes the "X", as in "kinase STK15", you can pretty well tell that it is redundant and explanatory-- "STK15 (which, by the way, is a kinase)"-- so do not include it in the tagged string: [2004-10-14]
(Subcategories replacing information-gathering mode [2004-03-25])
Most of the tags described below are for specific attributes of Malignancy, analogous to "Variation-state-original". But the Malignancy-type tag has a special purpose: to capture the diagnostic name of a malignancy. Unlike the more specific attributes like clinical stage, a Malignancy-type tag can have other tags inside it. In other words, tag-within-tag is allowed (and fairly common) when the outer tag is Malignancy-type.
Many of these attributes can be expressed as adjectives. Tag such adjectives as well. [2004-08-17]
This is how clinicians name the different types of cancer. As you can imagine, just like genes, there are different ways to name a single malignancy type: morphologic features, histological observation, anatomical location, the name(s) of the discoverer or patients, and many more. These criteria are not mutually exclusive. "Leukemia" could be considered as either an anatomical or a histological type-name, but either way it's a Malignancy-type. "Squamous cell carcinoma" and "Ewing's sarcoma" are made up of a cancer name and a modifier; the unmodified name by itself ("carcinoma", "sarcoma") would be a Malignancy-type, but we don't tag it within the more detailed name; we tag the full phrase.
We don't have a list of names of Malignancy-type, and we probably never will have a complete one. This is something like information-gathering, but with some restrictions. With your bio or medical backgrounds, you will probably be able to recognize what is meant as the name of the cancer -- the Malignancy-type -- most of the time. Tag it. But we are restricting it: no prepositions. If you see "cancer of the lung", the Malignancy-type ends at "of": it's just "cancer". Tag "lung" separately as Malignancy-site.
cancer of the lung
------ Malignancy-type
---- Malignancy-site
In the list below, the text you would tag as Malignancy-type is italicized. When you see a name that you think should qualify as a type but doesn't fit any of the criteria in this list -- morphology, histology, anatomy, or eponymy -- tag it and mention it on the onco-list mailing list.
Morphology: the cell types affected by the cancer. Some examples:
Histology: the type of tissue affected by the cancer:
Anatomy: at which body parts the cancer is active:
Eponymy: the name of the person who first described the cancer, or in whom it was first described:
Tag adjectival forms of Malignancy-type as well:
The tumor was composed of carcinomatous, sarcomatous, and transitional elements in the frontal wall of the uterine body and therefore was diagnosed as a carcinosarcoma. (PMID 11520156)
premalignant conditions
Tag these as Clinical-stage.
[2004-09-23]
types of normal tissue
Normal tissue, e.g., "fibroblast" may be tagged as Malignancy-histology, but not Malignancy-type.
metastasis, XYZ metastasis
Not a Malignancy-type. But metastasis can be a
Malignancy-clinical-stage.
tumor masses that do not actually specify the type of tumor "Metastasis" is a particular case of this general rule.
| The scope of Malignancy-Developmental-State has been significantly modified with the addition of quantitative tagging, as we begin to annotate Survival-status. See Developmental-state. This older definition is being retained for reference and comparison until all files annotated under this definition have been reannotated under the new one, with the interim label Developmental-state. When that is done we will rename all the Developmental-state references to Malignancy-developmental-state, reflecting the entity's conceptual association with Malignancy. [2005-02-11] ° |
|
This represents different developmental timelines of the malignancy's host (in the sense that a parasite lives in a host): an individual (usually patients), a cell line, or a tissue. The values of this attribute can be at different levels of specificity. Development of person or tissue:
Development of cell line:
Some comparative words can also be values of this attribute since
they provide the timeline information in relative terms; e.g.:
|
We use this attribute for three distinct types of mention:
Tumors are usually staged clinically by researchers. This attribute is used to evaluate the extent of a cancer within the body, especially whether the disease has spread from the original site to other parts of the body. There are different staging systems for different kinds of tumors.
There are three staging systems used for neuroblastoma: the Evans System, the St. Jude System, and the International Staging System. We may see any of these; any of these would be tagged as Clinical Stage. Other systems are used for other kinds of cancer. Tag them all, not just for neuroblastoma.
The International Neuroblastoma Staging System (INSS) is now universally used to stage neuroblastoma:
Stage 1: Localized tumor confined to the area of origin, with complete gross excision, lymph nodes microscopically negative.
Stage 2: The tumor extends beyond the structure of origin, but does not cross the midline,
Stage 3: Tumor extends beyond the midline, with or without bilateral lymph node involvement.
Stage 4/4S: Tumor disseminated to distant sites, such as bone, bone marrow, liver, skin or lymph nodes.
The older Evans system for neuroblastoma:
Besides the specific terms used in specified staging systems, some
general terms can also be used to state the clinical stage of the
tumor, and so should be treated as the values of this attribute as well,
such as
An annotator asked:
In source_file_3806_22708 (PMID 9718653) should MEN type 2B syndrome be tagged as a malignancy? I ask because the definition for it* stated that it is characterized by the 100% incidence of medullary thyroid carcinoma.* in the NCI Metathesaurus
I referred the question to the domain experts. They
decided that all premalignant conditions should be tagged as
Clinical-stage, restricting Malignancy-type to "established cancer
names". At some point in the future we may develop a separate way of
annotating references to premalignant conditions, but this will do for
now.
When the words "benign" and "malignant" describe a cancer or a tumor,
tag them as Clinical-stage:
This includes their use in terms such as
"malignant neuroblastoma", which is a Malignancy-type, so we will have
tag-within-tag:
But do not tag them when they are not describing a cancer or a tumor, e.g.:
"Benign", "malignant" °
* malignant neuroblastoma
Clin-stg-
--------type-----------
K-ras mutation analysis seems to be a powerful tool to determine the
Here the word "malignant" refers to a process: "malignant potential" means the ability or probability of a tumor to become malignant.
malignant potential of cystic pancreatic tumors before and
after surgery. (PMID 9671070)
This attribute specifies cell and/or tissue type(s) affected by benign or malignant tumors. It includes nothing below the cell level (subcellular components such as "nucleus" or "prokaryon", which we do not tag) and nothing above the tissue level (body structures such as "eye" or body regions such as "arm", both of which are Malignancy-site). [2004-10-19]
The terms are the same terms that are used for healthy cells. For example:
This tumor is composed of glial cells with low level differentiation.
Here "glial cells" is the phrase specifying the cell type making up the tumor, and so will be tagged as Malignancy-histology. The tag should include the word "cells", as indicated by the boldface type.
This attribute is also commonly used in naming the tumor, so Histology strings often appear as part or all of a Malignancy-type. Since Malignancy-type strings can include tagged strings of other types, such Malignancy-types will have (at least) two tags: the whole string tagged as Malignancy-type, the histological description tagged as Histology, and possibly other descriptors such as Site or Developmental-state.
In the following examples, we would tag the complete string to the left of the dash as Malignancy-type and the underlined part as Malignancy-histology, whether it is just part of the Malignancy-type (e.g., #11) or all of it. (The italicized text after the dash is the definition of the term, not part of the text.) Where the histological description consists of more than one word, tag them as a single string inside the longer Malignancy-type string, not two separate strings (see last two examples).
* chronic myelogenous leukemia
-------------------- Malignancy-histology
---------------------------- Malignancy-type
We will tag all references to cell type as Malignancy-histology, whether or not they actually are in a description of a malignancy. Even if, for example, "epithelial cells" appear in a sentence also mentioning "adenoma", both terms should be tagged as Malignancy-histology. (See discussion under Malignancy-site.)
(A list to be added to.) Tag as Malignancy-histology:
This attribute specifies the body part(s) affected by a malignancy, including organs, parts of organs, and body systems as well as terms like "leg" and "elbow" that refer to sections of the body. Terms referring to type of tissue should be tagged as Malignancy-histology.
Like Malignancy-histology, Malignancy-site is frequently used for
naming Malignancy-type; in fact, all the body parts mentioned in the
tumor names are the sites of the (not necessarily primary) tumors, and so are tagged
with this attribute. Examples of this kind include the following
(attribute values are in boldface):
Tag body part names in references to metastases. Although metastasis references are not
Malignancy-type, we are tagging body part names wherever they occur:
Sometimes a part of the body may be referred to with the word "area" or "region". It may be redundant, or the authors may be referring to a larger region than just the body part name that modifies it. Don't try to guess or figure it out or look it up, but just include it in the tagged string. But if "area" (or similar word) is accompanied by an identifier, the phrase probably refers to a very specific section of the body part mentioned or being discussed, so include the identifier as well.
(See below for terms that refer simultaneously to a cell or tissue type and to an organ or system of the body.)
Multiple body parts may be mentioned in conjunction, possibly referring either to a single value or to different values depending upon the context. For example, one abstract may always speak of "tumors of the head and neck", while another abstract may start off discussing "tumors of the head and neck" and later go on to separate discussions of "tumors of the head" and "tumors of the neck". Rather than read the whole abstract to decide whether such a conjoined mention at the beginning should be treated as one Site or as two, you should tag "head and neck" as a single Malignancy-site. (Actually, there aren't many Sites that are conjoined in this way; maybe the only other set is "small and large intestine".)
In a coordination like "tumors of the head and of the neck", where the second conjunct has its own preposition, tag the Sites separately.
Note that "cancer of the neck" is not a Malignancy-type because of the "no prepositions" rule for that attribute. But in such expressions, do tag "cancer" by itself as Malignancy-type:
cancer of the neck
---- Malignancy-site
------ Malignancy-type
cancer of the head and neck
------------- Malignancy-site
------ Malignancy-type
head and neck cancer ------------- Malignancy-site -------------------- Malignancy-type ** The Site ("head and neck") precedes "cancer" and there is no preposition.
We will tag all references to location in the body as
Malignancy-site, whether or not they actually are in a description of
a malignancy, and even if they appear to be redundant with another
mention in the sentence. For example, if "epithelial
cells" appear in a sentence also mentioning "adenoma",
both terms should be tagged as Malignancy-histology. In
The patient presented at the ER with a sprained left ankle, but examination and tests revealed osteosarcoma in the left tibia.tag as follows:
We are doing this for several reasons.
(A list to be added to.) Tag as Malignancy-site:
It is not always immediately clear whether to tag a reference as Histology or as Site.
Systems: Our domain experts have decided that references to a system of tissues in the body, such as "musculature" or "autonomic [nervous system]", should be tagged as Site rather than Histology.
Similarly, adjectives referring to a system (e.g., "neural") should generally be tagged as Site (compare "neuronal", which is Histology):
Both at once: A text string may refer both to cell or tissue type and to an organ or system of the body. For example, "lymphoma" refers to both lymphocytes (cell type, so Histology) and the lymphatic system (system, therefore Site). In order to avoid double tagging, we will tag such strings only as Malignancy-histology, which carries more specific information than Malignancy-site. The histology implies the site, but not necessarily vice versa.
This attribute shows the degree of tumor cell differentiation. At the early stage of normal development, cells within a particular tissue often look similar in appearance and function, a condition that is described as "undifferentiated". As development proceeds, they often change in appearance, behavior, and/or molecular characteristics, including the ability to evolve into two or more distinct cell subtypes. Many tumor cells, however, don't follow the normal developmental process, but stop differentiating at some point. This attribute indicates where that point is by specifying the degree of tumor cell differentiation.
Differentiation status of a tumor is often described roughly with phrases like
Pathologists also have a number of numerical grade systems to describe the degree of differentiation of tumor cells more precisely, with different systems used for different kinds of tumors. Higher scores usually describe well-differentiated tumors, and lower scores poorly-differentiated ones. Both the descriptive phrases and the systematic grade levels should be tagged as Malignancy-differentiation.
A malignancy can be partially or fully inheritable or can appear spontaneously without any similar family history. This attribute describes whether the malignancy in discussion has hereditary properties, that is, whether it can be transmitted from parent to child by information contained in the genes. The most common descriptions of this attribute are
NOTE: "congenital" does not refer to Malignancy-heredity-status. (A newborn child is a nine-month-old organism, and in that period can have developed a sporadic malignancy unrelated to the parents' germ plasm.)
(For brevity, "status" is omitted on the button in WordFreak, and probably in most of our discussions both oral and written.)
This category includes six tags:
Variations are extremely complex entities, actually involving a relationship between these components. Although there is a proposed standard notation to describe them, it is hardly ever used, and the literature contains a great many different ways of describing them.
Here some examples of the categories that we are now using to describe Variation. These lists are not exhaustive; they keep growing as we look at files and you ask questions.
Specifies the kind of change in the genomic material in a particular instance of variation, or a particular group of instances. [2004-08-18]
Besides synonyms, there are many ways of referring to mutations. People may refer to any of these with or without the word "mutation". Someone may say something like "the transition". And so on.
Sometimes the name is used in an adjectival form, as in "point mutational activities". In this case we would tag "point mutational" even though "mutational" is grammatically an adjective. [2003-07-31]
activation
Phenomic, not genomic. [2005-06-28]
alterations, genetic alterations
These expressions are entirely too general.
[2004-08-18]
gain of function
Phenomic, not genomic. [2005-06-28]
inactivation
Phenomic, not genomic. [2005-06-28]
loss of function
Phenomic, not genomic. [2005-06-28]
methylation
Something that happens to the gene, not a change
in the gene itself.
microsatellite instability
This is not a type of variation; it is a characteristic of the DNA
that makes it prone to variation. The same applies to the following
and other similar terms:
overexpression
Not a Variation-type. Defined as "excessive expression of a gene by
producing too much of its effect or product" (Merriam-Webster Medical
Dictionary via MedlinePlus).
[2004-09-15]
°
The place within the genomic material where the change occurs. Most often the location is within a gene: [2004-08-18]
Described as codon position, such as:
Described as nucleotide or protein sequence position, such as:
Described as cytogenetic band, such as:
del(13q) type loc(tag "del" as type and "13q" as location).
The location may also be included in a single string of notation together with the original and altered states, as above or in the next section.
[2003-07-23] [2005-05-12]° Sometimes the variation location can be a gene, when the entire gene rather than a part of it is the object of the variation. Just in such special cases, we double-tag the gene both as gene/RNA and as location. (See WordFreak instructions on double-tagging and clicking vs. dragging.)
NOTE: This does not apply to expressions like "a deletion mutation in the K-ras gene at codon 5", where the variation is specified as affecting a specific section of the gene rather than the whole gene. (See the discussion in the notes from 2003-08-19.) Even if no specific location within the gene is mentioned, as in "point mutations of the p53 gene", do not tag the gene as Location unless the entire gene is affected, which rules out double-tagging with most Variation-types. ("Deletion", unlike most other types, can refer to any scale in the genome, from a single nucleotide to a chromosome.) [2005-04-29]°
* deletion of the K-ras gene type---- G/RNA loc-- * translocation of the H-ras gene to location such-and-such type--------- G/RNA loc------------------- loc--Such double-tagging can be called for with at least the following variation types:
[2003-07-23] A location can be specified as a range, like "codons 18 through 20".
in codons 18 through 20 of the Ki-ras gene Loc----------------- G/RNA-
* from gene A to gene B G/R G/R Loc-------------------- * between genes A and B G/R G/R Loc--------------------
[2003-12-18] A variation is a change from one state of the genome to another. We have separate tags for the original and the altered states, as well as a "state-generic" tag for use when it isn't clear from the notation and the immediate text whether a state is original or altered. (See an example here on the mailing list archives.)
The states may be expressed in prose, as in "change of glycine to alanine", or as a formula that shows the two states linked by an arrow or similar marker. Such a formula may also include the location, as several of these examples do. The original state is shown here in red italics, the altered state in green italics, and the location in blue (not italic).
Described as amino acid change
Described as nucleic acid change
[2004-04-05] This category refers to the variation as a whole. It is similar in concept to the un-subdivided "Variation" category we began with, but its scope is limited to names or terms that refer to a whole variation, not long strings of text that describe it. (See Variation-Event Introduction for a fuller explanation.)
We use the Variation-event tag in two circumstances.
Frequently a variation or group of variations is described in specific detail in one or two sentences and is subsequently referred to with a phrase like "the mutation" or "this deletion" or "these point mutations"; or the reference may precede the description, either in the title or in the text. As long as the reference is to a variation event that is specified in terms of location, type, and/or state, tag it as a Variation-event, excluding determiners (the, this, these, that, those, ...). If it refers to a group of variation events tag it only if they are described as a group sharing at least one kind of specification.
Some genomic variations are common enough or important enough in
research to have names of their own. Down's syndrome (trisomy 21) is
so widespread that the name is familiar to many laypeople. Others
that we have encountered in this project are
[2003-07-23] Important note: Use this tag only when there is at least some specific information about the variation: at least a location, type, or (any kind of) state. Do not include the specific information in the tagged text. It doesn't even have to be in the immediate vicinity, as long as it clearly applies to the text you're tagging as a variation event. Some examples:
in 26 cases of Philadelphia-chromosome-positive,
bcr/abl-positive acute leukemia...
(pm0158896, edited)
"Philadelphia-chromosome" and "bcr/abl" are names for specific
variations. Their definitions include the type, location, and original
and altered states. (You would have to know this, whether from a
question or a reference tool; it's not evident in the context.) Tag
the bolded expressions.
We investigated 12 such cancers for the genetic anomalies
involved in the pathogenesis of gastrointestinal malignancies,
including (a) those occurring in common-type cancers -- allelic losses
at chromosomes 3p, 5q, ... (pm09154055)
"Genetic anomalies" is highly restricted by the context, which
includes mentions of specific locations and types. We use the context
to evaluate the reference, but we don't include it in the tag.
[2003-12-18] Some types of variation are more complex than others, or raise questions about how to tag them. Here are some specifics.
These are a complex type of variation, in which pieces of chromosomes get swapped around. Most of them involve a single exchange between two chromosomes:
| wild type: | chromosome A: | aaaaaaaaaaaaaaaaaaaaaaAAAAA |
| chromosome B: | bbbbbbbbbbbbbbbbbbBBBBBBBBB | |
| variation: | chromosome A: | aaaaaaaaaaaaaaaaaaaaaaBBBBBBBBB |
| chromosome B: | bbbbbbbbbbbbbbbbbbAAAAA |
There's a fairly standard notation for these; e.g.,
t(1;15)(p36.3;q24.2)That is:
Now, the original and altered state are implicit in this information, but they are not explicit there. There are two locations (here, 1p36.3 and 15q24.2), but they're not "before" and "after". But in annotating translocations we will tag one of the locations as "state-original" and the other as "state-altered". It doesn't theoretically matter which is tagged as which, but for consistency's sake let's tag the one mentioned first as original.
So we would tag this piece of notation as
t(1;15)(p36.3;q24.2)like this:
| t | var-type |
| 1 (+) p36.3 | var-state-orig |
| 15 (+) q24.2 | var-state-alt |
with each of the two states being a two-part chain. When this annotation is transferred into the database it will be transformed into a more accurate description of the translocation, but there's no need for us to complicate your work by adding new sets of buttons for different types of variation.
deletion{Type} of
bp 23-25{Loc}
We will tag the base pair range as location, not state. (Assume
that most of these examples begin with "deletion of", and tag
"deletion" as type.)
... exon 6 {Loc}
Similarly, an exon, or an intron, or a codon, or a range of them, will
be a location.
deletion {Type} resulting
in GGCTT {State-orig} ->
GT {State-alt}
Here we have explicit original and altered states, but no location.
... 3 base pairs in
exon 6 {Loc}
We have a location, but the text doesn't say which base pairs are
deleted, so we don't have any states or more precise location.
... D1S434-D1S228 {Loc}
This range specifies a location, in terms of markers that are used to
identify specific regions. This is similar to saying "between genes X and
Y" where the range between X and Y is the location.
... GCT {State-orig}
at bp23-25 {Loc}
The nucleotide sequence GCT is the original state, located at base
pairs 23-25.
One type of question that keeps coming up is "Should we tag XYZ as...?" Here are some types of term for which the answer is always No.
alleles
"In one patient, we observed 2 mutations that were shown to
be located in different alleles." (PMID: 1544704)
Alleles are not locations, but variant forms of a gene. The authors
are not using "located" in the sense of our Variation-location. (archive)
cell lines
"...heterologous expression systems of the DAT gene (human
embryonic kidney HEK-293 and mouse
neuroblastoma Neuro-2A cells,
respectively)." (PMID 11911843)
"HEK-293" and "Neuro-2A" are the names of cell
lines. We do not tag cell lines as any kind of entity at all. But we
do tag "neuroblastoma" here as Malignancy-type and
Malignancy-histology, even though it is not referring to a tumor in a
living organism. [2005-04-08] °
(archive)
| Annotators' home |
| Oncology annotators' page |
2005-11-30