Based on Trends in Genetics Genetic Nomenclature Guide (1998)
The current nomenclature guidelines are updates to rules established during a discussion session at a meeting in Ringberg, Germany, in March 1992, and are widely accepted by most zebrafish labs.
Zebrafish Nomenclature Committee(ZNC)
- Our Activities, [Contributors]
- For questions and advice about appropriate nomenclature, contact us at email@example.com.
Please use one of our submission forms to propose a new name for a gene or mutant and to provide supporting information. Your submission will be sent to the ZFIN nomenclature coordinator for review and will be treated in confidence
- [1. Gene names and Symbols]
- [1.1. Genes identified by cloning]
- [1.2. Duplicated genes]
- [1.3. Genes (loci) identified by mutation]
- [1.4. Genes identified by genomic sequencing projects]
- [1.5. Genes identified only by other large scale projects]
- [1.6. Transcript variants]
- [2. Proteins]
- [3. Alleles and Genotypes]
- [3.1 Line designations]
- [3.2 Genotype nomenclature for publications]
- [3.3 Genotype display at ZFIN]
- [4. Chromosomes and aberrations]
- [4.1. Deficiencies]
- [4.2. Translocations]
- [4.3. Transgenic lines and constructs]
- [5. Priority in Names]
- [6. Mapping and Sequencing information]
- [7. Contributors]
- [8. References]
Full gene names are lowercase italic, and gene symbols are three or more lowercase letters and are also italicized. The letters should be unique with respect to other named zebrafish mutants and genes and except in cases of established orthology, where the gene symbol should match that of the orthologue. Zebrafish gene designations should not be preceded by 'Z' or 'Zf'. The use of punctuation such as period and hyphens in gene names or symbols is discouraged, except under specific circumstances described below.
Gene names should be registered at ZFIN.
1.1. Genes identified by cloning
Genes should be named after the mammalian orthologue whenever possible. When mammalian orthologues are known, the same name and abbreviation should be used, except all letters are italicized and lower case. Members of a gene family are sequentially numbered.
Names - engrailed 1a, engrailed 2b
Symbols - eng1a, eng2b
In some cases when a zebrafish gene has been renamed to the mammalian orthologue from an older zebrafish name, it is still preferable within a publication to refer to the previous name. Refer to the previous name by appending the previous name in parentheses. Previous names are searchable at ZFIN.
Examples: shha (syu), bmp2b (swr)
1.2. Duplicated genes The zebrafish genome contains duplicated segments that resulted from a genome-wide duplication in the ray fin fish lineage after it diverged from the lobe fin lineage (that included avian and mammalian species). For this reason, zebrafish often have two copies of a gene that is present as a single copy in mammals.
In these cases, symbols for the two zebrafish genes should be the same as the approved symbol of the human or mouse orthologue followed by "a" or "b" to indicate that they are duplicate copies. Before these symbols are assigned, it is important to provide evidence by mapping that the two copies reside on duplicated chromosome segments. It is preferable that all copies in one of the duplicate chromosome segments use the same "a" or "b" suffix, although this will not always be possible for historical reasons. The a or b suffix does not indicate primacy of publication and will be assigned purely based on the suffix of the surrounding genes. This terminology should not be used for duplicates that resulted prior to the divergence of ray fin and lobe fin fish. In these cases it is preferable to use terminology that is most consistent with the mammalian nomenclature.
Examples: hoxa13a, hoxa13b
In some cases when there is a unique mammalian orthologue, but addition of the a, b suffixes would conflict with a different mammalian gene symbol, then numerical suffixes .1, .2 should be appended to the orthologous mammalian gene symbol instead of a, b. Tandem duplicates with a single mammalian orthologue may also be appended with a .1, .2, using the same symbol as the mammalian orthologue.
Examples: stat5.1, stat5.2
When mammalian gene duplications prevent identification of a unique mammalian orthologue, then an alternate gene symbol should be chosen. A possible choice would be an approved gene symbol from a unique non-mammalian orthologue. When a gene is homologous to a human gene, but orthology is ambiguous, the gene should be named after the closest mammalian homologue with the word 'like' appended to the name of the homologue. In some cases, a gene family described in zebrafish is homologous to a mammalian gene family but the evolution of the gene family is ambiguous. Under these circumstances the zebrafish gene family should be named with the same stem as the mammalian gene family with the gene number beginning after the end of the mammalian numbering and continuing sequentially throughout the gene family. If the members of the gene family are on the same chromosome, the adjacent genes should be given sequential numbers.
1.3. Mutant loci with unidentified genes Mutant loci for which the gene has not yet been identified are given placeholder gene names. When the gene is identified, it is renamed following standard nomenclature guidelines as described above. Genes identified by mutation are typically named to reflect the mutant phenotype. The symbol should be derived from the full name. Numbers should generally not be used in naming a mutant.
Example: touchy feely, tuf
Mutant names should be registered at ZFIN.
1.4. Genes identified only by genomic sequencing projects
Large-scale genome sequencing projects use a variety of prediction methods to identify both open reading frames and genes. Some of these genes are already known, while others are new. Novel genes identified by these means often cannot be identified and are assigned a name comprised of a prefix, a clone name, and an integer. The prefix is used to specify the research institution that identified the gene (e.g., "si" for the Sanger Institute). A colon separates the prefix from the clone identifier. In many cases, there are multiple predicted reading frames in a single clone. These genes are distinguished with a full stop (period) between the clone name and an integer. Integers are assigned to genes in the clone as they are identified and do not indicate the order of genes. If part of a gene is found in more than one clone, the name of the first clone in which the 5' portion of the gene is found takes precedence.
Examples: si:bz3c13.1, si:bz3c13.2, si:bz3c13.3
Genes initially identified by genomic sequencing projects are renamed using standard nomenclature guidelines (described above) as more information about them becomes available.
1.5. Genes identified only by other large scale projects
Large-scale sequencing of ESTs or full length cDNA clone sets often result in large numbers of unidentified genes. These are given placeholder names with the project prefix, a colon and a clone number, similar to genes identified by genomic sequencing projects. In these cases, the clones usually contain only one or a fragment of a single gene.
Examples: im:7044540, zgc:165514
1.6. Transcript variants
Transcript variants that originate from the same gene are not normally given different gene symbols and names. However, variants from a single gene can be distinguished in publications by adding to the end of the full name a comma, "transcript variant", and a serial number; and by adding to the end of the symbol an underscore, "tv", and a serial number.
Names -myosin VIa, transcript variant 1, myosin VIa, transcript variant 2,
Symbols -myo6a_tv1 myo6a_tv2
The protein symbol is the same as the gene symbol, but non-italic and the first letter is uppercase.
Examples: Ndrw, Brs, Eng1a, Eng2b, Ntl
Note the differences between zebrafish and mammalian naming conventions:
species / gene / protein
zebrafish / _shha _/ Shha
human / SHH / SHH
mouse / Shh / SHH
In publications, it is sometimes convenient to refer to a protein which has been renamed based on orthology using the more commonly known name in parentheses following the current name.
Examples: Shha (Syu), Bmp2b (Swr)
3.1 Line designations
When describing genes wild-type alleles are indicated using a superscript "+", while mutant alleles are indicated using a superscript line designation. Line designations are composed of a institution-specific designation followed by a number. The full list of institution designations can be found at ZFIN.
Institute specific line designations should be two or three letters in length, preferably two letters. These designations should not be the same as a gene name in mouse or human. The institution designation should be followed by a unique number specific to a particular line. Other letters should not immediately follow the institution designation but may be appended to the end of the line designation to make it unique. Line designations should only contain alphanumeric characters. Dominant alleles have a d in the first position of the line designation to distinguish them from recessive alleles. This means that the letter 'd' cannot begin an institution designation. Line designations for transgenic lines follow these same rules, so the same number cannot be give to both a transgenic line and a mutant allele.
Examples: b is the Eugene designation; m is for MGH, Boston; t is Tuebingen, Germany
wild type:<i>lof<sup></sup></i>, <i>ndr2<sup></sup></i>, <i>brs<sup>+</sup></i>
mutant: <i>lof<sup>dt2</sup></i>, <i>ndr2<sup>b16</sup></i>, <i>ndr2<sup>m101</sup></i>, <i>ndr2<sup>t219</sup></i>
3.2 Genotype nomenclature for publications
For unlinked loci, heterozygotes and homozygotes are distinguished by having each allele separated by a slash "/".
<i>ednrb1<sup>b140</sup>/ ednrb1<sup></sup></i> (heterozygote, can be abbreviated <i>ednrb1<sup>b140/</sup></i>)
<i>ednrb1<sup>b140</sup>/ ednrb1<sup>b140</sup></i> (homozygote, can be abbreviated <i>ednrb1<sup>b140/b140</sup></i> or <i>ednrb1<sup>b140</sup></i>)
For homozygous genotypes, the genotype at each locus is listed in order according to Chromosome number, from 1 to 25, with a semicolon to separate loci on different chromosomes.
For heterozygous genotypes, loci on homologous chromosomes are separated by a slash.
For linked loci, the haplotype on each chromosome is written as a sequence, with a space separating syntenic loci, and loci are placed in the order they appear on the Linkage Group, top to bottom. Homologous chromosomes are separated by a slash, and non-homologous chromosomes are separated by semicolons.
<i>ednrb1<sup>b140</sup> cx41.8<sup>t1</sup>; slc24a5<sup>b16</sup></i>
For unmapped loci, genotypes of unmapped loci are listed alphabetically within braces following genotypes of mapped loci on different chromosomes.
</i> (edi is unmapped, all three loci are on different chromosomes)
Poorly resolved loci on same chromosome are listed alphabetically within braces.
</i> (poorly resolved loci on same chromosome)
cx41.8<sup>t1</sup></i> (poorly resolved loci in a known interval between mapped loci, all on same chromosome)
3.3 Genotype displays in ZFIN
Due to technical constraints, genotypes at ZFIN are shown in alphabetical order by gene, and then by allele designation. See below for display of complex genotypes involving transgenic or chromosomal rearrangements.
The chromosome numbering system corresponds to the old Linkage Group designations with what was LG 1 now named Chr 1. Chromosomes are designated by non-italic numerals, 1 to 25. Chromosome differences have not been observed between males and females.
Chr1 to Chr25
Chromosome rearrangements are indicated with the following prefixes, followed by the details within parentheses. See below for specific examples. Common prefixes include:
The general format for naming a deficiency is:
Df indicates deficiency. The term xxx should describe the salient features of the deficiency, as determined by the investigator. In cases where the deficiency removes sequences from a named gene, the name should contain the standard symbol for that [gene].If the deletion removes multiple genes then they should be listed in order, when known, separated by commas. The [line] designation should follow standard nomenclature conventions (institution designation followed by line number).
The chromosome where the deficiency maps should be specified by its number (##) using two digits (i.e., 03 for Chr 3) so that computers will order them properly.
When a gene is disrupted at one of the two breakpoints of the deficiency, please contact the nomenclature coordinator at ZFIN for assistance (firstname.lastname@example.org).
The general format for naming translocations depends upon the type of translocation:
Reciprocal translocations have two separate chromosomal elements, and each element has a distinct name: T(Chr##;Chr##)xxx<line#,##U.##L and T(Chr##;Chr##)xxxline#,##U.##L
T indicates translocation. The elements in the parentheses are the chromosomes involved, the lower numbered chromosome is listed first, and the chromosomes are separated by a semicolon. The chromosomes should be specified by their numbers (##) using two digits (i.e., 03 for Chr 3) so that computers will order them properly.
The term xxx should describe some salient feature of the translocation, as determined by the investigator. In cases where the translocation moves a named gene primarily studied by the investigator, xxx would usually be the standard [symbol] for that gene. Alternatively, xxx could just be an experimental series number.
The [line] designation should follow standard nomenclature conventions (institution designation followed by line number). After the line designation comes a comma, and then a phrase that indicates the new order of the chromosomes, starting from the top of the chromosome as displayed by convention. The first number (##) is the Chr number, followed by upper case U to indicate the upper arm of a chromosome or by upper case L to indicate the lower arm of a chromosome. The location of the centromere is indicated by a period. No spaces. Translocations are written as an allele of a gene when the gene is disrupted at one of the breakpoints of the translocation. There can be as many as four alleles of a translocation.
T(Chr02;Chr12)ndr2b2131,02U.12L02L and T(Chr02;Chr12)ndr2b2131,12U.12L02L
This example illustrates a reciprocal translocation where a portion of the lower arm of Chr12 was translocated interstitially into the proximal lower arm of Chr2 and a portion of the lower arm of Chr2 was translocated to
the distal lower arm of Chr12.
Resolved translocations are where the two elements of the translocation separate and a mutant line has just one of the elements. This results in the animal being monosomic for some chromosome regions and trisomic for others. In these cases, the mutant line would be designated with just one of the elements rather than two as in the reciprocal designation above. The allele name would remain the same to indicate their common origin and common breakpoint.
4.3. Transgenic lines and constructs
Transgenic constructs now have their own pages in ZFIN. Transgenic construct names are important because the construct name is used in the transgenic line nomenclature when the insertion is NOT an allele of a gene (see below).
4.3.1 Transgenic constructs
Tg indicates transgene. Within the parentheses, the most salient features of the transgene should be described. Brevity and clarity in the transgene name are favored, in general, over exhaustive detail. Regulatory sequences should be listed to the left of the colon, and coding sequences to the right of the colon. Not all transgenic constructs will have both promoter and coding elements, and in this case, the colon will not be used. In cases where a construct utilizes sequences from a named gene, it should contain the standard zebrafish lowercase symbol for that gene. For those cases where a specific transcript or transcript promoter of a gene is used, the transcript number or name should be used. Transcript names and numbers can be identified by BLASTing the sequence in question against the ZFIN Vega Transcripts database at ZFIN. On the BLAST results page the transcript name will be displayed in the transcript or clone column. (http://zfin.org/action/blast/blast)
It should be noted that the use of hyphens here is distinct from the use of hyphens in regulatory or coding sequence fusions as discussed below. The hyphen in transcript names is an integral part of the transcript name and demarcates the transcript number for a gene.
Example: Tg(pitx2-002:GFP) In this case an internal pitx2 gene promoter that generates the pitx2-002 transcript is driving expression of GFP.
Regulatory sequence could be derived from either an enhancer or promoter, and is denoted by the symbol of the regulated gene or gene transcript. Regulatory or coding sequence fusions should be separated by hyphens.
Example: Tg(TetRE:Mmu.Axin1-YFP) In this case the construct has a fusion protein of mouse Axin1 and YFP under the control of a tetracycline response element.
Example: Tg(EPV.Tp1-Ocu.Hbb2:hmgb1-mCherry) In this case the construct utilizes six copies of the promoter from the Terminal protein 1 gene (Tp1) from the Epstein-Barr Virus (EPV), upstream of the rabbit (Ocu)
beta-globin (Hbb2) minimal promoter driving hmgb1 fused to mCherry.
Example: Tg(actb:stk11-mCherry) In this case the construct has a fusion protein of stk11 and mCherry under the control of the actb2 promoter.
In cases where a number of constructs are generated with differing sizes of promoter elements, these may be specified within the parentheses as follows:
These examples represent two constructs containing a fusion protein of spectrin beta (sptb) and GFP driven by an upstream enhancer containing either 3.5kb or 6.0kb 5' to the hhex gene.
However, in a number of cases, the changes within the construct may be too small to change the number of kbp. In this case, the constructs will be appended with a period and a number within the parentheses, referring to the element that has changed, instead of including further details in the name. Alternatively, if the .1, .2 nomenclature conflicts with a gene name, then a number may be placed at the beginning of the construct name. The numbering should start with a 1 and increment by one for each different construct. The details of construct differences will be available on the construct pages.
Sometimes within a single construct, there are multiple cassettes, each containing regulatory and coding sequences. In this case, it is necessary to distinguish between what is coding in the first cassette with what is regulatory in the second. Multiple cassettes may be distinguished using a comma. In the following example, isl3 promoter drives GAL4, and UAS drives GFP.
For those situations where a construct utilizes enhancers or promoters that regulate two or more genes, only one of the genes should be represented in the name such that the gene with the lowest number or gene closest to the promoter is listed.
Example: Tg(dlx1aIG:GFP) This construct utilizes intergenic (IG) regulatory elements of dlx1a and dlx2a to drive expression of GFP. In this case the lower numbered gene was listed in the name.
Example: Tg(zic4:Gal4TA4, UAS:mCherry) This construct utilizes an enhancer of the zic1 and zic4 genes to drive expression of Gal4TA4, with an additional cassette that has UAS driving mCherry expression. In this
case the gene closest to the enhancer was listed in the name.
For those cases where a gene from a different species is used, the three letter abbreviation should be used (Homo sapien (Hsa), Mus musculus (Mmu), Salmo salar(Ssa)) followed by a period then the gene symbol. For human genes use the standard gene symbol conventions of all capital letters. For mouse and other species the first letter of the gene is capitalized.
Example: Tg(Hsa.FGF8:GFP) Here the promoter of the human FGF8 gene is driving expression of GFP.
Example: Tg(Ssa.Ndr2:GFP) Here the promoter of the salmon Ndr2 gene is driving expression of GFP.
Enhancer, promoter, and gene-trap constructs may use Et, Pt, or Gt, all of which are considered types of transgenic constructs.
4.3.2 Enhancer trap, promoter trap, gene trap constructs
These all use the same nomenclature convention as described for transgenic constructs above, substituting Pt, Gt, Et as necessary.
4.3.3 Transgenic lines
Transgenic lines are of two types, those that are known to create alleles of genes and those that are not known to create alleles of genes. For a line that does not create an allele of a gene, the feature name consists of the construct name appended with a unique line number with no superscript. The line number should begin with the laboratory designation followed by a unique number.
For lines that do create alleles of a gene, a standard genetic representation is used, where the allele designation is superscripted above the gene, but is appended with a Tg to indicate that it is a transgenic insertion allele. Details regarding the construct used will be available on the genotype page. Gene traps and enhancer traps known to create alleles of a gene are handled in a similar fashion, appending Gt or Et to the allele designation.
4.3.4 Display of complex genotypes at ZFIN
Genotypes at ZFIN are shown in alphabetical order with transgenic lines that are not alleles of genes first, then other alleles.
As described above, zebrafish genes are named based on orthology to a human or mouse gene. If an ortholog cannot be identified, then the name that appears first in the literature will be given priority assuming it follows other nomenclature guidelines. ZFIN recommends submission of proposed gene names via the ZFIN form or consultation with the zebrafish nomenclature committee (email@example.com) for nomenclature assignment.
When a mutation is found in a previously cloned zebrafish gene, then the mutant will be referred to as an allele of the gene. If both the cloned gene and the mutation are known by different names and later found to be the same gene, then the name of the gene usually takes priority. The exception to this rule is when the mammalian gene has a gene symbol that is less than two characters such as the mouse gene brachyury which has the symbol T. In this case the zebrafish gene retained the original name no tail, ntl.
The genome project began in 1994, and by 1996 the genetic map was closed. NIH funded major programs to develop a doubled haploid meiotic mapping panel, deficiency strains and expressed sequence tags (ESTs), The ESTs and anonymous markers have been mapped on two radiation-hybrid panels. The Sanger Institute began full genome sequencing in 2001. A physical map is being constructed from the BAC libraries used for sequencing. Genomic information is updated regularly on ZFIN.
Marc Ekker (firstname.lastname@example.org), Center for Advanced Research in Environmental Genomics, University of Ottawa, Ontario, Canada
Mary Mullins (email@example.com), Department of Cell and Developmental Biology, University of Pennsylvania, USA
John Postlethwait (firstname.lastname@example.org), Institute of Neuroscience, University of Oregon, USA
Monte Westerfield (email@example.com), Institute of Neuroscience, University of Oregon, USA
Erik Segerdell, XenBase, University of Calgary, Canada
Melissa Haendel (firstname.lastname@example.org)), Oregon Health and Sciences University, USA
Ceri Van Slyke (email@example.com), Zebrafish Information Network, University of Oregon, USA
Yvonne Bradford (firstname.lastname@example.org), Zebrafish Information Network, University of Oregon, USA
Amy Singer(email@example.com), Zebrafish Information Network, University of Oregon, USA
- The Zebrafish Science Monitor (1992) Sept. 21.
- Mullins, M. (1995) Genetic methods: conventions for naming zebrafish genes in The Zebrafish Book (3rd edition, Westerfield, M., ed.), pp 7.1-7.4, University of Oregon Press.
- Genetic Nomenclature Guide, Trends in Genetics (1998).
For questions and advice about appropriate nomenclature, contact us at firstname.lastname@example.org.