Child pages
  • ZFIN Zebrafish Nomenclature Conventions

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: updated webdriver links

Based on Trends in Genetics Genetic Nomenclature Guide (1998)

Section
Column
width50%

NOMENCLATURE INFORMATION

The current nomenclature guidelines are updates to rules established during a discussion session at a meeting in Ringberg, Germany, in March 1992, and are widely accepted by most zebrafish labs.

Zebrafish Nomenclature Committee(ZNC)

APPROVAL FOR GENE AND MUTANT NAMES

Please use one of our submission forms to propose a new name for a gene or mutant and to provide supporting information. Your submission will be sent to the ZFIN nomenclature coordinator for review and will be treated in confidence

Submit a Proposed GENE Name
Submit a Proposed LOCUS/LINE Name

ADDITIONAL RESOURCES

A Tutorial for Proposing Zebrafish Gene Nomenclature
Laboratory Line Designations

Other nomenclature guidelines: Human,
Mouse, Fly (Drosophila), Yeast (Saccharomyces), Gene families

Column
width5%
 
Column


CONTENTS:

Anchor
1
1

1. GENE NAMES AND SYMBOLS

Full gene names are lowercase italic, and gene symbols are three or more lowercase letters and are also italicized. The letters should be unique with respect to other named zebrafish mutants and genes.  Gene symbols should not be the same as gene abbreviations in mouse or human, except in cases of established orthology, where the gene symbol should match that of the orthologue. Zebrafish gene designations should not include any reference to species, for example d, dr, z or zf. The use of punctuation such as period and hyphens in gene names or symbols is discouraged, except under specific circumstances described below.
Gene names should be registered at ZFIN.

Anchor
1.1
1.1

1.1. Gene Nomenclature
Genes should be named after the mammalian orthologue whenever possible. When mammalian orthologues are known, the same name and abbreviation should be used, except all letters are italicized and lower case. Members of a gene family are sequentially numbered.

...

Mutant names should be registered at ZFIN.

Anchor
1.4
1.4

1.4. Genes identified only by genomic sequencing projects
Large-scale genome sequencing projects use a variety of prediction methods to identify both open reading frames and genes. Some of these genes are already known, while others are new. Novel genes identified by these means often cannot be identified and are assigned a name comprised of a prefix, a clone name, and an integer. The prefix is used to specify the research institution that identified the gene (e.g., "si" for the Sanger Institute). A colon separates the prefix from the clone identifier. In many cases, there are multiple predicted reading frames in a single clone. These genes are distinguished with a full stop (period) between the clone name and an integer. Integers are assigned to genes in the clone as they are identified and do not indicate the order of genes. If part of a gene is found in more than one clone, the name of the first clone in which the 5' portion of the gene is found takes precedence.

...

Pseudogenes are sequences that are generally untranscribed and untranslated and which have high homology to identified genes . However, it has recently been shown that in different organisms or tissues functional activation may occur.  Pseudogenes will be assigned the next number in the relevant symbol series, suffixed by a "p" for pseudogene  e.g. prf1.9p is the symbol for "perforin 1.9, pseudogene".

Anchor
2
2

2. PROTEINS

The protein symbol is the same as the gene symbol, but non-italic and the first letter is uppercase.

...

     Examples: Shha (Syu), Bmp2b (Swr)

Anchor
3
3

3. ALLELES and GENOTYPES

Anchor
3.1
3.1

3.1 Line designations

...

Due to technical constraints, genotypes at ZFIN are shown in alphabetical order by gene, and then by allele designation. See below for display of complex genotypes involving transgenic or chromosomal rearrangements.

Anchor
4
4

4. CHROMOSOMES AND ABERRATIONS

The chromosome numbering system corresponds to the old Linkage Group designations with what was LG1 now named Chr1. Chromosomes are designated by non-italic numerals, 1 to 25. Reminder: cytogenetically identified chromosome numbers differ from the ‘Chr’ designations used for linkage groups and the reference genome sequence. Chromosome differences have not been observed between males and females in laboratory strains.

...

Df indicates deficiency. The term xxx should describe the salient features of the deficiency, as determined by the investigator. In cases where the deficiency removes sequences from named genes, the name should contain the standard symbols for those genes. The deleted genes should be listed in order, when known, separated by commas. The line designation should follow standard nomenclature conventions (institution designation followed by line number).

...

The term xxx should describe some salient feature of the translocation, as determined by the investigator. In cases where the translocation moves a named gene primarily studied by the investigator, xxx would usually be the standard symbol for that gene. Alternatively, xxx could just be an experimental series number.

The line designation should follow standard nomenclature conventions (institution designation followed by line number). After the line designation comes a comma, and then a phrase that indicates the new order of the chromosomes, starting from the top of the chromosome as displayed by convention. The first number (##) is the Chr number, followed by upper case U to indicate the upper arm of a chromosome or by upper case L to indicate the lower arm of a chromosome. The location of the centromere is indicated by a period. No spaces. Translocations are written as an allele of a gene when the gene is disrupted at one of the breakpoints of the translocation. There can be as many as four alleles of a translocation.

...

Tg indicates transgene. Within the parentheses, the most salient features of the transgene should be described. Brevity and clarity in the transgene name are favored, in general, over exhaustive detail. Regulatory sequences, which can be derived from either an enhancer or promoter, should be listed to the left of the colon. In general, the regulatory sequence is named for the gene from which it was derived or the gene/transcript that it regulates. Coding sequences are placed to the right of the colon. Not all transgenic constructs will have both regulatory and coding elements, and in this case, the colon will not be used. In cases where a construct utilizes sequences from a named gene, it should contain the standard zebrafish lowercase symbol for that gene.  The entire transgene name should be italicized. 

  • Enhancer trap, promoter trap, gene trap constructs : These all use the same nomenclature conventions as described for transgenic constructs, substituting Et, Pt, Gt as necessary.

...

  • Transgenes with transcripts in constructs: For those cases where a specific transcript or transcript promoter of a gene is used, the transcript number or name should be used. It should be noted that the use of   hyphens here is distinct from the use of hyphens in regulatory or coding sequence fusions as discussed below. The hyphen in transcript names is an integral part of the transcript name and demarcates the transcript number for a gene.

     Example: Tg(pitx2-002:GFP) In this case an internal pitx2 gene promoter that generates the pitx2-002 transcript is driving expression of GFP.

...

  • Fusions in constructs: Regulatory or coding sequence fusions should be separated by hyphens.

     Example: Tg(

...

     Example: Tg(EPV.Tp1-Ocu.Hbb2:hmgb1-mCherry)  In this case the construct utilizes six copies of the promoter from the Terminal protein 1 gene (Tp1) from the Epstein-Barr Virus (EPV), upstream of the rabbit (Ocu) 

      beta-globin (Hbb2) minimal promoter driving hmgb1 fused to mCherry.

...

actb2:stk11-mCherry)  

...

This construct codes for a fused protein of stk11 and mCherry under the control of

...

the actb2 promoter.

  • Promoter elements of differing sizes in constructs: In cases where a number of constructs are generated with

...

  • different sizes of promoter elements, these may be specified within the parentheses

...

  • using the length of the upstream DNA:

     Examples:

...

These examples represent two constructs

...

that code for a fusion protein of

...

sptb

...

and GFP driven by an upstream enhancer

...

either 3.5kb or 6.0kb 5' to the hhex gene.
     Tg(-3.5hhex:sptb-GFP)
     Tg(-6.0hhex:sptb-GFP)

However, in

...

many cases, the changes within the construct may be too small or too complex to change the number of kbp

...

or cannot be determined. To differentiate these constructs, they will be appended with a sequential number between the Tg (also Et, Pt, Gt) and the parentheses, instead of including further details in the name.  Details will be provided in the notes field on the construct page.

      Examples: original construct: Tg1(uxs1:GFP); subsequent construct: Tg2(uxs1:GFP); additional constructs: Tg#(uxs1:GFP)

  • Foreign Genes used in constructs: For those cases where a gene from a different species is used, the three letter species abbreviation should be used (Homo sapien [Hsa], Mus musculus [Mmu], Salmo salar [Ssa]) followed by a period and the gene symbol. For human genes use the standard gene symbol conventions of all capital letters. For mouse and other species, the first letter of the gene is capitalized. An exception to the 3-letter rule is Chlamydomonas reinhardtii.  Please use Cr for this organism as the 3-letter abbreviation (Cre) conflicts with the abbreviation for the Cre-Lox system.

     Example: Tg(Hsa.FGF8:GFP)  Here the promoter of the human FGF8 gene is driving expression of GFP.

     Example: Tg(Ssa.Ndr2:GFP)  Here the promoter of the salmon Ndr2 gene is driving expression of GFP.

  • Mutations used in constructs: When a mutated form of a gene is used in a construct, the

...

  • mutation/s in the gene

...

  • can be included in the construct. The variations should be represented at the most basic level, describing either DNA or amino acid changes.  Manuscript descriptions of the mutated sequence should always be related to a reference sequence (accession number) in order to be relevant and informative. The accession number will be added to the construct page.

...

     Example:

...

Tg(cav3:cav3_R26Q-GFP) The mutation results in an amino acid substitution of arginine for glutamine at position 26.

     Example: Tg(Hsa.MPZ_1026T>A:EGFP) The nucleotide mutation is in human gene MPZ at position 1026 where T has been replaced by A.

  • Clones in constructs: Transgenic constructs using modified clones, such as BACs and PACs, should be named with the clone type inserted between the "Tg" and the "(

...

  • ". The accession number of the clone must be included in the publication, so it can be associated with the construct. A link to the appropriate clone will be added to the construct page.

    Example: TgPAC(tal1:GFP) GFP is inserted within or near the coding sequence of tal1 in the PAC with the GenBank# AL592495.

  • Two or more cassettes in one construct: If there are two or more cassettes in a construct, it is necessary to distinguish between

...

  • cassettes by using a comma.

...

     Example: Tg(isl2b:GAL4,UAS:GFP) Here, isl3 promoter drives GAL4, and UAS drives GFP

  • Two or more distinct constructs inserted at the same locus: If 2 or more independently injected constructs

...

  • are experimentally demonstrated to be integrated

...

  • at the same locus, each construct

...

  • should be separated by a

...

  • comma. In this case, the line will be assigned one line designation (allele) number.  Note: if it is later

...

  • determined that the constructs integrated in different

...

  • loci, an additional line number

...

  • will be needed. 

...

Example: Tg(

...

sox9a:mCherry),Tg(usx1:YFP)line#

  • One promoter drives two or more coding sequences in construct: When one promoter is used to drive more than one

...

  • coding sequence, a comma is used to separate the gene names.  This includes uni- & bidirectional promoters.

     Example: Tg(abhd2a:YFP,mCherry)

  • Construct using a regulatory element that regulates more than one gene in vivo : For those situations where a construct utilizes enhancers or promoters from genes that regulate two or more genes in vivo, only one of the genes should be represented in the name such that the gene with the lowest number or gene closest to the promoter is listed.

     Example:  Tg(

...

dlx1a:GFP) This construct utilizes

...

regulatory elements of dlx1a and dlx2a to drive expression of GFP.  In this case the

...

lower0numbered gene

...

is listed in the name.

     Example: Tg(zic4:Gal4TA4, UAS:mCherry) This construct utilizes an enhancer of both the zic1 and zic4 genes to drive expression of Gal4TA4, with an additional cassette that has UAS driving mCherry expression.  In

...

this case, the gene closest to the enhancer was listed in the name.

For those cases where a gene from a different species is used, the three letter abbreviation should be used (Homo sapien (Hsa), Mus musculus (Mmu), Salmo salar(Ssa)) followed by a period then the gene symbol. For human genes use the standard gene symbol conventions of all capital letters. For mouse and other species the first letter of the gene is capitalized.  An exception to the 3 letter rule is Chlamydomonas reinhardtii.  Please use Cr for this organism as the 3-letter abbrev (Cre) conflicts with the abbreviation for the Cre-Lox system.

     Example: Tg(Hsa.FGF8:GFP)  Here the promoter of the human FGF8 gene is driving expression of GFP.

     Example: Tg(Ssa.Ndr2:GFP)  Here the promoter of the salmon Ndr2 gene is driving expression of GFP.

Transgenic constructs using modified clones such as BACs and PACs should be named with the clone type inserted between the "Tg" and the "("

    Example: GFP is inserted to within or near the coding sequence of tal1 in the PAC with the GenBank# AL592495.  The construct name would be TgPAC(tal1:GFP).  The accession number of the clone must be included in the publication so that it can be associated with the construct.   A link to the appropriate clone can be found on the construct page.

Anchor
4.3.2
4.3.2

4.3.2 Enhancer trap, promoter trap, gene trap constructs

These all use the same nomenclature convention as described for transgenic constructs above, substituting Et, Pt, or Gt , Et as necessary.

Anchor
4.3.3
4.3.3

4.3.3 Transgenic lines

...

      Tg(-0.7her5:EGFP)ne2067;hmgcrb s617/s617

Anchor
5
5

5. PRIORITY IN NAMES

As described above, zebrafish genes are named based on orthology to a human or mouse gene. If an ortholog cannot be identified, then the name that appears first in the literature will be given priority assuming it follows other nomenclature guidelines. ZFIN recommends submission of proposed gene names via the ZFIN form or consultation with the zebrafish nomenclature committee (nomenclature@zfin.org) for nomenclature assignment.

When a mutation is found in a previously cloned zebrafish gene, then the mutant will be referred to as an allele of the gene. If both the cloned gene and the mutation are known by different names and later found to be the same gene, then the name of the gene usually takes priority. The exception to this rule is when the mammalian gene has a gene symbol that is less than two characters such as the mouse gene brachyury which has the symbol T. In this case the zebrafish gene retained the original name no tail, ntl.

Anchor
6
6

6. MAPPING AND SEQUENCING INFORMATION

The genome project began in 1994, and by 1996 the genetic map was closed. NIH funded major programs to develop a doubled haploid meiotic mapping panel, deficiency strains and expressed sequence tags (ESTs), The ESTs and anonymous markers have been mapped on two radiation-hybrid panels. The Sanger Institute began full genome sequencing in 2001. A physical map is being constructed from the BAC libraries used for sequencing. Genomic information is updated regularly on ZFIN.

Anchor
7
7

7. CONTRIBUTORS

Current Nomenclature Coordinator:
Amy Singer (asinger@zfin.org), ZFIN Database Team, Zebrafish Information Network, University of Oregon, USA

...

Past Contributors:
Erik Segerdell, XenBase, University of Calgary, Canada
Melissa Haendel (haendel@ohsu.edu)), Oregon Health and Sciences University, USA
Ceri Van Slyke (van_slyke@uoneuro.uoregon.edu), Zebrafish Information Network, University of Oregon, USA
Yvonne Bradford (ybradford@zfin.org), Zebrafish Information Network, University of Oregon, USA

Anchor
8
8

8. REFERENCES

  1. The Zebrafish Science Monitor (1992) Sept. 21.
  2. Mullins, M. (1995) Genetic methods: conventions for naming zebrafish genes in The Zebrafish Book (3rd edition, Westerfield, M., ed.), pp 7.1-7.4, University of Oregon Press.
  3. Genetic Nomenclature Guide, Trends in Genetics (1998).

...

For questions and advice about appropriate nomenclature, contact us at  nomenclature@zfin.org .