

The score units are completely unspecified, but for sequence similarities, it is typically percent identity. Start is always less than or equal to stop.įor annotations that are associated with a numeric score (for example, a sequence similarity), this field describes the score. The stop of the annotation relative to the reference sequence. The start of the annotation relative to the reference sequence.

Together the method and source describe the annotation type. This field describes the type of the annotation, such as "CDS".

The annotation method, also known as type. The names and versions of software programs are often used for the source field, as in "tRNAScan-SE/1.2". In the example above, the source is "curated" to indicate that the feature is the result of human curation. This field describes how the annotation was derived. In the example above, the reference sequence is "Chr1". This is the ID of the sequence that is used to establish the coordinate system of the annotation.

Each line has nine columns and looks like this:Ĭhr1 curated CDS 365647 365963. The GFF format is a flat tab-delimited file, each line of which corresponds to an annotation, or feature. See GFF3 for more on the current version of GFF. For this reason, GFF2 format has been deprecated in favor of GFF3 format databases. This means you have to use "aggregators" to sort out the relationships. So it doesn't know whether the exon is a subfeature of the transcript, or vice-versa. The second limitation is that while GFF2 allows you to create two-level hierarchies, such as transcript → exon, it doesn't have any concept of the direction of the hierarchy. Most people get around this by declaring a series of transcripts and giving them similar names to indicate that they come from the same gene. GFF2 is unable to deal with the three-level hierarchy of gene → transcript → exon. This is mainly a problem when dealing with genes that have multiple alternatively-spliced transcripts. One of GFF2's problems is that it is only able to represent one level of nesting of features.
