###################################################################
README for ftp://ncbi.nlm.nih.gov/refseq/release/release-catalog

Last updated: February 13, 2004

###################################################################

_________________________________________________________________________

       
       National Center for Biotechnology Information (NCBI)
             National Library of Medicine
             National Institutes of Health
             8600 Rockville Pike
             Bethesda, MD 20894, USA
             tel: (301) 496-2475
             fax: (301) 480-9241
             e-mail: info@ncbi.nlm.nih.gov
             
_________________________________________________________________________


This directory includes files documenting the contents of the RefSeq release
both as an accession list and file list, and records that were included in the 
previous release but are not included in the current release.


Files included are:

  RefSeq-release#.catalog 
  release#.files.installed
  release#.removed-records

  where '#' is the release number

Subdirectories:
  archive - previous release catalogs are available here

==========================================
RefSeq-release#.catalog 
==========================================

Content: Tab-delimited listing of all accessions included in the current 
RefSeq release.

Columns:
 1. taxonomy ID
 2. species name
 3. accession.version
 4. gi
 5. refseq release directory accession is included in
      complete + other directories
      '|' delimited
 6. refseq status
      na - not available; status codes are not applied to most genomic records
      INFERRED
      PREDICTED
      PROVISIONAL
      VALIDATED
      REVIEWED
      MODEL
      UNKNOWN - status code not provided; however usually is provided for 
                this type of record
 7. length     


==========================================
release#.files.installed
==========================================

Complete listing of sequence files installed for the current 
release.

File name format indicates the directory node, molecule type, and format type. 
Multiple files may be provided for any given molecule and format type and file 
names include a numerical increment.  Files with the same numerical increment
are related by content, they are all derived from the same ASN.1 file.

Name format:

 complete10.bna.gz
|-------|--|---|--|
   1     2   3  4

   1. directory location 
   2. numerical increment
   3. format type 
   4. compression 

Note that for some molecule and format types, a number increment is skipped.
This is not an error.  The RefSeq release processing first produces a set of 
split ASN.1 files which are used to export the records by molecule and format 
type. If an ASN.1 file does not include any records for a given molecule type, 
such as genomic sequence data, then the corresponding 'genomic' fasta and 
flatfile records will not be found.

For example:

 complete10.bna.gz
 complete10.genomic.fna.gz
 complete10.genomic.gbff.gz
 complete10.protein.faa.gz
 complete10.protein.gpff.gz
 complete10.rna.fna.gz
 complete10.rna.gbff.gz

If complete10.bna.gz includes genomic, and RNA, and protein data then the full set 
of files are provided.


In contrast, if complete24.bna includes only genomic and protein data then the
corresponding rna file is not provided:

 complete24.bna.gz
 complete24.genomic.fna.gz
 complete24.genomic.gbff.gz
 complete24.protein.faa.gz
 complete24.protein.gpff.gz


==========================================
release#.removed-records
==========================================

Content: Tab-delimited report of records that were included in the previous 
release but are not included in the current release.

Columns:
 1. taxonomy ID
 2. species name
 3. accession.version
 4. gi
 5. refseq release directory accession is included in
      complete + other directories
      '|' delimited
 6. refseq status
      na - not available; status codes are not applied to most genomic records
      INFERRED
      PREDICTED
      PROVISIONAL
      VALIDATED
      REVIEWED
      MODEL
      UNKNOWN - status code not provided; however usually is provided for 
                this type of record
 7. length     
 8. removed status
      dead protein: protein was removed when genomic record was reloaded and protein
                    was not found on the nucleotide update.  This is an implied
                    permanent suppress.

      temporarily suppressed: record was temporarily removed and may be restored
                              at a later date.

      permanently suppressed: record was permanently removed. It is possible to restore
                              this type of record however at the time of removal that
                              action is not anticipated.

      replaced by accession:  the accession in column 3 has become a secondary 
                              accession that cited in column 8.