The expression data presented in NCBI's gene resource is based on selected RNASeq projects covering many tissues used for NCBI's genome annotation. The format of the XML data is optimized for loading into a Solr database. Solr is an open-source database, detailed documentation and download options at http://lucene.apache.org/solr/ Where other databases have a record as a basic unit of data, Solr has a document as a basic unit of data. Both a configuration summary (in header documents) and expression documents are present. There is a data document for each gene-sample combination and for each gene-source combination. The expression summary for a gene within a BioProject is generated by retrieving all the gene-source or gene-sample documents for that gene. The schema to configure a Solr instance for this data is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/expression/NCBI_gene_exp.schema.xml Header documents (documents are Solr's equivalent of records) characterize the tissue samples comprising a project. They have fields: is_metadata (boolean flag) is_sample (boolean flag) project_desc (NCBI Bioproject ID) taxid (NCBI taxonomic ID) source_name (name of tissue or developmental stage for the sample) sample_id (NCBI BioSample ID) exp_Mcount (Megabases of aligned reads against genome from this sample) Expression documents contain gene-specific expression levels. Document types are gene-sample, gene-source, and gene summary. Fields are: gene (NCBI gene ID, taken from genome annotation) project_desc (NCBI BioProject ID) sample_id (NCBI BioSample ID ) exp_total: read depth times bases covered from this sample for this gene; unnormalized full_rpkm: read depth normalized by gene length and by sequencing depth of the sample exp_rpkm: full_rpkm rounded to three significant figures source_name: label describing the tissue and where relevant, the developmental stage. supp_source: tissue only or developmental stage only portion of source name, for indexing var: variance of exp_rpkm where there are several samples from a source are available entropy: quantification of tissue-specific expression within a single BioProject. Sum (p_i log (p_i)), with p_i being expression from a single sample divided by total expression of the gene from all samples.