-------------Data files for RGASP 2009--------------- RNA-Seq input files for RGASP competition. Data not to be used for publications without written permission, see http://www.genome.gov/ENCODE/#3 Contact: Felix Kokocinski, fsk@sanger.ac.uk or: rgasp-comm@sanger.ac.uk ----------------------------------------------------- 1. experiment: Human polyA+ total RNA, single reads, K562 lab: Wold lab, Caltech format: fastq, tar archive with gzipped files file: human_fastq/K562_single.tar 2. experiment: Human polyA+ total RNA, single reads, GM12878 lab: Wold lab, Caltech format: fastq, tar archive with gzipped files file: human_fastq/GM12878_single.tar 3. experiment: Human polyA+ total RNA, paired reads, K562 lab: Wold lab, Caltech format: fastq, tar archive with gzipped files file: human_fastq/K562_2x75.tar 4. experiment: Human polyA+ total RNA, paired reads, GM12878 lab: Wold lab, Caltech format: fastq, tar archive with gzipped files file: human_fastq/GM12878_2x75.tar 5. experiment: Human cytosolic polyA+, single reads, stranded, K562 lab: Gingeras lab, CSHL format: fastq, compressed tar archive file: human_fastq/K562_cyto.fastq.gz 6. experiment: Human cytosolic long polyA+, K562 (SOLiD) lab: GIS format: cfasta, gzipped file file: human_csfasta/K562_cyto.cfasta.gz + qual file 7. experiment: Human cytosolic long polyA+, GM12878 (SOLiD) lab: GIS format: cfasta, gzipped file file: human_csfasta/GM12878_cyto.cfasta.gz + qual file 8. experiment: Human cytosolic long polyA+, K562 (Helicos) lab: Kapranov lab, Helicos format: fasta, gzipped file file: human_fasta/helicos_K262_fixed.fa.gz (updated 4.10.09) human_fasta/helicos_K262_gff.gz (updated 4.10.09) 9. experiment: Drosophila cell line S2-DRSC lab: Celniker modENCODE supergroup format: fastq, tar archive with gzipped files files: drosophila/S2-DRSC.tar.gz Biological Sample S2-DRSC-14: 2 lanes of 2X37 paired-end RNA-seq Biological Sample S2-DRSC-16: 3 lanes of 2X37 paired-end RNA-seq Biological Sample S2-DRSC-UT-3: 2 lanes of 2X37 paired-end RNA-seq Biological Sample S2-DRSC-UT-4: 2 lanes of 2X37 paired-end RNA-seq Biological Sample S2-DRSC-UT-1: 6 lanes of 1X45 single read RNA-seq Biological Sample S2-DRSC-UT-6: 2 lanes of 1X75 single read RNA-seq 10.experiment: Drosophila cell line CME_W1_CI lab: Celniker modENCODE supergroup format: fastq, tar archive with gzipped files file: drosophila/CME_W1_CI.tar.gz CME_W1_CI.8 cells: 5 lanes of paired end 2X37 nt reads 11.experiment: Drosophila cell line Kc167 lab: Celniker modENCODE supergroup format: fastq, tar archive with gzipped files file: drosophila/Kc167.tar (short paired reads) Biological Sample Kc167-2: 3 lanes of 2X37 paired-end RNA-seq Biological Sample Kc167-4: 2 lanes of 2X37 paired-end RNA-seq Biological Sample Kc167-SR: 7 lanes of 1X36 paired-end RNA-seq 12.experiment: Drosophila cell line ML-DmBG3-c2 lab: Celniker modENCODE supergroup format: fastq, tar archive with gzipped files file: drosophila/CME_W1_CI.8.tar.gz ML-DmBG3-c2 cells: 5 lanes of paired end 2X37 nt reads 13.experiment: Caenorhabditis elegans, polyA+ RNAseq random fragment library (Illumina). Full description: http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&report=full&term=SRX004867, etc. lab: UWGS-RW format: fastq, tar archive with gzipped files files: c_elegans.tar 1. SRX004863 & SRX004864: early embryo 2. SRX004865 & SRX004866: late embryo 3. SRX004867: mid-L1 (updated corrupted file, paired-end data, 2x36bp concatenated) 4. SRX001872: mid-L2 5. SRX001875: mid L3 6. SRX001874: mid L4 7. SRX001873: young adult (pre-gravid) --------------------------------------------------------------------------------- Further details: ================ Drosophila data: Insert sizes: 200 bp Protocol: Standard Illumina mRNA-Seq prep kit C.elegans data: Insert sizes: 100 - 300 bp Protocol: For each library, a 10µg aliquot of DNaseI treated total RNA was separated into polyA+ and polyA- fractions using the MACSTM mRNA Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). Double-stranded cDNAs were made from the polyA+ fractions using the Superscript Double-Stranded cDNA Synthesis Kit (Invitrogen, Carlsbad, California, USA) and random hexamer primers (Invitrogen, 25 µM). The quality and quantity of the resulting double-stranded cDNAs was assessed using an Agilent DNA 1000 series II assay (Agilent Technologies, Santa Clara, CA, USA) and Nanodrop 7500 spectrophotometer (Nanodrop, Wilmington, DE, USA). Aliquots containing ~200 ng amplified cDNA were each diluted with water to 100uL and sonicated for 5 minutes. Sonication was performed in an ice water bath using a Sonic DismembratorTM 550 (Fisher, Ottawa, Ontario, Canada) with a power setting of "7" in pulses of 30 seconds interspersed with 30 seconds of cooling. Total sonication times refer to active sonication time only and do not include the rest periods in between each pulse. The resultant cDNAs from all four libraries were size-fractionated on 8% polyacrylamide gels, and the 100 to 300 bp fractions excised. Gel-purified cDNA products were modified for Illumina sequencing using the Illumina genomic DNA sequencing kit as follows: Size-selected cDNAs were end-repaired by T4 DNA polymerase and Klenow DNA Polymerase, and phosphorylated by T4 polynucleotide kinase. The cDNA products were incubated with Klenow DNA Polymerase to generate 3' Adenine overhangs followed by ligation to Illumina adapters, which contain 5' Thymine overhangs. The adapter-ligated products were purified on Qiaquick spin columns (Qiagen), then PCR-amplified with PhusionTM DNA Polymerase using Illumina's genomic DNA primer set. PCR products were purified on Qiaquick MinEluteTM columns and the DNA quality assessed and quantified using an Agilent DNA 1000 series II assay and Nanodrop 7500 spectrophotometer and diluted to 10nM. Cluster generation and sequencing was performed on the Illumina cluster station and 1G analyzer following manufacturer's instructions. Sequences were extracted from the resulting image files using the Firecrest and Bustard applications run with default parameters. Read lengths were 36 bases.