Post-RPSBLAST Processing Utility v0.1 README file, revised 11 September 2016 ================================================================================ https://ftp.ncbi.nlm.nih.gov/pub/mmdb/cdd/rpsbproc/README CONTENTS: 1. SUMMARY 1.1 Standalone RPS-BLAST 1.2 rpsbproc command line utility 2. DOWNLOAD THE SOFTWARE PACKAGE 3. DOWNLOAD DATA FILES 4. RUN THE PROGRAMS 5. PARSE RPSBPROC OUTPUT 6. SAMPLE OUTPUT FROM RPSBPROC ----------------------------- 1. SUMMARY ----------------------------- The rpsbproc command line utility is an addition to the standalone version of Reverse Position-Specific BLAST (RPS-BLAST), also known as CD-Search (Conserved Domain Search). RPS-BLAST is used to identify conserved domains, or functional units, within a query sequence. RPS-BLAST searches a protein sequence (or a protein translation of a nucleotide sequence) against a database of profiles that represent conserved domains. This is the opposite of PSI-BLAST, which searches a profile against a database of protein sequences, hence the term 'Reverse'. The next two sections provide a brief description of the output produced by Standalone RPS-BLAST, and the output produced by the rpsbproc command line utility, which post-processes the RPS-BLAST output. ----------------------------- 1.1 Standalone RPS-BLAST ----------------------------- The standalone RPS-BLAST is packaged with the BLAST executables (available on the NCBI FTP site at https://ftp.ncbi.nih.gov/blast/executables/LATEST/), and is also available as part of the NCBI toolkit distribution (see https://ftp.ncbi.nih.gov/toolbox). Additional details are provided in https://ftp.ncbi.nih.gov/blast/documents/rpsblast.html. For each query sequence, standalone RPS-BLAST lists the conserved domain models that scored above a certain threshold (default set to an evalue of 10), sorted by scores. The information provided for each hit includes the conserved domain's PSSMID, a set of scores (e-value, bitscore, etc) and the actual sequence alignment between the conserved domain and the query sequence. The actual program that runs RPS-BLAST locally, "rpsblast," has existed since early Fall of 2000. In October 2008, with the release of BLAST 2.2.18, the RPS-BLAST executable was split into two separate programs: "rpsblast" for protein queries and "rpstblastn" for nucleotide queries. Both programs search a query sequence against a database of profiles, and using a BLAST-like algorithm, look for alignments between the query sequences and the profiles of conserved domains. They produce results such as those described in the preceding paragraph. ----------------------------- 1.2 rpsbproc command line utility ----------------------------- The rpsbproc command line utility is available, as of December 2014, from the Conserved Domain Database (CDD) FTP site: https://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/ It post-processes the results of local RPS-BLAST searches in order to provide a non-redundant view of the search results, and to provide additional annotation on query sequences, such as domain superfamilies and functional sites, similar to the annotation provided by the corresponding web services (e.g., the NCBI Batch CD-Search web service at http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi). Specifically, the rpsbproc utility reads the output of rpsblast/rpstblastn, fills in domain superfamily and functional site information for each region of the sequence, re-sorts the hits by a different standard, and calculates a set of non-redundent representative hits. In this way, it turns the raw alignments into domain/site annotations on the query sequence at different redundancy level, basically produce the same data as web-based Batch CD-Search service does. The annotation data is presented in tab-delimited tables to be processed either programatically or manually with a spreadsheet (see details below). See the CDD and CD-Search help document for additional details about superfamilies, conserved sites, and more: http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml ------------------------------------------------- 2. DOWNLOAD THE SOFTWARE PACKAGES ------------------------------------------------- The rpsbproc package is available at: https://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/ Rpsbproc binaries are build for Win-x64 and linux-x64, they are directly executable, no installation required. Rpsblast and rpstblastn also have pre-built executables for multiple systems that are available at: https://ftp.ncbi.nih.gov/blast/executables/LATEST/ Download and untar the tarball: $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/RpsbProc-x64-linux.tar.gz $ tar -xzf RpsbProc-x64-linux.tar.gz $ ls -l drwxr-xr-x 3 shlu structure 4096 Sep 26 21:12 RpsbProc-x64-linux -rw-r--r-- 1 shlu structure 5368503 Sep 26 21:12 RpsbProc-x64-linux.tar.gz The RpsbProc-x64-linux directory should have the following layout: . |-- README |-- rpsbproc |-- rpsbproc.ini `-- utils |-- getblbins.sh `-- getcdddata.sh For those who need (or desire) to build these utilities locally can download the source code tarballs. The rpsbproc program, as well as rpsblast and rpstblastn programs, are NCBI C++ toolkit applications and therefore require NCBI C++ toolkit to build. The blast source archive is actually a subset of NCBI C++ toolkit, with all blast programs added in. It is recommended to download it and insert the rpsbproc source code in as an additional application and build all programs together. Please download the source archive https://ftp.ncbi.nih.gov/pub/mmdb/cdd/rpsbproc/RpsbProc-src.tar.gz and follow instructions in the README file there. Rpsblast and rpstblastn also have pre-built executables for multiple systems that are available at: https://ftp.ncbi.nih.gov/blast/executables/LATEST/ Download the one for your platform and untar it. At the time or writing, the latest package for linux is ncbi-blast-2.9.0+-x64-linux.tar.gz. $ cd RpsbProc-x64-linux $ wget https://ftp.ncbi.nih.gov/blast/executables/LATEST/ncbi-blast-2.9.0+-x64-linux.tar.gz $ tar -xzf ncbi-blast-2.9.0+-x64-linux.tar.gz Copy the rpsblast and rpstblastn programs to current directory: $ cp ncbi-blast-2.9.0+/bin/rpsblast ncbi-blast-2.9.0+/bin/rpstblastn . $ ls -l -rwxr-xr-x 1 shlu structure 32331 Sep 26 13:21 README drwxr-xr-x 4 shlu structure 4096 Mar 11 2019 ncbi-blast-2.9.0+ -rw-r--r-- 1 shlu structure 243817734 Mar 11 2019 ncbi-blast-2.9.0+-x64-linux.tar.gz -rwxr-xr-x 1 shlu structure 43489856 Sep 26 22:26 rpsblast -rwxr-xr-x 1 shlu structure 15739416 Sep 26 17:04 rpsbproc -rwxr-xr-x 1 shlu structure 240 Sep 13 2018 rpsbproc.ini -rwxr-xr-x 1 shlu structure 43485504 Sep 26 22:26 rpstblastn drwxr-xr-x 2 shlu structure 4096 Sep 16 16:50 utils On linux platform, these steps can be done automatically by running utils/getblbins.sh (Get Blast Binaries). The script automatically retrieves the latest linux binary and create symbolic links from the current directory to rpsblast and rpstblastn: $ utils/getblbins.sh $ ls -l -rwxr-xr-x 1 shlu structure 32331 Sep 26 13:21 README drwxr-xr-x 3 shlu structure 4096 Sep 26 23:29 ncbi-blast-2.9.0+-x64-linux lrwxrwxrwx 1 shlu structure 60 Sep 26 23:29 rpsblast -> ./ncbi-blast-2.9.0+-x64-linux/ncbi-blast-2.9.0+/bin/rpsblast -rwxr-xr-x 1 shlu structure 15739416 Sep 26 17:04 rpsbproc -rwxr-xr-x 1 shlu structure 240 Sep 13 2018 rpsbproc.ini lrwxrwxrwx 1 shlu structure 62 Sep 26 23:29 rpstblastn -> ./ncbi-blast-2.9.0+-x64-linux/ncbi-blast-2.9.0+/bin/rpstblastn drwxr-xr-x 2 shlu structure 4096 Sep 26 23:10 utils -------------------------------------- 3. DOWNLOAD DATA FILES -------------------------------------- The rpsblast and rpstblastn executables need domain databases in order to work. Download pre-formatted search databases from: https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian Create db directory and download database files $ mkdir ./db $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Cdd_LE.tar.gz && tar -xzf Cdd_LE.tar.gz -C ./db && rm -f Cdd_LE.tar.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Cdd_NCBI_LE && tar -xzf Cdd_NCBI_LE -C ./db && rm -f Cdd_NCBI_LE $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Cog_LE.tar.gz && tar -xzf Cog_LE.tar.gz -C ./db && rm -f Cog_LE.tar.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Kog_LE.tar.gz && tar -xzf Kog_LE.tar.gz -C ./db && rm -f Kog_LE.tar.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Prk_LE.tar.gz && tar -xzf Prk_LE.tar.gz -C ./db && rm -f Prk_LE.tar.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Smart_LE.tar.gz && tar -xzf Smart_LE.tar.gz -C ./db && rm -f Smart_LE.tar.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Tigr_LE.tar.gz && tar -xzf Tigr_LE.tar.gz -C ./db && rm -f Tigr_LE.tar.gz The rpsbproc program will need a set of domain-annotation data files to function. Download these files from: https://ftp.ncbi.nih.gov/pub/mmdb/cdd/ Create data directory and download domain-annotation files $ mkdir ./data $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddid.tbl.gz -O ./data/cddid.tbl.gz && gzip -d ./data/cddid.tbl.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/cdtrack.txt -O ./data/cdtrack.txt $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/family_superfamily_links -O ./data/family_superfamily_links $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddannot.dat.gz -O ./data/cddannot.dat.gz && gzip -d ./data/cddannot.dat.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/cddannot_generic.dat.gz -O ./data/cddannot_generic.dat.gz && gzip -d ./data/cddannot_generic.dat.gz $ wget https://ftp.ncbi.nih.gov/pub/mmdb/cdd/bitscore_specific.txt -O ./data/bitscore_specific.txt On linux platform, these steps can be done automatically by running utils/getcdddata.sh (Get CDD Data). The script will obtain all databases and datafiles, decompress them if necessary, and put in ./data and ./db by default. $ ls -l -rwxr-xr-x 1 shlu structure 32291 Sep 18 16:41 README drwxr-xr-x 2 shlu structure 4096 Sep 18 17:09 data drwxr-xr-x 2 shlu structure 4096 Sep 18 17:11 db drwxr-xr-x 2 shlu structure 4096 Sep 18 16:53 example drwxr-xr-x 3 shlu structure 4096 Sep 18 17:06 linux drwxr-xr-x 2 shlu structure 4096 Sep 18 17:10 logs -rwxr-xr-x 1 shlu structure 27 Sep 13 09:18 precedences_sample.txt lrwxrwxrwx 1 shlu structure 38 Sep 18 17:07 rpsblast -> ./linux/ncbi-blast-2.9.0+/bin/rpsblast lrwxrwxrwx 1 shlu structure 40 Sep 18 17:07 rpstblastn -> ./linux/ncbi-blast-2.9.0+/bin/rpstblastn -rwxr-xr-x 1 shlu structure 15670008 Sep 18 13:40 sparclbl -rwxr-xr-x 1 shlu structure 446 Sep 18 16:25 sparclbl.ini drwxr-xr-x 2 shlu structure 4096 Sep 16 16:50 utils $ ls -l data/ -rw-r--r-- 1 shlu structure 1014866 Sep 18 17:09 bitscore_specific.txt -rw-r--r-- 1 shlu structure 3875945 Sep 18 17:09 cddannot.dat -rw-r--r-- 1 shlu structure 2455749 Sep 18 17:09 cddannot_generic.dat -rw-r--r-- 1 shlu structure 26712116 Sep 18 17:09 cddid.tbl -rw-r--r-- 1 shlu structure 1324235 Sep 18 17:09 cdtrack.txt -rw-r--r-- 1 shlu structure 1674528 Sep 18 17:09 family_superfamily_links -rw-r--r-- 1 shlu structure 29836008 Sep 18 17:09 specific_arch.txt -rw-r--r-- 1 shlu structure 17694472 Sep 18 17:09 superfamily_arch.txt The README file on the CDD FTP site provides information about the little-endian databases and the scope of data contained in each one. It also provides information about customizing search databases: https://ftp.ncbi.nih.gov/pub/mmdb/cdd/README Please note that we can no longer maintain pre-computed search databases for big_endian architectures. Search databases distributed via the big_endian FTP directory are outdated. (As mentioned in the descriptive file (https://ftp.ncbi.nih.gov/blast/documents/rpsblast.html) for the RPS-BLAST executable, RPS-BLAST uses a BLAST database, but also has some other files that contain a precomputed lookup table for the profiles to allow the search to proceed faster. Unfortunately it was not possible to make this lookup table architecture independent (like the BLAST databases themselves) and one cannot take an RPS-BLAST database prepared on a big-endian system (e.g., Solaris Sparc) and run it on a small-endian system (e.g., NT). The RPS-BLAST database must be prepared again for the small-endian system.) Now the local RPSBLAST system is ready to run. -------------------------------------- 4. RUN THE PROGRAMS -------------------------------------- If the above steps are followed to prepare the package, the current directory should look like: . |-- README |-- data | |-- bitscore_specific.txt | |-- cddannot.dat | |-- cddannot_generic.dat | |-- cddid.tbl | |-- cdtrack.txt | `-- family_superfamily_links |-- db | |-- Cdd.aux | |-- Cdd.freq | ...... |-- rpsblast |-- rpsbproc |-- rpstblastn `-- utils |-- getblbins.sh `-- getcdddata.sh **IMPORTANT**: The filenames under ./data are important! make sure they are correct after downloading from our ftp site and rename them if not match. Both rpsblast and rpstblastn support query files with multiple FASTA sequences and can deliver the results in many formats (run them with the "-help" for detailed usage information), but rpsbproc only processes the asn1 formatted Blast Archive object (by specifying "-outfmt 11" when running rpsblast and rpstblastn) because it contains all the information necessary for rpsbproc to function. For backward compatibility, this version still supports xml formatted BlastOutput object (by specifying "-outfmt 5"), but it is deprecated and maybe removed in future versions. Data generated by rpsblast/rpstblastn can be saved in a file then processed by rpsbproc: [~/localrpsb]$ rpsblast -query multi_protein.FASTA -db ./db/Cdd -evalue 0.01 -outfmt 11 -out multi_protein.asn [~/localrpsb]$ rpsbproc -i multi_protein.asn -o multi_protein.out -e 0.01 -m rep Or be piped directly to rpsbproc: [~/localrpsb]$ rpsblast -query multi_protein.FASTA -db ./db/Cdd -evalue 0.01 -outfmt 11 | rpsbproc -i multi_protein.asn -o multi_protein.out -e 0.01 -m rep Command line options supported by rpsbproc: -i Input file that contains the XML data generated by rpsblast /rpstblastn. By default the program reads data from standard input. -o File to which the processed results should be written. By default the output is directed to standard output. -e Only processes hits with evalues equal or better (smaller in value) than the designated. Default value is 0.01. This option is convenient in that the rpsblast needs only be run once with a higher evalue (such as 1.0) and the results can be processed by rpsbproc with difference evalue-cutoffs. -m Select redundancy level of domain hit data. Valid options are "rep" (concise), "std"(standard), and "full" (all hits). Please refer to http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSBResultsLevelsOfDetail for detailed information regarding the three redundancy levels. The default value is "rep". -t Select desired (target) data. Valid options are "doms" for domain/superfamily hits, "feats" for functional sites and structural motif annotations, or "both". Default is "both". -d Location of data files. By default, the program looks for a directory named 'data' the following order: current directory program directory (the directory where the rpsbproc program resides) -q Quiet mode -- do not display program information and processing progress to stderr. -h Display a brief usage screen, ignore other switches. -------------------------------------- 5. PARSE RPSBPROC OUTPUT -------------------------------------- The rpsbproc executable presents the data in a tab-delimited table-like format. It is designed such that it tries to accommondate both human read and program processing. A sample output file is included at the bottom of this document. There are two sections in an output file. The first section displays the program information, parameters used for data processing, and a "template" to explain the format and content of each column of the data tables. This section is intended for human read. All lines in this section start with a '#' so programs can treat them as "comment" lines that can be safely ignored. The second section contains the real data intended to be programmatically processed. All columns are delimited with a 'tab' character ('\t'). The data section always starts with a "DATA" token and ends with an "ENDDATA" token. In between, there can be several sessions, each of which start with a SESSION token and end with an ENDSESSION token. A session is one XML object produced by rpsblast or rpstblastn. Despite the number of FASTA sequences in a query file, each run of rpsblast/rpstblastn produces only one XML object. Besides the SESSION token, the line also contains several columns to display additional information about the session: SESSION Each session is given an ordinal number to distinguish it from other sessions, followed by the program/version, database, score matrix, and e-value threshold used for the rpsblast/rpstblastn session. The session end line contains the ENDSESSION token and the session ordinal number to indicate the end of the particular session: ENDSESSION In each session, there can be multiple queries, starting with a QUERY token line and ending with an ENDQUERY token line: QUERY ENDQUERY The is an ID assigned automatically by rpsblast/rpstblastn during a session, so is only unique within the session. The is either "Peptide" or "Nucleotide". The is the length of the query sequence (number of amino acid or nucleotide residues). The is the exact copy of the definition line of the FASTA sequence. Within each QUERY section, there are three optional sub-sections: 1. Domain/superfamily hits. If requested (with command line option "-t doms" or "-t both"), this section will appear, starting with a DOMAINS token and ending with an ENDDOMAINS token: DOMAINS ENDDOMAINS Within the domain hit section, each hit is repesented by one line with the following columns: The session ordinal number and the query id are included in each line to make it unique across the whole rpsbproc session. The "[readingframe]" part following the query id only appears for nucleotide queries. Other columns are: : "Specific", "Non-Specific", "Superfamily" or "Multidom". Refer to http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSB_hit_types for more information. : the PSSMID of the domain or superfamily : Domain start position in query sequence coordinates : Domain end (inclusive) position in query sequence coordinates : The E-value of this hit : The bitscore of this hit : The accession of the domain or superfamily : The short-name of the domain or superfamily : Indicate if there are more than 20% missing from N- or C- terminal compare to the original domain. Values are: "-" (no more than 20% shorter on either terminals), "N" (N-terminal has 20% or more missing), "C"(C-terminal has 20% or more missing) or "NC" (both terminal has 20% or more missing) : If applicable, this field contains the PSSMID of the superfamily to which the hit domain belongs. Otherwise, it will contain a dash '-'. 2. Site annotations. If requested (with command line option "-t doms" or "-t both"), this section will appear, starting with a SITES token and ending with an ENDSITES token: SITES ENDSITES Each functional site is represented by one line, which contains the following columns: <residue(coordinates)> <complete-size> <mapped-size> <source-domain> The first two columns are the same as in domain hit lines. Other columns are: <annot-type>: "Specific" (site-annotations found from specific domain hits) or "Generic" (site-annotations found from non-specific/superfamily hits). <title>: Description title of the functional site <residue(coordinates)>: one-letter residue symbols and the coordinate on the query sequence, each residue is separated by a comma (,). For protein queries, coordinate is just one number; for nucleotide queries, the residue symbol is still the translated amino acid, the coordinates, however, is a three-position range for the codon on query sequence, such as I44-46,F47-49. For minus strand, the range is like I6926-6924 with the bigger number comes first to indicate the reversed direction. <complete-size>: The number of residues in the original defined functional site. <mapped-size>: The number of residues of the functional site mapped to the query sequence. <source-domain>: The PSSMID of the domain on which the site is defined. For specific sites, it's the specific hit domain; for generic sites, it's the root domain of the domain hierarchy where the non-specific hit domain belongs. 3. Structural motifs. This section appears with site annotations, starting with a MOTIFS token and ending with an ENDMOTIFS token: MOTIFS ENDMOTIFS Each structural motif is represented by one line, which contains the following columns: <session-ordinal> <query-id[readingframe]> <title> <from> <to> <source-domain> The first two columns are the same as in domain hit lines. Other columns are: <title>: The description title of the motif <from>,<to>: The range of the motif in query sequence coordinates. Structural motifs are defined as ranges instead of multiple residues <source-domain>: Some structural motifs are defined on conserved domains, whose PSSMID will appear here; other structural motifs are not defined on CDs thus do not have a source domain. -------------------------------------- 6. SAMPLE OUTPUT FROM RPSBPROC -------------------------------------- #Post-RPSBLAST Processing Utility v0.1 #Config file: rpsbproc.ini #Input data file: test/sample.xml #Output data file: test/sample.out #E-Value cutoff: 0.01 #Redundancy: Concise #Data requested: Domain hits and site annotations #Output format -- tab-delimited table #DATA #SESSION <session-ordinal> <program> <database> <score-matrix> <evalue-threshold> #QUERY <query-id> <seq-type> <seq-length> <definition-line> #DOMAINS #<session-ordinal> <query-id[readingframe]> <hit-type> <PSSM-ID> <from> <to> <E-Value> <bitscore> <accession> <short-name> <incomplete> <superfamily PSSM-ID> #more such lines...... #ENDDOMAINS #SITES #<session-ordinal> <query-id[readingframe]> <annot-type> <title> <residue(coordinates)> <complete-size> <mapped-size> <source-domain> #more such lines...... #ENDSITES #MOTIFS #<session-ordinal> <query-id[readingframe]> <title> <from> <to> <source-domain> #more such lines...... #ENDMOTIFS #ENDQUERY <query-id> #more query sections.. #ENDSESSION <session-ordinal> #more session sections.. #ENDDATA #===================================================================== DATA SESSION 1 RPSBLAST 2.7.1+ /blast/db/blast/cdd BLOSUM62 0.01 QUERY Query_1 Peptide 430 gi|27085270|gb|AAN85445.1| SMAD8 protein [Mus musculus] DOMAINS 1 Query_1 Specific 199814 13 136 4.19971e-86 260.898 cd10490 MH1_SMAD_1_5_9 - 260161 1 Query_1 Superfamily 260162 230 430 1.84736e-150 428.529 cl00056 MH2 - - ENDDOMAINS SITES 1 Query_1 Specific DNA binding site K36,S40,S74,L75,R78,L79,Q80,V81,S82,K85,H104 11 11 199814 1 Query_1 Specific Zn binding site C68,C113,C125,H130 4 4 199814 1 Query_1 Generic trimer interface Y242,E250,T251,Q253,F264,T265,D266,P267,S268,R273,G277,L278,L279,S280,V282,E289,R293,H294,G296,Y369,T372,T395,E402,H404,H406 25 25 199819 ENDSITES ENDQUERY ENDSESSION 1 SESSION 2 RPSTBLASTN 2.7.1+ /blast/db/blast/cdd BLOSUM62 0.01 QUERY Query_1 Nucleotide 30180 gi|342851972|Borrelia afzelii PKo plasmid lp32-10, complete sequence DOMAINS 2 Query_1[1] Specific 251280 16624 17493 6.0196e-138 438.343 pfam02414 Borrelia_orfA - 45 2 Query_1[1] Specific 251598 17812 18234 8.70594e-72 241.506 pfam02890 DUF226 - 45 2 Query_1[1] Specific 251280 11503 12075 3.14064e-70 243.047 pfam02414 Borrelia_orfA N 45 2 Query_1[1] Specific 250787 19276 19530 2.61136e-43 157.532 pfam01672 Plasmid_parti - 45 2 Query_1[1] Specific 250787 1846 2118 1.14753e-30 120.939 pfam01672 Plasmid_parti - 45 2 Query_1[1] Non-Specific 114930 13324 13443 6.37628e-11 64.3142 pfam06238 Borrelia_lipo_2 N 45 2 Query_1[1] Non-Specific 251598 28285 28458 5.49995e-10 62.003 pfam02890 DUF226 N 45 2 Query_1[1] Superfamily 265491 5188 5487 7.8357e-10 60.8474 cl14958 HSDR_N - - 2 Query_1[1] Non-Specific 251280 2527 3081 7.9483e-10 63.1586 pfam02414 Borrelia_orfA N 45 2 Query_1[1] Non-Specific 251598 664 753 1.69066e-09 60.4622 pfam02890 DUF226 N 45 2 Query_1[1] Non-Specific 115671 12772 12897 1.47407e-07 53.5286 pfam07032 DUF1322 C 45 2 Query_1[1] Superfamily 272166 23854 24534 2.07183e-06 51.6026 cl21746 EpsG - - 2 Query_1[1] Non-Specific 251280 19072 19767 8.60803e-05 47.7506 pfam02414 Borrelia_orfA - 45 2 Query_1[1] Non-Specific 226114 6079 6273 0.000112918 45.0542 COG3586 COG3586 N 45 2 Query_1[1] Superfamily 272166 10507 10767 0.00105216 43.5134 cl21746 EpsG N - 2 Query_1[1] Non-Specific 250680 4720 4908 0.0065359 39.2762 pfam01519 DUF16 N 45 2 Query_1[1] Non-Specific 251280 14236 14874 0.00734381 41.5874 pfam02414 Borrelia_orfA - 45 2 Query_1[1] Multidom 224396 19069 20253 5.10176e-10 64.6994 COG1479 COG1479 - 45 2 Query_1[1] Multidom 224396 2551 3045 0.00197166 43.8986 COG1479 COG1479 N 45 2 Query_1[2] Specific 238997 1088 1300 5.6938e-09 62.7734 cd02042 ParA N 271875 2 Query_1[2] Specific 115964 12329 12748 7.71117e-61 234.958 pfam07341 DUF1473 - 45 2 Query_1[2] Specific 251280 11207 11920 7.7439e-50 198.364 pfam02414 Borrelia_orfA C 45 2 Query_1[2] Specific 251666 13499 13789 5.87401e-44 179.104 pfam02999 Borrelia_orfD - 45 2 Query_1[2] Superfamily 271929 23390 23665 1.29102e-36 154.451 cl21509 ApoLp-III_like N - 2 Query_1[2] Non-Specific 115940 12071 12298 5.44539e-28 125.946 pfam07316 DUF1463 N 45 2 Query_1[2] Superfamily 271875 2 193 8.34729e-17 88.9669 cl21455 ABC_ATPase NC - 2 Query_1[2] Specific 259034 26684 27478 3.42711e-10 67.0106 pfam14897 EpsG - 272166 2 Query_1[2] Non-Specific 251280 9587 10507 9.60884e-07 55.4546 pfam02414 Borrelia_orfA - 45 2 Query_1[2] Superfamily 271898 28811 29050 1.72577e-06 54.6842 cl21478 Mt_ATP-synt_B N - 2 Query_1[2] Non-Specific 255394 8819 9874 0.000107423 48.521 pfam09491 RE_AlwI - 45 2 Query_1[2] Non-Specific 165315 7280 8293 0.000263094 47.3654 PHA03016 PHA03016 C 45 2 Query_1[2] Specific 259034 6230 6739 0.000434934 46.595 pfam14897 EpsG N 272166 2 Query_1[2] Non-Specific 254343 22463 22690 0.00318818 43.8986 pfam07667 DUF1600 - 45 2 Query_1[2] Multidom 224396 1268 2461 0.00256734 44.2838 COG1479 COG1479 - 45 2 Query_1[3] Specific 238997 18237 18350 3.05374e-09 63.929 cd02042 ParA C 271875 2 Query_1[3] Specific 238997 18585 18761 3.53342e-06 53.5286 cd02042 ParA N 271875 2 Query_1[3] Specific 238997 756 872 3.56046e-05 50.447 cd02042 ParA C 271875 2 Query_1[3] Superfamily 271929 22950 23405 7.61085e-74 278.1 cl21509 ApoLp-III_like C - 2 Query_1[3] Specific 251859 14796 15101 2.51807e-34 147.132 pfam03304 Mlp - 45 2 Query_1[3] Specific 114930 13113 13331 3.6503e-32 139.813 pfam06238 Borrelia_lipo_2 C 45 2 Query_1[3] Specific 251598 28044 28352 4.54523e-32 139.428 pfam02890 DUF226 C 45 2 Query_1[3] Non-Specific 253346 25707 26294 4.30364e-19 96.2857 pfam05714 Borrelia_lipo_1 - 45 2 Query_1[3] Superfamily 271875 186 398 3.20379e-16 87.0409 cl21455 ABC_ATPase N - 2 Query_1[3] Specific 259034 15258 16181 1.16935e-09 65.0846 pfam14897 EpsG - 272166 2 Query_1[3] Non-Specific 115671 12891 13001 3.26518e-09 63.5438 pfam07032 DUF1322 N 45 2 Query_1[3] Non-Specific 251280 2688 3440 1.27865e-05 51.6026 pfam02414 Borrelia_orfA - 45 2 Query_1[3] Non-Specific 165315 3090 4169 0.000204623 47.7506 PHA03016 PHA03016 - 45 2 Query_1[3] Specific 214445 7947 8393 0.000774987 45.8246 MTH00166 ND6 - 271904 2 Query_1[3] Superfamily 272166 22341 22871 0.0021348 44.2838 cl21746 EpsG N - 2 Query_1[3] Superfamily 272166 21237 21503 0.0033904 43.8986 cl21746 EpsG N - 2 Query_1[3] Superfamily 271981 27096 27542 0.00351032 43.5134 cl21561 7tm_4 C - 2 Query_1[3] Superfamily 272166 2118 2525 0.00431283 43.5134 cl21746 EpsG NC - 2 Query_1[3] Multidom 224113 18228 18956 1.9077e-52 207.223 COG1192 Soj - 45 2 Query_1[3] Multidom 224396 3153 4286 1.05473e-05 51.9878 COG1479 COG1479 - 45 2 Query_1[3] Multidom 233080 186 425 9.75408e-19 95.1301 TIGR00665 DnaB N 45 2 Query_1[3] Multidom 224113 747 1037 6.54658e-18 92.4337 COG1192 Soj C 45 2 Query_1[3] Multidom 234229 2682 3614 9.29802e-05 48.9062 TIGR03490 Mycoplas_LppA C 45 2 Query_1[-1] Specific 259034 16834 17766 5.48379e-16 86.2705 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 1708 2721 6.3267e-12 72.7886 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 11143 12060 1.43775e-11 71.633 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 1237 2169 4.78903e-11 69.707 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 4948 5904 6.88935e-11 69.3218 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 19141 20061 2.92585e-09 63.929 pfam14897 EpsG - 272166 2 Query_1[-1] Superfamily 268042 16477 17055 1.54365e-08 61.6178 cl19689 Oxidored_q1 C - 2 Query_1[-1] Specific 259034 14056 15099 6.28352e-08 59.3066 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 7075 7911 1.24159e-05 51.6026 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 4357 5307 4.14665e-05 50.0618 pfam14897 EpsG - 272166 2 Query_1[-1] Specific 259034 418 753 0.000131278 48.521 pfam14897 EpsG NC 272166 2 Query_1[-1] Specific 259034 11827 12375 0.000214713 47.7506 pfam14897 EpsG N 272166 2 Query_1[-1] Superfamily 268042 2515 3084 0.00139548 45.0542 cl19689 Oxidored_q1 N - 2 Query_1[-1] Specific 224196 9919 10461 0.0016561 44.669 COG1277 NosY N 271894 2 Query_1[-1] Superfamily 272166 13687 14139 0.00528465 43.1282 cl21746 EpsG N - 2 Query_1[-1] Specific 224196 28582 29196 0.00701699 42.743 COG1277 NosY C 271894 2 Query_1[-1] Multidom 177158 16618 17994 1.32692e-11 71.633 MTH00095 ND5 - 45 2 Query_1[-2] Specific 252574 15600 16097 6.58907e-31 135.576 pfam04404 ERF - 45 2 Query_1[-2] Specific 259034 2433 3449 3.40237e-20 100.138 pfam14897 EpsG - 272166 2 Query_1[-2] Superfamily 271437 6582 7034 5.74264e-18 92.8189 cl00213 DNA_BRE_C - - 2 Query_1[-2] Specific 259034 3150 4181 1.70967e-17 91.2781 pfam14897 EpsG - 272166 2 Query_1[-2] Specific 259034 14103 15083 1.19188e-10 68.5514 pfam14897 EpsG - 272166 2 Query_1[-2] Non-Specific 177131 16935 17807 3.49055e-09 63.5438 MTH00059 ND2 - 45 2 Query_1[-2] Superfamily 272166 23262 23762 4.64315e-07 56.6102 cl21746 EpsG - - 2 Query_1[-2] Non-Specific 258475 19716 20594 3.00969e-06 53.9138 pfam14296 O-ag_pol_Wzy C 45 2 Query_1[-2] Specific 259034 11562 12074 4.96021e-05 49.6766 pfam14897 EpsG N 272166 2 Query_1[-2] Non-Specific 258475 1365 2786 0.000230787 47.7506 pfam14296 O-ag_pol_Wzy - 45 2 Query_1[-2] Specific 224196 13686 14462 0.000365584 46.9802 COG1277 NosY - 271894 2 Query_1[-2] Non-Specific 251280 15249 15647 0.000802318 45.8246 pfam02414 Borrelia_orfA NC 45 2 Query_1[-2] Superfamily 268329 6942 7256 0.00128791 45.0542 cl19976 7tm_7 N - 2 Query_1[-2] Non-Specific 255908 5496 6002 0.00217536 44.2838 pfam10325 7TM_GPCR_Srz C 45 2 Query_1[-2] Superfamily 272166 24501 25472 0.00323109 43.8986 cl21746 EpsG - - 2 Query_1[-2] Specific 257675 18849 19202 0.00348228 43.5134 pfam13346 ABC2_membrane_5 N 271894 2 Query_1[-2] Non-Specific 254938 4668 4838 0.00467304 43.1282 pfam08634 Pet127 C 45 2 Query_1[-2] Superfamily 271981 3834 4589 0.00705389 42.743 cl21561 7tm_4 - - 2 Query_1[-2] Superfamily 272166 20268 20975 0.0085514 42.3578 cl21746 EpsG N - 2 Query_1[-2] Multidom 177157 2418 3455 1.25459e-11 71.633 MTH00094 ND4 - 45 2 Query_1[-2] Multidom 233789 6630 7079 3.05713e-12 73.559 TIGR02225 recomb_XerD N 45 2 Query_1[-2] Multidom 234043 19629 20063 0.000122134 48.521 TIGR02871 spore_ylbJ N 45 2 Query_1[-3] Specific 259034 8756 9649 2.00834e-11 70.8626 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 9863 10546 1.90816e-10 67.781 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 2414 3292 1.94348e-09 64.3142 pfam14897 EpsG C 272166 2 Query_1[-3] Specific 259034 16976 17938 2.43849e-08 60.8474 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 572 1597 4.1738e-08 60.077 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 22880 23857 8.63418e-07 55.4546 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 1937 2830 1.38229e-06 55.0694 pfam14897 EpsG - 272166 2 Query_1[-3] Superfamily 272166 7235 7831 4.64133e-06 53.1434 cl21746 EpsG - - 2 Query_1[-3] Specific 259034 18086 18964 1.02965e-05 51.9878 pfam14897 EpsG - 272166 2 Query_1[-3] Specific 259034 269 772 3.32135e-05 50.447 pfam14897 EpsG N 272166 2 Query_1[-3] Non-Specific 255908 11594 11911 5.9978e-05 49.6766 pfam10325 7TM_GPCR_Srz C 45 2 Query_1[-3] Non-Specific 258475 2822 4171 0.000109453 48.521 pfam14296 O-ag_pol_Wzy - 45 2 Query_1[-3] Superfamily 271677 7460 8404 0.00019345 47.7506 cl09928 Molybdopterin-Binding N - 2 Query_1[-3] Specific 259034 28613 29173 0.000235147 47.3654 pfam14897 EpsG C 272166 2 Query_1[-3] Specific 259034 11357 12313 0.000404641 46.595 pfam14897 EpsG - 272166 2 Query_1[-3] Superfamily 272166 19073 19435 0.00068706 45.8246 cl21746 EpsG N - 2 Query_1[-3] Non-Specific 258475 20189 20956 0.000787441 45.8246 pfam14296 O-ag_pol_Wzy C 45 2 Query_1[-3] Superfamily 272166 4046 5119 0.00207293 44.2838 cl21746 EpsG - - 2 Query_1[-3] Non-Specific 258475 5624 6067 0.00354807 43.5134 pfam14296 O-ag_pol_Wzy C 45 ENDDOMAINS SITES 2 Query_1[2] Specific Magnesium ion binding site D1121-1123 2 1 238997 2 Query_1[2] Generic ATP binding site D53-55,Y56-58 9 2 238540 2 Query_1[2] Generic Walker B motif I41-43,I44-46,F47-49,I50-52,D53-55 5 5 238540 2 Query_1[3] Specific P-loop K18258-18260,G18261-18263,G18264-18266,V18267-18269,G18270-18272,K18273-18275,S18276-18278 7 7 238997 2 Query_1[3] Specific Magnesium ion binding site S18276-18278 2 1 238997 2 Query_1[3] Specific Magnesium ion binding site D18603-18605 2 1 238997 2 Query_1[3] Specific P-loop K777-779,G780-782,G783-785,V786-788,S789-791,K792-794,S795-797 7 7 238997 2 Query_1[3] Specific Magnesium ion binding site S795-797 2 1 238997 2 Query_1[-2] Generic active site K6929-6927,H6656-6654,R6647-6645,H6596-6594 5 4 271175 2 Query_1[-2] Generic DNA binding site R6971-6969,K6929-6927,I6926-6924,T6719-6717,Q6713-6711,H6656-6654 7 6 271175 2 Query_1[-3] Generic molybdopterin cofactor binding site K8215-8213,S8209-8207,I8206-8204,N8203-8201,I8143-8141,N8140-8138,N8134-8132,N8125-8123,E8122-8120,I8038-8036,I8035-8033,L7999-7997,I7627-7625,*7624-7622,K7621-7619,F7582-7580,I7579-7577,*7576-7574,F7567-7565,A7501-7499,K7498-7496,S7483-7481 25 22 238218 ENDSITES ENDQUERY ENDSESSION 2 ENDDATA #Total Blastout object processed 2 ================================================================================ Aron Marchler-Bauer, Shennan Lu, Renata Geer, revised 11 September 2016