Full Article
RNAMotifScan Version 2.0 ReadMe:
For comments or bugs please email Cuncong Zhong (cczhong@cs.ucf.edu).
School of EECS, University of Central Florida
RNAMotifScan.pl provides an interface for scanning through annotated PDB files. The annotated PDB files need to be in './RNAinPDB'. The process of generating the annotated PDB files will be discussed in detail later.
To see the arguments and how to run RNAMotifScan.pl, please run 'perl RNAMotifScan.pl --help'.
If you want to run alignment between two RNA structural segments, run './RNAMotifAlign' for reduced mode output (only score) or './RNAMotifAlign_align' for alignment output (alignment included). The arguments of 'RNAMotifAlign' and 'RNAMotifAlign_align' can be found in the programs.
If you are not dealing with a new PDB entry (later than the version released on Augest, 2008), you probably need not to read the following steps. The PDB entries you need are already parsed and deposited in './RNAinPDB'.
Getting annotated PDB files:
- first you need to download the PDB file(s),
- use MC-Annotate (http://www.major.iric.ca/MC-Tools_files/MC-Annotate.zip)
to annotate the PDB file,
(An alternative annotation program shall be RNAVIEW
(http://ndbserver.rutgers.edu/services/download/), but you may need
to write your own parsing scripts to generate formatted inputs.) - First, download the PDB sequence file from 'ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt' (we do not attach this file since it is to large, and may also be outdated if you want to process a new PDB entry),
place on the same directory of 'Make_RNA_Input.pl'.
Then, use the program 'Make_RNA_Input.pl' to parse the result
generated from MC-Annotate, save the result in a file.
You may need to change the directory of the MC-Annotate generated files. - cut the result into different files by using the program 'CutMultiMCA.pl'
if you have processed more than 1 MC-Annotate result, the input of
the program shall be the output file generated in step 3, - copy the cutted files into ./RNAinPDB/.
Annotation of output alignment:
- The line labled 'iso' is the isostericity base-pair indication. A Star '*' represents the matched location in the seqeunce is a part of a matched isosteric base-pair. A plus symbol '+' represents the matched location is a part of matched non-isosteric base-pair.
- The line labeled 'edge' is the interacting edges. 'W' is for Watson-Crick, 'H' is for Hoogsteen, and 'S' is for sugar edge. Upper case letters are for 'cis' orientation base-pairs while lower case letters are for 'trans' orientation base-pairs.
- The line labeled 'struc' represents the base-pair locations. A pair of '(' and ')' represent the base-pair that was identified in the first stage, while '<' and '>' represent the base-pair that was identified in the second stage.
- A dot in the sequence represents the breakage location of the original sequence due to presentation of multiple strands in the query motif. An '-' symbol represents a gap in the sequence.