ABSTRACT
Full Article
README
RNAMotifScanX: a graph alignment approach for RNA structural motif identification
RNAMotifScanX is written using GNU C++ 4.4.7 and BOOST C++ library 1.55. The
software package contains three components, namely 'scan', 'cut', and 'align'.
'scan' accepts a query model and a formatted PDB annotation file, and identifies
all motif instances that are similar to the query. 'cut' and 'align' are not
parts of the 'scan' compoents nor parts of the analysis pipeline, but are
provided for easier interpretation of the results. 'cut' extracts individual motif
candidates from a formatted PDB annotation file, which enables the users to view
the local interactions. 'align' performs pairwise alignment between motif instances,
using the RNAMotifScanX alignment algorithm.
INSTALLATION
RNAMotifScanX is distributed as executables for GNU Linux 64-bit system.
- Unzip the package
Type: "tar -zxf RNAMotifScanX-release.tgz"
- Compile RNAVIEW
Go into RNAVIEW home directory: "cd RNAMotifScanX-release/thirdparty/RNAVIEW"
Type "make"
- Set the enviroment
Type 'source ./set_env.sh `pwd`'
To doing this everytime you logged in, add these commands to your
"~/.profile" or "~/.bash_profile":
$ RNAMOTIFSCANX_PATH=[YOUR RNAMOTIFSCANX HOME]
$ export RNAMOTIFSCANX_PATH
$ RNAVIEW=$RNAMOTIFSCANX_PATH/thirdparty/RNAVIEW
$ export RNAVIEW
FORMAT PDB STRUCTURES
RNAMotifScanX accepts the formatted PDB annotation file. To generate such file,
first download the structure of interest as PDB format. Make sure you have Biopython
installed. And then run the annotation script:
(assume that you are under RNAMotifScanX home directory)
Type: "./scripts/PrepareInput.py ./pdb/1S72.pdb ./data/pdb_seqres.na.fa"
The second argument, './scripts/pdb_seqres.na.fa', is a fasta format file that contains
the primary sequence of your input structure. If the primary sequence of your structure
is not presented in the file, please add the sequence before running the script.
The script generates two types of output files, e.g. './PDB/1S72_0.rmsx.in' and
'./PDB/1S72_0.rmsx.nch'. (Two files will be generated for each RNA chain.) The
first file is the formatted PDB annotation file that is required by RNAMotifScanX,
and the second file is a hashing created between the residue IDs and their sequence
indexes. The second file is optional.
SCANNING THE STRUCTURE
You can now use 'scan' to search query motifs against your structure of interest:
(assume that you are under RNAMotifScanX home directory)
Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in"
To output PDB ID instead of absolute index with respect to primary sequence:
Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in --map_pdb=./pdb/1S72_0.rmsx.nch"
EXTRACTING INDIVIDUAL MOTIFS
If you want extract individual motifs (for example, to view their local interactions):
Type "./bin/cut ./pdb/1S72_0.rmsx.in --out_dir=./workspace"
If you want that the extracted motifs to be candidates with respect to a specific
motif family:
Type "./bin/cut ./pdb/1S72_0.rmsx.in --query_motif=./models/k-turn_consensus.struct --out_dir=./workspace"
PAIRWISE ALIGNMENT OF MOTIFS
If you want to perform pairwise alignment between individual motifs:
Type "./bin/align [Motif_A] [Motif_B]"
Here, 'Motif_A' and 'Motif_B' are individual motifs that are represented using
the same format as those for the models (./models/) or those generated by the
'cut' command.
FINDING THE EXACT MOTIF BOUNDARY
The program 'scan' predicts a set of rough regions that are the high-score hits
with respect to the query motif. At times you may want to find the exact boundary
of the motif that have been identified. You can find such information using the
following way.
Take the first kink-turn result for example, which is '1S72_0:72-81_89-102':
* Indentify the candidate motif file '1S72_0:72-81_89-102.rmf', which was generated
using the 'cut' program and should be in your working space directory.
* Align the kink-turn query motif using the 'align' program.
* In the result produced by 'align', you can find a field tagged 'Target aligned region',
in the example it would show 'Target aligned region: 3-7,12-19'.
* Here, '3-7' corresponds to the 3rd (0-based indexing) to the 7th residues are
aligned. Given the motif region '1S72_0:72-81_89-102', '72' is the 0th residue, '73'
is the 1st, '74' is the 2nd, ..., '81' is the 9th, '89' is the 10th, etc.
In this case, the indices for the aligned residues in the candidate motif
are '75-81,91-98'.
* The indices in the candidate motif, i.e. '75-79,91-98' corresponds to the orders
of the residues in the raw sequence (i.e. in this case the sequence in file '1S72_0.rmsx.in').
To recover the PDB indexing for each residue, one can consult the mapping file
(i.e. in this case the file '1S72_0.rmsx.nch').
* After mapping file lookup, the region is mapped as '0'77-'0'81/'0'93-'0'100.
The regions shown in the main manuscript are rough regions that are manually mapped
and may not accurately reflect the actually aligned regions. We are currently working
on automating the residue mapping stage.
ACKNOWLEDGEMENTS:
RNAMotifScanX is develop for an NIH funded project (R01GM102515).
RNAMotifScanX uses packages from the C++ BOOST Library (www.boost.org).
Third party software MC-Annotate was downloaded from
http://major.iric.ca/MajorLabEn/MC-Tools.html.
Third party software RNAVIEW was downloaded from
http://ndbserver.rutgers.edu/ndbmodule/services/download/rnamlview.html.
CONTACTS:
For bug reports or comments please contact:
cczhong@eecs.ucf.edu or shzhang@eecs.ucf.edu.