RNAMotifScanX

ABSTRACT

RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers non-canonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared to other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++.

Full Article

README

RNAMotifScanX: a graph alignment approach for RNA structural motif identification

RNAMotifScanX is written using GNU C++ 4.4.7 and BOOST C++ library 1.55. The

software package contains three components, namely 'scan', 'cut', and 'align'.

'scan' accepts a query model and a formatted PDB annotation file, and identifies

all motif instances that are similar to the query. 'cut' and 'align' are not

parts of the 'scan' compoents nor parts of the analysis pipeline, but are

provided for easier interpretation of the results. 'cut' extracts individual motif

candidates from a formatted PDB annotation file, which enables the users to view

the local interactions. 'align' performs pairwise alignment between motif instances,

using the RNAMotifScanX alignment algorithm.

INSTALLATION

RNAMotifScanX is distributed as executables for GNU Linux 64-bit system.

Unzip the package

Type: "tar -zxf RNAMotifScanX-release.tgz"

Compile RNAVIEW

Go into RNAVIEW home directory: "cd RNAMotifScanX-release/thirdparty/RNAVIEW"

Type "make"

Set the enviroment

Type 'source ./set_env.sh `pwd`'

To doing this everytime you logged in, add these commands to your

"~/.profile" or "~/.bash_profile":

$ RNAMOTIFSCANX_PATH=[YOUR RNAMOTIFSCANX HOME]

$ export RNAMOTIFSCANX_PATH

$ RNAVIEW=$RNAMOTIFSCANX_PATH/thirdparty/RNAVIEW

$ export RNAVIEW

FORMAT PDB STRUCTURES

RNAMotifScanX accepts the formatted PDB annotation file. To generate such file,

first download the structure of interest as PDB format. Make sure you have Biopython

installed. And then run the annotation script:

(assume that you are under RNAMotifScanX home directory)

Type: "./scripts/PrepareInput.py ./pdb/1S72.pdb ./data/pdb_seqres.na.fa"

The second argument, './scripts/pdb_seqres.na.fa', is a fasta format file that contains

the primary sequence of your input structure. If the primary sequence of your structure

is not presented in the file, please add the sequence before running the script.

The script generates two types of output files, e.g. './PDB/1S72_0.rmsx.in' and

'./PDB/1S72_0.rmsx.nch'. (Two files will be generated for each RNA chain.) The

first file is the formatted PDB annotation file that is required by RNAMotifScanX,

and the second file is a hashing created between the residue IDs and their sequence

indexes. The second file is optional.

SCANNING THE STRUCTURE

You can now use 'scan' to search query motifs against your structure of interest:

(assume that you are under RNAMotifScanX home directory)

Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in"

To output PDB ID instead of absolute index with respect to primary sequence:

Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in --map_pdb=./pdb/1S72_0.rmsx.nch"

EXTRACTING INDIVIDUAL MOTIFS

If you want extract individual motifs (for example, to view their local interactions):

Type "./bin/cut ./pdb/1S72_0.rmsx.in --out_dir=./workspace"

If you want that the extracted motifs to be candidates with respect to a specific

motif family:

Type "./bin/cut ./pdb/1S72_0.rmsx.in --query_motif=./models/k-turn_consensus.struct --out_dir=./workspace"

PAIRWISE ALIGNMENT OF MOTIFS

If you want to perform pairwise alignment between individual motifs:

Type "./bin/align [Motif_A] [Motif_B]"

Here, 'Motif_A' and 'Motif_B' are individual motifs that are represented using

the same format as those for the models (./models/) or those generated by the

'cut' command.

FINDING THE EXACT MOTIF BOUNDARY

The program 'scan' predicts a set of rough regions that are the high-score hits

with respect to the query motif. At times you may want to find the exact boundary

of the motif that have been identified. You can find such information using the

following way.

Take the first kink-turn result for example, which is '1S72_0:72-81_89-102':

* Indentify the candidate motif file '1S72_0:72-81_89-102.rmf', which was generated

using the 'cut' program and should be in your working space directory.

* Align the kink-turn query motif using the 'align' program.

* In the result produced by 'align', you can find a field tagged 'Target aligned region',

in the example it would show 'Target aligned region: 3-7,12-19'.

* Here, '3-7' corresponds to the 3rd (0-based indexing) to the 7th residues are

aligned. Given the motif region '1S72_0:72-81_89-102', '72' is the 0th residue, '73'

is the 1st, '74' is the 2nd, ..., '81' is the 9th, '89' is the 10th, etc.

In this case, the indices for the aligned residues in the candidate motif

are '75-81,91-98'.

* The indices in the candidate motif, i.e. '75-79,91-98' corresponds to the orders

of the residues in the raw sequence (i.e. in this case the sequence in file '1S72_0.rmsx.in').

To recover the PDB indexing for each residue, one can consult the mapping file

(i.e. in this case the file '1S72_0.rmsx.nch').

* After mapping file lookup, the region is mapped as '0'77-'0'81/'0'93-'0'100.

The regions shown in the main manuscript are rough regions that are manually mapped

and may not accurately reflect the actually aligned regions. We are currently working

on automating the residue mapping stage.

ACKNOWLEDGEMENTS:

RNAMotifScanX is develop for an NIH funded project (R01GM102515).

RNAMotifScanX uses packages from the C++ BOOST Library (www.boost.org).

Third party software MC-Annotate was downloaded from

http://major.iric.ca/MajorLabEn/MC-Tools.html.

Third party software RNAVIEW was downloaded from

http://ndbserver.rutgers.edu/ndbmodule/services/download/rnamlview.html.

CONTACTS:

For bug reports or comments please contact:

cczhong@eecs.ucf.edu or shzhang@eecs.ucf.edu.

DOWNLOADS

RNAMotifScanX-release_v0.0.5_x86-64_rhel.tar.gz

Prebuilt PDB annotation files
PDB_prebuild.tgz