RNAMotifScanX

ABSTRACT

  • RNA structural motifs are recurrent three-dimensional (3D) components found in the RNA architecture. These RNA structural motifs play important structural or functional roles and usually exhibit highly conserved 3D geometries and base-interaction patterns. Analysis of the RNA 3D structures and elucidation of their molecular functions heavily rely on efficient and accurate identification of these motifs. However, efficient RNA structural motif search tools are lacking due to the high complexity of these motifs. In this work we present RNAMotifScanX, a motif search tool based on a base-interaction graph alignment algorithm. This novel algorithm enables automatic identification of both partially and fully matched motif instances. RNAMotifScanX considers non-canonical base-pairing interactions, base-stacking interactions, and sequence conservation of the motifs, which leads to significantly improved sensitivity and specificity as compared to other state-of-the-art search tools. RNAMotifScanX also adopts a carefully designed branch-and-bound technique, which enables ultra-fast search of large kink-turn motifs against a 23S rRNA. The software package RNAMotifScanX is implemented using GNU C++.

  • Full Article

    README

    RNAMotifScanX: a graph alignment approach for RNA structural motif identification

     

    RNAMotifScanX is written using GNU C++ 4.4.7 and BOOST C++ library 1.55. The

    software package contains three components, namely 'scan', 'cut', and 'align'.

    'scan' accepts a query model and a formatted PDB annotation file, and identifies

    all motif instances that are similar to the query. 'cut' and 'align' are not

    parts of the 'scan' compoents nor parts of the analysis pipeline, but are

    provided for easier interpretation of the results. 'cut' extracts individual motif

    candidates from a formatted PDB annotation file, which enables the users to view

    the local interactions. 'align' performs pairwise alignment between motif instances,

    using the RNAMotifScanX alignment algorithm.

     


    INSTALLATION

     

    RNAMotifScanX is distributed as executables for GNU Linux 64-bit system.

     

    1. Unzip the package

                    Type: "tar -zxf RNAMotifScanX-release.tgz"

     

    1. Compile RNAVIEW

                    Go into RNAVIEW home directory: "cd RNAMotifScanX-release/thirdparty/RNAVIEW"

                    Type "make"

     

    1. Set the enviroment

                    Type 'source ./set_env.sh `pwd`'

     

    To doing this everytime you logged in, add these commands to your

    "~/.profile" or "~/.bash_profile":

     

                    $ RNAMOTIFSCANX_PATH=[YOUR RNAMOTIFSCANX HOME]

                    $ export RNAMOTIFSCANX_PATH

                    $ RNAVIEW=$RNAMOTIFSCANX_PATH/thirdparty/RNAVIEW

                    $ export RNAVIEW

     


    FORMAT PDB STRUCTURES

     

    RNAMotifScanX accepts the formatted PDB annotation file. To generate such file,

    first download the structure of interest as PDB format. Make sure you have Biopython

    installed. And then run the annotation script:

    (assume that you are under RNAMotifScanX home directory)

     

                    Type: "./scripts/PrepareInput.py ./pdb/1S72.pdb ./data/pdb_seqres.na.fa"

     

    The second argument, './scripts/pdb_seqres.na.fa', is a fasta format file that contains

    the primary sequence of your input structure. If the primary sequence of your structure

    is not presented in the file, please add the sequence before running the script.

     

    The script generates two types of output files, e.g. './PDB/1S72_0.rmsx.in' and

    './PDB/1S72_0.rmsx.nch'. (Two files will be generated for each RNA chain.) The

    first file is the formatted PDB annotation file that is required by RNAMotifScanX,

    and the second file is a hashing created between the residue IDs and their sequence

    indexes. The second file is optional.

     


    SCANNING THE STRUCTURE

     

    You can now use 'scan' to search query motifs against your structure of interest:

    (assume that you are under RNAMotifScanX home directory)

     

                    Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in"

     

    To output PDB ID instead of absolute index with respect to primary sequence:

     

                    Type "./bin/scan ./models/k-turn_consensus.struct ./pdb/1S72_0.rmsx.in --map_pdb=./pdb/1S72_0.rmsx.nch"

     


    EXTRACTING INDIVIDUAL MOTIFS

     

    If you want extract individual motifs (for example, to view their local interactions):

                   

                    Type "./bin/cut ./pdb/1S72_0.rmsx.in --out_dir=./workspace"

     

    If you want that the extracted motifs to be candidates with respect to a specific

    motif family:

     

                    Type "./bin/cut ./pdb/1S72_0.rmsx.in --query_motif=./models/k-turn_consensus.struct --out_dir=./workspace"

     


    PAIRWISE ALIGNMENT OF MOTIFS

     

    If you want to perform pairwise alignment between individual motifs:

     

                    Type "./bin/align [Motif_A] [Motif_B]"

     

    Here, 'Motif_A' and 'Motif_B' are individual motifs that are represented using

    the same format as those for the models (./models/) or those generated by the

    'cut' command.

     


    FINDING THE EXACT MOTIF BOUNDARY

     

    The program 'scan' predicts a set of rough regions that are the high-score hits

    with respect to the query motif. At times you may want to find the exact boundary

    of the motif that have been identified. You can find such information using the

    following way.

     

    Take the first kink-turn result for example, which is '1S72_0:72-81_89-102':

     

    * Indentify the candidate motif file '1S72_0:72-81_89-102.rmf', which was generated

      using the 'cut' program and should be in your working space directory.

    * Align the kink-turn query motif using the 'align' program.

    * In the result produced by 'align', you can find a field tagged 'Target aligned region',

      in the example it would show 'Target aligned region: 3-7,12-19'.

    * Here, '3-7' corresponds to the 3rd (0-based indexing) to the 7th residues are

      aligned. Given the motif region '1S72_0:72-81_89-102', '72' is the 0th residue, '73'

      is the 1st, '74' is the 2nd, ..., '81' is the 9th, '89' is the 10th, etc.

      In this case, the indices for the aligned residues in the candidate motif

      are '75-81,91-98'.

    * The indices in the candidate motif, i.e. '75-79,91-98' corresponds to the orders

      of the residues in the raw sequence (i.e. in this case the sequence in file '1S72_0.rmsx.in').

      To recover the PDB indexing for each residue, one can consult the mapping file

      (i.e. in this case the file '1S72_0.rmsx.nch').

    * After mapping file lookup, the region is mapped as '0'77-'0'81/'0'93-'0'100.

     

    The regions shown in the main manuscript are rough regions that are manually mapped

    and may not accurately reflect the actually aligned regions. We are currently working

    on automating the residue mapping stage.


     

    ACKNOWLEDGEMENTS:

     

                    RNAMotifScanX is develop for an NIH funded project (R01GM102515).

     

                    RNAMotifScanX uses packages from the C++ BOOST Library (www.boost.org).

     

                    Third party software MC-Annotate was downloaded from

                    http://major.iric.ca/MajorLabEn/MC-Tools.html.

     

                    Third party software RNAVIEW was downloaded from

                    http://ndbserver.rutgers.edu/ndbmodule/services/download/rnamlview.html.

     

    CONTACTS:

     

                    For bug reports or comments please contact:

                    cczhong@eecs.ucf.edu or shzhang@eecs.ucf.edu.

    DOWNLOADS