ABSTRACT
Full Article
README
GRASP is written in GNU C++ (g++ 4.8.1) and has been tested in
a 64-bit GNU/Linux system. GRASP contains two components: 'Build' and
'Search'. 'Build' is used to create a one-time index on the reads
which the reference is searched against. 'Search' performs the
actual assembly and search against the pre-built indexing files.
NOTE: GRASP currently only supports individual read length up to
255 amino acids.
Assume that you are in the GRASP directory:
--RUNNING "Build":
"./bin/Build ./example/reads.faa --work_space=./WorkSpace"
The indexing files are to be written under the specified work space.
--RUNNING "Search":
"./bin/Search ./example/query.faa ./example/reads.faa --work_space=./WorkSpace --result_space=./Results --write_certificate=1"
The indexing files created in the work space will be read,
and the results will be written to the result space. By adding
"--write_certificate=1", GRASP will output the alignments
between the query sequence and the assembled sequences.
NOTE: if you need to use alternative alphabet or seed length,
please rerun the 'Build' program with the new parameters.
--RESULTS INTERPRETATION:
The results can be found in the specified result space, or
by default "./Results". A default search will create a read
recruitment file that ends with '.reads'. Using "--write_certificate=1"
option will create another certificate file that ends with '.aln'.
The recruitment file that ends with ".reads" contains reads that
have been identified by the search, which are deemed to come
from homologous sequences of the query. The first column
indicates the query sequence ID (order in the query file,
begins with 0). The second column indicates the recruited read ID
(order in the read file, begins with 0). The third column
indicates the best alignment E-value achieved by the assembly
of the corresponding read.
The certificate file that ends with ".aln" contains the detailed
alignments between the query and the assembled sequences. Each
alignment segment corresponds to a unique assembled sequence.
The detailed location where the individual reads contribute to the
assembly can be found at the end of each alignment segment (included reads).
It contains a number of tuples (one for each read), while each one
labels a read ID and its starting position in the assembled sequence.
--PYTHON WRAPPER SCRIPT
"./scripts/GRASP.py --query=./example/query.faa --target=./example/reads.faa"
A wrapping script written in Python can be found in "./scripts/GRASP.py".
The script runs "Build" and "Search" together based on the input parameters.
It requires Python version 2.7 or higher.
If you found GRASP useful, please cite the following paper:
Cuncong Zhong, Youngik Yang, and Shibu Yooseph, "GRASP: Guided
Reference-based Assembly of Short Peptides". manuscript in preparation
Contact: Cuncong Zhong (czhong@jcvi.org) or Shibu Yooseph (syooseph.jcvi.org)
The authors would like to acknowledge that GRASP includes (part of) the
BOOST library (htt://www.boost.org) and the libdivsufsort library
(http://code.google.com/p/libdivsufsort/).