The non-coding RNA (ncRNA) elements in the 3' untranslated regions (3'-UTRs) are known to participate in
the genes' post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization.
Inferring co-expression patterns of the genes by clustering their 3'-UTR ncRNA elements will provide invaluable
knowledge for further studies of their functionalities and interactions under specific physiological processes. In this
work, we propose an improved RNA structural clustering pipeline that takes into account the length-dependent
distribution of the structural similarity measure. Benchmark of the proposed pipeline on Rfam data clearly
demonstrates over 10% performance gain, when compared to a traditional hierarchical clustering pipeline. By
applying the proposed clustering pipeline to Drosophila melanogaster 's 3'-UTRs, we have successfully identified
184 ncRNA clusters, of which 91.3% appear to be true RNA structural elements, based on RNAz's prediction.
Among the clusters, we have rediscovered the well-known histone ncRNA family as well as a number of other
families whose potential functionalities may be inferred from existing studies. One of such families contains
genes that are preferentially expressed in male Drosophila. In situ hybridization further reveals their characteristic
'cup' or 'comet' localization patterns in Drosophila testis. The complete clustering results are available from
Finding cliques in an undirected, unit-weighted graph:
CLCL -i graph -n num_of_nodes [-c clique_size_cutoff] [-e connectivity_cutoff]
'-i' specify the graph to be processed by the algorithm. The graph is represented
by a n-by-n binary matrix, where a '1' in the matrix indicates that the corresponding
vertices are connected, and a '0' otherwise. DO NOT INCLUDE HEADER IN THE MATRIX!
'-n' specify the number of nodes in the graph, which is expected to be a positive
'-c' specify the minimum number of nodes in the output clique. If a clique has less
nodes that this cutoff, it will not be output. The default value is 3.
'-e' specify the connectivity cutoff for merging (please see the article for detail).
The default value is 0.4.
Example: CLCL -i Test_Graph -n 100
The outputs are a number of rows of tab-delimited numbers. Each row represents a clique,
and the tab-delimited numbers represents nodes in the clique. The number is the row/column
index of the node in the input graph. MAKE SURE A CORRESPONDENCE LIST IS MAINTAINED
CORRECTLY TO MAP BACK THE CLIQUES!