Abstract

Computational RNA secondary structure prediction approaches differ by the way RNA pseudoknot interactions are handled. For reasons of computational efficiency, most approaches only allow a limited class of pseudoknot interactions or are not considering them at all. Here we present a computational method for RNA secondary structure prediction that is not restricted in terms of pseudoknot complexity. The approach is based on simulating a folding process in a coarse-grained manner by choosing helices based on established energy rules. The steric feasibility of the chosen set of helices is checked during the folding process using a highly coarse-grained 3D model of the RNA structures. Using two data sets of 26 and 241 RNA sequences we find that this approach is competitive compared to the existing RNA secondary structure prediction programs pknotsRG, HotKnots and UnaFold. The key advantages of the new method are that there is no algorithmic restriction in terms of pseudoknot complexity and a test is made for steric feasibility. Availability: The program is available as web server at the site: http://cylofold.abcc.ncifcrf.gov.

INTRODUCTION

The variety of biochemical functions that are being carried out by RNA molecules is mesmerizing. Many RNAs such as ribosomal RNA, RNAase P or tRNA attain a defined secondary and tertiary structure that is vital to their function. Experimentally determined structures are only available for a small fraction of RNAs that are of interest. This makes the computational prediction of the base-pairing pattern (the secondary structure) of RNA an important problem. One major breakthrough was the development of dynamic programming algorithms that could predict the minimum free energy secondary structure of RNA sequence assuming that the structures are non-nested (1–5). Subsequently, dynamic programming algorithms have been extended to allow certain classes of pseudoknots (6,7).

Many RNA secondary structure prediction algorithms (including the one presented here) are based on the idea of iteratively adding substructures to an initially unfolded sequence (8,9). Genetic algorithms are an example of such algorithms and have proven very useful for exploring pseudoknotted structures and sub-optimal RNA structures (10–14).

Allowing pseudoknots is desirable simply because RNA structures determined by X-ray crystallography or NMR revealed that many RNAs contain non-nested base pairing interactions. Allowing all possible base pairing interactions leads to the potential problem for structure prediction approaches that not only are there many more conformations to consider, but also many conformations are not sterically feasible. Here, we describe a computational approach for RNA secondary structure prediction that has no restriction in terms of pseudoknot complexity, but additionally checks the steric feasibility of the considered conformations.

ALGORITHM

The described approach of RNA secondary structure prediction is based on the idea of maximizing matching helices in a secondary structure (10). A flow chart of the algorithm is shown in Figure 1. Briefly, the method works as follows: Initially, a list (called a stem-list) of all possible helices with more than 3 bp is generated. Helices can contain Watson–Crick and GU–wobble base pairs. The secondary structure prediction is performed by picking the best-scoring structure obtained after 50 folding simulation runs. The score is set to be the sum of the free energy contribution of the already placed helices. Each folding simulation run is performed by picking helices from the stem list with a Boltzmann-weighted probability. Estimating the free energy contribution of an RNA double-helix is accomplished using the RNA Vienna package (2). Each chosen helix is represented by a very coarse-grained 3D representation in a virtual 3D workspace. An RNA double helix is represented by a cylinder (using a radius of 6.5 � and a length of 2.7 � times the number of base pairs) that is capped with a half-sphere on both ends. This shape is called a capsule. A schematic diagram of the mapping of an RNA secondary structure into a highly coarse-grained 3D representation is shown in Figure 2. The main reason for choosing capped cylinders over regular cylinders is the computational efficiency of collision detection. Single stranded regions between helices are represented as constraints for the maximum distance between the ends of the capped cylinders. A newly chosen capped cylinder is placed into the 3D simulation space at a random position such that the distance-constraints are fulfilled. The distance constraints are a function of the single-stranded sequence lengths between connected helices. The maximum distance between helix ends is 2.0 � + n*8.0 � with n being the sequence separation. The minimum distance is 2.0.

A flow-chart depicting the algorithm for predicting RNA secondary structures.
Figure 1.

A flow-chart depicting the algorithm for predicting RNA secondary structures.

Scheme for mapping between an RNA secondary structure (a) and the used 3D coarse-grained representation (b). Each helix is represented as a capped cylinder (‘capsule’). Single-stranded regions between helices are represented as distance constraints. Only those RNA secondary structures can be a (partial) solution of a secondary structure prediction, for which the algorithm succeeds in placing the corresponding capped cylinders such that they do not collide and do not violate distance constraints.
Figure 2.

Scheme for mapping between an RNA secondary structure (a) and the used 3D coarse-grained representation (b). Each helix is represented as a capped cylinder (‘capsule’). Single-stranded regions between helices are represented as distance constraints. Only those RNA secondary structures can be a (partial) solution of a secondary structure prediction, for which the algorithm succeeds in placing the corresponding capped cylinders such that they do not collide and do not violate distance constraints.

If cylinders collide, the newly placed capped cylinder is placed at a different random position. If after 20 attempts the newly placed capped cylinder is still colliding with previously placed capped cylinders, the positions of all capped cylinders are optimized in order to minimize collisions and constraint violations. If no collision-free position can be found, the newly chosen helix and its capped cylinder representation is discarded. Otherwise, the found collision-free position is stored. Helices that are part of the stem-list and that share bases with the newly placed helix are removed from the stem-list. In the next iteration the next helix is chosen until no more helices can be placed. Once no more helices can be placed, one simulation run is completed. Fifty simulation runs are performed and the overall best-scoring structure is returned to the user.

IMPLEMENTATION

The folding algorithm is implemented as a C++ program. The web server has been implemented using the Grails framework (18), which is based on the Groovy programming language. For a secondary structure prediction request, the web server launches the cylofold binary on a Linux compute cluster. After the prediction result has been generated, the program VARNA (19) is launched to generate an image of the secondary structure prediction. The prediction results are temporarily stored in a relational database.

Usage

A user of the CyloFold prediction web server can start a secondary structure prediction request by entering (‘pasting’) a nucleotide sequence (as raw characters or in FASTA form, both ACGU and ACGT alphabets are accepted) into the web form and pressing ‘submit’. The maximum sequence length that is currently accepted by the web server is 300 nt. The initial return of the web server is a unique id, which is needed if one wants to access results at a later time. Due to the compute-intensive approach for the prediction, it can take several minutes for the server to finalize a secondary structure prediction. The user can access the results by one of three methods: a simple ‘reload’ of the initial result page will update the status of the prediction and will eventually contain the prediction results. Alternatively, the user can bookmark the initial result page in the web browser and return to it at a later time. Lastly, the unique id provided after submitting the secondary structure prediction compute request can be used to access the results using another web form available on the server home page.

A typical output from a completed RNA structure prediction is shown in Figure 3. The prediction result is presented to the user in three different formats: (i) An image of the predicted RNA secondary structure created by VARNA (19); (ii) An extended bracket notation in which nested base pairs are denoted as pairs of nested parentheses and helices corresponding to pseudoknot interactions are denoted as letters; (iii) The ‘CT’ file format that is also generated by other programs such as mfold (5). This format contains a list of the indices of the bases and their predicted base-pairing partners.

Screenshot of a typical prediction result returned by the CyloFold web server. The shown sequence corresponds to the bacteriophage T2 gene 32 mRNA pseudoknot (PDB 2TPK).
Figure 3.

Screenshot of a typical prediction result returned by the CyloFold web server. The shown sequence corresponds to the bacteriophage T2 gene 32 mRNA pseudoknot (PDB 2TPK).

RESULTS AND DISCUSSION

The performance of the new RNA secondary structure prediction method was evaluated using two different data sets. Data set 1 (corresponding to the results shown in Table 1) consists of 26 RNA sequences, whose tertiary structure is available in the Protein Data Bank (PDB). The reference secondary structure was obtained by extracting the base pair information from the PDB coordinate file using the program RNAview (20). Data set 2 consists of 241 RNA sequences and secondary structures originating from PseudoBase (21,22).

Table 1.

Prediction results corresponding to 26 RNA structures that are available in the Protein Dank Bank

PDBDescriptionLPKFCF
PK
HK
UF
MCCSNSPPVMCCSNSPPVMCCSNSPPVMCCSNSPPV
1A60TYMV tRNA-like structure4413.60.740.770.710.961.000.930.830.770.910.830.770.91
1CX0HDV ribozyme7222.2–0.010.000.00–0.010.000.00–0.010.000.00–0.010.000.00
1E95SRV-1 pseudoknot3633.31.001.001.001.001.001.001.001.001.000.700.501.00
1HVUHIV RT bind. pseudoknot3026.60.951.000.910.951.000.910.560.400.800.560.400.80
1KAJMMTV RNA pseudoknot3225.00.851.000.730.851.000.730.851.000.730.530.500.57
1KH6HCV IRES domain420.00.740.770.710.550.540.580.530.540.540.920.930.93
1KPYPEMV-1 P1P2 pseudoknot2722.20.891.000.800.941.000.890.790.621.000.790.631.00
1KXKGroupII self-splic. intron700.00.910.870.950.810.830.790.810.830.790.960.960.96
1L2XViral RNA pseudoknot2722.20.941.000.890.941.000.890.790.631.000.790.631.00
1Q9A23S rRNA sarcin/ricin270.00.910.831.000.770.830.710.861.000.750.770.830.71
1U8DGuanine riboswitch6711.90.870.870.870.880.781.000.880.781.000.880.781.00
2A43Luteoviral pseudoknot2623.00.931.000.880.931.000.880.750.571.000.750.571.00
2G1WtmRNA pseudoknot2218.10.811.000.670.861.000.750.810.671.000.810.671.00
2GISSAM- riboswitch948.50.800.760.850.800.760.850.860.860.860.550.550.55
2HOOthi-box riboswitch830.00.700.670.740.580.620.540.580.620.540.580.620.54
2K95P2B-P3 telo-merase RNA4837.50.890.801.000.890.801.000.750.80.710.540.400.75
2OIUL1 Ribozyme Ligase adduct710.00.860.780.950.980.961.000.980.961.000.981.001.00
2QUSHammerhead Ribozyme682.90.950.911.000.950.911.000.950.911.000.951.001.00
2QWYSAM-II riboswitch5226.90.480.460.500.480.460.50.340.310.40.350.310.40
2RP0PEMV1 mRNA pseudoknot2615.30.881.000.780.881.000.780.840.711.000.840.711.00
2TPKT2 gene 32 mRNA p.k.3627.71.001.001.001.001.001.000.710.580.880.630.580.70
361DDomain E of 5S rRNA190.00.861.000.750.861.000.750.830.830.830.830.830.83
3DIGLysine Riboswitch17330.10.890.850.930.740.720.760.740.720.760.740.720.76
3FU2class-I preQ1 riboswitch3218.80.790.631.000.790.631.000.790.631.000.790.631.00
3PHPTYMV p.k. hairpin230.01.001.001.001.001.001.001.001.001.001.001.001.00
437Drib. frame-shifting p.k.2722.20.941.000.890.941.000.890.790.631.000.790.631.00
MeanAll0.830.850.830.820.840.810.750.710.830.730.650.83
MeanNo pseudoknots<5.00.870.850.890.810.840.800.820.840.810.870.900.87
MeanPseudoknots>5.00.810.840.800.820.840.820.730.650.840.660.550.80
PDBDescriptionLPKFCF
PK
HK
UF
MCCSNSPPVMCCSNSPPVMCCSNSPPVMCCSNSPPV
1A60TYMV tRNA-like structure4413.60.740.770.710.961.000.930.830.770.910.830.770.91
1CX0HDV ribozyme7222.2–0.010.000.00–0.010.000.00–0.010.000.00–0.010.000.00
1E95SRV-1 pseudoknot3633.31.001.001.001.001.001.001.001.001.000.700.501.00
1HVUHIV RT bind. pseudoknot3026.60.951.000.910.951.000.910.560.400.800.560.400.80
1KAJMMTV RNA pseudoknot3225.00.851.000.730.851.000.730.851.000.730.530.500.57
1KH6HCV IRES domain420.00.740.770.710.550.540.580.530.540.540.920.930.93
1KPYPEMV-1 P1P2 pseudoknot2722.20.891.000.800.941.000.890.790.621.000.790.631.00
1KXKGroupII self-splic. intron700.00.910.870.950.810.830.790.810.830.790.960.960.96
1L2XViral RNA pseudoknot2722.20.941.000.890.941.000.890.790.631.000.790.631.00
1Q9A23S rRNA sarcin/ricin270.00.910.831.000.770.830.710.861.000.750.770.830.71
1U8DGuanine riboswitch6711.90.870.870.870.880.781.000.880.781.000.880.781.00
2A43Luteoviral pseudoknot2623.00.931.000.880.931.000.880.750.571.000.750.571.00
2G1WtmRNA pseudoknot2218.10.811.000.670.861.000.750.810.671.000.810.671.00
2GISSAM- riboswitch948.50.800.760.850.800.760.850.860.860.860.550.550.55
2HOOthi-box riboswitch830.00.700.670.740.580.620.540.580.620.540.580.620.54
2K95P2B-P3 telo-merase RNA4837.50.890.801.000.890.801.000.750.80.710.540.400.75
2OIUL1 Ribozyme Ligase adduct710.00.860.780.950.980.961.000.980.961.000.981.001.00
2QUSHammerhead Ribozyme682.90.950.911.000.950.911.000.950.911.000.951.001.00
2QWYSAM-II riboswitch5226.90.480.460.500.480.460.50.340.310.40.350.310.40
2RP0PEMV1 mRNA pseudoknot2615.30.881.000.780.881.000.780.840.711.000.840.711.00
2TPKT2 gene 32 mRNA p.k.3627.71.001.001.001.001.001.000.710.580.880.630.580.70
361DDomain E of 5S rRNA190.00.861.000.750.861.000.750.830.830.830.830.830.83
3DIGLysine Riboswitch17330.10.890.850.930.740.720.760.740.720.760.740.720.76
3FU2class-I preQ1 riboswitch3218.80.790.631.000.790.631.000.790.631.000.790.631.00
3PHPTYMV p.k. hairpin230.01.001.001.001.001.001.001.001.001.001.001.001.00
437Drib. frame-shifting p.k.2722.20.941.000.890.941.000.890.790.631.000.790.631.00
MeanAll0.830.850.830.820.840.810.750.710.830.730.650.83
MeanNo pseudoknots<5.00.870.850.890.810.840.800.820.840.810.870.900.87
MeanPseudoknots>5.00.810.840.800.820.840.820.730.650.840.660.550.80

L, Sequence length; PKF, fraction of pseudoknot interactions; For each of the four different prediction methods (CF, Cylofold; PK, pknotsRG; HK, HotKnots 2.0; UF, UNAFold) we report three different measures of prediction quality (SNS, sensitivity; PPV, positive predictive value).

Table 1.

Prediction results corresponding to 26 RNA structures that are available in the Protein Dank Bank

PDBDescriptionLPKFCF
PK
HK
UF
MCCSNSPPVMCCSNSPPVMCCSNSPPVMCCSNSPPV
1A60TYMV tRNA-like structure4413.60.740.770.710.961.000.930.830.770.910.830.770.91
1CX0HDV ribozyme7222.2–0.010.000.00–0.010.000.00–0.010.000.00–0.010.000.00
1E95SRV-1 pseudoknot3633.31.001.001.001.001.001.001.001.001.000.700.501.00
1HVUHIV RT bind. pseudoknot3026.60.951.000.910.951.000.910.560.400.800.560.400.80
1KAJMMTV RNA pseudoknot3225.00.851.000.730.851.000.730.851.000.730.530.500.57
1KH6HCV IRES domain420.00.740.770.710.550.540.580.530.540.540.920.930.93
1KPYPEMV-1 P1P2 pseudoknot2722.20.891.000.800.941.000.890.790.621.000.790.631.00
1KXKGroupII self-splic. intron700.00.910.870.950.810.830.790.810.830.790.960.960.96
1L2XViral RNA pseudoknot2722.20.941.000.890.941.000.890.790.631.000.790.631.00
1Q9A23S rRNA sarcin/ricin270.00.910.831.000.770.830.710.861.000.750.770.830.71
1U8DGuanine riboswitch6711.90.870.870.870.880.781.000.880.781.000.880.781.00
2A43Luteoviral pseudoknot2623.00.931.000.880.931.000.880.750.571.000.750.571.00
2G1WtmRNA pseudoknot2218.10.811.000.670.861.000.750.810.671.000.810.671.00
2GISSAM- riboswitch948.50.800.760.850.800.760.850.860.860.860.550.550.55
2HOOthi-box riboswitch830.00.700.670.740.580.620.540.580.620.540.580.620.54
2K95P2B-P3 telo-merase RNA4837.50.890.801.000.890.801.000.750.80.710.540.400.75
2OIUL1 Ribozyme Ligase adduct710.00.860.780.950.980.961.000.980.961.000.981.001.00
2QUSHammerhead Ribozyme682.90.950.911.000.950.911.000.950.911.000.951.001.00
2QWYSAM-II riboswitch5226.90.480.460.500.480.460.50.340.310.40.350.310.40
2RP0PEMV1 mRNA pseudoknot2615.30.881.000.780.881.000.780.840.711.000.840.711.00
2TPKT2 gene 32 mRNA p.k.3627.71.001.001.001.001.001.000.710.580.880.630.580.70
361DDomain E of 5S rRNA190.00.861.000.750.861.000.750.830.830.830.830.830.83
3DIGLysine Riboswitch17330.10.890.850.930.740.720.760.740.720.760.740.720.76
3FU2class-I preQ1 riboswitch3218.80.790.631.000.790.631.000.790.631.000.790.631.00
3PHPTYMV p.k. hairpin230.01.001.001.001.001.001.001.001.001.001.001.001.00
437Drib. frame-shifting p.k.2722.20.941.000.890.941.000.890.790.631.000.790.631.00
MeanAll0.830.850.830.820.840.810.750.710.830.730.650.83
MeanNo pseudoknots<5.00.870.850.890.810.840.800.820.840.810.870.900.87
MeanPseudoknots>5.00.810.840.800.820.840.820.730.650.840.660.550.80
PDBDescriptionLPKFCF
PK
HK
UF
MCCSNSPPVMCCSNSPPVMCCSNSPPVMCCSNSPPV
1A60TYMV tRNA-like structure4413.60.740.770.710.961.000.930.830.770.910.830.770.91
1CX0HDV ribozyme7222.2–0.010.000.00–0.010.000.00–0.010.000.00–0.010.000.00
1E95SRV-1 pseudoknot3633.31.001.001.001.001.001.001.001.001.000.700.501.00
1HVUHIV RT bind. pseudoknot3026.60.951.000.910.951.000.910.560.400.800.560.400.80
1KAJMMTV RNA pseudoknot3225.00.851.000.730.851.000.730.851.000.730.530.500.57
1KH6HCV IRES domain420.00.740.770.710.550.540.580.530.540.540.920.930.93
1KPYPEMV-1 P1P2 pseudoknot2722.20.891.000.800.941.000.890.790.621.000.790.631.00
1KXKGroupII self-splic. intron700.00.910.870.950.810.830.790.810.830.790.960.960.96
1L2XViral RNA pseudoknot2722.20.941.000.890.941.000.890.790.631.000.790.631.00
1Q9A23S rRNA sarcin/ricin270.00.910.831.000.770.830.710.861.000.750.770.830.71
1U8DGuanine riboswitch6711.90.870.870.870.880.781.000.880.781.000.880.781.00
2A43Luteoviral pseudoknot2623.00.931.000.880.931.000.880.750.571.000.750.571.00
2G1WtmRNA pseudoknot2218.10.811.000.670.861.000.750.810.671.000.810.671.00
2GISSAM- riboswitch948.50.800.760.850.800.760.850.860.860.860.550.550.55
2HOOthi-box riboswitch830.00.700.670.740.580.620.540.580.620.540.580.620.54
2K95P2B-P3 telo-merase RNA4837.50.890.801.000.890.801.000.750.80.710.540.400.75
2OIUL1 Ribozyme Ligase adduct710.00.860.780.950.980.961.000.980.961.000.981.001.00
2QUSHammerhead Ribozyme682.90.950.911.000.950.911.000.950.911.000.951.001.00
2QWYSAM-II riboswitch5226.90.480.460.500.480.460.50.340.310.40.350.310.40
2RP0PEMV1 mRNA pseudoknot2615.30.881.000.780.881.000.780.840.711.000.840.711.00
2TPKT2 gene 32 mRNA p.k.3627.71.001.001.001.001.001.000.710.580.880.630.580.70
361DDomain E of 5S rRNA190.00.861.000.750.861.000.750.830.830.830.830.830.83
3DIGLysine Riboswitch17330.10.890.850.930.740.720.760.740.720.760.740.720.76
3FU2class-I preQ1 riboswitch3218.80.790.631.000.790.631.000.790.631.000.790.631.00
3PHPTYMV p.k. hairpin230.01.001.001.001.001.001.001.001.001.001.001.001.00
437Drib. frame-shifting p.k.2722.20.941.000.890.941.000.890.790.631.000.790.631.00
MeanAll0.830.850.830.820.840.810.750.710.830.730.650.83
MeanNo pseudoknots<5.00.870.850.890.810.840.800.820.840.810.870.900.87
MeanPseudoknots>5.00.810.840.800.820.840.820.730.650.840.660.550.80

L, Sequence length; PKF, fraction of pseudoknot interactions; For each of the four different prediction methods (CF, Cylofold; PK, pknotsRG; HK, HotKnots 2.0; UF, UNAFold) we report three different measures of prediction quality (SNS, sensitivity; PPV, positive predictive value).

In order to quantify the time-complexity of the folding method, we fitted a function of the form a*Nb (with N being the number of residues in the input sequence) to the execution time needed for the cases of the 241 sequence set. We found that the execution time (measured in seconds) of the structure prediction is well described by the function 2.74*10−8*N4.47. The timing evaluation was performed on a computer with 4 GB of RAM and an Intel 64-bit Xeon processor (3.0MHz).

We report in Tables 1 and 2 prediction results for these two data sets together with the corresponding results obtained by running the RNA secondary structure prediction programs HotKnots 2.0 (8), pknotsRG (7) and UNAFold (23).

Table 2.

Prediction results for a set of 241 RNA sequences that are part of PseudoBase for the programs CyloFold, pknotsRG (7), HotKnots 2.0 (8) and UNAFold (23)

MCCSNSPPV
CyloFold0.7520.7630.747
pknotsRG0.7480.7530.756
HotKnots 2.00.6110.5650.684
UNAFold0.5970.5320.692
MCCSNSPPV
CyloFold0.7520.7630.747
pknotsRG0.7480.7530.756
HotKnots 2.00.6110.5650.684
UNAFold0.5970.5320.692

SNS, sensitivity of predicted base pairs; PPV, positive predictive value.

Table 2.

Prediction results for a set of 241 RNA sequences that are part of PseudoBase for the programs CyloFold, pknotsRG (7), HotKnots 2.0 (8) and UNAFold (23)

MCCSNSPPV
CyloFold0.7520.7630.747
pknotsRG0.7480.7530.756
HotKnots 2.00.6110.5650.684
UNAFold0.5970.5320.692
MCCSNSPPV
CyloFold0.7520.7630.747
pknotsRG0.7480.7530.756
HotKnots 2.00.6110.5650.684
UNAFold0.5970.5320.692

SNS, sensitivity of predicted base pairs; PPV, positive predictive value.

The average Matthews correlation coefficient (MCC) obtained by comparing the base pairing pattern of the predicted secondary structures with their respective reference secondary structure is for data set 1 and CyloFold 0.83; this can be compared to pknotsRG (0.82), HotKnots 2.0 (0.75) and UNAFold (0.73) (see row of Table 1 named ‘All’).

We divided this data set into two subsets according to the fraction of pseudoknot base pairs in the respective structures. The results can be seen in the last two rows of Table 1. The eight PDB structures with <5% pseudoknotted base pairs correspond to an average MCC of 0.87 for CyloFold compared to 0.81 for pknotsRG, 0.82 for HotKnots 2.0 and 0.87 for UNAFold. The 18 structures listed in Table 1 that have a pseudoknot amount >5% correspond to an average MCC of 0.81 for CyloFold, 0.82 for pknotsRG, 0.73 for HotKnots 2.0 and 0.66 for UnaFold.

Using the larger data set 2, one obtains an average MCC of 0.752 for CyloFold and 0.748 for pknotsRG (Table 2). In Table 2 one can see the RNA secondary structure predictions obtained by CyloFold correspond to the highest MCC (compared to the programs pknotsRG, HotKnots 2.0 and UNAFold). It also has the highest average base pair prediction sensitivity (0.763). For another measure, the positive predictive value (how often are predicted base pairs part of the reference secondary structure), all programs obtain averages between 0.68 and 0.76 for data set 2 with pknotsRG leading with a value of 0.756. It should be noted that the MCC is often used as an overall measure of prediction quality, while sensitivity, specificity and positive predictive value capture certain other aspects of the prediction quality.

These results indicate that the prediction accuracy of CyloFold compared to pknotsRG is similar. The key advantage of CyloFold is that there is no restriction in terms of the classes of pseudoknots that are being considered. Also, it should be noted that the employed model of simulated RNA folding by placing helices with a probability according to their free energy contribution is in essence very simple (24). In that sense it is surprising how well the method performs, and it should be an encouragement to continue to develop RNA folding algorithms that are substantially different from established approaches.

CONCLUSION

CyloFold is a new method for RNA secondary structure prediction. We show using two different data sets that the prediction accuracy (MCC) is comparable to the RNA secondary structure prediction program pknotsRG. The search algorithm has no restriction in terms of pseudoknot complexity. Another novel aspect is that at each step during the simulated folding process, the steric feasibility of the predicted structures is checked for steric feasibility using a highly coarse-grained 3D representation. The method is made available in the form of a user-friendly web server.

FUNDING

This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract HHSN26120080001E. This Research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. Funding for open access charge: National Cancer Institute.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We wish to thank the Advanced Biomedical Computing Center (ABCC) at the NCI for their computing support. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

REFERENCES

1
Nussinov
R
Pieczenik
G
Griggs
JR
Kleitman
DJ
,
Algorithms for loop matchings
SIAM J. Appl. Math.
,
1978
, vol.
35
(pg.
68
-
82
)
2
Hofacker
IL
Fontana
W
Stadler
PF
Bonhoeffer
S
Tacker
M
Schuster
P
,
Fast folding and comparison of RNA secondary structures
Monatshefte f. Chemie
,
1994
, vol.
125
(pg.
167
-
188
)
3
Mathews
DH
Sabina
J
Zuker
M
Turner
DH
,
Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
J. Mol. Biol.
,
1999
, vol.
288
(pg.
911
-
940
)
4
Mathews
DH
Disney
MD
Childs
JL
Schroeder
SJ
Zuker
M
Turner
DH
,
Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure
Proc. Natl Acad. Sci. USA
,
2004
, vol.
101
(pg.
7287
-
7292
)
5
Zuker
M
Stiegler
P
,
Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information
Nucleic Acids Res.
,
1981
, vol.
9
(pg.
133
-
148
)
6
Rivas
E
Eddy
SR
,
A dynamic programming algorithm for RNA structure prediction including pseudoknots
J. Mol. Biol.
,
1999
, vol.
285
(pg.
2053
-
2068
)
7
Reeder
J
Steffen
P
Giegerich
R
,
pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
W320
-
W324
)
8
Andronescu
MS
Pop
C
Condon
AE
,
Improved free energy parameters for RNA pseudoknotted secondary structure prediction
RNA
,
2010
, vol.
16
(pg.
26
-
42
)
9
Ruan
J
Stormo
GD
Zhang
W
,
An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots
Bioinformatics
,
2004
, vol.
20
(pg.
58
-
66
)
10
Shapiro
BA
Wu
JC
,
Predicting RNA H-type pseudoknots with the massively parallel genetic algorithm
Comput. Appl. Biosci.
,
1997
, vol.
13
(pg.
459
-
471
)
11
Shapiro
BA
Bengali
D
Kasprzak
W
Wu
JC
,
RNA folding pathway functional intermediates: their prediction and analysis
J Mol Biol
,
2001
, vol.
312
(pg.
27
-
44
)
12
Benedetti
G
Morosetti
S
,
A genetic algorithm to search for optimal and suboptimal RNA secondary structures
Biophys. Chem.
,
1995
, vol.
55
(pg.
253
-
259
)
13
Gultyaev
AP
van Batenburg
FH
Pleij
CW
,
The computer simulation of RNA folding pathways using a genetic algorithm
J. Mol. Biol.
,
1995
, vol.
250
(pg.
37
-
51
)
14
Shapiro
BA
Navetta
J
,
A massively parallel genetic algorithm for RNA secondary structure prediction
J. Supercomput.
,
1994
, vol.
8
(pg.
195
-
207
)
15
Shapiro
BA
Kasprzak
W
Grunewald
C
Aman
J
,
Graphical exploratory data analysis of RNA secondary structure dynamics predicted by the massively parallel genetic algorithm
J. Mol. Graph Model
,
2006
, vol.
25
(pg.
514
-
531
)
16
Shapiro
BA
Wu
JC
Bengali
D
Potts
MJ
,
The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation
Bioinformatics
,
2001
, vol.
17
(pg.
137
-
148
)
17
Ren
J
Rastegari
B
Condon
A
Hoos
HH
,
HotKnots: heuristic prediction of RNA secondary structures including pseudoknots
RNA
,
2005
, vol.
11
(pg.
1494
-
1504
)
18
Rocher
G
Brown
J
The Definitive Guide to Grails
,
2009
 
Apress, New York
19
Darty
K
Denise
A
Ponty
Y
,
VARNA: interactive drawing and editing of the RNA secondary structure
Bioinformatics
,
2009
, vol.
25
(pg.
1974
-
1975
)
20
Yang
H
Jossinet
F
Leontis
N
Chen
L
Westbrook
J
Berman
H
Westhof
E
,
Tools for the automatic identification and classification of RNA base pairs
Nucleic Acids Res.
,
2003
, vol.
31
(pg.
3450
-
3460
)
21
van Batenburg
FH
Gultyaev
AP
Pleij
CW
,
PseudoBase: structural information on RNA pseudoknots
Nucleic Acids Res
,
2001
, vol.
29
(pg.
194
-
195
)
22
van Batenburg
FH
Gultyaev
AP
Pleij
CW
Ng
J
Oliehoek
J
,
PseudoBase: a database with RNA pseudoknots
Nucleic Acids Res.
,
2000
, vol.
28
(pg.
201
-
204
)
23
Markham
NR
Zuker
M
,
UNAFold: software for nucleic acid folding and hybridization
Methods Mol. Biol.
,
2008
, vol.
453
(pg.
3
-
31
)
24
Martinez
HM
,
An RNA folding rule
Nucleic Acids Res.
,
1984
, vol.
12
(pg.
323
-
334
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.