RNA tertiary structure prediction with ModeRNA

Modified nucleotides present in the target and the template structure (symbols in parentheses indicate common abbreviations), and the operations used by ModeRNA for modeling of modifications in the target

Residue number	Template (1EHZ) residue	Target residue	ModeRNA operation
10	N2-methylguanosine (m2G, L, 2G)	Guanosine (G)	Remove modification
16	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
17	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
20	Guanosine (G)	Dihydrouridine (D, 6U)	Replace G with U, add modification
26	N2,N2-dimethylguanosine (m22G, R, 3G)	Adenosine (A)	Remove modification, replace G with A
32	2′-O-methylcitidine (Cm, B, 0C)	Uridine (U)	Remove modification, replace C with U
34	2′-O-methylguanosine (Gm, #, 0G)	Cytidine (C)	Remove modification, replace G with C
37	Wybutosine (yW, Y, 16G)	N6-methyl-N6-threnylcarbamoyladenosine (m6t6A, E, 15A)	Remove modification, replace A with G, add modification
39	Pseudouridine (Y, P, 1U)	Uridine (U)	Remove modification
40	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
46	7-methylguanosine (m7G, 7, 7G)	7-methylguanosine (m7G, 7, 7G)	Copy modified nucleotide
49	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
54	5-methyluridine (m5U, T, 5U)	5-methyluridine (m5U, T, 5U)	Copy modified nucleotide
55	Pseudouridine (Y, P, 1U)	Pseudouridine (Y, P, 1U)	Copy modified nucleotide
58	1-methyladenosine (m1A, 1A)	Adenosine (A)	Remove modification

Residue number	Template (1EHZ) residue	Target residue	ModeRNA operation
10	N2-methylguanosine (m2G, L, 2G)	Guanosine (G)	Remove modification
16	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
17	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
20	Guanosine (G)	Dihydrouridine (D, 6U)	Replace G with U, add modification
26	N2,N2-dimethylguanosine (m22G, R, 3G)	Adenosine (A)	Remove modification, replace G with A
32	2′-O-methylcitidine (Cm, B, 0C)	Uridine (U)	Remove modification, replace C with U
34	2′-O-methylguanosine (Gm, #, 0G)	Cytidine (C)	Remove modification, replace G with C
37	Wybutosine (yW, Y, 16G)	N6-methyl-N6-threnylcarbamoyladenosine (m6t6A, E, 15A)	Remove modification, replace A with G, add modification
39	Pseudouridine (Y, P, 1U)	Uridine (U)	Remove modification
40	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
46	7-methylguanosine (m7G, 7, 7G)	7-methylguanosine (m7G, 7, 7G)	Copy modified nucleotide
49	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
54	5-methyluridine (m5U, T, 5U)	5-methyluridine (m5U, T, 5U)	Copy modified nucleotide
55	Pseudouridine (Y, P, 1U)	Pseudouridine (Y, P, 1U)	Copy modified nucleotide
58	1-methyladenosine (m1A, 1A)	Adenosine (A)	Remove modification

Note: Abbreviations of the modified nucleotides are listed in parentheses (abbreviation, one-letter abbreviation, numerical code).

Table 1:

Open in new tab Download slide

Modified nucleotides present in the target and the template structure (symbols in parentheses indicate common abbreviations), and the operations used by ModeRNA for modeling of modifications in the target

Residue number	Template (1EHZ) residue	Target residue	ModeRNA operation
10	N2-methylguanosine (m2G, L, 2G)	Guanosine (G)	Remove modification
16	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
17	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
20	Guanosine (G)	Dihydrouridine (D, 6U)	Replace G with U, add modification
26	N2,N2-dimethylguanosine (m22G, R, 3G)	Adenosine (A)	Remove modification, replace G with A
32	2′-O-methylcitidine (Cm, B, 0C)	Uridine (U)	Remove modification, replace C with U
34	2′-O-methylguanosine (Gm, #, 0G)	Cytidine (C)	Remove modification, replace G with C
37	Wybutosine (yW, Y, 16G)	N6-methyl-N6-threnylcarbamoyladenosine (m6t6A, E, 15A)	Remove modification, replace A with G, add modification
39	Pseudouridine (Y, P, 1U)	Uridine (U)	Remove modification
40	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
46	7-methylguanosine (m7G, 7, 7G)	7-methylguanosine (m7G, 7, 7G)	Copy modified nucleotide
49	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
54	5-methyluridine (m5U, T, 5U)	5-methyluridine (m5U, T, 5U)	Copy modified nucleotide
55	Pseudouridine (Y, P, 1U)	Pseudouridine (Y, P, 1U)	Copy modified nucleotide
58	1-methyladenosine (m1A, 1A)	Adenosine (A)	Remove modification

Residue number	Template (1EHZ) residue	Target residue	ModeRNA operation
10	N2-methylguanosine (m2G, L, 2G)	Guanosine (G)	Remove modification
16	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
17	Dihydrouridine (D, 6U)	Dihydrouridine (D, 6U)	Copy modified nucleotide
20	Guanosine (G)	Dihydrouridine (D, 6U)	Replace G with U, add modification
26	N2,N2-dimethylguanosine (m22G, R, 3G)	Adenosine (A)	Remove modification, replace G with A
32	2′-O-methylcitidine (Cm, B, 0C)	Uridine (U)	Remove modification, replace C with U
34	2′-O-methylguanosine (Gm, #, 0G)	Cytidine (C)	Remove modification, replace G with C
37	Wybutosine (yW, Y, 16G)	N6-methyl-N6-threnylcarbamoyladenosine (m6t6A, E, 15A)	Remove modification, replace A with G, add modification
39	Pseudouridine (Y, P, 1U)	Uridine (U)	Remove modification
40	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
46	7-methylguanosine (m7G, 7, 7G)	7-methylguanosine (m7G, 7, 7G)	Copy modified nucleotide
49	5-methylcytidine (m5C,?, 5C)	Guanosine (G)	Remove modification, replace C with G
54	5-methyluridine (m5U, T, 5U)	5-methyluridine (m5U, T, 5U)	Copy modified nucleotide
55	Pseudouridine (Y, P, 1U)	Pseudouridine (Y, P, 1U)	Copy modified nucleotide
58	1-methyladenosine (m1A, 1A)	Adenosine (A)	Remove modification

Note: Abbreviations of the modified nucleotides are listed in parentheses (abbreviation, one-letter abbreviation, numerical code).

In cases where modifications from the template matched those from the target, they were simply copied in the same manner as unmodified residues (e.g. the dihydrouridines in position 16 and 17). Some modifications needed to be changed into unmodified nucleosides (e.g. position 10). In such cases, the unmodified base was introduced by superposition of its three atoms nearest to the glycosidic bond onto the modified base to be replaced.

In the opposite situation, i.e. when an unmodified residue needed to be changed into a modified one (e.g. in position 20), one or more structural fragments containing the additional chemical groups were added to the base or ribose. When one modified residue needed to be replaced by another modified residue (e.g. in position 37), the first operation introduced a new unmodified residue, followed by an addition of the new modification. ModeRNA contains a set of 70 structural fragments that enable building 115 known modifications. Addition of the fragments is guided by a set of rules describing the atom triplets used for superposition and atoms to be added and removed.

In order to add dihydrouridine, the entire base needed to be exchanged due to the nonplanarity of the partially saturated ring. The same applied to pseudouridine, because the base ring needed to be rotated and connected with ribose via the C6 atom instead of N1. ModeRNA automatically identified and executed the proper rules for adding these modifications.

In addition to modeling modified residues, ModeRNA features a few other operations for working with modified nucleosides. First, modified nucleosides in an RNA structure can be detected (using the find_modifications function). The recognition is based on atomic coordinates and therefore independent of the nomenclature in the PDB file. Modifications can also be directly added to or removed from a specified residue (functions add_modification and remove_modification). It is also possible to remove all modifications in one step (function remove_all_modifications). ModeRNA contains a modification nomenclature implemented in the MODOMICS database [26]. For each modification, it stores a full name, a one-letter and a few-letter abbreviation, a common PDB residue name and a numerical code. In the alignment, the one-letter abbreviations are used, if available. Because there are more modifications than reasonably usable ASCII characters, the numerical code can be used alternatively, e.g. 001U for pseudouridine.

ADVANCED MODELING OF THE ANTICODON LOOP

Constructing a complete model from a template and alignment is the simplest way to obtain a model. However, in some cases it is not sufficient and some additional editing is required. ModeRNA provides many commands that enable changes on different levels: the entire molecule, a particular region and a single residue (see http://www.genesilico.pl/moderna/commands/).

In case of E. coli tRNA^Thr, we used the model built on the 1EHZ template and grafted the anticodon loop modeled on another template (1C0A). We built three models in addition to the one obtained previously. First, we added a fragment containing just the anticodon loop (9 residues), then one with almost the entire anticodon arm plus one terminal base pair (15 residues). Finally, we tried a fragment with the anticodon arm and a few additional nucleotides from an adjacent helical stem (19 residues).

In order to insert the three anticodon arm fragments, we cut out parts of 1C0A and saved them as separate PDB files. To extract residues from 1C0A, we used ModeRNA instead of manually copying residues between PDB files. We did that, because ModeRNA guarantees avoiding accidental formatting mistakes by using a unified nomenclature and numbering of atoms and residues. The structure 1C0A was loaded, unnecessary residues were deleted and the remaining structure was saved (the load_model, delete_residue, write_model commands were used). Thereby, three fragments were generated: U-625 to G-645 (UACCUGCCUQUCACGCAGGGG), C-627 to G-643 (CCUGCCUQUCACGCAGG) and G-630 to C-640 (GCCUQUCACGC). The two terminal residues of each fragment were used as anchor residues for superposition with the corresponding residues from the model.

The prepared PDB files were used to create fragment objects with ModeRNA (create_fragment). We specified two anchor residues from the model to guide the insertion and a new sequence for the anticodon fragment. During the insertion process, the fragments were superimposed using the anchor residues from fragment and model (the fragment was mobile and the model stayed in a fixed position). In particular the atoms O3′, C3′, C4′, C1′ and N1 or N9 on the 5′-end and C5′, C4′, C3′, C1′, N1 or N9, and O5′ on 3′-end were used for superposition. All residues present in the model between the anchors specified during fragment creation were removed and new residues from the fragment were added.

Eventually, the insertion of a fragment with a nonideal match with the anchor residues resulted in minor gaps in the backbone of the model. However, ModeRNA managed to rebuild backbone between such residues (function fix_backbone).

USAGE OF THE MODERNA SERVER

To make using ModeRNA simpler, we implemented a web interface available at http://iimcb.genesilico.pl/modernaserver/ [27], which provides many of the functions described above. First, a template and alignment can be obtained according to the procedure described above using the ‘Find template’ tab. Thus, it is possible to use just a sequence of E. coli tRNA^Thr to find possible templates (in this case 1EVV and 1EHZ in the first position and 1C0A in the fifth) and subsequently build a model. The server also provides alignments prepared using Infernal and a covariance model from Rfam. Moreover, the geometry and secondary structure of the chosen template and model can be analyzed by the server, and PDB structures can be cleaned from ions and water molecules using the ‘Analyze structure’ tab. Modeling based on a template and alignment can be carried out using the ‘Build model’ tab. In addition, a series of nucleotide exchanges including modified nucleotides can be performed in a straightforward way by providing a template structure and a target sequence.

More advanced manipulations of the RNA structure, e.g. extending a helix, adding a particular structural fragment, and searching for fragments with a given secondary structure require the standalone version of the software and are not yet possible with the server.

Summarizing, all steps described in this manuscript, except the replacement of the anticodon loop, can be done online by the ModeRNA server, without installing ModeRNA locally. Thus, the ModeRNA server is a convenient way to get started in many straightforward cases of homology modeling.

EVALUATION OF TRNA MODELS

As a result of the modeling procedure described above, we obtained five complete tRNA models (Figure 3). One was built without further editing on the 1EHZ template, one was built using the same procedure on the 1C0A template, and three were built based on the 1EHZ template, followed by replacement of an anticodon hairpin loop model based on 1C0A. Despite the fact that the 1EHZ template had the highest sequence identity, its anticodon loop was in a stacked conformation. Conversely, 1C0A had lower sequence identity but its structure was changed due to its interaction with class II tRNA synthetase, so it was in a similar state as we would like to obtain for our model.

Figure 3:

Structural superposition of E. coli tRNA(Thr) models. (A) Models built automatically (based on 1EHZ—square, based on 1C0A—circle) superimposed on the solved crystal structure 1QF6 (star); (B) anticodon arm fragments of different models superimposed on the crystal structure; models based on both templates are indicated by open circle (the 9-nt fragment insertion), triangle (15-nt fragment insertion) and diamond (19-nt fragment insertion); (C) D-loop (residues 14–21); (D) terminal fragment of acceptor arm (residues 72–76).

In order to evaluate the models, we checked simple structural features like unusual geometry and interatomic clashes (functions find_clashes, analyze_geometry) (Table 2). The test revealed that all models exhibited minor geometrical problems. Nonetheless, after manual inspection in PyMOL [28], we saw that none of them were very serious, all except one originated from unusual lengths of the glycosidic bond, which were 0.01 Å shorter than the allowed boundary. Moreover, none of the models seemed to have accumulated a particularly high level of errors during modeling. The model built with just one command on the 1EHZ template contained the lowest number of geometric distortions and the one built solely on 1C0A had the most problems. These measures provided information on how far the chemical structure and details of stereochemistry were from the ideal values in the model, but not whether the macromolecular structure was modeled correctly.

Table 2:

Evaluation of E. coli tRNA(Thr) models

Benchmark	Model based on 1EHZ	Model based on 1C0A	Model based on 1EHZ + 9 nt loop	Model based on 1EHZ + 15 nt loop	Model based on 1EHZ + 19 nt loop
Interatomic clashes	1	2	1	1	1
Unusual bond lengths	2	7	3	4	3
Unusual bond angles	0	3	1	1	2
Unusual dihedral angles	2	3	4	2	2
All-atom RMSD	5.07	3.38	4.33	4.01	3.78
P, C4′ RMSD	4.37	2.60	3.84	3.47	3.03
TM score	0.56	0.67	0.54	0.57	0.63
GDT-TS score	0.55	0.66	0.53	0.55	0.61
DI	0.82	0.79	0.82	0.74	0.69
Average DP	12.70	9.85	12.24	11.51	11.04

Benchmark	Model based on 1EHZ	Model based on 1C0A	Model based on 1EHZ + 9 nt loop	Model based on 1EHZ + 15 nt loop	Model based on 1EHZ + 19 nt loop
Interatomic clashes	1	2	1	1	1
Unusual bond lengths	2	7	3	4	3
Unusual bond angles	0	3	1	1	2
Unusual dihedral angles	2	3	4	2	2
All-atom RMSD	5.07	3.38	4.33	4.01	3.78
P, C4′ RMSD	4.37	2.60	3.84	3.47	3.03
TM score	0.56	0.67	0.54	0.57	0.63
GDT-TS score	0.55	0.66	0.53	0.55	0.61
DI	0.82	0.79	0.82	0.74	0.69
Average DP	12.70	9.85	12.24	11.51	11.04

Note: Bold indicates relatively best values.

Table 2:

Evaluation of E. coli tRNA(Thr) models

Benchmark	Model based on 1EHZ	Model based on 1C0A	Model based on 1EHZ + 9 nt loop	Model based on 1EHZ + 15 nt loop	Model based on 1EHZ + 19 nt loop
Interatomic clashes	1	2	1	1	1
Unusual bond lengths	2	7	3	4	3
Unusual bond angles	0	3	1	1	2
Unusual dihedral angles	2	3	4	2	2
All-atom RMSD	5.07	3.38	4.33	4.01	3.78
P, C4′ RMSD	4.37	2.60	3.84	3.47	3.03
TM score	0.56	0.67	0.54	0.57	0.63
GDT-TS score	0.55	0.66	0.53	0.55	0.61
DI	0.82	0.79	0.82	0.74	0.69
Average DP	12.70	9.85	12.24	11.51	11.04

Benchmark	Model based on 1EHZ	Model based on 1C0A	Model based on 1EHZ + 9 nt loop	Model based on 1EHZ + 15 nt loop	Model based on 1EHZ + 19 nt loop
Interatomic clashes	1	2	1	1	1
Unusual bond lengths	2	7	3	4	3
Unusual bond angles	0	3	1	1	2
Unusual dihedral angles	2	3	4	2	2
All-atom RMSD	5.07	3.38	4.33	4.01	3.78
P, C4′ RMSD	4.37	2.60	3.84	3.47	3.03
TM score	0.56	0.67	0.54	0.57	0.63
GDT-TS score	0.55	0.66	0.53	0.55	0.61
DI	0.82	0.79	0.82	0.74	0.69
Average DP	12.70	9.85	12.24	11.51	11.04

Note: Bold indicates relatively best values.

As the structure of E. coli tRNA^Thr interacting with its cognate aaRS enzyme has been experimentally solved [29] and is available as a PDB entry 1QF6 (chain B), we were able to assess the real accuracy of our models. To do so, we applied six different benchmarks of structural similarity (Table 2).

The root mean square deviation (RMSD) is the most standard measure of similarity between two 3D structures, and is still the one most frequently found in the literature. Generally, the smaller the RMSD, the more similar the compared structures (with 0 corresponding to identity); however, this relationship may not hold for structures with very different sizes, or those exhibiting conformational changes. We calculated RMSDs between the experimentally solved structure and our five models in two ways: using all heavy atoms and only two backbone atoms (P and C4′). The template modeling (TM) score is a normalized measure of the overall structure similarity that does not depend on the structure size. It ranges from 1 (identity) to 0 (no similarity) and follows an extreme value distribution; for protein structures values above 0.5 typically indicate similar structures, while values below 0.5 are characteristic of dissimilar structures [30]. Global Distance Test—Total Score (GDT–TS) is another measure whose value ranges between 0 (no identity) to 1 (100% identity) and indicates the number of corresponding atom pairs between the compared structures found below four different distance thresholds: 1, 2, 4 and 8 Å divided by four times the total number of atoms [14]. We also applied the deformation index (DI) and deformation profile (DP) metrics [31]. The DI indicates how well the base pairing and stacking interactions were modeled, with 1 being the ideal value. On the contrary, DP highlights the dissimilarity between two structures and is calculated by superimposing each pair of corresponding residues and generating a matrix of per-residue RMSD values (hence, the lower the DP, the more similar the structures). Here, we present the average of all values from the DP matrix, which depends on the molecule size.

As indicated earlier, we intended to generate a model of E. coli tRNA^Thr in a bound conformation; therefore, we used the E. coli tRNA^Thr–aaRS complex as the benchmark, and in this case ‘model accuracy’ indicates the ability of the predicted structure to approximate this particular conformation solved under specific experimental conditions. The analysis using five of the six measures (RMSD, backbone RMSD, TM, GDT–TS and average DP) showed that insertions of the anticodon loop modeled in the bound conformation improved the accuracy of E. coli tRNA^Thr models over the model built on 1EHZ alone (anticodon loop in unbound conformation), and that the longest insertion gave the best result. The model based on the template 1C0A (the one with lower sequence similarity, but in the most relevant functional state) turned out to be the most accurate. The DI reached the best value for the model constructed with the 1EHZ template alone (0.82), the same DI score was achieved for the model with the smallest insertion, and slightly lower for the model built with the 1C0A template (0.79). Summarizing, the models based on 1EHZ (template with the highest sequence identity to the target sequence) had the best local quality and the worst global accuracy, and the models built on 1C0A (the template in the desired functional state) had the worst local quality, but were globally most accurate.

Having inspected the models superimposed on the crystal structure 1QF6 (Figure 3), we found that the models with the replaced anticodon loop were more similar to the crystal structure than the model built on the 1EHZ template alone. In the models with the anticodon loop replaced, the anticodon bases were displaced from the corresponding residues in the 1QF6 structure. In the model built on the 1C0A template, the anticodon bases were closer to those in 1QF6, caused by a different orientation of the anticodon stem in 1C0A compared to 1EHZ. Another region that had a large impact on the quality of models was the 3′-CCA tail, which is known to crystallize in different conformations [32]. Another problematic region was the D loop. It is also known as a flexible region in tRNA because of two dihydrouridine residues, as described above.

We also compared the backbone RMSD between the native structure 1QF6 and the templates. In both cases, the values were identical: 4.37 for the 1EHZ template and 2.60 for the 1C0A template. Thus, the modeling process did not affect the overall similarity of the backbone between the target and template molecules. The values hint at the structural variety among tRNA structures. In an analysis of 99 tRNA structures, we found that tRNAs with an identical sequence differ by up to 4.2 Å [4].

DISCUSSION

This article presents a case study of RNA 3D structure prediction using the comparative modeling approach and the ModeRNA software. We built five models of E. coli tRNA^Thr in a conformation bound to an aaRS, starting from a target sequence. The choice of the template was not obvious and a few alternatives have been compared.

Two alternative templates were selected based on the known fact that the structural architecture of tRNA is strongly conserved. The target and the two templates originate from different phylogenetic domains and are specific for three different amino acids. Their sequences thus diverged already around the time of fixation of the genetic code. The conservation of the L-shaped architecture in all experimentally determined tRNA structures suggests that the same structure is likely to be found in their homologs. Although it is known that there is considerable structural variability in tRNA [33], it occurs mainly in the anticodon loop and CCA tail, two regions for which modeling was problematic.

Further support for the feasibility of the chosen templates comes from the alignments. The target sequence had the same length as the 1EHZ template, and a single-residue deletion compared to 1C0A. Recently, Capriotti et al. [34] described the ‘twilight zone of sequence identity’ at which homology modeling of RNA becomes difficult and it was reported to be around 0.42–0.71. Sequence identities of the templates 1EHZ and 1C0A to our target were 0.59 and 0.46, respectively. These numbers have been calculated with modified nucleotide residues present (which were not considered by Capriotti et al.). By default, D and U were considered as nonidentical, even though dihydrouridine is a derivative of uridine. ‘Demodification’ of the sequences (replacement of all Ds by Us, etc.) makes their per cent identity cross the level of 0.72, considered as the ‘safe zone’ for RNA homology modeling [34].

As exemplified by the modeling of E. coli tRNA^Thr, the sequence identity to the target molecule should not be the only criterion for the selection of a template. In comparative modeling, the user is responsible for defining the functional state desired for the target molecule. Macromolecules such as proteins or RNAs exhibit local and/or global conformational changes that depend on their environment, in particular interactions with other molecules. The binding of a particular molecule type often induces a conformational transition common to many members of the same family of macromolecules. Therefore, the conformational variability should be kept in mind and only those templates, whose conformation is consistent with the functional state to be modeled, should be selected.

Five tRNA models obtained in the modeling described above had an all-atom RMSD to the experimentally solved structure in the range of 3.4–5.1 Å. When only RMSD is concerned as structure similarity measure, using the 1C0A template in that was in the protein-bound state gave the best result, despite its lower sequence identity to the target, compared to the 1EHZ template. Five of the six benchmarks showed that a model based on the 1C0A is the most similar to the crystal structure of E. coli tRNA^Thr complexed with aaRS. However, according to the DI, the model built on 1EHZ is more accurate than the one on 1C0A, i.e. the base interactions are more similar to those in the experimentally determined structure. ‘Grafting’ an anticodon loop from the 1C0A-based model onto the 1EHZ-based model improved the global accuracy, but led to a decrease in the local quality. It is clear that building a model accurate both on global and local scale using a comparative approach is difficult. Further, the combined use of different templates for different regions of the target molecule significantly increases the time and complexity of the modeling exercise, compared to the ‘one step’ modeling based on target sequence alone.

In order to check the performance of ModeRNA in its simplest, maximally automated mode, it was tested on a set of 99 known tRNA structures using each other as templates [4]. The conformational heterogeneity of these structures was considerable, with the average P and C4′ RMSD reaching 4.9 Å. Hence, a dataset of all-versus-all target–template pairs contained many very similar structures, some of them in a completely different conformation than the crystal structure 1QF6. Besides, there were many tRNAs with a large insertion (a variable loop), which could not be modeled accurately on templates lacking a corresponding loop. The resulting 9675 tRNA models exhibited average all-atom RMSD values of 5.6 Å, 5.2 Å for P & C4′-atom RMSD, 0.5 for GDT-TS, 0.62 for DI, and 13.82 for DP. Comparison of these results to the values obtained for E. coli tRNA^Thr showed that by carefully choosing the template and modeling strategy, it is possible to build reliable models. Searching templates in the Rfam database by using covariance models is applicable to other families of ncRNA as well.

Models for tRNAs have been built with other programs as well. Several models for yeast tRNA^Phe have been generated, clustered and ranked with Nucleic Acid Simulation Tool (NAST), which is based on a coarse-grained knowledge-based potential. The three resulting clusters had an average all-atom RMSD to the native structure of 8.0, 13.6 and 15.6 Å and a GDT-TS of 0.2, 0.08 and 0.07, respectively [35]. Flores and Altman have also built a model for tRNA^Phe using constraints from a limited number of tertiary contacts, stacking interactions and NMR data which resulted in a P-only RMSD of 9.6 Å compared to the native structure [36]. Lavender et al. [37] have built a tRNA^Asp model with a P-only RMSD of 6.2 Å. Ding and Dokholan have modeled tRNA^Phe using the DMD software and secondary structure restraints with a RMSD 7.2 Å compared to the native structure [38], and the same authors have built a tRNA model with a RMSD of 4.0 Å using Molecular Dynamics alone [39]. More recently, Cao and Chen reported modeling of a tRNA with an all-atom RMSD of 4.2 Å to the native structure using a combination of knowledge-based potential and Molecular Dynamics simulation [40]. These values require two comments: first, one needs to be very careful when comparing RMSD values. The above paragraph alone contains four different modes of calculation that cannot be compared directly. Second, most of the methods mentioned above start from an unfolded RNA sequence and simulate the folding process. ModeRNA uses a template, which of course facilitates the model building, but taking into account the template identification and preparation of a target–template alignment, comparative modeling presents its own challenges. While it has been demonstrated that methods that simulate folding with the use of experimental restraints are capable of building accurate tRNA models, the ones built by ModeRNA are at least competitive. One decisive advantage of the homology modeling approach is calculation time. Where knowledge-based potentials often require hours to fold a tRNA-sized structure on a single processor, and full-atom Molecular Dynamics much longer, ModeRNA calculates such a model within seconds.

ModeRNA allows editing models explicitly, e.g. by fragment insertion, and by adding or removing helical regions and single base pairs. The recombination of different structural parts may result in backbone breaks and ModeRNA can reconstruct the backbone in such regions (function fix_backbone). This functionality was successfully used to build a model of the Azoarcus group I intron that compares well (4.3 Å versus 4.4 Å) with the one generated using the RNABuilder software by Flores et al. [41]. An initial model for the Azoarcus intron has been created by Rangan et al. [42]. Homology modeling was also used to construct a model of the 30S ribosomal subunit from E. coli [43]. Compared to the crystal structure (PDB-ID 3R8N), the model reaches a RMSD of 3.3 Å.

Comparative models (and in fact any models built with any method) may contain geometrical distortions and other steric problems. The identification of such flaws (e.g. by the use of ModeRNA) indicates that the model should undergo further optimization, for instance local optimization by programs for energy minimization like MMTK [44] or OpenMM Zephyr [45]. In the absence of the experimentally determined structure to be used as a reference for model quality assessment (i.e. in ‘real life’ cases of comparative modeling), the accuracy of the theoretical model can be predicted using computational tools, such as the recently developed knowledge-based potentials RASP [20] or KB [46]. It is also advisable to check the validity of the model against the available experimental data, e.g. with FILTREST3D [47] or with NAST [35].

The further use of RNA 3D structure models, e.g. to model interactions with other molecules, requires the use of other bioinformatics tools. In the example described in this article, the next step would require the acquisition of a model of the protein partner in the appropriate functional state, which can be achieved by an analogous modeling protocol, with protein-specific tools such as SwissModel [5] or Modeller [48]. The assembly of a complex can be guided by homology (e.g. by superposition onto another related complex) or de novo, by protein–RNA docking, e.g. with HADDOCK [49]. Protein–RNA docking is an emerging field, and currently no standard protocols exist, especially for analyzing RNA molecules with modified residues. For instance, HADDOCK cannot automatically process modified residues, and such analysis would require ‘demodification’ of the RNA with ModeRNA prior to docking. A detailed description of docking is however beyond the scope of this article.

The ModeRNA script and input files to reproduce all models presented in this study are available on the ModeRNA website (http://genesilico.pl/moderna/examples/). We believe they are useful as a starting point for developing further RNA comparative modeling experiments, on tRNA or other targets.

Key Points

Software for RNA 3D prediction may provide good quality models in reasonable time.
ModeRNA is a software for RNA structure prediction that uses the comparative modeling approach and can be used with a structural template and an target–template sequence alignment.
Escherichia coli tRNA^Thr has been modeled in a conformation corresponding to the complex with an aaRS.
The comparison of the best model with the experimentally solved structure (1QF6) resulted in an all-atom RMSD of 3.4 Å.

FUNDING

The German Academic Exchange Service (D/09/42768 to K.R.); the Polish Ministry of Science and Higher Education (N N301 035 539 to T.P.); the European Research Council (RNA+P=123D to J.M.B.); Foundation for Polish Science (‘Ideas for Poland’ fellowship to J.M.B.).

References

1

Hoogstraten

CG

Sumita

M

,

Structure-function relationships in RNA and RNP enzymes: recent advances

,

Biopolymers

,

2007

, vol.

87

(pg.

317

-

28

)

2

Laing

C

Schlick

T

,

Computational approaches to 3D modeling of RNA

,

J Phys Condens Matter

,

2010

, vol.

22

pg.

283101

3

Rother

K

Rother

M

Boniecki

M

et al. ,

RNA and protein 3D structure modeling: similarities and differences

,

J Mol Model

,

2011

doi:10.1007/s00894-010-0951-x

4

Rother

M

Rother

K

Puton

T

et al. ,

ModeRNA: a tool for comparative modeling of RNA 3D structure

,

Nucleic Acids Res

,

2011

, vol.

39

(pg.

4007

-

22

)

5

Arnold

K

Bordoli

L

Kopp

J

et al. ,

The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling

,

Bioinformatics

,

2006

, vol.

22

(pg.

195

-

201

)

6

Smith

KC

Cordes

E

Schweet

RS

,

Fractionation of transfer ribonucleic acid

,

Biochim Biophys Acta

,

1959

, vol.

33

(pg.

286

-

7

)

7

Robertus

JD

Ladner

JE

Finch

JT

et al. ,

Structure of yeast phenylalanine tRNA at 3 A resolution

,

Nature

,

1974

, vol.

250

(pg.

546

-

51

)

8

Kim

SH

Suddath

FL

Quigley

GJ

et al. ,

Three-dimensional tertiary structure of yeast phenylalanine transfer RNA

,

Science

,

1974

, vol.

185

(pg.

435

-

40

)

9

Agris

PF

Vendeix

FA

Graham

WD

,

tRNA's wobble decoding of the genome: 40 years of modification

,

J Mol Biol

,

2007

, vol.

366

(pg.

1

-

13

)

10

Motorin

Y

Helm

M

,

tRNA stabilization by modified nucleotides

,

Biochemistry

,

2010

, vol.

49

(pg.

4934

-

44

)

11

Grosjean

H

Grosjean

H

,

Nucleic acids are not boring long polymers of only four types of nucleotides: a guided tour

,

DNA and RNA Modification Enzymes: Structure, Mechanism, Function and Evolution

,

2009

Landes Bioscience

Google Preview

12

Dalluge

JJ

Hashizume

T

Sopchik

AE

et al. ,

Conformational flexibility in RNA: the role of dihydrouridine

,

Nucleic Acids Res

,

1996

, vol.

24

(pg.

1073

-

9

)

13

Agris

PF

,

Bringing order to translation: the contributions of transfer RNA anticodon-domain modifications

,

EMBO Rep

,

2008

, vol.

9

(pg.

629

-

35

)

14

Cozzetto

D

Giorgetti

A

Raimondo

D

et al. ,

The evaluation of protein structure prediction results

,

Mol Biotechnol

,

2007

, vol.

39

(pg.

1

-

8

)

15

Bujnicki

JM

,

Protein-structure prediction by recombination of fragments

,

Chembiochem

,

2006

, vol.

7

(pg.

19

-

27

)

16

Fiser

A

Feig

M

Brooks

CL

3rd

et al. ,

Evolution and physics in comparative protein structure modeling

,

Acc Chem Res

,

2002

, vol.

35

(pg.

413

-

21

)

17

Hardin

C

Pogorelov

TV

Luthey-Schulten

Z

,

Ab initio protein structure prediction

,

Curr Opin Struct Biol

,

2002

, vol.

12

(pg.

176

-

81

)

18

Krieger

E

Nabuurs

SB

Vriend

G

,

Homology modeling

,

Methods Biochem Anal

,

2003

, vol.

44

(pg.

509

-

23

)

PubMed

19

Moult

J

,

A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction

,

Curr Opin Struct Biol

,

2005

, vol.

15

(pg.

285

-

9

)

20

Capriotti

E

Norambuena

T

Marti-Renom

MA

et al. ,

All-atom knowledge-based potential for RNA structure prediction and assessment

,

Bioinformatics

,

2011

, vol.

27

(pg.

1086

-

93

)

21

Gardner

PP

Daub

J

Tate

JG

et al. ,

Rfam: updates to the RNA families database

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D136

-

40

)

22

Berman

HM

Westbrook

J

Feng

Z

et al. ,

The Protein Data Bank

,

Nucleic Acids Res

,

2000

, vol.

28

(pg.

235

-

42

)

23

Nawrocki

EP

Kolbe

DL

Eddy

SR

,

Infernal 1.0: inference of RNA alignments

,

Bioinformatics

,

2009

, vol.

25

(pg.

1335

-

7

)

24

Eiler

S

Dock-Bregeon

A

Moulinier

L

et al. ,

Synthesis of aspartyl-tRNA(Asp) in Escherichia coli—a snapshot of the second step

,

EMBO J

,

1999

, vol.

18

(pg.

6532

-

41

)

25

Boomsma

W

Hamelryck

T

,

Full cyclic coordinate descent: solving the protein loop closure problem in Calpha space

,

BMC Bioinformatics

,

2005

, vol.

6

pg.

159

26

Czerwoniec

A

Dunin-Horkawicz

S

Purta

E

et al. ,

MODOMICS: a database of RNA modification pathways. 2008 update

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D118

-

21

)

27

Rother

M

Milanowska

K

Puton

T

et al. ,

ModeRNA server: an online tool for modeling RNA 3D structures

,

Bioinformatics

,

2011

doi:10.1093/bioinformatics/btr400

28

DeLano

WL

,

The PyMOL Molecular Graphics System

,

2002

29

Sankaranarayanan

R

Dock-Bregeon

AC

Romby

P

et al. ,

The structure of threonyl-tRNA synthetase-tRNA(Thr) complex enlightens its repressor activity and reveals an essential zinc ion in the active site

,

Cell

,

1999

, vol.

97

(pg.

371

-

81

)

30

Xu

J

Zhang

Y

,

How significant is a protein structure similarity with TM-score = 0.5?

,

Bioinformatics

,

2010

, vol.

26

(pg.

889

-

95

)

31

Parisien

M

Cruz

JA

Westhof

E

et al. ,

New metrics for comparing and assessing discrepancies between RNA 3D structures and models

,

RNA

,

2009

, vol.

15

(pg.

1875

-

85

)

32

Giege

R

,

Toward a more complete view of tRNA biology

,

Nat Struct Mol Biol

,

2008

, vol.

15

(pg.

1007

-

14

)

33

Giege

R

Puglisi

JD

Florentz

C

,

tRNA structure and aminoacylation efficiency

,

Prog Nucleic Acid Res Mol Biol

,

1993

, vol.

45

(pg.

129

-

206

)

PubMed

34

Capriotti

E

Marti-Renom

MA

,

Quantifying the relationship between sequence and three-dimensional structure conservation in RNA

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

322

35

Jonikas

MA

Radmer

RJ

Laederach

A

et al. ,

Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters

,

RNA

,

2009

, vol.

15

(pg.

189

-

99

)

36

Flores

SC

Altman

RB

,

Turning limited experimental information into 3D models of RNA

,

RNA

,

2010

, vol.

16

(pg.

1769

-

78

)

37

Lavender

CA

Ding

F

Dokholyan

NV

et al. ,

Robust and generic RNA modeling using inferred constraints: a structure for the hepatitis C virus IRES pseudoknot domain

,

Biochemistry

,

2010

, vol.

49

(pg.

4931

-

3

)

38

Ding

F

Sharma

S

Chalasani

P

et al. ,

Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms

,

RNA

,

2008

, vol.

14

(pg.

1164

-

73

)

39

Gherghe

CM

Leonard

CW

Ding

F

et al. ,

Native-like RNA tertiary structures using a sequence-encoded cleavage agent and refinement by discrete molecular dynamics

,

J Am Chem Soc

,

2009

, vol.

131

(pg.

2541

-

6

)

40

Cao

S

Chen

SJ

,

Physics-based de novo prediction of RNA 3D structures

,

J Phys Chem B

,

2011

, vol.

115

(pg.

4216

-

26

)

41

Flores

SC

Wan

Y

Russell

R

et al. ,

Predicting RNA structure by multiple template homology modeling

,

Pac Symp Biocomput

,

2010

(pg.

216

-

27

)

42

Rangan

P

Masquida

B

Westhof

E

et al. ,

Assembly of core helices and rapid tertiary folding of a small bacterial group I ribozyme

,

Proc Natl Acad Sci USA

,

2003

, vol.

100

(pg.

1574

-

9

)

43

Tung

CS

Joseph

S

Sanbonmatsu

KY

,

All-atom homology model of the Escherichia coli 30S ribosomal subunit

,

Nat Struct Biol

,

2002

, vol.

9

(pg.

750

-

5

)

44

Hinsen

K

,

The molecular modeling toolkit: a new approach to molecular simulations

,

J Comp Chem

,

2000

, vol.

21

(pg.

79

-

85

)

Crossref