Wednesday, September 12, 2007

ARIA verson 2.2 released

I don't usually post about NMR (Nuclear Magnetic Resonance) and structural biology related stuff, but I've always intended to. In this post I'm pulling out all the stops on specialist lingo and assumed background knowledge, so hopefully it isn't too incomprehensible to the non-structural biology crowd :).

ARIA version 2.2 has been released in the last few weeks. ARIA is an automated NOE assignment and structure calculation package, which (in theory) takes some of the pain and slowness out of producing protein (and DNA and/or RNA) structures from Nuclear Magnetic Resonance data. I'll say up front; I haven't tried this version yet, but some of the improvements look exciting.

Here are two new features worth noting ... followed by what I think it all means:


  • The assignment method has been improved with the introduction of a network-anchoring analysis (Herrmann et al., 2002) for filtering of the initial assignments.
  • The integration of the CCPN has been completed. The imported CCPN distance constraints lists can enter the ARIA process for calibration, violation analysis and network-anchoring analysis. The final constraint lists can be exported as well.

In the past I have done some quick and dirty tests comparing the quality of protein structures produced using Aria 2.1 vs. Peter Gunterts CYANA 1.07 and 2.1, using the exact same NMR peak input lists (with slightly noisy data containing a number of incorrectly picked peaks). CYANA always won hands down, assigning more NOE crosspeaks correctly and producing an ensemble of model structures with much lower RMSD and generally better protein structure quality scores (ie using pretty much any decent pairwise pseudo-energy potential, and Procheck). Also, ARIA produced 'knotted' structures which were almost certainly incorrect, while CYANA did not. Other postdocs and students in my former lab had done similar independent tests with ARIA 1.2 vs. CYANA 1.0.7, and had come to similar conclusions.

The disclaimer: It should be noted here that assessment of the quality of an ensemble of NMR structure coordinates can be problematic, and is really the topic of another long post (and probably tens if not hundreds of peer-reviewed journal articles). So saying "CYANA version X is better then ARIA version X" based on the RMSD of the final calculated ensemble is a bit unfair ... in fact using RMSD of the ensemble to gauge structure quality is just plain wrong in this context. In my (unpublished, non-peer reviewed) tests, it is possible that ARIA could be producing high RMSD but essentially 'correct' structures, while CYANA could be producing tightly defined but 'incorrect' structures, but I doubt it. The gap between the output of each program was wide enough to suggest that under real-world conditions where the input peak list contained a number of 'noise' peaks, ARIA was failing to give a set of consistent solutions (probably due to lack of NOE assignments), while CYANA was giving a set of tightly defined structures (which may or may not have represented the 'correct' solution). Other evaluations (protein structure quality measures, Procheck, comparison to known structures of similar proteins) indicated that the CYANA structures were not grossly 'incorrect', so I'd say CYANA was just giving a better defined (ie lower ensemble RMSD) set of plausible solutions.

My gut feeling is that ARIA 2.2 will perform much better than past versions, due to one key feature that has been 'borrowed' from CYANA; the introduction of a network-anchoring analysis. In a nutshell, network-anchoring scores essentially weight distance constraints (or NOE assignments) based on how 'connected' that constraint is within the graph formed by other constraints. This means that in effect a single, isolated constraint pulling two residues on opposite sides of a protein together is down-weighted, while if multiple constraints link those residues (or their neighboring residues) then those constraints are considered more trusted and hence weighted heavier. For better or worse (usually better), this score simulates what the human NMR spectroscopist would do when assigning NOE crosspeaks manually ... usually two residues in contact will show multiple NOE crosspeaks connecting them and involve multiple different nuclei, however a single lonely NOE between two nuclei which are distant from eachother in the primary protein sequence is heavily scrutinized and regarded with suspicion since it is likely to be mis-assigned. I'm very keen to test ARIA 2.2 on my old data set and see if I'm actually right (I may be able to try it with network anchoring turned on, and off, and see just what sort of contribution that score is making).

Another completed feature, the integration between ARIA and the CCPN libraries/analysis package should also be a big plus. I haven't used the CCPN analysis software yet, but a few years ago I wrote some code to help make CYANA and the Sparky NMR assignment program work together better. The result was functional, but very hackish (and I'm probably the only person in the world who understands how it was intended to be used, since I still haven't got around to writing any documentation. Naughty, naughty). CCPN + ARIA may turn out to be the better option for spectral analysis and structure calculation in the future, as opposed to my currently preferred Sparky + CYANA combination.

I'm really itching to find a good reason to do an NMR structure project now ... back to work !!

No comments: