Multi-Spectra Automatic Structure Verification

Progress in
Multi-Spectra Automatic Structure Verification (MS-ASV)

POSTER by ¹Stanislav Sykora, ¹Carlos Cobas
¹Extra Byte, Castano Primo, Italy
²Mestrelab Research, Santiago de Compostela, Spain

Presented at SMASH 2017, Baveno (Italy), September 17-20, 2017.

DOWNLOAD full poster: PDF

DOI permalink: 10.3247/SL6Nmr17.003

Stan's Library | Stan's HUB

Please, cite this online document as:
Sykora S., Cobas C.,
Progress in Multi-Spectra Automatic Structure Verification (MS-ASV),
Poster at SMASH 2017, Baveno (Italy), September 17-20, 2017. DOI: 10.3247/SL6Nmr17.003.

Abstract

Automatic Structure Verification (ASV) is fast becoming an important part of NMR data evaluation software packages such as Mnova and others. Its basic goal is to answer, in a qualitative as well as quantitative way, the question

Is this molecular structure compatible with these NMR data?

Naturally, the question one makes, the answer one gets.
These things are fuzzy in logic because of imperfections and impurities in the spectra, nmr parameters prediction errors, solvent effects, etc. The complexity is close to that of an artificial intelligence, and the scoring is critically dependent on the query itself. But the problems extends beyond that, involving what one intends by 'NMR data'. As we know well, it is one thing to have a single 1H spectrum, for example, and another one to have a pair of spectra of different kinds, such as 1H and 13C, or 1H and HSQC, and a still more different one if the spectra (of presumably the same compound) include an arbitrary subset of any of the “200 and more” NMR experiments such as, for example 1H, 13C, HSQC, COSY, TOCSY, HMBC, ROESY, and others. How does one proceed with the automatic analysis in such cases? We all know that the likelihood of multiple “solutions” decreases sharply when one combines several spectra of different kinds. We also know that even 'human' analysis is in these cases anything but linear and standard: it requires multiple “passes” through the spectra and a non-trivial search for correlations (or lack of correlations) of various orders. We also expect that critical points (penalties) tend to accumulate when new data are added until they overflow a threshold (false negatives). Since no spectrum is perfect, how many spectra it takes before any structure gets ruled out? It implies that to reach a correct conclusion, there may be an optimal number of spectra; adding still more does increase the knowledge only marginally, while uncertainties continue to increase at least linearly.

Over the last few years we have built a considerable body of experience [1] with these problems, though we do not (and can not) claim that we have solved them yet. How to design a software machinery to iteratively analyze a set of any number of NMR spectra of various kinds (but presumably of the same compound, or better of the same sample)? It should extract and refine all information the spectra might contain, in a synergetic way, discard in an intelligent way all missing features and/or features in excess, and then knowledgeably score that information database against an assumed, hypothetical molecular structure. That, indeed, is a huge task which (in the opinion of one of the authors :- ) ) will keep to be tackled for the rest of this Century.

Stan's Library

Stan's HUB

Page design by Stan Sykora