How to use Torsion Explorer
First draw a chemical structure in the editor, which should be valid in terms of valances, atom charges and stereo-chemical assignments. Then switch to the lasso tool and move the mouse pointer over any rotatable single bond. Torsion Explorer will immediately classify the bond and construct an identifying substructure, which is then used to pull the corresponding torsion distribution statistics from a preprocessed table. A four-atom-strand of the drawn molecule is highlighted to indicate those atoms, which torsion angles refer to. The identifying substructure, an interactive histogram, and a Newman projection of the substructure are displayed. When the mouse pointer is moved over the histogram, the Newman projection displays the identifying substructure in the corresponding torsion state.
Atom decorations show atom properties of the substructure: a:aromatic, !a:non aromatic, r:ring member, sp2:flat hydrogen. A yellow marker on the central bond indicates a ring bond.
How does Torsion Explorer work internally?
Displaying meaningful torsion angle statistics for any given rotatable bond environment required to find a similarity criterion that is strict enough to distinguish substantially different cases and that is loose enough to collect enough cases for a statistically relevant answer. Since the tool should run on derived data without direct access to any 3D-structure database, the similarity criterion had to be based on predefined and reproducible rules rather than on user interaction. The selected solution was to construct a substructure fragment consisting of the rotatable bond (central bond and central atoms) and one additional shell of connected (terminal) bonds and atoms. Atomic numbers, bond orders (including delocalization), stereo configuration of the central atoms, ring state and aromaticity of the terminal atoms, as well a the ring state of the central bond were made part of this identifying substructure. Where one end of the substructure has more than one terminal atom, one of those is uniquely selected to be the reference point for the torsion value. If multiple terminal atoms are connected to e tetrahedral central atom and if two of these terminal atoms are property-wise indistinguishable then the third neighbor serves as the reference, even if it is a hydrogen or just a lone pair. In the latter case both symmetrical, terminal atoms are highlighted to indicate that the reference point is neither of them.
Identifying substructures are canonicalized and normalized to one enantiomer, where applicable. Then every substructure is assigned to one of six possible symmetry classes, which differ concerning the unique base torsion angle range and the symmetry operations needed to map equivalent ranges onto the base range. While the 360 degree torsion range in diphenyl can be split into 4 symmetrical 90 degree ranges, in 2-butanol the full 360 degree range is unique. Interestingly, tetrahedral nitrogen atoms need to be considered stereo centers, because they usually don't invert within a crystal.
Data Preparation
All organic structures with 3D coordinates and an R-factor <= 0.05 were exported from the CSD (Nov 2011), excluding structures with disordered atoms, structures with errors, polymers and powder structures. This yielded 104511 compounds. From these all structures with unbalanced atom charged, structures that contained bonds of type 'any', non strictly organic structures and a few that were considered invalid for other reasons were removed. Of the remaining 88011 structures every rotatable bond was processed in the following manner: The identifying substructure was determined, canonicalized and normalized. The terminal reference atoms/positions were located and the signed torsion value calculated. Depending on the symmetry class of the substructure the value was mapped into the unique angle range and then added to the histogram belonging to this substructure. Altogether 736557 torsion values were calculated and distributed among 3422 distinct identifying substructures.
Acknowledgements
Our thanks go to the Cambridge Crystallographic Data Center (CCDC) and Dr. Colin Groom for their support and for the permission to publish a tool containing data derived from the CSD.