DataWarrior User Manual

Chemistry in 3D

Biological properties of a chemical substance depend largely on its 3-dimensional structure, i.e. on the interaction potential of its atoms, their geometrical orientation and on the flexibility of the molecule. Typically, a molecule has not one but many low energy conformers and to understand the biological potential of a compound one needs to investigate its conformer structures in detail. The Flexophore descriptor was designed to cover all representative conformers of a molecule and to even consider its flexibility. Calculating similarities between molecules using the Flexophore is easy and allows to detect molecules whose conformers have a high potential to interact with a target protein in a similar way. Nevertheless, it doesn't reveal any insights into the 3-dimensional nature of a compounds.

DataWarrior has a conformer generator and forcefield based energy minimization built in, which together allow generating diverse and low energy conformers, which can be explored within DataWarrior, can be exported to be used in other software packages, or can be even rendered to yield photo-realistic images. Within DataWarrior there are three views, that may show conformers: First, the detail area automatically includes a 3D-molecule viewer, if a structure column has associated conformer information. Second, the form view may contain a form item that shows conformers and, third, the conformer explorer, of course, shows conformers.

Generating Conformers

This functionality creates one or multiple conformers for every structure within a DataWarrior document. Various algorithms for the conformer generation and subsequent energy minimization are available. To create conformers for the current data window's molecules, select Generate Conformers... from the Chemistry menu. A dialog allows to define options for the conformer generator.

If you work with LigandScout from Inteligand, then you should know that from Version 4.4.2 on LigandScout is able to directly read conformers from native DataWarrior files, no matter whether a row contains just one or multiple conformers.

Conformer Generator Options

Structure column: A column containing chemical structures for which to generate the conformers.

Algorithm: Most of the algorithms, which can be selected here, share the same general procedure to generate conformers, a rule based assembly of self organized rigid fragments:
First DataWarrior locates all freely rotatable bonds of the molecule, which are not part of a ring. By cutting all of these bonds a set of more or less rigid fragments is obtained. For any of these fragments a self organization based algorithm creates one or multiple fragment conformers.
In a second step the local neighborhood of every rotatable bond is inspected and used to assign the bond to a specific bond class. A bond class is basically defined as a sub-structure with query features that describe all neighbor atoms and bonds of the rotatable bond. DataWarrior uses a dictionary of about 5000 distinct rotatable bond environments, which have been extracted from experimental, i.e. crystallographic data. Every one of these rotatable bond classes comes with a list of preferred torsion angles and frequency data about how often these torsion angles have been found in the crystallographic database. From the frequency data DataWarrior derives a likelyhood for any of the torsion angles. Now using bond classes DataWarrior assigns a set of preferred torsion angles along with their likelyhoods to any rotatable bond of the molecule.
In a third step, the fragments are then assembled after choosing one of the preferred torsion angles for every rotatable bond. A collision check determines, whether the combination of torsions causes any atom collisions. If no collision occurs, the conformer is accepted and a new combination is chosen. Otherwise, the algorithm creates a rule about a torsion combination, which leads to a collision. These rules are considered when constructing new conformers.
Potentially, the number of constructable conformers may be very high, depending on the number of rotatable bonds, the number of torsions per bond and the number of self-organized fragment conformers. Therefore, one of multiple strategies must be chosen, which prioritize how torsion angles are permutated, how atom collisions are handled and to which extend likely torsions are preferred:

  • Random, low energy bias: This strategy randomly selects for every new conformer a new set of torsions and fragments. However, a weighted random method is used giving more likely torsion angles and better scoring fragments a higher chance of being selected than the less likely ones. This is a well balanced strategy leading to diverse low energy conformers.
  • Pure random: The degrees of freedom are selected randomly neglecting any likelyhoods. This produces the most diverse conformers, but not necessarily low energy ones.
  • Adaptive collision avoidance, low energy bias: This strategy starts and works like the low energy biased random strategy until a set of torsion angles causes atom collisions. Then, for every rotatable bond is determined to which extend its current rotation state contributes to atom collisions. With a weighted random approach one of the rotatable bonds is chosen to be modified next, such that the likelyhood for the next conformer is high to escape the collision.
  • Systematic, low energy bias: The starting point for this algorithm is that conformer, which uses for any degree of freedom the lowest energy option, i.e. for every rotatable bond the lowest energy torsion angle for every fragment conformation choice the best scoring one. For the next conformer only that degree of freedom is changed, which causes the smallest overall increase of energy. This way, the most likely conformers are produced first, but the initial diversity may be low if only a few conformers are generated.
  • Self organized: This algorithm does not use the general procedure described above. It applies a self organization approach to the entire molecule. For that all atoms are initialized with random coordinates. The a list of constraints is determined as follows: Distance constrainst define preferred distances between any two atoms. Plane constraints group atoms, which should share the same plane. Other constraints handle preferred torsions, stereochemistry and atoms on a straight line. In a kind of minimization procedure constraints are randomly picked and their atoms relocated in space to better meet the constraint. This algorithm works best with highly constrained, i.e. rigid structures like bridged ring systems.

Initial torsion: Since using crystallographic data derived torsion angles introduces a certain bias and in order to mimic other conformer generator's construction principles, DataWarrior's conformer generator may also forgo using experimental torsion data. Instead it may use six torsion angles per rotatable bond with 60 degree steps. In this case all six torsion angles are considered equally likely. Of course, conformers constructed this way are often far off the local energy minimum and probably should be energy minimized afterwards.

Minimize energy: Rule based assembled or self-organized conformers like those created by the above algorithms may still suffer from angle strains, slight atom collisions or suboptimal torsions, because the local environment of a particular molecule may not be well represented by more general rules that were used for the construction. To reduce strain and minimize energies these conformers can be further optimized by applying one of these forcefields:

  • MMFF94s+ forcefield: This is an optimized version of the MMFF94s force field, which adresses known and unrealistic torsion parameterization of the original MMFF94s implementation. The torsion angle analysis and corrections were done by Joel Wahl. A peer-reviewed publication is in process.
  • MMFF94s forcefield: The Merck Molecular Force Field 94 is a widely used and well known forcefield based on the MM3 forcefield. It is parameterized to be applicable to a wide range of organic compounds. The implementation that DataWarrior uses was ported from the RD-Kit to Java and validated by Daniel Bergmann and Paolo Tosco, who earlier had developed the MMFF94 implementation in C++ for the RD-Kit as well.
  • Idorsia forcefield: This forcefield is based on the MM2 forcefield. Its implementation in Java was developed by Joel Freyss. It is also universally applicable and mainly used for in-house purposes at Idorsia.
  • Don't minimize: This option passes through all conformers as they are generated by the construction algorithm.
  • Max. conformer count: The number of generated conformers per compound will be limited to this number. If more than one conformer is generated and if these are not written into an external file, then these are automatically pooled into one row and aligned on the most central rigid fragment of the molecule.

    Write into file: When this option is selected, generated conformers are exported into a compound file rather than added to the current dataset.

    File type: The most widely supported format is probably the SD-file version 2, while the most compact file format is certainly a native DataWarrior file. Note that in addition to DataWarrior itself also LigandScout from Inte:ligand is able to read single and multiple conformers from native DataWarrior files.

    Pool conformers of same compound: If conformers are saved as a native DataWarrior file, then this option allows storing all conformers of one compound into the same target row. Within DataWarrior such a conformer set will then be displayed in the detail view as a conformer ensemble that is automatically superposed using the most central rigid fragment.

    Conformer ensemble shown in DataWarrior's detail area

    When conformer ensembles are exported from DataWarrior into SD-files, then every conformer is saved as an individual molecule record. However, within a native DataWarrior file the connection table is only stored once, while atom coordinates of every conformer are included as a compact text string. Therefore, storing conformer ensembles as native DataWarrior files is very space-efficient and flexible at the same time.

    Remove small fragments: If this option is selected, then all unconnected fragments except for the largest one are removed from the molecule before conformers are generated. This is particularly advisable, if a forcefield minimization is used, which may potentially take a very long time to optimize relative positions of non connected fragments.

    Neutralize remaining fragment: If this option and the Remove small fragmenti> option both are selected, then DataWarrior tries after the removal of all small fragments to neutralize all charged atoms of the remaining fragment through protonation or deprotonation. If quarternary nitrogens cannot be deprotonated, the DataWarrior tries to deprotonate acidic atoms to achieve a neutral overall charge.

    Skip compounds with more than NN stereo isomers: If a molecule contains undefined stereo centers, then the conformer generator randomly constructs stereo isomers. Depending on the purpose, molecules with a high potential number of isomers and therefore an even higher number of representing conformers may pose a problem, e.g. for virtual screening where they cause high computation time and a low probability that found hits really represent the correct stereo isomer. This option allows to just skip this kind of molecules.

    Create proper protonation state: This option is available for selection only, if the ChemAxon pKa-Plugin is installed and DataWarrior was configured to find it. If this option is selected, then the pKa-values of basic and acidic atoms are determined using the ChemAxon method and these atoms are properly protonated or deprotonated to reflect their natural state at the given pH value. If the value in the +- text field is different from '0', then this defines a pH-range. In this case DataWarrior may produce more than one protonation state if the pKa of one or more of the basic or acidic atoms fall into the pKa range.

    3D-Structures in Detail Area

    When a DataWarrior file contains 3D-coordinates for chemical structures, then the Detail Area contains a dedicated 3D-structure viewer. Once the mouse moves over a row in any main view, this detail view is updated to display the row's 3D-structure. 3D-structures can be rotated with the right mouse button, moved in x,y and z dimensions with the middle mouse button and the scroll wheel, respectively. A right mouse click opens a popup menu with options to show molecular surfaces, change dispay or color modes, or measure angles, torsions and distances among atoms and bonds.

    One useful application for the 3D-viewer would be the visual inspection of docking results, i.e. comparing the position of the natural ligand with the locations of docked small molecules. For this purpose the natural ligand needs to be superposed with any docked molecule in the same coordinate system. This can be achieved in DataWarrior if a file contains the natural ligand and the docked molecules, by selecting the menu item Superpose Reference Row. In addition one needs to make the natural ligand the reference row by clicking it in any view such that it gets a red frame. Now, when moving the mouse from row to row, the 3D-detail-view is updated to show the row's docked molecule structure together with the natural ligand.

    Detail area with aligned conformers and popup menu.

    Another application may be to compare conformers in regard to their volume overlap when aligned properly. For that purpose you need to select both options, Superpose Reference Row and Align Shapes. The latter option causes DataWarrior to optimally align both conformers using the PheSA algorithm, which is a rigid alignment method that optimizes shape overlap as well as pharmacophoric feature overlap.

    Exploring Conformers of a Molecule

    DataWarrior has a built-in conformer explorer that allows to inspect multiple conformers of the same molecule. To open the conformer explorer select Explore conformers of 'Structure'... from the popup menu, which appears when clicking the right mouse button on top of any structure or marker within any main view. Conformers may be shown with a small delay, because one set of conformers is generated immediately. You may use the mouse wheel for zooming and moving conformers within the screen plane. Conformers may be rotated using the right mouse button. A right mouse click on a conformer opens a context sensitive popup menu with options for rendering, molecular surface display and distance and angle measurement.

    Conformer Explorer with superposed conformers

    The bottom panel of the conformer explorer shows some controls to regenerate conformers with different settings, to separate individual conformers, to display molecular surfaces, to save conformers into a file and to define, which atoms are super-positioned. To change these atoms, click the Superpose... button, select the atoms, which shall be superposed, and close the dialog.

    Conformer Explorer with separated conformers and molecular surfaces

    To freshly generate new conformers within the conformer explorer you may select the maximum number of conformers, the algorithm to be used, and a forcefield for the energy minimization. Then press the Generate button. The available algorithms are explained in previous section.

    Photorealistic Rendering

    If you select the Photo-Realistic Image... item from the popup menu of any of DataWarrior's 3D molecule viewers, then a dialog opens that lets you calculate a photo-realistic image using the professional quality ray-tracer Sunflow, which is part of the DataWarrior installation. The dialog lets you choose various options.

    • Image size: This is the size of the created image in pixel.
    • Environment: This option contains some predefined lighting, color and material conditions as bright sun and black background.
    • Move and zoom to fill image: If this option is selected, the molecule is rotated automatically to expose its largest possible silhouette to the camera. Furthermore, it is zoomed and moved to just about fill the image. If this option is not selected, DataWarrior tries to mimic the perspective and zoom state of the conformer panel. Since the rendering concepts of the ray-tracer and the conformer viewer are different, the original perspective will be similar, but not necessarily exactly reproduced.
    As soon as the render dialog is closed, a new window opens, in which all available processor cores are busy to render the molecule. Once the image is completed one may save it to a file or copy it to the clipboard by selecting the appropriate option from a popup menu. The following picture shows an example taken from the Crystallography Open Database.

    Photorealistic image of COD entry 2230709,

    Superposing Conformers

    If the binding conformer of a protein's ligand is known or at least suspected, then this is often used to search compound collections for similar conformers regarding shape and pharmacophore features. With DataWarrior you can perform such a virtual screening in two steps: First you generate conformers from any of the molecules to be screened. Then you try superposing these conformers onto the query conformer and assign a score to each molecule representing the best conformer's degree of shape and pharmacophore feature overlap. For this purpose DataWarrior uses an algorithm called PheSA (Pharmacophore enhanced shape alignment), which uses Gaussian functions to calculate and optimize the shared volume of two molecules' conformers while considering pharmacophore feature overlay at the same time.

    If your dataset contains at least one column with one or multiple conformers, then you may configure the superpositioning task with Superpose Conformers... from the Chemistry menu. A simple dialog opens, where you select the column containing the conformers. You also need to specify the query conformer, which shall be PheSA-matched against all conformers. This is done with a right mouse click in the blue initially empty area. A popup menu lets you import a molecule from either a molfile, mol2-file, the ligand from a PDB-file or from the PDB-database. Alternatively, you can also paste in a structure. If the clipboard contains a molecule without 3D-atom coordinates, then a low-energy conformer is created on the fly. Of course, in this case, the conformer is not a perfect query, because it is not necessarily close to any binding conformation.

    Dialog configured to superpose all energy-minimized 3D-Structures of a data set with a given conformer of the CNS-drug Alpidem.

    After clicking OK, DataWarrior tries matching the shape of all conformers of any row with the given target molecule(s). As result you will receive a new column named 'PheSA Score'. Scores close to 1.0 indicate a very high shape and pharmacophore feature match. Lower values represent poorer matches. Another new column contains the best matching conformer superposed to the query structure, which can only be seen in the Detail View, because the Table View doesn't show conformer columns.

    In fact, there is a second hidden column named 'Best Match', which contains the best matching stereo isomer of the input structure. When the input structure doesn't contain multiple stereo isomers, then the best matching structure and input structure are always identical. Hence, this column may be of limited use and, therefore, is hidden per default. You may make it visible with a right mouse click on the table header and choosing Show 'Best Match'.

    New columns after superposing; detail area shows two optimally superposed structures.

    If you had defined multiple query structures in the dialog, then you would have received one new 'PheSA Score' column for each query structure. You would also have got distinct new columns for the best matching isomers and 3D-superpositions.

    Continue with Accessing Databases...