DataWarrior User Manual

Chemistry in 3D


Biological properties of a chemical substance depend largely on its 3-dimensional structure, i.e. on the interaction potential of its atoms, their geometrical orientation and on the flexibility of the molecule. Typically, a molecule has not one but many low energy conformers and to understand the biological potential of a compound one needs to investigate its conformer structures in detail. The Flexophore descriptor was designed to cover all representative conformers of a molecule and to even consider its flexibility. Calculating similarities between molecules using the Flexophore is easy and allows to detect molecules whose conformers have a high potential to interact with a target protein in a similar way. Nevertheless, it doesn't reveal any insights into the 3-dimensional nature of a compounds.

DataWarrior has a conformer generator and forcefield based energy minimization built in, which together allow generating diverse and low energy conformers, which can be explored within DataWarrior, can be exported to be used in other software packages, or can be even rendered to yield photo-realistic images. Within DataWarrior there are three views, that may show conformers: First, the detail area automatically includes a 3D-molecule viewer, if a structure column has associated conformer information. Second, the form view may contain a form item that shows conformers and, third, the conformer explorer, of course, shows conformers.


Generating Conformers

This functionality creates one or multiple conformers for every structure within a DataWarrior document. Various algorithms for the conformer generation and subsequent energy minimization are available. To create conformers for the current data window's molecules, select Generate Conformers... from the Chemistry menu. A dialog allows to define options for the conformer generator.

If you work with LigandScout from Inteligand, then you should know that from Version 4.4.2 on LigandScout is able to directly read conformers from native DataWarrior files, no matter whether a row contains just one or multiple conformers.

Conformer Generator Options

Structure column: A column containing chemical structures for which to generate the conformers.

Algorithm: Most of the algorithms, which can be selected here, share the same general procedure to generate conformers, a rule based assembly of self organized rigid fragments:
First DataWarrior locates all freely rotatable bonds of the molecule, which are not part of a ring. By cutting all of these bonds a set of more or less rigid fragments is obtained. For any of these fragments a self organization based algorithm creates one or multiple fragment conformers.
In a second step the local neighborhood of every rotatable bond is inspected and used to assign the bond to a specific bond class. A bond class is basically defined as a sub-structure with query features that describe all neighbor atoms and bonds of the rotatable bond. DataWarrior uses a dictionary of about 5000 distinct rotatable bond environments, which have been extracted from experimental, i.e. crystallographic data. Every one of these rotatable bond classes comes with a list of preferred torsion angles and frequency data about how often these torsion angles have been found in the crystallographic database. From the frequency data DataWarrior derives a likelihood for any of the torsion angles. Now using bond classes DataWarrior assigns a set of preferred torsion angles along with their likelihoods to any rotatable bond of the molecule.
In a third step, the fragments are then assembled after choosing one of the preferred torsion angles for every rotatable bond. A collision check determines, whether the combination of torsions causes any atom collisions. If no collision occurs, the conformer is accepted and a new combination is chosen. Otherwise, the algorithm creates a rule about a torsion combination, which leads to a collision. These rules are considered when constructing new conformers.
Potentially, the number of constructable conformers may be very high, depending on the number of rotatable bonds, the number of torsions per bond and the number of self-organized fragment conformers. Therefore, one of multiple strategies must be chosen, which prioritize how torsion angles are permuted, how atom collisions are handled and to which extent likely torsions are preferred:

  • Random, low energy bias: This strategy randomly selects for every new conformer a new set of torsions and fragments. However, a weighted random method is used giving more likely torsion angles and better scoring fragments a higher chance of being selected than the less likely ones. This is a well-balanced strategy leading to diverse low energy conformers.
  • Pure random: The degrees of freedom are selected randomly neglecting any likelihoods. This produces the most diverse conformers, but not necessarily low energy ones.
  • Adaptive collision avoidance, low energy bias: This strategy starts and works like the low energy biased random strategy until a set of torsion angles causes atom collisions. Then, for every rotatable bond is determined to which extent its current rotation state contributes to atom collisions. With a weighted random approach one of the rotatable bonds is chosen to be modified next, such that the likelihood for the next conformer is high to escape the collision.
  • Systematic, low energy bias: The starting point for this algorithm is that conformer, which uses for any degree of freedom the lowest energy option, i.e. for every rotatable bond the lowest energy torsion angle for every fragment conformation choice the best scoring one. For the next conformer only that degree of freedom is changed, which causes the smallest overall increase of energy. This way, the most likely conformers are produced first, but the initial diversity may be low if only a few conformers are generated.
  • Self organized: This algorithm does not use the general procedure described above. It applies a self organization approach to the entire molecule. For that all atoms are initialized with random coordinates. The a list of constraints is determined as follows: Distance constrainst define preferred distances between any two atoms. Plane constraints group atoms, which should share the same plane. Other constraints handle preferred torsions, stereochemistry and atoms on a straight line. In a kind of minimization procedure constraints are randomly picked and their atoms relocated in space to better meet the constraint. This algorithm works best with highly constrained, i.e. rigid structures like bridged ring systems.

Initial torsion: Since using crystallographic data derived torsion angles introduces a certain bias and in order to mimic other conformer generator's construction principles, DataWarrior's conformer generator may also forgo using experimental torsion data. Instead, it may use six torsion angles per rotatable bond with 60 degree steps. In this case all six torsion angles are considered equally likely. Of course, conformers constructed this way are often far off the local energy minimum and probably should be energy minimized afterwards.

Minimize energy: Rule based assembled or self-organized conformers like those created by the above algorithms may still suffer from angle strains, slight atom collisions or suboptimal torsions, because the local environment of a particular molecule may not be well represented by more general rules that were used for the construction. To reduce strain and minimize energies these conformers can be further optimized by applying one of these forcefields:

  • MMFF94s+ forcefield: This is an optimized version of the MMFF94s force field, which adresses known and unrealistic torsion parameterization of the original MMFF94s implementation. The torsion angle analysis and corrections were done by Joel Wahl. A peer-reviewed publication is in process.
  • MMFF94s forcefield: The Merck Molecular Force Field 94 is a widely used and well known forcefield based on the MM3 forcefield. It is parameterized to be applicable to a wide range of organic compounds. The implementation that DataWarrior uses was ported from the RD-Kit to Java and validated by Daniel Bergmann and Paolo Tosco, who earlier had developed the MMFF94 implementation in C++ for the RD-Kit as well.
  • Idorsia forcefield: This forcefield is based on the MM2 forcefield. Its implementation in Java was developed by Joel Freyss. It is also universally applicable and mainly used for in-house purposes at Idorsia.
  • Don't minimize: This option passes through all conformers as they are generated by the construction algorithm.
  • Max. conformer count: The number of generated conformers per compound will be limited to this number. If more than one conformer is generated and if these are not written into an external file, then these are automatically pooled into one row and aligned on the most central rigid fragment of the molecule.

    Write into file: When this option is selected, generated conformers are exported into a compound file rather than added to the current dataset.

    File type: The most widely supported format is probably the SD-file version 2, while the most compact file format is certainly a native DataWarrior file. Note that in addition to DataWarrior itself also LigandScout from Inte:ligand is able to read single and multiple conformers from native DataWarrior files.

    Pool conformers of same compound: If conformers are saved as a native DataWarrior file, then this option allows storing all conformers of one compound into the same target row. Within DataWarrior such a conformer set will then be displayed in the detail view as a conformer ensemble that is automatically superposed using the most central rigid fragment.

    Conformer ensemble shown in DataWarrior's detail area

    When conformer ensembles are exported from DataWarrior into SD-files, then every conformer is saved as an individual molecule record. However, within a native DataWarrior file the connection table is only stored once, while atom coordinates of every conformer are included as a compact text string. Therefore, storing conformer ensembles as native DataWarrior files is very space-efficient and flexible at the same time.

    Remove small fragments: If this option is selected, then all unconnected fragments except for the largest one are removed from the molecule before conformers are generated. This is particularly advisable, if a forcefield minimization is used, which may potentially take a very long time to optimize relative positions of non connected fragments.

    Neutralize remaining fragment: If this option and the Remove small fragmenti> option both are selected, then DataWarrior tries after the removal of all small fragments to neutralize all charged atoms of the remaining fragment through protonation or deprotonation. If quarternary nitrogens cannot be deprotonated, the DataWarrior tries to deprotonate acidic atoms to achieve a neutral overall charge.

    Skip compounds with more than NN stereo isomers: If a molecule contains undefined stereo centers, then the conformer generator randomly constructs stereo isomers. Depending on the purpose, molecules with a high potential number of isomers and therefore an even higher number of representing conformers may pose a problem, e.g. for virtual screening where they cause high computation time and a low probability that found hits really represent the correct stereo isomer. This option allows to just skip this kind of molecules.

    Create proper protonation state: This option is available for selection only, if the ChemAxon pKa-Plugin is installed and DataWarrior was configured to find it. If this option is selected, then the pKa-values of basic and acidic atoms are determined using the ChemAxon method and these atoms are properly protonated or deprotonated to reflect their natural state at the given pH value. If the value in the +- text field is different from '0', then this defines a pH-range. In this case DataWarrior may produce more than one protonation state if the pKa of one or more of the basic or acidic atoms fall into the pKa range.


    3D-Structures in Detail Area

    When a DataWarrior file contains 3D-coordinates for chemical structures, then the Detail Area contains a dedicated 3D-structure viewer. Once the mouse moves over a row in any main view, this detail view is updated to display the row's 3D-structure. 3D-structures can be rotated with the right mouse button, moved in x,y and z dimensions with the middle mouse button and the scroll wheel, respectively. A right mouse click opens a popup menu with options to show molecular surfaces, change display or color modes, or measure angles, torsions and distances among atoms and bonds.

    One useful application for the 3D-viewer would be the visual inspection of docking results, i.e. comparing the position of the natural ligand with the locations of docked small molecules. For this purpose the natural ligand needs to be superposed with any docked molecule in the same coordinate system. This can be achieved in DataWarrior if a file contains the natural ligand and the docked molecules, by selecting the menu item Superpose Reference Row. In addition, one needs to make the natural ligand the reference row by clicking it in any view such that it gets a red frame. Now, when moving the mouse from row to row, the 3D-detail-view is updated to show the row's docked molecule structure together with the natural ligand.

    Detail area with aligned conformers and popup menu.

    Another application may be to compare conformers in regard to their volume overlap when aligned properly. For that purpose you need to select both options, Superpose Reference Row and Align Shapes. The latter option causes DataWarrior to optimally align both conformers using the PheSA algorithm, which is a rigid alignment method that optimizes shape overlap as well as pharmacophoric feature overlap.


    Exploring Conformers of a Molecule

    DataWarrior has a built-in conformer explorer that allows to inspect multiple conformers of the same molecule. To open the conformer explorer select Explore conformers of 'Structure'... from the popup menu, which appears when clicking the right mouse button on top of any structure or marker within any main view. Conformers may be shown with a small delay, because one set of conformers is generated immediately. You may use the mouse wheel for zooming and moving conformers within the screen plane. Conformers may be rotated using the right mouse button. A right mouse click on a conformer opens a context-sensitive popup menu with options for rendering, molecular surface display and distance and angle measurement.

    Conformer Explorer with superposed conformers

    The bottom panel of the conformer explorer shows some controls to regenerate conformers with different settings, to separate individual conformers, to display molecular surfaces, to save conformers into a file and to define, which atoms are super-positioned. To change these atoms, click the Superpose... button, select the atoms, which shall be superposed, and close the dialog.

    Conformer Explorer with separated conformers and molecular surfaces

    To freshly generate new conformers within the conformer explorer you may select the maximum number of conformers, the algorithm to be used, and a forcefield for the energy minimization. Then press the Generate button. The available algorithms are explained in previous section.


    Photorealistic Rendering

    If you select the Photo-Realistic Image... item from the popup menu of any of DataWarrior's 3D molecule viewers, then a dialog opens that lets you calculate a photo-realistic image using the professional quality ray-tracer Sunflow, which is part of the DataWarrior installation. The dialog lets you choose various options.

    • Image size: This is the size of the created image in pixel.
    • Environment: This option contains some predefined lighting, color and material conditions as bright sun and black background.
    • Move and zoom to fill image: If this option is selected, the molecule is rotated automatically to expose its largest possible silhouette to the camera. Furthermore, it is zoomed and moved to just about fill the image. If this option is not selected, DataWarrior tries to mimic the perspective and zoom state of the conformer panel. Since the rendering concepts of the ray-tracer and the conformer viewer are different, the original perspective will be similar, but not necessarily exactly reproduced.
    As soon as the render dialog is closed, a new window opens, in which all available processor cores are busy to render the molecule. Once the image is completed one may save it to a file or copy it to the clipboard by selecting the appropriate option from a popup menu. The following picture shows an example taken from the Crystallography Open Database.

    Photorealistic image of COD entry 2230709,
    catena-Poly[[(2,2'-dimethyl-4,4'-bi-1,3-thiazole-N,N')cadmium]-di-bromido]


    Superposing Conformers

    If the binding conformer of a protein's ligand is known or at least suspected, then its 3D-structure is often used to search compound collections for similar conformers regarding shape and pharmacophore features. With DataWarrior you can perform such a virtual screening in two different ways, either by aligning the query conformer to a dataset with rigid, pre-computed conformers, or by generating conformers on the fly and optimizing their torsion angles to maximize query overlap while retaining low energy torsion angles at the same time.

    In either of the two options you end up with conformers best-possible aligned to your query conformer and with a score that describes how much your aligned conformers overlap and how well any pharmacophiric features match. In fact, you get tree scores, one for the overlap, one for te pharmacophore match and one that summarizes both. For aligning and scoring DataWarrior uses an algorithm called PheSA (Pharmacophore enhanced shape alignment), which uses Gaussian functions to calculate and optimize the shared volume of two molecules' conformers while considering pharmacophore feature overlay at the same time. The PheSA algorithm was engineered by Joel Wahl and is described in https://pubs.acs.org/doi/10.1021/acs.jcim.4c00516

    Rigid Alignment: If your dataset doesn't contain at least one column with one or multiple conformers, then you need to generate conformers for any of the molecules to be screened as described in generate conformers. Otherwise, or afterward you configure the superpositioning task with Superpose Conformers->Rigid... from the Chemistry menu. A simple dialog opens, where you first select the column containing your conformers. You also need to specify the query conformer, which shall be PheSA-matched against all conformers. This is done with a right mouse click within the initially empty area. A popup menu lets you import a molecule from either a molfile, mol2-file, the ligand from a PDB-file or from the PDB-database. Alternatively, you can also paste in a structure. If the clipboard contains a molecule without 3D-atom coordinates, then a low-energy conformer is created on the fly. Of course, in this case, the conformer is not a perfect query, because it is not necessarily close to any binding conformation.

    Dialog configured to superpose all energy-minimized 3D-Structures of a data set with a given conformer of the CNS-drug Alpidem.

    After clicking OK, DataWarrior tries matching the shape of all conformers of any row with the given target molecule(s). As result you will receive a new column named 'PheSA Score'. Scores close to 1.0 indicate a very high shape and pharmacophore feature match. Lower values represent poorer matches. Another new column contains the best matching conformer superposed to the query structure, which can only be seen in the Detail View, because the Table View doesn't show conformer columns.

    In fact, there is a second hidden column named 'Best Match', which contains the best matching stereo isomer of the input structure. When the input structure doesn't contain multiple stereo isomers, then the best matching structure and input structure are always identical. Hence, this column may be of limited use and, therefore, is hidden per default. You may make it visible with a right mouse click on the table header and choosing Show 'Best Match'.

    New columns after superposing; detail area shows two optimally superposed structures.

    If you had defined multiple query structures in the dialog, then you would have received one new 'PheSA Score' column for each query structure. You would also have got distinct new columns for the best matching isomers and 3D-superpositions.

    Flexible Alignment: To flexibly align all molecules of an open DataWarrior file with a given 3D-query molecule (conformer), your DataWarrior file doesn't need to contain molecule coordinates in 3D. Instead, an initial conformer is calculated on the fly from every molecule, which is then optimized in a step-wise fashion to optimally overlap with the query structure. Typically, flexible PheSA alignment produces better matches than the rigid variant of the algorithm, because in most cases even large conformer sets don't contain perfectly matching conformers, especially, if a molecule has a high flexibility due to many rotatable bonds. In order to flexibly align your molecules with a given conformer choose Superpose Conformers->Flexible... from the menu and continue as described above.


    Protein-Ligand Docking

    DataWarrior uses a state-of-the-art docking algorithm that is about to be published. More details will be described here later...

    In order to dock the molecules of an open DataWarrior window into a specific protein cavity choose Dock Structures Into Protein Cavity... from the Chemistry menu. A dialog opens. Now define a protein cavity from the popup menu that appears after a right mouse click into the protein area of the open dialog. Typically, you will open a local PDB file or specify a PDB-Code to retrieve protein structure from the PDB-database. The PDB-entry must contain at least one ligand molecule. In case of multiple ligand molecules you need to select one by selecting its molecular formula. Then DataWarrior crops all parts of the protein that are more than 10 Angstrom away from the ligand and are not needed for the docking procedure. It also calculates the Conolly surface inside the cavity and starts docking your molecules as potential ligands into the cavity. This may take a few hours for thousands of molecules.

    Docked molecule into complement factor B protease domain, PDB-ID: 6RAV).

    Once the docking calculation has finished, the DataWarrior table contains two new columns, a docking score and a docked structure. The latter column's structures may differ slightly from the original structure column, because basic acidic atoms may be (de-)protonated and small unconnected fragments are removed. Another invisible column contains the 3D-atom coordinates of the docked pose. This column also knows about the natural ligand and the protein cavity. A detail view display the cavity with the docked structure. Optionally, the cavity surface, major interactions, and the natural ligand can be shown. If you have a device that is capable of displaying stereoscopic content, you may open a daughter view that mirrors the detail view for real 3D-perception. Interactions are calculated and shown according to S. Salentin, S. Schreiber, V. J. Haupt, M. F. Adasme, M. Schroeder; PLIP: fully automated protein–ligand interaction profiler; Nucleic Acids Research, 2015 1; doi: 10.1093/nar/gkv315


    Fragment Libraries

    Sometimes in a drug discovery project the bio-active conformation of an active compound is known, typically from an x-ray structure after co-crystallizing the compound with the target protein. The compound may be a natural ligand, a competitor's drug, or an own lead compound. If the compound structure is patented, if its activity is not sufficient, or if it has other unwanted properties related to parts of its structure, then modification of the structure may solve the issue. Usually, tiny modifications followed by property measurements are done to systematically explore the property space of the chemical neighborhood of a lead compound. If an issue is related to an entire class of compounds, then a larger structural change may be necessary. However, in order to keep the activity on the target, the molecule's 3D-geometry and pharmacophoric point positions should essentially be kept unchanged. The hope is that this can be achieved by replacing a central part of the molecule by another fragment that resembles the replaced one in terms of size and exit vector angles. A potential pitfall may occur, if the newly constructed molecule is far from its energy minimum, e.g. because of substantially different torsion potentials of the constructed exit vector bonds. Luckily, these cases are easily detected when letting the built conformer equilibrate into the next energy minimum by applying a force field minimization. If a constructed and energy-minimized molecule still superposes well with the original molecule, then the scaffold replacement looks promising.

    Similar to replacing a scaffold or core structure, is to replace two independent atoms or substituents of a molecule by one new fragment. This introduces a ring into the molecule and, thus, reduces the molecule's flexibility, i.e. degrees of freedom. Connecting two open ends and closing a ring often leads to improved target activity, because less flexible molecules lose less degrees of freedom when binding to a protein. This means a lower loss of entropy and, thus, a better binding constant.

    In DataWarrior seeking qualifying fragments or linkers requires a fragment library to be searched for candidates. Fragment libraries can be built easily from any collection of 3-dimensional structures. Typically, such 3D-molecules are taken from crystallographic databases. A good versatile set of molecules as starting point would be the organic part of the Crystallography Open Database, which you can download from here. Alternatively, you could use low energy conformers from any source, e.g. conformers generated by DataWarrior with MMFF94+ energy minimization.

    When DataWarrior generates 3D-fragments from molecules with 3D-coordinates, then it first determines all rotatable bonds within the molecule. Then, it calculates bond flexibility or rotatability values for these bonds, which basically describe how many substantially diverse torsion angles are populated for similar bonds in a crystallographic database. After that it compiles all substructures of the original molecule that can be constructed by cutting any possible combination of rotatable bonds. For those substructures that qualify certain conditions regarding size and overall flexibility, open valences are connected to a pseudo atom with the same atom type and coordinates than the original neighbor atom at that position. That pseudo atom is marked to be an exit vector. Duplicate fragments with the same structure and similar 3D-geometry are not collected again. Finally, generated 3D-fragments look like this:

    Visualization of generated 3D-fragments with exit vectors.

    To generate a new fragment library from an open window that contains structures with 3D-coordinates choose 3D-Fragments -> Build Library... from the Chemistry menu.

    Dialog to generate a fragment library with default parameters.

    3D-structure column: Here you select the structure column with associated 3D-atom coordinates, i.e. conformer(s), that will be the molecule source for generating 3-dimensional fragments.

    Minimum / Maximum non-hydrogen atoms: Use these two field to define the size of the generated fragments. Neither hydrogen atoms not exit vectors are included in these atom counts.

    Maximum bond flexibility sum: Typically, 3D-fragments are used as replacements for existing scaffolds or as linker, e.g. to form a macrocycle. In both cases highly flexible fragments should be avoided, because more rigid molecules tend to better bind to proteins. DataWarrior calculates for every source molecule the flexibility of every rotatable bond as a value from 0.0 to 1.0. The cut-off value refers to the flexibility sum of all rotatable bonds in a fragment.

    Minimum / Maximum exit vectors: This limits generated fragments to those that fit within a defined span of allowed exit vector counts.


    Replacing Scaffolds and 3D-Linking

    As described in the previous chapter, a central activity in drug discovery is to optimize a compound structure towards improved biological activity, suitable pharmacokinetic properties, tolerable toxicological profile, and patentability. Such changes include replacing a molecule's core structure by a new one or replacing two (or even three) independent substituents by one new fragment, which in this case closes a ring. Usually, the starting compound possesses some target activity, which would be unfortunate to be lost. Therefore, one aims for replacements that keep the molecule's 3D-geometry largely unchanged.

    DataWarrior offers an easy way to search a pre-built library of fragments for those, which satisfy customizable geometry criteria and therefore qualify as replacement for an unwanted part of a given 3-dimensional query molecule. First, you paste in the query molecule or load it from a file, then you select atoms to be replaced, select a fragment library to be searched, optionally change geometry constraints, and start the search. For every fragment that matches in terms of connecting atom types and geometry DataWarrior constructs a new molecule where the selected part is replaced by the fragment found. Then, the new molecule is energy-minimized, which basically equilibrates bond torsion angles into the next local energy minimum. This relaxed conformer is now aligned to the original query molecule in two ways, rigid and flexibly. All constructed molecules are shown in a new window together with multiple alignment visualizations and multiple scores that allow solution ranking in various regards. To open the configuration dialog choose 3D-Fragments -> Replace Scaffolds... from the Chemistry menu.

    Dialog to replace the central ribose fragment of Molnupiravir.

    3D-structure: At the top of the dialog you define the query molecule as a 3D-structure. When pressing the right mouse button in the empty molecule area an up-coming popup menu offers various option of how to populate the area. These include loading a molecule from a mol-, mol2-, SD-, or DataWarrior-file. Ideally, the file should just contain one molecule. Alternatively, you can load a ligand structure from a local pdb-file or from the remote PDB-database by specifying a PDB-code. Of course, you may also paste a structure into the area.
    Once a structure is loaded, you need to define that substructure, which you intend to replace. To do so, simply select some of the structures atoms by surrounding them with the mouse. If you press Shift when starting the selection, the surrounded atoms are added to previously selected atoms. Use Ctrl to unselect atoms. Typical selections would be a set of connected atoms, e.g. a ring system in the center of the molecule. But you could also select two or three small substituents or even just hydrogen atoms. In any case the new fragment comes in one piece and connects all open valences created by cutting away all selected atoms.

    3D-fragment file: Click the Choose... button to select a previously generated fragment-file, which must be a DataWarrior file created ealier by the method described in the previous chapter. In most cases a fragment file generated from diverse organic crystal structures is a good choice. However, if you want to close an oligo-peptide chain to create a macrocycle, you may prefer fragments generated from a peptide or protein structure source. Note: When the fragment file is searched for qualifying fragments, then only those fragments are considered where exit vector atom types (represented by cone colors) match the type of the new neighbor atoms to which they get connected.

    Allow m less and n more non-H atom: This defines the accepted size of introduced fragments in regard to the replaced query molecule part. Allowed fragments are those of which the number of non-hydrogen atoms is between x-m and x+n where x is the number of replaced (i.e. in the query molecule selected) non-hydrogen atoms.

    Maximum exit vector RMSD: After atom type filtering, all qualifying fragments are passed through a geometry match. For that the exit vector bonds are aligned to the corresponding bonds in the query molecule. Then, the RMSD between the fragment's exit vector bonds (both atoms) and the respective query molecule's bonds is calculated. Only if the RMSD is below the value defined here the fragment is considered further on. Of course, if a fragment has multiple exit vectors of the same atom type, then all permutations are checked.

    Maximum exit vector angle divergence: Fragments that pass the RMSD check above, while still being aligned, are then compared regarding exit vector directions. If an exit vector diverges more from the corresponding query bond's direction than the allowed value then the fragment is sorted out.

    Minimum PheSA-flex score: After replacing a scaffold by a new fragment, a flexible alignment of the constructed molecule to the query molecule is performed to check, whether it can adapt the query molecule's shape in a low energy conformation. The quality of the alignment is represented by the PheSA score, which represents both, the shape overlap and the overlap of pharmacophore features. Use this setting to eliminate solutions that cannot assume the query molecule's shape. Note: This setting is not used, if multiple disconnected parts of the query molecule are selected and, thus, a ring closure is intended.

    If your settings are valid you can start the fragment replacement process by clicking Shift. Depending on your fragment file size you will get a result message like the following after some seconds or minutes.

    Message after running the task "Replace Scaffold And Link".

    After closing the message you can access all constructed molecules in a new DataWarrior window. Its table view contains a structure column with the used fragments themselves and another one with these fragments built into the query molecule. Associated to these constructed molecules are three sets of 3D-atom coordinates. Therefore, the detail area on the right contains three 3D-molecule views, which need some explanation:

    • MMFF94+ minimized & retained atoms aligned: This conformer shows atom coordinates after replacing the fragment and after MMFF94+ energy minimization, which should relax strain introduced by new bonds with torsion potentials different to hte original ones. Thus, the conformer should more or less resemble the query molecule. In addition, this conformer was aligned to the query molecule using only those atoms that exist in both molecules. In all 3D-views the query molecule is shown in white color. In particular, this view is useful if you have introduced a larger fragment or a linker, because the alignment does not consider the new fragment and just shows the influence of the newly introduced fragment on the conformation of the retained part of the molecule. The RMSD of the retained atoms after alignment can be found in the table column 'MMFF94+ PheSA Rigid Score'.
    • MMFF94+ minimized & PheSA aligned: This is the same conformer as above: the energy relaxed molecule after construction. However, this time it is PheSA-aligned to the query molecule. This (rigid) PheSA alignment results in the best possible overlap the two fixed conformers concerning their volume and pharmacophore features. Thus, the quality of this alignment is relevant when replacing a central scaffold in order to get new molecules with a similar target binding characteristic. The numerical value for the alignment quality can be found in the table column 'Retained Atom RMSD'.
    • PheSA-flex aligned: This view shows a conformer of the built molecule that was flexibly aligned to the query conformer. Again, PheSA was used to optimize volume and pharmacophore overlap, but the conformer was allowed to change bond torsion angles in favor of a better alignment. Again a good overlap is a good indication for a potentially similar binding behaviour. However, this conformer may not necessarily be a low energy conformer. The flexible alignment score can be found in the table column 'PheSA-flex aligned', while the diversion of this conformer from the energy minimum in kcal/mol is found in 'Energy Dif Flex Rigid'.

    Result window after task "Replace Scaffold And Link".

    Finally, you need to rank your results depending on what you intend to achieve and probably also by judging the synthetic accessibility, which is beyond the scope of this method. The two graphical views may guide you to select the most promising replacements based on alignment scores and possibly considering the dissimilarity to the replaced fragment.


    Stereoscopic Screens and Glasses

    Any 3D-molecule view in DataWarrior, e.g. the ones showing conformers, superpositions, or docking poses, allows to open a second stereoscopic full screen view that dynamically mirrors the original view content in stereoscopic mode. Both view are synchronized regarding zoom and rotation state. Typically, one uses the mouse to navigate in the flat 3D-view while perceiving 3D-content in the stereoscopic view. Thus, if you have a screen or other device connected to your computer that supports stereoscopic output, then you can view and analyze 3D-structures in real 3D. The only requirement is that the device supports one of these input modes: side-by-side (SBS), half-side-by-side (HSBS), over-under (OU), or half-over-under (HOU). To open a new stereoscopic full screen view from any flat 3D-molecule view do a right mouse click and select View->Show Stereo View-> with the proper mode for target device.

    3D-View with popup menu to open a new SBS full screen view.

    One category of devices for this purpose are 3D-TVs, which usually support at least one of these modes. Basically, there are two types of 3D-TVs, the ones with active, battery powered shutter glasses (Samsung, etc) and the ones with light, passive, polarisation filter based glasses (LG, etc). The last generations of 4k-TVs by LG, especially the OLED ones, produce astoundingly crisp and perfect 3D-images (best in HOU mode). Unfortunately, 3D-TVs are not produced anymore, but used ones are still available for moderate prices.

    Illustration: Display of docking pose on a 3D-TV or 3D-Monitor.

    An alternative to large 3D-TVs or monitors are the new XR-glasses from Viture, Xreal, ..., which look like Ray-Ban sunglasses. In stereo mode they can be connected to a computer where they are perceived as a second, double wide (3840x1080) screen. Inside the glasses the computer output in split into left and right parts, which are then projected to your left and right eyes individually. To view 3D-molecules with XR-glasses, make sure the glasses are connected in stereo mode to your computer. Then, within a DataWarrior 3D-view do a right mouse click and select View->Show Stereo View->SBS for the device representing your XR-glasses.

    Scientist watching docking pose with XR-glasses.


    Continue with Accessing Databases...