openmolecules.org

 
Home » DataWarrior » Bug Reports » Stereoisomers conformer generation and Inchikeys
Stereoisomers conformer generation and Inchikeys [message #1388] Tue, 31 August 2021 17:08 Go to next message
mattiafelice.palermo is currently offline  mattiafelice.palermo
Messages: 3
Registered: June 2021
Junior Member
Dear Thomas,

I'm not 100% sure this is a bug as I am not particularly educated about InChIKeys generation and I am a bit rusty regarding stereochemistry.

I have calculated the lowest energy conformers for a molecule with two cis/trans stereocenters (limiting to 1 conformer per stereoisomer):

/forum/index.php?t=getfile&id=445&private=0

Datawarrior automatically generates the 3D geometry for four stereoisomers. Then I have generated the InChIKeys for the four molecules and Datawarrior returns the same key, RUSZGEIGALJIOH-UHFFFAOYSA-N, for every molecule (see figure).

I have tried generating the InChIKeys with RDKit in Python and instead, I have obtained the following:

ID | InChI-Key | Energy
1) RUSZGEIGALJIOH-MRSMTKAOSA-N 156.35
2) RUSZGEIGALJIOH-JYYJLMHASA-N 155.66
3) RUSZGEIGALJIOH-JYYJLMHASA-N 155.64
4) RUSZGEIGALJIOH-LMKHWIEJSA-N 148.88

I have two questions:

  • Which InChIKeys are the correct ones, DataWarrior or RDKit?
  • Aren't molecules 2 and 3 (E, Z and Z, E) the same stereoisomer in this particular case? RDKit InChIKeys seem to confirm that. If that is the case, perhaps DW conformer routine should only keep one of the two, or is this an intended behaviour?
Thank you for your help!

Mattia
Re: Stereoisomers conformer generation and Inchikeys [message #1391 is a reply to message #1388] Thu, 02 September 2021 22:18 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 88
Registered: June 2019
Member
Dear Mattia,

based on visual inspection of the structure formulae, entries #2 and #3 depict the same isomer. Thus, there are three different compounds in the table for which one anticipates three different SMILES strings, InChis, or eventually InChiKeys.

Based on the visual representation, I redrew the structures in PerkinElmer's ChemDraw test page.[1] This page equally permits the export in SMILES, InChi, and InChiKey format. There is no match of the later with those you report as generated by DW. The ChemDraw InChiKeys match the ones you report by RDKit. It is possible to process ChemDraw's SMILES strings into InChiKeys by OpenBabel, too, yet with the same result; match with RDKit/ChemDraw, no match with DW.

Importing the three different ChemDraw SMILES strings into DW yields different structure representations. However already the InChi for these match each other. Since the InChikeys only are a hash of the InChi, it is not surprising that these derivates equally match each other (again). If interested, compare with attached .dwar and Emacs .org file.

Based on these observations, pending correction of this local problem for DW, I would suggest to use the InChiKeys by RDKit. On occasion, implementation of the underlying rules indeed may cause problems which remain unidentified for quite some time until comparison with other programs.[2]

Norwid

[1] https://chemdrawdirect.perkinelmer.cloud/js/sample/index.htm l#
[2] https://mattermodeling.stackexchange.com/questions/6460/rdki t-and-pysmiles-results-differ-on-some-smiles-strings


Addition:

Since DW is able to generate a random library of molecules and to assign for these SMILES and InChiKeys, I wrote a little Python script to compare InChiKeys by DW with those generated by OpenBabel. (This only is low level of concept/doodle only.) For two runs (10 and 250 molecules), the ratio of SMILES for which both programs assign different InChiKeys over the grand total of structures submitted is a little bit higher, than anticipated. Perhaps a complementary check with RDKit is useful before Thomas steps in.

[Updated on: Fri, 03 September 2021 12:27]

Report message to a moderator

Re: Stereoisomers conformer generation and Inchikeys [message #1398 is a reply to message #1391] Tue, 14 September 2021 09:01 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 88
Registered: June 2019
Member
An improvement is on the way. Based on a separate exchange, Thomas mentioned the assignments of InChI/InChiKey provided by DataWarrior are revised and, where necessary, are going to be corrected.

For the moment, alternatives may be the online conversion of individual of structures,[1] or one of the reference implementations distributed by the InChI Trust[2] which, by a command like

./inchi-1 test.sdf -AuxNone -Tabbed -Key

write InChI and InChIKeys of structures in test.sdf into a tabulator separated .txt file.

Norwid

[1], examples
http://www.cheminfo.org/Chemistry/Cheminformatics/Generate_I nChI/index.html
https://chemdrawdirect.perkinelmer.cloud/js/sample/index.htm l# (Structure -> Get InChI)

[2] https://www.inchi-trust.org/downloads/

[Updated on: Sun, 19 September 2021 11:32]

Report message to a moderator

Re: Stereoisomers conformer generation and Inchikeys [message #1403 is a reply to message #1398] Tue, 21 September 2021 00:39 Go to previous message
thomas is currently offline  thomas
Messages: 463
Registered: June 2014
Senior Member
Many thanks to Norwid and Mattia for pointing out the discrepancies of created Inchis and Inchi-Keys with the ones produced by OpenBabel and also for providing a test data set. The issues were mainly caused by a bug in the conversion of the tetrahedral stereo configuration for the jni-inchi (1.0.3) library from 2010 that DataWarrior used to create inchis. During the recent days DataWarrior was updated to using a new open-source project jna-inchi from Daniel Lowe, which is based on the inchi-trust code 1.0.6 from 2020. The stereo issues have also been fixed. The current development update contains all fixes and a new jniinchi.jar (which actually is the jnainchi.jar). The dev update link appears on the download page after checking the 'Read and Understood' checkbox.
Previous Topic: Choosing X-Y Axes in 2D Views
Goto Forum:
  


Current Time: Sun Oct 24 08:29:21 CEST 2021

Total time taken to generate the page: 0.01685 seconds