openmolecules.org Forum: Functionality » suggest: adjustment .sdf export

Home » DataWarrior » Functionality » suggest: adjustment .sdf export

Show: Today's Messages :: Polls :: Message Navigator

suggest: adjustment .sdf export [message #908]

Fri, 15 May 2020 17:59

nbehrnd
Messages: 240
Registered: June 2019

Senior Member

Prior to further analysis of a library,[1] its entries were deduplicated by Data ->
merge equivalent rows, using content of the structure column as sole criterion. The
work with the .sdf subsequently generated by DataWarrior worked fine if the compound
name column used the row number.

Yet, retaining the information of the molecules' name -- here, a PubChem identifier
-- may be useful as a structure may be attributed more than one.[2] The corresponding
choice of compound name column to equate automatic may then yield a .sdf which is not
understood, e.g. by openbabel (version 3.0.0, April 2020).

index.php?t=getfile&id=199&private=0

The suggestion for this type of .sdf export by DW is to report the molecules names
in the data's header / footer on one line, separated only by a blank space.

The archived .dwar equally contains cells with more then one multiple occurrence of
the same PubChem number (e.g. cell #46 about PBCHM2982, PBCHM47354, and PBCHM40585).
The desideratum for cases like this one is to retain only one occurence of each
PubChem number per cell.

[1] https://github.com/IanAWatson/Lilly-Medchem-Rules/blob/maste r/test/example_molecules.smi, revision Apr 26, 2020
[2] E.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702940/

Attachment: format_suggest.png
(Size: 150.00KB, Downloaded 1090 times)
Attachment: testinput.zip
(Size: 276.48KB, Downloaded 658 times)
Attachment: sorted_DW_deduplicate_structure.dwar.zip
(Size: 1.34MB, Downloaded 690 times)

Report message to a moderator

Re: suggest: adjustment .sdf export [message #915 is a reply to message #908]

Sat, 23 May 2020 12:48

thomas
Messages: 742
Registered: June 2014

Senior Member

Thank you very much for the detailed description of the problem. I have fixed the issue and in the next update it will be included. The behaviour is now to replace any NEWLINE characters by a '; ' string when writing the content of an associated compound name column into the first line of the molfile. When DataWarrior reads an SD-File with these entries again, it recognizes the names as separate ones again, because for DataWarrior the '; ' is a natural separator.

Report message to a moderator

Previous Topic:	suggest: native .pdf export
Next Topic:	Feature Request: Templates in Recents

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Tue Mar 03 16:09:56 CET 2026

Total time taken to generate the page: 0.00693 seconds