openmolecules.org

 
Home » DataWarrior » Cheminformatics » Exporting a descriptor as a Textfile
Exporting a descriptor as a Textfile [message #1605] Mon, 09 May 2022 17:45 Go to next message
Christophe is currently offline  Christophe
Messages: 16
Registered: January 2022
Junior Member
Hello everyone,

Is it possible to export descriptors such as Skelspheres or others as a textfile ?

Thanks
Re: Exporting a descriptor as a Textfile [message #1610 is a reply to message #1605] Wed, 18 May 2022 14:22 Go to previous messageGo to next message
Christophe is currently offline  Christophe
Messages: 16
Registered: January 2022
Junior Member
Just to mention that I'd like to use these text files as a part of a matrix to perform diverse ordination techniques not available in DW like UMAP for example.
These files must be stored somewhere since the "Skelsphere" descriptor for example is available when a t-SNE ordination is envisioned.

Thanks
Re: Exporting a descriptor as a Textfile [message #1611 is a reply to message #1610] Thu, 19 May 2022 12:42 Go to previous messageGo to next message
amorrison
Messages: 25
Registered: March 2016
Junior Member
Hi,
If you use 'add calculated values' and then str(Descriptor Variable) a new column should appear. I think this is what you are looking for.
Angus
Re: Exporting a descriptor as a Textfile [message #1613 is a reply to message #1611] Fri, 20 May 2022 16:23 Go to previous messageGo to next message
Christophe is currently offline  Christophe
Messages: 16
Registered: January 2022
Junior Member
Hi Angus,

Thank you
The columns appear well but the data are unusable (cabalistic signs)
Some descriptors are encoded as a matrix (1024 bits for example), so I suppose this is not the proper way to extract it.

Christophe
Re: Exporting a descriptor as a Textfile [message #1615 is a reply to message #1613] Sat, 21 May 2022 18:48 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 127
Registered: June 2019
Senior Member
Christophe,

for a small set of molecules, I think I'm able to replicate your findings e.g., for the assignment and subsequent display of skelspheres as a string:

/forum/index.php?t=getfile&id=552&private=0

(observation with DW 5.5.0 for Linux, including the update by May 13th).

On the other hand (now specific to skelspheres); if these are like fingerprints e.g., openbabel offers,[1] what are programs you intend to use which accept these as an input for further computation?

E.g., openbabel reports about DMF:

$ obabel -:"CN(C)C=O" -ofpt -xf FP2
>   5 bits set 
00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00200000 00000000 
00008000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000400 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00040001 
00000000 00000000 
1 molecule converted
or

$ obabel -:"CN(C)C=O" -ofpt -xs -xf FP2
>
0 6 1 7 1 6 <693>
0 7 1 6 <82>
0 8 2 6 <623>
0 8 2 6 1 7 <330>
0 8 2 6 1 7 1 6 <64>
1 molecule converted
Norwid

[1] https://open-babel.readthedocs.io/en/latest/FileFormats/Fing erprint_format.html
Re: Exporting a descriptor as a Textfile [message #1616 is a reply to message #1615] Tue, 24 May 2022 15:24 Go to previous messageGo to next message
Christophe is currently offline  Christophe
Messages: 16
Registered: January 2022
Junior Member
Hi Norwid,

Thank you

It looks you managed to convert the structure DW column into one of the finger print managed by open babel.

I am not sure this procedure captures the native information of the skelspheres descriptor. It generated a FP2 (by default) FP descriptor.

I know how to generate a lot of different FP from structures. For example PaDEL (free java program, http://www.yapcwsoft.com/dd/padeldescriptorWink does a very good job. It provides a matrix (.csv file) with structures (in rows) and bits (binary or count, depending on the FP you select) (in columns).

If you know how bits (or series of bits) are organized you can try to trace the source that causes the differences in distribution by multidimension reduction methods in R for example.

with DW, when I apply a similarity (or activity cliff) with Skelspheres and/or OrgFunctions, I have data sets that cluterize very well but it is quite challenging then to relate the resulting clusters to the distributional differences in term of structure (SkelSpheres) or functionalization (Orgfunctions). If these two descriptors are different from the one I used so far such as ExtFP, MACCSFP .... may be I could gain more information. But I need the "matrix formalism" into a text file.
Christophe
Re: Exporting a descriptor as a Textfile [message #1618 is a reply to message #1616] Thu, 02 June 2022 09:46 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 514
Registered: June 2014
Senior Member
Hi Christoph,

if you use Java, you could use this line to decode the SkeletonSpheres descriptor into a byte array, which contains 1024 count values:

byte[] counts = new DescriptorHandlerSkeletonSpheres().decode(encodedSkeletonSph eres);

Then you could loop over the counts array and write numbers where ever you want. The only dependency would be OpenChemLib, which you can find on GitHub.

Likewise you can decode the OrgFunctions descriptor with

int[][] pairs = new DescriptorHandlerFunctionalGroups().decode();

Here you get an array of arrays with length of 2. Every one of these small arrays contains a functional group ID and an associated count value. Thus, this is not a simple matrix and making use of it will probably need the some knowledge of the groups, i.e. the similarity tree. You may study the FunctionalGroupClassifier to understand which groups have which ID and how the tree is organized.

By the way, UMAP support in DataWarrior is planned.

Thomas

[Updated on: Thu, 02 June 2022 09:46]

Report message to a moderator

Re: Exporting a descriptor as a Textfile [message #1620 is a reply to message #1618] Thu, 02 June 2022 17:24 Go to previous message
Christophe is currently offline  Christophe
Messages: 16
Registered: January 2022
Junior Member
Hello Thomas,

Thank you for this valuable information as usual.

UMAP support in future versions of DW is a very good news. Great!!!

Thanks

Christophe
Previous Topic: Dealing with multiple "0% inhibition" results in HTS tests
Next Topic: ChemAxon calculated properties
Goto Forum:
  


Current Time: Wed Jun 29 23:56:08 CEST 2022

Total time taken to generate the page: 0.00895 seconds