openmolecules.org Forum: Cheminformatics » Metrics to use for UMAP analysis with SkelSpheres descriptor

Home » DataWarrior » Cheminformatics » Metrics to use for UMAP analysis with SkelSpheres descriptor

Show: Today's Messages :: Polls :: Message Navigator

Re: Metrics to use for UMAP analysis with SkelSpheres descriptor [message #1729 is a reply to message #1725]

Thu, 08 September 2022 22:53

nbehrnd
Messages: 240
Registered: June 2019

Senior Member

Dear Christophe,

curious about the description «Technically, it [the SkelSpheres Descriptor] is a byte vector with a resolution of 1024 bins.» (help menu in DW, chapter Similarity & Descriptors, section Molecule or Reaction Similarity and Descriptors), I found an open access publication[1] going a bit more in detail. In section 4.2, the authors describe it with

«This descriptor was developed by Actelion. It is a vector of integers which represents the occurrence of different substructures in a molecule. Five circular layers with increasing bond distance are located for each atom in the molecule. Hydrogen atoms are not considered. This results in five fragments starting with the naked central atom, adding one layer at a time. Every fragment is encoded as a canonical string (id-code), similar to the generation of canonical SMILES. The canonical id-code includes the stereochemistry of the encoded fragment, which is a feature missing in other molecular descriptors. The string is then assigned to one of 1024 fields n in a vector. Therefore, the hash value of the id-code is calculated and the corresponding value in the vector is increased by one. The Hashlittle algorithm from Jenkins is used as a binning function which takes a text string as input and returns an integer value between 0 (inclusive) and 1024 (exclusive). [...] To consider the molecular scaffold without the influence of the hetero atoms, the whole calculation is repeated while replacing the hetero atoms with carbon. The resulting hash values are used to increment the corresponding fields in the vector. By adding this skeleton information to the descriptor vector the similarity calculation between two descriptor vectors becomes a bit insensitive to the exact position of the hetero atoms in two molecules. This directs the similarity value toward the perception of similarity by medicinal chemists. For them the exact position of a hetero atom is not as discriminating as it would be for the spheres descriptor without the skeleton coding part. The additional consideration of the scaffold information and the use of a histogram instead of a binary vector distinguishes the SkeletonSpheres descriptor from circular fingerprints.»

So far however, I don't understand the concept of "byte" in byte vector when entering the integers as elements of the vector either, which could be crucial.

Norwid

[1] The Screening Compound Collection: A Key Asset for Drug Discovery. C. Boss, J. Hazemann, T. Kimmerlin, M. von Korff, U. Lüthi, O. Peter, T. Sander, R. Siegrist, Chimia 2017, 71, 667-677, DOI: 10.2533/chimia.2017.667 (https://chimia.ch/chimia/article/view/2017_667 open access).

Report message to a moderator

[Message index]

		Metrics to use for UMAP analysis with SkelSpheres descriptor By: Christophe on Thu, 08 September 2022 11:45
		Re: Metrics to use for UMAP analysis with SkelSpheres descriptor By: nbehrnd on Thu, 08 September 2022 22:53
		Re: Metrics to use for UMAP analysis with SkelSpheres descriptor By: Christophe on Fri, 09 September 2022 10:23

Previous Topic:	Dealing with multiple "0% inhibition" results in HTS tests
Next Topic:	Removing "NaN points" from graphs or formula to avoid "NaN" results

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Mon Feb 02 14:37:27 CET 2026

Total time taken to generate the page: 0.00667 seconds