openmolecules.org

 
Home » DataWarrior » Cheminformatics » AllFragFp
AllFragFp [message #2156] Fri, 12 April 2024 10:48 Go to next message
Christophe is currently offline  Christophe
Messages: 31
Registered: January 2022
Member
Hello everyone,

While updating my old version of DW, I've just seen that a new Descriptor, i.e. AllFragFP was available. Does anyone could tell me about any difference(s) with the FragFP ?
I can't find its description on the online User Manual.
Thanks a lot
Christophe
Re: AllFragFp [message #2162 is a reply to message #2156] Tue, 16 April 2024 22:48 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 224
Registered: June 2019
Senior Member
Hello Christophe,

AllFragFP could be something still very early in implementation into DW and thus not (not yet) documented. If one queries the source code of DataWarrior,[1] and its assisting openchemlib[2] on GitHub, only the later contains this very string only once in file DescriptorConstants.java, lines 88 to 96 as a `DESCRIPTOR_TYPE_MOLECULE`.

Norwid

[1] https://github.com/thsa/datawarrior
[2] https://github.com/Actelion/openchemlib
Re: AllFragFp [message #2165 is a reply to message #2162] Fri, 19 April 2024 16:53 Go to previous messageGo to next message
Christophe is currently offline  Christophe
Messages: 31
Registered: January 2022
Member
Thank you Norwid
Re: AllFragFp [message #2182 is a reply to message #2165] Wed, 24 April 2024 14:45 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 715
Registered: June 2014
Senior Member
AllFragFp is substantially different from FragFp: it is hashed and uses 2048 bits. AllFragFp internally generates all substructures of a given molecule with up to 6 connected bonds including stereo chemistry. These substructures are converted into a canonical representation from which a hash code between 0 to 2047 is generated, for which the corresponding bit is set. The original idea was to accelerate the substructure search by a more discriminating descriptor than the FragFp. If the AllFragFp descriptor is available in a DataWarrior file, then DataWarrior uses that for substructure pre-screening. Since the sub-structure search is usually fast for mot more than some hundred thousand molecules, one shouldn't bother to use the AllFragFp. For many millions, however, it makes a significant difference.

Regarding the value of similarities calculated by this descriptor, I didn't really investigate it applicability domain. It certainly will produce very fine grained similarity values, but the SkeletonSpheres descriptor is will probably generate more intuitive ones, because by design single atom replacements cause less large losses of similarity compared to other substructure based descriptors.
Re: AllFragFp [message #2183 is a reply to message #2182] Fri, 26 April 2024 09:45 Go to previous messageGo to next message
Christophe is currently offline  Christophe
Messages: 31
Registered: January 2022
Member
Hello Thomas,

Thank you for this more detailed reply.
If I've understood correctly, the AllFragFP is more akin to what is more commonly known in chemometrics as a path fingerprints, whereas the SkelSpheres descriptor would be of the Extended Connectivity FP type. Is this correct?

Christophe
Re: AllFragFp [message #2211 is a reply to message #2183] Thu, 23 May 2024 22:48 Go to previous message
thomas is currently offline  thomas
Messages: 715
Registered: June 2014
Senior Member
Hi Christophe,

somewhat: SkelSpheres is like Extended Connectivity FP, but it includes full stereo features and it inlcudes a set of fragments, which contain the sckeleton only (no atom types). This makes it more tolerant to single atom replacements, which don't cause a drastic drop of similarity anymore.

AllFragFp is not just linear paths, it covers all fragments with up to 6 bonds (circular, chains, combinations) that are a substructure of the given molecule.

Thomas
Previous Topic: running a DW macro in batch
Next Topic: Create a macro to split a .dwar file into multiple .dwar files
Goto Forum:
  


Current Time: Sun Nov 24 07:14:57 CET 2024

Total time taken to generate the page: 0.03864 seconds