Home » DataWarrior » Functionality » molecular complexity descriptor (descriptor missing in the version 4.4.3)
molecular complexity descriptor [message #204] Wed, 02 November 2016 22:37 Go to next message
dataviz is currently offline  dataviz
Messages: 4
Registered: November 2016
Junior Member
i just read a recent review in DDT 2016 (O Mendez-Lucio et al) about molecular complexity computed by different tools including datawarrior. I asked the authors and they used version 4.2.2 to compute this descriptor but it seems that this is not present in version 4.4.3 ? while this parameter seems valuable... maybe of interest to plug it back ?
many thanks
Re: molecular complexity descriptor [message #206 is a reply to message #204] Sat, 05 November 2016 23:49 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 163
Registered: June 2014
Senior Member
this was due to an unfortunate bug in the 4.4.3 version. The checkbox disappeared, but will be back in an update in a few days.

Re: molecular complexity descriptor [message #417 is a reply to message #206] Sat, 03 November 2018 11:03 Go to previous messageGo to next message
timritchie is currently offline  timritchie
Messages: 3
Registered: February 2015
Location: Ranco, Italy
Junior Member
How is the complexity descriptor calculated? Which structural features are included?
Thanks and regards,
Tim Ritchie.
Re: molecular complexity descriptor [message #419 is a reply to message #417] Sun, 04 November 2018 01:13 Go to previous message
thomas is currently offline  thomas
Messages: 163
Registered: June 2014
Senior Member
Hi Tim,
the complexity calculation is conceptually very easy but computationally demanding. Its original version calculates the number of distinct structural fragments, which one can construct from a molecule by just cutting parts off. When doing this all delocalized bonds are retained, i.e. marked as such. Then the fragments is converted into a canonical code and added to the list if it is new. The fragment count grows in principal exponentially with the size of the molecule. Therefore we normalize the absolute fragment count by taking its logarithm and devide it by the molecule size. The more distinct fragments, the more complex is the molecule. Molecules with many symmetrical=equivalent atoms, substituents, or molecules with many re-occurring sub-structures are by this logic of low complexity.
For larger molecules the complete creation of all existing sub-structures is rather demanding in terms of memory and time. Therefore DataWarrior uses a fast and simplified version. We have found that if we limit the number of bonds that we allow a fragment to have, we have nevertheless a good estimator for the brute force method's result. DataWarrior limits the fragment generation to a maximum of 7 bonds and calculates the complexity as log(fragmentCount)/bondLimit with bondLimit=7 unless the molecule has less than 14 bonds. Then bondLimit is bondCount/2.

You can find the source code in as part of the DataWarrior source code.

More detailled info is here:

von Korff M., Sander T. (2013) About Complexity and Self-Similarity of Chemical Structures in Drug Discovery. In: Stavrinides S., Banerjee S., Caglar S., Ozer M. (eds) Chaos and Complex Systems. Springer, Berlin, Heidelberg
Previous Topic: Opening SDF file with Datawarrior from command line
Next Topic: Hide Stereochemistry Text Labels on Molecules
Goto Forum:

Current Time: Fri Nov 16 15:22:14 CET 2018

Total time taken to generate the page: 0.00938 seconds