molecular complexity descriptor [message #204] |
Wed, 02 November 2016 22:37 |
dataviz
Messages: 5 Registered: November 2016
|
Junior Member |
|
|
Hello
i just read a recent review in DDT 2016 (O Mendez-Lucio et al) about molecular complexity computed by different tools including datawarrior. I asked the authors and they used version 4.2.2 to compute this descriptor but it seems that this is not present in version 4.4.3 ? while this parameter seems valuable... maybe of interest to plug it back ?
many thanks
Bruno
|
|
|
|
|
Re: molecular complexity descriptor [message #419 is a reply to message #417] |
Sun, 04 November 2018 01:13 |
thomas
Messages: 715 Registered: June 2014
|
Senior Member |
|
|
Hi Tim,
the complexity calculation is conceptually very easy but computationally demanding. Its original version calculates the number of distinct structural fragments, which one can construct from a molecule by just cutting parts off. When doing this all delocalized bonds are retained, i.e. marked as such. Then the fragments is converted into a canonical code and added to the list if it is new. The fragment count grows in principal exponentially with the size of the molecule. Therefore we normalize the absolute fragment count by taking its logarithm and devide it by the molecule size. The more distinct fragments, the more complex is the molecule. Molecules with many symmetrical=equivalent atoms, substituents, or molecules with many re-occurring sub-structures are by this logic of low complexity.
For larger molecules the complete creation of all existing sub-structures is rather demanding in terms of memory and time. Therefore DataWarrior uses a fast and simplified version. We have found that if we limit the number of bonds that we allow a fragment to have, we have nevertheless a good estimator for the brute force method's result. DataWarrior limits the fragment generation to a maximum of 7 bonds and calculates the complexity as log(fragmentCount)/bondLimit with bondLimit=7 unless the molecule has less than 14 bonds. Then bondLimit is bondCount/2.
You can find the source code in FastMolecularComplexityCalculator.java as part of the DataWarrior source code.
More detailled info is here:
von Korff M., Sander T. (2013) About Complexity and Self-Similarity of Chemical Structures in Drug Discovery. In: Stavrinides S., Banerjee S., Caglar S., Ozer M. (eds) Chaos and Complex Systems. Springer, Berlin, Heidelberg
|
|
|