Assign cluster name based on cluster size [message #1587] |
Thu, 07 April 2022 13:46 |
mcmc
Messages: 23 Registered: April 2018
|
Junior Member |
|
|
it looks as if the cluster numbers that are generated by the "cluster compounds" algorithm are rather arbitrarily assigned (I guess in chronological order).
I feel it could be useful to sort clusters by size. That is, cluster 1 would be the largest one, then 2, etc.
Probably an easy fix, yet very helpful?
|
|
|
|
Re: Assign cluster name based on cluster size [message #1590 is a reply to message #1589] |
Mon, 11 April 2022 17:27 |
mcmc
Messages: 23 Registered: April 2018
|
Junior Member |
|
|
Thanks Norwid. Meanwhile I realized that with the current cluster numbering, similar clusters tend to have adjacent numbers. That is, cluster 404 resembles 405. I guess that has some advantages too.
Also I observed that a SALI analysis provides "neighbor count" which seems to be the same as cluster size (minus 1). That in turn, gives a filter that can be used to zoom in on the most populated clusters.
[Updated on: Mon, 11 April 2022 17:28] Report message to a moderator
|
|
|
Re: Assign cluster name based on cluster size [message #1598 is a reply to message #1590] |
Fri, 22 April 2022 22:46 |
nbehrnd
Messages: 224 Registered: June 2019
|
Senior Member |
|
|
Hello mcmc,
I just completed a small Python script to process DataWarrior's results about structure similarity (Chemistry -> Cluster Compounds) exported as text file (File -> Save Special -> Textfile). It identifies the clusters, sorts these based on the number of molecules in each clusters, updates the molecules' cluster labels (1, 2, 3,...) accordingly and writes a new .txt file one may read with DW by (Ctrl + O). There are two sorts possible: a) «the more molecules in the cluster, the lesser the integer used as label of the cluster», a pattern possibly matching best your intent. Though with the optional flag -r you equally may reverse the sort for b) «the more molecules in the cluster, the greater the label».
The .zip archive attached below includes the .py script and describes early results when processing a small set of test data. It assumes the first column labeled «Cluster No» contains the cluster labels assigned by DataWarrior (which is the program's default header).
Norwid
[Updated on: Tue, 26 April 2022 11:31] Report message to a moderator
|
|
|
|