Home » DataWarrior » Cheminformatics » Sorting, counting and deleting different elements (e.g., Iodine) in a dataset (Filtering uncommon elements from a drug like dataset in Datawarrior)
Sorting, counting and deleting different elements (e.g., Iodine) in a dataset [message #1733] |
Mon, 12 September 2022 00:54 |
Jo W
Messages: 34 Registered: July 2021
|
Member |
|
|
How do you find out the distribution and numbers of different elements that might occur in a dataset of drug-like molecules and delete those compounds that contain these specific elements?
For example if you download 5000 HIV active organic compounds from say PubChem containing a diverse set of different structures, there will be some compounds that for example contain selenium atoms or iodine.
These type of elements are not common in many datasets for biological screening and can distort and/or cause poor model predictions to occur.
So, how can you collate these compounds in Datawarrior and quickly analyse their frequency and also selectively remove them?
I know you can set up a filter for example "molecular formula" or "smiles" and then type in "Se" and then the filter "hides" all the selenium containing compounds (if you reverse the filter) and you can also tell how many compounds in the dataset were "hidden" and therefore get a figure of the selenium-containing compounds in the dataset.
However it's very laborious to do this for all other elements (accepting that you want for example C,H,O,N elements to remain), and also this peace-meal approach does not let you visualise the number of compounds in the dataset that contain for example, selenium, iodine, chlorine, phosphorous, etc.
For example, it would be good to see the following as a table in DW:
C,H,N,F - 90 compounds
C,H,O,P - 200 compounds
C,H,O,Se - 10 compounds etc
and maybe to visualise them as a histogram. Then for example removing the selenium-containing compounds from the dataset, to see what effects on the model they have.
So how can this be achieved in DW?
|
|
|
|
|
Sorting, counting and deleting different elements (e.g., Iodine) in a dataset
By: Jo W on Mon, 12 September 2022 00:54
|
|
|
Re: Sorting, counting and deleting different elements (e.g., Iodine) in a dataset
By: nbehrnd on Tue, 13 September 2022 08:55
|
|
|
Re: Sorting, counting and deleting different elements (e.g., Iodine) in a dataset
By: Jo W on Tue, 13 September 2022 22:13
|
|
|
Re: Sorting, counting and deleting different elements (e.g., Iodine) in a dataset
By: thomas on Tue, 13 September 2022 22:44
|
|
|
Re: Sorting, counting and deleting different elements (e.g., Iodine) in a dataset
By: Paul on Wed, 28 September 2022 21:02
|
Goto Forum:
Current Time: Thu Nov 21 23:12:54 CET 2024
Total time taken to generate the page: 0.03500 seconds
|