openmolecules.org

 
Home » DataWarrior » Cheminformatics » Sorting, counting and deleting different elements (e.g., Iodine) in a dataset (Filtering uncommon elements from a drug like dataset in Datawarrior)
Re: Sorting, counting and deleting different elements (e.g., Iodine) in a dataset [message #1741 is a reply to message #1735] Tue, 13 September 2022 22:44 Go to previous messageGo to previous message
thomas is currently offline  thomas
Messages: 655
Registered: June 2014
Senior Member
Dear Jon,

the following macro would be another example of how you could achieve this:
The macro adds the chemical formula and removes all numbers from it. Then it counts how often any atom combination exists, removes duplicate atom combinations, defines a custom order on the column that contains the atom combination string based on its frequency, and then creates a view with a bar chart showing all combinations with frequencies sorted by the frequencies.

To try out the macro on any data set with chemical structures copy the following macro text. Then in DataWarrior select "Macro->Paste Macro". The run the macro with "Macro->Run Macro->CountAtomCombinations"

<macro name="CountAtomCombinations">
<task name="addMolecularFormula">
structureColumn=Structure
</task>
<task name="findAndReplace">
isStructure=false
column=Molecular Formula
isRegex=true
what=[0-9]
with=
</task>
<task name="addCalculatedValues">
columnName=Atom Combination Frequency
isOverwrite=false
formula=frequency(MolecularFormula,"Molecular Formula")
</task>
<task name="deleteDuplicateRows">
caseSensitive=true
columnList=Molecular Formula
addCount=false
</task>
<task name="selectView">
viewName=Table
</task>
<task name="sortRows">
column=Atom Combination Frequency
selectedFirst=false
descending=true
</task>
<task name="setCategoryCustomOrder">
sortMode=mean
isStructure=false
column=Molecular Formula
isAscending=true
sortColumn=Atom Combination Frequency
</task>
<task name="new2DView">
where=center
whereView=Table
newView=2D View
</task>
<task name="setPreferredChartType">
type=bars
column=Atom Combination Frequency
viewName=2D View
mode=mean
</task>
<task name="assignOrZoomAxes">
high1=0.0
low1=1.0
column1=Molecular Formula
millis=1000
viewName=2D View
</task>
</macro>
 
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Molecular Descriptors Tools (Free for non-commercial use)
Next Topic: Stereochemical questions
Goto Forum:
  


Current Time: Sat May 11 18:11:32 CEST 2024

Total time taken to generate the page: 0.05311 seconds