openmolecules.org

 
Home » DataWarrior » Functionality » Filter out nasty functions (to design a macro to filter all nasty functions their names should be included )
Filter out nasty functions [message #1897] Sat, 13 May 2023 10:51 Go to next message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
dear Thomas
I am trying to design a filter for large tables of docked chemicals that, among other things, would automatically select for all toxic to eliminate them from the final table.

The partial macro looks like this:

<task name="calculateCompoundProperties">
propertyList=mutagenic tumorigenic reproEffective irritant nasty
structureColumn=Structure
</task>
<task name="changeCategoryFilter">
column=Mutagenic
settings=high low
duplicate=1
</task>
<task name="changeCategoryFilter">
column=Tumorigenic
settings=high low
duplicate=1
</task>
<task name="changeCategoryFilter">
column=Reproductive Effective
settings=high low
duplicate=1
</task>
<task name="changeCategoryFilter">
column=Irritant
settings=high low
duplicate=1
</task>
<task name="changeCategoryFilter">
column=Nasty Functions
settings=<multiple categories>
duplicate=1
</task>

The macro works well except for the nasty.
Despite using the <multiple categories>, the chemicals with "simultaneous nasties" per molecule were not removed.
For instance, the Nc1c([C@@H](CC(C2=CNC(Nc3ccccc3)=CC2=O)=O)C(C(O[C@@H]2[C@@H] 3CCCC2)=O)=C3O)cccc1 molecule predicted a double "nasty" like "polar activated DB; twice activated DB" but it was not removed among other 5434 molecules.

They may be not too many, but it would be better to remove any of those possible cases.
should those names be included at the settings?
That could be an enormous number of dual or perhaps nth number of possibilities!! Shocked

Is there any other alternative code solution?

thanks for your attention
julio




Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Re: Filter out nasty functions [message #1898 is a reply to message #1897] Sun, 14 May 2023 22:24 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 164
Registered: June 2019
Senior Member
Dear Julio,

I would like to add two suggestions; how the task is presented/shared, and a revision of the SMILES string.

a) After recording a DW macro, it is possible to export this via Macro -> Export Macro as a file with file extension .dwam. This offers the advantage for an easier/faster import to replicate your observations (Macro -> Import Macro in a first step, Macro -> Run Macro to apply the instructions) by any subsequent reader of your post, because it typically is small enough to be attached to a message here (up to five files in total [e.g. incl. a small test data set] for a maximum over all files of 2MB). It equally prevents the omission of lines; in your most recent example, the opening how the macro is named (for the display within the DW session) and closing line </macro> was missing.

For the purpose of illustration, I enclose a small test set .dwar, and a macro to assign SMILES and compute Mw as .dwam.

b) Curious about the structure the SMILES strings describes, I relayed it to openbabel to write a .sdf, however without success.

$ obabel -:"Nc1c([C@@H](CC(C2=CNC(Nc3ccccc3)=CC2=O)=O)C(C(O[C@@H]2[C@@H] 3CCCC2)=O)=C3O)cccc1" -h --gen3d -O test.sdf -xv3000
==============================
*** Open Babel Warning  in ParseSmiles
  Invalid SMILES string: 2 unmatched ring bonds.

0 molecules converted
Because equally ChemDraw test site[1] faces difficulties to process this one (Structure -> Load SMILES), as well as CDK Depict,[2] can you please check if the SMILES string shared contains the complete information?

With regards,

Norwid

[1] https://chemdrawdirect.perkinelmer.cloud/js/sample/index.htm l#
[2] https://www.simolecule.com/cdkdepict/depict.html
Re: Filter out nasty functions [message #1899 is a reply to message #1898] Wed, 17 May 2023 17:54 Go to previous messageGo to next message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
dear nbehrnd,
Thank you for your attention and my apologies for the irregularities you mention of my previous comunication. I am trying to improve it here.

I am doing intense DataWarrior Build Evolutionary Library during the last 2-3 months with several different systems and cavities. I manage to make libraries of a few thousand children molecules per every experiment with 3 runs each. However, I also found that 8.8 to 66 % of the children generated were classified as toxic by DW chemical properties depending on the runs, parent, etc.
Therefore I designed macro 1 to save the results in dwar and sdf formats, before and after elimination of the toxic-children molecules automatically (I enclose the 1.dwam file)

I first found out the children with the smiles I sent you with the 2 nasties without being eliminated by macro 1. However, after sending the topic to the forum, I realized that other nasties were also not being eliminated !!!!!.

I reproduced those failures and select some of the toxic rows and other healthy for you from experiments B10 (~3500 rows) and B13 (~5400 raws) in the files selected-15B10... and selected-24B13... Evolutionary_Library.dwar Both of them came from DataWarrior Build Evolutionary Libraries after being "detoxicated" with macro 1.

The particular smiles that I sent you in the previous communication, corresponds to ID879 of the experiment B10. I visually confirmed that it was the same smiles that I sent you before (I hope).


Thank you for your help !
sincerely Julio


Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Re: Filter out nasty functions [message #1900 is a reply to message #1899] Sun, 21 May 2023 12:24 Go to previous messageGo to next message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
I do not know how the following sentence appeared on the initial top title!

"to design a macro to filter all nasty functions their names should be included"

If that is the answer to my previous question on removing nasties, the next question would appear:

where in Datawarrior can I find out all the nasty names that DataWarrrior is checking when it filters them?

Any one knows it?

thank you in advance
julio


Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Re: Filter out nasty functions [message #1903 is a reply to message #1900] Tue, 23 May 2023 07:09 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 164
Registered: June 2019
Senior Member
Dear Julio,

so far, I understand your approach as following: departing on e.g., selected-15B10Evolutionary_Library.879sentbefore.dwar, you load the 1.dwam macro to filter out compounds which are not good enough for further consideration. In the attempt to replicate this, at level of exporting the results as .sdf files (the macro still running), there are multiple error messages. This either could be a) because of your macro, b) because of the version of DW I use (DW for Linux including the updates packaged by 2023-05-18), or c) a combination of the two.

I briefly tinkered a macro attached below which does the the filtering, however requires manual intervention to save the results as .dwar and .sdf. Conceptually, it builds on DW's assignment of toxicity properties -- like the one you built. This however is followed by calculating a column with an if clause; if you would apply this manually: Data -> Add Calculated Values. As for the formula used: DW's "if syntax" basically is

if(test condition, positive case, negative case)
as the general form, and

if (Mutagenic == "none" && Tumorigenic == "none" && ReproductiveEffective == "none" && Irritant == "none" && NastyFunctions == "", "retain entry", "skip entry")
as the one used here. You recognize e.g. Mutagenic as one of DW's assigned functions; here, DW is requested to check if the assignment was negative (as expressed as the string "none"), yet simultaneously (the &&) was fine about Tumorigenic, ReproductiveEffective, and Irritant. For NastyFunctions, I opted for an empty string as condition for a compound useful to retain. This is why, if an entry passes all these five tests well, its corresponding entry in the new column to build will be the string "retain entry", or else "skip entry".

The script then removes the filters of the individual properties (Tumorigenic, Irritant, etc) to leave only one if the compounds are good, or not for further work; for the ease of work with an additional green, or red background of the cell. For a much smaller set of molecules, this macro works well enough and does not yield the errors I observed with your macro. It however does not (yet) automatically save the results of «filtering» in a separate .sdf/.dwar file. Is this approach in line of what you like to accomplish?

As for «what defines a function nasty», one would have to check the source code,[1] as e.g. file /src/com/actelion/research/datawarrior/task/chem/DETaskCalcu lateChemicalProperties.java for example contains the string «nasty» 16 times and how this property is assigned either in this file's functions, or elsewhere in the source code. Maybe some criteria to mark compounds as not well suitable are similar to the ones in Lilly's criteria.[2]


Norwid

[1] https://github.com/thsa/datawarrior
[2] https://github.com/IanAWatson/Lilly-Medchem-Rules, https://github.com/IanAWatson/LillyMol_6_cmake

[Updated on: Tue, 23 May 2023 07:23]

Report message to a moderator

Re: Filter out nasty functions [message #1904 is a reply to message #1903] Thu, 25 May 2023 22:07 Go to previous messageGo to next message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
Thank you for your work Norwid !!! REALLY IMPRESSIVE !!.

It will take me sometime to digest all you sent me, test it with large EL dward files and look upon the urls you mention to understand what DW is doing. Hope I can reach the code. My last attempts were not very fruitful....

I will let you know of my advances on due time.

Thanks again, sincerely
julio



Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Re: Filter out nasty functions [message #1906 is a reply to message #1904] Sat, 27 May 2023 21:11 Go to previous messageGo to next message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
dear Norwid,
I tested an hybrid macro between my 1 and your suitable_compounds_color.
It was successful on a ~12000.dwar compound file from an Evolutionary Library (EL) !!!!!. Smile

With your "skip entry" filter, it would be impossible to retain any of those nasty rows for further analysis. In the EL example tested only ~3800 compounds were retained !!! Shocked

I am working under windows 10-64, which may be one of the reasons the macros did not exchange well. I relied mainly on recording macros to design mine. Nevertheless, I attached the final 11.dwam just in case you wanted to take a look or it may interest to others in the forum.

Thank you for the information on the nasties. My DW windows code says that there are 20 nasty compounds but I could not find their names........


  • Attachment: 11.dwam
    (Size: 1.71KB, Downloaded 5 times)


Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Re: Filter out nasty functions [message #1907 is a reply to message #1906] Mon, 29 May 2023 18:56 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 164
Registered: June 2019
Senior Member
Dear Julio,

I do not know the parameters to set up the evolutionary library you used -- motives may be similar to approved drugs, or natural products, or molecules derived from a different "seed library". Nor how the internally generated molecules eventually were filtered to be retained in the library; criteria may be sensitive to molecular patterns, scalar properties like molecular weight, or a combination. Second, your script seems to be stuck in the rut once the first set of molecules was saved. Because there are only the two levels of "safe molecules", and "harmful molecules" by only one remaining filter, the manual intervention after DW's computation to save either one sub set may be considered "still acceptable" (cf. the silent attached). (The small test set of natural product-like compounds, more than half were not considered "safe" either.)

The DataWarrior macro/script should be portable and work equally well regardless of the operating system in which DataWarrior is working. It was not previously tested in Windows because (for a couple of years) DW no longer is satisfied with a 32bit, but requires a 64bit system.

Norwid
  • Attachment: record.mp4
    (Size: 1.08MB, Downloaded 4 times)
Re: Filter out nasty functions [message #1908 is a reply to message #1907] Mon, 29 May 2023 19:07 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 164
Registered: June 2019
Senior Member
Dear Thomas,

the .dwam DataWarrior macros may grow over multiple "blocks" of tasks like in

<task name="saveFileAs">
fileName=#ask#
</task>
I would like to know if there are permitted special characters to add annotating comments. Percent sign, number sign, exclamation mark as in .tex, Python, or Fortran respectively do not seem suitable here. The addition of an empty line into a .dwam file breaks the macros' working.

Or, would the optional addition of comments add too much complexity to DW's working? This were plausible because the macro editor allows to change the sequence of the individual tasks by click-and-move of the individual tasks ahead.

Norwid
Re: Filter out nasty functions [message #1909 is a reply to message #1908] Thu, 01 June 2023 09:40 Go to previous message
juliocoll is currently offline  juliocoll
Messages: 11
Registered: January 2023
Location: Madrid SPAIN
Junior Member
dear Norwick,
I am only using either drugs or natural products for the evolutionary libraries (EL). The criteria were only: docking scores (x4 relative weight), molecular weights between 400-500g/mol (x2) and logP<3 (x1). I am evolving different parents and pdb complexes from several biological systems: coronavirus, monkey-vaccinia, rodenticides, new antibiotics, collagen hsp, and other. That´s all.

Thanks for the movie. GOOD IDEA!!!.
I will try to incorporate that method to the macro11 to avoid the need for the actual *.sdf manual elimination!

Just one more question:
Is it possible to save in a variable the number of rows at the bottom of the tables (Visible:... and Total:...Wink?
I am manually using those row numbers to easily differentiate *,dwar and *sdf files from different experiments. It will be great to automatically save those number in the file name.
I would like also to automatically incorporate into the variable a short label for each experiment.
I need the EL *.sdf files to convert them to *.pdbqt for AutoDockVina for consensus with docking-scores in nM affinities.....
When getting a large number of different files it is confusing for me to keep track of all those files to avoid mistakes even using different directories. I usually have a lot of those!!!

Thank you for your attention!
sincerely
julio
Any ideas?





Julio Coll
Profesor de Investigación. Emérito
Dpt.Biotecnología
CSIC-centro Nacional INIA, Madrid, SPAIN
Dr. Biologia Univ.Comp(UCM). Madrid, Spain
PHD. Biology Mass.Inst,Technol (MIT). Massachusetts, USA
orcid: 0000-0001-8496-3493
Previous Topic: 'Search ChEMBL Database' function
Goto Forum:
  


Current Time: Fri Jun 02 00:33:27 CEST 2023

Total time taken to generate the page: 0.01979 seconds