openmolecules.org

 
Home » DataWarrior » Functionality » Similarity analysis using "find similar compounds..." - slow analysis of libraries (Similarity analysis)
Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries [message #1141 is a reply to message #1138] Sun, 29 November 2020 00:05 Go to previous messageGo to previous message
thomas is currently offline  thomas
Messages: 655
Registered: June 2014
Senior Member
I have changed the algorithm again. Now it just writes the highest similarity and the number of compounds with similarity above threshold into the open file. This accelerates again. Now a 16k by 16k comparison takes about 10 seconds on my computer. A million by a million would probably take around 12 hours.

Putting two sets into one file and use the procedure I suggested earlier would not work for your purpose, because it just uses the complete similarity matrix of all compounds without considering sets. But I hope, the current update works for you. It can be downloaded as development patch from the download page after clicking the 'read and understood' box. The links are in the small print.

This task actually does not need much memory. It basically needs to fit the first file into the memory, which should be possible with even a few million compounds, if the -Xmx setting is adapted. The second file's size doesn't matter much, because it is processed row by row.

Please let me know, if there are problems of any kind.
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Confidentiality of compound structures
Next Topic: reaction enumeration reagent connection
Goto Forum:
  


Current Time: Mon Apr 29 14:47:03 CEST 2024

Total time taken to generate the page: 0.05409 seconds