openmolecules.org

 
Home » DataWarrior » Functionality » Similarity analysis using "find similar compounds..." - slow analysis of libraries (Similarity analysis)
Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries [message #1222 is a reply to message #1211] Thu, 11 February 2021 10:55 Go to previous messageGo to previous message
thomas is currently offline  thomas
Messages: 655
Registered: June 2014
Senior Member
Sorry for the delay. Since the MacOS and the Linux version use exactly the same jar file, they both use the same procedure. However, it slipped my attention that your version didn't use multiple threads for the similarity calculation unless you used the flexophore descriptor. With that version I made a comparison on my Linux desktop and my MacBookPro (87.000 against 105.000 compounds), which took about 20 minutes (FragFp) and >2 hours (SkelSpheres) on both computers. I had expected the Linux machine to be faster, because it has a desktop i7 compared to the some years older MacBookPro laptop i7. I don't have an explanation for the Mac being equally fast.

I now updated the code (and dev version) to use all threads when calculating similarities independent of the descriptor type, which now increased the speed by factor 4 on my hexacore Linux. The speed gain is not higher, because not everything is multithreaded and because of the overhead to launch threads for every molecule from the second file set. In general the needed processing time should grow more or less linearly with the number of compounds in file 1 and 2. 30 or even 170 hours seem very high to me. Which descriptor did you use? The molecule size does not play a role, because the calculation is running on descriptors only. If the second file doesn't contain the needed descriptors, they are calculated on the fly, which then adds to the time needed. I will make some more tests with larger files...
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Confidentiality of compound structures
Next Topic: reaction enumeration reagent connection
Goto Forum:
  


Current Time: Mon Apr 29 10:04:03 CEST 2024

Total time taken to generate the page: 0.04982 seconds