openmolecules.org Forum: Functionality » Similarity analysis using "find similar compounds..."

Home » DataWarrior » Functionality » Similarity analysis using "find similar compounds..." - slow analysis of libraries (Similarity analysis)

Show: Today's Messages :: Polls :: Message Navigator

Sun, 29 November 2020 00:05

thomas
Messages: 747
Registered: June 2014

Senior Member

I have changed the algorithm again. Now it just writes the highest similarity and the number of compounds with similarity above threshold into the open file. This accelerates again. Now a 16k by 16k comparison takes about 10 seconds on my computer. A million by a million would probably take around 12 hours.

Putting two sets into one file and use the procedure I suggested earlier would not work for your purpose, because it just uses the complete similarity matrix of all compounds without considering sets. But I hope, the current update works for you. It can be downloaded as development patch from the download page after clicking the 'read and understood' box. The links are in the small print.

This task actually does not need much memory. It basically needs to fit the first file into the memory, which should be possible with even a few million compounds, if the -Xmx setting is adapted. The second file's size doesn't matter much, because it is processed row by row.

Please let me know, if there are problems of any kind.

Report message to a moderator

[Message index]

		Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Tue, 24 November 2020 12:35
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Tue, 24 November 2020 21:33
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: thomas on Thu, 26 November 2020 18:16
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Fri, 27 November 2020 01:35
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: thomas on Sun, 29 November 2020 00:05
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Sun, 29 November 2020 15:07
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Thu, 04 February 2021 11:11
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: thomas on Thu, 11 February 2021 10:55
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Mon, 01 March 2021 01:00
		Re: Similarity analysis using "find similar compounds..." - slow analysis of libraries By: SM2020 on Tue, 02 March 2021 23:40

Previous Topic:	Confidentiality of compound structures
Next Topic:	reaction enumeration reagent connection

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Tue Jul 28 15:36:17 CEST 2026

Total time taken to generate the page: 0.00653 seconds