Comparing two databases by merging [message #2226] |
Mon, 17 June 2024 05:06 |
tingjenc
Messages: 2 Registered: June 2024
|
Junior Member |
|
|
Hi,
I tried to compare two SDF databases by merging two databases with structure as the merge key. I also followed a previous discussion by using canonical code for merging. All I got is "The defined key column(s) contain duplicate data in some rows and cannot uniquely identify each row." I indeed used ChemFinder to confirm that there are no duplicated structures in the two databases. So, I am stuck here. Is there anything I can do to use "merge" successfully? Or I can try other ways to compare two databases?
|
|
|
Re: Comparing two databases by merging [message #2228 is a reply to message #2226] |
Fri, 21 June 2024 10:05 |
thomas
Messages: 711 Registered: June 2014
|
Senior Member |
|
|
Hi tingjenc,
Structures are stored as canonical text string (idcodes). Thus, unless you intend to merge different stereo isomers or tautomers, you don't need to use canonical codes for merging. When merging, only the second file's key column(s) need to be unique. You could try a couple of things:
- change the order of your files. Possibly, one of the two files has unique structures.
- before merging the second file, you could "Data -> Merge Equivalent Rows" selecting 'Structure' as criterion. After that merging both files using 'Structure' should work.
- To find and display your duplicate structures within the second file you could "List -> Create Row List From -> Unique Rows" using the 'Structure' column. Then, select 'unique rows' in the new list filter, invert the filter, and click on the 'Structure' table header to sort by Structure. Redundant structures are now shown together.
- Instead of merging, you could use 'Chemistry -> Find Similar Compounds In File...". Here you can define a similarity limit using any descriptor rather than using an exact structure match, what merging does.
Hope, this helps...
|
|
|
|