|
Re: SMILES Code [message #2011 is a reply to message #2007] |
Wed, 13 September 2023 17:53 |
thomas
Messages: 716 Registered: June 2014
|
Senior Member |
|
|
I tried to reproduce the problem, but failed. When I paste the SMILES from your message I always got the proper molecule or substructure, depending on the context. I tried directly in the table view, in a structure filter (directly and after opening the editor). I also tried in the Chembl retrieval dialog in substructure and similarity modes. I also tried with the old V5.5.0 as well as with the current development version.
There must be a different reason. What is the source of the SMILES? May there be invisible characters in between? Can you help me to reproduce the error? Which OS do you use? I suggest to update to the dev version (go to DataWarrior download page, click on 'read and understood..., and find the update link in the small print).
[Updated on: Wed, 13 September 2023 17:54] Report message to a moderator
|
|
|
Re: SMILES Code [message #2016 is a reply to message #2011] |
Mon, 18 September 2023 10:55 |
Anne
Messages: 4 Registered: September 2023
|
Junior Member |
|
|
Thank you for your kind and prompt reply. The SMILES code is transformed via ChemDraw. I can also obtain the structure in DataWarrior via that SMILES code, but it seems the error message is unavoidable. I need to press "OK" to make the system accept the structure. However, there is no error message when I paste the SMILES code "C1CCCCC1C2CCCCC2", which corresponding to a structure without any stereochemistry information.
We install the DataWarrior on Win10, traditional Chinese version. Would this error caused by the conversion between double-byte and single-byte system?
|
|
|
Re: SMILES Code [message #2017 is a reply to message #2016] |
Tue, 19 September 2023 23:01 |
thomas
Messages: 716 Registered: June 2014
|
Senior Member |
|
|
I assume that you a right and the problem is connected to the Chinese Windows version, on which the default character encoding is probably GBK, while on western OS version it is UTF-8.
I am currently checking a few things simulating with standard encoding GBK, but don't understand the cause of the problem yet...
[Updated on: Wed, 20 September 2023 10:19] Report message to a moderator
|
|
|
Re: SMILES Code [message #2018 is a reply to message #2017] |
Wed, 20 September 2023 13:03 |
thomas
Messages: 716 Registered: June 2014
|
Senior Member |
|
|
I assume that the problem happens, when the String from the clipboard is converted to bytes before the SMILES is parsed. In version 5.5.0 the conversion of the String to bytes is done by the default operating system character set. I simulated that using various Chinese default character sets as GBK, Big5, and GB18030. It seems that all these convert the SMILES nicely to one byte per character, because all the characters in the SMILES (including '@') seem to be represented in all these char-sets by the same one-byte value as when using UTF-8. However, when using UTF-16 every character is converted into two bytes, which causes as lot of trouble. Thus, I assume that your machine default character encoding is an unusual one that does not convert '@' into one byte.
I have update DataWarrior to explicitly use UTF-8 for most of the String<->byte[] conversions, which should also cover the clipboard stuff. Can you please check with the current dev version (dw_wl_win.zip), whether the problem is solved? If not, I suggest to download and use the dw_wl_win_d.zip update file (same URL plus '_d'). If you start the contained DataWarrior_d.exe from the command line, you get debug output that hopefully explains, if sent to me, where to look for the problem.
Sorry for the trouble,
Thomas
|
|
|
|
|
|