Home » DataWarrior » Bug Reports » SMILES Code (Uncaught Exception:2)
|
Re: SMILES Code [message #2011 is a reply to message #2007] |
Wed, 13 September 2023 17:53   |
thomas
Messages: 728 Registered: June 2014
|
Senior Member |
|
|
I tried to reproduce the problem, but failed. When I paste the SMILES from your message I always got the proper molecule or substructure, depending on the context. I tried directly in the table view, in a structure filter (directly and after opening the editor). I also tried in the Chembl retrieval dialog in substructure and similarity modes. I also tried with the old V5.5.0 as well as with the current development version.
There must be a different reason. What is the source of the SMILES? May there be invisible characters in between? Can you help me to reproduce the error? Which OS do you use? I suggest to update to the dev version (go to DataWarrior download page, click on 'read and understood..., and find the update link in the small print).
[Updated on: Wed, 13 September 2023 17:54] Report message to a moderator
|
|
|
|
Re: SMILES Code [message #2017 is a reply to message #2016] |
Tue, 19 September 2023 23:01   |
thomas
Messages: 728 Registered: June 2014
|
Senior Member |
|
|
I assume that you a right and the problem is connected to the Chinese Windows version, on which the default character encoding is probably GBK, while on western OS version it is UTF-8.
I am currently checking a few things simulating with standard encoding GBK, but don't understand the cause of the problem yet...
[Updated on: Wed, 20 September 2023 10:19] Report message to a moderator
|
|
|
Re: SMILES Code [message #2018 is a reply to message #2017] |
Wed, 20 September 2023 13:03   |
thomas
Messages: 728 Registered: June 2014
|
Senior Member |
|
|
I assume that the problem happens, when the String from the clipboard is converted to bytes before the SMILES is parsed. In version 5.5.0 the conversion of the String to bytes is done by the default operating system character set. I simulated that using various Chinese default character sets as GBK, Big5, and GB18030. It seems that all these convert the SMILES nicely to one byte per character, because all the characters in the SMILES (including '@') seem to be represented in all these char-sets by the same one-byte value as when using UTF-8. However, when using UTF-16 every character is converted into two bytes, which causes as lot of trouble. Thus, I assume that your machine default character encoding is an unusual one that does not convert '@' into one byte.
I have update DataWarrior to explicitly use UTF-8 for most of the String<->byte[] conversions, which should also cover the clipboard stuff. Can you please check with the current dev version (dw_wl_win.zip), whether the problem is solved? If not, I suggest to download and use the dw_wl_win_d.zip update file (same URL plus '_d'). If you start the contained DataWarrior_d.exe from the command line, you get debug output that hopefully explains, if sent to me, where to look for the problem.
Sorry for the trouble,
Thomas
|
|
|
|
|
|
Goto Forum:
Current Time: Thu Apr 03 17:20:51 CEST 2025
Total time taken to generate the page: 0.04186 seconds
|