openmolecules.org

 
Home » DataWarrior » Bug Reports » SMILES Code (Uncaught Exception:2)
SMILES Code [message #2007] Tue, 12 September 2023 09:27 Go to next message
Anne
Messages: 4
Registered: September 2023
Junior Member
When I paste the SMILES codes of some structures, expecially the one with stereochemistry indication, an error message "Uncaught Exception:2" would pop out more than one hundred times. I can only press "OK" more than one hundred times to bypass this error. Confused Crying or Very Sad

D-Glucal is an example. I pasted the SMILES code "O[C@H]1[C@H](O)[C@@H](CO)OC=C1" into the structure drawing panel, and then the error message shown again and again.

My DataWarrior version is V05.05.00. Could you let me know how to fix this error? Do I miss any step?
Re: SMILES Code [message #2011 is a reply to message #2007] Wed, 13 September 2023 17:53 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 702
Registered: June 2014
Senior Member
I tried to reproduce the problem, but failed. When I paste the SMILES from your message I always got the proper molecule or substructure, depending on the context. I tried directly in the table view, in a structure filter (directly and after opening the editor). I also tried in the Chembl retrieval dialog in substructure and similarity modes. I also tried with the old V5.5.0 as well as with the current development version.

There must be a different reason. What is the source of the SMILES? May there be invisible characters in between? Can you help me to reproduce the error? Which OS do you use? I suggest to update to the dev version (go to DataWarrior download page, click on 'read and understood..., and find the update link in the small print).

[Updated on: Wed, 13 September 2023 17:54]

Report message to a moderator

Re: SMILES Code [message #2016 is a reply to message #2011] Mon, 18 September 2023 10:55 Go to previous messageGo to next message
Anne
Messages: 4
Registered: September 2023
Junior Member
Thank you for your kind and prompt reply. The SMILES code is transformed via ChemDraw. I can also obtain the structure in DataWarrior via that SMILES code, but it seems the error message is unavoidable. I need to press "OK" to make the system accept the structure. However, there is no error message when I paste the SMILES code "C1CCCCC1C2CCCCC2", which corresponding to a structure without any stereochemistry information.

We install the DataWarrior on Win10, traditional Chinese version. Would this error caused by the conversion between double-byte and single-byte system?
Re: SMILES Code [message #2017 is a reply to message #2016] Tue, 19 September 2023 23:01 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 702
Registered: June 2014
Senior Member
I assume that you a right and the problem is connected to the Chinese Windows version, on which the default character encoding is probably GBK, while on western OS version it is UTF-8.

I am currently checking a few things simulating with standard encoding GBK, but don't understand the cause of the problem yet...

[Updated on: Wed, 20 September 2023 10:19]

Report message to a moderator

Re: SMILES Code [message #2018 is a reply to message #2017] Wed, 20 September 2023 13:03 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 702
Registered: June 2014
Senior Member
I assume that the problem happens, when the String from the clipboard is converted to bytes before the SMILES is parsed. In version 5.5.0 the conversion of the String to bytes is done by the default operating system character set. I simulated that using various Chinese default character sets as GBK, Big5, and GB18030. It seems that all these convert the SMILES nicely to one byte per character, because all the characters in the SMILES (including '@') seem to be represented in all these char-sets by the same one-byte value as when using UTF-8. However, when using UTF-16 every character is converted into two bytes, which causes as lot of trouble. Thus, I assume that your machine default character encoding is an unusual one that does not convert '@' into one byte.

I have update DataWarrior to explicitly use UTF-8 for most of the String<->byte[] conversions, which should also cover the clipboard stuff. Can you please check with the current dev version (dw_wl_win.zip), whether the problem is solved? If not, I suggest to download and use the dw_wl_win_d.zip update file (same URL plus '_d'). If you start the contained DataWarrior_d.exe from the command line, you get debug output that hopefully explains, if sent to me, where to look for the problem.

Sorry for the trouble,

Thomas
Re: SMILES Code [message #2019 is a reply to message #2018] Thu, 21 September 2023 04:06 Go to previous messageGo to next message
Anne
Messages: 4
Registered: September 2023
Junior Member
Surprised Surprised Surprised Thank you so so much for your reply. I have sent the examples to you. Sorry that I spend some time to make this clear.

While I use “paste special” to transform the data from EXCEL, the SMILES code transform to each corresponding structure successfully without message.

While I try to add a whole new structure to the DataWarrior after the paste special step, countless messages will pop out. I need to correct my previous report. Regardless of the stereochemistry, the unexpected message will show. “CC(=O)C” also has the same outcome.

When I paste the “O[C@H]1[C@H](O)[C@@H](CO)OC=C1” to different conditions, different situations will show.

Please feel free to let me know any idea from you. I can try.

[Updated on: Thu, 21 September 2023 09:56]

Report message to a moderator

Re: SMILES Code [message #2020 is a reply to message #2019] Fri, 22 September 2023 11:43 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 702
Registered: June 2014
Senior Member
Many thanks for the examples. I could finally reproduce the problem. It wasn't related to character sets. The culprit was the 3D-view, which, when showing molecular structures on an axis, forgot to increase the molecule buffer space, when new molecules are added through pasting them directly into the table.

I have fixed this and deployed the update as dw_wl...zip archives with replacement files, which can be downloaded as decribed above. Please let me know, if something is not working as expected...
icon14.gif  Re: SMILES Code [message #2021 is a reply to message #2020] Mon, 25 September 2023 07:22 Go to previous message
Anne
Messages: 4
Registered: September 2023
Junior Member
Laughing It works! Shocked
I sincerely appreciate all your time and effort for this. Surprised Please feel free to let me know if there is any further step I should take.
Previous Topic: Copy-paste for structures throwing error in the master branch
Next Topic: .csv file cannot open DIRECTLY in Version 6
Goto Forum:
  


Current Time: Wed Oct 09 21:20:30 CEST 2024

Total time taken to generate the page: 0.02930 seconds