openmolecules.org

 
Home » DataWarrior » Cheminformatics » Visualising SMILES string
Visualising SMILES string [message #1256] Thu, 18 March 2021 10:55 Go to next message
amorrison
Messages: 38
Registered: March 2016
Member
Hi Thomas,

I have a smiles string with an attachment point that has been created in another application, e.g. C1CCC([*:1])CC1. Datawarrior fails to identify it as a valid smiles either on paste or convert structure to name. I'm assuming it's the '*:1' that is the issue. Can I modify the smiles so it can be interpreted by Datawarrior?

many thanks in advance,

Angus
Re: Visualising SMILES string [message #1257 is a reply to message #1256] Sat, 20 March 2021 15:07 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 715
Registered: June 2014
Senior Member
Hi Angus,

if you replace [*:1] by *, then the atom is interpreted as wildcard query feature, which is drawn as a '?'.
In this case the new molecule is considered as sub-structure rather than as normal structure. The difference is
that a sub-structure may have query features and all unspecified atom valences are not implicitly considered
to be blocked by hydrogen atoms. Therefore, a single bonded oxygen is shown as -O in contrast to -OH in a normal molecule.

DataWarrior makes this distinction, which is not made by most other software, possibly, because the SMILES definition itself does not distinguish between these two structure flavors.

Thomas
Re: Visualising SMILES string [message #1258 is a reply to message #1257] Sun, 21 March 2021 05:00 Go to previous messageGo to next message
amorrison
Messages: 38
Registered: March 2016
Member
Thanks Thomas,

I've tried this. Doing a find and replace my smiles string becomes - C1CCC([*])CC1. If try a structure to name this is still not recognised as a valid smiles. If I remove the square bracket - C1CCC(*)CC1. I get the structure but lose the attachment point.

Sorry if this trivial I'm a med chemist and not so familiar with smiles syntax.

Thanks,

Angus
Re: Visualising SMILES string [message #1261 is a reply to message #1258] Tue, 23 March 2021 16:38 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 715
Registered: June 2014
Senior Member
Dear Angus,

I have extended the SmilesParser within OpenChemLib and used by DataWarrior to support '*' and '?' as pseudo atom symbols. Both symbols can be used inside or outside square brackets. While '*' is inline with the opensmiles.org standard, '?' is not. Nevertheless, because of the missing distinction in the smiles syntax between query fragments and full molecules, I believe that '?' makes sense.

'*' creates a wild card atom, which is an atom query feature and only allowed in query fragments. Therefore, a smiles containing a '*' is atomatically perceived as a query structure rather than a full molecule. Its free valences are not considered as being filled with hydrogen.

'?' is converted into an atomicNo=0, which DataWarrior uses as attachment point, e.g. in R-groups after a SAR deconvolution. These atoms are meant to not exist. Their sole purpose is to carry the bond sticking out of the R-group.

If you paste the these 6 rows of smiles:

C1CCC([*:1])CC1O
C1CCC([*])CC1O
C1CCC(*)CC1O
C1CCC([?:1])CC1O
C1CCC([?])CC1O
C1CCC(?)CC1O

into a new DataWarrior Window (newest dev release), then you get the following table:
Note that I have added an oxygen atom. If you see an 'H' at the oxygen, the structure is considered a molecule. If not, the structure is a query fragment with open valences.

/forum/index.php?t=getfile&id=310&private=0
  • Attachment: t.png
    (Size: 41.90KB, Downloaded 808 times)
Re: Visualising SMILES string [message #1265 is a reply to message #1261] Fri, 26 March 2021 09:31 Go to previous message
amorrison
Messages: 38
Registered: March 2016
Member
Thanks Thomas, works great!
Angus
Previous Topic: Drug Likeness vs Drug Score
Next Topic: Count # scaffolds by plate ID
Goto Forum:
  


Current Time: Sun Nov 24 16:14:39 CET 2024

Total time taken to generate the page: 0.03829 seconds