openmolecules.org

 
Home » DataWarrior » Functionality » Question about R-group occurences estimation
Re: Question about R-group occurences estimation [message #773 is a reply to message #759] Tue, 11 February 2020 14:53 Go to previous messageGo to previous message
nbehrnd is currently offline  nbehrnd
Messages: 211
Registered: June 2019
Senior Member
There are two possible difficulties to retrieve methyl groups as-such in a list of SMILES. For one, the characteristic of them is the (single) carbon atom and searching just for this is less identifying than the string of c1ccccc1 about benzene, for example. Second, probably there would be a need to add explicit hydrogens on all SMILES before these would be easier to identify (which may be done, e.g., with babel).

If you have access to Python, then the additional module by rdkit (http://rdkit.org/) may be quite helpful to querry your SMILES with SMARTS. With the test file of smiles_list.smi attached below, the identification of methyl groups (in SMART's convention, expressed as [CH3]) works fine both locally -- per SMILES entry -- as well as in counting the globally:

from rdkit import Chem
smiles_source = "smiles_list.smi"
grand_total = 0

# example pattern to identify and count:
functional_group = Chem.MolFromSmarts('[CH3]')  #  methyl group

# alternative examples:
#functional_group = Chem.MolFromSmarts('c1ccccn1')  # for a pyridine
#functional_group = Chem.MolFromSmarts('C1CCCCC1')  # for cyclohexane

with open(smiles_source, mode="r") as source_file:
    for index, line in enumerate(source_file, start=1):
        molecule = Chem.MolFromSmiles(line.strip())
        match = molecule.GetSubstructMatches(functional_group)

        print("{:3} matches in entry {:2}: {}.".format(
            len(match), index, line.strip()))

        grand_total += len(match)

print("\nIn total {} instances were identified.".format(grand_total))


index.php?t=getfile&id=145&private=0
index.php?t=getfile&id=146&private=0

As DataWarrior relies on java, the implementation of the «ErtlFunctionalGroupsFinder» described by Fritsch et al. ( https://jcheminf.biomedcentral.com/articles/10.1186/s13321-0 19-0361-8, open access) equally may be of interest for Thomas.
  • Attachment: example.png
    (Size: 30.77KB, Downloaded 841 times)
  • Attachment: listing.png
    (Size: 51.00KB, Downloaded 719 times)
  • Attachment: smiles_list.smi
    (Size: 0.12KB, Downloaded 338 times)
  • Attachment: example.py
    (Size: 0.76KB, Downloaded 335 times)
 
Read Message
Read Message
Read Message
Read Message
Previous Topic: table view missing
Next Topic: Postgresql connection
Goto Forum:
  


Current Time: Sun May 19 15:37:57 CEST 2024

Total time taken to generate the page: 0.05359 seconds