openmolecules.org

 
Home » DataWarrior » Cheminformatics » Splitting column data (Extracting year of publication)
Splitting column data [message #1013] Mon, 27 July 2020 14:20 Go to next message
sansun is currently offline  sansun
Messages: 23
Registered: April 2019
Junior Member
I have a column with references in the following format.

Proc. Natl. Acad. Sci. U.S.A. 105 (26), 9059-9064 (2008)

Is it possible to extract 'journal name', 'year', etc. in separate columns?
Re: Splitting column data [message #1017 is a reply to message #1013] Wed, 29 July 2020 10:50 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 341
Registered: June 2014
Senior Member
if your references would be in a column called 'Lit' you could use something like the attached macro. The problem is that there is no easily recognizable separator between the journal name and the first number. This could be solved with a new function lastIndexOf() or reverseIndexOf(). Please let me know, if your references are similar enough for the macro to work. Then I would add the needed function for the remaining cut to make.

[Updated on: Wed, 29 July 2020 10:51]

Report message to a moderator

Re: Splitting column data [message #1018 is a reply to message #1017] Wed, 29 July 2020 12:19 Go to previous messageGo to next message
sansun is currently offline  sansun
Messages: 23
Registered: April 2019
Junior Member
Hi Thomas,

Yes references are in a similar format. However, in some cases several references are there in the same cell due to merging of rows.

Mainly I need 'year' information which is within the brackets.
Re: Splitting column data [message #1019 is a reply to message #1017] Wed, 29 July 2020 12:26 Go to previous messageGo to next message
sansun is currently offline  sansun
Messages: 23
Registered: April 2019
Junior Member
I tried your macro and following is the result.
I have several references in a cell and it recognizes only the first one, I guess.

index.php?t=getfile&id=255&private=0

  • Attachment: Macro-Lit.png
    (Size: 137.31KB, Downloaded 28 times)
Re: Splitting column data [message #1020 is a reply to message #1019] Thu, 30 July 2020 07:28 Go to previous message
sansun is currently offline  sansun
Messages: 23
Registered: April 2019
Junior Member
Another observation related to this query.

When I download Chembl data from within Datawarrior (from Database tab), it gives references in the above-mentioned format in a single column.

However, if I directly download data from the Chembl website in the form of .sdf file it gives 'journal name', 'year' etc. in the separate columns.

Probably the later format is better if splitting is going to be a problem.
Previous Topic: Box plots
Next Topic: TPSA
Goto Forum:
  


Current Time: Thu Aug 06 21:16:05 CEST 2020

Total time taken to generate the page: 0.00748 seconds