openmolecules.org

 
Home » DataWarrior » Bug Reports » > and < signs not interpreted in 6.0.0
> and < signs not interpreted in 6.0.0 [message #2097] Thu, 11 January 2024 12:19 Go to next message
mcmc
Messages: 23
Registered: April 2018
Junior Member
I noticed that 6.0 treats values like <2 and >10 differently from 5.5
In the old version these were treated as numbers, and a column would be correctly assigned as numeric if it would contain entries like these. Allowing for rounding, colouring scales etc.
Now columns with >2 are interpreted as text, generating errors on all my old macros.
Manually setting the columns to floating point generates errors too (image attached).

EDIT: there seems to be an overarching phenomenon, which is the treatment of spaces (notably the lack thereof) in cells.
'>2' seems to be recognized as text, while
'> 2' is treated as a number.
This was different in 5.5

Likewise, multiple numbers without spaces, are treated as text:
'4.5;4.8' is text, while
'4.5; 4.8' are two consequetive numbers.

Clearly it would be great it this could be backwards compatible to 5.5 again.

[Updated on: Thu, 11 January 2024 16:36]

Report message to a moderator

Re: > and < signs not interpreted in 6.0.0 [message #2098 is a reply to message #2097] Thu, 11 January 2024 23:42 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 224
Registered: June 2019
Senior Member
Hello mcmc,

in a small library of random molecules generated by DW 6.0, I observe a close `<2`(example metal atom count), or `>30` (example nonH atom count) to be recognized as a comparison against a number; just as anticipated and just as the pattern experience prior to the transient to the new version.

Regards,

Norwid
Re: > and < signs not interpreted in 6.0.0 [message #2100 is a reply to message #2098] Fri, 12 January 2024 12:03 Go to previous messageGo to next message
mcmc
Messages: 23
Registered: April 2018
Junior Member
Unfortunately, I cannot test this anymore, as I have rolled back to 5.5 now.
I had a DW file, with only a few rows, where one column consisted of these values
>2000
55.6
20
Right-clicking on the column heading indicated that these were not numbers. No option to round, for example.
As soon as I introduced the space by editing the cell to '> 2000', the column reverted to numbers and I could round.
This dwar file was not created from scratch with 6.0. Not sure if that could have been part of the problem, but one would still expect backwards compatibility to open older files.

The issue with the multiple numbers was reported by a colleague of mine, so we are at least two people observing space issues.
Re: > and < signs not interpreted in 6.0.0 [message #2101 is a reply to message #2100] Sat, 13 January 2024 20:18 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 224
Registered: June 2019
Senior Member
In continuation with DW 06.00.00, I created a new array: the first column with three random molecules (DW generated), the second column added with Data -> Add empty columns, where I selected "text" as category. Manually, I added the strings `>2000`, `55.6`, and `20` to the cells as a test property (cf. attached .dwar and `dw_screenphoto_01.png`). The subsequent addition of a third column by Data -> Add Calculated Values used a function of

if(test_property>20, "big", "small")

proceed smoothly. In your case application, do you equally have the automatic assignment of the column type enabled? This is based on the observation the intended computation is going to fail if the second column (still/accidentally) is set to the level of `text` -- regardless if `force categories` is activated, or not. Subsequent adjustment of the second column, then re-run by right-click on the third column and "update formula and recalculate" adjusted the results.

However: I assumed a right-click on the third column and subsequent "Re-Calculate All Columns" would be more helpful for a global update of the array of maybe multiple columns of calculated values. Hélàs no, it replaced the original entries in the second column by the results of my formula, and yielded the "NaN" indicator in the third column (dw_screenphoto_03.png). This observation is less intuitive to me.

Regards,

Norwid

[Updated on: Sat, 13 January 2024 20:19]

Report message to a moderator

Re: > and < signs not interpreted in 6.0.0 [message #2102 is a reply to message #2101] Wed, 17 January 2024 10:18 Go to previous messageGo to next message
mcmc
Messages: 23
Registered: April 2018
Junior Member
Hi Norwid, all columns are set to automatic type. The same .dwar file behaves differently in 5.5 and 6.0 in terms of interpretation of these spaces. Numbers become text - presumably due to the automatic type.
Re: > and < signs not interpreted in 6.0.0 [message #2105 is a reply to message #2102] Sat, 20 January 2024 15:02 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 716
Registered: June 2014
Senior Member
To mcmc: DataWarrior 6.0.0 should be compatible with earlier versions regarding the interpretation of numbers with modifiers like '>2000'. I tested your example (>2000, 55.6, 20) which correctly recognized the column as numerical. Thus, I assume there is something else fishy. Which OS do you use? Possibly the OS localization setting has some influence. Do you use decimal points '.' or ','? If I can reprodice, I may be able to update the numerical recognition circumvent the problem.

To Norvid: Many thanks for your comments on this. I could reproduce your phenomenon with writing into the 'test_property' column instead of the 'Calculated' column. This happened when clicking 'overwrite' twice. When changing the overwrite setting, DataWarrior was adapting the target column setting in a strange way: when choosing to overwrite, it hanged the setting to an existing column ('test_property'Wink and when unchecking again, it was not updating the target column. It was also not perceiving that 'not to overwrite' an existing target column is not really compatible. I updated the behaviour to be more meaningful.
> sign properly interpreted, but '2;3' is interpreted as text, not numercal [message #2254 is a reply to message #2097] Tue, 30 July 2024 12:11 Go to previous messageGo to next message
mcmc
Messages: 23
Registered: April 2018
Junior Member
Hello. Finally getting back to this issue now.
I am running 6.2.1 on my private PC now, and can reproduce half of the issue encountered earlier.
That is, the < and > signs are correctly interpreted, also without a whitespace in between.

However, two numbers in one cell, separated by nothing but a ; is not interpreted as numerical.

Started a fresh DW session and loaded the Wikipedia compounds.
Hovering over the Molweight column header, it nicely shows the values in there are numerical. This is still correct if I add '>' to the first value (image1).
However, if I add a second number to a molecular weight, without whitespace, the perceived data type changes to text (image2). Adding a space reverts it back to numerical (image 3).

I also attach the .dwar (only first 3 wiki compounds).

(the reason this is important to us, is that we use an in-house application that generates dwar files from our internal project SQL databases. If say an IC50 was measured more than once, each value is written into the dwar file separated by ';' without whitespace. We can of course change those scripts to include the whitespace, but we have hundreds of dwar files in circulation without the whitespace, that cannot be properly interpreted anymore by DW6).

[EDIT: 6.2.1 tested on Win11 Home 23H2. Initial report was with 6.0.0 on Win10 Pro, likely 22H2]

[Updated on: Tue, 30 July 2024 13:44]

Report message to a moderator

Re: > sign properly interpreted, but '2;3' is interpreted as text, not numercal [message #2255 is a reply to message #2254] Tue, 30 July 2024 19:31 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 224
Registered: June 2019
Senior Member
Dear mcmc,

with multiple measurements of one property about one structure record, could 1) creating individual .dwar files (each about one structure record with exactly one measurement of the property of interest) and 2) the subsequent merge of equivalent rows (where multiple measurements in one cell cell are separated by a linefeed) represent a workflow suitable for you? Though I didn't yet use DW in the context of SQL, your question seems to relate to the Yuan's approach documented by May 28th, 2024.[1]

Best regards,

Norwid

[1] https://openmolecules.org/forum/index.php?t=msg&th=743&a mp;start=0&
Re: > sign properly interpreted, but '2;3' is interpreted as text, not numercal [message #2257 is a reply to message #2255] Wed, 31 July 2024 09:52 Go to previous messageGo to next message
mcmc
Messages: 23
Registered: April 2018
Junior Member
Thanks for the reply, Norwid. I take it you could reproduce the issue? Also on windows?
DW has clearly been designed to handle multiple values in one cell (see image).
In plotting, it will automatically plot the average, which is typically what you want.
In DW5, there was no difference in handling '2;3' vs '2; 3'.

There is some resemblance to Yuan's request (although I would never put results from different assays into the same cell), but that doesn't solve my backwards compatibility issue..
  • Attachment: image4.jpg
    (Size: 35.08KB, Downloaded 66 times)

[Updated on: Wed, 31 July 2024 09:53]

Report message to a moderator

Re: > sign properly interpreted, but '2;3' is interpreted as text, not numercal [message #2264 is a reply to message #2257] Wed, 31 July 2024 14:51 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 716
Registered: June 2014
Senior Member
Dear mcmc,

sorry, it took me a while to understand the issue. A single semicolon was never meant to be a valid separator. DataWarrior always creates and expects '; ' (and '\n'). Nevertheless, you are right that version 5 also separated entries, when delimited with a single semicolon. This was more an error than intended.

For compatibility I will go back to the old handling in version 6.2.2 that will be out soon...

Sorry for the inconvenience,

Thomas

[Updated on: Wed, 31 July 2024 14:51]

Report message to a moderator

Re: > sign properly interpreted, but '2;3' is interpreted as text, not numercal [message #2267 is a reply to message #2264] Wed, 31 July 2024 17:50 Go to previous message
mcmc
Messages: 23
Registered: April 2018
Junior Member
Excellent - thanks a lot!
Previous Topic: Extract fragment
Next Topic: Data warrior crash after Mac OS update
Goto Forum:
  


Current Time: Thu Dec 12 00:06:09 CET 2024

Total time taken to generate the page: 0.02407 seconds