openmolecules.org

 
Home » DataWarrior » Functionality » Calculation of mean/median values in box &whisker plots.
Calculation of mean/median values in box &whisker plots. [message #601] Wed, 24 July 2019 11:05 Go to next message
timritchie is currently offline  timritchie
Messages: 15
Registered: February 2015
Location: St Albans, UK
Junior Member
Hello,
I have a DW file of compounds, separated into different classes, and activity data. Some compounds do not have an activity (cells are empty).
When mean values are displayed in a box or whisker plot, the values changes when the compounds with no activity are included.
I'm not sure why this should happen since they shouldn't affect the calculation of the mean.
Any thoughts on why this occurs?
Thanks,
Tim Ritchie.
Re: Calculation of mean/median values in box &whisker plots. [message #602 is a reply to message #601] Thu, 25 July 2019 10:30 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 204
Registered: June 2019
Senior Member
Hello Tim,

lacking a minimal working example I can only guess what you refer too. Recent
work of mine with DW however equally considered data with columns lacking some
entries, too. The solution working «good enough» for me, aiming for whisker
plots and their statistics, however was to start with a table with each cell
in the corresponding column already filled with the place holder «N/A»; to be
replaced by real data only if these are at hand. (This equally may be entered
prior to DW with a conditional formatting in a spred sheet, or as manual edit
per cell in DW, too.) Thankfully, this kind of «other entry type» seen in other
statistical programs (e.g., R) seems to be recognized by DW, too.

As an example, I populated an array about the first ten alcanes, and let DW
determine their molecular weight; eventually displayed as a whisker plot (cf.
alkanes_complete.dwar). In a copy of this file (alkanes_except_octaneMW.dwar)
the entry about the molecular weight for octane's was substituted by the «N/A»
place holder. Now, the dot about the entry missing (or, in parlance of R, about
«[data] not available») is set below the others, no longer considered for further
statistics -- both on screen, as well as in the plots' statistics.

This approach was used with DataWarrior (stable release 5.0.0) running in Linux
Xubuntu (18.04.2 LTS, 64 bit).

Norwid

[Updated on: Thu, 25 July 2019 10:32]

Report message to a moderator

Re: Calculation of mean/median values in box &whisker plots. [message #613 is a reply to message #602] Sat, 24 August 2019 00:28 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 646
Registered: June 2014
Senior Member
Hi Tim, I tried to reproduce, but wasn't able to do so, with both normal and logarithmic view mode on the value column. Also Norwid's alkanes_without_octaneMW.dwar does not change mean nor median, when switching off the octane. If the problem still exists, do you have a sample file? Thanks in advance and sorry for the very late reply. Thomas
Re: Calculation of mean/median values in box &whisker plots. [message #617 is a reply to message #613] Sun, 25 August 2019 20:13 Go to previous messageGo to next message
nbehrnd is currently offline  nbehrnd
Messages: 204
Registered: June 2019
Senior Member
Hi Thomas, Hi Tim,

in my observation, all four statistical values provided in the whisker
plot do change by setting manually the cell entry of molecular mass to
the string of "N/A" (without the quotation marks). Processing the data
a twice allows me to retrieve the changes in the whisker plot and its
statistical data, too. Here I share my approach to the task with DW
(Linux version 5.0.0) with the test file alkanes_complete.dwar above:

Reading the file as-such which contains 10 complete entries:

index.php?t=getfile&id=91&private=0

Accessing the cell value for methane, "16.0428" is replaced by "=N/A".
As expected, DW will indicate this as a non-valid entry. Of course, no
whisker plot is provided now. But the dot is still present in the plot.

index.php?t=getfile&id=92&private=0

Next step, replacing "=N/A" by "N/A". DW accepting this now provides a
Box whisker plot. The dot without associated value is sorted out, the
statistical data are updated.

index.php?t=getfile&id=93&private=0

Two additional observations:

If "N/A" is entered for the first time, the line vanishes completely,
hence shortening the list of 10 alkanes to 9 alkanes. This removal
may affect other columns than the column currently worked on, too.
This contradicts the aim to preserve the complete line which only has
no entry for this very cell.

Say, there is a second entry with no value available. Then, a direct
input of "N/A" via DW's edit cell function is possible without danger
to loose the complete line. E.g., direct access to

index.php?t=getfile&id=94&private=0

Again for documentation, the file used here is attached below. Maybe
there is a better way -- if so, I'm curious to learn it.

Norwid
  • Attachment: step_0.png
    (Size: 7.11KB, Downloaded 730 times)
  • Attachment: step_1.png
    (Size: 5.26KB, Downloaded 721 times)
  • Attachment: step_2.png
    (Size: 8.80KB, Downloaded 665 times)
  • Attachment: step_3.png
    (Size: 9.24KB, Downloaded 718 times)
  • Attachment: test_file.dwar
    (Size: 2.85KB, Downloaded 336 times)
Re: Calculation of mean/median values in box &whisker plots. [message #618 is a reply to message #617] Sun, 25 August 2019 21:52 Go to previous message
thomas is currently offline  thomas
Messages: 646
Registered: June 2014
Senior Member
Hi Norwid,

if a complete row disappears from the table or structure view after setting the numerical value in one column to 'N/A', there may be these reasons:

Potential reason 1: there is a range filter on that column. If a row is changes to not contain a numerical value anymore, then it is not in the range of the filter anymore and immediately filtered out.

Potential reason 2: the column is assigned to an axis of a graphical view and the view does not show empty values. In this case the row cannot be placed on the view anymore, because there is no value anymore. The default behavior of DataWarrior is to show in all views the same rows. Thus, if the row is removed from the graphical view, it must be removed from all other views as well, unless the graphical view is configured 'not to influence global row visibility'.

Hope this is a useful explanation,

Thomas
Previous Topic: Toggle-off absolute configuration while determining Murcko scaffold
Next Topic: background colour in structure editor
Goto Forum:
  


Current Time: Thu Mar 28 16:45:47 CET 2024

Total time taken to generate the page: 0.08410 seconds