Home » DataWarrior » Functionality » Calculation of mean/median values in box &whisker plots.
Calculation of mean/median values in box &whisker plots. [message #601] Wed, 24 July 2019 11:05 Go to next message
timritchie is currently offline  timritchie
Messages: 6
Registered: February 2015
Location: Ranco, Italy
Junior Member
I have a DW file of compounds, separated into different classes, and activity data. Some compounds do not have an activity (cells are empty).
When mean values are displayed in a box or whisker plot, the values changes when the compounds with no activity are included.
I'm not sure why this should happen since they shouldn't affect the calculation of the mean.
Any thoughts on why this occurs?
Tim Ritchie.
Re: Calculation of mean/median values in box &whisker plots. [message #602 is a reply to message #601] Thu, 25 July 2019 10:30 Go to previous message
nbehrnd is currently offline  nbehrnd
Messages: 7
Registered: June 2019
Junior Member
Hello Tim,

lacking a minimal working example I can only guess what you refer too. Recent
work of mine with DW however equally considered data with columns lacking some
entries, too. The solution working «good enough» for me, aiming for whisker
plots and their statistics, however was to start with a table with each cell
in the corresponding column already filled with the place holder «N/A»; to be
replaced by real data only if these are at hand. (This equally may be entered
prior to DW with a conditional formatting in a spred sheet, or as manual edit
per cell in DW, too.) Thankfully, this kind of «other entry type» seen in other
statistical programs (e.g., R) seems to be recognized by DW, too.

As an example, I populated an array about the first ten alcanes, and let DW
determine their molecular weight; eventually displayed as a whisker plot (cf.
alkanes_complete.dwar). In a copy of this file (alkanes_except_octaneMW.dwar)
the entry about the molecular weight for octane's was substituted by the «N/A»
place holder. Now, the dot about the entry missing (or, in parlance of R, about
«[data] not available») is set below the others, no longer considered for further
statistics -- both on screen, as well as in the plots' statistics.

This approach was used with DataWarrior (stable release 5.0.0) running in Linux
Xubuntu (18.04.2 LTS, 64 bit).


[Updated on: Thu, 25 July 2019 10:32]

Report message to a moderator

Previous Topic: Retrieve Data from SQL database
Goto Forum:

Current Time: Mon Aug 19 23:28:27 CEST 2019

Total time taken to generate the page: 0.00345 seconds