Home » DataWarrior » Functionality » Calculation of mean/median values in box &whisker plots.
Calculation of mean/median values in box &whisker plots. [message #601] |
Wed, 24 July 2019 11:05 |
timritchie
Messages: 15 Registered: February 2015 Location: St Albans, UK
|
Junior Member |
|
|
Hello,
I have a DW file of compounds, separated into different classes, and activity data. Some compounds do not have an activity (cells are empty).
When mean values are displayed in a box or whisker plot, the values changes when the compounds with no activity are included.
I'm not sure why this should happen since they shouldn't affect the calculation of the mean.
Any thoughts on why this occurs?
Thanks,
Tim Ritchie.
|
|
|
Re: Calculation of mean/median values in box &whisker plots. [message #602 is a reply to message #601] |
Thu, 25 July 2019 10:30 |
nbehrnd
Messages: 224 Registered: June 2019
|
Senior Member |
|
|
Hello Tim,
lacking a minimal working example I can only guess what you refer too. Recent
work of mine with DW however equally considered data with columns lacking some
entries, too. The solution working «good enough» for me, aiming for whisker
plots and their statistics, however was to start with a table with each cell
in the corresponding column already filled with the place holder «N/A»; to be
replaced by real data only if these are at hand. (This equally may be entered
prior to DW with a conditional formatting in a spred sheet, or as manual edit
per cell in DW, too.) Thankfully, this kind of «other entry type» seen in other
statistical programs (e.g., R) seems to be recognized by DW, too.
As an example, I populated an array about the first ten alcanes, and let DW
determine their molecular weight; eventually displayed as a whisker plot (cf.
alkanes_complete.dwar). In a copy of this file (alkanes_except_octaneMW.dwar)
the entry about the molecular weight for octane's was substituted by the «N/A»
place holder. Now, the dot about the entry missing (or, in parlance of R, about
«[data] not available») is set below the others, no longer considered for further
statistics -- both on screen, as well as in the plots' statistics.
This approach was used with DataWarrior (stable release 5.0.0) running in Linux
Xubuntu (18.04.2 LTS, 64 bit).
Norwid
[Updated on: Thu, 25 July 2019 10:32] Report message to a moderator
|
|
|
|
Re: Calculation of mean/median values in box &whisker plots. [message #617 is a reply to message #613] |
Sun, 25 August 2019 20:13 |
nbehrnd
Messages: 224 Registered: June 2019
|
Senior Member |
|
|
Hi Thomas, Hi Tim,
in my observation, all four statistical values provided in the whisker
plot do change by setting manually the cell entry of molecular mass to
the string of "N/A" (without the quotation marks). Processing the data
a twice allows me to retrieve the changes in the whisker plot and its
statistical data, too. Here I share my approach to the task with DW
(Linux version 5.0.0) with the test file alkanes_complete.dwar above:
Reading the file as-such which contains 10 complete entries:
Accessing the cell value for methane, "16.0428" is replaced by "=N/A".
As expected, DW will indicate this as a non-valid entry. Of course, no
whisker plot is provided now. But the dot is still present in the plot.
Next step, replacing "=N/A" by "N/A". DW accepting this now provides a
Box whisker plot. The dot without associated value is sorted out, the
statistical data are updated.
Two additional observations:
If "N/A" is entered for the first time, the line vanishes completely,
hence shortening the list of 10 alkanes to 9 alkanes. This removal
may affect other columns than the column currently worked on, too.
This contradicts the aim to preserve the complete line which only has
no entry for this very cell.
Say, there is a second entry with no value available. Then, a direct
input of "N/A" via DW's edit cell function is possible without danger
to loose the complete line. E.g., direct access to
Again for documentation, the file used here is attached below. Maybe
there is a better way -- if so, I'm curious to learn it.
Norwid
-
Attachment: step_0.png
(Size: 7.11KB, Downloaded 884 times)
-
Attachment: step_1.png
(Size: 5.26KB, Downloaded 871 times)
-
Attachment: step_2.png
(Size: 8.80KB, Downloaded 814 times)
-
Attachment: step_3.png
(Size: 9.24KB, Downloaded 870 times)
-
Attachment: test_file.dwar
(Size: 2.85KB, Downloaded 454 times)
|
|
|
Re: Calculation of mean/median values in box &whisker plots. [message #618 is a reply to message #617] |
Sun, 25 August 2019 21:52 |
thomas
Messages: 715 Registered: June 2014
|
Senior Member |
|
|
Hi Norwid,
if a complete row disappears from the table or structure view after setting the numerical value in one column to 'N/A', there may be these reasons:
Potential reason 1: there is a range filter on that column. If a row is changes to not contain a numerical value anymore, then it is not in the range of the filter anymore and immediately filtered out.
Potential reason 2: the column is assigned to an axis of a graphical view and the view does not show empty values. In this case the row cannot be placed on the view anymore, because there is no value anymore. The default behavior of DataWarrior is to show in all views the same rows. Thus, if the row is removed from the graphical view, it must be removed from all other views as well, unless the graphical view is configured 'not to influence global row visibility'.
Hope this is a useful explanation,
Thomas
|
|
|
Goto Forum:
Current Time: Sun Nov 24 03:14:52 CET 2024
Total time taken to generate the page: 0.03631 seconds
|