Dear gnuplot developers,
I have found two problems with some example code in the "plot data
smooth frequency" section of the manual (page 95 in v5.2, page 90
for v5.0), designed to give equal-width intervals and rectangles
whose heights equal each interval frequency:
binwidth = <something> # set width of x values in each bin
bin(val) = binwidth * floor(val/binwidth)
plot "datafile" using (bin(column(1))):(1.0) smooth frequency
One problem is minor, one is major.
The minor problem is: the histogram bins are centred 0.5*binwidth
to the left of where they should be. The definition of "bin()"
above replaces each data value with the left endpoint of the
interval it lands in, but then centres each interval at that
point. To fix the problem, simply replace the definition with
bin(val) = binwidth * (floor(val/binwidth) + 0.5)
and then each data point is replaced by the midpoint of the
interval it lands in.
The major problem is: if there are any "empty bins", this code
(potentially) generates an incorrect histogram. If any midpoint in
the range of the data is not represented in bin(column(1)), this
code causes the "smooth frequency" procedure to do the wrong
thing. Rather than produce an empty interval at that position, it
makes the intervals either side wider (with endpoints halfway
between those midpoints that are represented. Worse, the heights
of those wider intervals are not adjusted appropriately, so that
area no longer reflects frequency in those intervals.
I have attached a script and a PDF that it generates to illustrate
the problem with a simple example (I have v5.0 on fedora 28) which
does show clearly that using "floor()" with "smooth frequency" in
this way is problematic. Interestingly, the (experimental) "bins"
procedure, documented in the v5.2 manual but not available on my
v5.0, ostensibly does the right thing.
It may be that you are waiting for "bins" to be no longer
"experimental" before making this change, I guess that would make
sense. Apologies if this has already been flagged, but I couldn't
see it mentioned on the bug tracker page.
Cheers,
Michael
|