Boxplots
DESCRIPTION:
Produces side by side boxplots from a number of vectors. The boxplots
can be made to display the variability of the median, and can have
variable widths to represent differences in sample size.
USAGE:
boxplot(..., range=1.0, width=<<see below>>, varwidth=F,
names=<<see below>>, plot=T, notch=F, style.bxp=list(),
boxwex=.5, boxcol=3, medchar=F, medpch=NA, medline=T, medlwd=5,
medcol=0, confint=F, confcol=2, confangle=45, confdensity=25,
confnotch=F, whisklty=2, staplelty=1, staplewex=1, staplehex=1,
outchar=F, outpch=NA, outline=T, outwex=1)
REQUIRED ARGUMENTS:
- ...
-
vectors or lists containing
numeric components (e.g., the output of
split).
Note that all other arguments must be specified in the
name=value
form, and the names can not be abbreviated.
Missing values (NA) are allowed.
OPTIONAL ARGUMENTS:
- range=
-
controls the strategy for the whiskers and the detached
points beyond the whiskers. See the Details section below.
- width=
-
vector of relative box widths. See also the
varwidth argument. The default is that all widths are the same.
- varwidth=
-
if
TRUE, box widths will be proportional to
the square root of the number of observations for the box.
This is ignored if
width is specified.
- names=
-
vector of names for the groups. If
omitted, names used in labeling the plot will be taken from
the
names attributes of the first list of data.
- plot=
-
if
TRUE, the boxplot will be produced;
otherwise, the calculated summaries of the arguments are
invisibly returned.
- notch=
-
if
TRUE, notched boxes are drawn.
If the notches on two boxes do not overlap, this indicates a difference in a
location at a rough 5% significance level.
(NOTE: The
notch parameter is provided primarily for backward compatibility.
See the
confint,
confnotch,
confcol,
confangle and
confdensity
parameters below for more versatile control of the displaying of confidence
intervals.)
- style.bxp=
-
character string or list indicating the style of the boxplot. If
specified as a character string, the string is appended to
"
bxp." to get the name of a dataset which is a list. Component
names of this list should match the names of the parameters below; the
component values serve as the defaults for the corresponding
parameters (i.e., other arguments supplied to the function
override the
style.bxp component values). Standard
style.bxp option values
include
"splus" (new S-PLUS style),
"att" (new AT&T style) and
"old".
- boxwex=
-
Box width expansion. The width of the boxes, along with the width of the
staples (whisker end caps) and outliers (if drawn as lines), are proportional
to this parameter. The default is
0.5, but the
"att" and
"old" styles
set this to
1.
- boxcol
-
filled box color(s). If one number is supplied, the box will be filled with
the indicated color. If a vector of two non-negative numbers is supplied, the
area below the median will be filled with the first color and the area above
the median will be filled with the second color. A color of
0 can be used
to designate filling with the background color. A specification of
boxcol=-1
is used to designate "no fill" at all. The default is to fill with color
3,
but the
"att" and
"old" styles set this for no filling.
- medchar=
-
logical flag indicating whether to show the median as a plotted character.
This parameter is implicitly set to
TRUE if a
medpch parameter is
supplied. The default is
FALSE, but the
"att" style implicitly
sets the default to
TRUE (by specifying
medpch).
- medpch=
-
median plotting character. Setting this parameter implicitly sets the
medchar parameter to be
TRUE. The special value,
NA, can be
used to indicate the current plotting character (
par("pch")).The
default is
NA, but the
"att" style set the default to
16 (filled
octagon).
- medline=
-
logical flag indicating whether to show the median as a line across
the box. This parameter is implicitly set to
TRUE if the
medlwd
parameter is supplied. The default is
TRUE, but the
"att" style
sets it to
FALSE.
- medlwd=
-
median line width. Setting this parameter implicitly sets the
medline parameter to
TRUE. The special value,
NA, is used to
indicate the current line width (
par("lwd")). The default is
5,
but the
"old" and
"att" styles set the it to
5.
- medcol=
-
the color of the median line or character. The special value,
NA,
indicates the current plotting color (
par("col")). The default is
0
(the background color), but the
"old" and
"att" styles set the
default to
NA.
- confint=
-
if
TRUE, confidence intervals are shown.
If the intervals on two boxes do not overlap, this indicates a difference in a
location at a rough 5% significance level. How the confidence intervals are
displayed is determined by the
confnotch,
confcol,
confangle and
confdensity parameters.
- confnotch=
-
confidence interval notch logical flag. If
TRUE, confidence
intervals will be notched. The default is
FALSE, but the
"old"
and
"att" styles set this parameter to
TRUE.
- confcol=
-
confidence interval color. If supplied, confidence intervals will be
filled with the indicated color. The default is
2, but the
"old"
and
"att" styles set it to -1 (no filling).
- confangle=
-
confidence interval hatching angle. If supplied, confidence intervals will
be hatched at the indicated angle, in degrees. If
confdensity
is supplied and
confangle is not,
confangle defaults to
45.
- confdensity=
-
confidence interval hatching density. If supplied, confidence intervals will
hatched at the indicated density, in lines per inch. If
confangle is
supplied and
confdensity is not,
confdensity defaults to
25.
- whisklty=
-
whisker line type. The special value,
NA, indicates the current
line type (
par("lty")). The default is
2 (dotted line), but the
"old" and
"att" styles set it
to
4 (dashed line).
- staplelty=
-
staple (whisker end cap) line type. The special value,
NA,
indicates the current line type (
par("lty")). The default is
1
(solid line), but the
"att" style sets the default to
4 (dashed
line).
- staplewex=
-
staple width expansion. Proportional to the box width. The default is
1, but the
"old" style sets the default to
0.125.
- staplehex=
-
staple height expansion. Proportional to a standard height of about 1/100th
the height of the plotting area. The default is
1 but the
"old"
style sets the default to
0.
- outchar=
-
logical flag indicating whether to show the outliers as a plotted characters.
This parameter is implicitly set to
TRUE if an
outpch parameter is
supplied. The default is
FALSE, but the
"old" style sets it
to
TRUE, and the
"att" style implicitly sets it to
TRUE (by
setting
outpch).
- outpch=
-
outlier plotting character. Setting this parameter implicitly sets
the
outchar parameter to be
TRUE. The special value,
NA,
indicates the current plotting character (
par("pch")). The default
is
NA, but the
"att" style sets the default to
1 (an octagon).
- outline=
-
logical flag indicating whether to show the outliers as horizontal lines.
This parameter is implicitly set to
TRUE if the
outwex parameter is
supplied. The default is
TRUE, but the
"old" and
"att" styles
set it to
FALSE.
- outwex=
-
outlier line width expansion, proportional to the box width. The default
is
1.
Graphical parameters may also be supplied as arguments to
this function (see
par
).
In addition, the high-level graphics arguments described under
plot.default
and the arguments to
title
may be supplied to this function.
However,
boxplot will always use linear axes:
the
log and
[xy]axt arguments are ignored.
You can apply any transformation to your data before calling
boxplot
with
axes=F and use the
axis function to add a axis labeled to
reflect the transformation.
VALUE:
if
plot is
TRUE, the function
bxp is invoked with these components, plus
optional
width,
varwidth,
notch, and
style (and associated parameters),
to produce the plot. Note that
bxp returns a vector of box centers.
if
plot is
FALSE, an invisible list with the
components listed below:
- stats
-
matrix (of size
5 by the number of boxes) giving the upper extreme
(excluding outliers),
upper quartile, median, lower quartile, and lower extreme (excluding outliers)
for each box. By default, anything farther than 1.5 times the
Inter-Quartile Range is considered an outlier. See the Details
section below and the
range argument above.
- n
-
the number of observations in each group.
- conf
-
matrix (of size
2 by the number of boxes) giving
approximate 95% confidence limits for the
median. The limits are functions of the quartiles, so a few outliers have
little effect on them.
- out
-
optional vector of outlying points (outliers). See the Details
section below.
- group
-
vector giving the box to which each point in
out belongs.
- names
-
names for each box (see argument
names above).
SIDE EFFECTS:
If
plot is
TRUE, a plot is created on the current graphics device.
DETAILS:
By default, whiskers are drawn
to the nearest value not beyond a standard span from the quartiles; points
beyond (outliers) are drawn individually. Giving
range=0 forces
whiskers to the full data range. Any positive value of
range
multiplies the standard span by this amount.
The standard span is 1.5*(Inter-Quartile Range).
BACKGROUND:
Boxplots have proven to be quite a good exploratory tool, especially
when several boxplots are placed side by side for comparison.
The most striking visual feature is the box which shows the limits of
the middle half of the data (the line inside the box represents the median).
Extreme points are also highlighted.
Boxplots not only show the location and spread of data but indicate
skewness, as well.
REFERENCES:
Hoaglin, D. C., Mosteller, F., and Tukey, J. W., editors (1983).
Understanding Robust and Exploratory Data Analysis.
New York: Wiley.
McGill, R., Tukey, J. W., and Larsen, W. A. (1978).
Variations of box plots.
The American Statistician,
32, 12-16.
Tukey, J. W. (1990).
Data-based graphics: visual display in the decades to come.
Statistical Science
5, 327-339.
Velleman, P. F. and Hoaglin, D. C. (1981).
Applications, Basics, and Computing of Exploratory Data Analysis.
Boston: Duxbury.
SEE ALSO:
boxes
,
symbols
,
bxp
,
hist
,
stem
,
dotchart
,
par
,
title
.
EXAMPLES:
boxplot(lottery.payoff, lottery2.payoff, lottery3.payoff)
attach(market.frame)
boxplot(split(income, age), varwidth=TRUE, notch=TRUE)
boxplot(
split(lottery.payoff, lottery.number%/%100),
main="NJ Pick-it Lottery (5/22/75-3/16/76)",
sub="Leading Digit of Winning Numbers",
ylab="Payoff")