Boxplots

DESCRIPTION:

Produces side by side boxplots from a number of vectors. The boxplots can be made to display the variability of the median, and can have variable widths to represent differences in sample size.

USAGE:

boxplot(..., range=1.0, width=<<see below>>, varwidth=F,  
      names=<<see below>>, plot=T, notch=F, style.bxp=list(),  
      boxwex=.5, boxcol=3, medchar=F, medpch=NA, medline=T, medlwd=5,  
      medcol=0, confint=F, confcol=2, confangle=45, confdensity=25,  
      confnotch=F, whisklty=2, staplelty=1, staplewex=1, staplehex=1, 
      outchar=F, outpch=NA, outline=T, outwex=1) 

REQUIRED ARGUMENTS:

...
vectors or lists containing numeric components (e.g., the output of split). Note that all other arguments must be specified in the name=value form, and the names can not be abbreviated. Missing values (NA) are allowed.

OPTIONAL ARGUMENTS:

range=
controls the strategy for the whiskers and the detached points beyond the whiskers. See the Details section below.
width=
vector of relative box widths. See also the varwidth argument. The default is that all widths are the same.
varwidth=
if TRUE, box widths will be proportional to the square root of the number of observations for the box. This is ignored if width is specified.
names=
vector of names for the groups. If omitted, names used in labeling the plot will be taken from the names attributes of the first list of data.
plot=
if TRUE, the boxplot will be produced; otherwise, the calculated summaries of the arguments are invisibly returned.
notch=
if TRUE, notched boxes are drawn. If the notches on two boxes do not overlap, this indicates a difference in a location at a rough 5% significance level. (NOTE: The notch parameter is provided primarily for backward compatibility. See the confint, confnotch, confcol, confangle and confdensity parameters below for more versatile control of the displaying of confidence intervals.)
style.bxp=
character string or list indicating the style of the boxplot. If specified as a character string, the string is appended to " bxp." to get the name of a dataset which is a list. Component names of this list should match the names of the parameters below; the component values serve as the defaults for the corresponding parameters (i.e., other arguments supplied to the function override the style.bxp component values). Standard style.bxp option values include "splus" (new S-PLUS style), "att" (new AT&T style) and "old".
boxwex=
Box width expansion. The width of the boxes, along with the width of the staples (whisker end caps) and outliers (if drawn as lines), are proportional to this parameter. The default is 0.5, but the "att" and "old" styles set this to 1.
boxcol
filled box color(s). If one number is supplied, the box will be filled with the indicated color. If a vector of two non-negative numbers is supplied, the area below the median will be filled with the first color and the area above the median will be filled with the second color. A color of 0 can be used to designate filling with the background color. A specification of boxcol=-1 is used to designate "no fill" at all. The default is to fill with color 3, but the "att" and "old" styles set this for no filling.
medchar=
logical flag indicating whether to show the median as a plotted character. This parameter is implicitly set to TRUE if a medpch parameter is supplied. The default is FALSE, but the "att" style implicitly sets the default to TRUE (by specifying medpch).
medpch=
median plotting character. Setting this parameter implicitly sets the medchar parameter to be TRUE. The special value, NA, can be used to indicate the current plotting character ( par("pch")).The default is NA, but the "att" style set the default to 16 (filled octagon).
medline=
logical flag indicating whether to show the median as a line across the box. This parameter is implicitly set to TRUE if the medlwd parameter is supplied. The default is TRUE, but the "att" style sets it to FALSE.
medlwd=
median line width. Setting this parameter implicitly sets the medline parameter to TRUE. The special value, NA, is used to indicate the current line width ( par("lwd")). The default is 5, but the "old" and "att" styles set the it to 5.
medcol=
the color of the median line or character. The special value, NA, indicates the current plotting color ( par("col")). The default is 0 (the background color), but the "old" and "att" styles set the default to NA.
confint=
if TRUE, confidence intervals are shown. If the intervals on two boxes do not overlap, this indicates a difference in a location at a rough 5% significance level. How the confidence intervals are displayed is determined by the confnotch, confcol, confangle and confdensity parameters.
confnotch=
confidence interval notch logical flag. If TRUE, confidence intervals will be notched. The default is FALSE, but the "old" and "att" styles set this parameter to TRUE.
confcol=
confidence interval color. If supplied, confidence intervals will be filled with the indicated color. The default is 2, but the "old" and "att" styles set it to -1 (no filling).
confangle=
confidence interval hatching angle. If supplied, confidence intervals will be hatched at the indicated angle, in degrees. If confdensity is supplied and confangle is not, confangle defaults to 45.
confdensity=
confidence interval hatching density. If supplied, confidence intervals will hatched at the indicated density, in lines per inch. If confangle is supplied and confdensity is not, confdensity defaults to 25.
whisklty=
whisker line type. The special value, NA, indicates the current line type ( par("lty")). The default is 2 (dotted line), but the "old" and "att" styles set it to 4 (dashed line).
staplelty=
staple (whisker end cap) line type. The special value, NA, indicates the current line type ( par("lty")). The default is 1 (solid line), but the "att" style sets the default to 4 (dashed line).
staplewex=
staple width expansion. Proportional to the box width. The default is 1, but the "old" style sets the default to 0.125.
staplehex=
staple height expansion. Proportional to a standard height of about 1/100th the height of the plotting area. The default is 1 but the "old" style sets the default to 0.
outchar=
logical flag indicating whether to show the outliers as a plotted characters. This parameter is implicitly set to TRUE if an outpch parameter is supplied. The default is FALSE, but the "old" style sets it to TRUE, and the "att" style implicitly sets it to TRUE (by setting outpch).
outpch=
outlier plotting character. Setting this parameter implicitly sets the outchar parameter to be TRUE. The special value, NA, indicates the current plotting character ( par("pch")). The default is NA, but the "att" style sets the default to 1 (an octagon).
outline=
logical flag indicating whether to show the outliers as horizontal lines. This parameter is implicitly set to TRUE if the outwex parameter is supplied. The default is TRUE, but the "old" and "att" styles set it to FALSE.
outwex=
outlier line width expansion, proportional to the box width. The default is 1.

Graphical parameters may also be supplied as arguments to this function (see par ). In addition, the high-level graphics arguments described under plot.default and the arguments to title may be supplied to this function. However, boxplot will always use linear axes: the log and [xy]axt arguments are ignored. You can apply any transformation to your data before calling boxplot with axes=F and use the axis function to add a axis labeled to reflect the transformation.

VALUE:

if plot is TRUE, the function bxp is invoked with these components, plus optional width, varwidth, notch, and style (and associated parameters), to produce the plot. Note that bxp returns a vector of box centers.

if plot is FALSE, an invisible list with the components listed below:

stats
matrix (of size 5 by the number of boxes) giving the upper extreme (excluding outliers), upper quartile, median, lower quartile, and lower extreme (excluding outliers) for each box. By default, anything farther than 1.5 times the Inter-Quartile Range is considered an outlier. See the Details section below and the range argument above.
n
the number of observations in each group.
conf
matrix (of size 2 by the number of boxes) giving approximate 95% confidence limits for the median. The limits are functions of the quartiles, so a few outliers have little effect on them.
out
optional vector of outlying points (outliers). See the Details section below.
group
vector giving the box to which each point in out belongs.
names
names for each box (see argument names above).

SIDE EFFECTS:

If plot is TRUE, a plot is created on the current graphics device.

DETAILS:

By default, whiskers are drawn to the nearest value not beyond a standard span from the quartiles; points beyond (outliers) are drawn individually. Giving range=0 forces whiskers to the full data range. Any positive value of range multiplies the standard span by this amount. The standard span is 1.5*(Inter-Quartile Range).

BACKGROUND:

Boxplots have proven to be quite a good exploratory tool, especially when several boxplots are placed side by side for comparison. The most striking visual feature is the box which shows the limits of the middle half of the data (the line inside the box represents the median). Extreme points are also highlighted. Boxplots not only show the location and spread of data but indicate skewness, as well.

REFERENCES:

Hoaglin, D. C., Mosteller, F., and Tukey, J. W., editors (1983). Understanding Robust and Exploratory Data Analysis. New York: Wiley.

McGill, R., Tukey, J. W., and Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32, 12-16.

Tukey, J. W. (1990). Data-based graphics: visual display in the decades to come. Statistical Science 5, 327-339.

Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data Analysis. Boston: Duxbury.

SEE ALSO:

boxes , symbols , bxp , hist , stem , dotchart , par , title .

EXAMPLES:

boxplot(lottery.payoff, lottery2.payoff, lottery3.payoff) 
attach(market.frame) 
boxplot(split(income, age), varwidth=TRUE, notch=TRUE) 
boxplot( 
      split(lottery.payoff, lottery.number%/%100), 
      main="NJ Pick-it Lottery (5/22/75-3/16/76)", 
      sub="Leading Digit of Winning Numbers", 
      ylab="Payoff")