The function lm() returns an Splus object called a fitted linear
least-squares model
object, or lm object for short:lm(formula, data)
where formula is the structural formula that specifies the model and data is the data frame containing the data (omitted if the data is not stored in a data frame).
> fuel.fit_lm(100/Mileage ~ Weight + Disp., car.test.frame) > fuel.fit Call: lm(formula=100/Mileage~Weight + Disp.,data=car.test.frame) Coefficients: (Intercept) Weight Disp. 0.4789733 0.001241421 0.0008543589 Degrees of freedom: 60 total; 57 residual Residual standard error: 0.3900812A list of the components of an lm object and of the arguments which can be passed through the lm() function can be found at the end of this section.
The lm object fuel.fit can be used as an argument to other functions to obtain summaries of the data or to extract specific information about the model. The function summary() gives a more detailed description of the model than the one obtained by printing out the lm object. The components of an lm.summary object are listed at the end of this section.
> fuel.summary_summary(fuel.fit)
> fuel.summary
Call:
lm(formula=100/Mileage~Weight + Disp.,data = car.test.frame)
Residuals:
Min 1Q Median 3Q Max
-0.8109 -0.2559 0.01971 0.2673 0.9812
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 0.4790 0.3418 1.4014 0.1665
Weight 0.0012 0.0002 7.2195 0.0000
Disp. 0.0009 0.0016 0.5427 0.5895
Residual standard error: 0.3901 on 57 degrees of freedom
Multiple R-Squared: 0.7438
F-statistic: 82.75 on 2 and 57 degrees of freedom,
the p-value is 0
Correlation of Coefficients:
(Intercept) Weight
Weight -0.8968
Disp. 0.4720 -0.8033
Specific information about the model can be printed using the functions coef(),
resid() and fitted(). The same information can be obtained using fuel.fit$coef,
fuel.fit$resid, and fuel.fit$fitted.
> coef(fuel.fit)
(Intercept) Weight Disp.
0.4789733 0.001241421 0.0008543589
> formula(fuel.fit2)
100/Mileage ~ Weight + Disp. + Type
* the formula() function extracts the model
formula from the lm object fuel.fit2
The update() function is used to modify the model formula used in fuel.fit:
> fuel.fit2_lm(update(fuel.fit, . ~ . + Type))
> fuel.fit2
Call:
lm(formula = update(fuel.fit, . ~ . + Type))
Coefficients:
(Intercept) Weight Disp. Type1 Type2
2.960617 4.16396e-05 0.007594073 -0.1452804 0.09808411
Type3 Type4 Type5
-0.1240942 -0.04358047 0.1915062
Degrees of freedom: 60 total; 52 residual
Residual standard error: 0.3139246
By default, contrasts used for factors are the Helmert parametrization.
Here, the coefficient for Type 1 corresponds to the contrast between
the first two factors, the coefficient for Type 2 corresponds to the
contrast between the third factor and the mean of the first two, etc. To
view the contrast matrix, type fuel.fits2$contrasts. The type of
contrast used can be changed by specifying a different method as an argument
to the options() function, or bychoosing a contrast in the model
formula using the C() function. The four choices are: helmert, poly, sum,
and treatment.> fuel.fit2_lm(update(fuel.fit, . ~ . + C(Type, sum)))
* the first argument to the C() function
specifies the factor and the second
argument specifies the contrasts.
> X11();par(mfrow=c(2,1));plot(fuel.fit)
* the plot() function has a default method
for lm objects
* two plots are produced: one of the response
against the fitted values, with the y=x line
superimposed, the other of the absolute value
of the residuals against their fitted values
Suppose we have another data frame called new.cars for which we
want to predict fuel consumption using the model from fuel.fit.
This is provided by the function predict().
> predict(fuel.fit, new.cars)
* all the variables used in the model fuel.fit
must also be in the data frame new.cars
Quantity Expression Meaning
Standardized e/(s*(1-h)^.5) Residuals with equal variance
Residuals
Studentized e/si*(1-h)^.5) Use si as standard error
Residuals
DFBETAS bi/(si %o% xxi^.5) The change in the coefficients,
scaled by the standard error
for the coefficients
DFFIT h*e/(1-h) The change in the fitted value
when that observation is dropped
DFFITS h^.5*e/(si*(1-h)) Change in fitted values,
standardized to variance 1
where the objects are extracted from an lm object (lmf) as described below:
> lms_summary(lmf) > lmi_lm.influence(lmf) > e_residuals(lmf) > s_lms$sigma > si_lmi$sigma > xxi_diag(lms$cov.unscaled) > h_lmi$hat > bi_coef(lmf) - coef(lmi)Some of these quantities can be obtained from the function ls.diag(). The function ls.diag() takes as its argument a list like the output of lsfit(). The function lsfit() is similar to the lm() function except that the formula is not specified using a formula object, but rather the model and response matrices are passed on as arguments to the function. These matrices can be created from the variables or extracted from an lm object.
> fuel.model_lm(fuel.fit,x=T,y=T)
* an lm object is constructed which
saves the model matrix and the
response
> fuel.ls_lsfit(fuel.model$x, fuel.model$y, intercept=F)
* the model matrix and the response
are then passed on to the lsfit()
function
* the argument intercept=F is
specified because the intercept is
already included in the model matrix
* weights can also be specified using
the argument wt=
> ls.print(fuel.ls)
* ls.print() is similar to summary()
Residual Standard Error = 0.3901, Multiple R-Square = 0.7438
N = 60, F-statistic = 82.7518 on 2 and 57 df, p-value = 0
coef std.err t.stat p.value
Intercept 0.4790 0.3418 1.4014 0.1665
Weight 0.0012 0.0002 7.2195 0.0000
Disp. 0.0009 0.0016 0.5427 0.5895
> fuel.diag_ls.diag(fuel.ls)
* ls.diag() computes regression diagnostics
* the function returns a list with components:
std.dev residual standard deviation
hat vector containing the diagonal of the hat matrix
std.res standardized residuals
stud.res studentized residuals
cooks Cook's distance for each observation
dfits change in fitted value when each observation is deleted
correlation correlation matrix for the parameter estimates
std.err standard errors of the parameter estimates
cov.unscaled unscaled covariance matrix for the parameter estimates
An lm.object is a list with components (eg.: fuel.fit$coefficients):
coefficients
residuals
fitted.values
effects orthogonal, single-degree-of-freedom effects
R triangular facor of the decomposition
rank the computed rank
assign list of assignments of coefficients (and effects) to the
terms of the model
terms an object summarizing the formula
call call that produced the object
contrasts a list with matrices or vectors of contrasts used to code
factors
df.residual number of degrees of freedom for residuals
R.assign same as assign, for an over-specified model (in terms of
full-rank part of model)
assign.residuals vector identifying effects assigned to residuals
qr qr decomposition object (optional)
model model frame (optional)
x model matrix (optional)
y response (optional)
The following is a list of other possible arguments to the function lm(), with the
default action when not specified, if applicable:
weights= * vector of observation weights
subset= * expression specifying rows to be used
* may be logical, numeric (row numbers),
or character (row names)
na.action=na.fail * a function to filter missing values
* default creates an error message,
a possible alternative is na.omit
which deletes observations with
missing values
method="qr" * least squares fitting method to be used
* method="model.frame" returns the model.frame
model=F * if T returns the model frame component model
x=F * if T the model matrix is returned in component x
y=F * if T the response is returned in component y
contrasts=NULL * a list giving contrasts for all or some of the
factors appearing in the model formula
* the elements of the list should be a contrast
matrix, or a function to compute such a matrix
The function summary computes a list with the following components:
correlation correlation coefficient matrix for the coefficients
cov.unscaled unscaled covariance matrix
df degrees of freedom for the model and for the residuals
coefficients a matrix with three columns: coefficients, standard errors,
t-statistics
r.squared the multiple R-squared statistic
fstatistic numeric vector of length three giving the F test
- the first element is the statistic, the last two are the df
residuals the model residuals (weighted if applicable)
sigma residual standard error estimate
terms the terms object used in fitting this model