10 Shrinkage Estimation
Shrinkage estimation is a highly valuable technique in the context of high-dimensional regression analysis. It allows for the estimation of regression models with more regressors than observations.
10.1 Mean squared error
The key measure of estimation accuracy is the mean squared error (MSE). The MSE of an estimator \widehat \theta for a parameter \theta is MSE(\widehat \theta) = E[(\widehat \theta - \theta)^2]. The MSE can be decomposed into the variance plus squared bias: MSE(\widehat \theta) = \underbrace{E[(\widehat \theta - E[\widehat \theta])^2]}_{= Var[\widehat \theta]} + \underbrace{(E[\widehat \theta] - \theta)^2}_{= Bias(\widehat \theta)^2}
Proof
Proof. Subtracting and adding E[\widehat \theta] gives \begin{align*} &(\widehat \theta - \theta)^2 = (\widehat \theta - E[\widehat \theta] + E[\widehat \theta] - \theta)^2 \\ &= (\widehat \theta - E[\widehat \theta])^2 + 2(\widehat \theta - E[\widehat \theta])(\underbrace{E[\widehat \theta] - \theta}_{Bias(\widehat \theta)}) + \underbrace{(E[\widehat \theta] - \theta)^2}_{=Bias(\widehat \theta)^2}. \end{align*} The middle term is zero after taking the expectation: \begin{align*} E[(\widehat \theta - \theta)^2] = \underbrace{E[(\widehat \theta - E[\widehat \theta])^2]}_{=Var[\widehat \theta]} + 2 \underbrace{E[\widehat \theta - E[\widehat \theta]]}_{=0} Bias(\widehat \theta) + Bias(\widehat \theta)^2. \end{align*} \square
For instance, consider an i.i.d. sample X_1, \ldots, X_n with population mean E[X_i] = \mu and variance Var[X_i] = \sigma^2. Let’s study the sample mean \widehat \mu = \frac{1}{n} \sum_{i=1}^n X_i as an estimator of \mu. You will find that E[\widehat \mu] = \mu, \quad Var[\widehat \mu] = \frac{\sigma^2}{n}.
Proof
Proof. By the linearity of the expectation, we have E[\widehat \mu] = \frac{1}{n}\sum_{i=1}^n \underbrace{E[X_i]}_{\mu} = \mu. The independence of X_1, \ldots, X_n implies Var[\widehat \mu] = \frac{1}{n^2} Var\bigg[\sum_{i=1}^n X_i\bigg] = \frac{1}{n^2} \sum_{i=1}^n Var[X_i] = \frac{\sigma^2}{n} \square
The sample mean is unbiased for \mu, i.e., Bias(\widehat \mu) = E[\widehat \mu] - \mu = 0. The MSE equals its variance: MSE(\widehat \mu) = \frac{\sigma^2}{n}.
The sample mean is the best unbiased estimator for the population mean, but there exists estimators with a lower MSE if we allow for a small bias.
10.2 A simple shrinkage estimator
Let us shrink our sample mean a bit towards 0 and define the alternative estimator \widetilde \mu = (1-w)\widehat \mu, \quad w \in [0,1]. Setting the shrinkage weight to w = 0 gives \widetilde \mu = \widehat \mu (no shrinkage) and w = 1 gives \widetilde \mu = 0 (full shrinkage). Our shrinkage estimator has the bias Bias(\widetilde \mu) = E[(1-w)\widehat \mu] - \mu = (1-w)\underbrace{E[\widehat \mu]}_{= \mu} - \mu = -w\mu. The variance is Var[\widetilde \mu] = Var[(1-w)\widehat \mu] = (1-w)^2Var[\widehat \mu] = (1-w)^2 \frac{\sigma^2}{n}, and the MSE is MSE(\widetilde \mu) = Var[\widetilde \mu] + Bias(\widetilde \mu)^2 = (1-w)^2 \frac{\sigma^2}{n} + w^2 \mu^2.
The optimal weight in terms of the MSE is w^* = \frac{1}{1+n\mu^2/\sigma^2}
Proof
Proof. We take the derivative of mse(\widetilde \mu) across w to obtain the first order condition: -2(1-w)\sigma^2/n + 2w\mu^2 = 0. Solving for w gives w(1+n\mu^2/\sigma^2) = 1. Then, w^* is the global minimum because the second derivative is 2\sigma^2/n+2\mu^2 > 0. \square
For instance, if \mu = 1, \sigma^2 = 1, and n=99, we have w^* = 0.01.
The shrunk sample mean \widetilde \mu^* = (1-w^*) \widehat \mu = \frac{n \mu^2/\sigma^2}{1 + n \mu^2/\sigma^2} \frac{1}{n} \sum_{i=1}^n X_i has a lower MSE than the usual sample mean: MSE(\widetilde \mu^*) = (1-w^*)^2\frac{\sigma^2}{n} + (w^*)^2 \mu^2 < \frac{\sigma^2}{n} = mse(\widehat \mu) This is a remarkable result because it tells us that the sample mean is not the best we can do in the MSE sense to estimate a population mean. The shrinked estimator is more efficient. It is biased, but the bias vanishes asymptotically since \lim_{n\to \infty} w^* = 0.
The optimal shrinkage parameter w^* is infeasible because \mu^2/\sigma^2 is unknown. While insightful theoretically, this result is not directly applicable in empirical work, and taking sample means is still recommended.
However, the shrinkage principle can be very useful in the context of high-dimensional regression.
10.3 High-dimensional regression
Least squares regression works well when the number of regressors k is small relative to the number of observations n. In a previous section on “too many regressors”, we discussed how ordinary least squares (OLS) can overfit when k is too large compared to n. Specifically, if k=n, the OLS regression line perfectly fits the data.
Many economic applications involve categorical variables that are transformed into a large number of dummy variables. If we include pairwise interaction terms among J variables, we get another \sum_{i=1}^{J-1} i = J (J-1)/2 regressors (for example, 190 for J=20 and 4950 for J=100).
Accounting for further nonlinearities by adding squared and cubic terms or higher-order interactions can result in thousands or even millions of regressors. Many of these regressors may provide low informational value, but it is difficult to determine a priori which are relevant and which are irrelevant.
If k>n, the OLS estimator is not uniquely defined because \boldsymbol X' \boldsymbol X does not have full rank. If k \approx n the matrix \boldsymbol X' \boldsymbol X can be near singular, resulting in numerically unstable OLS coefficients or high variance.
For the vector-valued (k-variate) estimator \widehat{\boldsymbol \beta}_{ols} the (conditional) MSE is \begin{align*} MSE(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X) &= E[(\widehat{\boldsymbol \beta}_{ols} - \boldsymbol \beta)'(\widehat{\boldsymbol \beta}_{ols} - \boldsymbol \beta)|\boldsymbol X] \\ &= Var[\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X] + Bias(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X) \big(Bias(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X)\big)', \end{align*} where, under random sampling, OLS is unbiased: Bias(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X) = E[\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X] - \boldsymbol \beta = \boldsymbol 0. Consequently, the MSE of OLS equals its variance: MSE(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X) = Var[\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X] = (\boldsymbol X' \boldsymbol X)^{-1} \boldsymbol X' \boldsymbol D \boldsymbol X (\boldsymbol X' \boldsymbol X)^{-1}.
10.4 Ridge Regression
To avoid that (\boldsymbol X'\boldsymbol X)^{-1} becomes very large or undefined for large k, we can introduce a shrinkage parameter \lambda and define the ridge regression estimator \widehat{\boldsymbol \beta}_{ridge} = (\boldsymbol X' \boldsymbol X + \lambda \boldsymbol I_k)^{-1} \boldsymbol X' \boldsymbol Y. \qquad \quad \tag{10.1}
This estimator is well defined and does not suffer from multicollinearity problems, even if k > n. The inverse (\boldsymbol X' \boldsymbol X + \lambda \boldsymbol I_k)^{-1} exists as long as \lambda > 0. For \lambda = 0, the ridge estimator coincides with the OLS estimator.
While the OLS estimator is motivated from the minimization problem \min_{\boldsymbol \beta} (\boldsymbol Y-\boldsymbol X \boldsymbol \beta)'(\boldsymbol Y-\boldsymbol X \boldsymbol \beta), the ridge estimator is the minimizer of \min_{\boldsymbol \beta} (\boldsymbol Y-\boldsymbol X \boldsymbol \beta)'(\boldsymbol Y-\boldsymbol X \boldsymbol \beta) + \lambda \boldsymbol \beta'\boldsymbol \beta. \qquad \quad \tag{10.2}
The minimization problem introduces a penalty for large values of \boldsymbol \beta. The solution is then shrunk towards zero by \lambda > 0.
10.5 Standardization
It is common practice to standardize the regressors in ridge regression: \widetilde X_{ij} = \frac{X_{ij} - \overline{\boldsymbol X}_j}{\sqrt{\frac{1}{n-1} \sum_{i=1}^n (X_{ij} - \overline{\boldsymbol X}_j)^2}}, \quad \overline{\boldsymbol X}_j = \frac{1}{n} \sum_{i=1}^n X_{ij}
Without standardization, variables with larger scales (i.e., larger variances) will disproportionately influence the penalty term through \lambda \boldsymbol \beta'\boldsymbol \beta = \lambda \sum_{j=1}^k \beta_j^2. Variables with smaller variance may be under-penalized, while those with larger variance may be over-penalized.
Standardization ensures that each variable contributes equally to the penalty term, making the penalty independent of the scale of the variables.
Standardizing makes the coefficient estimates more interpretable, as they will all be on the same scale, which helps in understanding the relative importance of each variable.
10.6 Ridge Properties
The bias of the ridge estimator is Bias(\widehat{\boldsymbol \beta}_{ridge}|\boldsymbol X) = - \lambda (\boldsymbol X' \boldsymbol X + \lambda \boldsymbol I_k)^{-1} \boldsymbol \beta, and the covariance matrix is Var[\widehat{\boldsymbol \beta}_{ridge}|\boldsymbol X] = (\boldsymbol X' \boldsymbol X + \lambda \boldsymbol I_k)^{-1} \boldsymbol X' \boldsymbol D \boldsymbol X (\boldsymbol X' \boldsymbol X + \lambda \boldsymbol I_k)^{-1}. In the homoskedastic linear regression model, we have MSE(\widehat{\boldsymbol \beta}_{ridge}|\boldsymbol X) < MSE(\widehat{\boldsymbol \beta}_{ols}|\boldsymbol X) if 0 < \lambda < 2 \sigma^2/\boldsymbol \beta'\boldsymbol \beta.
Similarly to the sample mean case, the upper bound 2 \sigma^2/\boldsymbol \beta'\boldsymbol \beta does not give practical guidance for selecting \lambda because \boldsymbol \beta and \sigma^2 are unknown.
10.7 Mean squared prediction error
The optimal value for \lambda minimizes the MSE, but estimating the MSE of the ridge estimator is not straightforward because it depends on the parameter \boldsymbol \beta being estimated. Instead, it is better to focus on the out-of-sample mean squared prediction error (MSPE).
Let (Y_1, \boldsymbol X_1), \ldots, (Y_n, \boldsymbol X_n) be our data set (in-sample observations) with ridge estimator Equation 10.1, and let (Y^{oos}, \boldsymbol X^{oos}) be another observation pair (out-of-sample observation) that is independently drawn from the same population as (Y_1, \boldsymbol X_1), \ldots, (Y_n, \boldsymbol X_n).
The mean squared prediction error (MSPE) is MSPE(\widehat{\boldsymbol \beta}_{ridge}) = E\big[(Y^{oos} - (\boldsymbol X^{oos})'\widehat{\boldsymbol \beta}_{ridge})^2\big]. Note that (Y^{oos}, \boldsymbol X^{oos}) is independent of \widehat{\boldsymbol \beta}_{ridge} because it has not been used for estimation. \widehat Y(\boldsymbol X^{oos}) = (\boldsymbol X^{oos})'\widehat{\boldsymbol \beta}_{ridge} is the predicted value of Y^{oos}.
To estimate the MSPE, we can use a split sample.
We divide our observations randomly into a training sample (in-sample) of size n_{train} and a testing sample (out-of-sample) of size n_{test} with n = n_{train} + n_{test}: (Y_1^{ins}, \boldsymbol X_1^{ins}), \ldots (Y_{n_{train}}^{ins}, \boldsymbol X_{n_{train}}^{ins}), \quad (Y_1^{oos}, \boldsymbol X_1^{oos}), \ldots (Y_{n_{test}}^{oos}, \boldsymbol X_{n_{test}}^{oos})
We estimate \boldsymbol \beta using the training sample: \widehat{\boldsymbol \beta}_{ridge}^{ins} = \bigg(\sum_{i=1}^{n_{train}} \boldsymbol X_i^{ins}(\boldsymbol X_i^{ins})' + \lambda \boldsymbol I_k\bigg)^{-1} \sum_{i=1}^{n_{train}} \boldsymbol X_i^{ins} Y_i^{ins}.
We evaluate the empirical MSPE using the testing sample, \widehat{MSPE}_{split} = \frac{1}{n_{test}} \sum_{i=1}^{n_{test}} \Big(Y_i^{oos} - \big(\boldsymbol X_i^{oos}\big)'\widehat{\boldsymbol \beta}_{ridge}^{ins}\Big)^2 \qquad \quad \tag{10.3}
Steps 2 and 3 are repeated for different values for \lambda. We select the value for \lambda that gives the smallest estimated MSPE.
10.8 Cross validation
A problem with the split sample estimator is that it highly depends on the choice of the two subsamples. An alternative is to select m subsamples (folds) and evaluate the MSPE using each fold separately:
m-fold cross validation
Divide the sample into j=1, \ldots, m randomly chosen folds/subsamples of approximately equal size: \begin{align*} &(Y_1^{(1)}, \boldsymbol X_1^{(1)}), \ldots, (Y^{(1)}_{n_1}, \boldsymbol X^{(1)}_{n_1}) \\ &(Y_1^{(2)}, \boldsymbol X_1^{(2)}), \ldots, (Y^{(2)}_{n_2}, \boldsymbol X^{(2)}_{n_2}) \\ & \qquad \vdots \\ &(Y_1^{(m)}, \boldsymbol X_1^{(m)}), \ldots, (Y^{(m)}_{n_m}, \boldsymbol X^{(m)}_{n_m}) \end{align*}
Select j \in \{1, \ldots, m\} as left-out test sample and use the other subsamples to compute the ridge estimator \widehat{\boldsymbol \beta}_{ridge}^{(-j)}, where the j-th fold is not used.
Compute Equation 10.3 using the j-th folds as a test sample, i.e., \widehat{MSPE}_{j} = \frac{1}{n_{j}} \sum_{i=1}^{n_{j}} \Big(Y_i^{(j)} - \big(\boldsymbol X_i^{(j)}\big)'\widehat{\boldsymbol \beta}_{ridge}^{(-j)}\Big)^2
The m-fold cross validation estimator is the weighted average over the m subsample estimates of the MSPE: \widehat{MSPE}_{mfold} = \sum_{j=1}^m \frac{n_j}{n} \widehat{MSPE}_{j}, where n = \sum_{j=1}^m n_j is the total number of observations.
Repeat these steps over a grid of tuning parameters for \lambda, and select the value for \lambda that minimizes \widehat{MSPE}_{mfold}.
Common values for m are m=5 and m=10. The larger m, the less biased the estimation of the MSPE is, but also the more computationally expensive the cross validation becomes.
The largest possible value for m is m=n, where each observation represents a fold. This is also known as leave-one-out cross validation (LOOCV). LOOCV might be useful for small datasets but is often infeasible for large dataset because of the large computation time.
10.9 L2 Regularization: Ridge
The \ell_p-norm of a vector \boldsymbol a = (a_1, \ldots, a_k)' is defined as \|\boldsymbol a\|_p = \bigg( \sum_{j=1}^k |a_j|^p \bigg)^{1/p}.
Important special cases are the \ell_1-norm and \ell_2-norm: \|\boldsymbol a\|_1 = \sum_{j=1}^k |a_j|, \quad \|\boldsymbol a\|_2 = \bigg(\sum_{j=1}^k a_j^2\bigg)^{1/2} = \sqrt{\boldsymbol a' \boldsymbol a}.
The \ell_1-norm is the sum of absolute values, and the \ell_2-norm, also known as the Euclidean norm, represents the length of the vector in the Euclidean space.
Ridge regression is also called L2 regularization because it penalizes the sum of squared errors by the square of the \ell_2-norm of the coefficient vector, \|\boldsymbol \beta\|_2^2 = \boldsymbol \beta' \boldsymbol\beta. Ridge is the solution to the minimization problem Equation 10.2, which can be written as \widehat{\boldsymbol \beta}_{ridge} = \argmin_{\boldsymbol \beta} (\boldsymbol Y-\boldsymbol X \boldsymbol \beta)'(\boldsymbol Y-\boldsymbol X \boldsymbol \beta) + \lambda \|\boldsymbol \beta\|_2^2.
10.10 L1 Regularization: Lasso
An alternative approach is L1 regularization, also known as lasso. The lasso estimator is defined as \widehat{\boldsymbol \beta}_{lasso} = \argmin_{\boldsymbol \beta} (\boldsymbol Y-\boldsymbol X \boldsymbol \beta)'(\boldsymbol Y-\boldsymbol X \boldsymbol \beta) + \lambda \|\boldsymbol \beta\|_1, where \|\boldsymbol \beta\|_1 = \sum_{j=1}^k |\beta_j|.
The elastic net estimator is a hybrid method. It combines L1 and L2 regularization using a weight 0 \leq \alpha \leq 1: \widehat{\boldsymbol \beta}_{net,\alpha} = \argmin_{\boldsymbol \beta} (\boldsymbol Y-\boldsymbol X \boldsymbol \beta)'(\boldsymbol Y-\boldsymbol X \boldsymbol \beta) + \lambda \big( \alpha \|\boldsymbol \beta\|_1 + (1-\alpha) \|\boldsymbol \beta\|_2^2 \big). This includes ridge (\alpha = 0) and lasso (\alpha = 1) as special cases.
Ridge has a closed form solution given by Equation 10.1. Lasso and elastic net with \alpha > 0 require numerical solutions by means of quadratic programming. The solution typically involves some zero coefficients.
10.11 Implementation in R
Let’s consider the mtcars
dataset, which is available in base R. Have a look at ?mtcars
to see the data description.
We estimate a ridge regression model to predict the variable mpg
(miles per gallon) using the other variables. We consider the values \lambda = 0.5 and \lambda = 2.5.
Ridge, lasso, and elastic net are implemented in the glmnet
package. The glmnet()
function requires matrix-valued data as input. The model.matrix()
command is useful because it produces the regressor matrix \boldsymbol X and converts categorical variables into dummy variables.
Y = mtcars$mpg
X = model.matrix(mpg ~., data = mtcars)[,-1]
## Number of observations n and regressors k:
dim(X)
[1] 32 10
fit.ridge1 = glmnet(x=X, y=Y, alpha=0, lambda = 0.5)
fit.ridge2 = glmnet(x=X, y=Y, alpha=0, lambda = 2.5)
fits = cbind(coef(fit.ridge1), coef(fit.ridge2))
colnames(fits) = c("lambda=0.5", "lambda=2.5")
fits
11 x 2 sparse Matrix of class "dgCMatrix"
lambda=0.5 lambda=2.5
(Intercept) 19.420400249 21.179818696
cyl -0.250698757 -0.368541841
disp -0.001893223 -0.005184086
hp -0.013079878 -0.011710951
drat 0.978514241 1.052837310
wt -1.902328296 -1.264016952
qsec 0.316107066 0.164790158
vs 0.472551434 0.755205256
am 2.113922488 1.655241565
gear 0.631836101 0.546732963
carb -0.661215998 -0.560023425
By default the regressors are standardized. Therefore the coefficients represent the marginal effects as the change in the response variable for a one standard deviation change in the regressor. For instance, with \lambda=0.5, the coefficient of wt
(weight) is -1.9, which means that a one standard deviation increase in weight leads to a decrease of 1.9 miles per gallon.
When we exclude the intercept, the average coefficient size (with respect to the \ell_2 norm) becomes small for larger values of \lambda:
The lasso estimator (\alpha = 1) sets many coefficients equal to zero:
fit.lasso1 = glmnet(x=X, y=Y, alpha=1, lambda = 0.5)
fit.lasso2 = glmnet(x=X, y=Y, alpha=1, lambda = 2.5)
fits = cbind(coef(fit.lasso1), coef(fit.lasso2))
colnames(fits) = c("lambda=0.5", "lambda=2.5")
fits
11 x 2 sparse Matrix of class "dgCMatrix"
lambda=0.5 lambda=2.5
(Intercept) 35.88689755 30.0625817
cyl -0.85565434 -0.7090799
disp . .
hp -0.01411517 .
drat 0.07603453 .
wt -2.67338139 -1.7358069
qsec . .
vs . .
am 0.48651385 .
gear . .
carb -0.10722338 .
The cv.glmnet()
command estimates the optimal shrinkage parameter using 10-fold cross validation:
We can use ridge and lasso to estimate linear models with more variables than observations. The command ^2
includes all pairwise interaction terms, which produces 55 variables in total. The dataset has n=32 observations.
X.large = model.matrix(mpg ~. ^2, data = mtcars)[,-1]
dim(X.large) # more regressors than observations
[1] 32 55
fit.ridgelarge = glmnet(x=X.large, y=Y, alpha=0, lambda = 0.5)
fit.lassolarge = glmnet(x=X.large, y=Y, alpha=1, lambda = 0.5)
fits = cbind(
coef(fit.ridgelarge), coef(fit.lassolarge)
)
colnames(fits) = c("ridge", "lasso")
fits
56 x 2 sparse Matrix of class "dgCMatrix"
ridge lasso
(Intercept) 1.315259e+01 23.655330629
cyl -4.061218e-02 -0.036308043
disp -8.137358e-04 .
hp -5.588290e-03 .
drat 4.386174e-01 .
wt -5.547986e-01 -1.301739306
qsec 2.308772e-01 .
vs 6.705889e-01 .
am 4.379822e-01 .
gear 8.788479e-01 .
carb -1.537294e-01 .
cyl:disp 6.830897e-05 .
cyl:hp 1.351742e-04 .
cyl:drat 2.455464e-02 .
cyl:wt -2.621868e-03 .
cyl:qsec 3.358094e-03 .
cyl:vs 1.591177e-01 .
cyl:am 6.102385e-02 .
cyl:gear 3.481957e-02 .
cyl:carb 7.499023e-04 .
disp:hp 8.592521e-06 .
disp:drat -9.421536e-05 .
disp:wt 2.191122e-04 .
disp:qsec -1.789464e-05 .
disp:vs -1.280463e-03 .
disp:am -9.043597e-03 .
disp:gear -3.601317e-04 .
disp:carb -1.255358e-04 .
hp:drat -2.086003e-03 .
hp:wt 4.404097e-04 .
hp:qsec -4.347470e-04 -0.001328046
hp:vs -1.858343e-02 .
hp:am -2.604620e-03 .
hp:gear -3.464491e-04 .
hp:carb 9.107116e-04 .
drat:wt -1.766081e-01 -0.337667877
drat:qsec 3.828881e-02 0.073725291
drat:vs 1.123963e-01 .
drat:am 5.047132e-02 .
drat:gear 8.294201e-02 .
drat:carb -4.770358e-02 .
wt:qsec -3.289204e-02 .
wt:vs -3.239643e-01 .
wt:am -4.197733e-01 .
wt:gear -1.890703e-01 .
wt:carb -1.497574e-02 .
qsec:vs 3.114409e-02 .
qsec:am 5.199239e-02 .
qsec:gear 7.035311e-02 0.041623415
qsec:carb -1.859676e-02 .
vs:am 8.688134e-01 2.429571498
vs:gear 3.311330e-01 .
vs:carb -2.768199e-01 .
am:gear 1.462749e-01 .
am:carb 1.588431e-01 .
gear:carb 8.165764e-03 .
To get the fitted values you may use the predict()
command:
Yhatridge = predict(fit.ridgelarge, newx = X.large)
Yhatlasso = predict(fit.lassolarge, newx = X.large)
Yhats = cbind(Y, Yhatridge, Yhatlasso)
colnames(Yhats) = c("Y", "Yhat-ridge", "Yhat-lasso")
Yhats
Y Yhat-ridge Yhat-lasso
Mazda RX4 21.0 20.94312 21.64528
Mazda RX4 Wag 21.0 20.47797 21.14997
Datsun 710 22.8 26.12112 25.98585
Hornet 4 Drive 21.4 19.57785 19.91064
Hornet Sportabout 18.7 17.25059 17.35026
Valiant 18.1 19.25815 19.52858
Duster 360 14.3 14.80168 15.42082
Merc 240D 24.4 23.06386 22.50685
Merc 230 22.8 23.69586 22.78181
Merc 280 19.2 18.47341 19.75241
Merc 280C 17.8 18.75521 19.92770
Merc 450SE 16.4 15.39830 15.79922
Merc 450SL 17.3 16.19856 16.61670
Merc 450SLC 15.2 16.21931 16.54465
Cadillac Fleetwood 10.4 12.25717 12.57063
Lincoln Continental 10.4 11.74625 11.88810
Chrysler Imperial 14.7 11.64161 11.58002
Fiat 128 32.4 28.79845 27.43656
Honda Civic 30.4 31.07410 29.68475
Toyota Corolla 33.9 30.63399 28.72288
Toyota Corona 21.5 22.35048 22.60097
Dodge Challenger 15.5 17.17402 17.68091
AMC Javelin 15.2 17.70056 17.97138
Camaro Z28 13.3 14.14050 14.67766
Pontiac Firebird 19.2 16.37763 16.39890
Fiat X1-9 27.3 29.32240 27.93021
Porsche 914-2 26.0 26.15812 24.43481
Lotus Europa 30.4 28.93150 27.72235
Ford Pantera L 15.8 16.69717 17.16642
Ferrari Dino 19.7 20.27929 20.20595
Maserati Bora 15.0 14.07394 14.80373
Volvo 142E 21.4 23.30782 24.50302