Goodness of Fit to the Regression Line

We are now prepared to address how well the regression line fits the data. We have already said that a good line should pass through the coordinates of Xav and Yav, but the line should also be minimally distant from all the other data. "Minimally distant" is a general objective of all statistical analysis. After all, we do not know these random variables exactly; if we did, they would be deterministic and not random. Therefore, statistical methods in general strive to minimize the distance or error between the probabilistic values found in the probability density function for the random variable and the real value of the variable.

As discussed in Chapter 2, we minimize distance by calculating the square of the distance between a data observation and its real, mean, or estimated value and then minimizing that squared error. Figure 8-3 provides an illustration. From each data observation value along the horizontal axis, we measure the Y-distance to the regression line from that data observation point. Each such measure is of the form:

HoonsnsaJ a*ia = Tima inrltnfirnfirl v^iri/itrib t

Figure 8-3: Distance Measures in Linear Regression.

where Y is the specific observation, and YX is the value of Y on the linear regression line closest to Yi.

Consider also that there is another distance measure that could be made. This second distance measure involves the Yav rather than the Yx

Figure 8-4 illustrates the measures that sum to Y2dAv. Ordinarily, this second distance measure is counterintuitive because you would think you would always want to measure distance to the nearest point on the regression line and not to an average point that might be further away than the nearest point on the line. However, the issue is whether or not the variations in Y really are dependent on the variations in X. Perhaps they are strongly or exactly dependent. Then a change in Y can be forecast with almost no error based on a forecast or observation of X. If such is the case, then Y2 distance is the measure to use. However, if Y is somewhat, but not strongly, dependent on X, then a movement in X will still cause a movement in Y but not to the extent that would occur if Y were strongly dependent on X. For the loosely coupled dependency, Y^dAv is the measure to use.

The disUnce pi any pg rticulfl! data point 1«nn average oosl is square root ol thie vertical distance* = Sqfl (Caue - Sip, wliene Cave is trio average value of Ihecosl.

Horizontal axia = Tunc Independent vanable T Figure 8-4: Distance Measures to the Average.

The disUnce pi any pg rticulfl! data point 1«nn average oosl is square root ol thie vertical distance* = Sqfl (Caue - Sip, wliene Cave is trio average value of Ihecosl.

Horizontal axia = Tunc Independent vanable T Figure 8-4: Distance Measures to the Average.