Multiple variable regression analysis

In multiple variable regression analysis, the dependent variable y depends on various independent variables Xi, x2... x^:

where:

Xk = k-th independent variable y = dependent variable e = error term or residual

0k = regression slope for variable x

0O = y-axis intercept

Least squares multiple variable regression analysis is similar to the single variable case outlined in Section 4.1.

The estimated regression equation is:

This equation can also be calculated from the data set by a computer.

The firm of architects from the single variable regression analysis example wants a regression model that better fits the data. To this end more properties of the houses are added to the database. The database now has data on the following properties of each house:

• Gross floorspace;

• Usable floorspace;

• Number of floors the roof spans.

Table 4.2 shows the content of the database.

As shown in the single regression example the usable floorspace has a larger correlation coefficient than the gross floorspace. Still, the question arises whether both should be taken into account when performing the multiple regression analysis. This should be done if both variables do not correlate to each other. Both describe more or less the same property of a house, so caution is warranted to prevent multi-collinearity. We therefore carry out a test

Table 4.2 Cost price vs. house properties

Type

[m2]

Usable floorspace [m2]

Surface area fagade [m2]

Number of floors

Number of floors roof spans

3A

98,352

163

122.20

60.00

2.50

1.00

3AK

117,887

163

122.20

149.00

2.50

1.00

3B

121,408

173

122.20

153.00

2.50

1.00

3C

106,772

163

128.10

68.00

2.50

1.00

8A

95,422

142

111.40

37.00

2.50

2.00

8B

108,140

168

132.70

58.00

2.50

2.00

8C

107,689

161

125.30

42.00

2.50

2.00

8D

111,181

137

93.50

77.00

2.50

2.00

5B

104,175

178

133.70

66.00

2.50

1.00

5C

121,962

188

133.70

157.00

2.50

1.00

5E

112,513

178

139.20

75.00

2.50

1.00

6A

100,905

158

122.70

39.00

2.50

2.00

6B

114,288

194

151.90

51.00

2.50

2.00

6C

122,941

167

122.90

120.00

2.50

2.00

6D

131,675

172

126.40

126.00

2.50

2.00

7A

95,360

147

121.10

41.00

2.50

2.00

7AK

112,445

147

121.10

121.00

2.50

2.00

7B

117,433

195

148.20

51.00

2.50

2.00

7BK

134,518

195

148.20

131.00

2.50

2.00

7E

114,288

194

151.90

93.00

2.50

2.00

7F

113,625

147

121.10

85.00

2.50

2.00

1A

139,567

169

153.00

105.00

4.00

0.50

1AK

177,639

169

153.00

244.00

5.00

0.50

1AK2

184,950

204

184.60

125.00

5.00

0.50

4A

136,986

210

171.70

80.00

3.50

2.00

4B

191,438

215

192.80

228.00

3.50

2.00

4C

192,753

235

203.00

183.00

3.50

2.50

4D

167,405

215

182.40

121.00

3.50

2.00

5A

118,838

194

161.70

74.00

2.50

1.00

5AK

140,294

194

161.70

179.00

2.50

1.00

5D

147,366

203

166.80

200.00

2.50

1.00

□ Microsoft Excel - e_ra-2ods ill] Fie Edit View Insert Format lools Data Window WB! Help i 1 rf J i.j .J 4 ?» * -J A - J -)-<»- ft > • i.

ill] Fie Edit View Insert Format lools Data Window WB! Help i 1 rf J i.j .J 4 ?» * -J A - J -)-<»- ft > • i.

A

B

C

D

E

F

G

H

1

J

K

L

1

Cost price

Gross fs.

Usable fs.

Facade

Floors

:loors-roof

2

Type 3A

98352

163

122.2

60

2.5

3

Type 3A

117887

163

122.2

148.92

2.5

4

Type 3B

121408

173

122.2

153

2.5

5

Type 3C

106772

163

128.1

68

2.5

6

Type 8A

95422

142

111.4

37

2.5

2

7

Type 8B

108140

168

132.7

58

2.5

2

8

Type 8C

107689

161

125.3

42

2.5

2

9

Type 8D

111181

137

93.5

77

2.5

2

10

Type 5B

104175

178

133.7

66

2.5

1

11

Type 5C

121962

188

133.7

157

2.5

1

12

Type 5E

112513

178

139.2

75

2.5

1

13

Type 6A

100905

158

122.7

39

2.5

2

14

Type 6B

114288

194

151.9

51

2.5

2

15

Type 6C

122941

167

122.9

120

2.5

2

16

Type 6D

131675

172

126.4

126

2.5

2

17

Type 7a

95360

147

121.1

41

2.5

2

18

Type ta

112445

147

121.1

120.75

2.5

2

HI

19

Type 7B

117433

195

148.2

51

2.5

2

20

Type 7B

134518

195

148.2

130.75

2.5

2

Ir

21

Type 7E

114288

194

151.9

92.8

2.5

2

s

22

Type 7F

113625

147

121.1

85

2.5

2

23

Type 1A

139567

169

153

105

4

0.5

?4

Tvoe 1A

177639

169

153

243.51

5

0.5

n < ► nlVsheetl/ Sheet2 /Sheet3 /

<

1

Jjj

Figure 4.11 Data from database in spreadsheet to see whether these properties correlate to each other. If they do, only one should be taken into account.

To carry out this test, in cell D33 we type:

This shows the correlation coefficient (r = 0.93) from which you can conclude that they indeed correlate. This means we should not take both into account but only the one that best correlates to the cost price. From the previous example we know that the usable floorspace correlates best to the cost price.

To carry out the regression analysis we first add the data to an empty Excelsheet as shown in Figure 4.11.

Select the menu item 'Tools/Data analysis' and then select 'Regression' from the list. This opens up a new window where you have to select the 'Input Y range' and the 'Input X range'. The 'Input Y range' is the range of cells containing the cost prices, in this case cells B2 through B32. The 'Input X range' is the range of cells containing the data on the different properties, in this case cells D2 through G32. After adding these ranges press the 'OK' button to carry out the analysis. This will output the results on a new worksheet as shown in Figure 4.12.

This output shows that the model now fits the data much better because the value of 'R Adjusted' (cell B6) equals 0.95.

FJe Edit View Insert Format Tools Data Window WB! Help Type a question for help - _ iS X

J A Là ¿J I d! iA I y £11 * ^ a - <f I "J - ' I & z - Si il I S - © 1

Anal - io ' I B I u I m m m g I $ % t las ¿ig | ïw iw I EE - & - A -1

A1 -r j& SUMMARY OUTPUT_

FJe Edit View Insert Format Tools Data Window WB! Help Type a question for help - _ iS X

J A Là ¿J I d! iA I y £11 * ^ a - <f I "J - ' I & z - Si il I S - © 1

Anal - io ' I B I u I m m m g I $ % t las ¿ig | ïw iw I EE - & - A -1

A1 -r j& SUMMARY OUTPUT_

A I B

C

D

E

F

G

H

I

j

K

L

-

1

SUMMAR

OUTPUT

2

3

Regress/on Statistics

4

Multiple R

0.979386

5

R Square

0.959197

6

Adjusted

0.952919

7

Standard I

6084.239

8

Obseivatio

31

9

10

ANOVA

11

df

SS

MS

F

qnificance

F

12

Regressio

4

2.26E+1Û

5.66E+Û9

152.8009

1.17E-17

13

Residual

26

9.62E+08

37017964

14

Total

30

2.36E+10

15

16

Coefficientifandard Err

tStat

P-value

Lower 95% Upper 95% ower 95. Q9'pper 95.0°/

17

Intercept

-13450.6

7483.928

-1.79727

0.083919

-28834.1

1932.793

-28834.1

1932.793

18

X Variable

412.128

58.24576

7.075673

1.63E-07

292.4021

531.8539

292.4021

531.8539

19

X Variable

220.6636

23 81237

9.266761

1.01E-09

171.7165

269.6106

171.7165

269.6106

20

X Variable

16062.93

2119.084

7.580129

4.79E-08

11707.09

20418.77

11707.09

20418.77

21

X Variable

8360.701

2088.98

4.002289

0.000465

4066.742

12654.66

4066.742

12654.66

22

23

►IF

H < ► n|\Sheet4 / Sheetl /Sheet2

5heet3 /

<l

Ready NUM

Figure 4.12 Multiple regression analysis output in spreadsheet

The equation to predict the deviation can be derived from the 'Intercept' (cell B17) and the 'X Variable' (cell B18 and B19). The equation is:

y = -13450.6 + 412.128xx + 220.6636x2 + 16062.93x3 + 8360.701x4 (4.9)

where:

y =

cost price

Xi =

usable floorspace

X2 =

surface area façade

X3 =

number of floors

X4 =

number of floors the roof covers

If you examine the different factors of the equation, you will notice that the first two factors regarding the usable floorspace and surface area of the façade have a substantially lower magnitude in comparison to the last two factors regarding the number of floors and the number of floors the roof covers. This is quite plausible because it is likely that an increase of the number of floors or the number of floors the roof spans with one floor has a substantially bigger effect on the cost price than an increase of the amount of usable floorspace or an increase of the surface area of the façade with one metre.

The better fit of the model (higher 'R Adjusted') is due to the fact that more properties affecting the cost price have been taken into account. The fact that

Fie Edit View Insert Format lools Data Window WB! Help

! Arial T 10 T B I U m M = Ü I $ % * trio ¿"0 ¡Î ÏP

A1 T fx SUMMARY OUTPUT

Fie Edit View Insert Format lools Data Window WB! Help

! Arial T 10 T B I U m M = Ü I $ % * trio ¿"0 ¡Î ÏP

A1 T fx SUMMARY OUTPUT

A | B

C

D

E

F

G

H

I

j

K

L

1

SUMMAR

OUTPUT

2

3

Regression Statistics

4

Multiple R

0.980067

5

R Square

0.960532

B

Adjusted

0.952638

7

Standard

6102.358

8

Obseivatio

31

9

10

ANOVA

F

11

df

SS

MS F :qnificance

12

Regressio

5

2.27E+10

4.53E+Û9 121.685 1 Û3E-16

13

Residual

25

9.31E+08

37238776

14

Total

30

2.36E+10

15

16

Coefficientifandard Err

t Stat P-value Lower 95% Upper 95%ower 95.09'pper 95.0°/

18 19

Intercept X Variable X Variable

-22160.7 141.3304 268.5296

12084.53 153.6718 166.7088

-1.83381 0.91969 1.610771

0.078615 0.366524

-47049.2 -175.163

2727.882 457.8234

-47049.2

2727.882

-175.163

457.8234

0.119785

-74.8135

611.8728

-74.8135

611.8728

22

X Variable X Variable X Variable

218.213 17508.68 8453.903

24.03147 2643.575 2097.65

9.080301

2.16E-09

168.7192

267.7067

168.7192

267.7067

6.623108

6.12E-07

12064.14 22953.23 12064.14

22953.23

4.030177 0.000458

4133.711 12774.09 4133.711

12774.09

23

H -<

► H |\ She

et5 / Sheet

/ Sheet2

Sheet3 / hi

i h r

Figure 4.13 Multiple regression analysis output in spreadsheet types 3A and 3AK have the same gross and usable floorspace, but differ in cost price is explained by the difference in surface area of the façade. The difference is due to the fact that these houses are placed at the end of a block of houses and therefore its façade surface area is larger. The number of floors, and the number of floors the roof covers further adds to the precision of the model.

Finally, to illustrate the effect of multi-collinearity, we carry out the regression analysis including the gross floorspace. Remember that we established a correlation between the usable and gross floorspace. The results of this analysis are shown in Figure 4.13.

This output shows that adding the extra variable 'gross floorspace' does not yield a better fit of the model because the value of 'R Adjusted' (cell B6) being 0.95 equals the value found in the previous analysis. Even worse, the output shows that statistical significance has decreased because the p-value, which was nearing zero in the previous analysis, now has risen to 0.37 and 0.12 respectively. The drop in statistical significance is due to the fact that the variables 'gross floorspace' and 'usable floorspace' are measures for more or less the same property. This results in the model 'having difficulty' distinguishing between the effects of both variables on the cost price of the house. The model therefore yields a lower significance. This phenomenon is called multi-collinearity and is the reason why we used only one of both variables in the regression analysis.

Figure 4.14 Optimistic, pessimistic estimates for Monte Carlo simulation related to standard deviation from regression analysis
Real Estate Essentials

Real Estate Essentials

Tap into the secrets of the top investors… Discover The Untold Real Estate Investing Secrets Used By The World’s Top Millionaires To Generate Massive Amounts Of Passive Incomes To Feed Their Families For Decades! Finally You Can Fully Equip Yourself With These “Must Have” Investing Tools For Creating Financial Freedom And Living A Life Of Luxury!

Get My Free Ebook


Post a comment