## Multiple variable regression analysis

In multiple variable regression analysis, the dependent variable y depends on various independent variables Xi, x2... x^:

where:

Xk = k-th independent variable y = dependent variable e = error term or residual

0k = regression slope for variable x

0O = y-axis intercept

Least squares multiple variable regression analysis is similar to the single variable case outlined in Section 4.1.

The estimated regression equation is:

This equation can also be calculated from the data set by a computer.

The firm of architects from the single variable regression analysis example wants a regression model that better fits the data. To this end more properties of the houses are added to the database. The database now has data on the following properties of each house:

• Gross floorspace;

• Usable floorspace;

• Number of floors the roof spans.

### Table 4.2 shows the content of the database.

As shown in the single regression example the usable floorspace has a larger correlation coefficient than the gross floorspace. Still, the question arises whether both should be taken into account when performing the multiple regression analysis. This should be done if both variables do not correlate to each other. Both describe more or less the same property of a house, so caution is warranted to prevent multi-collinearity. We therefore carry out a test

 Type [m2] Usable floorspace [m2] Surface area fagade [m2] Number of floors Number of floors roof spans 3A 98,352 163 122.20 60.00 2.50 1.00 3AK 117,887 163 122.20 149.00 2.50 1.00 3B 121,408 173 122.20 153.00 2.50 1.00 3C 106,772 163 128.10 68.00 2.50 1.00 8A 95,422 142 111.40 37.00 2.50 2.00 8B 108,140 168 132.70 58.00 2.50 2.00 8C 107,689 161 125.30 42.00 2.50 2.00 8D 111,181 137 93.50 77.00 2.50 2.00 5B 104,175 178 133.70 66.00 2.50 1.00 5C 121,962 188 133.70 157.00 2.50 1.00 5E 112,513 178 139.20 75.00 2.50 1.00 6A 100,905 158 122.70 39.00 2.50 2.00 6B 114,288 194 151.90 51.00 2.50 2.00 6C 122,941 167 122.90 120.00 2.50 2.00 6D 131,675 172 126.40 126.00 2.50 2.00 7A 95,360 147 121.10 41.00 2.50 2.00 7AK 112,445 147 121.10 121.00 2.50 2.00 7B 117,433 195 148.20 51.00 2.50 2.00 7BK 134,518 195 148.20 131.00 2.50 2.00 7E 114,288 194 151.90 93.00 2.50 2.00 7F 113,625 147 121.10 85.00 2.50 2.00 1A 139,567 169 153.00 105.00 4.00 0.50 1AK 177,639 169 153.00 244.00 5.00 0.50 1AK2 184,950 204 184.60 125.00 5.00 0.50 4A 136,986 210 171.70 80.00 3.50 2.00 4B 191,438 215 192.80 228.00 3.50 2.00 4C 192,753 235 203.00 183.00 3.50 2.50 4D 167,405 215 182.40 121.00 3.50 2.00 5A 118,838 194 161.70 74.00 2.50 1.00 5AK 140,294 194 161.70 179.00 2.50 1.00 5D 147,366 203 166.80 200.00 2.50 1.00

□ Microsoft Excel - e_ra-2ods ill] Fie Edit View Insert Format lools Data Window WB! Help i 1 rf J i.j .J 4 ?» * -J A - J -)-<»- ft > • i.

ill] Fie Edit View Insert Format lools Data Window WB! Help i 1 rf J i.j .J 4 ?» * -J A - J -)-<»- ft > • i.

 A B C D E F G H 1 J K L — 1 Cost price Gross fs. Usable fs. Facade Floors :loors-roof 2 Type 3A 98352 163 122.2 60 2.5 3 Type 3A 117887 163 122.2 148.92 2.5 4 Type 3B 121408 173 122.2 153 2.5 5 Type 3C 106772 163 128.1 68 2.5 6 Type 8A 95422 142 111.4 37 2.5 2 7 Type 8B 108140 168 132.7 58 2.5 2 8 Type 8C 107689 161 125.3 42 2.5 2 9 Type 8D 111181 137 93.5 77 2.5 2 10 Type 5B 104175 178 133.7 66 2.5 1 11 Type 5C 121962 188 133.7 157 2.5 1 12 Type 5E 112513 178 139.2 75 2.5 1 13 Type 6A 100905 158 122.7 39 2.5 2 14 Type 6B 114288 194 151.9 51 2.5 2 15 Type 6C 122941 167 122.9 120 2.5 2 16 Type 6D 131675 172 126.4 126 2.5 2 17 Type 7a 95360 147 121.1 41 2.5 2 — 18 Type ta 112445 147 121.1 120.75 2.5 2 HI 19 Type 7B 117433 195 148.2 51 2.5 2 20 Type 7B 134518 195 148.2 130.75 2.5 2 Ir 21 Type 7E 114288 194 151.9 92.8 2.5 2 s 22 Type 7F 113625 147 121.1 85 2.5 2 23 Type 1A 139567 169 153 105 4 0.5 ?4 Tvoe 1A 177639 169 153 243.51 5 0.5 n < ► nlVsheetl/ Sheet2 /Sheet3 / < 1 Jjj

Figure 4.11 Data from database in spreadsheet to see whether these properties correlate to each other. If they do, only one should be taken into account.

To carry out this test, in cell D33 we type:

This shows the correlation coefficient (r = 0.93) from which you can conclude that they indeed correlate. This means we should not take both into account but only the one that best correlates to the cost price. From the previous example we know that the usable floorspace correlates best to the cost price.

To carry out the regression analysis we first add the data to an empty Excelsheet as shown in Figure 4.11.

Select the menu item 'Tools/Data analysis' and then select 'Regression' from the list. This opens up a new window where you have to select the 'Input Y range' and the 'Input X range'. The 'Input Y range' is the range of cells containing the cost prices, in this case cells B2 through B32. The 'Input X range' is the range of cells containing the data on the different properties, in this case cells D2 through G32. After adding these ranges press the 'OK' button to carry out the analysis. This will output the results on a new worksheet as shown in Figure 4.12.

This output shows that the model now fits the data much better because the value of 'R Adjusted' (cell B6) equals 0.95.

FJe Edit View Insert Format Tools Data Window WB! Help Type a question for help - _ iS X

J A Là ¿J I d! iA I y £11 * ^ a - <f I "J - ' I & z - Si il I S - © 1

Anal - io ' I B I u I m m m g I \$ % t las ¿ig | ïw iw I EE - & - A -1

A1 -r j& SUMMARY OUTPUT_

FJe Edit View Insert Format Tools Data Window WB! Help Type a question for help - _ iS X

J A Là ¿J I d! iA I y £11 * ^ a - <f I "J - ' I & z - Si il I S - © 1

Anal - io ' I B I u I m m m g I \$ % t las ¿ig | ïw iw I EE - & - A -1

A1 -r j& SUMMARY OUTPUT_

A I B

C

D

E

F

G

H

I

j

K

L

-

1

SUMMAR

OUTPUT

2

3

Regress/on Statistics

4

Multiple R

0.979386

5

R Square

0.959197

6

0.952919

7

Standard I

6084.239

8

Obseivatio

31

9

10

ANOVA

11

df

SS

MS

F

qnificance

F

12

Regressio

4

2.26E+1Û

5.66E+Û9

152.8009

1.17E-17

13

Residual

26

9.62E+08

37017964

14

Total

30

2.36E+10

15

16

Coefficientifandard Err

tStat

P-value

Lower 95% Upper 95% ower 95. Q9'pper 95.0°/

17

Intercept

-13450.6

7483.928

-1.79727

0.083919

-28834.1

1932.793

-28834.1

1932.793

18

X Variable

412.128

58.24576

7.075673

1.63E-07

292.4021

531.8539

292.4021

531.8539

19

X Variable

220.6636

23 81237

9.266761

1.01E-09

171.7165

269.6106

171.7165

269.6106

20

X Variable

16062.93

2119.084

7.580129

4.79E-08

11707.09

20418.77

11707.09

20418.77

21

X Variable

8360.701

2088.98

4.002289

0.000465

4066.742

12654.66

4066.742

12654.66

22

23

►IF

H < ► n|\Sheet4 / Sheetl /Sheet2

5heet3 /

<l

Figure 4.12 Multiple regression analysis output in spreadsheet

The equation to predict the deviation can be derived from the 'Intercept' (cell B17) and the 'X Variable' (cell B18 and B19). The equation is:

y = -13450.6 + 412.128xx + 220.6636x2 + 16062.93x3 + 8360.701x4 (4.9)

where:

 y = cost price Xi = usable floorspace X2 = surface area façade X3 = number of floors X4 = number of floors the roof covers

If you examine the different factors of the equation, you will notice that the first two factors regarding the usable floorspace and surface area of the façade have a substantially lower magnitude in comparison to the last two factors regarding the number of floors and the number of floors the roof covers. This is quite plausible because it is likely that an increase of the number of floors or the number of floors the roof spans with one floor has a substantially bigger effect on the cost price than an increase of the amount of usable floorspace or an increase of the surface area of the façade with one metre.

The better fit of the model (higher 'R Adjusted') is due to the fact that more properties affecting the cost price have been taken into account. The fact that

Fie Edit View Insert Format lools Data Window WB! Help

! Arial T 10 T B I U m M = Ü I \$ % * trio ¿"0 ¡Î ÏP

A1 T fx SUMMARY OUTPUT

Fie Edit View Insert Format lools Data Window WB! Help

! Arial T 10 T B I U m M = Ü I \$ % * trio ¿"0 ¡Î ÏP

A1 T fx SUMMARY OUTPUT

 A | B C D E F G H I j K L 1 SUMMAR OUTPUT 2 3 Regression Statistics 4 Multiple R 0.980067 5 R Square 0.960532 B Adjusted 0.952638 7 Standard 6102.358 8 Obseivatio 31 9 10 ANOVA F 11 df SS MS F :qnificance 12 Regressio 5 2.27E+10 4.53E+Û9 121.685 1 Û3E-16 13 Residual 25 9.31E+08 37238776 14 Total 30 2.36E+10 15 16 Coefficientifandard Err t Stat P-value Lower 95% Upper 95%ower 95.09'pper 95.0°/ 18 19 Intercept X Variable X Variable -22160.7 141.3304 268.5296 12084.53 153.6718 166.7088 -1.83381 0.91969 1.610771 0.078615 0.366524 -47049.2 -175.163 2727.882 457.8234 -47049.2 2727.882 -175.163 457.8234 0.119785 -74.8135 611.8728 -74.8135 611.8728 22 X Variable X Variable X Variable 218.213 17508.68 8453.903 24.03147 2643.575 2097.65 9.080301 2.16E-09 168.7192 267.7067 168.7192 267.7067 6.623108 6.12E-07 12064.14 22953.23 12064.14 22953.23 4.030177 0.000458 4133.711 12774.09 4133.711 12774.09 23 H -< ► H |\ She et5 / Sheet / Sheet2 Sheet3 / hi i h r

Figure 4.13 Multiple regression analysis output in spreadsheet types 3A and 3AK have the same gross and usable floorspace, but differ in cost price is explained by the difference in surface area of the façade. The difference is due to the fact that these houses are placed at the end of a block of houses and therefore its façade surface area is larger. The number of floors, and the number of floors the roof covers further adds to the precision of the model.

Finally, to illustrate the effect of multi-collinearity, we carry out the regression analysis including the gross floorspace. Remember that we established a correlation between the usable and gross floorspace. The results of this analysis are shown in Figure 4.13.

This output shows that adding the extra variable 'gross floorspace' does not yield a better fit of the model because the value of 'R Adjusted' (cell B6) being 0.95 equals the value found in the previous analysis. Even worse, the output shows that statistical significance has decreased because the p-value, which was nearing zero in the previous analysis, now has risen to 0.37 and 0.12 respectively. The drop in statistical significance is due to the fact that the variables 'gross floorspace' and 'usable floorspace' are measures for more or less the same property. This results in the model 'having difficulty' distinguishing between the effects of both variables on the cost price of the house. The model therefore yields a lower significance. This phenomenon is called multi-collinearity and is the reason why we used only one of both variables in the regression analysis. Figure 4.14 Optimistic, pessimistic estimates for Monte Carlo simulation related to standard deviation from regression analysis ## Best Investment Tips and Ideas

The dynamics of investing can be very emotional and stressful if not properly managed. When you are aware of what is all involved you give yourself the power to avoid those situations or at least manage them effectively. That will make your investments more exciting, rewarding, and enjoyable. Those positive factors will only lead to greater success in all that achieve with investments and life.

Get My Free Ebook