What Is the Most Important Part of an External Audit

Autor: Maryam • December 30, 2017 • 2,285 Words (10 Pages) • 1,400 Views

Page 1 of 10

...

This leads us to look closer at these 2 models to understand them in more detail.

We check if transformations are needed for the response variable for the 2 models using box cox. Since the 95% interval covers 1 for both, we proceed without using transformations.

[pic 5][pic 6]

Analysis of different fits

Prior to using cook’s distance or the removal of high leverage points, the model that we possessed was

Height= 200.21005 +0.58890 biacromial -0.57510 waist.girth -0.57398 thigh.girth -0.61655 bicep.girth -0.62328 calf.girth +1.25772 weight +4.97965 gender -0.24329 chest.girth .

This model had an adjusted r squared of 0.7877.

We look at our 2 possible models with different combination of data point removals and look for large changes in coefficients as well if there is an improvement in the r squared of these new models.

Excluding points with high cook’s distance (fit3)

Coefficients change does occur for this new model, in particular biacromial and gender have significant changes in their estimated coefficients (0.44517 and 3.7916 are the new coefficients for biacromial gender respectively). As we can see from figure 3 the original model’s qq plot has a slightly heavy tail, this is improved for the plot after removing the points with high cooks distance. There are fewer points that deviate massively from the straight line.

Without high leverage points or high cooks distance (fit4)

Like the above model, there are large changes in the coefficients which suggest that the removal of these points had a significant effect on the model and thus should be considered as a permanent change. Figure 3 shows that the qq plot for the new model is also improved, looking very similar to the qq plot of the data set of having removed high cooks distance.

The analysis of these 2 data sets provide very similar results, with both new data sets having significant changes in coefficients as well as a marked improvement in the qq plot. To try and separate these two models we looked at the respective anova tables and the summary of the two models.

The anova tables for the 2 data sets show that fit3 (without data points with high cooks distance) has better or the same F values than fit 4 (removing both high leverage points and high cooks distance points) for 7 of the variables and better F values for 5 of the variables.

This shows fit 3 has a higher probability that the coefficients are not zero.

The regression without high cooks distance points also possess a slightly better adjusted r squared than fit 4. Its adjusted r squared of .83 is also improved from the initial r squared value of .78. The improved adjusted r squared means that more of the data points is explained by the model that we have.

So based on these figures we believed the fit 3 model was marginally better than the others and so it was chosen as out final model.

Final Model Analysis [pic 7][pic 8]

[pic 9]

Justification of model choice

Above on the left is the anova table for our final model, the most noticeable point is the p value for the F test of thigh girth which is fairly small. However when fitted last, the thigh girth’s p value for the f test (equivalent to its p value for the t test shown in the summary table below) is very small suggesting that thigh girth is significant after the effects of the other variables have been taken into account.

If thigh.girth is fitted first it has a p value for the f test of 5.953e-09, the fact that it is less significant after being fitted behind biacromial and waist could suggest they are positively correlated. Using the cor function in R, we find there is a slight correlation with thigh.girth and waist which could be the cause of the drop in p value.

All the other variables in the model do have very good p values meaning there is a very good chance that the coefficient is not zero and they have a significant role in explaining the response variable.

Change in extreme points[pic 10][pic 11]

The graphs above show that prior to data removal, there were several data points that had both high leverage and high residual. After removing them, the fit of the data points is much improved and there are no patterns to be seen.

Explanation of model and the significance of each individual variable

Height= 203.34501 -0.71210 thigh+ 0.44517 Biacromial -0.72215 bicep.girth

-0.47886 calf.girth+1.35938 weight + 3.79164 gender -0.17182 chest.girth-0.63473 waist.girth

The range of data is very important in understanding how our model predicts height. For example while gender has the highest coefficient, its data points range from 0 to 1, so its effect is fairly small to the response variable. In contrast weight has a comparably high coefficient with the other variables and a similar range of values. Only chest girth has values that are consistently larger but its coefficient is half the size.

This leads us to believe that weight is the variable that has the clearest and most significant effect in explaining height. The fact that it has very high correlation with a lot of variables support this conclusion as its high correlation make it a representative of a set of variables for each data point. Weight also has a very high VIF of 16.5 which implies there is a very amount of correlation with other variables, this does not mean necessarily our model is very poor, its predictive power as a model is still intact but our ability to predict based on individual variables is limited. This is a fairly large flaw in our model but one that we cannot easily solve without removing variables that have high VIF or to use methods such as LASSO.

Biacromial is a variable that has good p values for the t test and f test, and a high correlation with height[3]. This leads us to believe that it has a big role in predicting the height of an individual. Unlike many of the other variables which are more prone to changes due to individual training regimes which could mislead us. For example, bicep girth will increase largely with individual training, this is not something that applies to biacromial

...

Download: txt (13.8 Kb) pdf (58.8 Kb) docx (17.8 Kb)

Continue for 9 more pages »

Read Full Essay Save to my library

Only available on Essays.club