Regression for Sale Unit of Men’s Wallet
Autor: goude2017 • November 21, 2018 • 965 Words (4 Pages) • 558 Views
...
The relationships between “Customers” and “Comments”, “Comments” and “Collection Add” are very doubtful. So we need close observation during liner regression later.
Regression Analysis
1st Liner Regression
[pic 8]
As we all know, the P-value should be no more than 5%, so we removed “Operation Time” firstly which has the largest P-value.
2nd Liner Regression
[pic 9]
This time, we removed “Price”.
3rd Liner Regression
[pic 10]
Based on the same principle, we removed “Comments”.
4th Liner Regression
[pic 11]
From the result, we noticed that both “R Square” and “Adjusted R Square” are 66%, close to 1. And all the variables’ P-values are smaller than 5%. All meet the principles.
However, we remember the doubtful correlations among “Customers”, “Comments”, and “Collection Add”. So we calculated the correlation again.
Sales Volume (unit)
Region
Discount
Customers
Collection Add
Sales Volume (unit)
1
Region
0.113860313
1
Discount
-0.044780318
-0.040390387
1
Customers
0.810195757
0.091465071
-0.018051062
1
Collection Add
0.459024798
0.078729011
0.048421446
0.510088819
1
We noticed that the correlation between “Customers” and “Collection Add”. Although 51% is not more than 70% (meet the principle), it is also large improperly. So we removed “Collection Add” and ran the last liner regression.
5th Liner Regression
[pic 12]
Estimation and Check the model assumption
Regression statistics
R2 is 0.66, Adjusted R2 is 0.66. This regression statistics is not very high but enough, we could believe the regression model fit the data. The standard error, which is an estimation of standard deviation of the random noise, is 281.7.
Test significance for the regression model
First we set the null hypothesis and alternative hypothesis.
H0: β2=β4=β5=0; Ha: some of β≠0
The p-value is 1.06*10-250 at 0.05 significance level.
Test significance for p-value of variances
H0: β1=0; Ha: β1≠0, reject H0: β1=0 at 0.05 level.
Intercept:
T statistics of intercept = 3.08
P-value of intercept = 0.00209, which is less than 0.05 significance
Region:
T statistics of region = 2.81
P-value of region= 0.00507, which is less than 0.05 significance
Discount:
T statistics of discount = -2.07
P-value of discount = 0.03847, which is less than 0.05 significance
Customer:
T statistics of customer = 58.07
P-value of customer = 2.21*10-7, which is less than 0.05 significance
So all the coefficient of the variances are significant.
Check for regression assumption
- Multi-collinearity
Sales Volume (unit)
Region
Discount
Customers
Sales Volume (unit)
1
Region
0.113860313
1
Discount
-0.044780318
-0.040390387
1
Customers
0.810195757
0.091465071
-0.018051062
1
From the table above, we can see the correlations between variables are not very high, so there is no evidence for multi-collinearity.
- Heteroscedasticity
[pic 13]
[pic 14]
[pic 15]
From the residual plots above, we can see there is no evidence of heteroscedasticity.
- Normality
[pic 16]
We can
...