Decision Trees - Machine Learning
Autor: Maryam • April 1, 2018 • 1,954 Words (8 Pages) • 562 Views
...
Gains for Nodes
[pic 25]
Node
Gain
Node
N
Percent
N
Percent
Response
Index
1
394
61.0%
179
66.1%
45.4%
108.3%
2
252
39.0%
92
33.9%
36.5%
87.0%
Growing Method: CHAID
Dependent Variable: BINARY LOYALTY
---------------------------------------------------------------
Risk
Estimate
Std. Error
.420
.019
Growing Method: CHAID
Dependent Variable: BINARY
LOYALTY
---------------------------------------------------------------
Classification
Predicted
LOW
HIGH
Percent
Observed
LOYALTY
LOYALTY
Correct
LOW LOYALTY
375
0
100.0%
HIGH LOYALTY
271
0
0.0%
Overall
100.0%
0.0%
58.0%
Percentage
Growing Method: CHAID
Dependent Variable: BINARY LOYALTY
[pic 26][pic 27]
---------------------------------------------------------------
TAKEAWAYS
Gain-Percentile graph shows that the cumulative gain % for node1 for High Loyalty: 179/271=66.1% whereas gain % for node2: 92/271= 33.9%. Moreover, since there are 2 nodes the cumulative gain % doesn't go above 66.1% (@ 60 percentile) and goes to 100% (66.1+33.9) (@ 100 percentile) makes the area under the curve not considerably greater than .5, which is area of the curve based on naive model(50:50 probability of each of the 2 response categories).
Gain is the percentage of total cases in the target category in each node, computed as: (node target n / total target n) x 100. The gains chart is a line chart of cumulative percentile gains, computed as: (cumulative percentile target n / total target n) x 100.
Response-Percentile graph shows that the response for the targeted category of High Loyalty doesn't deviate much from the response of the entire sample on the whole. For a model having good predictive power, the higher the response of any of the nodes of the decision tree created than the response of node 0, the better is the tree. In this case the parent node gives response of 42% for high loyalty whereas node 1&2 give 45% & 37% resp., not good enough response prediction.
Response is the percentage of cases in the node in the specified target category. The response chart is a line chart of cumulative percentile response, computed as: (cumulative percentile target n/ cumulative percentile total n) x 100.
Index-Percentile graph shows that the ratio of response of each of the two terminal nodes to the parent node(node 0) is close to 1. 108% for node1 and 87% for node2. For a model having good predictive power, the ratio percentage should be as big as possible that shows that the tree has good response predictability in yielding the class of interest for a set of predictor values.
Index is the ratio of the node response percentage for the target category compared to the overall target category response percentage for the entire sample. The index chart is a line chart of cumulative percentile index values. Cumulative percentile index is computed as: (cumulative percentile response percent / total response percent) x 100.
Each of the 2 terminal nodes: 1&2 are predicted as Low Loyalty since misclassification costs are same for each of the 2 response categories as well as more than 50% of the customers categorized under each node have Low Loyalty. As a result the classification has 0% predictability for High Loyalty customers with a low correct prediction percentage of 58% and a high risk percentage/misclassification percentage of 42%.
Hence, not a good model.
---------------------------------------------------------------
Marketing Model for Brand Advocacy
Customer Loyalty Assessment based on Demographics & Behavioral Responses
[pic 28]
Model Summary
[pic 29][pic 30][pic 31][pic 32][pic 33][pic 34][pic 35][pic 36][pic 37][pic 38][pic 39][pic 40][pic 41][pic 42][pic 43][pic 44][pic 45]
Specifications
...