Airfare Prediction Model
Autor: Jannisthomas • February 7, 2018 • 1,332 Words (6 Pages) • 670 Views
...
34% only remained constant in the same cluster when 5 % of the data is removed. This show that there is very less stability in the clusters formation.
- Use k-means clustering with the number of clusters that you found above in Part (a). Does the same picture emerge? If not, how does it contrast or validate the finding in Part c above?
Solution.
Same picture emerges in both K means clustering and Hierarchal clustering
Original coordinates
Cluster
#Obs
Avg. Dist
Cluster Names
Cluster-1
2519
33974.75
less: Infrequent Fliers
Cluster-2
198
151189.8
High: Frequent Fliers
Cluster-3
1282
79520.32
Medium: Intermittent Fliers
Overall
3999
54379.35
Observed to be same as found in Hierarchical Clustering
Normalized coordinates
Cluster
Obs
Avg. Dist
Cluster-1
2519
1.787111
Cluster-2
198
5.832602
Cluster-3
1282
2.274985
Overall
3999
2.143815
- Average distance travelled is Less in cluster 1: Denotes they are less frequent fliers.
- Average distance travelled is high in cluster 2: Means they are frequent fliers.
- Average Distance travelled is medium: Cluster 3 is in between the cluster 1 and cluster 2, making them infrequent fliers.
Same can be Seen in the Normalized Coordinates:
[pic 5]
- Which cluster(s) would you target for offers, and what type of offers would you target to customers in that cluster? Include proper reasoning in support of your choice of cluster(s) and the corresponding offer(s).
Solution.
- Cluster 1: As these are less frequent fliers nothing much can be done by providing promotional schemes to these.
- Cluster 2: As these are frequent fliers for the airlines the strategy should be to provide only promotional offers limited to retention of these as they are less price sensitive too. Offers related to service guarantee and quality would help more.
- Cluster 3: As these are price sensitive and intermittent fliers these will respond most to the promotional schemes and offers. Early bird discounts, bulk discounts and discounts like to and fro offers will help here.
Question 2
Q1. PCA on wine data
- Enumerate the insights you gathered during your PCA exercise.
The weights for PC 1 to PC 13 are available in the Table: Principle Components
- The variance will be highest in the principal component column 1 and reduce when going towards right from PC 1 to PC 13.
- The covariance within the columns is 0.
- The Out of 13 data set 4.7 is captured by 1st column which is around 36%.
- The next column will add extra variance of 19% and cumulative variance of 55%.
- Majority of variance i.e. 90% is covered by initial 8 columns.
- Hence we can take 8 columns only to get maximum information without much loss of information. Dimension reduction to 8 is feasible here.
[pic 6]
[pic 7]
- What are the social and business values of those insights, and how the value of those insights can be harnessed?
By PCA we got the dimension reduction of the said data to only 8. This will result into reduced cost and removal of unnecessary analysis without loss of useful information.
PART 2: Cluster Analysis
- Step 1: Cluster Analysis with all chemical measurements:
[pic 8]
- Step 2: Cluster Analysis using 2 most significant PC scores:
[pic 9]
- Any more insights you come across during the clustering exercise?
Solution.
Clusters are divided mostly on the basis of average diluted wine and its Proline content.
- Cluster 1: concentrated with Proline contents
- Cluster 2: concentrated with diluted wine
- Cluster 3: low wine content and high proline content.
- Are there clearly separable clusters of wines? How many clusters did you go with? How the clusters obtained in part (i) are different from or similar to clusters obtained in part (ii), qualitatively?
Solution.
- Yes, there are 3 clearly visible clusters of wine.
-
...