Data Mining
Autor: goude2017 • December 26, 2017 • 2,231 Words (9 Pages) • 561 Views
...
Note: Click here Missing value transformation for further details on handling missing values.
- PCA for variable reduction:
We removed useless variables and replaced the missing values of the operators, but there were still many attributes which needed to reduce. For this purpose, we used Principal component analysis on these set of attributes, so that we can derive the data in fewer number of vectors replacing the original set of variables.
PCA category: We have divided the attributes into 11 categories for PCA
- Giving history: We selected these variables to apply PCA because the giving history of a donor can be a contributing factor in determining a model to predict potential donors.Please find the rapid miner model in appendix. The parameter we used for PCA analysis is given in appendix table.
Attributes input to analyze Giving History PCA
AVGGIFT
CARDGIFT
LASTGIFT
MAXRDATE
MINRAMNT
MINRDATE
NEXTDATE
NGIFTALL
RAMNTALL
TIMELAG
- Promotion history: We considered the promotional history attributes to use this past data of the various promotions and responses of donors in our predictive modelling.
The parameter we used for PCA analysis is given in appendix table
Attributes input to analyze Promo_History
CARDPM12
CARDPROM
MAXADATE
MINRDATE
NUMPRM12
NUMPROM
- Neighbor attributes: Initially in the total of 480 attributes, there were 286 attributes which were about the characteristics of the donor’s neighborhood. So in order to derive meaningful insight from this segment and at the same time reduce the number of variables we performed PCA on this category to get cumulative effect of many attributes into few reduced variables. The parameter we used for PCA analysis is given in appendix table.
AGE907
POP901
CHIL1
POP902
CHIL2
POP903
CHIL3
POP90C1
AGEC1
POP90C2
AGEC2
POP90C3
AGEC3
POP90C4
AGEC4
POP90C5
AGEC5
AGE901
AGEC6
AGE902
AGEC7
AGE903
CHILC1
AGE904
CHILC2
AGE905
CHILC3
AGE906
CHILC4
CHILC5
MARR4
HHAGE1
HHP1
HHAGE2
HHP2
HHAGE3
HHN4
HHN1
HHN5
HHN2
HHN6
HHN3
MARR1
MARR2
MARR3
- Donor interest in other type of mail orders: We considered these attributes because these defined the interest of the donor and the response different type of mail orders received, considering this attribute of the donor interest and response can help us determine a better model, we used PCA on these attributes. The parameter we used for PCA analysis is given in appendix table
Attributes input to analyze PCA Response
MAGFAML
MAGFEM
MAGMALE
MBBOOKS
MBCOLECT
MBCRAFT
MBGARDEN
PUBCULIN
PUBDOITY
PUBGARDN
PUBHLTH
PUBNEWFN
PUBOPP
PUBPHOTO
- Military history: We created PCA for donor’s military history.
MALEMILI
MALEVET
VIETVETS
WWIIVETS
LOCALGOV
STATEGOV
FEDGOV
- Ethnicity: Based on neighbor population data, we created PCA for ethnicity variables ETH1-ETH16.
- Housing PCA: Based on neighbor population housing information .
- Income
...