Some datasets may have (heavily) skewed predictors. This may increase bias as well as variance of models assuming linear associations between predictor and response. Predictors can then be scaled using non-linear transformations in order to have more linear associations between predictors and response. Here Andrew Gelman discusses some potential advantages of log-transforming predictors when associations between predictor and response are non-linear.
Many decision-tree methods, however, are insensitive to the scaling of the predictors. They thus do not require you to pre-process your predictors. Some suggestions for methods that are invariant with respect to monotonous transformations (e.g., log transform, squaring) of the predictors:
- Single decision tree created using the CART or MOB algorithms, which are e.g. implemented in the rpart and partykit R packages. The advantage of the latter is that it does not have a preference towards splitting on variables with a higher number of possible split points. Package partykit also implements the conditional inference tree algorithm, which is a good method, but it employs linear association tests for variable selection, which are sensitive to the scaling of the predictors.
- Decision-tree ensemblse based on the previously mentioned tree algorithms, as implemented in the randomForest and mobForest packages. Or a CART-based gradient boosting decision-tree ensemble based on the previously mentioned tree akgorithms, as implemented in package gbm.
- Prediction rule ensemble based on MOB or CART. This will strike a balance between the ease of interpretability of a single tree, and the higher predictive accuracy of a decision-tree ensemble. This method is implemented in package pre. Specify use.grad=FALSE in order to derive the rules using MOB, specify tree.unbiased=FALSE to derive the rules using CART (otherwise, default is to use conditional inference trees).
Great
ReplyDeleteMachine Learning Projects for Final Year