Thursday, August 11, 2016

R functions model.frame en model.matrix

This post is a reminder on the differences between model.frame() and model.matrix() in R.

Firstly, model.frame() creates a data.frame, and therefore keeps the column classes as they are (e.g., numeric, logical, ordered or factor), whereas model.matrix() creates a matrix, and therefore converts all columns to class numeric. Variables of class logical are converted to 0-1 variables (with "TRUE" added to the variables name), variables of class factor are converted to (number of factor levels - 1) 0-1 variables using treatment contrasts (all factor levels are compared with the first level of the factor), variables of class ordered are converted to (number of factor levels - 1) 0-1 variables using polynomial contrasts.

Secondly, model.frame() keeps the y variable (or left-hand side of the formula specification) in the model by default, whereas model.matrix() does not. If you want to leave the y-variable out of the model.frame, simply leave it out of the formula (e.g., "~ x1+ x2" instead of "y ~ x1 + x2")

Thirdly, model.matrix includes a column of ones for the intercept, by default. If you do not want this column in your model.matrix, include "-1" in the right hand side of the formula (e.g., "y ~ -1 + x1+ x2")

The following R code also illustrates some of the differences:

## Create example dataset and model formula:
exdata <- data.frame(y = 1:8, x1 = c(1:4, 1:4), 
                      x2 = rep(c(TRUE, FALSE), times = 2), 
                      x3 = factor(rep(letters[1:2], times = 4)), 
                      x4 = factor(rep(1:4, times = 2), ordered = TRUE))
exformula <- y ~ x1 + x2 + x3 + x4                      
## Create an example model.frame:
exmodframe <- model.frame(formula, exdata)
## y is retained in the model.frame:
exmodframe
## model.frame keeps keeps same column classes:
sapply(exmodframe, class)
## Create an example model.frame:
exmodmat1 <- model.matrix(exformula, exdata)
## y is dropped from model.matrix:
exmodmat1
## all columns are converted to numeric class:
apply(exmodmat1, 2, class)
## It does not make a difference if we create a model.matrix directly from the 
## data, instead of from a model.frame of the data:
exmodmat2 <- model.matrix(exformula, exmodframe)
exmodmat2
apply(exmodmat2, 2, class)
exmodmat1 == exmodmat2

No comments:

Post a Comment