Is there a difference between the R functions fitted() and predict()?

R

R Problem Overview


Is there a difference between the functions fitted() and predict()? I've noticed that mixed models from lme4 work with fitted() but not predict().

R Solutions


Solution 1 - R

Yes, there is. If there is a link function relating the linear predictor to the expected value of the response (such as log for Poisson regression or logit for logistic regression), predict returns the fitted values before the inverse of the link function is applied (to return the data to the same scale as the response variable), and fitted shows it after it is applied.

For example:

x = rnorm(10)
y = rpois(10, exp(x))
m = glm(y ~ x, family="poisson")

print(fitted(m))
#         1         2         3         4         5         6         7         8 
# 0.3668989 0.6083009 0.4677463 0.8685777 0.8047078 0.6116263 0.5688551 0.4909217 
#         9        10 
# 0.5583372 0.6540281 
print(predict(m))
#          1          2          3          4          5          6          7 
# -1.0026690 -0.4970857 -0.7598292 -0.1408982 -0.2172761 -0.4916338 -0.5641295 
#          8          9         10 
# -0.7114706 -0.5827923 -0.4246050 
print(all.equal(log(fitted(m)), predict(m)))
# [1] TRUE

This does mean that for models created by linear regression (lm), there is no difference between fitted and predict.

In practical terms, this means that if you want to compare the fit to the original data, you should use fitted.

Solution 2 - R

The fitted function returns the y-hat values associated with the data used to fit the model. The predict function returns predictions for a new set of predictor variables. If you don't specify a new set of predictor variables then it will use the original data by default giving the same results as fitted for some models, but if you want to predict for a new set of values then you need predict. The predict function often also has options for which type of prediction to return, the linear predictor, the prediction transformed to the response scale, the most likely category, the contribution of each term in the model, etc.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionN BrouwerView Question on Stackoverflow
Solution 1 - RDavid RobinsonView Answer on Stackoverflow
Solution 2 - RGreg SnowView Answer on Stackoverflow