How to save & load xgboost model?
PythonMachine LearningSaveXgboostPython Problem Overview
From the XGBoost guide:
> After training, the model can be saved. > > bst.save_model('0001.model') > > The model and its feature map can also be dumped to a text file. > > # dump model > bst.dump_model('dump.raw.txt') > # dump model with feature map > bst.dump_model('dump.raw.txt', 'featmap.txt') > > A saved model can be loaded as follows: > > bst = xgb.Booster({'nthread': 4}) # init model > bst.load_model('model.bin') # load data
My questions are following.
- What's the difference between
save_model
&dump_model
? - What's the difference between saving
'0001.model'
and'dump.raw.txt','featmap.txt'
? - Why the model name for loading
model.bin
is different from the name to be saved0001.model
? - Suppose that I trained two models:
model_A
andmodel_B
. I wanted to save both models for future use. Whichsave
&load
function should I use? Could you help show the clear process?
Python Solutions
Solution 1 - Python
Here is how I solved the problem:
import pickle
file_name = "xgb_reg.pkl"
# save
pickle.dump(xgb_model, open(file_name, "wb"))
# load
xgb_model_loaded = pickle.load(open(file_name, "rb"))
# test
ind = 1
test = X_val[ind]
xgb_model_loaded.predict(test)[0] == xgb_model.predict(test)[0]
Out[1]: True
Solution 2 - Python
Both functions save_model
and dump_model
save the model, the difference is that in dump_model
you can save feature name and save tree in text format.
The load_model
will work with model from save_model
. The model from dump_model
can be used for example with xgbfi.
During loading the model, you need to specify the path where your models is saved. In the example bst.load_model("model.bin")
model is loaded from file model.bin
- it is just a name of file with model. Good luck!
EDIT: From Xgboost documentation (for version 1.3.3
), the dump_model()
should be used for saving the model for further interpretation. For saving and loading the model the save_model()
and load_model()
should be used. Please check the docs for more details.
There is also a difference between Learning API
and Scikit-Learn API
of Xgboost. The latter saves the best_ntree_limit
variable which is set during the training with early stopping. You can read details in my article How to save and load Xgboost in Python?
The save_model()
method recognize the format of the file name, if *.json
is specified, then model is saved in JSON, otherwise it is text file.
Solution 3 - Python
An easy way of saving and loading a xgboost model is with joblib library.
import joblib
#save model
joblib.dump(xgb, filename)
#load saved model
xgb = joblib.load(filename)
Solution 4 - Python
Don't use pickle or joblib as that may introduces dependencies on xgboost version. The canonical way to save and restore models is by load_model
and save_model
.
> If you’d like to store or archive your model for long-term storage, use save_model (Python) and xgb.save (R).
This is the relevant documentation for the latest versions of XGBoost. It also explains the difference between dump_model
and save_model
.
Note that you can serialize/de-serialize your models as json by specifying json as the extension when using bst.save_model
. If the speed of saving and restoring the model is not important for you, this is very convenient, as it allows you to do proper version control of the model since it's a simple text file.
Solution 5 - Python
If you are using the sklearn api you can use the following:
xgb_model_latest = xgboost.XGBClassifier() # or which ever sklearn booster you're are using
xgb_model_latest.load_model("model.json") # or model.bin if you are using binary format and not the json
If you used the above booster method for loading, you will get the xgboost booster within the python api not the sklearn booster in the sklearn api.
So yeah, this seems to be the most pythonic way to load in a saved xgboost model data if you are using the sklearn api.