What does calling fit() multiple times on the same model do?

PythonMachine LearningScikit Learn

Python Problem Overview


After I instantiate a scikit model (e.g. LinearRegression), if I call its fit() method multiple times (with different X and y data), what happens? Does it fit the model on the data like if I just re-instantiated the model (i.e. from scratch), or does it keep into accounts data already fitted from the previous call to fit()?

Trying with LinearRegression (also looking at its source code) it seems to me that every time I call fit(), it fits from scratch, ignoring the result of any previous call to the same method. I wonder if this true in general, and I can rely on this behavior for all models/pipelines of scikit learn.

Python Solutions


Solution 1 - Python

If you will execute model.fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.

If you want to fit just a portion of your data set and then to improve your model by fitting a new data, then you can use estimators, supporting "Incremental learning" (those, that implement partial_fit() method)

Solution 2 - Python

You can use term fit() and train() word interchangeably in machine learning. Based on classification model you have instantiated, may be a clf = GBNaiveBayes() or clf = SVC(), your model uses specified machine learning technique.
And as soon as you call clf.fit(features_train, label_train) your model starts training using the features and labels that you have passed.

you can use clf.predict(features_test) to predict.
If you will again call clf.fit(features_train2, label_train2) it will start training again using passed data and will remove the previous results. Your model will reset the following inside model:

  • Weights
  • Fitted Coefficients
  • Bias
  • And other training related stuff...

You can use partial_fit() method as well if you want your previous calculated stuff to stay and additionally train using next data

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFantaView Question on Stackoverflow
Solution 1 - PythonMaxU - stop genocide of UAView Answer on Stackoverflow
Solution 2 - PythonsgrpwrView Answer on Stackoverflow