How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

PythonScikit LearnRandom ForestCross Validation

Python Problem Overview


I'm running GridSearch CV to optimize the parameters of a classifier in scikit. Once I'm done, I'd like to know which parameters were chosen as the best.

Whenever I do so I get a AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_', and can't tell why, as it seems to be a legitimate attribute on the documentation.

from sklearn.grid_search import GridSearchCV

X = data[usable_columns]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = {
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

print '\n',CV_rfc.best_estimator_
    

Yields:

`AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'

Python Solutions


Solution 1 - Python

You have to fit your data before you can get the best parameter combination.

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)


rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X, y)
print CV_rfc.best_params_

Solution 2 - Python

Just to add one more point to keep it clear.

The document says the following:

> best_estimator_ : estimator or dict: > >Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.

When the grid search is called with various params, it chooses the one with the highest score based on the given scorer func. Best estimator gives the info of the params that resulted in the highest score.

Therefore, this can only be called after fitting the data.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionsapo_cosmicoView Question on Stackoverflow
Solution 1 - PythonRyanView Answer on Stackoverflow
Solution 2 - PythonrohithnamaView Answer on Stackoverflow