Grid Search

On this page, W3schools.com collaborates with , to deliver digital training content to our students.

The majority of machine learning models contain parameters that can be adjusted to vary how the model learns. For example, the logistic regression model, from sklearn, has a parameter C that controls regularization,which affects the complexity of the model.

How do we pick the best value for C? The best value is dependent on the data used to train the model.

How does it work?

One method is to try out different values and then pick the value that gives the best score. This technique is known as a grid search. If we had to select the values for two or more parameters, we would evaluate all combinations of the sets of values thus forming a grid of values.

Before we get into the example it is good to know what the parameter we are changing does. Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite.

Using Default Parameters

First let's see what kind of results we can generate without a grid search using only the base parameters.

To get started we must first load in the dataset we will be working with.

from sklearn import datasets iris = datasets.load_iris()

Next in order to create the model we must have a set of independent variables X and a dependant variable y.

X = iris['data'] y = iris['target']

Now we will load the logistic model for classifying the iris flowers.

from sklearn.linear_model import LogisticRegression

Creating the model, setting max_iter to a higher value to ensure that the model finds a result.

Keep in mind the default value for C in a logistic regression model is 1, we will compare this later.

In the example below, we look at the iris data set and try to train a model with varying values for C in logistic regression.

logit = LogisticRegression(max_iter = 10000)

After we create the model, we must fit the model to the data.

print(logit.fit(X,y))

To evaluate the model we run the score method.

print(logit.score(X,y))

Example

from sklearn import datasets
from sklearn.linear_model import LogisticRegression

iris = datasets.load_iris()

X = iris['data']
y = iris['target']

logit = LogisticRegression(max_iter = 10000)

print(logit.fit(X,y))

print(logit.score(X,y))

With the default setting of C = 1, we achieved a score of 0.973.

Let's see if we can do any better by implementing a grid search with difference values of 0.973.

ADS CODE

Frequently Ask

What is VPS ? Explain in 100 word.