What is inductive bias in machine learning?

Machine LearningTerminology

Machine Learning Problem Overview


What is inductive bias in machine learning? Why is it necessary?

Machine Learning Solutions


Solution 1 - Machine Learning

Every machine learning algorithm with any ability to generalize beyond the training data that it sees has some type of inductive bias, which are the assumptions made by the model to learn the target function and to generalize beyond training data.

For example, in linear regression, the model assumes that the output or dependent variable is related to independent variable linearly (in the weights). This is an inductive bias of the model.

Solution 2 - Machine Learning

What is inductive bias?

Pretty much every design choice in machine learning signifies some sort of inductive bias. "Relational inductive biases, deep learning, and graph networks" (Battaglia et. al, 2018) is an amazing  read, which I will be referring to throughout this answer.

> An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independent of the observed data. [...] Inductive biases can express assumptions about either the data-generating process or the space of solutions.

Examples in deep learning

Concretely speaking, the very composition of layers  in deep learning provides a type of relational inductive bias: hierarchical processing. The type of layer imposes further relational inductive biases:

Various relational inductive biases in standard deep learning components (Battaglia et. al, 2018)

More generally, non-relational inductive biases used in deep learning include:

  • activation non-linearities,
  • weight decay,
  • dropout,
  • batch and layer normalization,
  • data augmentation,
  • training curricula,
  • optimization algorithms,
  • anything that imposes constraints on the learning trajectory.

Examples outside of deep learning

In a Bayesian model, inductive biases are typically expressed through the choice and parameterization of the prior distribution. Adding a Tikhonov regularization penalty to your loss function implies assuming that simpler hypotheses are more likely.

Conclusion

The stronger the inductive bias, the better the sample efficiency--this can be understood in terms of the bias-variance tradeoff. Many modern deep learning methods follow an “end-to-end” design philosophy which emphasizes minimal a priori representational and computational assumptions, which explains why they tend to be so data-intensive. On the other hand, there is a lot of research into baking stronger relational inductive biases into deep learning architectures, e.g. with graph networks.

An aside about the word "inductive"

In philosophy, inductive reasoning refers to generalization from specific observations to a conclusion. This is a counterpoint to deductive reasoning, which refers to specialization from general ideas to a conclusion.

Solution 3 - Machine Learning

Inductive bias is the set of assumptions a learner uses to predict results given inputs it has not yet encountered.

Solution 4 - Machine Learning

According to Tom Mitchell's definition,

> an inductive bias of a learner is the set of additional assumptions > sufficient to justify its inductive inferences as deductive > inferences.

I couldn't quite understand above definition so I searched through Wikipedia and was able to summarize the definition in layman's terms.

> Given a data set, which learning model (=Inductive Bias) should be > chosen?

Inductive Bias has some prior assumptions about the tasks. Not one bias that is best on all problems and there have been a lot of research efforts to automatically discover the Inductive Bias.

The following is a list of common inductive biases in machine learning algorithms.

Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier.

Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased.

Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries.

Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. The assumption is that simpler hypotheses are more likely to be true. See Occam's razor.

Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms.

Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class.

More information is here: Inductive Bias - How to generalize on novel data

Solution 5 - Machine Learning

Inductive bias can be thought of as the set of assumptions we make about a domain which we are trying to learn about.

Technically, when we are trying to learn Y from X and, initially, the hypothesis space (different functions for learning X->Y) for Y is infinite. To learn anything at all, we need to reduce the scope. This is done in the form of our beliefs/assumptions about the hypothesis space, also called inductive bias.

Through the introduction of these assumptions, we constrain our hypothesis space and also get the capability to incrementally test and improve on the data in the form of hyper-parameters.

Examples of inductive bias -

  • Linear Regression: Y varies linearly in X (in parameters of X).
  • Logistic Regression: There exists a hyperplane which separates negative / positive examples
  • Neural Networks: crudely speaking, Y is some non-linear function of X (the non linearity depends on the activation functions, topology etc.)

Solution 6 - Machine Learning

Inductive bias is nothing but a set of assumptions which a model learns by itself through observing the relationship among data points in order to make a generalized model. The accuracy of prediction will then be increased when exposed to a new test data in real time.

For example:

Let’s consider a regression model to predict the marks of a student considering attendance percentage as an independent variable-

enter image description here

Here, the model will assume that there is a linear relationship between attendance percentage and marks of the student. This assumption is nothing but an Inductive bias.

In future, if any new test data is applied to the model then this model will try to predict the marks with respect to the learning it had through this training data. Linearity is important information (assumption) this model has even before it is seeing the test data for first time. So, the inductive bias of this model is an assumption of linearity between independent and dependent variable.

Consider another example in which we have support vector machine model to classify whether a vehicle is car or bike based on the number of wheels it has as an independent variable-

enter image description here

Here, the model will try to increase the distance between 2 classes by trying to maximizing the width between decision boundaries. This learning will be used as assumptions in test data which is an inductive bias of this model.

Similarly, we can consider many examples in machine learning with respect to the character of many algorithms like-

K Nearest neighbors - Assuming that the class of a new data point belongs to the category of majority of K nearest neighbors’ classes.

Association rules - Assuming that 2 items with a high confidence value will be probably bought together by customers most of the times.

Time series analysis - Assuming that when an intrinsic value of a time dependent variable crosses the moving average from below to above then there is a chance that the value of time dependent value will increase over time.

Solution 7 - Machine Learning

I think it is a set of assumption with which people can predict from inputs which are not in the data set we have more properly. It is necessary for a model to have some inductive bias, because only with it the model can be more useful for more data. The goal of model is to fit in the most of data, but not only in the sample data. So inductive bias is important.

Solution 8 - Machine Learning

It is seen often that a machine learning algorithms work well when tested on the training set and does not work so good when working with new data, it has not seen. In machine learning, the term inductive bias refers to a set of assumptions made by a learning algorithm to generalize a finite set of observation (training data) into a general model of the domain.

For example In linear regression, the model implies that the output or dependent variable is related to the independent variable linearly (in the weights). This is an inductive bias of the model. Hope this answer helps.

Solution 9 - Machine Learning

Inductive bias (of a learning algorithm) refers to a set of assumptions that the learner uses to predict outputs given unseen inputs. The most commonly used ML models rely on inductive bias such as

Maximum conditional independence in Bayesian framework

Minimum cross-validation error

Maximum margin in Support Vector Machine (SVM)

Minimum description length (Based on Occam’s razor principle that argues simpler hypotheses are more likely to be true)

Minimum features (Unless there is good evidence that a feature is useful, it should be deleted)

Nearest neighbors in clustering

Model ensemble in boosting algorithms (using multiple additive classifiers to obtain better predictive performance).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionhaguki-taroView Question on Stackoverflow
Solution 1 - Machine LearningViswaView Answer on Stackoverflow
Solution 2 - Machine LearningChristabella IrwantoView Answer on Stackoverflow
Solution 3 - Machine LearningNadeeshani WilliamView Answer on Stackoverflow
Solution 4 - Machine LearningaerinView Answer on Stackoverflow
Solution 5 - Machine LearningVaibhavView Answer on Stackoverflow
Solution 6 - Machine LearningSanjay NandakumarView Answer on Stackoverflow
Solution 7 - Machine LearningxiaotongView Answer on Stackoverflow
Solution 8 - Machine Learningmaria_gView Answer on Stackoverflow
Solution 9 - Machine LearningLelouch viView Answer on Stackoverflow