What is the difference between regression and classification, when we try to generate output for a training data set $x$?
What is the difference between regression and classification?
11 Answers
Regression: the output variable takes continuous values.
Classification: the output variable takes class labels.
-
0For classification, don't we use numerical values? Something like if output variable in [0,10) then indicates class A, if in [10, 20) then class B and so on. And if so, then are the underlying concepts (algorithm/formula) same for both classifications and regressions? – 2017-06-07
Regression involves estimating or predicting a response.
Classification is identifying group membership.
-
2@endolith In a narrow sense, we say "estimate parameter" but "predict/forecast response". – 2018-01-24
Regression and classification are both related to prediction, where regression predicts a value from a continuous set, whereas classification predicts the 'belonging' to the class.
For example, the price of a house depending on the 'size' (in some unit) and say 'location' of the house, can be some 'numerical value' (which can be continuous): this relates to regression.
Similarly, the prediction of price can be in words, viz., 'very costly', 'costly', 'affordable', 'cheap', and 'very cheap': this relates to classification.
Each class may correspond to some range of values.
Given the following
$f: x \rightarrow y$
If $y$ is discrete/categorical variable, then this is classification problem.
If $y$ is real number/continuous, then this is a regression problem.
-
6This seems to simply reiterate answers from long ago. If you have something new to add, please clarify your answer. – 2014-07-27
Regression: given a set of data, find the best relationship that represents the set of data.
Classification: given a known relationship, identify the class that the data belongs to.
We can see that regression and classification start from opposing ends: to find a pattern or to find the pattern that it belongs to.
Regression and classification can work on some common problems where the response variable is respectively continuous and ordinal.
But the result is what would make us choose between the two.
For example, simple/hard classifiers (e.g. SVM) simply try to put the example in specific class (e.g. whether the project is "profitable" or "not profitable", and doesn't account for how much), where regression can give an exact profit value as some continuous value.
However, in the case of classification, we can consider probabilistic models (e.g. logistic regression), where each class (or label) has some probability, which can be weighted by the cost associated with each label (or class), and thus give us with a final value on basis of which we can decide to put it some label or not. For instance, label $A$ has a probability of $0.3$, but the payoff is huge (e.g. 1000). However, label $B$ has probability $0.7$, but the payoff is very low (e.g. $10$). So, for maximizing the profit, we might label the example as label $A$ instead of $B$.
Note: I am still not an expert, perhaps someone would rectify if I am wrong in some part.
http://www.differencebetween.com/difference-between-classification-and-vs-regression/
Classification trees have dependent variables that are categorical and unordered.
Regression trees have dependent variables that are continuous values or ordered whole values.
Regression means to predict the output value using training data.
Classification means to group the output into a class.
For example, we use regression to predict the house price (a real value) from training data and we can use classification to predict the type of tumor (e.g. "benign" or "malign") using training data.
-
1It's funny when you read this answer while referring to a machine learning course at Coursera ;) . – 2016-02-21
Regression is defined as $E[Y \mid X]$ (i.e. the expectation of $Y$ given $X$).
A subset of these types of models where $Y$ is binary or categorical, including logistic regression and multinomial regression along with many machine learning algorithms that essentially have the same target (such as classification trees), are useful for classification.
It is important to be clear when using terms like regression, classification and prediction to discriminate between the task you are performing and the method used to perform it. A classification task involves taking an input and labelling it as belonging to a given class, so the output is categorical. On the other hand, a prediction task involves predicting a continuous valued output.
Methods for achieving these tasks include regression, in which a continuous valued output is estimated (or, rather, the expected value of a distribution on a continuous variable is estimated, conditional on a given set of input values). This can be used to carry out a prediction task, as you would expect. It can also be used to carry out a classification task, for example using logistic regression to estimate the log odds of the input pattern belonging to a given class. In this case, the task is classification, the method is regression.
Classification methods simply generate a class label rather than estimating a distribution parameter. K nearest neighbour is a good example where the task and the method are both called classification.