0
$\begingroup$

So I understand the differences between positive and negative correlation and so on. However what I don't understand are the similarities and differences between a positive correlation and a linear equation/relationship.

How are they the same and/or different?

  • 1
    Have you read this Wikipedia article? http://en.wikipedia.org/wiki/Correlation#Correlation_and_linearity2012-09-07
  • 0
    Never found it :(2012-09-08

1 Answers 1

1

Correlation is a metric which measures the relation between random data samples. It is originally defined for random processes as $\rho_{XY}=\frac{E[(X_-\mu_x)(Y-\mu_Y)]}{\sigma_x\sigma_y}$

Whereas linear equation is given in such a form: $f(x)=ax+b$ which is in principe deterministic.

The idea is that if there is a correlation between two random variables $X$ and $Y$, $X$ will look like similar to $Y$ as much as the degree of the correlation. For example of $\rho_{X,Y}=1$ then $X$ and $Y$ will be perfectly correlated. This means $X$ and $Y$ will output the same samples $x$ and $y$. Now lets have a look at the linear equation for this specific case. Obviously we will have $f(x)=x$. If we have for example $X/2$ and $Y$ for which we have $\rho_{X/2,Y}=1$ then we will have $f(x)=x/2$. In general we can have the form $f(x)=ax+b$ if we are talking about a very correlated data. In case the data looses the correlation then it will spread around the line $f(x)=ax+b$ for some suitable parameters $a$ and $b$.

  • 0
    Wait, what? As much As I appreciate this, can you make it simpler? (I'm only in 12th grade)2012-09-08
  • 0
    @Link Assume you have a data in a vector $x=[x_1,x_2,...,x_N]$ and another $y=[y_1,y_2,...,y_N]$.If your data samples, $x_i$ and $y_i$ are correlated, then they will look like similar right? in case this correlation is $100\%$ you will have $y_i=x_i$.E.g: $x=[1, 2, -1, 4]$ and $y=[1, 2, -1, 4]$.If you draw these values in $x-y$ coordinate you will see that they follow the line $y=x$. In particular $x=[1/2, 2/2, -1/2, 4/2]$ and $y=[1, 2, -1, 4]$ will also follow a line but it is not $y=x$ but $x=2x$.Another line is $x=[1/2+2, 2/2+2, -1/2+2, 4/2+2]$ and $y=[1, 2, -1, 4]$ here we have $y=2x-4$2012-09-08
  • 0
    @Link all above examples are for $100\%$ correlation. As you can see such correlations map to a linear line of the form $y=f(x)=ax+b$ for some suitable $a$ and $b$. In case you loose your correlations, your data $x$ and the corresponding data $y$ will also start to deviate from linearity... I hope this clarifies the things now.2012-09-08
  • 0
    okay, it does thanks!2012-09-08