Missconceptions in Co-relation (group no 4 roll no 25): October 2013

Common Misconceptions

Correlation and causality

The conventional dictum that "correlation does not imply causation" means that correlation cannot be used to infer a causal relationship between the variables.This dictum should not be taken to mean that correlations cannot indicate the potential existence of causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown, and high correlations also overlap with identity relations, where no causal process exists. Consequently, establishing a correlation between two variables is not a sufficient condition to establish a causal relationship (in either direction).

A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health, or does good health lead to good mood, or both? Or does some other factor underlie both? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be.

Correlation and linearity

Four sets of data with the same correlation of 0.816

The Pearson correlation coefficient indicates the strength of a linear relationship between two variables, but its value generally does not completely characterize their relationship. In particular, if the conditional mean of Y given X, denoted E(Y|X), is not linear in X, the correlation coefficient will not fully determine the form of E(Y|X).
The image on the right shows scatterplots of Anscombe's quartet, a set of four different pairs of variables created by Francis Anscombe.The four y variables have the same mean (7.5), variance (4.12), correlation (0.816) and regression line (y = 3 + 0.5x). However, as can be seen on the plots, the distribution of the variables is very different. The first one (top left) seems to be distributed normally, and corresponds to what one would expect when considering two variables correlated and following the assumption of normality. The second one (top right) is not distributed normally; while an obvious relationship between the two variables can be observed, it is not linear. In this case the Pearson correlation coefficient does not indicate that there is an exact functional relationship: only the extent to which that relationship can be approximated by a linear relationship. In the third case (bottom left), the linear relationship is perfect, except for one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.816. Finally, the fourth example (bottom right) shows another example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables is not linear.
These examples indicate that the correlation coefficient, as a summary statistic, cannot replace visual examination of the data. Note that the examples are sometimes said to demonstrate that the Pearson correlation assumes that the data follow a normal distribution, but this is not correct.

Life-time of correlation

Most analyses do not take into account variation of the correlation coefficient with time. If the variables are non-stationary, then some concepts of choosing optimal time intervals are needed. The durability of correlation should also be calculated in such a case.

Bivariate normal distribution

If a pair (X, Y) of random variables follows a bivariate normal distribution, the conditional mean E(X|Y) is a linear function of Y, and the conditional mean E(Y|X) is a linear function of X. The correlation coefficient r between X and Y, along with the marginal means and variances of X and Y, determines this linear relationship:

$E(Y|X) = E(Y) + r\sigma_y\frac{X-E(X)}{\sigma_x},$

where E(X) and E(Y) are the expected values of X and Y, respectively, and σ_x and σ_y are the standard deviations of X and Y, respectively.

Partial correlation

If a population or data-set is characterized by more than two variables, a partial correlation coefficient measures the strength of dependence between a pair of variables that is not accounted for by the way in which they both change in response to variations in a selected subset of the other variables.

Thanks,
Nishant Asrani
Roll No 25
Group No 4

Missconceptions in Co-relation (group no 4 roll no 25)

Tuesday, 8 October 2013

Group 7 roll no 45 T test

Misconception of co-relation