Simple Linear Regression

Previously we tested for a relationship between two categorical variables, but what if are interest involved two quantitative variables? You may recall from algebra the line equation: Y = mX + b where Y is the dependent variable; m is the slope of the line; X is the independent variable; and b is the y-intercept.

In algebra you usually had a perfect line, i.e. each X and Y pair fell exactly on the line, however in statistics this rarely the case. The concept begins by considering some dependent variable, e.g. Final Exam. In looking at such scores you realize that not all of them are the same (i.e. not everyone scores the same on the final) so one may wonder what would explain this variation. One possibility is performance on the midterms. To see if there might be a linear relationship between these two variables one would start with a Scatterplot.


From this plot you can see that a potential linear relationship exists as for the most part higher midterm avergages appear to coincide with higher final exam scores. To get a numerical aspect for a linear relationship one would consider the correlation with the symbol of r. From Minitab this correlation is 0.67. Thus correlation is a measure of the strength of a linear relationship between two quantitative variables. If a perfect linear relationship existed the correlation would be one. However, not all relationships are positive. For instance consider the variables Weight and Exercise. One would think that as the more one exercised their weight would decrease. Thus increased exercise would result in decreasing weight. This would have a negative relationship. Thus correlations can also be negative. This gives the possible ranges for correlation to be from –…...

