Statistical Analysis
Regression model:
Investigation of possible relationship between 2 variables:
Examples:
Is there any relationship between smoking Cigarette and Cancer?
Is there any relationship between Environmental factors and Cancer
Is there any relationship between Genetic factors and Cancer
Is there any relationship between X and Y
Is there any relationship between Adverting Expenditure and Sales
Is there any relationship between Price and Sales
Is there any relationship between Quality and Sales
Is there any relationship between Location and Sales
Is there any relationship between X and Y
Investigation of possible relationship between 2 variables:
 Tabular Methods
 Graphical Methods
 Numerical Methods
 Tabular Methods
The use of the Cross Tabs allows us to make a comment on the possible relationship between X and Y.
 Graphical Methods
The use of the scatter Plotallows us to make a comment on the possible relationship between X and Y:
X and Y appear to be:
 Not related
 Related in a nonlinear fashion (Polynomial, Exponential, Log etc.)
 Related in a linear fashion :
 Is it a direct linear relationship?
 Is it an indirect linear relationship?
 Numerical Methods
Regression and correlation analysis will be used to make a comment on the possible relationship between X and Y. The main idea is to:
 Calculate and interpret the intercept (b_{0}) and the slope (b_{1}).
 Calculationand interpret the Coefficient of Determination(r^{2}).
 Calculate and interpret the Correlation Coefficient (r).
 Test the significance of relationship (from sample to Population).
 Conduct forecasting.
Steps in Regression Analysis:
 Step 1 Model Building – Purpose is to establish a Cause and effect relationship between the Dependent (Y) and The Independent variables ( X1, X2,,….. )
Y = f (X_{1}, X_{2}_{, }X_{3}_{, }X_{4}_{,…….})Multivariate Model
 Step 2 Specification Step– Purpose is to simplify the model as a “Simple” “Linear” model.
Number of Independent variables:
 Simple model ( only ONE independent variable)
Y = f (X)Simple Model
 Multivariate model ( More than ONE independent variable)
Y = f (X_{1}, X_{2}_{, }X_{3}_{, }X_{4}_{,…….})
Type of Functional relationship:
 Linear relationship
Y = b_{0 }_{+ }b_{1}X
Intercept (b_{0}): At X = 0, the Y is EXPECTED (or estimated) to be b_{0}
Slope (b_{1}): For every additional unit increase in X, Y is estimated to increase (decrease if negative) by b_{1}
 Nonlinear relationship i.e. Exponential, Cubic, Log etc
Y = b_{0 }_{+ }b_{1}X^{2}
Y = b_{0 }_{+ }b_{1}X^{1/3}* X^{1/2}
 Step 3 Data collection step
 Time series Data (Historical Data) Collecting data across different Time Periods.
Elements  Variable 1  Variable 2 
Time  X  Y 
2008  
2009  
2010  
2011  
2012 
Elements  Variable 1  Variable 2 
Time  X  Y 
January  
February  
March  
April  
May 
 Cross Sectional Data – Collecting data across different observations (elements)
Elements  Variable 1  Variable 2 
Countries  X  Y 
Afghanistan


Zimbabwe 
Elements  Variable 1  Variable 2 
Companies  X  Y 
IMB  
Microsoft  
 Step 4 Visualization Step – Purpose is to visualize (see) the pattern. Does it appear to be any type of relationship? Is the relationship nonlinear? If linear, does it appear to be a direct (positive) or an Indirect (Negative) relationship?
 Step 5 Estimation Step– Purpose is to estimate (and Interpret) the Intercept and the slope.
Step 6 Forecasting – Purpose is to use the pattern to conduct forecasting.
Chapter 12 Regression and Correlation Analysis
 Develop the scatter diagram for the data set. Comment on the possible relationship between X and Y.
 Calculate the intercept(b_{0}) and the slope(b_{1}).Interpretyour findings.
 For a given level of X, forecast the
 Calculation the Coefficient of Determination and Interpret
 Calculate the Correlation Coefficient (r) and Interpret.
True Values  Estimated Values  
All  Some  
Population Parameter  Sample Statistics  
Average  μ  X Bar 
Standard Deviation  σ  S 
Proportion (%)  π  P Bar 
Intercept  β_{0}  b_{0} 
Slope  β_{1}  b_{1} 
Correlation Coefficient  ρ  r 
Part1. Comment on the possible relationship between X and Y
Make a comment about the following:
The use of the scatter Plot allows us to make a comment on the possible relationship between X and Y:
X and Y appear to be:
 Not related
 Related in a nonlinear fashion (Polynomial, Exponential, Log etc.)
 Related in a linear fashion :
 Is it a direct linear relationship?
 Is it an indirect linear relationship?
Do you see any pattern? What does the pattern look like? Does it appear to be a linear or a nonlinear relationship? If linear, does it appear to be “direct” or “indirect” linear relationship?
It appears that there is a directlinear relationship between “# of TV ads” and “# of cars sold”.
Part 2– Interpretations of the intercept (b_{0}) and the slope (b_{1}):
Y^{^} = b_{0 }_{+ }b_{1}X
Intercept (b_{0}): At X = 0, the Y is EXPECTED (or estimated) to be b_{0}
Slope (b_{1}): For every additional unit increase in X, Y is estimated to increase (decrease if negative) by b_{1}
_{ }
_{ }
Y^{^} = 10_{+ }5X
If we run NO TV ads “# of cars sold” is expected to be 10 cars.
For every additional TV ads,“# of cars sold” is expected to increase by 5 cars.
_{ }
_{ }
_{ }
Part 3– For a given level of X, forecast the Y.
Y^{^} = b_{0 }_{+ }b_{1}X
In a typical case (homework), the value of X will be provided. Simply insert the given value of X in the above equation and finish the calculation.
Y^{^} = 10_{+ }5(5) = 35
If we run 5 TV ads, “# of cars sold” is expected to be 35.
Part 4. Coefficient of Determination(r^{2})
Calculation the Coefficient of Determination and Interpret it.
The Coefficient of Determination (r^{2} ) is a measure of the goodness of fit of a linear model to observed data. The value of the (r^{2} ) is always expressed as a percentage and it varies between 0 =<r^{2 }^{=}< 1.00
Overall, the larger the r^{2}, the better the fit.
For a complete interpretation, make the following three comments about r^{2}:
 Comment on the goodness of fit. (Tip: If r^{2}> .65 it is a good fit, if r^{2}>0.80 it is an excellent fit).
Example: For a sample of …. randomly selected ……, the linear model provides a …(good, excellent etc)…. fit.
 What % of variations in Y^ is EXPLIAINED by the variations in X. (the main factor).
 What % of variations in Y^ is EXPLIAINED by the variations in other influencing factors. 1r^{2}
 For a sample of 5 randomly selected time period, the linear model provide an excellent fit to the observed date.
 72% of variations in “# of cars sold” is explained by “# of TV ads”.
 28% of variations in “# of cars sold” is explained by other influencing factors i.e. Management, Price …
Part 5.Correlation Coefficient
The Correlation Coefficient (r) is a measure of the strength as well as the direction of a linear model to observed data. The value of the (r) is always expressed as a percentage and it varies between 1.00 =<r =< 1.00
Overall, the larger the r, the stronger the relationship
Tips regarding theDIRECTION of the linearrelationship:
If the sign of the “r” is positive, then use the adjective of “Direct”. If the sign of the “r” is negative use the adjective of “Indirect”
Tips regarding theSTRENGHT of linear relationship:
Disregard the sign, if the absolute value of “r” is within the following ranges, then use the suggested adjectives to interpret your findings:
r >0.70  Strong 
r >0.85  Extremely Strong 
r <0.30  Weak 
r <0.15  Extremely Weak 
0.40< r < 0.60  Medium 
0.60< r < 0.70  Semistrong 
0.30< r < 0.40  Semiweak 
For a sample of 5 randomly selected time period, there is direct extremely strong linear relationship between “#of cars sold” and “# of TV ads”.
Regression model:
Purpose: In simple linear regression, a model will be used to describe the relationship between a single dependent variable y and a single (or multiple) independent variable(s) x.
Model Building:Either a simple or multiple regression modelsare initially posed as a hypothesis concerning the relationship among the dependent and independent variables.
Example: As an illustration of regression analysis and the least squares method, suppose a university medical centre is investigating the relationship between stress and blood pressure. Assume that both a stress test score and a blood pressure reading have been recorded for a sample of 20 patients. The data are shown graphically in the figure below, called a scatter diagram. Values of the independent variable, stress test score, are given on the horizontal axis, and values of the dependent variable, blood pressure, are shown on the vertical axis. The line passing through the data points is the graph of the estimated regression equation: y = 42.3 + 0.49x. The parameter estimates, b0 = 42.3 and b1 = 0.49, were obtained using the least squares method.
Correlation:
Correlation and regression analysis are related in the sense that both deal with relationships among variables. The correlation coefficient is a measure of linear association between two variables. Values of the correlation coefficient are always between 1 and +1.
A correlation coefficient of +1 indicates that two variables are perfectly related in a positive linear sense, a correlation coefficient of 1 indicates that two variables are perfectly related in a negative linear sense, and a correlation coefficient of 0 indicates that there is no linear relationship between the two variables.
Neither regression nor correlation analyses can be interpreted as establishing causeandeffect relationships. They can indicate only how or to what extent variables are associated with each other. The correlation coefficient measures only the degree of linear association between two variables. Any conclusions about a causeandeffect relationship must be based on the judgment of the analyst.
Application of the Simple Linear Regression
Example 1: Simple regression line
Reed Auto periodically has a special weeklong sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale.
Data from a sample of 5 previous sales are shown on the right.
X  Y 
# of TV Ads  # of cars sold 
1  14 
3  24 
2  18 
1  17 
3  27 
Y^{^} = b_{0} + b_{1 }X Estimated Regression Equation
 Slope for the Estimated Regression Equation
yIntercept for the Estimated Regression Equation
Let’s forecast the # of auto sales if we run “5” TV ads.
Estimated Regression Equation
Y^ = b_{0} + b_{1 }X
Example: Relationship between Total Cost and Production Volume
Below is the monthly data which depicts the relationship between Total Cost and Production Volume for the last 6 month.
#21  X  Y 
Observatios  Production Volume  Total Cost 
Jan.  400  $4,000 
Feb.  450  $5,000 
March  550  $5,400 
April  600  $5,900 
May  700  $6,400 
June  750  $7,000 
Let’s answer the following questions:
 Develop the scatter diagram for the data set. Comment on the possible relationship between X and Y.
 Calculate the intercept (b_{0}) and the slope (b_{1}). Interpret your findings.
 Test of ( b_{1 }). Conduct a test of the significance of linear relationship between X and Y. Interpret your findings.
 Calculation the Coefficient of Determination and Interpret
 Calculate the Correlation Coefficient (r) and Interpret.
 Test of ( r ). Conduct a test of the significance of the Strength of linear relationship. Interpret your findings.
 For a given level of X, forecast the
Place a custom essay order similar to this or any related topic. NB: The assignment paper will be written from scratch as per your instructions and it will be 100% original. It will pass all plagiarism check.