13 June 2008
Business Mathematics_1
Correlation and Regression
OUTLINE
Introduction
Scatter Plots
Correlation
Regression
OBJECTIVES
Draw a scatter plot for a set of ordered pairs.
Find the correlation coefficient.
Find the equation of the regression line.
OBJECTIVES
Find the coefficient of determination.
Find the standard error of estimate.
INTRODUCTION
Every day we take personal and professional decisions that are based on predictions of future events.
To make these forecasts, we rely on the relationship between what is already known and what is to be estimated.
Regression and correlation analysis show us how to determine both the nature and the strength of a relationship between two variables.
SIGNIFICANCE OF THE STUDY OF CORRELATION
Most of the variables show some kind of relationship between price and supply, income and expenditure, etc. correlation analysis gives the degree of relationship in one figure
Once we know the relationship we can estimate the value of one variable given the value of another.
Correlation analysis contributes to the economic behaviour. In business, correlation analysis enables the executive to estimate costs, price, etc.
TYPES OF CORRELATION
Positive and negative
Simple, partial and multiple and
Linear and non linear
POSITIVE AND NEGATIVE CORRELATION
If two variables vary together in the same direction or in opposite directions, they are said to be correlated.
If as X increases Y increases consistently, X&Y are +vely correlated
If as X increases Y decreases and as X decreases Y increases X&Y are -vely correlated
SIMPLE, PARTIAL AND MULTIPLE CORRELATION
When only two variables are only studied – simple correlation.
When two or more variables are studied – partial or multiple correlation.
In multiple correlation two or more variables are studied simultaneously
In partial correlation more than two variables are there but we consider only two variables (keeping the other as constant)
DEPENDENT & INDEPENDENT VARIABLES
The known variable is called the independent variable and the variable we are trying to predict is the dependent variable.
If the correlation is perfect positive, all the points will lie in a straight line as shown in figure and the correlation is perfect negative they will be in a line as shown in figure
EXAMPLE
Sales of major appliances vary with the new housing market. When new home sales are good, so are the sales of dishwashers, washing machines, drinkers and refrigerators. A trade association compiled the following historical data ( in thousands of units) on major appliance sales and housing starts.
The direction of the line can indicate whether the relationship is direct or inverse.
How can we fit a line mathematically?
To a statistician, the line will have a good fit if it minimizes the error between the estimated points on the line and actual observed points that were used to draw it. (method of least squares)
THE METHOD OF LEAST SQUARES
An equation of a line that is drawn through the middle of a set of points in a scatter diagram such that the sum of the squares of the errors is minimum . The estimating line or points that lie on the estimating line
Slope of the best-fitting Regression line & Y-intercept of the best-fitting Regression line
The given equation is regression equation of Y on X. It gives most probable values of Y for given values of X.
The regression line of X on Y gives the probable values of X for given values of Y. say X=a + bY.
The regression equation of Y on X can also be represented by
The general sales manager of Kiran Enterprises – an enterprise dealing in the sale of ready-made men’s wears – is toying with the idea of increasing his sales to 80,000. on checking the records of sales during the last 10 years, it was found that the annual sale proceeds and advertisement expenditure were highly correlated to the extent of 0.8. It was further noted that the annual average sale has been Rs. 45,000 and annual average advertisement expenditure Rs. 30,000 with a variance of Rs.1600 and Rs. 626 in advertisement expenditure respectively.
In view of the above, how much expenditure on advertisement you would suggest the General sales Manager pf the enterprise to incur to meet his target of sales.
X- advertisement expenditure
Y- sales expenditure
When Y= 80,000
X= 47500
EXAMPLE
Suppose BMC is interested in the relationship between the age of garbage truck and the annual repair expense they should expect to incur. In order to determine this relationship, BMC has accumulated information concerning four of the trucks the city currently owns.
Organize the data as outlined in table
Use the equations of a & b to find the numerical constants for our regression line.
a= 3.75
Y=3.75+0.75X
BMC can estimate the annual repair expense given the age of truck.
If it is 4 years old use the equation Y=3.75+0.75X to get the annual expense as follows
Y= 3.75+0.75 *4
=6.75
Expected annual repair expense =6750.0
Measured by the standard error of estimate
It measures the variability, or scatter of the observed values around the regression line.
STANDARD ERROR
Standard error=0.866à 866.0 /-
If standard error is zero we expect the estimating equation to be a perfect estimator of the dependent variable.
68% of the points within + Se
95.5 % of the points within +Se and 99.7% of the points within + 3Se
CORRELATION
The statistical tool with the help of which the relationships between two or more than two variables is studied is called correlation.
CORRELATION ANALYSIS
Correlation analysis is the statistical tool to describe the degree to which one variable is linearly related to another.
The coefficient of determination
The extent, or strength of the association that exists between two variables X & Y
Sample coefficient of determination
SAMPLE COEFFICIENT OF DETERMINATION
r2=1 when there is perfect correlation
r2=0 when there is no correlation
Note
r2 measures only the strength of a linear relationship between two variables.
CORRELATION COEFFICIENT
The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables.
Sample correlation coefficient, r.
Population correlation coefficient,
Coefficient of correlation
r= r2
When the slope the equation is positive r is the positive square root, but if b is negative r is the negative square root..
The sign of r indicates the direction of the relationship between two variables X & Y
KARL PEARSON’S CORRELATION COEFFICIENT
This is also called product moment coefficient of correlation.
Covariance of x and y is defined as
What does r=0.6 mean?
r=0.6 à r2=0.36
à 36% of the variation in the amount spent on movies is explained by the regression line.
From r=0.6 à the amount spent on movies correlates 0.6 with family incomeà seems like fairy strong correlation . But r2=0.36
36% of the variation in the amount of money families spend on movies.
If you designed your marketing strategy to appeal only to families with high incomes, you’d miss a lot of potential customers.
Instead try to find what else is influencing family movie decisions.
RANK CORRELATION COEFFICIENT
When quantitative measure of certain factor cannot be fixed, but the individuals in the group can be arranged in order thereby obtaining for each individual a number indicating his rank in the group.
The rank correlation coefficient is applied to a set of ordinal rank numbers, with 1 for the individual ranked first, in quantity or quality, and so on, N for last ranked one, then R can be defined as
EXAMPLE
Two managers are asked to rank a group of employees in order of potential for eventually becoming top managers. The ranking are as follows.
Compute the coefficient of rank correlation and comment on the value.
=0.915
WHERE RANKS ARE NOT GIVEN
Assign ranks. Then apply the same formula
EQUAL RANKS OR TIE IN RANKS
Assign each individual or entry an average rank.
Thus if individuals are ranked equal at 5th place, give the rank (5+6)/2 =5.5 to both .
If m is the number of items whose ranks are common then R is
MULTIPLE REGRESSION AND CORRELATION ANALYSIS
We can use more than one independent variable to estimate the dependent variable and thus attempt to increase the accuracy of the estimate.
This process is called multiple regression analysis
EXAMPLE
Consider the real estate agent who wishes to relate the number of houses the firm sells in a month to the amount of her monthly advertising.
Certainly we can find a simple estimating equation that relates these two variables.
Could we also improve the accuracy of our equation by including the number of salespeople she employs each month ?
Then we can use number of sales agents and the advertising expenditures to predict monthly house sales.
Multiple regression equations
For getting a, b & c solve the normal equations
In trying to evaluate the effectiveness in its advertising campaign, a firm complied the following information
Year 1996 1997 1998 1999 2000 2001 2002 2003
Adv. Expenditure 12 15 15 23 24 38 42 48
(‘000 Rs.)
Sales (Lakh Rs.) 5.0 5.6 5.8 7.0 7.2 8.8 9.2 9.5
Estimate the probable sales when advertisement expenditure is Rs. 60 thousand.
Y= 3.8719+ 0.1250 X
When X=60
Y= 11.37
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment