13 June 2008

Business Mathematics_1

Correlation and Regression

OUTLINE

Introduction

Scatter Plots

Correlation

Regression

OBJECTIVES

Draw a scatter plot for a set of ordered pairs.

Find the correlation coefficient.

Find the equation of the regression line.


OBJECTIVES

Find the coefficient of determination.

Find the standard error of estimate.


INTRODUCTION

Every day we take personal and professional decisions that are based on predictions of future events.

To make these forecasts, we rely on the relationship between what is already known and what is to be estimated.

Regression and correlation analysis show us how to determine both the nature and the strength of a relationship between two variables.


SIGNIFICANCE OF THE STUDY OF CORRELATION

Most of the variables show some kind of relationship between price and supply, income and expenditure, etc. correlation analysis gives the degree of relationship in one figure

Once we know the relationship we can estimate the value of one variable given the value of another.

Correlation analysis contributes to the economic behaviour. In business, correlation analysis enables the executive to estimate costs, price, etc.


TYPES OF CORRELATION

Positive and negative

Simple, partial and multiple and

Linear and non linear


POSITIVE AND NEGATIVE CORRELATION

If two variables vary together in the same direction or in opposite directions, they are said to be correlated.

If as X increases Y increases consistently, X&Y are +vely correlated

If as X increases Y decreases and as X decreases Y increases X&Y are -vely correlated

SIMPLE, PARTIAL AND MULTIPLE CORRELATION

When only two variables are only studied – simple correlation.

When two or more variables are studied – partial or multiple correlation.

In multiple correlation two or more variables are studied simultaneously

In partial correlation more than two variables are there but we consider only two variables (keeping the other as constant)

DEPENDENT & INDEPENDENT VARIABLES

The known variable is called the independent variable and the variable we are trying to predict is the dependent variable.

If the correlation is perfect positive, all the points will lie in a straight line as shown in figure and the correlation is perfect negative they will be in a line as shown in figure

EXAMPLE

Sales of major appliances vary with the new housing market. When new home sales are good, so are the sales of dishwashers, washing machines, drinkers and refrigerators. A trade association compiled the following historical data ( in thousands of units) on major appliance sales and housing starts.

In this case, data points represents the relationship between the housing market and sales of house appliances. The relationship between X & Y is well described a straight line.

The direction of the line can indicate whether the relationship is direct or inverse.

William C Andrews, an organizational behavior consultant for Victory Motorcycles ,has designed a test to show the company’s supervisors the dangers of over supervising their workers. A worker from the assembly line is given a series of complicated tasks to perform. During the worker’s performance, a supervisor constantly interrupts the worker to assist him or her in completing the tasks. The worker, upon completion of the tasks, is then given a psychological test designed to measure the worker’s hostility toward authority (a high score equals low hostility).

Eight different workers were assigned the tasks and then interrupted for the purpose of instructional assistance variance number of times. Their corresponding scores on the hostility test are revealed as follows. Predict the expected test score if the worker is interrupted 18 times.


How can we fit a line mathematically?

To a statistician, the line will have a good fit if it minimizes the error between the estimated points on the line and actual observed points that were used to draw it. (method of least squares)


THE METHOD OF LEAST SQUARES

An equation of a line that is drawn through the middle of a set of points in a scatter diagram such that the sum of the squares of the errors is minimum . The estimating line or points that lie on the estimating line

Slope of the best-fitting Regression line & Y-intercept of the best-fitting Regression line

The given equation is regression equation of Y on X. It gives most probable values of Y for given values of X.

The regression line of X on Y gives the probable values of X for given values of Y. say X=a + bY.

The regression equation of Y on X can also be represented by

EXAMPLE

The general sales manager of Kiran Enterprises – an enterprise dealing in the sale of ready-made men’s wears – is toying with the idea of increasing his sales to 80,000. on checking the records of sales during the last 10 years, it was found that the annual sale proceeds and advertisement expenditure were highly correlated to the extent of 0.8. It was further noted that the annual average sale has been Rs. 45,000 and annual average advertisement expenditure Rs. 30,000 with a variance of Rs.1600 and Rs. 626 in advertisement expenditure respectively.

In view of the above, how much expenditure on advertisement you would suggest the General sales Manager pf the enterprise to incur to meet his target of sales.

X- advertisement expenditure

Y- sales expenditure

When Y= 80,000

X= 47500

EXAMPLE

Suppose BMC is interested in the relationship between the age of garbage truck and the annual repair expense they should expect to incur. In order to determine this relationship, BMC has accumulated information concerning four of the trucks the city currently owns.

Organize the data as outlined in table

Use the equations of a & b to find the numerical constants for our regression line.

b= 0.75

a= 3.75

Y=3.75+0.75X

BMC can estimate the annual repair expense given the age of truck.

If it is 4 years old use the equation Y=3.75+0.75X to get the annual expense as follows

Y= 3.75+0.75 *4

=6.75

Expected annual repair expense =6750.0

HOW TO MEASURE THE RELIABILITY OF THE ESTIMATING EQUATION?

Measured by the standard error of estimate

It measures the variability, or scatter of the observed values around the regression line.

STANDARD ERROR

For the above example

Standard error=0.866à 866.0 /-

If standard error is zero we expect the estimating equation to be a perfect estimator of the dependent variable.

Assuming that the observed points are normally distributed around the regression line, we can expect

68% of the points within + Se

95.5 % of the points within +Se and 99.7% of the points within + 3Se

CORRELATION

The statistical tool with the help of which the relationships between two or more than two variables is studied is called correlation.

CORRELATION ANALYSIS

Correlation analysis is the statistical tool to describe the degree to which one variable is linearly related to another.

The coefficient of determination

The extent, or strength of the association that exists between two variables X & Y

Sample coefficient of determination

SAMPLE COEFFICIENT OF DETERMINATION

r2=1 when there is perfect correlation

r2=0 when there is no correlation

Note

r2 measures only the strength of a linear relationship between two variables.

CORRELATION COEFFICIENT

The correlation coefficient computed from the sample data measures the strength and direction of a relationship between two variables.

Sample correlation coefficient, r.

Population correlation coefficient, 

Range of Values for the Correlation Coefficient

Coefficient of correlation

r= r2

When the slope the equation is positive r is the positive square root, but if b is negative r is the negative square root..

The sign of r indicates the direction of the relationship between two variables X & Y

KARL PEARSON’S CORRELATION COEFFICIENT

This is also called product moment coefficient of correlation.

Covariance of x and y is defined as


What does r=0.6 mean?

r=0.6 à r2=0.36

à 36% of the variation in the amount spent on movies is explained by the regression line.

From r=0.6 à the amount spent on movies correlates 0.6 with family incomeà seems like fairy strong correlation . But r2=0.36

36% of the variation in the amount of money families spend on movies.

If you designed your marketing strategy to appeal only to families with high incomes, you’d miss a lot of potential customers.

Instead try to find what else is influencing family movie decisions.

RANK CORRELATION COEFFICIENT

When quantitative measure of certain factor cannot be fixed, but the individuals in the group can be arranged in order thereby obtaining for each individual a number indicating his rank in the group.

The rank correlation coefficient is applied to a set of ordinal rank numbers, with 1 for the individual ranked first, in quantity or quality, and so on, N for last ranked one, then R can be defined as

EXAMPLE

Two managers are asked to rank a group of employees in order of potential for eventually becoming top managers. The ranking are as follows.

Compute the coefficient of rank correlation and comment on the value.

R=1-0.085

=0.915

WHERE RANKS ARE NOT GIVEN

Assign ranks. Then apply the same formula

EQUAL RANKS OR TIE IN RANKS

Assign each individual or entry an average rank.

Thus if individuals are ranked equal at 5th place, give the rank (5+6)/2 =5.5 to both .

If m is the number of items whose ranks are common then R is

MULTIPLE REGRESSION AND CORRELATION ANALYSIS

We can use more than one independent variable to estimate the dependent variable and thus attempt to increase the accuracy of the estimate.

This process is called multiple regression analysis

EXAMPLE

Consider the real estate agent who wishes to relate the number of houses the firm sells in a month to the amount of her monthly advertising.

Certainly we can find a simple estimating equation that relates these two variables.

Could we also improve the accuracy of our equation by including the number of salespeople she employs each month ?

Then we can use number of sales agents and the advertising expenditures to predict monthly house sales.


Multiple regression equations

For getting a, b & c solve the normal equations

EXAMPLE

In trying to evaluate the effectiveness in its advertising campaign, a firm complied the following information

Year 1996 1997 1998 1999 2000 2001 2002 2003

Adv. Expenditure 12 15 15 23 24 38 42 48

(‘000 Rs.)

Sales (Lakh Rs.) 5.0 5.6 5.8 7.0 7.2 8.8 9.2 9.5

Estimate the probable sales when advertisement expenditure is Rs. 60 thousand.

Y= 3.8719+ 0.1250 X

When X=60

Y= 11.37

No comments:

Post a Comment