Sei sulla pagina 1di 58

Reader author: Peter Stikker For: INHOLLAND IBMS students Edition 2.

Preface
This reader was written for students of the department International Business and Management Studies (IBMS) from the professional university INHOLLAND. The reader is adopted from the Dutch department of Marketing from the same university, but has intensively been updated and modified. For most chapters the general structure is to first explain, what will be done, then how this can be done manually and finally how SPSS can be used to do the work. I found it important to still show the manual calculations for a better understanding (and perhaps appreciation) of what the software tool SPSS does for you. Hopefully you will find this reader, clear, informative and most important enjoyable. Kind regards, Peter Stikker

Whats new in the 2nd edition?


Content wise nothing has changed. The material is the same as that used in the first edition. What did change is the following: 1. Structure Rather than showing manual calculations directly followed by SPSS, this 2nd edition first shows all manual calculations and then all SPSS content.

2. SPSS introduction
The SPSS introduction has now become a separate document. 3. Minor errors Some small mistakes and errors have been adjusted. And 2.1 Some adjustments for SPSS 15 rather than 13 Some other small adjustments. And 2.2.. SPSS has recently announced version 18. I have however updated the reader with version 17. Note that the name has also changed to PASW Statistics instead of SPSS Statistics. PASW is the portfolio name for the products of SPSS inc. It is short for Predictive Analytics SoftWare and is pronounced letter by letter. Screenshots replaced with SPSS 17 version Videos now on YouTube (go to http://webpages.inholland.nl/Employees/Peter.Stikker/ and click on Material). Some small other adjustments

Index
Preface.........................................................................................................................................i Whats new in the 2nd edition?..................................................................................................i Index..........................................................................................................................................ii How to work with this reader...................................................................................................iii Introduction................................................................................................................................1 Trend..........................................................................................................................................2 Part 1 Basics: Manual calculations.........................................................................................4 1 Regression................................................................................................................................4 1.1 Simple linear regression equation.....................................................................................4 1.2 Forecasting........................................................................................................................9 2 How good is the best?............................................................................................................10 2.1 The covariance................................................................................................................10 2.2 Standard deviation...........................................................................................................11 2.3 Correlation......................................................................................................................13 2.4 Determination coefficient...............................................................................................14 3 Seasonal trend........................................................................................................................14 3.1 Moving averages.............................................................................................................15 3.2 Seasonal correction.........................................................................................................18 4 Summary of Part 1.................................................................................................................19 Part 2: Using SPSS...................................................................................................................20 5 Linear-Regression Equation...................................................................................................20 6 Simple non-linear regression.................................................................................................22 6.1 Determining the best model............................................................................................25 6.2 Forecasting non-seasonal data with SPSS......................................................................27 7 Confidence intervals..............................................................................................................28 8 Seasonal data..........................................................................................................................30 8.1 Moving Averages............................................................................................................31 8.2 Trendline for seasonal data.............................................................................................33 8.3 SPSS Seasonal correction...............................................................................................34 9 Multiple regression (optional)................................................................................................38 10 Some final comments...........................................................................................................42 Exercises..................................................................................................................................43 Appendix...................................................................................................................................A 1 Additional information on R-square and F-ratio ...................................................................A 2 Using a calculator....................................................................................................................C 2.1 Casio fx-82 ES / 83ES / 85 ES / 300 ES / 350 ES...........................................................C 2.2 TI-83 / PLUS / PLUS Silver edition.................................................................................E 3 Bibliography............................................................................................................................G 4 Essential Formulas..................................................................................................................I

ii

How to work with this reader


SPSS This reader assumes you have SPSS 12 available. You can also work with SPSS 13 and perhaps higher versions although these have not been tested. If you are entirely unfamilar with SPSS you can use the quick introduction to familiarize yourself (see appendix Error: Reference source not found). Calculations In the manual calculations, the outcome is often rounded to two decimals. However if the outcome is used in another calculation the non-rounded value is used. So if you are going to check the calculations please keep this in mind. Calculator In appendix 2 a brief description is given for using a Casio natural display calculator and a Texas instruments graphical calculator to do some calculations for you (determination coefficient and regression equation). Symbols In forecasting various symbols are used, and unfortunately there is not always a standard symbol. For example a predicted value is sometimes indicated by Y , y, y or Y . The reader will use y for real data, and Y for predicted. Text In this reader you will find the following different type of text sets: Questions, exercises and assignment Questions Within the text of a chapter sometimes an exercise is given. This is shown as in below: Question x They are called Question and the table is slightly less wide than the page. End of chapter exercise Some chapters have additional exercises at the end. Assignment Throughout the text special exercises are given to complete the assignment. The assignment is basically to use the theory learned and apply it to the data of a company of interest. They will look like: Assignment exercise .. Others In some cases a box with blue text is shown. This indicates additional or historical information. Type of information You will also come across bullet points looking like: . This indicates a step in SPSS. It is recommended that you follow the steps. Almost all steps are also introduced by the following icon: Start.. This indicates that there is a movie clip available that will show you the same steps. After the last step that is shown in the video the same icon will appear but now with the text End: End.. iii

Introduction
This reader will be an introduction to forecasting, based on historical data. Key terms will be trend, seasonal patterns, and confidence intervals. Forecasting is an area of statistics that can cover a complete book and this reader is therefore only an introduction to this area. There are two main different types of forecasting techniques; qualitative and quantitative. Qualitative techniques often rely on educated opinions of experts. Five methods that can be used are: 1. Executive committee technique / Panel consensus In this method a group of people (often from various departments within the company) is asked to come up with a forecast. The main downside of this method is that people in higher positions might dominate and therefore the actual degree of consensus might be biased. 2. Delphi method The Delphi method lets people answer a series of questions anomalously then the answers of the other people are given. The respondents then answer the questions again. This procedure can then be repeated several times until some kind of consensus is reached. This way the downside of the executive committee technique is somewhat reduced. 3. Market research By conducting questionnaires, interviews, using test-markets etc. and asking the customers their expectations, it can be possible to get an overall trend. 4. Product-life cycle analogy With this method you forecast the situation for lets say product A, based on the data of product B. 5. Expert judgment / Visionary Forecasting Here you rely on the opinions of experts Quantitative techniques rely on some kind of historical data. There are four different methods: 1. Moving averages By using some kind of moving average a forecast is based on average of previous periods 2. Exponential smoothing This is similar as using a weighted moving average, and allows the inclusions of trends etc. 3. Models This will be the method we are going to focus on in this reader. 4. Box-Jenkins An autocorrelation method

Accurate and timely information about what is likely to happen to the economy and society in the future has always been of value to business decision makers. One of the best-known stories of such forecasting is recorded in the first book of the Bible. In that case, Joseph was given the ability by God to interpret the Pharaoh's dream and forecast that there would be seven years of very good harvests and then seven years of famine. Acting on that forecast, Egypt stored grain during the good years and survived the famine - and even prospered as people from surrounding lands had to come to buy food. More than a millennium later, the Oracles of Delphi also appealed to the gods to predict the future for the Greek kings and Roman emperors. In fact, there are reports throughout history of unusual forecasting techniques - often shrouded in mysticism. Source: http://gbr.pepperdine.edu/001/forecast.html (03-06-2006) 1

Trend
To start we will need to define what a trend is. Most often a trend is defined as: Definition 1: Trend The general direction in which something tends to move. For the purpose of this reader we will limit ourselves to time-series. Definition 2: Time series A series of values of a variable at successive times So we will be looking at time-series and try to determine a trend. After a trend has been found we can use this trend to forecast it, into the future. Trends can also be described via non-numerical values, like e.g. fashion trends, which are often based on opinions. A time-series is often shown by means of a scatter graph. This is a diagram only showing points at the data values. Since we are interested in time-series, on the horizontal axis (x-axis) we will place the time, and on the vertical axis (y-axis) we will place the value. Lets look at an example. In below youll find the turnover of company X over the last couple of years: History According to the OED trend was originally a geological term for "the way something [e.g. a coast-line] trends or bends away" and then it acquired a figurative sense meaning "the general course, tendency, or drift of action, thought etc." Statistical writers used the word in this latter sense but R. H. Hooker seemed to be aware he was extending the meaning when he wrote, "The curve or line representing the successive instantaneous averages [moving-averages] I propose to call the trend." W. I. King's Elements of Statistical Method (1912) took Hooker's construction and the term "trend" to a broader audience. The term was soon applied to curves or lines constructed on other principles. Source: http://members.aol.com/jeff570/t.html

Example set 1 Year Turnover 1995 29 1996 28 1997 31 1998 34 1999 37 2000 38 2001 38 2002 37 2003 40 2004 44 2005 46 The scatter diagram will then look like:
46 44

42

Turnover in million $

40

38

36

34

32

30

28 1994 1996 1998 2000 2002 2004 2006

YEAR

Looking at the diagram we might be able to see that over the years the turnover has been increasing. But what will be the best describing trend line? To answer this question we will need to go into something called regression which will be dealt with in the next chapter. A time-series often consists out of several components1: - Trend Trend is a long term movement in a time series. It is the underlying direction (an upward or downward tendency) and rate of change in a time series, when allowance has been made for the other components. - Seasonal In weekly or monthly data, the seasonal component, often referred to as seasonality, is the component of variation in a time series which is dependent on the time of year. It describes any regular fluctuations with a period of less than one year. For example, the costs of various types of fruits and vegetables, unemployment figures and average daily rainfall, all show marked seasonal variation - Cyclic In weekly or monthly data, the cyclical component describes any regular fluctuations. It is a non-seasonal component which varies in a recognizable cycle. - Irregular The irregular component is that left over when the other components of the series (trend, seasonal and cyclical) have been accounted for. For this course we will focus on the trend and seasonal component. Assignment exercise 1 Select an international company of your interest and find the historical data from this company on its revenues or sales per quarter. Enter this data into SPSS and save the data file. Criteria: You must have at least the data over a period of 5 years and each of its quarters. Tip Try using the company website for the information. Many companies have a investors relations or something similar part on their website. You might be able to find the revenues separately in different annual reports.

The definitions given are taken from: http://www.stats.gla.ac.uk/steps/glossary/time_series.html 3

Part 1 Basics: Manual calculations 1 Regression


Regression is a mathematical tool to find the best equation to describe a set of data. Although SPSS will be able to determine this equation for us, it is good to see how this works manually. We will start by dealing with the most simple version of them all, the simple-linear regression. History The statistical concept of regression has its origins in an attempt by Francis Galton (1822-1911) to find a mathematical law for one of the phenomena of heredity. His model (as it would be called today) was extended by Karl Pearson and G. Udny Yule and the biological reference eventually disappeared. The Pearson-Yule notion of regression was based on the multivariate normal distribution but R. A. Fisher refounded regression using the model Gauss had proposed for the theory of errors and method of least squares. The Pearson-Yule and Gauss-Fisher notions are still in use. Source text: http://members.aol.com/jeff570/r.html Source picture: http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Galton.html

1.1 Simple linear regression equation.


Linear regression is as the name implies about linear equations. A linear equation is a mathematical name for a function that has the following structure (or can be rewritten into this form): y = ax + b For regression however this is often rewritten into: Y = b0 + b1t. b0 and b1 will be known numbers, t will be the time period and Y in the example the predicted turnover. This type of equation will always produce a straight line. The main problem is now to find the best values for b0 and b1 to describe the data. As a first thought you might think to minimize the differences between the data points and the straight lines. So for example: Example set 2 Year Turnover 1995 9 1996 12 1997 19 1998 22 1999 29 2000 32 We might guess a b0 of 3 and b1 of 5. This will give us then the following diagram:
35

30

25

20 Original Gues s 15

10

0 1 2 3 4 5 6

Our first guess would then result in: Year Turnover 1995 9 1996 12 1997 19 1998 22 1999 29 2000 32 Total

t 1 2 3 4 5 6

Guess 5x1+3=8 5 x 2 + 3 = 13 5 x 3 + 3 = 18 5 x 4 + 3 = 23 5 x 5 + 3 = 28 5 x 6 + 3 = 33

Difference |9 - 8| = 1 |12 - 13| = 1 |19 - 18| = 1 |22 - 23| = 1 |29 - 28| = 1 |32 - 33| = 1 6

So what you might want to do, is to minimize the difference total (6) by looking for different values of b0 and b1. There is however one major problem with this; there is no single equation line that will give the minimum of the difference total. In other words, you will not be able to find just one, but several pairs of b0 and b1 that will minimize the absolute difference. Question 1 In below the turnover of company E Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 t 1 2 3 4 5 6 7 8 9 10 Turnover 12 22 28 38 44 54 60 70 76 86

a) With b0 = 4 and b1 = 8 compute the absolute difference total between the real turnover and
the linear equation results.

b) With b0 = 5 and b1 = 8 compute the absolute difference total between the real turnover and
the linear equation results. In order to solve this problem, we can, instead of taking the absolute values, square the differences. This method is therefore known as the least-square method. It is the most widely used method to determine the best equation to fit the data. So returning to our example set 1, and using b0 = 27 and b1 = 1.5 we can have a look at the total squared differences: Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Turnover 29 28 31 34 37 38 38 37 40 44 46 t 1 2 3 4 5 6 7 8 9 10 11 Equation est. 27 + 1,5 x 1 = 28,5 27 + 1,5 x 2 = 30 27 + 1,5 x 3 = 31,5 27 + 1,5 x 4 = 33 27 + 1,5 x 5 = 34,5 27 + 1,5 x 6 = 36 27 + 1,5 x 7 = 37,5 27 + 1,5 x 8 = 39 27 + 1,5 x 9 = 40,5 27 + 1,5 x 10 = 42 27 + 1,5 x 11 = 43,5 Total: Squared diff. (28,5 - 29)2 = 0,25 (30 - 28)2 = 4 (31,5 - 31)2 = 0,25 (33 - 34)2 = 1 (34,5 - 37)2 = 6,25 (36 - 38)2 = 4 (37,5 - 38)2 = 0,25 (39 - 37)2 = 4 (40,5 - 40)2 = 0,25 (42 - 44)2 = 4 (43,5 - 46)2 = 6,25 30,5 5

So in a way we can now play around with different settings of b0 and b1 to minimize the squared differences. Lucky however, mathematicians have developed an algorithm to find the values for b0 and b1.

1) 2) 3) 4) 5) 6)

Calculate the average of the t values => t Calculate the average of the turnover (or whatever you are using) => y Calculate the difference for each time point with the average => t i t Calculate the difference for each turnover with the average => ( y i y )

) ( )( y i y )

Multiply each result of step 3, with the corresponding result of step 4 => t i t Add up all values of step 5 and divide it by the number of items minus 1 =>

S ty =

((t
i =1

t )( y i y ) ) n 1

7) Square each result of step 3 => ( t i t ) 2 8) Add up all results of step 7 and divide it by the number of items minus 1 =>
S t2 =

((t i t ) 2 )
n i =1

n 1

9) b1 is now equal to the result of step 6 divided by the result of step 8 => b1 =

S ty S t2

10) b0 is now equal to the result of step 2 minus the result of step 9 times step 1 => b0 = y b1 t
Heres a complete worked out example: (1) Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total: (2) Turnover 29 28 31 34 37 38 38 37 40 44 46 402 (3) t 1 2 3 4 5 6 7 8 9 10 11 66

( t i t ) ( yi y )
-5 -4 -3 -2 -1 0 1 2 3 4 5 -7,55 -8,55 -5,55 -2,55 0,45 1,45 1,45 0,45 3,45 7,45 9,45

(4)

(5)

( t i t )( y i y ) ( t i t ) 2
37,73 34,18 16,64 5,09 -0,45 0,00 1,45 0,91 10,36 29,82 47,27 183 25 16 9 4 1 0 1 4 9 16 25 110

(6)

(7)

Step 1: Calculate the average of the t values => t The average of t => t =

t i
i= 1

1 + 2 + ... +10 +11 66 = =6 11 11

Step 2: Calculate the average of the turnover (or whatever you are using) => Y

y=

y
i =1

29 + 28 + ... + 44 + 46 402 = 36.55 11 11

Step 3: Calculate the difference for each time point with the average => t i t

)
6

5 So (1 6 ) = , ( 2 6 ) = 4 etc. See column 4.

Step 4: Calculate the difference for each turnover with the average => ( y i y ) 7 8 So ( 29 36 .55 ) = .55 , ( 28 36 .55 ) = .55 etc. See column 5. Step 5: Multiply each result of step 3, with the corresponding result of step 4 => t i t So 5 ( 7.55 ) 37 .73 , 4 ( 8.55 ) 34 .18 etc. See column 6. Step 6: Add up all values of step 5 and divide it by the number of items minus 1 =>

)( y i y )

S ty =

((t
i =1

t )( y i y ) ) n 1 t )( y i y ) ) n 1 =

S ty =

( (t
i =1

Step 7: Square each result of step 3 => ( t i t ) 2

37 .73 + 34 .18 + ... + 29 .82 + 47 .27 183 = = 18 .3 11 1 10

So (1 6 ) 2 = 25 , ( 2 6 ) 2 =16 etc. See column 6.

Step 8: Add up all results of step 7 and divide it by the number of items minus 1 =>

S t2 =

((t i t ) 2 )
n i =1

S t2 =

((t i t ) 2 )
n i =1

n 1

n 1

25 +16 + ... +16 + 25 110 = = 11 11 1 10

Step 9: b1 is now equal to the result of step 6 divided by the result of step 8 => b1 =

S ty S t2

b1 =

S ty S t2

18 .3 1.66 11

Step 10: b0 is now equal to the result of step 2 minus the result of step 9 times step 1 => b0 = y b1 t

b0 = y b1 t 36 .55 1.66 6 26 .57


So according to the least-square method, the best linear equation to describe the data of example set 1 is:

Y = 26 .56 +1.66 t

Question 2 In below again the turnover of company E Year Turnover 1995 12 1996 22 1997 28 1998 38 1999 44 2000 54 2001 60 2002 70 2003 76 2004 86 7

a) Determine via the least square method the best linear equation.

1.2 Forecasting
So far we have worked on finding the best mathematical model to describe the data. One major advantage of having a mathematical equation to describe the data is that the equation can to some extend predict future values. In the example used we found that the data could be described by using the mathematical linear equation of

Y = 26 .56 +1.66 t

So if for example we would like to know the result for 2006 we simply enter t = 12 in the equation and obtain a prediction of: Y = 26 .56 +1.66 12 = 46 .48 . This way we can predict values in whatever year we like.

2 How good is the best?


In chapter 2 we discussed how to find the best equation to describe the data, but is the best good enough? In the same way if the best grade is a 42, it is still not good enough. One of the most used measurements for this is the so called determination coefficient. We will first show again how this can be done manually. The determination coefficient uses 3 other statistical measurements; the covariance, the standard deviation and the correlation coefficient. We will discuss each one first separately

2.1 The covariance


The covariance measures how two variables vary together. The general formula is:

s yY =

( ( yi y ) (Yi Y ) )
i =1

n 1

This looks perhaps rather complicated, but can easily be broken down into several steps: 1) Calculate the arithmetic mean for variable 1 => y 2) Calculate the arithmetic mean for variable 2 => Y 3) Calculate yi y for each i 4) Calculate Yi Y for each i

5) Multiply each result of step 3 with that of step 4 => ( yi y ) (Yi Y ) 6) Add up all results of step 4 =>

(( y
n i =1

y ) Yi Y

))

7) Divide the result of step 6 by n 1


If we would call our real turnover x and the value that will come out of the linear model found y then we can follow the steps with the following table: Year t i 1 2 3 4 5 6 7 8 9 10 11 Turnover y 29 28 31 34 37 38 38 37 40 44 46 402 36,55 Model Y 28,23 29,89 31,55 33,22 34,88 36,55 38,21 39,87 41,54 43,20 44,86 402,00 36,55

yi y
-7,55 -8,55 -5,55 -2,55 0,45 1,45 1,45 0,45 3,45 7,45 9,45

Yi Y

( yi y )(Yi Y )
62,76 56,86 27,68 8,47 -0,76 0,00 2,42 1,51 17,24 49,61 78,64 304,44 30,44

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total: Average: Divide by n - 1:

-8,32 -6,65 -4,99 -3,33 -1,66 0,00 1,66 3,33 4,99 6,65 8,32

1) 2) 3) 4) 5) 6) 7)

y = 36.55 (= 402/11)

Y = 36.55 (= 402/11) See the column yi y (e.g. 29 36.55 = -7.55) See the column Yi Y (e.g. 28.23 36.55 = -8.32)
See the column ( yi y ) Yi Y
i i

) (e.g. -7.55 -8.32 62.76) The sum of column ( y y ) (Y Y ) = 304.44


Divided by n 1 gives 30.44 10

Notice that if both values are lower than their mean ( yi y ) Yi Y will be positive, if both are higher than it will also be positive, but if one is higher and the other one lower it will be negative. History... The term covariance, analogous to variance, began appearing in 1930 in the writings of R. A. Fisher (photo on the left) and his circle. It is used in Fisher's The Genetical Theory of Natural Selection (p. 195), Harold Hotelling's "The Consistency and Ultimate Distribution of Optimum Statistics," Transactions of the American Mathematical Society, 32, p. 850 and H. G. Sanders's "A Note on the Value of Uniformity Trials for Subsequent Experiments, Journal of Agricultural Science, 21, p. 64. In 1918 when Fisher introduced the term variance (q.v.) he announced the fact. In 1930 nobody said he was introducing a new term. Source text: http://members.aol.com/jeff570/c.html Source picture: http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Fisher.html Question 3 Calculate the Covariance of company E using the linear model: t Year Turnover 1995 1 12 2 1996 22 1997 3 28 1998 4 38 1999 5 44 2000 6 54 2001 7 60 8 2002 70 2003 9 76 2004 10 86

2.2 Standard deviation


The standard deviation is another measurement we need in order to calculate the determination coefficient. The standard deviation is the square root out of the average squared differences to the mean. If we use yet another measurement, the variance this is slightly less complicated to understand. The variance is the average squared difference to the mean. Notice that we square the differences again and not only use the absolute difference. The reasons for this are not part of this course. Since everything is squared the variance is often quite large if you compare it to the original data. The opposite of squaring is taking the square root, and thats exactly what the standard deviation does; taking the square root of the variance. In below the two formulas:

Variance:

s =
2

(( x
n i =1

x)

)
(( x
n i =1

n 1 x)
2

Standard deviation:

s= s =
2

n 1

11

Lets see how this works with our example: Year t Turnover y 29 28 31 34 37 38 38 37 40 44 46 402 36,55 Model Y 28,23 29,89 31,55 33,22 34,88 36,55 38,21 39,87 41,54 43,20 44,86 402,00 36,55

yi y
-7,55 -8,55 -5,55 -2,55 0,45 1,45 1,45 0,45 3,45 7,45 9,45

( yi y ) 2
56,93 73,02 30,75 6,48 0,21 2,12 2,12 0,21 11,93 55,57 89,39 328,73 32,87

Yi Y

(Y

1995 1 1996 2 1997 3 1998 4 1999 5 2000 6 2001 7 2002 8 2003 9 2004 10 2005 11 Total: Average: Divide by n - 1:

-8,32 -6,65 -4,99 -3,33 -1,66 0,00 1,66 3,33 4,99 6,65 8,32

Y 69,19 44,28 24,91 11,07 2,77 0,00 2,77 11,07 24,91 44,28 69,19 304,43
i

30,44

The means were already calculated to be 36.55 for both. Also yi y and Yi Y were already once calculated. The new ones are: ( y i y ) and Yi Y Dividing the totals by n 1 gives us the variance. So:
2 s 2 = 32 .87 and s Y = 30 .44 y

And therefore the standard deviations are: s y = 32 .87 5.73 and s Y = 30 .44 5.52 History Introduced by Karl Pearson (1857-1936) in 1893, "although the idea was by then nearly a century old" (Abbott; Stigler, page 328). According to the DSB: The term "standard deviation" was introduced in a lecture of 31 January 1893, as a convenient substitute for the cumbersome "root mean square error" and the older expressions "error of mean square" and "mean error." The OED2 shows a use of standard deviation in 1894 by Pearson in "Contributions to the Mathematical Theory of Evolution," (Philosophical Transactions of the Royal Society A, 185, (1894), 71-110.): "Then will be termed its standard-deviation (error of mean square)." (p. 80) He had "always found it more convenient to work with the standard-deviation than with the probable error or the modulus, in terms of which the error-function is usually tabulated." (p. 88n) On p. 70 he identified the standard deviation with Gausss mean error. Source text: http://members.aol.com/jeff570/s.html Source picture: http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Pearson.html Question 4 Calculate the standard deviation of company E (see question 3 for the data)

12

2.3 Correlation
The last measurement we need to discuss before going to the determination coefficient is the correlation. In a way the correlation coefficient makes it possible to compare different values of covariance. The covariance can be any value, but the correlation coefficient is always between -1 and 1. This is done by dividing the covariance by the two standard deviations. So in formula2: r= covariance s y sY

In the example we already determined that the: - covariance was 30.44 s y 5.73 -

s Y 5.52

30.44 0.962 5.73 5.52 Correlation and cause Correlation and causation are two quite different words, and the innumerate are more prone to mistake them than most. Quite often, two quantities are correlated without either one being the cause of the other. One common way in which this can occur is for changes in both quantities to be the result of a third factor In the New Hebrides Islands, body lice were considered a cause of good health. As in many folk observations, there was some evidence for this. When people took sick, their temperatures rose and caused the body lice to seek more hospitable abodes. The lice and good health both departed because of the fever. Similarly, the correlation between the quality of a states day-care programs and the reported rate of child sex abuse in them is certainly not causal, but merely indicates that better supervision results in more diligent reporting of the incidents which occur. Taken from Innumeracy from John Allen Paulus, ISBN 0140291202, pg 118,119 Picture taken from: http://www.stat.psu.edu/~resources/Cartoons/
So the correlation coefficient is: r = A correlation coefficient of 1 indicates a perfect positive linear relation, and correlation of -1 a perfect negative linear relation and 0 of no linear relation.

r = 0.04

r = -0.99

r = 0.99

Question 5 Calculate the correlation coefficient from company E (see question 3 for the data)

The full official name is: Pearson product-moment correlation coefficient 13

2.4 Determination coefficient


Finally we are set to calculate the determination coefficient. Dont worry this is relatively easy after the previous paragraphs. The determination coefficient is nothing else than the correlation coefficient squared:

r2
Thats it. So just square the correlation coefficient and you get the determination coefficient. In the example this will become r 2 0.962 2 0.926 The R-square shows the proportion of how much of the total variance can be explained. In the example the 0.926 means that we have explained 92.6% of the original variability. The full interpretation of this value is beyond the scope of this course. What is important however is the higher the determination coefficient, the better the model describes the data. Question 6 Calculate the determination coefficient from company E (see question 5)

3 Seasonal trend
In some situations there might be some seasonal influences on the data. An obvious example is perhaps the turnover of ice-cream. In winter time this is most often less than in the summer. In this chapter we will deal with these kind of data sets. There are two types of seasonal patterns that can be distinguished; additive and multiplicative. In the additive pattern the difference between each season remains the same. In a multiplicative pattern the season themselves increase by a certain factor. In below the diagram of an additive seasonal trend:
250 200

150

100

50

0
20 00 Q 1 20 00 Q 2 20 00 Q 3 20 00 Q 4 20 01 Q 1 20 01 Q 2 20 01 Q 3 20 01 Q 4 20 02 Q 1 20 02 Q 2 20 02 Q 3 20 02 Q 4 20 03 Q 1 20 03 Q 2 20 03 Q 3 20 03 Q 4 20 04 Q 1 20 04 Q 2 20 04 Q 3 20 04 Q 4 20 05 Q 1 20 05 Q 2 20 05 Q 3 20 05 Q 4

-50

-100

Notice that although the diagram as a whole is moving upwards, the difference between the high and low values (the amplitude) remain the same. In the multiplicative model, the diagram might look like:
300 250 200 150 100 50 0 2000 Q4 2001 Q1 2002 Q1 2002 Q2 2003 Q1 2004 Q1 2004 Q2 2005 Q2 2000 Q1 2000 Q2 2000 Q3 2001 Q2 2001 Q3 2001 Q4 2002 Q3 2002 Q4 2003 Q2 2003 Q3 2003 Q4 2004 Q3 2004 Q4 2005 Q1 2005 Q3 -50 -100 2005 Q4

Notice that now still the diagram as a whole is moving upwards, but also the difference between the high and low values changes. For this course we will limit ourselves to the additive model. 14

3.1 Moving averages


In order to still see an overall pattern we can use the principle of moving averages. Lets follow the following example of company B Example set 3 Turnover Yea Quarter r 1300 2001 1 2600 2001 2 4900 2001 3 1800 2001 4 2400 2002 1 4400 2002 2 6400 2002 3 2800 2002 4 3600 2003 1 5300 2003 2 7300 2003 3 3900 2003 4 5300 2004 1 6400 2004 2 8400 2004 3 5600 2004 4 We will start by using a simple example first. Company C has the following information. Year Period 2003 Jan - Mar Apr - Jun Jul - Sep Okt - Dec 2004 Jan - Mar Apr - Jun Jul - Sep Okt - Dec 2005 Jan - Mar Apr - Jun Jul - Sep Okt - Dec Sales 22 50 101 34 73 123 48 92 134 68 104 158

In below you see this in a diagram:


180 160 140 120 100 80 60 40 20 0 Jan Mar Apr Jun Jul Sep Okt Dec Jan Mar Apr Jun Jul Sep Okt Dec Jan Mar Apr Jun Jul Sep Okt Dec

2003

2004

2005

Notice that there is a pattern emerging

15

In order to smooth the data we could calculate the average each three months. So then we move up one row and compute this again:

22 + 50 +101 57 .67 3

50 +101 +34 61 .67 etc. 3


Sales 22 50 101 34 73 123 48 92 134 68 104 158 MA 57,67 61,67 69,33 76,67 81,33 87,67 91,33 98,00 102,00 110,00

Year Period 2003 Jan - Mar Apr - Jun Jul - Sep Okt - Dec 2004 Jan - Mar Apr - Jun Jul - Sep Okt - Dec 2005 Jan - Mar Apr - Jun Jul - Sep Okt - Dec

In below a diagram showing both the sales and the moving average:
180 160 140 120 100 80 60 40 20 0 Jan Mar Apr Jun Jul Sep Okt Dec Jan Mar Apr Jun Jul Sep Okt Dec Jan Mar Apr Jun Jul Sep Okt Dec

2003

2004

2005

The moving averages now make it easier to see what kind of pattern (or trend) the sales have. In the example set 3 there is however a small complication. We cant tell to which period the MA belongs.

1300 + 2600 + 4900 +1800 = 2650 but this belongs to 2001 quarter 2.5. So in 4 2600 + 4900 +1800 + 2400 order to solve this we also compute the next MA of = 2925 which therefore 4 belongs to 2001 quarter 3.5. Now taking the average of quarter 2.5 and 3.5 results in the average for quarter 3, 2650 + 2925 so: = 2787 .5 which we will use as the MA for 2001 quarter 3. 2
We could calculate e.g.

16

We can use the same technique for the other quarters: Turnover Year Quarte r 1300 200 1 1 2600 200 1 200 1 200 1 200 2 200 2 200 2 200 2 200 3 200 3 200 3 200 3 200 4 200 4 200 4 200 4 2 2650 4900 3 2787,5 2925 1800 4 3150 3375 2400 1 3562,5 3750 4400 2 3875 4000 6400 3 4150 4300 2800 4 4412,5 4525 3600 1 4637,5 4750 5300 2 4887,5 5025 7300 3 5237,5 5450 3900 4 5587,5 5725 5300 1 5862,5 6000 6400 2 6212,5 6425 8400 3 MA (half) MA

5600

17

3.2 Seasonal correction


For the seasonal data we have only a prediction for the moving average of the specific periods in time. For example we have let SPSS calculate that the moving average for 2006 Quarter 4 will be 14508.39. But as you can see from the diagram period 4 is most often below the MA. What we need is a so called seasonal correction. In order to see how much we should add or subtract from the moving average, we calculate the average of those we already know. So what we do is we arrange each difference between a known value and the moving average in the appropriate quarter column. Then for each quarter we calculate the average: Quarter Turnover 1300 2600 4900 1800 2400 4400 6400 2800 3600 5300 7300 3900 5300 6400 8400 5600 Total Count Average Year 2001 2001 2001 2001 2002 2002 2002 2002 2003 2003 2003 2003 2004 2004 2004 2004 Quarte r 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 MA 1 2 3 4

2787,5 3150 3562,5 3875 4150 4412,5 4637,5 4887,5 5237,5 5587,5 5862,5 6212,5

2112,5 -1350 -1162,5 525 2250 -1612,5 -1037,5 412,5 2062,5 -1687,5 -562,5 187,5

-2762,5 3 -920,83

1125 3 375

6425 3 2141,67

-4650 3 -1550

In the Quarter columns each time the difference between turnover and MA is calculated. So on an average the second quarter for example will be 375 higher than the MA shows. As a final note these four quarter averages should add up to 0. However as you might notice they do not: -920.83 + 375 + 2141.67 - 1550 45 .83 . In order to solve this a very simple adjustment is made: 45 .83 11 .46 . So we subtract 11.46 from each average: 4 Quarter 2 3 4 1125 6425 -4650 3 3 3 2141,666 375 7 -1550 363,5416 2130,208 7 3 1561,4583

Total Count Average Average cor

1 -2762,5 3 920,8333 932,2917

18

4 Summary of Part 1
In this first part weve seen various manual calculations, with as a main goal to produce an accurate forecast. The basic steps in order to create this forecast are: Non-seasonal data Step 1: Determining the best linear regression equation This is done by using the least-square method. Step 2: Determine how good the best model/equation is This is done by calculating the determination coefficient Step 3: Creating the forecast This is done by filling in the appropriate t value in the equation. Seasonal data Step 1: Create the overall trend This is done by calculating the moving averages Step 2: Determine the best linear regression equation for the Moving Averages This is done by using the least-square method. Step 3: Determine how good the best model/equation is This is done by calculating the determination coefficient Step 4: Determine the adjustments This is done by calculating the adjusted seasonal correction Step 5: Creating the forecast This is done by filling in the appropriate t value in the equation (gives a prediction for the MA) and adding the appropriate seasonal correction.

19

Part 2: Using SPSS 5 Linear-Regression Equation


SPSS can also determine the values for b0 and b1.
Start: Video 2 Simple Linear Regression.avi

First we enter the turnovers from our example:

Next we select from Analyze-Regression-Curve Estimation from the menu-bar:

The Turnover is the dependent variable, and for the independent we use the Time

The rest we leave as it is, so we press on OK 20

In the Output window we see (among other tables) the following information:

The table has the following columns: Dependent This is the dependent variable we set. Mth This is the method / model used. It shows LIN which stands for Linear. Later we will see other models as well. Rsq This stands for R square, and is the so-called determination coefficient. Also this will be dealt with later in this reader. D.f. Stands for degrees of freedom. An explanation of what this is, can be found in appendix 1.. F The F-ratio is not part of this course, a quick introduction can be found in appendix 1. Sigf Significance level is not part of this course , a quick introduction can be found in appendix 1. b0 (note in SPSS 15 and higher it now shows constant rather then b0) This shows the b0 value for the regression equation. Notice that the 26.5636 is very close to the 26.57 we found manually. The difference is due to rounding. b1 This shows the b1 value for the regression equation. Notice that the 1.6636 is very close to the 1.66 we found manually. The difference is due to rounding. Underneath the table, you will see a scatter diagram and the regression line:

Save the output as OutputExample1 Save the data file as Example1


End: Video 2 Simple Linear Regression.avi

Assignment exercise 2 Determine the simple-linear regression equation from your selected company from assignment exercise 1. In the output file add a new text line and type in the equation. Save the output file as AssignmentEx2

21

6 Simple non-linear regression


So far we discussed only linear regression equations. But not always is a straight (linear) line the best to describe the situation. Therefore also non-linear regression equations exist3. In below are the different types of models that SPSS knows: Model Quadratic Equation Y = b0 + b1t + b2t2 Graph
Quadratic Opt 1

Quadratic Opt 2

Cubic

Y = b0 + b1t + b2t2 + b3t3

Cubic Opt 1

Cubic Opt 2

Compound

Y = b0 + b1t

Com pound Opt 1

Growth

Y = e b0 +b1t

Grow th Opt 1

Grow th Opt 2

Logarithmic

Y = b0 + b1ln(t)

Logarithm ic Opt 1

Logarithm ic Opt 2

Officially the models discussed in this chapter are still linear regression models. The linear formally refers to that some kind of linear relation still exists. All models discussed in this chapter can still be transformed into a linear equivalent. However to avoid this confusion the author has chosen to name these non-linear since the equations are non-linear equations, but keep in mind that the regression equation is still considered a linear regression equation. 22

Y=

b0 +

b1 t

S Opt 1

S Opt 2

Exponential

Y = b0 e b1t

Expone ntial Opt 1

Exponential Opt 2

Inverse

b1 Y = b0 + t

Inve rs e Opt 1

Inver se Opt 2

Power

Y = b0 t b1

Pow e r Opt 1

Pow er Opt 2

For all these models it is possible to use the least square method again to determine b0 and b1 (and b2 and b3 in some cases). You will then need to go into so called normal-equations. This is out of the scope for this reader. Instead we will use SPSS to determine the values for us.
Start: Video 3 Simple Non-linear regression.avi

Open the data file Example1 From the menu bar select Analyze Regression Curve Estimation Make Turnover the dependant variable; Select Time as the Independent variable and; Select all models except the Logistic one

Click on OK Save the output as OutputExample2 23

End: Video 3 Simple Non-linear regression.avi

In the output window you now see all models, and their appropriate columns. Also the scatter diagram is shown again. Notice however that due to the amount of models used, this diagram becomes rather messy. In the next chapter we will see how to select the best model. For now lets see how we could determine the regression equation for example of the cubic model using SPSS.
Start: Video 4 Simple Non-linear cubic regression.avi

Open the data file Example1 From the menu bar select Analyze Regression Curve Estimation Make Turnover the dependant variable; Select Time as the Independent variable and; Make sure only the cubic model is selected Click on Help Scroll down and click under Related topics on the option Curve Estimation Models Click on Show Details

Notice that SPSS shows the general regression equation for the cubic model. The x means times and xx means to the power of so txx3 means t3. Click on OK
End: Video 4 Simple Non-linear cubic regression.avi

In the output we see:

And since the equation for the cubic model in general is: Y = b0 + b1 t + b2t2 + b3t3 we can now use the found b0 (called Constant), b1 and b3 from SPSS to create our cubic regression equation: Y = 23.894 + 3.924t 0.459t2 + 0.026t3 Assignment exercise 3 From your selected company create an output file showing all models (except the logistic). Add a new text where you write down the quadratic model equation. Save the output file as AssignmentEx3

24

6.1 Determining the best model


We discussed how the determination coefficient is a measurement to see how good the best equation is. We can therefore also use the determination coefficient to determine which models are suitable. Calculating the determination coefficient for all 9 models will take up a lot of time. Luckily we have SPSS to do the work for us. Actually we already done this. Notice that in the output of the Curve estimation one of the columns is Rsq which is just short for R squared. In other words the column Rsq shows you the determination coefficient. However only judging the best model on this single value can sometimes lead to a wrong result, or to be more precise, to a less feasible prediction. Occars Razor. SPSS gives in the output also some other measurements, it Actually the quote here is an often does this because also those other measurements stated used modification of the original influence how well the model describes the data. Entities should not be multiplied Another point to take into consideration is the unnecessarily. The name comes complexity of the model. If for example both the linear from William of Occam who was and the cubic model have the same Rsq most often the a logician and Franciscan friar. linear model will then be preferred (in a way this uses However he is only associated with the Occams razor: The simplest answer is usually the principle because he often used it, but he was correct answer). probably not the first to use it. Sometimes the principle is given in Latin to perhaps give it Also we should consider the graphical representation. more authenticity as for example: "Pluralitas Especially for long-term forecasting the nature of the non est ponenda sine neccesitate". For more equation might head towards an unusual effect. To information see for example: illustrate this take the following example. http://math.ucr.edu/home/baez/physics/General In below you see the turnover of some company and the /occam.html quadratic model. Source picture: http://en.wikipedia.org/wiki/Occam's_Razor

The Rsq was 0.989 and also the line seems to fit the data quite well. Notice that the observed values seem to go up, but less steep. If we forecast using the quadratic model the next 10 periods the result will look like:

25

Notice that the turnover almost reaches 0. Even though perhaps the forecast shows this, most likely the management will prevent this from happening long time before it hits this line. This scenario could however be used to warn the management. Another useful element of looking at the graph is the detection for outliers. In chapter 3 at the end of the chapter exercise 1, we saw already that an outlier can influence the outcome of the model significantly. So in short, in chosing a model follow these three steps: 1. Select the models with a high Rsq 2. Look at the diagram to see which one will describe the data best 3. Apply Occams razor

Assignment Exercise 4 Create an output file showing all models and add a new text showing the equation for the best model. Save the output file as AssignmentEx4

26

6.2 Forecasting non-seasonal data with SPSS


Also SPSS can do this for us.
Start: Video 5 Forecasting.avi

Open the file Example1 From the menu bar select Analyze Regression Curve Estimation Make Turnover the dependant variable; Select Time as the Independent variable and; Make sure only the linear model is selected Click on Save Check the option Predicted values and; at Predict cases select check Predict through Lets say we would like to know the prediction for 2008. Since the first turnover was in 1995 (=> t = 1 => observation 1) and the last one 2005 (=> t = 11 => observation 11) this means 2008 will be observation 14 (2008 1995 + 1). So at observation type in 14

Click on continue Click on OK Save the data file as Example2


End: Video 5 Forecasting.avi

Notice that we now have a new column entitled FIT_1. These are the values according to the regression equation. For 2008 (observation 14) the expected turnover will therefore be 49.85 (appr.). Also we can see that the prediction of 46.48 that we found manually is quite close to the 46.53 that SPSS calculated for us. The difference is due to the fact that we used rounding in our manual calculation, while SPSS does not.

27

7 Confidence intervals
The prediction will of course not be 100% accurate. However statisticians have come up with ways to make the prediction more reliable. Instead of giving one precise value, they often will give a range between which the real value in the future will occur. Imagine someone saying the temperature tomorrow will be 18 degrees. This statement has a rather large chance to be incorrect. However the statement that the temperature will be between 17 and 19 degrees has already a larger chance of success. In order to be 100% accurate the range will most often be useless. In the weather example if someone says that the temperature tomorrow will be between -100 and 100 he is most likely 100% accurate but also 100% useless . So by making the interval smaller we loose some confidence. Luckily by only giving up a small amount of the confidence the interval often already dramatically decreases. One of the most often used so called confidence level is 95%. We will leave the manual calculation away and go straight to letting SPSS do this for us.
Start Video 6 Confidence interval.avi

Open the file Example1 From the menu bar select Analyze Regression Curve Estimation Make Turnover the dependant variable; Select Time as the Independent variable and; Make sure only the linear model is selected Click on Save Check the option Predicted values and; at Predict cases select check Predict through We will use again 2008. So at observation type in 14 Check the option Prediction interval Notice that SPSS allows you to change the confidence level to be either, 90%, 95% or 99%. By default the 95% is selected, and lets leave it at that. Click on continue Click on OK Save the data file as Example3
End: Video 6 Confidence interval.avi

In the data file we now have again the FIT_1 but also two new columns. LCL_1 and UCL_1. LCL stands for Lower Control Limit and is the lower value of the confidence interval, and as you might expect UCL stands for Upper Control Limit and is the upper value of the confidence interval. It would be nice to see in a diagram how the four columns relate to each other
Start: Video 7 Confidence diagram.avi

Open the file Example3 From the menu bar select o SPSS 13 or earlier: Graphs Sequence o SPSS 15 and 16: Analyze Time series Sequence charts o SPSS 17: Analyze Forecasting Sequence charts

28

Move all the available columns to Variables

Click on OK Save the output file as OutputExample3


End: Video 7 Confidence diagram.avi

The diagram below shows the result:


Turnover Fit for Turnover from CURVEFIT, MOD_8 LINEAR 95% LCL for Turnover from CURVEFIT, MOD_8 LINEAR 95% UCL for Turnover from CURVEFIT, MOD_8 LINEAR 40,00000

50,00000

30,00000

20,00000 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sequence number

Notice that the real values (the most fluctuating line) fall within the LCL and UCL. Also notice that the difference between the LCL and UCL become larger while going further away from the average time period. This is in a way logical since the further you would predict into the future, the greater the uncertainty so the interval will be larger. Assignment Exercise 5 Create a forecast including a 90% confidence interval for the next four quarters of your company. Save the data file as Assignment5 and the output file as Assignment5 29

8 Seasonal data
We will have to type the turnover in, in SPSS, but for the other columns we can (and have to) let SPSS do the work:
Start: Video 8 Defining dates.avi

Create a new data file and enter the turnovers from the Example set 3 (see page 14) From the menu bar select Data Define Dates

In the pop-up window select Years, quarters

At Year replace the 1900 with 2001 and leave the Quarter at 1. Click on OK Save the data file as Example4
End: Video 8 Defining dates.avi

Notice that SPSS created several new columns:

30

Now we can let SPSS calculate the moving averages for us.

8.1 Moving Averages


Once again SPSS can do all of the work for us.
Start: Video 9 Moving Averages.avi

Open the data file Example4 From the menu bar select Transform Create time series

First select at Function the option Centered moving average Then at Span type in 4 Click on Turnover and click the move button. SPSS automatically gives it the name Turnover_1 which is perhaps not so useful. So change it into MA Then click on the Change button

Click on OK Save the data file as Example5


End: Video 9 Moving Averages.avi

If youve followed the instructions your screen should now look like:

31

32

Assignment Exercise 6 Determine the moving averages for your selected company Save the data file as Assignment6 After weve determined the moving averages we can use those to determine the trendline. This goes exactly the same as weve done before.

8.2 Trendline for seasonal data


Start: Video 10 MA Trend and forecast.avi

Open the data file Example5 Select from the menu-bar Analyze Regression - Curve estimation Select all the models and use now the MA instead of the turnover

From the output you can see that the Cubic model has the highest Rsq.

Now for the prediction Select from the menu-bar Analyze Regression - Curve estimation Make sure only the cubic model is selected Click on Save Mark predicted values Mark Prediction intervals -> Predict through make it to 2007 quarter 1

Click on Continue then on OK Save the data file as Example6

33

End: Video 10 MA Trend and forecast.avi

8.3 SPSS Seasonal correction


Once again also SPSS can calculate the seasonal correction for you.
Start: Video 11 Seasonal Decomposition.avi

Open the data file Example6 The following procedure will only work if all values are available. Since weve predicted some values we will need to delete these first.. Remove rows 17 till 24 (the once that were predicted) Remove the columns FIT_1, LCL_1 and UCL_1 From the menu bar select o SPSS 16 or earlier: Analyze Time series Seasonal decomposition o SPSS 17: Analyze Forecasting Seasonal decomposition Make Turnover the variable Select the additive as model And since we have an even number of periods use Endpoints weighted by .5

Click on OK The following screen is now available:

Remove the columns ERR_1, SAS_1 and STC_1 (they are of no use for this course) Save the data file as Example7
End: Video 11 Seasonal Decomposition.avi

Notice that the new column SAF_1 contains the seasonal corrections we were after. Question 7 Add the predicted values again from the cubic model, predicted through 2007 quarter 1 and use a 95% confidence interval. Save the data file as Example8 34

For our last steps unfortunately SPSS cannot immediately calculate it for us. But luckily we can still use SPSS somewhat as a calculator. We have now the seasonal corrections, the predicted moving averages, and their lower and upper limit with a 95% confidence interval. What we would like to do is show three other values: 1. The predicted value per quarter => predicted MA + seasonal correction 2. The predicted lower limit per quarter => predicted LCL + seasonal correction 3. The predicted upper limit per quarter => predicted UCL + seasonal correction Notice that for each we will need the seasonal corrections, which are listed in the SAF_1 column. However they are only shown till row 16 and not for the predicted values. This can be easily adjusted
Start: Video 12 Seasonal Decomposition UCL LCL.avi

Open the data file Example8 Select in the SAF_1 column the row 9 till row 16 From the menu bar select edit copy Click on row 17 from the SAF_1 column and From the menu bar select edit paste

Now that we have the seasonal correction for all periods we can calculated the 3 values we want. From the menu bar select Transform Compute Click on FIT_2 and move it to the Numeric expression Click on the button + Click on SAF_1 and move it to the Numeric expression As target variable, type in Prediction Click on OK

That took care of the first value Lets go for the second: From the menu bar select Transform Compute Remove everything at Numeric expression Click on LCL_2 and move it to the Numeric expression Click on the button + Click on SAF_1 and move it to the Numeric expression As target variable, type in Pred_Low Click on OK Save the data file as Example9
End: Video 12 Seasonal Decomposition UCL LCL.avi

That took care of the second one we wanted. Question 8 Add the third one we wanted. Save the data file as Example9 (overwrite the old one) After the exercise your final workbook should look like:

35

We could create an extra graphical output from this.


Start: Video 13 Seasonal Sequence Chart.avi

Open the data file Example9 From the menu bar select o SPSS 13 or earlier: Graphs Sequence o SPSS 15: Analyze Time Series Sequence charts o SPSS 17: Analyze Forecasting Sequence charts Move Turnover, Prediction, Pred_Low and Pred_High to Variables Move Date format to Time axis label

Click on OK Save the output file as ExampleOutput4


End: Video 13 Seasonal Sequence Chart.avi

Our final image looks like:

36

Assignment Exercise 7 Create the final prediction and a 90% confidence interval for that prediction for the next four quarters (including now the seasonal corrections). Save the data file as AssignmentEx7 and the output file as AssignmentEx7

37

9 Multiple regression (optional)


As you might have noticed we discussed simple linear and simple non-linear models. However also multiple regression is possible. With multiple regression we have more than one independent variable. So for example the price per page of a magazine might depend on the amount of readers and income of the reading audience. Since we now have two independent variables and one dependent variable we could still show the points in a 3D-diagram.
Start: Video 14 3D scatter.avi

Open the data file AdvertisingCosts From the menu bar click on Graphs Scatter/Dot Click on the 3-D scatter icon Click on Define Use the Page_cost as for the Y-axis; Use the Audience for the X-axis and; Use the Aud_Inc for the Z-axis

Click on OK
End: Video 14 3D scatter.avi

Notice that it is now a lot more difficult to see a pattern emerging. 38

Lets call the Price per page Y and we would like to predict Y by using the amount of readers X1 and the reading audience X2. The multiple regression equation than might look like: Y = b0 + b1X1 + b2X2 Again by using the least-square method it is possible to determine the best values for b0, b1 and b2, but we will leave it up to SPSS to do those calculations for us:
Start: Video 15 Multiple Regression.avi

Open the data file AdvertisingCosts Click on the tab Variable view and notice that the projected audience is in thousands From the menu bar click on Analyze Regression Linear Move the Page_Cost as the Dependent variable; Move the Audience to the independent variable and; Move the Aud_Income also to the independent variable

Click on Save Mark at residuals the option Unstandardized and; Remove any other mark that might appear.

Mark at predicted values the option Unstandardized;

Click on continue Click on OK Save the data file as Example10


End: Video 15 Multiple Regression.avi

39

In the output we encounter a lot of different tables and some might have some complicated names in them, however we will only focus on the table entitled Coefficients:

At the column B it shows in the first row the value for the Constant. That is in our case the b0. Then the coefficient for the projected audience. Since we said that X1 will represent the audience the value shown is therefore b1 and hence the last one must be the b2. So the multiple-linear-regression equation becomes: Y = 8173.423 + 3.767X1 + 0.718X2 The coefficient of X1 means that if all else remains the same, then one thousand people more in the audience will increase the price of the page with 3.767. So one person more will increase the price with 0.003767 (remember that the audience was in thousands). If we look in the data file you probably noticed two new columns have been added:

PRE_1 This column shows the prediction, in other words the predicted page cost according to the equation. RES_1 This shows the difference between the predicted cost and the real cost.

As you can see from the residual list the difference between the predicted and real value is often quite different. One of the reasons might be that there is a third element of influence on the price. Perhaps the gender of the audience. So lets add this to the equation: Y = b0 + b1X1 + b2 X2 + b3X3 With Y = price per page (in color) X1 = projected audience (in thousands) X2 = Household income (median) X3 = Percentage of male (from audience) Three independent variables and one dependent means that we would have to draw a 4D diagram, which up till today is not possible. However the mathematics dont really care about this and just use the same principles. So without being able to visualize the situation we can still use the least-square method to come up with the equation. With the help of SPSS we can find b0 = 4042.799 b1 = 3.788 b2 = 0.903 b3 = -123.634 Question 9 Open the data file AdvertisingCosts2 and create an SPSS output showing the values described above. 40

In the file AdvertisingCosts2 we now have two residual columns. One showing the difference between the predicted values according to the equation without the gender and the real values, and one with the predicted values from the equation with the gender. In order to see if the later does a better job we could compare the total differences. However both of them will have a total of 0. So what we will do is add a new column that will calculate the difference between the RES_1 and the RES_2.
Start: Video 16 Comparing results.avi

Open the data file AdvertisingCosts2 From the menu bar click on Transform Compute At Functions select double click on Abs(numexpr) Select RES_1 and move it to the Numeric Expression Click on next to the parenthesis Click on - Double click again on the Abs(numexpr) Select RES_2 and move it to the Numeric Expression Type in as Target Variable Diff

Click on OK Save the data file as Example11


End: Video 16 Comparing results.avi

What we have done is calculated the difference between the two residuals. If the absolute value of residual 1 is higher than of residual 2 the result of diff will be positive, if its lower then diff will be negative. In mathematical notation we have: Diff = |RES_1| - |RES_2| If the second equation (the one with the gender in it) is better, then the sum of the differences from the residuals must be positive
Start: Video 17 Descriptive Sum.avi

From the menu-bar click on Analyze descriptive statistics descriptives Move Diff to the Variable(s) Make sure only Sum is checked

Click on Continue Click on OK


End: Video 17 Descriptive Sum.avi

41

In the output you can see that the total sum is 6854.74. So by adding the gender we have increased the accuracy of the equation. Most likely there are a lot more variables that determine the price per page.

10 Some final comments


In chapter 3 we saw the use of moving averages, to smooth out the seasonal factor in a time series. Moving averages are also often used in stock forecasting. A basic technique is to use a short period moving average and a long term moving average4. If the two cross its then often a good moment to either buy or sell. In general long MAs are MAs of longer than 40 weeks, medium those between 4 and 13 and short less than 20 days. Some forecasters use also three moving averages. For example Dr. Melvin Pasternak suggest to use a 4, 9 and 18 day moving average. If the 4 crosses above the 9 and the 9 above the 18 consider buying. If opposite then consider selling5. Also there are many different types of moving averages to smooth the data. Terms as exponential moving average, sine moving average etc. For the interested reader StockCharts.com has a nice short introduction on how to use several moving averages in stock trading (http://www.stockcharts.com/education/IndicatorAnalysis/indic_movingAvg.html). In chapter we actually deviated from the time-series. The linear regression can also be used to test if two variables relate to each other. Also for this purpose there is a range of other techniques that can be used and tests to consider (for example the Chi-square). Forecasting is and remains always to some extend a guess. What we try to do is to make that guess as plausible as possible. It is wise not to rely only on quantitative forecasting but combine this with qualitative forecasting as well. Also as mentioned in the introduction we have only began to scratch the surface of forecasting. There are a lot of additional factors to take into consideration (and test) to make the forecast more accurate.

Source: Moving averages, http://www.stockcharts.com/education/TradingStrategies/AHillMAcrossover.html (04-06-2006) 5 Source: http://www.streetauthority.com/terms/simpleandexponentialmovingaverages.asp (04-06-2006) 42

Exercises
Exercise 1 a. The data file Chapter3Exercise1 contains the revenues of a company. Use SPSS to determine the simple linear regression equation. b. What do you notice in the scatter diagram? What might cause such an error? c. Change the value of 1400 into 140 and determine again the regression equation. Exercise 2 In below three scatter diagrams: Set 1
Set 1 35 30 25
6000 10000

Set 2
Set 2
90 80

Set 3
Se t 3

8000

70 60 50

20
4000

40 30

15 10 5 0 0 5 10 15 20 25 30 35
2000

20

0 0 -2000 5 10 15 20 25 30 35

10 0 0 5 10 15 20 25 30 35

a. b.

Which of the three sets will have a linear relation? Which of the three sets will have a non-linear relation?

Exercise 3 The data file Chapter3Exercise3 contains the average spending on social activities from a group of people. a. Create a scatter diagram to see if there might be a linear relation between time and spending b. If there is a linear relation, find the best simple-linear regression equation. Exercise 4 In exercise 3 we saw a scatter diagram that looks like:
Set 2 10000

8000

6000

4000

2000

0 0 -2000 5 10 15 20 25 30 35

a.

What kind of model do you think will best describe this data?

Exercise 5 In the data file Chapter4Ex2 you will find five sets of data a. Determine for each set the best model by looking at the diagram b. Write down the regression equation of the model selected at a. Exercise 6 At exercise 5 you have chosen a model based on the scatter diagram. Open the data file Chapter4Ex2 again, and now check if your visual choice also has the highest Rsq for each set. Exercise 7 In the output of the curve estimation we find the determination coefficient, but none of the other measurements discussed. Can you figure out how to show the covariance, correlation and the standard deviations with SPSS?

43

Exercise 8 Once again we return to the data file Chapter4Ex2. a. Use a 95% confidence interval to predict the next 5 values for each set and the UCL and LCL b. Why is the difference between UCL and LCL larger at the last period than the previous ones? c. For the first set also use a 99% confidence interval. d. What do you notice about the UCL and LCL at 95% and those at 99%? Exercise 9. At the firm Jamin are for The Netherlands the following turnover data of smartys available (in million euros): Year Turnover 198 9 29 199 0 28 199 1 31 199 2 34 199 3 37 199 4 38 199 5 38 199 6 37 199 7 40 199 8 44 1999 46

a) Create a SPSS data file from this data. Use the variable Turnover, and use define dates to add the
years. b) Create a sequence chart of the data. c) Determine the linear trend and give its regression equation. d) Create a sequence chart of the data including the trend. e) Determine which model will best represent the data. Exercise 10. Company Candy has for the Belgium market the following data from 1996 till 2005 on its quantity sold from the product Choco (x 1000): Year Quantity 1996 570 1997 580 1998 610 1999 600 2000 615 2001 710 2002 740 2003 800 2004 950 2005 1050

a) Create a SPSS data file from this data. Use the variable Turnover, and use define dates to add the
years. b) Create a sequence chart of the data. c) Use Curve Estimation to determine which trend will be best. d) Create a forecast for the quantity sold in 2006, also include a graphical representation. Exercise 11. Another candy company has kept record of its turnover in million of euros per quarter since 2002, see below: Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 200 2 24 40 54 28 200 3 32 52 70 36 200 4 42 66 86 50 2005 54 82 104 62

a) Create a SPSS data file from this data. Use the variable Turnover, and use define dates to add the
years and quarters. b) Determine the moving averages c) Create a sequence chart showing the trend based on the moving averages d) The management suspects that there are seasonal changes. Investigate this statement and determine if this is true. e) Add the seasonal corrections f) Create a forecast till the first two periods of 2007 including the seasonal corrections. g) Add a 95% confidence interval. h) Create a graph showing the prediction.

44

Exercise 12 Anox is a supplier of varies types of candy. From 1996 it recorded the following quarterly revenues:: Year 199 6 199 7 199 8 199 9 200 0 I 232, 4 249, 2 260, 4 268, 2 283, 8 II 154, 0 170, 4 186, 8 192, 8 206, 0 III 143, 6 151, 4 161, 0 180, 8 191, 0 IV 210,0 225,4 233,4 246,4 261,0

a) Create a SPSS data file from this data. Use the variable Turnover, and use define dates to add the
b) c) d) e) f) g) h) i) j) k) years and quarters. Create a sequence chart of the original data. Determine if there is a seasonal pattern based on the chart. Use SPSS to calculate the moving averages. Use Curve Estimation to create a regression equation from the linear and quadratic model. Give the regression equation for both models. Which of the two will be best? Determine the seasonal corrections Give a forecast for the expected trend (moving average) for the periods 2001-I till 2004-IV. Add a 95% confidence interval. Correct the results of h and i based on the seasonal correction. Create a chart to illustrate the forecast.

Exercise 13 The data file Employee data contains information about the employees of a company. a) Use SPSS to create a multiple-linear regression equation that can predict someones current salary based on his/her beginning salary; educational level; job category; amount of months he/she has been working and; amount of months of previous experience b) Predict with the formula found in exercise a, what someone salary is who is a manager, has been working for the company for 5 and a half years, has 3 years of previous experience, had a beginning salary of $30,000 and has as an educational level 15. Exercise 14 The data file World95 contains a lot of different type of information from various countries. Can you create any trustable regression model from this data?

45

Appendix 1 Additional information on R-square and F-ratio


As shown in paragraph 2.4 the determination coefficient is the ratio between the explained variance and the total variance. In this case variance does not really refer to s2 but more to the general word in English explaining variability. To measure variability statistical formulas often use the square instead of the absolute deviations. The main reasons for this are: - Mean deviations can produce similar results even when variability is different. - Working with squares is easier in mathematics than with absolute values. The R-square, F-ratio and significance all can be expressed as a proportion of different types of variability. There are three types of variability of interest: Type Original Regression Residual SS Formula
SS ori =

( y y )

d.f.
2

Mean Square

df ori = n 1
df reg =1 df res = df ori df reg

MS ori =
MS reg =

SS reg = SS res =

(Y y ) 2 ( y Y ) 2

SS ori df ori
SS reg df reg

MS res =

SS res df res

In the table above: - SS stands for Sum of Squares; - d.f. for degrees of freedom; - MS for Mean Square. - y = original value - Y = value according to regression equation Often the types are also be labelled Original => Total, Regression => Explained and Residual => Error (or unexplained). Visual representation of difference between Original and regression. The vertical lines represent the difference. The values will be squared and the total will give the Ssres: SS res =

( y Y ) 2

Visual representation of difference between Original and mean. The vertical lines represent the difference. The values will be squared and the total will give the SSori:

Visual representation of difference between Regression and mean. The vertical lines represent the difference. The values will be squared and the total will give the SSreg:

Relation: SSori = SSreg + SSres The determination coefficient can therefore also be calculated as: R2 =
2 Or even as R =

SS reg SS ori

SS tot SS reg SS tot

(Y y ) 2 ( y y ) 2

The F-ratio is calculated as: F= An alternative formula using the R2 can also be used:

MS reg MS res
R2

1 R 2 The F-ratio tells us precisely how much more of the variation in Y is explained by X than is due to random, unexplained, variation.
In order to see if this ratio is significant a F-test can be done. This is however out of the scope of this reader.

F = ( n 2)

2 Using a calculator
In this appendix some possible calculations that can be done on a calculator are explained. The data example that will be used is: t Turnover 1 29 2 28 3 31 4 34 5 37 6 38 7 38 8 37 9 40 10 44 11 46

2.1 Casio fx-82 ES / 83ES / 85 ES / 300 ES / 350 ES


Regression equations: The Casio fx-82 ES has the following regression models available. Please note that in the column Show as the notation for this model is shown that is used by Casio: Model Lineair Quadratic Logarithmic Exponential Compound Power Inverse Equation y = b0 + b1t y = b0 + b1t + b2t2 y = b0 + b1ln(t) y = b0 e y = b0 + b1t
b1t

Show as A+BX _CX2 ln X e^x AB^X AX^B 1/X

R2 Yes No Yes Yes Yes Yes Yes

y = b0 t b1

b1 y = b0 + t

Picture taken from http://www.casioeurope.com/euro/sc/technical/fx82es /

Setting up the calculator - activate the STAT menu: Press MODE 2

Select the type of model you would like to use (see table above) For this example we will use the linear model so press 2

Entering the data - You now have a screen with two columns X and Y. Press 1 = 2 = 3 = etc. to fill in the X values. Press then to go to the first row for Y Press 2 9 = 2 8 = 3 1 = etc. until the last one 46

Working with the data - Finding the R2. Notice that Casio uses A, B, etc. to indicate b0, b1 etc. Press SHIFT 1 (this will activate the STAT menu). Press 2 (this will activate the regression menu) Press 3 (this will give the correlation coefficient) Press x2 (to get the square of the correlation resulting in the determination coefficient). Press = Notice that the result is shown in the data table itself.

Finding the bs Notice that Casio uses A, B, etc. to indicate b0, b1 etc. First we have to delete the R2: Press UP DEL (the R2 is now deleted) Press SHIFT 1 7 (this activates the regression menu) Now to find b0 = A press 1 = Notice that again the value is placed in the table. You can delete it, and follow the steps again choosing now option 2 to get b1. Changing model At any time you can press SHIFT 1 1 to change the type of model. Forecasting Use the previous steps to find out A and B for the lineair model. If all went well, you should have found A = 26.56 and B = 1.66. Press MODE 3 to enter the TABLE mode. Press 2 6 . 5 6 + 1 . 6 6 x ALPHA ) = Now you will be asked what the first value of x should be, by default it is set to 1, which is okay, so press = Now for the end, since we would like to forecast we will use 15 so press 1 5 = As a last option you can define the steps. By default it is set to 1, which will result in only using integers for x, which is ok for this example so press = A table is now generated showing all the values according to the regression equation we specified.

Remarks Unfortunately the values in the STAT mode cannot be saved and need to be written down (or remembered). Also once you go back to computational (normal) mode the data will be lost. No confidence intervals can be computed. For further explainations/details please read the manual, available at http://ftp.casio.co.jp/pub/world_manual/edu/en/fx-82ES_83ES.etc_Eng.pdf (pg E33 - E43)

2.2 TI-83 / PLUS / PLUS Silver edition


Regression equations: The TI has the following regression models available. Please note that in the column Show as the notation for this model is shown that is used by Casio: Model Lineair Quadratic Logarithmic Exponential Power Equation y = b0 + b1t y = b0 + b1t + b2t2 y = b0 + b1ln(t) y = b0 e b1t y = b0 t b1 Show as LinReg(ax+b) QuadReg LnReg ExpReg PwrReg R2 Yes No Yes Yes Yes

Picture taken from http://education.ti.com/educationport al/sites/US/productDetail/us_ti83p.ht ml

Setting up the calculator - Clearing the lists Press STAT 5 ENTER This will setup the first 6 lists. Entering the data - Press STAT 1 Three columns are shown named L1, L2 and L3. The first row of column L1 is highlighted. Press 1 ENTER 2 ENTER 3 ENTER etc. to fill in the values. Use the arrow keys to go to the first row of the column L2. Press 2 9 ENTER 2 8 ENTER etc. The values have now been entered. Working with the data - Setting up the calculator Press 2nd 0 to activate the Catalog menu Use the arrow keys to go to DiagnosticOn and press ENTER ENTER

Using a model Press STAT Use the arrow key to select the CALC menu Press the value of the model you would like to use. For this example we will use the lineair model so press 8 Press 2nd 1 to select L1 then press , 2nd 2 to select L2 Now press ENTER You will see the general equation followed by the variables (b0, b1 etc), the correlation coefficient and the determination coefficient. E

Storing the information You can save the equation as an equation. Repeat the steps from using a model with one modification. When entering the L1 and L2 add also , VARS arrow key to select Y-VARS menu 1 1 This will store the equation as Y1. Press ENTER Press Y= Notice that the regression equation is now stored as a function

Remarks The TI-83 has some models that SPSS does now have. These are the Quartic and Sinus model You can also produce graphs in various ways and use them to show any forecast. Further details on how to show the graph and other options are available in the manual (http://education.ti.com/guidebooks/graphing/83p/83m$book-eng.pdf), mainly chapter 12 is of interest.

3 Bibliography
Anderson, D.R., Sweeney, D.J. and Williams, T.A. (1996) Statistics for business and economics (6th ed.) St Paul, West Publishing Andrew F. Siegel, A.F. (2003) Practical Business Statistics (5th ed.) New York, McGraw-Hill Irwin Barrow, B. (2006) Statistics for Economics, Accounting and Business Studies (4th ed.) Essex, Pearson Prentice Hall Berenson, M.L., Levine, D.M. and Krehbiel, T.C. (2006) Basic Business Statistics (10th ed.) New Jersey, Pearson Prenctice Hall Brinkman, J. (2002) Cijfers spreken voor economie, Groningen, Wolters Noordhoff Buijs, A. (1999) Statistiek om mee te werken (6 th ed.) Houten, EPN Burns, A.C. and Bush, R.F. Basic Marketing Research, New Jersey, Pearson Prentice Hall Burns, A.C. and Bush, R.F. (2006) Principes van marktonderzoek (4th ed.) fourth edition, original title: Marketing research, update with SPSS 12.0, edited by Swart, F. de, Iterson, A. van and Maks, H.; Amsterdam, Pearson Prentice Hall Casio (2006) Manual for fx-82ES (online), available at URL: http://ftp.casio.co.jp/pub/world_manual/edu/en/fx-82ES_83ES.etc_Eng.pdf De Veaux, R.D., Velleman, P.F. and Bock, D.E. (2006) Intro Stats (2nd ed.) Pearson Addison Wesley Easton, V.J. and McColl, J.H. (2006), Statistics Glossary-Time series data (online), available at URL: http://www.stats.gla.ac.uk/steps/glossary/time_series.html Gibbs, P. updated by Hiroshi S. (1996, 1997) What is Occams Razor? (online), available at URL: http://math.ucr.edu/home/baez/physics/General/occam.html Graziadio Business Report, The (2000) Forecast (online), available at URL: http://gbr.pepperdine.edu/001/forecast.html Hanke, J.E. and Wichern, D.W. (2005) Business Forecasting (8th ed.) New Jersey, Pearson Prentice Hall Hill, A (2006) StockCharts Arthur Hill on Moving Averages (online), available at URL: http://www.stockcharts.com/education/TradingStrategies/AHillMAcrossover.html Hopkins, W.G. (2001) New View of Statistics: Models (online), available at URL: http://www.sportsci.org/resource/stats/modelsdetail.html Markitquest (2006) Sales Forecasting and Demand Planning: Basic principles (online), available at URL: http://www.markitquest.com/controller-series/sales-controller/sales-forecasting/getting-it-right.htm McClave, J.T., Benson, P.G. and Terry Sincich, T. (2005) Statistics for Business and Economics (9th ed.), New Jersey, Pearson Prentice Hall Miller, J. (2006) History of Mathematics (online), available at URL: http://members.aol.com/jeff570 Pasternak, M. (2006) Street Authority - Simple Vs. Exponential Moving Averages (online), available at URL: http://www.streetauthority.com/terms/simpleandexponentialmovingaverages.asp Paulus, J.A. Innumeracy, pg 118, 119

Ross, S.M. (2005) Introductory Statistics (2nd ed.) Elsevier Academic Press Saunders, M., Lewis, P. and Thornhill, A.(2004) Methoden en technieken van onderzoek (3rd ed.) original title Research methods for business students, Amsterdam, Pearson Prentice Hall Sparling, D. (2004) Forecasting (online), available at URL: http://www.uoguelph.ca/~dsparlin/forecast.htm StockCharts (2006) Moving Averages (online), available at URL: http://www.stockcharts.com/education/IndicatorAnalysis/indic_movingAvg.html Tamhane, A.C. and Dunlop, D.D. (2000) Statistics and Data Analysis New Jersey, Prentice Hall Texas Instruments (2006) Manual for TI83 Plus(online), available at URL: http://education.ti.com/guidebooks/graphing/83p/83m$book-eng.pdf UCLA (2006) Annotated SPSS output: Regression (online), available at URL: http://www.ats.ucla.edu/STAT/SPSS/output/reg_spss.htm Wikipedia (2006) Occams Razor (online), available at URL: http://en.wikipedia.org/wiki/Occam's_Razor Wolfram Mathworld (2006) The webs most extensive mathematics resource (online), available at URL: http://mathworld.wolfram.com/

Pictures were taken from: Evaluating Hypotheses; http://ranger.uta.edu/~cook/dm/lectures/l6/ PennState; Department of Statistics; http://www.stat.psu.edu/~resources/Cartoons/cartoon010.gif MacTutor; The MacTutor History of mathematics archive; http://www-history.mcs.standrews.ac.uk/history/ , more direct URL: o http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Galton.html o http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Fisher.html o http://www-history.mcs.st-andrews.ac.uk/history/PictDisplay/Pearson.html

4 Essential Formulas
Formulas Description Arithmetic mean Formula Note. xi is the data element number i n is the number of items.

x=
Variance

x
i =1

2 sx =

(( x
n i =1

x)

xi is the data element number i n is the number of items.

n 1
s
n
2 x

Standard Deviation Covariance

sx =

s yY
Correlation Determination coefficient r=

( ( y i y ) (Yi Y ) )
i =1

xi is the data element number i n is the number of items. y will represent the actual turnover and Y the values according to the model.

n 1
Covariance sx s y

2

s y is the standard deviation of the actual turnover

s Y is the standard deviation of the model values


s y is the standard deviation of the actual turnover

Covariance r 2 = s y sY

s Y is the standard deviation of the model values

Simple linear regression A simple linear regression has an equation of

Y = b0 + b1 t

Using the least square method one can find:

b1 =

s ty s
2 t

(with

s ty =

((t
i =1

t )( y i y ) ) n 1

and s t2 being the variance of the t values) and;

b0 = y b1 t

Potrebbero piacerti anche