Sei sulla pagina 1di 201


Project Report Submitted in Partial Fulfilment of the requirements for the Award of the Degree

Master of Business Administration (Finance)


C.R. Narendra Kumar

(Regd. No. 10501)

Department of Management Studies


(Deemed to be University)
Prasanthi Nilayam 515134 Anantapur District, Andhra Pradesh, India


- - -- - -- - -- - -- - -- - -- - -- - -- --- - -- -

------------------------------Dr. Subramanian S (Project Guide)

Prof.U.S.Rao (Project Guide)

List of Figures

List of Figures
Figure3. 1: Mind mapping of the project topic Stock return distribution modeling ..... 31 Figure5. 1: Relationship between VIX, Mean, Skewnessand Kurtosis............................ . 52 Figure5. 2: Relationship between Mean, standard deviation, Skewnessand Kurtosis. ... 53 Figure5. 3: Density of a standard Normal distribution with mean=0 and standar d deviation 1.................................................................................................................. = . 55 Figure5. 4: Distribution for the Index returns over the period 2000-2001. .................... . 58 Figure5. 5: Distribution for the Index returns over the period 2001-2002. .................... . 59 Figure5. 6: Distribution for the Index returns over the period 2002-2003. .................... . 59 Figure5. 7: Distribution for the Index returns over the period 2003-2004. .................... . 60 Figure5. 8: Distribution for the Index returns over the period 2004-2005. .................... . 60 Figure5. 9: Distribution for the Index returns over the period 2005-2006. .................... . 61 Figure5. 10: Distribution for the Index returns over the period 2006-2007. .................. . 61 Figure5. 11: Distribution for the Index returns over the period 2008 ................... 62 2007. Figure5. 12: Distribution for the Index returns over the period 2008-2009. .................. . 62 Figure5. 13: Distribution for the Index returns over the period 2010 ................... 63 2009.

Figure6. 1: Histogram of Stock returns considered for the period of ...................68 Study . Figure6. 2: A Normal Curve of the Stock Index returns for the ........................... period . 69 Figure6. 3: Normal's generated from EM algorithm. ..................................................... . 72 Figure6. 4: Mixture of two normals ............................................................................... . 74 Figure6. 5: Comparison of Mixture normals and Normal distribution........................ of . 75 Figure6. 6: ModelingChanges in the regime over the period using Stock prices .. 77 Index .

List of Tables and Equations

List of Tables and Equations

Table5. 1: Basic statistics summary of the data from April 1st 2000 to March 31st 2010

displayedyear-wise. ..................................................................................................... . 52

Equation3. 1: Skewness................................................................................................ . 35 Equation3. 2: Kurtosis................................................................................................... . 37 Equation3. 3: Excess Kurtosis........................................................................................ . 37

Equation5. 1: Normal distribution density function. ..................................................... . 55 Equation5. 2: Probability density function mixture of two of normals ........................... 57 .

Equation6. 1: Probability densities function of Normal distribution. ............................. . 67 Equation6. 2: E-step function ........................................................................................ . 70 Equation6. 3: M-step for mean value............................................................................ . 71 Equation6. 4: M-step for standard deviation ....................................................... value . 71 Equation6. 5: M-step for weights Proportions .......................................................... or . 71

One of the most uncertain things in the present day world is the movement of the stock market. Many have tried to understand its nature and predict its movement. As a result so many different models have been developed which were able to predict them with certain accuracy. This project especially focuses on understanding the behavior of stock returns by concentrating on the mixture of normals distribution as a good fit to the Stock Index returns. This study begins by analyzing the nature of stock Index returns and their behavior. Understanding the nature of stock Index returns is crucial as it directly affects the return of the investors. If commonly used assumptions were used during investment decisions, then it may result in underestimation of risk, mispricing of assets and sub-optimal portfolio. The study then develops a model and tests its efficiency in describing the stock Index returns. We apply mixture of normals as the next probable fit in defining the stock Index returns. The study uses Expectation-Maximization algorithm and Maximum likelihood estimation (MLE) for getting the optimum results in defining the stock Index returns for the given data. We observe that the mixture of normals provide a better descriptive validity compared to Normal distribution. We also examine the changes in the regimes by the stock prices with the help of Markov switching model to understand the sensitivity of stock Index returns to the changes in the economy. Towards the end, the study attempts to check the dependence between VIX, mean, skewness and kurtosis of the considered data which can help us in better understanding of the stock returns.


Introduction 1. INTRODUCTION
1.1 Introduction:
In the present day world, investing in the stock market has become such a general activity that every other citizen is affected by its changes. It is affected by a number of factors in the economy. People have started earning more money and so started investing in the stocks too. They want to gain optimum returns from their portfolio by reducing the risks to minimum. In the light of these circumstances, understanding of the behavior of stock market is gaining importance. Identification of exact specifications of distribution of stock returns has gained importance. This is because it has very significant implications in the famous mean-variance portfolio theory, prices of contingent claims and in models of capital asset prices which are used for portfolio selection by investors. So many statisticians and financial economists were concerned about the description of stock returns. For a very long time in many financial econometric assumptions, this distribution was considered to be normal and all the returns are independent and identically distributed. But this was first proved wrong by Benoit Mandelbrot (Mandelbrot 1963) and his view was supported by Fama E. F (1963). Later on, many researchers came up with varied types of distributions trying to define the stock returns. Famous and pathbreaking papers among them are by Blattberg and Gonedes (1974), Officer (1972), L. Hagerman (1978) and Castinias (1979). In this study, we deal with the understanding of stock returns and their nature of behavior in order to model them. We undertake a broad study on the stock return distributions and the various characteristics that it shows at various points

in time. We also study the mixed normal distribution as a good fit for the stock returns.

1.2 Scope of the Study:

The study tries to examine the ability of mixed normal distribution to model the stock returns. It is based on the secondary sources of data alone and the study is for the period April 1 st 2000 to March 31st 2010. Data is collected from Yahoo finance website.

1.3 Purpose of the Study:

There have been many studies that explored the distributional properties of stock returns. But however there appears to be no unique distribution that can model stock returns accurately at all the times. There have been few significant milestone papers in this study right from 1900 by Bachelier that shaped the views of researchers about the returns. This study intends to explore the behavior of stock returns. It also seeks to investigate and understand the various characteristics of stock returns that influence the determination of distribution for modeling stock returns. The study also explores the fit of mixture of normal distribution for defining the stock returns.

1.4 Scheme of Chapterization:

The remaining portion of the report is organized into six chapters. Chapter- deals with the various studies and papers that have been published in this area of research. Then it considers the latest studies and updates that have been done and published in this area.

Chapter- deals with the concepts and definitions relating to stock return distributions that have been documented and those which are generally used in this field. It then reviews various types of distributions that are evolved to fit the returns based on the observed characteristics of the returns.

Chapter- describes the methodology of the data collection and design of the study. It details the software used and it also explains the limitations of the study.

Chapter- presents the study of the index return distributions of S& P 500 during the 10 year period. It establishes the presence of certain characteristics of returns in the stock market. Chapter- presents the Application of the mixed normal distribution model for S&P 500 and findings of the study. It also involves testing the efficiency of the model in describing the stock returns. It ends up with an attempt to draw out a best-fit model for various stock returns by comparing it with Normal distribution. Chapter- summarizes the results of the study, presents the conclusions and also suggests scope for future work in this field.


Page 3



Page 4


Literature Review

2.1 Introduction:
In this section of the study, we review the existing research on stock return distribution. We can classify the works of all the researchers into two types. They will be either proposition of a new model for defining the stock returns or a comparison of two or more proposed models to identify the superiority of the models in defining the stock returns. In the earlier part of the researches, there are many new models that are proposed. But towards the later part i.e. in the last few years, there is only comparison and all the proposed new models are only a modification of the earlier ones.

2.2 Literature Review:

The variation of certain Mandelbrot speculative Prices, By: Benoit

(1963) This break-through paper stated some conclusions which have changed the way researchers continued their research. The author disproved the path-breaking contribution of Louis Bachelier that the stock returns follow normal distribution. His empirical results showed that stock returns display fat tails and peaks which he called them as Outliers. So he introduced a new family of distributions for defining the stock returns and i.e. stable paretian family. He proposed this distribution which was introduced in Paul Levys classic Calcul des probabilites (1925) as it can accommodate all the observed characteristics of the stock returns. Later on in his analysis, he found out that the proposed model is reasonable in defining the returns. It may not be able to take sudden changes in the movements, but author states that all the sudden changes can be observed


Page 7

Literature Review
by a close look of the earlier movements which are considered by the

characteristic exponent of stable paretian distribution.

Mandelbrot and the Eugene F. Fama.





(1963) When Benoit Mandelbrot came up with a new type of distribution to define the stock returns, it had completely changed the researchers view about the returns. So to check the validity of the Mandelbrots proposed distribution, Fama had conducted this study. The Stable paretian distribution assumes that the variances are infinite which has tremendous implications. All the statistical tools were based on the assumption that variance is finite and if the proposed distribution is infinite, all the other tools make give distorted results. In this paper, author considered thirty stocks from Dow Jones Industrial Average and tested them on daily basis first differences of log prices. He found that in all the cases, the data showed distributions with long tails and when tested with stable paretian, it showed more consistency with stable paretian than the Gaussian hypothesis. So the author states that the stable paretian is a superior model to normal. But he also asserts that the model should enlarge the basis of testing to improve and accommodate other types of speculative series. He continues to state that the estimation of parameters based on samples may not give the expected distribution as the behavior of samples is not known at that point in time.

The Behavior of Stock market Prices: By Eugene F. Fama (1965)

In this paper, the author goes with the random walk theory of stock prices and gives a detailed study about the developments and assumptions that had


Page 8

Literature Review
happened in this field till then. He then supports a model which was suggested by Mandelbrot in the same year. The data considered consists of thirty stocks of Dow Jones Industrial Average for the period 1957 to 1962 with samples of about 1200-1700 observations per sample. The empirical analysis was not done on the daily closing prices, but on the differences of their natural logarithms. The main conclusion of this paper is that the Stable paretian distribution with characteristic exponent of less than 2 seems to be a better fit than the normal distribution. Towards the end, the author had emphasized on the future possible research. There has been little effort spent in exploring the more basic processes that give rise to the empirical distributions i.e. price formation in the stock market (Fama 1965).

The distribution of share price changes-Peter D. Pratez: (1972) The entire paper was based on Osbornes Brownian motion theory of share price changes. In this theory, there was an assumption that the share price variances are constant which is not true in practice. So after few modifications done by Peter D. Pratez to this model, it finally resulted into scaled t-distribution. The author considered nine years of Sydney stock exchange weekly observations (from 1956-66 a total of 462 observations) from seventeen different stock indexes for his study. According to him the empirical evidence showed that scaled tdistribution is a much better fit to the data than the normal, compound process and the Stable paretian distributions.

The distribution of stock returns: By R.R. Officer : (1972) In this paper the author checks the validity of symmetric stable class ability of defining the distribution of stock returns. The author states initially that the


Page 9

Literature Review
distribution of stock returns does not belong to the family of normal distributions but to the family of non-normal distributions. Later on from the work done by Teichmoeller, the author takes the clue and had done the research will to be check able to whether define Symmetric the stock stable return distributions family

distributions. The author took random sample of 39 stocks which were listed continuously from January 1926 to June 1968, a total of 509 observations. Finally from the empirical analysis, the author finds that the returns have some properties of a stable process but not all the characteristics of stock returns can be explained by stable class. The author found that the monthly stock returns follow the property of stable returns and states that it is reasonable to use stable distributions for defining monthly returns (for at least sums up to 5 months). The author also found that the sample standard deviation behaved well apparently as a measure of dispersion.
A comparison statistical of the stable and student distributions as

models for stock prices-Robert C. Blattberg and Nicholas J. Gonedes (1974) The authors compared the then proposed Stable distribution which was an alternative for Gaussian distribution with the Student distribution. Pratez have done a comparison previously between normal, compound events model, stable and Student models. In a similar manner, Blattberg and Gonedes have compared the Student and Stable models in this paper. After testing them empirically they finally got to the conclusion that the Student model gives a better fit than Stable model. They took both daily and weekly observations of 30 securities in the Dow-Jones industrial average over a period of 1957-62.They compared and found out that the stable model is not a better fit since it lacks convergence to normality. So they generated stable numbers with known


Page 10

Literature Review
characteristic exponent. They also found the difference in the value of loglikelihood ratios.

More Evidence on the distribution of security returns, By Robert L.

Hagerman:(1978) In this paper the author have tried to provide more empirical evidence on the assumption of the stock return distribution as stable symmetric distribution as assumed in The Sharpe, Lintner and Mossin capital asset pricing model. He stated that the main aim of this paper is to answer the question of what distributional model should be assumed while testing hypothesis for stock returns. So to find the most promising distribution, the author considered three types of hypotheses. They are 1) Security returns follow stable symmetric distribution with an infinite variance as proposed by Mandelbrot and Fama. 2) A mixture of normal underlying distributions as discussed by Bones, Chen and Jatusiptak. 3) The daily securities follow student distribution. Later as many researchers found that the characteristic component estimated in the stable symmetric distribution comes for the addition of the stock returns individually, the author concluded that the daily stock returns do not follow the stable symmetric distribution (also stable normal distribution).In this paper the author claims to consider a larger sample to replicate the results of other studies and also by using AMEX stocks. The author considered daily security return data from CRSP from August 14, 1962 to December 31, 1976 consisting of 805 securities traded on NYSE and 286 securities trading on AMEX. Finally after the empirical research the author found that the symmetric stable distribution does not give an appropriate sketch of the distribution of stock returns. So the author is left with the remaining two distributions i.e. mixture of normal distributions


Page 11

Literature Review
and student t distribution which he says pose a difficulty in theoretical and

empirical research in finance.

A .James








Press.(1970s) A new distribution has been proposed by the author which is Compound Events model. This model differs from the earlier ones with respect to one assumption. The logged price changes are assumed to be from a Poisson mixture of normal distributions instead of stable distribution. Author considered data of 10 stocks from Dow Jones Index for the period 1926-1960. As the model is constructed based on the assumption that the stock prices follow random walk theory, price prediction is innately not possible. But the paper aimed at studying the distribution of stock returns. The proposed model has to be compared with other competing models to check whether the proposed model is superior to them. But this comparison has not been done. The author also states that the model should be empirically tested with the samples considered and also other real time data to check the efficiency of the model which has not been done. The author also asserts that a better definition of the distribution can be done if a larger sample sizes are considered for analysis.

Macro prices: by








Richard P. Castanias:(1979) In this paper, the author tried to explain the efficiency of markets and see whether the market is able to absorb the macro information that is produced in the market quickly. The author goes on to say that this absorption will lead to create observed distributions which are nonsimple distributions in the market


Page 12

Literature Review
prices. The author created a market factor which will act just like a highly traded assets to study the market. The author tried to study the relationship between the market factor and various general economic indicators and found none that can explain the variation in the market factor explicitly. The variation or the difference between the log of the daily S&Ps Composite Index was used as a substitute for the market factor by the author. The period of study is from January 1, 1973 to June 30, 1977. From the empirical studies, the author found that the distribution of market factor is affected consistently by other processes or distribution (may be of process central to the occurrence of macro information). According to author, this affect is caused to a maximum extent by the sample variance and not the sample mean of the distribution of price changes. In this paper, the author considered five different categories of economic events to check the efficiency of the market. In this, the speed at which the market absorbed the information is considered as against the accuracy of it. The five series were 1.Department of commerce series 2.Federal reserve board announcement of news and policy statements 3.Price index series 4.First series of market days after all weekends and holidays 5.Federal Reserve Board announcement of routine statistics. Steps have been taken to avoid the subjectivity of the information by the author. The final conclusion that was arrived was that the chief determinant of the changeability of the market factor is the coming of information of broad economic content. The author also found an indication of the anticipation effect ahead of the release of customary statistical series.


Page 13

Literature Review
The Stable Paretian distribution, Subordinated stochastic

Processes, (1979)







Investigation: By David E. Upton and Donald S. Shannon

The author investigates about the appropriateness of lognormal assumptions for stock returns as assumed initially for the study of investor behavior. The data set consisted of 235 monthly returns for 50 companies which were chosen at random with the condition that they are listed continuously in NYSE from January 1, 1956 to July 1, 1975. The study was done at various levels i.e. monthly, quarterly, semi-annual and annually to check for the varied characteristics that may arise due to time differences. After the empirical studies, the authors summarized that the assumption of log-normality is reasonable for longer horizon periods. But it is questionable even for monthly horizons. The authors also observed that both the portfolios and individual assets distributions converge into log-normality in the long run. The other finding that they came across was that of consistent leptokurtosis that was found in the monthly horizons fell as the time horizon is lengthened. Finally they proposed that the Subordinated stochastic process is a preferred model than the Stable Paretian model.

Models of stock returns-A comparison By Stanley J. Kon.

(1984) In this article the author showed empirical evidence through log-odds results saying that the discrete mixture of normal distributions model is more explanatory than the Symmetric Student distribution model. He Used an 18 year time-series (from Jul 2nd 1962 to Dec 31st 1980)of 4639 daily return observations on the Standard and Poors composite(S&P) to reach this conclusion. He divided both the indexes i.e. CRSP value weighted and the CRSP


Page 14

Literature Review
equal weighted on three basis. 1)On yearly basis to roughly account for time-ordered events, 2)on the day of the week basis to account for cyclical events and 3)on both, year and on the day of the week basis to account for both the effects. All the three indexes showed significant kurtosis and skewness at 1% probability level. But by partitioning the data into annual sub periods, the first two showed a reduction in the kurtosis coefficient test statistics. So on testing the third index, it gave strong reason to consider the third than anything else. It showed that the model should accommodate both the cyclical and time-ordered shifts. He finally found out that the generalized discrete mixture of normals model is more descriptive than the student model through maximum likelihood estimates.

A General Returns: by






Richard M. Book Staber and James B. McDonald (1987) In this paper, authors proposed a generalized distribution which is again a mixture of different types of distributions like log-normal, logCauchy and log-t. This was done to incorporate the various characteristics that stock returns exhibit such as fat tails, infinite higher moments etc. The resultant distribution which is called as GB2 is more flexible in accounting for different characteristics of stock returns. This may look like a mixed distribution and has a density function which is easily expressible. The authors showed the ways in which anyone generally goes in defining the distribution i.e. by describing the procedure in which returns rise and thereby defining the nature of stock returns. The other way in which researchers do is through empirical like observation skewers of stock fat tails. returns Based and on their those characteristics this and

characteristics they try and define the distribution of stock returns. In


Page 15

Literature Review
paper, authors tried to define it with a mixed distribution called GB2 i.e. Generalized Beta of second kind. Authors used bootstrapping technique to generate data for research. This study concludes that GB2 is a better fit than log-normal and significantly better for short time intervals.

On measuring stock return






distributions: The case of the Market Index :By S.G. Badrinath and Sangit

Chatterjee (1988) The authors have done an exploratory investigation on a type of distribution known as (gh) distribution based on the paper of Tukey (Tukey 1977; Hoaglin 1983).They wished to incorporate the characteristics of skewness and elongation of the stock returns which they have observed. They also stated that (gh) distribution can be assumed as a similar one to a mixture of distributions. Many researchers have tried defining the stock return distribution .But none of them have given a single scalar to represent the shape patterns in the return distributions. The present paper gave a superior analysis in terms of defining the skewness and skewness induced elongation (kurtosis). The paper aimed at providing easier means of describing the stock return distributions and to understand the skewness and kurtosis of the stocks enabling individuals to make a wiser decision in making their portfolio selections. The authors considered Center for Research in Security Prices (CRSP) daily returns for the period 19621985. From the analysis, they found out more complicated forms of skewness and kurtosis in the data. The elongation in the middle and at the tails is different. They felt that this is a much simpler way (practically) of modeling the distribution functions for stock returns. They have found out for the market index as a whole and so they felt that it can be done for individual stocks too.


Page 16

Literature Review
According to them all these have important implications for formulating

skewed portfolio strategies.

On Estimating Skewness in Stock Returns: By Hon-Shiang

Lau, John R. Wingender and Amy Hing-Ling Lau.(1989) In this paper, authors have given an alert call to all the researchers in this topic about the consideration of skewness factor in their study. This is because incorporating skewness factor in returns is a major factor and if not considered can lead to misleading results. So in this paper, they have shown common error that researchers do and also provided with a dependable approach in approximating stock returns skewness. They based their study on the study done by Singleton and Win gender in 1986. They considered log-normality as the assumption of stock returns and tried providing a skewness estimation measure. After doing empirical study they have found a rough estimation for the confidence interval for log-normally distributed variables and showed the importance of sampling errors that may lead to wrong conclusions. But the limitations of this paper are that they suggested the confidence intervals based on a normal distribution and not for other type of distributions. So this cannot be applied universally for all types of distributions. The other limitation is that all their results are applicable only if the distribution follows log-normality. For all other Non-lognormal distribution, these results cannot be applied.

The distribution of stock market returns: 1958-1973 by Peter

Praetz and Edward J.G. Wilson. ( 1980s) In this paper, authors basically tried to find out empirical evidence of the ability of defining stock returns between Student t distribution and stable distribution.


Page 17

Literature Review
The authors considered various contemporary papers and finally chose to do on these two leading distributions. So they considered Monthly share returns which were continuously compounded from January 1958 to December 1973 from the Melbourne stock exchange. They then constructed portfolios of sizes 5,10,20,40,100 and 909 securities. Then they used chi-squared statistics for finding the goodness of fit. They found out that Student t distribution is clearly a better fit than stable paretian. Authors also state that normality assumption is also not a bad approximation for medium sized portfolios. Non - Stationary returns are the main determinant in the typical distribution shape as per the authors and that the Errors in the price relative file is of no reason. They finally limit all their conclusions to monthly returns.

Empirical comparisons of distributional models for stock index returns:

By J Brian gray and Dan w. French.(1990)

In this paper, authors have tested the ability of description power of normal distribution to model log-price returns of S&P 500 composite Index. They tested the logistic distribution to model the stock returns. But to find Logistic a superior fit, they tested four and alternative exponential models power simultaneously on a large base of stock returns i.e. Normal distribution, distribution, Scaled-T distribution distribution (EPD). They considered Kolmogrov-Smirnov (K-S) test, Cramer-Ron mises and Anderson-darling tests for testing the goodness of fit of the data to a particular distribution. The data under study runs over a period of 8.5 years i.e. from 2nd Jan 1979 to Sept 30th 1987.After testing the data empirically; all the tests rejected the normality hypothesis. But the goodness of fit tests also showed that EPD is the superior description model to stock returns than the other distributions under consideration. They found that when the


Page 18

Literature Review
significance levels were lower than 5%, K-S test didnt show a higher rejection from normality hypothesis as they identified it to be a weak test as it considers only large number variations and ignores small changes.

Behavior of International stock return distributions: A simple

test of functional form By Vihang, kedreth and summon: (1990s) In this paper, they tried to find whether there are any differences in the global return distributions as per their Pearson types. For checking out whether this difference exits or not, they introduced a simple criterion called kappa. They studied two types of markets i.e. developed markets(DMS) and emerging markets and found that among DMS, they found a little difference in the their distributional form where as among EMS they differ from each other significantly. There are so many distributional types suggested to define stock returns for a particular market. Kappa criterion identifies the best among those distributions. The data consisted of 8 DMS and 8 EMS for the period JAN 1976 to DEC 1988.After studying the data, they concluded that the functional forms of stock market returns are similar over the period considered for DMS and is consistent with lognormal distribution which is often used. In the case of EMS the differences existed in the beginning and as the markets developed, their distribution forms also tend to replicate with those of the DMS.

The Empirical Distribution of UK and US Stock Returns: By RICHARD D. F. HARRIS AND C. COSKUN KUCUKOZMEN (2001) Defining appropriate distribution model for stock returns has been a difficult task and there had been many distributions that were proposed to define the stock returns. In this paper, authors have compared two of the most well-known


Page 19

Literature Review
distributions that have the ability of incorporating varied levels of skewness and kurtosis which is generally observed in the stock returns. Those two distributions are 1.Exponential generalized beta (EGB) and 2. Skewed generalized t distribution (SGT). These distributions are proposed by other researchers, but were considered in this paper for testing their efficiency. The data considered consists of continuously compounded daily, weekly and monthly returns of the UK and US stocks and indices spreading from January 1 1979 to December 31 1999.Later on, they consider implementation of VaR to check the implications of their results arrived. They concluded that EGB and SGT gave drastic improvement over normal distribution and SGT seemed to have given a slightly better fit than the EGB. They also found that at lower levels of significance, SGT gave a better VaR estimation for the UK than the other normal and student tdistributions. But at higher levels of significance, normal distribution proved to be a better tool for estimating the VaR for both UK and US portfolios.

A Model For Stock Return Distribution: By Mikael Linden (2001)

A new type of distribution had been proposed by the author for characterizing the stock returns. It is also a type of mixture distribution in which a normal and an exponential distribution have been mixed. Maximum likelihood estimates were also done to check the validity of the distribution. The data consists of 20 most traded securities in the Finland market i.e. Helsinki stock market from June 1 1987 to January 28 1989. According to the author, Laplace distribution is the alternative preferable to normal distribution to accommodate the observed excess kurtosis. This distribution has been extended to the asymmetric case too. After doing empirical analysis, author found out that Laplace is a reasonably


Page 20

Literature Review
good model for defining the daily stock returns. But author had accepted the fact that the unconsidered parameters like time dependency and the other regressions involved would be considered in the subsequent research that would be taken up.

Mixed Normal Conditional heteroskedasticity: By Markus

Hass, Stefan Mitinik, and Marc s. Paolella. (2003) Mixed normal distributions which are unconditional in nature and GARCH models with conditionality have been introduced by the authors in this paper. They have used mixed normal combined with GARCH- type structure which agrees for conditional variance. The data considered consists of NASDAQ index daily returns from its inception i.e. from February 1971 to June 2001, a sample of 7681 observations. They then did the empirical analysis and also compared with the then most famous model i.e. Markov-switching model to check the differences and efficacy of the models. After the investigation into the properties of the stock returns, they came across the rich dynamics of the time varying skewness and kurtosis which were not captured by the other GARCH models have been incorporated through this model. They also gave insight for future investigations in the area of time varying mixture of weights, employing more general GARCH structures, multivariate models for realistic issues relating to risk management.

A conditional distribution model for limited stock Index returns: By Ralph and Walter (2006) In many of the stock markets, they have a price limit regime for all the stocks in that Index. This is usually done to reduce volatility and making investors to


Page 21

Literature Review
rethink their investment strategies. So under this price limit regime, the returns of stocks can be different and so this paper tries to model the distribution of the stock index returns with the conditionality of price limits. Initially they tried out the beta distribution for modeling returns and found that it cannot take excessive kurtosis i.e. when the returns hit the limit. So they introduced mixed beta distribution to define returns. They found results empirically that with 95% confidence band, the stock index returns (considered shangai-A and shangai-B) follows specified mixed beta distribution. The analysis is based on data consisting of value weighted daily closing price indices of the Shangai-A shares and Shangai-B shares.

The Distribution of S&P 500 Index Returns: By William J. Egan. (2007)

This paper deals with comparison of three basic types of distributions i.e. Normal, Lognormal, and Student t distribution. Author considered stock data of S&P 500 Index for the period 1950-2005 i.e. around 14090 values. After doing empirical analysis, author found out that the normal distribution is not at all a good fit for daily percentage returns. This is because; it showed skewness of - 0.91 signifying negative asymmetries and Kurtosis of 24.7 signifying heavy tails in the returns distribution. Later on Jarque-Bera test was also done to test the normality, which gave result of p-value as <0.001 signifying low probability of null hypothesis being accepted i.e. it being normally distributed. Then author uses lognormal distribution to define the stock returns, which again showed a skewness of -1.3 and kurtosis of 35.2 for daily continuously compounded returns indicating that they are not normally distributed. Finally, author finds t-distribution as the best distribution in defining the daily stock returns if they are given with a scale (standard deviation) and location (mean) parameters. This


Page 22

Literature Review
was found out when they are plotted against daily percent changes of S&P 500 data on a q-q plot. The result showed a p-value of 0.39 signifying that t-distribution is not significantly different from the actual data distribution.

Empirical distributions of stock returns: Paris stock market-

By Stella kanelloponlou and Epaminondas panas (2008) Considering the importance of determining stock return distribution in finance and economics, these authors have tried to define the stock returns using Log price relatives. But after testing the normality using Jarque-Bera statistic, it clearly rejected the normality hypothesis. It showed skewness beyond tolerance levels for a normal and was having Lepto-kurtosis to allow for these characteristics, authors selected Levy-Stable distribution as a suitable one. After empirically testing it, they found returns are consistent with the stable distributions. They also introduced Hurst coefficient to estimate the long memory process. This Hurst coefficient is done along with stable distributions. Additionally they used AFIRMA models to test the long memory behavior of the stock returns and found that there exists long range dependence/Long memory among Paris stock returns. They took data from CAC Index of nine stocks covering a period from Jan 2nd 1980 to May 31st 2003 i.e. 6108 trading days. From their conclusions, they found that stock returns exhibit long memory and so it violates the famous market efficiency hypothesis.

Distribution switching of stock evidence by: Kosei Fukuda (2009)



There are six alternative distribution models, which are being considered to check their validity in modeling the distribution of stock returns. They are the


Page 23

Literature Review
normal distribution model, t-distribution model, normal with parameter change, t-distribution with parameter change, normal and t-model, and finally t and normal model. The data consists of monthly time series of international financial statistics for Canada, France, Italy, Japan, UK and US for the period 1959 to 2005. After the empirical analysis, the study shows that the returns arise from various distributions at different switch points. In the current study, the data i.e. monthly stock returns before the switch point were generated from normal distribution and after the switch point they were from tdistribution. The author also states that both these switch points were due to some economic problems which were caused at international level during his period of study.

Modeling skewness and kurtosis with the skewed GaussLaplace sum

distribution: By Markus Hass (2009) The author introduces a modified model of Gauss-Laplace Sum(GLas) distribution which was introduced by Haas with asymmetries included in it. The new modified model is known as skewed Gauss-Laplace sum Distribution (sGLas) which includes a normal and also a fat tailed component which will be modeled as a Laplace random variable along with an asymmetric component induced in it. Author considered daily returns of three indices (NASDAQ, Dow Jones and S&P 500) of US stock market from January 1990 to September 2007.Then author compared the proposed distribution with other competing distributions to check the superiority of sGLas over others. The other distributions considered were Normal, Laplace, skewed Laplace, GLas, SGLas, and Generalized Exponential Distribution (GED) and skewed GED. After doing the empirical analysis, author found out that there was substantial kurtosis in the observed data that even the normal distribution gets higher loglikelihood.


Page 24

Literature Review
The Jarque-Bera test statistics show that Skewed GED improves over other subclasses. But the author also states that it is still not an appropriate distribution as it needs to incorporate skewness and kurtosis more than what it is now.

2.3 Conclusions:
From the above literature review, we can conclude that there had been many studies done in defining the stock returns and their nature of distribution. But the study still continues in trying to define a better distribution for stock returns. But before this, we need to understand the importance of defining each of the terms that we use in this report. So in the next chapter, we define the actual meaning that we considered for each of the terms in this report.


Page 25

Literature Review


Page 26


Theoretical Background
resulting distribution from these continuous random variables is called as Continuous Distribution. 3.4.4 Univariate distributions: A distribution is said to be a Univariate distribution if it is the probability distribution of one random variable. This is considered when there is only one dependant variable or each data point has only one scalar component. 3.4.5 Multivariate distributions: A multivariate distribution is a more general account of random variables i.e. taking Univariate distributions to higher dimensions. This distribution is more practical and realistic in nature as any variable is affected by more than one variable.

3.5 Moment:
In statistics, moment basically tries to provide a value for the set of points and their shape in a particular distribution. There are kinds of moments like 1st moment, 2nd moment and so on. Generally, the second moment is used to quantify the width of the points in a particular distribution. There is also another type of moment i.e. central moment which is the moment of higher order values. These central moments are used over ordinary moments for they help us in identifying distribution characteristics as regards with their spread and shape instead of their location. There are several levels of central moments that are considered in defining a particular distribution.


Page 34

Theoretical Background
3.5.1 Variance:
A measure of the dispersion of a set of data points around their mean value. Variance is a mathematical expectation of the average squared deviations from the mean. -Source: Investopedia

This is the second central moment that is measured. It is comparable and has additive property as it is calculated as the volatility from its mean. The square root of variance is the standard deviation of that particular data which explains the width or spread of the set of points in the data considered away from their mean.

3.5.2 Covariance:
Covariance is the degree to which two sets of variable are correlated. Variance is a particular case of covariance when both the variables are alike. If the covariance is positive, then it means that both the sets of data react similarly for a change. If it is negative covariance, then they react inversely.

3.5.3 Skewness:
This is the third central moment of the data which is called skewness. It basically tries to explain the unevenness of the distribution from its standard normal distribution. This unevenness can be either left skewed or right skewed. If a particular distribution is having most of its data accumulated on its left side of the curve, then it is called negatively skewed. On the other hand, if it has most of its data clustering on the right side of the distribution curve, then it is called positively skewed. This skewness can be calculated using the following formula:

Skewnes s =

Equation3. 1: Skewness


Page 35

Theoretical Background
Where 3 is third moment about the mean and is the standard deviation of the data. For a normal distribution, there is certain tolerable level of skewness that can be accepted for classifying it as normal distribution. We can test whether a distribution is significantly skewed or not by testing it using following steps:
1. Calculate the standard error of skewness.

2. Double this standard error and set the range of it from positive to negative.
3. Then check whether the skewness that is calculated falls

within this range. For example: If the standard error is 0.123, then double it to get the value of 20.123=0.246. Then set the range from positive to negative i.e. -0.246 to +0.246. Check whether the calculated skewness falls within this range. If the calculated skewness falls within this range, it means that it is not significantly skewed; lest it means that it is significantly skewed showing the non-normality of the distribution.

3.5.4 Kurtosis:
This is the fourth central moment of the data which is called Kurtosis. This helps us in describing the shape of the probability distribution about its peakedness. A particular distribution can be more peaked or flat depending upon the concentration of the data points around its mean. Different types of distributions have different kurtosis levels which are acceptable. Because of the dispersion of the data, or the volatility of the data, the distribution curve can be either peaked or more flat. If a particular distribution has sharper and longer peaks with fat tails, it will have a high kurtosis. On the other hand, if it has a round peak and

Theoretical Background
thinner tails, then it will have low kurtosis. This kurtosis can be calculated using the following formula:

Kurtosis =

4 4

Equation3. 2: Kurtosis

Where 4 is the fourth moment about the mean and is the standard deviation of the data. Similar to that of skewness, kurtosis also has some limits to tolerance for a data to be accepted as a normal distribution. This can again be done by following the above three steps as explained for skewness. The steps are as follows: 1. Calculate the standard error of Kurtosis. 2. Double this standard error and set the range of it from positive to negative.
3. Then check whether the skewness that is calculated falls

within this range. If the calculated kurtosis falls within the range then it is acceptable level for a distribution to be classified as a normal distribution. There is another concept called excess kurtosis which is the additional kurtosis that a data displays over the normal distribution. This is calculated as follows.

Excess kurtosis =

4 3

Equation3. 3: Excess Kurtosis


Page 37

Theoretical Background
The subtraction of 3 at the end signifies the correction factor to the data to make kurtosis of normal distribution equal to zero. If the excess kurtosis is positive, then the distribution is said to have Lepto-kurtotic. But if the excess kurtosis is negative, then the distribution is said to have Platy-kurtotic. The following is the brief history of developments that has taken place in this field of defining the distribution for stock returns. Stock market returns and defining those returns has been the effort of many researchers since 1900s. The first person to initiate the endeavor of defining them was by Louis. Bachelier in 1900 in his first thesis paper titled Theory of Speculation. He developed a set of models assuming that the stock returns are normally distributed. But his assumption was proved wrong when Mandelbrot (1963) came up with his empirical analysis that stock returns exhibit heavy tails than tolerable under normal distribution. Sohe proposed a different family of distributions to incorporate these heavy tails. He proposes Stable paretian distribution having a characteristic exponent of less than two. In 1964 Osborne found from his sample returns of stock indexes that the stock returns can be narrowly approximated to normal distribution. But Fama E.F (1963) had also provided proof in support of Mandelbrot that the stock returns can be explained through stable paretian family of distributions. Peter D. Praetz (1972) gave theoretical as well as empirical evidence showing that the distribution of stock returns should be from Scaled t distribution. He had based his data on the Osbornes Brownian motion theory. In the same year R.R Officer came out with his research that the proposed stable distribution can be used only as an approximation for the monthly returns and this does not apply for daily returns


Page 38

Theoretical Background
up to twenty days. He also found out that the standard deviation showed good signs as a measure of dispersion. Robert C. Blattberg and Nicholas J. Gonedes (1974) compared these two distributions to find their superiority over describing the stock returns. They found out that Student t distribution has more descriptive validity than the symmetric stable model. In 1978, Robert L. Hagerman provided more evidence on the distribution of stock returns that they dont follow symmetric stable distribution when he compared symmetric stable and mixture of underlying normal. He concludes saying that the distribution has to be either student t distribution or mixture of underlying normal distributions.

In 1980s P. Praetz and Edward J.G. Wilson have compared the ability of defining stock returns between Student t distribution and stable distribution. They found out that Student t distribution is clearly a better fit than stable paretian. They also found that normality assumption is not a bad approximation for medium sized portfolios. In 1984, Stanley J. kon has come up with a very relevant paper signifying the importance of discrete mixture of normals. According to his study, the discrete mixture of normals distribution model has substantially more descriptive validity than the student model (Kon 1984). Later in 1987, Richard M. bookstaber and Mc. Donald came up with another type of mixture distribution called as generalized beta of second type which is a mixture of so many kinds of distributions. Going on the same line of thought, Badrinath and chatterjee came up with a different kind of mixture distribution called (gh) distribution in 1987. Their major contribution to this field is that of defining the skewness and kurtosis of the data more accurately and easily. Then the next attempt to make a comparison

Theoretical Background
between the distributions is done by J Brian gray and Dan w. French in 1990. They compared the superiority of the distributions in defining the stock returns. They considered Normal, Logistic, Scaled t-distribution and Exponential distributions. From their empirical study they found out that Exponential power distribution gives a much superior fit than all other distributions considered. In 2001, with the intention of proposing a mixture distribution, Mikael Linden came up with a type of mixture distribution in which a normal and an exponential distribution have been mixed. According to him, Laplace distribution is the next best alternative to the earlier proposed normal distribution. In 2007, another researcher found out through comparing three distributions that scale-t distribution is better in defining the stock returns.(Egan 2007). Stella kanellopoulo and Epaminodas panas have proposed Levy-stable distribution as a better distribution to accommodate the observed Lepto-kurtosis in the stock returns. They also used Hurst component in defining their distribution.(Stella and Epaminondas 2008). In 2009, Markus Hass compared Normal, Laplace, skewed Laplace, GLas, Skewed GLas, and Generalized Exponential Distribution (GED) and skewed GED for finding the superiority of the distributions. After empirical analysis, he found out that the Skewed GED surpasses all other distributions in defining the stock returns. Again in the same year, Kosei Fukuda has compared various distributions to find the superiority of the distributions. The distributions compared are normal distribution model, t-distribution model, normal with parameter change, t-distribution with parameter change, normal and t-model, and finally t and normal model. He found out that the stock returns switch from one distribution to another at various points. So he found out that the monthly stock returns were from normal distribution and after the switch point they were from tdistribution. He also continues to say that

Theoretical Background
these switch points are also due to some economic problems. The data he considered consists of statistics for Canada, France, Italy, Japan, UK and US for the period 1959 to 2005. So finally there is no one distribution that has the ability to define stock returns till date. So this gives an opportunity for one trying to find the nature of stock returns and then defining it after undertaking appropriate level of study in depth.

3.6 Conclusion:
There have been many distributions that have been tested in literature. But none of them gave a universal standard fit to define the stock returns. This gives us the opportunity to try for an ideal distribution that can accommodate all the variations that are displayed by the stock returns at various times. Presently people use various assumptions in considering a particular distribution for their portfolios depending upon their nature of risk taking ability and the size of portfolios.


Page 41



This chapter basically discusses about the way in which the study has been done. It explains the method of collecting the data, collating the data and filtering the data as per the requirements. Each of these data collection techniques are used based on the methods that are used generally by other researchers.

4.2 Data Collection:

The daily data has been collected from the Stock market Index considered i.e. from S & P 500. The daily data consists of the open, close, high and low stock index figures from April 1st 2000 to March 31st 2010. We collected the data from which is freely available on the Internet.

4.3 Nature of Study:

In this area, there has been lot of study done in trying to define an appropriate distribution model to explain the stock returns all across the globe. There had been many distributions that are already proposed and can reasonably accommodate the stock returns. But in this study, we tried to find a much better distribution model for defining the stock returns. The present study is done based on the distributions that are already proposed. We tried to modify them and make them more universal. We have done the entire study based on empirical analysis.


Page 45

4.4 Objective of the study:
The main objectives of the study are as follows:
To understand the nature and behavior of stock index returns

Find out an appropriate distribution model that can reasonably define the stock index returns Use the proposed distribution model in the real-time data and test its efficiency in describing the stock returns. Compare the proposed model with the existing models to compare the relative superiority in explaining the stock returns over other models.

4.6 Methodology:
In this report, we tried to observe the stock returns from S & P 500 for the period of 10 years stretching from April 1st 2000 to March 31st 2010. We first tried to run the data through normal distribution and find whether the data follows normality and if not (as found by many like (Mandelbrot 1963), (Blattberg and Gonedes 1974), (D.Pratez 1972) etc.), then what is the variation? We then tried to use the observed findings from the above study to formulate a suitable distribution considering all the parameters like mean, standard deviation, skewness and kurtosis. We want to test mixed normal distribution as a probable fit for the distribution of returns as it has the flexibility for the user regarding the proportions or weights of the normal distributions that has to be considered for fitting the observed data.


Page 46

So in modeling the mixed normal distribution, we considered two normal distributions to be appropriately weighted to form a mixed normal distribution. This mixed normal distribution would be formed in such a way that it would be able to accommodate all the various characteristics and variances that stock returns would display.

4.6 Software used in the study:

The main aim of the study i.e. defining a better distribution that can incorporate all the variations in stock returns have been achieved using various softwares. Among them, R is the most prominent one. We have also used Microsoft Excel in few cases. In R-software, we used many different packages like mixtools, xts, quantmod, RcompHam94 and many others. All these softwares helped in achieving our goal of defining a mixture of Gaussians comfortably.

4.7 Limitations of the study:

The following are the limitations of the study: The study is done only for 10 years and so the results may not be comprehensive. We considered the closing prices from S & P 500 for the 10 years from the stock market which may not be strong efficient market that can incorporate all the expected information. So this may not be universally applicable all across the globe. We considered approximate values in few cases and so we may not be able to get accurate results.


Page 47

Study of the Index returns distributions of Nifty during the period of the study

Initial Analysis

The basic aim of this chapter is to conduct initial analysis on the data to find out the nature of the returns so that we understand their behavior. Initial analysis of the data is of utmost importance because it gives us an idea about the character of returns basing on which we can decide an appropriate model to fit them. In this chapter, after the initial analysis we tried to find whether the data follows normal distribution and get basic summary of data to understand its behavior. The following table gives us the basic information about the data that is considered i.e. S&P 500 from April 1st 2000 to March 31st 2010 consisting of 2513 daily observations. The data has been presented year wise to get a more detailed picture of the index returns.

5.2 Summary of the Stock Index returns:

2000_01 No. observations Minimum Maximum 1. Quartile 3. Quartile Mean Median Sum SE Mean LCL Mean UCL Mean Variance Standard deviation of 251 0.060045 0.048884 0.008593 0.007438 0.001039 0.001515 0.260733 0.000882 0.002775 0.000697 0.000195 0.013967 2001_02 246 0.050468 0.042753 0.007352 0.006898 -4.60E05 0.000202 0.011215 0.000792 0.001605 0.001514 0.000154 0.012415 2002_03 253 0.042423 0.055744 0.013523 0.008225 0.001194 0.001903 0.302152 0.001078 0.003317 0.000929 0.000294 0.017147 2003_04 253 0.025234 0.025781 0.004479 0.006434 0.001121 0.001291 0.28352 0.00054 5.80E-05 0.002184 7.40E-05 0.008585 2004_05 251 0.016455 0.016027 0.004252 0.004463 0.000188 0.000588 0.047156 0.000423 0.000645 0.001021 4.50E-05 0.006702 2005_06 253 0.018496 0.019544 0.003563 0.004261 0.000365 0.000841 0.092396 0.000399 -0.00042 0.001151 4.00E-05 0.006346 2006_07 250 0.035343 0.021336 -0.00317 0.0037 0.000371 0.000842 0.092852 0.000427 0.000471 0.001213 4.60E-05 0.006759 2007_08 251 0.032519 0.041535 0.006811 0.006479 0.000285 0.000589 0.071587 0.00076 0.001783 0.001213 0.000145 0.012048 2008_09 253 0.094695 0.109572 0.013819 0.009597 0.001998 8.10E-05 0.505485 0.001753 0.005451 0.001455 0.000778 0.027887 2009_10 252 0.043732 0.037347 0.003565 0.008407 0.001517 0.002347 0.382326 0.000759 2.10E-05 0.003013 0.000145 0.012056


Page 51

Initial Analysis
Skewness Kurtosis 0.000326 1.519084 0.023498 1.378859 0.445533 0.322367 0.046936 0.030774 0.105125 0.179404 0.050473 0.168834 0.478434 3.520454 0.113523 0.695755 0.022857 2.386663 0.331595 1.061594

Table5. 1: Basic statistics summary of the data from April 1 to March 31 2010 displayed year2000 wise.



From the above table we can clearly see that Skewness of the returns varies from as high as 0.445533 in 2002-03 to as low as -0.478434 in the year 2006-2007. Similarly the Kurtosis also varied from as high as 3.520454 in 2006-2007 to as small as -0.179404 in 2004-2005. We can also make another inference that year 2006-2007 showed extremes of both skewness and kurtosis signifying non-normality.

5.3 Relation between VIX and standard deviation is as follows:

Relationship between VIX, Mean, skewness and Kurtosis

70.00000 60.00000 50.00000 40.00000 30.00000 20.00000 10.00000 0.00000 ap oc -10.00000 rju td n0 ec 0 00
apr ju n 01 oct de c0 1 apr ju n 02 oct de c0 2
ap rju n0 7o c td ec 07 ap rju n0 8

1600 1400 1200 1000 800 600 400 200

oct dec 0 8 a oct p rju dec n 0 9 09

Quarterly avg VIX S kew ness Kurtosis Quarterly avg mean

ap rju n0 3o ctd e c 03 ap rju n 0 4o ctd ec 0 4 ap rju n0 5o ctd e c 05 ap rju n 0 6o ctd ec 0 6

Quarterly periods Figure5. : Relationship 1

betw een VIX, M Skew ness ean, and Kurtosis.


Page 52

Initial Analysis
From the above diagram, we can clearly see that VIX and Mean are inversely related throughout. But kurtosis and VIX are directly related till Apr-Jun 2005. But after Apr-Jun05, they started behaving inversely. But the important point to note is that, the kurtosis is reacting little late to the movements of the VIX. So we can expect the reaction of stock market in similar lines basing on the movement of VIX. If we consider the relationship between kurtosis and skewness, then relationship is inversely related throughout.

5.4 Relation between mean, standard deviation, kurtosis and skewness:

7 6 5 4 3 2 1 0 -1 -2 -3

Plot of mean, standard deviation, kurtosis and skewness

Mean Stdev Skewne ss Kurtosis

oc td aprjun ec 00 00

a p rj u n 0 1 o c t d e c 0 1 a p rj u n 0 2 o c t d e c 0 2 a p rj u n 0 3 o c t d e c 0 3

ap rju n0 4o ct de c0 4a prj un 05 oc td ec 05 ap rju n0 6o ct de c0 6a prj un 07

o c td ec0 7ap rju n 08

o c td ec0 8ap rju n 09


Quarterly periods
Figure5. 2: Relationship between Mean, standard deviation, Skewness and Kurtosis.


Page 53

Initial Analysis
From the above diagram, we clearly see that till 2005, skewness and kurtosis are directly related. But once there is a crossover, they are inversely related. But regarding mean and standard deviation, there is hardly any relationship that can be seen with the kurtosis and skewness. They overlap on each other in the above graph.

5.5 Normal distribution:

At this point of time, we need to know about the normal distribution as it will be used repeatedly through the rest of the paper. Normal distribution was first discovered by De-Moivre in 1733, who got this continuous distribution as a limiting case of the binomial distribution. Later in 1774, Laplace also found it with the historical error as found by Gauss in 1809. The normal distribution bases its proposition from the central limit theorem. Central limit theorem states that any sample size of more than 30 would tend to follow normal distribution. So more the sample size, more the data would tend to be normal. This is because, as per central limit theorem any natural data collected would tend to hover around a particular value which is mean. So if more data is collected, then the variation of the data from its mean reduces which will finally result into a normal distribution. The formal definition of the Normal distribution is as follows: A random variable X is said to have a normal distribution with parameters (mean) and 2 (variance) if its probability density function is given by the probability law:


Page 54

Initial Analysis
1 1

; ,

= 2 exp 2

Equation5. 1: Normal distribution density function.

There are few characteristics which any data that follows Normal distribution

should have and they are listed as follows: 1. The curve is bell-shaped ad symmetrical about the line x=.
2. All the three measures of dispersions i.e. mean, median and

mode coincide with each other. 3. As the sample size or the no. of. Observations x increases, the distribution tend to get closer to normal. There are many other characteristics that go in defining the normal distribution. The Density of a standard normal distribution would be as follows:

Figure5. 3: Density of a standard Normal distribution with mean=0 and standard deviation= 1


Page 55

Initial Analysis

5.6 Mixture distributions:

A mixture distribution is a model where we can mix two types of distributions to produce a new type of distribution which will have the characteristics of both the distributions. So we can mix two types of distributions given their parameters along with the new density function for the newly created distribution. In this paper, we tried to mix two normal distributions to better fit the stock Index returns distribution. 5.6.1Mixture of Normals: This type of distribution is from the family of LOSS distributions as they help us in indentifying the loss from the given data. Mixture of normals basically tries to mix two normals given the necessary parameters. But the logic is that the two normals are mixed in different proportions so as to get the optimal fit for the data. In this paper, we are trying to fit the stock returns with the help of this type of distribution. We have observed from the data that the stock returns vary their distribution over the same period of time in different years. So to get the best fit we need to mix two different normal distributions in some proportion to get Mixture of normals which can incorporate the data under both the distributions which are mixed. The advantage of choosing the mixture distributions is that, after observing the data initially about their nature and behavior, we can create the new mixture distributions accordingly so as to fit the distribution. Reasons for choosing the Mixture of Normals: These models can handle most of the issues like Non-Normality

Initial Analysis
violation of homogeneity of

variances values



Complex random-effects structures, missing data etc...

The probability density function of mixture of two normal distributions is as follows:

, , 1 , 2 =

1 (2)

( ) p exp 2 2
1 1


( ) exp 2

Equation5. 2: Probability density function of m ixture normals of two

where p and q are the two different normals with standard deviations of 1 and 2 respectively. In the Initial analysis, the whole ten years data has been divided into yearly data and each years data has been plotted quarterly. This is done because, this will help us to identify whether the distributions are changing over time in the same period of year. This will also help us to identify whether they change their distribution patterns over the same period across years. The main intention of considering the financial year rather than a simple calendar year for our study is that many companies that are listed on S&P 500 follow financial year for their business records and strategies. So their performance is more accurately reflected in the Index returns in this period


Page 57

Initial Analysis
because of which the financial year is considered. The distribution that is displayed from these yearly returns display a more realistic nature of the returns. The following graphs show the returns distributions that are plotted quarterly for every year. This is done to check whether the stock returns change their distribution patterns over the period of time in a particular year.


Figure5. 4: Distribution for the Index returns over the period 2000-2001.


Page 58

Initial Analysis


Figure5. 5: Distribution for the Index returns over the period 2001-2002.


Figure5. 6: Distribution for the Index returns over the period 2002-2003.

Initial Analysis


Figure5. 7: Distribution for the Index returns over the period 2003-2004.


Figure5. 8: Distribution for the Index returns over the period 2004-2005.


Page 60

Initial Analysis


Figure5. 9: Distribution for the Index returns over the period 2005-2006.


Figure5. 10: Distribution for the Index returns over the period 2006-2007.

Initial Analysis

Figure5. 11: Distribution for the Index returns over the period 2007-2008


Figure5. 12: Distribution for the Index returns over the period 2008-2009.

Initial Analysis

Figure5. 13: Distribution for the Index returns over the period 2009-2010

From the above graphs, we can clearly say that in the quarter APR-JUN, the returns distribution changed their shapes i.e. especially in few years like 2007-08 and 2008-09, it showed a very high kurtosis displayed through peakedness. The quarter JUL-SEPT showed reasonable normality in the graphs. But in the first year of analysis i.e. 2000-2001, it was more than all other quarters. Again in the year 2002-2003, it took a more flat curve as against all other curves in that particular financial year. The next quarter i.e. OCT-DEC is a very crucial one as this part of the year shows the effect of the entire calendar year. In the period of the study considered, it showed a peaked distribution in the years 20032004, 2004-2005, 2006-2007


Page 63

Initial Analysis
compared with the other quarters distributions. It showed significant flatness in the year 2008-2009. Then coming to JAN-MAR quarter, it displayed quite normal curves in all the quarters considered except in the year 2007-2008, where in it showed Flatness. The study was done on quarterly basis so that we identify the changes in the distributions over time. It was proved that the distribution of stock returns change over a period of time by (Fukuda 2009).

5.7 Conclusions:
In this chapter, after doing the initial analysis, we found out that the stock Index returns do not follow Normal distribution. This has been proved after checking it with the Jarque-Bera normality test. It gave a P-value of 2.2e-16 which signifies the probability of null hypothesis that it is normal distribution being correct. So as the value is much below the significance level of 5%, it goes to prove the non-normality of the data considered. Then from the quarterly returns distribution, we also found that the distribution shapes are changing over time. In the next chapter, we applied the mixture of normals to the data considered as a good fit .


Page 64

Development and Application of the Mixture of Normals distribution to the Observed Index returns

Application of the model

6. Application of the Mixture of Normals distribution to the Observed Index returns

6.1 Introduction:
In this chapter, we tried to develop a model for fitting the Stock Index returns after the observation of its behavior over time. We applied it for the data under consideration to effectively fit the returns with the developed distribution. Initially, we need to identify the normality of the data under consideration.

6.2 Normality of the data considered:

The Normal distribution as explained and proved by central limit theorem, any sample size more than 30 should follow normal distribution. The normal distribution should have a probability density function as follows:

; ,

= 2 exp 2

Equation6. 1: Probability densities function of Normal distribution.

Where x is the random variable, is the mean and is the standard deviation. In our case, the data considered i.e. from April 1st 2000 to March 31st 2010 has been plotted to check the normality of the data. When the data has been graphed in the R-software, the figure shown is as follows:


Page 67

Application of the model

Figure6. 1: Histogram of Stock returns considered for the period of Study

The above graph shows the histogram of the stock Index returns over the period considered for our study. But if we try to plot a density graph of the data to know the distribution of returns, it would be as follows:


Page 68

Application of the model

Figure6. 2: A Normal Curve of the Stock Index returns for the period

From the above graph, one may feel that the stock returns follow Normal distribution. But it is not so as it was proved by the pioneer Mandelbrot in his paper The variation of certain speculative prices (Mandelbrot 1963). The fatness of tails i.e. kurtosis of this distribution is beyond tolerance levels of a normal distribution which may not be visible graphically. So to check the normality of distributions, we have several normality tests. One of the effective normality tests is Jarque Bera test which has been considered in checking the normality of the returns distribution. Using the Jarque-Bera test i.e. the test of normality, we got the P-value of the stock returns as 2.2e-16 which is much below to the significance level of 5%.So

Application of the model

from this we can clearly say that there is no statistical evidence for not rejecting the null hypothesis that the data follows normal distribution. So the results clearly displayed the non-normality of the data because of either Skewness or kurtosis or both. We then tried to fit it with the Mixture of normals as the next probable distribution that can help us fit the Index returns.

6.3 Generation of Mixture of Normals distribution:

As it was evident from the above analysis that the stock returns dont follow normal distribution, the need for a new distribution arises that can fit the stock Index returns. So we then tried the Mixture of normals as the next probable option for fitting the stock index returns. The generation of this distribution has been done with the help of MIXTOOLS package in R -software. In this package, we used a function called normalmixEM. This function basically tries to mix two univariate normals using two steps called E-step and an M-step. In this function, we need to give initial parameters like proportions, mean and standard deviation for mixing the two normals. The model then estimates the parameters using the E-step i.e. Expectation step. The likelihood of the parameters is found out with the given initial values.


p . ( ;

t=1 i=1

Equation6. 2: E-step function

Where k= number of Gaussians, T= number of observations and is Gaussian density function.


Application of the model

Th e Th e expected values are then maximized in the M-step i.e. Maximization step . log likelihood obtained in the E-step is maximized by renewing the

parameters that were given i.e.

and pi.
( )
( |)




( )

( |)

Equation6. 3: M-step for mean value

( +1)


( )



( )

( |)

Equation6. 4: M-step for standard deviation value

( +1)


( )
( |)

Equation6. 5: M-step for weights or Proportions

This Expectation-Maximization algorithm is then run repetitively to get the optimum solution. This model does better and better over much iteration. The function that is used requires initial parameters about the proportions of univariate normals to be mixed in, initial mean and standard deviation of both the normals which finally gives the output using Maximum likelihood estimation(MLE). We also can give the option for location mixture or a scale mixture to be done for both the univariate normals.


Page 71

Application of the model

6.4 Application of the Mixture of Normals distribution to the Stock Index returns:
The above mixture of normals distribution has been applied to the observed stock index returns to check the descriptive validity of Mixture of Normals over the Normal distribution. In the function normalmixEM, the whole 10 years returns data has been given along with mean and standard deviation with initial mixing proportions as fifty percent each. When all the inputs are given, the mixture normal distribution for the stock Index returns that are considered for the period displays as follows:

Figure6. 3: Normal's generated from EM algorithm.


Page 72

Application of the model

The r-code for the entire function is available in the Appendices. When the data stock returns were run with this function, the function gave an output as follows: $lambda [1] 0.7863768, 0.2136232 $mu [1] 0.0003096069, -0.0016108368 $sigma [1] 0.008251659, 0.025431399 The output essentially says that the mixing proportions (lambda) of the two normals are 0.7863 and 0.2136 and the mean and sigma (standard deviation) are as displayed respectively. To get a better picture of the final mixture of normals distribution curve, we mixed the two normals displayed in the proportions that are given by the EM algorithm along with the specified mean and standard deviation. The resultant output would be as follows:


Page 73

Application of the model

Figure6. 4: Mixture of two normals

As we can see from the above graph, the two univariate normals are mixed to form one Mixture of normals density curve. The mixture of normals that is generated has the characteristics of both the individual normals that are mixed. So as a result it has the ability to define the data under both the curves. The advantage of using this mixture of normals is that we can specify the proportions in which the data can be mixed which will be optimized later over iterations to define the data under consideration. When the data under consideration has been run with the package normalmixEM, the number of iterations varied from 100 to 160. So as the iterations increase, the accuracy of the newly generated distribution in defining the data increases.

Application of the model

6.5 Testing the efficiency of Mixture of normals over the normal distribution:
In testing the efficiency of the Mixture of normals over Normal, we can either use a graphical method or a numerical method. In this paper, we proved the descriptive validity of mixture normals as a better fit than Normal in defining the stock index returns through graphical method. When the mixture of normals is plotted as per the suggested parameters by EM algorithm, we get the curve as displayed in the above graph. But to test the descriptive validity of the mixture of normals over Normal distribution, we need to plot them together. So when we plot stock returns along with the Mixture of normals and Normal distribution curve in R- software, we get the output as follows:

Figure6. 5: Comparison of Mixture of normals and Normal distribution.


Application of the model

From the above graph, it is clearly evident that the mixture normal distribution fits better than the normal distribution for the period of stock Index returns under consideration. The mixture normal distribution has the ability to capture leptokurtosis and skewness in the data. This has been displayed by the graph so clearly.

6.6 Modeling Changes in the regime using Markov Switching Model:

Markov switching model is an asset returns model that can incorporate stochastic volatility components into it. In the present analysis, we have used this model to identify the changes in the regime during the period of study. These changes in the regime would have occurred due to various reasons that would have happened in the economy at large. For the current study, we considered monthly prices data as weekly data has too many data points to understand the graph and quarterly data has too less data points. The monthly stock prices data has been plotted using the package RcompHam94 from the works of James D. Hamiltons. This package is not available in the rCRAN. But it is downloadable from the website . When plotted, the figure looked as follows:


Page 76

Application of the model

Figure6. 6: Modeling Changes in the regime over the period using Stock Index prices

From the above graph, we can see that there are few regions where in there are spikes in the probabilities that caused changes in the distribution pattern of the stock prices in turn on their returns. These spikes would have occurred due to many reasons that happened in the economy. We clearly observe from the above graph that during the period 2008-2009, there is a spike lasting for long which can be attributed to the financial crisis that took place in that year. The main result of plotting this graph is to know the sensitivity of the Stock Index returns to the constant changes that happen in the economy.


Page 77

Application of the model

6.7 Conclusion:
From the above analysis, we derive the following conclusions that the mixture of normals provides a better fit than the Normal distribution in describing the stock Index returns. We also found out that there is switching of regimes in the stock prices over the period of study. Then we found out the relation between the VIX, mean, skewness, and kurtosis which can help us in knowing about the dependency among them.


Page 78

Summary and Conclusions


The R-code that is used to generate the above results; #************************include required libraries********* require(PerformanceAnalytics) require(quantmod) require(xts) require(mixtools) require(RcompHam94) require(dylm)

getSymbols("^GSPC",from="2000-04-01",to="2010-0331") stock<-ROC(Cl(GSPC))
na.omit(stock) stock[] =0

*****User Input Begins******** plot.multi.dens <- function(s)

{ junk.x = NULL junk.y = NULL junk.z = NULL junk.a = NULL for(i in 1:length(s)) {

junk.x = c(junk.x, density(s[[i]])$x) junk.y = c(junk.y, density(s[[i]])$y) junk.z = c(junk.z, density(s[[i]])$z) junk.a = c(junk.a, density(s[[i]])$a)

} xr <- range(junk.x) yr <- range(junk.y) zr <- range(junk.z) ar <- range(junk.a) plot(density(s[[1]]), xlim = xr, ylim = yr, main = "Returns distribution",ylab="Density") for(i in 1:length(s)) { lines(density(s[[i]]), xlim = xr, ylim = yr, col = i) } }

# For 2000-01 aprjun00=stock[1:63] julsept00=stock[64:126] octdec00=stock[127:189] janmar01=stock[190:251]

aprjun00=dnorm(n=length(aprjun00),mean=mean(aprjun00),sd=sd(aprjun0 0))
julsept00=dnorm(n=length(julsept00),mean=mean(julsept00),sd=sd(julsept 00))


Page 94

octdec00=dnorm(n=length(octdec00),mean=mean(octdec00),sd=sd(octdec0 0)) janmar01=dnorm(n=length(janmar01),mean=mean(janmar01),sd=sd(janm ar01))


plot.multi.dens(list(aprjun00,julsept00,octdec00,janmar01)) legend("topright",legend=c("aprjun00","julsept00","octdec00","janmar01"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year00_01=rbind(aprjun00,julsept00,octdec00,janmar01) colnames(year00_01)="2000_01"

# For 2001-02 aprjun01=stock[252:314] julsept01=stock[315:373] octdec01=stock[374:437] janmar02=stock[438:497] aprjun01=dnorm(n=length(aprjun01),mean=mean(aprjun01),sd=sd(aprjun0 1))

julsept01=dnorm(n=length(julsept01),mean=mean(julsept01),sd=sd(julsept 01))

octdec01=dnorm(n=length(octdec01),mean=mean(octdec01),sd=sd(octdec0 1))
janmar02=dnorm(n=length(janmar02),mean=mean(janmar02),sd=sd(janm ar02))


legend("topright",legend=c("aprjun01","julsept01","octdec01","janmar02"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year01_02=rbind(aprjun01,julsept01,octdec01,janmar02) colnames(year01_02)="2001_02"

# For 2002-03 aprjun02=stock[498:561] julsept02=stock[562:625] octdec02=stock[626:689] janmar03=stock[690:750] aprjun02=dnorm(n=length(aprjun02),mean=mean(aprjun02),sd=sd(aprjun0 2))

julsept02=dnorm(n=length(julsept02),mean=mean(julsept02),sd=sd(julsept 02))

octdec02=dnorm(n=length(octdec02),mean=mean(octdec02),sd=sd(octdec0 2))
janmar03=dnorm(n=length(janmar03),mean=mean(janmar03),sd=sd(janm ar03))

plot.multi.dens(list(aprjun02,julsept02,octdec02,janmar03)) legend("topright",legend=c("aprjun02","julsept02","octdec02","janmar03"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year02_03=rbind(aprjun02,julsept02,octdec02,janmar03) colnames(year02_03)="2002_03"

# For 2003-04


Page 96

aprjun03=stock[751:813] julsept03=stock[814:877] octdec03=stock[878:941] janmar04=stock[942:1003] aprjun03=dnorm(n=length(aprjun03),mean=mean(aprjun03),sd=sd(aprjun0 3))
julsept03=dnorm(n=length(julsept03),mean=mean(julsept03),sd=sd(julsept 03))

octdec03=dnorm(n=length(octdec03),mean=mean(octdec03),sd=sd(octdec0 3)) janmar04=dnorm(n=length(janmar04),mean=mean(janmar04),sd=sd(janm ar04)) plot.multi.dens(list(aprjun03,julsept03,octdec03,janmar04)) legend("topright",legend=c("aprjun03","julsept03","octdec03","janmar04"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year03_04=rbind(aprjun03,julsept03,octdec03,janmar04) colnames(year03_04)="2003_04"

# For 2004-05 aprjun04=stock[1004:1065] julsept04=stock[1066:1129] octdec04=stock[1130:1193] janmar05=stock[1194:1254] aprjun04=dnorm(n=length(aprjun04),mean=mean(aprjun04),sd=sd(aprjun0 4))

julsept04=dnorm(n=length(julsept04),mean=mean(julsept04),sd=sd(julsept 04))

octdec04=dnorm(n=length(octdec04),mean=mean(octdec04),sd=sd(octdec0 4))
janmar05=dnorm(n=length(janmar05),mean=mean(janmar05),sd=sd(janm ar05))

plot.multi.dens(list(aprjun04,julsept04,octdec04,janmar05)) legend("topright",legend=c("aprjun04","julsept04","octdec04","janmar05"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year04_05=rbind(aprjun04,julsept04,octdec04,janmar05) colnames(year04_05)="2004_05"

# For 2005-06 aprjun05=stock[1255:1318] julsept05=stock[1319:1382] octdec05=stock[1383:1445] janmar06=stock[1446:1507] aprjun05=dnorm(n=length(aprjun05),mean=mean(aprjun05),sd=sd(aprjun0 5))

julsept05=dnorm(n=length(julsept05),mean=mean(julsept05),sd=sd(julsept 05))

octdec05=dnorm(n=length(octdec05),mean=mean(octdec05),sd=sd(octdec0 5))
janmar06=dnorm(n=length(janmar06),mean=mean(janmar06),sd=sd(janm ar06))

plot.multi.dens(list(aprjun05,julsept05,octdec05,janmar06)) legend("topright",legend=c("aprjun05","julsept05","octdec05","janmar06"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)


Page 98

year05_06=rbind(aprjun05,julsept05,octdec05,janmar06) colnames(year05_06)="2005_06"

# For 2006-07 aprjun06=stock[1508:1570] julsept06=stock[1571:1633] octdec06=stock[1634:1696] janmar07=stock[1697:1757] aprjun06=dnorm(n=length(aprjun06),mean=mean(aprjun06),sd=sd(aprjun0 6))

julsept06=dnorm(n=length(julsept06),mean=mean(julsept06),sd=sd(julsept 06))

octdec06=dnorm(n=length(octdec06),mean=mean(octdec06),sd=sd(octdec0 6))
janmar07=dnorm(n=length(janmar07),mean=mean(janmar07),sd=sd(janm ar07))

plot.multi.dens(list(aprjun06,julsept06,octdec06,janmar07)) legend("topright",legend=c("aprjun06","julsept06","octdec06","janmar07"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year06_07=rbind(aprjun06,julsept06,octdec06,janmar07) colnames(year06_07)="2006_07"

# For 2007-08 aprjun07=stock[1758:1820] julsept07=stock[1821:1883] octdec07=stock[1884:1947]


janmar08=stock[1948:2008] aprjun07=dnorm(n=length(aprjun07),mean=mean(aprjun07),sd=sd(aprjun0 7))
julsept07=dnorm(n=length(julsept07),mean=mean(julsept07),sd=sd(julsept 07))

octdec07=dnorm(n=length(octdec07),mean=mean(octdec07),sd=sd(octdec0 7))
janmar08=dnorm(n=length(janmar08),mean=mean(janmar08),sd=sd(janm ar08))

plot.multi.dens(list(aprjun07,julsept07,octdec07,janmar08)) legend("topright",legend=c("aprjun07","julsept07","octdec07","janmar08"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year07_08=rbind(aprjun07,julsept07,octdec07,janmar08) colnames(year07_08)="2007_08"

# For 2008-09 aprjun08=stock[2009:2072] julsept08=stock[2073:2136] octdec08=stock[2137:2200] janmar09=stock[2201:2261] aprjun08=dnorm(n=length(aprjun08),mean=mean(aprjun08),sd=sd(aprjun0 8))

julsept08=dnorm(n=length(julsept08),mean=mean(julsept08),sd=sd(julsept 08))

octdec08=dnorm(n=length(octdec08),mean=mean(octdec08),sd=sd(octdec0 8))
janmar09=dnorm(n=length(janmar09),mean=mean(janmar09),sd=sd(janm ar09))

Page 100

plot.multi.dens(list(aprjun08,julsept08,octdec08,janmar09)) legend("topright",legend=c("aprjun08","julsept08","octdec08","janmar09"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year08_09=rbind(aprjun08,julsept08,octdec08,janmar09) colnames(year08_09)="2008_09"

# For 2009-10 aprjun09=stock[2262:2324] julsept09=stock[2325:2388] octdec09=stock[2389:2452] janmar10=stock[2453:2513] aprjun09=dnorm(n=length(aprjun09),mean=mean(aprjun09),sd=sd(aprjun0 9))

julsept09=dnorm(n=length(julsept09),mean=mean(julsept09),sd=sd(julsept 09))

octdec09=dnorm(n=length(octdec09),mean=mean(octdec09),sd=sd(octdec0 9))
janmar10=dnorm(n=length(janmar10),mean=mean(janmar10),sd=sd(janm ar10))

plot.multi.dens(list(aprjun09,julsept09,octdec09,janmar10)) legend("topright",legend=c("aprjun09","julsept09","octdec09","janmar10"), col=(1:4), lwd=2, lty = 1,cex=0.60,xjust=1)

year09_10=rbind(aprjun09,julsept09,octdec09,janmar10) colnames(year09_10)="2009_10"


Page 101

#*******************uSER DEFINED FUNCTION ENDS******#

require(fBasics) summary00_01=basicStats(year00_01) summary01_02=basicStats(year01_02) summary02_03=basicStats(year02_03) summary03_04=basicStats(year03_04) summary04_05=basicStats(year04_05) summary05_06=basicStats(year05_06) summary06_07=basicStats(year06_07) summary07_08=basicStats(year07_08) summary08_09=basicStats(year08_09) summary09_10=basicStats(year09_10) summary=data.frame(summary00_01,summary01_02,summary02_03,sum mary03_04,summary04_05,summary05_06,summary06_07,summary07_08, summary08_09,summary09_10)
colnames(summary)=c("2000_01","2001_02","2002_03","2003_04","2004_ 05","2005_06","2006_07","2007_08","2008_09","2009_10")

summary_aprjun00=basicStats(aprjun00) summary_julsept00=basicStats(julsept00) summary_octdec00=basicStats(octdec00) summary_janmar01=basicStats(janmar01) quarterlysummary00_01=data.frame(summary_aprjun00,summary_julsept0 0,summary_octdec00,summary_janmar01)

colnames(quarterlysummary00_01)=c("aprjun00","julsept00","octdec00","ja nmar01")

Page 102


summary_aprjun01=basicStats(aprjun01) summary_julsept01=basicStats(julsept01) summary_octdec01=basicStats(octdec01) summary_janmar02=basicStats(janmar02) quarterlysummary01_02=data.frame(summary_aprjun01,summary_julsept0 1,summary_octdec01,summary_janmar02) colnames(quarterlysummary01_02)=c("aprjun01","julsept01","octdec01","ja nmar02")

summary_aprjun02=basicStats(aprjun02) summary_julsept02=basicStats(julsept02) summary_octdec02=basicStats(octdec02) summary_janmar03=basicStats(janmar03) quarterlysummary02_03=data.frame(summary_aprjun02,summary_julsept0 2,summary_octdec02,summary_janmar03)

colnames(quarterlysummary02_03)=c("aprjun02","julsept02","octdec02","ja nmar03")

summary_aprjun03=basicStats(aprjun03) summary_julsept03=basicStats(julsept03) summary_octdec03=basicStats(octdec03) summary_janmar04=basicStats(janmar04) quarterlysummary03_04=data.frame(summary_aprjun03,summary_julsept0 3,summary_octdec03,summary_janmar04)

colnames(quarterlysummary03_04)=c("aprjun03","julsept03","octdec03","ja nmar04")


Page 103


summary_aprjun04=basicStats(aprjun04) summary_julsept04=basicStats(julsept04) summary_octdec04=basicStats(octdec04) summary_janmar05=basicStats(janmar05) quarterlysummary04_05=data.frame(summary_aprjun04,summary_julsept0 4,summary_octdec04,summary_janmar05) colnames(quarterlysummary04_05)=c("aprjun04","julsept04","octdec04","ja nmar05")

summary_aprjun05=basicStats(aprjun05) summary_julsept05=basicStats(julsept05) summary_octdec05=basicStats(octdec05) summary_janmar06=basicStats(janmar06) quarterlysummary05_06=data.frame(summary_aprjun05,summary_julsept0 5,summary_octdec05,summary_janmar06)

colnames(quarterlysummary05_06)=c("aprjun05","julsept05","octdec05","ja nmar06")

summary_aprjun06=basicStats(aprjun06) summary_julsept06=basicStats(julsept06) summary_octdec06=basicStats(octdec06) summary_janmar07=basicStats(janmar07) quarterlysummary06_07=data.frame(summary_aprjun06,summary_julsept0 6,summary_octdec06,summary_janmar07)

colnames(quarterlysummary06_07)=c("aprjun06","julsept06","octdec06","ja nmar07")


Page 104


summary_aprjun07=basicStats(aprjun07) summary_julsept07=basicStats(julsept07) summary_octdec07=basicStats(octdec07) summary_janmar08=basicStats(janmar08) quarterlysummary07_08=data.frame(summary_aprjun07,summary_julsept0 7,summary_octdec07,summary_janmar08) colnames(quarterlysummary07_08)=c("aprjun07","julsept07","octdec07","ja nmar08")

summary_aprjun08=basicStats(aprjun08) summary_julsept08=basicStats(julsept08) summary_octdec08=basicStats(octdec08) summary_janmar09=basicStats(janmar09) quarterlysummary08_09=data.frame(summary_aprjun08,summary_julsept0 8,summary_octdec08,summary_janmar09)

colnames(quarterlysummary08_09)=c("aprjun08","julsept08","octdec08","ja nmar09")

summary_aprjun09=basicStats(aprjun09) summary_julsept09=basicStats(julsept09) summary_octdec09=basicStats(octdec09) summary_janmar10=basicStats(janmar10) quarterlysummary09_10=data.frame(summary_aprjun09,summary_julsept0 9,summary_octdec09,summary_janmar10)

colnames(quarterlysummary09_10)=c("aprjun09","julsept09","octdec09","ja nmar10")


Page 105

quarterlysummary=data.frame(quarterlysummary00_01,quarterlysummary0 1_02,quarterlysummary02_03,quarterlysummary03_04,quarterlysummary04 _05,quarterlysummary05_06,quarterlysummary06_07,quarterlysummary07_ 08,quarterlysummary08_09,quarterlysummary09_10)

# Normal curve distribution for the stock. h<-hist(stock,breaks=15) xhist<-c(min(h$breaks),h$breaks) yhist<-c(0,h$density,0) xfit<-seq(min(stock),max(stock),length=2500) yfit<-dnorm(xfit,mean=mean(stock),sd=sd(stock)) plot(xhist,yhist,type="s",ylim=c(0,max(yhist,yfit)),main="Normal pdf and histogram") lines(xfit,yfit,col="red")

# Mixing normals
s=normalmixEM(x=stock,lambda=c(0.5,0.5),mu=c(0.000643,0.000442),sigma=NULL,k=2,epsilon=0.00000001,maxit=1000000, arbmean=TRUE,arbvar=TRUE)

s[c("lambda","mu","sigma")] plot(s,whichplots=2,main="Normals genrated from EM algorithm") legend("topright",legend=c("Normal 1","Normal 2"), col=(1:2), lwd=2, lty = 1,cex=0.60,xjust=1)

# Simulating from the mixture of normals n<-5000 lambda<-c(0.7863768,0.2136232) mu<-c(0.0003096069,-0.0016108368)


Page 106

sigma<-c(0.008251659,0.025431400)<-rnormmix(n, lambda, mu, sigma)

plot(s,which=2) lines(density(,col="black")

legend("topright",legend=c("Mixture","Normal 1","Normal 2"), col=(1:3), lwd=2, lty = 1,cex=0.60,xjust=1)

# Comparision of Normal versus Mixture normal plot(xhist,yhist,type="s",ylim=c(0,40),main="Comparision of Normal And Mixture normal")
lines(xfit,yfit,col="red") lines(density(,col="black")

legend("topright",legend=c("Mixture normal","Normal"), col=(1:2), lwd=2, lty = 1,cex=0.60,xjust=1)

# Modeling changes in the regime over the period using Markov switching models
z=GSPC$GSPC.Adjusted z=to.monthly(z) z=z$z.Close z=as.zooreg(z) selection <- window(z, start = c(2000,4),frequency=12) g <- diff(100 * log(as.vector(selection[, "z.Close"])))

d <- index(selection[-1])

Page 107


nlags <- 4 nstates <- 2^(nlags + 1) lagstate <- 1 + outer(1:nstates, 1:(nlags + 1), FUN = function(i,j) { trunc((i - 1)/2^(nlags + 1 - j))%%2
}) head(lagstate)

transit <- outer(X = 1:nstates, Y = 1:nstates, FUN = function(i,j) { ((2 * lagstate[i, 1] + lagstate[j, 1] - 1) - 1) * (((i - 1)% %(2^nlags)) == trunc((j - 1)/2)) + 1 }) head(transit)

infer.regimes <- function(THETA, YT) { phi <- THETA[grep("phi.*", names(THETA))] mu <THETA[grep("mu.*",

names(THETA))] sigma <- THETA["sigma"] p11star <- THETA["p11star"] p22star <- THETA["p22star"] T <- length(YT) tp <- c(0, p11star, 1 - p22star, 1 - p11star, p22star) P <- array(tp[transit], c(nstates, nstates)) A <- rbind(diag(nstates) - P, rep(1, nstates))

Page 108

ergodic.pi <- (solve(t(A) %*% A) %*% t(A))[, nstates + 1] xi.t.t <- ergodic.pi %o% rep(1, nlags) xi.t.t_1 <- xi.t.t log.likelihood <- 0 for (tt in (nlags + 1):T) { residuals <- as.vector(((rep(1, nstates) %o% YT[tt:(tt nlags)]) - array(mu[lagstate], c(nstates, nlags + 1))) %*% c(1, -phi)) eta.t <- dnorm(residuals, mean = 0, sd = sigma) fp <- eta.t * xi.t.t_1[, tt - 1] fpt <- sum(fp) xi.t.t <- cbind(xi.t.t, fp/fpt) log.likelihood <- log.likelihood + log(fpt) xi.t.t_1 <- cbind(xi.t.t_1, P %*% xi.t.t[, tt]) } xi.t.T <- xi.t.t[, T] %o% 1 for (tt in (T - 1):1) xi.t.T <- cbind(xi.t.t[, tt] * (t(P) %*% (xi.t.T[, 1]/xi.t.t_1[, tt])), xi.t.T) list(log.likelihood = log.likelihood, xi.t.t = xi.t.t, xi.t.T = xi.t.T) }

g.lm <- dynlm(g ~ 1 + L(g, 1:4), data = zooreg(data.frame(g = g))) THETA <- c(p11star = 0.85, p22star = 0.7, mu = c(1, 0), phi = as.vector(g.lm$coefficients[1 + (1:nlags)]), sigma = summary(g.lm)$sigma)


Page 109


objective <- function(THETA, YT) { -infer.regimes(THETA, YT) $log.likelihood } optimizer.results <- optim(par = THETA, hessian = TRUE, fn = objective, gr = NULL, YT = as.vector(g), method = "BFGS")
se <- diag(solve(optimizer.results$hessian))^0.5 print(optimizer.results$par)


regimes <- infer.regimes(optimizer.results$par, as.vector(g)) recession.probability <- as.vector((1:nstates > nstates/2) %*% regimes$xi.t.t) smoothed.recession.probability <- as.vector((1:nstates > nstates/2) %*% regimes$xi.t.T)

# par(mfrow=c(2,1)) # plot(recession.probability,type="l") plot(smoothed.recession.probability,type="l",main="Changes in the regime over the period",xlab="years",ylab="smoothened probabilities")


Page 110

PDF to Word