Asset Management

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/309835452
Asset Management
Book · May 2018
CITATIONS READS
0 8,705
1 author:
Paolo Vanini
University of Basel
143 PUBLICATIONS 694 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Asset Management View project
Operational Risk View project
All content following this page was uploaded by Paolo Vanini on 17 May 2019.
The user has requested enhancement of the downloaded file.

Asset Management
Paolo Vanini
University of Basel
May 17, 2019

2
Contents
1 Introduction 5
2 Fundamentals 11
2.1 Wealth of Nations and Assets under Management (AuM) . . . . . . . . . 13
2.2 Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Sovereign Wealth Funds (SWFs) . . . . . . . . . . . . . . . . . . . 16
2.2.2 Pension Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Management of Pension Funds . . . . . . . . . . . . . . . . . . . . 21
2.2.4 TAA Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Forwards and Futures . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.6 Private Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Returns and Performance Attribution . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Time Value of Money (TVM) . . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Interest Rate Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.3 Forward Rate Agreements . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.4 Constructing Discount Factors . . . . . . . . . . . . . . . . . . . . 46
2.3.5 Return Bookkeeping . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.6 Returns and Rebalancing . . . . . . . . . . . . . . . . . . . . . . . 53
2.3.7 Stochastic Portfolio Theory (SPT) . . . . . . . . . . . . . . . . . . 60
2.3.8 Return Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.9 Returns and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 The Ecient Market Hypothesis (EMH) . . . . . . . . . . . . . . . . . . . 70
2.4.1 Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.4.2 Long Time Horizon Predictions . . . . . . . . . . . . . . . . . . . . 81
2.4.3 Cross-Sectional vs Time Series Predictability . . . . . . . . . . . . 83
2.4.4 EMH Extensions and Critique . . . . . . . . . . . . . . . . . . . . . 84
2.5 Who Decides? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.5.1 MiFID II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.5.2 Investment Process for Retail Clients . . . . . . . . . . . . . . . . . 91
2.5.3 Mandate Solutions for Pension Funds . . . . . . . . . . . . . . . . . 93
2.5.4 Conduct Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.6 Risk, Return, Diversication and Reward-Risk Ratios . . . . . . . . . . . . 97
2.6.1 Long-term Risk and Return Distribution . . . . . . . . . . . . . . . 98
3
4 CONTENTS
2.6.2 Diversication of Assets - Portfolios . . . . . . . . . . . . . . . . . 98

2.6.3 Two Mathematical Facts About Diversication . . . . . . . . . . . 102
2.6.4 Risk Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6.5 Concentration and Diversity . . . . . . . . . . . . . . . . . . . . . . 106
2.6.6 Reward-Risk Ratio (RR) . . . . . . . . . . . . . . . . . . . . . . . . 110
2.6.7 Risk Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.6.8 Costs and Performance . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.6.9 Passive versus Active Investment; a First Step . . . . . . . . . . . . 113
3 Portfolio Construction 117

3.1 Steps in Portfolio Construction . . . . . . . . . . . . . . . . . . . . . . . . 117
3.2 Allocation - Foundations of Investment Decisions . . . . . . . . . . . . . . 118
3.2.1 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.2.2 Rational Dynamic Decision Making . . . . . . . . . . . . . . . . . . 123
3.2.3 Growth Optimal Portfolios . . . . . . . . . . . . . . . . . . . . . . 124
3.2.4 Heuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.3 Portfolio Construction Examples . . . . . . . . . . . . . . . . . . . . . . . 128
3.3.1 Heuristic Allocation: Static 60/40 Portfolio . . . . . . . . . . . . . 128
3.3.2 Optimal Allocation: Dynamic Merton Model . . . . . . . . . . . . 130
3.3.3 Optimal Allocation: Goal Based Investment . . . . . . . . . . . . . 132
3.3.4 Optimal Allocation: Markowitz . . . . . . . . . . . . . . . . . . . . 133
3.3.5 Review Markowitz Model . . . . . . . . . . . . . . . . . . . . . . . 146
3.3.6 Views and Portfolio Construction - The Black-Litterman Model . . 151
3.3.7 Heuristic Allocation: Risk Budgeting Portfolio Construction . . . . 157
3.4 Estimation: The Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . 162
3.4.1 Sample Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.4.2 Dimension Reduction I: PCA . . . . . . . . . . . . . . . . . . . . . 166
3.4.3 Linear Factor Model . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3.4.4 Dimension Reduction II: Random Matrix Theory . . . . . . . . . . 170
3.4.5 Equal Weighted Portfolios . . . . . . . . . . . . . . . . . . . . . . . 172
3.4.6 Optimization and Estimation Risk . . . . . . . . . . . . . . . . . . 174
3.4.7 Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
3.4.8 Linear Shrinkage of the Covariance Matrix . . . . . . . . . . . . . . 176
3.4.9 Non-Linear Shrinkage of the Covariance Matrix . . . . . . . . . . . 177
3.4.10 Comparing Dierent Approaches . . . . . . . . . . . . . . . . . . . 178
3.4.11 Comparing Dierent Approaches - Asymptotics . . . . . . . . . . . 179
3.5 Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.5.1 Industry Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 187
3.5.2 Non-Performance of Alternative Risk Premia . . . . . . . . . . . . 192
3.5.3 Theory: Factor Pricing Model . . . . . . . . . . . . . . . . . . . . . 194
3.5.4 The CAPM as a Beta Pricing Model . . . . . . . . . . . . . . . . . 196
3.5.5 Factor Investing: 3-Factor Model of Fama and French . . . . . . . 207
3.5.6 Factor Investing: 5-Factor Model of Fama and French . . . . . . . 209
CONTENTS 5
3.6 Backtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

3.6.1 Data Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
3.6.2 Overtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
3.6.3 Backtesting and Multiple Testing . . . . . . . . . . . . . . . . . . . 216
3.6.4 Application to Factor Investing . . . . . . . . . . . . . . . . . . . . 220
3.6.5 p-Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
3.6.6 Active vs Passive Investments . . . . . . . . . . . . . . . . . . . . . 224
4 Investment Theory Synthesis 235

4.1 Absolute Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
4.2 Simple General Equilibrium Model . . . . . . . . . . . . . . . . . . . . . . 236
4.3 Fundamental Asset Pricing Equation . . . . . . . . . . . . . . . . . . . . . 238
4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.4.1 Risk Neutral Probabilities . . . . . . . . . . . . . . . . . . . . . . . 239
4.4.2 Constant Relative Risk Aversion . . . . . . . . . . . . . . . . . . . 241
4.4.3 Zero Coupon Bond . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
4.5 Equivalent Formulation of the Fundamental Asset Pricing Equation . . . . 242
4.6 State Prices, Risk Neutral Probabilities, SDF . . . . . . . . . . . . . . . . 244
4.6.1 Projection Pricing and SDF Formulation . . . . . . . . . . . . . . . 249
4.6.2 Arbitrage Pricing Theory (APT) . . . . . . . . . . . . . . . . . . . 256
4.6.3 Pricing Real-Estate Risk . . . . . . . . . . . . . . . . . . . . . . . . 258
4.6.4 Multi-Period Asset Pricing and Multi-Risk-Factors Models . . . . . 265
4.6.5 Low Volatility Strategies . . . . . . . . . . . . . . . . . . . . . . . . 266
4.6.6 What Happens if an Investment Strategy is Known to Everyone? . 269
4.7 Optimal Investment Strategy and Rebalancing . . . . . . . . . . . . . . . 270
4.7.1 General Rebalancing Facts . . . . . . . . . . . . . . . . . . . . . . . 271
4.7.2 Convex and Concave Strategies . . . . . . . . . . . . . . . . . . . . 273
4.8 Short-Term versus Long-Term Investment Horizons . . . . . . . . . . . . . 276
4.8.1 Time-Varying Investment Opportunities . . . . . . . . . . . . . . . 276
4.8.2 Model Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
4.8.3 Fallacies in Long Term Investment . . . . . . . . . . . . . . . . . . 281
5 Global Asset Management 283

5.1 Asset Management Industry . . . . . . . . . . . . . . . . . . . . . . . . . 283
5.1.1 The Demand and Supply Side . . . . . . . . . . . . . . . . . . . . . 283
5.1.2 Asset Management Industry in the Financial System - the Eurozone284
5.1.3 Global Figures 2007-2014, Market Structure . . . . . . . . . . . . . 286
5.1.4 Asset Management vs Trading Characteristics . . . . . . . . . . . . 289
5.1.5 Institutional Asset Management versus Wealth Management . . . . 290
5.2 The Fund Industry - An Overview . . . . . . . . . . . . . . . . . . . . . . 291
5.2.1 Types of Funds and Size . . . . . . . . . . . . . . . . . . . . . . . . 292
5.3 Mutual Funds and SICAVs . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
5.3.1 US Mutual Funds versus European UCITS . . . . . . . . . . . . . 294
5.3.2 Functions of Mutual Funds . . . . . . . . . . . . . . . . . . . . . . 295
6 CONTENTS
5.3.3 Fees for Mutual Funds . . . . . . . . . . . . . . . . . . . . . . . . . 298

5.3.4 The European Fund Industry - UCITS . . . . . . . . . . . . . . . . 300
5.4 Index Funds and ETFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.4.1 Index Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.4.2 Capital Weighted Index Funds . . . . . . . . . . . . . . . . . . . . 305
5.4.3 Risk Weighted Index Funds . . . . . . . . . . . . . . . . . . . . . . 308
5.4.4 ETFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
5.4.5 Evolution of Expense Ratios for Actively Managed Funds, Index
Funds and ETFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
5.5 Alternative Investments (AI) - Insurance-Linked Investments . . . . . . . 316
5.5.1 Insurance-Linked Investments . . . . . . . . . . . . . . . . . . . . 317
5.6 Private Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
5.7 Hedge Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.7.1 What is a hedge fund (HF)? . . . . . . . . . . . . . . . . . . . . . . 322
5.7.2 Hedge Fund Industry . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.7.3 CTA Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
5.7.4 Fees and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
5.7.5 Withdrawing Restrictions, Fund Flows and Capital Formation . . 327
5.7.6 Biases, Entries and Exits . . . . . . . . . . . . . . . . . . . . . . . 327
5.7.7 Investment Performance . . . . . . . . . . . . . . . . . . . . . . . . 329
5.8 Event-Driven Investment Opportunities . . . . . . . . . . . . . . . . . . . 334
5.8.1 Structured Products (SP) and Derivatives . . . . . . . . . . . . . . 336
5.8.2 Pricing and Risk Management . . . . . . . . . . . . . . . . . . . . . 337
5.8.3 Political Events: Swiss National Bank (SNB) and ECB . . . . . . . 343
5.8.4 Opportunities to Invest in High Dividend Paying EU Stocks . . . . 344
5.8.5 Low-Barrier BRCs . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
5.8.6 Japan: Abenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
5.8.7 Market Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
5.8.8 Negative Credit Basis after the GFC . . . . . . . . . . . . . . . . . 348
5.8.9 Positive Credit Basis 2014 . . . . . . . . . . . . . . . . . . . . . . . 349
5.8.10 Currencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
5.9 Collateral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
5.9.1 Prime Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
5.9.2 Repo Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
6 Asset Management Innovation 361

6.1 Views on Disruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
6.1.1 Replacement and Prices . . . . . . . . . . . . . . . . . . . . . . . . 361
6.1.2 Market Entrants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
6.1.3 Value Chain, Investment Process and Technology . . . . . . . . . . 367
6.1.4 Some Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
6.2 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
6.2.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
CONTENTS 7
6.2.2 Demand for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 372

6.2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
6.2.4 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . 374
6.2.5 Linear Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . 393
6.2.6 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . . . 395
6.2.7 Tree Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 397
6.2.8 Naive Bayes Classier . . . . . . . . . . . . . . . . . . . . . . . . . 400
6.2.9 Nearest Neighbour Analytics . . . . . . . . . . . . . . . . . . . . . 402
6.2.10 'Sentimental Risk' . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
6.2.11 Customer Retention: Text Mining . . . . . . . . . . . . . . . . . . 404
6.2.12 Portfolio Construction with Machine Learning, I . . . . . . . . . . 406
6.2.13 Portfolio Construction with Machine Learning, II . . . . . . . . . . 409
6.2.14 Penalty approaches in portfolio optimization . . . . . . . . . . . . . 413
6.3 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6.3.1 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6.3.2 Modular Arithimetic (MA) . . . . . . . . . . . . . . . . . . . . . . 415
6.3.3 RSA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
6.3.4 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
6.3.5 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
6.3.6 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
6.3.7 Dierent Blockchain Types, Type of Consensus . . . . . . . . . . . 428
6.3.8 Blockchain Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 429
6.4 Currencies and Crypto-Currencies . . . . . . . . . . . . . . . . . . . . . . . 436
6.4.1 Money and Payment Systems . . . . . . . . . . . . . . . . . . . . . 436
6.4.2 Fiat Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
6.4.3 Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
6.4.4 Bitcoin Blockchain Security . . . . . . . . . . . . . . . . . . . . . . 443
6.4.5 Initial Coin Oering (ICO) . . . . . . . . . . . . . . . . . . . . . . 444
6.5 Demography and Pension Funds . . . . . . . . . . . . . . . . . . . . . . . 445
6.5.1 Demographic Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
6.5.2 Pension Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
6.5.3 Role of Asset Management . . . . . . . . . . . . . . . . . . . . . . 451
6.5.4 Investment Consultants . . . . . . . . . . . . . . . . . . . . . . . . 452
6.6 Uniformity of Minds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
6.6.1 The Great Depression and the Great Recession . . . . . . . . . . . 456
6.6.2 Uniformity of Minds . . . . . . . . . . . . . . . . . . . . . . . . . . 456
6.7 Climate Change and Finance . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.7.1 Green Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
6.7.2 Energy Contracting and Structured Finance . . . . . . . . . . . . . 464
7 Proofs 469
8 Appendix 485
8 CONTENTS
9 References 487
Chapter 1
Introduction
Assets and their management (AM) are a key discipline in a modern economy: we man-
age our assets to maintain the standard of living after retirement, to buy property later,
or because a sovereign wealth fund does not want to lose the assets of future genera-
tions. AM is a process of building, distributing, and maintaining assets throughout the
life-cycle cost-ecient and compliant. Pension funds, institutional investors or private
investors are dierent users of the AM process.
Game Changers
PwC (2015, 2012), McKinsey (2015), Oliver Wyman (2016) and many others identify the
following game changers for the asset management industry:
1
• Growth of wealth: Global assets under management (AuM) will exceed USD 100
trillion by 2020, up from USD 64 trillion in 2012.
• Regulation: In the past, banks dominated the nancial industry. They were the in-
novators. Regulation focused on banks and insurers after the 2008 Great Financial
Crisis (GFC). AM initially faced fewer regulatory requirements and is now moving
more and more center stage.
• Technological Disruptions: Platforms, data analysis and mutual distributed ledger

technologies allow greater connectivity between market participants, a redesign of
the AM value chains, a reduction in life cycle costs, access to new customers and
new approaches to horizontal integration.
• Longevity and demographics: Retirement and health care will become critical issues
as aging grows. The ratio of pensioners to the working-age population will reach
25.4 percent by 2050, up from 11.7 percent in 2010. This puts a strain on pension
systems. The still increasing life expectancy - each new generation will live three
months longer in the developed world - increases the need for individual wealth
1 The data published by consulting rms are private and results cannot be veried nor replicated by
a third party.
9
10 CHAPTER 1. INTRODUCTION
management solutions when people are retired. Asset managers will therefore focus
on long-term investments and on individual asset decumulation. This change aects
in particular the US, Japan, most European countries, South Korea, Singapore,
Taiwan and China.
• The distribution of AM services will be redesigned. Economies of scale force global

distribution on global platforms, and on the other hand, increasing compliance
complexity strengthens regional platforms.
• Fees will continue to decrease for most asset management solutions and regulation
requires to transform many existing fee models.
• Alternative investments transform into traditional ones and exchange traded funds
(ETFs) continue to proliferate.
Climate change is missing in the list above, although it will be one of the most impor-
tant game changer. Furthermore, the game changer 'performance' is missing although
performance is a notorious problem for many investors and there is no consensus about
optimal investment behavior. We will give this topic wide scope.
Today's technology enables new approaches to investment. Such connections between

technology and investment methodology are as important as the technology seen in terms
of process eciency, change in market infrastructure, and data integration. Asset man-
agers face competition from new entrants - FinTechs with technological advantage but
no customer base. The Medallion Fund of Renaissance shows that the interplay between
technology and scientic mastery can make all the dierence: In 26 years of investment
history (1988-2016), the fund has returned 88% annually per year with only a loss of 4%
in a year. Today, even the question arises as to whether technology can generally replace
human abilities - can one generate a digital alpha? But it works or went without tech-
nology. Lord Keynes for more than 19 years outperformed by17% per annum the S&P500.
While regulation dominated the decade after the GFC, the changes caused by tech-
nology are even more profound for the future of AM.
• Technology is irreversible while regulation is not. Regulators could revoke any

regulatory rules. But technology which proves useful to the people cannot be
stopped - how to stop the use of iPhones?
• Technology has still an overall positive connotation - it improves the circumstances

of living and it is creative. Regulation, despite its goals to make the nancial
system safer and to protect customers, fails to be seen in the same way.
• Technology puts clients center stage. Regulation intends to do so.
The current digitization wave diers from the well-known automation. The technology
has matured to a level where abstract banking and asset management products can be
11
understood, researched and valued by clients in a completely dierent way than in the
past. Today's technology is closer to humans than it ever was. Technology is also able
to replace human labor even for complex activities in the AM value chain - which work
will still be human-specic in the AM industry?
Contents
The content can from a methodological point of view split into two parts: Classical
methods and innovation. The former one considers some of the main developments in
the last decades which are in use in the AM industry. These can be the many ways how
portfolios are constructed using the models or methods of Markowitz, factor investing,
Black Litterman and many others. But it also means the way how the AM value chain
is structured and organized. We focus in innovation on two topics: Data science, i.e. the
way how possibly better forecasts can be made or customer needs measured. The second
one are platforms and blockchain. This means new forms how the asset management
infrastructure and value chain can be designed. The traditional models are discussed in
Chapter 3 and innovation is considered in Chapter 6.
The content can from a topical point of view split into two parts: Standard and
trends. The rst one includes to understand how dierent asset or asset classes behave,
how their are selected and managed. Besides the technological trends described above
the focus is on retirement provision. The standard material appears in all rst ve chap-
ters. The trends in retirement provision are presented in Chapter 6
The content can from an intellectual point of view split into two parts: Need-to-know
and need-to-think. It is important to know facts about the status and the projections of
the AM industry. Else, the whole approach to AM is purely theoretical. Therefore, facts
and gures about the AM industry are given full weight. Since AM always means to turn
information into numbers analytical methods play a dominant and key role. Besides the
traditional techniques we introduce to the theory of learning and blockchain which are
basic in AM innovation.
The rst chapter focuses on global asset growth, various types of investors and the
Ecient Market Hypothesis (EMH). Then follows a regulatory section before discussing
basic AM concepts such as risk, return, performance, performance attribution, and di-
versication without relying on sophisticated investment models.
Portfolio construction in Chapter 3 has three steps:
• Asset selection and their grouping dene the opportunity set of the investor.
• Asset allocation tells how much capital is invested in the selected assets.
• Asset implementation maps the theoretical asset allocation into trades.

We discuss the Capital Asset Pricing Model (CAPM), the Fama - French three- and
ve-factor models, 60/40 investment rule, Black Litterman, risk budgeting and general
factor models. We focus on estimation of the covariance matrix, stability issues and
estimation risk. Factor investing, which has become fashionable in many countries, has
some fundamental problems. Harvey et al. (2015): Hundreds of papers and hundreds of
factors attempt to explain the cross-section of expected returns. Given this extensive data
mining, it does not make any economic or statistical sense to use the usual signicance
criteria for a newly discovered factor, e.g., a t-ratio greater than 2. [...] Echoing a re-
cent disturbing conclusion in the medical literature, we argue that most claimed research
ndings in nancial economics are likely false. We therefore discuss backtesting in some
details.
Chapter 4 provides a synthesis for the statistical models described in the portfolio
construction. In other words, asset pricing in the absolute and relative (no arbitrage
theory) is discussed.
Chapter 5.1 provides an overview of the AM industry. Some key ndings from McK-
insey (2015) for the period 2002-2015 are:
• AM rms dominate both banks and insurers in capitalization and P/E ratios.
• The global annual AuM growth rate is 5%, with strong regional dierences. The
main driver was market performance.
• The absolute value of prots increased while the prot margins decreased. The ab-
solute revenues in China, South Korea, Taiwan are almost at par with the revenues
in Japan, Germany, France and Canada.
• Retirement and dened contribution pension plans grew globally with a Com-
pounded Annual Growth Rate (CAGR) of 7.5% almost twice as strong as the
retail sector and the institutional customer's CAGR was 5% .
• The growth rate of passive investments is larger than for active solutions.
• Active management of less liquid asset classes, or with more complex strategies, is
increasing.
Table 8.2 summarizes expected AuM growth until 2020. We discuss mutual funds,
Clients 2012, USD tr. 2020, USD tr. Growth rate p.a.
Pension funds 33.9 56.5 6.5%
Insurance companies 24.1 35.1 4.8%
Sovereign wealth funds 5.2 8.9 6.9%
HNWIs 52.4 76.9 4.9%
Mass auent 59.5 100.4 6.7%
Table 1.1: Expected AuM growth until 2020 ((PwC [2014]).

13
SICAVs, compare the US Mutual Funds and the European UCITS industry, index funds,
ETFs, expense ratios for dierent collective scheme wrappers, insurance linked invest-
ments, the real estate asset class and hedge funds.
We compare fund wrappers with opportunistic market investments, i.e. with deriva-
tives and structured products. The event of January 15, 2015, when the Swiss National
Bank removed the minimum exchange rate for EURCHF is analyzed. The erce (over)-
reactions led to investment opportunities that required rapid product issuance for invest-
ment as some of these opportunities lasted only a few days. This was an ideal set-up for
derivative investment solutions.
The next chapter deals with AM innovation and technology. Investments in FinTech
raised in the period 2010-2017 up from USD 1.4 billion in to USD 26 billion. The survey
of McKinsey (2015), for the sample of more than 12'000 FinTech, states that 62% of the
start-ups target private customers, 28% SMEs and the rest large enterprises. Most Fin-
Techs work in the area of payment services (43% ) followed by loans (24% ), investments
(18% ) an deposits (15% ).
AM is aected dierently by digitization: market platforms, smarter and faster ma-

chines, customer relocation, process outsourcing and empowered investors are key fea-
tures. Regulation must nd its new role in this dynamic environment. We then look
at data analysis (big data), block chain and crypto-currencies. Whatever the realized
changes in the future, the implications for the employees are clear: Fewer people will
work in the AM industry and the requirement prole continue to be more demanding.
The remaining sections cover demographic trends, uniformity of minds in a global infor-
mation society and the relationship between the climate change and nancial innovation.
Some topics are not taken into account: custody, execution, client reporting, struc-
turing in investment funds, a detailed discussion of tax issues and cross-border asset
management. The target audience for this book is students who have a bachelor's degree
in nance or economics. The proofs of the main statements are summarized in a separate
capital.
I am grateful for the assistance of Dave Brooks and Theresia Büsser. I would like to
thank Sean Flanagan, Barbara Doebeli, Bruno Gmür, Jacqueline Henn-Overbeck, Tim
Jenkinson, Andrew Lo, Helma Klüver-Trahe, Roger Kunz, Tom Leake, Robini Matthias,
Attilio Meucci, Tobias Moskowitz, Tarun Ramadorai, Blaise Roduit, Olivier Scaillet,
Stephen Schaefer and Andreas Schlatter for their collaboration, their support or the
possibility to learn from them.
Chapter 2
Fundamentals
The two words in the term 'Asset Management' (AM) require an explanation: What do
we mean by an 'asset,' who 'manages' the assets, and how is this done?
Denition 1. Financial assets are nancial contracts that dene resources over which
property rights are enforced and from which future economic benets can ow to the
owner. An asset class is a group of nancial assets that share predened economic, legal
and regulatory characteristics.
Financial assets are intangible, non-physical assets. Financial assets often are more
liquid than tangible assets. Traditional asset classes are equities, xed income securities,
money market instruments and currencies. Alternative asset classes include real estate,
commodities and private equity. Hedge funds are not an asset class but an investment
strategy dened for liquid asset classes.
Asset management is a systematic process of analyzing, trading, lending and borrow-

ing assets of all kinds. Since all assets belong to a person, the management of the assets
results from a decision on the investment strategy - a decision made by the owner of the
assets or by a third party. McKinsey (2013) estimates that third-party asset managers
managed a quarter of global nancial assets worldwide. Main outsourcers of assets are
pension funds, sovereign wealth funds, family oces, insurance companies, and private
households. Third-party managed portfolios are either mutual fund companies or dis-
cretionary mandates. In a mandate, the owner of the asset delegates the investment
decision to the asset manager. Funds combine assets with a certain level of risk into a
collective system. Diversication is the key risk concept. Investors buy and sell shares
in funds (mutual funds, ETFs or hedge funds). The asset management function can be
organized as an independent rm (Blackrock, Amundi) or the AM function is part of a
bank or insurer (Goldman Sachs Asset Management).
AM means managing assets from savers to investors. The goal of investing is to save
today for the benets of future consumption. The benet after an investment period
should be greater than the present direct consumption of all resources. Investments are
15
16 CHAPTER 2. FUNDAMENTALS
made through the use of securities of all kinds - that is, money, stocks, bonds, ETFs,
mutual funds or derivatives. Securities are tradable nancial assets. They are issued
through nancial intermediaries (primary market) and can often be traded on the sec-
ondary market. They dier among others in their ownership, complexity, liquidity, risk
and reward prole, transaction fees, accessibility and regulatory compliance.
Pricing and price forecasts of assets are particularly important. There are two ways
to praise assets in theory: absolute pricing as an equilibrium outcome in an economy, and
relative pricing using the concept of no arbitrage, see Chapter ??. While relative pricing
is the approach for derivatives for cash assets often empirical pricing models are used.
These models can fail to be backed by an equilibrium model. Instead they follow from
working with data empirically in an econometric way or more recently by using machine
learning and AI. This third approach is the far most used one in the industry although
lack of theoretical foundations and misuse of statistics often leads to awed investment
strategies - data mining, data snooping, inaccurate backtestings drive the results of the
investment strategy.
To summarize, the four key questions in AM are:
1. Who decides?
2. How do we invest? The investment method question.
3. Where do we invest? The asset selection question.
4. How are asset management services produced and distributed in dierent jurisdic-
tions? - the protability, process, client segmentation, regulation and technology
question.
In the past, technology was needed to implement the investment theory (estimation
of model parameters, for example). Tew technologies enable radically new investment ap-
proaches that dier from traditional statistical models such as the Capital Asset Pricing
Model (CAPM). But technology is also the key factor in scaling the business, managing
regulatory complexity, i.e. to keep or increase protability.
Question 4. attracted a large part of the asset management resources in the decade
after the GFC due to regulatory and technological changes and also to dierent client
expectations. This question can be considered as the sum of the following strategic
business issues (UBS [2015]):
• In which countries does an AM rm want to compete in? The answer to this
geographical question depends on the AM rm's actual strength, its potential, the
costs to comply with the country specic regulation, the costs to build up the
human capital and the business and technological complexity.
• Which clients should be served?

2.1. WEALTH OF NATIONS AND ASSETS UNDER MANAGEMENT (AUM) 17
• Which products and investment areas should the AM rm focus on? Often large
AM rms oer up to several hundred investment strategies.
• What services should be provided and which technologies should be used for them?
• What operating model should be used? This question has a distribution dimension
(global vs. (multi)-local oering), an operational one (centralized vs. decentral-
ized), a value-chain one (in-house vs. outsourcing) and a legal/tax environment
one (on-shore vs. oshore).
2.1 Wealth of Nations and Assets under Management (AuM)

Prosperity growth is the raw material for asset management. Figure 2.1 shows the relative
distribution of wealth worldwide in the last 2000 years in absolute and relative terms.
Figure 2.1: The size of the area indicates the proportion of global GDP produced in that
area during the years concerned. GDP is measured in USD to oset purchasing power
parity. In each chart, the total assets are displayed in USD. 1 AD means the year 1 anno
Domini in the Julian calendar.(worldmapper.org).
In the period up to 1500, the distribution of wealth was stable and proportional to
the population but not to the distribution within a population. This reects the small
productivity dierences. This changed radically as Europe and then North America ruled
the rest of the world. Globalization and the end of colonialism, in which the economic
dierences between countries are shrinking, are changing the distribution of GDP to-
wards the time of the Roman Empire. In absolute terms, it took 400 years to double
global GDP from $ 1 trillion to $ 2 trillion (1500-1900), but it only took 30 years from
1960-1990 to triple global wealth.
Assets under Management (AuM) is the market value of assets that an investment
company manages on behalf of investors. AuM is often used as a measure of growth
between asset managers. As protability varies widely for dierent types of assets, AuM
should be used with caution to draw conclusions about the asset manager's protability.
There are also very dierent views about what AuM means. For example, GIPS (Global
Investment Performance Standards) is the market standard or AuM reportings investors.
PwC (2015) estimates that global AuM will exceed USD 100 trillion by 2020, up
from USD 64 trillion in 2012. Other estimates are similar. These gures would result
in an annual global compounded growth rate of 6 percent. This rate varies for dierent
geographic regions (Boston Consulting Group [2016]):
• Western Europe, northern America, Japan: 1.6% p.a.
• Emerging Markets (EM): South America, BRIC states, Middle East, Eastern Eu-
rope: 8.5% p.a.
The dierent growth rates dene opportunities for wealth managers in developed markets
to oer solutions in fast-growing markets. Therefore, market access for the development
of AM plays a prominent role. At the individual level, per capita GDP in 2016 was USD
11'000 for the emerging economies and USD 47'000 for the industrialized countries. The
estimates for the period 2016-2021 are 150% for the EM and 50% for the developed ones
(IMF World Economic Outlook [2016]).
The evolution of EM can also be seen by considering specic assets, see Table 2.1 for
the emerging market bonds market share growth. The credit quality of the EM bonds
Markets 31 Dec 1989 31 Mar 2016 trn USD

US 61.30% 38.10% 37
Developed Markets ex US 37.80% 44.30% 44
Emerging Markets 1.00% 17.50% 17
Table 2.1: Bond market shares (Barclays Capital, BIS, FactSet, J.P. Morgan Asset Man-
agement [2016]).
also changed in the last two decades. 20 years ago almost 100% of the bonds had a high
yield creditworthiness, in 2016 only 45% had such a rating in the JP Morgan EM bond
index and therefore with 55% with an investment grade rating. Figure 2.2 shows other
dimensions of the EM developments.
Wealth growth must be compared to the dynamics of wealth inequality. Increase in
inequality is likely to destabilize the growth of wealth as it leads to social and political
instability. Inequality risks are among the highest risks in the annual global risk map of
2.1. WEALTH OF NATIONS AND ASSETS UNDER MANAGEMENT (AUM) 19
Share of Global Nominal Consumption

0.4 14.00%
EM Private Credits EM Real Interest Rate
12.00% Growth
0.35
10.00%
0.3
8.00%
0.25 6.00%
4.00%
0.2
2.00%
0.15
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
0.00%
Taper Tantrum 2017 Taper Tantrum 2017
EM Consumption US Consumption
Figure 2.2: Upper left panel: Share of global nominal consumption measured in current
USD expenditures. Upper right panel: EM country fundamentals at the time of the
taper-tantrum and measured at the beginning of 2017. Lower panel: Creditworthiness
of EM countries. The right panel shows the divergence for dierent EM countries. (
J.P. Morgan Guide to the Markets, UN, World Bank, J.P. Morgan Global Economics
Research [2013, 2015, 2016])
the World Economic Forum. On the one hand, the global increase in wealth has been
the main reason for poverty to fall worldwide at a level never seen before in history. But
on the other hand, CO2 emissions due to the changed living conditions, mobility, meat
dominated food, tourism among others, have also increased massively.
The global wealth projections of PwC (2015) for dierent types of investors are shown
in Table 2.2.
Mass auent clients and HNWIs in emerging markets are the main drivers of AuM
growth. The global middle class is projected to grow by 180 percent between 2010 and
2040, with Asia replacing Europe as home to the highest proportion of middle classes as
early as in 2015 (OECD, European Environment Agency, PwC [2014]). The growth of
pension funds will be large in countries with fast growing GDPs, weak demographics and
dened contribution pension schemes.
Clients 2012, USD tr. 2016, USD tr. E2020, USD tr. Growth rate p.a.
Pension funds 33.9 38.3 53.1 6.5%
Insurance companies 24.1 29.4 38.4 4.8%
SWF 5.2 7.4 10 6.9%
HNWIs 52.4 72.3 93.4 4.9%
Mass auent 59.5 67.2 84.4 6.7%
Total AuM 175.1 214.6 279.3
Table 2.2: There are double counts. Assets of wealthy individuals (HNWIs) are invested
in insurance and pension funds. Mass auent refers to individuals with liquid assets
between USD 1-3 mn. HNWIs possess liquid assets of USD 3 - 20 mn. The categorization
is not unique. The predictions of the 2020 AuM changed from 2015 and 2018 vista time.
While the numbers were stable for pension funds and insurance companies, the forecast
for HWNI was signicantly correceted upwards and the mass auent number for 2020
is now signicantly lower (PwC [2015], PwC [2018]).
2.2 Investors
There are dierent types of investors: private clients, high net worth individuals, pension
funds, family oces or state investment funds. At a higher level, investors are divided
into private investors and institutional investors. The ownership of assets between these
two categories changes over time, see Figure 2.3 for the US.
2.2.1 Sovereign Wealth Funds (SWFs)

SWFs are among the largest wealth owners in the world. The largest SWF in 2018 was
the Norwegian government's pension fund with assets of $ 1,002 billion. The next largest
are from the Middle or Far East: Abu Dhabi, United Arab Emirates, Saudi Arabia,
China, Kuwait, Hong Kong, Singapore and Qatar. All manage funds with assets ranging
from $ 200 to $ 800 billion.
Why are there so many SWFs in emerging markets? First, more than 50 percent
of all large SWFs originate in oil. Second, Asian governments are much more active in
managing their economies than some of their western counterparts. According to Ang
(2014), another reason is that the US, after the many state bankruptcies of the 1980s
and 1990s, told emerging markets to save more. In recent years, a debate has begun on
whether it is productive to accumulate so much capital in sovereign wealth funds. Would
not it be more productive to invest capital directly in the local economy?
Many SWFs accumulate liquid assets as reserves for unexpected future economic
shocks. This forms a long-term precautionary motive for future generations. This mo-
tivation is crucial for the acceptance of a SWF. A SWF can only exist if it has public
support. This public support is a sensitive issue. Scandals due to incompetent fund
2.2. INVESTORS 21
Figure 2.3: Equity ownership in the US. In the 1950s, 90% of equity in the US were held
by private investors. This number dropped almost linearly to 40% by the end of 2010
and then began to rise slightly. The fraction of equity ownership held by institutional
investor follows the opposite evolution. Source: Rohner [2014].
management, lack of integration of the fund into economic strategies, political misman-
agement and criminal acts should be avoided. All changes in the risk policy for asset
management must be documented and communicated to the owners of the Fund. For
example, the Norwegian SWF initially invested only in bonds. Only after a broad public
discussion was a diversication of investments into other asset classes considered. This
behavior of Norwegians is unique and rooted in their democratic tradition.
2.2.2 Pension Funds

Large pension funds can be managed at the state level, but most are privately owned,
unlike SWFs. The assets managed by pension funds vary between 70 percent (US) and
130 percent (Netherlands) of GDP (2017). Why are there pension funds? Pension funds
can provide individuals with risk-sharing mechanisms that are not feasible on their own.
Consider a 25 year old person who wants to protect their future capital at retirement.
The nancial markets do not oer capital-protected products with a maturity of 40 years
- the markets are incomplete. A retirement plan can mitigate today's generation risks by
creating buer stocks over the next 40 years across generations. A risk sharing between
generations takes place: In this sense, pension funds complete the market by adding a
synthetic innite lived market participant - the aggregate over all generation - which
allows individual to share their life cycle-specic investment risk.
Pension funds are one part of the total pension system of a country, which is often
divided into three pillars:
• Pillar I - This pillar should cover the subsistence level and is often organized ac-
cording to the pay-as-you-go system. Each month, employees pay part of their
salary, which is immediately distributed to pensioners.
• Pillar II - This is the pillar of the pension funds. It should be enough to cover
the cost of living after retirement together with pillar I. The asset owners only
have limited access to their assets. There are two types of funds: Dened Benet
(DB) and Dened Contribution (DC). DB plans are based on predetermined future
benets for the beneciaries, but keep the contributions exible. DC plans x the
contributions but not the future benets. In summary, the contributions dene the
benets in the DC plans and the benets dene the contributions in the DB plans.
• Pillar III - Privately managed investments, which often have tax advantages. Access
to assets before retirement is usually limited.
Figure 2.4: Left panel: The importance of the three pillars in percentage of retirement
income (ABP [2014]). Right Panel: Basic form of DC and DB pension plans.
Figure 2.4 illustrates the importance of dierent pillars in dierent countries.Retirement

systems are under pressure in most industrialized countries due to demographic changes
and increasing longevity. For the rst pillar, demographic change means that working
people pay on average for a growing number of retirees. This jeopardizes the concept of
2.2. INVESTORS 23
intergenerational risk sharing.
The threat to the rst pillar has major implications for national budgets. The rst
pillar accounts for more than 90 percent of retirement income in Spain. For Germany,
France and Italy, the value is between 75 percent and 82 percent. Given the extremely
low fertility rates and high unemployment among young people in Spain and Italy, the
rst pillar can not survive. Shifts into the second or third pillar are required, which rep-
resents an opportunity for asset management. But this only makes sense for the workers
with a regular income. The pension problem of the mass of today young people without
work remains unresolved.
The drivers in pay-as-you-go systems
The above statements can be illustrated by the following back-on-the-envelope cal-

culation. Assume that people work 40 years, retire and live for another 20, that the
population is the same for every year (normalized to 1), that all workers earn 1 unit per
year and that they pay in the benchmark case 20% of their income to the rst pillar.
Therefore, retired earn 0.4 units per year from the rst pillar which is enough to survive
but also requires a second pillar. Assume next that 10 percent of the workers are unem-
ployed which earn the same 0.4 from the workers and that the working cohorts are 10%
smaller than the retired ones (demography). Then, the rst pillar contribution of the
workers increases by 33% per annum. This puts the system under stress: In southern
Europe unemployment rate for generations of young workers are higher than 20% and
the Japan, the working class will not shrink by 10% compared to the retired persons but
it more likely to drop by 50% in the next 20 years. Under such stress scenarios, workers
will pay 50% of their income for the rst pillar.
In the DB plans, the pension is set in relation to the last average salaries, see Figure
2.4. The contributions are calculated in such a way that they generate a predened cap-
ital stock at the end of working life. Therefore, an increase in salary requires additional
funds in order to maintain the full rent. On the other side, a year with very low income
can have dramatic eects for the contributor in the retirement period. Since the nancing
amount can change on an annual basis, they are considered intransparent.
In DC plans, the xed contributions are invested in several asset classes and the rent
is only weakly related to the most recent salary of the contributor. The growth of the
invested capital, including interest payments, implies a nal capital value at the end of
working life. The conversion rate applied to that nal capital level nally denes the
annual rent. Contributors to DC plans - contrary to those who contribute to DB plans
- bear the investment risk. This makes this form of pension system cheaper to oer for
employers. Unlike DB plans, the contributors can at least partially inuence investment
decisions - that is, choose the risk and return levels of the investments. This is one reason
why DC plans have become more attractive to contributors than their DB counterparts.
Finally, in some jurisdictions, DC plans are portable from one job to the next, while DB
plans often are not portable.
Underfunding is a serious problem. The S&P 500's biggest pension plans faced 2018
a $ 382 bn funding gap, of the 200 biggest DB plans in the S&P 186 aren't fully funded
in 2018. Companies like Intel have a ratio of pension assets to pension obligation of less
than fty percent. In Switzerland, the average funding ratio of private pension funds
in 2013 was 107.9 percent (Kunz [2014)]). The ratio for public funds was 87.8 percent,
showing strong underfunding. Private and public pension funds dier even more severely
when comparing the overfunding and underfunding gaps: For the Swiss private sector,
there is CHF 16.2 billion of overfunding capital and CHF 6.4 billion of underfunding. In
the public domain, the situation is the opposite: CHF 1.4 billion of overfunding versus a
CHF 49.5 billion funding gap.
2.2.2.1 DB versus DC Planes

There was a rapid shift from the DB plans in the 1980s to DC systems in the US and
UK. In the United States, nearly 70 percent of the 2017 pension funds are DC. This is a
percentage reversal from the situation 30 years ago. This system change took place more
quickly in the private sector than in the public sector, as the state can rely on taxpayers.
What are the causes of these changes? One reason is regulation, which, according to the
proposals of the Basel Committee and also under Solvency II, requires a certain coverage
ratio for the insurance industry. Furthermore, IFRS accounting standards since 2006
state that a funding decit should be included in the balance sheet of the companies.
For DB systems, the shortfalls are nanced by the employer, so that guarantees are on
the balance sheet of the companies. By switching to DC plans, which are not guaranteed,
the burden on the balance sheet disappears for the companies.
Another perspective associated with the transition to DC-based plans is the average
undersavings in such plans. Munnell et al. (2014) report that in 2013 the average DC
portfolio at retirement is USD 110,000, while over USD 200,000 is needed. Finally, DB
and DC dier in their costs. The CEM benchmarking (2011) which considers 360 global
DB plans with 7 trillion USD assets nd a fee range between 36 and 46 bps. Munnell
and Soto (2007) estimate the fees for DC plans between 60 and 170 bps.
Another perspective on DC and DB is nancial literacy - the ability of decision makers

to understand their investments. By denition, employees make investment decisions in
DC plans. Several studies document that a majority of employees want to delegate their
investment decision. One reason is their knowledge in nancial matters. Gale and Levine
(2011) are testing four traditional approaches to nancial education - employer-based,
school-based, credit counseling or community-based. They note that none of the literacy
eorts have had a positive and substantial impact.
2.2. INVESTORS 25
2.2.2.2 Demographic Changes and Longevity

An AM trend in many countries will reect the increasing importance of asset consump-
tion by the baby boomer generation in retirement, compared to previous generations
whose main goal was saving assets. This shift from an accumulation regime to asset
consumption has a deep impact on the delivery of AM solutions. Asset consumption
in retirement is a personalized asset-liability management problem. Accumulation of
wealth, on the other hand, is much less individualized. Baby boomers will therefore
demand tailored asset-liability management solutions.
Furthermore, private savings are becoming more important due to the problems in
the rst pillar. They will be responsible for a larger part of their assets and bear the
investment risk. Given the inability to cover retirement losses, pension fund clients will
ask for less risky assets.
Many nancing and redistribution risks between the active insured and the retires
exist. Many countries dene a legal minimum xed interest rate which has to be applied
to the minimum benet pension plan. This rate is in Switzerland 1.75% for 2015 and
1.25% in 2016. Given the CHF swap rate for 10 years in 2015 close to zero, it is not
possible for a pension fund to generate the legally xed rate using risk free investments.
This denes the nancing risk for the contributing population to a pension plan.
To understand redistribution risk, we consider the technical interest rate. This is by

denition the discount rate for pensions. Since pensions cannot in Switzerland be changed
after the day of retirement (constitution), any reduction of the technical interest rate leads
to higher capital for the retired population in order to maintain their pensions unchanged.
The technical rates are 2016 signicantly higher in most low-interest countries than the
interest rates: The pensions paid out are simply too high, see Figure 2.5. Axa Winterthur
(2015) estimates that in Switzerland CHF 3.4 bn are redistributed from active insured
to retired persons every year. If the ratio between the active and retired populations
changes in the future due to the demographics and longevity issue, future low interest
periods will sharply increase the annually redistributed amounts.
If interest rates are low, pension funds are forced to consider alternative investments:
Invest more or newly in stock markets, credit-linked notes, private markets, liquid invest-
ment strategies (smart beta or factor investing), insurance-linked investments, high-grade
securitized mortgages or senior unsecured loans. These alternatives induce dierent risks
and the experience of many pension funds is limited. Pension funds can also reduce their
costs. This would help but not solve any of the above problems due to demographics,
low interest rates or longevity risk.
2.2.3 Management of Pension Funds

The obvious approach to managing pension funds is to match the assets with the liabil-
ities. This means optimizing the dierence between asset and liability (surplus). This
Figure 2.5: The return of the 10y Swiss government bond, the minimum legal rate for
Swiss pension plans and the technical rate for privately insured retired individuals. If
this status remains unchanged in the next years then underfunding becomes a serious
issue and there can be no signicant return expected from investment in the xed income
asset class. The technical rates are even higher than the minimum rates which indicates
the extent at which actual pensions are too high. (Swisscanto [2015], SNB [2015], OAK
(2014]).
is not a trivial task. The mismatches between risk, growth and maturity between assets
and liabilities are reasons for this. While assets and risks are market-driven, the val-
ues and risks of liabilities are dened primarily by the characteristics of contributors to
pension funds, demographic changes and policy interventions - all non-market factors.
Furthermore, the growth rate of liabilities is more stable than for assets.
Another reason is implicit or explicit return guarantees on the liability side. Guar-
antees cut linear payo liabilities; non-linear disbursements, ie options arise. Unlike
standard nancial derivatives on stocks, the pricing of these options is much more com-
plex and opaque: the underlying assets are not tradable (incomplete market building)
and risk-sharing mechanisms must be considered in option pricing. For the most part,
these options are neither valued nor hedged. But they exist and can adversely aect the
goals of a pension fund. A third reason is the overlapping of generations in the design
of the pension system. How can collective pension funds generate higher welfare than
individual solutions? See Cui et al. (2011) for a formal model.
2.2. INVESTORS 27
We are pursuing the less ambitious task of taking the asset side management into
account, with the liability side is implicitly included in the asset return benchmark. It
is customary to divide the yield contribution into three parts: strategic asset allocation
(SAA), tactical asset allocation (TAA) and stock selection. The SAA is an asset allo-
cation over a long-term period of 5-10 years. The SAA is based on unconditional past
information; returns are unconditional expectations. The TAA seeks to exploit the pre-
dictability of returns over a short to medium term horizon. TAA forecasts are conditional
expectations, the current status of the nancial market or the business cycle matter. As
a result, SAA weights change slowly over time while TAA weights are more dynamic.
Formally,
SAA : EP (Rt+1 ) , TAA : EP (Rt+1 |Ft ).
Denition 2. (Sharpe (2007)) In SAA, an investor's return objectives, risk tolerance,

and investment constraints are integrated with long-run capital market expectations to
establish exposures to permissible asset classes and currencies. The end result is a set of
portfolio weights (of asset classes) that denes the investor's risk-return trade-o.
The SAA's primary objective is to create a long-term optimal expected risk and
return asset mix. The SAA divides assets into dierent asset classes, geographic regions,
sectors, currencies and various credit rating levels. The related to systematic risk: The
risk inherent in all assets (in an asset class) that can not be reduced by adding more
assets.
Denition 3. A risk premium is the compensation for bearing systematic risk dened
by its expected excess return (expected return less risk-free (benchmark) return).
The TAA bets on the predictability of asset return, i.e. are asset returns predictable?
If they are not, markets are ecient, why are then CIOs betting permanently? Although
the concept of a TAA has existed for more than 40 years, practitioners and scientists
attribute dierent meanings to a TAA. Practitioners use a one-period setup to dene
a TAA. Academics often use intertemporal portfolio theory to derive dynamic optimal
investment rules. This theoretical optimal TAA that has a short-sighted one-period and
a dynamic hedging demand component. The short-sighted part of the optimal TAA
corresponds to the TAA of practitioners. The other component is missing in practice,
see Sections 3.3.4.6 and 4.7.
Example Historical background TAA
We refer to Lee (2000) for a detailed discussion. The rst investment rm to consider
a TAA was Wells Fargo in the 1970s. The decline in many assets during the 1973-1974
oil crisis increased investor demand for alternatives to shifts within a particular asset
class. Wells Fargo proposed shifts across asset classes and bonds. The system was able
to generate positive returns over a period when stock markets fell more than 40 percent.
In the 1980s, portfolio insurance became popular based on the option price theory. These
dynamic strategies seek to maintain a guaranteed minimum portfolio return (oor). The
Constant Proportion Portfolio Insurance (CPPI) approach largely simplied the option
approach, making portfolio insurance even more attractive to investors. The global stock
crash in 1987 shifted the investor's interest away from portfolio insurance back to TAA,
as portfolio insurance strategies mostly did not deliver the guaranteed oor, while TAA
strategies suered before the crash, but outperformed shortly thereafter.
Let's go back to the management of pension funds. We assume that the people at the
top of the funds have little investment knowledge. Their decisions concern the SAA. At
the lower end of the fund hierarchy are the experienced asset managers. Their success is
measured in relation to TAA and they seek to generate excess returns (commonly known
as 'alpha') over the TAA benchmark by selecting assets. However, many empirical studies
show that SAA is the most important determinant of total return and risk of a broadly
diversied portfolio. This denes the discrepancy between economic relevance and know-
how in the hierarchy of decision-makers.
• Brinson et al. (1986) report that around 90 percent of the return variance arrives
from the passive investment part. Subsequent papers claried these ndings and
estimate the importance of these returns being between 40 percent and 90 percent
(see, for example, Ibbotson and Kaplan [2000]). Schaefer (2015), one author of the
professors report to the Norway's Government Pension Fund Global, states that
the variance attribution to the benchmark return was 99.1% and only 0.9% was
attributed to the active return.
• Between 5 and 25 percent are due to TAA and related to the Chief Investment
Ocer (CIO) function.
• Between 1 and 5 percent are due to security selection by the portfolio managers.
We turn to the denition of alpha. Consider the excess return Rte which can be
B
regressed on an excess return Rt of a benchmark portfolio
Rte = α + βRtB + t
with beta the constant regression parameters, t idiosyncratic or mean zero random
component of residual return:
Rtr = α + t .
We will encounter such return decomposition over and over again.
Denition 4. Look-forward (ex-ante), alpha is a forecast of residual return. Looking

backward (ex-post), alpha is the average of the realized residual returns.
Alpha has the portfolio property, i.e. the alpha of two returns is equal to the position
weighted sum of the two alphas. By denition the benchmark portfolio always has zero
2.2. INVESTORS 29
residual return: The alpha of the benchmark portfolio must be zero which also holds for
a risk-free portfolio and in particular cash.
We turn to active portfolio management. An active position is the dierence between

the portfolio holdings and the benchmark holdings. To reach an active position active
management decisions are needed. This means that the manager's forecasts dier from
the consensus. We consider this in some detail in the Section about skill and luck.
2.2.4 TAA Construction

Having introduced the concept of a TAA, we consider its construction in this section for
a Swiss intermediary.
ISV TAA ISV TAA Weight Duration

Country FX Position Benchmark ISV Weight Duration Implementation ETF ETF ETF
CH CHF Liquidity Libor 1 Monat 4.50 Pictet CH Short-Term Money Market CHF 4.50 0.14
CH Govis 14.00 4.78 14.00 4.78
CH CHF Bonds 1 - 5 SBI Domestic AAA-BBB 1-5y TR 9.66 CS ETF (CH) on Swiss Bd Idx Dom Govt 1-3 8.76 1.82
CH CHF IG SBI Domestic Non-Gov AAA-BBB TR 14.00 ZKB-CIF Swiss Bd TM Idx AAA-BBB Dom E 14.00
EU EUR Govis JPM EMU Government Xy TR 2.50 4.90 2.50 4.90
EU EUR Bonds 5 - 10 JPM EMU GOV 5-7/7-10y 1.13 iShares Barclays Cap Euro Gov Bd3-5 1.64 3.60
EU EUR IG Citigroup EUROBIG Corporate TR 2.25 ZKB-CIF EUR Corp. Bond Index E 2.25
EU EUR HYCB 0.25 iShares Markit iBoxx Euro Hi-Yld Bd (IE) 0.25
UK GBP Govis JPM GBI UK Xy GBP TR 1.00 5.06 1.00 5.06
USA USD Govis JPM GBI US Xy USD TR 1.00 5.00 1.00 5.00
USA USD Bonds > 10 JPM GBI US 10y+ USD TR 0.05 iShares Barclays Cap $ Trsy Bd7-10 0.55 7.53
USA USD IG Citigroup USBIG Corporate TR 0.80 ZKB-CIF USD Corp. Bond Index E 0.80
USA USD HYCB BofA Merrill Lynch US High Yield TR 0.20 iShares iBoxx $ High Yield Corporate Bd 0.20
CAN CAD Govis JPM GBI Canada Xy CAD TR 1.00 4.85 1.00 4.85
J JPY Govis JPM GBI J Xy JPY TR 0.00 0.00Swisscanto (CH) Inst BF-JPY I 0.00 4.81
AUS AUD Govis JPM GBI Australia Xy AUD TR 1.00 4.64UBS (Lux) BF AUD P Acc 1.00 4.90
EM USD EM Bonds JP Morgan EMBI+ TR 3.00 iShares JPMorgan $ Emerging Markets Bond 3.00
CH CHF Stocks MSCI Switzerland NR 11.00 Amundi ETF MSCI Switzerland A 11.00
EU EUR Stocks MSCI Europe ex Switzerland NR 9.50 Amundi ETF MSCI Europe Ex Switzerland 9.50
N-Americas USD Stocks MSCI North America USD NR 8.50 iShares MSCI North America 8.50
Asia / Pacific USD Stocks MSCI Pacific USD NR 6.50 ComStage ETF MSCI Pacific 6.50
EM USD Stocks MSCI Emerging Markets USD NR 4.50 db x-trackers MSCI Emerg Mkts TRN 1C 4.50
Global CHF Hedge Funds HFRX Global Hedge Fund CHF Index 4.50 db x-trackers db Hedge Fund 5C 4.50
Global USD Commodities DJ UBS Commodity TR Hedge to CHF 3.00 ZKB-CIF Commodity Index hedged CHF E 3.00
Global USD Gold Spotpreis USD/Unze CHF Hedged 3.00 ZKB Gold ETF CHF Hedged 3.00
CH CHF Real Estate CH SXI Real Estate Funds Index 4.00 UBS-IS - SXI Real Estate Funds I 4.00
Total 100.00 100.00
Figure 2.6: Inputs in a TAA. The table shows the dierent asset classes, their volatility
adjusted benchmark, the implementation when using ETFs, the weights and durations
of the benchmark and ETF portfolio. Source: ZKB (2013)
Figure 2.6 shows the inputs in the TAA construction. The considered asset classes
are cash, bonds, stocks, hedge funds, commodities, gold and real estate CH. Bonds are
split in three time buckets, 1-5y, 5-10y and more than ten years and into government
bonds corporate investment grade (IG) and high yield (HYCB) bonds. The list of ETF
represents a possible implementation of the benchmarks. The weights of the portfolio
are the result of an optimization such as a mean-variance optimization or the solution of
a risk budgeting model. The weights of the dierent asset classes are volatility weighted
(the ISV notation) which will explained below.
The positions in the TAA are replicated using liquid and cheaper instruments than
ETF: futures, forwards and swaps which we call REP instruments, see Table 2.3. We
introduce to swaps in Section 2.3.2 and discussion forwards and futures in the next
section. We see that equity, foreign bonds, commodities are replicated using futures,
currency using forwards and the swaps are used for CHF bonds. This means that asset
class Aj is written as a linear combinations of the REP instruments. Say the 11%
allocation of Swiss stocks is equal to 9% in SMI Fut and 2% in SMIM Fut. The splitting of
the individual assets classes into the REP experiments is done by minimizing the tracking
error of the REP instruments towards the benchmark. Hence, a given REP instrument
can contribute to several asset classes. The allocation of the REP instruments is shown
in the table Allocation 100%. Hedge funds and real estate cannot be attributed to the
liquid REP instruments. There allocation weight is therefore zero.
Instrument Allocation 100% Allocation Index Instrument Allocation 100% Allocation Index
Liquidity CHF 4.5% 4.5% Australian 10 YR Bond Future 0.6% 1.2%
SMI Fut 10.7% 20.7% Natural Gas Future 0.3% 0.5%
SMIM Fut 1.2% 2.3% Crude Oil Future 0.2% 0.4%
FTSE Fut 4.1% 7.9% Brent Oil Future 0.4% 0.8%
Euro-Stoxx 50 Fut 6.2% 12% Live Cattle Future 0.2% 0.3%
S&P 500 E-mini Fut 8.4% 16.4% Wheat Future 0.2% 0.4%
TSX 60 Fut 0.7% 1.4% Corn Future 0.2% 0.4%
SPI 200 Fut 1.8% 3.5% Soybean Future 0.4% 0.8%
TOPIX Fut 4.2% 8.2% Sugar Future 0.2% 0.3%
Hang Seng Fut 0.6% 1.2% Aluminum Future 0.2% 0.4%
MSCI Singapore Fut 0.4% 0.7% Copper Future 0.4% 0.8%
MSCI EM Fut 4.8% 9.4% Gold Future 3.8% 7.3%
Swap CHF 3 YR 24.1% 46.8% EUR/CHF Fw 11.5% 22.4%
Swap CHF 7 YR 6.2% 12.1% GBP/CHF Fw 5.1% 10%
Swap CHF 10 YR 1.7% 3.2% USD/CHF Fw 18.7% 36.2%
Euro-Schatz Fut 1.6% 3.2% CAD/CHF Fw 1.8% 3.5%
Euro-Bobl Fut 2.4% 4.7% AUD/CHF Fw 2.9% 5.6%
Euro-Bund Fut 1.3% 2.6% JPY/CHF Fw 4.2% 8.2%
Short Gilt Fut 0.6% 1.2% HKD/CHF Fw 0.6% 1.2%
Long Gilt Fut 0.5% 0.9% SGD/CHF Fw 0.4% 0.7%
US 2 YR Note Fut 1.2% 2.3% Total Cash 4.5% 4.5%
US 5 YR Note Fut 2.4% 4.7% Total Futures 95.5% 185.5%
US 30 YR Note Fut 1.8% 3.4% Total Forwards 45.2% 87.8%
Can 10 YR Bond Fut 1.1% 2.1% Volatility 60d 4.1% 8%
Mini JGB 10 YR Bond Fut 0% 0% Investment Degree 100% 194.2%
AUS 3 YR Bond Fut 0.5% 0.9%
Table 2.3: Futures, forwards and swaps in the TAA replication. The allocation to 100% is scaled
to the allocation index which takes into account that the total volatility of the TAA should be
equal to 8%, see text for explanations.
The next step is to take volatility into consideration. For each instrument calculate
for one year the daily returns. Then form the sum of the products of the returns at
each date. This gives the return of the allocation index time series. To calculate the
volatility of this index calculate its standard deviation and multiply the result by the
square root of return days within one year (square root rule). This implies the volatility
4.12%. Given the target volatility of 8%, the investment degree of 194.2% follows, see the
Allocation Index. We described a technical mechanistic way of how to construct a TAA.
We reconsider this issue in Section 3.3.6.2 where we allow for bets of the CIO dening
the TAA. That is, the model TAA of this section is an input for the CIO which makes
pairwise bets of dollar value each at inception.
2.2. INVESTORS 31
2.2.5 Forwards and Futures

A forward contract is an obligation for the buyer (seller) to buy (sell) a specied quantity
of an underlying asset at a specied price at at specied future date. The terms, i.e. the
date of exchange T, the underlying asset S, the quantity to exchange N and the delivery
price K are all xed at conclusion t of the contract. Contrary to futures contract, all
terms of a forward can be freely xed by the two counter parties. Hence, forwards are
OTC contracts and the parties face the counter party risk of the opponent.
Forwards can be used to hedge risks. Consider a German based rm which wants to
buy goods in the US. The rm could buy the goods at a future date at the spot rate
USD-Euro. To avoid this risk the rm can enter today into a forward contract.
The forward price F is xed such that no cash ows exist at spot date t; the PV of
a forward is zero. The delivery price K is set equal to the forward price F at spot date
t, i.e. K = F (t, T ). The equality K = F (t, T ) does not hold any longer after spot date.
• S(T ) = K and since forward price F (T, T ) = 0 the German buyer of the contract
faces neither losses nor gains.
• S(T ) > K . This is favourable for the German rm since USD can be bought at the
cheaper price K than the spot price. If spot is lower than K, the opposite holds
true.
Summarizing, a forward contract has the following value for the buyer at maturity:
N · (S(T ) − K) ,
with N the notional amount. For the seller, the payo is N · (K − S(T )). Hence, the
payo is a linear function of S(T ) contrary to options, where the payo is non-linear.
The linearity implies that the price of a forward does not depend on spot volatility since
the probability of making a gain or a loss at maturity is symmetric .
• How is K calculated?
• We know the value of a forward contract at initiation (zero) and at maturity. What
are its value at intermediate dates?
Proposition 5. The unique arbitrage free forward price F (t, T ) given a risk free rate r
is under continuous compounding:
F (t, T ) = S(t) · er(T −t) (2.1)
This price only holds for forward where no dividends, no cost of storage, no interest
rate dierential costs and no convenience yield apply. The growth rate of the future
price F is equal to r, F (T, T ) = S(T ) and F (t, t) = S(0). To prove the proposition the
standard no arbitrage argument is used. Assume F (t, T ) > S(t) · er(T −t) . We set up a
portfolio W as follows. We borrow a money amount S(t) to buy the cheap stock S and
go short the more expensive forward for the same amount. Then W (t) = 0. At T , we
pay back the loan, sell the stock to full the forward contract obligation and settle the
forward contract which pays S(T ) − F (t, T ). The cash balance at T is
W (T ) = −S(T ) · er(T −t) + S(T ) − (S(T ) − F (t, T ))

= −S(T ) · er(T −t) + F (t, T ) > 0
Using such a strategy we start with zero value and end at T with certainty with a a
positive value - this is an arbitrage which allows for the construction of a money machine.
A similar argument applies for the other inequality. For simple compounding we get by
using the approximation e x ∼ 1 + x:
F (t, T ) ∼ S(t) · (1 + r(T − t)) . (2.2)
We generalize to forwards with stocks paying dividends with a continuous rate d, com-
modities with storage cost rate s, bonds with coupon payment rate c, FX-transaction with
dierent interest rates foreign and domestic i (interest rate dierential) and commodities
with a convenience yield y. These extensions are captured by the net cost-of-carry yield,
i.e.
q = r + s − y − d − c ± i.
Proposition 6 generalizes:
Proposition 6. The unique arbitrage free forward price F (t, T ) given q is given under
continuous compounding by:
F (t, T ) = S(t) · eq(T −t) (2.3)
If q > 0, the costs to possess the underlying value are larger than its value. This
happens if storage costs are very high for a commodity forward. The buyer of the
forward has therefore to compensate the seller for this high costs. We next consider the
valuation of a forward at intermediate dates. We recall that the forward contract has
value 0 at initiation time t and at T we have S(T ) − K = S(T ) − F (t, T ) for the long
position For s an intermediate date t < s < T , the value V (s) satises
V (s) := PV(S(T ) − F (s, T )) .
The PV of S(T ) equals S(s) if no dividends are paid and
PVF (t, T ) = D(s, T )F (t, T )
which implies
V (s) = S(s) − D(s, T )F (t, T ) .
Using the no arbitrage relation S(s) = F (s, T )D(s, T ) we get:
2.2. INVESTORS 33
Proposition 7. The value of forward contract at time s is given by
V (s) = (F (s, T ) − F (t, T ))D(s, T ) . (2.4)
The consistency check, i.e. setting s=t or s=T shows that the true initial and
nal known values follow.
Consider a stock with S(t = 0) = 25 CHF and a 6m forward contract. The 6m

interest rate is r = 7.12%. Using simple compounding we get
F (0, 0.5) = 25(1 + 0.0712/2) = 25.89 CHF
After 3m the stock price is S(t + 3m = 0.25) = 23 CHF and 3m interest rates are
r = 8.08%. The forward price of a new contract with the same maturity reads
F (0.25, 0.5) = 23(1 + 0.0808/4) = 23.46 CHF .

The value of the old contract is
V (0.25) = (F (0.25, 0.5) − F (0, 0.5))D(0.25, 0.5) = −2.38 CHF .

If the buyer (long) wants after 3m leave the original contract he has to pay 2.38 CHF
to the seller. The value of the old contract for the short position clearly has this value
since the sum of both positions is a zero sum game. The dierence between the forward
and spot price is called the basis b:: b(t) = F (t, T ) − S(t).
Forwards oer full exibility to the two involved parties. But forwards also possess
some drawbacks. Each seller needs to nd a buyer and both parties face counter party
default risk. These two drawbacks are eliminated futures instead of forwards. Futures are
traded at a future exchange. Each future exchange has a clearinghouse. A clearinghouse
is a well-capitalized nancial institutions. The clearinghouse acts as an intermediary
between the two parties. The house guarantees contract performance to both parties.
The two parties than have an obligation to the clearinghouse and no longer to each
other. Risk of default is therefore reduced to default risk of the clearinghouse. To reduce
default risk of the clearinghouse and the exchange the buyer and seller must deposit
funds with their broker; the margins. The form of the margin must be eligible such as
cash or specied securities. Since the initial margin is typically a one digit fraction of
the goods represented in the future contract potential losses are much higher than the
margin deposit. To counteract this risk the potentially large gains or losses in future
contracts are not left to grow over time but they are realized on a daily basis, i.e. the
exchange values the future positions daily by marking-to-market them. The margins split
at contract initiation in the initial margin, the maintenance margin, which reects the
necessary minimum amount on the margin account, and the variational margins payable
if a shortfall of the margin account is considered.
Futures contracts are are standardized contracts w.r.t. to delivery date T, the un-
derlying value and the quantity to deliver. The two parties only have to agree about the
delivery price K and the number of contracts.
Since a future initial price is zero, to buy a futures is essentially equivalent to buy the
underlying value nanced by borrowing. Therefore large leverage positions are possible.
They allow for the same exposure than the underlying value but at lower costs - lower
fees and smaller bid-ask spreads. Since future price can vary heavily during its life time
it requires enough liquidity for the potential margin calls. For example at the Chicago
Mercantile Exchange the minimum amount is 250 thousand US dollar. The largest future
exchanges are CME, CBOT, Eurex.
To explain the margining process, see Table 2.4, we consider an example:
• Monday
Investor buys futures USD-EUR with a notional amount EUR 1250 000.
The underlying value is
USD
0.7 EUR and maturity is 1 year.
• Tuesday
USD
Price underlying: 0.5 EUR
USD × EUR 1250 000 =
0.2 EUR USD 250 000 are taken away from the margin
account.
• Wednesday
USD
Price underlying: 0.8 EUR
USD × EUR 1250 000 = USD 350 500 are credited to the investor's margin
0.3 EUR
account.
Time Fut Price Margin Seller Margin Buyer

0 F0 0 0
1 F1 F0 − F1 F1 − F0
··· ··· ··· ···
T −1 FT −1 FT −2 − FT −1 FT −1 − FT −2
T FT FT −1 − FT FT − FT −1
Table 2.4: Margin calls.
In a Dax future the notional amount is equal to the Dax index value times 25 Euro.
For Dax at 3'900 points the notional amount of a future contract is 97'500 Euro. The
initial margin for a Dax Future is Euro 30 850. With Euro 100 000 on the margin account
one can enter into at most 2 futures contracts. The maturity of Dax futures is typically
3 months. The tick for the futures is 0.5 Dax index points. Since the value of a single
2.2. INVESTORS 35
Dax point is Euro 25, the value of a tick is 12.5 Euro. The exchange fee are Euro 50
cents per contract.
How do we value futures?
Proposition 8. If there is no interest rate risk, default risk of the counter parties and no
arbitrage holds, then the valuation of futures is the same as the valuation of corresponding
forward contracts.
Table 2.5 proves the proposition where r is the xed one-day interest rate
Time Forward Future

0 0 0
1 0 +r(VF (1) − VF (0))rT −1
2 0 +r2 (VF (2) − VF (1))rT −2
3 0 +r3 (VF (3) − VF (1))rT −3
. . .
. . .
. . .
T-1 0 +rT −1 (VF (T − 1) − VF (T − 2)r1
T rT (S(T ) − F (t, T )) +rT (S(T ) − VF (T − 1))
Table 2.5: Equivalence of forwards and futures for deterministic interest rates
If we price futures in a market without any frictions, the pricing of futures is given
by the cost-of-carry model, i.e. no arbitrage is the driver. If C represent the expected
cost-of-carry, i.e. the costs which are necessary to carry the good forward from t to
delivery date T, the no arbitrage relation reads
F (t, T ) = S(t)(1 + C) , or F (t, T ) = S(t)eq(T −t) (2.5)
which relates the costs C uniquely to the cost-of-carry yield q.
We illustrate (2.5) for S be an equity index, D the value of the dividends before
maturity and r the annualized nancing rate or money market yield. The fair futures
prices reads
T −t
F (t, T ) = S(t)(1 + r )−D . (2.6)
360
Let B(t, T ) be the market price of a bond including accrued interest rate (dirty price).
The fair futures price is given by
F (t, T ) = Bond Price + Interest Cost - Coupon Income
where the interest cost are the interest opportunity cost and the coupon payments are
those up to expiration of the futures contract. Let c be the annualized coupon rate, A
the days of accrued interest rate, then
T −t T −t+A
F (t, T ) = B(t, T )(1 + r ) − cB(t, T ) .
360 360
Scenario 1 Scenario 2
today tomorrow today tomorrow
Spot 400 600 Spot 400 200
Future 425 637 Future 425 212
1:1 Hedge -25 -37 1:1 Hedge -25 -12
P& L -12 P& L 12
Table 2.6: Delta static hedge.
This formula assumes that the bond can be bought and delivered at any data. But this
needs not to be true. Consider US Treasury bond futures which are traded on the Chicago
Board of Trade (CBOT). These bonds have quarterly expiration dates. The size of one
futures contract is equal to USD 100'000 face value of a eligible Treasury bonds having
at least 15 years to maturity and which are not callable for at least 15 years. Therefore
B(t, T ) in the second bond expression in the last formula is replaced by a tradeable bond
for the short seller - the cheapest to deliver bond Bcd (t, T ). Since dierent bonds have
dierent characteristics, standardization is lost at this stage. To give the short seller
exibility in choosing which bond is actually delivered the actual Treasury bond selected
by the short seller for delivery is price adjusted by a delivery factor f such that the bond
reects a standardized 8 percent coupon rate:
−t
B(t, T )(1 + r T360 ) − cBcd (t, T ) T −t+A
360
F (t, T ) = . (2.7)
f
These futures are quoted as in 1/32 units where 100 represents an 8 percent coupon bond
with a yield to maturity of 8 percent too.
Delta hedging is used to hedge the risk of a future. Consider gold with spot price
400 in a currency , net cost-of-carry 6% p.a. and time-to-maturity one year, i.e. the
present future price is F (t, T ) = 400e0.06 = 425 assuming no interest rate risk. In the
rst scenario the investor's static hedge is short a future and long spot position gold.
Table 2.6 summarizes the prot and loss for two price scenarios, where we set the spot
and future quantity to unity. The investor faces a gain or loss in the respective scenario.
This is not what we expect about a hedge. The reason is that the spot and future move
only 1:1 if the cost-of-carry is zero. This yield is not zero, therefore spot and future price
evolve dierent over time.
A Delta hedge is based on the Delta ∆Fut

Spot , i.e. the sensitivity of the spot price with
respect to a variation in the futures price:
Change Spot
∆Fut
Spot = .
Change Futures
If x is the change in the futures price and τ = T − t, no arbitrage between spot and
2.2. INVESTORS 37
today in 1m in 6m
Spot 400 600 500
Future 425 634 515
1:1 Hedge
Portfolio -25 -34 -15
Prot & Loss -9 19
Delta Hedge
Portfolio 0 0 0
P& L 0 0
Delta 0.942 0.946 0.970
Table 2.7: Delta hedge over time and the 1:1 static hedge.
futures price implies

e−qτ x
∆Fut
Spot = = e−qτ , (2.8)
x
i.e. the Delta is determined by the cost-of-carry. Hence, the Delta hedged portfolio
W is short ∆ times the futures and long spot:
W (t) = S(t) − ∆Fut

Spot F ut(t) .
If the future changes, the portfolio changes as follow:
∆W (t) = ∆S(t) − ∆Fut

Spot ∆F ut(t) = ∆S(t) − ∆S(t) = 0
which proves the hedging property. The data in Table 2.6 show that the portfolio value
and the P&L are zero in each scenario.
Table 2.7 shows how the Delta hedge changes over time for a 1m and a 6m scenario.
As an illustration the 1:1 static hedge is also shown. Consider an investor which posses a
stock portfolio which is equivalent to the SMI index. If SMI is at 7'300 points, the value
of the portfolio is CHF 7.3 Mio. The investor wishes to hedge for the next six month
against price changes using SMI index futures. Interest rate is at 3 percent and dividend
yield at 4 percent. The rst static hedge is to short 1'000 6m SMI futures. Suppose that
today SMI falls to 7'200 points. The change in value of the future price for 7'300 (7'200)
points is
1 1
70 300e 2 (0.03−0.04) = 70 263.59 CHF , 70 200e 2 (0.03−0.04) = 70 164.09 CHF.
This gives a gain in the future position of
(70 263.59 − 70 164.09)1000CHF = 990 500CHF .
This gain is not sucient to cover the loss on the stock portfolio of CHF 1000 000. There-
fore, a change in the future price is not equivalent to a change in the spot price. Us-
1
ing Delta hedging, the factor is e(0.03−0.04) 2 = 0.995. The future price changes less
strong than the spot and one needs not 1'000 future to hedge as in the static case but
10 000/0.995 = 1005.025. Using this amount, the gain in the future position if SMI drops
0 0
from 7 300 to 7 200 points:
(70 263.59 − 70 164.09)1005CHF = 990 997.50CHF
which an almost perfect hedge to the incurred loss of 1000 000 in the stocks.
2.2.6 Private Investors

Private investors dier in many ways from sovereign wealth funds and pension funds.
Their biggest assset is human capital, which interacts with assets along the life cycle as
follows. As a young, they only have human capital. In the course of their lives, human
capital generates an income that is converted into nancial capital. When retiring, most
people no longer use their human capital to generate nance capital but consume accu-
mulated nance capital. Pension funds for example are timeless.
Private investors show a strong real estate dependence in their balance sheet, see
Figure 2.7 for the Swiss case. In particular younger investor face a large leverage eect
of mortgage nancing: the ratio of assets (real estate) to existing capital is large. Small
changes in the property asset price have a signicant impact on the balance sheet equity
of the investor. Interest rate risk and real estate market price risk aect the asset. The
latter is potentially more dangerous for the investor's default and thus for systemic risk.
Figure 2.7: Balance sheet of private households in Switzerland (SNB [2018]).

2.2. INVESTORS 39
Consider a private investor which bought a house worth CHF 1 million. The 'golden
rule of aordability' in Swiss banking states that the investor needs to cover 20% of the
house price with his own capital and that the interest rate charge for the mortgage should
not exceed 1/3 of regularly income assuming a hypothetical high interest rate level of
5%. For a mortgage of CHF 8000 000 regular income of the investor has to be not lower
0
than CHF 3 × 0.05 × 800 000 = 1200 000. Suppose that the investor gets a mortgage with
xed 5 year rate of 1% which is in 2016 a plausible number. He therefore has to pay
for the next 5 years without any amortization payments CHF 80 000 per annum which is
much less than renting the same object. Assume that the remaining liquid capital of the
investor is CHF 1000 000 and an annual salary of CHF 1500 000.
0 0
The leverage ratio of the investor is λ = 1100
000 000
0 000 = 10. Consider two scenarios. First,
interest rates are up in ve years to 3%. Second, house price fall by 15% in the next ve
0
years. The rst scenario implies that the investor has to pay CHF 24 000 per annum for
the interest rate charge - up 16% of its income more per annum. In the second scenario.
0
the house is only worth CHF 850 000. Since the investor should always cover 20% of
0
the house price, a maximum mortgage of 80% means a value of CHF 680 000 since the
0
new house price is CHF 850 000. The investor has to pay the dierence of the old and
0
new mortgage value of CHF 120 000. This is more than an annual salary - payable in
principle within 30 days. Hence, house price risk is more severe risk than interest rate risk.
Given the importance of real estate risk for private clients it is not understandable
why the myriad of sophisticated wealth management tools almost always only consider
the nancial assets leaving aside the house asset and mortgage debt.
The dierent types of investors dier also in the type of nancial assets they buy.
The more professional investors are, the more they invest in cash products such as bonds
and stocks. They do not use mutual funds or structured products, since they can create
the same payos without paying the wrapper costs. Figure 2.7 shows on the aggregate
of all investors that bond investments and structured products did not grow in the last
decade opposite to the growth of funds and shares.
Individuals and smaller pension funds prefer mutual funds and structured products.
One reason is lack of capital to reach a reasonable diversication. We discuss below that
a Swiss investor needs about CHF 1.5 million in order to achieve a reasonable diver-
sication by investing in cash products. The second reason is that individuals fail to
have direct access to some markets and they cannot enter into short positions or are not
allowed to trade derivatives under the International Swaps and Derivatives Association
(ISDA) agreement. They are forced to buy derivatives in the packaged form of a mutual
fund or a structured product.
How do investors consider an asset and liability approach in investment? Research

from State Street (2014), using data from a worldwide survey of 3, 744 investors, shows
that although nearly 80 percent of investors realize the importance of achieving long-term
goals but prociency in achieving them can strongly deviate. In the US, public pension
funds were on average less than 70 percent funded, with more than USD 1.3 trillion
of unfunded liabilities. A similar picture holds for private investors. While 73 percent
cited long-term goals only 12 percent could say with condence that they were on target
to meet those goals. Many academic papers address the misalignment between what
investor's say is important (ALM) and what they do (asset only). There is a myriad of
possible reasons for this dierence between what they state and what they do which are
discussed in the papers.
2.3 Returns and Performance Attribution

Returns are key in asset management for the calculation of risk and performance. The
calculation of returns is not as straightforward as one might guess. One needs to cal-
culate returns for arbitrary complicated cash ow proles where cash can be injected
or withdrawn at dierent time dates. Dierent assets possess dierent time scales for
return calculations varying from intraday to months for illiquid assets. Returns often
need to be aggregated for risk calculations to reduce the dimensionality and risk models
are needed to value expected returns. Finally, the return for an investor can be the result
of several money managers, i.e. returns should be decomposable to account for dierent
contributors.
Why do we work with returns and not with prices? Prices can show exponential
growth behaviour over time. Such price time series are then statistically hard to manip-
ulate. The mean value has little meaning if prices grow exponentially. Therefore, one
works with a scale free quantity: Returns. Why one does work with log-returns? The
St −St−1
simple return over a period,
St−1 , is not useful if one tries to model returns assuming
a normal distribution since simple returns range from −1 to +∞ while the normal dis-
tribution ranges over the reals. Furthermore, returns aggregate multiplicative, i.e. the
10 days gross return is the product of the ten one-days gross returns. But the product of
normal distributions is not normal. One therefore prefers to work with log-returns where
the product of return aggregation is replaced by a sum and the sum of log-normals is
log-normal.
2.3.1 Time Value of Money (TVM)

Since returns compare cash ows (CF) at dierent dates, the time value of money matters.
CFs at dierent dates cannot be added since the value of CHF 1 today is dierent from
CHF 1 at any other date. The microeconomic assumption of impatience rationalizes why
there is a time value of money. Consider consumption of the same good c at time t and
T > t. If investors prefer consumption earlier to later,
u(ct ) ≥ u(cT )
2.3. RETURNS AND PERFORMANCE ATTRIBUTION 41
where the utility function u values consumption. To make the investor indierent, the
consumption good at time T must be larger than at time t, i.e.
u(ct ) = u(ct + ∆t,T ) =: u(ct (1 + Rt,T ))
with ∆ the interest and R the interest rate to compensate for impatience.
The function which weights CFs at dierent dates T > t is the discount function
D(t, T ). Discounting restores additivity which makes it possible to calculate single
numbers, such as the present value PV or future value FV, for any complicated cash
ow prole and to compare dierent investment returns. Discounting is the necessary
price of a product can be written as the probability and
ingredient such that the
time weighted sum of future cash ows.
The discount function has the form D(t, T ) = D(T − t), i.e. the homogeneity of time or
the irrelevance of the vista time. The inverse operation of discounting is compound-
ing, i.e. D(t, T )D(t, T )−1 = 1.
Consider CHF 1 at time T and two scenarios. First, discount the CHF back directly
to t. Second, discount it rst back to a time s, t < s < T , and then from s to t. We
assume that there is no risk. The value at t of the Swiss franc should be independent
of the chosen discounting path. Else, by buying low and selling high generates a money
machine (arbitrage in a risk free environment). Formally,
D(t, s)D(s, T ) = D(t, T ), D(t, t) = 1 . (2.9)
Cauchy proved that the exponential function is the unique continuous function which
satises (2.9):
D(t, T ) = e−a(T −t) , a > 0 .
This motivates exponential discounting. a has the dimension inverse time and calculating
∂D
the growth rate of the discount factor, ∂T
D = −a, identies a with the interest rate R.
The discount function D(t, T ) for dierent maturity dates T denes the spot rate term
structure which we write {D(t, T )} := {D(t, T ), T ≥ 0}. Assume next that there exists
interest rate risk. Then equation (2.9) makes no sense sind D(s, T ) is a random variable.
To restore the identity, we have to x the rate between s and T at time t, i.e. with the
discount factor D(t, s, T ) we again have that D(t, s)D(t, s, T ) = D(t, T ). This denes
the Forward Rate Term structure {D(s, t, T )}. Using the no-arbitrage relation (2.9)
given one term structure, the other follows by no arbitrage.
The absence of arbitrage implies that there exists exactly one discount factor for
each currency and for each maturity; else build a money machine. But there are many
dierent forms of interest rate, prot and loss and performance calculations. The reasons
are:
• The method of compounding - do investors reinvest their proceeds in future periods

(compounding) or do they consume them (simple compounding)?
• Do we use market rates for discounting or synthetic rates from an asset management
perspective such as the yield-to-maturity (YtM) to value and compare dierent
investments?
• The calender and day-count-convention dier: The number of days within a year
varies for dierent countries, exchanges and products.
Examples
Compounding
Investing n years with compounding and simple compounding implies:
F VnD = P V (1 + Rd )n , F VnS = P V (1 + nRs ) . (2.10)
Hence, F VnD ≥ F VnS . The formulae can be generalized to the case with sub-annual
periods and where R is not constant. The limit forward value is achieved if interest rates
are calculated instantaneously which results in the exponential compounding formula as
a limit how fast capital can grow.
Unique discount factor
The continuous discount factorDc = e−Rc (T −t) , the discrete time Dd = (1 + Rd )T −t

and the simple Ds = (1 + Rs (T − t))
−1 all have to attribute the same PV to a future
CHF 1. Therefore, the dierent rates Rc , Rd , Rs are in one-to-one relationship. Equating

D S
for example F Vn = F Vn implies Rd = (1 + nRs )
1/n − 1.
Continuous discounting approximation
Continuous discounting:
P V C = D(0, 1)F V C = e−R(1−0) F V C = e−Rc F V C
implies
FV
Rc = ln
PV
If we consider short time periods (say daily return calculations), the logarithm can be
FV FV

approximated up to rst order by the gross simple return: ln PV ∼ PV − 1 =: R.
Remarks:
• Interest rates are quoted on a p.a. basis.
• Simple discounting is used for LIBOR rates, products with maturity less than a
year, discrete compounding for bonds and continuous compounding for derivatives
or Treasury Bills.
• The discount function is a simple function of the interest rate. But the interest rate
itself is a complicated function of a risk free rate, the creditworthiness of counter
parties, liquidity in the markets etc. The discount function construction is the
key object in nancial engineering. An industry developed and develops methods
to construct the discount function D(t, T ) for dierent maturities T - the term
structure.
Example
Let p(t, T ) the price of a zero-coupon bond (ZCB) at time t paying USD 100 at
maturity T if there is no default. Except from counter party risk, a ZCB is the same as a
discount factor. ZCB are the most simple interest rate products. More complex products
such as coupon paying bonds can be written as a linear combination of ZCBs. Consider
a coupon bond with a yield R, i.e. the rate needed such that the PV of the bond is
equal to its present price is R. The slope of the price-yield graph is negative since a bond
issued today will have a lower price tomorrow if the interest rates increase (opportunity
1
loss). The relation is non-linear since p(t, T ) = D(t, T ) × 1 = (1+R(t,T ))T −1
× 1.
Example Eective rate of return and Yield-to-Maturity (YtM)
The eective simple rate Re,s is the gross return needed to reach from the PV value
the FV value:
(1 + Re,s )P V := F V .
Consider Re,s for a n-year investment in a stock S (where PV=S0 , FV=Sn ):
n
Sn Sn Sn−1 S1 Y
1 + Re,s = := ... = (1 + Rk,k−1 )
S0 Sn−1 Sn−2 S0
k=0
where Rj,j−1 is the sub-period return. The eective, simple gross return is equal to the
product of the period returns. The compounded eective rate Re,d follows by taking a
square root n in the above formula. If compounding is continuous, the eective return
is equal to the arithmetic sum of period returns. This is one reason why continuous
compounding is preferred.
A particular decision problem for an investor is to choose between two bonds:
• Bond 1: Price 102, coupon 5%, maturity 5 years.
• Bond 2: Price 98, coupon 3%, maturity 5 years.

Bond 1 has more attractive future CFs but bond 2 is cheaper. Which one to prefer? If
maturity would increase then bond 1 should become more protable and the opposite
holds if the price of the bond 2 become more cheaper compared to the bond 1. The
yield-to-maturity (YtM) y is a decision criterion which assumes that products are kept
until maturity. The, the YtM y solves by denition the equation:
n
X c N
Price = j
+ .
(1 + y) (1 + y)n
j=1
The bond with the higher resulting y is the preferred one. This equation can be solved
easily numerically. YtM, which has a at term structure, is the most important example
of a Money-Weighted Rate of Return (MWR), see below.
2.3.2 Interest Rate Swaps

The construction of discount factors is a key discipline in nancial engineering. The dis-
count factors are derived from prices of liquid nancial instruments together with math-
ematical interpolation for factor maturities where no observable asset prices exist. For
maturities up to one year, the money markets, futures or forward rate agreements (FRA)
are used. For longer maturities, the capital markets, bonds or swap rates are used. We
therefore introduce interest rate swaps (IRS) and FRA. Both instruments are derivatives.
Vanilla Interest Rate swaps (IRS)

1 are bilateral contracts. Typically, xed versus
oating rates are exchanged. The reference rate for the oating leg is typically LIBOR
or EURIBOR. The notional amount is not exchanged. It serves only as a calculation
gure. In USD or Euro the maturity ranges between 2-30 years. To enter such a contract
an ISDA agreement and counter party risk limits are needed. Minimum contract size in
Swiss Francs is CHF 2 Mio. For the xed payments the day count convention of bond
markets 30/360 is used and act/360 is used for the oating leg.
2 The counter party pay-
ing the xed rate is called the 'payer', the other one the 'receiver'. The payer (receiver)
is by convention long (short) the swap.
Originally, IRS were introduced for interest arbitrage reasons. Consider two rms
A and B. A has a high creditworthiness, B a low one. Both rms can borrow at a xed
or oating rate given in Table 2.8.
The two rms can both benet if they enter into a IRS, since the dierences between
xed rate borrowing diers from the oating rate one. Both parties can realize and divide
1 The notion vanilla is used for basic derivatives. Vanilla products on equity are call and put option.
More complicated products are called exotic. Vanilla interest rate do not possess for example complex
prepayment or amortisation schedules.
2 '30/360' means that each month has 30 days and each has 360 days. The convention 'act' means
that the actual calender dates are summed. The reset frequency is the frequency of oating payments.
A B Dierence
Fixed 5% 6 % 1 %
Floating LIBOR LIBOR + 0.75 % 0.75 %
Table 2.8: Rates for rms A and B .
the dierence of 0.25% using an IRS. To lock in the prot, each party borrows where
they have an advantage. A borrows xed and B oating. B agrees to pay A oating
rate LIBOR plus 0.75 percent and A agrees to pay B xed 5.9 percent. A gets oating
rate funding at LIBOR minus 0.15 percent and B gets an advantage in xed funding of
0.1 percent.
The rst swap was designed 1981 between the World Bank and IBM. The motivations
of the two parties are shown in Figure 2.8: IBM received DM from its funding program
and used this money to nance project in the US. That for IBM needed to change peri-
odically USD in DM to serve the coupon payments. Since USD became stronger in that
period compared to DM, IBM made currency gains. To realize these gains IBM needed
to get ride of its DM-liabilities. The World Bank borrowed in the capital markets and
lent to developing countries for project nance. The costs of the loans were the same
than the nancing cost of the World Bank in the markets. US interest rates were at 17
percent in this period and in Germany and Switzerland they were 12% and 8%. World
Bank was raising funds in low interest rate currencies but the World Bank was constraint
to borrow in these countries. It needed to nd another party which owed DM/CHF and
which wanted to exchange them against USD. An investment banker at Salomon Broth-
ers realized that a currency swap would solve the problems of both parties. Salomon
Brothers managed the trade on a back-to-back basis where IBM could change their DM-
liabilities into USD and the World Bank could buy DM at favorable rates: The World
Bank lent IBM over notional amounts and coupons denominated in DM and received
from IBM notional and coupons in USD.
After this trade in a second period in the 80's and 90's banks started to enter into
own-name transactions. I.e. swap counter parties discussed directly with the bank as
intermediary their desired risk and return prole. Entering in-between the two parties
the bank faced twice counter party risk. One also stared to develop standardized doc-
umentation documents which allowed to process customized transaction eectively, the
ISDA agreements. The third period was characterized by beginning market making.
Banks started to trade swaps with several counter parties. Market and counter party
risk increased due to this wider activities - large investment in risk management followed.
Market risk was often compensated with transactions in other markets. For example, due
to their liquidity government bonds were preferred.
Bond Market
Pays
DM & CHF
Pay Coupons and Debt in Borrow USD

USD from the Swap
DM & CHF
Coupons & Nominal
World Bank IBM USD

FX Market
Exchange USD in DM & CHF
Pays DM & CHF Swap Pays USD Trade Income
USD
Coupons & Notional DM & CHF
Pay back of Loans Loans Coupons & Notional
DM & CHF DM & CHF
Clients Existing Loans
of DM & CHF
World Bank
Figure 2.8: Swap between the World Bank and IBM.
2.3.2.1 Swap Pricing

We x swap initiation date 0 and maturity date T. The xed rate at which a new swap
par swap rate s0,T (0).
can be executed is the constant This rate by denition sets
the value of the swap at initiation to zero, i.e.
PVSwap (0, s0,T (0)) =0
since at initiation no cash ows are exchanged. Fixed payments s0,T (0) are made annu-
ally.
3 Floating payments are made quarterly.
4 Figure 2.9 shows replication of a swap
into par xed bond and oating rate note (FRN). We prove below that the PV of FRN
must be worth par at each quarterly LIBOR reset date. Since the initial value of a swap
is zero, the initial value of the xed leg must also be worth par. We have
PVSwap, xed (0, s0,T (0)) = PVSwap, oating (0) .
Solving for the swap rate, using that the PV of the oating leg is 1 − p(0, T ) times the
notional, we get
1 − p(0, T ) PV Floating
s0,T (0) = = (2.11)
A0,T (0) Annuity
3 We assume that the dierence between two consecutive dates is equidistant.

4 They are equal to act/360 times the 3m LIBOR rate at the beginning of the quarter. This is called
setting in advance and paying in arrears .
Floating Cash
Flows
Swap
t0 t1 t2 t3
Fixed Cash Flows
1 1
Trick
t0 t1 t2 t3
1 1
Fixed Cash Flows
Fixed
Coupon
t0 t1 t2 t3
Bond
+ 1
Floating
Rate Note
t0 t1 t2 t3
(FRN)
Figure 2.9: Graphical representation of a payer swap replication (payer means the party which
pays the xed rate and obtains the oating one). Dotted lines represents oating cash ows.
Replication is obtained by virtually adding and subtracting notional amounts at the beginning
and maturity of the swap. We assume for simplicity the same periodicity for the oating and
xed leg. The gure shows an important property of risk structuring: To obtain the cash ow
prole of a new product one can add to an existing prole new products and add them vertically.
PT
where A0,T (0) = j=1 p(0, j) is the present value of an annuity and p(0, t) is the price
of a zero coupon bond; the level of the swap. We have to prove the PV claim for the
oating leg.
Proposition 9. The PV of a oating rate note is equal to the notional.

Since D(0, 0) = 1, the PV of the oating leg equals
PV(Float) = (1 − p(0, T ))N .

To prove the claim, we set Lj = L(tj−1 , tj ) for the LIBOR forward rate xed at tj−1
with payment at tj . At time 0 only L0 is
known. To replicate a random cash ow
Lj = L(tj−1 , tj ) we need one unit of a currency at time tj−1 , which can be invested at
the rate Lj such that we get in tj the payo 1 + Lj : Buy a zero coupon bond p(0, tj−1 )
and sell a zero coupon bond p(0, tj ). The balance of both bonds at tj is Lj . Replication
is accomplished. Consider the next cash ow Lj+1 . Then the short bond p(0, tj ) for the
former cash ow will enters as a long bond: The bond cancels. Considering a series of
cash ows Lj , the replicating bonds cancel but the last one. This proves the claim.
Consider a 2y FRN with reset date each 6m, notional 10 000 and given spot rates.
Setting the day count fraction to 1/2 , we get the values in Table 2.9:
Maturity Spot Rate Forward Rate Cash Flow FRN PV FRN

0.5 2.45% 2.45% 12.25 -
1 2.62% 2.76% 13.78 -
1.5 2.80% 3.08% 15.40 -
2 3% 3.45% 1017.37 -
- - - - 1000
Table 2.9: Valuation of a FRN. The forward rates are calculated using simple compounding
1+T ×R(0,T )
−1
F (0, S, T ) = 1+S×R(0,S)
T −S . The FRN cash ows are derived from CF(T ) = 10 000 × F (0,S,T
2
)
and
CF
the PV follow from PV(CF(T )) = 1+T ×R(0,T ) .
(T )
We close with some OTC market gures. Figure 2.10 shows notional and gross
amounts in OTC markets. Notional amounts are USD 600 tr which is 8 times worldwide
GDP. The gross amount is more than a factor 10 smaller. The markets cover OTC foreign
exchange, interest rate, equity, commodity and credit derivatives.
The gross positive market values is the sum of the replacement values of all contracts
that are in a current gain position to the reporter at current market prices and similar for
gross negative market value. The gross positive market value is the sum of the two above
absolute values. Gross means that there is no netting or osetting. Gross market values
supply information about the potential scale of market risk in derivatives transactions
and it is a measure of comparable economic signicance across markets and products.
2.3.3 Forward Rate Agreements

IRS can be considered as a sequence of forward rate agreements (FRAs): A FRA is
a swap with a single oating and single xed leg. Consider a client which would like to
Global OTC Market Values, mn USD

Percentage Gross Market Values OTC Deriatives
800'000'000 40'000'000
Categories
Other derivatives
700'000'000 35'000'000 1%
Credit default
swaps
600'000'000 30'000'000 18%
500'000'000 25'000'000
400'000'000 20'000'000
Equity-linked
300'000'000 15'000'000 contracts…
200'000'000 10'000'000 Credit derivatives

18%
100'000'000 5'000'000
- -
01.06.1998
01.07.1999
01.08.2000
01.09.2001
01.10.2002
01.11.2003
01.12.2004
01.01.2006
01.02.2007
01.03.2008
01.04.2009
01.05.2010
01.06.2011
01.07.2012
01.08.2013
01.09.2014
01.10.2015
01.11.2016
01.12.2017
Commodity
contracts
16%
Notional amounts Gross market values
Figure 2.10: OTC market gures. The statistics on the country level is based on data reported
every six months by dealers in 12 jurisdictions (Australia, Canada, France, Germany, Italy, Japan,
the Netherlands, Spain, Sweden, Switzerland, the United Kingdom and the United States) plus
data reported every three years by dealers in more than 30 additional jurisdictions. (Source:
BIS, 2018)
obtain a loan of CHF 10 Mio. starting in 6m with 6m maturity. LIBOR spot L(6m, 6m)
is xed in 6m. The client believes that 6m LIBOR starting in 6m will be higher than
present 6m LIBOR. He would like to freeze the loan term on the actual interest rate
level, i.e. on the xed forward LIBOR rate K = F (0, 6m, 6m): He wants to swap the
oating rate L(6m, 6m) against the xed rate F (0, 6m, 6m). A FRA contract achieves
this client's need:
• At initiation time 0 no cash ows are exchanged.
• At time 12m the client has to pay CHF −10(1+L(6m, 6m)) Mio. without an FRA.
But the client would like to pay CHF −10(1 + F (0, 6m, 6m)) Mio.
• An FRA contract pays/receives an amount A in 6m and A(1 + F (6m, 6m)) in 12m

such that A balances the payments in 12m between the unwanted risky payment
without a FRA and the wanted xed payment: A solves in 12m the equation:
A(1 + L(6m, 6m)) − 10(1 + L(6m, 6m)) = −10(1 + F (0, 6m, 6m)) .
| {z } | {z } | {z }
Balance Without FRA Desired Payment
Solving for A, inserting the year-fraction α = Hedging

360
period
and K = L(0, 6m, 6m) we
get:
10α(L(6m, 6m) − K)
A= .
1 + αL(6m, 6m)
The no arbitrage relation between spot rates R(s, t) and forward rates F (s, t, u)

183 182 365
1 + R(0, 6m) 1+K = 1 + R(0, 12m) (2.12)
360 360 360
implies
K = F (0, 6m, 6m) = 7.26%.
Returning to swap pricing and using the no arbitrage relationship between zero bonds
and forward rates we get
T
X p(0, j)
s0,T (0) = wj L(0, Tj−1 , Tj ) , wj = .
A0,T (0)
j=1
The sum over all weights wj equals 1. This shows that a IRS is a weighted sum of
FRA's.
2.3.4 Constructing Discount Factors

We construct the discount function starting from the par swap rates ('stripping the curve',
'bootstrapping'). We derive them using observable swap par swap rates. We start with
a 1y par swap rate s0,1 (0) and 6m LIBOR for the oating leg. Proposition 9 implies
N (D(0, 1) − 1) = s0,1 (0)D(0, 1)α0,1 .

Solving implies for the rst discount factor
1
D(0, 1) = .
1 + s0,1 (0)α0,1
To obtain D(0, 2) we have to consider a 2y swap with swap par rate s0,2 (0). From
N (D(0, 1) − 1) = s0,2 (0) (D(0, 1)α0,1 + D(0, 2)α1,2 )

D(0, 2) as a function of D(0, 1) (Bootstrapping, curve stripping). Solving,
1 − s0,2 α0,1 D(0, 1)

D(0, 2) = .
1 + s0,2 α1,2
An immediate recursion gives
n−1
P
1 − s0,n (0) αi−1,i D(0, i)
i=1
D(0, n) = . (2.13)
1 + s0,n (0) .αn−1,n
So far, we assumed that necessary input rates exist. What if there are holes, i.e. times
were no observable instrument exists? Then we have to to interpolate. Such a con-
struction should satisfy several requirements:
• Liquid Mark-to-market. The value of a dollar at a future date should be deter-

mined by liquid securities. This minimizes the risk that cash ows, are misspecied.
• Stability. The constructed term structures should be stable when switching from
one structure to another one. Switching from a meaningful discount curve to a
forward curve should also provide a meaningful forward curve.
• Smoothness. Curves should not be ragged unless a sound economic explanation

exists.
• Consistency. Estimated term structures today should be consistent with the

dynamics of interest rate models. More precisely which parameterized families
used to estimate the forward rate curve are consistent with arbitrage free interest
rate models? We do not consider this issue and refer to Filipovic (2009).
Consider the data for CHF in Table 2.10 using money and capital market instruments.
There are several methods to nd a curve which interpolates observed data. The rst
one is to use a full cash ow view. A vector of the n market instruments ~
X (LIBOR,
futures, swaps, etc.) is represented as
~ = CD
X ~ +
with C the cash ow matrix, ~

D the N discount factors for the searched term structure
and the error term allows to treat statistically bid and ask dierences or outliers in
Period SARON Period LIBOR % Period Swiss Gov. Bonds

o/n 0.04
1m 0.13833
3m 0.17
6m 0.24
12m 0.52
2y 0.605
3y 0.811
4y 1.029
5y 1.229
7y 1.538
8y 1.651
10y 1.818
20y 2.162
30y 2.278
Table 2.10: CHF interest rates as of January 2011. SARON (Swiss Average Rate
Overnight) is an overnight interest rates average referencing the Swiss Franc interbank
repo market. The data in the table are blended: if several possibilities exist to construct
the table, the most convenient instruments are used to ll out the table. Source: Swiss
National Bank.
prices. The term structure ~

D is found by minimizing the quadratic distance between
~
X and ~.
CD This optimization has serious drawbacks. Since n << N many possible
solutions follows. Furthermore, the matrix C has many entries which are zero since
most cash ows appear only once, the inverse matrix C −1 is highly sensitive to small
variations of the input parameters. There is nothing in this approach which prevents
one to obtain a ragged term structure curve, similar as in the case where one linearly
interpolates between existing rates. Suppose the zero rate curve is constructed in this
way. To derive the forward rates basically a derivative of the zero rate curve is applied.
5
Errors and kinks in the linear approximation lead then to jagged forward rate curves.
We need therefore higher order polynomials. Approaches are the so-called B-splines
or cubic splines, smoothing splines and the exponential-polynomial approach (Nelson-
Siegel, Svensson). The last approach, which is used by most central banks, estimates the
forward curve by minimizing the distance to bond prices where the estimated forward
curve has the functional form
F = z1 + (z2 + z3 T )e−z4 T .
The approach is exible in choosing the degree of the polynomial terms and it is
consistent with the dynamics of the chosen interest rate model. For T to innity the
5 From the no arbitrage relation 1+f (0, S, S+∆) = p(0,S+∆)

, we get f (0, S, S+∆) = p(0,S+∆)−p(0,S)
∼
p(0,S) p(0,S)
∂p(0,S)
∂S
.
estimated curve becomes equal to the constant z1 which is interpreted as the long term
rate. The second term z2 represents the short term rate due to the exponential decay
and z3 is the intermediate rate term representation.
We consider an interpolation example by 3rd order polynomials, i.e. the spot rate
R(0, t) is given by
R(0, t) = at3 + bt2 + ct + d

with the constraint that the curve matches known rates at specic dates tk :
• t1 = 1y, t2 = 2y, t3 = 3y, t4 = 4y with
• values 4, 4.5, 5, 5.3 percent, respectively.
Using the cubic interpolation we search for the intermediate rate R(0, 2.5y), i.e.
R(0, 2.5) = a(2.5)3 + b(2.5)2 + c(2.5) + d.
That this unknown rate matches the 4 given ones is equivalent to a linear system:
 
1 1 1 1
8 4 2 1 
M x = y , x = (a, b, c, d)0 , y = (4%, 4.5%, 5%, 5.3%, )0 , M = 


 27 9 3 1 
64 16 4 1
where the matrix M has the time index powers as entries. Using
 
−1/6 1/2 −1/2 1/6
 3/2 −4 7/2 −1 
M −1 = 
 −13/3 19/2 −7 11/6 
4 −6 4 −1
implies for
x = (−0.00033, 0.002, 0.00133, 0.037)

the rate
R(0, 2.5) = −0.00033(2.5)3 + 0.002(2.5)2 + 0.00133(2.5) + 0.037 = 4.762% .
We apply the methods to swap pricing. Table 2.11 shows how dierent rates are
derived from the given swap rates. We rst recall:
• The forward curve is a function t → F (t, T ).
• The zero or discount curve is a function T → p(0, T )
• The par swap rate curve is vector of spot starting swap rates for all maturities.
Maturity Swap Rate Discount Factor Spot Rates Forward Rates

1y 4.50% 0.95638 4.5615% -
2y 4.95% 0.90647 5.0324% 5.55%
3y 5.39% 0.85158 5.5015% 6.57%
4y 5.57% 0.80151 5.6872% 6.28%
5y 5.68% 0.75409 5.8071% 6.31%
Table 2.11: To obtain the discount factors from the swap rate we use (2.13). To get
1/T
the spot rates from the discount factor we use R(0, T ) = 1
D(0,t) − 1 and the for-
D(0,T )T
ward rates are calculated as F (0, S, T ) = D(0,S)S
. The day-count factor reads act/360/100
=1/36'000*365=0.0101388.
1y 2y 3y 4y 5y
Rates 4.5615% 5.5518% 6.5750% 6.2827% 6.3128%
Cash ows -2'280'743 -2'775'921 -3'287'486 -3'141'361 -3'156'409
PV of cash ows -2'181'246 -2'516'291 -2'799'551 -2'517'843 -2'380'228
Table 2.12: Floating leg pricing. Up to 1y spot rates are used for longer maturities forward
rates apply.
1y 2y 3y 4y 5y
Fix 1% 1% 1% 1% 1% 1%
Cash ows 500'000 500'000 500'000 500'000 500'000
PV of cash ows 478'188 453'235 425'789 400'757 377'047
Table 2.13: Pricing with a xed 1% rate.

Using these rates we price 5y swap with a notional of 50 Mio. in a given currency.
Table 2.12 summarizes the oating leg pricing. The PV of the oating leg, see Proposition
9, is
−120 3950 159 = −500 0000 000(1 − 0.75409) .
We price the xed leg using 1% as an ad hoc xed rate. The result is given in Table
2.13. The PV using 1% xed is 20 1350 015, hence the xed swap rate s follows:
PVFloating (0)
s=− = 5.806%.
PVx at 1% (0)
We apply this to the pricing of the IBM-Worldbank Swap. IBM had to make
the following payments:
• 12'375 million CHF p.a. at dates Mar 30 from 1982-1986, repayment of principal
CHF 200 Mio.
• 30 million DM p.a. on the same dates, principal DM 300 Mio.
IBM wanted to receive these payments from the World Bank. What are the equivalent
USD payment for IBM to the above World Bank payments? We need the present value
of the foreign currency payments the World Bank had promised to make. We assume a
at term structure with the interest rates at transaction settlement date Aug 11, 1981
of 8 percent (CHF) and 11 percent (DM). The settlement date of the swap was Aug 25,
1981. We have for the PV of the xed CHF payments
3
0 0 120 3750 000 X 120 3750 000 2120 3750 000
PVx, CHF in CHF = 191 367 478 = + +
(1 + 0.08)x (1 + 0.08)i+x 1 + 0.08)4+x
i=1
as of Aug 25, 1981. Here x are 215 days instead of 360 days for the subsequent years
since the rst payments were due Mar 30, 1982. The terms of the swap were agreed
to weeks before i.e. Aug 11. Using the 2w forward contracts for Aug 25 for conversion
where 1 USD was worth 2.18 CHF, we get
0 0
PVx, CHF in USD = 87 783 247
as of Aug 25, 1981. With the same procedure one gets:
0 0
PVx, DM in USD = 117 703 153
as of Aug 25, 1981 (with FX 2.56). Adding the two xed PVs, total USD payments of
the World Bank were Z = 2050 4860 400. To borrow a present value of USD Z with a
payment schedule matching the payment schedule of the IBM debt, the World Bank had
to issue debt and pay commissions and expenses of 2.15%. Consequently, to get a net
present value of Z 2100 0000 000 with a coupon of 16
it could issue debt at par for USD
percent.
0 0
The World Bank would receive 97.85% of USD 210 000 000, i.e. which amounts
to Z.
2.3.5 Return Bookkeeping

We always consider a nite economy: A nite number of dates 0, 1, 2, . . . , T , S0 , a
risk-less asset, normalized toS0 (0) = 1, N risky investment opportunities Sj (t) ≥ 0 in
all future states with known prices Sj (0) ≥ 0, j = 1, . . . , N and no frictions (tax, spreads,
etc.). The state space can be a tree or a continuum at each date.
The value or wealth process, position times price, is dened as follow(we often
neglect the superscript ψ ):
N
X
V ψ (t) = ψ0 S0 (t) + ψj Sj (t) =: hψ(t), S(t)i , (2.14)
j=1
with ψ0 the amount invested in the risk-free asset, ψj the number of units of the risky
security j held in a period [t, t + 1) and hψ, Si the scalar product. The vector ψ(t) is a
portfolio or a strategy .
Denition 10. A normalized portfolio φ at time t is dened by

ψ0 (t)S0 (t) ψk (t)Sk (t)
φ0 (t) = , φk (t) = , k = 1, . . . , N . (2.15)
V ψ (t) V ψ (t)
If all positions are positive, i.e. a long-only portfolio, and there is no leverage, then
the normalized weights add up to one and are positive: They are probabilities. Asset
managers are interested in self-nancing portfolios, i.e. the changes in the value process
in any period is due only to changes in asset values and not by an external money in- or
out-ow. Writing ∆Xt := Xt − Xt−1 , the change in portfolio value V = ψS reads with
the product rule:
∆Vt = (∆ψt )St + ψt ∆St .
In the rst term on the RHS a change in portfolio value between two dates is due
to external money added or withdrawn. Self-nancing rules out such strategies. The
following portfolio accounting properties are immediate to prove:
Proposition 11. 1. The normalized portfolio components without leverage adds up

to 1.
2. The return of a portfolio is equal to the weighted sum of the portfolio constituent's
return:
N
X
Rφ = φj Rj =: hφ, Ri . (2.16)
j=1
3. If the portfolio is self-nancing, then

t X
X N
V (t) = V (0) + ψj (s)∆Sj (s) .
s=1 j=1
The simple return of a portfolio is invariant of the size of the portfolios: Scaling
the portfolio value by a factor, the factor cancels out in the return calculation. Hence,
without loss of generality we set V (0) = 1.
φ
The theorem implies for the growth rate of wealth R[0,t] from 0 to t:
φ Vt
1 + R[0,t] := (2.17)
V0
= (1 + Rφ (t))(1 + Rφ (t − 1)) . . . (1 + Rφ (1))
= (1 + hφ, R(t)i)(1 + hφ, R(t − 1)i) . . . (1 + hφ, R(1)i), (2.18)
i.e.
t
Y
Vt = V0 ((1 + hφ, R(s)i) .
s=1
Wealth growth follows from a geometric rate and not an arithmetic one.
2.3.6 Returns and Rebalancing

Literature for this section are Hallerbach (2014), Blitz (2015), Hayley (2015), White
(2015), Pal and Wong (2013) and Quian (2014). We rft dene two basic investment
strategies:
Denition 12. • ψ(t) is a buy-and-hold or static portfolio if ψ(t) = ψ(0) for all
t ≥ 0.
• ψ(t) is a constant rebalanced portfolio if ψj (t−)Sj (t) = cj for all positions j and
all t with c given. t− denotes a prior time arbitrary close to t where the asset value
of the period is realized and the portfolio weight ψj (t − 1) chosen at t − 1 is changed
to ψj (t) such that the position value equals the predened position cj .
The market portfolio is a buy-and-hold portfolio. Rebalancing to constant wealth

levels keeps a constant dollar mix in the positions but not a constant risk mix. We
only consider this type of rebalancing. The proportion on capital in stock j just before
rebalancing is given by
ψk (t)(1 + Rk (t + 1))
ψk (t + 1)− = N
.
P
ψj (t)(1 + Rj (t + 1))
j=1
the weights ψk (t + 1)− are called the drifted weights. In a buy-and-hold portfolio
drifted weights equal rebalanced weights at each date.
Given that wealth growth is at a geometric rate we state:

Proposition 13. The geometric return of a BH long-only portfolio GMBH in T periods

for N assets is given by
N
X
T
1 + GMBH = φj (1 + gj )T
j=1
T
where φj = 1 and (1 + gj )T = (1 + rj,k ).
P Q
j
k=1
The geometric return of a xed-weight RB portfolio GMRB in T periods for N assets

is given by  
T
Y XN
T
1 + GMRB = 1 + rjk 
k=1 j=1
where φj rjn = rRB,n is the xed-weight return of the portfolio in period n.

P
j
Comparing BH and RB strategies faces two diculties have. First, the expressions
are of a dierent algebraic form and one needs a method to make them comparable.
Second, for RB strategies volatility matters. The rst problem can approached by com-
paring the two geometric returns with a third, ctitious return Ḡ. It is ctitious since it
is not the geometric return of any actual portfolio. The second problem is solved by using
the volatility drag relationship between geometric and arithmetic returns, see below.
We consider a portfolio valueV which consists of two asset S and B where at each
date the weight of the S -asset is 60% of the total portfolio value. If φ represents the
number of shares S in the portfolio and ψ those of B , we have at time 0 (by abusing our
notation):
V0 = φ0 S0 + ψ0 B0 = 0.6V0 + 0.4V0 .
V0
To achieve the weights, the investor has to buy at time 0 φ0 = S0 × 0.6 of asset S
and similarly, for asset B. After one time step the absolute portfolio value before
rebalancing reads:
V1 = φ0 S1 + ψ0 B1 6= 0.6V1 + 0.4V1
where a change in portfolio value is entirely due to changes in asset values and not in
changing the positions (self-nancing investment strategy). Then the required values are
restored by rebalancing. It follows that the weight of the asset with a price increases is
reduced and vice versa for the other asset.
Generalizing the framework to multiple periods, the rebalanced strategy at time k

reads for asset S:
k
(1 + RkV )
Q
V0 j=1
φk = k
, (2.19)
S0 Q
(1 + RkS )
j=1
where RkS is the one-period simple return of asset S and RkV the one-period portfolio
return. A similar result holds for the other asset. This formula shows that if the S asset
returns lower than the B asset ones and hence lower than the portfolio returns, more and
more is invested in the S asset. By rebalancing we implement an implicit buy-low-sell
high mechanism.
Consider an investment problem of the mean-variance type. That is, the optimal
policy follows such as in the Markowitz model (see below) by solving an optimization
problem. Expected return and variance are inputs in this model.The expected return
input of a single asset is typically the arithmetic mean return over T periods. Then the
expected return of the portfolio becomes the arithmetic mean of the returns of a portfolio
which is rebalanced to the mix specied by the optimal investment policy for each asset.
Such an analysis of the Markowitz model implies that long term return of an asset with
returns is given by the arithmetic mean but we have have seen that the eective growth
rate is geometric.
Since the arithmetic mean of any return series is always greater than the geometric
mean, the return predicted by the Markowitz analysis is always greater than the true long
term return that would have been obtained by using the actual rebalanced allocation.
The next proposition makes this precise.
Proposition 14. For numbers xi ,
N N
!1/N
X Y
xi ≥ xi . (2.20)
xi =1 i=1
Equality holds only if all xi are the same.
If one therefore inputs the geometric asset return also a non optimal situation occurs
since this always underestimates the true return of the rebalanced portfolio - the volatility
drag.
We write GM for the geometric mean and AM for the average arithmetic mean. GM
for T periods reads:
T
!1/T
Y
GM = (1 + Rk ) −1 . (2.21)
k=1
Taking logarithm,
T
1X
log(1 + GM) = log(1 + Rk )
T
k=1
with Ri the return of the portfolio between time i−1 and i. Writing µ for the expected
mean return of the portfolio we get:
T
1X
E(log(1 + GM)) = E(log(1 + Rk ))
T
k=1
T
(Ri − µ)2

1X Ri − µ
= log(1 + µ) + E −E + o(µ)
T 1+µ 2(1 + µ)2
k=1
σ2
= log(1 + µ) + 0 − + o(µ).
2(1 + µ)2
If µ is small then, log(1 + µ) = µ + o(µ), using the Neumann series in the portfolio
volatility term and approximating the log in GM implies the volatility drag equation:
σ2 σ2
E(GM ) = µ − + o(µ) = E(AM) − + o(µ) . (2.22)
2 2
The equation also holds if there exists no risk and for individual assets.
2.3.6.1 Why Should Investors Rebalance?

The rst reason arises from optimal investment models such as the Merton model.
The general expression does not implies that one has to rebalance necessary but that it is
not optimal to keep portfolio weights constant over time. In this models, in each period
the optimal investment weights satisfy
φ(t) = MPR(t) × RRA−1 + (1 − RRA−1 )∆Y (t) × RIRA−1 (2.23)
where:
αt −rt
• The Market Price of Risk (MPR): MPR = σt2
. MPR is proportional to the
Sharpe ratio:
Denition 15. The Sharpe ratio SR is dened by

E(R)+
SR(R) = ≥0 (2.24)
σ(R)
with R a general return (absolute, relative, net, gross) and A+ = max(A, 0).
Often the Sharpe ratio is not constrained to be positive. But this ratio is not
very meaningful for negative values since the higher risk for a xed negative return
the higher the Sharpe ratio. Assuming log normal returns. Square-root scaling
√
rule implies that the Sharpe ratio scales with T for an increasing time horizon
while the MPR is time-scale invariant. While conceptually simple, there are many
dierent interpretations and calculation methods for the Sharpe ratio: Should one
use linear or log returns, how do we scale the Sharpe ratio properly from one time
horizon to another one, what are the industry standards in the calculation of the
ratio? The widely observed square-root scaling rule only holds in the IID case, see
the Section Risk Scaling. For non-IID returns the situation is more complex and
Lo (2003) is the reference to follow.
−1 −1
• RRA the inverse relative risk aversion - the investor's risk tolerance. RIRA
the inverse relative innovation risk aversion.
• ∆Y is the hedge of the changing opportunity set for the investor.
Hence, changing alphas or benchmark returns or investment opportunities all impact the
weights of optimal portfolio. We consider this formula in Section 4.7.1.
A second reason is volatility harvesting. Volatility creates alpha-generating re-

balancing opportunities for any portfolio. That volatility impacts the growth of wealth
follows from a mathematical fact that for small return, growth in wealth (which is a com-
pounded gure) approximately equals the arithmetic average return over an investment
horizon less half the volatility of the return; the volatility drag, see below. The stated
relationship holds independent of the portfolio weights and since volatility enters the re-
turn relationship, wealth growth is path dependent, i.e. a strategy which reduces average
volatility will reduce the drag or increases wealth growth. This means that strategies
which take volatility into account such as rebalancing strategies are expected to outper-
form pure buy-and-hold or equal market weighted strategies.
2.3.6.2 Rebalancing Examples

Can we characterize the properties of the rebalancing strategy? What can be said about
the performance of this strategy? We consider these issues in Section 2.3.6.3. We il-
lustrate rebalancing for the following indices: Swiss Market Index SMI (SMI), MSCI
World UCITS ETF (MXWO), JPM Global Aggregate Bond Index (SZG2TR), Equal -
weighted index 1,600 hedge funds (JAGGUSD), FTSE NAREIT All Equity REITs Index
(BCOMT), gold dollar price (AU Currency) and the S&P 500 Index (SPX). We consider
weekly data from Sept 2 1993 to Mar 27 2015. Weekly rebalancing is too frequent from
a transaction point of view but the analysis is for illustration purposes, see Figure 2.11.
The top left panel shows the index or price evolution. The dot.com and GFC crisis are
visible. The bottom left panels shows the rebalancing strategy. Basically, winners are
sold and losers are bought. This panel is a mirror image of the price chart. The panels
on the right hand side show performance of dierent investment strategies. On the top
right, the rebalanced strategy and the equal weighted buy-and-hold strategy are shown.
Both strategies fail to provide protection if markets are under stress although the rebal-
ancing strategy suers from a lower shortfall. But it also cuts the upside potential which
leads to overall underperformance. The red line assumes transaction costs of 10 bps per
rebalancing.
7 Indices Prices Performance of Investment Strategies

6.000 3.50
5.000 3.00
4.000 2.50
2.00
3.000
1.50
2.000
1.00
1.000
0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
SMI MXWO SZG2TR JGAGGUSD
BCOMTR XAU Curncy SPX Rebalanced to EW Equal Weighted Buy-and-Hold Average TX Costs
Rebalanced Strategies of 7 Indices Performance of Investment Strategies

2.50 4.00
3.50
2.00
3.00
1.50 2.50
2.00
1.00
1.50
0.50 1.00
0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
SMI MXWO SZG2TR JGAGGUSD
BCOMTR XAU Curncy SPX Rebalanced to EW IV Momentum
Figure 2.11: Rebalancing example for Swiss Market Index SMI (SMI), MSCI World
UCITS ETF (MXWO), JPM Global Aggregate Bond Index (SZG2TR), Equal -
weighted index 1,600 hedge funds (JAGGUSD), FTSE NAREIT All Equity REITs Index
(BCOMT), gold dollar price (AU Currency) and the S&P 500 Index (SPX).
In the lower right panel, the rebalanced to EW strategy is compared with the inverse
volatility strategy IV and two momentum strategies. In the IV strategy, the rebalancing
update of the strategies is adjusted by the past volatility of the indices - the more volatile
an index was, the less weight it will have in the next period (negative leverage). With this
strategy the large market stress periods are neutralized but the strategy also annihilates
the growth potential. In the momentum approach, strategies are updated according to
whether a strategy belonged to the winner or looser strategy over the past month. More
precisely, the average last month return of all strategies are calculated. Each strategy
is compared to this average: If the performance is higher (lower) than the average, the
strategy is a winner (looser) one and the updated rebalancing strategy is updated by
adding/subtracting a constant number, respectively. The strategy is a long only strategy
which is atypical for momentum strategies which are implemented as long-short portfolios
(buy the winners, sell the losers). The momentum strategy shows boost and crash before
and during the GFC. These two eects typically are reinforced in a long-short set-up.
2.3.6.3 Rebalancing = Short Volatility Strategy

A short volatility strategy means that investors sell out-of-the-money call and put op-
tions. Since the price of an option is in 1:1 relation with the volatility, shorting a call
is the same as shorting volatility. Therefore, 'Rebalancig = Short Volatility Strategy'
means that the investor is eectively selling rewarded options by rebalancing which leads
to the additional growth rate. We follow Ang (2013).
Consider a single risky asset S and a risk-free bond that pays 10 percent each period
in a two-period binomial model. The stock starts with a value of 1 and can go up or
down in each period with the same probability of 50 percent (see the data in Figure
2.12). If an up state is realized, the stock value doubles; otherwise the stock loses half of
its value.
Stock dynamics Buy-and-hold wealth dynamics Rebalancing wealth dynamics
Figure 2.12: Rebalancing as a short volatility strategy in a binomial tree model. Left are
the risky asset's dynamics, in the middle are the wealth values if a buy-and-hold strategy
(60/40) is used, and right are the wealth levels for a rebalancing strategy to xed (60/40)
weights. Note that up and down is the same as down and up. Therefore, there are two
paths for the stock value after period 2, both with the result of 1 (Ang [2013]).
Using these assumptions, wealth projections for the buy-and-hold strategy follow at
once. The value in the node 'up - up' - that is, 2.884 follows from
2.884 = 1.64(0.7317 × 2 + 0.2683 × 1.1),

where 1.64 is the wealth level of the former period node; 2 and 1.1 are the returns of the
risky asset (up) and the risk-free asset, respectively; and 0.7317 = 0.6 × 2/1.64 is the
holding in equity after the rst period. The rebalancing dynamics are calculated in the
same way but with xed proportions in the two assets.
The payos after period 2 show that rebalancing adds more value to the sideways
paths but less value to the extremes (up - up or down - down) compared to the buy-and-
hold strategy. This transforms the linear strategy of buy-and-hold - that is, payo is a
linear function of the stock value, in a non-linear way. Precisely, consider a European
call option with a strike value 3.676 at time 2 and a European put option with a strike
of 0.466. The option prices at date0 and date 1 follow from no-arbitrage pricing.
Consider the following two strategies:
• A rebalancing strategy.
• A short call + short put + long bond + long buy-and-hold strategy. The rst two
positions are the short volatility strategy.
A calculation - see Ang (2013) - shows that:
• Both strategies start with the same value 1 at time 0.
• Both strategies attain the same values in all 3 states at time 2.
Therefore the two strategies are identical. This shows that a short volatility strategy,
nanced by bonds and the buy-and-hold strategy, is the same as a rebalancing strategy.
Since volatility is a rebalancing means short volatility, the investor automatically earns
the volatility risk premium. The short volatility strategy makes the payo in the center
of the probability distribution larger at the costs of the extreme payos. Short volatility
or rebalancing underperforms buy-and-hold strategies if markets are either booming or
crashing, but it performs well if markets are showing time reversals.
2.3.7 Stochastic Portfolio Theory (SPT)

Stochastic portfolio theory (SPT) was introduced by Fernholz (2002). Its goal is to con-
struct investment strategies that outperform a certain reference portfolio such as the
market portfolio. We work in discrete time and follow Pal and Wong (2016). We have
seen in the last sections that rebalancing can but must not outperform a benchmark re-
turn. SPT make precise under which conditions such a outperformance is possible. The
conditions are variants of the Fernholz master equation. The original equation depends
on the assumed dynamics of the underlying asset prices. Pal and Wong showed in the
discrete time set-up that the master equation does not depend on the asset dynamics.
It allows a return decomposition for each path without relying to the asset stochastic law.
Recently, Schied et a. (2018) showed that also in continuous time a dynamic independent
master equation is possible.
Since rebalancing and volatility pumping is key, we consider a classic example in

two assets. Asset 1 earns −50% return for all odd periods and 100% return for all even
periods. Asset 2 is a risk-free asset whose return is always 0%. Table 2.14 shows dierent
portfolio returns for a buy-and-hold (BH) portfolio and a rebalanced portfolio (RB) to
equal weights in each period. Initial wealth is USD 1.
Period BH 1 BH 2 RB 1+2 RB 1 RB 2
0 1 1 1 0.5 0.5
1- 1 0.5 0.75 0.25 0.5
1+ 1 0.5 0.75 0.375 0.375
2 1 1 1.125 0.756 0.375
Table 2.14: Buy-and-hold versus equal-weight reabalanced portfolios. 1− means the time
after the asset price is realized but before any possible rebalancing and 1+ is the portfolio
and position value after rebalancing.
Investing the dollar BH in either of the two assets leads to zero growth of wealth. Re-
balancing to equal weights in each period leads to a portfolio growth of 0.75×1.5 = 1.125.
This dominates all BH strategies. Hence, systematic rebalancing is capable of capturing
prot `from volatility even when the underlying assets experience zero growth. If we
extend the model to many periods a sequence of alternating return products 0.75 × 1.5
determines excess return of RB relative to BH. The order of the growth factors does
not matter but how many pairs of 0.75 × 1.5 = 1.125 can be formed - the number of
matchings. If we can form N pairs, the growth is boosted as (1.125)N .
To formalize the discussion, consider a binomial tree with two risky asset prices S1 , S2 .
Asset S2 serves as the numeraire asset replacing the risk less cash asset in the example
above. Then
S1 (t)
X(t) = log
S2 (t)
is a measure of relative price.
6 We set∆X(t) := ±σ where σ2 is instantaneous variance
of relative prices. For a strategy φ = (φ1 , φ2 ),
V φ (t) S1 (t) + S2 (t) S2 (0)

W φ (t) := =
S2 (t)/S2 (0) S1 (0) + S2 (0) S2 (t)
is the value of the portfolio φ relative to asset 2 and it satises the dynamics (using the
same arguments as for (2.29)):
W φ (t + 1)
= 1 + φ1 (t) e∆X(t) − 1 =: A(t) .
W φ (t)
6 The log follows from the standard representation of up and down moves in the binomial tree.
Iterating this equation implies
t−1
Y
φ
W (t) = A(s).
s=0
Assume that φ is constant, then a volatility matching pair of up and down moves gener-
ates the growth factor which is larger than 1, i.e. A(s)A(s − 1) > 1. It is maximal for
the equal weighted portfolio φ = 0.5. The recursive form of wealth implies that wealth
growth of the constant weighted portfolio dominates the benchmark growth rate of asset
2 if the number of matching pair dominates the moves in the price paths which do not
match. In a perfect zig-zag price path all moves match, in a monotone increasing or
decreasing path there is no matching at all and volatility harvesting is a loser strategy.
We use these ideas to develop the formal set-up of SPT starting with some notations.
The relative weights µj (t), where Xj (t) is the market capitalization at time t of asset j,
Xj (t)
µj (t) := N
(2.25)
P
Xk (t)
k=1
dene the market weight of asset j. Investing in each period according to the market
weights, the investment portfolio is the market portfolio. The temporal update of the
market weights, if only asset returns lead to capital changes, reads
µj (t)(1 + Rj (t + 1))
µj (t + 1) = N
. (2.26)
P
µk (t)(1 + Rk (t + 1))
k=1
and a calculation shows for the market portfolio:
P
µ j Xj (t)
V (t) = P . (2.27)
j Xj (0)
Let Vµ be the market portfolio value and Vφ any other portfolio value. The above
relative performance considerations leads to dene the relative portfolio
V φ (t)
V φ/µ (t) := . (2.28)
V µ(t)
The time evolution of the relative portfolio depends only on the market weights: For all
t > 0:
N
V φ/µ (t + 1) X µk (t + 1)
φ/µ
= φk (t) (2.29)
V (t) µk (t)
k=1
and V φ/µ (0) = 1. The formula follows at once by writing the dynamics for the dierent
portfolios. Considering log returns, we write
∆ log V φ/µ (t) (2.30)

N
!
X µk (t + 1)
= log φk (t)
µk (t)
k=1
N N N
!
X µk (t + 1) X µk (t + 1) X µk (t + 1)
= φk (t) log + log φk (t) − φk (t) log
µk (t) µk (t) µk (t)
k=1 k=1 k=1
N
X µk (t + 1)
=: φk (t) log + γ φ/µ (t).
µk (t)
k=1
The expression
γ φ/µis always non-negative by Jensen's inequality and it strictly positive
µk (t+1)
if Y := log µk (t) is not constant. If there is temporal volatility, gamma is strictly
t
positive and generates an excess growth.
7 We write Γφ/µ (t) =
P
γ φ/µ (s) for the cumu-
s=0
lated excess growth rate. Since γ is strictly positive if there is volatility, Γ is a growing
quantity to innity if time goes to innity, the energy term.
N
P µk (t+1)
The second transformation for the log return in (2.30) is to rewrite φk (t) log µk (t)
k=1
in terms of relative entropy as follow:
N N X N
X µk (t + 1) X µk (t + 1) φk (t)
φk (t) log = φk (t) log − φk (t) log .
µk (t) φk (t) µk (t)
k=1 k=1 k=1
P
Using the relative entropy notation S(p, q) = j pj log(pj /qj ) for two probability distri-
butions we can write
N
X µk (t + 1)
φk (t) log = S(φ(t), µ(t + 1)) − S(φ(t), µ(t)).
µk (t)
k=1
Summarizing, relative log return for any strategy can be decomposed into:
∆ log V φ/µ (t) = γ φ/µ (t) + S(φ(t), µ(t + 1)) − S(φ(t), µ(t)). (2.31)
If rebalancing takes place in each period to constant weights, then φ is constant and the
decomposition reads:
∆ log V φ/µ (t) = γ φ/µ (t) + S(φ, µ(0)) − S(φ, µ(t)). (2.32)
Figure 2.13 shows the decomposition of a log portfolio value, rebalanced to constant
weights, into its energy and entropy decomposition. The cumulative excess growth rate
7 To understand this, γ φ/µ = log Eπ/µ (eY −Eπ/µ (Y ) ) ≥ 0 follows by using the denition of Y and of
log. A Taylor approximation shows that Gamma is proportional to an excess growth rate.
Log V
Entropy
Figure 2.13: Decomposition of a constant weight rebalanced portfolio in the energy and
entropy paths.
Gamma measures the amount of market volatility captured by the portfolio. This cor-
responds to the the number of matched factors in the introductionary examples. The
relative entropy term measures how much the relative performance deviates from Gamma.
Since this term depends only on the initial and current positions of the market weight
vector, i.e. how the change in capital distribution aects the performance of the portfo-
lio, it excludes the eect of volatility. The uctuations of the log return are dominated
in the short run by the entropy part and long term growth comes from the cumulated
excess growth rate.
For constant weighted portfolios the following theorem characterizes log return growth.
Theorem 16. Consider a constant weighted strategy φ. Assume that the market returns
µ(t) for all t are element of a compact set K and that Γ(t) is increasing to innity for t
to innity. Then, portfolio value V φ/µ (t) also tends to innity.
The proposition statement is a pathwise statement and completely free of any stochas-
tic modeling assumptions. Long term outperformance follows whenever the two path
properties are satised. The validity of these two conditions can be evaluated by a port-
folio manager at each date. The authors extend Pam and Wong extend the discussion to
non-constant rebalancing strategies.
The ctitious return is based on the following denitions:

2.3.8 Return Attribution

We consider the Arithmetical Relative Return (ARR) . It is dened as the dierence
between a portfolio return RV and a benchmark return Rb
X
ARR = RV − Rb = (φj RjV − bj Rjb ) (2.33)
j
with b the benchmark portfolio weights. Figure 2.14 shows how this return dierence -
blue minus the grey square - can be split into three dierent parts for each j:
ARRj = φj RjV − bj Rjb = 1 + 2 + 3 .
Using geometry:
Benchmark
Weight
Portfolio
1 3
jj
jjb 2
Rjb Return
Rj
Figure 2.14: Arithmetic return decomposition. Source: Adapted from Marty (2015)
ARRj = 1 + 2 + 3 = (φj − bj )Rjb + (RjV − Rjb )bj + (φj − bj )(RjV − Rjb ) . (2.34)
| {z } | {z } | {z }
=:A =:S =:I
A, a dierence arising from dierent portfolio weights, represents the tactical asset
allocation (TAA) (the Brinson-Hood-Beebower (BHB) eect), S the stock selection eect
and I the interaction eect. This return decomposition was invented by BHB in 1986 and
it is used by many asset management as a starting point for their performance attribution.
Figure 2.15 shows the performance attribution tree for the MSCI World ESG Quality
Index. The total return RT can be written in the form
RT = RT − Rb + Rb = ARR + Rb .
Since fees are not available, the total return is a gross return. The ARR has several
levels.
RT
=RB + ARR Fees: net of fee return
Asset Classes
TAA & Selec.
Risk Premia
Figure 2.15: Performance attribution tree for the MSCI World ESG Quality Index where
the information written in red comes from me (Adapted from MSCI [2016]).
The ARR is rst decomposed in asset classes. Then the asset class equity is further
decomposed into three types: Sector and geographical diversication G, the selection
part S and a part which invests into a portfolio of factor risk premia. In the return
attribution, the numbers add up in the hierarchy whereas the non-linear risk gures to
not add up.
Given the above attribution, how do we calculate returns? This is trivial if no cash
inows or outows need to be considered. There are two methods to calculate the
investment return:
• Time-Weighted Rate of Return (TWR).

• Money-Weighted Rate of Return (MWR)

We only provide some basic results and refer to Marty (2015) for a detailed discussion.
The TWR should measure the return of an investment where in- or outows do not aect
the return of the investment. TWR should reect the return due to the asset managers
decisions taken in the past. As an example, start with USD 100 in period one, where
another USD 200 are added at the beginning of period two and portfolio value at the
end of period two is USD 300. The net gain of the portfolio is zero, but calculating the
linear return results in 200/100 a 200 percent return if do not take into account the inter-
mediate cash in- or out-ows. TWR controls for these cash ows in return calculations.
MWR reect the return from an investor's perspective: In and out cash ows as well
as the prot and loss matter in this perspective. The MWR method is based on the no
arbitrage principle. Both, the MWR and TWR can be applied on an absolute or relative
return basis.
The TWR
TWR
R0,T of a an investment starting in 0 and ending in T, with T −1 time
points in between (not-necessarily equidistant) is dened by:
−1
TY −1
TY −1
TY −1
TY
TWR Vi+1 − Vi Vi+1 hφi , Si+1 i
1+R0,T = (1 + Ri,i+1 ) = 1+ = =
Vi Vi hφi , Si i
i=0 i=0 i=0 i=0
(2.35)
PN
where hφi , Si+1 i := j=1 φi,j Si+1,j is the value of the N assets with the corresponding
portfolio φ. Writing out the second product it follows that

TWR
1 + R0,T maps V0 into VT -
all intermediate steps cancel. The following properties holds for TWR:
Proposition 17. 1. Adding or subtracting any cash ow cbt at any time b
t does not
change TWR.
2. If φi (j) = λi φi−1 (j) for all assets j and all time points i, then TWR equals the
return of the nal portfolio value relative to its initial value. Hence, all intermediate
returns cancel in (2.35).
The TWR method is used by most index providers since cash in- or out-ows do not
impact the return of the index. To prove the rst property, x a time t and
b let cbt be an
arbitrary cash ow. The relevant terms in the TWR with this additional cash ow are:
Vb − Vbt Vbt − Vbt−1

1 + t+1 1+ .
Vbt Vbt−1
Assuming that Vbt = Vt + cbt , i.e. the additional cash ow is added, and inserting this in
the last expression implies
Vt+1
Vt−1
which is the same result as simplifying the two terms in the TWR without any additional
cash ows.
In the MWR cash ows cj are reinvested at the internal rate of return (IRR), i.e.
RM W R solves:
T
X
P V (C, RM W R ) = D(0, j; RM W R )cj (2.36)
j=1
where the discount factor D depends explicitly on the RM W R . Since RM W R enters the
denominator of the discount factor, (2.36) is solved numerically. Using the rst order
1
approximation D ∼ 1+R transforms (2.36) into a linear equation for R- the so-called
Dietz Return (with AIC the Average Investment Capital):
−1
TP
ST − S0 − cj
Dietz P &L j=1
R = := −1
TP
. (2.37)
AIC 1
S0 + 2 cj
j=1
This approximation implies simple compounding and assumes that the CF are realized
in the middle of the respective periods.
2.3.9 Returns and Leverage

We consider two assets, see Anderson et al. (2014). The return of this portfolio without
leverage R0 in a single period reads
X
R0 = hφ, Ri, φi = 1 (2.38)
i
with φ, 1 − φ the invested amounts in asset 1 and 2, respectively. Consider a leveraged

position with leverage ratio λ ≥ 1. The portfolio value in absolute terms reads at any
date
V λ = λ(ψ1 S1 + ψ2 S2 ) + (1 − λ)ψ3 B (2.39)
where the rst part represents the levered portfolio and the last term represents borrowing
costs for the leveraged position which is an investment in the borrowed asset B. Note
that this term is negative. In relative terms,
1 = λ(φ1 + φ1 ) + (1 − λ)φ3
which shows that for λ=1 we are back in the unlevered case.
As an example, consider two assets S1 , S2 both with prices 100 CHF at date 0, one
unit of each asset in the portfolio and leverage λ = 3. Hence, 600 = λ(ψ1 S1 + ψ2 S2 )
CHF. Therefore, ψ3 B = 400 and V0 = 200 CHF. Assume that the return of the borrow-
ing asset B is 2%. If the S -asset's return is 10%, then the leveraged portfolio return is
22% = 3 × 10% − 4 × 2%. But if the S assets falls by 10%, then the leveraged portfolio
return is −38%. This means that after costs for leveraging, a boost of positive returns is
smaller than the comparable negative return boost if asset prices drop.
Calculating the return of the leveraged portfolio:
Rλ = λhφ, Ri + (1 − λ)φ3 RB . (2.40)
Setting φ = (φ1 , φ2 ) and φ3 = 1 − φ1 − φ2 . Inserting this, implies
λ
= φ1 + φ2 .
2λ − 1
If there is no leverage, φ3 = 0 or λ = 1. Calculating the excess return relative to a risk
free rate Rf and to the borrowing rate RB we get:
Rλ − Rf = λhφ, R − Rf i + (1 − λ)φ3 (RB − Rf )

λ
R − RB = λhφ, Ri . (2.41)
The excess return relative to the borrowing rate scales linearly in the leverage ratio. But
for the excess return relative to the risk free rate, if RB > Rf , increasing of the leverage
ratio reduces the gains in the original portfolio.
The leverage ratio λ is in many investment strategy applications not a constant

over time but a random variable. Rewriting the second equation in (2.41) and taking
expectations we get:
E(Rλ ) = λhφ, E(R)i + E(1 − λ)E(R − RB ) + cov(λ, R − RB ) . (2.42)
Anderson et al. (2014) call the rst two terms on the right hand side the magnied
source terms due to leveraging. How important is the covariance correction in the last
term? To quantify it we need to consider the volatility drag. Formula (2.42) summarizes
that the expected return of a leverage portfolio also contains a covariance reduction term
between the random leverage ratio and the excess return. Summarizing, in a multi period
investment, there are three factors which matter:
• The covariance correction present in leverage portfolios.
• The volatility drag which is present in any multi-period investment.
• Transaction costs.
Anderson et al. (2014) consider these three factors in a 60/40 target volatility investment
with US equity and US Treasury bonds. They consider monthly returns from Jan 1929
to Dec 2012. The target volatility is set equal the xed 11.59% realized volatility in the
observation period. Since volatility is not known ex ante, the leverage ratio is a random
variable. The borrowing for the leverage is done at the 3m Eurodollar deposit rate and
trading costs are proportional to the traded volume.
The authors nd that the magnied source return in equation (2.42) dominates all
other components. But his portfolio is not realizable. The gross return of the source
portfolio, i.e. 60/40 target (gross of trading costs) and
the risk parity portfolio with
3m Eurodollar nancing (net of trading costs) is 5.75% in the period. The magnied
source term contributes 9.72%. This implies that 3.97% is due to the leverage and
excess borrowing return. The total levered arithmetic return is 6.84%. The dierence
to 0.72% is the covariance correction −1.84% and the trading costs of −1.04%. Finally,
the variance drag value is −0.4% which implies the total geometric levered return of
6.37%. Summarizing, the three eects - transaction costs, covariance correction and
variance drag - reduced the positive leverage return impact of 3.97% by 82% to 0.69%
(3.97 − 1.84 − 1.04 − 0.4 = 0.69%).
2.4 The Ecient Market Hypothesis (EMH)

The TAA raises the question of whether asset prices are predictable. Predictability is
part of the broader Ecient Market Hypothesis (EMH) concept.
Malkiel (2003): Revolutions often spawn counter-revolutions and the ecient market
hypothesis [EMH] in nance is no exception. The intellectual dominance of the ecient-
market revolution has more been challenged by economists who stress psychological and
behavioural elements of stock-price determination and by econometricians who argue that
stock returns are, to a considerable extent, predictable.
Lo (2007): The ecient market[s] hypothesis (EMH) maintains that market prices fully
reect all available information. [...] It is disarmingly simple to state, has far-reaching
consequences ..., and yet is surprisingly resilient to empirical proof or refutation. Even
after several decades of research and literally thousands of published studies, economists
have not yet reached a consensus about whether markets - particularly nancial markets
- are, in fact, ecient.
Asness and Liew (2015):The concept of market eciency has been confused with every-
thing from the reason that you should hold stocks for the long run to predictions that stock
returns should be normally distributed to even simply a belief in free enterprise.
Shiller (2014): [If markets are ecient] there is never a good time or bad time to
enter the market [...]
We start with the EMH denition.

8
Denition 18. A nancial market is ecient when market prices reect all available
information about value.
8 This section is based on Fama (1965, 1970, 1991), Cochrane (2011, 2013), Malkiel (2003), Asness
(2014), Lo (2007), Nieuwerburgh and Koijen (2007), and Shiller (2014).
2.4. THE EFFICIENT MARKET HYPOTHESIS (EMH) 75
All available information includes past prices, public information, and private infor-
mation. These dierent information sets Ft lead to dierent EMHs (see below). The
statement 'reecting all available information' is not dened. If a company announces to
expect twice as much earnings, do stock prices double, triple, or fall? Reect all available
information means in the sense of Jensen that trading based on the information set does
not lead to an economic prot. An asset pricing model is needed to make precise what
reecting all information means in the EMH. Eciency testing means to test whether the
properties of expected returns implied by the model of market equilibrium are observed
in actual returns. This is referred to as the joint hypothesis problem (Fame [1970]):
• Pillar 1: Do prices reect all available information - that is, are market ecient?
Prices can only change if new information arrives. The information content.
• Pillar 2: Developing and testing asset pricing models. The price formation mecha-
nism (Asset Pricing Model).
We consider the meaning of the information set Ft which is not a vector space. Hence,
in conditional expectation E[X|F] is linear in X but has no arithmetic structure in the
information set. Consider a 3-period recombining stock price model. In each period stock
prices can move up or down with given probabilities. We can observe 8 possible path
realizations ωk after there periods, see Figure 2.16:
ω1 = (u, u, u) , ω2 = (u, u, d) , ω3 = (u, d, u) , ω4 = (u, d, d)

ω5 = (d, u, d) , ω6 = (d, d, u) , ω7 = (d, u, d) , ω8 = (d, d, d) .
The set of all observable outcomes ωj is the sample space Ω. To understand informa-
tion dynamics, suppose that the rst move was up. Then four paths are still possible
after this step, the others are impossible. After say a down move in the second step,
only two paths remain possible. After the last price move, a single realized path is left.
This allows to introducepossible events. For 8 observable events, the power set A = 28
denes all possible events.
Filtrations (Ft ) describe the possible event structure dynamics. Ft ∈ A represents

the possible information up to time t: Only past information generated by all possible
price paths enter into the ltration. We require
Ft ⊂ Ft+1 , Ft ∈ A ∀t .
Intuitively, increasing time means information resolution increases (there are more sets).
At t = 0, F0 = {∅, A} everything is possible, i.e. all future information is random.
9 At
t = 1, we dene the sets
A1 = {ω1 , ω2 , ω3 , ω4 } , A2 = {ω5 , ω6 , ω7 , ω8 } .
9 The inclusion of the empty set guarantees that the set F0 is closed under countable intersection and
complement set formation.
Time
0 1 2 3
 = set of Realized state
observable states w4
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8
A1
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8
A2
F0 = Empty Set F1= Empty Set, F3= Empty Set,

and Power set of Power set of , Power set of ,
 A1, A2 A1, A2 , A3, ...
Figure 2.16: Illustration of the information and ltration structure for the three period
CRR.
A1 (A2 ) is the set of all events where the rst price move is 'up' ('down'). We set
F1 = {∅, A, A1 , A2 } .
This assures that F0 ⊂ F1 . F2 is the power set of all eight observable states. The
information sets were generated by the evolution of the asset prices only. This is the
standard information structure set-up in asset and derivatives pricing.
Returning to the EMH, let Rt+1 assumed information

be an asset's return, FM the
used in the market to set the equilibrium price of the asset and F the real information
used to form asset prices. Market eciency means that the expected returns at t+1 given
the two information sets at time t are the same (we assume that rational expectations
are formed)
E(Rt+1 |FM,t ) = E(Rt+1 |Ft ) . (2.43)
The standard asset pricing equilibrium model of the 1960s assumed that the equilibrium
expected returns are constant: E(Rt+1 |FM,t ) = constant. If the EMH (2.43) holds, then
E(Rt+1 |Ft ) = constant
follows. To test the EMH, the regression of the future Rt+1 returns on the known infor-
mation Ft should have a zero slope. If this is not the case, the market equilibrium model
could be wrong or the denition of FM,t overlook information in price setting, FM,t and
Ft are not equal, or both channels could be awed.
Remarks
• The EMH does not hold if there are market frictions (trading costs, cost of obtaining
information). In the US, reliable information about rms can be obtained relatively
cheaply and trading securities is cheap too. For these reasons, US security markets
are thought to be relatively ecient.
• Grossman and Stiglitz (1980) show that perfect market eciency is internally in-
consistent.
• The EMH does not assumes rationality of investors. But to operationalize the
EMH one often assumes rationality. Fama proposes the following form:
E[Rt+1 − E[Rt+1 |Ft ]|Ft ] = 0 . (2.44)
Given the information set there is on average no systematic deviation of future

returns and its expectations. In this sense prices reect all available information.
Clearly, investors are assumed to be rational under this assumption. The EMH does
not assume that all investors have to be informed, skilled, and able to constantly
analyze the information ow. One can prove that market eciency is possible even
if a small number of market participants are informed and skilled.
• The EMH is applicable to all asset classes. If the EMH holds true, then prices react
quickly to the disclosure of information.
Why is the EMH important for AM? Fama's work on market eciency (1965, 1970)
triggered passive investing with the rst index launched 1971. In ecient markets buying
and selling securities is a game of chance rather than one of skill. Active management
is a zero-sum game. If the EMH holds, the variation of the performance of the active
managers around the average is driven by luck alone. Many studies found little or no
correlation between strong performers in one period and those in the next one, see Figure
2.17.
Suppose that one is able to pick in advance those managers who outperform others.
As per the EMH, investors would give them all their money; no-one would select those
managers doomed to underperform. But who will be on the other side of the outper-
former's trades? This process would be self-defeating.
The same conclusion also holds for technical analysis, the study of past stock prices
to predict future prices, and fundamental analysis, the analysis of nancial company
information to select undervalued stocks. If the EMH holds, both approaches are useless
in predicting asset prices. The value of nancial analysts is not in predicting asset values
but to analyse incoming information fast such that the information is rapidly reected
in the asset prices. In this sense analysts support the EMH. Fama (1970) denes three
dierent forms of market eciency, this means dierent sets F. In the weak-form
Figure 2.17: Performance ranking of the top 20 equity funds in the US in the 1970s and
in the following decade. The average annual rate of return was 19 percent compared
to 10.4 percent for all funds. In the following decade, the former 20 top funds had an
average rate of return of 11.1 percent compared to 11.7 percent for all funds (Malkiel
[2003]).
EMH, F is all available price information at a given date. Hence, future returns cannot
be predicted from past returns or any other market-based indicator. This precludes
technical analysis from being protable. In the semi-strong EMH, F is all available
public information at a given date, i.e. nancial reports, economic forecasts, company
announcements, etc. matter. Technical and fundamental analyses are not protable in
this case. This the form of the EMH which is often subsumed in the literature. In the
strong-form EMH, F is all available public and private information at a given date.
This extreme form serves mainly as a limiting case.
Example
A well-known story tells of a nance professor and a student who come across a
hundred dollar bill lying on the ground. As the student stops to pick it up, the professor
says, 'Don't bother - if it were really a hundred dollar bill, it wouldn't be there.' This
story illustrates well what nancial economists usually mean when they say markets are
ecient. But suppose that the student assumes that nobody so far tested whether the
bill is indeed real but that all assumed that someone else checked the bill's validity. Then,
there were no eorts made to generate the information needed to value the bill. But if
nobody faced the costs of generating that information then Ft is the empty set. Then
the EMH cannot hold. This shows that a reasonable assumption about human behavior
can lead to a violation of the EMH.
Example
A rm announces a new drug that could cure a virulent form of cancer. Figure 2.18
shows three possible reactions of the price paths. The solid path is the EMH path:
prices jump to the new equilibrium value instantaneously and in an unbiased fashion.
The dotted line represents a path where market participants overreact and the dashed
one where they underreact. The dash-dotted line is a strong signal for insider trading,
front running, or any other form of illegal trading.
Figure 2.18: Possible price reactions as a function of the day relative to the announcement
of a new drug.
2.4.1 Predictability
If the EMH holds in a rational set-up, asset prices follow a random walk:
Denition 19. Let Rt be the return of an asset with the dynamics

Rt = Rt−1 + m + t , m ∈ R, R0 = r . (2.45)
If the sequence (t ) is IID with mean zero, variance σ 2 and zero covariance cov(t , t−1 ) =
0, then Rt is a random walk with drift m.
Setting m = r = 0, then E[Rt ] = 0 and var(Rt ) = tσ 2 . If we take the standard
deviation as risk measure, then risk grows with the square-root of time. Assuming simple
St −St−1
returns given by Rt = St−1 , the random walk equation implies
St+1 − St St − St−1
= + t
St St−1
and after some algebra
t s
Y S1 X
St = S0 ( + k ) .
S0
s=1 k=1
While in a random walk returns are a zero sum game, it follows that prices are by no
means driftless.
A return Rt+1 is predictable by Ft if the return is measurable w.r.t. to Ft . In

particular, if the return is predictable
E[Rt+1 |Ft ] = Rt+1 , ∀t. (2.46)
That is conditioning on the information set Ft has no value. When returns are not
predictable, prices follow a martingale:
Denition 20. Assume
E Q [St+1 |Ft ] = St , ∀t, (2.47)
with the expectation is under a probability measure Q. Then St is a Ft Q - -martingale.

If S is a martingale, using the tower property of conditional expectation
E Q [St+1 |Ft ] = E Q [St ] . (2.48)
Hence, martingales are not predictable. The expected return of a non-predictable pro-
cess is constant over time but the return itself can vary. If returns are martingales, the
operational form of the EMH (2.44) holds true.
Martingales are key in general equilibrium theory and in pricing of derivatives. The
Fundamental Asset Pricing Equation reads
E(Mt,t+1 St+1 |Ft ) = St , ∀t, (2.49)
with M the stochastic discount factor (SDF) and Mt,t = 1. Therefore, MS is a mar-
tingale. In derivative pricing, the SDF is derived from the interest term structure. In
general equilibrium models with consumption, the SDF depends on the marginal utilities
of consumption at dierent dates, see Chapter 4.
To relate the discussion to AM, consider asset prices St in discrete time t. The return
Rt+ in period t to t+1 of a stock is equal to the capital gain plus a dividend yield D,
i.e.
St+1 − St Dt+1
Rt+1 = + . (2.50)
St St
1
Rewriting this equation St = 1+Rt+1 (St+1 + Dt+1 ) follows which is a linear dierence
equation in the prices. Solving this equation

10 implies for k periods
k j k
! !
X Y 1 Y 1
St = Dt+j + St+k . (2.51)
1 + Rt+m 1 + Rt+m
j=1 m=1 m=1
It is common to assume that asset price grow at a lower rate than the return - the second
term has tend to zero for k → ∞. We get:
Theorem 21. If the asset prices growth is lower than the asset returns, the price St is
equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = Dt+j . (2.52)
1 + Rt+m
j=1 m=1
Since there is no randomness, the future dividends are known in this PV formula. We
extend this formula by adding risk and considering the EMH. Consider the operational
form of the EMH (2.44) and take conditional expectations in (2.50):
E[St+1 |Ft ] + E[Dt+1 |Ft ]

St = .
1 + E[Rt+1 |Ft ]
This shows that an asset pricing model to evaluate E[St+1 |Ft ] is needed to calculate
today's prices. Since the pricing formula has the same structure with and without risk,
the formula of last theorem carries over to the case with risk providing the ex-ante version:
Theorem 22. If the expected asset prices growth is lower than the expected asset returns,
the price St is equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = E[Dt+j Ft ] . (2.53)
1 + E[Rt+m |Ft ]
m=1
j=1
If returns follow a random walk, the discounting factor are deterministic and if one
assumes that dividends are known, then formula further simplies, see the next exam-
ple. Formula (2.53) is a fundamental asset pricing formula where the SDF is given by
10 Iterating the equation and guessing the solution or using the general formula.
the asset's return and not by the substitution rate of marginal utilities if we consider
an equilibrium model, see Section 4. The formula is highly non-linear which makes it
challenging to test it empirically. Straightforward approximations, Taylor series, are used
to linearize the formula. A second manipulation is to write the price of an asset today
as the expected value of changes in dividends and returns. One shifts the pricing equa-
tion one step ahead and subtracts it from the non-shifted expression. We use these two
approaches below without doing the algebra.
Example constant dividends and returns
If expected dividends and returns are constant, the above valuation equation reads
D
St = = constant . (2.54)
R
But empirical evidence shows that expected returns and dividends are both not constant
over time. Therefore, (2.54) is too naive. It implies that the volatilities of the growth
rates are the same:
dRt dDt
volatility = volatility .
Rt Dt
But the return volatility is around 16% while the dividend volatility is only about 7%.
Therefore something else must be time varying. Furthermore, the return volatility is
time varying. Monthly market return volatility uctuated between values of 20% and
more in market stress periods (Great Depression, Great Financial, Crisis) and 2% in the
60s and mid-90s of last century, see next section.
Example Skill and luck, martingales
We show that a little amount of skill makes a hugh dierence for wealth growth in a
gamble - the same observation is true if one considers skills in active asset management.
Consider an investor with initial capital W0 playing a the following dice game: She invests
in each period 1 unit of her capital. The outcome of the strategy in each period is of +1
with probability p or −1 with probability q = 1 − p. She does not change her strategy
over time. The outcome in each period is an IID sequence (Xk ) of random variables. Her
wealth after n periods reads
n
X
Wn = W0 + Xk .
k=1
What is the probability that she attains a nal wealth level Wf > W0 ? To derive the
wealth dynamics equation the rst step is to dene disjoint sets of events which allow to
calculate probabilities. We set
n
X m
X
AW0 ,n = {W0 + Xk = Wf , 0 < W0 + Xk < Wf , m < n}
k=1 k=1
for the set where she reaches the desired wealth level for the rst time after n plays
without being bankrupt before. Since (Xk ) are IID, the sets (AW0 ,n )n are independent.
Therefore the probability p̃(W0 , Wf ) that the investor reaches the desired wealth level
Wf sometime is given by
∞ ∞
!
[ X
p̃(W0 , Wf ) = P AW0 ,n = P (AW0 ,n ) .
n=1 n=1
The following dynamics
p̃(W0 , Wf ) = q p̃(W0 + 1, Wf ) + pp̃(W0 − 1, Wf ) (2.55)
captures game logic. This is a rst order dierence equation. A solution is found by
inserting the guess
p̃(W0 , Wf ) = A + BrW0 , r = q/p (2.56)
if q 6= p with A, B two constants. A, B are determined by the two conditions p̃(0, Wf ) =

0, p̃(Wf , Wf ) = 1. We get for Wf > W0 :
rW0 −1
(
W , if p 6= q ;
p̃(W0 , Wf ) = r f −1 (2.57)
W0
Wf , if p = q.
If the game is fair (a martingale), then the probability to reach a 50 percent higher
wealth level than the starting value of 100 units is 66%. If the investor's strategy has a
small skill component such that q = 0.49 and p = 0.51, then the probability to reach the
desired level is 98%.
Predictability from a forecast point of view uses linear regressions of returns R on

a variable xt of the form
Rt+1 = a + bxt + t+1 (2.58)
with a, b constants, t+1 a sequence of IID normal random variables with mean 0 and
variance σ2. The variable xt can be the return itself or a market price variable such as
price-dividend ratios. The regression (2.58) becomes a random walk if b=0 or if a = 0,
b=1 and xt = Rt . For the latter choice, the random walk regression
Rt+1 = Rt + t+1 (2.59)
implies
t+1
X
Rt+1 = R0 + j , Et (Rt+1 ) = R0 , σ 2 (Rt ) = tσ 2 .
j=1
This shows that R is martingale and that the variance increases over time. R0 = 0
is a reasonable assumption for short term returns and it implies that discounted price
processes are martingales too. Therefore, discounted prices are martingales if the returns
are martingales with expected zero return.
Cochrane (2013) tests for lag returns predictability by considering
Rt+1 = a + bRt + t+1 (2.60)
for US stocks and T bills using annual data, see Table 2.15.
Object b t(b) R2 E(R) σ(Et (Rt+1 ))

Stock 0.04 0.33 0.002 11.4 0.77
T bill 0.91 19.5 0.83 4.1 3.12
T bill excess 0.04 0.39 0.00 7.25 0.91
Table 2.15: Regression of returns on lagged returns annual data 1927-2008. t(b) is the
t-statistic value and σ(Et (Rt+1 )) represents the standard deviation of the tted value
bRt (Cochrane [2013]).
The result shows that stocks are almost not predictable while T bill returns are.
A value of b = 0.04 for stock means that a if returns increase by 10% this year the
expectation is that they will increase by 0.4% next year. Also the R2 is tiny and the
t-statistic is below its standard threshold value of 2. For the T bill returns the story
is dierent - high interest rates last year imply that the rates this year will again be
high with a high probability. Can this foreseeability of T bills be exploited by a trader?
Suppose rst that stocks would be highly predictable. Then one could borrow today and
invest in the stock market. But this logic does not work for T bills since borrowing would
mean to pay the same high rate than one receives. To exploit T bill predictability the
investor has to change his behavior - save more and consume less today which is totally
dierent from the stock case. This is a main reason why one considers excess returns Re
- return on stocks minus return on bonds - in forecasting with Rb the benchmark return:
Re,t = Rs,t − Rb,t . (2.61)
By analysing the excess return one separates the dierent motivations 'to consume less
and to save' from the willingness to bear risk. Table 2.15 shows that considering excess
return we are back for T bills in the almost non-predictable stock case. Lo and MacKin-
lay (1999) nd that short-run serial correlations are not zero and that the existence of
'too many' successive moves in the same direction enables them to reject the hypothesis
that stock prices behave as random walks. There is some momentum in short-run stock
prices. Even if the stock market is not a perfect random walk, its statistical and eco-
nomic signicance have to be distinguished. The statistical dependencies are very small
and dicult to transform into excess returns. Considering transactions costs for example
will annihilate the small advantage due to the momentum structure (see Lesmond et al.
[2001]).
2.4.2 Long Time Horizon Predictions

We consider longer time horizons and use market prices or yields to forecast returns.
This section is based on Cochrane (2005). Following the dividend/price (D/P) issue of
last section, we consider the return-forecasting regressions of Cochrane (2013) in Table
2.16. The regression equation reads
e Dt
Rt→t+k =a+b + t+k (2.62)
St
with Re the excess return dened as CRSP

11 value-weighted return less the three-month
Treasury bill return. The return-forecasting coecient estimate b is large and it grows
e
σ(Et (Rt+1 ))
Horizon b t(b) R2 e ))
σ(Et (Rt+1 e
E(Rt+1 )
1 year 3.8 (2.6) 0.09 5.46 0.76
5 years 20.6 (3.4) 0.28 29.3 0.62
Table 2.16: Return-forecasting regressions, 1947-2009, annual data. t(b) is the t-

statistic value and σ(Et (Rt+1 )) represents the standard deviation of the tted value
bD e Dt
St , σ(Et (Rt+1 )) = σ(b St ) (Cochrane [2013]).
t
for longer time horizon. Hence, high dividend yields (low prices) mean high subsequent
returns and vice versa. The R2 of 0.28 is large when we compare it with an R2 of predict-
ing stock returns on say a weakly basis which are seen to be not predictable. Therefore,
excess returns are predictable by D/P ratios. Fama and French (1988) document that 25
to 40 percent of the variation in long-holding-period returns can be predicted in terms of
a negative correlation with past returns. Behaviorists Behaviorists attribute this 'fore-
castability' to stock market price 'overreaction' which is due to investors facing periods
of optimism and pessimism which cause the deviations from the fundamental asset values
(DeBondt and Thaler (1995)).
The above tests are not stable. First, the point estimate of the return forecasting
coecients and its associated t-statistic vary signicantly if dierent sample periods are
considered. Second, the denition used for 'dividends' impacts the results.
If we take conditional expectations in equation (2.62),
e Dt
Et (Rt+1 )=a+b . (2.63)
St
Since dividend/price ratio varies over time between 1 and 7, return predictability is the
same as to say that expected returns vary over time. Using b = 3.8 and a variation
of D/P by 6 percentage points, turns into a long-term variation of expected returns of
3.8 × 6 = 22.8 percentage points which is too high given that the long-term average
11 Center for Research in Security Prices at Chicago Booth business school.

expected return is 7 percentage points
Dt+k
are
When we analyze the regression of dividend growth,
Dt replaces the return in
(2.62), Cochrane (2013) states:Returns, which should not be predictable, predictable
[see Table 2.16]. Dividend growth, which should be predictable, is predictable. not
This contradicts the traditional view that expected returns are constant and that if
prices fall then future dividends should also decline: Dividends have to be predictable
since they have to approach the low price levels. The above observation states that
on average we observe a dierent pattern. To deepen the discussion, we consider the
multi-period Fundamental Asset Pricing equation (2.53),
∞ j
!
X Y 1
St = Et Dt+k . (2.64)
Rt+k
j=1 k=1
Using log-variables (lower case symbols) change products into sums and we get for one-
period from (2.64):
st − dt = Et (∆dt+1 ) − Et (rt+1 ) . (2.65)
This generalises to many periods with ρ the discount factor:
∞
X
Et ρj−1 (∆dt+j − rt+j ) .

st − dt ∼ (2.66)
j=1
Rearranging, it follows that long-run return uncertainty comes from cash-ow uncer-
tainty (changes in dividends and D/P ratios). The more persistent r and ∆d are, the
stronger is their eect on the D/P ratio since more terms in the summation matter. If
dividend growth and returns are not predictable, their conditional expectations are con-
stant over time, then the D/P ratio is constant which is not observed. This extension
to many periods for the D/P ratio also holds for the variance equation (2.68) where the
discounted summation enters in the return and dividend growth variables. As in the one-
period model, the long-run return and long-run dividend growth regression coecients
must add to one. By regressing the long-term return and dividend growth Cochrane
(2013)states:
Return forecasts - time-varying discount rates - explain virtually all the variance of
market dividend yields, and dividend growth forecasts or bubbles - prices that keep rising
forever - explain essentially none of the variance of price.
This changes the traditional view on the EMH. Traditionally, expected returns were
assumed to be constant (asset pricing model) and stocks were martingales with zero drift
(random walks). In this reasoning, low D/P ratios happens when people expect declines
in dividend growth and variations in D/P are due to cash ow news entirely (dividend
predictability). The above result states that the opposite is true. The variance of D/P
is due to return news and not to cash ow ones.
Predictability is also related to the volatility of prices. Shiller states that if prices are
expected discounted dividends then prices should vary less than their expected variables.
But prices vary wildly more than they should even if we knew future dividends per-
fectly. This is the excess volatility of stock returns pointed out by Shiller.
We claim that return predictability and excess volatility have the same cause. To
obtain an equation for the variance we rst write regressions of returns and dividend
growth on dt − pt with br , bd the respective coecients. Plugging the regressions into
(2.65) we get:
1 = br − bd , 0 = t+1,r − t+1,d (2.67)
where the residuals enter the two regression. Therefore, the expected return can be
higher if the expected dividend is higher or the initial price is lower. The only way the
unexpected return can be higher is if the unexpected dividend is higher, since the initial
price cannot be unexpected. Since a regression coecient is covariance over variance,
1 = br − bd reads:
σ 2 (pt − dt ) = cov(pt − dt , ∆dt+1 ) − cov(pt − dt , rt+1 ) . (2.68)
This shows that D/P ratios can only vary if they forecast dividend growth or forecast
returns in regressions. Since the dierence between the two coecients must be one
(2.67), if one coecient is small in the regression then the other one has to be large.
To capture any positive autocorrelation in price movement econometricians use often

the Autoregressive Moving Average (ARMA) model of Box and Jenkins (1970)
p
X q
X
Rt = ak Rt−k + bk t−k + t + c
k=1 k=1
where the IID error term are t ∼ N (0, σ 2 ). The variance of the error terms is then often
modelled using a Generalised Autoregressive Conditional Heteroskedasticity (GARCH)
model by Bollerslev (1986). The literature documents patterns of persistence which vary
for the asset classes and the markets under consideration. While such patterns are found
on daily, weekly, monthly or even on an annual basis for stocks and bonds, the time
periods are much shorter for FX markets which we consider below.
2.4.3 Cross-Sectional vs Time Series Predictability

The prediction of nancial returns can be based on a cross-sectional and a time-series
approach. Which approach is better suited for return prediction? Consider a momentum
strategy where past winning stocks in a cross section enter in a long position and post
losers dene a short position. Net investment is zero, i.e. the long and short exposure
equalize at inception of the strategy. In a time-series momentum strategy, investors
are long stocks with past returns above zero and short the other ones. The exposure
do not add up to a zero investment but investors are net long in a bullish market.
Using NYSE quoted stocks between 1946 and 2013 and considering the past annual
performance for the long/short selection for the next month a cross-sectional annual
return of 5 percent compared to the time-series strategies return of 9.3 percent. This
observation was reported by Moskowitz et al. (2012). They conclude that time series
strategies fully explain and subsume cross-sectional strategies. We note that the ndings
hold for individual assets as well as for indices of stocks, currencies or commodities.
Evidently, the two strategies are not directly comparable - one being a net zero in-
vestment and the other one either net long or short. Furthermore the threshold levels,
i.e. which asset enters as a long or short position, is arbitrary and possibly, there is a
natural or optimal threshold such that the observed dierences qualitatively change.
Goyal and Jegadeesh (2018) make the two strategies comparable by correcting the
net long/short position. Since more stocks earned positive returns than negative returns
during the sample period, time series' long positions are bigger than the short positions.
The average long and short positions are $ 1.24 and $ 0.76, respectively. Therefore,
the time-series-constructed portfolio earned returns for simply being net long during a
bullish period. The authors therefore add to the cross-sectional strategy a time-varying
investment in the market equal to the dollar value of the dierence between the long
and short sides of the time series strategy each month. Doing this exercise, for NYSE
quoted stocks, the adjusted cross-sectional strategies show an annual return of 9.4 per-
cent; similar to the 9.3 percent found when using time-series strategies. Therefore, the
literature claims that time-series return predictability methods dominate cross-sectional
ones is erroneous.
2.4.4 EMH Extensions and Critique

The debate about the EMH raises questions which are not related how to proceed with
the joint hypothesis testing problem but whether the EMH per se is a meaningful con-
cept. Considering the denition of the EMH in its operationalized form, three objects
matter in the EMH [Lo (2004)]: Prices, probabilities and preferences. By the martingale
property, see equation (2.49), prices, probabilities and preferences enter decision making
of investors in a form such that no prots are possible by trading on the available infor-
mation since any prot is already captured in present prices.
Given the three parts prices, probabilities and preferences of the EMH and the strin-
gent martingale property which combines them, two critical points are immediate: Be-
haviour and technology. It is from a behavioural perspective not convincing that these
highly behavioural sensitive parts of the EMH are related with a single mathematical
property where behavioural facets do not matter. Whether the investors are greedy or
whether they fearing market crashes does not matter how they perceive the odds in price
formation (probability) nor how they value the outcome of their decision. The adaptive
EMH of Lo adapts the original EMH and makes it context-dependent and dynamic. The
adaptive EMH becomes a statement dependent on the environment of the economy and
the markets and the behavior of market participants.
Denition 23 (Lo (2004)). Prices reect as much information as dictated by the combi-
nation of environmental conditions and the number and nature of species [types of agents]
in the economy.
The behavior of the agents is considered to follow evolutionary principles. The dy-
namics of how market participants interact and therefore how the price dynamics of the
assets follows is driven by evolutionary principles which are better suited to describe the
market dynamics than the equilibrium concept in the EMH.
Lo (2004) states the following implications of the adaptive EMH. The risk and reward
relation is not stable over time since the population of agents and how they interact are
time varying. Similarly, the institutional and regulatory set-ups are also not constant
over time. A second implication is that temporary arbitrage opportunities are possible
and therefore, the EMH-critique of Grossman and Stiglitz (1980) does not apply for the
adaptive EMH. The possibility of temporary arbitrage possibilities also shapes the per-
formance of active investment strategies which in the EMH are useless. As an example
one considers the rolling monthly rst-order autocorrelation coecient of the S&P Com-
posite Index returns from January 1871 to 2003. By the EMH, the coecient should
by zero. The empirical plot shows that rst the coecient is typically positive where
there are periods of clustering values of the coecient. Finally, innovation is the key to
survival. The EMH states that certain levels of expected returns can be achieved simply
by bearing a sucient degree of risk. Since in the adaptive form the risk/reward relation
is not constant it turns out that adaption to the changing market conditions is the main
source for stabilizing the risk/return reward.
Technology is one environmental factor which is under permanent change. Consider

the information set Ft which is used in decision-making by the agents or investors. The
amount of data which is processed today and which will made available in the future
due to big data analytics is not comparable with information available in the past. The
Medaillon fund, see below, is an example where technology leads to a comparative ad-
vantage in specifying the information set. More precisely, the fund is able to separate
valuable information from noise by processing permanently the extremely larger number
of signals from the markets, the news and other communication types.
The discussion so far considered non-specic individuals. The examples Warren Buf-
fet and Renaissance Medallion Fund show that particular skills and expertise allow in-
dividuals to generate excess returns as if the markets were inecient while for many
other investors the same markets are ecient. Buet and the Renaissance Medallion
Fund use their skills to predict future returns in very dierent forms. Buet's goal is to
understand specic rm in detail on an idiosyncratic level and then to embed view rm
specic investment view in a sector and macro context. The Medallion Fund is a quant
fund founded by the mathematician Jim Simons. The Fund which was set-up 1998 for
the employees of Renaissance generated in the period 1998 to 2016 an annual return of
80% with 1999 the only year with a loss of around 4 percent. The fund generated in this
period more than USD 55 billion in prot which is more protable by several billions of
USD than the next best funds. Even more notably the invested AuM have been smaller
than those of their competitors.
The fund has always been very secret about the methodology used. One knows that
Simons hired top scientists from the computer industry, notably from IBM, and Ph.D
in mathematics or physics from the top universities. The model which they constructed
is based one signal detection. This means that their powerful IT system is processing
all types of signals which are generated in the world. By a signal not only simple ones
such as realized price changes are considered but also signals from speeches and from
documents are detected. The success of their model is given by their power to separate
noise from valuable information and then to translate them into trades. The model itself
is not a single strategy encoded by a quantitative model but many dierent strategies
are integrated into one system.
2.5 Who Decides?

Investors can decide for themselves or delegate the decision to third parties. Each type
of decision is subject to a comprehensive regulatory framework. After the introduction,
we focus on the MiFID II rules.
An investment decision today has to meet much more regulatory standards than in
the past. Regulation denes restrictions and rules for decision-making, but it never sets
an AM rm goals. Figure 2.19 illustrates the avalanche of regulations following the GFC
and their implementation time line.
12
12 PRIIPs are the Packaged Retail Investment and Insurance-based investment Products documents
and UCITS is The Undertakings for Collective Investment in Transferable Securities Directive for collec-
tive investments by the European Union. Obligations for central clearing and reporting (EMIR, Dodd
Frank) and higher capital requirements for non-centrally cleared contracts (CRR), the obligation to trade
on exchanges or electronic trading platforms is considered by revising MiFID, the so-called The Markets
in Financial Instruments Regulation (MiFIR). US T+2 means the realization of a T+2 settlement cycle
in the US nancial markets for trades in cash products and unit investment trusts (UITs). FIDLEG is
part of the new Swiss nancial architecture which should be equivalent to MiFID II of the euro zone.
In 2013, following the LIBOR and EURIBOR market-rigging scandals, the EU Commission published
legislative proposal for a new regulation on benchmarks (Benchmark Regulation). The Asia Derivative
Reform mainly focus on the regulation of OTC derivatives and should therefore be compared with EMIR
and Dodd-Frank Act. The Market Abuse Directive (MAD) in 2005 and its update MADII resulted in
an EU-wide market abuse regime and a framework for establishing a proper ow of information to the
market. BCBS considers principles of risk data aggregation and reporting by the Basel Committee on
Banking Supervision. Comprehensive Capital Analysis and Review (CCAR) is a regulatory framework
introduced by the Federal Reserve in order to assess, regulate, and supervise large banks and nancial
institutions. EU FTT means the EU Financial Transaction Tax. IRS 871 (m) are regulations of the
IRS about dividend equivalent payment withholding rules for equity derivatives. CRS are the Common
2.5. WHO DECIDES? 91
Figure 2.19: Regulatory initiatives and their implementation time line. See the footnote
for the description (UBS [2015]).
Individual regulations can have strategic or operational implications for AM. High
operational impacts have UCITS, PRIIPS, EMIR or MiFID II. Low Strategic Impact
have PRIIPS and MAD II. MiFID II, the Volcker Rule or Dodd-Frank Act, UCITS have
a high strategic importance. The ability of international banks and large AMs after the
GFC to comply quickly and integrate the regulatory program into their strategic plan-
ning resulted in a competitive advantage over smaller institutions. The know-how of the
international institutions enables them to participate actively in the technological change
eciently. They are almost invulnerable, despite the many and heavy nes imposed on
them by the many scandals in recent years.
Example Impact of Regulation on the Swiss banking sector and AM sector
The absence of the above mentioned advantages for smaller intermediaries impacts
the structure of regional banking sectors. It is estimated that of the approximately 300
Swiss banks in 2014, about one-third will stop operating as an independent brand. A
KPMG study from 2013 (KPMG [2013]) summarizes:
Reporting Standards of the OECD for the automatic bank account information exchange.
• A total of 23 percent of Swiss banks faced losses in 2012. All of them with AuM
of less than CHF 25 billion.
• Non-protable banks in 2012 were mostly not protable in previous years too.
• Dispersion between successful banks (large and small ones) and non-performing
banks (small ones) is increasing.
• The performance of small banks is much more volatile than that of larger ones.
• Changes of business model in large banks seem to be successful.
• A total of 53 percent of the banks reported negative net new money (NNM).
Small asset managers, many of them rms with less than 5 employee's, faced after the
GFC's regulatory requirements a cost and lack of knowledge problem. They failed to
have legal and compliance know how and it was also not protable to higher specialists in
these elds. Similarly, they could not invest in new, scalable technologies for accounting,
strategy construction, performance calculation and attribution. etc. Both factors led to
platform-as-a-service (PaaS) innovations where the dierent services are outsourced and
are bought by connecting via API technology.
Many of the regulatory initiatives launched in recent years are related to asset man-
agement and trading. We consider the eurozone. The Alternative Investment Fund
Managers Directive (AIFMD) mainly acts in the hedge fund sector, whereas UCITS are
key for the fund industry. EMIR regulates the OTC derivative markets, and PRIIPS
initiative is responsible for the key information for retail investors in the eurozone. Mi-
FID II provides harmonized regulation for investment services across the member states
of the EU with one of the main objectives being to increase competition and consumer
protection in investment services. In the US, the Dodd-Frank Act is the counterpart of
many European initiatives.
Regulatory initiatives place greater demands on asset managers and their service
providers. They force changes in the areas of customer protection, agreements with
service providers, disclosure of regulatory and investor information, distribution channels,
trade transparency, and compliance and risk management functions (PwC [2015]).
2.5.1 MiFID II
The MiFID II Directive implements the G20 Pittsburgh Summit Agreement in 2009 in
the euro area and for all non-EU nancial intermediaries oering investment products in
the eurozone. It requires the adoption of 32 legal acts by the European Commission, 47
regulatory standards, 14 performance standards and 10 packages of measures.
13 MiFID
13 Similar remarks apply also to other regulatory initiatives such as Dodd Frank Act in the US. Its
implementation requires to create 398 new rules for governing nancial activities, disclosures and pro-
II has the following goals:
• The creation of a robust framework for all nancial market players and nancial
instruments.
• Improving the supervision of the various market segments and market practices, in
particular OTC nancial instruments.
• Strengthening market integrity and competition through greater market trans-

parency.
• Harmonization and strengthening of regulation.
• Improving investor protection.
• Limiting the risks of market abuse in relation to derivatives on commodities, in

particular for futures of essential goods.
Investor protection is based on four topics. First, inducements, i.e. the need to
disclose independent versus non-independent status of advice and the prohibition for
discretionary managers and independent advisers to be involved in inducements. Prod-
uct governance means that the manufacturers' product approval process has to include
the target market denition which has to be taken into account by the distributors and
which has to be tracked by the asset managers. Suitability and appropriateness
requires from all investment rms operating in EU countries to provide clients with ade-
quate information for assessing the suitability and appropriateness of their products and
services, and to comply with best execution obligations. Finally, client information
requires that enhanced information is shared with clients, both regarding content and
method such as in particular costs and charges for services or advice.
In the eurozone, suitability and appropriateness have to follow client segmentation

and intermediation segmentation (see Figure 2.20). This segmentation applies to all EU
and all non-EU banks oering investment products in the zone.
Intermediation Channel Segmentation
• Execution only: Investors decide themselves and investment rms only execute
orders.
• Advisory: Investors and investment rm sta interact. While relationship managers
or specialists advise the investor, the investment decision is nally made or approved
by the investors themselves. Advisory was the traditional intermediation channel
before the nancial crisis of 2007.
cesses, conduct 67 studies, and issue 22 periodic Reports. The law itself consists of 2'300 pages, without
estimating the nal documents for implementation.
Figure 2.20: Client segmentation and intermediation segmentation as per MiFID II.
• Mandate: The investor delegates the investment decision in a mandate. The man-
date contract reects the investor's preferences. The portfolio manager chooses
investments within the contracted limits. Many banks and asset managers moti-
vated their clients to switch from the advisory to the mandate channel after the
GFC. The main reasons for this are lower business conduct risk and better oppor-
tunities for automatization. These reduce production costs and enhance economies
of scale. Since the active portfolio managers are benchmarked against the CIO's
TAA mandates they face the same problems as actively managed funds - most of
them will turn out to be zero-alpha funds, see Section 3.6.6.3. This will motivate
many customers to move back to the advisory or execution only channel.
Client Segmentation. Investment rms must dene written policies and procedures
according to the following categorization:
• Eligible counterparties such as banks, large corporates, and governments.
• Professional clients. A professional client possesses experience, knowledge, and

expertise with which to make his or her own investment decisions and properly
assess the risks thus incurred.
• Retail clients (all other clients).
Wealth as the sole variable for the classication of customers is no longer applicable.
Customers can both opt to opt up or down. So they can choose a less or stringent
protection category. Suitability and appropriateness requirements are dened in each
cell of the 3×3 segmentation matrix (Figure 2.20). Client suitability addresses the
following six points:
1. Information on clients
2. Information provided to clients
3. Client knowledge and experience
4. Financial circumstances of the client
5. Investment objective
6. Risk awareness and risk appetite
These six points reect the parameters that dene the optimization problem of a
rational economic investor. To determine the preferences of an investor one needs to
have general information about the investor (3.1) and specic risk attitudes (6), which
both enter into the objective function (5). The optimization of the objective function
leading to the optimal investment rule is carried out under various restrictions: the
budget restriction (4) and restrictions of admissible securities due to their complexity or
the experience of the investor (3). Tax issues, legal constraints, and compliance issues
also enter into the restriction set and require information to be provided to the client
(3.3). These six points are therefore sucient for the investor to determine his or her
optimal investment strategy.
Client product suitability consists of requirements that ensure that the product
is suitable:
1. Specic service-/product-related restrictions
2. Adverse tax impact
3. Requirements for prospectuses
4. Disclaimer
These requirements become less demanding the more experienced the client is. Suit-
ability in advisory services requires qualied sta and an appropriate incentive structure
in the asset management rm.
2.5.2 Investment Process for Retail Clients

How are the investor's preferences elicited, transformed into investment guidelines, and
managed over time for retail clients? Figure 2.21 illustrates an investment process. Given
the client's need, his or her preferences are compared with the CIO view and its trans-
formation into CIO portfolios. This comparison denes the theoretical client portfolio.
Using the securities from the producers the theoretical portfolio is transformed into the
(real) client portfolio. Life-cycle management controls the evolution of the client portfolio
over its life cycle and compares the risk and return properties with the initially dened
client prole. If necessary, this process sends warning or necessary activity messages to
the client and/or advisor. A CIO view typically consists of several inputs such a quanti-
tative model, research macro view and market view. Smaller institutions do not have the
resources to provide all these inputs. They then buy the CIO view from another bank.
Figure 2.21: An investment process. The three channels from left to right are the client
- advisor channel, the investment oce, and the producers of the assets or portfolios
(trading and asset management).
Traditionally, intermediaries use questionnaires to reveal investors' preferences. This

approach has several drawbacks.
• Reliability. It is dicult to test to what extent the investor understands the ques-
tions.
• Zero emotion. Questions are by denition drawn up in a laboratory environment.
• Restricted oering. Due to the low degree of automation, the solutions oered
cannot consider individual preferences on a ne level.
• Missing scenarios and consequences.
• Life-cycle management, when investment circumstances are changing, is dicult to

handle.
• Time and place dependent.

• Missing economies of scale for intermediaries; lack of control standards.
Current technologies make it possible to use scenario engines to obtain a more reliable
client prole, to generate a more rened portfolio oering, to set up more comprehensive
and real-time life-cycle management (portfolio surveillance; reporting) and to make some
steps in the whole investment process scalable for the intermediary.
New trends in technology allow the process outlined in Figure 2.21 to be shaped. In
extremis, there will be no need for an investor to disclose his or her investment prefer-
ences since the data already exist in the virtual world. If, furthermore, the investment
views are formed in a fully automatized manner using publicly available data, then the
function both of advisor's and of the CIO will become superuous.
These approaches fall under the label 'big data', see the Section Big Data.
2.5.3 Mandate Solutions for Pension Funds

This section follows Lanter (2015). Figure 2.22 illustrates the investment decision process
for a pension fund.
Figure 2.22: Process for a mandate in a pension fund (Lanter [2015]).
Asset Liability Management (ALM) is the rst step, often involving external advice.
This analysis provides a transparent picture of current assets and liabilities and how they
may change in the future due to the various risk factors. Fulllment of the Pension Fund's
long-term objectives based on the analysis denes the strategic asset allocation, ie the
allocation that should be stable through the possible future economic and nancial mar-
ket cycles. The tactical asset allocation is the next step. The pension funds must decide
whether to delegate the TAA to external portfolio managers in the form of a mandate or
whether they will manage the assets within the fund. Furthermore, the benchmark and
the denition of risk-based areas for tactical asset allocation have to be determined. It
must also be decided whether the reporting, administration and risk controlling functions
of the investment portfolios should also be outsourced. In case of outsourcing, request for
proposal will be used. The entire investment decision outsourcing process is conducted
with the involvement of external consultants. Goyal and Wahal (2008) estimate that 82
percent of US public pension funds use pension consultants.
We discuss in Section 6.5.4 that the extensive use of investment consultants raises is
by no means free of conicts for the performance of the delegated investments and for
the selected asset managers. Critics for example often make them the accusation to be
drivers of new investment strategies which turn out to be more complex (hence more
dicult to handle, understand and also more expensive) than the actual used ones but
where it is not clear whether they lead to a larger performance.
The other steps in the process, as illustrated and described in the last gure, are
evident.
2.5.4 Conduct Risk

The largest risk for investment rms is conduct risk in the investment process. Conduct
risk comprises a wide variety of activities and types of behavior that fall outside the other
main risk categories. It refers to risks attached to the way in which all employees con-
duct themselves. A key source of this risk is the diculty of managing information ows,
their impact, their perception, and responsibilities in an unambiguous way. Consider
an execution-only investor who does not understand a particular statement in a given
research report. Can the relationship manager help the execution-only investor without
entering into conict with his or her 'execution-only' status - that is, help without advis-
ing? To hedge their conduct risk sources investment rms are forced to work out detailed
and well-documented processes concerning the information ow between themselves and
the customer. While this paper work may be eective as a hedge against conduct risk,
its eciency is questionable.
Example
The Financial Stability Board (FSB) stated in 2013: One of the key lessons from
the crisis was that reputational risk was severely underestimated; hence, there is more
focus on business conduct and the suitability of products, e.g., the type of products sold
and to whom they are sold. As the crisis showed, consumer products such as residential
mortgage loans could become a source of nancial instability. The FSB considers the
following issues key for a strong risk culture:
• Tone from the top: The board of directors and senior managers set the institu-
tion's core values and risk culture, and their behaviour must reect the values being
espoused.
• Accountability: successful risk management requires employees at all levels to un-

derstand the core values of the institution's risk culture. They are held accountable
for their actions in relation to the institution's risk-taking behaviour.
• Eective challenge: a sound risk culture promotes an environment of eective chal-

lenge in which decision-making processes promote a range of views, allow for testing
of current practices, and stimulate a positive, critical attitude among employees and
an environment of open and constructive engagement.
• Incentives: nancial and non-nancial incentives should support the core values
and risk culture at all levels of the nancial institution.
Conduct risk is a real source of risk for investment rms: nes worldwide amounted
to more than USD 100 billion for the period 2009-2014. These nes and the new reg-
ulatory requirements raise serious protability concerns for investment rms and banks
(see Figure 8). But there is more than just nancial costs at play for the intermediaries.
A loss in trust in large asset managers and banks can prove disastrous. In particular if
new entrants without any reputational damage can oer better services thanks to Fin-
Tech. Figure 2.23 shows the evolution of the nes imposed by the British regulatory
authorities (Left Panel) and the global value of nes. One sees that it took about three
years after the GFC to charge the nes to the banks, insurance companies and asset
managers. The global gures now exceed USD 230 bn since the start of the GFC. The
horizontal lines in the histogram show how large the individual nes were. It follows from
example that there was a ne in 2014 of more than USD 15 bn to a single institution.
In the US, enforcement statistics from the Securities and Exchange Commission (SEC)
show an increase in enforcement actions in the category investment advisor/investment
company of roughly 50% following the GFC. Compared to the pre-crisis gures of 76 and
97 cases per year, respectively, 2011-2014 returned respective gures of 130 and 147 cases.
Anti-tax-evasion and anti-money-laundering measures are driven by the OECD. Af-

ter the Base Erosion and Prot Shifting (BEPS) report of 2013, asset managers operate
in a world with country specic reporting of prots and tax paid. Therefore, oshore
nancial centers try to have access to double tax treaties (DTT) which motivates asset
managers to use cross-border passports and reciprocities. But it also forces asset man-
agers to decide in which location they want to be active and where they want to step back.
Figure 2.23: Left Panel: Table of nes imposed in the UK (FSA and FCA web pages).
Right Panel: Global value of nes (FT research, June 2015).
Example - Hedge fund disclosure
Patton et al. (2013) show that disclosure requirements for hedge funds are
not sucient to protect investors. The SEC for example requires US-based hedge
funds managing over USD 1.5 billion to provide quarterly reports on their perfor-
mance, trading positions, and counterparties. The rule for smaller hedge funds are less
detailed. Instead, one has to care seriously about the quality of the information disclosed.
We consider monthly self-reporting of investment performance where thousands of

individual hedge funds provide data to one or more publicly available databases which
are then widely used by researchers, investors, and the media.
Are these voluntary disclosures by hedge funds reliable guides to their past perfor-
mance? The authors state:
... track changes to statements of performance in 'vintages' of these databases

recorded at dierent points in time between 2007 and 2011. In each such 'vintage', hedge
funds provide information on their performance from the time they began reporting to
the database until the most recent period.
2.6. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 101
Vintage analysis refers to the process of monitoring groups and comparing perfor-
mance across past groups. These comparisons allow deviation from past performance to
be detected. The authors nd
that in successive vintages of these databases, older performance records (as far back
as 15 years) of hedge funds are routinely revised: nearly 40 percent of the 18, 382 hedge
funds in the sample have revised their previous returns by at least 0.01 percent at least
once, and over 15 percent of funds have revised a previous monthly return by at least
1 percent. These are very substantial changes, given the average monthly return in the
sample period is 0.64 percent.
Less than 8 percent of the revisions are attributable to data entry errors. About
25 percent of the changes were based on dierences between estimated values at the
reporting dates for illiquid investments and true prices at later dates. Such revisions
can be reasonably expected. In total, 25 percent (50%) of the revisions relate to returns
that are less than three months old (more than 12 months old). They nd that negative
revisions are more common, and larger when they do occur than positive ones. They
conclude that on average initially provided returns signal a better performance compared
to the nal, revised performance. These signals can therefore mislead potential in-
vestors. Moreover, the dangerous revision patterns are signicantly more likely revised for
funds-of-funds and hedge funds in the emerging-markets style than for other hedge funds.
Can any predictive content be gained from knowing that a fund has revised its history
of returns? Comparing the out-of-sample performance of revising and non-revising funds,
Patton et al. (2013) nd that non-revising funds signicantly outperform revising funds
by around 25 basis points a month.
2.6 Risk, Return, Diversication and Reward-Risk Ratios

The rst step toward investment theory is to gain insights into the interplay between
risk, return, and diversication without relying on a particular investment model. We:
• show on an ad hoc basis when a portfolio is more than the sum of the parts - that
is, more return and less risk;
• analyze the long-term performance of investments before and after costs;
• consider risk scaling;
• discuss two proposition from statistics concerning diversication;
• introduce to diversity and concentration risk;
• show how fees impact long-term returns;

• introduce to the debate between active and passive management.
2.6.1 Long-term Risk and Return Distribution

Table 2.17 shows the risk and return distribution and the wealth growth for the period
1925-2013 for dierent asset classes (Kunz [2014]).
Investment of CHF Return Risk

100 after 88 years gives Average annual return Standard deviation
Stocks USA 71,239 7.75% 23.50%
Stocks CHF 70,085 7.73% 19.30%
Stocks DEU 44,669 7.18% 41.30%
Stocks GBR 34,619 6.87% 25.30%
Stocks FRA 18,939 6.14% 29.20%
Stocks JPN 5,367 4.63% 29.80%
Stocks ITA 2,552 3.75% 28.30%
Bonds CHF 3,611 4.16% 3.70%
Bonds GBR 1,880 3.39% 12.70%
Bonds USA 1,196 2.86% 12.50%
Bonds FRA 212 0.86% 15.00%
Bonds ITA 195 0.76% 20.40%
Bonds JPN 57 -0.64% 21.20%
Deposit CHF 1,070 2.73% 1.20%
Gold 1,052 2.71% 15.80%
Table 2.17: Average annual returns and standard deviations of the asset classes and
growth of capital after 88 years. The calculation logic being 71, 239 = 100(1 + 0.075)88 .
The Figure 2.24 shows the distribution of return and risk, measured by the standard
deviation, over 88 years of investments.
In the long run equity had in most economies higher returns and risks than its bond
counterparts. We discuss in Chapter 4 why nevertheless an advice to invest in stocks
only if the investor has a long-term horizon is not an optimal strategy. Furthermore, a
small dierence in the average return creates a large dierence in wealth accumulation;
the compounding eect. Finally, gold has in this long period a large risk component but
only a small average return. This rst analysis allows us to consider diversication next.
2.6.2 Diversication of Assets - Portfolios

Can we combine dierent investment classes to form a portfolio with higher return and
lower risk than the individual asset classes above? This is the diversication question.
If there is a positive answer, is there an optimal way of diversifying the investment? We
apply diversication to the data in Table 2.17 using an ad hoc portfolio construction
Return and standard deviation, 1925‒2013

9.00%
8.00%
7.00%
6.00%
Averrage annual returns
5.00%
4.00%
3.00%
2.00%
1.00%
0.00%
-1.00%
-2.00%
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00%
Standard deviation
Figure 2.24: The distribution of return and risk, measured by the standard deviation,
over 88 years of investments. The square marks represent equity, the diamonds bonds,
the triangle is cash, and the circle is gold (data from Kunz [2014]).
approach: the weights are not optimally chosen using a statistical model but are xed
based on heuristics (experience). We form four portfolio strategies - called conservative,
balanced, dynamic, and growth, see in Table 2.18.
Strategy
Conservative Balanced Dynamic Growth
Equity 25% 50% 75% 100%
CH 10% 20% 30% 40%
Rest of world total (six countries)* 15% 30% 45% 60%
Rest of the world per country 2.5% 5% 7.50% 10%
Bonds 75% 50% 25% 0%
CH 66% 44% 22% 0%
Rest of world total (six countries)* 9% 6% 3% 0%
Rest of the world per country 1.50% 1% 0.50% 0%
Table 2.18: Investment weights in four investment strategies (data from Kunz [2014]).
*Investment in G, F, I, J, USA, UK.
Using data from Figure 2.24 for the dierent asset classes, we get the returns in Table
2.19.
Figure 2.25 shows that a combination of risk and return gures of basic asset classes
Investment Return Risk

100 CHF after 88 years gives Average annual return Standard deviation
Conservative 143,131 8.61% 19.80%
Balanced 76,949 7.84% 15%
Dynamic 33,318 6.82% 10.40%
Growth 11,702 5.56% 6.30%
Table 2.19: Average annual return, risk, and wealth growth for the four investment
strategies.
can lead to a portfolio from which more return can be expected for the same risk or less
risk for the same return. The green marks for the investment strategies form a virtual
boundary line. In fact, the Markowitz model is an example that there is a ecient frontier
such that there can be no portfolio construction with more return and lower risk than
any portfolio on the ecient frontier within this model approach.
Figure 2.25: Distribution of return and risk, measured by the standard deviation, over
88 years of investments. The square marks represent equity, the diamonds bonds, the
triangle is cash, and the circle is gold. The dots represent the four investment strategies
- conservative, balanced, dynamic, and growth (data from Kunz [2014]).
Two questions regarding diversication arise:
• What are the risks of not diversifying? Concentration risk.
• When does diversication make little sense?

Consider the rst question. Often employees own many stocks of their employer directly
or indirectly in their pension scheme. Such stock concentration can be disastrous. Enron
employees for example had over 60% of their retirement assets in company stock. They
faced heavy losses when Enron went bankrupt. Diversication reduces such idiosyncratic
risk.
Institutional investors also fail to diversify suciently. The University of Rochester's

endowment in 1971 was USD 580 million, placing it fourth in the respective ranking of
private universities. In 1992, it ranked twentieth and by 2011 had dropped to thirtieth
place. A main reasons was the concentration held in Eastman Kodak, which led for
bankruptcy in February 2012. Boston University invested USD 107 million in a privately
held local biotech company in the 1980s. The rm went public and suered a setback.
In 1997, the university's stake was worth only USD 4 million. The Norwegian sovereign
wealth fund, in contrast, was created precisely to reap the gains from diversication. The
fund swapped the highly concentrated oil revenues into a diversied nancial portfolio.
While anticipated events are incorporated into market prices, most of the return ulti-
mately realized will be the result of unanticipated events. Investors do not know their
timing, direction, or magnitude. Investment diversication is a mean to reduce these
risks.
Considering the second question, Warren Buet states: Diversication is protection

against ignorance. It makes little sense if you know what you are doing.
Diversication also reduces complexity of portfolio risk management. If a portfolio

is well diversied, the hope is that idiosyncratic risks compensate for each other leaving
for management only market risk instead of many idiosyncratic risk factors. Figure
2.26 shows that dependence between asset classes strongly vary and also change sign.
Diversication benets vary over time and are dicult to predict.
To highlight these statements, consider the fraction of wealth φ invested in asset 1

and the remainder in asset 2. Portfolio variance reads
σp2 = σ12 φ2 + σ22 (1 − φ)2 + 2ρσ2 σ1 φ(1 − φ) (2.69)
with ρ the correlation between the two assets. Portfolio risk becomes additive only if
the assets are not correlated. A negative correlation value reduces portfolio risk which
motivates the search for negatively correlated risks. If correlation is −1, portfolio risk
becomes a complete square and can be eliminated completely in two risky asset case by
solving σp2 = 0 w.r.t. the strategy. If correlation is +1, which is typical for many asset
classes when markets are under stress, portfolio risk is maximal.
Example Needed Investment Amount for Diversication

Figure 2.26: Pair-wise correlations over time for dierent asset classes (Goldman Sachs
[2011]).
Elton and Gruber (1977) show that the individual risk of stocks could be reduced
from 49 percent to 20 percent by considering 20 stocks per market. Adding another
980 stocks only reduces risk further to 19.2 percent. The eect of adding more and more
assets has a diminishing impact on risk diversication.
How much wealth is needed to achieve a diversication in 20 securities? Given the

average price of stocks and bonds in Swiss francs the amount invested in one security
should be around CHF 10,000. Lower investments are not ecient. Therefore, one needs
CHF 200,000 for a pure equity portfolio of Swiss stocks. Diversifying this portfolio to
US, European, and Asia-Pacic stocks requires an investment of CHF 0.8 million. If the
portfolio should be an equal dollar mixture of bonds and equities the amount needed for
diversied single security investments is CHF 1.6 million. Therefore, only wealthy indi-
viduals can invest directly in cash products to generate a suciently diversied portfolio.
This is a rationale for the existence of ETFs, mutual funds, or certicates, which oer a
similar diversication level to less wealthy clients as well.
2.6.3 Two Mathematical Facts About Diversication

The following propositions make precise how idiosyncratic and market risk behave if one
increases the number of assets, see Section 7 for the proof.
Proposition 24. Assume N uncorrelated asset returns and equally weighted (EW) in-
vestment, that is φk = 1/N for all assets. Increasing the number of assets N reduces
portfolio risk σp2 arbitrarily and monotonically.
The EW-assumption is not necessary but facilitates the proof. To eliminate portfolio
risk completely in an portfolio with uncorrelated returns, one only has to increase the
number of assets in the portfolio. The proof reads:
   
N N
X 1 1 X Nc
σp2 = var  Rj  = 2 var  Rj  ≤ 2
N N N
j=1 j=1
with c the largest variance. If assets are correlated to each other, which removes an
unrealistic assumption in the last theorem, then:
Proposition 25. Consider an EW portfolio strategy 1/N . The portfolio variance is equal
to the sum of market risk and idiosyncratic risk. Increasing N , the latter one vanishes
while market risk can only be reduced to the level of average portfolio covariance cov.
The proof is only slightly more complicated than the former proof, and leads to:
var 1
σp2 = + (1 − )cov .
N N
Hence, covariances prove more important than single asset variances in determining the
portfolio variance. Taking the derivative of the portfolio variance w.r.t. the number of
assets N, − N12 . Adding to N = 4 a further asset
the sensitivity becomes proportional to
1 1
reduces portfolio risk by
25 , adding another asset to 9 assets the reduction is only 100 .
Therefore, reducing portfolio risk by adding new assets becomes less and less eective
the larger the portfolio is.
We show with that the two-asset intuition does not carry over to three or more as-
sets. Consider an investor which wants to increase the return on investment by selling
volatility and correlation of two stocks S1 and S2 . He sells the risk that any of the two
stocks breaches a barrier level in specied time period. The price for this sold volatility
and correlation risk is transformed into a xed coupon which the investor receives. The
sold option is a down-and-in put option since the barrier level is typically lower than the
strike of the option and the option has a value dierent from zero only if the barrier is
breached ('in'). Barrier reverse convertibles are a wrapper for such a payo. An investor
gets at maturity his invested amount plus the coupon if there was no breach or the the
coupon plus the lowest stock value at maturity in case of a breach. The higher the prob-
ability of a breach, the higher the coupon to the investor.
Suppose that both stocks can move up and down with the same probability. If +1,
the change of a barrier breach is 50% - either both move up or down and breach. If −1,
the probability is 1 since one stock has to go down and breach the barrier. If they are
uncorrelated, the probability is 75% since there is only one state with both up where
there is no breach in the four possible states. Hence, for two assets the more negatively
correlated the assets are the higher the risk of breaching the barrier and therefore the
higher the coupon.
Consider the same investment with 3 stocks. The intuition of the two asset case
does not generalize. Given 3 assets there are 3 pairwise correlations. That all three
correlations equal to −1, which would lead to the highest coupon, is not possible. If two
correlations are −1, then the third one has to be +1. This shows that the 2-asset case
logic does not extend to the three asset case.
2.6.4 Risk Model

A general risk model arises from the mapping of asset returns into a portfolio context.
Model Asset Return → Portfolio Risk Analytics
A portfolio context is used since building a risk model on say 10'000 individual assets
would mean to consider 10'000 models. Therefore, a risk model is build for all assets.
Traditionally, risk is dened as the variance of returns. Most risk models in asset
management are based on linear multi-factor return models. These models are simple,
clear and tractable. The hope is to capture the dependency structure between the many
assets by considering a much smaller number of factors. Factors should be independent
of one another. If we have N assets, the dimension of the covariance matrix N (N − 1)/2
is reduced to K + N (K + 2), if there are K factors. Formally, for asset i out of N assets,
a generic linear models reads
Ri,t = αi + βi0 Ft + i,t (2.70)
where there are K factors F , R, F, are IID Gaussian
Rt ∼ N (α, C) , Ft ∼ N (0, I) , t ∼ N (0, D2 )
with D2 the diagonal idiosyncratic covariance matrix with the variances of the idiosyn-
cratic risks as entries and I the identity matrix. The (N × K ) matrix β is the loadings
matrix. The dynamics (2.70) implies
σi2 = βi0 CF βi + Di2 C = βCF β 0 (2.71)
with CF the factor covariance matrix.
Consider the following correlation matrix:

 
1
 0.09 1 
ρ=
 0.02 0.12

1 
0.01 0.18 0.94 1
The matrix indicates that the rst and second assets as well the third and fourth
assets are driven by the same risk factor. The other correlations are also of the same
order of magnitude. Instead of considering (4 × 3)/2 = 6 correlations, one would start
with a two-factor model.
The linear factor model for the assets transform in the same functional form for
portfolios. Let φj be the portfolio weights (long or short) which add up to 1. Then the
portfolio return Rp can be written:
Rp,t = αp + βp0 Ft + p,t (2.72)
where all parameters are portfolio weighted, for example αp = φ0 α and portfolio risk
reads
X
σp2 = βp0 CF βp + φ2 Di2 . (2.73)
i
Therefore, a risk model specication means to x the factor covariance matrix CF , the
factor exposures β 2
and the residual risks D .
The risk model hierarchy decision are as follow:
• First, factors or betas are not specied. Then a statistical factor model is used such
as Asset Pricing Theory (APT) model. A model provider is Sungard. Principal
Component Analysis (PCA) is used for the estimation. Statistical factor models
are the best in-sample performing ones by construction. The resulting factors are
dicult to interpret and they can vary strongly. The models are not meaningful
in wealth management when portfolio risk has to be explained but they are used
in trading thanks to the their precision for short time horizons to circumvent the
instability problem.
• Second, factors are dened and betas are estimated by a time-series regression.
This set-up is used by UBS, Blackrock, swissQuant, Quantec, R-Squared.
• Third, betas are dened and factors are estimated using a cross-sectional regression.
Providers of this model are Barra, Axioma, Bloomberg.
The second and third model both lose information, i.e. estimation error enters the risk
model either in the stock betas, factor returns and covariances. In the second method,
the estimation error in the betas are diversied away on the portfolio level if N is large.
This is not true for the third model: Estimation risk on the portfolio and individual
asset level are the same. Both methods assume that the variables in the estimation are
observable.
The time-series model (type 2) can only be used when the stock betas are stable over
time. But style investing (factor investing) assumes that betas are not stable. In risk
models where style factors enter, a hybrid approach is necessary - one part for the stable
model (second model) and one part for the style part (third model).
2.6.5 Concentration and Diversity

The attentive reader remarked that we have not dened the notion of 'diversication'.
There is not a single, widely accepted denition. We consider some concepts of concen-
tration risk and diversity: The diversication index of Tasche (2008), the concentration
indices of Herndahl (1950) and Gini (1921), and the Shannon entropy, which measures
diversity; see Roncalli (2014) for a detailed discussion.
Tasche's diversication index

The diversication index of Tasche (2008) is the ratio between the risk measurement of
a portfolio and the weighted risk measurement of the assets. If one species the risk
measure to be the volatility, the Tasche diversication index TA reads
p
hφ, Cφi
TA(φ) = , (2.74)
hφ, Di
where D is the vector of volatilities. The numerator is equal to the portfolio risk term in
the Markowitz model (3.1). For general risk measures, the index is dened by the ratio
of the portfolio risk over the sum of the risk over all risky positions.
The most-diversied-portfolio MDP portfolio minimizes the diversication index of

Tasche, see Choueifaty and Coignard (2008). Consider N risky assets, D the vector of the
asset volatilities and φ a portfolio where the components sum up to 1. The diversication
ratio is dened by
1
DR(φ) = , (2.75)
T A(φ)
i.e. the ratio of of the weighted average of volatilities divided by the portfolio volatility.
This ratio is smaller than one and only equal to one if all wealth is invested in a single
asset. Given a set of constraints M, the MDP is the portfolio which maximizes the
diversication ratio under the set of constraints. If the expected returns of the assets
are proportional to the their volatilities, expected returns replace in DR the nominator
hφ, Di. Then, maximizing DR is the same as maximizing the Sharpe ratio of the portfolio
and MDP is the tangency portfolio.
Herndahl's concentration index

Consider the relative weight vector φ ≥ 0 of a long-only portfolio. Therefore, the weights
are probabilities. Maximum concentration occurs if one weight has the value one and all
other weights are zero. Risk concentration is minimal if the portfolio weights are equally
weighted. The Herndahl index which is similar to the Gini Index, is dened by
N
X
Herndahl Index = φ2k . (2.76)
k=1
N
It takes the value +1 in the case of maximum concentration and 1/N = N2
in the EW
portfolio case.
Shannon entropy diversity measurement

The Shannon entropy S for a relative weight long-only portfolio vector φ is dened by
N
X
S(φ) = − φk ln φk . (2.77)
k=1
To understand the entropy measurement, consider two dies - one symmetric and the other
distorted. The outcome for the symmetric one is more uncertain than for the other die.
Shannon axiomized this notion of uncertainty in the 1940s in the context of information
theory. He proved that there exists only the function S(φ) above, which satises his eight
axioms describing uncertainty.
In nance, entropy measures how close dierent probability laws are to each other.
The prior and the posterior distribution in the Black-Litterman model are an example.
The space of probability laws is just a set and not a vector space. It is not trivial to nd
a reasonable measuring stick to measure nearness of say two normal distributions, one
with mean 0.1 and variance 0.2 and the other one with mean 0.2 and variance 0.1. The
relative entropy S(p, q), the Kullback-Leibler divergence, for two discrete distributions p
and q, dened by
X pk
S(p, q) = − pk ln( ), (2.78)
qk
k
measures the similarity of two probability distributions.
Roncalli (2014) illustrates the dierent notions of diversication. There are 6 assets
with volatilities 25%, 22%, 14%, 30%, 40%, and 30%, respectively, and the same returns.
Asset 3 has the lowest volatility. The correlation matrix reads
 
100%

 60% 100% 

 60% 60% 100% 
ρ=  .

 60% 60% 60% 100% 

 60% 60% 60% 60% 100% 
60% 60% 60% 60% 20% 100%
How will this local deviations - the lower volatility for asset 3 and the lower correlation
between asset 5 an 6 - be perceived (it at all) and valued using dierent portfolio diversity
measures? The following portfolio are compared: the global minimum variance (GMV),
the equal risk contribution (ERC), the most diversied (MDP), and the equal weights
(EW) portfolios. The GMV portfolio is the Markowitz optimal solution in (3.1) with
minimal risk. ERC is the portfolio in which the risk contribution of all six assets is
set equal to 1/6 percent, see discussion following the example. Roncalli (2014) provides
us with the results in Table 3.5 where φj , RCj , the risk contribution, are expressed in
percentage values.
Since correlation is uniform, but for one asset, it is 'overlooked' in the GMV alloca-
tion. Therefore, the GMV optimal portfolio picks asset 3 with the lowest volatility. The
GMV portfolio is heavily concentrated. Portfolio risk measured by GMV is the smallest,
Asset GMV ERC MDP EW

φj RCj φj RCj φj RCj φj RCj
1 0 0 15.7 16.67 0 0 16.67 16.18
2 3.61 3.61 17.84 16.67 0 0 16.67 14.08
3 96.39 96.39 38.03 16.67 0 0 16.67 8.68
4 0 0 13.08 16.67 0 0 16.67 19.78
5 0 0 10.86 16.67 42.86 50 16.67 24.43
6 0 0 14.49 16.67 57.14 50 16.67 16.86
Portfolio σ 13.99 19.53 26.56 21.39
Tasche index 0.98 0.8 0.77 0.8
Gini index 0.82 0.82 0.17 0 0.69 0.67 0 0.16
Herndahl index 0.92 0.92 0.02 0 0.41 0.4 0 0.02
Table 2.20: Comparison of the global minimum variance (GMV), equal risk contribu-
tion (ERC), most diversied (MDP), and equal weights (EW) portfolios. All values are
percentages (Roncalli [2014]).
which comes as no surprise.
The MDP, on the other hand, focuses on assets 5 and 6, which are the only ones that
do not possess the same correlation structure as the others. Contrary to GMV, MDP
is attracted by local dierences in the correlation structure. The diversication index is
lowest for the MDP. If we consider the concentration measures of Herndahl, the EW
should be considered if the investor wishes to have the broadest weight diversity and
the ERC if risk concentration is the appropriate diversication risk measurement for the
investor.
We consider in more detail ERC. The risk contribution of asset j to the portfolio risk
is by denition the sensitivity of portfolio risk w.r.t. to φj times the weight φj . The
Euler Allocation Principle states when the sum of all risk contributions equals portfolio
risk.
Proposition 26. Let f be a continuously dierentiable function on a open subset of Rn .

If f is positive homogeneous of degree 1, this means tf (u) = f (tu) for t > 0, then
n
X ∂f (u)
f (u) = uk , u ∈ Rn . (2.79)
∂uk
k=1
See Section 7 for the proof. Applying the Euler Theorem to risk measures means:
X ∂R(φ) X
R(φ) = φj =: RCj (φ) . (2.80)
∂φj
j j
Calculating say portfolio risk for10 000 positions in a portfolio is complicated. But using
0
Euler's theorem, we need to calculate 1 000 sensitivities, multiply them with their position
and sum the result which is a much simpler task. For the volatility risk measure this
means:
X ∂R(φ) X (Cφ)j
R(φ) = σp (φ) = φj = φj √ 0 (2.81)
∂φj φ Cφ
j j
where (Cφ)j denotes the j -th component of the vector Cφ. The Euler risk decomposition
holds true for the volatility, VaR, and expected shortfall risk measurements.
Example - Euler allocation principle
Consider four assets in a portfolio with equal weights of 25 percent. The volatilities
are 30%, 20%, 40%, and 25%. The correlation structure
 
1
 0.8 1 
ρ=
 0.7 0.9 1
 .

0.6 0.5 0.6 1
The covariance matrix C is then calculated as (using the formula Ckm = ρkm σk σm )
 
9%
 4% 4% 
C=
 8.4% 7.2% 16%
 .

4.5% 2.5% 6% 6.25%
The portfolio variance

4
X
σp2 = φi φj Cij = 6.37%.
i,j=1
follows. Taking the square root, the portfolio volatility of 25.25% follows. Using (2.81),
the marginal risk contribution vector
 
26.4%
Cφ  18.3% 
√ 0 = 
φ Cφ  37.2% 
19%
follows. Multiplying each component of this vector with the portfolio weight gives the
risk contribution vector RC = (6.6%, 4.5%, 9.3%, 4.7%). Adding the components of this
vector gives 25.25% which is equal to the portfolio volatility. This veries the Euler
formula.
Table 2.21 shows that a seemingly well-diversied portfolio in terms of capital is in

fact heavily equity-risk concentrated.
Asset class diversication Risk allocation

Cash 2% Real estate 17% Cash 2%
Domestic equities 14% Hedge funds 10% Equity 79%
IEQ 8% Private equity 5% Commodity 8%
EM equities 4% Venture capital 9% CCR 10%
Domestic govt bonds 9% Natural resources 8% Other 4%
ICB 10% Distressed debt 4%
Table 2.21: Asset class diversication and risk allocation. The rst two columns contain
the diversication using the asset class view. The third column shows the result using
risk allocation. While the investment seems to be well diversied using the asset classes
the risk allocation view shows that almost 80% of the risk is due to equity. IEQ means
international equities, ICB means international corporate bonds, CCR corporate credit
risk.
This fact is often encountered in practice: Equity turns out to be the main risk factor
in many portfolios. But then capital diversication is a poor concept from a risk per-
spective.
The asset allocation of European's asset managers was in 2013 (EFAMA (2015)):
• 43% bonds;
• 33% equity;
• 8% cash and money market instruments;
• 16% other assets (property, private equity, structured products, hedge funds, other
alternatives).
The allocation has been fairly stable in the past except in the GFC where equities lost
massive value. This average allocation signicantly diers for dierent countries. UK for
example has investment in the equity class between 46% and 52% in the past while in
France the same class is around 20%. This dierence is due to dierences in preferences
of home-domiciled clients and the large dierences in cross-border delegation of asset
management. The ratio of AuM/GDP in UK is 302% which shows the importance of
UK as the leading asset management center of Europe with a strong client basis outside
of the UK. Comparing the allocation for investment funds and discretionary mandates,
the bond allocation is 28% in investment funds and 58% in the mandates and equities
have a share of 39% in the funds and 26% in the mandates. Hence, self-deciders are less
risk averse than those who delegate the investment decisions using mandates.
2.6.6 Reward-Risk Ratio (RR)

We link risk and returns - the reward-risk ratios. These ratios are used to determine
the performance, to communicate the success of the asset manager and to compensate
for performance-related fees. The nominator represents a return and the denominator
a risk. The RR has a deep anchoring in investment theory. In the Markowitz model,
the CAPM, the Black-Litterman and dynamic investment models: The optimal portfolio
weights are proportional to the RR.
The best known reward-risk ratio is the Sharpe ratio, see Denition 2.84. But it is
not monotonic: An investment can be preferred to another one, although the latter one
has a higher return in all possible states of the world. Therefore, a variety of alternative
reward-risk ratios have been proposed in recent years.
Cheridito and Kromer (2013) set-up an axiomatic system a RR should satisfy. Let
R be the return in one period [0, T ] where R can be an absolute return, a benchmark
return, a gross or a net return. The reward-risk ratio RR is dened by:
RE(R)+
RR(R) = (2.82)
RI(R)+
whereR is the return, RE(X)+ is the reward function, RI+ the risk function and
A+ = max(A, 0). The axioms are:
1. RR should be monotone [M]: More return is better than less.
14
2. RR should be quasi-concave [Q]: RR should prefer averages to extremes and en-

courage diversication.
15
3. Scaling [S] a return by a given factor does not change the RR.
16
Table 2.22 lists the properties of some well-known ratios. A dierent motivation for RR
not related to any above mentioned investment model for the Sharpe ratio dates back
to the safety-rst principle of Roy (1957). Roy argued that an investor rst wants to
make sure that a certain amount of the investment is preserved before he thinks about the
optimization of risk and return. The investor therefore searches for a strategy φ such that
the probability of the invested return being smaller than a level Rf is minimized. But
this probability cannot be larger than σ 2 /(µ−Rf )2 independent of the chosen probability
(Tchebychev's inequality) for any probability law. Therefore, the best thing to do is to
σ
minimize
µ−Rf , which is the same as maximizing the Sharpe ratio.
2.6.7 Risk Scaling

Is it possible given some assumptions to calculate risk for a new time horizon given
its value on a dierent horizon without needing further data, running simulations or
14 For all returns R1 , R2 , where R1 dominates R2 in all states, then RR(R1 ) ≥ RR(R2 ).
15 RR of a convex combination of two returns should be not smaller than the minimum of the two
single RRs.
16 The authors also state a fourth axiom parameter uncertainty [D]: RR are estimated by considering
not a single probability (the historical one for example) but by assuming that a whole set of probabilities
can generate the returns. This is the case of robust decision making.
RR Denition (M) (Q) (S) (D)

+
Sharpe ratio SR = E(R)
σ(R) - x x x
E(R)+
Sortino-Satchel ratio SSR = ||R − ||
p
x x x x
E(R)+
Gains-Loss ratio GLR = E(R− ) x x x x
E(R)+
Black-Treynor ratio BT(= Cov(R,B) + /Var(B) - x x -
E(R)+
Value-at-Risk ratio VaR = VaR(R)+ x - x x
aE(R)+
Mean-Entropic ratio MER = log E(exp −aR )+ , a >0 x x - x
E(R)+
MiniMax ratio MMR = ||R− ||∞ x x x x
Table 2.22: RR properties. A− denotes the negative value of A, B a benchmark return,

|| · ||p the p-norm. [Cheridito and Kromer [2013]).
developing a new risk model? If one assumes IID normally distributed returns with
zero mean, then the square-root of time rule can be used to scale volatilities. Consider
investments with two dierent holding periods t < T. The volatility for the T -period
follows from the t-period volatility by the square-root scaling law
p
σ(T ) = σ(t) T /t . (2.83)
To prove this, consider n IID returns:
σ 2 (R1 + . . . + Rn ) = σ 2 (R1 ) + . . . + σ 2 (Rn ) = nσ 2 (R).
For an asset with a one-day volatility of

p 2%, the monthly volatility - assuming 20 trading
days - is equal to 2% × 20/1 = 8.9%. The square-root rule provides a simple solution
to a complex risk scaling problem. The method fails in any of the following situations:
• Modelling volatility at a short horizon and then scaling to longer horizons can
be inappropriate since temporal aggregation should reduce volatility uctuations,
whereas scaling amplies them.
• Returns in short-term nancial models are often not predictable but they can be
predictable in longer-term models. Applying the scaling law one connects the
volatility in two time domains that are structural dierent.
• The scaling rule does not apply if jumps occur in the returns.
• If returns are serially correlated, the square-root rule needs to be corrected (see
Rab and Warnung [2011] and Diebold et al. [1997]).
2.6.8 Costs and Performance

We did not consider an frictions in the risk, return, and performance analysis starting in
Section 2.6.1. We add now fees and no taxes which changes the results in Figure 2.25.
We consider Swiss stocks with a gross average return of 7.73 percent and assume (Kunz
[2014]):
• 25% of the return arises from dividends, which face a taxation rate of 30%,
• The long-term ination rate is 2%,
• Investments can be via an investment fund (mutual fund, SICAV) with annual
costs of 1.5 percent, or an index fund with annual costs of 0.5 percent.
The net returns using these gures are given in Table 2.23.
Return after ...

... Fees ... Fees and taxes ... Fees, taxes, and ination
Market index 7.73% 7.15% 5.15%
Investment fund 6.23% 5.65% 3.65%
Index fund 7.23% 6.65% 4.65%
Table 2.23: Returns after Fees (Kunz [2014]).
Given these net returns, an investment of CHF 100 takes after 25 years the values in
Table 2.24.
Value of CHF 100 after 25 years ...

... Fees ... Fees and taxes ... Fees, taxes, and ination
Market index 643 562 351
Investment fund 453 395 245
Index fund 573 500 312
Table 2.24: Net growth of wealth (Kunz [2014]).
Fact 27. Using a cost and tax ecient wrapper for an investment amounts to an annual
return gain of 1.45% compared to an investment fund.
Given the zero-sum game of active investment, see the next Section, that only 0.6%
of 2,076 actively managed US open-end, domestic equity mutual funds generate a posi-
tive alpha after costs, see Section 3.6.6.3, and the possibility to wrap many investment
ideas in cheap index funds or ETFs, it becomes clear why many investor prefer passive
investments.
2.6.9 Passive versus Active Investment; a First Step

Let µm , µp , µa be the expected returns of the fully diversied market portfolio, a passive
portfolio, and an active investment, respectively. We assume that the fraction λ of
investors is passively invested and 1−λ is invested in active vehicles. By denition,
passive management means following an index, benchmark or another portfolio using
quantitative techniques. Active investors are the non-passive ones. Since any investor is
either an active or passive one and since the market return follows from the aggregate
return of the active and passive investors, we have:
µm = λµp + (1 − λ)µa . (2.84)
Assuming that the return of the passive investment equals that of the market, (2.84)
implies that the active return equals market return independent of the fraction λ.
Therefore, without any probabilistic or behavioural assumptions, before costs the three
investments pay back the same return:
Proposition 28 (Sharpe) . Before costs, the return on the average actively managed
dollar will equal the return on the average passively managed dollar.
Because active managers bear greater costs than a passive investment:
Proposition 29 (Sharpe). After costs, the return on the average actively managed dollar
will be less than the return on the average passively managed dollar.
These statements are strong and they are based on strong assumptions. Despite its
beauty, the assumptions that lead to (2.84) trivialize the problem. Suppose all investor
are active ones - who is on the other side of the trades? Returns are not independent
of the demand and supply side but in fact follow in market equilibrium. Demand and
supply matter. Pedersen (2018) extended the Sharpe arithmetic to cases where active
management can on average be more protable than passive one in an equilibrium con-
text. He replaced the unrealistic assumption that an active investor's gain is the loss of
another active investor, leading in the aggregate to a zero sum game. Next, the market
portfolio is not constant. It changes over time since new shares are issued and corporate
actions happen: Passive investors need also to trade regularly. If they have to trade at
less favourable prices than the active investors do, then the logic of Sharpe is broken.
Roll pointed out that a true market portfolio is not observable since it would include
any single asset. Market weighted indices are used as an approximation. In the US,
the Wilshire 4'500 Index contains 4'500 stocks of approximately 5'000 listed stocks. In
Switzerland, SPI Index contains 210 of 270 listed stocks. The global market portfolios
also dier signicantly depending who is calculating it. The major contributors are debt
and equity where equity is split in global equity, EMMA equity, private equity and small
cap equity and debt is split in government bonds, agency bonds, asset backed securities,
EMMA bonds, corporate bonds. The assumption that passive investment means to be
invested in the market portfolio is an approximation. Consider funds. US retail funds
are dierent to US institutional funds and are also dierent to non-US funds. The one-
ts-all argument of Sharpe does not considers the heterogeneity of investment wrappers
across dierent asset classes, dierent geographical regions, dierent client segmenta-
tions. Finally, the result is based on average active managers. It does not account for
the dierences between skill and luck.
The goal of active asset management is to outperform benchmarks. The manager tries
to beat the benchmark within a given Tracking Error (TE) limit. In fact proponents of
active investment use argumentation following the work of Berk and Green (2004). They
show that ecient markets not contradict the existence of skilled fund managers who
beat the market consistently. The concept of benchmarking and hence relative perfor-
mance has several advantages for the portfolio manager: Performance measurement is
simple relative to the benchmark, benchmarking has a disciplining force acting on the
asset manager and the structuring of the investment portfolio is simplied.
Active management often has both a passive component, the long-term goals in a
benchmark portfolio, and an active component, playing the views to exploit market
opportunities (TAA). The passive portfolio stabilizes the whole investment.
Denition 30. A passive investment strategy tracks a market-weighted index or portfolio

(the benchmark). The goal of an active investment strategy is to beat the market-weighted
index by changing market weights (asset selection) at the right time (market timing)
within a TE limit.
ETFs, trackers and index funds are examples of passive strategies. Mutual funds, op-
portunistic use of derivatives, and hedge funds are examples of active strategies. While
the deviation of a strategy from a benchmark, the tracking error, should be as small as
possible in passive investment, the tracking error in active investment describes how far
away the active manager moves away from the benchmark.
Dierent types of benchmarks are used. Either the benchmark is used to compare
the performance of a fund with its peers or the benchmark is a market index. While
both methods are meaningful for active investment, in a passive investment only index
benchmarking makes sense.
The main stock benchmark indices are MSCI World Index, FTSE, S&P 500 and
some other well known stock market indices. Since bond securities do not trade on
open exchanges there is less transparency about bond prices and the indexes used for
benchmarking are those created by the largest bond dealers such as the Barclays Global
Aggregate Bond Index, which tracks the largest bond issuers globally. Benchmark indexes
for commodities are for example provided by S&P and Goldman Sachs (S&P GSCI) or
by Bloomberg (Bloomberg Commodity Index). For credit risk of the Markit iTraxx
indices reect the creditworthiness of large corporates. A provider for real estate indices
is MSCI. There are four dierent type of income-producing real estate assets: oces,
retail, industrial and leased residential. Non-income producing assets are houses, vacation
properties or vacant commercial buildings. These dierent types of real estates assets
lead together with the geographical segmentation to many dierent real estate indices.
Chapter 3
Portfolio Construction
3.1 Steps in Portfolio Construction
So far, we did not consider the logic of portfolio construction but used dierent portfolios
in examples on an ad hoc basis. Several steps dene portfolio constructions:
• Grouping of assets: How do we select the parts (securities) of a portfolio?
• Allocation of assets: How much wealth (weights) do we invest at each date in the
specic securities?
• Implementation of the strategy: How do we transform the asset allocation into

trades?
The grouping of the assets or asset selection can be done on dierent levels:
• Asset classes (AC)
• Single assets
• Risk factors
The allocation of the assets can follow dierent rules:

• Optimal investment (Markowitz, CAPM, Black-Litterman, dynamic Merton-type
models, mean-surplus maximization)
• Heuristic rules (EW, ERC, risk budgeting, factor investing)
• Big data based methods
The implementation of the asset allocation can be done using dierent liquid assets:
• Cash products such as stocks and bonds
• Derivatives such as futures, forwards and swaps
121
122 CHAPTER 3. PORTFOLIO CONSTRUCTION
• Options
• Mutual funds, certicates, ETFs, money market funds
Further implementation issues are liquidity, tax and compliance (eligibility, suitability
and appropriateness).
3.2 Allocation - Foundations of Investment Decisions

The risk, return and diversication properties of assets of last sections were the result
of ad hoc or exerience based decision rules. We focus on optimal investment. Optimal
investment is based on on rational decision-making in a probabilistic set-up (statistical
models). We distinguish between optimal investment where people consume and invest
(ALM) or where they only invest (asset-only).
We assume that investors use the expected utility criterion as a rule of choice: The
higher the expected value is for an investment, the more is such an investment preferred.
Like any mathematical model, expected utility theory is an abstraction and simplica-
tion of reality. There exists a large academic literature which reports about systematic
violations of empirical behavior of investors compared to the expected utility theory pre-
dictions. A prominent theory is prospect theory by Kahneman and Tversky (1979) which
is also an optimization problem but typical behaviors of models such as in Markowitz
is enriched. But most investment theories used in practice are still based on expected
utility theory.
The theory assumes that investors form correctly beliefs and that they choose opti-
mal actions or decisions. The beliefs dene the probabilistic set-up about the dynamics
of future returns. One optimal action is the choice of the portfolio weights over time. The
optimal decision is based on the investor's preferences which are represented by her util-
ity function. Optimization requires to maximize expected utility subject to constraints
such as the budget constraint. This representation of the decision problem in term of
mathematical optimization is a well-developed eld in mathematics and the approach is
very general.
If investors face situations where risks (probabilities) are not known, uncertainty
dominates, then it makes no sense to rely on optimal investment theory but to use
heuristic reasoning, see Section 3.2.4.
3.2.1 Statistical Models

Most classic investment models such as Markowitz, the CAPM, arbitrage pricing theory
(APT) and Black-Litterman are asset-only models. There are no liabilities except in the
case of pension funds. With new technologies it is possible to consider also the liabilities
or goals for private investor optimally.
3.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 123
Preferences are described by a utility function u wealth W . Utility increases u0 (W ) >

0 with increasing consumption (positive marginal utility) but marginal utility decreases,
u00 (W ) < 0.1 These mathematical conditions imply that investors:
• Prefer more money to less;
• Are risk averse.
We further assume that investors are impatient: They prefer 1 CHF today than 1 CHF
tomorrow.
The mean-variance model was the rst model in portfolio optimization or asset allo-
cation which is based on the return-risk trade-o. Markowitz stated in 1952 the principle:
The investor should consider expected return a desirable thing and variance of return as
an undesirable thing. Three methods are common to operationalize this principle:
1. Either the investor chooses a portfolio φ to maximize the expected return where
volatility cannot exceed a predened level σ, or
2. Volatility is minimized such that the expected return cannot be lower than a pre-
dened level r or
3. A mean-variance utility function is optimized. The solution φ is parametrized by

the risk aversion, θ.
All solutions are equivalent.
2 We formalize the ideas. Consider N risky assets with a
return vector R in a single period. The expected returns are µ = E(R) and the covariance
matrix C of the returns is given by
C = E((R − µ)0 (R − µ)) .

The objective is to maximize the quadratic utility function which reects the trade-o
between reward and risk:
θ
u(R) = φ0 R − φ0 (R − µ)0 (R − µ)φ .
2
θ is the risk aversion of the investor.
3 Taking the expectations
θ
EP (u(R)) = φ0 µ − φ0 Cφ .
2
Optimization means to nd a portfolio φ which maximizes the above expected utility,
i.e.

θ 0 0
max EP (u(R)) = max φ µ − φ Cφ (3.1)
φ φ 2
1 We always assume that the utility functions are continuously dierentiable.
2 All three problem formulations are smooth convex quadratic optimization problems which possess a
unique solution.
3 The factor 1 is used to cancel a factor 2 in calculating the optimal portfolios.
2
with the solution
1
φ∗ = C −1 µ . (3.2)
θ
The matrix C −1 is the information matrix. Suppose that there is only one risky asset
and risk free asset with return µf . Then the above optimal rule reads:
1 µ − µf
φ∗ = . (3.3)
θ σ2
The fraction
µ−µf
σ2
the market price of risk. It is proportional to the Sharpe ratio.
An investor with zero risk aversion puts all the money in the asset with the largest
expected return. If risk aversion is not zero and since risk is always positive, the higher
risk, the lower the optimal level of expected utility. Formula (3.3) states that the optimal
amount invested in each asset is given by a mix of the expected returns of all assets with
the information matrix doing the mix. Assume that C is the identity matrix - there
is no risk dependence structure. Then C −1 = C and φ∗ = 2θ
1
µ. Hence, investment is
proportional to the expected returns and there is no mixing.
An insightful investor doubts that the probability law is known. He could therefore
consider the investment situation where dierent probabilities matter in the portfolio
choice problem. Then, uncertainty besides risk matter. Formally, let P be a set of
admissible probabilities. The optimization becomes

0 θ 0
min max EP (u(R)) = min max φ µ − φ Cφ . (3.4)
P ∈P φ P ∈P φ 2
The investor assumes that out of all possible probabilities (who denes this set?) the
worst one is chosen by a second player called 'nature'. This denes a robust optimization
problem. The solution will be more conservative than the original one. If one asset is
risk-free, this asset will attract a large part of the invested money. Although theoretically
sound, robust investments in this sense are hardly considered since the wealth allocation
is often to conservative and it is dicult to single out the set of admissible probabilities.
We do not consider this approach any further.
Summarizing, maximizing expected utility under constraints has the following general
structure.
• There is an utility function u, decision variables χ, such as consumption and port-

folio weights and state variables ξ, such as wealth.
• The constraints dene the admissible set A(ξ). The most well-known being the full
investment constraint, the budget constraint, the max and min amounts for each
asset class, a turnover constraint and many more.
3.2.1.1 Examples
Consider a single period investment problem where the investor derives utility u(W1 )
from nal wealth W1 . The investor chooses a portfolio φ ∈ Rn for
P n assets to maximize
E(u(W1 )) under the two budget constraints
P at time 0 and 1: j φj Sj (0) = W0 with
Sj (0) the price of asset j and W1 = j φj Sj (1). The rst order condition (FOC) for
optimality reads:
E(u0 (W1 )(Ri − Rj )) = 0 , (3.5)
for all asset pairs i, j . This equation has several implications. First, Ri − Rj means
a long-short combination is optimal. This zero-cost portfolio is called excess return.
Second, one can choose for asset j the risk free asset. Third, geometrically the condition
states that the excess return vector and marginal utility are orthogonal to each other,
that is
4
hu0 (W1 ), Ri − Rj i = 0 . (3.6)
Fourth, assume that the investor is risk averse u00 < 0. Then it is never optimal to
fully invest in the risk free asset. By contradiction, assume that the investor puts all
his initial wealth in the risk free asset.But then nal wealth W1 will be non-random
0
and also u (W1 ) is deterministic and it which can be taken outside the expected value in
(3.5). But then, unless all risky returns are the same, the FOC cannot be satised.
The utility function denes risk preferences. Consider an investor who is given the
choice a lottery that pays-o either 50 or 100 with the same probability or a lottery with a
guaranteed payo of 75: The bet has the same expected value as the guaranteed payo. A
risk-neutral investor is indierent between the two lotteries, a risk-averse investor prefers
the guaranteed payo.
5
Figure 3.1 shows the payo and utilities for the risk-averse and the risk-neutral in-
vestor. For the risk-averse investor, the expected value of the bet lies also on a straight
line but its utility value (yellow dot) is strictly lower than the utility of the guaranteed
payo (red dot). A risk-averse investor needs an extra compensation 'red minus yellow
dot' such that he becomes indierent.
Investment restrictions are widely used.
6 Practitioners often impose constraints
4 The inner product of the square integrable random variables is used.

5 Jensen's inequality E[u(W )] ≤ u(W [W ]) for a concave utility implies this. Hence, risk aversion is
the same as concave utility functions.
6 Some restrictions are:
• Preference restrictions - limiting the fraction of capital invested in equities.
• Legal restrictions - prohibiting access to some markets.
• Taxation - dierent taxation for the same investment idea wrapped by dierent securities such as
mutual funds or structured products.
• Budget restrictions.
• Liquidity restrictions - large investors do not want to move asset prices when they trade.
• Transaction fee restrictions.
Figure 3.1: Risk-neutral and risk-averse investors.
if the output of an investment optimization is not in line with what they consider a
reasonable strategy. But very constraint has an economic price; the shadow price. The
larger this price the lower is the constrained optimum compared to an unconstrained one.
Furthermore, adding many ad hoc constraints makes it dicult to explain whether a port-
folio is optimal due to the investor's preferences or due to the many constraints. Often in
wealth management several dozen constraints are imposed - constraints expressing client's
preferences ('not investing in hedge funds'), compliance constraints ('Chinese bonds are
excluded') or CIO-related constraints ('weight of Swiss equity is between 20 − 40% for a
specic investor').
We show the loss of utility in restricted optimization. The optimal value of an un-
restricted optimization problem is never lower than the value of a restricted problem.
Consider the minimization of the parabola u(x, y) = x2 + y 2 . The minimum is achieved
for the vector (0, 0) and the optimal value is u(0, 0) = 0. We insert the restriction that
x + y = r > 0. This means that x and y are positioned on a line. The optimal values
r r r r2
are x = y =
2 and f ( 2 , 2 ) = 2 which is larger than the optimal unrestricted value. The
Lagrange multiplier λ associated to the constraint x + y = r has the value λ = r . Since
the unrestricted optimum is a the origin, the larger we choose r , that is the more distant
the line is from the origin the more value is lost. This is exactly the statement of the
shadow price.
3.2.2 Rational Dynamic Decision Making

Investors often face a long-term investment horizon. Pension funds have to satisfy a
liability stream over time, private clients would like to nance dierent goals in the fu-
ture. This denes a dynamic expected utility problem, see Sections 3.3.2 and 3.3.3. The
investor searches a portfolio φt at dierent dates such that the expected present value of
the investment is maximized. To solve such an investment problem optimally one has
to proceed backwards : Solve, in discrete time, the last period investment decision. This
is an optimal single period decision. Then solve the second to last decision given the
last one will be optimal and so on. This backward induction is based on the optimality
principle of Bellman (1954): An optimal policy has the property, that whatever the ini-
tial state is and initial decisions are, the remaining decisions must constitute an optimal
policy with regard to the state resulting from the rst decision.
Why should one consider dynamic investment at all?
Fact 31. Optimal dynamic investment allows to distribute investment risk not only in
the cross-section (single-period models) but also over time.
Despite the meaningfulness of multi-period models, most investment models used are
static ones. There are three main reasons. First, technology was not able in the past
to solve dynamic problem in time, i.e. machines were not fast enough. Second, most
asset managers are well-educated in static models but knowledge about dynamic models
is sparse. Third, yet static models are awed by parameter uncertainty (estimation risk).
The intertemporal set-up adds additional uncertainty.
Optimal dynamic investment is able to take into account changing future investment
opportunities in an optimal way. Static models do not have any foresight power to react
today what could happen in future periods. Changing investment opportunities are key
for long-term investors such as for pension funds.
Example Backward versus forward induction
Consider the case where you have to drive from New York to Boston. Using a repeated
static model (forward induction) you decide at each crossroad given the trac sit-
uation which direction to follow next. Using this strategy you will never arrive in Boston.
Dynamic optimally means that you start with the end in mind: You work backwards
starting in Boston. At each crossroad in the backward approach, you calculate whether
it is best to turn left or right knowing that all decisions which follow are optimal. Given
the circumstances it may be globally optimal to take a non-optimal small road in a
single step if for example this road leads to the next highway. This singles out between
the myriad of paths between New York and Boston the truly optimal one.
Repeating say 10 times an optimal one-period model decision (forward solution) is

not the same than making optimal investment decisions backwards, except for some
particular situations.
3.2.3 Growth Optimal Portfolios

Growth optimal portfolio (GOP) by denition have a maximal expected growth rate over
any time horizon. Therefore, a GOP dominates any other portfolio strategy when the
time horizon increases. If such a strategy exists, why do then people care about any
otehr investment strategies?
The origins of GOP is attributed to Kelly (1956) which also leads to the Kelly cri-
terion in investment. Kelly was not interested in investment but wrote his work with
gambling and information theory in mind. The Kelly strategy, i.e. a GOP, is an optimal
strategy such that with probability one the strategy accumulates more wealth than
any other strategy. The expression 'with probability one' is key. Deleting this expression
leads to wrong statements and decisions concerning GOP.
To motivate GOP, consider a binary gamble, see Rotando and Thorp (1993). Let
W0 be initial wealth, Bk the bet k, p the probability of winning and q the probability of
loosing the bet. Then,
n
X
E(Wn ) = W0 + (p − q)E(Bk ) .
k=1
If the game's expectation is positive, p > q, then to maximize E(Wn ) is the same to
maximize E(Bk ) at each trial. Therefore, it is optimal to bet on all resources in each
trial - W0 = B1 is the starting bet. The ruin probability of such a strategy is 1 − pn ,
i.e you get bankrupt fast almost certain. Contrary, if one minimizes the ruin probability
then one also minimizes expected return. The GOP is an intermediate strategy between
these two over-aggressive or over-timid strategies.
Consider the strategy to invest a xed fraction c of present wealth in the next bet,
i.e. Bk = cWk−1 . If s is the number of successful bets and f the number of failures in n
bets, then
Wn = W0 (1 + c)s (1 − c)f .
If 0 < c < 1, ruin is not possible. Using the compounding identity
1/n
n log Wn Wn
e W0
=
W0
the exponential growth rate per trial is given by
1/n
Wn s f
log = log(1 + c) + log(1 − c).
W0 n n
Setting G(c) equal to the expected value of this growth rate, we get
G(c) = p log(1 + c) + q log(1 − c).
Since G(c) = n1 E(log Wn )− n1 log W0 , maximizing G(c) is equivalent to the maximization

0
of log utility E(log(Wn )). Taking the derivative, G = 0 if the optimal xed fraction
c∗ = p − q is chosen. Furthermore G00 < 0 and c = c∗ is the unique maximum.
Proposition 32. 1. If c ∈ (0, c∗ ) is chosen, then wealth will grow unlimited if n → ∞

except for a nite number of wealth terms.7 Contrary, if c ∈ (c∗ , 1) is chosen, ruin
follows if n → ∞.
2. Consider the optimal xed fraction strategy c∗ and any

other strategy φ. Then the
ratio of wealth W (c∗ )/W (φ) tends to innity if the number of trials goes to innity
except for a nite number of wealth terms.
3. The fastest time to reach a target wealth level starting from any level W is given
asymptotically by a strategy which maximizes expected log-wealth utility.
Rotando and Thorp apply the GOP to S&P investing using the data 1926-1984.
First, they calculate the probability of a return below a T-bill return. This probabil-
ity decreases from 38% for n = 2 21%
years to after ten years to 8% after 30 years.
The optimal xed fraction to invest is 117%, i.e. it is optimal to borrow 17% of exist-
ing wealth in each year. This suggests that GOP needs long-term investment horizons
and the optimal strategy is leveraged. Summarizing, a GOP has the theoretical advan-
tage of maximum rate of growth of wealth but it turns out to be too risky in practice.
These results triggered many discussions about the usefulness of GOP. A main cri-
tique was formulated from Samuelson in the 60's of last century. He states that if one
is not willing to accept a single bet then one rationally will never accept a sequence of
such bets: If the ruin probability is not acceptable for the rst year investment given c∗
I will never accept 30 bets of this type. This non-transitivity of preferences is refused by
Samuelson. Thorp answered that the limit GOP respects transitivity.
This discussion about transitivity is central to GOP: Does a property is valued in

the limit n → ∞ or for a nite but large n, and how large is such an n in practice?
Christensen considers geometric Brownian motions.
8 The GOP strategy is to replace
a−r
volatility of the price process by the market price of risk θ= σ and the drift a by the
risk free rate r. Then even with a Sharpe ratio of 0.5 it would take almost 30 years to
beat the risk-free bond with a 90% probability.
Summarizing, GOP are too risky: No money manager can survive with a limit-oering
if he is hit say twice in the rst 5 years of his mandate. Impatience of investors rules
7 Formally, Wn is approaching innity almost surely.

8 The asset price dynamics dSt = St (adt + σdWt ) with S0 = s has the solution St = se(a− 12 σ2 )t+σWt .
out any long term investment strategies which are focussing on maximum return growth
without controlling the possible nite shortfall risks. But controlling for short fall risks
in a mathematical way brings us back to a return-risk framework. A dierent approach
is to mix the mathematics of GOP with business experience by selecting those stocks
for GOP which are not expected to have shortfall risks. W. Buet seems to apply an
investment approach of this form.
3.2.4 Heuristic Models

The heuristic approach is radically dierent from the statistical one. Heuristics are
methods used to solve problems using rules of thumb or experience. Heuristics need
not be optimal in a statistical modelling sense. Heuristics could be seen as a poor
man's concept compared to statistical models. But there are situations where heuristic
approaches are meaningful.
One reason for the use of heuristics arises if one distinguishes between risk and uncer-
tainty. According to Knight (1921), risk refers to situations of perfect knowledge about
the probabilities of all outcomes for all alternatives. This makes it possible to calculate
optimal choices. Uncertainty refers to situations in which the probability distributions
are unknown or unknowable - that is to say, risk cannot be calculated at all. Situations
of known risk are relatively rare. Savage (1954) argues that applying standard statistical
theory to decisions in large, uncertain worlds would be utterly ridiculous because there is
no way of knowing all the alternatives, consequences, and probabilities. Using optimal so-
lutions in a world with uncertainty just adds non-controllable model risk. To understand
when people use statistical models in decision-making and when they prefer heuristics
requires the study of how the human brain functions, see Camerer et al. [2005] and
Plicher and Fehr [2013].
Example - Uncertainty examples
Ellsberg (1961) invented the following experiment to reveal the distinction between
risk and uncertainty.
9 An individual considers the draw of a ball from one of two urns:
• Urn A has 50 red and 50 black balls.
• Urn B has 100 balls, with an unknown mix of red and black.
First, subjects are oered a choice between two bets:
• USD 1 if the ball drawn from urn A is red and nothing if it is black.
• USD 1 if the ball drawn from urn B is red and nothing if it is black.
Second, the same subjects are oered a choice between the following two bets:
• USD 1 if the ball drawn from urn A is black and nothing if it is red.
• USD 1 if the ball drawn from urn B is black and nothing if it is red.
In both cases, the rst bet is generally preferred in experiments. That is, individuals belief
in the rst case that the number of red balls in urn B is less than 50% and in the second
case the same individuals assume that the number of black balls in urn B is also smaller
than 50%. This probability assessments are inconsistent. Ellsberg's interpretation was
that individuals are averse to the ambiguity regarding the odds for the ambiguous urn B.
They therefore prefer to bet on events with known odds. Consequently they rank bets
on the unambiguous urn, A , higher than the risk-equivalent bets on B.
Example - Uncertainty in macroeconomics
Caballero (2010) and Caballero and Krishnamurth (2008) consider the behavior of
investors in the following ight-to-quality episodes:
• 1970 - Default by Penn Central Railroad's prime-rated commercial paper caught the
market by surprise.
• 1987 - Speed of the stock market's decline led investors to question their models.
• 1998 - Co-movement of Russian, Brazilian, and US bond spreads surprised almost

all market participants.
• 2008 - Default on commercial paper by Lehman Brothers created tremendous un-

certainty. The Lehman bankruptcy also caused profound disruption in the markets
for credit default swaps and interbank loans.
They nd that investors were re-evaluating their models, used conservative behav-
ior or even disengaged from risky activities. These reactions cannot be addressed by
increasing risk aversion about macroeconomic phenomena. The reaction of investors in
an uncertain environment is fundamentally dierent from a risky situation with a known
situation and environment.
Example - Greece and the EU
In spring 2015 uncertainty about the future of Greece in the EU increased. Four
dierent scenarios were considered:
• Status quo. Greece and the EU institutions agree on a new reform agenda such
that Greece receives the remaining nancial support of EUR 7.2 billion from the
second bailout package.
• Temporary introduction of a currency parallel to the euro. If the negotiations under

A are taking longer than Greek liquidity can last, Greece will introduce a parallel
currency to fulll domestic payment liabilities.
• Default with subsequent agreement between the EU and Greece. There is no agree-
ment under A. Greece fails to repay loans and there will be a bank run in Greece.
The ECB takes measures to protect the European banking sector.
• Grexit - that is, Greece leaves the eurozone. Greece stops all payments and the
ECB abandons its emergency liquidity assistance. Similar conclusions hold for the
Greek banking sector as under C. Greece needs to create a new currency since the
country cannot print euros.
The evaluation of the four alternatives is related to uncertainty and not to risk: the prob-
ability of each scenario is not known, there are no historical data with which to estimate
the probabilities, and the scenarios have dependencies but they are of a fundamental
cause-eect type, which cannot be captured by the statistical correlation measure. This
shows that valuable management is related to situations which are based on uncertainty.
The use of 'uncertainty' and 'risk' does not follows clear standards and conventions
in practice. A volatility index such as VIX is sometimes called a measure of uncertainty:
If volatility increases one often states that uncertainty increases. Strictly speaking this
makes no sense since the VIX is a calculated index of risk. Hence, risk increases or
decreases but this has a priori no relation to uncertainty. A similar logic is that investor
state if uncertainty increases often markets become more volatile and equity markets fall
(negative leverage eect) or if uncertainty increases, then credit spreads of corporates or
governments should widen. Again risk and uncertainty are used interchangeably. 2016
provides an example that one should not mix risk and uncertainty. In 2016 many events
happened where it was impossible to calculate risk - Brexit, election of Trump, increasing
geopolitical tensions in the Middle East, political instability in major countries such as
Brazil and Turkey for example. There were for example no data to assess the risk of the
Trump election. But if large uncertainty means large risks, then heavy market reactions
should follow. But most assets classes ended the year with positive returns. There was
almost no market reaction to the events. Furthermore, plotting an uncertainty index such
as policyuncertainty.com versus credit spreads measured in USD show that uncertainty
increased in 2016 while the spreads fell.
3.3 Portfolio Construction Examples

3.3.1 Heuristic Allocation: Static 60/40 Portfolio
A classic portfolio construction is the so called '60/40 portfolio'. After each time period,
the portfolio values are rebalanced such the value of equity is 60 percent of the actual
wealth level and the xed income government bond investment has weight 40 percent.
3.3. PORTFOLIO CONSTRUCTION EXAMPLES 133
The two components equity and government bonds are equally weighted portfolios of
stocks and bonds ('dollar weighted'). The 60/40 portfolio in the US has generated a 4
percent average annual return back to 1900.
The 60/40 portfolio turns out to be not diversied enough when markets are dis-
tressed or booming. The dot-com bubble and the nancial crisis of 2008 revealed that
dierent asset classes moved in the same direction and behaved as if they were all of
the same type, although capital diversication was maintained: Risk weights are not the
same as dollar weights.
Deutsche Bank (2012) reports the following risk contributions using volatility risk
measurement for 60/40 portfolios with S&P 500 and US 10y government bonds. The
long-term risk contribution, 1956 to 2012, by asset class was 79/21 percentage dierent
from a 60/40 capital diversication. The risk contribution in extreme market periods of
US government bonds varied between 53% in 19981 and 7% in 1973.
The left panel in Figure 3.2 illustrates the strong positive correlation between equity
and bonds: In the left panel, world wide equity portfolios are compared to a balanced
equity and bond portfolio. The linear relationship between the two returns with low vari-
ability indicate that a single global equity portfolio is as good as a balanced equity bond
portfolio. The performance and risk of traditional balanced portfolios is mostly driven by
the equities quota. The R2 is 95%, i.e. 95% of the risk is explained by equity risk. Hence,
asset classes consist of a bundle of 'risk factors' where the same risk factors can belong
to several asset classes. This extends to all asset in the case of systemic liquidity events:
The monthly dollar returns between the classic asset classes and alternative classes show
rather low correlation between 2000 and 2007 but increase sharply during the GFC and
remain elevated as the sovereign debt crisis follows in 2011. This failure of alternatives
to diversify during the GFC led to critique about the diversication concept based on
asset classes per se, see Figure 3.2. In the middle panel commodities and hedge funds are
added to the balanced portfolio. While the variability increases one still sees that equity
risk factors are driving the returns, the allocation of risk is only slightly improved. Still
90% of risk is explained by the equity risk factor. Finally, if one replaces equity by bonds
in the right panel, a cloud-type scatter plot follows. This indicates that equity and not
bond risk factors are the return drivers.
The time varying correlation in Figure 2.26 shows that the correlation between stocks
and bonds varies over time. Historically, periods of rising ination and heightened
sovereign risk have driven stock and bond correlations sharply positive. In contrast,
correlations often turned negative when ination and sovereign risk were at low levels.
If stocks and bonds can be described by their exposure to macroeconomic factors,

their correlations could be determined entirely through their relative exposures to the
same set of factors. Therefore, why not measure the exposures of stocks and bonds to
common factors and act according to the volatility and correlation forecast instead using
Figure 3.2: Left Panel: Monthly return equities world vs monthly return balanced port-
folio (Equities world: 50%, bonds world: 50%), Bloomberg: 12/1998-3/2013. Middle
Panel: Monthly return equities world vs monthly return balanced portfolio (Equities
world: 40%, bonds world: 40%, commodities: 10%, hedge funds global: 10%) Com-
modities database: DJUBSTR, Hedge Funds database: HFRXG. Right Panel: Monthly
return bonds world vs monthly return balanced portfolio (Equities world: 50%, bonds
world: 50%) Bloomberg: 12/1998-3/2013, local data.
the static 60/40 rule? This is not eective since the true factor structure is unobservable,
economic factors are not investable or investor's sentiment impact the correlation struc-
ture, which makes the prediction of changing correlation dicult. Kaya et al. (2011)
nd that the economic factors growth and ination have accounted for only 2 percent of
the total volatility of the 60/40 portfolio in the US since 1957, while 98 percent of the
volatility of the portfolio has been the result of missing factors, mis-specied factors, or
risks that are specic to each asset class.
Summarizing, the 60/40 asset allocation based on asset classes correlations between
asset classes is time-varying, not risk-stable and dicult to forecast. Risk weights are
not the same as dollar weights. Asset classes seem not to be the right level for risk
aggregation.
3.3.2 Optimal Allocation: Dynamic Merton Model

We consider the Merton model (1973) of dynamic optimal consumption and investment.
This seminal contribution is the benchmark model for dynamic optimal decision making.
The choice variable is a vector (c, φ) with consumption rate c, φ the fraction of wealth
invested in the risky asset and 1−φ in the risk-less asset. The state variable Wt represents
wealth and utility reads
ca
u(t, c) = e−rt , 0<a<1.
a
The individual optimizes expected utility:
Z ∞ a

−rt ct
V (W0 ) = max E e dt , 0 < a < 1 .
c,φ 0 a
The maximization is done subject to the dynamic budget constraint for the wealth dy-
namics Wt . Wealth growth is driven by the price evolution of a single risky asset S, a
risk free asset B and the consumption rate at each date. The risky asset S dynamics fol-
lows a geometric Brownian motion with constant drift µ and volatility σ and the growth
rate of the risk free r. Inserting this information provides us with the dynamic budget
constraint
dW = (φµW + (1 − φ)rW − c)dt + σφW dB
with B the standard Brownian motion. The optimality principle of Bellman starting in
t0 for a period t0 + dt reads:
Z t0 +dt
V (t0 , W0 ) = max E u(t, c, W )dt + V (t0 + dt, W0 + dW ) . (3.7)
c,φ t0
Hence, the value at t0 is equal to the sum of optimal utility over short time dt plus
the value reached at t0 + dt, i.e. all decisions are optimal after t0 + dt. Expanding the
future value in a Taylor series, using the dynamics of the assets transforms the above
equation into a non-linear partial dierential equation for the value function J. The
solution of this equation implies the following optimal strategies, see the Section Proof
7 for details:
10
1 µ−r 1
V (W ) = α∗ W a , c∗ = W (aα∗ ) a−1 , φ∗ = (3.8)
σ2 1 − a
where α∗ is the explicit solution of an algebraic equation involving the preference
and growth rate parameters. The optimal investment in the risky asset φ∗ is equal to
µ−r 1
the market price of risk (MPR) times the relative risk aversion
σ2 1−a . The MPR
is itself proportional to the Sharpe ratio (which is also the solution of the Markowitz
problem). This validates the claim that the Markowitz problem also holds in a dynamic
context unless the investment opportunity sets are changing over time, see Section 4.7.1.
Optimal consumption is proportional to the wealth level which is reasonable. There
are many extensions of the basic Merton model - such as many assets, adding income,
allowing for a bequest motif, adding linear investment constraints. As a fact, analytical
tractability is lost in most extensions.
10 Only very few problems can be explicitly solved.

3.3.3 Optimal Allocation: Goal Based Investment

The ideas of Merton can be applied to maximizing the probability of nancing several
goals (liabilities) at dierent future dates. since private clients often think in terms of
goals G. They are interested to choose an optimal strategy such that the probability of
nancing their future goals is maximized. The trivial case is where initial W0 is larger
the risk-free PV of all goals.
Assume that risk is needed to nance the goals. Goal based investment (GBI) means
to nd a strategy φ(t) which maximizes the probability
max P (WT ≥ GT ) . (3.9)

φ
To this objective function one adds the asset dynamics, the initial wealth level and
additional constraints. Assume that there are N risky assets which all are coupled by
a time-varying but deterministic covariance matrix C and where each asset has a time-
varying expected return µ(t). There is a risk less asset with a time-varying deterministic
short-term rate r(t). The asset dynamics denes the wealth dynamics dWt starting at
W0 . The optimal policy, using the Bellman Principle, is derived by Browne (1999):
C −1 (t)Θ(t) φ N −1 (z(t))

S
φ (t) = qR Wt (3.10)
T 0 z(t)
t Θ(s) Θ(s)
RT
with the discount factor D(t, T ) = e− t r(s)ds
, φ the density function of a standard nor-
mal distribution, N the associated cumulative distribution function, Θ = C −1 (µ − re)
W0 −1
the market price of risk (MPR), e a N -dimensional unit vector, z(t) = G T
D (t, T ) the
percentage of the discounted goal reached at time t
The optimal investment formula 3.10 states:
• At each time t≤T optimal investment is a linear function of wealth.
• The linear function is weighted by a time-dependent part proportional to the MPR

φ(N −1 (z(t)))
and a part which measures how much of the goal has be achieved at
z(t)
time t.
• The investor or asset manager at each date t observes the optimal wealth Wt and
then chooses the investment for the next (innitesimal) period according to the
optimal formula. The problem can be discretized in order to obtain real investment
periods.
• At each date the deterministic expected means and covariances enter. These func-
tions can be determined by the CIO oce or the advisory function using a SAA
and TAA approach. Besides the actual values also the values for the remaining
life-time matter. Therefore, by changing these forecast values at time t implies a
reshaping of the optimal investment policy at this date. Given the simplicity of
the optimal formula the investment universe can be set-up by a large number of
dierent assets ensuring diversication of wealth growth.
• Suppose that all assets lose in value from the beginning for some time. Then if
wealth has dropped enough in value there is not enough time left such that the
wealth level can beat the goal. Then, the investor has to borrow or to inject
additional money or to reduce the size of the goal. Browne shows in an example
that for T = 10y the wealth has to drop more than 62% in the rst year in order
to need to borrow. If there is only one month left, then the investor must borrow
unless wealth is already of 88% distance to the investment goal.
The approach can be generalized to include income and consumption streams, beating a
benchmark portfolio and controlling for downside risk, see Browne (1999).
3.3.4 Optimal Allocation: Markowitz

3.3.4.1 The Two-Asset Case
Consider two assets and two portfolios A, B shown in the portfolio expected return and
standard deviation space in Figure 3.3.
Figure 3.3: Portfolio frontiers in the two-asset case. The portfolio opportunity set is a
hyperbola in the portfolio coordinates expected return and standard deviation.
Solving the mean-variance optimization problem in this two risky asset case shows
that the portfolio opportunity set is a hyperbola in the (σ, µ)-portfolio coordinates (line
3). It is maximally bowed for perfect negative correlation. The lower correlation is, the
higher are the gains from diversication. For perfect positive or negative correlation the
hyperbola degenerates to straight lines. Line 1 represents all possible portfolio choices
if there is perfect positive correlation, +1. Similarly, for perfect negative correlation the
straight line 2a or 2b follow. In the presence of perfect negative correlation we can fully
eliminate portfolio risk (point C ).
The following denitions are common.
Denition 33. 1. If a portfolio oers a larger expected return than another portfolio
for the same risk, then the latter portfolio is strictly dominated by the rst one.
2. Portfolios that are not strictly dominated are called mean-variance ecient.
The set of these portfolios form the ecient frontier.
3. The portfolio φm at the point D is the global minimum variance (GMV) port-
folio
.
The lines 1, 2b and the line between D and B are ecient frontiers.
3.3.4.2 Many Risky Assets

Considering many assets does not add new economic insights. We therefore keep this
section short. The assumptions of the Markowitz model are:
1. There are N risky assets and no risk free asset. Prices of all assets are exogenous
given.
2. There is a single time period. Hence risks cannot distributed over time but only in
the cross-section.
3. There are no transaction costs. This assumption can be relaxed nowadays by

solving the model numerically.
4. Markets are liquid for all assets.
5. Assets are innitely divisible. Without this assumption, we have to rely on integer
programming which makes sense and which today is feasible.
6. If borrowing and lending is excluded, full investment holds, he, φi = 1 with e =

(1, . . . , 1) ∈ Rn .
7. Portfolios are selected according to the mean-variance criterion.
8. The vectors e, µ are linearly independent. If they are dependent then the optimiza-
tion problem does not have a unique solution.
9. All rst and second moments of the random variables exist, i.e. the mean and
covariance are not dened.
We dene the auxiliary variables: a = hµ, C −1 µi, b = he, C −1 ei, c = he, C −1 µi, ∆ =
ac − b2 and
a c
A= .
c b
Proposition 34. Consider N risky assets and the above assumptions. Then the Markowitz
problem
1
minn hφ, Cφi (M) (3.11)
φ∈R 2
s.t. he, φi = 1 , hµ, φi = r .
has a unique solution
φM V = rφ∗1 + φ∗2 (3.12)
with
φ∗1 C −1 µ

−1
=A . (3.13)
φ∗2 C −1 e
See Section 7 for the proof. The portfolio weights are linear in the expected portfolio
return r . Inserting φM V into the variance implies the optimal minimum portfolio variance
σp2 -hyperbola:
1 2
σp2 (r) = hφM V , CφM V i =

r b − 2rc + a . (3.14)
∆
Diversication in the mean-variance model means that adding more assets causes the
ecient frontier to widen: for the same risk, a higher expected return follows (see Figure
3.4).
The Markowitz model fails to be stable in the following sense. Consider a GMV
portfolio with two assets, hence the optimal portfolio only depends on covariance but
not on returns. Suppose that both assets have a volatility of 20 percent and full positive
correlation of 1. Then, the optimal weights are 50 percent in each asset. Suppose next
that asset 1 has only 19.9% volatility, all other numbers unchanged. Then, 100 percent
is invested in this asset and zero in the second one.
Example
Consider three assets with expected returns (20%, 30%, 40%) and covariance
 
0.1
C =  0.08 0.15 
0.09 0.07 0.25
Figure 3.4: Dierent ecient frontiers for dierent numbers of assets. It follows that
adding new assets allows for higher expected return for a given risk level (measured
by the portfolio standard deviation - Stdev). The portfolio with the lowest standard
deviation is the global minimum variance (GMV) portfolio (Ang [2012]).
We assume that the investor expects a minimum return of r = 30%. He could then fully
invest in asset 2 to achieve this return goal. But the optimization φM V shows that he
can reach this target with lower risk, where
φM V = (0.28, 0.43, 0.28)0

The investor is fully invested and long in all assets. The risk of the optimal portfolio
is σp = 10.7 percent which is less than the 15 percent if the investor only invests in
the second asset. We compare the Markowitz portfolio with the equally weighted (EW)
portfolio and the risk-parity portfolio of inverse volatility (IV) - that is to say, investment
in each asset is inversely proportional to its volatility. We get
• φM V = (0.28, 0.43, 0.28)0 ,
• φEW = (0.33, 0.33, 0.33)0 ,
• φIV = (0.48, 0.32, 0.19))0 .

The MV strategy considers variances and covariances, EW does not consider them at
all, and the risk-parity strategy only considers variances. The statistics for the three
strategies are:
Strategy Expected return Portfolio σP

MV 35.7% 10.7%
EW 29.7% 10.6%
IV 26.8% 9.6%
Example
Consider two assets with expected returns of µ1 = 1 and µ2 = 0.9 and

0.1
C= .
−0.1 0.15
Asset 1 seems more attractive than asset 2. It has a higher expected return and lower
risk. Naively one would invest fully in the rst asset. But negative correlation makes
an investment in asset 2 necessary to obtain an optimal allocation. The expected return
constraint is set equal to r = 0.96. We consider four strategies:
• φ1 = (1, 0), full investment in asset 1.
• φ2 = ( 12 , 21 ), an equal distribution.
• φ3 = (5/9, 4/9), optimal Markowitz strategy without the expected return con-
straint.
• φ∗M V = (0.6, 0.4), optimal Markowitz solution with the expected return constraint.
The following expected portfolio returns and risk for the dierent strategies hold:
Strategy µ σP
φ1 1 0.1
φ2 0.95 0.0125
φ3 0.955 0.011
φ∗M V 0.96 0.012
φ1 satises the expected return condition but risk is much larger than in all other
strategies - lack of diversication. The risk of φ3 is minimal but the return is smaller than
required. To generate the return and keep risk minimal, 40 percent has to be optimally
invested in the not very attractive asset. This is the Markowitz phenomenon: to reduce
the variance as much as possible, a combination of negatively correlated assets should be
chosen.
Figure 3.5 shows that an under-diversied portfolio follows for portfolios on the e-
cient frontier which becomes more pronounced for higher risks. Furthermore, the steep
Figure 3.5: Ecient allocations for 21 dierent portfolios. The rst portfolio is the GMV
portfolio and moving to the right optimal portfolios on the ecient frontier follow. Data
1991-2016, monthly data, long-only portfolio constraint.
vertical changes in the asset allocation indicate that the allocations are not robust:
Small changes in covariance data lead to large changes in the asset allocations. Does
a Markowitz portfolio provides a reasonable diversication for portfolios over time?
The answer, see Figure 3.10, is again no: One observes an under-diversication and
non-stability of the asset allocation.
3.3.4.3 Mutual Fund Theorem

The Mutual Fund Theorem (MFT)
11 is one of the most important results for investment
theory and investment practice. Under suitable assumptions, which are satised in the
Markowitz model, the optimal investment strategy is to invest in the risk-free asset and
a second risky asset (ETF, mutual fund) which is a linear combination of the risky assets
available on the nancial market. The Proposition for the case with risky assets only
reads:
Proposition 35. Any minimum variance portfolio can be written as a convex combina-
tion of two distinct minimum variance portfolios.
11 Sometimes called the 'two fund theorem' or 'separation theorem'.
See Section 7 for the proof.
Formally, if φ∗M V (r) is any optimal minimum variance portfolio, then there exists a
function ν(r) for any two other optimal minimum variance portfolio, φ∗1 (r), φ∗2 (r), such
that
φ∗M V (r) = νφ∗1 (r) + (1 − ν)φ∗2 (r). (3.15)
The entire mean-variance frontier curve can be generated from just two distinct portfolios.
This holds since the ecient frontier is a one-dimensional ane subspace in Rn . The
Mutual Fund Theorem allows investors to generate an optimal portfolio by searching
for cheaper or more liquid portfolios and invest in these portfolios in the prescribed
way. This theorem led to the growth of mutual fund and ETF industry. The Mutual
Fund Theorem also holds for some dynamic models such as the Merton model of last
sections. But if there are risk sources for assets which cannot be hedged then more than
two funds are needed to construct an optimal investment strategy, see Section 4. In
general, structure of the investor's preferences and the structures of the asset markets
both determine whether a mutual fund theorem is valid.
3.3.4.4 Markowitz Model with a Risk-Free Asset

If we assume that one asset is risk-less and the other ones are risky, the whole optimization
program of Markowitz can be repeated. Most properties of the risky-only asset case carry
over to the case with a risk-free asset.
The ecient frontier is a straight line which has at least one point in common with
the ecient frontier - the case, where it is be fully invested in risky assets. The portfolio
where the two frontiers intersect is the tangency portfolio T (see Figure 3.6; left panel).
Natural candidates for the mutual fund theorem are the tangency portfolio and the
risk-less-asset investment. In the right panel of Figure 3.6, dierent portfolios on the
ecient frontier are shown. The investors can add cash to become more conservative
or borrow cash for an aggressive investment. The portfolios on the Capital-Market-Line
(CML) depend on the investor's preferences θ in (3.1). The higher risk aversion, the
closer is the point in the CML to the risk-free investment. Ang (2012) estimates an
aggregate risk aversion parameter value as follow. He calculates the optimal minimum
variance portfolio using USA, JPN, GBR, DEU, and FRA risky assets only. Then he
adds a risk-free asset and searches for the point on the CML that delivers the highest
utility. This point implies a risk aversion of θ = 3. The optimal portfolio with a risk-free
asset can be seen in Figure 3.6 in the region where the aggressive investor is shown. The
investor is long on all risky assets and short on the risk-free asset. But in reality, only
half of investors invest their money on the stock market and the remainders keep their
money risk free. In some European countries stock market participation is lower than 10
percent. This is the non-participation puzzle of mean-variance investing.
Figure 3.6: Mean-variance model with a risk-free asset. Left panel - straight line ecient
frontier (CML), which is tangential to the ecient frontier when there are risky assets
only. The tangency point T is the tangency portfolio where investment in the risk-free
asset is zero. Right panel - investors' preferences on the ecient frontier. Moving from
the tangency portfolio to the right, the investor starts borrowing money to invest in the
risky assets. The investor is short cash in this region to nance the borrowing amount.
Geometry implies for the CML
µT − Rf
µp = Rf + σp
σT
with µT , σT the expected mean and standard deviation of the tangency portfolio, respec-
µT −Rf
tively. The slope of the CML is the Sharpe ratio SR = σT . This is the price of one
unit risk of for an ecient portfolio.
3.3.4.5 Mean-Value-at-Risk Portfolios

One critique of the mean-variance criterion for optimal portfolio selection often concerns
the variance as a symmetric risk measurement: Why penalize the upside in portfolio
selection? Also, the variance is not seen as a true measurement of risk since it fails to
detect the states that reect stress situations. Risk-sensitive asset managers prefer to
use a mean-downside risk approach such as mean-Value-at-risk (VaR) instead.
To gain some ideas about stress periods, table 3.1 reports data about periods when
Swiss stock market faced a stress. Besides the maximum drawdown, the time period
where prices were falling and when they rebound are shown. The last two periods rep-
resent the global nancial crisis and the dot-com bubble, respectively. On average it
takes longer for the markets to recover than to drop and a second observation are the
heavy maximum drawdowns. This illustrates that also in an optimal portfolio choice the
evaporation of diversication - that is correlations become close to 1 - in time of market
stress happens.
Period 1928-1941 1961-1968 1972-1979 1989-1992 00-05 08-13 Av.

Low 1935 1966 1974 1990 2002 2008
MDD % 41.3 37.5 47.2 20.2 42.3 34.1 36
yfp 7 5 2 1 2 2 2.86
yrp 6 2 5 2 3 5 3 .57
Table 3.1: Periods involving large drawdowns in Swiss equity markets. The drawdown is
the measurement of the decline from a historical peak. The maximum drawdown (MDD)
up to time T is the maximum of the drawdown over the overall time period considered,
yfp means years with falling prices, yrp years with rising prices and Av. average (Kunz
[2014]).
We consider mean-VaR portfolio optimization. VaR(a) is the minimum dollar amount

an investor can lose with a condence of 1−a for a given holding period where the portfolio
is not changed.
12 If the portfolio returns are normal N (µ, σ), the dollar amount V aR(a)
is for a unit time period
VaR(a) = σk(a) + µ, (3.16)
where µ is the portfolio return, σ the volatility of the portfolio return, and k(a) is a
tabulated function of the condence level 1 − a, see Section 7 for a proof of 3.16. Hence,
under normality, VaR is proportional to volatility. This translates into the optimization
problem: Mean-variance is equivalent to mean-VaR by rescaling the volatility.
Figure 3.7 shows an ecient frontier and several VaR constraints, i.e. the problem is
to maximize expected return under the constraint
P (R ≤ x) ≤ a . (3.17)
12 To gain intuition for VaR, consider:

• A stock with an initial price S0 of USD 100 and the random price S1 in one year.
• An investor faces a loss if S1 < 100.
What is the probability that the loss exceeds USD 10 - that is to say, P (100 − S1 < 10) =?. Hence,
the loss amount is given; the probability of the loss is unknown. VaR answers a related question: the
investors search a USD amount - the VaR - such that the probability of a loss is not larger than the
predened quantile level on average. That is to say,
P (100 − S1 < ?) ≤ a%,
where ? = is the Dollar VaR amount. Hence, the probability of the loss is given; the loss amount is
unknown.
This VaR(a) constraints dene straight lines assuming normality. The impact on the
optimal portfolio choice is as follow. Starting with the benchmark of say −3% = x loss
capacity, the straight blue lines. The intersection between this line and the mean variance
frontier select the optimal mean-VaR portfolio. If loss capacity increases, the line moves
parallel to the right implying higher possible optimal risks and returns. The same eect
follows if for a xed loss capacity the condence level is lowered - more risk and return
becomes optimal.
Increasing Confidence
Expected
Return 99%
95%
Equities
Bonds
Portfolio Risk
Increasing Loss Capacity
-3%
-7% -9%
Figure 3.7: Mean-drawdown optimal portfolio. The straight lines represent the VaR
constraints. If loss capacity increases, the VaR lines move to the right indicating higher
risk and returns in the optimal portfolio. The same result follows if the condence level
is lowered.
We conclude with two VaR calculations. Consider a position with value USD 1
million. Assuming normality of returns, the goal is to calculate the one-day VaR on the
95 percent level. The estimated daily mean is 0.3 percent and the volatility is 3 percent.
With k(5%) = 1.6449 follows
VaR(a) = (1.6449 × 0.03 + 0.003) × USD1mn = USD 52, 347.
Therefore, on average in 1 out of 20 days the loss is larger than the calculated VaR of
USD 52, 347.
To be more realistic for AM, we calculate VaR for an Euro investor with the portfolio:
There are three equity risk sources (DAX, DJ, Novartis), two FX risks USDEUR (spot
1.05) and CHFEUR (spot 0.8) and US interest rate risk for the bond, i.e. 6 risk factors.
Position Type Market Price Currency

1 10 Equity Funds Shares DAX 1'000 Euro
2 5 Equity Funds Shares DJ 5'000 USD
3 200 Novartis Stocks 50 CHF
4 10 US Treasury, 10y, Zero Coupon Bonds 800 USD
Table 3.2: Initial value of the investor's portfolio.
The goal is to calculate the weekly Euro VaR on a 95% level.
We rst need the variance and covariance information, then the calculation of the
exposure in Euro and the allocation of the EUR exposure to the risk factors using market
data in the following table. We rst calculate the EUR exposure and the allocation of the
σ DAX DJ Novartis USDEUR CHFEUR US 10y

DAX 30% 1 0.5 0.6 0.4 0.55 -0.2
DJ 20% 0.50 1 0.55 0.77 0.66 -0.4
Novartis 25% 0.60 0.55 1 0.23 0.72 -0.22
USDEUR 15% 0.40 0.77 0.23 1 0.73 -0.49
CHFEUR 5% 0.55 0.66 0.72 0.73 1 -0.21
US 10y Treasury 10% -0.20 -0.4 -0.22 -0.49 -0.21 1
Table 3.3: Market Data.
EUR exposure to the risk factors: The portfolio variance σp2 is given by σp2 = hX, CXi
Posit. Price EUR Exp DAX DJ Nov. USDEUR CHFEUR USD10y

10 DAX 1'000 10'000 10'000
5 DJ 5'000 26'250 26'250 26'250
200 Novartis 50 8'000 8'000 8'000
10 US Treas. 800 8'400 8'400 8'400
Sum (X ) 10'000 26'250 8'000 34'650 8'000 8'400
Table 3.4: EUR exposure and allocation of the exposure to the risk factors.
where X is the EUR exposure vector allocated to the risk factors and Cij = σi σj ρij .
Calculating these matrix products gives σp2 = 1600 8040 032. This is the value on an
annual basis. To obtain the result on a weekly basis, we obtain for the variance
1600 8040 032/52 = 10 758 .

p
σw =
The critical value on the 95% level is k95% = 1.644853. This implies the 1w EUR VaR
of using
√ √ √
−VaRα = σkα T = X 0 CXkα T
where the drift is zero:
VaR = 10 758 × 1.644853 = 20 892 EUR .
We then get for the VaR contribution:
∂ VaR √ CX CX
= kα T √ = kα2 T 2 .
∂X X 0 CX VaR
This implies the contribution rule:
X ∂ VaR √ X (CX)j X
VaR = Xj = kα T Xj √ = VaRj . (3.18)
j
∂Xj X 0 CX
j j
Applying this to the portfolio, the US Treasury bond is negative: Due to its negative
correlations to the other factors the VaR is reduced by 6 percent. The largest VaR
contribution is from the DAX risk factor with 31 percent, although the exposure is only
10.5 percent. The contribution of USDEUR is 19 percent to VaR whereas the factor
exposure is the largest one of 36 percent.
3.3.4.6 SAA and TAA

The Markowitz optimization approach can be used to dene the SAA and TAA rigorously.
We follow Leippold (2011), Lee (2000) and Roncalli (2014). Consider the optimization
problem (3.1):

θ
max hφ, µi − hφ, Cφi
φ 2
where we assume the full investment condition. Then, the solution can be written as a
sum of the GMV portfolio and a second portfolio φX :13
φ = φGM V + φX .
To introduce the SAA, we use the unconditional long-term (equilibrium) mean of the
returns. Adding and subtracting the long-term mean µ̃ in the second component, the
solution can be written after some algebra in the form:
14
φ = φGM V + φS + φT . (3.19)
The second and the third component are the SAA and the TAA component, respectively.
The sum of the three components is an ecient portfolio.
13 We have
C −1 he, C −1 µi
φX = (µ − e)
θ he, C −1 ei
.
14 φS = 1 C
−1
(µ̃e0 −eµ̃0 )C −1
e and φT = 1 C
−1
((µ−µ̃)e0 −e(µ−µ̃)0 )C −1
e .
θ he,C −1 ei θ he,C −1 ei
Each SAA component φj,S is proportional to µ̃j − µ̃k for k 6= j . If the long-term fore-
casts of all assets are the same, the SAA component is zero. If the long-term forecasts
dier, the holdings are shifted to the asset with the higher equilibrium return. The size
of pairwise bets depend on the relative risk aversion θ and the covariance C which enter
φS . The sum of the GMV and the strategic portfolio is called the benchmark portfo-
lio in the asset management industry and the strategic mix portfolio in investment theory.
For each TAA component, φj,T is proportional to
µj − µ̃j − (µk − µ̃k )
for k 6= j . Hence, there are again bets between the assets case where there are no
bets against the same asset and the bets are of an excess return type with the SAA as
benchmark. For N assets, there are N (N − 1)/2 bets. As in the SAA case, the bets are
weighted by the covariance matrix and the relative risk aversion.
3.3.4.7 Active Investment and Benchmarking

We consider the case of mean-variance optimization considering a general benchmark b.
The tracking error dierence e between an active managed portfolio φ and the benchmark
b denes the return dierence
e = R(φ) − R(b) = (φ − b)0 R , (3.20)
where the dierence

ψ := φ − b (3.21)
denes the active bets of the investor. Taking expected value the expected tracking error
dierence reads
µ(φ, b) = (φ − b)0 µ . (3.22)
The tracking error TE is by denition the volatility of the tracking error dierence:
p
TE = σ(φ, b) = σ(e) = (φ − b)0 C(φ − b) . (3.23)
The investor then chooses the bets such that the quadratic utility is maximized:

θ
max hψ, µi − hψ, Cψi (3.24)
ψ 2
Assuming full investment, the solution of this active risk and return program can be
written as a sum of two parts. One part is given by the benchmark and the second
one by the bets. Note that in general this bet vector is dierent from the tactical asset
allocation vector in last section.
Proposition 36. Consider the active risk and return optimization in (3.24) with the full
investment constraint. The ecient frontier are straight lines in the (σ(ψ, b), µ(ψ, b))-
space. Inserting further linear constraints, the ecient frontier are non-degenerate hy-
perbolas.
3.3.4.8 Comparing Mean-Variance Portfolios with Other Approaches

We follow Ang (2012). Consider four asset classes - Barcap US Treasury (US govt
bonds), Barcap US Credit (US corporate bonds), S&P 500 (US stocks), and MSCI EAFE
(international stocks) for the period 1978 to 2011. Dierent strategies are chosen monthly
and for the estimated parameters the past ve years of data are used:
• Mean-variance (MV), equal weights (EW), Global Minimum Variance (GMV) and
equal risk contribution (ERC) are four strategiess
• Diversity weights, which are transformations of market weights using entropy as a

measure of diversity, dene another strategy.
• Risk parity (RP). The optimal portfolio weights are chosen proportional to inverse
volatility. This approach mimics negative leverage in the markets - if asset prices
fall, volatility rises. This strategy ignores the correlation structure.
• The Kelly rule.
Strategy Return Volatility Sharpe ratio USD 100 after 33 years

Mean-variance 6.06 11.59 0.07 697
Market weights 10.25 12.08 0.41 2,503
Diversity weights 10.14 10.48 0.46 2,422
EW 10 8.66 0.54 2,323
RP 8.76 5.86 0.59 1,598
GMV 7.96 5.12 0.52 1,252
ERC 7.68 7.45 0.32 1,149
Kelly rule 7.97 4.98 0.54 1,256
Table 3.5: Risk and return gures for the dierent investment strategies. (Ang [2012]
and own calculations).
The mean-variance portfolio is the strategy with the worst performance: choosing
market weights, diversity weights, or EW leads to higher returns and lower risk. A reason
for the outperformance of the global GMV is that there is a tendency for low-volatility
assets to have higher returns than high-volatility assets.
3.3.5 Review Markowitz Model

We considered so far properties of the Markowitz model without asking basic questions
about the pros and cons of the model. This is the objective in this section.
First the Markowitz model is the most used model in portfolio allocation. There
are two main reasons for this fact. First its simple and convincing economic assumption
about the risk and return trade-o. Second, it denes a quadratic optimization prob-
lem (QP). This means hµ, φh− 2θ hφ, Cφi is minimized under a set of linear constraints
Aφ ≤ b with A a matrix and b a vector. In its simplest form QP problem are even
analytically solvable. Adding more constraints, the problem is numerically approached
where decades of research in this direction provide ecient algorithms. Summarizing,
portfolio optimization with a benchmark, a tracking-error problem, also the problem of
Black-Litterman with views, index sampling, turnover constraints and the case with lin-
ear and quadratic transaction costs are all QP! Its specic mathematical form is therefore
a success factor for mean-variance portfolio allocation.
Explained why the mean-variance analysis is successful, we consider its general prop-
erties:
• Portfolio theory in general and the mean-variance approach in particular are as-
sumed to be related to diversication. But what does this really mean?
• Is optimal investment in the Markowitz model to risk factor, arbitrage factors,

hedging factors?
• How smooth and robust is mean-variance optimization?
We start with the diversication issue and recall that optimal investment is proportional
in the basic Markowitz model to C −1 µ: the information matrix mixes expected returns
for the optimal allocation and not the covariance matrix. But what can be said about
the information matrix? Stevens (1998) derives an expression for the information matrix
in the Markowitz model. Using general matrix inversion, the OLS regression of Rt,i on
the return of all other assets Rt,−i plus a noise term which is normally distributed with
mean zero and variance σi2 reads:
Rt,i = β0 + βi0 Rt,−i + i,t .
He proves that
1 βij
Cii−1 = 2
−1
, Cij =− .
σii (1 − Ri ) σii (1 − Ri2 )
Using this model, the information matrix elements follow as ratio between the estimated
betas and the unhedgeable risk of the regression.
Proposition 37 (Stevens (1998)) . Consider the standard Markowitz model 3.1.

P
1 µ̂i − k6=i βik iµk
φ∗i = .
λ σi2 (1 − Ri2 )
σ̂i2
where Ri2 = 1 − σi2
.
In other words, the optimal MV allocation rewards large hedging errors in

the above sense
P weighted by the tracking error of the hedging portfolio. The
dierence µ̂i − k6=i βik iµk is a long-short combination of the asset's expected return and
the hedge portfolio. MV optimality therefore does not mean that it is optimal to diversify
across the risk sources but to concentrate the exposure on the long-short combination of
risk minus the hedge. This is a complete dierent story than 'do not put all eggs in the
same basket'. The better the hedge, i.e. the larger the R2 of the regression, the smaller is
the denominator in the optimal policy and therefore the more weight the asset receives.
But a high R2 means that the asset i is strongly correlated to the other assets. Hence,
yet small variations of the dependence create strong variation in the optimal policy. This
shows why strongly correlated assets are a source of instability of mean-variance optimal
portfolios. An investor is long in asset i if the expected return of this asset is larger than
the return of all other assets and similarly for a short position (and similar for a short
position).
Bourgeron et al. (2018) provide a characterization of the above dierence between

the expected return and the optimal hedge:
Proposition 38 (Bourgeron et al. (2018)) . Consider the standard Markowitz model 3.1.
φ∗i = φ∗i,0 + ω(φ∗i,0 − φ∗i,h )
where φ∗i,0 is the optimal portfolio by assuming zero correlation, φ∗i,h is the optimal port-
folio of the hedging strategies and ω is the leverage dened as the ratio between the id-
iosyncratic variance and the tracking error variance.
If tracking error is small, a larger leverage follows. This characterization shows that
MV diversication means to leverage a hedge portfolio: The MV optimal portfolio is an
aggressive portfolio by selecting a few bets!
To control for this incentives one uses constraints. The simplest one is full invest-
P
ment i φi = 1. Solving (3.1) with such a constraint amounts to consider a Lagrangian
function and then calculating the First-Order-Condition (FOC). Further constraints can
be added. Real optimizer in asset and wealth management can consider up to hundreds
of constraints. This destroys analytical tractability and in some sense leads optimization
ad absurdum: If you know what you want by imposing many constraints why don't you
simply state the investment policy? Furthermore, each constraint has an economic price,
the shadow price, which reduces unconstrained utility. This should be made transpar-
ent to the investor what the economic price of his own constraints are, such as loving a
particular stock, and what the price of constraints are induced by the AM rms such as
the band with of the SAA and TAA. From an ecient frontier perspective, adding con-
straints transforms the hyperbola into piecewise straight lines and piecewise hyperbolas,
Globally the constraint frontier lies below the ecient frontier and shifted towards more
risk.
We now consider the second challenge, how to complement the Markowitz optimiza-
tions such that for example solutions are smooth, i.e. varying inputs slightly should lead
to smooth changes of the allocation, controlling for a smooth rebalancing and controlling
for turnover costs. The main strategy is to add an additional term T to the quadratic
utility function (Thikonov Regularization)
θ
hµ, φi − hφ, Cφi − c||Γφ − φ0 ||22
2
with c > 0, Γ a matrix, φ0 an initial portfolio and || . . . ||2 the Euclidian norm. c controls
the importance of the regularization term. These terms are commonly added to promote
sparsity or to reduce sensitivity to outliers.
Figure 3.8: Ridge solution for a portfolio with and without a target. The gures shows
how the ridge solution provides smooth portfolio components (denoted by xj ) as a func-
tion of the control parameter c, (Source: Roncalli (2018))
The de-noising techniques of the covariance matrix are not sucient for obtaining
the stability of the solution. Figure 3.9 provides the intuition. Each covariance matrix
can be diagonalized where the eigenvalues are all positive, real numbers in the diagonal
matrix. Ordering the eigenvalues according to their size, we show below that the largest
eigenvalues of C account for portfolio risk. The smallest eigenvalues however matter for
the information matrix C −1 which is the proportion constant for the optimal investment
rule: The noisy eigenvalues drive optimal investment. Regularization techniques handle
this small eigenvector problems. But this is not sucient to obtain a meaningful optimal
asset allocation since as stated above, the Markowitz model makes bets on the long-short
portfolio of expected return minus the beta hedge. But these factors are distributed on
the whole range of eigenvalues. Therefore, considering the largest ones and treating the
smallest one using regularization leaves out all intermediate eigenvalues which impact
also the stability and smoothness of the optimal allocation. Hence, more than de-noising
of the covariance matrix is needed.
Some practitioners prefer to introduce restrictions into the optimization problem to

stabilize the optimization problem. This approach has drawbacks:
• Each restriction has an economic price.
• Compare two constraint models. Is one allocation better than the other because
of a better model or because of the chosen constraints? Constraints are ad hoc,
discretionary decisions that impact a model's performance in a complicated way.
Size of eigenvalus
Dominate Region Portfolio Risk ~ Dominate Region Optimal Allocation ~

Eigenvalue Size Inverse Eigenvalue Size
Regularization
Intermediate Eigenvalue Size
Eigenvalues of a covariance
matrix C
Figure 3.9: Distribution of eigenvalues of a covariance matrix.
To introduce regularization, we consider the problem of optimizing or tting for

φ1 , φ2 in the objective ||1(φ1 + φ2 )||22 . There are innitely many solutions φ1 = 1 − φ2 .
φ1 = 10100 and φ2 = 1 − 10100 are two solutions which we do not like. Adding a penalty
||φ||1 := |φ1 | + |φ2 | an optimal solution is the sparse solution φ1 = 1, φ2 = 0. For dierent
choices of c, Γ, dierent well-known regularization approaches follow. If Γ is set equal
to the identity matrix, the ridge regularization follows. We next denote Ĉ the unbiased
empirical covariance matrix, F an estimator which is biased but converges more quickly
∗
than the empirical covariance, Ĉ(ν) := ν Ĉ + (1 − ν)F and if ν is the minimizer of
∗
E(||Ĉ(ν) − C||2 ). If we set c = 1−ν ν ∗ and Γ equal to the Cholesky decomposition of F ,
then the Ledoit-Wolf covariance shrinkage method follows, see Section 3.4.8 for a discus-
sion. We discuss regularization in dierent sections below.
3.3.6 Views and Portfolio Construction - The Black-Litterman Model

In all model considered so far, the views of the investors did not enter in a systematic
form. But most investors have some views about specic assets and they wish to apply
these views to their asset management. For example high past returns may not be the
same in the future and the asset manager would like to correct this by implementing
a prior view in the model. By doing this, the acceptance of the model increases. The
views should be integrated consistently in the model such that stable estimations for the
expected return and covariances follow.
There are many dierent approaches how views can be used in portfolio construction.
We consider the Black-Litterman (BL) model (BL [1990]) to be the rst, and still the
most popular, model used by practitioners.
For further reading, in addition to BL, we cite Walters (2014), Satchell and Scowcroft
(2000), Brand (2010), Meucci (2010), Idzorek (2006), Herold (2003), and He and Litter-
man (1999).
3.3.6.1 Black-Litterman Model

The two contributions of BL model to the asset allocation problem are:
• The equilibrium market portfolio serves as a starting prior for the estimation of
asset returns.
• It provides a clear way of specifying an investor's linear views on returns and of

blending these views with prior information. The investor is not forced to have a
view for all assets and the views can span arbitrary combinations of assets.
For non-linear view, consider entropy pooling. In the construction of BL, the rst step is
to dene the reference model. Assume that for the returns RN (µ, C), where both mean
and covariance are unknown. Since the goal of BL is to model expected returns we start
with a model for the mean: µ ∼ N (π, Cπ ). Hence, µ = π + with ∼ N (0, Cπ ). The
covariance of the returns CR about the estimate π is - given µ and are not correlated -
is given by
CR = C + Cπ . (3.25)
Therefore the reference BL model is given by R ∼ N (π, CR ). The mean π represents the
best guess for µ, and the covariance Cπ measures the uncertainty of the guess. How do
we x π , the prior estimate of returns, that is to say the returns before we consider views?
BL uses a general equilibrium approach since if a portfolio is at equilibrium of supply

and demand in the markets, then each sub-portfolio must be at equilibrium too. There-
fore, an equilibrium approach for the return estimate is independent of the size of the
portfolio under consideration. BL and many other use the CAPM in the following reverse
engineering way. But there is no model restriction.
Using the CAPM means that all investors have a mean-variance utility function.
Without any investment constraints, the optimal strategy φ maximizes the expected
utility given in (3.1)
θ
E(u) = φ0 π − φ0 Cφ ,
2
where we have replaced the expected returns by the unknown expected return estimate π.
The solution gives us the optimal strategy φ as a function of the return and covariance:
φ = 1θ C −1 π .
Given the equilibrium strategy φ in the CAPM we immediately get the excess return
estimate
π = θCφ . (3.26)
How do we x the risk aversion parameter? Multiplying (3.26) with the market portfolio
φ0 implies that
2
RM − Rf = θσM (3.27)
with RM the total return of the market portfolio. In other words, the risk aversion pa-
rameter is equal to the market price of risk. Using (3.27) in (3.26), the CAPM species
in equilibrium the prior estimate of returns π.
We consider next the insertion of views, where we follow Walters (2014). A view is
a statement on the market. Views can exist in an absolute or relative form. A portfolio
manager can, for example, believe that the fth asset class will outperform the fourth
one. BL assumes that views
• apply linearly to the market mean µ,
• face uncertainty,
• are fully invested (the sum of weights is zero for relative views or one for absolute
views), and
• do not need to exist for some assets.
More precisely, an investor with k views on N assets uses the following matrices:
• The k×n matrix P of the asset weights within each view.
• The k×1 vector Q of the returns for each view. That is, Pπ = Q expresses the
views.
• The k×k diagonal matrix Ω of the covariance of the views, with ωnn the matrix
entries. The matrix is diagonal as the views are required to be independent and un-
correlated. The inverse matrix with the entries 1/ωnn are known as the condence
in the investor's views.
The conditional distribution of the mean and variance can be represented in the view
space as
P (View|Prior) ∼ N (Q, Ω) .
Since the matrix P is in general not invertible, this expression cannot be be written in a
useful way in the asset space. But using Bayes' theorem, a posterior distribution of the
returns that blends the above prior and conditional distribution follows. Since the asset
returns and views are normally distributed, the posterior is also normally distributed. It
is given by the Black-Litterman master formula for the mean returns πBL and the
covariance CBL

I −1 µ
πBL = Cπ−1 Ω (3.28)
P Q
CBL = C + Cπ

I −1 I
Cπ = Ω .
P P
The parameters Ω and C are not observable and must be xed additionally. C is typically
replaced by the estimated covariance matrix C
b. There are several ways of specifying Ω.
One can assume that the variance of the views will be proportional to the variance of
the asset returns, one uses a condence interval or one uses the variance of residuals if
a factor model is used. We refer to Walters (2014) for details. How do we estimate the
variance of the mean π - that is, how do we x Cπ ? BL assume the proportionality
CR = τ C (3.29)
with τ the constant of proportionality factor. The uncertainty level τ can be chosen
proportional to the inverse investment period 1/T . The longer the investment horizon
is, the less uncertainty exists about the market mean; the higher the value of τ, the less
weight is attached to the CAPM. Summarizing, the prior return distribution is a normally
distributed random variable with the mean given in (3.26 and variance (1 + τ )C . With
this choices, the Black-Litterman master formula for the mean returns π and the
covariance C read
πBL = π + τ CP 0 (P τ CP 0 + Ω)−1 (Q − P π) (3.30)

−1 0 −1 −1
CBL = ((τ C) +P Ω P) .
Several consistency checks can be applied to (3.30): First, if Ω vanishes, which means
absolute certainty about the views, then the posterior mean becomes independent or
insensitive to the parameter τ. Next, if the investor has a view on every asset, the ma-
trix P becomes invertible. Since the covariances are by denition invertible the posterior
mean equation simplies to π = P −1 Q. Finally, if the investor is fully uncertain about
the validity of his or her views - that is to say, the matrix entries of Ω tend to innity,
there is no value added by adding any views to the model since the prior and posterior
return distribution agree: π = π.
Example
Consider four assets and two views. The investor believes that asset1 will outperform
asset 3 by 2 percent with condence ω11 and that asset 2 will return 3 percent with
condence ω22 . The investor has no other views. Mapping these views into the above-
dened matrices implies

1 0 −1 0 2 ω11 0
P = , Q= , Ω= . (3.31)
0 1 0 0 3 0 ω22
The technique developed by BL provides a framework in which more satisfactory

results are obtained from a larger set of inputs than are obtained using the mean-variance
framework. The model is usually applied to asset classes rather than single assets. Besides
generating higher returns, the BL model leads to more stable portfolio allocations over
time.
3.3.6.2 CIO Investment Process and Black-Litterman

A Black-Litterman-oriented investment process would have at least the following steps
(Walters [2014]):
• Determine which assets constitute the market.
• Compute the historical covariance matrix for the assets.
• Determine the market capitalization for each asset class.
• Use reverse optimization to compute the CAPM equilibrium returns for the assets.
• Specify views on the market.
• Blend the CAPM equilibrium returns with the views using the Black-Litterman
model.
• Feed the estimates (estimated returns, covariances) generated by the Black-Litterman

model into a portfolio optimizer.
• Select the ecient portfolio that matches investors' risk preferences.
These steps only dene one part of the investment process of a CIO. In general, the
CIO receives information from dierent sources the investment process: A macroeco-
nomic view from research analysts, market information, chartist information and valua-
tion information.Assume that one output of this information is to ' overweight Swiss
stocks - underweight European stocks'.
This denes a pair-wise bet. All bets of this type form the tactical asset allocation
(TAA). Several questions follow:
A How strong is the bet - that is to say, how much should the two stock positions deviate
from the actual level 'overweight Swiss stocks - underweight European stocks' ?.
B Should any possible currency risk in the bet be hedged?
C How long should this bet last?
D How condent is the CIO and his or her team about the bet?
E Is the bet implementable and what is the precision of such an implementation mea-
sured by the tracking-error?
F Will there be a stop-loss or prot-taking mechanism once the bet has been imple-
mented?
G How does the CIO measure the performance of the bet?
The approach to question A is often based on the output of a formal model. That
is to say, a risk budgeting model, a BL model, or a mean-variance optimization model
proposes to increase Swiss stocks by 5 percent and to reduce the European stock ex-
posure by 5 percent. It is then common practice that this proposal is corrected by the
CIO, either because it creates too much turnover for the portfolio managers or because
he considers such a change to be too strong.
Question B is - among other things - a consistency question since, on the one hand,
the +/ − 5 percent increase in equities also changes the FX exposure of the whole TAA
and, on the other hand, there could be a CHF-EUR bet following from the many in-
formation sources. Typically - question C - bets are made for one month. This is the
standard time after which the CIO and his or her team review the TAA.
Question D is the information risk issue. Information risk is dierent from statistical
risk. The most well-known statistical risk measurement in the industry is the tracking
error, which measures the volatility of alpha over a period of time. The risk sources
are market, counterparty, and liquidity risk of the assets. Bernstein (1999) denes in-
formation risk as the quality of the information advantage of a decision-maker under
uncertainty.
Reconsider the above Swiss stock-European stock bet. This view must be driven by
our information set, as well as by the proprietary process of analyzing the information
and data. To evaluate information risks, we ask (Lee and Lam [2001]):
• What is the completeness and timeliness of our information set?
• Have we missed something?

• Have we misinterpreted something?
• How condent are we about our models and strategies?
These questions suggest that some information risk may be quantied with a good deal of
precision while in most cases precise measurement of information risks seems impossible,
and well-informed judgement may be necessary. This may result in a nal statement on
the decision-maker's condence of adding alpha. If, say, the condence is 50 percent, we
are not condent at all about the bet. A standard approach to measuring the perfor-
mance of bets is the hit rate (HR).
A hit rate of 60 percent means that we add alpha in 60 percent of the months in
which we make an active bet. The condence in adding alpha can be interpreted as the
expected value of the hit rate. Information risk is then quantied by the expected hit
rates of our investment views.
Example
We follow Lee and Lam (2001). They assume that alpha is normally distributed
around its mean value. Then, there is a unique one-to-one mapping between the hit rate
HR and the information ratio IR. To derive this relation, we have for the α of an asset
which follows a normal distribution:
HR = P (α > 0) , α ∼ N (α, TE)
with α the arithmetic average alpha and T E the tracking error. Changing variables:
Z ∞
1 1 2
HR = √ e− 2 y dy
2π − TE
α
i −α
with x = αTE and dening the information ratio
α
IR =
T
, we get: Z ∞
1
HR =√ f (y)dy = 1 − Φ(−IR), (3.32)
2π −IR
with f the standard normal density function, Φ the standard normal distribution function
and IR the information ratio. Once the expected alpha and expected tracking error, and
therefore the expected information ratio, are stated, the complete ex ante distribution of
alpha is specied. The hit rate is the area to the right of 0% alpha. Using the square-root
law the following information risks, condence levels, and information ratios follow:
Information risks Condence (monthly HR) Monthly IR Annualized IR

Low 60% 0.25 0.88
Medium 56% 0.15 0.52
High 52% 0.05 0.17
Innity 50% 0 0
Table 3.6: Information risks, condence levels, and information ratios (Lee and Lam
[2001]).
3.3.7 Heuristic Allocation: Risk Budgeting Portfolio Construction

Risk-based portfolio construction - called risk budgeting - has two basic properties:
1. It is not based on the optimization of an investor's utility function, unlike the

Markowitz model.
2. It uses only explicitly the risk dimension of investment.
The rst property derives from the mentioned problems using quadratic optimization.
The second one reects the diculty of forecasting expected returns. Although only risk
is explicit, returns are implicit and the approach therefore a priori does not lead to very
conservative portfolios.
Constructing risk-based portfolios has three steps:
• Dene how risk is measured.
• Consider the risk allocation.
• Dene and solve the risk-budgeting problem. This implies the investment strategy.
3.3.7.1 Risk Measurements

The foundations of coherent risk measurement are given in the work of Artzner et al.
(1999). They dene a set of properties that each risk measure should satisfy, prove the
existence of such measures, construct such measures and show that some widely used
measures violate some of these properties. The properties that a coherent risk measure
should satisfy (Artzner et al. [1999]) are:
1. The risk of two portfolios is smaller than the sum of the risks.
2. The risk of a leveraged portfolio is equal to the leveraged risk of the original port-
folio.
3. Adding a cash amount to a portfolio reduces the risk of the portfolio by the cash
amount.
There are several variations of this axiomatic approach to risk theory.

Value at risk (VaR) is only a coherent risk measure for elliptical return distribution.
General VaR fails to satisfy axiom 1. Expected shortfall, i.e. what is the expected
loss given the loss exceeds a VaR-value, is a coherent and convex risk measurement.
Volatility risk measurements do not satisfy an additional property which is often assumed:
If a portfolio's return dominates another portfolio's return in all scenarios, the risk of
the former portfolio dominates the risk of the latter. Under the normal distribution
distributed, VaR, expected shortfall and volatility risk measurement are equivalent by
scaling the risk gures.
3.3.7.2 Risk Allocation

The main tool for risk allocation is the Euler allocation principle, see equations (2.80)
and (2.81).
3.3.7.3 Risk Budgeting

We restrict ourselves to the case of two risk budgets; the generalization is obvious. The
main idea is that the portfolio is chosen such that the individual risk contributions, using
a specic risk metrics, equal a predened risk budget.
Let B1 and B2 be two risk budgets in USD. For a strategy φ = (φ1 , φ2 ), the risk bud-
geting problem is dened by the two constraints, which equate the two risk contributions
RC1 and RC2 to the risk budgets - that is to say, the strategy is chosen such that the
following equations hold:
RC1 (φ) = B1 , RC2 (φ) = B2 . (3.33)
Summing the left-hand sides of (3.33) is, by the Euler principle, equal to total portfolio
risk. The sum on the right-hand side is the total risk budget. Problem (3.33) is often
recast in a relative form. If bk = cBk is the percentage of the sum of total risk budgets,
(3.33) reads
RC1 (φ) = b1 R(φ), RC2 (φ) = b2 R(φ) . (3.34)
The goal is to nd the strategies φ which solve (3.33) or (3.34) . This is in general a
complex numerical mathematical problem. But introducing the beta βk of asset k,
cov(Rk , R(φ)) (Cφ)k
βk = = ,
σ 2 (φ) σ 2 (φ)
the weights are given by
bk β −1 (φ)
φk = P k −1 . (3.35)
j bj βj (φ)
The weight allocated to component k is thus inversely proportional to the beta. This
equation is only implicit since the beta depends on the portfolio φ. The next proposition
summarizes some explicit solvable cases.
Theorem 39. Consider the risk budgeting program (3.34) for N assets with volatility
risk measure.
1. If correlation ρ = 0 among all assets,

√
bk σ −1
φk = P p k −1 . (3.36)
j bj σj
2. If correlation ρ = 1 among all assets,
bk σ −1
φk = P k −1 . (3.37)
j bj σj
3. If correlation is minimal, i.e. ρ = − N 1−1 among all assets, the ERC portfolio
follows:
σ −1
φk = P k −1 . (3.38)
j σj
4. In all other correlation cases, the implicit formula (3.35) holds.
5. If all volatilities are the same:

 −1
X
φk ∼  φj ρik  (3.39)
j
1. implies for example that the higher volatility of a component, the lower is its
weight in the RB portfolio. For equal risk contributions (ERC) model where all weights
for the risk budget bk are set equal to 1/N , Maillard et al. (2008) show that the volatility
of the ERC model is furthermore located between the volatility of the minimum variance
(MVP) portfolio and the volatility of an equally capital weighted (EW) portfolio:
15
σM V P ≤ σERC ≤ σEW . (3.41)
The ERC portfolio is equal to the MV portfolio if (i) the correlation is constant and
(ii) the correlation value attains its lowest possible value. The ERC is equal to the EW
portfolio if all volatilities are identical.
Denition 40. The (ERC) approach is called the risk parity (RP) approach.
15 The three portfolios are dened as follow:
∂σ(φ) ∂σ(φ) ∂σ(φ) ∂σ(φ)
φk = φj (EW ) , = (M V P ), φj = φk (ERC) . (3.40)
∂φj ∂φk ∂φj ∂φk
Although closed-form analytical solution for risk budgeting problems are possible only
in some particular cases, there is a simplied heuristic allocation mechanism - inspired
by the allocation (3.35):
−m
Riskk
φk = L × P −m (3.42)
k Riskk
with Risk any risk measure, L the portfolio leverage which is needed if one denes ex-
ante a risk level for the portfolio (risk-targeting approach) and m a positive number.
If m = 0, the portfolio is equally weighted. For increasing m, the portfolio allocation
becomes more and more concentrated on the assets with the lowest individual risk. For
example, the GMV portfolio follows if all correlations are set equal to zero and m=2
and ERC by assuming that all correlations are constant and m = 1.
Teiletche (2014) illustrates some properties for the above four portfolios using Ken-
neth French's US industry indices, 1973-2014; see Figure 3.10.
Figure 3.10: Risk-weighting solutions for EW, GMV, MD, and RP (ERC) portfolios
using sector indices from Kenneth French. The variance-covariance matrix is based on
ve years of rolling data (Teiletche [2014]).
Figure 3.10 indicates that GMV has a preference for lower volatility sectors (e.g.,
utilities or consumer non-durables), MD prefers low correlation (e.g., utilities or energy),
EW is not sensitive at all to risk measures, and RP (ERC) is mixed. The RP and EW
show similar regular asset allocation patterns and GMV and MD asset allocation patterns
are much less regular. The latter react much more to changing economic circumstances
and are therefore more defensive.
Maillard et al. (2009) compare the ERC portfolio with 1/N and MVP portfolio for
a representative set of the major asset classes from Jan 1995 to Dec 2008.
16 The ERC
portfolio has the best Sharpe ratio and average returns. The Sharpe ratio of the 1/N
portfolio (0.27) is largely dominated by MVP (0.49) and ERC (0.67). MVP and ERC
dier in their balance between risk and concentration. The ERC portfolios are much
less concentrated than their MVP counterparts and also their turnover is much lower.
Lack of diversication in the MVP portfolios can be seen by comparing the maximum
drawdown values: The value for MVP is −45% compared to −22% of the ERC portfolio.
When we restrict the risk measurement to volatilities, the heuristic approach (3.42)
takes the following generic component-wise form (Jurczenko and Teiletche [2015]):
φ = kσ −1 , (3.43)
where k is a positive constant, σ is a vector of volatilities of N assets, and φ is the vector

of risk-based portfolio weights. Equation (3.43) corresponds to the risk parity and max-
imum diversication portfolio solutions when the correlation among assets is constant,
the minimum variance portfolio when correlation is zero, and the 1/N portfolio when all
volatilities are equal. Many practitioners use (3.43) to scale their individual exposures
and the MSCI Risk Weighted Indices attribute the weights proportionally to the inverse
of the stock variances.
The constant k can be calibrated in dierent ways. Using a capital-budgeting con-

straint, that is to say, the sum of the components φi is equal to 1, implies
1
k=P −1 .
k σk
So, (3.43) becomes the heuristic model (3.42) with m=1 and zero leverage. If we use a
volatility-target constraint σT for the risk-based portfolio, we get
σT σT
k= = (3.44)
N Concentration N C(ρ)
with ρ the average pair-wise correlation coecient of the assets and C(ρ) the concentra-
tion measure
17
p
C(ρ) = N −1 (1 + (N − 1)ρ) . (3.45)
16 The asset class representatives are: S&P 500, Russell 2000, DJ Euro Stoxx 50, FTSE 100, Topix,
MSCI Latin America, MSCI Emerging Markets Europe, MSCI AC Asia ex Japan, JP Morgan Global
Govt Bond Euro, JP Morgan Govt Bond US, ML US High Yield Master II, JP Morgan EMBI Diversied),
S&P GSCI.
17 To prove this formula, we write Λσ for the diagonal matrix with the vector of volatilities σ on its
diagonal, ρ the correlation matrix of returns and I the identity matrix. The covariance matrix can be
written in the form C = Λσ ρΛσ which implies
hσ −1 , Λσ ρΛσ σ −1 i = he, ρei .
The concentration measure varies from 0, when the average pair-wise correlation reaches
its lowest value, to +1, when the average correlation is +1. Hence, k increases when the
diversication benets are important - that is, when the correlation measure decreases.
In this case, each constituent's weight needs to be increased to reach the desired volatility
target: the risk-based portfolio even becomes leveraged. Risk-based investing often faces
the criticism that it cannot allow for views. This is not true, see Jurczenko and Teiletche
(2015) and Roncalli (2014).
3.4 Estimation: The Covariance Matrix

The estimation of the input variables expected returns and covariance matrix is key. If
they are of low quality, the optimal portfolio rules are contaminated. Estimating the
expected return has been for a long time in the focus of researchers whereas the esti-
mation of the covariance matrix received much less attention. There are several reasons
that return estimate were in the focus. The fundamental pricing equation, see Chapter
4, states that changes in asset price returns are driven by changing expectations of the
cash ows, changing correlations between the assets or changes in the discount factors.
We refer the reader to Ilmanen (2012) for a discussion of estimating the expected return.
There is no comparable economic equation as the fundamental pricing equation for

the covariance matrix which identies the drivers. The main theoretical property is that
each covariance matrix can be diagonalized. This representation, the Principal Compo-
nent Analysis (PCA), reduces the complexity of a N ×N covariance matrix to the study
of N eigenvalues (entries of the diagonal matrix). Many approaches in estimating the
covariance matrix start with an analysis of the eigenvalues.
The estimation problem of the covariance matrix always includes the inverse matrix
and the optimal allocation rule. Estimation risk means parameter uncertainty about the
true parameters in the models, i.e. they are not known say in the Markowitz model and
one has to estimate these parameters given only a nite data set. Whichever statistical
approach we choose, there is risk that the estimated parameters are dierent from the
unknown, true parameter values.
The volatility of the risk-based portfolio is then given by (using (3.43)):

p p s XX
σRB = φCφ = k he, ρei = k 1 + ρij .
i j6=i
Introducing the average pairwise correlation coecient

1 XX
ρ= ρij
N (N − 1) i
j6=i
implies (3.44).
3.4. ESTIMATION: THE COVARIANCE MATRIX 167
There are dierent methods to estimate the covariance. The obvious one is to use the
unbiased sample
18 covariance matrix CS.
But given N assets and a time series of length
T , say of the same order as N , then one cannot estimate an number of parameters which
2
is of order N by the same order of magnitude of available data. This leads to a large
estimation error.
Let R(j) be the rate of return in the past month j. The average return of n ob-
√
servations assuming IID returns has itself a mean R and a standard deviation σ/ n.
These are the true values. For an assumed annual return of 12%, the true monthly re-
turn is R1m
√ = 1%. For an annual standard deviation of σ = 5% the monthly estimate
σ1m = 5/ 12 = 1.44% follows. This estimate is larger than the mean itself, i.e. not
meaningful. Using n = 60 (ve years of data), the standard deviation estimate becomes
0.00645, which is not signicantly smaller than the mean. If we would like to have a
√
standard deviation estimate of, say, 1/10 of the mean, the equation 0.05/ n = 0.001
implies n = 2, 500. This corresponds to a time series of more than 208 years (2,500/12).
To illustrate estimation risk, we estimate the sample mean bS

µ and sample covariance
b 0 S from the data and plugg the values into the optimal portfolio rule (3.3):
C
1 b −1,S S
φM V = C µ
b . (3.46)
θ
Assuming that the plugged-in parameters are the true ones leads to zero estimation
risk. But this is not an optimal approach.
19 One has to dene a procedure outside of
the investment optimization program which xes the values of the parameters.
Bouchaud and Potters (2009) illustrate this. They consider the Markowitz model
without the full investment constraint. The optimal policy, if we assume the true ρ
known:
ρ−1 µ
φM V = r (3.47)
hµ, ρ−1 µi
with r the expected mean return. The true minimal risk is then
2 2 1
σM V = hφM V , ρφM V i = r . (3.48)
hµ, ρ−1 µi
We compare this optimal case with the in-sample and out-of-sample risks. The in-sample
estimate uses the known empirical correlation matrix ρbS of the corresponding period.
The out-of-sample matrix uses the empirical correlation ρ̃
S which is observed in the next
period. The portfolio risks read:
18 The expected covariance matrix is equal to the true covariance matrix.

19 Tu and Zhou (2003), Kan and Zhou (2011), Zellnter and Chetty(1965), Pastor and Stambaugh
(2000).
2 2 1 2 2 hµ, ρ̃−1 ρρ̃−1 µi

σM V,in = r , σM V,out = r . (3.49)
hµ, ρb−1,S µi (hµ, ρ̃−1 µi)2
If the posterior estimate is equal to the true one, then the risk of the out-of-sample
estimate is equal to the optimal one. Assuming that the in-sample estimate is not biased,
convexity properties for positive denite matrices imply:
2 2 2
σM V,in ≤ σM V ≤ σM V,out . (3.50)
How far away are the in- and out-sample risk from true risk? Pafka and Kondor (2004)
show that for IID returns and large portfolios:
2 2
p 2 N
σM V,in = σM V 1 − q = σM V,out (1 − q) , q = . (3.51)
T
√
The in-sample risk is 1−q smaller than the true risk and while the out-of-sample risk
√
is larger than true risk by the value 1/ 1 − q . This denes data snooping.
But also estimation risk of the mean matters. Ang (2014) estimates the original
mean-variance frontier using data from January 1970 to December 2011. The mean
of US equity returns is 10.3 percent. Ang changes the mean to 13.0 percent. Such a
change is within two standard error bounds. The minimum variance portfolios for a
desired portfolio return of 12 percent are given in Table 3.17. This change caused the
US position to change from -9 percent to 41 percent, and the UK position to move from
48 percent to approximately 5 percent.
Asset US mean = 10.3% US mean = 13.0%

USA -0.0946 0.4101
JPN 0.2122 0.3941
GBR 0.4768 0.0505
DEU 0.1800 0.1956
FRA 0.2257 -0.0502
Table 3.7: MV portfolios for two dierent expected equity returns (Ang [2014]).
Returning to the estimation of the covariance matrix, the main question is how to
reduce the number of degrees of freedom for the estimation purpose? In the last two
decades, one approach was to reduce to the order of magnitude to a low number of say 1,
2 or 3 degrees.
20 Today, the approach is to consider estimators of the covariance matrix
which are optimal in a low dimensional space. The EW portfolio 1/N, the linear shrink-
age approach or the factor model approach are examples. We compare these approaches
with the order N approach of Ledoit and Wolf (2018), a non-linear shrinkage approach,
20 Ledoit and Wolf (2003, 2004a,b), Kan and Zhou (2007), Brandt et al. (2009), DeMiguel et al. (2009,
2013), Frahm and Memmel (2010), and Tu and Zhou (2011).
and with the sample estimate, i.e. N 2. The non-linear shrinkage approach positions in
between the too loose sample estimate and the possibly to tight low dimensional methods.
We proceed as follows: We consider the sample estimate, the diagonalization (PCA)

approach and the general linear factor model. Then the Baysian, linear and non-shrinkage
methods are introduced. The results are compared in a nite and asymptotic set-up.
3.4.1 Sample Estimates

The asset returns follow a N -dimensional normal distribution with mean zero and co-
variance C. Consider T independent copies of the return distribution. The maximum
likelihood estimators bS , C
µ b S or ρbS are given by:
T T
1X 1 X
µ S
b = S
Rt , C =
b b)(Rt0 − µ
(Rt − µ b)0 (3.52)
T T −1
t=1 t=1
and the estimates are distributed as:
1
bS ∼ N (0,
µ b S ∼ W(C, T − 1)
C) , (T − 1)C (3.53)
T
with W(C, T − 1) the N -dimensional central Wishart distribution with scale matrix C
and T − 1 degrees of freedom. If N << T , ρ bS should be close to the true value ρ.
If N increases for T xed, then the estimation error and the out-of-sample performance
worsen. Ledoit and Wolf (2003) estimate that for N assets around T ∼ 10N observations
should exist to control for estimation error.
1
Consider the EW strategy with return variance σp2 = N2
he, Cei. The sample estima-
bp2 is distributed as
tor σ
1 χ2
bp2 =
σ he, b ∼ σ 2 T −1
Cei p
N2 T −1
with χ the chi-square distribution with T −1 degrees of freedom. The variance of this
estimator is unbiased.
C −1 e 2 1
We compare EW with the GMV strategy φGM V = he,C −1 ei
with σGM V = he,C −1 ei
.
Then
2 1 2 χ2T −N
σ
bGM V = ∼ σGM V ,
b −1 ei
he, C T −1
i.e. the number of assets N matter. The variance of this estimator is biased and propor-
N 2
tional to (1 − T )σGM V . Consistency follows only in the limit of innitely many copies
N −1
T of returns. The true variance is underestimated by the factor (1 −
T ) . Considering
20 assets and ve years of monthly data (T = 60), the true variance is expected to be
around 50% above its estimate.
3.4.2 Dimension Reduction I: PCA

The estimation of the covariance is high dimensional. It is natural to reduce its dimen-
sion by projecting to a lower dimension space or by nding a better data representation.
Principal Component Analysis (PCA) is a simple and successful method which dates
back to a 1901 paper by Pearson. Let (Rk ) ∈ RN be a return time series of length
T. The goal is to project the data to a lower dimensional subset K: First, nd the K
dimensional ane space such that the projection of PK (Rk ) are the best approximation
to Rk . Second, nd K dimensional projection PK Rk such that as much variance of
of
the data as possible is preserved.
Let
K
X
Rk ∼ µ + βkj vj
j=1
an approximation with W = (v1 , ..., vK ) where (vj ) is an orthonormal basis for subspace
K . Then, W 0 W =I do to he orthonormality of the vectors (vj ). Finding the best linear
t means to solve the least square optimization:
N
X
min0 ||Rk − (µ + W βk )||2 .
µ,W,βk ,W W =I
k=1
Optimizing for µ implies

µ∗ = µN ,
i.e. the sample mean follows. Optimizing βk implies (orthonormality of the v 's):
βk = W 0 (Rk − µN ).
Inserting this in the objective function implies
N
X
min ||W (Rk − µN )0 W W 0 (Rk − µN )||2 = (N − 1)tr(W 0 C S W )
W,W 0 W =I
k=1
with CS the sample covariance and tr the trace of a matrix. But the matrix under the
trace can be diagonalized according to the spectral theorem of linear algebra applies:
Proposition 41. Let C be a symmetric, positive denite and real matrix of dimension
N × N . There exists a diagonal matrix Λ and matrix W such that
W 0 CW = Λ. (3.54)
The diagonal elements of Λ are real-valued and positive (the eigenvalues λ1 , ..., λN ). The
eigenvalues solve the polynomial equation det(C − λI) = 0 with I the identity matrix.
Given any eigenvalue λk , the solution of the linear equation Cvk = λk vk is called an
eigenvector vk . They form an orthonormal basis and W = (v1 , ..., vN ).
Hence, the PAC is given by nding the largest eigenvalues and the corresponding
eigenvectors. The restriction to the largest eigenvalues means 'de-noising' the covariance
matrix. A covariance matrix C of dimension N × N does not tell us how much the
unobservable risk rivers of the N assets add to the total portfolio variance. Transforming
the matrix using PCA allows us to derive how important the risk factors are in explaining
portfolio risk? Consider Figure 3.11 where in the left panel the closing values of the Dow
and S&P 500 index are shown. The two series are heavily dependent: A data point of
the Dow corresponds to a S%P closing price such that the pair is close to the diagonal
(think about the bifurcation for low closing prices). The dependence can be o-set, if we
rotate the coordinate system. In the new coordinate system, data points have almost no
variance in the y2 direction but only one in the y1 direction. Therefore, the y1 -direction
factor explains most of the portfolio variance. De-noising then means to neglect the
y2 -risk contribution. PCA does this.
Figure 3.11: Closing values for the S&P 500 and Dow Jones Index in 2006. The red
coordinate systems denote the rotation applied in PCA.
The eigenvectors explain the variance of the factors in (2.70):
σp2 = hφ, Cφi = hφ, W 0 ΛW φi

X
= hW φ, ΛW φi = hψ, Λψi = λi ψi2 . (3.55)
i
Factors with low eigenvalues add only little to the portfolio risk and are therefore avoided
- the de-noising of the covariance matrix. But the eigenvalues that are important from
a risk perspective are the least important ones from a portfolio optimization perspective
where C −1 matters, see (3.3)φ = 1θ C −1 µ. But the eigenvalues of the information matrix
are the reciprocal values 1/λk of the eigenvalues λk . This trade-o between risk and
investment is one reason why portfolio managers often do not use portfolio optimization
methods. Furthermore the small values of the inverse eigenvectors needed for optimal
portfolios are not robust - a small change of the values heavily changes the portfolio.
Therefore, regularization techniques are used. Regularization techniques of the covariance
matrix are factor analysis, shrinkage methods or random matrix theory. Regularization
of the whole optimization program is done by introducing constraints on the decision
and state variables.
Consider the matrix

2.25 0.4330
M= .
0.4330 2.75
This matrix is symmetric. Solving the eigenvalue equation
det(M − λI) = (2.25 − λ)(2.75 − λ) − 0.43302 = 0
we get the two eigenvalues, λ = 3 and λ = 2. Therefore, matrix M is also positive denite
and satises all the mathematical properties of a covariance matrix. The information
matrix M
−1 has the inverse eigenvalues 1/3 and
1
2 on its diagonal, which shows that the
ranking order reversion. Solving the two linear systems for the eigenvectors implies
v1 = (−1.73205, 1)0 , v2 = (1, 1.73205)0 .
The two vectors are orthogonal.
[% ] PCA of C
Factor 1 Factor 2 Factor 3
Asset 1 65 -72 -22
Asset 2 70 69 -20
Asset 3 30 -2 95
EV 8 0.8 0.3
Cumulated σp -contribution 88 97 100
Table 3.8: PCA analysis of covariance matrix. Note that the eigenvalues of C −1 are
12, 119, 380 for the factors 1, 2, 3, i.e. the inverse ordering relation compared to the
covariance matrix. The rst factor in the covariance matrix is a market factor since all
components in the eigenvector are positive. It has the largest eigenvalue and contributes
88 percent to the portfolio's volatility.
3.4.3 Linear Factor Model

Consider the linear factor model (2.70) with N assets and K risk factors F . How does
one estimates the model? Let R = (R1 , . . . , Rt ) be the N × T matrix and assume that
K < N. Then (Kempthorne, Factor Models, MIT Lecture Notes, Fall 2013)
• Step 1: PCA analysis

1
• x̄ = T XI, means of the rows with I the identity matrix.
• De-meaning of returns X ∗ = X − x̄I
• C
b= 1 ∗ ∗0
Sample covariance
TX X .
• PCA: C
b=W
cΛ c0.
bW
• Step 2: Fixing initial estimates
• α
b0 = x̄
• βb0 = W b m ) 12
cm (Λ where the subindex indicates the submatrix of the rst m columns
• D b − diag(βb0 βb0 )
b 0 = diag(C)
0
b0 = βb0 βb0 + D0
• C 0
• Step 3 Adjustment
• Adjust sample covariance to b∗ = C

C b−D
b0
• Adjust PCA, i.e. compute eigenvalues and eigenvectors for b∗ = W

C cΛ c0;
bW update
the eigenvalue and eigenvector matrices
• Repeat the initial xing steps leading to βb1 , D

b1 and b1 = βb1 βb0 + D1 .
C 1
• Step 4: Generate sequence of estimates
• Repeat the adjustments of the last step
• Leads to sequence of triple βbk , D

bk, C
bk for k = 1, 2, . . . until Ds becomes suciently
small
• Finally, use the estimates from the last step
Another approach to nd the parameters is to use a maximum likelihood estimation.

Geometrically, the matrix β is found by an orthogonal projection of the returns on the
set generated by the factors. This projection are the betas in beta pricing models or the
factor risk premia in the APT model. Analytically, β is given by the eigenvectors of the
PCA.
Example Roncalli (2104)
Consider the S&P 500, SMI, Eurostoxx 50, and Nikkei 225 indices from Apr 1995 to
Apr 2015. Calculating the correlation matrix on a weekly basis using the closing prices:
 
1
 0.8 1 
ρ=
 0.82 0.88
s
1 
0.67 0.56 0.58 1
The data indicate that the correlation between the European and American markets is
stronger than between the Japanese market and the European or American one. We
therefore set up a two-linear-factor model.
The matrix β follows from the likelihood estimation

−0.015 0.21 0.29 0.35
β=
.91 0.93 0.96 0.76
The portfolio is long only in one factor, the market factor by denition, and long/short
in the second factor. Here it is short in the S&P 500 and long in the other three indices.
3.4.4 Dimension Reduction II: Random Matrix Theory

Given a PCA analysis of a covariance matrix - how noisy are the estimated eigenval-
ues? Random Matrix Theory (RMT) considers the study of the eigenvalues, eigenvectors
of large-dimensional matrices whose entries are sampled according to known probability
densities.
21 Basically, if the eigenvalue distribution of a covariance matrix is close to those
of a matrix of completely random entries, then randomness dominates in the covariance
matrix. A main feature of RMT is universality: The asymptotic behavior of random
matrices is often independent of the distribution of the entries. A second one is that
the limiting distribution takes non-zero values only on a bounded interval, displaying
sharp edges. Sharp edges indicate that eigenvalues outside of the asymptotic range are
non-random.
We write the empirical covariance (3.53) in the form
C S = RR0
where R is a N ×T matrix whose rows are the time series of the returns, one row for
each stock. We assume that returns are normalized by their standard deviation such
that their variance is 1. Suppose that the entries of R are random IID variables with
21 References for RMT are Wigner (1951), Wishart (1928), Pastur and Marchenko (1967), Levina and
Vershynin (2012), Vershynin (2012), Gatheral (2008).
mean zero and variance σ2, i.e. R ∼ N (0, C). R is a random matrix. Using PCA,
the hope is to nd a low dimensional structure in the distribution which corresponds
to large eigenvalues of C. How close are the spectral properties of CS and C? If N is
xed and T → ∞, the law of large numbers guarantees E[C S ] = C . But N is often of
the order of T or even larger. In this case it is not clear whether C S converges towards C .
We start with C = I, i.e. there is no low dimensional structure. For T = 500, N =

1000 the histogram in Figure 3.12 shows that for nite N there is a positive probability
of nding eigenvalues that may be above or below the theoretical bounds. The red line
is the eigenvalue distribution predicted by the Marchenko-Pastur distribution.
Eigenvalues Distribution C= Identity Matrix

Marchenco-Pastur Law
10%
9%
8%
7%
6%
Frequency
5%
4%
3%
2%
1%
0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3
Eigenvalues
Figure 3.12: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The red line is the theoretical eigenvalue distribution of Marcenko and Pastur. Source:
Gatheral [2008].
The random matrix corresponds to the null hypothesis that the set of stocks consid-
ered are strictly independent, and that the correlation matrix is the identity matrix. Any
deviation from this structure in the empirical correlation matrix suggest the presence of
true information. All eigenvalues which belong to the theoretical spectrum of eigenvalues
are noisy and should not be considered in portfolio optimization.
How are the PCA eigenvalues related to the eigenvalues of the random matrix covari-
ance? The Theorem of Marcenko and Pastur provide the answer:
Proposition 42 (Marcenko-Pastur). Let R be the random matrix dened above. In the

limit N, T → ∞ where the ratio Q := T /N ≥ 1 is kept constant, the density of eigenvalues
of λ is given by p
Q (λ+ − λ)(λ− − λ)
ρ(λ) = (3.56)
2πσ 2 λ
where r 2
2 1
λ± = σ 1± .
Q
The random matrix CS λpm are the theoretical minimum
is a random Wishart matrix,
and maximum eigenvalues of the random correlation matrix and ρ is the Marcenko-Pastur
density.
22 The proof of the theorem is based on the following moment expansion and
combinatorics: Z λ+
1
E[R0 R)k ] = λk dρ(λ) .
N λ−
What can be said about the distribution of the largest eigenvalue? Can we nd
the cut-o the eigenvalues separating noisy from eigenvalues with have true information?
What can be said about the eigenvalue distribution if N >T and can the IID assumption
in RMT be relaxed? The answer to the rst question is given by the Tracy-Widom law:
The probability distribution of the largest eigenvalue can analytically expressed in case
of normally distributed random variables. We refer to the literature for details.
Figure 3.13 compares the case of a random identity matrix with the eigenvalue dis-
tribution of a risk model (blue histogram). The higher frequency for large eigenvalues
indicates that the largest eigenvalues in the risk model which determine the risk is not
driven by noise compared to the identity matrix assumption. In other words, the risk
model is able to capture true risk information. A similar conclusion follows for the small
eigenvalues which dominate in optimal portfolio construction. For the intermediate eigen-
values there is virtually no dierence to the pure noise case. These the factors which
matter in the Markowitz model in the long-short portfolio of the expected return minus
the beta hedge which lead to unstable optimal allocations.
3.4.5 Equal Weighted Portfolios

A radical approach in controlling the estimation error of the covariance matrix is to get
rid-o this matrix. DeMiguel et al. (2009) compare 14 optimized portfolio approaches
across 7 datasets with the 1/N EW investment. This models does not need need any
correlation estimate. Surprisingly, 1/N is dicult to beat by the 14 optimal portfolios.
They empirically compare the Sharpe ratios, analytically derive the critical estimation
window length for mean-variance strategy to outperform 1/N and use simulations to
extend the models to classes of models which are designed to control estimation risk.
The ndings are:
• Empirically none of the 14 portfolio models consistently dominates 1/N across all
data sets in terms of Sharpe ratio and turnover.
22 ρ(λ) is dened as the limit ρN (λ) =: 1/N PN δ(λ − λj ) with δ the Dirac delta function.
j=1
Eigenvalues Distribution C= Identity Matrix

versus Risk Model
10%
9%
8%
7%
6%
Frequency
5%
4%
3%
2%
1%
0%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
Eigenvalues
Figure 3.13: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The blue histogram corresponds to a the eigenvalue frequencies of a risk model.
• Using US stock data, for 25 assets in the portfolio the critical estimation window
is around 30 000 months of data. This gure doubles for twice as much assets in the
portfolio.
• Models which control for estimation risk also need very long data series to outper-
form 1/N .23
These results contradict the common view that heuristics is less successful than statis-
tical optimization models. Ignoring part of ambiguous information - insucient historical
data for estimation of model input parameters - is what makes heuristics 1/N robust for
the unknown future.
The 1/N model can also be used to get insights about which estimation risk is more
severe - return or covariance risk? We follow Rohner (2014). Assume that µ and C are
known for 10 assets such that 1/10 is invested optimally in each asset. To see the impact
if either of the two parameters is not known, simulate multivariate normal returns, use
rolling window estimation with an integration period of 100 and calculate 300 sample
means and estimated covariances. Use these estimates to calculate for each date the
optimal portfolio weights, see Figure 3.14.
Even if the distribution of returns is known, estimated portfolio weights can deviate
23 They consider Bayesian portfolios, portfolios with moment restrictions, portfolios with short-sale
constraints and combinations of optimal portfolios.
Figure 3.14: Signicance of return and covariance estimation risk. Source: Rohner [2014].
largely from their theoretical optimal values. It follows, that estimation error in expected
returns has a larger impact than dependency estimation errors on the optimal portfolio
weights. Therefore, GMV portfolio which do not depend on estimated returns are more
robust than other ecient portfolios.
3.4.6 Optimization and Estimation Risk

We noted that one has to dene a procedure outside of the investment optimization
program to x the values of the model parameters. We write for the expected mean
variance utility function, see also (3.1),
θ θ
U (φ) := E[u] = hφ, µi − hφ, Cφi =: µp − σp2 , (3.57)
2 2
and for the portfolio rule φb based on the estimation of historical data DT in T periods
φb = f (DT ) (3.58)
with f a statistical function. φb is random function and therefore, the out-of-sample

variance and mean are also random:
bp2 = hφ,
σ b C φi bp = hφ,
b , µ b µi .
The random out-of-sample objective function reads
U
b (φ) b µi − θ hφ,
b := hφ, b C φi
b . (3.59)
2
The random dierence L = U −U

b is the loss function and its expected value is the
risk function. We assume that all risky asset returns are IID normally distributed and
that the length of the historical time series T is large enough compared to the number
of risky assets N such that estimated matrices can be inverted.
We rst consider plug-in examples. Using the maximum likelihood estimates for the
two unknown
24 as plug-in values in (3.58), the rule φb follows. This rule is the most
ecient estimator of φM V . But this estimator is not optimal if we want to optimize
the expected out-of-sample performance. The specic assumptions allow us to compare
explicitly the estimated strategy φb with the optimal but unknown φM V in (3.3):
T
E(φ)
b = φM V .
T −N −2
If T > N + 2, the investor using the estimated values will take riskier positions than the
investor which unrealistically assumes to know the true parameters (φM V ) by considering
the portfolio variance. For innite long time series, the two strategies agree. Kan and
Zhou (2011) provide a detailed analysis.
3.4.7 Bayesian Approach

The investor cares about the prior distribution about the model parameters and the
goal is to update this distribution to the posterior one. Note that there is no optimal
prescription how the prior distribution is found. Assuming normal returns, the prior
distribution - called diuse prior P0 - about the mean and the covariance matrix is
distributed as (see Stambaugh (1997))
N +1
P0 (µ, C) ∼ |C|− 2 . (3.60)
Using this distribution, the posterior distribution P (µ, C|Data) conditional on the data
set available is a t-distribution with T −N degrees or freedom. The optimal Bayesian
investment rule becomes
1 b −1 T −N −2
φbBay = f (T, N ) C µ
b = f (T, N )φbM V , f (T, N ) := (3.61)
θ T +1
The Bayesian investor therefore holds the same proportion C −1 µ portfolio as the MV
investor who does not consider estimation risk. But the function f (T, N ) shapes the
portfolio investment uniformly for all assets. Since f <1 for any reasonable problem,
the investment in the risky assets is smaller in the Bayesian approach - estimation risk
is identied and priced - than in the MV optimal plug-in case without estimation risk.
This follows from
T
E(φbBay ) = φM V .
T +1
24 We estimate the sample mean by µb = 1
PT
Rt and similarly for the covariance estimate.
T t=1
The dierence between the Bayesian and the optimal investment φM V is small in case
of a diusive prior. For arbitrary long time series of data the model has 'learned' the
true model parameters - the two rules then agree. To obtain a more sensitive dierence
between the Bayesian and the optimal approach other priors, informative priors are con-
sidered, see Pastor and Stambaugh (2000).
Kan and Zhou (2011) show that the Bayesian approach leads to a better out-of-
sample performance than plug-in approaches: The Bayesian portfolio rule based on an
diusive prior uniformly dominate the classic plug-in approaches. They consider a two-
fund portfolio rule
φbout = f ∗ φbM V (3.62)
which dominates the Bayesian rule based on the diusive prior uniformly (and hence the
plug-in rules). f∗ maximizes the expected out-of-sample performance. This holds since
f (T, N ) in (3.61) does not maximizes the expected out-of-sample performance. Opti-
mizing f , however, requires unrealistically long time series in order to get a substantial
improvement. If this is not the case, the optimized expected out-of-sample approach still
leads to negative performances. How can we encompass negative performances? They
therefore consider a a three-fund separation portfolio rule which shows in simulation ex-
periments higher expected out-of-sample performance than the former methods. This is
not an optimal portfolio. But it hedges against parameter uncertainty risk and therefore
leads to positive expected out-of-sample returns if the weights in the combination are
properly chosen. The shrinkage approach of Jorion (1991) is a particular three-fund rule,
see below.
3.4.8 Linear Shrinkage of the Covariance Matrix

Linear Shrinkage is the approach where a highly structured estimator F is combined with
the unstructured sample covariance matrix CS in the form
νF + (1 − ν)C S , ν ∈ [0, 1].

The shrinkage value ν is constant. Leodit and Wolf (2003) state: The beauty of the
principle is that by properly combining two 'extreme' estimators one can obtain a 'com-
promise' estimator that performs better than either extreme.
To illustrate the shrinkage idea, consider a N ×N covariance matrix where σij is the
non-observable true covariance for i 6= j and covij is the sample covariance. The squared
deviation of the weighted average from the true value reads (1 − ν)covij − σij )2 which
is a loss measure. Since the sample covariances are random, we take the expected loss
function and minimization is a routine quadratic optimization with the following optimal
shrinkage intensity P
var(covij )
j>i
ν= P .
|var(covij + σij |
j>i
Hence, to x the optimal shrinkage intensity ν , the variances have to be estimated. Since
the nominator and dominator are both positive and the latter one dominates the former
one the shrinkage intensity takes values in the unit interval.
The choice of the structured estimator is not unique. Ledoit and Wolf (2003) have
chosen the Sharpe (1963) single-factor matrix, Ledoit and Wolf (2008) used the constant
correlation matrix and Jorion uses the GMV returns as follows. Since a GMV portfolio is
the only ecient portfolio where no return estimates are needed, moving away from the
GMV portfolio, increases estimation risk. The idea of Jorion is that the estimated returns
are shrinked the closer they are to the GMV. The shrinking of asset return towards the
GMV returns is used to dene the expected return µS as a convex combination
µS = (1 − ν)b
µ + νµ0,GM V e (3.63)
with e a vector of 1's, µ

b the arithmetic average of the estimated returns and µ0,GM V =
b −1 µ
he,C bi bS
C
b −1 ei the estimated return of the GMV portfolio and
he,C
the estimated covariance
matrix of the returns. Jorion shows that choosing
φb N +2
ν= , φb =
φb + T hb b −1 (b
µ − µ0,GM V e, C µ − µ0,GM V e)i
minimizes estimation risk. For T → 0, distrust increases since ν converges to 1.
3.4.9 Non-Linear Shrinkage of the Covariance Matrix

The class of non-linear shrinkage estimators was introduced by Stein (1975, 1986). Intu-
itively, small eigenvalues of the sample covariance matrix are pushed up and the large ones
pulled down by an amount that is determined individually for each eigenvalue. Given
N eigenvalues, there are N degrees of freedom. This therefore denes an intermediate
approach to covariance estimation between the low dimensional optimized approaches
and the unstructured sample estimate. Based on the work of Stein and Theorem of
Marcenko and Pastur, see also the Random Matrix Section, Ledoit and Wolf (2018)
derive the following Proposition were we use a verbal description.
We compare in this asymptotic set-up the dierent models of next sections where the
goal of the investor is to maximize the reward-to-risk ratio
hφ, mi
max p
φ hφ, Cφi
where m is a predictive signal of future returns and not the usual expected return.
A solution portfolio is proportional to C −1 m and Ledoit and Wolf (2018) choose the
proportion constant such that scale independence holds, i.e.
p p
hm, mi −1 hm, mi b −1
φ= C m , φb = C m, (3.64)
hm, Cmi hm, Cmi
b
are the optimal portfolio vectors where the second formula is the plug-in estimator. Since
the goal is to obtain the best estimator out-of-sample, the loss function for portfolio
selection is dened sa the out-of-sample variance
L = hφ,
b C φi.
b (3.65)
Contrary to the above loss function denition, in the asymptotic limit the loss function
converges to a non-stochastic limit, see below, and no expectation is needed. Calculations
show that given the signals m, the minimization of the above loss function is equivalent to
maximizing the Sharpe ratio which is equivalent to maximize a quadratic utility function
of wealth such as given in (3.59).
Proposition 43. Assume the asymptotic setup in Section 3.4.11, the class of rotation-
equivariant estimators (Stein [1975]) and that the signal distribution mT is rotation in-
variant and independent of CTS .
A The loss function (3.65) is a deterministic function. There is no need to consider the
expected loss function.
B The limit loss function is characterized by the limit shrinkage function and the Stiltjes
transform of the limiting empirical distribution of sample eigenvalues.
There exists a limiting loss function estimator.
We refer to Ledoit and Wolf (2018) for a detailed discussion.
3.4.10 Comparing Dierent Approaches

Kan and Zhou (2011) compared the expected out-of-sample performance for dierent
time windows T of the historical data for 13 portfolios. They assume a relative risk
aversion in the optimal portfolio rule (3.3) θ = 3. The asset space is given by the N = 10
largest stocks in the NYSE from Jan 1926 to Dec 2003. The mean and covariance matrix
are estimated from monthly returns and the excess return of the 10 assets is assumed to
be generated from a multivariate standard normal distribution with the estimated mean
and covariance as parameter values. They report results for the following strategies:
• I: Theoretical optimal, i.e. the investor knows the true µ and C.
• II: Investor knows the squared Sharpe ratio µ0 C −1 µ of the tangency portfolio but
not the two components of the Sharpe ratio. The investors theoretically invests an
optimal amount in the ex-ante optimal tangency portfolio.
• III: Theoretical three-fund portfolio.
• IV: Plug-in portfolio. µ

b and C
b are plugged in resulting from the maximum likeli-
hood method.
• V: Bayesian portfolio rule.

• VI: Rule II where the theoretical squared Sharpe ratio is replaced by its estimated
value.
• VII: Jorion's shrinkage rule.
• VIII: Estimated three-fund portfolio, i.e. III where the theoretical values are re-
placed by their estimates.
Rule T = 60 T = 180 T = 300 T = 420

I 0.419 0.419 0.419 0.419
II 0.044 0.122 0.171 0.210
III 0.133 0.191 0.224 0.248
IV -5.122 -0.748 -0.225 -0.025
V -2.996 -0.584 -0.170 0.002
VI -0.185 0.060 0.133 0.177
VII -0.899 -0.030 0.117 0.182
VIII -0.343 0.051 0.143 0.189
Table 3.9: Out-of-sample performance for 8 portfolio rules with 10 risky assets (Kan and
Zhou [2011]).
Table 3.9 shows that in order to obtain positive expected out-of-sample performance
for any rule very long time series are needed. T = 420 months means 35 years of data.
The rst three rules are all leading to positive performance but unfortunately, they are
theoretical models. Replacing the unknown theoretical parameter values by their sample
estimates the positivity vanishes for short windows. The direct plug-in approach based
on maximum likelihood estimates is the worst model w.r.t. out-of-sample performance.
The shrinkage rule and the three-fund rule lead for large windows to the same values but
for shorter time windows the superiority of the three-fund rule over the Jorion's rule is
evident.
3.4.11 Comparing Dierent Approaches - Asymptotics

Ledoit and Wolf (2018) provide an asymptotic analysis on a monthly basis using Cen-
ter of Security Prices, Jan 1 1972 until Dec 31 2011. The out-of-sample period ranges
from Jan 19 1973 to Dec 31 2011, i.e. T = 480 months. Each month the covariance
matrix is estimated using the most recent T = 250 daily returns. Portfolio sizes N are
30, 50, 100, 250, 500 covering the majority of important stock indices. They rst x the
500 largest stocks with a complete return history over 1 year and expected over the next
month and then select at random the N stocks.
For the asymptotic case were N, T tend to innity while keeping N/T nite the
asymptotic set-up has to be dened properly, see Ledoit and Wolf (2018) for details.
All variables are indexed with the subscript T. One basic assumption is that the eigen-
values of CT are sorted in increasing order and the empirical eigenvalue distribution
ET converges for T, N → ∞ to a limit law E where the set E 6= 0 is bounded away

from zero. Furthermore, one assumes that the sample covariance matrix representation
√ √
CT = T1 YT0 Y
√T = 1
T C T XT
0 X
T CT consists of the observable Y and the non-observables
X, C with CT a symmetric positive-denite matrix with the square-root eigenvalues of
CT and the T × N matrix X generating the data has IID random variables with nite
moments.
Then the sample CTS admits a PCA decomposition and the empirical distribution
S
function ET of the eigenvalues converges to a continuously dierentiable limit distribu-
S
tion E for T to innity.
The test goal is the estimate of the GMV portfolio without any short-sales restrictions.
They consider 11 portfolio but we restrict to the following cases:
• 1/N portfolio.
• Sam: Sample covariance matrix estimate portfolio.
• Lin: Linear Shrinkage portfolio.
• Non-Lin: Non-linear Shrinkage portfolio.
• Inv-Non-Lin: Non-linear Shrinkage portfolio where the information matrix is esti-

mated.
• Sharpe: The portfolio where the estimate is given by a single factos.
• FF: Estimated covariance is given by the Fama-French three-factor model.
Table 3.10 presents the results. Since the standard deviation of the true GMV portfolio
decreases in N , this should be reected by the dierent constructions. Increasing from
N = 30 to N = 500, the standard deviation of 1/N decreases by only 1.1 percentage
points compared to Lin and Non-Lin with 3.9 and 4.4 percentage points. In general, 1/N
is consistently outperformed in terms of the standard deviation by all other portfolios
with exception the sample portfolio. Non-Lin has the uniformly best performance among
the rotation-equivariant portfolios. For N = 250 and 500, Sharpe ratio gains are 0.08
and 0.06 or in relative terms 15% and 12%, respectively. If one forms the factor portfolio
Non-Lin-Sharpe, then it outperforms FF which outperforms SF (numbers not displayed.)
Summing up, Non-Lin dominates all other rotation-equivariant portfolios portfolios in
terms of the standard deviation and additionally Lin in terms of the Sharpe ratio. Con-
sidering the summary statistics of portfolio weights over time, the most dispersed weights
among the rotation-equivariant portfolios are found for Sam. The three shrinkage meth-
ods have generally the least dispersed weights. The authors provide robustness tests,
tests with transaction costs and tests where individual stocks are replaced by the Ken
French portfolios.
Portfolio 1/N Sam Lin Non-Lin Non-Lin-Inv Sharpe FF

N=30
AV 11.14 8.64 8.52 8.71 8.72 8.22 9.39
SD 20.05 14.21 14.16 14.08* 14.08 14.08 14.59
SR 0.56 0.61 0.6 0.62 0.62 0.56 0.66
N=50
AV 9.54 4.65 5.1 5.21 5.22 5.22 5.44
SD 19.78 13.15 12.75 12.68*** 12.68 13.04 12.51
SR 0.48 0.35 0.4 0.41 0.41 0.4 0.43
N=100
AV 10.53 4.74 4.99 5.1 5.12 4.81 5.8
SD 19.34 13.11 11.79 11.52*** 11.55 11.96 11.3
SR 0.54 0.36 0.42 0.44 0.44 0.4 0.51
N=250
AV 9.57 275.02 5.81 6.26 6.43 5.95 6.6
SD 18.95 3,542.90 10.91 10.34*** 10.49 11.3 10.47
SR 0.5 0.08 0.53 0.61 0.61 0.52 0.63
Table 3.10: Performance measures for various estimators of the GMV portfolio. AV,
average; SD, standard deviation; SR, Sharpe ratio; FF, Fama-French three-factor model.
All measures are based on 10,080 daily out-of-sample returns in excess of the risk-free
rate. In the rows SD, the lowest number appears in bold. In the columns Lin and Non-
Lin, signicant out-performance of one of the two portfolios over the other in terms of
SD is denoted by asterisks: *, **, and ** indicate signicance at the 10%, 5%, and 1%
level, respectively (Ledoit and Wolf [2018]).
3.5 Factor Models

One searches for liquid objects - risk factors - which are (i) random variables which
should explain/generate the return of assets, (ii) not divisible into smaller parts and (iii)
dierent risk factors should do not contain the same risk sources (risk unbundling). Do
risk factors exist? How are they selected? Several dierent approaches are used to select
factors. The rst one uses theory. The classic is the CAPM where the market portfolio
return is only factor which determines expected returns. Merton (1973) extended the the-
ory to the inter-temporal context. In this model any state variable that predicts future
investment opportunities such as term premium, volatility premium, default premium,
ination dene additional factors. Then, Breeden added consumption to capital asset
pricing which links asset returns to their covariances with marginal utility of consumption.
Statistical factor selection is a second approach with the arbitrage pricing theory
(APT) of Ross as the classic model. Finally, identifying factors based on rm charac-
teristics with the famous the three-factor model of Fama and French (1993) denes the
empirical approach to facto selection.
From a risk perspective, priced risk such as the market risk premium is distinguished
from idiosyncratic risk25 :
Denition 44. Let Ri be the return of an asset and RSys be a return of a portfolio with
the same systematic risk. The dierence
E(Ri ) − RSys = αi (3.66)
denes idiosyncratic risk (or alpha) of the asset i.
Adding more and more stocks to a portfolio reduces idiosyncratic risk, see Proposition
25. In this sense, alpha does not scale. Is there a decomposition of asset returns which is
scalable? The so-called Professors' Report on the Norwegian GPFG (Ang et al. [2009])
states that the risk factor decomposition represents 99.1 percent of the fund's return
variation.
3.5.0.1 Style Investment: Quality and Momentum

Empirical observations of some liquid trading strategies, all dierent from investing in
the broad market, can show empirical time-averaged persistent return patterns
in the data. The patterns are based on grouping the assets in so-called styles or factors.
25 Alpha has dierent meanings:

• Active portfolio management: a better active manager will have a more positive alpha at a given
level of risk.
• Passive portfolio management: the better the passive portfolio matches the benchmark portfolio,
the smaller is alpha.
3.5. FACTOR MODELS 187
The Fama-French factors value and size are examples. The factors capture rm charac-
teristics such as valuation ratios derived from the balance sheets and income statements
or market parameters such as volatility. A characteristic is then mapped into a tradeable
liquid strategy in a long-only or a long-short (market neutral) combination. Figure 3.15
illustrates the decomposition of risk premia into traditional and alternative ones. We
stress that the dierent premia are overlapping, i.e. they are not orthogonal to each
other.
Risk Premia
Traditional Premia Alternative Risk Premia
Grouping
Interest
of Assets Equities Credit Currencies Commodities
Rates
Carry EQ Dividends IR Carry CR Carry HY FX Global Carry CO Carry
EQ Merger Diversified vs. IG (Curve)
Arb IR Muni/Libor
Value EQ Value FX Value CO Value
Equi- Interest Real Volatility EQ Glob Vol IR Vol FX Vol Basket CO Vol
ties Rates Estate Divers.
EQ Mean FX Vol Single
Reversion CO Vol Single
Momen- EQ Moment. IR Moment. CR Moment. FX Moment. CO Trend

tum CO
Momentum
Idiosyn- EQ Low Beta

cratic EQ Quality
Covered by Global Market Only weakly covered by Global Market Portfolio and Factors driving the Premia Orthogonal
Portfolio (Assumptions)
Figure 3.15: Overview.
3.5.0.2 Quality Premium

We consider the quality risk factor (EQ Quality), see Figure 3.16, for all stocks in the
MSCI Europe. One calculates on a monthly basis rm specic gures such as protability,
net prot or degree of indebtedness. This allows to calculate the quality gure (Q-
gure) for all rms. To consider the sector structure, the Q-gure is normalized by
using the average sector Q-gure and the sector volatility. This denes the Q-score
- the characteristic. Ranking these scores one observes that on average those rms
with a high score led to a larger return than those with a lower score. This is the
empirical feature called EQ Quality. If one believes that this historical return pattern
will continue to hold on average in the future, one can invest in such a strategy. Large
investment banks and asset managers oer the tradeable products which transform the
above empirical observation into a nancial asset. A long-short EQ implementation
removes directional risks. There are institutional investors which do not want or do not
can invest in long-short vehicles. But investing long only in a risk premia is not market
neutral. Market neutrality is lost and correlation between risk premia and between
traditional asset classes moves signicantly away from a weak correlation structure. But
a long-short strategy is not free of risk, see the momentum crash below. The producer
oer the risk premia products as fully transparent indices. Dierent wrappers are used
for risk premia investment - UCITS funds, ETFs or structured notes.
Monthly Quality Figure Final selction

MSCI Europe Normalization of Q-Figure
calculated such as liquidity,
(Q-Figure) borrowing costs
company
𝑄𝐾 − 𝑄𝐾𝑆𝑒𝑐𝑡𝑜𝑟
figures 𝑄𝑠𝑐𝑜𝑟𝑒 =
𝜎𝑆𝑒𝑐𝑡𝑜𝑟
Quality Ran
Stock Stock Q-Score 20 % highest Q-
Figure k
Score
y Long Position
StockA 2.5 Stock
1 3.0
C
StockB
1.6
2 Stock F 2.95
StockC
Historical
3.8 Stock
3 2.93 return
y A
y StockD 0.1
Stock
4 2.91 20 % lowest
StockE Z
2.0 Q-Score
Stock Short Position
5 2.86
… … S
… …
… … Historical
n Stock -3.0 return ARP Strategy
B
Figure 3.16: Construction of the risk factor quality.
3.5.0.3 Momentum Factor

The idea is to extrapolate past performance into the future by 'buying the past winners
(long) and selling the past losers (short)', see Figure 3.17.
Daniel and Moskowitz (2012) consider a time series from 1932 to 2011 using inter-
national equities; there are 27 commodities, 9 currencies, and 10 government bonds in
their data set. They nd that in the period past WWII through 2008, the long/short
equity momentum strategy had an average return of 16.5 percent per year, a negative
correlation (beta) with the market of −0.125, and an annualized Sharpe ratio of 0.82.
They document that momentum is pervasive for equities, currencies, commodities, and
futures. The maximum monthly momentum return was 26.1% and that the worst ve
monthly returns were −79%, −60%, −46%, −44%, and −42%. Intuitively, the premium
is positive if the winner's return is larger than the loser's one. In a momentum crash,
past winners will be future losers and vice versa - you are wrong both the long and short
leg of the investment. This happened in fast market rebounds:
• In June 1932 the market bottomed. In the period July-August 1932, the market
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov J=3 K=3
Screen Wait Buy / Sell
Formation
Period
Skip 1
Month
Holding
Period
Figure 3.17: We assume that stocks are screened based on their past return over the
last J = 3 months, where also J = 6, 12 months are used. This screening identies
the past winners and losers and denes the formation period. After this identication,
no action is taken for one month. The reason is to lter out any possible erratic price
uctuations in the past winners and losers selection portfolio. Finally, in the holding
period the selected stocks are hold for K = 3 months where again longer holding periods
are possible. Afterwards the positions are closed. This procedure is repeated monthly
which leads to an overlapping roll-over portfolio allocation.
rose by 82 percent. Over these two months, losers outperformed winners by 206
percent.
• In March 2009 the US equity market bottomed. In the following two months, the
market was up by 29 percent, while losers outperformed winners by 149 percent.
Firms in the loser portfolio had fallen by 90 percent or more (such as Citigroup,
Bank of America, Ford, GM). In contrast, the winner portfolio was composed of
defensive or countercyclical rms like AutoZone.
The rationale is simple. Suppose markets are crashing. Then losers already lost in value
before the crash and during the crash they are becoming extremely cheap if one beliefs
that they will not default. But once investors are convinced that markets will recover,
the demand for the losers exceeds the one for the gainers which leads to the winner-loser
reversal.
Byun und Jeon (2018) suggested to adapt the momentum strategy in order to re-
duce the impact of momentum crashes. They considered to observe past returns for 12
Figure 3.18: Long-only momentum strategies. Left panel - momentum strategies 1947-
2007 Right panel - momentum strategies during the GFC (Daniel and Moskowitz [2012]).
months but only invest 1 month and the decision criterion for going long and short is
past cumulated 52 weeks return. Therefore, the authors expect that 52-week high sub-
sumes the predictive power of past 12-month return and investing only for one month
adapts to often seen fast momentum reversals. This mimics that as the market rebound,
investors demand increases on stocks that are far from their 52-week highs. This bias
induces that the 52-week high negatively related with future returns. The authors show
that during the crash periods, stocks far from their 52-week highs outperform stocks near
their 52-week highs.
Figure 3.19 shows the return of investing $1 in 1956 until 2015 in a market factor,
and the styles size, value or momentum.
26
In this long-term view three observations are immediate: Simple grouping of assets
can lead to signicant outperformance of the market return for long periods and second,
there are short time periods where factor investment can crash. This is most dramatic
seen for the momentum crash during the GFC. Finally, the factors do not seem to be
independent - momentum and markets crash and boom in parallel during the GFC and
the following decade.
Is there a theoretical foundation of styles/risk factors? Can theory dierentiate be-
26 Value eect. Low price-to-book (P/B) stocks (value stocks) typically outperform high P/B stocks
(growth stocks). Size eect. Smaller stocks typically outperform larger stocks.
Figure 3.19: Investment return of $1 in 1956-2014 in the market, market plus value,
market plus size and market plus momentum factor (Ken French's website).
tween style which are supported in in equilibrium and styles which are fantasies? How
are risk factors identied and turned into tradeable strategies? How can it be that in the
long run simple grouping of assets produce much higher returns persistently, i.e. why
aren't they arbitraged away? Who is on the other side of the trades? We consider these
question in the sequel.
3.5.1 Industry Perspective

A key step for the industry about factor investing were the requirements published in
the Professor's report (2009) written for the Norwegian Government Fund. The authors
required that factors, dierent from the market risk premia but which can explain the
cross section of asset excess return, should:
• have an intellectual foundation (rational or behavioural) [Explainable].
• exhibit signicant premiums which are expected to persist in the future [Persis-
tence].
• be not correlated among themselves and to asset classes in good times and nega-
tively correlated in bad times [Independence].
• be implementable in liquid, tradeable instruments [Liquidity].

The notion of 'good' and 'bad' times is made precise in economic theory by the stochastic
discount factor (SDF), see Section 4.
The nancial industry denes factor investing similar to the Professor's report. Deutsche
Bank [2015] states additional to the above requirements:
• Accessible - risk factors must be accessible at a level of cost that is suciently low
to avoid the dilution of the return.
• Fully transparent - strategies are fully systematic and work within well-dened rules.
• Low cost - a well-dened systematic approach makes possible ecient transactions

costs.
• Flexible access - strategies can be accessed in a variety of formats â either funded
or unfunded as a portfolio overlay and in a variety of wrappers (OTC, structured
notes, UCITS funds, etc.).
Factor investing means alternative strategies dened on liquid assets and not the creation
of new, illiquid asset classes. Transparency radically changed in the last decade. Some
years ago, an investment bank's oering of a momentum strategy basically was a black
box for the investor. Today, each factor is constructed as an index with a comprehensive
documentation about the index mechanics, the risks and governance issues. Hedge funds
often use factor investing strategies. But they are not transparent.
3.5.1.1 Industry Oering

We consider the practice of factor oering by large asset managers.
27 The process of
building a risk factor portfolio is as follows (Deutsche Bank [2015]):
• Identify rst the key objectives of the portfolio and the preferences of the investor.
• Start with a long list of potential risk factors and select a core portfolio made up
of the most attractive risk factors. Figure 3.20, shows the cross asset risk factor
list of DB.
• Add any uncorrelated factors if they oer a benet to the portfolio.
• Finalize the list of selected risk factors and construct a portfolio using a risk-parity
methodology.
• Review and test the portfolio against general measures of diversication.
Figure 3.20, upper panel, shows the cross asset risk factor list of DB and some key gures.
Risk and return properties of the dierent risk factors dier. Therefore, if one invests
into a portfolio with a target volatility to control downside risk, leverage is needed because
27 The data are from Deutsche Bank (DB) or JP Morgan (JPM).

EQ IR CR FX CO
Category
Equities Interest Rates Credit Currencies Commodities
Carry EQ Dividends IR Carry Diversified CR Carry HY vs. IG FX Global Carry CO Carry (Curve)
EQ Merger Arb IR Muni/Libor
Value EQ Value FX Value CO Value

Volatility EQ Glob Vol Carry IR Vol FX Vol Basket CO Vol Divers.
EQ Mean Reversion FX Vol Single CO Vol Single

Momentum EQ Moment. IR Moment. CR Moment. FX Moment. CO Trend
CO Momentum
Idiosyncratic EQ Low Beta
EQ Quality
Annual Returns (dark), Volatilities (light) and Sharpe Ratios (diamonds)

For DB-Factors since Start Date
16.0% 1.8
14.0% 1.6
1.4
12.0%
Return & Volatilität
1.2
Sharpe Ratio
10.0%
1
8.0%
0.8
6.0%
0.6
4.0%
0.4
2.0% 0.2
0.0% 0
IR Vol
EQ Mome
COM Vol
FX Mome
IR Carry (Divers.)
EQ Vol
EQ Carry (Div)
FX Value
COM Carry (Curve)

FX Carry (G10)
IR Carry (Muni/Libor)
EQ Carry (Merg Arb)
FX Carry (Balanced)
COM Value (Backw.)
CR Mome
CR Carry
IR Mome
FX Carry (Global)
COM Mome
EQ Idios. (Low Beta)
COM Carry (Box)

EQ Idios.(Quality)
EQ Value
COM Mome (Trend)
Figure 3.20: Upper Panel: Risk factor list of DB London. Risk factors are grouped
according to their asset class base and the ve styles used by practitioners. Lower Panel:
Average annualized volatilities, returns and Sharpe ratios for the risk factors (DB [2015]).
else combining a low vol 2% interest rate risk premia with a 12% vol equity premia
makes no sense. Figure 3.12 shows monthly correlations. The lower triangular matrix
correlations are calculated for turbulent markets; those for normal markets are shown in
the upper triangular matrix.
28
The correlation for the equally weighted portfolio of risk factors implies an annual-
ized correlation of 4% in normal markets and 5% in stressed ones. The correlation with
traditional asset classes is also low. Correlations between the dierent asset classes are
28 The following periods dene turbulent markets:

• May 97 to Feb 98 Asian nancial crisis
• Jul 98 to Sep 98 Russian default and collapse of LTCM
• Mar 00 to Mar 01 Dot-com bubble bursts
• Sep 01 to Feb '03 9/11 and market downturn of 2002
• Sep 08 to Mar. 09 US subprime crisis and collapse of Lehman Bros.
• May 10 to Sep 10 European sovereign debt crisis
- ARP EQ Bonds Commodities HF Real Estate PE
ARP 5%/4% 10% 4% 7% 16% 8% 9%

EQ 4% - 6% 39% 47% 64% 27%
Bonds 6% 4% - 18% 5% 7% -20%
Commodities 6% 43% 19% - 30% 24% 28%
HF 13% 41% 11% 32% - 29% 40%
Real Estate 4% 66% 9% 34% 28% - 52%
PE 4% 78% -21% 36% 35% 52% -
Table 3.11: The correlations in the top-left cell is the average equally-weighted correlation
of a portfolio of all DB risk premia. PE means Private Equity. In the lower triangular
matrix the correlations are calculated for turbulent markets; those for normal markets
appear in the upper triangular matrix (DB [2015]).
however much larger. In this sense the risk factors are closely mutually independent
compared to the asset classes.
Low beta portfolios, that is to say, a portfolio of risk factors should have low cor-
relation to equities and bonds in all normal periods and negative correlation to equity
in turbulent markets, are of particular importance since they promise to resist a joint
downturn of all asset classes. Suitable risk factors are value and momentum risk factors
for all asset classes, low beta risk factors, quality, and US muni curves vs Libor. The
−1.6% and to bonds 7.6%. In turbulent markets,
correlation of this portfolio to equity is
correlation to equity is −37.5% and to bonds 8.8%. The Sharpe ratio is very high and
the maximum drawdown is low, at −5.6%, see Table 3.12.
Statistics Low beta portfolio
% positive 12m returns 99.5%

IRR 10.7%
Volatility 5.0%
Sharpe ratio=IRR/volatility 2.16
Maximum drawdown -5.6%
IRR/MDD 1.93
Days to recover from MDD 120
Correlation to equity -1.6%
Correlation to bonds 7.6%
Stress correlation to equitie -37.5%
Stress correlation to bonds 8.8%
Table 3.12: Summary statistics for the low beta portfolio (DB [2015]).
A deeper analysis of the correlation structure reveals that the risk factors can be
clustered into three broad groups.

DB (2015):
• High beta, higher information ratio factors. These factors exhibit high information
ratios but also contain some equity market risk.
• Low beta, stable correlation factors. Factors with moderate correlation levels which
are typically stable.
• Negative beta, lower information ratio factors. Factors that exhibit negative corre-
lations to equity markets.
This observation then leads to timed factor portfolio investments, see the literature for
details.
We conclude this section by comparing a low volatility portfolio of risk premia of JP
Morgan - the 7.5% target volatility index - with the MSCI world, see Figure 3.21.
Great Financial Crisis European Debt Crisis Stress Q1 2016

JPM 15.86% JPM 13.70% JPM 3.16%
MSCI -42.86% MSCI -11.90% MSCI 0.02%
XRJPBE5E - 7.5% Volatility Target

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2006 -0.48% -2.41% -0.27% 1.30% 1.63% 0.57% 2.67% -0.06% 2.79% 5.78%
2007 2.04% -1.62% 2.13% 3.14% 2.63% -0.34% -0.27% 1.15% 1.59% 2.09% 0.87% 3.40% 18.01%
2008 -1.46% 8.20% 1.35% -2.49% 1.79% 2.19% 1.22% 0.21% -1.10% 3.85% 3.93% 6.26% 25.15%
2008 MSCI 5.31% 1.16% -8.34% -2.72% -2.35% -12.68% -19.91% -6.80% 3.47%
2009 0.15% 2.61% -0.67% -0.34% 2.27% 1.18% 1.59% 0.00% 2.65% 3.10% 1.61% 1.77% 16.20%
2009 MSCI -8.63% -10.02%
2010 1.54% 2.28% 1.41% 1.27% -2.93% 2.45% 0.99% 2.10% 1.47% 3.16% -1.19% 1.87% 15.23%
2011 -0.14% 0.74% -0.90% 4.60% 1.92% -0.53% 1.83% 1.05% 3.91% -0.90% 1.82% 2.51% 16.93%
2011 MSCI 3.86% -2.52% -1.75% -1.73% -7.53% -9.65% 10.62% -3.21%
2012 2.93% 1.65% 0.09% 1.79% 3.49% 2.38% 3.98% -0.72% 2.23% -0.07% 1.99% -0.84% 20.45%
2013 2.44% 4.78% 2.86% 1.44% -2.33% -2.78% 1.17% -2.00% 1.38% 1.43% 2.99% 0.03% 11.70%
2014 -1.85% 1.55% 0.87% 0.37% 0.99% 1.82% 0.05% 1.36% 0.90% -1.79% 3.32% -0.55% 7.12%
2015 0.61% 0.72% 1.24% -0.53% 2.04% -2.01% 3.04% -4.43% 5.19% -0.69% 1.82% -1.57% 5.20%
2016 3.17% 1.36% -1.37% 3.14%
2016 MSCI -6.09% -0.90% 7.16%
Figure 3.21: Top panel. The return of JP Morgan risk premia index and MSCI world
2006-2016. The middle statistics shows the cumulative returns of the two indices for
three stress events. The bottom panel shows the monthly returns of the JPM index and
for the three stress events - GFC, EU debt crisis, Q1 2016 - the returns of the MSCI are
also shown. (JPM [2016]).
The top panel shows that investing world wide diversied did not provided any pos-
itive return in the ten year investment period if the concept of asset diversication is
used. The JPM index, contrary showed an impressive performance. More detailed, the
risk premia performance slope is not the same in the ten years period: After the GFC
2008 until the end of 2012, the returns were largest with very low risk. Then for about
one and a half years there was a stand still period which was followed by a positive return
period with larger risks - the return chart is more zigzagged than in previous years. If
we compare the performance of the JP Morgan Index with MSCI in three stress periods
- GFC, EU debt crisis and Q1 2016 - we observe that the risk premia index did well
compared to the MSCI in the GFC and the EU debt crisis: The construction mechanics
to be uncorrelated to traditional asset classes in general and negatively correlated in
market stress situations worked. In the Q1 2016 event, things are more complicated.
While the same can be said for Jan and Feb 2016, the March data show that the risk
premia index largely underperformed MSCI. To understand the reason, from an asset
class perspective, there was a sharp and fast rebound of stock markets after the ECB's
president Draghi's speech. This rebound was to fast for the risk premia index' rebal-
ancing quarterly basis rebalancing frequency. Furthermore, the speech of Draghi also
aected credit risk premia in a way which is the exception rather than the rule: The
credit spread tightening was more pronounced for the Itraxx Europe Main index than
for the Crossover index of the same family. This means that risk factors collecting the
credit risk premia generated negative returns since they were wrong both the long and
the short risk premia portfolios. A similar remark applies to interest rate risk premia.
How many risk factors are they? Harvey et al. (2015) use 313 published works and
selected working papers and catalogue 316 risk factors and Hou, Xue, and Zhang (2017)
report in their study 447 factors. It is clear that not hundred of factors will be rewarded,
i.e. they are anomalies. We show in the section backtesting that an appropriate use
of statistical methods rules out most of them. Hou et al.(2017) for example nd that
two-third of the 447 factors are insignicant at the 5 percent level using the usual critical
t-value of two and 85 percent becomes insignicant if a critical value of three is used.
3.5.2 Non-Performance of Alternative Risk Premia

Are alternative risk premia performing? The HFR Bank Systematic Risk Premia Indices
reect the performance of the universe of investible risk premia strategies. The indices
which represent many styles across six asset classes comprise over one thousand individ-
ual component strategies.
Table 3.13 shows for premia indices and for individual premia show that most premia
fail to deliver a promised performance. Essentially, the indices show over the last three
year zero performance. What are the reasons for this underperformance compared to
the promising values of the premia providers in the section before? First, overestimated
backtesting results, see Section Backtesting. Second, a mixture of the backtesting period
between a period of true investments in the premia and a prior period where return were
calculated theoretically.
Name Premia(index) YTD LAST 36M

Risk Premia Commodity Index -11.72% -2.95%
Risk Premia Credit Index -13.00% 5.70%
Risk Premia Currency Index -3.98% -1.34%
Risk Premia Equity Index -16.29% -2.18%
Risk Premia Multi-Asset Index -20.81% 0.02%
Risk Premia Rates Index -2.93% -3.46%
Risk Premia Commodity Carry Index -6.99% 1.19%
Risk Premia Commodity Volatility Index -16.52% -9.89%
Risk Premia Credit Momentum Index -26.87% -3.12%
Risk Premia Credit Multi-Style Index -4.30% 20.29%
Risk Premia Currency Momentum Index -6.08% -6.65%
Risk Premia Currency Value Index -5.43% 6.50%
Risk Premia Equity Size Index -25.10% -11.85%
Risk Premia Equity Smart Beta Index -13.66% 5.73%
Risk Premia Equity Value Index -11.81% 0.03%
Risk Premia Equity Volatility Index -28.59% -5.38%
Risk Premia Multi-Asset Momentum Index -17.83% 3.19%
Risk Premia Multi-Asset Volatility Index -35.20% -31.94%
Risk Premia Rates Carry Index -5.80% -8.52%
Risk Premia Rates Volatility Index -5.72% 4.28%
Risk Premia Alternative Income Index -0.47% 4.99%
Risk Premia Risk Mitigation Index -6.91% -4.55%
Table 3.13: Performance of risk premia YTD (12. 12. 2018) and for the last three years.
In the upper part risk premia indices' performances are shown. Below, I selected the
best and worst performing individual risk premia for the six asset classes on the three
year basis. (Source: HFR Database,(2018)).
3.5.3 Theory: Factor Pricing Model

We dene factor pricing models and their relation to beta pricing models, see Back
(2010) and LeRoy and Werner (2000) for a detailed discussion. Starting point is the
linear factor model for returns (2.70).
Linear regressions of the returns on the linear space generated by the factors are
equivalent to projections, when we assume that the space of returns is an innite dimen-
sional complete normed vector space (a Hilbert space). Although of innite dimension,
the geometric intuitions of the Hilbert space R3 can be applied. This means that the
notions of a basis of vectors, orthogonality, projection, least square distance common in
R3 are well-dened. The norm is induced by the scalar product for random variables
y, x:
hx, yi := E(xy) . (3.67)
Since the factor returns span a subspace of the large Hilbert space spanned by the
asset returns, a regression is a projection on the smaller space where the dierence be-
tween the return and its projection is an orthogonal error, i.e. residuals are orthogonal
to the factors. The goal in empirical asset pricing is to construct factors such that the
error remains small, i.e. returns and their projection dier only slightly.
Consider the one-factor CAPM regression formula which contains covariances and
variances. We discuss why this are nothing than explicit representations of the projec-
tions. Let R ∈ R3 and the factor space F be 2-dimensional. We distinguish (i) the
factor space is a vector space or (ii) the factor space plane does not intersects the origin
(hyperplane), see Figure ??. The second case is generic in our context since the factor
space is generated by random variables plus a constant; the risk free return.
A map P is an orthogonal projection of a real vector space onto a subspace if it is
linear, P 2 = P, P 0 = P .29 We project R on F = {F̃1 + F̃2 }, where the two factors are
assumed to be orthogonal. Then
2
X hR, Fi i
PF (R) = Fi . (3.68)
hFi , Fi i
I=1
It is straightforward to check the two properties that make P an orthogonal projection.

If R is orthogonal to both factors the projection vector is zero, i.e. the orthogonal error
is maximal. The factors used for regression are not able to explain the return at all. The
opposite holds if one factor is collinear to the return vectors.
30
29 'Projecting twice = projecting once' and P is symmetric.

30 Assuming that the factors are orthogonal or even orthonormal is not a restriction since the Gram-
Schmidt procedure allows us to construct for linearly independent vectors a set of orthonormal vectors
recursively as follows. Set F̃1 = Fi . To get the second vector, we denote by F ⊥ the orthogonal vector to
F , the full spanning property holds:
PF (R) + PF ⊥ (R) = R , PF + PF ⊥ = I . (3.69)

R R
PF(R)
F
PF(R)
Figure 3.22: Left panel - projection on the factor space which is a vector space. Right
panel - projection where the factor space is an ane space, i.e. translated away from the
zero vector.
If we project R on a ane space F = {a+ F̃1 + F̃2 }, where the two factors are assumed
to be orthogonal, then
2
X hR − a, Fi i
PF (R) = a + Fi . (3.70)
hFi , Fi i
I=1
Therefore a simple shifts follows. As a consistency check, if a is equal to R then the

projection is the identity. If the factors are orthonormal and not only orthogonal, then
the denominator hFi , Fi i = 1. Using that the inner product in factor models is induced
by the expectation, the last formula reads:
2
X cov(R − a, Fi )
PF (R) = a + Fi (3.71)
σ 2 (Fi )
I=1
cov(x,y)
and dening
σ 2 (y)
=: βx,y the standard formula follow.
Using this formalism we dene factors, factor models and beta pricing models.
Then hF̃1 , F2 i
F̃2 = P(F̃1 )⊥ (F2 ) = F2 − PF̃1 (F2 ) = F2 − F̃1 .
hF̃1 , F̃1 i
This vector is orthonormal to the rst one and the construction is continued for the next vector by
projecting the third vector on the orthogonal complement of the rst two orthogonal vectors.
Denition 45 (Factors) . Factors F = (F1 , . . . , FK ) are independent square-integrable

random variables with zero expectation. F is the linear space of dimension K generated
by the factors. A factor model is quantied by a SDF M = a + b0 F where a is a number
and b a vector of dimension K .
Denition 46 (Beta Pricing Model). Let F = (F1 , . . . , FK ) be a vector of random
variables, R0 a constant and λ a K -dimensional constant vector. There exists a multi-
factor beta pricing model with factors F if for each return R:
E(R) = R0 + λ0 β (3.72)
with
β := CF−1 cov(F, R) (3.73)
the vector of multiple regression betas of the return R on the factors F and CF the
covariance matrix of the F 's.
The elements Fj are the risk factors and λ is the factor risk premium. If λ > 0, then
an investor is compensated for holding extra risk by a higher expected return when risk
is measured with the beta w.r.t. F. The coecient β is the coecient of an orthogonal
projection of the return R on the space generated by the factors F plus a constant.
Factors can be abstract random variables, portfolio returns, excess returns or dollar-
neutral returns. The model is exact, because there is no error term in (3.72). One can
always take factors to have zero means, unit variances and be mutually uncorrelated by
using
Fb := CD−1 (F − E(F )) (3.74)
with CD the Cholesky decomposition of CF .
When is a factor pricing model exact? By the Riesz theorem the price of an asset is
given by a scalar product of the future asset prices and the SDF. If the SDF is element
of the space spanned by the factors, then the pricing of the asset can be done by using
the factors with arbitrary precision. Then are beta pricing and factor pricing models
equivalent, see Proposition 69.
If the factors F are returns then the factor risk premium λ becomes an ordinary
risk premium λ = E(F ) − R0 .31
3.5.4 The CAPM as a Beta Pricing Model

The capital asset pricing model (CAPM) is an exact one-factor beta pricing model.
We start with linear times series regression for the asset returns, i.e. constant betas
are expected for the risk model. Consider for a stock i with return Rt,i , Rt,f the risk-free
rate, and Rt,M the return of a broad market index the linear regression
Rt,i − Rt,f = αi + βi,M (Rt,M − Rt,f ) + t (3.75)
31 To prove this consider a one-factor model F = R in (3.72). If there is a risk free asset, then R0 = Rf .
where α is the intercept, βi,M the slope or regression coecient, and t the standard
normal error term satisfying (2.70). The slope indicates the unit changes in stock ex-
cess return for every unit change in market excess return. The intercept indicates the
performance of the stock that is not related to the market and that a portfolio manager
attributes to her skills.
Example
For both regression coecients α and β condence intervals can be determined

using the estimated parameter value, the standard error of the estimate (SEE), the
signicance level for the t-distribution and the degrees of freedom. The formula for the
β condence interval reads β ± tc × SEE, where β is the estimated value and tc the
critical t-value at the chosen signicance level.
Consider the linear regression between an European equity fund's returns (dependent
variable) and the EUROSTOXX 50 index (independent variable). Statistical analysis
implies for 20 observation dates the estimates β = 1.18, SEE 18 = 20 − 2
= 0.147 and
degrees of freedom. The Student's t-distribution at the 0.05 18
signicance level with
degrees of freedom is 2.101. This implies the condence interval 1.18±(0.147)∗(2.101) =
0.87, 1.49. There is only a 5 percent chance that β is either less 0.87 or greater than 1.49.
There is a 95% condence that this fund is at least 87% as volatile as the S&P 500, but
no more than 149% as volatile, based on our ve-year sample.
We relate this empirical approach to the unconditional equilibrium asset pricing

model CAPM. The CAPM states that within the model the following exact cross-
section relation has to hold (deleting time indices):
E(Ri ) − Rf = βi,M (E(RM ) − Rf ) =: βi,M F . (3.76)
The risk premium of the asset i is E(Ri ) − Rf and the market portfolio risk factor is
F = E(RM ) − Rf . The CAPM states that some assets have higher average returns than
other ones but it is not about predicting returns. An asset has a higher expected return
because of a large beta and not the other way around. Furthermore, projection theory
implies:
cov(Ri , RM )
βi,M = . (3.77)
σ 2 (RM )
Summarizing, the time series regression (3.75) denes the β which enters the CAPM
model (3.76) which predicts that alpha should be zero.
The linear relation (3.76) in the CAPM between the excess return of an asset and
the market excess return follows from the following assumptions:
• Investors act competitively, optimal, have a one-period investment horizon and

there are many investors with small individual endowments. Hence, they cannot
inuence prices and are so-called price takers.
• All investors have mean-variance preferences.
• All investors have the same beliefs about the future security values.
• Investors can borrow and lend at the risk-free rate, short any asset, and hold any
fraction of an asset.
• There is a risk-free asset in zero net supply. Since markets clear in equilibrium,
total supply has to equal total demand. Given the net supply of the risk-free asset,
we combine the investor's portfolios to get a market portfolio. This will imply that
the optimal risky portfolio for each investor is the same.
• All information is accessible to all investors at the same time to all investors - there
is no insider information.
• Markets are perfect: There are no frictions such as transaction costs or lending or
borrowing costs, no taxes, etc.
Proposition 47. Under the above assumptions:

• Each investor is investing in the risk-less asset and the tangency portfolio.
• The tangency portfolio is the market portfolio.
• All investors hold the same portfolio of risky securities.
• For each title i, the linear relationship between risk and return (the security market
line [SML]) (3.76) holds:
E(Ri ) − Rf = βi,M (E(RM ) − Rf ) (3.78)
with the beta given in (3.77) measuring the risk between asset i and the market
portfolio M .
See Section 7 for the proof. The SML implies that beta measures how systematic risk is
rewarded in the CAPM, there is no idiosyncratic risk entering the SML.
32 There is no
reward, via a high expected rate of return, for taking on risk that can be diversied away.
A higher beta value does not imply a higher variance, but a higher expected return.
33
32 If an asset i uncorrelated with the market, its beta is zero although the volatility of the asset may
be arbitrarily large.
33 β = 1 implies E(Ri ) = E(RM ), β = 0 implies E(Ri ) = Rf and β < 0 implies E(Ri ) < Rf .
The behavioural implications of the CAPM assumptions are the following ones. First,
all investors consider the same mean-standard deviation chart. Hence, all possess a mean-
variance ecient portfolio. By the mutual fund theorem, each minimum variance port-
folio is a combination of a risk less asset and a xed risky asset portfolio. Therefore, all
investors invest in all risky assets in the same proportions. Since demand equals supply
in the asset market equilibrium, all investors must hold the market portfolio which in
turn is mean-variance ecient. Therefore, no investor needs to perform a mean-variance
analysis but just invest in the market portfolio.
The linearity of (3.78) implies that the portfolio beta is the sum of asset betas multi-
plied by the portfolio weights. The volatility risk measure of a portfolio needs to calculate
all pairwise covariances and volatilities instead which amounts for N assets to calculate
2N + N (N − 1)/2 parameters compared to the 3N + 2 for the CAPM (N covariances
with the market return, N returns, N volatilities, the market return and the risk free
rate). Furthermore, volatility of a portfolio is an absolute measure while beta is a relative
measure. In the CAPM, all optimal portfolios are a combination of the risk-free portfolio
and the market portfolio. Tobin's separation states how individually tailored portfolios
can be constructed. First, the portfolio manager constructs the risk free and market
portfolio. Then, investment advisor or more modern the client itself determines digital
his risk prole which xes the optimal allocation between risk-free and risk investments.
Inserting cov(Ri , RM ) = ρ(i, M )σk σM in (3.78) implies
µk − Rf µM − Rf
SRk := = ρ(k, M ) . (3.79)
σk σM
The Sharpe ratio of asset k is equal to the slope of the CML times the correlation
coecient. Comparing SML and CML, see Figure 3.23, all portfolios lie on the SML but
only ecient portfolios lie on the CML.
34 Finally, SML plots rewards vs systematic risk
while CML plots rewards vs total risk.
Consider three risky assets A, B , and C and 3 investors with capital of 250, 300, and
500, respectively, who have the following portfolios:
Investor Risk-less asset A B C

1 50 50 50 100
2 -150 150 200 100
3 100 75 75 250
Market Cap. 1,050 0 275 325 450
Table 3.14: CAPM.
1, 050, the tangency portfolio follows from the Markowitz

Market capitalization is then
model φT = (0.2619, 0.3095, 0.4286) and the market portfolio is φM = (275/1050, 325/1050, 450/1050).
34 A portfolio lies on both the SML and CML if the correlation between the portfolio return and the
market portfolio is 1.
Figure 3.23: Left panel - capital market line in the Markowitz model. Right panel -
security market line in the CAPM model. Assume that the borrowing and lending rate
are dierent. Draw the CML for these two rates.
It follows that the two portfolio are equal.
Consider three risky assets, the market portfolio, and a risk-free asset given by the
data in Table 3.17 (taken form Kwok (2010)):
Portfolio σ ρ with market portfolio β µ

1 10% 1 0.5 13%
2 20% 0.9 0.9 15.4%
3 20% 0.5 0.5 13%
Market portfolio 20% 1 1 16%
Risk-free asset 0% 0 0 10%
Table 3.15: Asset pricing in the CAPM.
The CML implies, at the standard deviation levels10 percent and 20 percent, respec-
tively, expected returns of 13 percent and 16 percent. Therefore portfolio 1 is ecient,
but the other two portfolios are not. Portfolio 1 is perfectly correlated with the market
portfolio but the other two portfolio have non-zero idiosyncratic risk. Since portfolio 2
has a correlation closer to one it lies closer to the CML. The expected rates of return
of the portfolios for the given values of beta, calculated with the SML, agree with the
expected returns in the table. To see this,
µ = µf + (µM − µf )β = 13%.
Therefore, there is no mis-pricing.
The following assumptions hold for the regression of asset k in the empirical CAPM
equation: E(k ) = cov(k , RM ) = 0. Then,
σk2 = βk,M
2 2
σM + var(k )
which is a decomposition in systematic and idiosyncratic risk. Idiosyncratic risk is un-

correlated with the market risk and can be reduced by diversication. Clearly, two
stocks can then possess the same total risk but dierent systematic and idiosyncratic
risk components. Consider two stocks:
• Stock 1: Chemical sector, market beta 1.5 and residual variance of 0.1.
• Stock 2: Software sector, market beta 0.5 and residual variance of 0.18.
The total risk of the two assets is, for a market standard deviation of 20 percent,
σ12 = β1,M
2 2
σM + var(1 ) = (1.5)2 (0.2)2 + 0.1 = 0.19
σ22 = β2,M
2 2
σM + var(2 ) = (0.5)2 (0.2)2 + 0.18 = 0.19 .
The two stocks have the same total risk but dierent systematic risk: the percentage of
systematic risk for the rst stock is (1.5)2(0.2)2/0.19 = 47% and for the second stock
only 5%.
3.5.4.1 Performance Measurement

We considered several Risk-Reward-Ratios (RR) in Section 2.6.6. The considered ratios
mostly diered in the risk measurement. The Sharpe and Treynor ratios, which we con-
sider in the sequel turned out to be not monotonic RR in the sense that they guarantee
that more return is better than less. Despite this weakness, the Sharpe ratio encourages
diversication: It can rank portfolios, portfolio managers, fund, identify poorly diver-
sied portfolios and too high charged funds. Which measure should one choose if the
portfolio is less diversied? Jensen's alpha, the appraisal ratio, and the Treynor ratio are
used in practice. They are all based on the SML while the Sharpe ratio is based on the
CML.
Jensen's alpha
αk := µk − Rf − βk (µM − Rf ) (3.80)
is a performance measurement between the realized and theoretical returns of the CAPM.
Since alpha is a return it should be used for the compensation of portfolio managers.
While the Sharpe ratio can be illustrated in the return-volatility space, Jensen's alpha
is shown in the return-beta space. Jensen's alpha measures how far above the SML the
asset's performance is. It does not considers the systematic risk that an investment took
on earning alpha.
The Treynor Ratio measurement (TR) adjusts for this systematic risk taken:
µk − Rf
TRk := .
βk
The TR equals the slope of the SML for the actively managed portfolio. If the CAPM
holds, then the Treynor ratio is the same for all securities. Both, the Jensen and Treynor
measurements do not adjust for idiosyncratic risk in the portfolio.
The appraisal ratio (AR) or information ratio (IR) divides the excess return over
the benchmark by the tracking error (TE).
Values of the IR around 0.5 are considered to be good values while a value greater
than 1 is extraordinary. The IR generalizes the Sharpe ratio since it substitutes the
passive benchmarks for the risk-free rate.
Example - Performance Measurement

We calculate the dierent ratios for the data in 3.17.
Portfolios Return Volatility Correlation with market

A 12% 15% 0.9
B 16% 24% 0.94
C 18% 17% 0.98
Market 15% 20% -
Risk-free rate 4% - -
Table 3.16: Data set for the performance ratios.
The beta of A is equal to its market portfolio correlation times its volatility divided
0.9×15%/20% = 0.675. The Sharpe ratio for A is SR =
by the market volatility - that is,
(12% − 4%)/15% = 0.53. Jensen's alpha for portfolio A reads 12% − 4% − 0.675(15% −
4%) = 0.575% and the Treynor ratio for A is given by (12% − 4%)/0.675 = 0.119. The
IR and the TE follow in the same way. We nally get:
Portfolio Beta TE SR Jensen TR IR

A 0.675 9.22% 0.53 0.58% 0.119 0.062
B 1.128 8.58% 0.5 -0.41% 0.106 -0.048
C 0.833 4.75% 0.84 4.84% 0.168 1.017
Market 1 0% 0.55 0% 0.11 -
Table 3.17: CAPM.
It follows that portfolio C is the best portfolio. We summarize the relevance of the
dierent performance measurements:
• Beta is relevant if the individual risk contribution of a security to the portfolio risk
is considered.
• TE is relevant for risk budgeting issues and risk control of the portfolio manager
relative to a benchmark.
• The Sharpe ratio is relevant if return compensation relative to total portfolio risk
is considered.
• Jensen's alpha is the maximum amount one should pay an active manager.
• Treynor measurement should be used when one adds an actively managed portfolio,
besides the many yet existing actively managed one, to a passive portfolio.
• The information ratio measures the risk-adjusted return in active management.

It is frequently used by investors to set portfolio constraints or objectives for their
managers, such as tracking risk limits or attaining a minimum information ratio.
Grinold and Kahn (2000).
Warnings: If return distributions are not normal since they show fatter tails, higher
peaks, or skewness, then the use of these ratios can be problematic, since higher moments
than the second one contribute to risk. Furthermore, the IR depends on the chosen time
period and benchmark index. Finally, the chosen benchmark index aects all benchmark-
based ratios: Managers benchmarked against the S&P 500 Index have lower IR than
managers benchmarked against the Russell 1000 Index [Goodwin (2009)].
3.5.4.2 Testing the CAPM

The CAPM triggered an enormous econometric literature that addresses the verication
of (3.76). Although Black, already in 1972, veried that the risk premia are not pro-
portional to their beta, it took many more years and much more academic writing for
a majority of researchers to accept the non-empirical evidence of (3.76). The many as-
sumptions of the CAPM are the cause of the empirical failure of the CAPM. The CAPM
can, for example, not explain the size or value eect. The CAPM on average explains
only 80 percent of portfolio returns. One needs more factors than just the covariance
between the asset return and the return on the market portfolio. The classic papers
are Black et al. (1972), Fama and MacBeth (1973) and for a review about cross-section
regression Goyal (2012).
Standard assumptions for testing CAPM are rational expections, i.e. in particular
realized returns are a proxy for expected theoretical returns and that the holding period
of assets is known, typically one month.The CAPM equation which should be tested
E(Ri ) − Rf = βi,M (E(RM ) − Rf ) (3.81)
raises several questions: Are beta's stable measures of systematic risk? Are the expected
returns linearly related to the betas (Q1)? Is beta the only systematic risk measure (Q2)?
Does the expected return of the market portfolio exceeds the expected return of assets
uncorrelated to the markets (Q3)? Finally, do assets uncorrelated to the market portfolio
have the risk-free rate return (Q4)? There two linear test of the CAPM equation. Once
the returns of dierent assets are regressed over the betas (cross-section, Q2, Q3) and
once the CAPM equation for each individual asset over time is regressed (time-series, Q4).
The cross-sectional regression is used to test the CAPM equation. over a period
T years. Since expected returns are not measurable, the CAPM equation is tested for
average annual realized returns. The temporal individual asset test using time-series
regression tests the CAPM on a number of xed sub-periods up to time T: Excess asset
return is regressed over the excess market return in each sub-period.
Using the time series regression equation,
Rt,k − Rt,f = αk + βk,M (RM,t − Rt,f ) + t .
to estimate alpha, beta and epsilon, it follows that. The estimates of beta βb are volatile
both for stocks and for sectors; see Figure 3.24.
Since the CAPM is only interesting for portfolios with beta the signicant risk mea-
sure, an application to single securities does not make sense.
We consider tests of the CAPM where we restrict to three key papers. The beta
instability led to CAPM test for portfolios only. The rst one is the paper of Black,
Jensen and Scholes (1972).
35 It starts with the observation that zero-beta assets earned
more than the risk-free rate and that the beta premium was lower than the market excess
35 Consider a period of say 50 years, i.e. 600 months. Use say 60 months to estimate the beta for each
stock (pass one). Rank the securities by the estimated betas and form ten portfolios. Recalculate the
betas for the next ve years, and so on which denes a rolling regression. We then have monthly returns
for the time period minus ve years for each portfolio. Calculate mean portfolio returns and estimate
the beta coecient for each of the 10 portfolios. This provides beta estimates for the portfolios. Do pass
2 for the portfolios, i.e. regress the portfolio means against portfolio betas, that is estimate the ex-post
SML.
Figure 3.24: Beta estimates for AT&T (left panel) and the oil industry (right panel)
(Papanikolaou [2005]).
return. This violats Q3. The authors support these ndings and postulated a modied
version of the CAPM, the zero-beta CAPM: That is it accommodates zero-beta returns
above the risk-free rate or in other words, all investors can lend and borrow any amount
of money at the risk-free rate.They formed ten portfolios ordered from highest to lowest
beta securities. Their time series regression nds that high-beta (low-beta) portfolios
consistently show negative (positive) alphas which violates Q4. In the cross-section, the
regression line has a atter slope than the SML and an intercept which is signicantly
greater than zero. But linearity of between return and beta is conrmed.
Consider the cross-section where the factors F are non-traded portfolios with an ex-
isting risk-free rate. Using the time series regression, estimates of factor risk premia and
pricing errors can be obtained. But in the cross-section the estimation is simplied us-
ing the two-pass regressions. First, betas are estimated from the time-series regressions,
and then a cross-sectional regression of average returns on betas follows. That is the
estimated betas are in the second step the explanatory variables.
36 The pricing errors
are given by the cross sectional residuals α̃. The estimates of the cross-section can be
obtained by OLS or by using GLS more ecient estimates follow since the cross-section
36 Formally for an arbitrary factor F :

Time Series : Re,t,i = αi + βi Ft + t ⇒ βbi ⇒ Cross-Section : Re,i = βbi λ + α̃i . (3.82)
residuals are correlated. The betas in the second-pass CSR are time series estimates
which leads to the problem of errors-in-variables. Shanken (1992) showed how to correct
the standard errors of the risk premium and pricing error estimates. The predictions
of CAPM are that alpha is zero, lambda is equal to the market premia, that any other
variables are zero. Typically, alpha is estimated to be positive, lambda is positive but
smaller than the market premium and other factors are not rejected.
So far all results are under the assumption of a small number of assets and the
estimators are time horizon T consistent. If the number of assets increases for a xed T,
the error-in-variables problem also leads to biased and inconsistent coecient estimates.
Shanken (1992b) derives an estimator that is N -consistent. Finally, Gagliardini et al.
(2011) explore the properties of these estimators under both T, N → ∞.
Suppose that the R2 is large in the cross-sectional CAPM equation (3.76). The
CAPM then explains the cross-section of average returns successfully and the alpha in
cross-section is small. This can be the case even if the R2 of the time series regression
(3.75) is low. The main goal of the CAPM is to see whether high average returns in the
cross-section are associated with high values of the factors.
Summarizing, ndings are that excess returns on high-beta stocks are low, that excess
returns are high for small stocks and that value stocks have high returns despite low betas
while momentum stocks have high returns and low betas. The CAPM does not explain
why in the past rms with high B/M ratios outperformed rms with low B/M ratios
(value premium), or why stocks with high returns during the previous year continue to
outperform those with low past returns (momentum premium). Despite these ndings,
the CAPM is used for guring out the appropriate compensation for risk, is used as a
benchmark model for other models, and is elegantly simple and intuitive.
3.5.4.3 Conditional CAPM

Some researchers assumed that the poor empirical performance of the CAPM could be
due to its assumption of constant conditional moments but that a conditional CAPM
will possess a better empirical performance. They therefore model explicitly the time
varying conditional distribution of returns as a function of lagged state variables.
The conditional CAPM works as follows. Consider two stocks. Suppose that the
times of recessions and expansions are not of equal length in an economy, that the mar-
ket risk premia are dierent and that the two stocks have dierent betas in the dierent
periods. The CAPM then observes only the average beta for each stock for both periods.
Assume that this beta is 1 for both stocks. Therefore, the CAPM will predict the same
excess return for the two stocks. But in reality the two stocks will show due to their
heterogeneity dierent returns for the two dierent economic periods. One stock can for
example earn higher return than explained by the CAPM since its risk exposure increases
in recessions, when bearing risk is painful, and decreases in expansions. Therefore such a
stock is riskier than the CAPM suggests and the CAPM would detect an abnormal high
return suggesting this is a good investment. The conditional CAPM corrects this since
return comes from bearing the extra risk of undesirable beta changes.
Lewellen and Nagel (2006) did not questioned the fact that betas vary considerably
over time. But they provide evidence that betas d o not vary enough over time to
explain large unconditional pricing errors. As a result, the performance of the conditional
CAPM is similarly poor as the unconditional model: It is unlikely that the conditional
CAPM can explain asset-pricing characteristics like book-to-market and momentum.
These statistical criticisms are not unique to the CAPM. Most asset pricing models are
rejected in tests with power.
3.5.5 Factor Investing: 3-Factor Model of Fama and French

The non-zero alpha in the CAPM and the non-vanishing of factors dierent than the
market premia led Fama and French to add additional factors value (HML) and size
(SMB) in the 90s. They sorted stocks in into ve market cap and ve book-to-market
equity (B/M) groups at a specic date, which leads to 25 portfolios. The sorted portfolios
scatter around the CAPM line. The interpolated line between the sorted portfolios is
too at compared to the CAPM line: stocks with low B/M should provide high average
returns and high betas but the betas are not small for high expected return. They even
have the wrong sign - betas are lower for higher return securities. This observation led
FF to introduce the two new factors and state the exact beta pricing relation:
E(Ri ) − Rf = βi,M (E(RM ) − Rf ) + βi,SM B E(RSM B ) + βi,HM L E(RHM L ) . (3.83)
While the CAPM has a theoretical foundation, the FF model is an ad hoc model in-
troduced to better t empirical data. The three factor model is routinely included in
empirical research
.
We follow Kenneth French's web site for the FF factor construction. The factors are
constructed using the six value-weighted portfolios formed on size and book-to-market.
• SMB (small minus big) is the average return on the three small portfolios minus
the average return on the three big portfolios
1
SMB = (Small Value + Small Neutral + Small Growth) (3.84)
3
1
− (Big Value + Big Neutral + Big Growth) .
3
• HML (high minus low) is the average return on the two value portfolios minus the
average return on the two growth portfolios
1 1
HML = (Small Value + Big Value) − (Small Growth + Big Growth) .
2 2
• Whether a stock belongs to, say, Small Value depends on its ranking. Small Value
contains all stocks where the market value of the stock is smaller than the median
market value, say, of the NYSE and where the book-to-market ratio is smaller than
the 30 percent percentile book-to-market ratio of NYSE stocks.
• SMB for July of year t to June of t + 1 includes all NYSE, AMEX, and NASDAQ
stocks for which there exist market equity data for December of t − 1 and June of
t, and (positive) book equity data for t − 1.
Why should one include factors which cannot explain average returns? The CAPM
worked until stocks were grouped by their book-to-market ratio (value) but it still works
when stocks are grouped according to their size. If FF were only to consider factors
which explain theaverage returns then they could left them out. But size is important
for return variance reduction.
To see this work, assume that the CAPM is perfect. Then,
E(Rk ) = βk E(RM )
where we set the risk free rate to zero. Include an additional industry portfolio in the
regression, i.e.
Rt,k = αk + βk,M Rt,M + βk,I Rt,I + t,k .
The regression generically leads to a coecient βk,I > 0 and taking expectations:
E(Rt,k ) = αk + βk,M E(Rt,M ) + βk,I E(Rt,I ) .
This additional industry portfolio return contradicts that the CAPM is perfect. To
resolve the puzzle, one uses a nested projection approach: First project the industry
portfolio on the market return:
Rt,I = αI + βI,M Rt,M + t,I .
If the CAPM is right, the industry alpha is zero and
E(Rt,I ) = βI,M E(Rt,M ) .
Then orthogonalize the industry return, i.e.:
∗
Rt,I := Rt,I − E(Rt,I ) = Rt,I − βI,M Rt,M .
This is equivalent to beta-hedge the portfolio. The expected value of the new return is
zero if the CAPM is right. Run a regression on this orthogonality-adjusted CAPM. This
improves the R2 , the t-statistics and the volatility of the residual while the mean of the
CAPM is unchanged.
Considering dierent portfolios, the CAPM-R

2 statistics for the increased for dierent
portfolios from 78 percent to 93 percent in the FF portfolios. Roncalli (2013) states that
the improvement in the R2 is not uniform:
• The dierence in R2 between the FF and the CAPM is between 18 percent and 23
percent in the period 1995-1999.
• This dierence is around 30 percent during 2000 and 2004.
• The dierence then decreases and is around 11 percent during the GFC.
• In the period starting after the GFC and running until 2013 the dierence is 7
percent.
• SMB and HML explain the variation of returns across stocks; the market factor
explains why stock returns are on average higher than the risk-free rate.
Are the FF factors global or country specic? Grin (2002) concludes that the FF
model exhibits its best performance on a country-specic basis. This view is largely
accepted. While FF performed originally regressions on portfolios of stocks, Huij and
Verbeek (2009) and Cazalet and Roncalli (2014) provide evidence that mutual fund re-
turns are more reliable than stock returns transaction costs, trade impact, and trading
restrictions are of less impact.
Figure 3.25 illustrates the dierent FF factors' performance since 1991 and momen-
tum factor. The size factor only generates low returns. This is the reason why most
risk premia providers do not oer size risk premia.
37 Cyclicality is common to most risk
factors. Some factors show persistent excess risk-adjusted returns over long time peri-
ods but over shorter horizons they show cyclical behavior with underperformance. Ang
(2013) argues that the premia exist to reward long-horizon investors for bearing that risk.
FF (1993) tested their model in the period 1963-1991. They rejected the assertion that
all intercepts from the regression of excess stock returns on excess market return, SMB
and HML are zero. The FF performed better than any single factor model and it failed
only slightly due to the the low-B/M portfolios. Their return was too low and the return
on the big-size portfolio was too high; i.e. the size eect was missing in the lowest-B/M
quintile.
3.5.6 Factor Investing: 5-Factor Model of Fama and French

Fama and French (2015) proposed a ve-factor model extension of their three-factor
model. The motivation of the model follows from rm valuation equation:
∞
P 1
Et (1+R)j
(Yt+j − ∆Bt+j )
Mt j=1
= (3.85)
Bt Bt
37 The gure shows periods with momentum crashes. Heavy monthly losses occurred during the Great
Depression. The risk factor faced losses of up to 50 percent in one month. The risk factor performed
much better in the post WWII period until the burst of the dot-com bubble. In this period, investing
USD 100, say, in 1945 led to a payback of USD 3,500 around 50 years later. The average monthly return
over the whole period is 0.67 percent.
250
Monthly returns of momentum risk factor,
18.00%
1927 ‒ 2014
200
16.00%
14.00%
150
12.00%
100 10.00%
8.00%
50
6.00%
4.00%
0
2.00%
-50
1991
2007
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
2009
2010
2011
2012
2013
2014
0.00%
Mkt-RF SMB HML WML RF
Figure 3.25: Left panel - FF annual factor performance in the period 1991-2014 starting
each year in January and ending in December. Mkt is the market return, RF the risk-
free return, and WML the momentum factor. Right panel - monthly returns of the
momentum risk factor (Kenneth French's web site).
with M the current market cap, Y total equity earning , ∆B the change in total book
value in the period, and R the internal rate of return of the expected dividends. Equation
(3.85) follows from the fundamental pricing equation; see Equation (4.31) in Chapter 4.
Equation (3.85) implies that B/M value is an imperfect proxy for expected returns: The
market cap M also responds to forecasts of earnings and investment (expected growth
in book value) which dene the two new factors. The regression (3.83) reads (neglecting
time indices)
X
Ri − Rf = βi,M (RM − Rf ) + βi,k Rk + αi + i , (3.86)
k∈{SMB, HML, RMW, CMA}
with RRM W the earnings risk factor (dierence between robust and weak protability)
and RCM A the innovation risk factor (dierence between low- and high-investment rms).
This is again an exact factor model by denition. The explicit construction of the risk
factors is a long/short combination similar to (37); see Fama and French (2015).
Fama and French (2015) rst analyze the factor pattern in average returns following
the construction of the three-factor model:
• One-month excess return on the one-month US treasury bill rate follow.

• The returns follow for 25 value-weighted portfolios of US stocks from independent

sorts of stock (ve size and ve B/M groups ranking into quintiles). The authors
label the quintiles from Small to Big (Size) and Low to High (B/M).
• Data are from 1963 to 2013.
Figure 3.26: Return estimates for the 5x5 size and B/M sorts. Size is shown on the
vertical and B/M on the horizontal. OP are the earnings factor portfolios and Inv the
investment factor portfolios. Returns are calculated on a monthly basis in excess to the
one-month US treasury bill rate returns. Data start in July 1963 and end in December
2013, thus covering 606 months (Fama and French, 2015).
Panel A in Figure 3.26 shows that average returns typically fall from small to big
stocks - the size eect. There is only one outlier - the low portfolio. In every row, the
average return increases with B/M - the value eect. It also follows that the value ef-
fect is stronger among small stocks. In Panel B, the sort B/M is replaced by operating
protability due to the denition found in Fama and French's 2015 paper. Patterns are
similar to the size-B/M sort in panel A. For every size quintile, extremely high rather
than extremely low operating protability (OP) is associated with a higher average re-
turn. In panel C the average return on the portfolio in the lowest investment quintile is
dominates the return in the highest quintile. Furthermore, the size eect exists in the
lowest four quintiles of the investment factor.
The authors perform an analysis to isolate the eect of the factors on average return.
The main results are:
• Persistent average return patterns exist for the factors HML, CMA, RMW, SMB.
• As expected, statistical tests reject a ve-factor model constructued to capture

these patterns.
• The model explains between 71 percent and 94 percent of the cross-section variance
of expected returns for HML, CMA, RMW, SMB.
• HML (value) becomes a redundant factor. Its high average return can be completely
generated by the other four factors, in particular to RMW and CMA.
• Small stock portfolios with negative exposure to RMW and CMA are problematic:
Negative CMA exposures are in line with evidence that small rms invest a lot.
Negative exposures to RMW, contrary, is not in line with a low protability.
Why Fama and French did not introduce momentum? Asness et al. (2015) state
that momentum and value are best viewed together, as a system, and not stand-alone.
Therefore, it is not a surprise that value becomes redundant in the ve-factor model
where momentum is not considered. The authors redo then the estimation of 5 factor
model where they also nd that HML can be reconstructed and is better explained by
a combination of RMW and CMA. But the other direction is not true. CMA cannot be
explained for example by HML and RMW. The authors then add momentum which is
negatively correlated to value: Then, value becomes statistical signicant in explaining
returns.
3.6 Backtests
Backtests are historical simulations of quantitative investment strategies. The tests com-
pute the P&L of the strategy if it had been run over that time period. The performance
is expressed using performance measures such as the Sharpe ratio,. Backtests often look
very promising for investment. But many practitioners fear that once they invested into
a backtested strategy, the backtesting-performance evaporates. This fear is justied if
statistics is not used appropriately, as it is too often the case.
3.6.1 Data Snooping

Data snooping and data mining are often the reasons for a disagreement between back-
testing results and future investment performance. They suggest ndings which are
supported by the data but in fact are spurious. Consider an investment strategy which
has been ne-tuned on the in-sample data set. Applying it out-of-sample the claimed
in-sample performance can disappear, if the strategy improved on specic in-sample char-
acteristics which are missing out-of-sample. The data snooping example from Andrew Lo
(1994) shows how a pure non-sense can lead to a spurious outperformance. The invest-
ment strategist beliefs that the following mathematical proposition of Fermat regarding
prime numbers provides meaningful investment signals:
3.6. BACKTESTS 217
Proposition 48. For any prime number p, the division of 2p−1 by p always leads to a
remainder of 1.
Dividing 213−1 for example by 13 implies 315 plus the remainder of 1. This holds for
all prime numbers. But the converse is not true. If a division of 2p−1 by p leads to a
remainder of 1, it does not imply that p is a prime number. But the converse is 'almost
true': There are very few numbers that satisfy the division property and are not prime.
In the rst 10, 000 numbers there are only seven such numbers, one of them is 1105.
He relates this prime number theorem to stock market performance as follow: Select
those stocks where one of these seven numbers are embedded in the CUSIP identiers.
38 Given the aforementioned seven numbers, there is only one CUSIP code that contains
such a number: CUSIP 03 110510. This CUSIP represents the stock Ametek. This stock
had exhibited, by the time of Lo's writing, extraordinary performance: a Sharpe ratio of
0.86, a Jensen alpha of 5.15, a monthly return of 0.017, and so on.
There is no reason why the link 'Prime Number Theorem - CUSIP Stock Selection '
should work in general. The relationship driven return is simple luck. Here the highly
non-linear prime number property leads to a spurious return patterns.
Consider an order statistics example. Assume that there are N = 100 IID securities
with standard normal distributed annual returns with mean of 10 percent and standard
deviation of 20 percent. The probability that the return of security k exceeds 50 percent
is then 2.3 percent.
39 It is unlikely that security k will show this strong return. But if
we ask for the winner return
40 - that is to say, the probability that the maximum return
will exceed 50 percent, the probability is 90 percent.
But this winning question does not tell us anything about the nature of the winning
stock since they are IID distributed. Nothing can be inferred about the future return if
one knows at a given date which stock is the winner. Choosing today the past winner
and predicting that it will also be the future winner is data snooping. The prediction is
only related to luck.
3.6.2 Overtting
Researcher in investments algorithms often publish their results using in-sample results
where the number of trials is not stated. Not reporting the number of all trials increases
the probability of overtting: The published investment algorithm fails to t additional
data or predict future observations reliably. There is risk that a high Sharpe ratio in-
sample but with zero Sharpe ratio out-of-sample is reported. Consider an investment
38 A CUSIP is a nine-character alphanumeric code that identies a North American nancial security
for the purposes of facilitating clearing and settlement.
39 P (Rk ≥ 50%) = 2.3%.
40 P (maxk (Rk ) ≥ 50%) = 99%.
algorithm for stock investment where 1000 paths are simulated. If one selects and pub-
lish the best performing path, then all investors using this algorithm will be disappointed.
The example is from Bailey et al. (2014). Consider an IID sequence of normal
returns with mean µ and volatility σ. The annualized Sharpe ratio can be computed as
(Lo (2002))
µ√
SR = T
σ
where T is the number of returns per year. The true values of the drift and volatility are
not known. Hence they are estimated, leading to an estimated annualized Sharpe ratio
SR. Lo proves that this estimate converges asymptotically for large y, the number of
years used to estimate the Sharpe ratio, to:
2
1 + SR
2T
SR → N (SR, ).
y
If µ=0 and y = 1, then

SR → N (0, 1) .
The following proposition is key:
Proposition 49. Let Xn , n = 1, . . . N, be IID standard normally distributed and y = 1.

The expected maximum of the sample is for large N approximated by:
1 1
E[max Xn ] ∼ (1 − γ)Φ−1 (1 − ) + γΦ−1 (1 − ) , N >> 1, (3.87)
n∈N N eN
with γ = 0.57721...
√ the Euler-Mascheroni constant. An upper bound for the expected
maximum is 2 ln N .
Figure 3.27 illustrates the proposition. For N = 10 alternative congurations of

an investment strategy, one expects to nd a strategy with a Sharpe ratio in-sample of
1.57 although all strategies are expected to deliver a Sharpe ratio of zero out-of-sample.
Increasing the number of tested strategies, an increasing non-null probability of selecting
in-sample a strategy with null expected performance out-of-sample follows. Hence, unless
the maximum estimated Sharpe ratio is not much larger than the expected maximum
Sharpe ratio, the discovered strategy is likely to be a false positive.
The results carry over to y 6= q by scaling the above result. Again, the more indepen-
dent congurations a researcher tries, the more likely is overtting. Hence, increasing N
means to use a higher acceptance threshold for the backtested result to be trusted. In-
creasing the sample size y, the above overt problem can be at least partially mitigated.
This means that a minimum backtest length can be calculated such that one does not
selects an in-sample strategy with Sharpe ratio the expected maximum one but which
has an expected out-of-sample Sharpe ratio of zero, see Figure 3.27.
This trade-o implies for say 6 years data at hand, that no more than 100 independent
model congurations should be tried. Else, almost surely strategies are produced with
3.6. BACKTESTS 219
Overfitting Backtest's Results as a Function of Trials N

4
3.5
2.5
Expected Maximum
1.5
0.5
0
5 55 105 155 205 255 305 355 405 455 505 555 605 655 705 755 805 855
Number of Trials N
Expected Maximum Upper Bound
Figure 3.27: Overtting of backtests for µ = 0 and y = 1 and minimum expected backtest
length.
positive Sharpe ratios in-sample but zero ones out-of-sample. The authors state: A
researcher that does not report the number of trials N used to identify the selected backtest
conguration makes it impossible to assess the risk of overtting.
Example Backtest Q&A
The Financial Math Organization present some questions and answer to overtting
in a blog
41 which relates to the paper of Bailey et al. (2014). We focus on some questions
and answers.
• Why do so many quantitative investments fail? ... some of the most successful in-
vestment funds in history apply rigorous mathematical models (..., Winton, Citadel,
...). Many of them are closed to outside investors, and the public rarely hears about
them. This void is often lled by pseudo-mathematical investments, which apply
mathematical tools improperly as a marketing strategy. One of the most widely
misunderstood experimental techniques is historical simulation, or backtesting.
• Is it true that every backtest is intrinsically awed? Not at all. ... The purpose of
our research is to highlight how easily backtest results can be manipulated, ...
• Can the 'hold-out', i.e., reserving a testing set to validate the model discovered in
the training set, method prevent overtting? Unfortunately, this method cannot
prevent overtting. ... Perhaps the most important reason for hold-out's failure is
that this method does not control for the number of trials attempted. If we apply
the hold-out method enough times (say 20 times for a 95% condence level), it is
expected that we will obtain a false negative (i.e., the test fails to discard an overt
strategy). ...
• Are you saying that Technical Analysis is a form of charlatanism? No. Technical
analysis tools rely on a variety of lters that make them prone to overtting. We
are simply stating that technical analysts and their investors should be particularly
aware of the risks of overtting. When the probability of backtest overtting is
correctly monitored, technical analyses may provide valuable insights to investors.
3.6.3 Backtesting and Multiple Testing

We consider adjusting the p-value if multiple testing is considered. The discussion is taken
from Harvey and Liu (2015). Consider a single initial zero-dollar investment strategy φ
and the return Rt . A single test is used to evaluate the hypothesis that the expected
return E(Rt ) of the strategy is dierent from zero. To test the hypothesis a sample
statistics is constructed by considering the time series of historical returns and estimating
the sample mean µ
b and volatility σ
b. The t-statistics which tests the null hypothesis of
µ
zero expected return is t= bT which implies that
σ
b
√
SR = T ×t . (3.88)
Therefore for a xed time horizon an increasing SR implies an increasing t-ratio which
implies a higher signicance level, and vice versa for the other direction. This is equivalent
to a lower p-value for a single strategy test:
√
pS = P (|R| > t) = P (|R| > SR T ) . (3.89)
Assuming a distribution for the returns, a distribution for the t-statistics and hence the
Sharpe ratio follows. Summarizing, if SR is the right measure to value performance,
(3.88) states that this is one-to-one related to the t-statistics. What is the appropriate
p-level if multiple tests are used? A practitioners is to apply ad hoc rules in their back-
testing rules: Discount the Sharpe ratios of single test in the backtests by 30% or even
50%. While easy to implement, this approach fail to have any justication.
We generalize the above discussion to multiple N ≥ 1 independent tests. Assume

that we want again test whether a single investment strategy is protable. The hypothe-
sis is that if the strategy is not protable, then with 5% we make a wrong decision (error
type I) by implementing a strategy which will lose money. This is a false discovery.
We test independently 100 strategies. While 5% is acceptable for one test, for many tests
this percentage can result in a large number of false positives. This is the multiple testing
3.6. BACKTESTS 221
problem. In the 100 tests, the chance to err is around 99%.42 If we want to keep the
5% level, a solution is to use 5%/100 = 0.0005 for each test. This makes sure that the
chance of making the wrong decision that one of those 100 strategies is working is less
than 5%. This is called controlling of the Family-wise error rate (FWER). It is a very
restrictive control in the multiple testing. The p-value of 1.66 for 5% becomes 3.4. Only
extremely performing strategies allow us to keep the 5% decision level. Many strategies
which are performing good will be missed.
As an example of multiple testing, consider a hedge fund manager using Commodity

Trading Advisors (CTAs) strategies. That is, he relates detected changes of trend in the
securities to changes in the exposure. Dierent parameters dene the change detection
such as length of the time series to calculate the moving averages, tresholds to enter
and to exit and from a risk management perspective, stop-loss trigger points. Given
that many assets are tested, the number of combinations are millions of even billion
ones. Suppose that each strategy is individually tested, say by calculating the Sharpe
ratio for each trial and test its signicance on 95% level. Given the large number of
individual tests, multiple testing raises the concern that an increasing number of them
will be positive purely due to chance. That is, a large fraction of the individual tests
that ex post are positive will be false discoveries, i.e. are due to chance. If the false
discovery rate is 100%, the signicance of all individual tests is completely uninformative.
The conservative FWER rule was improved by Holm (1979) and Benjamini and
Hochberg (1995). They proposed to allow non-performing strategies as long as the there
are enough performing ones. In doing this, we gain power to detect the skill-full man-
agers. But how many non-performing strategies are we willing to accept? We x the rate
of false discoveries (FDR) to 20% - we are willing to accept that out of ve strategies
one is a non-protable one. Assume that 2 out of the 100 strategies add value while the
other ones destroy wealth. Benjamini and Hochberg found an upper bound, i.e. even if
all 100 strategies are null, we will get our 20 percent by adjusting the threshold. In this
case it also follows that the strategies are normally distributed. If some strategies are
protable, then we get a better rate than 20 percent.
How do we nd the threshold which gives the chosen FDR? The theory is rather in-
volved, but an algorithm is used to derive the correct threshold. We expect 100×0.05 = 5
signicant variables. Starting with the p-value of 2 we get say 6 variables: We get only
42 To prove this let pM be the p-value for the multiple test dened as:
pM = P ( max |Ri | > t) = 1 − (1 − pS )N . (3.90)
i=1,...,N
Using the standard t = 2 value for a single test or equivalently pS = 5%, implies for N = 100 pM = 99%.
The search for a strategy which is at least as protable as the observed strategy largely reduces the
statistical signicance of the single test. In this sense pM is seen as the adjusted p-value which takes
data mining into account. Equating the two p-values, the adjusted or haircut Sharpe ratio follows which
is smaller than SR (since pM > pS ).
one skill-full manager while 5 have no skills. The ratio 5/7 = 71% is much higher than
the 20% accepted FDR. The algorithm then increases the p-value from 2 such that the
ratio of expected to observed variables becomes 20%. The resulting number of observed
variables s such that the division of the expected variables by s equals the FDR rate
says that if we know that there are 2 performing strategies among the 100 ones, then
by controlling the FDR to 20%, the test has the power to discover the strategies s. In
variables selection terms, we are willing to add estimation noise to our model (variable
which is not important) as long as we add relevant information as well (include more
relevant variables).
If the tests are dependent, then pM depends on the joint distribution of all N sin-
gle test statistics. To limit the occurrence of incorrectly discovered protable strategies
- false rejections of the null hypothesis occurs more likely than in a single test - two
methods are used: The method which controls the family-wise error rate (FWER) and
the control of the false discovery rate (FDR). Both methods dene type I errors in mul-
tiple testing thus generalizing type I error probabilities for single tests. Summarizing,
FDR conceptualizes the rate of type I errors in null hypothesis testing when conducting
multiple comparisons. FDR-controlling procedures are designed to control the expected
proportion of discoveries, i.e. rejected null hypotheses that are false (incorrect rejections).
Formally, we denote by R the number of rejections, N the tested hypotheses and N0|r
the fraction of false discoveries.
Denition 50. FWER dened by

FWER = P (N0|r ≥ 1) (3.91)
is the probability of making at least one false discovery.

FDR considers the proportion of false rejections and it is based on the false discovery
proportion (FDP), the proportion of type I errors dened by

N0,r , Fraction of false discoveris if R > 0;
FDP = (3.92)
0, if R = 0.
FDR measures the expected proportion of false discoveries among all discoveries, i.e.
F DR = E[F DP ]. Given the type I error denitions, p-value adjustments control for
data mining. Based on the adjusted p-values, the corresponding t-ratios are transformed
into Sharpe ratios. There are dierent methods to transform p-values. Two methods for
FWER are:
Bonferroni's Method:
pBonf
(i) = min(N p(i) , 1)
Holm's Method:
pHolm
(i) = min(max(N − j + 1)p(i) , 1) .
j<i
3.6. BACKTESTS 223
For FDR the method of Benjamini, Hochberg, and Yekutieli (BHY) reads
pBHY
(i) = p(N ) if i=M
and if i≤M −1
N c(N )
pBHY
(i) = min(pBHY
(i+1) , p(i) )
i
PN 1
with the normalization constant c(N ) = k=1 k and where the p-values are ordered
descending in the algorithm.
To illustrate the methods, consider 8 investment funds given in 3.18.
√
Fund Ret Vol SR T t-stat t-value p-value
Energy -19,58 16,16 -1,21 1,41 -1,71 0,95637 0,08726
Diversied Dividend 6,70 3,87 1,73 1,41 2,45 0,99266 0,01468
Multi-Asset Income 1,58 3,70 0,43 1,41 0,60 0,72575 0,54850
Global RE Income 5,14 2,14 2,40 1,41 3,40 0,99966 0,00068
Low Vol Equity Yield 8,03 5,38 1,49 1,41 2,11 0,98257 0,03486
Low Volatility Yield 7,77 5,37 1,45 1,41 2,05 0,97982 0,04036
Real Estate 9,20 9,37 0,98 1,41 1,39 0,91621 0,16758
Dividend Income 9,25 4,37 2,12 1,41 2,99 0,99861 0,00278
Table 3.18: 8 investment funds from Ivesco. Data from January 2015 to December 2016.
(Engesser (2018)).
The constant c(N ) = 2.72 and ordering the p-values, pBHY

(8) = 0.5485 is the largest
adjusted p-value. Using the BHY algorithm,
8 × 2.72
pBHY
(7) = min(0.5485, 0.16758) = 0.5209
7
and the other adjusted p-values follow in the same way. Doing the calculation, we ob-
serve that all p-values increased except the highest one and that only two of them,
pBHY
(2) = 0.0302, pBHY
(1) = 0.0148 are statistically signicant compared to the ve signi-
cant strategies in 3.18 before correcting the p-values.
The next example considers the FWER for adaption to the momentum strategy
following the construction of Kenneth French. He considers all stocks on NYSE and
NASDAQ, where six portfolios are formed according to the market cap (small, big) and
historical returns (high, medium and low). We consider data from July 1963 to December
2012, i.e. 594 monthly returns. The null hypothesis is that returns are not dierent from
zero. Calculating rst the performance of the strategy without any adjustments using
the Sharpe ratio we get:
µ√ 0, 7 √
SRp.a. = 12 = 12 = 0.57.
σ 4.29
Calculating the p-value using
p = 2(1 − Φ(t-value) = 0.00006
follows. We reject the null hypothesis. Assuming that there are N = 50 strategy im-
provements, p
Bonf= 0.003 follows which is signicant. If N = 10 000, then the Bonforroni
0
adjusted p-value becomes 0.06, that is the null hypothesis cannot be rejected for 1 000
strategies.
3.6.4 Application to Factor Investing

Harvey et al. (2015) used 313 published works and selected working papers and a cata-
logue 316 risk factors . The 316 risk factors are the result of various sorting mechanism.
Which of these factors are truly independent or which of them are subsumed by other
variables? The standard criterion of using a t-ratio greater than 2.0 as a hurdle is no
longer adequate. There are three main reasons for this.
First, the multiple testing problem using the FDR control replace the single testing
problem p-values. Second, there must be a huge number of putative papers that did not
nd any signicant explanation for the cross section of expected returns. These papers
were never published and hence their information content did not enter the traditional
statistical setup. There are two reasons for these non-publications. You don't make an
academic career in nance by publishing non-results and it is also dicult to publish
a replication of a successful argument. There is a bias toward publishing papers that
establish new factors. Third, Lewellen et al. (2010) show that the explanatory powers of
many documented factors are spurious using cross-sectional R-squared and pricing errors
to judge the success of new factors. The Fama-French 25 size-B/M portfolios in their
three factor model explain more than 90%(75%) of the time-series variation in portfolios'
returns (cross-sectional variation in their average returns). Any new factor added to this
model which is correlated with size and value but not with the residuals will produce a
large cross-sectional R-squared.
Harvey et al. (2015) apply the false discovery proportion (FDP) and the false discov-
ery rate (FDR). The authors derive the following results. Between 1980 and 1991, only
one factor is discovered per year growing to around ve factors in the period 1991 - 2003.
In the last nine years, the annual FDR has increased sharply to around 18: 164 factors
were discovered in the last nine years, doubling the cumulated 84 discovered factors of
the past. They calculate t-ratios for each of the 316 factors discovered, including those
in working papers. The vast majority of t-ratios exceed the 1.96 benchmark and the
non-signicant factors typically belong to papers that propose a number of factors.
The authors apply their method rst to the case in which all tests of factor cross-
section returns are published. This false assumption denes a lower bound of the true
t-ratio benchmark. They obtain three benchmark t-ratios, two of which we describe:
3.6. BACKTESTS 225
• Factor-related sorting results in cross-sectional return patterns that are not ex-
plained by standard risk factors. The t-ratio for the intercept of the long/short
strategy returns regressed on common risk factors is usually reported.
• Factor loadings as explanatory variables. They are related to the cross section of
expected returns after controlling for standard risk factors. Individual stocks or
stylized portfolios (for example FF 25 portfolios) are used as dependent variables.
The t-ratio for the factor risk premium is taken as the t-ratio for the factor.
They transform the calculated t-ratios into p-values for all three methods. Then, these
p-value are transformed back into t-ratios, assuming that standard normal distribution
accurately approximates the t-distribution, see Figure 3.28
Figure 3.28 presents the benchmark t-ratios for the three dierent methods. Using
Bonferroni the benchmark t-ratio starts at 1.96 and increases to 3.78 by 2012 and will
reach 4.00 in 2032. A corresponding p-values for 3.78 is for example0.02 percent which
is much lower than the starting level of 5 percent. Since Bonferroni detects fewer discov-
eries than Holm, the t-ratios of the later one are lower. BHY t-ratio benchmarks are not
monotonic but uctuate before the year 2000 and stabilize at 3.39 after 2010.
Figure 3.28 shows the t-ratios of a few prominent factors - the main result in this
section:
Result 51. Book-to-market, momentum, durable consumption goods, short-run volatil-

ity and market beta are signicant across all types of t-ratio adjustments, consumption
volatility, earnings-price ratio and liquidity are sometimes signicant and the rest are
never signicant.
The authors extend the analysis by testing, for example, for robustness and assuming
correlation between the factors. The above results did not change notably. The analysis
suggests that a newly discovered factor today should have a t-ratio that exceeds 3.0,
which corresponds to a p-value of 0.27 percent. The authors argue that the value of
3.0 should not be applied uniformly. For factors derived from rst principles, the value
should be less.
Harvey et al. (2015) - Many of the factors discovered in the eld of nance are likely
false discoveries: of the 296 published signicant factors, 158 would be considered false
discoveries under Bonferonni, 142 under Holm, 132 under BHY (1%) and 80 under BHY
(5%). In addition, the idea that there are so many factors is inconsistent with the princi-
pal component analysis, where, perhaps there are ve 'statistical' common factors driving
time-series variation in equity returns (Ahn, Horenstein and Wang (2012)).
3.6.5 p-Hacking
In general, p-hacking means to push down the p-value to create signicance. For ex-
ample, testing multiple hypotheses increases the likelihood of false results. That is, the
Figure 3.28: The green solid curve shows the historical cumulative number of factors
discovered, excluding those from working papers. Forecasts (dotted green line) are based
on a linear extrapolation. The dark crosses mark selected factors proposed by the lit-
erature. They are MRT (market beta; Fama and MacBeth [1973]), EP (earnings-price
ratio; Basu [1983]), SMB and HML (size and book-to-market; Fama and French [1992]),
MOM (momentum; Carhart [1997]), LIQ (liquidity; Pastor and Stambaugh [2003]), DEF
(default likelihood; Vassalou and Xing [2004]), IVOL (idiosyncratic volatility; Ang, Ho-
drick, Xing, and Zhang [2006]), DCG (durable consumption goods; Yogo [2006]); SRV
and LRV (short-run and long-run volatility; Adrian and Rosenberg [2008]), and CVOL
(consumption volatility; Boguth and Kuehn [2012]). T-ratios over 4.9 are truncated at
4.9 (Harvey et al. [2015]).
null hypothesis is rejected, although it is correct: The p-value is actually larger and not
signicant. Chordia, Goyal and Saretto (2017) show how the published performance of
investment strategies is doubtful since the manner in which they are evaluated does not
align with research quality standards. First, there is a publication bias since only those
3.6. BACKTESTS 227
strategies that are signicant are reported as only they have a viable path to publication.
Second, data snooping leads to a number of false rejections of the null. Finally, a number
of data choices, test procedures, and samples may be tried until a signicant result is
discovered and only the signicant result is reported. All this is referred to p-hacking.
They use all accounting variables on Compustat data base and basic market variables
on CRSP data base. They construct all possible trading signals based on the data item
of Compustat satisfying minimal requirements. The signals consist of all types levels and
growth rates, ratios of two levels or growth rates, i.e.
x1 − x2
x3
and all possible permutations. This leads to a total of approximatively 2.1 million sig-
nals in 1972-2015. It is clear, that most of these signals are economically meaningless
combinations of items. But this large sample accounts for existing and yet to be studied
trading strategies. Using this sample they ask whether they can put a bound on the
magnitude of p-hacking and furthermore, after accounting for p-hacking, how likely is a
researcher to nd a truly abnormal trading strategy? To strengthen the robustness of
the results, non-treadable and peculiar assets are removed: A 6 month period between
portfolio formation and the data base timestamp is required, all stocks worth less USD
3 and all stocks in the bottom quintile of NYSE market cap distribution are removed.
The authors use FDP to control the proportion of false discoveries, since the trading
strategies are not independent of each other (cross-correlation in stock returns) and FDP
deliver statistical cutos that rely on the cross-correlations present in the data. They
calculate measures of risk-adjusted performance for each strategy by rst constructing
a long-short portfolio based on the top and bottom decile of each signalâs distribu-
tion, computing portfolio alphas using the Fama and French (2015) ve factor model
augmented with the Carhart (1997) momentum factor and they calculate the Fama and
MacBeth (1973) (FM) coecient for each signal.
Imposing a tolerance of 5 percent FDP and the same signicance level, the critical
value for alpha t-statistic is 3.79 (for FM it is 3.12). This numbers are comparable to
those of Harvey et al. (2015). At these thresholds, 2.76 percent of strategies have signi-
cant alphas and 10.80 percent have signicant FM coecients.
43 Using single hypothesis
testing (SHT) with t-statistic higher than 1.96 rejects the null hypothesis in about 30
percent of the cases for both alpha and FM t-statistics. The majority of the discoveries
(rejections of the null of no predictability) based on SHT without accounting for the very
large number of strategies that are never made public are likely false.
The authors add economic reasoning to this so far purely statistical considerations to
gain more robust conclusions. They impose consistency between performance measures
43 The larger critical values for FM than for the alphas are due to the longer tails of the former one.
obtained by portfolio sorts (alpha) and those derived from FM regressions. Eliminating
strategies that have statistically signicant t-value for alpha but insignicant for FM, or
vice-versa, reduces the number of successful strategies to 806 under MHT and to 33,881
under SHT.
The second restriction are economic hurdles based on the Sharpe ratio, i.e. they elim-
inate strategies that do not have a Sharpe ratio higher than that of the value-weighted
market portfolio. Imposing the two economic hurdles leaves us with 17 strategies that
are both statistically and economically signicant under MHT and 801 under SHT. The
the likelihood of a researcher nding a truly abnormal trading strategy tends to zero.
Surprisingly, the 17 surviving strategies fail to have any economic meaning - the
sorting makes no economic sense of these strategies. The authors conclude that the
standard of market eciency is as strong as ever. A dierent conclusion is that while
accounting and economic based sorting is meaningless, this could be dierent for nancial
market signal based sorting such as implied vs realized volatility, credit basis trades or
carry trades.
3.6.6 Active vs Passive Investments

The simple arithmetic drawn from Bill Sharpe, see (2.84)), showed that, before costs,
the return on the average actively managed dollar will equal the return on the average
passively managed dollar. The analysis did not tell us whether an active manager who
beats the average is skilled or just lucky. The skill/luck question is impacted by several
factors. Scale for example often impacts performance negatively: A more skillfully man-
aged large fund can under-perform a less skillfully managed small fund. Pastor et al.
(2014) empirically analyze the returns-scale relationship in active mutual fund manage-
ment. They nd strong evidence of decreasing returns at the industry level and that a
fund's performance deteriorates over its lifetime.
3.6.6.1 The Success of the Active Strategy

Leaving the size-skill dependence aside, how can we dene and measure skills in active
management? We take a skill degree of the asset managers for granted in this and the
next section.
Assume IID returns R ∼ N (0, σ 2 ). Protable trades have by denition a positive return
and then the expected return E(R) of one protable trade is .
44
r
2
E(R) = σ ∼ 0.8 × σ ≡ 80% percentile .
π
Since risk scales with the square root of the number of trades, risk equals for n trades
√
nσ . Consider two portfolio managers. One manager is always successful; the other
44 E(R) = x2
q
.
R∞
√ 1
2πσ 2 0
e− σ dx = σ 2
π
3.6. BACKTESTS 229
is successful in x% of all trades. Both trade n times. The information ratio (IR), the
measure of a manager's generated value, measures the excess return of the active strategy
over risk:
Excess Return Active Strategy over Benchmark
IR = , (3.93)
Tracking Error (Active Risk)
where the tracking error is the standard deviation of the active return. For the investor
with 100% success rate, we get
q
nσ π2
r
2n
IR = √ =
nσ π
The trader with a success rate of x percent faces a loss in 1 − x percent of the trades
leading to a net prot x − (1 − x) = 2x − 1. Hence, after n trades
r r
2 2n
Ex (R) = (2x − 1)nσ , IRx = (2x − 1) . (3.94)
π π
For a xed success rate x an increasing trading frequency n increases the information
ratio. But raising the trading frequency brings about diminishing returns due to the
square-root function. Numerically, an IR of 50 percent needs a success rate x of two-
thirds if the manager trades quarterly. Hence, a high success rate is necessary to obtain
a moderate IR. Assuming that active management is a zero-sum game centered at zero,
Table 3.19 relates the IR to the percentiles: It follows that a top-quartile manager has
Percentile IR
90 1
75 0.5
50 0
25 -0.5
10 -1
Table 3.19: Percentiles of an IR distribution.
an IR of one-half and an IR of +1 is exceptional.
The skill versus frequency of trading (breadth) trade-o reads qualitatively, see (3.94),
IR
x∼ √ (3.95)
n
is of dierent severity for dierent asset classes. Many investors in interest rate risk
trade one a monthly or quarterly basis since they are exposed to fundamental economic
variables. They cannot increase their trading frequency arbitrarily. To achieve a high IR
they need to be very successful. But if markets are ecient, this is not possible. One
expects to observe more skills within (global) asset managers which can exploit inecien-
cies between dierent markets. It is easier to increase the IR by increasing the trading
frequency but this increases trading costs. Beside the naive approach to trade more often
other methods are to enlarge the set of eligible assets for the asset managers or to expand
the risk dimension by allowing investment strategies which generate separate risk premia.
Following this rst example, add some structure to the discussion. Skill have dier-
ent meanings. In its basic form a measure of skill is a hit ratio. It accounts for playing
well a game. This is not a statistical measure. The information coecient IC is such a
statistical measure of skill.. The measure correlates forecast residual return with ex post
residual return. The information ratio relates skill, say IC, directly to capital market
theory such as the CAPM, i.e. by assuming specic IC properties and investor decision
process.
The IR has similar to the alpha an ex-post and an ex-ante interpretation. Ex-post
it measures an achievement; the a ratio of (annualized) residual return to (annualized)
residual risk. Such a realized IR is often negative and in a return regression it is related to
the t-statistic one obtains for the alpha. Roughly, the IR is equal to the alpha's t-statistic
divided by the square root of observation years. The ex-ante IR measures opportunities
given by the expected level of annual residual return per unit of annual residual risk.
3.6.6.2 Fundamental Law of Active Management

Formula (3.93) is one of many formulas to be found in the literature related to skills in
active portfolio management. The most famous formula, the fundamental law of active
management, expressed by Grinold (1989), states:
Proposition 52. Consider mean-variance portfolio optimization where the optimal ac-
tive weights φA maximize the utility function µA − λσA 2 with the expected active return
and active return variance. If the residual stock returns are uncorrelated and if no budget
constraint is imposed, then:
√
IR ∼ IC BR = Skill × Frequency , (3.96)
where IC is the information coecient

of the manager and BR - the strategy breadth
- is the number of independent forecasts of exceptional returns we make per year.
The proof of a more general result is given in 7. IC measures the correlation between
actual realized and predicted returns and provides a measure of a manager's forecasting
ability. Equation (3.96) states that the investors have to play often (high BR) and
play well (high IC) to win a high IR. The fundamental law (3.96) is additive in the
squared information ratios. Formula (3.93) shows the same intuition: 2x − 1 represents
√
IC and n represents BR. The derivation of (3.96) depends on several assumptions, see
Buckle (2005) for a review of the assumptions. Roughly on a behavioral sid, the portolio
manager knows the metric of skill and h optimizes skill, according to a model, say the
CAPM. Regarding securities, the same skill level applies to all asset choices and the
sources of information are independent - forecasts are unbiased and residual returns have
3.6. BACKTESTS 231
zero expected value. Next, the information coecient is a small number and the impact
of estimation error in investment information on out-of-sample optimized investment
performance is not considered. Some consequences following Grinold (1999)are:
• Combine models, because breadth applies across models as well as assets.
• Don't market-time. Such strategies are unlikely to generate high information ratios.
While such strategies can generate very large returns in a particular year, they're
heavily dependent on luck. On a risk-adjusted basis, the value added will be small.
This will not surprise most institutional managers, who avoid market timing for
just this reason.
• Tactical asset allocation has a high skill hurdle. This strategy lies somewhere be-
tween market timing and stock picking - it provides some opportunity for breadth,
but not nearly the level available to stock pickers. Therefore, to generate an equiv-
alent information ratio, the tactical asset allocator must demonstrate a higher level
of skill.
We apply this to portfolio management. To continue, we restate the denition of the IR

of a portfolio given in (3.93) as
Portfolio Alpha αp
IR = = . (3.97)
Portfolio Residual Risk p
For a portfolio P relative to a benchmark B we have:
2p = σp2 − βp2 σB

2
, (3.98)
i.e. residual risk orthogonal to the systematic return. The objective of an active mean-
variance asset manager is to maximize:
θ
E(u) = αp − 2p . (3.99)
2
Replacing the alpha by the IR using (3.98) implies the optimal level of residual risk:
IR
∗p = . (3.100)
θ
Using the fundamental law,
√
IR IC BR
∗p = = . (3.101)
θ θ
The breadth allows for diversication among the active bets and skill increases the pos-
sibility of success so that the overall level of aggressiveness ∗ can increase.
Example Grinold and Kahn (2000)
A manager wants to forecast the direction of the market each quarter. The market
direction takes only two values - up and down, i.e. the random variable x(t) = ±1
with mean zero and standard deviation 1. The forecast of the manager y(t) takes the
same values and has the same mean and standard deviation as x(t). The information
coecient IC is given by the covariance of x and y . If the manager makes N bets and is
correct N1 times (x = y) and wrong N − N1 times (x = −y), then
1
IC = (N1 − (N − N1 )) . (3.102)
N
The fundamental law of active management has been generalized. One reason is that
the IR given in (3.96) seems to overestimate the IR which a portfolio manager can
reach. Assume a forecast signal with an average monthly IC of 0.03 and a stock universe
of 1, 000, Then, the expected annualized IR from (3.96) is 3.29. This is beyond what the
best portfolio managers can realize. Ding (2010) generalizes the law by considering time
series dynamics and cross-sectional properties. He shows that cross-sectional ICs are
dierent from time-series ICs and that IC volatility over time is much more important
for a portfolio IR than breadth: Playing a little better has a stronger impact on the IR
than playing a little more often. He proves
IC √
IR = p BR , (3.103)
1 − IC2
o.e. for a small IC, (3.103) is approximatively the same as (3.96).
From an information processing point of view, active management is forecasting.

There are dierent types of forecast quality. The naive forecast is the consensus expected
return. This is the informationless forecast and if it can be implemented eciently, the
expected returns of the market or the benchmark follow. There are so-called raw and
rened forecasts (Grinold and Kahn [2000]). Raw forecasts are based corporate earnings
estimates or buy and sell recommendations. It is not directly a forecast of exceptional
return. Rened forecasts are conditional expected return forecasts based on the raw
forecast information. The following forecast formula for the excess return vector R and
the raw forecast vector g where the two vectors have a joint normal distribution holds:
cov(R, g)
E(R|g) = E(R) + (g − E(g)) . (3.104)
var(g)
The covariance term is the IC. This equation relates forecasts that dier from their
expected levels. The rened forecast is then dened as the dierence between E(R|g)
and the naive forecast E(R), the consensus expected return. It is the informationless
3.6. BACKTESTS 233
forecast. The naÃ¯ve forecast leads to the benchmark holdings.The forecast formula has
the same structure as the CAPM or any other single factor model. This is not a surprise
but follows from a linear regression analysis.
3.6.6.3 Skill and Luck in Mutual Fund Management I

The approach so far has not addressed the problem of how one can distinguish between
skill and luck. Peter Lynch, the manager of the Magellan fund, exhibited statistically sig-
nicant abnormal performance. Lynch beat the S&P 500 in 11 13years from 1977
of the
to 1989. This is itself not evidence of value enhancement. Consider 500 coin-ippers.
Each ips 13 coins and we count the number of heads for each ipper. The winner, on
average, ips 11.63 heads. But the excess return of Lynch in this period relative to S&P
is remarkable 10.5% which is a strong evidence of skills.
We analyze how skill-full are fund managers. Scaillet et al. (2013) use the FDR
to control for false discoveries or mutual funds that exhibit signicant alphas by luck
alone. They estimate the proportions of unskilled, zero-alpha, and skilled funds in the
population. A fund is unskilled if the return from stock picking is smaller than the costs
(alpha is negative net of trading costs and expenses), a zero-alpha fund if the dierence
is zero, and a skilled fund otherwise (alpha is strictly positive).
We consider the distribution function for the three groups unskilled, zero-alpha, and
skilled funds. Grouping the three distribution functions as a function of the t-statistics,
we have three density functions with the zero-alpha group density function in the middle,
see Figure 3.29. The two density functions overlap - unskilled overlaps with zero-alpha
and zero-alpha with skilled. Pick the latter region of overlap. If a fund has a high enough
t-value, then if the fund belongs to the group of zero-alpha funds, the probability of this
fund having the high t-value is driven by luck. Therefore, in the cross-section distribution
of all funds, some funds with high t-values are genuinely skilled and others are merely
lucky.
Of course, it is not possible to observe the true alphas for each fund. The inference
for the three skill groups is carried out as follows. First, for each fund, the alpha and its
standard deviation are estimated. The ratio of the two estimates denes the t-statistic.
Choosing a signicance level, the t-estimate lies within or outside the threshold implied
by the signicance level. Estimates outside are labelled signicant. The FDR measures
the proportion of lucky funds among the funds with signicant estimated alphas. The
data set are monthly returns of 2, 076 actively managed US open-end, domestic equity
mutual funds that existed at any time between 1975 and 2006 (inclusive).
Of the funds, 75.4 percent are zero-alpha, 24.0 percent are unskilled, and 0.6 percent
are skilled. Unskilled funds under-perform for long time periods. Aggressive growth
funds have the highest proportion of skilled managers, while none of the growth and
income funds exhibit skills. During the period 1990-2006, the proportion of skilled funds
decreases from 14.4 to 0.6 percent, while the proportion of unskilled funds increases from
Figure 3.29: Intuition about luck and skill for the three groups of mutual funds unskilled,
zero-alpha and skilled. (Scaillet et al. [2013]).
9.2 percent to 24.0 percent. Although the number of actively managed funds increases
over this period, skilled managers have become exceptionally rare. This is also reected
in a decreasing overall alpha in the period reaching -1% in 2016, see Figure 3.100. These
facts seem to be a good motivation for passive investments.
What could be reasons for these facts, although the education level of the average
asset manager increased during the two decades? After the peak in 1993 when the alpha
started to decline, the internet was launched. The cost of information started to decrease
over time. Therefore markets became more and more ecient. In other words, luck has
become more important than skill over time. But luck is not persistent. This leads to
an overall decreasing alpha of the industry. They authors test whether funds lose their
outperformance skills due to their increasing size. They treat each ve-year fund record
as a separate 'fund' and nd that the proportion of skilled funds equals 2.4 percent,
implying that a small number of managers have 'hot hands' over short time periods.
A plausible further explanation is the movement of skilled and performing managers

to the hedge funds industry since hedge fund use performance-based fees contrary to
payments used in mutual funds, see the same analysis for hedge funds in Section 5.7.
3.6. BACKTESTS 235
Figure 3.30: Proportion of unskilled and skilled funds (Panel A) and total number of
mutual funds in the US versus average alpha (Scaillet et al. [2013]).
Skilled funds are concentrated in the extreme right tail of the estimated alpha distri-
bution. This suggests a way to detect them. If in a year tests indicate higher proportions
of lucky, zero-alpha funds in the right tail, then the goal is to eliminate these false dis-
coveries by moving further to the extreme tail. Carrying out this control each year, they
nd a signicant annual alpha of 1.45 percent. They also nd that all outperforming
funds waste, through operational ineciencies, the entire created surplus.
The authors re-examine the relation between fund performance and turnover, expense
ratio, and size. For each characteristic, the proportion of zero-alpha funds is around 75%.
The proportion of unskilled funds is qualitatively larger for funds with high turnover -
many unskilled funds trade on noise to pretend that they are skilled. The size of the
fund has a bipolar eect: Both the proportion of unskilled and skilled funds are larger
than for smaller funds.
What about European funds? Scaillet (2015) considers 939 open-end funds between
2001 and 2006. The main ndings are rst, the proportion of zero-alpha funds is 72.2
percent, the proportion of skilled funds is 1.8 percent, and the proportion of unskilled
funds is 26 percent. Second, in skilled funds, we nd low betas with respect to MSCI
Europe. Some skilled funds are known to play bonds and depart from their pure equity
mandates.
3.6.6.4 Skill and Luck in Mutual Fund Management II

Leippold and Rueegg (2018) reconsider the skill and luck question by changing or ex-
tending the analysis of last section as follow.
First, they do not consider equity markets only but also take into account a multi-
risk factor analysis for the xed income mutual funds. Risk factors change in the level,
slope, and curvature of the local yield curve, together with a credit spread. Second,
they compare value-weighted returns of active against index mutual funds within the
same investment category. This allows them to avoid choosing multi-factor benchmarks
and they can compare two investable alternatives where in both alternatives the corre-
sponding friction costs and restrictions are included. They use 30 dierent investment
categories across asset classes. Finally, they distinguish between retail and institutional
funds and they change the statistical methods of last section.
We consider the last point in more details. The studies of Scaillet et al. (2010) and
Fama and French (2010) state or assume that autocorrelation is of minor importance.
Leippold and Ruegg test for autocorrelation in mutual fund returns using a distribution-
free test. They nd that already in the rst three lags serial dependence can be found for
20 percent of single mutual funds and 30 percent of mutual fund portfolios. This evidence
calls for temporal dependence control in the analysis of single and portfolios of mutual
funds alphas against dierent benchmark models. They suggest to block-bootstrap the
alpha of a strategy to its benchmark returns, see Ledoit and Wolf (2008, 2011). This im-
proves inference accuracy for dependent time series data and the bootstrapped t-statistics
and p-values are then the inputs in the multiple hypothesis frameworks, see Romano and
Wolf (2005a). Since the authors test whether single active or index funds signicantly
outperform the theoretical multi-factor models, there are many hypotheses and thus they
use the FDR. For portfolios of mutual funds there are only a few hypotheses and they
use the FWER.
Figure 3.20 summarizes some ndings which are comparable to those of the former
section. The result shows the dierences between retail and institutional funds, for ex-
ample the percentage of skilled active institutional funds with 3.5% compared to 1.4%
and 1.9% for skilled single mutual funds. For the active and index mutual funds only
managers in Europe and Japan have skills. For xed income funds the number of zero
alpha funds is lower. The highest skills are observed in the US and Euro market.
Figure 3.31 represents the hall of fame of successful investors which prove to out-
perform the S&P500 for at least more than 10 years The only persistent quantitatively
managed investments from Renaissance is based on top secrecy about the used methods
3.6. BACKTESTS 237
Retail US Glob. EU Jap Asia Aver USD CHF EUR GBP Aver
Active
Zero alpha 55.1 39.6 66.2 67.9 83 62.3 38.9 71 77.8 83.3 67.8
Skilled 0 0 3 5.7 0 1.7 23.3 3.3 22.2 16.7 16.4
Unskilled 44.9 60.4 30.8 26.4 17 35.9 37.8 25.7 0 0 15.9
Index
Zero alpha 61.9 30.1 76.5 73.7 100 68.4 41.6 93.3 71.6 100 76.6
Skilled 0 0 3.6 5.9 0 1.9 29.2 6.7 28.4 0 16.1
Unskilled 38.1 69.9 19.9 20.4 0 29.7 29.2 0 0 0 7.3
Instit.
Active
Zero alpha 69.3 53.5 78.5 88.2 97.4 77.4 38.5 77.4 60.1 82.7 64.7
Skilled 0 0 8.2 9.4 0 3.5 40.7 22.6 39.9 17.3 30.1
Unskilled 30.7 46.5 13.3 2.4 2.6 19.1 20.8 0 0 0 5.2
Index
Zero alpha 66.9 55.7 91.9 92.5 90.6 79.5 57.5 71.4 56.5 95 70.1
Skilled 0 0 6.8 0 0 1.4 21.3 26.2 43.5 0 22.7
Unskilled 33.1 44.3 1.4 7.5 9.4 19.1 21.3 2.4 0 5 7.2
Table 3.20: For equity the ve-factors benchmark model including the regional model of
the Fama and French homepage for MKT, SMB, HML and WML and AQR homepage for
BAB. For the xed income benchmark model the four factors are 'shift', 'twist', 'buttery'
and the spread of the BBB to the AAA credit spread from MSCI. The Morningstar
database from Dec 1991 to Dec 2016 includes 61,269 funds (Source: Leippold and Ruegg
[2018]).
and the hiring of top scientists from the natural and IT sciences which apply algorithms.
Only one money manager of the alternative investment group is listed in the hall of fame.
Furthermore, it is notable that the macro investors dominate the fundamental investors
which cannot be grouped to the Buet/Graham school. Finally, the appearance of Lord
Keynes shows that it was possible to successfully outperform the US markets in days
where technology was in a state of infancy but instead relying on deep understanding of
the macro economy.
Figure 3.31: Hall of Fame of investors (gurufocs, Hens and FuW [2014]).
Chapter 4
Investment Theory Synthesis

We discussed several approaches to portfolio construction where asset prices were exoge-
nous. Demand and supply for assets are introduced which x the asset prices. We rst
consider absolute asset pricing where all individuals make their optimal choices (such as
portfolio weights or consumption) and markets clear, i.e. supply equals demand, which
denes the endogenous asset prices. Relative asset pricing is a second approach. Base
asset pricing is exogenous given but prices of derivatives relative to the base asset prices
are derived assuming no arbitrage. This assumption is behaviourally minimal (more
money preferred to less) and in this sense opposite to the rich absolute pricing models.
But relative pricing delivers is the most used pricing in nance and the price accuracy
is unique in economics. Absolute and relative pricing are related to each other: The
absence of arbitrage is necessary for a an equilibrium to exist. In other words, a general
equilibrium cannot exist if money machines are possilbe.
1
4.1 Absolute Pricing

Investors solve in absolute pricing a fully-edged economic model: they choose opti-
mal consumption and investment portfolios over time to maximize their expected utility
function. The optimal policies clear the markets of all goods and nancial assets which
determines the equilibrium asset price dynamics. Only a few models can be explicitly
solved, for more complicated ones numerical approximation methods are used. In equi-
librium no rational investor has an incentive to deviate from the equilibrium allocation:
If an investor is optimally short, there must be an investor who optimally buys the asset,
else markets do not clear.
1 We follow Cochrane (2005), Back (2010), Campbell and Viceira (2002), Cochrane (2011), Culp and
Cochrane (2003), Merton (1971, 1973), Martellini and Milhau (2015), Schaefer (2015) and Shiller (2013).
239
240 CHAPTER 4. INVESTMENT THEORY SYNTHESIS
4.2 Simple General Equilibrium Model

We consider two investors i = 1, 2 which can consume at the beginning and end of a
single period a single good c. Both derive utility from a logarithmic utility function over
consumption.
The two investors face the same endowment (salary). They only dier in their im-
patience: b1 and b2 are dierent. The only asset to invest in
The time discount rates
the nancial market is a risk-free bond S , which they can exchange, i.e. there is no money.
An optimal policy xes the optimal consumption levels at the two dates and the
investment amount in the bond at the rst date. These optimizations determine optimal
consumption ci (S) and investment φi (S) for each investor. The policies depend on the
yet exogenous given bond price S. Inserting these strategies in the market clearing
condition xes the endogenous price S = e−Rf of the bond, i.e. the risk-free interest
rate Rf follows from the interaction of the investors. Let φk (S) be the number of bonds
investor k buys and keeps at time 0. Market clearing means φ1 + φ2 = 0: what 1 sells
(buys) must 2 buy (sell). Inserting the individual optimal investment strategy functions
xes the equilibrium risk-free interest rate
2(1 − b1 b2 )
Rf = .
b1 + b2 + 2b1 b2
All quantities which enter symmetrically in the optimization such as endowment have to
cancel in the equilibrium expressions. The time value of money is driven by impatience. If
impatience is zero, the risk-free rate is zero. Other limit or sensitivity cases follow at once.
We derive formally the solution by allowing for more heterogeneity. The log prefer-
ences are:
ui (ci0 , ci1 ) = log(c10 ) + bi log(ci1 ) , i = 1, 2 , 0 ≤ bi ≤ 1
with bi the time preference rate. The budget restrictions read for investor i (with e the
endowment):
1
ci0 − ei0 = − φi
1 + Rf
ci1 − ei1 = φi
with Rf the yet unspecied risk free rate. We introduce the Lagrangian L:
1
Li (ci , φi , λi ) = ui − λi0 (ci0 − ei0 + φi ) − λi1 (ci1 − ei1 − φi ) .
1+r
4.2. SIMPLE GENERAL EQUILIBRIUM MODEL 241
The FOC read:
∂Li i 1 i bi
0 = =⇒ c0 = , c 1 =
∂cij λi0 λi1
∂Li
0 = =⇒ λi0 = λi1 (1 + r)
∂φi
∂Li
0 = .
∂λij
Solving these equations implies:
ei0 ei1 1
ci0 = i
+ i
= i
PV(e )
(1 + b ) (1 + Rf )(1 + b ) 1 + bi
ei0 bi + Rf ei0 bi + ei1 bi
ci1 =
1 + bi
−e1 + ei0 bi + Rf ei0 bi
i
φi =
1 + bi
(1 + Rf )(1 + bi )
λi0 =
ei0 + Rf ei0 + ei1
1 + bi
λi1 = .
ei0 + Rf ei0 + ei1
Using market clearing, we get the equilibrium interest rate:
e11 + e21 − e10 b1 + e21 b1 − e20 b2 + e11 b2 − e10 b1 b2 − e20 b1 b2

Rf =
e10 b1 + e20 b2 + e10 b1 b2 + e20 b1 b2
Assuming that endowment is the same for both agents, endowment cancels in the last
expression and the above equilibrium rate follows.
If risk enters in the model, the FOC conditions become equations where expected
matters but the same logic applies as in this basic example.
Example - SNB Policy
In January 2015 the Swiss National Bank (SNB) removed the oor value between
the euro and the Swiss franc. This oor had been introduced in August 2011 since
EUR/CHF had moved from more than 1.6 to close to parity value in three years. This
had proved to be a signicant burden for the Swiss export industry since two-thirds
of exports are denominated in euros. In 2011, the oor was set to 1.2 up from around
1.1. When the oor was removed, the exchange rate fell within minutes from 1.2 to 0.9
and stabilized over the following days at around 1.05. If we consider the non-regulated
exchange rate to represent the equilibrium rate, the rst intervention forced the rate to
move out of equilibrium, and then removing the oor the rate was allowed to return to
its equilibrium value.
Whatever the utility function of the SNB is, the market clearing conditions shows
their importance. If SNB wants to move a value out of equilibrium, it has to change
the demand or supply side. By buying euros, SNB lowered euro demand at the price to
accept that its balance sheet grew from CHF 100 bn to almost CHF 800 bn.
Example - Logarithmic Utility
Logarithmic utility facilitates calculations and they are also specic for investment
view. Log investors always act optimally myopic (one-period view) independent of the
dynamic context. Their demand for hedging long-term risks is zero. To understand why,
a log investor maximizes log returns. Assuming normality of the returns, the log return
over a long time horizon is equal to the sum of one-step returns. Long-term return is
therefore maximized if the sum over the one-period returns is maximized which is the
same that each one-period return is maximal.
4.3 Fundamental Asset Pricing Equation

We recall the relationship between prices and returns which both matter in the sequel.
Xj
The gross return Rj = Sj of security j with non-zero price Sj is its payo Xj divided
by price. Net return is gross return minus 1. Payos can be price at a future date plus
dividends or transformations f (s) such as for options.
The rational investors derives expected utility from two-period consumption at the
present date t and a future date t + 1,
Et [u(ct , ct+1 )] = Et [u(c1t )] + bEt [u(ct+1 )] , , 0 ≤ b ≤ 1
with b the time preference rate. He chooses investment to maximize expected utility
where consumption is assumed to be already optimally chosen. There is only a single
risky asset and two budget constraints at time t and t+1 (with e the endowment):
ct − et = −φt St
ct+1 − et+1 = φt Xt+1
with S the single risky asset price. Introducing the Lagrangian, the FOC imply the
Fundamental Asset Pricing Equation (4.1)- for asset S at time t follows.
St = Et (Mt+1 Xt+1 ) (4.1)

4.4. EXAMPLES 243
where M is the stochastic discount factor ( SDF),

u0 (ct+1 )
Mt+1 = b (4.2)
u0 (ct )
with u0 (c) marginal utility of consumption. Hence, price is expected discounted pay-
o. (4.1) assumes that there is the underlying general equilibrium model, which ensures
that a single SDF exists which can be used to price all assets by discounting payos.
Equivalent to the existence of a single SDF is the absence of arbitrage.
The SDF relationship between asset prices and consumption states that investments
proposed by asset managers should protect investors' optimal consumption in the short
and long run. This sound theoretical model has some drawbacks. First, investments
based on consumption data of investors often underperform. Second, the precise knowl-
edge of investors' preferences in the analytical sense is limited. Data science oers today
a much more precise capturing of investors behavior but without relying to a single utility
function.
Since consumption at time t+1 is stochastic from vista time t, the discount factor
Mt+1 is stochastic too. The SDF is high if time t+1 turns out to be a bad time -
consumption is low in future states, see Figure 4.1. Then future payos are discounted
weakly in the pricing equation (4.1). They attribute to assets in bad times a high price.
The ratio of marginal utilities entering the SDF reects that investors value money
more when they need it in bad times than in good times. Marginal utility can therefore
be seen as an index of bad times and the SDF as a substitution measure between present
and future consumption is an index of growth in dierent times. The price changes of
S in the fundamental pricing equation (4.1) can have three causes: The probability p,
the discount factor M or the payo X changes. There is strong evidence that expected
return variation over time and across assets dominate and that asset valuation moves far
more on news aecting the discount factor than on news of expected cash ows, that is,
the payo X.
4.4 Examples
4.4.1 Risk Neutral Probabilities
Consider a discrete market model with s = 1, 2, ..., S future states. The market is com-
plete, that is, for each states s there exists a nancial product which pays $1 in this
state s and zero else. These base assets are called Arrow-Debreu securities. In such
a market all securities can be replicated by the base assets - all risks can be shared over
time and in the cross section between all investors. We write Sc (n) for the price today
of a base asset for state n. The price S(X) of any payo X reads:
S
X
S(X) = Sc (s)X(s) .
s=1
Figure 4.1: Marginal utility u0 is a decreasing function of consumption. Hence, in bad

times where consumption is lower at the future t+1 than at present t, the ratio of
marginal utilities in (4.2) is larger than one.
Multiplying and dividing by the probability p(s) 6= 0 for each state:
S S
X p(s) X
S(X) = Sc (s)X(s) = p(s)M (s)X(s) = E[M X]
p(s)
s=1 s=1
where the SDF M is the ratio of state prices and state probabilities. This denition of
the SDF and the former one are equivalent.
If we consider a risk-less asset S0 that is, the payo X(s) = 1 in all states s, then
S0 = E(M ) .
Therefore, the risk-less rate Rf satises
1 1
1 + Rf = = .
S0 E(M )
If we dene risk neutral probabilities

M (s)
q(s) := (1 + Rf )Sc (s) = p(s),
E(M )
4.4. EXAMPLES 245
then the fundamental asset pricing equation reads
S
1 X 1
S(X) = q(s)X(s) = E Q (X) .
1 + Rf 1 + Rf
s=1
The price of any asset is equal to the expected discounted value of the payo using risk
neutral probabilities. The discounted value of the payo is then a Q-martingale (note
that the discount factor is 1 today). More explicitly, replacing the future payo by the
future asset price, the fundamental equation reads
St = EtQ (Dt,t+1 St+1 )

with D the discount factor. This representation is the essence of relative pricing, see
Section 4.6 for the discussion.
4.4.2 Constant Relative Risk Aversion

1
We pay USD 1 and one get 1 + Rf USD for the risk-less rate, i.e. 1 + Rf = E(M ).
Assuming a constant relative risk aversion utility function u(c) = c1−γ with 0 < γ < 1,
the SDF reads
−γ c
ct+1 −γ ln t+1
M =b = be ct
∼ b(1 − γ∆ct+1 )
ct

ct+1
with a Taylor approximation up to the rst order where ∆ct+1 = ln ct . Again
expanding up to rst order:
1 1
1 + Rf = ∼ (1 + γEt (∆ct+1 )) .
E(M ) b
Hence interest rates are higher if people are impatient (low b) or if expected consumption
growth is high. Since high consumption growth means people get richer in the future one
has to oer high risk free rate such that they consume less now and save.
How much does Rf vary over time is the same to ask how much must one oer to
individuals to postpone consumption. This variation is given by the risk aversion factor
γ. Expanding the risk-free rate relation up to second order:
1 1
1 + Rf ∼ (1 + γEt (∆ct+1 ) − γ 2 σt2 (∆ct+1 ) .
b 2
Therefore, higher consumption growth volatility lowers interest rates which motivates
investors to save more in uncertain times.
4.4.3 Zero Coupon Bond

Consider a zero-coupon bond price St,t+1 with one year maturity. Since the bond pays $
1 at maturity in all states, X = 1,
St,t+1 = Et (Mt+1 ) .
Hence, pricing zero-coupon bonds means constructing of discount factors.
4.5 Equivalent Formulation of the Fundamental Asset Pric-

ing Equation
Equation (4.1) can be equivalently rewritten to derive the factor pricing models in the
traditional form. In investment one prefers to think about rates of return instead of
prices. Dividing (4.1) by the price, we get the equivalent return formula
1 = Et (Mt+1 Rt+1 )
with R the gross return on the asset. Similarly, with Re the excess return over the
risk-free rate, we get
e e
0 = Et (Mt+1 Rt+1 ) = hMt+1 , Rt+1 i, (4.3)
i.e. excess return and the SDF are orthogonal to each other. Therefore, the expected
return is the orthogonal projection (beta) of the return on the SDF:
e e
Et (Rt+1 ) = PMt+1 (Rt+1 ). (4.4)
Using
E(M R) = E(M )E(R) + cov(M, R)
and dening the regression coecient βi = cov(M, Ri )/var(M ) and λ = −var(M )/E(M ),
beta pricing is equivalent to (4.1) (deleting time indices):
E(Rie ) = βi λ . (4.5)
The beta is calculated relative to the SDF. In the CAPM, the market return replaces
the SDF in the beta calculation. The risky asset's risk premium is proportional to the
covariance between its returns and the SDF (its systematic risk). All exact pricing fac-
tor models are particular cases of (4.5) where one substitutes a series of factors for the
general SDF. If the asset payo is uncorrelated with consumption (βi =0 in (4.5)), then
the asset does not pay a risk premium, irrespective of how volatile its returns are. All
equations hold true for an additional small investment. But for big asset purchases,
however, portfolio variance can matter a lot. The variance of the payo will aect - in
equilibrium - via the marginal utility, the SDF and nally the risk premium.
Rewriting equation (4.1), we can dene systematic and idiosyncratic risk
St = Et (Mt+1 )Et (Xt+1 ) + covt (Mt+1 , Xt+1 ) . (4.6)
Hence, asset prices are equal to a expected discounted cash ow plus a risk premium.
Idiosyncratic risk in this context is by denition the part that is not correlated with the
SDF and hence does not generate any premium.
4.5. EQUIVALENT FORMULATION OF THE FUNDAMENTAL ASSET PRICING EQUATION247
Example
We reconsider the case u(c) = c1−γ with 0 < γ < 1. Inserting the explicit utility
function up to rst order we get in (4.4)
e
cov(Rt+1 , ∆ct+1 )
e e
Et (Rt+1 ) = βλ ∼ γ cov(Rt+1 , ∆ct+1 ) = γσt2 (∆ct+1 ) . (4.7)
| {z } σt2 (∆ct+1 )
=λ | {z }
=β
If assets covary positively with consumption growth or equivalently negatively with the
SDF then they must pay a higher average return. High expected returns are equivalent to
low asset prices. From a risk perspective, the above equations state that average returns
are high if beta on the SDF or on consumption growth ∆c is large. This is the above
'bad times - low consumption growth - high SDF - high returns or high asset prices' story.
Using the fundamental equation (4.1) with a risk free rate and using the approxima-
tion for the SDF we get:
Et (Xt+1 )
St = Et (Mt+1 Xt+1 ) ∼ − γ cov(Xt+1 , ∆ct+1 ) . (4.8)
Rf
Again, price is higher if the asset payo is a good hedge against consumption growth
(negative correlation).
Example
We reconsider the zero-coupon bond pricing problem St,t+1 = Et (Mt+1 ). To value to

the bond one needs a model for the discount factor M. The simplest model is the discrete
Vasicek model:
ln Mt+1 = zt + t+1 , zt+1 = (1 − a)d + azt + 0t+1 .
The second equation states that the interest rate is a mean-reverting process with d the
long-term mean. Using this model in the pricing equation and calculating the expecta-
tions, the yield y for the two-period zero coupon bond follows:
1 0
yt,t+2 = c1 + c1 yt,t+1 + cov(t+1 , t+1 ).
2
The last term represents the risk premium between the discount factor and the interest
rate shocks. This is the premium which the zero-coupon bond must pay since it payo
moves up or down with the interest rate.
Example
The price of a forward satises:
0 = Et (MT XT ) = Et (MT (ST − ft,T )) .
This orthogonality equation states that the forward rate is given by the orthogonal pro-
jection:
ft,T = Et (ST ) + covt (MT , ST )Rf .
The forward price is therefore equal to the expected future spot price at time T plus a
risk premium.
What can be said about the sign of the covariance in (4.6)? Since the SDF is an
indicator of bad times but assets pay o well in good times, the covariance between them
is typically negative:
St < Et (Mt+1 )Et (Xt+1 ) . (4.9)
This generates a risk premium and allows risky assets to pay more than the interest rate.
Setting X equal to the stock price S and writing S̃t = St /Mt , (4.9) becomes
S̃t < Et (S̃t+1 ) . (4.10)
Investors expect positive gross asset returns. Therefore, the asset price dynamics is not a
martingale under the objective or empirical probability.If asset price dynamics would be
a fair coin toss then returns would not be predictable.Contrarily, to generate risk premia,
asset prices have to be predictable in the statistical sense.
Insurance investments show the opposite behavior to nancial assets in equation

(4.6): A nancial investment's return is positive in good times and negative in bad
times. Contrary, an insurance investment's return is negative in good times but pays o
well in bad times. The covariance in equation (4.6) is positive. Therefore the value of
the insurance, the left-hand side of (4.9), is larger than the right-hand side.
4.6 State Prices, Risk Neutral Probabilities, SDF

There are dierent equivalent views on asset pricing. States prices is the traditional view
in nancial economics, risk neutral pricing is the approach in derivative pricing and pro-
jection pricing is a modern approach to formulate geometrically the Markowitz, CAPM
and APT model based on the SDF.
We always assume in this and the next section that there are N risky assets in
a single period with a nite number of states S. This model covers the basic economic
ideas except information generation over time. The S states describe future uncertainty.
4.6. STATE PRICES, RISK NEUTRAL PROBABILITIES, SDF 249
S j (k) is the asset price of asset j in state k . RN is the space of portfolios where each
component represents an amount of an asset hold. The linear payo map P : RN → RS
associates to a portfolio φ a payo Pφ. In the simplest market structure each payo can
be reached by a portfolio given a payo map. Every risk in the economy can be per-
fectly replicated. But typically, the space of payos which can be reached is smaller than
the state space. This smaller vector space is called the asset span hSi ⊂ RS Proper-
ties of the larger state space are mapped to the smaller span using orthogonal projections.
We impose the weak internal consistency condition of no arbitrage in the market: We

exclude portfolios which allow to make no losses in all future states and gains in some
states in a risky environment. This minimal structure translates to properties of the SDF
(or any other equivalent formalism below).
Denition 53. Consider a one-period model with S > 1 states at time T and N − 1
risky assets S and a risk less asset B . The price of asset j at time T in state k is given
by S j (k). The payo matrix P is dened by2
 1
B (1) S 2 (1) · · · S N (1)

.. .. .. ..
P= . . . .  . (4.11)
 
1 2
B (S) S (S) · · · S (S) N
A portfolio or a strategy is a vector φ = (φ1 , . . . , φN )0 .

The matrix P has the dimension S × N. The payo or portfolio value X at time T
in state k is
X = Pφ . (4.12)
Denition 54. A payo X is attainable given a market structure P if a portfolio φ

exists such that X = Pφ. The portfolio φ is called a replication portfolio. The space of
attainable payos, the asset span, is denoted hSi ⊂ RS .
Investors are interested to nd φ given the payo and the market payo matrix. The
system
X = Pφ
has the solution
φ = P+ X + N (P) (4.13)
with A
+ the Moore-Penrose Pseudo Inverse3 and N () the kernel space of the payo
operator. If P is invertible, the pseudo inverse equals the inverse and the kernel space is
2 We suppress the time index T for the assets.
3 Let A ∈ Cm×n . A Moore-Penrose Pseudo Inverse A+ is a matrix of the same dimension as A if it
satises: (i) AA+ A = A, (ii) A+ AA+ = A+ , (iii) (AA+ )∗ = AA+ and (iv) (A+ A)∗ = A+ A. If A has
linearly independent columns , then (A∗ A)A∗ A is invertible) and
A+ = (A∗ A)−1 A∗ ,
i.e. A+ is the left-inverse A+ A = I. P = AA+ and Q = A+ A are orthogonal projection operators with
the properties: P A = AQ = A, A+ P = QA+ = A+ . P is the orthogonal projector onto the range of A
and (IP ) = (IAA+ ) is the orthogonal projector onto the kernel of A∗ .
empty. The pseudo inverse matters if the payo matrix is not onto, that is the kernel is
not empty. If the kernel is not empty, the inverse does not exist and the pseudo inverse
is minimizes ||Pφ − X||2 which in the invertible case is given by the inverse matrix. If
N =S and if the payo matrix is onto, then the inverse exists. For N =S the sources of
randomness can be spanned in all state by the assets. If there are more states than assets,
S > N, the replication problem has no solution. In the case S<N an innite number
of solutions is the generic case. The number of states and the number of risky assets
determine the market structure and hence the possibility to nd a (unique) replicating
portfolio. The 'perfect' market set-up is dened:
Denition 55. A market with a payo matrix P is complete if each claim X is attainable.
For P invertible, market completeness follows. Given a market with payo matrix P
and asset price vector S0 := (B0 , S01 , . . . , S0N ), arbitrage is dened:
Denition 56. Consider a market with payo matrix P and asset price vector S0 . An
arbitrage is a portfolio φ = (φ1 , . . . , φN )0 such that
• the initial portfolio value V0 = hS0 , φi ≤ 0,
• Pφ ≥ 0
• and there exists at least a single state k̃ where x(k̃) > 0 holds.
If there is arbitrage, a zero costs portfolio today and ends up with no loss tomorrow
in all states and with the chance of a prot in at least one state. Consider a single
asset which prices S
+ and S − and B the risk free asset price. In a risky environment
− +
S < B < S is the natural market structure. This in fact implies the absence of
− +
arbitrage. If B < S < S , you borrow a large amount by selling the risk free asset and
invest the whole amount in the risky one. Whatever the realized future state is you will
make a certain hugh prot. When is a market free of arbitrage? Using state prices ψ
the First Fundamental Theorem of Finance (FFTF) answer the question.
Proposition 57 (First Fundamental Theorem of Finance (FFTF)). There is no arbitrage

opportunity in a nite discrete market model if and only if there exists a vector of state
prices ψ ∈ RS , ψj > 0 for all j , such that
P0 ψ = S0 . (4.14)
See Section 7 for the proof. The FFTF states that the price of asset i at time 0 is
given by
S
X
S0i = ψj Pi (j) .
j=1
To interpret state prices, consider a risk less asset:
S
X S
X S
X
1
B(0) =: B0 = ψj B (j) = ψj × 1 = ψj =: ψ0
j=1 j=1 j=1
since the risk less asset pays 1 in all possible S states. We dene the probabilities in
(0, 1)
qi := ψi /ψ0 , ∀i .
The risk less asset can be rewritten
S
X
B0 /ψ0 = qj B 1 (j) = 1 .
j=1
Therefore, ψ0 is the discount on a risk less borrowing. If r in the risk less annual interest
rate, we write
1
B0 = ψ 0 = .
(1 + r)T
This implies for all other risky assets:
S S
Si

X 1 X
S0i i i Q
= E Q D(0, T )S i

= ψj S (j) = T
qj S (j) = E T
(4.15)
(1 + r) (1 + r)
j=1 j=1
with D(0, T ) the discount factor. The sum of the qi 's is equal to 1 and qi > 0; the q 's are
the risk neutral probabilities. This shows the equivalence of risk neutral probabilities
and state prices:
qi
! ψi . (4.16)
(1 + r)T
State prices are Arrow-Debreu securities e(j), j = 1, . . . , S where the security e(m) pays
1 CHF if the state m is realized and zero otherwise. The First Fundamental Theorem of
Finance implies
(1 + r)T · · · · · · (1 + r)T
    
1 ψ1
 e(1)   1 0 0 0   ψ1 
 ...  = 
     .
··· ··· ··· ···  ... 
e(S) 0 ··· 0 1 ψS
Hence, e(j) = ψj follows. We summarize.
Corollary 58. State-price densities are the prices of the Arrow-Debreu securities. State
price densities and risk neutral probabilities are equivalent.
Each payo can be written as a linear combination of the Arrow-Debreu securities.
In practice one often prefers to work with risk neutral probabilities instead with state
prices. We state the FFTF in this representation which given the above equivalence is
evident-
Proposition 59 (First Fundamental Theorem of Finance). There is no arbitrage oppor-

tunity if and only if a risk neutral probability exists..
The absence of arbitrage does not implies that the risk neutral probability is unique.
The second Second Fundamental Theorem of Finance considers this question.
Proposition 60 (Second Fundamental Theorem of Finance). Consider an arbitrage free

market. The risk neutral probability is unique if and only if the market is complete.
Proof. To show that completeness implies uniqueness of the state vector, assume that
there exist two state vectors ψ1 , ψ2 which both solve the equation S0 = Pψ . This implies
0 = P(ψ1 − ψ2 ), i.e. the vector dierence is orthogonal to all rows of the payo matrix.
Therefore, the dierence is not attainable. This contradicts that the two vectors are state
price vectors. The other direction is proven with a similar argument.
The theorem is equivalent to the uniqueness of SDF or risk neutral probabilities. We

consider examples and start with the following market structure:

4 6 2
P= , S00 = (7, 3, 5)
12 3 9
The equation for the state price density reads
P0T ψ = S0 .
A solution of the linear system is:
ψ 0 = (1/4, 1) .
The market is free of arbitrage. The existence of a unique solution is an exception since
there are 3 equations and 2 unknowns. Typically, the payos of the three non-redundant
securities are conicting.
Consider a market with a risk less asset with zero interest rate and a risky asset with
3 states:  
1 180
P =  1 150  , S0 = (1, 150)0 .
1 120
This market is incomplete. Solving Pψ = S0 , ψj > 0 and the set of state prices is
parametrized by:
ψ = {(a, 1 − 2a, a) , a ∈ (0, 1/2)} .
This incomplete market is free of arbitrage within the given parametrization set. Given
the incompleteness, there exist self-nancing portfolios φ such that there are claims
X ∈
/ hSi: X 6= Pφ for some states. Consider a call option which payo (30, 0, 0). This
call option is not attainable. Since X − Pφ is not zero in all states, hedge risk exist.
No arbitrage alone does not leads to a unique price in this case. A second criterion is
needed to enforce uniqueness. There are many possible criteria. One is that the market
chooses the single RNP Q which is used for pricing. The derivative price is then xed by
mapping the parametrized theoretical prices to observed market prices. This approach
is used in interest rate modelling ('inverting the yield curve').
We consider an incomplete market with two securities S (risky) and B (risk less)
with r the interest rate. The risky security S can achieve three states: S u = S0 u >
S m = S0 m > S d = S0 d with S0 the initial price and the up/mid/down parameters
u/m/d. This trinomial model is incomplete since the payo matrix has the dimension
3 × 2. Since there is a risk less asset, we know that the three state prices add up to the
risk less discounting factor:
1
= ψ1 + ψ2 + ψ3 . (4.17)
1+R
ψj S j S j = S0 × x,
P
The second condition S0 = j follows from no arbitrage. Since
x = u, m or d, this condition reads:
1 = uψ1 + mψ2 + dψ3 . (4.18)
The two equations (4.17) and (4.18) should determine the three dimensional state price
vector. The solution of the two equations, which are two planes, is in general a line or
arbitrage free prices - and not a point as in a complete market. The state price vector is
not unique. Despite the incompleteness the no arbitrage condition is the same as in the
well-known binomial model: There is no arbitrage if and only if
d<1+r <u .
The line is bounded by the requirement that state prices are positive. Each vector on
the line segment used to price derivatives leads to arbitrage free prices. Solving the two
equations, the boundary points of the line segment follow: For m≥1+r
1+r−d u−1−r
ψ1 = , ψ2 = 0 , ψ 3 = ,
(1 + r)(u − d) (1 + r)(u − d)
and
m−1−r 1+r−d
ψ1 = 0 , ψ2 = , ψ3 = .
(1 + r)(m − d) (1 + r)(m − d)
A similar corner solution holds for m < 1 + r.
The boundary values do not lead to arbitrage free prices since some components of
the state price densities are zero. If m→d or m → u, the trinomial model collapses to
the binomial one with the state prices:
1 1
ψ1 = q , ψ2 = (1 − q) , ψ3 = 0 .
1+r 1+r
4.6.1 Projection Pricing and SDF Formulation

We showed that pricing of assets can be represented by by state prices or risk-neutral
probabilities. We discuss as a third method the pricing kernel representation. This
is based on geometry of Hilbert spaces and the existence of the pricing kernel follows
from Riesz Representation Theorem. The ideas are simple. Use projections to write
any payo as a sum of a payo say in the asset span and an orthogonal part, spanning
any vector using a basis and the use the Riesz representation which states that any
linear function on a Hilbert space can be represented by a scalar product which in our
set-up is induced by an expected value. The results in this section are the result of the
application of these three ideas. We recall that in our context Hilbert spaces are complete
vector spaces over the reals where the inner product is induced by the expectation, i.e.
hx, y, i := E(xy). The main source for this section is LeRoy and Werner (2000). The
Riesz Theorem, states that there exists a unique scalar product which equalizes a linear
pricing functional dened on a complete vector space, see the Section 7 for a proof.
Theorem 61. (Riesz ) Let X be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ X , the Riesz kernel, such that
p(x) = hr∗ , xi
for all x ∈ X .
The pricing functional that maps future payos into current prices. In the discrete
market set-up, the existence of a SDF, of state prices and of risk neutral probabilities are
equivalent. Since the asset span hSi is as a subspace of the Hilbert space RS a Hilbert
space in its own right, the Riesz Representation Theorem applies to linear functionals
dened on the asset span. The expectations functional and the payo pricing functional
turn out to be two such functionals of particular interest. To dene them, we need to
comment rst on pricing and valuation functionals. Pricing functionals ψ are functions
from the span to the reals which attribute to the payo a price, i.e. ψ(x) is a price of
the payo x in the span. If markets are free of arbitrage, then ψ is strictly positive.
Taking x to be an Arrow -Debreu state, ψ is a state price. Hence, a pricing functional is
a linear combination of the basis state prices. The pricing functional ψ can be extended
S
on the whole asset space R which is then called the valuation functional. If markets
are complete, the unique representation ψ(x) = hψ, xi holds for the valuation (and hence
the pricing) functional.
Denition 62. The expectations functional E maps every payo x in the asset span into
its expectation E(x). The payo pricing functional ψ maps every payo x in the asset
span into its price ψ(x).
Given the two functionals and by the Riesz Represetation for both functionals exist
a unique vector k∗ , M ∗ such that
E(x) = E(k ∗ x)
and
p(x) = E(M ∗ x).
M∗ is called the pricing kernel and k∗ the expectation kernel. The vector space
∗
generated by M and k ∗ is denoted E.
The construction of the dierent kernels is straightforward. For the Riesz kernel, let
hSi = span(1, 1) ⊂ R2 , the probabilities of the expectation in the inner product (1/4, 3/4)
and the functional p(s) = 2s1 for s = (s1 , s2 ) ∈ hSi. Since (1, 1) is a basis of the span,
the Riesz kernel has to be a multiple a(1, 1) of the basis vector with a ∈ R. Then,
a 3a
p(1, 1) = 2 × 1 = 2 = E(r∗ (1, 1)0 ) = ×1+ × 1 = a,
4 4
i.e. a = 2 and the kernel reads r∗ = (2, 2). To calculate the expectation kernel, let
x1 , . . . xm be m payos with S components
P for the states with the probabilities qs , s =
∗ x) = ∗
P
1, . . . , S for the states. Then, E(x) = q
s P s x s and E(k s qs ks xs . Since the
∗
expectation kernel is in the asset span, k = v as xv , i.e. the kernel can be spanned by
the payos with the coecients unknown. But then
X
qs x s = av xs xsv
v
k∗ =
P
denes a linear system for the a's. Solving this system and using v as xv provides
the expectation kernel. If there are three states with equal probability, two payos
x1 = (1, 1, 0) and x2 = (0, 1, 1), then the expectation kernel reads k ∗ = (a1 , a1 + a2 , a2 ),
2 ∗
and from
3 = E(k xj ), j = 1, 2 the linear system
2
3
1 a1 1 a1 + a2
2 = + .
3 3 a1 + a2 3 a2
k∗ = 2 4 2

Solving the system,
3, 3, 3 follows.
The next proposition summarizes some basic facts (see Section 7 for a proof ).
Theorem 63. 1. If the risk-free payo is in the asset span, then the expectations
kernel is risk-free and equal to one in every state.
2. If the risk-free payo is not in the asset span, then the expectations kernel is the
orthogonal projection of the risk-free payo on the asset span.
3. The pricing kernel is unique regardless of whether markets are complete or incom-
plete.
4. Let ψ1 , . . . , ψS be the state prices of the S states and ps the corresponding probabil-
ities of the states. Then −M ∗ is the orthogonal projection of the vector ψ/q on the
asset span.
5. For any SDF M , E[(M − M ∗ )x] = 0, x ∈ hSi. M ∗ is the projection M on hSi.
6. k ∗ = e if markets are complete.
We apply the representation to mean-variance theory.

Denition 64. The mean-variance frontier is the set M which consists of all payos
x ∈ hSi such that there exists no other payo x0 in the asset span with the same expected
value and the same valuation.
The next theorem is the main result (see Section 7 for a proof ).
Proposition 65. In a discrete, nite state market M = E . The expectations kernel

and the pricing kernel are collinear i all portfolios have the same expected return. If
the risk-free asset is element of the asset span, then expectations kernel and the pricing
kernel are collinear i the expected payo for each asset is the same equal to r. Then,
k ∗ = e and M ∗ = 1r e.
Hence, a a payo is a mean-variance frontier payo i it lies in the span of the

expectations kernel and the pricing kernel. Since return is dened as payo divided by
price and price is given by the valuation functional, we have for x = M ∗:
∗ M∗ M∗
RM := =
p(M ∗ ) E[(M ∗ )2 ]
and similarly
∗ k∗
Rk := .
E[k ∗ ]
This two returns are frontier returns, i.e. returns on the frontier payos or frontier
payofs with unit price (see Section 7 for a proof ).
Proposition 66. Assume that the pricing and expectation kernel are not collinear.
a) The set of frontier returns is given by the line spanned by the two frontier returns Rk
∗
and R M∗ , i.e. for λ a number Rλ

∗ ∗ ∗
Rλ = Rk + λ(RM − Rk )
is a frontier return.
b) If the expectation kernel is risk free,

∗
var(Rλ ) = λ2 var(RM ). (4.19)
c) If the risk-free payo is in the asset span, then the risk-free return is the minimum-
variance frontier return. If the risk-free payo is not in the asset span, then the
choice ∗ ∗ ∗
cov(Rk , RM − Rk )
λ0 = −
var(RM ∗ − Rk∗ )
denes the minimum-variance frontier return Rλ0 .
d) Given any frontier return Rλ , which is dierent from the minimum-variance frontier
return, there exists a zero-covariance frontier return RλC , i.e. cov(Rλ , RλC ) = 0.
Using this proposition, we recover Beta pricing models. Let Rj be return of an asset
j. Then,
Rj = PE Rj + j
denes the projection on the space E and the epsilon term is orthogonal to this space.
Since this space is generated by the expectation and pricing kernel, epsilon is orthogonal
to these two kernels and hence has zero expectation and price. This implies that the
projected return PE Rj is a frontier return. We span this return in a new basis Rλ and
the zero-covariance return, i.e. for some parameter βj
Rj = RλC + βj (Rλ − RλC ) + j .
Taking expectations of this equation and the covariance w.r.t. Rλ , which is uncorrelated
with the zero-covariance return and the epsilon, implies that the beta coecient is the
ordinary regression coecient of Rj on Rλ . If the risk-free payo is in the asset span,
the beta pricing equation
E(Rj ) = Rf + βj (E(Rλ ) − Rf )
follows. Since the market return in the CAPM turns out to be also a frontier return, Rλ
can be replaced by the market return. Hence, the SML of the CAPM is a special case of
beta pricing. The analysis not only holds for a single asset but for a portfolio.
The beta pricing one with one factor Rλ is generalized in the above geometric set-up
in a straightforward way. The span E is replaced by a span F of K normalized factors
fj , i.e. their expected value is zero, and the risk-free asset xf . Projecting an arbitrary
payo xj on the new span space, switching from prices to return the usual representation
K
X
Rj = E(Rj ) + βjk fk + j (4.20)
k=1
follows with the beta's the factor loadings. As in the proof of the beta pricing repre-
sentation, if the pricing kernel and the risk free asset are elements of F, then the exact
factor pricing equation X
E(Rj ) = Ff + βjk λk
k
holds with λk = −E(R M∗ fk )Rf .
So far we did not consider any equilibrium economy analysis in this representation
set-up. To do so, consider a two period economy where agents derive utility from con-
sumption of a single good, the utility function is a smooth function, individuals are
strictly risk averse, there are K factors fj and where the expected error epsilon in (4.20)
conditional on the factors is zero.
Theorem 67. Under the above assumption, if the risk-free asset, the factors, and agents'
endowments at date 0 lie in the asset span and if the aggregate date 0 endowment lies in
the factor then exact factor pricing holds in any equilibrium in which the consumption
allocation is interior.
We nally relate this representation to the case where the SDF M is linearly related
to factor returns, such as for the CAPM
∗
Mt+1 = a + bRM,t+1 , (4.21)
with a, b constants. Using this SDF, the CAPM formulation follows
E(Rj ) = Rf + βj,M (E(RM ) − Rf ) (4.22)
if the parameters a and b are appropriately chosen or the mean-variance model
∗
Mt+1 = a + bRmv,t+1 ,
where Rmv,t+1 is any mean-variance ecient return. As for the CAPM, given any Rmv,t+1
and a risk-free rate, we nd a SDF that prices all assets and vice versa. This shows that
the CAPM and Markowitz model are approximations to the general equi-
librium pricing kernel or SDF - the ratio of marginal utilities of consumptions at
dierent dates is approximated by ane functions in the market and mean-variance re-
turn respectively.
It is worth to express the relationship between factor models and beta representations
in general since the expression of a risk premium given in (4.5) is of limited practical use
because it involves the unobservable SDF. The idea is to start with investable factors
and then derive the beta representation which is equivalent to the SDF approach.
Denition 68. A K -factor model is quantied by M = a + b0 F where F is the K -

dimensional vector of factors, a is a number and b is a vector of numbers. A factor Fk
that has a non-zero loading bk is said a pricing factor
.
The equivalence between factor models and beta pricing models is given in the next
proposition.
Proposition 69. A scalar a and a vector b exist such that M = a + b0 F prices all assets
if and only if a scalar κ and a vector λ exist such the expected return of each asset j is
given by
E(Rj ) = κ + λ0 βj (4.23)
where
1 1
λ=− cov(M, F ), κ = .
E(M ) E(M ) − 1
The K × 1 vector βj is the vector of multivariate regression coecients of the return of
asset j on the risk factor vector F .
See Section 7 for the proof. The vector λ is called the factor risk premia. The con-
stant κ is the same for all assets and it is equal to the risk-free rate if such a rate exists.
We mentioned above that factor models often are not given as pay-os nor as returns,
but the fundamental pricing equation is expressed using pay-os. It possible to replace
a given set of pricing factors by a set of pay-os that carries the same information. The
following proposition summarizes:
Proposition 70. Starting with a SDF in the factor model format M = a + b0 F , we

can always construct a new SDF M ∗ = a∗ + b0 F ∗ where a∗ and F ∗ are the constant a
mimicking and the factor F mimicking payos. These mimicking expressions depend on
the original factors and the payo x as follow:
a∗ = E(x)0 E(xx0 )−1 x , fk∗ = E(Fk x)0 E(xx0 )−1 x, k = 1, ..., K.
'Mimicking' means that the new SDF is as close as possible chosen to match the pay-
o. Summarizing, there is no loss of generality from searching for pricing factors among
pay-os.
Cochrane (2013) distinguishes between pricing factors and priced factors. Con-
sider M = a + b0 F and the factor risk premia λ of Proposition 69. The coecient b in
the SDF is the multivariate regression factor of the SDF on the factors. Each component
of the factor risk premia is proportional to the univariate beta of the SDF with respect
the corresponding factor. If b is non-zero for a given factor means that the factor adds
value in pricing the assets given all other factors - a pricing factor. If the component
of the factor risk premia is non-zero, then the factor is rewarded - a priced factor. The
two concepts are not equivalent except in the case where all factors are independent.
Summarizing, the three representations - discount factors, mean-variance frontiers,

and beta representation - are all equivalent. They all carry the same information. Given
one representation, the others can be found. Economist prefer to use discount factors,
nance academics prefer the mean-variance language, and practitioners the beta or factor
model expressions.
But there is bad news. Factors are related to consumption data entering the SDF.
While multi-factor models try to identify variables that are good indicators of bad vs
good times - such as market return, price/earnings ratios, the level of interest rates, or
the value of housing - the performance of these models often varies over time. The overall
diculty is that the construction of the SDF by empirical risk factors is more an art than
a science. There is no constructive method that explains which risk factors approximate
the SDF in all possible future events reasonably well.
So far we did not consider how to choose risk factors for investment. We discuss some
theoretical recommendations for the choice of risk factors. First, factors should explain
common time variation in returns. Second,assuming that there exist a risk-free rate rf
and M =a+ b0 F , then the denition of the SDF implies for any asset k return rk :
E(rk )
b0 cov(rk , F ) = 1 − .
1 + rf
For all assets earning a dierent expected return than the risk-free rate, the vector of
covariances between the risk factor and the asset's return must be non-zero. Regressing
the returns on the candidate pricing factors, all assets should have a statistically signi-
cant loading on at least one factor. This choice recommendation is model independent.
The next recommendation is based on the APT model. APT not only requires that
factors explain common variation in returns but the theory suggests that these factors
should also explain the time variation in individual returns. This ensures that the pay-
o and hence the price of an asset can be approximated as the pay-o of a portfolio of
factors. Therefore, the idiosyncratic terms should be as small as possible. Performing a
PCA, the largest eigenvalues follows and hence the main factors.
4.6.2 Arbitrage Pricing Theory (APT)

Ross's (1976b) arbitrage pricing theory (APT) starts from the SML and no arbitrage.
Like the CAPM, APT assumes that asset prices are based on systematic risk and not
total risk. But dierent to the CAPM (i) it not assumes that all investors behave alike,
i.e. not all investors need to keep the same portfolio, (ii) nor that the tangency portfolio
or capital-weighted market portfolio is the only risky asset that investors hold and (iii)
that more factors than the single market factor act as risk sources: APT is based on the
idea that portfolios of stocks can be good approximated as linear combinations of returns
of a few basic major macroeconomic factors. Specically, the assumptions underlying the
APT are:
• security returns can be described by a linear factor model;
• there are suciently many securities available to diversify away any idiosyncratic
risk: In a large and diversied portfolio the idiosyncratic risk contributions should
be negligible due to the law of large numbers - investors holding such a portfolio
require compensation only for the systematic part.
• arbitrage opportunities do not exist.
APT does not assume an economic equilibrium nor the existence of risk factors driving
the opportunity set for investments. While CAPM and ICAPM represent the SDF in
terms of an ane combination of factors, APT decomposes returns into factors. CAPM
explains the risk premia; APT leaves the risk premia unspecied.
Assume that there are k factors Fk with a non-singular covariance matrix CF and
N >> F returns RN . Projecting the returns orthogonally on the set generated by the
factors plus a constant:
Ri = E(R)i + cov(F, Ri )CF−1 F + i (4.24)
with Fk = Fk − E(Fk ) the centred factors and idiosyncratic risks i satisfying E(j ) =
cov(Fk , j ) = 0 and E(j k ) = 0 for all j 6= k . The restriction that the residuals should
be uncorrelated across assets implies:
C = β 0 CF β + C (4.25)
where C is a diagonal matrix with non-zero elements the variances of the idiosyncratic
risks, CF is the factor covariance matrix and β is a m×N matrix of betas.
Denition 71. The returns in equation (4.24) have a factor structure with the factors
F1 , . . . , Fk if all residuals are uncorrelated.
To understand APT, assume rst that idiosyncratic risks are zero in (4.24). We can
derive an exact beta pricing model starting from the fundamental asset pricing equation
E(M Ri ) = 1. Writing the expectation product as single expectations plus the covariance
term, inserting (4.24) for the return and rearranging implies the beta pricing equation
(4.23) in Proposition 69:
E(Rj ) = κ + λ0 βj (4.26)
If the residuals are not zero, we get
E(M j )
E(Rj ) = κ + λ0 βj − (4.27)
E(M )
with the additional the pricing error. The idea is E(M j ) → 0 if we increase the number
of uncorrelated assets, see Proposition 24. The analysis requires a precise mathematical
modelling under the assumption that no arbitrage holds. The APT theorem states that
if there are enough assets then the beta pricing equation is approximatively true for most
assets.
Example
Consider two assets with two dierent factor loadings but the same factor F. What
relationship holds between their expected returns if there is no arbitrage? Let φ be the
weight of the rst asset in a portfolio and 1−φ the weight of the second one. The
portfolio return reads (we set for simplicity idiosyncratic risk to zero)
RP = (µR,1 + β1 F)φ + (µR,2 + β2 F)(1 − φ) .
We choose φ such that the portfolio return becomes
(µR,1 − µR,2 )β1

RP = + µR,2
β2 − β1
This is a risk-free portfolio. Therefore, the return must be equal to the risk-free rate µ0 .
Rearranging,
µR,1 − µ0 µR,2 − µ0
= =λ.
β1 β2
Since the two expressions are the same for any asset, the ratios must be equal to a
constant value λ; the factor risk premium since it represents the expected excess return
above the risk-free rate per unit of risk (as quantied by F ). The two assets have the same
factor risk premium. Otherwise, arbitrage is possible. This equality can be rewritten as
µR = µ0 + βλ , (4.28)
the exact beta factor relation holds.
4.6.3 Pricing Real-Estate Risk

We apply the asset pricing theory to real estate risk. This is an important risk source.
In the US, the value of real estate owned by households and non-prot organizations in
2017 was USD 23.8 tr. We state some characteristics of real estate risk.
First, the market for real estate is often larger in valuation than the entire stock
markets. In Switzerland, the value of real estate in (2014) was about 4 to 5 times larger
than the value of all companies listed on the SIX exchange. Second, pure real estate
risk is illiquid. The annual turnover of private owned real estate is in the low one-digit
domain. Table 4.1 illustrates the illiquidity using data from the the state of Zurich in
2011.
Number of houses in the state 690'000

Of which property 210'000 (30%)
New constructions in 2011 11'000 (1.6%)
Of which property 4'300 (40%)
Arm's-length transactions 7'110
Resales 3'700 (1.7%)
Table 4.1: Liquidity for the state of Zurich.
The holding period median value of the private persons' homes is 25 years. Hence,
the construction of a repeated sales index, which would be a transaction based price
index, is not meaningful. Third, one cannot short property. Fourth, historically real
estate risk is the most prominent a frequent driver for a nancial crisis. Fifth, friction
costs for direct real estate transaction are high. Sixth, since each property is unique, the
construction of a standardized asset which can be aggregated to form an index is not
a trivial task. Summarizing, real estate markets are incomplete and inecient. Hence
pricing real estate risk is more an art than a science.
What do we mean by real estate risk? Figure 4.2 provides an overview of investments
and consumption in the real estate asset class.
Figure 4.2: Dierent use of the real estate asset class (Extension of Zürcher Kantonalbank
(2015)).
4.6.3.1 US market: Repeated Sales Index versus Constant Quality Index

Case and Shiller (1994) tested the eciency of the US market for single-family homes.
Since resales of houses occur over time periods of decades, the usual tests that work for
equity could not be applied. The quarterly published 'Constant Quality Index' produced
by the US Census Bureau is compared with the Case and Shiller 'Repeat Sales' index
in Figure 4.3. price index (Case and Shiller [1987, 1989, 1990]). A constant quality
index corrects for dierent quality characteristics zk , k = 1, . . . , K such as size of the
property, view, shopping facilities, number of bedrooms, location, etc. 'Hedoni' refers to
the concept that the value of a home is given by the value of the constituent components
of a home. Therefore, hedonic house prices are not inherently skewed by for example
people shifting to larger properties: Without accounting for change in the parameter
size, property prices increase due to this change in demand or if say the characteristic
'nearness to the city center' changes due to say new trac possibilities, than only this
characteristic's price changes. With enough data points a regression model can be used
to determine the relationship between each of these parameters and the value of a home.
In the time dummy linear model, a single hedonic regression equation is estimated from
data across characteristics starting for quarterly periods 0, 1, . . . , T . The dummy variable
Figure 4.3: Two indices of US home prices divided by the Consumer Price Index (CPI-U),
both scaled to 1987=100. Monthly observations in the period 1987-2013 are considered
(Shiller [2014]).
takes the value δt if the house is sold in period t 6= 0, and zero otherwise. For property
j with prics S in two adjacent periods t, t + 1 the regression reads
4
K
Sjt,t+1 t,t+1
+ t,t+1
X
= β0 + δ t+1
Djt+1 + βk zk,j j (4.29)
k=1
with β the estimated weights of the characteristic. More general models account for
time varying beta over longer time periods. Hedonic models contain between 20 and 30
dierent characteristics for private property.
Figure 4.3 shows that both indices are smooth over time. For real estate price mo-
mentum dominates price volatility. The boom in house prices after 2000 is visible in the
Case Shiller index but not in the Census Constant Quality Index. The reason is that new
homes are built where it is possible and protable to build them. This is often not the
case in the expensive area of a city. Therefore, the constant quality index level through
time is more accurately determined by simple construction costs if as in the US there is
a hugh reservoir of cheap land.
4 See Fisher, Geltner, and Webb (1994), Hansen, (2009), Silver (2018) and Shimizu et al. (2010).
4.6.3.2 Constant Quality Index: Greater London and Zurich Area

Figure 4.4 shows the house price indices in the Greater London and Zurich areas. Both
indices, the Halifax and the ZWEX, are transaction based hedonic models which include
condominiums and single-family houses.
Figure 4.4: Left Panel: The Halifax Greater London price index and the Zurich price
index (ZWEX) (ZKB and Lloyds Banking Group). Right Panel: Halifax Greater London
price index and forwards on the index (Syz and Vanini (2008)).
Figure 4.4, left panel, shows that in the mid-1990s house prices in Zurich and London
started to grow at dierent rates. This is in line with London becoming the world's ma-
jor nancial center. In the GFC, the greater vulnerability of the Halifax index is visible
while during the whole GFC house prices in Zurich never fell.
The right panel shows forwards on the Halifax index at dierent time periods in the
GFC period. In May 2007 the forecast was still on an increasing value of the house
prices: Market participants failed to identify the GFC. During the GFC, forward levels
of the index sharply corrected in each month. The culmination point was October 2008
where the forward levels were predicted too low but the turning point of the index was
identied almost perfectly.
The EMH requires that markets are free of frictions. But in housing markets there are
many sources of friction. Figure 4.5 shows friction sources for dierent types of real-estate
investments in Switzerland. 'Direct' means that investors buy houses, 'indirect' means
to invest in stocks that are related to housing or investment funds and and 'derivative'
refers to property derivatives dened on property indices.
Figure 4.5: Frictions for investment in real-estate markets in Switzerland. Lex Koller
is a federal law which restricts the purchase of property by foreigners (Syz and Vanini
[2008]).
4.6.3.3 Investment, Derivatives

Property derivatives on property indices are niches products compared to REITs in the
US or property investment funds. Property derivative markets never really established.
The rst reliable indices which overcame the standardization problem was launched 1994
in London. The markets started in 2005 in the US where again OTC products domi-
nated. The CME started to launch 2006 futures with very limited success for residential
investment based on the S&P/ Case-Shiller Index. The most common transactions are
swaps. Derivative instruments allow investors to gain exposure to the real estate asset
class, without having to buy or sell properties by replacing the real property with the
performance of a real estate return index. Most popular instruments are swaps, total
return swaps while options are much less established.
Consider the case of derivatives on the residential, hedonic real transactions property
index ZWEX of Zurich Area. In 2006 warrants, calls and puts on the ZWEX, were issued
to allow investors to protect home owner's capital against falling future house prices (in-
dex mortgages, i.e. ordinary mortgage plus a put on the real estate index) and to oer
leveraged investments at the same time. Fix a home owner which seeks protection from
falling house prices at the end of his 5y xed mortgage contract. He buys a put option
on the ZWEX, see Salvi et al. (2008). The put option should nance possible forced
amortizations at maturity of the mortgage if ZWEX falls. To show the impact on capital
protection, consider a present house price of CHF 1 mn and a maximum loan-to-value
(LTV) of 80%, i.e. the mortgage notional is CHF 8000 000. Suppose that house prices are
0 0
down 20% after ve years. Then, 80% LTV of CHF 800 000 means CHF 640 000. With-
0
out a protective put, he is forced to amortize CHF 160 000. The costs of the put option
are 50 bps p.a. Figure 4.6 shows the eectiveness of the hedge for three dierent real
estate house price evolutions. How can the bank as the issuer of put hedge its risk? To
Figure 4.6: Eectiveness of the put option hedge for a 5 year mortgage under three
dierent real estate price scenarios (Syz and Vanini (2008)).
achieve this, a cross hedge between homeowners seeking protection and investor betting
on house prices by buying options from the trading department applies. To be eective,
such a cross hedge assume that the demand on the mortgage and trading side are similar.
But while there was a strong belief that Zurich house price will drop in the GFC period
from investors buying puts, homeowners did not shared this belief. This disequilibrium
led to a failure of the product innovation. One reason was nearness. The buyer of a put
during or after the GFC has a short time horizon while the homeowner has long time
horizon in mind, i.e. when the mortgage contracts are due.
A second example of property derivatives are property swaps, i.e. OTC contracts, see
Geltner and Miller. Assume that a small rm BUY wants to invest in real estate without
facing high costs and illiquity of a direct investment. The rm SELL is over-invested in
real estate and wants to sell real estate market risk exposure. No party intends to buy
or sell objects they are actually invested to circumvent large transaction costs and to
keep regular income stream from the physical objects. A NCREIF Appreciation Swap
('Swap') allows BUY to swap a xed return for NPI appreciation return, i.e. the return
of the property index, and SELL takes the short position of BUY, pays the oating,
quarterly NPI appreciation return and receives from BUY quarterly the xed return.
Netting of the payments occurs quarterly and notional amounts are not exchanged.
We price the Swap using a replication portfolio and no arbitrage. We assume that it
is possible to replicate a NPI return with a portfolio of assets, that there are no frictions
and short-selling is possible. Although these assumptions are violated in practice, the
pricing denes a benchmark which can be compared to the second equilibrium pricing.
The assumptions allow us to construct a risk-less hedge using the replicating portfolio and
the swap. We consider two periods, t, t + 1, t + 2, It the value level of NPI, E[y] expected
income of NPI with y the same random income in each period and S the unknown xed
leg / spread of swap.
t t+1 t+2
Short Index It −gt+1 It − Et [y]It −gt+2 It − E[y]It − It
Risk-less ZCB −Rf It 0 It
Long Swap 0 gt+1 It − SIt gt+2 It − SIt
Hedge (1 − Rf )It −(S + E[y])It −(S + E[y])It
Table 4.2: Risk-less hedge Swap, long position of BUY. ZCB means Zero Coupon Bond,
D discounting with the risk free rate and g is the growth rate of NPI.
The t+1 and t+2 part of the hedge are risk-less. Setting the NPV of the hedge
equal to zero, this means excluding arbitrage, implies for the xed leg
S = Rf − Et [y]. (4.30)
The xed spread S is independent of the NPI level value and only the borrowing costs
of the investor BUY as well the expected income stream matter. If we consider a total
return swap, i.e. all proceeds from the index are also exchanged, then expected income
Et [y] is also part of the index value and S = Rf follows using the same replication ap-
proach.
For the equilibrium valuation, we introduce the risk premium
FI = E[RI ] − Rf
and decompose E[y] = E[RI ] − E[g] with E[g] the real estate appreciation rate. Then
the xed no-arbitrage spread
S = E[g] − FI ,
is equal to the expected index appreciation rate minus the risk premium. A no arbitrage
argument is not allowed since it is not possible to short It . We assuming linear pricing
rules in equilibrium. BUY expects the net return which consists of the NPI appreciation
return E BU Y (g), minus S plus receives Rf from the covering bond position to be not
smaller than the swap risk premia:
E BU y [g] − S + Rf ≥ Rf + FI .
SELL also considers his overall net return. It consists of S plus the expected return on
his real estate portfolio E SELL (RS ) which should be as close as possible to the return of
the NPI minus NPI appreciation return E SELL (g). Since by assumption E[y] is constant
and the NPI swap obligation is covered by the bond portfolio, net risk exposure is zero.
Summarizing, SELL's requirement is:
S − E SELL (g) + E SELL (RS ) ≥ Rf .
Connecting the two requirements implies the price range:
Rf − E SELL (RS ) + E SELL (g) ≤ S ≤ E BU Y (R) − FI .
This is not an equilibrium condition since beliefs can dier. If beliefs are the same for
BUY and SELL, then RS = F and the two inequalities become the equality F = E[g]−F ,
i.e. the single price in the complete market using no-arbitrage follows. Assuming that
the expectation of BUY of g is b bps higher than the market expectation and SELL
expectsg to be lower by −s, s ≥ 0, bps below the market expectation, then S can vary
between S ± b + s bps. Clearly, infeasible beliefs are also possible. If say SELL assumes
that market expectations will be lower than his expectations for example, then no S will
exist.
4.6.4 Multi-Period Asset Pricing and Multi-Risk-Factors Models

If we consider equity with D=X the dividends, we get in many periods the fundamental
value equation of the dividend discount model of corporate nance generalizing (4.1):
∞
X 1
St = Et Dt+j , (4.31)
(1 + R)j
j=1
with R the internal rate of return on expected dividends: For two stocks with the same
expected dividends but dierent prices, the stock with the lower price has to have a
higher expected return. Merton's (1973) multi-factor inter-temporal CAPM (ICAPM)
generalizes to the case of several factors assuming:
• Investors choose an optimal consumption path and an optimal investment portfolio

to maximize their lifetime expected utility.
• Investors care about the risk factors market return RM and so-called innovations
Y.
• Innovation factors describe changes in investment opportunity which by denition

is equal to the set of all attainable portfolios. Examples are changing volatilities,
changing interest rates, or labour income. Innovations are orthogonal to the asset
space generated by the market return.
In the Markowitz model, the investment opportunity set consists of all ecient and
inecient portfolios. If the investment opportunity set changes over time, then variables
Y other than the market returns drive returns. Working without these factors trivializes
human behavior and needs. Using market return only for example, all investors are
jobless since no labor income exists. The possible change of the investment opportunity
set for investors is more important for longer-term investment horizons than for shorter
ones since the deviations from a static opportunity set can become larger for longer time
horizons. The solution of the ICAPM model generalizes (4.5) to
St (Re ) = bM λM + bI λI = Θcov(Re , RM ) − Ωcov(Re , RI ) (4.32)
where Θ is the average relative risk aversion of all investors and Ω is the average aversion
to innovation risk. The mean excess returns are driven by covariances with the market
portfolio and with each innovation risk factors. The geometric intuition of this beta
pricing model is the same as in the case with xed opportunity sets. The rst term
in (4.32) is mean-variance ecient but the total portfolio is no longer mean-variance
ecient. Economically, the average investor is willing to give up some mean-variance
eciency for a portfolio that better hedges innovation risk. The mutual fund theorem of
the Markowitz model generalizes to a K +2 fund theorem if there are K innovation risk
sources. Investors will split their wealth between the tangency portfolio and K portfolios
for innovation risk.
4.6.5 Low Volatility Strategies

Low-beta stocks outperform in many empirical studies high beta stocks and volatility neg-
atively predicts equity returns (negative leverage eect), see Haugen and Heins (1975),
Ang et al. (2006), Baker et al. (2011), Frazzini and Pedersen (2014), Schneider et al.
(2016). This means, high beta (risk) is not rewarded as it should be according to the
asset pricing equations. These denes the beta and volatility low risk anomalies.
There are dierent ways how to rationalize these anomalies by enlarging models which
lead to the anomalies. Schneider et al. (2016) show that taking equity return skewness
into consideration rationalize these anomalies. Thy generalize the CAPM which serves as
an approximation and allows for higher moments of the return distribution. This leads to
skew-adjusted betas. They use credit worthiness of the rms as the source for skewness
in returns:The higher a rm's credit risk, the more the CAPM overestimates the rm's
market risk, because it ignores the impact of skewness on asset prices (Schneider et al.
(2016)). Benchmarked returns against the CAPM then appear to be too low since the
CAPM fails to capture the skewness eect. Formally, starting with (4.5), dening the
regression coecient βi = cov(M, Ri )/var(M ) and the variable λ = −var(M )/E(M ), we
get the equivalent equation to (4.1)
cov(M, Ri ) σ(M )
Et (Rie ) = . (4.33)
σ(M ) E(M )
Schneider (2015), Kraus and Litzenberger (1976) and Harvey and Siddique (2000) dene
the risk premium as the dierence between the expected value of a derivative X based
on the historical probability P and on the risk neutral probability Q:
Risk Premium = EtP (XT ) − EtQ (XT ) . (4.34)
The two probabilities P, Q can be related to each other by the state price density L:5 f
dQ
L= , E P (L) = 1 . (4.35)
dP
To illustrate the technique, consider two states with probabilities P = ( 12 , 12 ) and Q=
1/3
(1/3, 2/3). Then in state 1, L1 = 1/2 . Therefore,
1
E P (X) = p1 X1 + p2 X2 = (X1 + X2 ) = E Q [LX] = q1 L1 X1 + q2 L2 X2 .
2
Using M =L in (4.33) and the risk premia for the market risk return we get:
cov(L, Ri )
Et (Rie ) = e
Et (RM ). (4.36)
cov(L, RM )
The expected return on asset i is proportional to the expected excess return on the mar-
ket, scaled by the assets covariation ratio with the pricing kernel - the true beta. Since
L is not observable, the authors approximate L(R) := E P (L|R) in a power series in R .6
Using a linear and a quadratic approximation of L in (4.36) changes the true beta into
a CAPM beta (linear case) or a skew-adjusted beta in the quadratic case.
... a rm's market risk also explicitly depends on how its stock reacts to extreme mar-
ket situations .. and whether its reaction is disproportionally strong or weak compared
to the market itself. A rm that performs comparably well ... in such extreme market
5 The Radon-Nykodim L derivative (math), the state price density (economics), likelihood ratio
(econometrics).
6 To achieve this L is written as an innite series. The coecients in the series depend on P, Q, i.e.
the price dynamics of the assets, and the risk aversion of the investor. Geometrically, the representation
of L is equivalent to orthogonal projections of L on the space generated by the powers of R.
situations, has a skew-adjusted beta that is lower relative to its CAPM beta. ... investors
require comparably lower expected equity returns for rms that are less co-skewed with the
market. Schneider et al. (2016)
To incorporate time-varying skewness in the stock returns the authors consider cor-
porate credit risk by using the Merton (1974) model. In this models, equity value at
maturity date is an European call option on the rm value with strike equal to debt
(which is a zero-coupon bond). For rms with high credit risk, the increased probability
to default is reected in strong negative skew of the return distribution. The forward
value of equity is then given by the expected value of the call option discounted with the
SDF M =L under P . This forward value denes with the call option value the rm's
i excess equity
e
return Ri . The expected gross return is given by (4.36) with the linear
and quadratic approximation replacing the SDF. For the linear CAPM the betas increase
with credit risk, i.e. the asset volatility or the leverage, and the rm correlation to the
market. Comparing this beta with the skew-adjusted one, the latter one is in general
larger. The dierence increases the higher credit risk: The rm becomes more and more
an 'idiosyncratic risk factor' and hence less connected to the market the stronger the
skew is. In this sense the CAPM approximation overestimates expected equity returns,
i.e. the return anomaly.
Schneider et al. (2106) apply their model implications to low risk anomalies, the
so-called Betting-Against-Beta (BAB) strategy, see Frazzini and Pedersen (2014).
BAB is based on the empirical observation that stocks with low CAPM betas outperform
high beta stocks. Hence, investors believing in BAB goes long a portfolio of low-beta
stocks and short a portfolio of high-beta stocks. To reach an overall zero beta, the
strategy takes a larger long than short position. The strategy is nanced with riskless
borrowing. Frazzini and Pedersen (2014) document that the BAB strategy produces
signicant prots across a variety of asset markets. Using empirical evidence from 20
international stock markets, Treasury bond markets, credit markets, and futures markets
Frazzini and Pederson (2014) ask:
• How can an unconstrained arbitrageur exploit this eect, i.e., how do you bet against
beta?
• What is the magnitude of this characteristic relative to the size, value, and momen-
tum eects?
• How is BAB rewarded in dierent countries and asset classes?
They nd that for all asset classes alphas and Sharpe ratios almost monotonically
decline in beta. Alphas are decreasing from low beta to high beta portfolios for US
equities, international equities, treasuries, credit indices by maturity, commodities and
foreign exchange rates. Constructing the BAB factors within 20 stock markets they nd
for the US a Sharpe ratio of 0.78 between 1926 and March 2012 which is twice as much
as the value eect and still 40% larger than momentum. The results for international
assets are similar. They also report that BAB returns are consistent across countries,
time, within deciles sorted by size, and within deciles sorted by idiosyncratic risk and
are robust to a number of specications. Hence, coincidence or data mining are unlikely
explanations.
The BAP strategy is rationalized in the model of Schneider et al. (2016) as follow.
The CAPM betas increase for xed credit risk (xed volatilities and leverage) with the
rm's correlation to the market: buy stocks with low and sell stocks with high correlation
to the market. The alpha of this strategy, the excess expected return relative to market
covariance risk, is given by the rm's expected return for the skewness. These typically
positive alphas increase with increasing credit risk. Summarizing, the BAB returns can
be directly related to the return skewness induced by credit risk.
4.6.6 What Happens if an Investment Strategy is Known to Everyone?

We follow Asness (2015) who considers the value risk factor - that is to say, bets that
cheap stock investments will beat expensive investments. What happens to a strategy
if it becomes more and more widely known? Intuitively, at the beginning of a strategy
one faces true alpha which is expected to move towards a beta strategy the wider the
strategy is used. But even a public known strategy can continue working for dierent
reasons. First, the investor is receiving a rational risk premium, i.e the strategy exists
in equilibrium. If the long (cheaper) stocks are more risky than the short and more
expensive one on a portfolio level which cannot be diversied away, then it is rational
that there is a persistent risk premium. A second reason is behavioural: Investors make
from a rational point of view errors. The long stocks have a higher expected return not
because they are riskier, but because of these errors - the stocks are too cheap and one
earns a return if they return to their rational value.
The relative impact of both explanations can vary over time. During the tech bubble
of 1999-2000 for example, cheap value stocks - which typically are cheaper because they
are riskier - were cheaper because investors were making errors.
The two explanations behave dierently when a strategy becomes known. In the ra-
tional model the value strategy still works but at a level consistend with the equilibrium
demand and supply side.The equilibrium property conserves both the expected return
and the risk of the strategy.
In the behavioural explanation, the risk source is not systematically linked to the
return in equilibrium. There is no systematic demand and supply as in the equilibrium
model to guarantee that the risk premium will not go away. It is therefore dicult to be
convinced that risk remains stable over time.
Asness (2015) compares these two dierent views using historical data and the Sharpe
ratio. If a strategy has an impact on the risk premia if it becomes more common, the
Sharpe ratio is expected to fall. Either because excess return diminishes or because risk
increases. With regard to the returns, one could argue that if the value strategy becomes
more popular, then the 'value spread' between the long and short sides of the strategy
gets smaller. This spread measures how cheap the long portfolio is versus the short port-
folio. If more and more investors are investing in this strategy, then both sides face a
price movement - long is bid up and short is bid down. This reduces the value spread.
The author uses the FF approach for value factor construction. He calculates the
ratio of the book-to-price ratio of the cheapest one-third over the BE/ME of the most
expensive one-third of large stocks. Clearly, cheaper stocks always have a higher BE/ME
than the expensive stocks. But the point is to compare how the ratio of large-cheap over
large-expensive changes over time as an approximation of the attractiveness of the value
strategy. Considering 60 years of data, the ratio is very stable, with a 60-years median
value of 4. There is no downward or upward trend. The only two periods during which
the ratio grew signicantly - reaching a value of 10 - correspond to the dot-com bubble
and the oil crisis of 1973. This measurement shows little evidence that the simple value
strategy was arbitraged away in the last 60 years.
To analyze the risk dimension, the annualized, rolling, 60-month realized volatility
of the value strategy for the last 56 years is considered. Again, the dot-com bubble is
the strongest outlier followed by the GFC and the '73 oil crisis. There is again little
evidence that the volatility of the strategy is steadily rising or falling. The attractiveness
of a strategy is best measured by the in- and outows of investment in the strategy.
Increasing inows should, on a longer time scale, increase the return of a strategy and
the opposite holds if large outows occur. This was not observed in the above return
analysis.
4.7 Optimal Investment Strategy and Rebalancing

Investors are interested in the optimal investment strategies portfolio. Merton laid the
foundations in his works from 1969 and 1971 (Merton [1969, 1971]). The rational agents
optimize their lifetime expected utility of consumption by choosing their optimal con-
sumption path and optimal investment portfolio.
The work of Merton triggered a myriad of academic papers. These papers dier from
one another in many respects, including the innovation risk sources, the agents pref-
erences, information asymmetries.Fortunately, for most models the optimal investment
strategy have the same structural form which we considered in equation (2.23). We recall:
• A static strategy (buy and hold) is the choice of a portfolio at initiation without
changing the portfolio weights in the future.
• A rebalancing strategy is a constant proportion trading strategy where the port-

folio values in the assets do not vary.
4.7. OPTIMAL INVESTMENT STRATEGY AND REBALANCING 275
• Myopic strategies are strategies that are independent of returns that are ahead
more than one dperiod.
The optimal strategy φ(t), for time separable expected utility maximizing models,
consists of two parts:
φ(t) = Short-Term Weight + Opportunistic Weight (4.37)
The short-term weight is also called the myopic investment demand, and the opportunis-
tic weight the hedging demand or long-term weight. This general rule follows from the
the 'Principle of Optimality' of R. Bellman, see Section 3.2.1.
4.7.1 General Rebalancing Facts

We write (4.37) in the form (2.23):
φ(t) = MPR × RRA−1 + (1 − RRA−1 )∆Y × RIRA−1 (4.38)
where:
αt −rt
• The Market Price of Risk (MPR): MPR = σt2
.
−1
• RRA the inverse relative risk aversion - the investor's risk tolerance.
−1
• RIRA the inverse relative innovation risk aversion.
• ∆Y the hedge of innovation risk factors.
∆Y = 0, then optimal investment is equal to myopic

If the opportunity set is constant,
−1
investment. The myopic component MPR × RRA is equal to a the optimal solution
−1
of a one-period model. The second component (1 − RRA )∆Y × RIRA−1 represents
the desire to hedge against future opportunity changes.
Denition 72. Equation (4.38) denes the theoretical TAA.

The myopic part corresponds to the one-period TAA used in practice, see Section
3.3.4.6.
This is called the TAA and the dierence to the theoretical one is the more dicult to
handle long-term part. The main reason of not considering the intertemporal hedging
demand part is complexity and uncertainty.
We comment on the basic optimal strategy formula (4.38).
First, the optimal investment strategy is time-varying. In general, portfolio rebalanc-

ing is.
Second, rebalancing is countercyclical. Consider the two asset case. If an asset price
increases, the market price of risk for this asset increases (larger return, lower risk).
Rebalancing means to sell this asset.
Third, even if transaction costs are considered, rebalancing remains optimal. Only
the frequency and the strength of the rebalancing change.
Fourth, which of the two components in (4.37) is more important? In the extreme
case where returns are not predictable or stochastic opportunities are not changing over
time or the investor has a logarithmic utility function, then the long-term investment
strategy part in (4.37) vanishes and it is optimally to invest myopically. Else, there are
studies which show evidence that that the rst component is more important and other
ones which indicate that the other term dominates. Basically, the result depends on size
of the opportunistic weight which is driven by two factors: predictability and investment
opportunity. The closer asset returns are to predictability and/or the less stochastic
opportunity set variations matter, the lower is the opportunity component.
Fifth, the parameters in the MPR are in general time dependent. If the expected
return is larger than the risk-free rate or if the risky asset's volatility decreases, invest
more in the risky asset. If the risky asset's expected return is low enough or even negative,
go short on the risky asset and use the money raised to invest more than 100 percent of
the capital in the risk-free asset. If there is more than one risky asset, MPR keeps its
form but the division by the variance is replaced by a multiplication with the information
matrix
MPR = C −1 (µt − rt ). (4.39)
Comparing this with the solution of the Markowitz problem (3.3), φ = 1θ C −1 µ, where
there is no risk-free asset, shows that the rst component of the optimal investment
strategy (4.37) denes a mean-variance ecient portfolio. The MPR is proportional
to the Sharpe ratio. In this sense, portfolio theory with the seminal work of Merton
without innovation risk factors rationalizes the Sharpe ratio and the Markowitz model
to many period investing.
Sixth, the inverse relative risk aversion measures the curvature of the utility function
−1
as a function of wealth: If the investor is risk neutral, RRA = 1. The more risk averse
−1
an investor is, the lower RRA is and the more is optimally invested in the risk-free
asset. The notion of relative risk aversion raises two delicate issues. First, there is a
calibration result by Rabin (2000) that shows that expected-utility theory is an utterly
implausible explanation for appreciable risk aversion over modest stakes. That is, the
only explanation for risk aversion in expected-utility by the curvature of the utility func-
tion leads to non-reasonable results. Second, the measurement of RRA is, in itself, a
delicate matter.
Seventh, the opportunistic weight consists of three dierent terms: Rhe expression
1 − RRA−1 is meant literally in the sense that in some models if the investor is getting
−1
more risk averse, RRA decreases, then the myopic component in the optimal portfolio
becomes less important whereas the long-term or opportunistic weight is attributed more
weight. Second, the aversion to innovation risk sources. Third, a hedging demand against
innovation risk. This is proportional to cov(R
e , R ) in (4.37) - that is to say, the hedging
I
demand follows from the correlation pattern of the innovation's portfolio return with
the overall portfolio return. Investors will increase their holding of the risky asset given
by the rst term if it covaries negatively with state variables, that matter in the value
function to the investor. A bond is such a hedge against falling interest rates.
Eighth, if liabilities matter such as in goal based investment, then in both expressions
in (4.38) functions of time dierences f (T − t) enter where T is a realization time of a
liability. These functions take into account the 'way to go' eect. It is for example
optimal to take more risk given a positive drift if there is 5 years left to nance, given
an actual nancing degree, a liability than if only one month is left until maturity of the
liability.
To see how the optimal investment formula fail to be applied in reality, consider the
Great Financial Crisis (GFC). Pick an investor with a relative risk aversion of 1, a normal
market return of 6% in stocks, a risk free rate of 2% and volatility of 18%. The investor
assumes that returns are IID; he is a myopic investor. From the optimal portfolio formula
0.06−0.02
(4.38): φ= 0.18 = 0.6. That means the investor holds 60% in equities and 40% in a
risk-less asset. In the GFC, volatility (both realized and implied one) increased to levels
around 70%. The optimal myopic formula implies φ = 0.04, i.e. a 4% equity position or
a reduction by 93% from the pre-crisis investment. But stock market participation was
not reduced by 93%. Since the average investor holds the market, he did not show the
same behavior as our theoretical.
4.7.2 Convex and Concave Strategies

We compare three well-known strategies:
• Do nothing (buy-and-hold) [In (4.38) all parameters are constant];
• Buy falling stocks, sell rising ones (constant-mix strategies) [Contrarian view to
the myopic part in (4.38)];
• Sell falling stocks, buy rising ones (portfolio insurance strategies) [In-line with the
myopic part of (4.38)].
We follow Perold and Sharpe (1988) and Dangel et al. (2015). They consider buy-and-
hold, constant mix (say 60/40 strategies), constant-proportion portfolio insurance and
option-based portfolio insurance. We focus rst on the rst two strategies with a risky
asset S and a risk free asset B. In th payo diagrams value of the assets is a function of
the value of the stock and in th exposure diagrams the relation between dollars invested
in stocks to the total assets is calculated.
The payo diagram for the 60/40 rule is a straight line with a slope of 0.6, the max-
imum loss is 60% of the initial investment and the upside is unlimited, see Figure 4.7.
The exposure diagram is also a straight line in the space where the value of the assets
are related to the stock position. For a buy and hold strategy, the slope is 1 and the line
intersects the x-axis of the value of the assets at USD 40. If the portfolio is less than 40
USD, the demand to invest in the stock is zero. For a constant mix strategy, the slope
becomes 0.6 in the exposure prole and the intersection point is at (0, 0). Hence, investor
with a constant mix strategy holds stocks at all levels.
If there is no volatility in the market, either stocks rise or fall forever. Then the buy-
and-hold payo always dominates the constant mix portfolio. But with volatile markets,
the success of the strategy depends on the paths of asset prices. Since rebalancing is the
same as a short volatility strategy, see Section 2.3.6.3, a constant mix portfolio tends to
be s superior strategy if markets show reversal behavior instead of trends. If the trends
dominate, buy-and-hold is superior.
60/40 60/40
Buy-and-hold Buy-and-hold Zero volatility
Value of assets
Value of assets
60/40
Constant mix
40
Stock value Stock value

60/40
Volatile stocks
Buy-and-hold
60/40
Value of assets
Weight stocks
Constant mix
60/40
60/40
Buy-and-hold
Constant mix
40 Value of assets Stock value
Figure 4.7: Payo and exposure diagrams for constant mix and buy-and-hold strategies
(Adapted from Perold and Sharpe [1988]). The left panels shows the payo diagram for
the 60/40 buy and hold strategy and the exposure diagrams for the 60/40 strategy once
buy-and-hold or dynamic, that is assuming a constant mix. The upper right panel shows
the superiority of the buy-and-hold strategy when there are only trends and the lower
diagram shows that constant mix strategy can dominate the buy-and-hold one if there
is volatility depending on the stock asset path which is represented by the thickness of
the asset value line.
This shows that the performance of rebalancing depends on the investment environ-
ment: dierent economic and nancial market periods lead to dierent results. Ang
(2013) compares the period 1926-1940 with the period 1990-2011. He compares buy-
and-hold investments in US equities and US Treasury bonds and pure investments in
the two asset classes with the rebalanced (60/40) investment portfolio in the two assets.
The countercyclical behavior of rebalancing smoothes the individual asset returns. It
leads to much lower losses after the stock market crash in 1929 but it was not able to
follow the strong stock markets before the crash compared to the static strategy. The
rebalancing strategy also leads to much less volatile performance than the single asset or
bond strategy.
We consider the third alternative - portfolio insurance. Maximizing expected return

with constant absolute risk aversion implies that optimal static sharing rules are linear
in the investment's payo: It is optimal to hold a certain fraction of a risky investment
rather than negotiating contracts with nonlinear payos. This also holds in some dy-
namic models such as the Merton model (1971). If investment opportunity sets are not
changing, the proportions of risky and risk-free assets are kept unchanged over time. But
this requires portfolio rebalancing: Buying/selling the risky asset when it decreases/in-
creases in value and selling it with increasing prices - the constant mix strategy holds.
Theoretically, with this strategy investors invest in risky assets even in market stress
situations. In practice, however, there is a strong demand for portfolio insurance since
investors have a considerable downside-risk aversion. Therefore, a rebalancing method
'opposite' to the constant mix is required: selling stocks as they fall.
Returning to the tree alternatives - do nothing, buy (sell) stocks as they fall (rise),
sell (buy) stocks as the fall (rise) - the payo of the strategies are linear, concave or
convex. The last strategy is called convex since the paoy function is increasing with an
increasing rate if the stock values increase. Hence, the kind of rebalancing has an impact
on the payo without making reference to a specic decision rule. Concave strategies,
such as the constant mix strategies, are the mirror image of convex strategies such as
portfolio insurance. The buyer of one strategy is also the seller of the other one.
Summarizing, buying stocks as they fall leads to concave payos. These are good
strategies in market with no clear trend since the principle 'buy low, sell high' applies.
In markets under stress, losses are aggravated since more and more assets are bought.
The convex payo of portfolio insurance strategies limits the losses in stressed markets
while keeping he upside intact. But if markets oscillate, their performance is poor.
There are many ways to construct convex payos:
• Stop-loss strategies. The investor sets a minimum wealth target or oor that must
be exceeded by the portfolio value at the investment horizon. This strategy is
simple but once the loss is triggered the portfolio will no longer be invested in the
risky asset and hence participation in a risky asset recovery is not possible.
• In the option based approach one buys a protective put option. While simple,
this strategy has several drawbacks. First, it act against many investor's behavior
that one should buy portfolio insurance when it is cheap - stock markets boom.
Second, buying an option at the money is expansive compared to the expected
risky asset return and since one has to roll the strategy costs multiple. Therefore,
such option based strategies are often used in long-short combinations (buying
out-of-the-money put and sell an out-of-the-money call).
• Constant Proportion Portfolio Insurance (CPPI). This strategy is a simpler version

of the protective put strategy.
4.8 Short-Term versus Long-Term Investment Horizons

This section is based on Campbell and Viceira (2002). The theoretical set-up allows us
to discuss the relevant practical questions or observations:
• Financial planners often recommend investors with a longer investment horizon to

take more risks than in the case of a short time horizon.
• Conservative investors are advised to hold more bonds relative to stocks than ag-
gressive investors. This contrasts the constant bond - stock ratio in the tangency
portfolio of the CAPM. This is the asset allocation puzzle.
• Judgement of risk may be dierent for long-term and short-term investors. Cash
which is considered risk free for the short term becomes risky in the longer-term
since it must be reinvested at an uncertain level of real interest rates.
4.8.1 Time-Varying Investment Opportunities

When investment opportunities vary, optimal long-term portfolio choice is dierent from
myopic portfolio choice. Investment opportunities can vary because market factors do so
(interest rates, volatility, risk premia) or because non-market factors vary (labor income).
We consider the case of time-varying short-term interest rates. An investor with

constant relative risk aversion maximizes his consumption paths by investing in a single
risky equity asset and a risky short-term rate asset. The optimal investment in the risky
asset in (4.37) reads
µt − rt −1 cov(It+1 , −Et (It+1 ))

φ(t) = 2 RRA + (1 − RRA−1 ) (4.40)
σt σt2
with It+1 the short-term interest rate at time t + 1.
If the interest rate return is IID, then the optimal strategy is the myopic one, i.e. the
second is zero. Assume returns are not IID. If the investor becomes more risk averse,
−1
RRA tends to zero. A conservative investor will not invest in the risky asset to cap-
ture its short-term risk premium but rather fully hedge the future risk of the risky asset.
4.8. SHORT-TERM VERSUS LONG-TERM INVESTMENT HORIZONS 281
Hence, short-term market funds are not a risk-less asset for a long-term investor. Camp-
bell and Viceira (2002) show that the risk-less asset is in this case an ination-indexed
perpetuity or consol. Note that for all results an individual investor's viewpoint is con-
sidered buy not an equilibrium. Hence, possible equilibrium feedback eects on the asset
prices and returns are missing.
Predictable asset returns lead to a hedging demand. If equity is predictable, there will
be an inter-temporal hedging demand for stocks. Campbell and Viceira (2002) consider
a model where long-term investors face a time varying opportunity due to changing
interest rates or changing equity risk premia. A striking result is that a conservative
investor will hold stocks even if the expected excess return of the stock is negative. How
is he compensated for doing so? We rst assume that the covariance between risky asset
returns at two consecutive future dates is negative. This captures that equity returns
are mean-reverting: an unexpectedly high return today reduces expected returns in the
future. This describes how the investment opportunities related to equity vary over time.
If the average expected return is positive, the investor will be typically long on stocks.
Given a negative correlation, for stocks with a high return today future return will be
low and hence the investment opportunity set deteriorates. The conservative investor
wants to hedge this deterioration. Stocks are just one asset that delivers increasing
wealth when investment opportunities are poor. Figure 4.8 illustrates, for a conservative
investor, three alternative portfolio rules.
Figure 4.8: Portfolio allocation to stocks for a long-term investor, a myopic investor, and
for a CIO choosing the TAA (Campbell and Viceira [2002]).
The horizontal line represents the optimal investment rule if the expected excess
stock return is constant and equal to the unconditional average expected excess stock
return. The TAA is the optimal strategy for an investor who observes, in each period,
the conditional expected stock return. The myopic strategy and TAA cross at the point
at which the conditional and unconditional returns are the same. The TAA-investor
is a myopic investor with a one-period horizon. The SAA line represents the optimal
investment of a long-term investor. There is a positive demand for stocks even if the
expected return is negative. This reveals that the whole discussion in this section can
be seen as describing the structure of strategic asset allocation (SAA). In fact, Formula
(4.37) can be transformed as follows:
φ(t) = Short-Term Weight + Opportunistic Weight
= Short-Term Weight - Long Run Myopic Weight
+ Long Run Myopic Weight + Opportunistic Weight (4.41)
The long-term investor should hold long-term, ination-indexed bonds and increase
the average allocation to equities in response to the mean-reverting stock returns (time-
varying investment opportunities). Empirical tests suggest that the response to changing
investment opportunities occurs with a higher frequency for stocks than for the interest
rate risk factor. Therefore, this long-term weight or SAA should be periodically reviewed
and the weights should be reset.
4.8.2 Model Portfolios

Whether or not investors use long-term investments as described in the last sections
depends on constraints taken from WEF (2011):
1. Liability prole - the degree to which the investor must service short-term obliga-
tions, such as upcoming payments to beneciaries.
2. Investment beliefs - whether the institution believes long-term investing can produce
superior returns.
3. Risk appetite - the ability and willingness of the institution to accept potentially
sizable losses.
4. Decision-making structure - the ability of the investment team and trustees to exe-
cute a long-term investment strategy.
Comparing this with optimal investment formula (4.38), point 3. is captured by risk
aversion, 2. denes the asset universe selection of the model and 1. is part of the utility
function.
The WEF (2011) report considers the question who is the long-term investors. They
build the following ve categories. Family oces with USD 1.2 trillion AuM, endowments
or foundations with USD 1.3 trillion AuM, SWFs with USD 3.1 trillion AuM, DB pension
funds with USD 11 trillion AuM and ve insurers with USD 11 trillion AuM. Matching
these dierent types of investors to the above listed four constraints leads to the following
long-term investment table (Source for the table is WEF (2011) and the many sources
cited therein):
Investor Liability constraint Risk appetite Decision Estimated

Family oces In perpetuity High Low 35%
Endowments In perpetuity High Low 20%
SWFs In perpetuity Moderate Moderate 10%
DB pension funds D 2-15 yrs Low High 9%
Insurers D 5-15 yrs Low High 4%
Table 4.3: Decision represents the decision making structure, D the average duration
and Estimated the estimated allocation to illiquid investments (WEF [2011]).
The following model portfolio construction of Ang et al. (2018) provides an a prac-
titioner's approach to long and short term investment in asset classes.
Their model portfolios are parametrized by investor's preferences such as risk toler-
ance and the selection of the asset universe. Their construction combines three portfo-
lios: A performance benchmark reecting investorâs risk appetite, a construction of
the strategic model relative to the benchmark which reects long-term view on market
and nally a tactical model portfolio is considered mimicking short-term views.
The benchmark portfolio φB is a xed equity-bond portfolio, say 80/20. The chosen
fraction mimics risk tolerance of the investor. Such benchmarks can be implemented
at low costs and the performance of more complicated portfolio can measured without
diculty. The strategic portfolio φS is the solution of a mean-variance optimization
problem relative to φB where both the risk aversion and covariance matrix are long-term
parameters. Several constraints are used such as equalizing the equity components of
the strategic to the benchmark portfolio, long-only, full-investment and many more. The
short-term or tactical portfolio φS also solves a mean-variance problem where the short-
term expected returns and covariance matrix parameters enter. The two main constraints
are hφS , ei = 0, i.e. it is a zero-dollar long-short portfolio which shapes the strategic allo-
cation, and hφS , CS φs i = 1, i.e. the short-term risk aversion follows from this constraint.
This short term portfolio is weighted by market signals implying the so-called long-short
combined portfolio φC = hw, φS i where the weights wi add up to one. Adding φC + φS
denes the target portfolio φ∗ . Finally, the model portfolio φM is the portfolio which
minimizes the
∗ ∗
variance h(φ − φ , CS (φ − φ )i subject to the full-investment constraint
and linear constraints for the asset classes.
The authors use liquid ETFs to implement the approach. Besides the broad equity
and bonds which enter the benchmark portfolio they use ETFs for the styles momentum,
minimum volatility, value, quality and size. These more complicated indices are added
in the strategic portfolio and the tactical portfolio. Figure 4.9 illustrates their model
portfolio construction.
Figure 4.9: Tactical U.S. equity model portfolio construction process. White MSCI USA
Index, Red MSCI USA Minimum Volatility Index, Blue MSCI USA Momentum Index,
Green MSCI Risk Weighted Index, Light Blue MSCI Value Index. (Ang et al. (2018)).
The gure shows that the start is to choose performance benchmark reecting the
preferences, i.e. blending with MSCI USA Minimum Volatility Index the MSCI USA
Index. Active risk of 250 bps relative to the performance benchmark is added in the
strategic model portfolio. For the model portfolio capturing short-term tactical views
on the chosen style, the factors are re-weighted relative to the strategic portfolio. This
means to add additional an average of 110 bps risk. The model is tested full amount of
active risk relative to the performance benchmark is approximately 300 bps. The model
is tested using data from Jan 2000 to Jun 2017. Note that not all styles existed for the
whole period, i.e. the index values are then theoretically calculated. The nal model
portfolio generated an annual return of 8.9%, outperforming the performance benchmark
by 3.4% per year. The outperformance is attributed to two sources. The strategic port-
folio tilts the factors which possess inherent and persistent risk premia. Second, the
short-term indicators have some ability to predict factor returns. These time-varying
active positions versus the strategic benchmark generate excess return.
Comparing this approach with the general optimal decision making formula, the two
components of long term and short term are present. The way how they enter in the
nal model portfolio is a multi-stage process which consists of several plausible particular
optimizations. Why the whole approach should be optimal at all is not considered at
all. Furthermore, since one period decisions are made in each model type, risks are not
distributed over time in an optimal way but in a kind of a static long term part and a
varying short-term allocation.
4.8.3 Fallacies in Long Term Investment

When asset returns are IID, the variance of a cumulative risky return is proportional to
the time horizon implying that the standard deviation is proportional to the square root
of the time horizon (the square-root rule). Since the Sharpe ratio uses standard devia-
tion, the ratio grows with the square-root of the time horizon. It is therefore tempting
to increase the investment time horizon to increase the Sharpe ratio. This is a pseudo
risk-return improvement since Sharpe ratios must always be measured over the same time
interval.
Are equities less risky than bonds in the long run? Siegel states (Siegel [1994]):
It is widely known that stock returns, on average, exceed bonds in the long run. But it
is little known that in the long run, the risks in stocks are less than those found in bonds
or even bills! [...] But as the horizon increases, the range of stock returns narrows far
more quickly than for xed-income assets [...] Stocks, in contrast to bonds or bills, have
never oered investors a negative real holding period return yield over 20 years or more.
Although it might appear riskier to hold stocks than bonds, precisely the opposite is true:
the safest long-term investment has clearly been stocks, not bonds.
Using the standard deviation, Siegel advices that long-term investors should buy and
hold equities due to the reduced risks of stock returns at long maturities. But such a risk
reduction only holds if stock returns are mean reverting: returns are not IID. But we
showed that a long-term buy-and-hold strategy is not optimal. The optimal strategy is
a strategic market timing strategy with a mixture of myopic and hedging demand parts.
If one follows Siegel's advice, the buy-and-hold investment strategy is not optimal. The
other logical direction is also true: an optimal long-term investment strategy does not
produce the suggested portfolio weights of Siegel.
The herding of pension funds. Pension funds consider, by their very denition,
an innite time horizon in their investments since each year there are new entrants to
the pension scheme. As long-term investors, one would expect pension funds to focus
on their long-term investment strategies. They should therefore behave dierently than
typical short-term asset-only managers. But there is a dierent investment motivation,
which may counteract long-term investment behavior: the fear of underperforming rela-
tive to their peer group, which denes such funds incentive to herd.
Such herding may be stronger for institutional investors than for private investors.
First, there is more trade transparency between institutional investors than between
private investors. Second, the trading signals that reach institutional investors are more
correlated and hence increase the likelihood of eliciting similar reactions. Finally, because
of the size of the investments, institutional herding is more likely to result in stronger
price impacts than is the herding of private investors. Therefore, to adopt a position, as
an institutional investor, outside the herd will have a stronger return impact than would
such a position if adopted by private clients.
Blake et al. (2015) study the investment behavior of pension funds in the UK, an-
alyzing - on an asset-class level - to what extent herding occurs. Their data set covers
UK private sector and public sector dened-benet (DB) pension funds' monthly asset
allocations over the past 25 years. They present information on the funds' total portfolios
and asset class holdings, and are also able to decompose changes in portfolio weights into
valuation eects and ow eects.
These authors nd robust evidence of reputational herding in subgroups of pension

funds. Similar pension funds follow each other. Public-sector funds for example follow
other public-sector funds of a similar size. This follows from a positive relationship be-
tween the cross-sectional variation in pension funds' net asset demands in a given month
and their net demands in the preceding month. A second result is that pension funds seem
to use strong short-term portfolio rebalancing. Funds rebalance their long-term portfo-
lios such that they match their liabilities. Since the maturity of pension fund liabilities
increased, pension funds have systematically switched from UK equities to conventional
and index-linked bonds.
The authors also nd that pension funds mechanically rebalance their short-term
portfolios if restrictions in their mandates are breached. They therefore, on average, buy
in falling markets on a monthly basis and sell in rising markets. This is suboptimal given
the optimal investment rule (4.32). Therefore, pension funds' investments fail to move
asset prices toward their fundamental values, and hence do not stabilize nancial mar-
kets. The market exposure of the average pension fund and the peer-group benchmark
returns match very closely the returns on the relevant external asset-class market index.
This is evidence that pension fund managers herd around the average fund manager:
they could simply invest in the index without paying any investment fees.
As a nal result, the pension funds studied captured a positive liquidity premium
contrary to the expectation that these long-term investors should be able to provide
liquidity to the markets and earn a risk premium in return.
Chapter 5
Global Asset Management

The asset management industry faces many challenges and opportunities: Regulation,
technology, economic growth, demographic and the climatic change, see Walter (2013),
UBS (2015), PwC (2015). AM faces a shift in the investor base meaning increasing
wealth in developing countries. The trend of managing household assets in the form of
professional, collective schemes such as mutual funds (US) or SICAV vehicles (eurozone)
will continue. The robo-advisors dene a disruptive technology if they once reach an
acceptable level of quality. This technology has a potential for radically changing the way
how investment decisions are made. Related to demographics, untenable, government-
sponsored pension systems (pay-as-you-go schemes) need to be fundamentally redesigned
by asset pool systems, which are in line with demographic changes. From a return
perspective, the search for alternative asset classes due to the increasing eciency, and
hence decreasing alphas, of traditional asset classes continues. Distribution is being
redrawn both globally and locally based on technology. This means that platforms and
open architecture will dominate. This allows for economies of scale, mastering regulatory
complexity and cost transparency. The transformaion of fee models and that alternatives
are becoming more mainstream are two further changes.
5.1 Asset Management Industry

5.1.1 The Demand and Supply Side
The AM industry clients' are segmented into private and institutional clients. Insti-
tutional clients include pension funds, insurance companies, family oces, corporate
treasuries, and government authorities. The two categories categories dier in many
respects. Retail clients pay higher fees than institutional ones. Institutional investors
ask for pure asset management services, while private clients often combine their asset
management demands with other banking services such as nancial planning or mortgage
lending. Private clients invest more heavily in wrappers of investment solutions such as
mutual funds, ETFs or structured products. Institutional clients prefer to invest in cash
products directly (bonds or stocks) and using overlays to cut and paste the return prole.
287
288 CHAPTER 5. GLOBAL ASSET MANAGEMENT
Finally, institutional clients have better or exclusive market access such as to alternative
investments.
Trading units and asset management rms are the suppliers of assets for investment.
Mutual funds or ETFs are often oered by non-banking rms such as . BlackRock. These
rms issue products but also provide other services.
1
The largest asset management organizations in 2017 were BlackRock with USD 6.3
trillion AuM followed by the Vanguard Group.
2 The largest fund in 2014 was the SPDR
ETF on the S&P 500 managed by State Street Global Advisors with assets of USD 224
bn; see the Appendix for details.
5.1.2 Asset Management Industry in the Financial System - the Eu-

rozone
We follow EFAMA (2015). Asset management companies are one channel between
providers and users of funds in the case where the parties do not exchange the assets
directly by using organized market places. AM rms provide a pooling of funds for in-
vestment purposes. Banks, another channel, oer also non-asset management functions.
Insurance companies or pension funds take savings from households or companies and
invest them in money markets and capital markets. The main services of the AM indus-
try to clients are savings management (diversication, reduction of risk by screening out
bad investment opportunities), liquidity provision (providing liquid asset to clients while
investing in not necessarily liquid assets) and reduction of transaction costs (size matters).
But the asset management rms also contribute to the real economy. Firms, banks
and governments use AM rm to meet their short-term funding needs and the long-term
capital requirements. The AM contribution to debt nancing is 23%: European asset
managers held this amount of all debt securities outstanding which also represents 33%
of the value of euro-bank lending. The equity nancing gures are similar. The AM
industry held 29% of the market value of euro area listed rms and 42% of the free-oat.
From a corporate nance perspective, the valuation and market capitalization of asset
management rms compared to banks and insurance companies between 2002 and 2015
is shown in Table 5.1 (McKinsey (2015)):
The number of asset management companies in 2017 in Europe was approximately
40 200 up from 30 300 in 2014. Most companies are located in France, Ireland, Luxem-
bourg, Germany, UK, Netherlands and Switzerland. The high number of Ireland and
1 BlackRock Solutions - the risk management division of BlackRock - was mandated by the US Trea-
sury Department to manage the mortgage assets owned by Bear Stearns, Freddie Mac, Morgan Stanley,
and other nancial rms that were aected by the nancial crisis in 2008. This gained expertise boosted
the BlackRock Solutions to become more important than the asset management rm part.
2 The Vanguard Group 5.1 tr USD, Charles Schwar 3.4 tr USD, UBS 3.1 tr USD, State Street 2.8 tr
USD.
5.1. ASSET MANAGEMENT INDUSTRY 289
Feature Asset management rms Banks Insurance

Market Cap (100 in 2002) 516 313 231
P/E ratio 16.1 11.3 14.8
P/B value 3.2 1.2 1.6
Table 5.1: Key gures 2015 for asset management rms, banks and insurance companies.
(McKinsey [2015])
Luxembourg is due to their role played in the cross-border distribution of UCITS funds
(see below). The main asset management center where the investment management
functions are carried out is London. The average AuM per asset manager range from
EUR 9 billion in UK to less than one billion in Portugal and Turkey for example. The
industry is highly concentrated in each country. Market shares are concentrated in UK
(35%), France (17%) and Germany (9%). The top 5 asset managers in Germany control
94% percent of all assets and in the UK the corresponding gure is 36%. In UK and
France, less than 20% of the rms are owned by banking groups. In Germany (60%) and
Austria (71%) of the asset management functions are part of a bank. Insurance compa-
nies play a signicant role in Italy, UK, France and Germany (all 13%) and in Greece
(21%). Institutional investors represent the largest client category of the European asset
management industry, accounting for 71% of total AuM at end 2016. Pension funds and
insurance companies accounted for 28% and 25% of total AuM, respectively.
Total Assets under Management (AuM) in Europe increased by 10% in 2017 to EUR
25.2 trillion. Comparing the growth of investment funds versus discretionary mandates in
Europe, both categories have increased in 2014 to a similar level of EUR 13.1(9.1) trillion
in investment funds and EUR 12(9.9) trillion in discretionary mandates (EFAMA (2018)
and (2015)). The share of investment funds compared to the mandates was falling from
2007 until 2011 but it then started to increase in the last three years. While mandates
represented more than 70% of the AuM in the UK, Netherlands, Italy, Portugal, and more
than 70% of the all AuM in Germany, Turkey or Romania were invested in investment
funds. The dominance of either type of investment can have dierent causes. In the UK
and the Netherlands pension funds play an important role in asset management and they
prefer to delegate the investment decisions. The pool of professionally managed assets in
Europe remains centered in the UK (37% market share), France (20%), Germany (10%),
Italy, Nordic countries and Switzerland.
The number of individuals directly employed (asset managers, analysts) in the indus-
try is estimated 2017 (2013) at 1100 000(900 000) with the dominant part of one-third in
the UK. The indirect employment such as IT, marketing, legal, compliance and admin-
istration is estimated to boost the total number of employees in the whole industry up
to a half-a-million individuals.
5.1.3 Global Figures 2007-2014, Market Structure

The following gures in the period 2007-2014 are from McKinsey (2015).
• Per annum, global AuM growth between is 5%. The main driver was market per-
formance. Typically, the net AuM ows are between 0% and 2% per annum.
• The growth of AuM is 13.1% in Europe, 13.5% in North America and 226% in
emerging markets which is largely due to the money market boom in China.
• The absolute value of prots increased in Europe by 5%, 29% in North America
and 79% in the emerging markets.
• Prot margins as the dierence between net revenues margin and operating cost
margin are 13.3 bps in Europe, 12.5 bps in North America and 20.6 bps in emerging
markets. The observed revenue decline in Europe is due to the shift from active
to passive investments, the shift to institutional clients and the decrease in man-
agement fees. The revenue margin in the emerging markets is only slightly lower
in 2014 compared to 2007 (down to 68.1 bps from 70.6 bps) but the increase in
operating cost margin from 33.8 bps to 47.4 bps in 2014 is signicant.
• The absolute revenues in some emerging markets such as China, South Korea,
Taiwan are with values between USD 10.1 bn to USD 3.7 bn. They are almost at
par with the revenues in Japan, Germany, France and Canada (all around USD 10
bn). The revenue pools of UK (USD 21.2 bn) and the US (USD 150.8 bn) are still
leading the global league table.
• The cost margins in Europe are stable between 21 bps and 23 bps. The split of the
cost margin is in sales and marketing (around 5 bps), fund management (around 8
bps), middle/back oce (around 3.5 bps) and IT/support (around 6 bps). There
is a cost increasing trend for IT/support, decreasing costs for sales and marketing
and middle/back oce.
• From a customer segment perspective, retirement/DC grew with a Compounded

Annual Growth Rate (CAGR) of 7.5% is almost twice as strong as the retail sector
with 4% between 2007 and 2014. The institutional customer's CAGR was 5%.
These average global rates dier for dierent geographic regions. The retiremen-
t/DC CAGR dominates in Europe the retail one by a factor of 4 whereas in the
emerging markets, the CAGR for institutional customers is 13% compared to 11%
for retirement/DC.
By considering the above facts one should take into account the particular circumstances
in the years after the GFC such as the decreasing interest rates level and stock market
boom which were the main factors in the success of the asset management industry in
this period.
Investment type 2003 2008 2012 22016 E2025

Passive /ETF 2 3.3 7.9 14.6 E36.6
LDIs 0.6 1.6 2.5 - -
Active Core 24.8 28.1 30.9 60.1 E87,5
Active Solutions 8.2 10.8 15.1 - -
Alternatives 1.9 3.9 6 10 21.1
Table 5.2: Global distribution of AuM by product and its dynamics in the last decade in
trillion USD. Alternatives includes hedge, private-equity, real-estate, infrastructure, and
commodity funds. Active solutions includes equity specialties (foreign, global, emerging
markets, small and mid caps, and sector) and xed-income specialties (credit, emerging
markets, global, high yield, and convertibles). LDIs (liability-driven investments) in-
cludes absolute-return, target-date, global-asset-allocation, exible, income, and volatil-
ity funds. Active core includes active domestic large-cap equity, active government xed-
income, money market, and traditional balanced and structured products (Valores Cap-
ital Partners [2014]).
The gure for 2016 and the projection to 2015 are from PwC (2018).
Table 5.2 illustrates the global distribution of AuM by product and its dynamics in
the last decade.
The table indicates that the growth rate of passive investments is larger than for active
solutions. McKinsey (2015) states for the period 2008-2014 that the cumulated ows are
36% for passive xed income and 22% for passive equity. Standard active management
is decreasing for some asset classes and strategies: Active equity strategies lost 20% on
a cumulated ow basis while active xed income gained 52%. A next observation is
that active management of less liquid asset classes, or with more complex strategies, is
increasing. An increase of 49% cumulate ows for active balanced multi asset and of 23%
for alternatives. The global gures vary strongly for dierent regions or countries. Swiss
and British customers adopted the use of passive much faster than for example Spanish,
French or Italian investors. Figure 5.1 shows the distribution of global investable assets
by region and by type of investor.
Regulation imposes a great deal of complexity on the whole business of asset man-
agement and banking. On the other side of the fence, there is a so-called shadow banking
sector with much less regulatory overview. Although the expression 'shadow bank' makes
no sense at all - either an institution has a banking license or not - there is an incentive for
banks to consider outsourcing their asset management units to these 'shadow banking'
sector.
Traditional and non-traditional asset managers' (alternative asset class managers)

roles are converging. Traditional asset managers have continuously lost market share to
low-cost ETFs. They therefore consider liquid alternative products to stop the bleeding.
This is one reason for the convergence. Non-traditional asset managers, on the other
hand, want to expand into traditional segments since their non-traditional products are
Figure 5.1: Global investable assets by region in trillions of USD (Brown Brothers Har-
riman [2013]).
becoming more liquid and more transparent. This is the other reason for the coming
together of the two, previously distinct, roles. The hedge fund AQR Capital Management
opted for the Company Act Of 1940 (the 40-Act) mutual fund industry regulatory regime.
This act requires much more transparency in reporting than hedge funds usually provide.
This allowed AQR access to a new customer base. This business had grown to USD 19
billion AuM by 2014.
Forward looking estimates by PwC (2014, 2018) for the period 2014-2020 estimate
that actively managed funds will grow at an CAGR of 5.4 percent and mandates with 5.7
percent (PwC [2014]). The actively managed funds growth driver is the growing global
middle-class client base. Mandates growth factors are institutional investors (pension
funds and SWFs) and HNWIs, see Table 5.3. Furthermore, the ratio active:passive = 7:1
Investment type 2014 - USD trillions E2020 - USD trillions

Actively managed funds 30 41.2
Mandates 32 47.5
Alternative investments 6.9 13
Table 5.3: Actively managed funds, mandates, and alternative investment (PwC [2014]).
by 2012 and is estimated to fall to 3:1 by 2020. By the end of 2014, the AuM in actively
managed funds are distributed as follows - 60% in the Americas, 32% in Europe, and
12%in Asia. Compared to 2010, there is a relative stagnation or decrease in Europe and
Asia whereas the proportion in the Americas is increasing.
The formation of four regional blocs in AM - South Asia, North Asia, South Asia,
Latin America, and Europe - creates opportunities, costs, and risk. These blocks develop
regulatory and trade linkages with each other based on reciprocity - AM rms can dis-
tribute their products in other blocs. The US, given the actual trends, will stay apart
since it prefers to adhere to its regulatory model. But integration will not only increase
between these blocs but also within blocs. There will be, for example, a strong regula-
tory integration inside the South Asia bloc. The ASEAN platform between Singapore,
Thailand, and Malaysia will be extended to include Indonesia, the Philippines, and Viet-
nam. All these countries possess a large wealthy, middle-class of potential AM service
investors. The global structure UCITS continues to gain attraction worldwide and reci-
procity between emerging markets and Europe will be based on the European AIFMD
model for alternative funds. By 2013, more than 70 memoranda of understanding for
AIFMD had been signed.
The traditional AM hubs London, New York and Frankfurt will continue to dom-
inate the AM industry. But new center will emerge due to the global shift in asset
holdings. There will be a balance between global and local platforms. Whether or not
a global or local platform is pushed depends on many factors: Time-to-market, regu-
latory and tax complexity, behavior and social norms in jurisdiction and the eduction
level matter. AM rms recruit local teams in the key emerging markets - the people
factor. The education of these local individuals started originally in the global centers
but will diuse more and more to the new centers in the emerging markets. Due to the
positive brand identities that tech rms have, they can integrate part of the business
layer into their infrastructure layer and oer AM services under tech rm brands instead
of more traditional banking or AM company brands ( Branding reversal). Finally,
alternatives asset managers on one hand side oer new products - asset managers
move in the space banks left vacated - and on the other hand side try that their alterna-
tive funds become mainstream. New products include primary lending, secondary debt
market trading, primary securitizations, and o-balance-sheet nancing.
5.1.4 Asset Management vs Trading Characteristics

Some key characteristics of asset management rms:
• Agency business model. Asset managers are not the asset owners, they act on a
best eort basis for their clients and the performance is attributed to their clients.
• Low balance sheet risk. Since asset managers to not provide loans, to not act as
counter parties in derivatives, nancing or securities transactions and they seldom
borrow money (leverage) their balance sheet does not face the risk of a bank's
balance sheet.
• Protection of client assets. Asset managers are regulated and in mandated asset
management, the client assets are held separately from the asset management rm's
assets.
• Fee based compensation. Asset managers generate revenue principally from an

agreed-upon fee. There is no prot and loss as in the trading.
From a risk perspective, asset management is a fee business with conduct, business,
and operational risk as the main risk sources.
Trading is contrary a market, counter party and liquidity risk business which needs a
strong balance sheet of the intermediary. Trading is a mixture of a fee (agency trading)
and a risk-taking business (principal and proprietary trading). Agency trading is a fee
business based on client ow. Clients place their orders and the trading unit executes
the orders on behalf of the client's account. For example, a stock order is routed by
the trader to the stock exchange where the trade is matched. The bank receives a fee
for this service. Principal trading already requires active market risk or counterparty
risk taking by the bank since the bank's balance sheet is aected by the prots and
losses from trading. Principal trading is still based on clients' orders but it requires the
traders to take some trading positions in their market-making function or in order to
meet future liabilities in issued structured products. This is a key dierence to agency
trading. Proprietary trading is not based on the client's ow at all. Proprietary traders
implement trading ideas without any reference to a client activity. This type of trading
puts the bank's capital at risk. New regulations limit proprietary trading by investment
banks such as the The Volcker Rule in the US and 'ring-fencing' in the UK.
AM rms wrap the underlying assets into collective investment schemes ('funds')
while the trading of a bank oers issuance and market making for cash products, deriva-
tives, and structured products. Despite their dierences, trading and asset management
are linked. Portfolio managers in the asset management function execute their trades
via the trading unit or a broker. The market making of ETF and listed fund trading
takes place in the trading unit. Cash products are used by the asset management func-
tion in their construction of collective schemes and asset managers use in their portfolios
derivative (overlay) to manage risk and return characteristics.
5.1.5 Institutional Asset Management versus Wealth Management

Investors are in institutional asset management (IAM) are legal entities such as pension
funds and in wealth management WM private clients. The investment goal in IAM is
often based on an non-maturing asset-liability analysis while in WM the goal is linked to
the life cycle of the client. Although, this denes long-term investment horizons for both
types of investors, we refer to Section 3.6 for diculties of pension funds f to follow a long-
term strategy. If WM clients use short- or mid-term investment horizons, opportunistic
behavior is motivated. The performance of the investment for IAM is benchmarked while
WM clients also prefer absolute returns. Therefore, for IAM beta is the rst concern and
alpha is added in a satellite form. The responsibility for the performance in IAM is
5.2. THE FUND INDUSTRY - AN OVERVIEW 295
attached to investment boards, CFOs, board of trustees. In WM, the mandate manager
is responsible for the performance. IAM companies use several mandates, often one for
each asset class, to manage investments while WM either use a fewer number of mandates
or even decide by their own in the advisory channel.
The size of investment is very huge for IAM and smaller for WM. The risk man-
agement for IAM is comprehensive and of the same quality as it is used by say banks
for their own purposes. In WM risk management is often less sophisticated. Fees are
typically lower for IAM than for WM. While IAM are highly regulated the regulation
of WM was in the past much less strong. This changed after the GFC where MiFID II,
Know-Your-Client, product information sheets, etc. heavily increases the WM regulation
setup. Finally, the loyalty of IAM clients is decreasing while WM clients are more loyal.
It will be interesting to observe in the future how loyalty of WM clients will change if
technology will make investments not only more tailor-made but also more open platform
oriented and therefore, less strongly linked to the home institution of the WM clients.
5.2 The Fund Industry - An Overview

In 1774 Abraham van Ketwich, an Amsterdam broker, oered a diversied pooled se-
curity specically designed for citizens of modest means. The security was similar to a
present day closed-end fund. It invested in foreign government bonds, banks, and West
Indian plantations. The word 'diversication' is explicit in the prospectus of this fund.
The 1920s saw the creation in Boston of the rst open-end mutual fund - the Mas-
sachusetts Investors' Trust. By 1951 more than 100 mutual funds existed and 150 more
were added in the following twenty years. The challenging 1970s - oil crisis - were marked
by a number of innovations. Wells Fargo oered a privately placement, equally weighted
S&P 500 index fund in 1971. This fund was unsuccessful and Wells created a successful
value-weighted fund in 1973. It required hugh eorts - tax and regulatory compliance,
build up stable operations and education of potential investors. Bruce Bent established
the rst money market fund in the US in 1971 such that investors had access to high
money market yields in a period where bank regulated interest rates. In 1975, John
Bogle create a mutual fund rm - Vanguard. They launched 1976 the rst retail index
fund based on the S&P 500 Index. In 1993, Nathan Most developed an ETF based on
the S&P 500 Index.
The fund industry is not free of scandals. In 2003 for example illegal late trading and
market timing practices were uncovered in hedge fund and mutual fund companies. Late
trading means that trading is executed after the exchanges are closed. Traders could buy
mutual funds when markets were up at the previous day's lower closing price, and sell at
the purchase date's closing price for a guaranteed prot.
5.2.1 Types of Funds and Size

There are dierent types of funds: Mutual funds, index funds, ETFs, hedge funds and
alternative investments. We note some broad characteristics:
• Index mutual funds and most ETFs are passively managed.
• Index funds seek to match the fund's performance to a specic market index, such
as the S&P 500, before fees and expenses.
• Mutual funds are actively managed and try to outperform market indexes. They
are bought and sold at the current day's closing price - the NAV (net asset value).
• ETFs are traded real time at the current market price and may cost more or less
than their NAV.
NAV is a company's total assets minus its total liabilities. If an investment company
assets are worth USD 100 and has liabilities of USD 10, the company's NAV is USD 90.
Since assets and liabilities change daily, NAV also changes daily. Mutual funds generally
must calculate their NAV at least once every business day. An investment company
calculates the NAV of a single share by dividing its NAV by the number of outstanding
shares.
3
Funds can be open- or closed-end. Open-end funds are forced to buy back fund shares
at the end of every business day at the NAV, see Table 5.4. Prices of shares traded during
the day are expressed in NAV. Total investment varies based on share purchases, share
redemptions, and uctuations in market valuation. There is no limit on the number of
shares that can be issued. Closed-end funds issue shares only once. The shares are listed
and traded on a stock exchange: An investor cannot give back his or her shares to the
fund but must sell them to another investor in the market. The prices of traded shares
can be dierent to the NAV - either higher (premium case) or lower (discount case). The
vast majority of funds are of the open-end style.
The legal environment is crucial for the development of the fund industry. About
three-quarters of all cross-border funds in Europe are sold in Luxembourg. Luxem-
bourg oers favorable framework conditions for holdings/holding companies, investment
funds, and asset-management companies. These companies are partially or completely
tax-exempt; typically, prots can be distributed tax free. For private equity funds, two-
thirds have the US state of Delaware as their domicile. For hedge funds one-third are
in the Caymans; one-quarter in Delaware. As of Q3 2013, 48 percent of mutual funds
had their domicile in the US, 9 percent in Luxembourg, and around 6 percent in Brazil,
France, and Australia, respectively.
3 We assume that at the close of trading a mutual fund held USD 10.5 mn securities, USD 2 mn of
cash, and USD 0.5 mn of liabilities. With 1 million shares outstanding, the NAV is USD 12 per share.
5.3. MUTUAL FUNDS AND SICAVS 297
Feature Open-end fund Closed-end fund

Number of outstanding shares Flexible Fixed
Pricing Daily NAV Continuous demand and supply
Redemption At NAV Via exchange
Market share > 95% < 5%
US terminology Mutual fund Closed-end fund
UK terminology Unit trust Investment trust
EU terminology SICAV SICAF
Table 5.4: Features of open-end and closed-end funds. A SICAV (Société d'Investissement
a Capital Variable) is an open-ended collective investment scheme. SICAVs are cross-
border marketed in the EU under the UCITS directive (Undertakings for Collective In-
vestments in Transferable Securities, see below). SICAFs are the closed-end fund equiv-
alent of SICAVs.
5.3 Mutual Funds and SICAVs

The Securities and Exchange Commission (SEC) denes mutual funds as follows:
Denition 73. A mutual fund is a company that pools money from many investors
and invests the money in stocks, bonds, short-term money-market instruments, other
securities or assets, or some combination of these investments. The combined holdings
the mutual fund owns are its portfolio. Each share represents an investor's proportionate
ownership of the fund's holdings and the income those holdings generate.
In Europe, mutual funds are regulated under the UCITS regime and mutual fund
equivalents are called SICAVs. When we refer below to mutual funds, we always have US
mutual funds in mind. Some characteristics of mutual funds are that investors purchase
mutual fund shares from the fund and not via stock exchange, investors can sell their
share any time, that they pay for mutual fund shares the NAV plus any shareholder
fees, that if there is a new demand, mutual funds create and sell new shares, and nally,
investment portfolios are managed by separate entities (investment advisers) that are
registered with the SEC. Mutual funds are non-listed public companies that neither pay
taxes nor have employees.
The major benets of mutual funds for investors are:
• Diversication and professional management.
• Investor protection (regulation)
• Aordability - the basic unit of a fund unit requires only little money from the
investors and access to assets.
• Partial transparency about the investment process, performance, the investment

portfolio, and the fees.
• Default remoteness. Fund capital is treated as segregated capital.
• Liquidity. Mutual fund investors can redeem at any time their shares at the current
NAV plus any fees and charges assessed on redemption.
• Investment strategy. The investor can choose between active and passive invest-
ment, can have access to rule-based strategies, etc. But he cannot choose a guar-
anteed payo as for structured products. Hence, investors in funds believe that the
fund managers skills generate the performance.
Some disadvantages of mutual funds:
• Lack of control about the securities in the portfolios.
• Price uncertainty. Pricing follows the NAV methodology, which the fund might
calculate hours after the placement of an order.
The Investment Company Institute and US Census Bureau (2015) states that a total
of 43.3% of US households with a median income of USD 85, 000 own mutual funds. The
median mutual fund holdings are USD 103, 000 and the median of household nancial
assets is USD 200, 000. 86% own equity funds, 33% hybrids, 45% bond funds, and 55%
money-market funds. Only 36% was invested in global or international equity funds.
The primary nancial goal (74%) for mutual fund investment are retirement goals.
5.3.1 US Mutual Funds versus European UCITS

Mutual funds and SICAVs are both collective investment schemes. But there are some
major dierence between the two types of wrapper and the entire industries. We follow
Pozen and Hamacher (2015).
Cross-border distribution has been most successful within the European UCITS for-
mat. This is not only true for Europe. UCITS dominate global fund distribution in more
than 50 local markets (Europe, Asia, the Middle East, and Latin America). This kind
of global fund distribution is the preferred business model in terms of economies of scale
and competitiveness. In 2016 around 80,000 registrations for cross-border UCITS funds
exist. The average fund is registered in eight countries. Furthermore, UCITS are not
required to distribute all income annually.
UCITS do not need to accept redemptions more than twice a month. Although the
two previous points hold in general, many funds oer - for example - the option to dis-
tribute income annually or make redemptions possible on a daily basis. UCITS sponsors
must comply with the EU guidelines on compensation for key personnel: the remunera-
tion directive.
Both, UCITS funds and mutual funds originally were quite restrictive in their in-
vestment guidelines. Then UCITS (similar remarks apply to mutual funds) were allowed
to use derivatives extensively. Using derivatives means, among other things, leveraging
portfolios or creating synthetic short positions - UCITS are not allowed to sell physical
assets short. The strategies of these funds - referred to as 'newCITS' - are similar to
hedge fund strategies and they showed strong growth to USD 294 billion in 2013 accord-
ing to Strategic Insight (2013).
But there are also dierences between US mutual funds and European UCITS on
a more fundamental level. US clients invest in existing funds while European investors
are regularly oered new funds. That is, the number of US mutual funds has been
decreasing in the last decade while the European funds have showed a strong increase
in numbers; see Table 5.5. The stability of the US fund industry is due to the inuence
of US retirement plans (dened contribution), which do not change investment options
often. The tendency to innovate permanently in Europe leads to funds which on average
around six-times smaller than their US counterparts.
2003 2013
US
Number of funds 8,125 7,707
Total Assets USD tr 7.4 15.0
Asset per fund USD mn 911 1,949
Europe
Asset per fund USD mn 164 270
Asia
Asset per fund USD mn 116 183
Table 5.5: Number of funds, average fund size and assets by region (Investment Company
Institute [2010, 2014] and Pozen and Hamacher [2015]).
5.3.2 Functions of Mutual Funds

5.3.2.1 How They Work
Buying and selling mutual funds is not done via a stock exchange - the shares are bought
directly from the fund. Therefore, the share price is not xed by traders but is equal to
the net asset value (NAV). Investors pay the NAV plus the sales load fee when they buy;
if they sell, they get the NAV minus the redemption fee. While the calculation of the
NAV is theoretically simple, the process of implementing the calculation is not since one
has to accurately record all securities transactions, consider corporate actions, determine
the liabilities for example. Digitization oer an opportunity to overcome present NAV
calculation problems. If say the NAV can be calculated real-time, why should fund shares
not be listed on a stock exchange?
Mutual funds as companies pay out almost all of their income - dividend and realized
capital gains - to shareholders every year and pass on all their tax duties to investors.
Hence, mutual funds do not pay corporate taxes. Therefore, the income of mutual funds
is taxed only once while the income of 'ordinary' companies is taxed twice.
4
5.3.2.2 Organization of Mutual Funds

The fund's board of directors is elected by the fund's shareholders. It should govern
and oversee the fund, see Figure 5.2. Mutual funds are required to have independent
directors on their boards. The investment adviser manges the fund's portfolio following
the guidance described in the prospectus. The fund administrator oers administrative
services to the fund and ensures that the fund's operations comply with internal and
external legal requirements. The fund's distributor or the principal underwriter sells
fund shares. Mutual funds are required to protect their portfolio securities by placing
them with a custodian. The largest custodians are Bank of New York Mellon, J.P. Morgan
and State Street Bank and Trust Company (see the Appendix for a list of assets under
custody).
5.3.2.3 Taxonomy of Mutual Funds

Money Market (MM) Funds
There are tax-exempt and taxable fund types. The former invest in securities backed
by municipal authorities and state governments. Both securities do not pay federal in-
come tax. Which fund to choose is only a question of the after-tax yield. Tax-exempt
funds make sense for investors who face a high tax bracket. In all other cases, taxable
funds show a better after-tax yield. Fund sponsors typically oer a retail and an insti-
tutional investor series of MM funds.
Bond Funds
There are many types of bond funds. Bond funds can be tax-exempt or taxable,
US and global bonds. In each possible category dierent factors matter: The creditwor-
thiness of the bond, the maturity of the bonds, the segmentation of global bonds into
emerging market bonds and general global bonds and the classication of bonds accord-
ing to dierent economic sectors or specic topics. Finally, alternative bond funds use
techniques from hedge funds to shape the risk and return prole.
4 Mutual funds make two types of taxable distributions to shareholders: ordinary dividends and capital
gains. The Internal Revenue Service (IRS) denes rules that prevent ordinary rms from transforming
themselves into mutual funds to save taxes: A rule demands for example that mutual funds have only a
limited ownership of voting securities and that funds must distribute almost all of their earnings.
Figure 5.2: The organization of a mutual fund (ICI Fact Book [2006]).
Morningstar adopted in 2012 a new classication system to overcome the excessive

number of dimensions that a bond fund can have. The system classies bonds in the
two dimensions creditworthiness (credit quality) and interest-rate sensitivity where each
dimension has three classes such as high/medium or low credit quality and limited/mod-
erate/extensive interest sensitivity. That is, each bond is classied in this 3×3 matrix.
The credit dimension indicates the likelihood that investors will get their invested money
back. The interest-rate sensitivity states the impact of changing interest-rates on the
value of the bonds.
Stock Funds
For stock funds the dierence between tax-exempt and taxable does not exist since
most of their income comes from price appreciation and income from dividends is very
low. Categories are US versus global funds, sectors, regions, style, etc. As for bond
funds, a 3 × 3 style box from Morningstar exists with size as one dimension and style the
other one.
5.3.3 Fees for Mutual Funds

5.3.3.1 Denitions
The SEC (2008) denes the following components for mutual fund fees. (i) fees paid by
the fund out of fund assets to cover the costs of marketing and selling fund shares ... (ii)
'distribution fees', including fees that compensate brokers and others who sell fund shares
and that pay for advertising, the printing and mailing of prospectuses to new investors,...
(iii) 'shareholder service fees' - fees paid to persons who respond to investor inquiries and
who provide investors with information about their investments.
The expense ratio is the fund's total annual operating expenses including management
fees, distribution (12b-1) fees and other expenses. All fees are expressed as a percentage
of average net assets. Other fees include fees related to the selling and purchasing of
funds: Back-end sales load is a sales charge investors pay when they redeem mutual
funds. Front-end sales is the similar fee when funds are bought. It is generally used by
the fund to compensate brokers. Purchase and redemption fees are not the same as the
back- and front-end sales. They are both paid to the fund. The SEC generally limits
redemption fees to 2 percent.
5.3.3.2 Share Classes

While dierent stock classes are used to express dierent voting rights, dierent mutual
fund classes are used for dierent customers and dierent fees. The most prominent
classes in the US are the A-, B- and C-class.
Class-A shares for example charge a front-end load and have low 12b-1 (distribution)
fees. They are benecial for long run investors. In Europe the type of share classes can
dene the client segmentation, specify investment amount and specify the investment
strategy. For example:
• AA-class: Admissible for all investors, distribution of earnings.
• AT-class: Admissible for all investors, blow back of earnings.
• CA-class: Admissible for qualied investors only, distribution of earnings.
• D-class: Same as CA but blow back of earnings.
• N-class: Only for clients which possess a mandate contract or an investment con-
tract with the bank.
5.3.3.3 TER and Performance

The total expense ratio (TER) is a percentage ratio dened as the ratio between total
business expenses and the average net fund value. TER expresses the total of costs and
fees that are continuously charged. Business expenses are fees for the fund's board of di-
rectors, the asset manager, the custodian bank, administration, distribution, marketing,
the calculation agent, audit, and legal and tax authorities.
The following approach is widely used for performance calculations. Consider a

period starting at 0 with length T. The performance P is dened by:
NAVT × f1 × . . . × fT
P% = × 100 (5.1)
NAV0
with f the adjustment factor for the payout, such as dividends,
NAVex + BA
f= ,
NAVex
with BA the gross payout - that is to say, the gross amount of the earning- and capital-
gain payout per unit share to the investors, and NAVex the NAV after the payout.
Example
Consider a NAV at year-end 2005 of CHF 500 million, 2006 earnings of CHF 10
million, and a capital-gain payout of CHF 14 million. The NAV after payments is CHF
490 million and the NAV at the end of 2006 is CHF 515 million. The adjustment factor
is
490 + 10 + 14
f= = 1.04898.
490
This gives the performance for 2006

515 × 1.04898
P = −1 = 8.045%.
500
There are several reasons why it is important to measure the performance of a fund
correctly: Selection of the best fund, check whether the fund managers do what they
promise and a correctly measured performance allows one to check whether the fund
manager added value.
The performance formula (5.1) can be rewritten in the eective return form
T
Y BAk
(1 + P )NAV0 = NAVT × f1 × . . . × fT = NAVT 1+ . (5.2)
NAVex,k
k=1
If the gross payouts are zero in all periods, then the performance reads
(1 + P )NAV0 = NAVT
with P the simple eective return. Contrarily, assume that in each period a constant
fraction
BA
g = NAV is paid out. Then,
ex
(1 + P )NAV0 = NAVT (1 + g)T ≥ NAVT .
Since (1 + g)T is larger than one, with the same eective return P, the fund without any
payouts achieves a larger nal eective value than the fund with payouts.
Example
The return calculation for funds can be misleading. Consider the following reported
annual returns: 5%, 10%, −10%, 25%, 5%. The arithmetic mean is 7%. The geometric
mean is 6.41%. How much would an investor earn after 5 years if he or she starts with
USD 100?
100 × 1.05 × 1.1 × 0.9 × 1.25 × 1.05 = USD136.4.
If the fund reports the arithmetic mean, the investor would expect
100 × 1.075 = USD140.2.
Using the geometric mean of 6.41%, the true value of USD 136.4 follows. Although it
is tempting to report the higher arithmetic mean, such a report would be misleading.
Some jurisdictions require funds to report returns in the correct geometric way.
5.3.4 The European Fund Industry - UCITS

Luxembourg attracts dierent kinds of funds by providing dierent vehicles with which to
pool their investments. It oers both regulated and non-regulated structures. For regu-
lated fund in Luxembourg, two options are available. First, an 'undertaking for collective
investment' (UCI), a category which itself is divided into UCIs whose securities are dis-
tributed to the public and UCIs made up of securities that are reserved for institutional
investors. The most common legal form of UCI is a SICAV (Société d'Investissement a
Capital Variable) - that is, an open-ended collective investment scheme that is similar
to open-ended mutual funds in the US. A SICAV takes the form of a public limited
company. Its share capital is - as its name suggests - variable and at any time its value
matches the value of the net assets of all the sub-funds. Closed-end funds are referred
to as SICAFs. Second, a Société d'Investissement en Capital à Risque (SICAR). These
provide a complementary regime to that of UCIs. They are tailor-made for private equity
and venture capital investment. There are no investment diversication rules imposed
by law and a SICAR may adopt an open-ended or closed-ended structure.
Both schemes are supervised by the Luxembourg nancial sector regulator. A main
reason for Luxembourg's attractiveness is taxation. Both, SICAV and SICAF investment
funds domiciled in Luxembourg are exempt from corporate income tax, capital gains tax,
and withholding tax. They are only liable for subscription tax at a rate of 0.05 percent
on the fund's net assets. Also, favorable terms apply with regards to withholding tax.
The UCITS - undertakings for collective investment in transferable securities - direc-

tives were introduced in 1985. They comprise the main European framework regulat-
ing investment funds. Their principal aim is to allow open-ended collective investment
schemes to operate freely throughout the EU on the basis of a single authorization from
one member state ('European Passport'). Their second objective is the denition of levels
of investor protection (investment limits, capital organization, disclosure requirements,
asset safe keeping, and fund oversight).
In summary, UCITS funds are open-ended, diversied collective investments in liquid

nancial assets and are 'product passported' in 27 EU countries.
Total UCITS funds' AuM grew from EUR 3.4 trillion at the end of 2001 to EUR 5.8
trillion by 2010 with a value of EUR 6.8 trillion at the end 2014. Roughly 85 percent of
the European investment fund sector's assets are managed within the UCITS framework.
On average, 10 percent of European households invest directly in funds: Germany, 16%;
Italy, 11%; Austria, 11%; France, 10%; Spain, 7%; and the UK, 6%.
There have been ve framework initiatives - UCITS I (1985) to UCITS V (2016).
Goals of UCITS IV:
• Reduce the administration burden by the introduction of a notication procedure.
• Increase investor protection by the use of key investor information (KID). KID
replaces the simplied prospectus.
• Increase market eciency by reducing the waiting period for fund distribution
abroad to 10 days.
The Mado fraud case and the default of Lehman Brothers highlighted some weak-
nesses in and lack of harmonization of depositary duties and liabilities across dierent
EU countries leading to UCITS V. It considers the following issues. First, it denes
what entities are eligible as depositaries and establishes that they are subject to capital
adequacy requirements, ongoing supervision, prudential regulation and some other re-
quirements. Second, client money is segregated from the depositary's own funds. Third,
the depositary is confronted with several criteria regarding the holding of assets. Fourth,
remuneration is considered. A substantial proportion of remuneration, for example, and
at least 50 percent of variable remuneration, shall consist of units in the UCITS funds
and be deferred over a period that is appropriate in view of the holding period. Fifth,
sanctions shall generally be made public and pecuniary sanctions for legal and natural
persons are dened. Finally, measures are imposed to encourage whistle-blowing.
5.4 Index Funds and ETFs

The work of Fama on market eciency was one reason for the rise in the 70s of low-cost
and passively managed investing through index funds. Another theoretical milestone in
the development of passive management was established by Jensen's (1968) work about
the performance of 115 equity mutual funds:
The evidence on mutual fund performance indicates not only that these 115 mutual
funds were on average not able to predict security prices well enough to outperform a buy-
the-market-and-hold policy, but also that there is very little evidence that any individual
fund was able to do signicantly better than that which we expected from mere random
chance.
A growth analysis of the top ten global asset managers over the past ve years con-
rms this trend. Vanguard with its emphasis on passive products is the strongest growing
AM, followed by BlackRock with its passive products forming the iShares family. Both
index funds and ETF aim at replicating the performance of their benchmark indices as
closely as possible. Issuers and exchanges set forth the diversication opportunities they
provide - like mutual funds - to all types of investors at a lower cost as for mutual funds,
but also highlight their tax eciency, transparency, and low management fees. Although
actively managed ETFs were launched around twenty years ago their importance remains
negligible. One major reason is that actively managed ETFs lose their cost advantage
compared to mutual funds. As of June 2012 about 1, 200 ETFs existed in the US, in-
cluding only about 50 that were actively managed.
Example Core-satellite
Core-satellite approaches are common in many investment processes. They comprise

a core of long-term investments with a periphery of more specialist or shorter-term in-
vestments. The core is then a passive investment style where index funds or ETFs are
used to implement the passive strategy at low costs (see the following sections for index
funds and ETFs). Satellites are, conversely, often actively managed and the hope is that
they are only weakly correlated with the core.
We next consider in some detail the construction of dierent types of indices.
5.4.1 Index Construction

Besides the member asset prices, there are four other main factors determining the index
value. To calculate the index value the following factors have to be taken into considera-
tion: member weighting, divisor, index return type and value xing. The general formula
5.4. INDEX FUNDS AND ETFS 307
for index calculation reads

PM
i=1 wi Si
I=
D
where M is the number of assets in the index, Si is the price of a tradable unit of asset
i, e.g. the price of a stock, wi is the weight assigned to the price of that asset and D is
the divisor.
5.4.1.1 Weighting
Various methods are used for determining the weight of individual members in the index.
Within the same category of members there can be subcategories.
• Market capitalization weighting: The members are weighted proportional to the

total market value of the asset issuer, i.e. wi is dependent on the size of the
company for equity. In the equity case this would correspond to the number of
outstanding free-oating shares multiplied by the share price. Subgroups of this
weighting would be if weights were capped at some level, or that no consideration
was taken into free oat. This is the most common form of weighting for public
indices and the rule for indices such as S&P, FTSE, MSCI and SMI .
• Equal weighting 1 (Price Weighting): The weight assigned to dierent assets is

the same. As a consequence the price of a tradable unit of the asset will have a
determining eect on the weight of an asset in the index . Dow 30 and Nikkei 225
indices are calculated using the equal weighing scheme.
• Equal weighting 2 (Currency Weighting): The CHF weight assigned to each asset
is the same, i.e. Si wi , is the same for each asset. This means that if CHF 500 is
to be invested in a basket of 10 assets, the amount bought of each asset would be
CHF 50.
• Share weighting: The members are weighted proportional to the total number of
tradable units issued, i.e. wi is dependent on the number of the shares outstanding
for the equity asset class.
• Attribute weighting: The members are weighted according to their ranking score in
the selection process. If our ranking is based on ethical and environmental criteria,
and asset Y has a score of 75 and asset X 25, then weight ratio between asset Y
and X will be Weight Y / Weight X = 3.
• Hybrid or Custom weighting: The weighting scheme can be a combination of the

above alternatives or be something totally new, maybe based on the request of
client.
Free-oating is the portion of total shares held for investment purposes. This is opposite
to shares held for strategic purposes, i.e. for control. Some indices are quoted using
dierent weighting schemes, e.g. MSCI. However, the main quoted value is using the
market capitalization weighting method.
Remark:
The dierence between the asset weighting scheme and the weight of an asset in the
index is as follows. For a price weighted index w1 = w2 for asset 1 and asset 2. However
if S1 /S2 = 3 the weight of asset 1 in the index will be 3 times larger than the weight of
asset 2.
5.4.1.2 Divisor
The divisor is a crucial part of the index calculation. At initiation it is used for normal-
izing the index value. For instance, the initial SMI divisor on June 1998 was chosen to
a value, which normalized the index to 1500. However, the main role of the divisor is
to remove the unwanted eects of corporate actions and index member change on the
index value. It ensures continuity in the index value in the sense that the change in the
index should only stem from the investor sentiment and not originate from "synthetic"
changes. Corporate actions, which need to be accounted for by changing the divisor
value, are dependent on the weighting scheme used for the index.
An example is the eect of a stock split for
• Market capitalization weighting: The price of stock will be reduced, but the number
of free-oating shares will increase. These two eects will be osetting and no
change has to be made to the divisor.
• Equal weighting 1 (Price Weighting): The stock price reduction will have an eect,
but the number of free-oating share has no impact on such a weighting. Therefore,
the divisor has to be changed, to a lower value, in order to avoid a discontinuity in
the index value.
It is important to have a good understanding of the inuence of common corporate actions

such as splits, dividends, spin o, merger & acquisition, rights oering, bankruptcy, etc.
on the index value so that the index value continuity can be ensured.
5.4.1.3 Return Type

How the dividends are handled in the index calculation determines the return type of
the index. There are three versions of how dividends can be incorporated into the index
value calculations:
• Price return index: No consideration is taken to the dividend amount paid out by
the assets. The day-to-day change in the index value reects the change in the
asset prices.
• Total return index: The full amount for the dividend payments is reected in the
index value. This done by adding the dividend amount on the ex-dividend date to
the asset price. Thus, the index value 'acts' as if all the dividend payments were
reinvested in the index.
• Total return index after tax: The dividend amount used in the index calculation
is the after tax amount, i.e. the net cash amount. In contrast, in the total return
index case the gross dividend amount is used.
5.4.1.4 Value Fixing

Another set of rules that characterize an index calculation, is the data values and the
frequency, which they are used. An index value is usually calculated in real time or once
a day. The values that are needed for index value calculation can be quoted in various
versions. For example, the most important value is the asset price. It has to dened
weather the value uses is mid prices, bid or ask prices, last trade prices or any other price
value provided.
In addition, if the index constituents have a wide geographical span, there are other
issues that need to be taken into consideration. Some of the rules that need to dened are:
index value quotation currency, source of currency rates, index opening and closing hours,
and assets registered on multiple exchanges. For most major indices the quotation is real
time and the currency rate used is also real time. The opening hour for the constructed
index starts with the opening of the exchange of any index member, and the closing occurs
when no index member exchange is open. Having a global index, with constituents from
Japan to USA, would mean that the index would be "open" most hours of the day.
5.4.2 Capital Weighted Index Funds

Index funds are used to gain access to (global) diversied equity market performance.
Traditionally, these indices are constructed using capitalization weights (CWs). In recent
years, new types of weights have been considered. These alternative methods are often
called smart beta approaches. The rationale for CW is the CAPM: all investors hold
the CW market portfolio. The second theoretical input is the ecient market hypothesis
(EMH). These two theoretical streams were the foundation for cost eective, passive in-
vestment in CW instruments: McQuown developed the rst index fund - at Wells Fargo
- in 1970.
One must distinguish between the theoretical index and a strategy that replicates
the theoretical index using securities. The theoretical index is not an investable asset or
security. If we set φi,t for the weight of asset i in the index at time t, with Ri,t the gross
return of the asset in the period t−1 to t, the index value It satises the dynamics
XN
It = It−1 ( φk,t Rk,t ) , I0 = 100 . (5.3)
k=1
The value of the index tomorrow is equal to the present value times the return of each
stock generated until tomorrow weighted by the asset weight. The index fund Ft aims
to replicate (5.3) by investing in the stocks. At each date t the fund has a number nk,t
of stocks k and Ft is equal to the sum of all stocks times their price Pk,t . The dierence
between the values Ft and It is the tracking error where the accuracy of the replication
is often measured with the volatility of the tracking error.
Example
The tracking error (TE) can be calculated directly or indirectly. Consider the follow-
ing returns for a portfolio and its benchmark (market portfolio).
Period [month] Portfolio Market Return dierence

1 0.37% 0.53% -0.16%
2 -1.15% -1.36% 0.21%
3 -1.81% -1.43% -0.38%
4 -0.04% -0.34% 0.30%
5 -1.22% -1.59% 0.37%
6 0.08% -0.30% 0.37%
7 1.18% 1.12% 0.07%
8 -0.52% -0.39% -0.13%
9 1.83% 1.94% - 0.11%
10 -0.70% -0.36% -0.33%
11 -0.66% -0.60% -0.06%
12 -1.60% -1.85% 0.25%
σ √ 1.10% 1.14% 0.27%
σ1y = σ 12 3.80% 3.93% 0.92%
Table 5.6: Direct tracking error calculation. The TE is 0.92%
The indirect method uses the following replication of the tracking error. The TE is
equal to buying the portfolio and selling the benchmark. We can use the general variance
formula for two random variables and choosing the weights φ1 = +1 and φ2 = −1:
σ 2 = σ12 + σ22 − 2ρσ1 σ2 .
The TE is equal to σ. The covariance of the two time series is 0.011 percent. Dividing by
the volatilities of the two time series the correlation factor ρ = 0.89 follows. This gives
the TE per period and scaling it with the square root law the annualized TE of 0.92%
follows.
Example
This example follows ZKB (2013). Examples of capital-weighted indices include

the S&P 500, FTSO, MSCI, and SMI. Other indices use equal weighting (EW). Dow
Jones 30 and Nikkei 225 are both equally weighted indices. Other types include share
weighting and attribute weighting. In attribute weighting the weights are chosen
according to their ranking score in the selection process. If our ranking is based on
ethical and environmental criteria, and asset Y has a score of 75 and asset X of 25, then
the weight ratio between asset Y and X will be 3.
The divisor is a crucial part of the index calculation. At initiation it is used for
normalizing the index value. The initial SMI divisor in June 1998 was chosen as a value
that normalized the index to 1, 500. However, the main role of the divisor is to remove the
unwanted eects of corporate actions and index member changes on the index value. It
ensures continuity in the index value in the sense that the change in the index should only
stem from investor sentiment and not originate from 'synthetic' changes. The impact of
corporate actions depends on the weighting scheme used for the index. Consider a stock
split for an index with:
• Market capitalization weighting - The price of the stock will be reduced and the
number of free oating shares increases. These two eects will be osetting and no
change has to be made to the divisor.
• Equal weighting (price weighting) - The stock price reduction will have an eect,
but the number of free-oating shares has no impact on such a weighting. Therefore,
the divisor has to be changed to a lower value in order to avoid a discontinuity in
the index value.
How the dividends are handled in the index calculation determines the return type of
the index. There are three versions of how dividends can be incorporated into the index
value calculations:
• Price return index - No consideration is taken of the dividend amount paid out
by the assets. The day-to-day change in the index value reects the change in the
asset prices.
• Total return index - The full amount for the dividend payments is reected in the
index value. This is done by adding the dividend amount on the ex-dividend date
to the asset price. Thus, the index value acts as if all the dividend payments were
reinvested in the index.
• Total return index after tax - the dividend amount used in the index calculation is
the after tax amount; that is to say, the net cash amount.
The relative weights φ are, for a CW index, dened by
Mk,t Pk,t
φk,t = PN (5.4)
j=1 Mj,t Pj,t
with M the number of outstanding shares. The numerator is the market capitalization
of stock k and the denominator is the market capitalization of the index. The weights φ
can change as follows, where we write MC for the index market capitalization:
∆Mk,t Pk,t ∆Pk,t Mk,t ∆Mk,t Pk,t ∆M C

∆φk,t = + − . (5.5)
MC MC (M C)2
The three possible changes of the weights reect the changes in the outstanding shares,
price changes and changes in the index market capitalization. The second change is
the most important one. The two others are more constant in nature. If the market
shares are constant over time, the same holds true for the number of shares N that are
needed to construct the fund. This is one of the main reasons why CW is often used:
the constancy of the shares implies low trading costs. This reason and the simplicity of
the CW approach have made it the favorite index construction method.
5.4.3 Risk Weighted Index Funds

There are reasons why one searches for alternatives to the CW approach: The rejection
of the CAPM and the procyclical behavior strategy of CW. Suppose that one single stock
in the CW index formula (5.4) is outperforming all others at a very high rate. Then, the
weights will be concentrated over time in this single stock. Diversication is lost and the
index construction turned into a concentration of idiosyncratic risk with the respective
large drawdown risk of such a construction in the past.
Alternative weighting schemes - smart beta approaches - weight the indices not by
their capital weights but either by other weights, which should measure the economic
size of companies better (fundamental indexation), or by risk-based indexation. Most
often, investors will use a mixture of CW and alternative schemes. A rst requirement for
such a mix is that the two approaches show a low correlation. Fundamental indexation
serves the purpose of generating alpha to dominate the CW approach while risk-based
constructions focus on diversication.
Examples of risk weighted allocations are EW, MV, MDP, ERC and MDP. Roncalli
(2014) compares the dierent methods for the Euro Stoxx 50 index using data from
December 31, 1992, to September 28, 2012. He computes the empirical covariance matrix
using daily return and a one-year, rolling window; rebalancing takes place on the rst
trading date of each month and all risk-based indices are computed daily as a price index,
see Table 5.7.
CW EW MV MDP ERC
Expected return p.a. 4.47 6.92 7.36 10.15 8.13
Volatility 22.86 23.05 17.57 20.12 21.13
Sharpe ratio 0.05 0.16 0.23 0.34 0.23
Information ratio - 0.56 0.19 0.42 0.62
Max. drawdown -66.88 -61.67 -56.04 -50.21 -56.85
Table 5.7: Statistics for the dierent index constructions of the Euro Stoxx 50. CW is
capital weighting, EW is equal weighting, MV is mean-variance optimal, MDP is most
diversied portfolio, and ERC is equal risk contribution (Roncalli [2014]).
5.4.4 ETFs
Exchange traded funds (ETFs) are a mixture of open- and closed-end funds. The main
source is Deville (2007). They are hybrid instruments which combine the advantages of
both fund types. Mutual funds must buy back their units for cash, with the disadvantage
that investors can only trade once a day at the NAV computed after the close. Further-
more, the trustee needs to keep a fraction of the portfolio invested in cash to meet the
possible redemption outows. Closed-end funds avoid this cash problem. Since it is not
possible to create or redeem fund shares, there is no possibility to react to changes in
demand for the shares in such funds: If there are strong shifts in demand, price reactions
follow such as signicant premiums or discounts with respect to their NAV.
ETF trade on the stock market and shares can be created or redeemed directly from
the fund due to the in-kind creation and redemption process.
The in-kind process idea is due to Nathan Most. ETFs are organized as commodity
warehouse receipts with the physicals delivered and stored, whereas only the receipts
are traded, although holders of the receipt can take delivery. This 'in-kind' - securities
are traded for securities - creation and redemption principle has been extended from
commodities to stock baskets, see Figure 5.3.
It illustrates the dual structure of the ETF trading process with a primary mar-
ket open to institutional investors (AP) for the creation and redemption of ETF shares
directly from the fund. The ETF shares are traded on a secondary market. The perfor-
mance earned by an investor who creates new shares and redeems them later is equal to
the index return less fees even if the composition of the index has changed in the mean-
time. Only authorized participants can create new shares of specied minimal amounts
(creation units). They deposit the respective stock basket plus an amount of cash into
the fund and receive the corresponding number of shares in return. ETF share are
not individually redeemable. Investors who want to redeem are oered the portfolio of
stocks that make up the underlying index plus a cash amount in return for creation units.
Since ETFs are negotiated on two markets - primary and secondary market - it has
Primary
ETF pponsor / fund market
Institutional
Creation
investors
Redemption
Stock basket + cash ETF shares in return
in return ETF shares for stock basket + cash
Autorized participants / Cash

Stock market
Market makers Stocks
Buy / Sell
Institutional and
retail investors
Cash Cash
Buyers Exchange Sellers
ETF shares ETF shares
Secondary
market
Figure 5.3: Primary and secondary ETF market structure where the 'in-kind' process for
the creation and redemption of ETF shares is showsn. Market makers and institutional
investors can deposit the stock basket underlying an index with the fund trustee and
receive fund shares in return. These created shares can be traded on an exchange as
simple stocks or later redeemed for the stock basket then making up the underlying
index. Market makers purchase the basket of securities that replicate the ETF index
and deliver them to the ETF sponsor. In exchange each market maker receives ETF
creation units (50,000 or multiples thereof ). The transaction between the market maker
and the ETF sponsor takes places in the primary market. Investors who buy and sell the
ETF then trade in the secondary market through brokers on exchanges. (Adapted from
Deville [2007] and Ramaswamy [2011]).
two prices: the NAV of the shares in the primary market and their market price in the
secondary market. These two prices may deviate from each other if there is a pressure
to sell or buy. The 'in-kind' creation and redemption helps market makers to absorb
such liquidity shocks on the secondary market, either by redeeming outstanding or by
creating shares. It also ensures that departures between the two prices are not too large
since authorized participants in the primary market could arbitrage any sizable dier-
ences between the ETF and the underlying index component stocks. If the secondary
market price is below the NAV, APs could buy cheap ETFs in the secondary market, take
on a short position in the underlying index stocks and, then ask the fund manager to
redeem the ETFs for the stock basket before closing the short position at a prot. Since
ETF fund manager do not need to sell any stocks on the exchange to meet redemptions,
they can fully invest their portfolio and the creations do not yield any additional costly
trading within the fund. Finally, in the US, 'in-kind' operations are a nontaxable event.
Most ETFs track an index and are passively managed. ETFs generally provide diver-
sication, low expense ratios, and the tax eciency of index funds, while still maintaining
all the features of ordinary stock, such as limit orders, short selling, and options. ETFs
can be used as a long-term investment for asset allocation purposes and also to im-
plement market-timing investment strategies. All of these features rely on the above
described specic 'in-kind' creation and redemption principle. ETF are constructed by
index providers, exchanges, or index fund managers (the originators).
The costs of an ETF have two components: transaction costs and total expense ratio
(TER). Transaction costs are divided into explicit and implicit costs. Explicit trans-
action costs include fees, charges, and taxes for the settlement by the bank and the
exchange. Implied costs are bid-ask spreads and costs incurred due to adverse mar-
ket movements. ETFs can be constructed by direct replication (physical) or by using
swap-backed construction (synthetic). Physicaal replication is a transparent approach
with low counterparty risk (which occurs due to securities lending). Physical replication
can be expensive for tracking broad emerging market equity or xed income indices.
Commodities ETFs and leveraged ETFs not necessarily employ full replication because
the physical assets are either dicult to store or to leverage. Referring only to a subset
of the underlying index securities for physical replication leads to a signicant tracking
error in returns between the ETF and the index. In a swap-backed construction, the
performance of a basket is exchanged between the ETF and the swap counterparty.
Trends in ETF investment arise from regulation and investors' desire. From a reg-
ulatory perspective there has been barriers for active managers due to regulations by
Retail Distribution Review (RDR) in UK and MiFID II in the euro zone. But growth
in passive strategies will also be driven by cost transparency and the search for cheap
investments. But also new uses for ETFs will emerge. Institutions will use them to get
access to specic asset class or geographic exposures and retail investors will invest in
ETFs as a lower-cost alternative to mutual funds and UCITS funds. Finally, trends in
the last year are to construct ETF not on an CW basis but on a risk weighted one using
risk parity methods and to focus on risk factors instead of asset classes as underlying
instruments.
5.4.4.1 Unfunded Swap-Based Approach

In the swap-based approach one invests indirectly in a basket by achieving the index
performance via a total return swap (TRS), see Figure 5.4. The ETF sponsor pays cash
to the swap counterparty and indicates which index should matter for the ETF. The
swap counterparty is often the parent investment bank of the ETF sponsor. The TRS
swaps the index return against a basket return - that is to say, the ETF sponsor receives
the desired index return needed for the ETF and delivers a basket return to the swap
counterparty. The basket should be close to the index; the closer it is the lower is the
tracking error borne by the swap counterparty. The swap counterparty delivers a basket
of securities to the ETF sponsor as collateral for the cash paid.
This approach minimizes the tracking error for the ETF investor and enables more
underlyings to be accessed. The basket of securities used as collateral is typically not
related to the basket delivered to the swap counterparty, which mimics the index. Why
should an investment bank, as swap counterparty, enter into such a contract, see the next
example.
Example
Assume that three securities -S1 , S2 , and S3 - make up an Index I . The weights
of S1 and S2 are each 48%, and S3 contributes 4% to the index. The ETF sponsor
delivers the basket consisting of assets S1 and S2 only to the swap counterparty. The
missing S3 -asset is the tracking error source. The swap counterparty (the investment
bank (IB)) delivers to the ETF sponsor seven securities, C1 ,..., C7 , as collateral. These
assets are in the inventory of the IB due either to its market-making activities or the
issuance of derivatives, i.e. business that is not related to ETFs. When these securities
Ci are less liquid, they will have to be funded either in unsecured markets or in repo
markets with deep haircuts. The IB has, for example, to pay 120% for a security Ci that
is worth only 100% at a given date. Transferring these securities to the ETF sponsor,
the IB may benet from reduced warehousing costs for these assets. Part of these cost
savings may then be passed on to the ETF investors through a lower total expense ratio
for the fund holdings. The cost savings accruing to the investment banking activities
can be directly linked to the quality of the collateral assets transferred to the ETF
sponsor. A second possible benet for the IB is lower regulatory and internal economic
capital requirements since the regulatory charge for less liquid securities Ci is larger than
for the more liquid securities S1 and S2 in the basket delivered by the ETF sponsor.
Summarizing, a synthetic swap has a positive impact on the security inventory costs of
the IB due to non-ETF business or regulatory capital or internal economic risk capital
charges.
The drawbacks of synthetic swaps are counterparty risk and documentation require-
ments (International Swaps and Derivatives Association [ISDA]).
5.4.4.2 ETFs for Dierent Asset Classes

The rst and most popular ETFs track broad stock indices, sector indices or specic niche
areas like green power. The evolution of ETFs by region between 2010 and 2013 (World
Federation of Echanges [2014]) shows the dominance of the Americas with around 90%
of the traded ETF volumes, followed by Asia and Europe, both with 5% and 6%. The
size in Europe declined in the period whereas the size in Asia doubled. The worldwide
Figure 5.4: Unfunded swap ETF structure (Ramaswamy [2011]).
ETF assets in USD bn were 9.670 in 2010 and 11, 893 in 2013.
Bond ETFs face typically face huge demand when stock markets are weak such as
when recessions occur. An asset rotation from stocks to bonds is often observed in such
cases. Figure 5.5 shows bond inows of USD 800 billion and equity redemption in long-
only equities (LO equities) after the GFC. In the last years an opposite rotation began
due to close-to-zero interest rates.
Commodity ETFs invest in oil, precious metals, agricultural products, etc. The idea
of a gold ETF was conceptualized in India in 2002. At the end of 2012 the SPDR Gold
Shares ETF was the second-largest ETF. Rydex Investments launched 2005 the rst cur-
rency ETF. These funds are total return products where the investor gets access to the
FX spot change, local institutional interest rates, and a collateral yield.
Actively managed ETFs were oered in the United States since 2008. Initially, they
grew faster than index ETFs did in their three years. But the growth rate was not
sustainable: The number of actively managed ETFs is not growing since several years.
Many academic studies question the value of active ETF management since they face
the same skill and luck issue as mutual fund and much higher costs than static ETFs.
5.4.4.3 Leveraged ETFs (LETFs)

Leveraged ETFs (LETFs) or inverse leveraged ETFs use derivatives to seek a return that
corresponds to a multiple of the unleveraged ETF. LETFs require nancial engineering
Figure 5.5: Bond inows and equity redemptions (BoA Merill Lynch Global Investment
Strategy, EPFR Global [2013]).
techniques in their construction and the life cycle management to achieve the desired
return. Trading future contracts is a common way to construct leveraged ETFs. Re-
balancing and re-indexing of LETFs can be costly in turbulent markets. LETFs deliver
positive or negative multiples of a benchmark's return on a daily basis. Several empirical
studies show that LETFs deviate signicantly from their underlying benchmark. This
tracking error has two main causes - a compounding eect and a rebalancing eect, Dobi
and Avellaneda (2012).
Example
Consider a LETF with positive leverage factor 2 (bullish leverage). We follow Dobi
and Avellaneda (2012). There are three time periods 0, 1, 2 in the example (see Table
5.8). The index value of the ETF starts at 100, loses 10%, and then gains 10%.
The initial AuM is USD 1, 000 at day 0, and the AuM is USD 800 at day 1 due to
the 10% drop on day 1:
USD 800 = 1, 000(1 − 2 × 0.1).
This implies a required TRS exposure of 2 × 800 = USD1, 600. The notional value of the
Time Grid t0 t1− t1+ t2− t2+

Index Value 100 90 99
AuM 1,000 800 960
TRS exposure needed 2,000 1,600 1,920
Notional TRS 2,000 1,800 1,6000 1,760 1,920
Exposure adjustment 0 - -200 - +160
Table 5.8: Data for the leveraged ETF example. tk,− denotes the time tk before adjust-
ment of the TRS and tk,− after the adjustment of TRS.
TRS from day 0 has become, at day 1,
USD 2, 000 × (1 − 0.1) = 1, 800.
1 is USD 1, 600,
This is the exposure before adjustment. Since the exposure needed at day
the swap counterparty must sell (short the synthetic stock) USD 200 = 1, 800 − 1, 600
of TRS. Doing the same calculation for day 2, the AuM is USD 960 and the exposure
needed is USD 1, 920 at day 2. Similarly, on day 2 the swap counterparty must buy a
TRS amount of USD 160 = 1, 920 − 1, 760, where USD 1, 760 = 1, 600 × (1 + 0.1) is the
exposure before adjustment.
Example
We consider the compounding problem for a LETF. Fix an index and a two-time
LETF, both beginning at 100. Assume that the index rst rises 10% to 110 and then
drops back to 100, a drop of 9.09%. The LETF will rst rise 20% and then drop 18.18% =
2 × 9.09%. But 18.18%120 = 21.82. Therefore, while the index has value 100, the LETF
is at 98.18. which implies a loss of 1.82%. Such losses always occur for LETF when the
underlying index value changes direction. The more frequent such directional changes
are - hence it is a volatility eect - the more pronounced the losses.
These examples illustrate that a LETF always rebalances in the same direction as
the underlying index, regardless of whether the LETF is a bullish one (positive leverage)
or bearish one (negative leverage). The fund always buys high and sells low in order to
maintain a constant leverage factor. A similar results holds for inverse LETFs.
5.4.5 Evolution of Expense Ratios for Actively Managed Funds, Index

Funds and ETFs
Figure 5.6 shows the evolution of expense ratios for actively managed funds and index
funds.
Expense Ratios of Actively Managed and Index Funds, bps

p.a.
120
100
80
60
40
20
0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Actively managed bond funds Index bond funds

Actively managed equity funds Index equity funds
Figure 5.6: Expense ratios of actively managed (upper lines) and index funds (lower
lines) - bps p.a. (Investment Company Institute and Lipper [2014]).
The trend of decreasing fees continues. But for the index funds a bottom level seems
to be close. Table 5.9 also considers ETF fees.
Equity Bonds
Mutual funds (*) 0.74% 0.61%
Index funds (*) 0.12% 0.11%
ETFs (**, ]) 0.49% 0.25%
ETF core (**,+) 0.09% 0.09%
Table 5.9: Fees p.a. in bps in 2013 ((*) Investment Company Institute, Lipper; (**) DB
Tracker; (]) Barclays; (+) BlackRock).
5.5 Alternative Investments (AI) - Insurance-Linked Invest-

ments
AIs are often dened as investments in asset classes other than stocks, bonds, commodi-
ties, currencies, and cash. These investments can be illiquid. We only consider insurance
linked securities in the sequel. It is estimated that alternative investments will reach
to USD 13 trillion by 2020 up from USD 6.9 trillion in 2014. One expects that more
and more investors can access AIs as regulators begin to allow them access to specic
5.5. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS321
regulated vehicles such as alternative UCITS funds in Europe and alternative mutual
funds in the US.
5.5.1 Insurance-Linked Investments

This section is based on LGT (2014). Insurance-linked investments are based on the
events of life insurers, and of non-life insurers such as insurers against natural catastro-
phes for example. The main products are insurance-linked securities (ILS such as CAT
bonds) and collateralized reinsurance investments (CRI). The size, in global terms, of
this relatively young market is USD 200 bn as of 2014. Regulation plays a signicant
role in the use of alternatives. The creditworthiness of the insurance and reinsurance
company require large capital basis' from a regulatory perspective for the catastrophe
cases. To reduce the capital charge under Solvency II, the catastrophe part of the risks
is transferred to the capital markets using ILS and CRI.
5.5.1.1 ILS
Insurance buyers such as primary insurers, reinsurers, governments, and corporates enter
into a contract with a special purpose vehicle (SPV). They pay a premium to the SPV
and receive insurance cover in return. The SPV nances the insurance cover with the
principal paid by investors. The principal is returned at the end of the contract if no
event has occurred. The investor receives, in excess to the principal payback, the pre-
mium and a collateral yield.
An example is the catastrophe or CAT bond 'Muteki'. Muteki SPV provided the
insurance buyer Munich Re with protection against Japanese earthquake losses. Central
to ILS investing is the description of the events. The description has to be transparent,
unambiguous, measurable, veriable, and comprehensive. The parametrization in Muteki
is carried out using parameters from the 1,000 observatories located in Japan that use
seismographs. 'Ground acceleration' is used to calculate the value of the CAT bond
index. This determines whether a payout from the investors to the insurance protection
buyers is due.
5 Figure 5.7 shows the peak ground velocities measured during the 11
March, 2011 earthquake. The star indicates the epicenter; the regions with the highest
ground velocities also experienced the related tsunami.
The insurance industry lost an estimated USD 30−35 billion. The ground acceleration
data became available on 25 March, 2015. Multiplying the ground velocity chart by the
weight-per-station chart of Munich Re implied an index level for the CAT bond of 1, 815
points. This index level led to a full payout from the investors to the insurance buyer
since the trigger level - that is to say, the level of the index at which a payout starts to
be positive - of 984 was exceeded and also because the exhaustion level of 1,420 points
was breached. Hence, investors in this CAT bond suered a 100 percent loss.
5 The exposure of Munich Re in Japan is not uniformly spread over the whole country. The insurer
therefore weights the signals of the measuring stations such that the payout in the CAT bond matches
the potential losses of Munich Re from claims incurred due to the event.
Figure 5.7: Ground velocities measured by the Japan's 1,000 seismological observatories
during the earthquake of 11 March, 2011, which also caused a huge tsunami and almost
20,000 fatalities (Kyoshin [2011]).
5.5.1.2 CRI
In collateralized reinsurance investments (CRIs) the same insurance protection buyers as
for ILS buy insurance cover from an SPV in exchange for a premium. The SPV hands
over the premium and collateral yield to the investor. The investor pays, in cases where
he receives proof of loss, the loss payment to the SPV. Between the investor and the in-
surance buyer a letter of credit is set up to guarantee the potential loss payment. Table
5.10 summarizes ILS and CRI product specications. The ILS pays out if an event is
Parameter ILS CRI

Wrapping Fixed-income security Customized contract
Return Collateral yield + premium Collateral yield + premium
Term 12 to 60 months 6 to 18 months
Size USD 2 to 500 mn USD 2 to 50 mn
Liquidity Tradable asset; liquid Non-tradable asset
Market size for non-life risk (2014) USD 24 bn USD 35 bn
Table 5.10: Comparison between ILS and CRI investments (LGT [2014]).
realized and triggers are met. Then the bond pays out. For the CRI, if and event is
realized and triggers are met, the investor makes a loss payment.
5.5. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS323
ILS and CRI comprise 13 percent and 18 percent, respectively, of total reinsurance
investments. The remainders are traditional uncollateralized reinsurance investments.
The cumulative issuance volume of CAT bonds and ILS started in 1995, reached 20 bn
in 2007, 40 bn in 2010 and 70 bn in 2015.
6 Figure 5.8 shows the average catastrophe
bond and ILS expected loss and coupon by year.
Figure 5.8: Average expected coupon and average expected loss of CAT bonds and ILS
issuance by year (artemis.com [2015]).
The correlation with traditional asset classes are smaller than comparable correlations
ILS Govt bonds Corporate bonds Equities

ILS 100%
Govt bonds 8% 100%
Corporate bonds 25% 35% 100%
Equities 23% -22% 63% 100%
Table 5.11: Correlation matrix for dierent asset classes. Monthly data in USD from 31
Dec 2003 until 30 Nov 2014 (LGT [2014], Barclays Capital, Citigroup Index, Bloomberg).
between bonds and stocks, see Table 5.11. Nevertheless, correlation is weakly positive.
This is due to the fact that catastrophe events always have an impact on rm value in
both directions. The correlation with government bonds is much less aected and would
6 The main intermediaries or service providers to the catastrophe bond and insurance-linked securiti-
zation market in 2014 were Aon Beneld Securities, Swiss Re Capital Markets, GC Securities, Goldman
Sachs, and Deutsche Bank Securities.
become stronger if a catastrophe event had a signicant impact on the entire economy
of a nation.
5.6 Private Markets

Private markets (PM) compared to public markets are characterized as follows. First,
the assets to invest are not publicly traded. Second, shares can only be bought and sold
in large quantities. Third, information about the company where investment takes place
is more detailed than in public markets but only accessible for shareholders. Fourth,
shareholders are typically heavily involved and hold often a majority stake in the com-
pany. This means that a private equity rm such as The Blackstone Group not only
buys shares for the investors but is heavily involved in the management of the company.
7
Fifth, PM transactions are characterized by signicant access to capital and to networks
with strong expertise.
The evolution of PM can roughly classied in three periods. In the area 1970-1990
private markets ment emergence of leveraged buyouts focussed on the US and in the
retail, chemical and manufacturing sectors. Such leveraged buyouts (LBO) mean to buy
a company using a combination of equity and debt where the company's cash ow is used
to repay the borrowed money. Debt is used since it costs of capital are lower than for eq-
uity. Interest payments reduce the corporate income tax liability but dividend payments
based on equity do not. The use of leveraged buyouts led to several defaults of rms
since their debt ratio was too high. This led banks to require lower debt-to-equity ratios.
In the period 1990-2010 private equity became broader in the industries they invested
in (healthcare, education) and PM became a global activity. In the last period starting
after the GFC PM became broader with three pillars: Debt, real estate and infrastructure.
The low interest rate environments made private markets attractive for investors such
as pension funds which before the GFC did not invest in these markets. A study of Tow-
ers Watson in 2017 highlighted that 94% of the actual PM investors will increase or
maintain their private market allocations in the longer-term. The AuM in PM steadily
increased from 2006 USD 1.5 tr to more than USD 4 tr in 2017. Dry powder however
did not increase in the same period but dropped from around 40 percent before the GFC
to values between 30 and 35 percent in the last years. Dry powder refers to highly liquid
securities. If deal activity falls and dry powder accumulate a risky situation can emerge
when investors adds pressure to the PM rm to deploy that capital, i.e. doing transac-
tions they might not otherwise do.
A second observation related PM and public markets in the last 25 years. First, the
number of publicly listed rms dropped from 7'322 in 1996 to 3'671 in 2016 (Credit Su-
7 The ten largest PE rms in 2017 according to PEI Media are The Blackstone Group, Kohlberg
Kravis Roberts, The Carlyle Group, TPG Capital, Warburg Pincus, Advent International Corporation,
Apollo Global Management, EnCap Investments, Neuberger Berman and CVC Capital Partners.
5.6. PRIVATE MARKETS 325
isse, Doldge et al. (2016)) and second, private rms stay private longer or even forever.
Facebook for example was founded 2004 and had its IPO in 2012.
The valuation of share in PM and public markets are in 2018 both at historic highs.
The S&P 500 index increased by a factor of almost 2.5 in the last 6 years and EV/EB-
BITA in PM also increased around 40 percent in the same period for value 14x for large
caps and 12x for small and mid caps (Sources: S&P and Partners Group (2018)).
Figure 5.9 shows that operational value creation drives performance more than nan-
cial development which is the opposite compared to the Leveraged Buyout period.
Figure 5.9: Drivers of performance in PM. (Partners Group [2017]).
A further signicant tendency of investor is abstain from excessive diversication

but to search high conviction portfolios. Excessive diversication was a result of lack of
transparency whereas high conviction is the result of experiences and successful selection
of the investments. Typically, in the past institutional investor spread their PM invest-
ments among hundreds of assets. Today, the most successful PM rms see the investment
spreads only across several dozen of assets.
Comparing return in PM after the GFC with public markets, roughly PM have a 3
to 4 percent average higher return than their public counter parts in the equity, debt,
real estate and infrastructure investments and considering maximum drawdowns in the
period 2000-2015, the gures for PM are between 20 and 30 percent lower in the above
four classes compared to the public counter parts.
Some major players in the PM start to oer part of their PM oering to wealthy
private clients or auent clients. This requires to transform some of the PM oerings
into public ones. Since many investors became familiar with PM in the last years they in-
creased their allocations and invest globally. This requires PM rms to consider portfolio
construction techniques on a more sophisticated level than in the past.
5.7 Hedge Funds

5.7.1 What is a hedge fund (HF)?
HFs allow for private placement collective investments for mostly qualied investors. HFs
are an investment strategy and not an asset class in their own right since they often trade
in common liquid asset classes.
8 HFs often use short positions, derivatives, and lever-
age in their strategies. From a regulatory and tax perspective, HFs were often oshore
domiciled on certain islands/countries that oer tax advantages or have low regulation
standards. But regulation of hedge funds is changing. But since 2012, HFs with assets
exceeding USD 150 million have to register and report information to the SEC.
9 HFs
have to satisfy less stringent disclosure rules than mutual funds.
HFs often have a limited number of wealthy investors. If a HF restricts the number
of investors it is not a registered investment company. It is then in the US exempt from
most parts of the Investment Company Act Of 1940 (the 40-Act). Most HFs in the US
have a limited-partnership structure. The limitation of the number of investors auto-
matically increases the minimum investment amount to USD 1 million or more. Many
HFs do not allow investors to redeem their money immediately. The reason are short
positions of the funds. To reduce this risk, HF needs to pay margins. If short positions
increase, HFs need to add more and more margin and would then eventually face liquid-
ity problems if at the same time investors redeem their money. Mutual funds are not
allowed to earn non-linear fees, while most HFs do charge a at management fee and a
performance fee (rule 2/20 for 2% management fee, 20% performance fee). The business
of running a hedge fund has become more expensive due to the increased regulatory
burden. KPMG (2013) outline the following gures for the average set-up costs: USD
700, 000 for a small fund manager, USD 6 million for a medium-sized one, and USD
14 million for the largest. In all, KPMG estimated hedge funds had spent USD 3 billion
meeting compliance costs associated with new regulation since 2008 - equating to, roughly,
a 10 per cent increase in their annual operating costs. KPMG (2013).
8 The main sources are the hedge fund review of Getmansky, Lee, and Lo (2015) and Ang (2013).
9 Fatca, the Foreign Account Tax Compliance Act, is an US extraterritorial regime of hedge fund
regulation. It requires all non-US hedge funds to report information on their US clients. Europe's
Alternative Investment Fund Managers Directive (AIFMD) requires information by any fund manager
independent where they are based if they sell to an EU-based investor.
5.7. HEDGE FUNDS 327
HFs can face losses due to their construction or the market structure even in cases
when there are no specic market events. As Khandani and Lo (2007) state, quantitative
HFs faced a perfect nancial storm in August 2007 in a normal market environment. The
Global Alpha Fund, managed by Goldman Sachs Asset Management, lost 30 percent in a
few days although it claimed to be designed for low volatility and low correlated strategies.
The HF received an injection of USD 3 billion to stabilize it.
5.7.2 Hedge Fund Industry

The rst HF was set up by Jones in 1949. This fund was based on three principles.
First, it was not transparent how Jones was managing the fund. Second, there was a
performance fee of 20 percent, but no management fee. Third, the fund was set up as a
non-public fund. This framework is still applied by most HFs today.
The largest HF in 2014, 2017 are shown in Figure 5.12. Total HF size in 2014 was
USD 2.85 trillion versus USD 2.6 trillion in 2013. The average growth in HF assets from
1990 to 2012 was roughly 14 percent per year. The decrease in AuM after the GFC
was fully recovered six years later. The losses incurred during the GFC were around 19
percent, which is only around half the losses of some major stock market indices. In the
period 2009 to 2012, HF performance was lower than the S&P 500, ranging between 4.8
percent and 9.8 percent on an annual basis. The decreases in AuM during the GFC and
Hedge Funds USD bn 2014 USD bn 2017 Growth 14-17 [%]

Bridgewater Associates USA 87.1 122.2 40
AQR Capital Management USA 29.9 69.9 134
J.P. Morgan Asset USA 59.0 45.0 -24
Renaissance Technologies USA 24.0 42.0 75
Two Sigma Investments/Advisers USA 17.5 38.9 122
D.E. Shaw USA 22.2 34.7 56
Millenium Management USA 21.0 33.9 61
Man Group, London UK 28.3 33.9 20
Och-Zif Capital Management USA 36.1 33.5 -7
Winton Capital Management UK 24.7 32.0 30
Elliott Management Corporation USA 23.3 31.3 34
Table 5.12: Largest hedge funds. (Barclays Hedge Fund Database)
the European debt crisis from USD 2.1 tr to 1.5 tr show that investors allocate money
pro-cyclically to HFs, similar to mutual funds or ETFs. The following facts regarding
the largest HFs are from Milnes (2014) (the number after the hedge fund's name is its
ranking in the list of the world's largest HFs as of 2014).
• Bridgewater Associates (1). There was a relatively poor performance of the three
agship funds in 2012 and 2013 of 3.5%, 5.25%, and 4.62%. The performance over
ten years is 8.6%, 11.8%, and 7.7%.
• J.P. Morgan Asset Management (2). J.P. Morgan bought 2004 the global multi-
strategy rm Highbridge Capital Management for USD 1.3 billion. Highbridge's
assets have 2004 multiplied by nearly 400 percent to USD29 billion.
• Brevan Howard Capital Management (3). This HF maintains both solid returns
and asset growth - which is the exception of a HF. The agship is a global macro-
focused HF (USD 27 bn AuM), which - since its launch in 2003 - has never lost
money on an annual basis.
• Och-Zi Capital Management (4) oers publicly traded hedge funds in the US with
far greater disclosure than other HFs. Its popularity is mainly due to Daniel Och's
conservative investing style.
• BlueCrest Capital (5) was a spin-o from a derivative trading desk at J.P. Morgan
in 2000. It has grown rapidly and is one of the biggest algo hedge fund rms. Its
reputation boosted up in 2008 when it made large prots while most other HF
facing losses.
• AQR Capital Management (7), co-founded by Cli Asness, gives retail investors
access to hedge fund strategies. Asness is also well-known for his critique of the
unnecessarily high fees charged by most HFs and his scientic contributions.
• Man Group (9) was founded in 1783 by James Man as a barrel-making rm. It has
225 years of trading experience and 25 years in the HF industry. In recent years,
its agship fund AHL struggled due to its performance.
• Baupost Group (11) is an unconventional, successful HF. Baupost avoids leverage,

is biased toward long trades, holds an average of a third of its portfolio in cash and
charges only 1 percent fee.
• Winton Capital Management (13) has its roots in the quant fund AHL (founded
1987 and bought by Man Group in 1989). David Harding, like many in the quan-
titative trading eld with a math or physics education was also a pioneer in the
commodity trading adviser (CTA) eld. Winton is the biggest managed futures
rm in the world.
• Renaissance Technologies (15). The fameous mathematician Jim Simons founded

Renaissance Technologies. Simons became the pioneer of quantitative analysis in
the hedge fund industry. Renaissance mainly relies on scientists and mathemati-
cians to write its moneymaking algorithms. It has been consistently successful over
the years.
The largest loss a HF has suered was the USD 6 billion losses of Amaranth in 2006.
This loss, of around 65 percent of the fund's assets, was possible due to extensive leverage
and a wrongheaded bet on natural gas futures.
5.7.2.1 HF Strategies
An important selling argument for HFs is that their investment only weakly correlates
with traditional markets. Starting in 2000, correlation between MSCI World and the
broad DJ CS Hedge Fund Index (HF Index) changed on a two-year rolling basis: Cor-
relation was 0.16 (HF index) in the years 2000-2007 and jumped to 0.8 in 2007-2009
since a signicant number of HFs' managers started 2007 to invest traditionally in stocks
and commodities. Many HF use similar strategies as in factor investing. The main dif-
ference is transparency of the latter one, implementation of the factors as indices and
construction of a cross-asset oering of factors. This main advantages make it attractive
for investor to switch their investments from the more opaque and often more expansive
HF to a factor portfolio.
5.7.3 CTA Strategy

CTA strategies are managed futures strategies where the HF invests in highly liquid,
transparent, exchange-traded futures markets and foreign exchange markets.
10 Invest-
ments are made in dierent markets following a rule based investment strategy. The
predominant investment strategy is market-neutral trend following: There is no need for
any fundamental input nor for a forward looking market opinion. The portfolio construc-
tion is usually risk-weighted. Figure 5.10 shows the size evolution of the managed futures
industry.
The gure shows the strong inow in 2009 after the GFC where managed futures
were successful and other investments in HF faced heavy losses. The last 4 years show
stagnation in the growth of AuM. Many events in the recent past made trend following
dicult: Euro Sovereign Debt Crisis, Greece, China Crisis 2015, etc. The zig-zag be-
haviour of markets due to such events is the natural enemy for trend models since trend
reversal signals are 'too late'. The largest player as of end of 2017 with around USD 32 bn
is Wynton Capital, followed by MAN HL and Two Sigma Investments. Geographically,
the London area dominates followed by the US and Switzerland. In the last two decades
there has been a signicant shift from the US to London and other European countries.
5.7.4 Fees and Leverage

Most hedge funds charge annual fees of a xed percentage of AuM (1% − 2% of the NAV
per year) and an incentive fee that is a percentage of typically 20% of the fund's annual
net prot dened as the fund's total earnings above some minimum threshold such as
the LIBOR return and net of previous cumulative losses (high-water mark).
Are HF fees justied? Titman and Tiu (2011) document that on average HF in the
lowest R2 quartile charge 12 basis points more in management fees and 385 basis points
10 The abbreviation CTA means Commodity Trading Advisors which are heavily regulated in the US
by NFA / CFTC. Typically traded instrument are futures (and options) on equities, equities indices,
commodities, xed income such as spot, forwards, futures and options in FX asset class.
Figure 5.10: Development of the managed futures industry. Data are from Barclay CTA
index (Gmür [2015]).
more in incentive fees compared to hedge funds in the highest quartile. Feng et al. (2013)
nd that management fees act similar as a call option at maturity, and that HF man-
agers can therefore increase the value of this option by increasing the volatility of their
investments. For CTAs one observes that very professional investors in CTAs prefer to
set the xed management fee to zero and instead to share even more than 20% of the
performance fee.
Fees are particularly opaque for double layer fees for funds of funds, see Brown et al.
(2004). They nd that individual funds dominate funds of funds in terms of net-of-fee
returns and Sharpe ratios. The performance fee impacts compensation of HF managers
or owner. While top hedge fund managers can earn billions of USD in one year. This
dominates salaries of bluechip CEO by factors 10 to 30 times.
The fee discussion continues to damage the reputation of HF. California Public Em-
ployees' Retirement System (CalPERS) decided 2014 to divest itself of its entire USD 4
billion portfolio of HF.
Hedge funds often use leverage to boost returns. Since leverage increases both returns
and risks, leverage is most relevant for low volatility strategies. Besides return volatility,
illiquidity is another risk source for leveraged investments, i.e. the loans are linked to
margin calls. This can force HFs to shut down in a crisis when the HF is unable to cover
the large margin calls. Ang et al. (2011): ... hedge fund leverage decreased prior to the
start of the nancial crisis in 2007 and was at its lowest in early 2009 when the leverage
of investment banks was at its highest.
Leverage is not constant over time. Cao et al. (2013) nd that HF are able to adjust
their portfolios' market exposure as a function of market liquidity conditions. Several
pitfalls exist in the context of leverage. Consider the use of futures for CTAs. Suppose
that an investor invests USD100 with a margin of 10 but he desires a leveraged exposure
of USD 200 which requires a margin of USD 20. How much can the investor lose? In the
worst case USD 100 when there is a margin call which exceeds USD 80. If the investor
cannot comply with the margin call or if the investor is not able to pay the called amount,
the positions are closed and the loss of the investor is USD 100.
5.7.5 Withdrawing Restrictions, Fund Flows and Capital Formation

Getmansky et al. [2015] state various restriction for investors to withdraw money from
a hedge fund:
• a subscription process for investors,
• the capacity constraints of a given strategy,
• new investors are often forced into a one-year 'lockup' period during which they
cannot withdraw their funds,
• withdrawals that are subject to advanced notice,
• temporary restrictions on how much of an investor's capital can be redeemed in a

crisis.
Such restrictions protect against re-sale liquidations causing extreme losses for the
HF remaining investors. The discretionary right to impose withdraw gates can be very
costly for investors if the losses accumulate during the period where withdrawing is not
possible, see Ang and Bollen (2010). Several studies document a positive empirical rela-
tionship between fund ows and recent performance. HF investors seek positive returns
and ee from negative returns (Goetzmann et al. [2003], Baquero and Verbeek [2009],
and Getmansky et al. [2015]). The relationship between fund ows and investment per-
formance is often non-linear.
11
5.7.6 Biases, Entries and Exits

Hedge fund managers report, voluntarily, their returns to databases. They are free to stop
reporting at any time. Hence, a number of biases are possible in HF returns databases.
11 See Aragon, Liang, and Park (2013), Goetzmann et al. (2003), Baquero and Verbeek (2009), Teo
(2011) and Aragon and Qian (2010) report about some non-linear relations.
• Survivor bias and selection bias, i.e. there is a stronger reporting incentive if
returns are positive. This bias increases the average fund's return, ranging between
0.16% − 3%, see Ackermann et al. [1999], Liang [2000] and Amin and Kat [2003].
• Backll bias. The primary motivation for disclosing return data is marketing. HF
start to report after they have been successful: They ll in their positive past
returns; the 'backll bias'. Fung and Hsieh (2000) estimate a backll bias of 1.4
percent p.a. for the Lipper TASS database (1994-1998). Malkiel and Saha (2005)
estimate that the return of HFs that backll is twice the return gure for those not
backlling.
Backlling means that part of the left tail loss return distribution are missing in HF
databases. Since large, well-known HFs do not need to engage in marketing by report-
ing to commercial databases also part of the right-hand return tail is missing in the
databases. We recall the ndings of Patton et al. (2013) in Section 2.5.4 about the
revision of previously reported returns.
Given these biases why do databases not correct in a transparent and standardized
way these biases when publishing their data? Figure 5.11 shows that impact if one
corrects for survivorship and backll biases annualized returns half.
Figure 5.11: Summary statistics for cross-sectionally averaged returns from the Lipper
TASS database from January 1996 through December 2014. The last value - box p-value -
represents the p-value of the Ljung-Box Q-statistics with three reported legs (Getmansky
et al. [2015]).
We consider entries and exits in HF. More than twice as many new funds entered
Jan 1996-Dec 2006 the Lipper TASS database each year, despite the high attrition rates.
This process reversed in the GFC period. After the peak number of new HF in 2007 -
2008, the attrition rate jumped to 21 percent, the average return was the lowest at −18.4
percent, and 71 percent of all hedge funds experienced negative performance.
The survival rates of hedge funds is estimated by several authors, see Horst and
Verbeek (2007). Summarizing, 30 − 50 percent of all HFs disappear within 30 months of
entry and 5 percent of all HFs last more than 10 years. These rates dier signicantly
for dierent stlyes, see Getmansky et al. (2004).
5.7.7 Investment Performance

We use the strategy categorization of the Lipper TASS database in 11 main groupings.12
5.7.7.1 Basic Performance Studies

There are several facts that limit the alpha of the HF industry. The number of HF
managers has increased from hundreds to more than 10, 000 in the last two decades.
Although the average fund manager today has higher technical skills than in the past,
it is becoming increasingly dicult for the individual manager to beat the HF market:
Take out the superstars, and you are left with an expensive, below-benchmark industry.
A second limitation is the increased eciency of some markets. The closer markets are
to the EMH, the less possible it is to predict future returns. Finally, an increasing size
of the fund typically lead to a weaker performance.
Asness (2014) plots the realized alpha of hedge funds over a period of 36 months. He
takes the monthly returns over cash, subtracts 37 percent for the S&P 500 excess return
- which is the full-period, long-term beta - and looks at the annualized average of this
realized alpha (see Figure 5.12).
We observe a decreasing alpha over time which ends up negative in the near past.
Recent years seem to have been particular. Unlike for mutual funds, a number of studies
document positive risk-adjusted returns in the HF industry before the GFC. Ibbotson et
al. (2011) report positive alphas in every year in the period 1995-2009. While the alphas
of the HF industry have been decreasing steadily in the last two decades, correlation
with broad stock market indices shows the opposite evolution.
The performance of HF is often linked to specic circumstances. Gao and Huang

(2014) report that hedge fund managers gain an informational advantage in securities
trading through their connections with political lobbyists. They nd that politically
connected hedge funds outperform non-connected funds by between 1.6 percent and 2.5
12 Convertible Arbitrage, Dedicated Short Bias, Emerging Markets, Equity Market Neutral, Event
Driven, Fixed Income Arbitrage, Global Macro, Long/Short Equity Hedge, Managed Futures, Multi-
Strategy, and Fund of Funds.
Figure 5.12: Average monthly returns (realized alpha) of the overall Credit Suisse Hedge
Fund Index and the HFRI Fund Weighted Composite Index for a rolling 36 months
(Asness [2014]).
percent per month on their holdings of politically sensitive stocks as compared to their
less politically sensitive holdings.
5.7.7.2 Performance Persistence

There is mixed evidence regarding performance persistence.
• Agarwal and Naik (2000a), Chen (2007) and Bares et al. (2003) nd performance
persistence for short periods.
• Brown et al. (1999) and Edwards and Caglayan (2001) nd no evidence of perfor-
mance persistence.
• Fung et al. (2008) nd a positive alpha-path dependency. Given a fund has a
positive alpha, the probability that the fund will again show a positive alpha in
the next period is 28 percent. The probability for non-alpha fund is only half of
this value. The year-by-year alpha-transition probability for a positive-alpha fund
is always higher than that of a non-alpha fund.
While performance persistence is sought out by investors, excessive persistence is a

signal that something is wrong.
Figure 5.13: Monthly return distribution for Faireld Sentry (line) and S&P 500 (dots)
returns (Ang [2013]).
Figure 5.13 shows the extremely smooth return prole of Faireld Sentry compared
to the S&P 500. Faireld Sentry was the feeder fund to Mado Investment Securities.
We consider the performance of the CTAs Winton and Chesapeake. Starting with
USD 1 of investment in October 1997 until the January 2013 (Quantica [2015]), the rst
CTA pays out around USD 9 ad the end of 2013 and the second one USD 18. Both CTAs
had positive return until the GFC. Then Chesapeake's volatility started to increase and
the positive past trend became essentially a at one. This behaviour is typical for other
CTAs too. For Winton, there is almost no suering of return during and after the GFC.
The reason is risk. Winton takes much less risk than Chesapeake. Why can a CTA strat-
egy work? Empirical evidence for the equity index market shows that skewness and the
Sharpe-ratio are highly positively related in equity markets: Investors are compensated
with excess returns for assuming excess skewness rather than excess volatility. Trend-
following strategies which oer positive risk-premia with positive skewed returns. Market
participants often belief that hedge funds are excessively using short strategies. This is
not the case for CTAs - around 80% of the investments are long-only strategies and 20%
use short strategies.
Figure 5.14 shows the attribution of the prot and loss to the dierent asset classes in
the last decade. During the GFC, CTA did not produced a positive return by huge short
positions in equity markets but by long positions in the trend model for xed income:
The decreasing rates in this period where a constant source of positive returns.
Figure 5.14: Annual sector attribution of the prot and loss for the Quantica CTA
(Quantica [2015]).
5.7.7.3 Timing Ability

Hedge funds are much less restricted compared to mutual funds to engage in several
forms of timing. This includes market timing, volatility timing, or liquidity timing. The
study of Aragon and Martin (2012) gives evidence that HF successfully use derivatives
to prot from private information about stock fundamentals. Cao et al. (2013) nd that
HF managers increase (decrease) their portfolios' market exposure when equity market
liquidity is high (low), and that liquidity timing is most pronounced when market liquidity
is very low.
5.7.7.4 Luck and Skill

Criton and Scaillet (2014) apply the false discovery methodology to hedge funds. They
use a multi-factor model with time-varying alphas and betas. This means that they con-
sider dierent risk factors for the dierent asset classes. For equity, one risk factor is the
S&P500 minus the risk-free rate and for bonds one factor is represented by the monthly
change in the 10-year treasury constant maturity yield.
They consider equity long/short strategy, emerging markets, equity market neutral,
event driven, and global macro strategies. The main results are that the majority of
funds are still zero-alpha funds (ranging from 41% to 97% for dierent strategies) similar
to mutual funds. But there is a higher proportion of positive alpha funds compared to
mutual funds (0%−45%) and the proportion of negative-alpha funds ranges between 2.5%
and 18.6%. The highest skilled funds are emerging market strategies, followed by global
macro and equity long/short. The proportion of skilled or unskilled funds is dierent
for dierent market stress periods. But there is not an uniform decline of skilled funds
observed over the period from 1992 to 2006 as for mutual funds. This is some evidence
that successful mutual fund asset managers moved to the HF and/or that markets are
less ecient for HF strategies than for mutual fund ones.
5.7.7.5 Hedge Fund Styles

Hedge fund styles are highly dynamic and behave very dierently from those used by
mutual funds. Getmansky et al. (2015), see Figure 5.15, report correlations of monthly
average returns of hedge funds in each Lipper TASS style category.
• High correlation. Correlations between Event Driven and Convertible Arbitrage

categories are 0.77.
• Negative correlation. Correlations between Long/Short Equity Hedge and Dedi-
cated Short Bias are −0.74.
• Virtually no correlation. Managed Futures have no correlation with other categories
except for Global Macro.
Getmansky et al. (2015) use a factor model based on PCA to gain more insight into
possible correlations. The size of the eigenvalues indicates that 79% of the strategies'
volatility-equalized variances is explained by only three factors. This suggests that a
large fraction of hedge funds' returns are generated by a very small universe of uncorre-
lated strategies. The largest estimated eigenvalue takes the value 52.3%. The authors
simulate one million correlation matrices using IID Gaussian returns and they compute
the matrices' largest eigenvalues. The mean of this distribution is 13.51%, while the
minimum and maximum are 11.59% and 17.18%, respectively. These values are much
smaller than 52.3%. This is strong evidence that the dierent HF returns, although they
are claimed to be dierent in their styles and even unique, in fact are driven by few com-
mon factors. Since 79% of HF category returns are driven by three factors, the benets
of diversication are limited for HF.
The heterogeneity and commonality among HF styles is shown in Figure 5.16. Ded-
icated Short Bias underperformed all other categories. Multi-Strategy hedge funds out-
performed Funds of Funds, Managed Futures funds' returns appear roughly IID and
Gaussian. The returns of the average Convertible Arbitrage fund are auto-correlated
and have fat tails. The styles Long/Short Equity, Event Driven, and Emerging Markets
funds have high correlations with the S&P 500 total return index between 0.64 − 0.74.
Return volatility of the average Emerging Markets fund is three times greater than for
the average Fixed Income Arbitrage fund.
Figure 5.15: Monthly correlations of the average returns of funds for the 10 main Lipper
TASS hedge fund categories in the Lipper TASS database from January 1996 through
December 2014. Correlations are color-coded with the highest correlations in blue, in-
termediate correlations in yellow, and the lowest correlations in red (Getmansky et al.
[2015]).
The CTA Quantica shows a low correlation with the traditional asset classes inclusive
10 − 15% correlations to the S&P 500, USD Gov
the global hedge fund index: betweeen
24T helargecorrelationwiththeCT AindexindicatesthatmanyC
Bonds 3-5y and GSCI commodity index,
trend−f ollowingmodelswhicharebroadlydiversif ied.AlthoughCT Asshowapersistentupwardsdrif tin
It follows that the CTA index shows much less heavy drawdowns than an equity
or a commodity index. The main reason is discipline in investment. This has two
components. First, CTAs are fully rule based. If a stop-loss trigger is breached losses
are realized. Second, CTAs allocations are risk-based where again, the risk attribution is
carried out mechanically. CTAs therefore follow the investment advice of David Ricardo
written in The Great Metropolis 1838: Cut short your losses, and let your prots
run on.
5.8 Event-Driven Investment Opportunities

The models so far assumed diversied investments of investors.
13 We consider a dier-
ent set-up: markets are disrupted unpredictably by certain events and investors want to
choose an investment in response of the event. Investments should hence be fast deployed
13 This section is an (almost) verbatim transcription of Mahringer et al. (2015).

5.8. EVENT-DRIVEN INVESTMENT OPPORTUNITIES 339
Figure 5.16: Summary statistics for the returns of the average fund in each Lipper TASS
style category and summary statistics for the corresponding CS-DJ Hedge Fund Index
from January 1996 through December 2014. Sharpe and Sortino ratios are adjusted for
the three-month US treasury bill rate. The 'All Single Manager Funds' category includes
the funds in all 10 main Lipper TASS categories and any other single-manager funds
present in the database (relatively few) while excluding funds of funds (Getmansky et al.
[2015]).
and not capturing any diversication needs but being bets due to the market disruption
and hence the belief, that markets will drift back to normal levels. Whether there is a
pre-trade portfolio suitability check of a possible trade is not the issue in this section.
There are dierent causes for these events - macroeconomic, policy interventions,
break down of investment strategies, or rm-specic events (for example, Lehman Broth-
ers). While some events are isolated and aect only single corporates, events at the
political or market level often lead to broader investment opportunities. Policy inter-
ventions can trigger market reactions that in turn can lead to new policy interventions.
The Swiss National Bank's announcement, in January 2015, that it would remove the
euro cap and introduce negative interest rates had an eect on Swiss stock markets, EU-
R/CHF rates, and xed-income markets.
Such events can impact dierent nancial markets for a short period of time (a ash
crash), a medium time period (the GFC), or a long time (the Japanese real-estate shock
of the 1990s). Investors with an investment view taking a bet when markets are under
Figure 5.17: Drawdown periods for S& P500 total return, GS commodity total return
index and Barclays US Managed Futures index BTOP 50. Data are from Dec 1986 to
Mar 2013 (Bloomberg).
stress is simpler than in normal times where a bet has more dimensions: Time that an
event happens and the direction of the event. Once an event has occurred, an investor
no longer needs to guess whether any event could happen in the future that would aect
the investment. We stress that a general requirement for investments based on events is
the tness of all parties involved - investors, advisory, and the issuer.
If an event occurs, the time-to-market to generate investment solutions and to make

an investment decision are central.
5.8.1 Structured Products (SP) and Derivatives

The wrappers of such solutions are no longer funds or ETFs - it takes too long to construct
them. The wrappers are derivatives and structured products. Both are manufactured
and issued by trading units or derivative rms - not by traditional asset management
rms. SPs are a combination of traditional investment instruments and at least one
derivative. SPs are therefore not an own asset class. SP dene payo liabilities for the
issuer and hence aect the balance sheet of the issuer and the investor faces issuer risk.
The replication of the payo of an SP with cash products and vanilla options is cen-
tral to the pricing and hedging of the SP. The price of the SP is equal to the sum of the
prices of the building blocks. The no arbitrage paradigm applies. The hedge corresponds
to the position of the dealer of the bank, which must generate the promised payo of the
SP. Theoretical equivalent replications can be dierent in practice if components have
dierent liquidity or if taxation diers. The buyer of a SP faces only claims but no
obligations which can only be asserted against the issuer. The only counter-party for the
investor is therefore the issuer whose creditworthiness enters in the pricing of the SP.
The issuer must deliver the relevant security to the buyer within 1-5 days (depending
on the jurisdiction). Since short positions are only possible if you can borrow the secu-
rities needed for delivery from someone (via securities lending) but as a rule, there is no
securities lending for SP, short positions in SP are not possible for investors. The issuer
is by denition short by issuing the SP. But it is possible to acquire SP with a short
position in the underlying. In this case, the investor holds a long position in the SP but
a short position in the underlying. Table 5.13 compares mutual funds with structured
products.
Mutual funds Structured Products

Mass products Taylor made, starting from CHF 10'000
No issuer risk Issuer risk (but COSI, TCM)
Long time-to-market Short time-to-market
Performance promise Payo promise
Large setup costs Low setup costs
Liquid and illiquid assets Liquid assets
Strong legal setup, standards, market access No legally binding denition of Structured Products
High-quality secondary markets
On balance sheet
Table 5.13: Mutual funds vs. structured products. COSI are structured products with
a minimal issuer risk thanks to collateralization vis SIX exchange. Triparty Collateral
Management (TCM) serves the same purpose.
Investing in a fund means an investor trusts the experience and capabilities of the
portfolio manager. SP and derivatives promise a payo, the ability of the issuer to
manage the product/hedge it is irrelevant for the product performance (as long as the
issuer does not default).
5.8.2 Pricing and Risk Management

We consider a discount certicate (DC), see Figure 5.18 for the payo prole at maturity
of the DC.
The investor is willing to give up the upside of the underlying in exchange for a buer
if the underlying drops. If the spot price of the underlying is CHF 236 and the issuing
price of the DC is CHF 210, 11 percentage lower than the price of the underlying,
i.e.
then the investors gets a discount of 11% and the maximum return is 250/210 − 1 = 19%.
The nal payo of the DC is given by min(ST , K) with ST the value of the underlying at
Profit
Underlying
Cap
Discount Certificate
Strike CHF 250

Underlying
Price
Loss
Figure 5.18: Payo of a discount certicate.
maturity and K the strike value. To price a DC one replicates the payo by using simpler
products: the price of the DC is by no arbitrage equal to the replication payo 's price.
Since the payo is non-linear, options are needed for replication and a model such as
Black and Scholes is used to price the options. The replication portfolio is long a LEPO
(Low Exercise Price Option) and short a call with strike K = 250. LEPOs are European
type call option with very low strike of K = 0.01 CHF. The current value of a LEPO is
equal to the current price of the underlying share compounded by the risk-free interest
rate, less the accumulated value of dividends and the strike price. Since K is close to
zero, the price sensitivity of th LEPO which implies that the price of the LEPO is well
approximated by the price of the underlying minus the PV of the dividends. Hence, the
DC payo is graphically equivalent to a straight line(LEPO) plus a short call payo.
More specically, in the model of Black and Scholes where the frictionless and com-
plete market consists of a risky asset following a geometric Brownian motion and and
risk-less asset with a risk-free rate R, the price for European call C0 and put options P0
with maturity T, strikes K, initial price S0 of the underlying are given by:
C0 = S0 Φ(d1 ) − Ke−RT Φ(d2 ) (5.6)

−RT
P0 = Ke Φ(−d2 ) − S0 Φ(−d1 )
with
√
ln(S0 /K) + RT σ T √
d1 = √ + , d2 = d1 − σ T = .
σ T 2
This prices are the solution of the Black and Scholes dierential pricing equation, see 5.7.
The rst term Φ(d1 ) is equal to the price sensitivity of the call w.r.t. to an underlying
S price move, the Delta ∆ = Φ(d1 ). Φ(d2 ) states the probability that the call will be in
the money, i.e. the value of the underlying is not lower than the strike price, at maturity.
Risk management for options is based on the Greeks, i.e. sensitivities or partial
derivatives of option prices with respect to parameters. Since derivatives are linear; a
Greek of a portfolio is equal to the sum of the position Greeks time the quantity. Since
there are many parameters in the pricing formula, several Greeks exist. Delta (∆) was
dened above. A Delta of +0.5 implies that if an underlying stock rises by CHF 1,
the theoretical option price increases by CHF 0.5. Consider an investor which is long
10 option contracts 50-calls (i.e. strike 50) on Nestle stock with a Delta of 0.5 with 70
shares of the stock per option contract and he is short 200 Nestle stocks. The position
Delta :
−200 + 0.5 × 10 × 70 = +150 .
Gamma Γ states how much the Delta of an option changes when the price of the stock
moves. A large Gamma means that the Delta changes strongly even for a small move in
the stock price.
Theta Θ, or time decay, is an estimate of how much the theoretical value of an

option decreases when 1 day passes. The Thetas for same-parameter calls and puts are
not equal. The dierence depends on the cost-of-carry for the underlying stock. When
the dividend yield is less than the interest rate - the cost-of-carry for the stock is positive
- Theta for the call is higher than for the put. The dierence between the extrinsic value
of the option with more days to expiration and the option with fewer days to expiration
is due to Theta. Therefore, long options have negative Theta and short options have
positive Theta. If options are continuously losing extrinsic value, a long (short) option
position will lose (gain) money because of Theta. Theta value does not decreases linearly
over time since the value is not linear distributed between OTM, ATM and ITM. Gamma
and negative Theta are dual to each other: If Gamma is highest for a long call position,
negative Theta is also largest.
We consider Delta and Gamma hedging in for the portfolio V:
• Short 10 000 calls, Time-to-Maturity (TtM) 90 days, strike 60, volatility 30%, risk
less rate 8%. The currency is irrelevant.
• The fair option price using Black and Scholes is 4.14452 with Delta 0.581957. We
therefore receive a premium of 4144.52 by selling the options.
• To hedge the position we buy 581.96 stocks at the price 60. That for we borrow
(cash)
581.96 × 60 − 4144.52 = 340 917.39 − 4144.52 = 300 772.88 .
The portfolio value today is zero. We consider the portfolio value after1 day, i.e. TtM
is 89 days. In the Scenario 'unchanged' the underlying value remains at 60. Using Black
and Scholes, the option is worth 4.11833, i.e. Theta acts. This lower option liability
value is partly o-set by the increased cash liability:
300 779.62 = 30772.88 × (1 + 0.08/365) .
A gain 19.44 follows, see Table ??. The result for the two other scenarios 'up' and 'down'
are reported in Table ??.
Value
unchanged up down
Underlying 34'917.39 35'499.35 34'335.44
Cash -30'779.62 -30'779.62 -30'779.62
Option -4'118.33 -4'721.50 -3'559.08
Sum 19.44 -1.77 -3.26
Table 5.14: Value of the portfolio V after 1 day for dierent scenarios.
This shows that the Delta hedge is eective for small changes in the underlying value.
Can we additionally hedge the Gamma? Since one option is used for the Delta hedge,
we need a second option to achieve also Gamma neutrality. The data of this option are:
• Call, TtM 60 days, strike 65.
• All other parameters are the same as for the rst option, see Table 5.15.
TtM Strike Option Price Delta Gamma

Option 1 90/365 60 4.14452 0.581957 0.043688
Option 2 60/365 65 1.37825 0.312373 0.048502
Table 5.15: Option data.
Delta and Gamma neutrality means to choose a number of stocks z of option z such
that:
∆V = x − 10 000∆Opt1 + z∆Opt2 = 0
ΓV = −10 000ΓOpt1 + zΓOpt2 = 0 .
Solving these two linear equations gives
x = 300.58 , z = 900.76
To x cash, one solves V =0 at time 0:
V = xS + Cash − 1000 ∗ Opt1 + z ∗ Opt2 = 0 =⇒ Cash = −150 131.77 .
To be Delta and Gamma neutral we are long in the underlying, long in option 2 and short
cash. Table 5.16 compares the hedge eectiveness between Delta and Delta & Gamma
hedging.
Underlying after 1d Delta & Gamma Delta

58 -2.04 -71.35
58.5 0.3 -31.56
59 1.07 -3.26
59.5 0.81 13.69
60 0.02 19.45
60.5 -0.79 14.22
61 -1.11 -1.77
61.5 -0.49 -28.24
62 1.52 -64.93
Table 5.16: Delta & Gamma vs. Delta Hedge.
Vega is an estimate of how much the theoretical value of an option changes when
volatility changes by 1 percent. Option prices and volatility are in a 1:1 relation. There-
fore, instead of considereing prices equivalently volatility is used. Positive Vega means
that the value of an option position increases when volatility increases. Vega is highest
for ATM options. Rho ρ is an estimate of how much the theoretical value of an option
changes when interest rates move 1.00 percent. The Rho for a call and put at the same
strike price and the same expiration month are not equal.
Sensitivity w.r.t. Math Finance Expression

∂C(S)
Underlying S ∂S Delta ∆ ∆C = Φ(d1 ) > 0
∂P (S)
∂S ∆P = Φ(d1 ) − 1 < 0
∂C(τ ) √
Time-to-maturity τ ∂τ Theta Θ ΘC = −Sσφ(d1 )/(2 τ ) − rKe−rτ Φ(d2 ) < 0
∂P (τ ) √
∂τ ΘP = −Sσφ(d1 )/(2 τ ) + rKe−rτ Φ(−d2 ) < 0
∂C(r)
Risk free rate r ∂r Rho ρ ρC = Ke−rτ τ Φ(d2 ) > 0
∂P (r)
∂r ρP = −Ke−rτ τ Φ(−d2 ) < 0
∂C(σ) √
Vola σ ∂σ Vega ω ωC = φ(d1 )S τ > 0
∂P (σ)
∂σ ωP = ωC
∂ 2 C(S) √
Underlying S ∂S 2
Gamma Γ ΓC = φ(d1 )/(Sσ τ ) > 0
∂ 2 P (S)
∂S 2
ΓP = ΓC
Step 1 Product Size B & S Tr.Pr. Pos.Val. Delta P & L

Option LH -1000 7.232 -7232 -587
Position -7232 -587 0

Option LH -1000 7.232 -7232 -587
Stock LH 620 80 80 49600 620
Position 42368 33 0

Option LH -1000 7.232 7.5 -7232 -587 268
Stock LH 620 80 80 49600 620 0
Position 42368 33 268

Option LH -1000 7.232 7.5 -7232 -587 268
Stock LH 620 81 81 50220 620 0
Delta 33
Position 42988 33 268
Table 5.17: Positions in the option portfolio construction. Tr.Pr. means Trading Price,
B & S the theoretical Black and Scholes model price and Pos.Val. Position Value.
We discussed the sensitivities as if they were independent. But they are not. They
are linked by the Black and Scholes dierential equation:
σ 2 S 2 Γ + rS∆ − rC = −Θ . (5.7)
This equation follows by the assumed market structure and the assumption of no arbi-
trage - no further economic assumptions are needed. Adding the specic option contract
as a terminal condition, the solution C of the equation is the famous Black and Scholes
formula for the call option (or any other contract).
We consider the creation of an option trading book in a liquid market. Consider

the liquid stock Lafargeholcim (LH). We start with a short position of 1000 calls on LH
with price 7.232 CHF (Step 1). The option price is theoretically calculated. If LH stock
∂C
moves, up to rst order
∂S =: ∆, a loss of CHF −587 on the derivative position follows,
see Table 5.17.
Step 2: To reduce Delta risk, we buy 620 LH stocks at the price 80. To generate
P&L dierent possibilities exist. First (Step 3) one sells the options slightly at a higher
price than their values are. This gives a P&L of CHF 268. Second, price movements
as described above lead to P&L (step four where LH gains 1). Step 5 describes how
volatility movements generate P&L. We assume that the portfolio V is Delta neutral.
Volatility is 20%. If volatility increases by 1 volatility point, the bank loses 304 CHF.
If the trader hedges the Vega exposure he needs to trade in dierent options. Step 6
shows that if he trades in a second option, the Vega of the position is reduced but Delta
increases moves from zero. As we seen above, both Greeks can be controlled.
Step 5 Product Size Price Pos.Val. Delta Vega in CHF

Option Holcim -1000 7.232 -7232 -587 -304
Stock Holcim 588 80 47040 588
Position 39808 1 -304
Step 6 Product Size Price Pos.Val. Delta Vega in CHF

Option Holcim -1000 7.232 -7232 -587 -304
Stock Holcim 588 80 47040 588 0
Option Holcim 2 400 7.232 2893 235 122
Position 42701 236 -182
Table 5.18: Position in the option portfolio construction. The gure Delta is expressed
in numbers of Holcim shares.
We nally consider structural risk management aspects of SP trrading. The man-

agement of the risks of globally active SP issuers is characterized by strongly localized
sub-markets (customer needs, taxes, law, regulatory requirements): SPs are issued by lo-
cal units, the risks are sent via internal OTC transactions to the responsible trading units.
The Market Maker (MM) serves the Bank's clients as a counterparty for derivatives
transactions. He manages the risks of the bank. Hence, contrary to an asset management
rm he is a risk taker. But he also has to earn money. Assume that a market maker is
responsible for all derivatives on a particular underlying asset. He then exploits struc-
tural volatility arbitrage as follows. For most SPs, customers indirectly write options
(sell volatility). A market maker can therefore buy relatively low volatility through the
issuance of SP. For leverage products such as warrants, clients buy options. The mar-
ket maker can sell relatively expensive volatility by issuing warrants. The MM sets the
prices of SP and warrants relative to a calculated reference volatility curve, e.g. the Eu-
rex curve: Volatilities of the warrants are larger than the corresponding reference values
and vice versa for the SP. Hence, the MM buys cheaper SP volatility and sells it more
expensive in the warrants. Is this structural arbitrage persistent? Yes, the market for
SP is incomplete: Short positions in SP are not possible. The only force reducing this
arbitrage is competition between several issuers.
5.8.3 Political Events: Swiss National Bank (SNB) and ECB

The SNB announced, on 15 January 2015, the removal of the euro cap and the intro-
duction of negative CHF short-term interest rates. This decision caused the SMI to lose
about 15 percent of its value within 1 - 2 days, and the FX rate EUR/CHF dropped
from 1.2 to near parity. Similar changes occurred for USD/CHF. Swiss stocks from
export-oriented companies or companies with a high cost base in Swiss francs were most
aected. The drop in stock prices led to a sudden and large increase in Swiss stock
market volatility. Swiss interest rates became negative for maturities of up to thirteen
years.
It was also known at the time that the ECB would make public its stance on quan-
titative easing (QE) one week later. The market participants' consensus was that Mario
Draghi - president of the ECB - would announce a QE program. The events in Switzer-
land, which came as a surprise, and the ECB QE measures subsequently announced
paved the way for the following investment opportunities:
1. A Swiss investor could invest in high quality or high dividend paying EUR shares at
a discount of 15 percent. EUR shares were expected to rise due to the forthcoming
ECB announcement.
2. All Swiss stocks, independent of their market capitalization, faced heavy losses
independently of their exposure to the Swiss franc.
3. The increase in volatility made BRCs with very low barriers feasible.
4. The strengthening of the Swiss franc versus the US dollar, and the negative CHF
interest rates, led to a USD/CHF FX swap opportunity that only qualied investors
could benet from.
5. The negative interest rates in CHF and rates of almost zero in the eurozone made
investments in newly issued bonds very unattractive. Conversely, the low credit risk
of corporates brought about by the ECB's decision oered opportunities to invest
in the credit risk premia of large European corporates via structured products.
Before certain investment opportunities are discussed in more detail, it should be

noted that by the time this paper had been written (about ve months after the events
described above took place), all investments were protable and some even had two-
digit returns. This certainly does not mean that the investments were risk free, as such
investments are not risk free. But it shows that many investment opportunities are
created by policy interventions. This contrasts with the often voiced complaints about
negative interest rates and the absence of investment opportunities for rms, pension
funds, and even private investors. Some investment ideas will now be considered in more
detail.
5.8.4 Opportunities to Invest in High Dividend Paying EU Stocks

The idea was to buy such stocks at a discount due to the gain in value of the Swiss
franc against the euro. The rst issuer of a tracker oered such products on Monday, 19
January 2015 - that is to say, two business days after the SNB's decision was announced.
With all products, investors participated in the performance of a basket of European
shares with a high dividend forecast. The basket's constituents were selected follow-
ing suggestions from the issuing banks' research units. Investors could choose between
a structured product denominated in Swiss francs or in euros depending on their will-
ingness to face - besides the market risk of the stock basket - also the EUR/CHF FX risk.
This investment had two main risk sources. If it was denominated in euros, the EU-
R/CHF risk held and one faced the market risk of the large European companies whose
shares comprised the basket. Most investors classied the FX risk as acceptable since
a signicant further strengthening of the Swiss franc against the euro would meet with
counter measures from the SNB. More specically, a tracker on a basket of fourteen Eu-
ropean stocks was issued. The issuance price was xed at EUR 98.75. As of 1 April
2015 the product was trading at EUR 111.10 (mid-price) - equivalent to a performance
of 12.51 percent pro rata. Similar products were launched by all the large issuers.
Other issuers launched a tracker on Swiss stocks, putting all large Swiss stocks in a
basket that had only a little exposure to the Swiss franc, but which also faced a heavy
price correction after the SNB announcement in January. Again, the input of each issu-
ing bank's research unit in identifying these rms was key. The underlying investment
idea for this product can be seen as a typical application of behavioral nance: an over-
reaction of market participants to events is expected to vanish over time.
The risk in this investment was twofold. First, one did not know whether the SNB
would consider further measures, such as lowering interest rates further, which would
have led to a second drop in the value of Swiss equity shares. Second, international
investors with euros or US dollars as their reference currency could realize prots since
the drop in Swiss share values - around 15 percent - was more than oset by the gain
from the currency, which lost around 20 percent in 'value'; roughly, an institutional
investor could earn 5 percent by selling Swiss stocks. Since large investors exploit such
opportunities rapidly, it became clear three days after the SNB's decision was announced
that the avalanche of selling orders from international investors was over.
5.8.5 Low-Barrier BRCs

Investors and private bankers searched for cash alternatives with a 100 percent capital
guarantee. The negative CHF interest rates made this impossible: if 1 Swiss franc today
is worth less than 1 Swiss franc will be worth tomorrow, one has to invest more than 100
percent today to get a 100 percent capital guarantee in the future.
Low-barrier BRCs - say, with a barrier at 39 percent - could be issued with a coupon
of 1 to 2 percent depending on the issuer's credit worthiness and risk appetite for a ma-
turity of one to two years. S&P500, Eurostoxx 50, SMI, NIKKEI 225, and other broadly
diversied stock indices were used in combination as underlying values for the BRCs.
The low xed coupon of 12 percent takes into account that the product is considered as
a cash alternative with a zero percent, or even a negative, return.
Therefore, investors received, at maturity, the coupon payment - in any case - and
also 100 percent of the investment back if no equity index lost more than 61 percent
during the life-span of the product. If at least one index lost more than 61 percent,
the investor received the worst performing index at maturity, together with the coupon.
The risks of such an investment dier clearly from those of a deposit. For a deposit in
Switzerland, there is a deposit guarantee of up to CHF 100, 000. Furthermore, almost all
banks in Switzerland did not charge their clients the negative interest rate costs. Hence,
in this period a deposit is seen by many customers as 'less risky', albeit also with zero
performance before costs.
A low-barrier BRC, apart from issuer risk, has market risk. Can one estimate the
probability that one of the indices in a basket will lose more than 61 percent in one
year? One could simulate the basket and simply count the frequency of events leading
to a breach. Such a simulation has the drawback that one needs to assume parameters
for the indices. Another method would be to consider the historical lowest level of such
a basket - that is to say, what was the maximum loss in the past if one invested in a
low-barrier BRC? Using data going back to the initiation of the indices, no index lost - in
one year - more than 60 percent . This was the rationale to set the barrier at 39 percent.
This is obviously not a guarantee that this statement will apply also in the future, but it
helps investors to decide whether they accept the risk or not. Although this discussion
has concerned a BRC on equity, a similar discussion applies to such convertibles that
have currencies and commodities as underlyings. Relevant political and market events
in the recent past - and to which the above discussion also applies - occurred in October
2014 and, due to the European debt crisis, in August 2011. With regards to the former
set of events, the pressure on equity markets was due to uncertainty regarding Russia
and what would happen next in Ukraine; and on 15 October 2015 liquidity evaporated in
treasury futures and prices - an event known as the 'ash crash in the treasury market'.
5.8.6 Japan: Abenomics

As expected, the Liberal Democratic Party of Japan gained a substantial parliamentary
majority in the 2012 elections. The economic program introduced by the newly elected
PM Shinzo Abe was built on three pillars: 1) scal stimulus, 2) monetary easing, and 3)
structural reforms ('Abenomics'). Subsequently, the Yen (JPY) plunged versus its main
trading currencies, providing a hefty stimulus to the Japanese export industry. The issuer
of one product oered an outperformance structured product on the Nikkei 225 in quanto
Australian dollars, meaning that the structured product in question is denominated in
AUD and not in JPY, which would be the natural currency given the underlying Nikkei
225. This means that investors did not face JPY/AUD currency risk but if they were
Swiss investors, who think in Swiss francs, they still faced AUD/CHF risk. The term
'quanto' means 'quantity adjusting option'.
Outperformance certicates enable investors to participate disproportionately in price

advances in the underlying instrument if it trades higher than a specied threshold value.
Below the threshold value the performance of the structured product is the same as the
underlying value. How can investors invest in an index in such a way as to gain more
when markets outperform a single market index investment, but still not lose more if
the index drops? The issuer uses the anticipated dividends of the stocks in the index to
buy call options. These options lead to the leveraged position on the upside (see Figure
5.19).
Figure 5.19: Payo of an outperformance structured product.
The reason for using quanto AUD is the higher AUD interest rates compared to JPY
interest rates. Higher interest rates lead to higher participation and the participation
in the quanto product was 130 percent. The risk of the investment lay in whether
Abenomics would work as expected; and possibly FX AUD/CHF. The economic program
in Japan worked out well and the redemption rate lay at 198 percent after two years.
This redemption contains a loss of 16.35 percent due to the weakness of the Australian
dollar against the Swiss franc.
5.8.7 Market Events

The focus here will be on the credit risk of structured products. Although the examples
are presented under the heading of market events, the status of the market in the most
recent GFC and in 2014/2015 was the result of a complicated catenation of business
activities, policy interventions, and market participants' reactions. The discussion below
shows that structured products with underlying 'credit risk' oer, under specic circum-
stances, valuable investment opportunities to some investors. But the number of such
products issued is much smaller than the number of equity products. One reason for
this is that not all issuers are equally experienced or satisfy the requirements for issu-
ing credit-risky structured products (necessary FI trading desk, balance sheet, and risk
capital constraints). Another reason is the lack of acceptance of such products among
investors, regulators, portfolio managers, and relationship managers, all of whom often
do not have the same level of experience and know-how as they have regarding equity
products.
5.8.8 Negative Credit Basis after the GFC

Negative credit basis is a measurement of the dierence in the same risk in dierent
markets. The basis measures the dierence in credit risk - measuring once in the deriva-
tives markets and once xed in the bond markets. Theoretically, one would expect that
the credit risk of ABB has the same value independent of whether an ABB bond or a
credit derivative dened on ABB's credit risk is being considered. This is indeed true
if markets are not under stress - at which point the credit basis is close to zero. But if
liquidity is an issue, the basis becomes either negative or positive. In the most recent
GFC, liquidity was a scarce resource. The basis became negative since investing in bonds
required funding the notional while for credit derivatives only the option premium needs
to be nanced. For large corporates, the basis became strongly negative by up to −7
percent. Table 5.19 shows how the positive basis in May 2003 changed to a negative one
in November 2008.
Corporate Credit basis in May 2003 (bps) Credit basis in November 2008 (bps)
Merrill Lynch 47 -217
General Motors -32 -504
IBM 22 -64
J.P. Morgan Chase 22 -150
Table 5.19: Credit basis for a sample of corporates in 2003 and their negative basis in
the most recent GFC.
To invest in a negative basis product, the issuer of a structured product locks in the
negative basis for an investor by forming a portfolio of bonds and credit derivatives of
those rms with a negative basis. For each day on which the negative basis exists a cash
ow follows, which denes the participation of the investor. When the negative basis
vanishes, the product is terminated.
Example
Investing in the negative credit basis of General Motors (see Table 5.19) leads to a
return, on an annual basis, of 5.04 percent if the basis remains constant for one year.
If the product has a leverage of 3, the gross return is 15.12 percent. To obtain the net
return, one has to deduct the nancing costs of the leverage.
Structured products with this idea in mind were oered in spring 2009 to qualied
investors. The products oered an annual xed coupon of around 12 percent and partic-
ipation in the negative basis. The high coupons were possible as some issuers leveraged
investors' capital. This could only be oered by those few issuers in the most recent GFC
that were cash rich; typically AAA-rated banks. The products paid one coupon and were
then terminated after 14 months since the negative basis approached its normal value.
The product value led to a performance of around 70 percent for a 14-month investment
period. Was this formidable performance realized ex ante a free lunch - that is to say,
a risk-less investment? No. If the nancial system had fallen apart, investors would
have lost all the invested capital. But the investors basically only needed to answer the
following question: Will the nancial system and real economy return to normality? If
yes, the investment was reduced to the AAA issuer risk of the structured product.
Many lessons can be drawn from these products. A very turbulent time for markets
can oer extraordinary investment opportunities. The valuation of these opportunities
by investors must follow dierent patterns than in times of normal markets: There is
for example no history and no extensive back-testing, and hence an impossibility of
calculating any risk and return gures. But there is a lot of uncertainty. Making an
investment decision when uncertainty is the main market characteristic is an entirely
dierent proposition to doing so when markets are normal and the usual risk machinery
can be used to support decision-making with a range of forward-looking risk and return
gures. If uncertainty matters, investors who are cold-blooded, courageous, or gamblers,
and analytically strong, will invest, while others will prefer to keep their money in a safe
haven.
5.8.9 Positive Credit Basis 2014

The monetary interventions of the ECB and other central banks led to excess liquidity,
which was mirrored in a positive basis for several large rms. Monetary policy also im-
plied low or even negative interest rates. This made investment in newly issued bonds
unattractive. To summarize, investors were searching for an alternative to their bond
investments, but an alternative that was similar to a bond.
A credit linked note (CLN) is a structured product. Its payo prole corresponds to
a bond's payo in many respects. A CLN pays - similarly to a bond - a regular coupon.
The size of the coupon and the amount of the nominal value repaid at maturity both
depend on the credit worthiness of a third party, the so-called reference entity (the issuer
of the comparable bond). This is also similar to the situation for bonds. But the size
of the CLN coupon derives from credit derivative markets. Hence, if the credit basis is
positive, a larger CLN coupon follows, as compared to the bond coupon of the same ref-
erence entity. CLNs are typically more liquid than their corresponding bonds since credit
derivative markets are liquid while many bonds, even from large corporates, often suer
from illiquidity. CLNs are exible in their design of interest payments, maturities, and
currencies. CLNs also possess, compared to bonds, tax advantages; in fact, the return
after tax for bonds that were bought at a price above 100 percent is in this negative in-
terest rate environment often negative. The investor in a CLN faces two sources of credit
risk: the reference entity risk as for bonds, and the issuer risk of the structured product.
As an example, Glencore issued a new 1.25 percent bond with a coupon in Swiss francs.
Due to the positive basis, the coupon of the CLN was 1.70 percent. Another product
with, as the reference entity, Arcelor Mittal in EUR implied a higher CLN eective yield
compared to the bond of 1.02 percent in EUR.
Let us consider a more detailed example. Consider the reference entity Citigroup
Inc. The bond in CHF matures in April 2021 and its price is 102.5 with a coupon of
2.75 percent. The bond spread is 57 bps, which leads to a yield to maturity of −0.18
percent - an investor should sell the bond. The CLN has a spread of 75 bps, which
proves the positive basis and an issuance price of 100. The coupon of the CLN is - then
−0.71 percent, which leads to a yield to maturity of 0.57 percent if funding is subtracted.
Therefore, selling the bond and buying the CLN generates an additional return of 75 bps.
5.8.10 Currencies
Currency is typically an exposure derived from an investment in asset classes such as
xed income or equity. It provides no earnings growth, dividends, capital appreciation
or coupon payments. Therefore, many investors assume zero expected return from cur-
rency exposure. But currency exposure can impact return of international investment
portfolios. Investors activities range from unhedged to fully hedged currency exposure.
We follow ETF.com (1999).
Passive investors in stocks and bonds manage currency exposures actively, else a pas-
sive (un-)hedged policy will result in a large P&L. Any active investment is supported
by the low transaction costs in currency trading. Active currency exposure in most pro-
grams is to reduce exposure to foreign currency by keeping the upside risk and avoid
the downside risk when currencies are falling. Such a non-linear participation can be
implemented using derivatives and the costs of the derivatives can be optimized by us-
ing long-short combinations of derivatives. Using options a dynamic hedging strategy
follows. A dierent approach is following fundamental and/or technical techniques in
their strategies. They use models or traditional judgemental analysis to anticipate the
direction of the market.
Consider a bank periodically xing a tactical asset allocation (TAA). The market
assessments result in concrete forecasts for the individual investment categories and cur-
rencies. The goal is to replicate the TAA as closely as possible with tradable, liquid
nancial products. The investment categories real estate and alternative investments do
not meet these requirements and are excluded from the index. Assume that the TAA
has an average volatility of 8%, i.e. a balanced TAA. The TAA should be implemented
as cost-eectively as possible. Therefore, investments in asset classes equities, bonds,
commodities, currencies of the index are implemented by means of liquid futures, swaps
and forwards. Forwards are used for currency pairs such as EURCHF, USDCHF, i.e.
all against CHF. Futures are used for stock market indices and commodities and swaps
for interest rates. The index value It at timet which replicates the TAA is updated as
follows from a index value at a prior time s<t
!
X F Xtk k X X
It = 1+ φks R + φls Swapls,t + φm m
s Forws,t Is
F Xsk F ut,t m
k l
where RF ut,t is the simple futures return and the FX component only matters for the
futures since the swaps and the forwards are in CHF. F Xtk is the exchange rate of the
currency of the future k against the Swiss franc. francs at time t where 1 currency unit
k
is equivalent to F Xt of Swiss francs. The value of the futures k is calculated as the price
of the futures k at the time s in local currency multiplied by the contract unit of the
futures. An oil future with contract unit 1, 000 and USD 108 local currency has value
USD 108, 000. Swaps,t is the value of the swap l at time s with nal date t' at fair value
m
interest rate and with nominal CHF 1.00 The value of the currency forwards Forws,t at
time t in Swiss francs is given by a the fair value forward rate xed at time s with a
nominal value of CHF 1.00. The maturity of the forward corresponds to the next planned
roll date. The above formula shows that interest exposure follows from the chain rule.
Forwards and futures imply in the absence of no arbitrage the Covered Interest Rate
Parity. This basic equation relates interest rates and FX rates. There are two parity
relations.
• Covered Parity : The return of a domestic risk free investment equals the return of
a foreign risk free investment if the FX risk is hedged using a forward contract.
• Uncovered Parity : The interest dierential between two countries is compensated

by the expected FX changes.
An arbitrage strategy tries to make money based on the uncovered parity. Such trades are
called 'carry trades'. We consider rst the covered parity and discuss it using Japanese
yen (JPY) and Brazilian Real (BRL). If JPY are exchanged against BRL there is no
guarantee that BRL does not devaluates. Using a FX forward we eliminate this risk. We
assume
• Interest rate Yen RJP Y = 1% p.a., interest rate Real RBRL = 10% p.a., spot rate
S(t) = 0.025 BRLJPY.
We consider two dates 0 and T = 1y for a Japanese investor. The investor acts as follow
in 0:
• He borrows JPY 1000 at 1% for 1y, i.e. he pays back JPY 1010 at T.
• He changes the JPY into BRL at the spot rate which gives BRL 25.
• He invests the BRL 25 at 10% for 1y, i.e. he receives at T BRL 27.50.
In T = 1y , the investor changes the BRL 27.50 into JPY at the spot S(T ) which is not
known at 0: The above strategy is risky. To choose a risk free FX strategy he replaces
today unknown spot rate S(T ) by the known forward price F (0, T ). The forward price
is determined with the following no arbitrage argument. This leads to CIP. We write
Rd for the nominal interest rate in the domestic currency JPY and Rf for the interest
rate in the foreign currency BRL for one year. Figure 5.20 illustrates the strategy where
borrowing is in the foreign currency.
At 0:
• The investor borrows BRL for one year. He exchanges the BRL at the spot S(0)
in JPY and invests the JPY for 1y .
• He buys a forward F (0, T ) to exchange in one year JPY against BRL.
At T:
• The investor exchanges (1 + Rd ) × F (0, T ) JPY in BRL.
• He pays back the borrowed BRL amount and pays (1 + Rf )× BRL.
To avoid arbitrage at time T the amount received in foreign currency cannot be larger
than the amount of foreign currency payed back. This implies the Covered Interest
Rate Parity Theorem (CIP)
(1 + Rd )
F (t, T ) = S(t) (5.8)
(1 + Rf )
with
Rd − Rf
the interest rate dierential. CIP states that the dierence between domestic and
foreign interest rate determines the forward price.
Assuming S(t) = 106 JPYUSD, Rd (in JPY) = 0.034 and Rf (in USD) = 0.050. We
get for F (t, 1y)
F (t, 1y) = 106 × (1 − 0.016) JPY/USD = 104.304 JPY/USD.
Assume that a Bank oers the forward for 100 JPYUSD, then a U.S. investor exploits
this by borrowing in the cheaper USD currency, changing this amount into Japanese Yen
earning the foreign interest rate on this amount and he buys USD on a forward contract
t 1 BRL borrow S(t) JPY receive
T (1+ Rf) BRL (1+ Rd)S(t) /F(t,T)

receive BRL receive
Figure 5.20: Representation of the forward strategy to hedge the FX risk.
basis. At maturity he changes the Yen into USD at the forward price and pays back the
loan.
Consider next an USD-based investor with a 5 bn yen equity investment. He wishes to

fully hedge the currency risk for the next year. The current spot rate is 96.50 JPYUSD,
one-year interest rates are 5.63% in the US and 0.7% in Japan. The one-year forward
rate is
(1 + 0.70%)
× 96.50 = 92.00 JPYUSD.
(1 + 5.63%)
Since interest rates in Japan are signicantly lower than in the US, the forward price of
yen is at a signicant premium to the current spot price (92.00 vs. 96.50). To hedge
the investment, the investor enters into a forward contract to sell 5 billion yen at 92.00
JPYUSD one year from today. If the JPYUSD exchange rate ends the year lower (higher)
than the forward rate, the investor will realize a loss (gain) on the contract. Consider
a falling yen scenario. The initial Yen investment 5, 000, 000, 000 is worth initially USD
34, 722, 722. Initial spot is 144.0 and after one year, assumed spot is 156.7. The forward
rate is 136.8, the spot return is 8.1%, the forward premium on Yen is 5.0% and the
currency surprise is 13.3%. The spot P&L is USD 2, 810, 847, hedging P&L is 4, 551, 885
which results in a total P&L of 1, 741, 039.
What is the dierence between the uncovered (UIP)and the covered parity? UIP
replaces the forward price in the CIP procedure by expected spot price, i.e. F (t, T ) by
Et [S(T )]: :
(1 + Rd )
UIP: Et [S(T )] = S(t) . (5.9)
(1 + Rf )
Which view enters the expectation? If the view is that the best guess is the forward rate,
we are back to the CIP. There is no FX risk left. Carry trades are bets that expectations
formation diers from the e forward rate view:
(1 + Rd )
Et [S(T )] 6= F (t, T ) = S(t) . (5.10)
(1 + Rf )
Consider a Swiss investor which needs in 30d JPY. He buys the 30d JPYCHF forward
which xes the exchange rate for 30d in CHF. This is a covered position, i.e. there is no
FX risk. A dierent strategy is to exchange the CHF in JPY at the spot rate S(t), to
invest the amount in the Japanese money market for 30d and to pay the debt in JPY
back. This also leads to CIP. Finally, a third strategy is to invest the CHF amount and
to exchange it in 30d into JPY. This investment is not covered. FX risk is only zero
if realized 30d spot rates equals the forward price. Hence if the forward is lower than
indicated by the CIP one borrows money in the foreign currency, exchange it in domestic
currency at the spot price and lend in the domestic currency.
A currency carry trade is by denition a strategy to borrow in a currency with low

interest rates and to invest simultaneously in a currency with high interest rates. This
can only be protable if the expected spot price and the forward price deviate, else the
interest rate dierence is compensated by the forward price dierence. For JPY this
means to borrow at close to zero Japanese interest rates and to invest in a currency with
high rate. Assume that the Japanese rate is 0.5%. The loan in JPY is exchanged at spot
prices into USD where USD interest rates for one year are 5.25%. If the exchange rate
between USD-JPY remains unchanged, the net gain is 4.75%. If USD weakens relative
to JPY the gain shrinks since one then needs more USD to repay the debt in JPY.
Given the uncovered interest rate parity (UIP), arbitrage implies that the change of
a FX rate is equal to the nominal interest rate dierential between the two currencies.
Hence monetary policy (xing interest rates) and exchange rates are dependent. The
so-called Trilemma or Impossible Trinity. holds. A country cannot simultaneously
choose three policies: 1) a xed exchange rate (exchange rate stability), 2) open capital
markets (nancial integration) and 3) monetary policy autonomy. It can pick two; the
third follows by no arbitrage. If a country chooses open capital markets, uncovered
interest parity must hold: Arbitrage equalizes expected returns at home and abroad; the
domestic interest rate must equal the foreign interest rate plus the expected appreciation
of the foreign currency. If a country chooses open capital markets and xed exchange
rates, domestic interest rates have to equal the base-country interest rate, ruling out
monetary policy autonomy. If a country chooses open capital markets and wishes to set
domestic interest rates at levels suitable to domestic conditions, then exchange rates can
no longer be xed.
Figure 5.21: Left Panel: If a nation adopts position a, then it would maintain a xed
exchange rate and allow free capital ows, the consequence of which would be loss of
monetary sovereignty. Sweden for example decided to have control over the interest rate
and the free international capital ows and accepts that the exchange rate follows, i.e.
cannot be controlled.Source: Wikipedia. Right Panel: Monetary policy selections for four
countries. Source: J.P. Danthine, Swiss Finance Institute (2011).
Since 2014, CIP between USD and major other countries is broken. Borio et al.
(2016) analyze the reasons which can explain why such a basic relationship can fail to
hold. The goal in this section is to show how one can exploit this arbitrage opportunity
by making trades.
Consider a rm which denominates its income and balance sheet in CHF - a currency
where the CIP with USD in the period since 2014 is broken. Although CHF interest
rates are negative up to several years of maturity the rm cannot take a prot out of
this fact since interest rates are oored, deposits pay zero interest rates, and loans are
shifted upwards.
The broken CIP makes it possible that the rm participates at the negative interest
rate environment. First, the Swiss rm asks for the loan in USD. Together with an US-
DCHF swap the FX risk is hedged and participation at the negative CHF interest rates
follows.
With market rates as of June 12, 2017, the mechanics is the following:
• At t = 0, a FX USDCHF swap is xed, the USD loan is received, the amount is

changed into CHF at the spot rate and the 3m forward USDCHF is xed. Consider
a loan value in CHF 10 mn and spot USDCHF 0.9730. Then he gets USD 10.277.490
for a CHF loan of 10 mn.
• At t = 3m, the USD are bought back at the forward rate 3m USDCHF 0.9670
which implies a pay-back USD amount:
CHF9.979.660 = 10.277.490(1 + 1.58/4) ∗ 0.9670
where a margin of 0.4 percent is added to the USD 3m LIBOR rate 1.18%. Hence,
a P&L in 3m of CHF 20.340 which means a return p.a. of 0.81 percent follows.
The strategy can be rolled-over until the CIP is eventually restored in the future.
5.9 Collateral
5.9.1 Prime Finance
Prime Finance is an important trading activity which is frequently used by asset man-
agement rms. Prime Finance has dierent aspects:
• Lending and borrowing of securities, the Securitie and Lending Business (SLB).Securities
Lending Business SLB
• Repos, i.e. sale and repurchase agreements.
• Synthetic nance such as synthetic SLB or synthetic Repo. These transactions

combine a SLB with a derivative where the underlying is the security of the SLB
transaction.
The general motivation for repos is the borrowing or lending of cash. In securities lend-
ing, the purpose is to temporarily obtain the security for other purposes, such as covering
short positions or for use in complex nancial structures. Securities are generally lent
out for a fee. Securities lending trades are governed by dierent types of legal agreements
than repos.
Prime nance business changed heavily after the GFC and is still transforming. Sev-
eral rationales motivate prime nance activities and its transformation. A rst rationale
is collateralized banking. Repo business can be considered as secured banking were
collateral serves as a creditor protector for non-retail investors. Creditors are bank, in-
surance companies, governments, rms, AM or pension funds. Markets which are widely
collateralized are for example xed income repo, equity nance, exchange traded se-
curities, OTC derivatives, securities lending, banks loans, asset backed securities. An
important property of collateral is its eligibility, i.e. the extend how collateral can be
5.9. COLLATERAL 361
converted into an economic value if the counter party defaults. Liquidity, quality in terms
of embedded credit risk and the possibility to settle the collateral dene the collateral eli-
gibility. Cash is the most used collateral followed by government bonds, large-cap shares.
For traders, repos are used to nance long positions, obtain access to cheaper funding
costs of other speculative investments, and cover short positions in securities. A second
rationale is cost reduction in the custody of securities where lending and borrowing secu-
rities generates earnings which lower these costs. Third, to cover short positions one has
to borrow securities. Short positions can be the results of market making, the hedging of
derivative positions or part of an investment strategy. Finally, regulatory requirements
lead to lower risk weighted assets in the regulatory capital charge if one switches from
unsecured to secured transactions.
5.9.2 Repo Transaction

A repo, a bilateral contract between a buyer and a seller, allows a borrower to use a
nancial security as collateral for a cash loan at a xed rate of interest. The borrower
agrees to sell immediately at 0 a security to a lender and also agrees to buy the same
security from the lender at a xed price at some later date 1. A repo is equivalent to a
cash transaction combined with a forward contract. The dierence between the forward
price and the spot price is the interest on the loan while the settlement date of the
forward contract is the maturity date of the loan. A repo can be cash or security driven.
It is security driven if the investor wishes to lend a security. Repos can be described as
follow:
• At 0: Assignment of the securities from the seller to the buyer.
• At 1: Redemption of the loan and interest rate payments to the buyer and reas-
signment of the security from the buyer to the seller.
The purchase price in 0 equals the market value (dirty price) of the underlying security
minus an add on (Haircut). The haircut provides a restricted protection against falling
security prices. The payback price equals the purchase price plus an agreed interest pay-
ment (repo rate), which depends upon the quality of the security. If the security losses
value, a margin call follows. Using a repo the Buyer obtains favourable rates compared
to an unsecured loan and the Seller receives collateral.
Almost any security may be employed in a repo. But highly liquid securities are
preferred because they can be easily secured in the open market where the buyer has
created a short position in the repo security through a reverse repo and market sale.
Treasury, Government bills, corporate and Treasury/Government bonds, and stocks may
all be used as a collateral in a repo transaction. Coupons which are paid while the repo
buyer owns the securities are passed to the repo seller although the ownership of the
collateral rests with the buyer during the repo agreement. There are three types of repo
maturities: overnight, term (i.e. with a specied date), and open repo.
The most important forms of a repo transactions are specied delivery and tri-party.
The rst form requires the delivery of a prespecied bond at the onset, and at maturity
of the contractual period. Tri-party essentially is a basket form of transaction, and allows
for a wider range of instruments in the basket or pool. The tri-party agent, acts as an
intermediary between the two parties to the repo. The tri-party agent is responsible for
the administration of the transaction, marking to market, and substitution of collateral.
The largest one being Clearstream and JP Morgan Chase.
A reverse repo is the same repurchase agreement from the buyer's viewpoint, not the
seller's. The term reverse repo is used to describe a short position in a debt instrument
where the buyer in the repo transaction immediately sells the security provided by the
seller on the open market.
Example:
While investors trade bonds on a stand alone basis, trading desks use repo jointly with
bond trading. Buying a bond is completed immediately by selling the bond in a repo,
i.e. one nances the bond. We consider an US Treasury Bond with the following dates:
• T trading day to buy the bond.
• T1 = T + 1 settlement day for the bond. Start/opening the repo.
• T2 = T + 2. Closing of the 1-day repo.
At T the trader buys the bond for the price B(T ) from a counter party A. At T +1 the
repo transaction starts to nance the bond. To achieve this
• the repo desk delivers the bond for 1 day, i.e. the period of the repo transaction is
overnight from T1 to T2 for a price B(T1Repo ), to the repo counter party and
• the repo desk agrees to buy the bond back at T2 for the price
B(T1Repo )(1 + r/360)
with r the repo rate.

The prices B(T ) and B(T1Repo ) can dier at T1 . The dierence is a residual cash position
with a cash rate rcash . This rate is in general dierent from the repo rate. At T2 the
Repo
repo desk pays B(T1 )(1 + r/360) to the counter party, receives the bond back and
delivers the bond for the price B(T1 ) to the buyer. The P& L of this transactions over
1 day reads:
P&L = P (T1 ) − B(T ) Price Change Bond (5.11)

Repo
− B(T1 )r/360 Repo Costs
Repo
+ (B(T1 ) − B(T ))rCash /360 Dierence Repo vs. Cash Market.
5.9. COLLATERAL 363
Using the data notional 100 Mio. USD, coupon 4 percent, T is Oct 2 for trading the
bond, settlement Oct 3 from the clean Price of the bond 100'078'125 USD (= 100 − 02+
in US Treasury notation) with accrued interest the settlement price 100'110'911 USD
3
follows where the accrued interest rate is
183 × 0.04/2: The bond accrues interest since
Sept 30 and a half a year has 183 days. The repo rate r equals 3.4 percent, the cash rate
is 3.5 percent. Since the bond settles Oct 3, the repo desk nances the bond. The bond
price changes from Oct 2 to Oct 3 by (100-05). Therefore, the value of the position in
dirty prices increased to
3
1000 1890 036 = (1 + 5/32 + × 0.04/2) × 100Mio. USD .
183
At Oct 3 the following payments/transactions are made:
• Bonds are received with value USD 1000 1100 911 and exchanged for a secured loan
0 0
of USD 100 189 036 with the repo counter party.
• They deliver cash payments of 780 125 USD.
At Oct 4 the following payments/transactions are made:
• The repo counter party hands back the lent bond and obtains the repo rate interest:
1000 1980 499 = 1000 1890 036 × (1 + 0.034/360) .
• The bond is sold from the repo desk to the buyer. The price equals the clean price
of Oct 3 with Oct 4 settlement plus accrued interest. If the bond increased to
100-08, we have
4
1000 2930 715 = (1 + 8/32 + × 0.04/2) × 100 Mio. USD .
183
The P&L components are:
• Change in bond price: 1000 2930 715 − 1000 1100 911 = +1820 803 USD.
• Repo costs : −1000 1890 036 × 0.034

360 = −9462 USD.
• Dierence Repo vs. Cash Market: 780 125 × 0.035

360 = +7.7 USD.
A 1-day P&L of 173'349 USD follows.
Contrary to the SLB business, repo is always of the type cash against security. Both
transaction types face the same market risk but settlement risk can be dierent.
Eurex, one of the world wide largest exchange for futures and option trading, also
oers platforms for bond trading and for repo (Eurex Repo). The platform is open to all
nancial institutions. The Eurex Repo platform is a TriParty platform with integrated
trading and settlement functionalities. This means that a a third party to the Buyer and
Seller is responsible for administration and operations. The largest providers TriParty
Repo programs are Clearstream and JP Morgan Chase. The Eurex platform integrates
trading, settlement and legal documentation. Participants at Eurex Repo can choose
from a broad menu of repo transactions. An advantage of the Eurex Repo platform is
that the securities which are received as collateral can be used immediately for a new
repo transaction. This allows banks to raise cash if they need to do so. The Eurex market
consists of four links for the participants in CHF repos:
• Trading is via the Eurex Repo platform.
• Clearing, Settlement and Collateral Management takes place at SIX SIS.
• Cash Clearing is done via SIX Interbank Clearing.
• There is a link to SNB which publishes the SNB-eligible securities.
As an example, consider a bond trader (Seller) which wishes to borrow CHF 20 Mio.
to nance for one week an investment of CHF 18 Mio. Swiss Government Bonds with
3 percent coupon. A repo buyer oers a repo rate of 2 percent. The seller accepts the
rate. He delivers CHF 18 Mio. nominal against CHF 20 Mio. cash. At the same day he
pays the buyer CHF 20 Mio. in exchange of the CHF 18 Mio. bonds. After one week
the buyer gives back the bond to the seller. The seller pays back the loan plus accrued
interest:
0.02 × 7
200 0000 000 × = 70 777.8 CHF .
360
Chapter 6
Asset Management Innovation

6.1 Views on Disruption
6.1.1 Replacement and Prices
At the root of disruption lies replacement. Existing successful goods and services are
replaced by new ones: Automobiles replacing rail transport, high speed rail replacing
short distance ights, word processing software replacing the typewriter, ultrasound re-
placing X-ray imaging, plastic replacing metals and wood, personal computers replacing
workstations and Wikipedia replacing traditional encyclopaedias.
Disruption is considered uncontrollable - unlike a transformation. After the nan-

cial crisis in 2008, digital disruption for the nancial intermediaris (FI) meant foremost
replacing the semi-automated value chains by automating ones. Cost reductions and
scalability were the drivers.
But digital disruption has a much broader meaning than eciency. Innovation and
the new entrants, FinTechs and the Tech Giants, can disrupt ownership of the FI both
on the production and the customer side. FinTech innovation is driven from an end-
customer perspective and customer will follow their business model. Hence FI have to
adopt this view too and leave their bank centric approach. Since FI can integrate or copy
the solutions of the many FinTechs they are not a real threat. But the few Tech Giants
are. They already have a broad customer base, a technological advantage and almost
unlimited resources. The Revised Payment Service Directive of the European Union,
eective 2018, is an example. It has disruption potential since it is an important step
towards an open nance system. That is a system where the end-costumers choose their
best products and services from a platform. The FI deliver their services and products to
the platform. Compared to the traditional business model in an open nance economy
the link end-costumer / FI is broken and the FI are in competition on the platform.
Clearly, in an open economy business becomes much less protable unless the FI owes
the platform. The above directive is a regulatory driver for Tech Giants to oer their
365
366 CHAPTER 6. ASSET MANAGEMENT INNOVATION
superior data analytic capabilities to end-customers.
Besides breaking the customer-FI link, new entrants can act disruptively by changing
the topology of the nancial architecture (Blockchain, cryptocurrencies, platforms). This
means to redene the market participants connections and to reallocate ownership rights
in the architecture. Their broad assumption is that the action space of FI in the value
chains can be largely reduced, sometimes even completely replaced. In fact, technology
is able to replace monopolistic or oligopolistic ownership of key centralized FI functions
by decentralized solutions based on game theory (aka blockchain). This denes the in-
frastructure channel of digital disruption in the nancial industry.
The internet revolution in the 90s revolutionized the ow of information by sending
information quickly and free of charge to many individuals. This is an ecient way
to copy and distribute information. But nancial intermediation is based on asset val-
ues and their distribution based on contracts. To revolutionize the existing generation
and ow of values, the internet solution approach is useless: Copying a USD 10 bill for
payment purposes is useless. In a digital value ow, someone has to validate that the
payer owes USD 10, that he has not promised it to anyone else and that the millions of
payments in the system are synchronized to prevent fraudulent actions. Formally, trans-
action feasibility, transaction legitimization and transaction consensus have to assured
for each transaction at each date. In the current nancial world, banks, central banks
and exchanges are oering and owning these functions: They provide a payment system
and they validate as third parties the transaction (consensus). Bitcoin based on the
mutually distributed ledger technology proved that complete digitized payment systems
and currencies are possible were code and mathematics replaces all functions of the FI
in at money banking. Whether the thousands of cryptocurrencies survive is not clear.
Each currency needs to reect an economic value and not just a believe of investors,
they need to be competitive (transaction fees, speed, security), ecological sound (energy
consumption) and address the monetary perspective (sti supply side).
The attack on the end-customer /FI link is dierent in nature. The iPhone method
of integration makes it possible to integrate all FI activities in such a unique device.
Furthermore, the methods to express and analyze customers needs will in the very near
future replace the capabilities and quality of any relationship advisor. This sets the stage
for an Open Finance system: They want to have a single access to an intelligent platform,
where they can decide in a user experienced way. The FI, if they do not owe the platform,
are reduced to product service providers and to running the accounts in the background.
The quality of the digital services will ultimately create a time and location independent
emotional relationship with the end-customers. At this stage denitively there will be
no further need for a human FI interaction. Some FI have proven to be able to adapt
quickly to a new environment and they use their powerful resources to act as a shaper.
6.1. VIEWS ON DISRUPTION 367
6.1.2 Market Entrants

There are start-ups (FinTechs) and Tech giants entrants. Start-up rms produce goods
and services in a better way than traditional AM rms. 'Better' can mean cheaper, tailor-
made to the customers, scalable, interconnected with other needs of the customers or an
increased functionality or quality. Most FinTechs are mono-liners by oering a single
service or product, they have no client base and they innovate mostly from a end-client
perspective, i.e. at the end-client FI interface. This makes FinTechs vulnerable. The
main strategy observed is to enter into a cooperation contract with FI. But cooperations
are unlikely stable end states - the FI can break cooperation if they have caught up with
the technological advantage of the FinTechs and FinTechs themselves can leave if they
have access to customers.
The nancial crisis 2008 can be considered a starting point for digital disruption in
the nancial sector. Of the 248 surveyed European FinTechs in the Roland Berger (2016)
study, 15 were founded before 2008 and the rest after the nancial crises. Three triggers
cumulated in this period: The iPhone made it impossible to empower the end client,
FI had to spend many their resources to meet the regulatory avalanche, and FI had to
increase protability by lowering costs. The survey of McKinsey (2015) for the sample
of more than 120 000 FinTech start-ups states:
• Target clients: 62% of the start-ups target private customers, 28% SMEs and the
rest large enterprises.
• Function: Most start-ups work in the area of payment services (43%) followed by
loans (24%), investments (18%) and deposits (15%).
Even FinTechs consider Tech Giants to be more dangerous for FI as they are them-
selves (Roland Berger (2016). The Big Four Amazon, Apple, Google and Facebook are
examples for Tech Giant entrants in the nancial sector. Our Western-centric view is
to simplify the discussion. For each of the Big Four there is a comparable and equally
1
successful Chinese counterpart. While the Big Four are less agile than FinTechs, their
almost unlimited resources, their strong client basis and their technological advantage
make them a real threat for FI.
Although it has long been speculated that the Big Four will enter the banking business
on a large scale so far this does not happen. Google has a banking license for Europe
since 2011, Facebook requested one but nothing happened so far. One can speculate
about the reasons: More protable alternatives, to heavy regulatory costs or business
risk such as the program AdWords Business Credit which was discontinued? Facebook
could oer banking services to its 1.5 billion users which are living in countries with a
1 JD.com, Renren, Baidu, Sina, Tencent and Alibaba are counterparts to the Big Four. Tencent for
example started with a market cap of USD 210 m at IPO in 2004 which is up to USD 233 bn according
to Bloomberg. The user base of their application WeChat grew from 50 million in 2011 to over 800
million in 2016.
non-stable political, nancial, social and legal system. Apple which is active with Apple
Pay could do a lot more. It is meaningful what disruption could mean. One scenario
is the the Big Four to become full FI. But they could also prefer to take over the end-
customer interface due to their superior technology and data analytics methods. This
latter model ts well into the so-called open nance paradigm where end-customers are
self-decision makers, they are connected to a platform or a cloud where data analytics
methods provide decision making services for portfolio management for example, where
the data of the customers in dierent FI are aggregated in the platform, where the best
FI is selected to deliver products and services once an end-customer made its decision.
Hence, FI become pure product providers and lose the interface to their clients. Since
2018, the Revised Payment Service Directive (PSD2) from the European Union points
in this direction and puts core banking functions under stress. PSD2 obliges banks
which are active in payments to reveal customer data to third parties if the customers
wishes to do so. Banks could then lose a main part of their value chain since the Tech
Giants could use their excellent analytics to provide services to the end-clints. Banks
will defend their value chain. Their main weapon is the existing payment infrastructure
which they build up such as IBAN and SWIFT. They will price costs to the new entrants.
The four main channels for disruption are:
• Eciency channel.
• Customer-centricity channel.
• Transaction values or verication channel, see Blockchain.
• Data channel, see Big Data.
Disruptive eciency is the classical view of nancial intermediaries: All banks are
looking at ways to cut costs and also generate more revenues. Ermotti (2016) . Digital
eciency has a dierent meaning than past automation based eciency which meant
to digitalize the information workow in a value chain to reduce human activity and to
reach scalability: Doing the same at lower costs, with fewer errors and using scalability.
Disruption means to redesign the workows and to eliminate humans to a before not seen
degree using Bots, avaters or smart contracts. They are not only digitized legal docu-
ments - say trade conrmations - but they also contain code which make it possible that
the documents manage themselves over the life cycle. Platforms are a second example
of disruption. While platforms since ever changed the connectivity of the participants,
present platforms possess two new features: Not only numerical information ows but
any form of information gained from structured and unstructered data and platforms are
using AI to analyze, manage, control and direct the information ow in the platform.
FinTech drive Customer-centricity by developing solutions starting with the end-

customer in mind. The customers will have more bargaining power and autonomy to
decide. They can decide, value and compare say any investment advice in much deeper
and intuitive way than in the past. The digital instruments also allow to take into account
the customer's environment or his context. Customers expect tailor-made, convenient,
aordable and integrated (smartphone) solutions for their asset and wealth management
purposes.
How will asset protection considered in a digital world where cyber criminals, govern-
ments and bank defaults dene risk sources? The FinTechs do not have the reputation
and size for protecting accounts, insurance contracts and security deposits. The Tech
Giants' reputation is decreasing, in particular in the US and Europe. In the FinTech
study conducted by Roland Berger (2016), the European FinTechs mentioned customer
trust to the nancial intermediaries as the only success factor for nancial intermediaries.
Their protection function works since decades. So far there is no strong alternative to FI
regarding safe keeping of money and nancial assets.
Customer-centricity also aects regulation. It changes the interactions between regu-

lation and innovation that have existed for decades: New regulation leads to innovations,
which in turn trigger regulation. This occurred on a structural (Glass-Steagall Act) or a
product level (Eurobonds). This cat-and-mouse game becomes less important due to the
customer centricity where the interaction nancial innovation/customer will dominate.
This challenges regulators. How should they behave in the spotlight of this new dynam-
ics and what is the best approach to any regulation? Evidently, in the new dynamic,
complex interaction cannot be eectively and eciently controlled with static, large reg-
ulatory frameworks as it was done in the past.
The WEF 2015 document The Future of Financial Services (2015) (FFS) summarizes
and extends the discussion. The paper identied 11 clusters of innovation in six functions
of nancial services, see Figure 6.1.
The approach of considering six independent intermediary functions and identifying
within these functions the eleven clusters is a silo business view. The clusters can be
grouped into six themes that cut across traditional functions:
• Streamlined Infrastructure: Emerging platforms and decentralised technologies pro-

vide new ways to aggregate and analyse information, improving connectivity and
reducing the marginal costs of accessing information and participating in nancial
activities.
• Automation of High-Value Activities ...
• Reduced Intermediation: Emerging innovations are streamlining or eliminating tra-

ditional institutions' role as intermediaries, and oering lower prices and / or
higher returns to customers.
• The Strategic Role of Data ...
• Niche, Specialised Products: New entrants with deep specialisations are creating
Figure 6.1: The six functions (payments, market provisioning, investment management,
capital raising, deposits and lending, and insurance) and the 11 innovation clusters (new
market platforms, smarter & faster machines, cashless world, emerging payments rails, in-
surance disaggregation, connected insurance, alternative lending, shifting customer pref-
erences, crowd funding, process externalization, empowered investors) (The Future of
Financial Services [2015]).
highly targeted products and services, increasing competition in these areas and
creating pressure for the traditional end-to-end nancial services model to unbundle.
• Customer Empowerment ...
2017 the FFS paper was reconsidered and updated. Some expected trends materialized
in the two years period while for others the expectations were revised due to lack of
demand, technological immaturity or regulatory considerations.
Two years later in 2017 the working group of the WEF published a status report.
The main ndings are:
• Fintechs have seized the initiative â dening the direction, shape and pace of
innovation across almost every subsector of nancial services.
• Fintechs have reshaped customer expectations, setting new and higher bars for user
experience.
• Failure: Customer willingness to switch away from incumbents has been overesti-
mated.
• Fintechs have struggled to create new infrastructure and establish new nancial
services ecosystems.
We close this section with sentiments of people about the digital disruption. On a
broad scale, two-thirds of 400 CEOs in the US surveyed by KPMG in 2016 believed that
the next three years will be more critical for the business performance of their companies
than the past 50 years. Additionally, Grossman (2016) states in a CEO survey, based
on Russel Reynolds Associates, that there are only three industry sectors, Health Care,
Asset Management and Industries, where less than 50% of the CEOs expect massive
or moderate digital disruption. For Media, Consumer Financial Services and Telecom
more than 60% expect such a scenario. 34 percent of all 4.000 Chief Information Ocer
respondents surveyed in more than 50 countries note that digital disruption is already a
reality in their companies, and further 28% say that this will happen in the next 1 to 2
years (Harvey Nash (2015)). Those responsible for information expect a much stronger
disruption for the services industry due to the lack of physical components than for the
processing, pharmaceutical or energy sector.
It is a scientic fact that nancial literacy of the population is at a low level. Hence,
any link to end-customers which is not based on a rational but an emotional paradigm
is likely to win the end-customers connection battle. Although in principle an emotional
link can be formed by using a human interaction with the end-customer this approach
is not scalable: A client advisor has in Europe between 100 and 400 clients to serve.
Therefore a digital link is a more promising solution. But how can a communication
between a human and a software generate an emotional basis? The software needs to
care and perform. This means to understand the customer's needs in his life cycle context
and to give meaningful advice. AI is pointing in this direction. If this is possible why
should end-customers bother that they do not communicate with a human?
6.1.3 Value Chain, Investment Process and Technology

Asset management is more than just investment theory. Roughly, by knowing an in-
vestment strategy we have not set up machinery that shows how the strategy can be
implemented, priced, sold and managed for many investors eciently, and we do not
know how to export our AM capacity to other cultures and jurisdictions in a compliant
and protable way. These issues dene the value chain of AM, see Figure 6.2 where the
production part of the value chain is shown.
Figure 6.2: Structure of the AM value chain.
The chain has two layers: the business and infrastructure level. The business layer
has the following main functions (see Figure 6.2):
• The front -, middle - and back oce.
• Product management.
• Solution providers.
The front oce consists of the distribution channel and the investment process. In
this part of the chain the investor's preferences, risk capacity, and the type of investment
delegation (execution-only, mandate, or advisory) are dened. All communication to end
clients is made via this channel - new solutions, performance, risk reporting, etc. The
investment process, headed by the CIO, starts with the investment view applied to the
admissible investment universe. The view is then implemented by portfolio managers
where dierent procedures can be followed. More precisely, the investment process has
the following sub-processes for mandate clients:
• Investment view by the CIO.
• Tactical asset allocation (TAA) construction.
• Implementation of the TAA by asset managers.

• Matching of the eligible client portfolio to the implemented portfolios.
The middle oce is responsible for reporting and for controlling the client portfolio
with respect to suitability, appropriateness, performance, risk and it also constructs the
eligible client portfolio. The back oce is responsible for the execution and settlement
of the trades.
The product management denes for the investor an eligible, suitable and appropriate
oering. It is also responsible for overall governance, such as market access and regu-
latory requirements. The product management strategy tries to understand where the
market is headed, how this compares with current products, client segments served, and
rms' capabilities, and how competitors price their services in dierent channels. Product
managers anticipate the people, process, and technology requirements for the product.
They also assess gaps versus current capabilities and propose counter measures. A main
function is the new-product-approval (NPA) process oce. This oce guarantees both
an optimal time-to-market and an eective implementation of new products. Finally,
product management also oversees out- or insourcing opportunities in the business value
chain. The solution providers in the investment process provide the building blocks for
implementing the portfolios including funds, cash products, ETFs and derivatives.
The infrastructure layer naturally develops, maintains, and optimizes the IT infras-
tructure for the several functions of the business layer. The technology ocer oversees
the developments in technology and data management and considers the out- or insourc-
ing opportunities along the infrastructure value chain.
To deal with the digital disruption, many leading companies are looking at their
businesses and operations anew, taking something of a 'blank sheet of paper' view of
the world. Many outsource important parts of their back oces (NAV calculations, 'on-
boarding', investor statements, etc.), largely as a reaction to investor pressure following
the scandals, see Section 2.5.4. According to PwC's recently released Alternative Ad-
ministration Survey, 75 percent of alternatives fund managers currently outsource some
portion of their back oce to administrators and 90 percent of hedge funds behave in
this way. While the initial experience has been mixed in many respects, it has helped to
rethink business from scratch.
6.1.4 Some Innovations

Platforms are a synonym for technical connectivity. Novus is such a platform provider.
Dec 2017, almost 200 of the world's top investment managers and investors - managing
a combined total of approximately USD 3.5 trillion - are using Novus platform. At its
essence, Novus is a platform via which the industry's top investors can collectively inno-
vate. Novus aggregates funds' performance and position data. This denes a single point
of access for asset managers. Using this platform, almost all worldwide funds and their
performance are catalogued and analyzed based on an automated collection of regulatory
reporting data.
Externalization of processes is a key strategy for FI in the digital world. FFS classies
dierent innovations in process externalization:
• Advanced analytics. Using advanced computing power, algorithms and analytical

models to provide a level of sophistication for the solutions.
• Cloud computing to improve connectivity with and within institutions. This allows
for simpler data sharing, lowers implementation costs. streamlines the maintenance
of processes, and enables real-time processing.
• Natural language leading to more intuitive processes for end users.
Kensho as an example models investment scenarios for fully automatized decision-making.

The cost per generated scenario are much lower than those few manually generated sce-
narios. Using Kensho, institutions can shift their resources away from the management
of processes to functions with higher value and where the asset management rm has
comparative advantages. Kensho threatens the ability to model market projections and
hypotheses by quants of large nancial institutions by oering next-generation tools,
application, technology and data bases. Common models of the process externalisation
providers are:
• Platform, real-time databases or expert systems, leverage automation for the users
and the solution providers.
• As-a-service reduces infrastructure investments to a minimum level by externaliza-

tion.
• Capability sharing between institutions frees them to build up all possible capabil-
ities and allows integration of dierent legal and technical standards.
Process externalization means for the AM industry:
• AM rms use advanced technologies to externalize, consolidate, and commoditize

processes in a more ecient and sophisticated manner.
• Winning AM activities shift from process execution to more 'human' factors.
• External service providers give small and medium-size asset managers access to
sophisticated capabilities that were not previously attainable due to lack of scale.
This gives access to small and medium-size asset managers to top-tier processes
and smaller players are able to compete with large incumbents.
• Cross-border oering become protable with well controlled conduct and regulatory
risk due to the platforms. But it could also amplify the risks of non-compliant activ-
ities and unclear liabilities when centralized externalization providers fail. Automa-
tion also increases the speed at which nancial institutions implement regulatory
changes. Therefore, regulators will receive faster consistent inputs from nancial
institutions.
6.2. BIG DATA 375
• Since more capabilities, technologies, and processes are externalized, asset manage-
ment rm becomes more dependent on third parties, lose negotiating power and
continuity.
The constantly evolving regulation across geographies means an increase of compliance

resources require solutions about regulation and its changes which is consistent within
and across dierent jurisdictions. New entrants are able to interpret regulatory changes
and translate them into rules. Such a rules based approach is scalable and allows asset
managers responding fast to regulatory changes, see Figure ??.
FundApps is such a FinTech rm. It organizes regulatory information from various

sources, and delivers a cloud-based service that automates shareholding disclosure and
monitors investment restrictions across over one-hundred regulatory regimes. FundApps
partners with a global legal service provider to monitor and translate changes in rele-
vant regulations into rules on a daily basis. If regulatory agencies partner rms such as
FundApps in the future, they could ensure consistent compliance across nancial institu-
tions, make dissemination of regulatory changes in disclosure regimes faster, and reduce
the compliance burden faced by the industry. FFS.
6.2 Big Data

6.2.1 Denitions
Big data means a business process, see Figure 6.3: The goal is to answer business ques-
tions using a large amount of dierent type of data and algorithms extracting information
on the data to answer the business questions.
2 From a volume perspective the data vol-
ume range is between peta- to zetabytes (1 mio. petabytes)
3 Besides structured data,
unstructured data such text are considered. This requires dierent tools and architectures
to handle the large and dierent-type data. The amount and growth rate of unstructured
data encompass structured ones. For the algorithms, open-source code is widely used
compared to the secret code approach in the past. Since storage capacities are almost
free and computational speed is still increasing, handling massive data is possible and
protable.
Broadly spoken the process can be split into two steps. First, raw data are trans-
formed into model variables such as averages, aggregates, conditioning of the raw data.
The raw data is complex, huge, structured and unstructured. The second step is to
generate analytics using algorithms. Note that the economic value of this process starts
at the end of the two steps - a clear business perspective is needed to make the big data
2 The sources in this section are Lin (2015), Roncalli (2014), McKinsey Global Institute (2011, 2013),
Varian (2013), Hastie et al. (2009), Harvey et al. (2014), Novy-Marx (2014), Bruder et al. (2011), Freire
(2015), Fastrich et al. (2015), Zou (2006), DeMiguel et al. (2009), Belloni et al. (2012), Burges (1998),
Smola and Schölkopf (2004), Jaakokla (2006).
3 1 peta means 1015 or 1.000 trillions.
Internal
Structured
Develop/
Prediction Retain/
Acquire
Semi-Structured Visualization
Unstructured Costumers
Clustering Optimize
S
F 1 Peta F Pricing
Analytics
External
Improve
Learning Products
Structured
Algorithms Marketing
Unstructured
Figure 6.3: Denition of big data adapted from Roncalli [2014].
process meaningful.
Pre-processing the data for the second analytic step is a challenge. For this purpose,
the given structures of the dierent databases must be brought together. The data
are not only available in dierent formats, they are also not complete, have dierent
integrity properties, are intermittently exible, only partially digitized. The quality of
pre-processing the data determines the value of the following analytics. While in the past
years the main focus was on pre-processing, innovation shifted to analytics (algorithms).
This is possible thanks to the powerful private and public usable pre- and processing
applications.
6.2.2 Demand for Big Data

One can broadly distinguish big data analytics in the FI in the two often overlapping
clusters 'customers' and 'internal FI' data analytics. In the context of customers, big
data should help to
1. develop customers,
2. retain customers,
3. acquire customers.
Internal goals of data analytics are:

6.2. BIG DATA 377
1. optimize distribution, i.e. the relationship managers' performance and potential,
2. optimize marketing and branding,
3. analyze competitors and pricing.
4. cyber and fraud security.
Optimization means that relationship managers know their potential given their cus-
tomer's need with respect to products and services. Another application is customer
selection for a new product oering. Typically, customer selected using big data lead to
a several times higher success rate compared to the traditional selection methods.
The market for big data rose from USD 7.3 billion in 2010 to 130 billion in 2016
(wikibon.org, Forbes, IDC). The revenues for the providers of large data are distributed
in large data hardware, software and service revenues. Large IT companies like IBM, HP
or Dell dominate in absolute revenues. But the share of total sales in these companies
is still at a low single-digit percentage. New companies with large big data revenues are
Palantir and Pivotal.
There are, however, also a number of substantive criticisms regarding big data ana-
lytics. The construction of the analysis function f often lacks theoretical foundations.
This may result in an insucient performance of the analysis. The plethora of so-called
Robo Advisors for example are leaving many investors disappointed. Data protection
and data privacy are key to data science although many individuals at present do not
really care about their privacy. But this will and has to change if humanity wants not to
be ruled in the future by data owned by a few rms. This is one of the great challenges
for politics and will undoubtedly be a key hurdle for the growth of data analytics.
6.2.3 Algorithms
There are dierent types of articial intelligence (AI) algorithms, see Figure 6.4. While
AI is about algorithms, big data analytics is a business process. AI industry start-ups rose
from $ 282 million in 2011, to $2.4 billion in 2015 (WEF (2017) and the number of merger
and acquisition deals in AI also raised to 20 to 40 deals per annum. Vaguely, AI is the
theory and application of software to perform tasks which require human intelligence.
Machine learning (ML) is a narrower concept. ML, a statistical theory, extends well-
known methods such as linear regression to situations where the data set is enormous or
where the linearity assumption is not suitable. While econometrics is based on causal
inference, ML is not. ML is based on prediction and categorization using optimization. A
learner or algorithm detect characteristics on a training set such as typical words in email
spamming and applies the insight to new emails. But such an inductive reasoning might
lead us to false conclusions. The word 'casino' is labelled as a spamming indicator but
the word can also appear in a non-spamming email. While human learners can rely on
common sense to lter the meaning of such a word, a machine learner needs well-dened
AI
Big Data
ML
Deep Learning
Supervised (Classifikation)
Unsupervised (Patterns)
Reinforcement
AI vs ML (Un)Supervised, Reinforcement, Deep

• ML subset of AI Learning
• ML = selflearning, algorithmic • Supervised: Classification, testsets
identification of pattern and objects in with known results
large data sets • Unsupervised: Estimation and pattern
• ML-Applications: Image/voice/ recognition (outliers, anomalies)
speech/text recognition, fraud • Reinforcement: Mix between above
detection, predictions two ML types
• AI different from ML: Use and • Deep Learning: Neuronale nets,
application of deductive and inductive hidden layers (i.e. multistage image
logic recognition)
Figure 6.4: Scheme of algorithms and big data.
principles in order of not reaching useless conclusions. Basic is the incorporation of prior
knowledge that biases the learning mechanism; the inductive bias. Evidently, there is a
trade-o between too restrictive and too broad a priori knowledge implementation.
6.2.4 Machine Learning (ML)

Machine learning means learning from examples (training set) which are shown to the
machine and the machine tries to infer rules which can then be applied to new, unseen
examples, see Figure 6.5. In this sense, ML means the automation of the process of
learning from experience. We will exclusively study the statistical perspective of learn-
ing: How many samples are needed for learning? That means, we focus on the amount
of information learning requires. We do not consider the important task of how much
computation is involved in carrying out a learning task. ML is similar to a child which
learns what a car is by showing examples of cars. After learning, the child can decide
whether a new object is a car or not. But learning does not mean in this example how a
car functions how a car is driven. Dierent from general AI, the goal is not to generate
any kind of intelligent behaviour but to discover rules, tasks or mechanisms which can
be learned by a computer.
If a prediction or algorithm f learns on the training set the correct answers, a hu-
man tells the machine what is correct on the training set. This is called a supervised
learning problem. If the values of the output are not known, unsupervised learning,
learning means to nd structures or meaningful groups on the inputs. Unsupervised learn-
6.2. BIG DATA 379
Data
Computer Output
Program
Traditional Progamming
Data
Computer Program
Output
Machine Learning
Computer Program
Cat, Cat, No Cat
Cat, Cat, … Learning
Algorithm Cat !
Figure 6.5: Upper panel. Distinction between traditional programming and machine
learning. Lower panel. A supervised learning example. Data consists of images of cats
and non-cats. A supervisor classies the images in cats and non-cats. The classication
and the corresponding digital animal data feed the training algorithm such that the
algorithm is then able to classify with high precision new, not yet seen images into cat
or non-cat.
ing arises in consumer behaviour and investment behaviour situations. The algorithm
tries for example to pool customers with similar behaviour. Reinforcement learning
means optimal control such as in intertemporal decision making in investment. Key are
solution concepts such as the Bellman principle. The algorithm starts with unlabelled
data, chooses action and receives a feedback from a machine or a human. Reinforcement
learning is used in robotics and self-driving cars projects. Deep learning uses algo-
rithms inspired by the structure and function of the brain. It uses several layers and
the algorithms are called articial neural networks. These algorithms can be used for
supervised, unsupervised, or reinforcement learning. This section is based on Luxburg
and Schölkopf (2008), Shalev-Schwartz (2016), Hazan (2016), Bruna (2018).
X is the set of examples or instances (pictures of cats). Every point in

Formally,
X hasfeatures such as four legs, two ears. They are often represented as a vector. Y is
the label space. The data set S consists of all pairs of instances and labels,
S = {(x1 , y1 ), . . . (xm , ym )} ⊂ X × Y.
Y in is simplest form the binary set +1, −1 (cat, no cat). We always consider this output
space in classication tasks unless otherwise stated. The data are randomly split into
the typically labelled training data, the test data with hidden labels and the valida-
tion data used for parameter tuning. The proofs of all proposition are given in Section 7.
X, Y, S are the inputs in the statistical learning model (ML). The output is a pre-
diction rule, hypotheses f : X → Y where f ∈ F with F a space of functions or
sets: Circles, rectangles, linear, polynomials or more general functions. F is called the
hypothesis class. A classication algorithm for example takes the training data as
input and produces a classier f ∈F as output. The selection of f is the algorithm,
although we often call f directly the algorithm.
No assumptions are made about the sets X and Y but about the mechanism which
generates the data: There exists a joint probability function P on X × Y , each training
example (xi , yi ) is sampled IID from P and Y is given by some function h:X →Y.
Note that P is not known at the time of learning - else learning becomes trivial.
By the denition, P is not changing over time is ruled out. The stationarity of the
unknown distribution must be relaxed if nancial time seris are forecasted which are
not stationary. We write for the power |F| of a set F; i.e. the number of elements. If
F is the set of classiers from X with m examples into yes/no classier set, then |F | = 2m .
Given a training set S and a set F, the goal is to nd the 'best function f ∈F such
that the classier is able to classify well all new data dening the test set.
To illustrate what learning means, we rst reconsider regression analysis and as a
second example the problem to classify apples into sweet and sour ones.
6.2.4.1 Regression from a ML Perspective

We follow the review of Mehta et al. (2019). Assume that the data set S is randomly
split into training Strain and test data Stest , respectively. As a rule of thumb, the training
set is much larger than the test set.
The setup is given by the dataset S, the model f (θ) ∈ F which is a function of the
parameters θ and the cost or risk function R(S, f (θ) which values how well f (θ) performs
on, the observations S. Given F, the goal is to nd the value of the parameters θ∗ that
minimize the risk function where we assume the squared error risk function. The cost
or risk function is given by an expected value of a loss function or equivalently, by a
probability of the set of all x where the true h(x) and the model prediction f (x) dier.
The model is tted only on the training set:
θ∗ = argminθ R(Strain , f (θ))
. The performance is done by calculating on the test set R(Stest , f (θ∗ )) and we de-
ne the in-sample error Rin := R(Strain , f (θ∗ )) and the out-of-sample error Rout :=
6.2. BIG DATA 381
R(Stest , f (θ∗ )). Since the two risks are calculated based on observed data, both values
Rin , Rout follow for any f ∈ F that minimizes empirical risk; the empirical risk mini-
mization (ERM) algorithm which we consider below.
In general, Rout ≥ Rin holds. More can be said if additional assumptions are made.
Consider the case of linear regression where the predicting minimizer least squares error
solution θ∗ is taken from IID sample of the observation. Then, the average R̄out , R̄in
satisfy
R̄out = σ 2 (1 + p/m) , R̄in = σ 2 (1 − p/m)
where p is the number of features in each sample. Therefore,
p
|R̄out − R̄in | = 2σ 2 .
m
If the number of features p is much larger then number of data m, the error between
the in- and out-of-sample (generalization error) is large. The model is not learning. To
improve the situation either more samples are needed or regularization is used. such as
the Ridge regression or the LASSO penalty.
Since we do not know the exact model f ∈ F, several models f in F are considered
and the model minimizing Rout is chosen as the best model. But typically the model
with lowest Rout does not have the lowest Rin . Consider data which on a large scale have
a linear drift. Then a rst order polynomial model ts not perfectly the training data
but it is expected to do well on unseen test data. A 10th order polynomial, which has
a higher model complexity, will do much better on the training data but any new data
which are not in the bulk of the training data will lead to a large error: The goal is to
obtain a model that is useful for prediction and not for the best in-sample t. Moreover,
the dierence |Rin − Rout | increases for increasing model complexity since the increasing
number of parameters forces us to consider high-dimensional spaces. The 'curse of di-
mensionality' ensures that many phenomena that are absent or rare in low-dimensional
spaces become generic.
Consider a probabilistic process that assigns a label yi to an observation xi generated

by drawing samples from the equation
yi = h(xi ) + i (6.1)
where h(xi ) i is a Gaussian, uncorrelated noise

is some xed unknown function, and
2
variable with mean zero and covariance σ . The larger i . the noisier the data. To make
predictions, consider three polynomial classes: The set of all linear polynomials F1 , the
set F3 of order 3 polynomials and F10 , respectively. F1 has two, F3 four and F10 eleven
parameters, i.e. model complexity is increasing.
We ask how the size of the training dataset Strai and the noise strength aect the
ability to make predictions. To train the three models, x are uniformly sampled in an
interval, Strai is constructed using (6.1) and tting on this sample is done using least-
squares regression.
With zero noise σ=0 for any size of Strai the model class that generated the data
also provides the best t and the most accurate out-of-sample predictions. For noisy
data σ 6= 0 and a large training set, the 10th order model provides the lowest Rin but
the worst out-of-sample predictions Rout independent of the order of the polynomial in
data generation. If the data set is small a simple linear model cannot represent possible
complex patterns but complex models can. But if sparse data are driven by noise the
complex model will do poorly out of sample - the overtting problem. Hence, for small
data sets simple models have more predictive power although they have a higher bias, a
simple set F1 cannot learn the true model, the error of new data points is lower, i.e. a
lower variance holds. This is called the bias-variance trade-o, see Figure 6.6. Therefore,
although the F10 models have a better predictive performance for an innite amount of
training data (less bias), the training errors stemming from nite-size sampling (variance)
cause simpler models to outperform.
Can we say something general about the relationship between Rin and Rout , see
Figure 6.6.
Optimum
Rout Rout
Variance
Error
Error
Bias
Rin
Variance Bias
Data Points Model Complexity F
Figure 6.6: Left panel. Rin and Rout as a function of training set size where the model
by assumption cannot not exactly t the true function h(x). Right Panel: Bias-Variance
tradeo and model complexity. Shown is Rout as function of the model complexity for a
training dataset of xed size. The bias decreases and the variance increases with model
complexity.
The two risks are a function of the amount of training data and assume that true
data are driven by a complicated distribution such that we cannot exactly learn the
function h(x). Rin increases with the number of data points, since our models are not
powerful enough to learn the true function. Rout decreases for more data points: Rout
and Rin must approach the same value; the 'bias' of our model. The dierence between
the two errors is called the generalization error. Bias means how much on an average are
6.2. BIG DATA 383
the predicted values dierent from the actual ones, i.e. erroneous assumptions about F
such that relevant relations between features and outputs are missing, i.e. undertting.
Hence, the more complex F, the smaller the bias.
But an innite amount of data is not available. To get best predictive power it is
better to minimize Rout Rout can be decomposed into a bias and
rather than the bias.
a variance. Models with a large gap |Rout − Rout | 'overts' the data, i.e. tting and
prediction heavily dier. It is not enough to minimize the training error, since Rout can
still be large. The second panel in Figure 6.6 shows Rout as function of the complexity
of the assumed model class F to approximate the true function h(x). ML makes precise
what complexity means, see below. Rout will be a non-monotonic function of the model
complexity on the training set: It is therefore minimized for models with intermediate
complexity. Using more complicated models F reduces the bias but for too complex F,
the generalization error becomes larger due to high variance. Thus, to minimize Rout
a more biased model with small variance is better suited than a less-biased model with
large variance (bias-variance tradeo ). A high variance leads to modelling random noise
in the training data instead in the outputs, i.e. overtting. Models with a lower bias in
parameter estimation have a higher variance of the parameter estimates across samples,
and vice versa. Models with low bias are more complex which allows for better repre-
sentation of the training set. Summarizing, an optimal function f makes both (i) the
training error and (ii) the gap between training and test error small. These two goals
correspond to undertting, the model is not able to get a small training error, and
overitting, the gap is too large.
We discuss the bias-variance tradeo for continuous regression predictions. Consider

a dataset S consisting of m data points generated from the noisy model (6.1). Let fˆ
be the least square predictor for a new data point x by minimizing a squared error cost
function.
We want to nd a function fˆ(x) that approximates the true function h by making
between (y − fˆ(x)) minimal. fˆ
the mean squared error
2 Finding that generalizes to
points outside of the training set can be done with any algorithm used for supervised
learning. Whichever function fˆ we select, we can decompose its expected error on an
unseen sample x as follows:
h 2 i 2
y − fˆ(x) Bias fˆ(x) + Var fˆ(x) + σ 2

Rout = E =
where
Bias fˆ(x) = E fˆ(x) − f (x)

and
2
Var fˆ(x) = E[fˆ(x)2 ] − E[fˆ(x)]

See Section 7 for a proof. Since all three terms are non-negative, it forms a lower
bound on the expected error on unseen samples.
6.2.4.2 Classifying Apples

We want to classify apples into sweet and sour ones. We assume that two features weight
(g) and diameter (mm) matter, i.e. X a two-dimensional lattice with spacing 1mm and
1g , respectively, and Y = ±1 according to the outcome sweet or sour. For each date
t = 1, 2, . . ., an apple xt is presented. The learner then predicts ŷt ∈ Y , i.e. sweet or
sour. The environment then reveals the true label yt ∈ Y . The goal of the learner is
make as few mistakes as possible when he has to classify the randomly chosen apples
which are IID without knowledge of the distribution P. Without any hypotheses, if the
number of apples is unlimited such that at each date a new apple is shown, then the
learner might always err; he cannot know the label of the apples. If there are only a
nite number of apples an algorithm could memorize all labels but this is not what we
would call learning. This learning setting is called on-line learning since the learner
receives one sample (apple) at a time and makes a prediction for this sample. A dierent
setting is batch learning where the learner receives the full training sample for prediction.
The way out to make learning meaningful is to provide the learner with more knowl-
edge. We assume that the environment produces the labels of the apples by applying a
function h:X→Y which is element of F. By assumption, F is a nite set of rectangles
which are aligned to the axis'. That is we restrict the set of classiers to be nite and
given by rectangles. Explicitly, the maximum rectangle is given by 200g and 100 mm.
The prediction rule f is f (x) = 1 if x is element of interior of rectangle, else the value
is −1. Therefore, the learner knows F but not h. Figure 6.7 illustrates the construction.
In the right panel the problem of overclassiction, there is no rectangle which classies
the apples. This also shows that learning is dierent from tting. The learner prefers
in this case a less complex classier - rectangle - compared to the complex tting area.
Hence, a key task is to dene complexity or simplicity in ML.
Given the set F of rectangles does the unknown h exists? I.e. does there exists
a rectangle which fully determines which apples are sweet. The right panel in Figure
6.7 shows an example where realizability does not holds: There does not exist rectangle
h∈F such that the probability that h and f agree is one. The realizability assumption
means to assume the existence of h. This simplies the arguments but it can be waived
by using so-called agnostic learning, see below.
Given this hypothesis, how could we dene learning? We consider:
1. Consistent learner. He starts with F1 = F , i.e. all rectangles at date 1. At each

future date t given an apple xt , he picks an f ∈ Ft and predicts f (xt ) = ŷt . Having
observed yt , the set Ft is updated to Ft+1 by removing f if the hypotheses was not
successful (ŷt 6= yt ). This algorithm rules out more and more of the initial set of
rectangles.
6.2. BIG DATA 385
Maximum Rectangle Maximum Rectangle

Optimal Rectangle
Optimal Classifier
200g 200g
weight
weight
diameter 100mm diameter 100mm
Non Optimal Rectangles Non Optimal Rectangles
Figure 6.7: Left panel. Optimal rectangle classier. Blue denotes sour and red sweet
apples. Right panel. There exists no optimal classier. The optimal region, yellow, is
complex a complex domain which classies correctly the shown apples but it will hardly
correctly classify additional apples. This corresponds to overclassication (similar to
overtting) which means that the optimal algorithm shown will poorly generalize to
further not yet classied apples.
2. Halving Learner. He behaves as the consistent learner, except that he predicts the
majority f (xt ) where f ∈ Ft .
Theorem 74. The consistent learner makes at most |F | − 1 errors; the halving learner
log2 |F |.
The proof is simple. Suppose that at t, the f ∈ Ft used for prediction leads to an error,
i.e. f∈
/ Ft+1 and hence, |Ft+1 | ≤ |Ft | − 1. By induction,
|Ft+1 | ≤ |Ft | − 1 ≤ |Ft−1 | − 2 ≤ . . . ≤ |F1 | − t.
For the halving learner, the results follows by induction on |Ft+1 | ≤ |Ft |/2 since for any
error, half of the functions in Ft will not be in Ft+1 . Although the halving learner makes
dramatically less errors, the runtime of halving grows with |F |. That is, ecient compu-
tation is needed in learning theory, else the whole methodology becomes useless.
How well does the algorithm f perform? The error function or true risk measures
the performance for a perfect classier h (if it exists) and the unknown P
R(f ) = P (f (x) 6= h(x)) . (6.2)
The identity P (A) = E(χA )) shows for a set A that risk is an expected value. Therefore,
R(f ) = P (f (x) 6= h(x)) = E(χf (x)6=h(x) ) = E(l(x, y)) (6.3)
with l the loss function. Hence, the error or risk can be equivalently expressed as
the expected loss. To compare risk with the best possible learning rule, we dene the
minimum risk value, Bayes risk,
R∗ = inf R(f ) .
f ∈F
For a binary classication, the classier leading to minimum risk can be explicitly calcu-
lated.
Theorem 75. Let F to be the set f : X → Y = {−1, 1} of all possible measurable

function Fall . Then the Bayes classier
(
+1 , if P (Y = 1|X = x) ≥ 1
fBayes := 2
−1 , else
denes an optimal classier, i.e. it attains Bayes' risk.
Although Bayes risk is smaller than for any other chosen classier f, since we don't
know P, we cannot compute the Bayes classier and neither the associated risk. But
this classier serves as a benchmark.
Is it possible to nd f such that R is zero? No. Consider X = {x1 , x2 } with

P ({x1 }) = 1 − , P ({x2 }) = , 0 < < 1, and m IID samples. The probability that x2
is not seen among all samples is (1 − )
m ∼ e−m for m large.4 If is much smaller than
1/m, then the probability of not seeing x2 tends to one: The label of x2 is not known.
Therefore, we are satised if
R(f ) ≤
with the accuracy chosen. There is a second problem arising from the randomness
of the input. The probability that the learner observes the same example over and
over again is not zero: R(f ) ≤ cannot be guaranteed by any algorithm. We allow
the algorithm to fail with a chosen condence probability δ over the random choice of
examples. Summarizing, the learner asks for training data S containing m(, δ) examples.
This denes Probably (with probability at least 1 − δ) Approximately (up to accuracy
) Correct - PAC learning..
Denition 76. m(, δ) is the sample complexity function.

This function does not depend on P and f.
Denition 77 (Statistically Learnable). A set F is statistically learnable if for all , δ > 0
exist a sample complexity function m(, δ) = |S| and an algorithm that produces f with
R(f ) < with probability 1 − δ and

1 1
m(, δ) = Poly( , ln , ln |F |)
δ
with Poly representing polynomial growth. F is PAC-learnable if the runtime of the
algorithm is polynomial in S .
Learnability assumes that number of samples required for generalization depends
logarithmically on the size of F and that it increases with increasing accuracy , δ .
4 (1 − )m = em ln(1−) ∼ em(ln(1)−ln0 (1)) .
6.2. BIG DATA 387
6.2.4.3 Learning Finite Classes

Assume F, S and realizability given. Empirical risk
|{(xi , yi )|f (xi ) 6= yi }|

Remp (f ) = (6.4)
m
which counts the errors of the algorithm is observable contrary to theoretical risk (6.2).
Empirical risk Remp (f ) depends on the data set, i.e. one often writes Remp,m (f ). Em-
pricial Risk Minimization (ERM) is given by any algorithm fERM that minimizes
empirical risk:
fERM (x) := argminf ∈F R(f ) (6.5)
This is the most important estimator for unknown theoretical risk. It should be con-
sidered with care since overtting may lead to a very low performance of the ERM.
Consider the apple classication problem where there exists a small rectangle R0 of area
1 0
which perfectly classies all sweet apples: They are element on R and the sour ones
are element of the complement. Consider a second larger rectangle R with area 2. Then
fERM (x) = yi if x = xi for some i and sweet otherwise is optimal and has zero error on
the training set. But outside of the sample the error is 1/2. We have not restricted the
hypothesis class in which we searched for an explanation to the data, i.e. it was too large
and we faced overtting. In general, if we choose the set of all functions Fall and set fm
equal to the classier which minimizes empirical risk among all functions, then in general
consistency does not hold true: Empirical risk will not converge to true unobservable risk
if the number of observations tends to innity.
Theorem 78 (ERM PAC Learnable, Finite Case) . Assume that f ∈ F is a nite set,
fERM dened in (6.5) and

1 |F |
m≥ log
δ
for all , δ > 0, then with probability 1 − δ
R(fERM ) ≤ .
This theorem applies to any machine learning model satisfying the assumptions; it
does not restrict P nor F. We prove more general theorems below.
We reconsider the classication of apples and with two features weight (g) and di-
ameter (mm) where the maximum rectangle classier of these two features is given by
200g and 100 mm. Apples are sampled IID uniformly from this maximum rectangle and
realizability holds. If F is the set of all indicator functions for all possible rectangles R0
with a precision of 1 g and 1 mm, then the cardinality |F | of F is bounded by a hugh
number
|F | ≤ 2002 × 1002 × 2 = 800 mn ,

where the last number two is due to apples being sweet or sour. The theorem states that
the problem is PAC learnable. Note that log F ≤ 50 and if follows that the sample size
S can be taken as
|S| = 100 × log(1002 × 2002 × 2) ≤ 100 000.
The theorem is generalized to account for the case where f∈/ F , the case agnostic learn-
ing, to the case where F is not nite and how to choose F and how to pick the best f .
And if there exist an optimal algorithm in F, how can we nd it without going through
the class elements one after the other?
6.2.4.4 Agnostic Learning

Wee drop the realization assumption but keep the niteness of F; this denes agnostic
learning. Reconsider the apple classication problem following equation (6.5). Apples
are sampled IID uniformly from a maximum rectangle and F was chosen to be the set
of all indicator functions for all possible rectangles R0 with a precision of 1 g and 1 mm.
If there is noise in the data it cannot be possible to perfectly classify the training set
with a rectangle, i.e. with error zero on the training set. This absolute error goal is then
weakened to a relative PAC.
Given PAC, agnostic learning is dened not to the zero-error case but w.r.t. the error
compared to the best f ∈ F: Denition 77 remains unchanged except that RP (f ) < is
replaced by
RP (f ) < min RP (f ∗ ) + .
f ∗∈F
Every nite F is agnostically ERM learnable: The learning algorithm theorem 78

carries over with a change in the complexity function µ (consider square in epsilon term):

1 ln |F |)
Poly( , ln .
2 δ
6.2.4.5 Generalization
Given a hypothesis class F and fm the classier with smallest empirical risk Remp (fm ).
But is true risk R(fm , X) =: R(fm ) small too? Is the error still small if fm is applied
on all (not only test data) of X with the unknown P ? The strong law of large number
states that |R(fm ) − Remp (fm )| converges to zero for m → ∞ for a xed f . The classi-
er fm then generalizes well. But there are two issues to consider. First, the rate of
convergence is unknown. Convergence speed can be that slow that the number m of test
data needed for a given accuracy becomes extremely large. Second, empirical risk should
approximate true risk uniformly in F and P, i.e. not only for a xed f. This denes the
concept of consistency. Why is consistency important? Consider a nite set F such
that for each f there exists a sample where the dierence between true and empirical risk
is small. But for a given sample we don't know how many of the functions in F satisfy
6.2. BIG DATA 389
the inequality. Furthermore, fm which minimizes empirical risk need not minimize also
true risk - the dierence between the two risk measures for fm can become large. We
want to rule out both cases; empirical risk needs to converge towards true risk uniformly
in F (independent of f ∈ F ). We address these two issues.
Writing fF for the theoretical classier which minimizes true risk given a set F we
dene:
Denition 79. An algorithm is called consistent w.r.t. F and P if for all > 0:
P (R(fm ) − R(fF ) > ) → 0, m → ∞.
The inequalities of Hoeng and Cherno, see below, address the speed of convergence
given a xed function f empirical risk is close to the actual risk:
2
P (|Remp (f ) − R(f )| ≤ ) ≤ 2e−2m . (6.6)
Hence, if m is suciently large, it is highly probable that the training error provides
a good estimate of the test error. These results are not sucient to prove consis-
tency of empirical risk. The dierence between empirical and true risk should become
simultaneously small for all functions f ∈ F. Formally,
sup |R(f ) − Remp (f )| ≤ .

f ∈F
But then also

|R(fm ) − Remp (f )| ≤ sup |R(f ) − Remp (f )| ≤ .
f ∈F
The quantity on the right hand side is what uniform law of large number deals with.
Proving uniform convergence of empirical risk implies uniform convergence of |R(fm ) −
R(fF )|:
|R(fm ) − R(fF )|
= R(fm ) − R(fF )
= R(fm ) − Remp (fm ) + Remp (fm ) − Remp (fF ) + Remp (fF ) − R(fF )
≤ R(fm ) − Remp (fm ) + Remp (fF ) − R(fF )
≤ 2 sup |R(f ) − Remp (f )|
f ∈F
where we used in the second last line Remp (fm ) − Remp (fF ) ≤ 0 by denition of fm .
Therefore,
P (|R(fm ) − R(fF )| ≥ ) ≤ P (sup |R(f ) − Remp (f )| ≥ /2).

f ∈F
Hence, if we can prove consistency for the empirical risk inequality, consistency for
|R(fm ) − R(fF )| follows.
The proof of consistency of supf ∈F |R(f ) − Remp (f ) is done in several steps where F
can be an innite set.
• Step I Symmetrization or ghost trick. R(f ) is not known. The trick replaces
this quantity by doubling virtually the existing sample of m points and replacing
true risk by calculable empiricial risk.
Proposition 80 (Vapnik and Chervonenkis Symmetrization Lemma) . Let F con-

sist of m elements with n samples. For m ≥ 2:
P (sup |Remp (f ) − R(f )| ≥ ) ≤ 2P 0 (sup |Remp (f ) − Remp

0
(f )| ≥ /2)
f ∈F f ∈F
where the second distribution P 0 refers to the IID distribution of a sample with size
2n and Remp measures the risk on the rst n samples and Remp 0 on the second n
samples.
• Step II Finiteness Step I implies that F is nite: Given the 2m elements, there
are at most 2
2m elements in F . Writing F for this nite set we replace the innite
set F :
2P 0 (sup |Remp (f ) − Remp

0
(f )| ≥ /2) = 2P 0 (sup |Remp (f ) − Remp
0
(f )| ≥ /2).
f ∈F f ∈F
• Step III Shattering Coecient, Union Bound, Hoeding The last step is
to bound the above expression by:
2 /4
P 0 (sup |Remp (f ) − Remp
0
(f )| ≥ /2) ≤ S(F, 2m)e−m (6.7)
f ∈F
with S the shattering coecient and where the union bound trick as well the
Hoeding inequality are used.
If the RHS of the last expression converges to zero for m to innity, then ERM is con-
sistent for the innite function set F. Given the exponential function, if the shattering
coecient is not growing to strong, then convergence follows.
Given uniform convergence, how should the space F be chosen such that the shatter-
ing coecient times the exponential function in Step III converges? If we choose F to all
functions, then the classier fm contains all Bayes classiers which leads to inconsistency.
More precisely, consider an ERM predictor over Fall from X to 0, 1. This means that no
a priori knowledge exists or that every possible function is considered a good candidate.
But the No-Free-Lunch theorem of data science predicts that the ERM predictor will fail
5
on some learning task. Therefore, the class of all functions is not PAC learnable.
To prevent such a situation, we use prior information, to restrict the space of all
function. Clearly we do not want to reduce it to the extend that the classier with zero
5 This theorem broadly states that there does not exists a learner or an algorithm which can succeed
on all learning tasks.
6.2. BIG DATA 391
error (PAC) or small error (Agnostic) is ruled out. The error decomposition shows that
there is an intermediate reduction. To achieve this, see Figure 6.8:
R(fm ) − R(fBayes ) = R(fm ) − R(fF ) + R(fF ) − R(fBayes ) (6.8)

| {z } | {z }
Estim. error Approx. error
The rst term is a random quantity depending on the data and it measures how close
Approximation Error Estimation Error
fF
fBayes fn
Space of all Function Fall Space F use by algorithm
Figure 6.8: Estimation and approximation error.
fm is to the best choic fF in F. Approximation risk is not driven by randomness but by

the error if we search in a space F instead in the space of all functions Fall . Choosing an
appropriate space F balances between the two risks. The approximation does not depend
on the sample size. Enlarging the class leading to fF can decrease the approximation
error. If the realizability assumption holds, this risk is by denition zero. The estimation
error results because the risk R(fF ) is only an estimate of Bayes risk. The quality of this
estimation depends on the training set size and on the size or complexity of the hypothesis
class. We have shown in the nite-F theory that this error increases logarithmically with
the size |F | and it decreases with m. The notions of shattering and the VC-dimension will
make explicit which classes F are learnable and which are not. In order to minimize total
risk, the above decomposition induces the so-called bias-complexity tradeo. Choosing
F large, decreases the approximation error and increase the estimation error due to
overtting. A very small F has the opposite eects due to undertting.
6.2.4.6 Inequality of Hoeding

We provide the details of the three steps in the last section and start with the inequality
of Cherno (1952) and its generalization of Hoeding (1963). They describe how good
nite empirical risk approximates expected true risk. Their bounds are used over and
over in statistical learning theory.
Proposition 81 (Hoeding). Let x1 , . . . , xm be independent bounded random variables

with xi taking values in [ai , bi ]. Let Sm = m i xi . Then for every > 0:
P

−2 2
P (|Sm − E(Sm )| ≥ ) ≤ 2e Wm (6.9)
with Wm
2 = − ai )2 .
P
i (bi
An increasing number m of training data reduces the probability for large deviations
of the empirical mean from the expected value. Replacing the variables in the theorem
with those used in the learning theory where the random variables take values in the
unit interval (the cube Wm volume equals 1), we get for the deviation of empirical risk
to true risk:
2
P (|Remp (f ) − R(f )| ≤ ) ≤ 2e−2m . (6.10)
2
We rewrite (6.10) using δ := 2e−2m as
s
log 2δ
P (|Remp (f ) − R(f )| ≤ )≤δ . (6.11)
2m
Inversion of the probability means that with probability at least 1−δ

s
log 2δ
|Remp (f ) − R(f )| ≤ .
2m
For any function

q f and any positive δ true risk is bounded by empirical risk plus the
log 2δ
2m with probability at least 1 − δ. If n is suciently large the training error becomes
a good estimate of true error. But the result is limited since it holds for a given function
f: For this function there is a training set where the bound holds with probability 1−δ
but for a dierent function f0 the training set may be dierent and the bound fails with
probability δ. The bound does not rule out the case that the deviation is large for a
general function while for a xed f this is unlikely to happen. The bound do not hold
uniformly. Hence, Hoeding's bound is not sucient to prove consistency of empirical
risk minimization. Applying the Glivenko-Cantelli theorem, which holds under the same
assumptions as the DKW theorem below, uniform convergence follows:
6
6 Convergence means in the almost everywhere sense. The inequality of DKW quanties how fast an
empirical distribution function approaches the distribution function from which the empirical samples
are drawn. It generalizes the Glivenko-Cantelli Lemma of uniform convergence of empirical functions.
6.2. BIG DATA 393
This means ∀ > 0:

P ( sup |Remp (fm ) − R(f )| ≥ ) = 0 , n → ∞. (6.13)
fm ∈F
The next theorem summarizes:
Proposition 83 (Vapnik and Chervonenkis) . Uniform convergence for all positive in

(6.13) is necessary and sucient for consistency of empirical risk minimization w.r.t. F.
This abstract result is not very useful for applications since there is no characterization
whether for a given set F the uniform law of large numbers holds. The theorems of DKW
and Glivenko-Cantelli are one dimensional results. This denes a second challenge since
our set F is high dimensional. We address these issues - which properties of F determine
uniform convergence - starting with the union bound trick.
6.2.4.7 Union Bound

The trick is based on the elementary statement that the probability of a union of events
is smaller or equal to the sum of the individual probabilities. We apply this to the case
where F consists of only nitely m many functions fi . Then, we have nitely many 'or'
operations:
P (sup |Remp (f ) − R(f )| ≥ )

f ∈F
= P (|Remp (f1 ) − R(f1 )| ≥ or |Remp (f2 ) − R(f2 )| ≥ . . .) . (6.14)
Applying the union bound trick and the Hoeding inequality:

2
X
P (sup |Remp (f ) − R(f )| ≥ ) ≤ P (|Remp (fi ) − R(fi )| ≥ ) ≤ 2me−2m . (6.15)
f ∈F i
This proves that empirical risk minimization over a nite set F is consistent with respect
to F: The supremum can be taken outside of the probability. Equivalently, for each
function f ∈F we have with probability 1−δ
s
log 2δ
R(f ) = Remp (f ) + log m + . (6.16)
2n
We see that uniformity, which is needed for consistency, increases the error bound by the
factor log m. The nite dimensional theory is next generalized to innite sets F since
(6.16) becomes meaningless if m tends to innity.
We dene the empirical distribution function:

m
X
Fm (x) = 1/n χ{Xi ≤x} , x ∈ R.
i=1
Proposition 82. (Dvoretzky - Kiefer - Wolfowitz inequality) Let Xi be IID. Then

r
2 1
P sup |Fm (x) − F (x)| > ε ≤ 2e−2mε for every ε ≥ ln 2. (6.12)
x∈R 2m
6.2.4.8 VC Theory
Can F be learned if its cardinality is innite, i.e. does the theorem about agnostic learn-
ing for nite F generalize? We start with an example which shows that niteness of F
is a sucient condition for learnability but not a necessary one. Hence, the size of the
class F is not the measure needed to classify the complexity of ML models in learnable
and non-learnable ones.
fr (x) = 1, if x ≥ r ∈ R, and zero else. Set F + equal to all fr , where r runs through
Let
the positive reals. That is F
+ consists of all positive half lines. It has uncountable many
elements and we set X = R, Y = {0, 1}. Despite its dimensionality, F

+ is statistically
PAC learnable and agnostic learnable. To simplify the calculation assume realizability,
i.e. f∗ is an algorithm which perfectly classies the data. To nd fERM , the algorithm
selects the maximal r such that no real number x < r is assigned to a value 1, i.e.
r > f ∗. Let fˆ be the function chosen by our algorithm. Then there is a region [f ∗ , fˆ]
with probability epsilon where f∗ and fˆ disagree, i.e. fˆ assigns 0 to an x in this interval
∗
while f assigns 1. On the remaining interval (fˆ, ∞) both functions agree with probability
1 − . Then,
|S|
Y
/ [f ∗ , fˆ]) =
P (RP (fˆ) > ) ≤ P (∀(xi , yi ) ∈ S, x ∈ / [f ∗ , fˆ]) ≤ (1 − )|S| .
P (x ∈
i=1
Using 1 + x ≤ ex we get
P (RP (fˆ) > ) ≤ e−|S| .
1 1 ˆ

If we choose |S| ≥ m(, δ) =
log δ , the probability that the error of f is larger than
can be made smaller than δ , i.e. the algorithm is learnable. The innite set F is
described by a single parameter. We guess that if an innite set F can be described by
a nite number of parameters, then the set is statistically learnable. This is true for the
above example in higher dimensions. But it fails in general to be true.
The key step in determining which innite sets F can be learned is based on the
ghost sample trick idea of Vapnik and Chervonenkis: It reduces an innite to a nite
problem where the union bound trick can be applied and where the factor m in nite
dimension is replaced by a capacity measure which can be computed for innite sets.
Let x1 , . . . , x m be data points and Zm be the sample of the m points (xi , yi ). Set
|FZm | equal to the cardinality of F restricted to Zm . Although F is innite, |FZm | is
nite. The shattering coecient S(F, m) of F is dened by:
S(F, m) = max{FZm | : x1 , . . . , xm ∈ X}
In other words, a set ofm instances Xm from input space X is shattered by a function
class of F if all possible 2m labellings can be generated using functions from F . If we
3
consider three points in the plane, i.e. m = 2, there are 2 = 8 labellings. Using
6.2. BIG DATA 395
hyperplanes as the only functions in F, the points are shattered by the hyper planes, see
Figure 6.9.
Figure 6.9: Computing the VC dimension of hyperplanes, i.e. F = hyperplanes, in

dimension 2 for three points. For any of the eight possible labellings of these points, we
can nd a linear classier that obtains zero training error on them. But there is no set
of 4 points that this hypothesis class can shatter: A set of 3 points can be shattered, but
no set of four points.
The number S(F, m) ≤ 2m is independent of the dimension of the set F . The

growth function S(F, m) is the maximum number how m points can be classied by F .
Consider the case where F consists of all possible functions. This class is that rich that
it can classify each sample in every possible way. That means S(F, 2m) = 2
2m . To prove
convergence of ERM, we have to consider from (81)

2 /4 2 /4 2 /4
S(F, 2m)e−m = 22m e−m = e2m log 2 e−m = m(log 2 − 2 /4).
If log 2 − 2 /4 > 0, i.e. 2 < 4log2, for m to innity the expression diverges. That is,
consistency of ERM for the unrestricted set of all functions Fall does not follow. Since
the bound for consistency is only an upper bound, i.e. a sucient condition, we cannot
not directly conclude that the ERM is inconsistent if we use all functions. The following
condition
log S(F, m)/m → 0
denes a necessary and sucient condition for ERM to be consistent. Using this on the
unrestricted set of functions
log S(F, 2m)/(2m) = 1 .

Therefore, nally
Theorem 84. ERM is not consistent on Fall .

But if the shattering coecient grows polynomially, say S(F, 2m) ≤ (2m)k , then
ERM is consistent.
The following theorem summarizes the above discussion, i.e. that the growth function
is the right one to generalize the nite dimensional theory.
Proposition 85 (Vapnik and Chervonenkis) . For any δ > 0, with probability 1 − δ any
function f ∈ F satises
r
4
R(f ) = Remp (f ) + (2 log S(F, 2n) − log δ) . (6.17)
m
The shattering coecient which we used so far has the drawback that it is dicult to
calculate. It turns out that a dierent capacity gure, the VC dimension is better suited.
To dene this number, a sample Zm of size m is shattered by the class F if the function
class can realize any labelling on the given sample, i.e. |Fm = 2m . The VC-dimension
A(F, m) is dened as the largest number m such that there exists a sample of size m
which is shattered by F. If the VC dimension of F is nite, then F is learnable:
Theorem 86. F is PAC learnable if and only if the VC dimension of F is nite. Then,
the complexity mF (, δ) grows at the same rate as
VC-dim(F)

1
log .
δ
For agnostic learning, the rst epsilon is replaced by its squared number.
The VC-dimension measures the ability of a set of functions to t available nite data.
A set of functions has VC-dimension dh samples that can be shattered by
if there exist
this set of functions, but there does not exist h + 1 samples that can be shattered. If
d
one considers the half-planes in R , then VC is d + 1, see Figure 6.9 for the case d = 2
since there exists three points that can be shattered but four point cannot be shattered.
If en (y), n = 1, . . . , m, is a set of m linearly independent function, then the function
f (y, θ) = χP θn en (x)+a>0
n
is equivalent to linear functions in m-dimensional space. The VC dimension is m + 1. If

arbitrarily large samples can be shattered, the VC dimension is set equal to innity.
This concludes our presentation of machine learning theory. Summarizing, PAC learn-
ability discussed so far allow the sample sizes to depend on the accuracy and condence
parameters, but they are uniform with respect to the labeling rule and the underlying
data distribution. That is, classes that are learnable must have a nite VC-dimension.
We refer to the literature for weaker notions of learnability
6.2. BIG DATA 397
6.2.5 Linear Threshold Model

Linear predictors are one of the most useful and used families of hypothesis classes F.
They are intuitive, easy to interpret, and they t the data reasonably well in many
problems. To dene them, we start with the class of ane functions An of dimension d
Ad (x) := {x → x0 + hθ, xi} θ ∈ Rd , x0 ∈ R,
that is each function is parametrized by θ. The dierent classier (hypothesis classes) of

linear predictors are compositions g ◦ An , i.e. maps from X = Rd → R → Y . In a binary
classication, the case under consideration here, g is chosen to be the sign function and
in a regression, g is the identity function.
The next theorem summarizes learnability:
Theorem 87. If x0 = 0, then the VC dimension of the class of halfspaces F in Rd is d.

If x0 6= 0, the VC dimension is d + 1.
In the case of binary classication, F is set equal to the class of halfspaces:
F = sign ◦ Ad = {x → f (x, θ) = sign(hθ, xi)}
where (setting x0 = 0)
(
+1, hθ, xi ≥ 0
f (x, θ) = sign(hθ, xi) = , θ ∈ dn .
−1, hθ, xi < 0
Each classier forms a hyperplane that is perpendicular to the vector θ and intersects
at the origin. This transition corresponds to the crossing of the plane hθ, xi = 0. The θ
vector is orthogonal to the plane. It points in the direction where hθ, xi increases most,
see Figure 6.10. The sign does not change if we change the order of the x's: The linear
classier does not cares about the nearness of the labels.
Assuming n points in S and realizability, then the ERM classier for half-spaces is
expected to make zero error on the training set. The ERM can be implemented by using
the perceptron alogrithm on half-spaces.
The idea is to adjust the parameters θ incrementally to minimize classier training
error step-by-step. By the perceptron update rule, on an image-by-image basis in the
training set, the adjustment after k steps for image m reads:
θ(k+1) = θ(k) + ym xm
if a mistake is realized (hym (θ

(k) ), x i < 0). Then
m
ym hθ(k+1) , xm i = ym h(θ(k) + ym xm ), xm i = ym hθ(k) , xm i + ym

2
hxm , xm i,
i.e.
ym hθ(k+1) , xm i = ym hθ(k) , xm i + ||xm ||2 ≥ ym hθ(k) , xm i .
A linear classifier without offset parameter Maximum margin linear classifier
<q,x> > 0 <q,x> > 0

<q,x> = 0 <q,x> = 0
ggeom
+1 labelled images +1 labelled images
Decision Boundary Decision Boundary
q q
<q,x> < 0
-1 labelled images
<q,x> < 0
-1 labelled images
Figure 6.10: Geometry of the classier problem.
Therefore, given a mistake the updated value becomes more positive and for a xed
image, after a certain number of updates the value becomes positive and the image is
classied correctly. Then the next image is considered. Updating the parameters again
leads to a correctly classied new image. But will these updates keep the former updates
stable - the convergence question of the algorithm.
Proposition 88. Assume that for all test images m exists a constant γ > 0 such that
hym (θ∗ ), xm i ≥ γ and that all training images have bounded norm ||xm || ≤ r. Then the
perceptron algorithm converges in a nite number of steps k with
r2 ||θ∗ ||2
k≤ . (6.18)
γ2
The number γ is called the margin, a name whose meaning will become clear be-
low. θ∗ is the decision parameter for the plane hθ∗ , xi = 0. Therefore, the assumption
hym (θ∗ ), xm i ≥ γ > 0 means that there exists a linear classier in our class with nite
parameter values that correctly classies all training images. The inverse upper bound
γ2
is the smallest distance in the image space from any image to the decision bound-
r2 ||θ∗ ||2
∗
ary specied by θ . It measures how well the two classes of images are separated by a
linear boundary. This is the geometric margin γ : geom and it inverse is a measure of
how dicult the problem is: The smaller the geometric margin the more dicult the
problem, see Figure 6.10.
6.2. BIG DATA 399
The bound in the theorem can be rewritten
r2
k≤ 2
.
γgeom
Remarkably, the bound does not depend directly on the dimension of the images (pixels)
nor on the number of training images. Nevertheless, the bound turns out as a measure
of complexity of the problem of learning linear classiers - the VC-dimension.
How well does the perceptron classify images which are not in the training set? If
r2 |
the two assumptions of the theorem hold true also for new images, then after k≤ 2
γgeom
7
mistakes in classifying the new images , all further images will be classied correctly. In
this sense, the above result generalize to the new images.
We assumed that there exists a linear classier that has a large geometric margin. Is
it possible to nd such a large margin classier directly? The next section provides the
answer.
6.2.6 Support Vector Machines (SVM)

We show how the optimal margin classier can be found directly. The classier is called
the Support Vector Machine (SVM). Intuitively, see Figure 6.11 where the optimal and
a non-optimal classier are shown, one could nd the maximum margin linear classier
by rst identifying any classier that correctly classies all examples and then increasing
the geometric margin until towards is maximum number: SVM nds the separating
hyperplane with the largest margin directly. This hyperplane minimizes the upper bound
of the classication error, i.e. it minimizes overtting.
This denes an optimization problem: Maximize the geometric margin under the
constraint that the classier is correct under on all training examples:
γ2
max , yk hθ, xk i ≥ γ, ∀k . (6.19)
θ ||θ||2
This problem is recast in a more suitable form. Replacing max by min provides a
quadratic objective function, inserting the usual factor 1/2 and since the result depends
on the ratio θ/γ , we set without loss of generality γ = 1. Summarizing, the problem
reads
1
min ||θ|| , yk hθ, xk i ≥ 1, ∀k . (6.20)
θ 2
This denes a quadratic optimization problem which can be generalized. Using the
Lagrangian, the Kuhn-Tucker conditions are necessary and also sucient for this convex
7 The algorithm does not know when he made a mistake. This detection has to be added to the model.
SVM
<q,x> > 0
+1 labelled images
<q,x> = 0
Decision Boundary
<q,x> < 0
-1 labelled images
Figure 6.11: The optimal SVM hyperplane is shown together with a non-optimal hyper-
plane where the margin is smaller than in the optimal case. The two data points which
are element of the two hyperplane belonging to the SVM optimal one are denoted by a
square.
problem for an optimum. If αk is the Lagrange multiplier associated to constraint k in

the optimization problem, the complementarity condition of Kuhn-Tucker
αk (yk hθ, xk i − 1) = 0
implies that only data points which are elements of the two hyperplanes in Figure 6.11
marked with the square can have αk > 0 since they are the only points where the con-
straint holds with equality. These two data points are called support vectors. For all
other data points the alphas are zero. In the pixel example, the solution depends only
on the subset of images which are exactly on the margin. The remaining images do not
matter. Hence, the support vectors are sucient to dene the training set.
The many conditions in the Kuhn-Tucker which are due to the inequality constraint
make it dicult to solve the problem. Therefore, one transforms the optimization prob-
lem from the above formulation (primal model) to its dual model form which is easier
∂L
to solve. That for, one solves in the primal model the equation
∂θ = 0 with L the
Lagrangian w.r.t. to θ and substitutes this solution back into the Lagrangian which im-
plies the dual Lagrangian LD which depends only on alpha, y and xk . From a statistical
learning theory perspective, maximizing the margin means minimizing the VC dimension
of the support vector machine. Support vector machines minimize bot empirical risk and
6.2. BIG DATA 401
the condence interval.
So far we did not considered the typical situation where images are dicult to classify
because of labelling errors, i.e. some few images pop-up in the wrong half-plane in the
optimal solution. We alter the optimization problem of SVM to account for these types of
errors in the maximum margin linear classier. The simplest form is to introduce 'slack'
variables. Slackness means that we measure the degree to which each margin constraint
is violated and associate a cost for the violation in the objective function. The problem
then reads:
n
1 X
min ||θ|| + c ξk , yk hθ, xk i ≥ 1 − ξk , ξk ≥ 0 , ∀k . (6.21)
θ 2
k=1
where ξ are the slack variables. If we have to set ξk > 0, then the margin constraint is
violated (possible misspecication) and the penalty costs occur. Increasing the constant
c, i.e. increasing the penalty costs, leads to ξk = 0 for all k. We are back in the original
problem. For small c many margin constraints can be violated. It is reasonable to ask
whether this is indeed the trade-o we want.
So far we assumed that data sets can be separated linearly by a hyperplane. But
often a non-linear curve is needed instead. We refer to the literature for the powerful
methods which rely on the clever idea to transform the non-linear into a linear one by
mapping the data into a higher dimensional space and then to use the above linear theory
in this space (kernel methods).
6.2.7 Tree Based Learning

This section is based on the material from AnalyticsVidhya. Tree based (TB) learning
algorithms are supervised learning methods. They are very popular since they can be
of a high accuracy, stability and easy to interpret. TB algorithms can be used both for
classication and regression. Unlike linear models such as linear regressions they can
account for non-linear relationships. Examples of TB are random forest, decision trees
or boosted trees.
As an example, assume that we have N = 30 stocks which each stock described by

three input variables: Creditworthiness score (high, low), sector (energy, transportation)
and its sustainability score (high, low). 15 of the stocks were performing well in the last
investment period. The goal is to nd a model to predict which stock will perform well in
the next period based on the input variables. Decision trees identify the variables which
creates the best homogeneous sets of stocks. How are these variables and the splitting
identied? While the variables in the example are all categorical (of the yes or no type),
the variables can also take continuous values. Figure 6.13 explains the main terminology
for decision trees. The only missing term is Pruning, i.e. removing sub-nodes of a decision
node which is the opposite operation of splitting.
Root Node
Parent Node of B and C
Decision Node Decision Node A
Terminal Node B Terminal Node C

Child Node of A Child Node of A
Terminal Node Decision Node
Terminal Node Terminal Node
Branches
Sub-trees Splitting
Figure 6.12: Terminology for decision trees.
It is common to speak about regression trees if the dependent variable is continuous;

else the expression classication tree is used.
Denition 89. A classication tree is a decision tree in which each node has a binary
decision based on Xi < a or not for a xed value a ∈
mathbbR.
The root node contains all data (Xi , Yi ). For both models, the prediction space is
cut into disjoint subsets. Splitting from top down cuts the prediction space into new
branches as long as the user dene to terminate the splitting process. If there are too
many splits, overtting follows: Performance will be poor when applied to new data.
Pruning is the counter measure to reduce overtting.
How does the tree decides when to make the next split? Many dierent algorithms
are used. The general goal is that at each node, feature Xi and the threshold a are
chosen to minimize resulting diversity in the children nodes. Consider the splits w.r.t.
Creditworthiness and Sector, respectively. The Gini index split calculates rst the
index for each sub-node and then the index is calculated for a split weighted Gini score
of each node of that split.
The weighted Gini index for the split Creditworthiness is then 1/4 ∗ 10/30 + 1/4 ∗
20/30 = 1/8 + 1/6 = 0.29. For the split Sector, the Gini Index is
12/30 7/12)2 + (5/12)2 + 18/30 8/18)2 + (10/18)2 = 0.51.

6.2. BIG DATA 403
Split on Creditworthiness Split on Sector
N=30 Assets N=30 Assets
Low 10 High 20 Energy 12 Transportation 18

Performing Performing Performing Performing
well last period well last period well last period well last period
5 10 7 8
Figure 6.13: Split on Creditworthiness and split on Sector.
The Gini score for Split on Sector is higher than the other one. The node split will be
on Sector. Intuitively, there is more diversity in the Sector split compared to the other
split where prediction is close to random coin toss. From an information perspective, the
purer a node is the less information is needed to describe the node. Hence, entropy S is
another quantity to calculate a split. A method for continuous variable are reduction of
variance calculations using the above two step procedure of calculating the variance for
each node and then using the weighted average for the split variance value.
To control for overtting, that is extreme 100% accuracy on training set by making
one leaf for each observation, is achieved either by setting constraints or by pruning.
Constraints can be set on the parameters in the tree: One can x the minimum number
of observations in the nodes for a split, dene the minimum samples for a terminal node,
the maximum depth of tree, the maximum number of terminal nodes or the maximum
features to consider for split. These restrictions prevent the model from learning relations
which are specic to the node but do not generalize.
Splitting is a myopic approach: The algorithm checks locally whether a split should
happen but does not consider a global view. The algorithm only stops if it reaches a
constraint value. In this sense, algorithms are greedy: Myopic decision makers which do
not take into account any future decisions. They act in the analogy to investment as one
period optimizer in a multi-period context. Pruning is the choice which consider eects a
few steps ahead. The implementation follows the usual backward induction logic. First,
generate the decision tree to a large depth and then work backwards by removing all
nodes which imply negative returns.
Comparing tree based models with linear logistic regression for classication and lin-
ear regressions the classic models are appropriate if the relationship between dependent
and independent variable is indeed linear. But if there non-linearities between the vari-
ables then only tree based models can account for them. Further more, tree based models
are often simpler to explain than their linear counter-parts.
Often not a single model is used but an ensemble of models to achieve a better
accuracy and model stability. Like any ML model, tree based models suer from the
bias-variance tradeo. Small trees for example lead to low variance and high bias. In-
creasing the complexity of the model a reduction in prediction error due to lower bias
follows. But at some point, high complexity starts to overt the model that is, variance
is increasing. Ensemble models are a method to manage the bias-variance trade-o. En-
semble methods include Bagging, Boosting and Stacking approaches. Boosting combines
many 'weak' or high bias models in an ensemble that has lower bias than the individ-
ual models, while bagging combines 'strong' learners in a way that reduces their variance.
Bagging reduces the variance of predictions by combining the result of multiple clas-
siers modelled on dierent sub-samples of the same data set. Starting with a training
setX of size N , bagging generates M new training sets Xi0 each of size M by sampling
from X uniformly and with replacement. The M models are tted using the above M
samples and combined by averaging the output for regression or voting for classication.
This averaging procedure stabilizes the single algorithms.
6.2.8 Naive Bayes Classier

Consider a training set X , where each training instance x is an n-dimensional attribute
vector x = (x1 , x2 , ..., xn ) and C is a set of classes c1 , . . . , cm . In which class should a
new instance x̃ be classied? That is we want to nd the most probable class
c∗ = arg max P (c|c̃) .

c∈C
Using Bayes' theorem,

P (c̃|c)P (c)
c∗ = arg max
c∈C P (c̃)
and since P (c̃) is the same for all classes,
c∗ = arg max P (c̃|c)P (c). (6.22)

c∈C
The Naive Bayes Classier assumes that the attributes are conditionally independent
given the classication:
n
Y
P (c̃|c) = P (c̃i |c) . (6.23)
i=1
6.2. BIG DATA 405
Denition 90. Naive Bayes classier nds the most probable class for c̃:
n
Y
c = arg max
∗
P (c̃i |c)P (c) . (6.24)
c∈C
i=1
We apply this to the following asset management example. Consider 18 investors of

an AM rm, which have ve attributes Risk Prole, Experience, Home Bias, Liquidity,
Buy Portfolio, see Table 6.2.8.
ID Risk Prole Experience Home Bias Liquidity Buy Portfolio

1 Low High No Fair No
2 Low High No Excellent No
3 Medium High No Fair Yes
4 High Medium No Fair Yes
5 High Low Yes Fair Yes
6 High Low Yes Excellent No
7 Medium Low Yes Excellent Yes
8 Low Medium No Fair No
9 Low Low Yes Fair Yes
10 High Medium Yes Fair Yes
11 Low Medium Yes Excellent Yes
12 Medium Medium No Excellent Yes
13 Medium High Yes Fair Yes
14 High Medium No Excellent No
15 Low Low Yes Fair Yes
16 High Medium Yes Fair Yes
17 Low Medium Yes Excellent Yes
18 Medium Medium No Excellent Yes
The prediction ŷ is whether a new client with given features will buy or not buy the
oered portfolio solution. More specically, we consider a client with features x̃ = (Risk
Prole=Low, Experience=Medium, Bias = Yes, Liquidity = Fair) and ask whether she
will buy the oered portfolio product. Two classes are of interest: c1 which means that
an investor will buy the portfolio and c2 that she will not do so. To decide this question,
we use the Naive Bayes Classier formula and start with P (c1 ) = 13/18, P (c2 ) = 5/18.
Table 6.2.8 summarizes the necessary conditional probabilities.
P (RP=Low|c1 ) = 4/13 P (RP=Low|c2 ) = 3/5

P (EXP=Med|c1 ) = 7/13 P (EXP=Medi|c2 ) = 2/5
P (BIAS = Yes|c1 ) = 9/13 P (BIAS = Yes|c1 ) = 1/5
P (LIQ=Fair|c1 ) = 8/13 P (LIQ=Fair|c2 ) = 2/5
Summing the probabilities, implies
P (x̃|c1 ) = 0.07 , P (x̃|c2 ) = 0.019

and calculating nally
P (c1 )P (x̃|c1 ) = 0.05 , P (c2 )P (x̃|c2 ) = 0.005,
the individual x̃ will buy the portfolio product.
6.2.9 Nearest Neighbour Analytics

Consider clients where each instance x is a vector with n attributes. We want to categorize
clients which are near and distant from each other in n dimensional space. If clients
cluster, one knows similar demand for goods and services which allows for up-selling
sales activities. Figure 6.14 plots a set of costumers which are scattered according to
their risk aversion index and their income. A third feature describes whether the client
bought a specic product (green) or whether he refused to do so (blue).
Figure 6.14: Nearest neighbours.
Consider a new client x̃ (red dot). If we consider he rst nearest neighbour, he belongs
to the class who does not buys the product. Considering the 3 nearest neighbours, we
assign to the new client the class blue, i.e. buy the product. Considering the 5 nearest
neighbours, a next assignment follows and so on. The reason to consider only an odd
number of neighbours is to avoid unambiguous assignment. We denote by Nk (˜§) the
neighbourhood of x̃ of thek instances given a distance metric d. Which metric d should
one choose? If the inputs x are real numbers, the Euclidian distance is a possible metric.8
8 The Manhatten distance is given by d(x, y) = P |xi − yi |, the p-norms are used or the Chebyshev
j
distance which is given by d(x, y) = maxi |xi − yi |.
6.2. BIG DATA 407
If inputs are binary valued, the Hamming distance is used.
If we use the Euclidian distance, the distance in income and the distance in risk
aversion are of dierent sizes. Therefore, the attributes are normalized to take values
in [0, 1] and furthermore, it attributes have dierent weights, the terms in the Euclidian
norm are weighted respectively. Typically, the weighting function v(x, y) can be chosen
inversely proportional to d(x, y), the closer two components of x, y are, the more weight
is attributed. The classication task is to nd the class cj ∈ C such that the weighted
distance in a neighbourhood is maximized, i.e.
X
c(x̃) = arg max v(x, x̃)d(cj , c(x)) . (6.25)
c∈C
x∈Nk (x̃)
6.2.10 'Sentimental Risk'

The goal is to integrate additional, forward-looking signals, which are based on the
computer-assisted analysis of a a variety of corporate and nancial news, into a standard
factor risk model.
Consider the factor returns

X
RkF (t) = Ak,m m RF ∈ RM ,
m
√
A = C, with C the covariance matrix of the factor returns and IID normally dis-
tributed. Given the factors, the linear pure time series risk model reads
Rn (t) = βn0 RF + σ̃n ˜n
with σ̃n the idiosyncratic volatilities and ˜ IID normally distributed. Data anayltic
providers deliver news statistics for many nancial instruments such as stock, stock in-
dices, commodities and many more such as the number of positive and number of negative
messages per time unit or a signal which summarizes all sentiments at a given date of
an asset. There are many dierent ways how such sentiment signals can be integrated in
the pure time series risk model.
One way is to assume that the signal impact the time series volatility of the factors.
This implies that a new adjusted volatility σa follows which captures the additional
information in the sentiment signals. The simplest relationship between the original and
adjusted volatilities is a linear regression
σa (t) = a0 + a1 Ξ(t)σ(t)
with Ξ the signal following the sentiment analytics. If there exists for each factor an
adjusted volatility, then the factor θk = σa,k /σk is integrated in the risk model as follows.
Setting
Aθij = θi Aij
we get
2
σa,k = θk2 σk2 .
Therefore the modied model X = Aθ would change the volatilities as desired without
changing the correlation structure.
6.2.11 Customer Retention: Text Mining

Consider an AM rm which receives information from its customers. Some information
are Complaints which we distinguish from all other information called General Contacts.
The task is to automatically classify any client feedback into a Complaint or a General
Contact. This is a text mining exercise. Data preprocessing is for texts a rst main
activity which has several steps. The written texts have to be transformed such that a
machine can read them. Figure 6.15 illustrates the big data process for text mining task
and the data preprocessing part. Some steps of preprocessing are:
• Tokenization: remove all punctuation marks, special characters, numbers, etc.
• Lemmatization: morphological replacement such as 'was' → 'be', 'angry' → 'anger.
• Synonyms: reduce set by replacing words with synonyms such as ('annoyance',

'displeasure', 'hostility') → 'anger'.
• Stemming: retrieve basic form of words, i.e. remove 'ly, ed, ing' at end of word
• Filtering: remove stop words such as 'and', 'be', 'or'.
An original complaint text
To whom it may concern : Can ANYONE help? I have fullled all requirements re
Claim † XXXX dd XXXX Green Tree Loan Servicing, & I am being told that I MUST
wait until further investigations are complete before the check which was issued by XXXX
XXXX can be released to me for completion of repairs to the home I live in. I have an
XXXX 15 year old daughter who is suering in this HEAT WAVE -along with XXXX
cats, a bird my wife and myself. No hotel will take us. My next step will be an attorney
& the Media. XXXX XXXX
reads after the pre-processing (XXX hides condential information):
may concern can anyone help fullled requirements re claim dd green tree loan ser-
vicing told must wait investigations complete check issued can released completion repairs
home live i year old daughter suering heat wave along cats bird wife no hotel take us my
next step attorney media
The next step is the word selection. We know that most of the words are used very
often or very rarely. Information gain is based on entropy which is useful at selecting
6.2. BIG DATA 409
Evaluation /
Validation
- Performance assessment
- Feedback loop
Data Text
Modeling
Acquisition Preprocessing
- Acquisition - Transformation - Discover

- Cleaning - Extract
Application
- Organize
knowledge
- Presentation
- Campaign
- Software
Text Pre-processing
• Remove special Morphological

Data • Detect language
characters replacement:
Acquisition • Select a language
• Convert to lower given  give
Select most significant Stem words:

Replace synonyms
words Texts  Text
Modeling
Evaluation / Test performance of

Train classifier
Validation classifier
Figure 6.15: Text mining big data analytics (Swissquant [2017]).
words that are best able to discriminate between the two categories General Contacts
and Complaints. We dene
IG(tj ) = How well does a given word tj separates the two categories?
Expressed in mathematical terms this function reads:
1
X 2
X 2
X
IG(tj ) = p(tj = m) p(Lc |tj = m) log2 p(Lc |tj = m) − p(Lc ) log2 (Lc ) (6.26)
m=0 c=1 c=1
where Lc describes the two categories and where the second term is a constant. Using
this function, we discard all words below e.g. the 90th percentile.
To obtain a systematic answer to the separation problem, several dierent classiers

can be used such as Naive Bayes, Random Forest, SVN Neural Network. In order to test
a classier, the data set is divided into a training set and a testing set. naive Bayes has
already an accuracy of 0.951. This can be improved with advanced algorithms. Using
SVM, the accuracy increases to 0.985. The confusion matrix for the Naive Bayes classier
is the business relevant output.
Predicted Predicted
Complaint General
Actual Complaint 49.54% 3.12%
Actual General 1.87% 45.47%
The diagonal elements show the correct predictions. The classier has a very good
hit ratio. There is almost no general text predicted to be a complaint (Type I error).
About 8% of the complaints are not detected as such (Type II error). The business risk
the rm faces due to big data analytics is that employees who write the answer do not
recognize the Type II error communication, then the customer will receive an improper
response. The parametrization of the algorithm can change the ratio Type I to Type II
error.
6.2.12 Portfolio Construction with Machine Learning, I

Gu et al. (2018) use machine learning to measure asset risk premia. They perform a
comparative analysis of ML methods in the prediction of cross section and time series
stock return. They identify trees and neural nets as the best predictors. That is the
non-linear predictor interactions are the source for the result. Regressions fail to account
for this type of structure. As a second nding. the methods agree on the same small
set of risk premia: Dominant predictive signals are found for momentum, liquidity, and
volatility. They measure risk premia both of the aggregate market and individual stocks.
Measurement of an asset's risk premium means predicting conditional expectation of

a future realized excess return. Relative to traditional econometric methods, ML pro-
vides a far more expansive list of potential predictor variables and richer specications
of functional forms. ML does not forces to use linear predictions as in the Fama-French
model for example. Furthermore, the zoo of predictors is a serious problem, both for
academia and investors. ML is used for variable selection and dimension reduction tech-
niques.
The authors use 300 000 individual stocks over the time horizon 60 years from 1957
to 2016. Our predictor set includes 94 characteristics for each stock, interactions of each
characteristic with eight aggregate time series variables, and 74 industry sector dummy
variables, totalling more than 900 baseline signals. Some of our methods expand this
predictor set much further by including non-linear transformations and interactions of
the baseline signals. Gu et al. (2018)
Their benchmark panel of individual regressions of individual stock returns consists

of there lagged stock-level characteristics: size, book-to-market, and momentum. This
benchmark is parsimonious, simple and comparison against this benchmark is conserva-
tive, since the three characteristics are routinely demonstrated to be among the most
robust return predictors. This nding is in line with Lewellen (2015). In their hugh
sample, the out-of-sample R-squared from the benchmark model is 0.16% per month for
the individual stock returns. Using OLS for such a hugh set with more than 900 pre-
dictors leads to negative R-squared. OLS is not able to generate any predictive power
out-of-sample if that many parameters need to be estimated (overtting).
Dimension reduction or penalization techniques are needed in the OLS case. Using
6.2. BIG DATA 411
parameter shrinkage and variable selection, which both limit the degrees of freedom in
the regression, brings the out-of-sample R-squared back to 0.09% per month. Principal
component regression (PCR) and partial least square (PLS), which reduces the dimension
of the predictor set to a few linear combinations of predictors, raise the out-of-sample
R-squared to 0.28% and 0.18%, respectively. Non-linear specication further improves
predictions. The authors use generalized linear models, regression trees, and neural net-
works. Regression trees and neural nets lead to a R2 between 0.27% and 0.39%. The
economic gains are considerable. An investor in the S&P 500 which uses neural network
forecasts reaches a 21 percentage point increase in annualized out-of-sample Sharpe ratio
(0.63) relative to the 0.42 Sharpe ratio of a buy-and-hold investor. Forming a long-short
decile spread, sorted on stock return predictions from a neural network, the strategy
earns an annualized out-of-sample Sharpe ratio of 2.35 compared to the Sharpe ratio of
0.89 of their benchmark.
We describe some of the used machine learning methods. All methods have the
objective to minimize the mean squared predictions error (MSE). They describe an asset's
excess return of asset j as an additive prediction error model:
Rtj1 = E(Rt+1
j
|Ft ) + jt+1 = g(ztj ) + jt+1 (6.27)
) where zj is the vector of predictors. Since predictions do not depend on time and the
individual stock the estimates of risk premia are more stable for any individual asset
contrary to standard methods where the cross-sectional is re-estimated in each period.
Since g depends on ztj information prior to t or from other stocks is not used. ML re-
quires careful construction of the sub-samples for testing, estimation and hyperparameter
tuning to control model complexity for the out-of-sample performance. Depending on
the algorithms used, dierent tuning methods are in force, see Guo e al (2018) for details.
Dierent choices of the function g dene dierent models. In the simple linear model
imposes conditional expectations can be approximated by a linear function of the raw
predictor variables, i.e. g = hzti , φi with the parameter vector φ. The objective function
is the ordinary mean square error loss
N,T
1 X j 2
L= Rt+1 − hzti , φi .
NT
j,t=1
Using statistical robustness methods (the Huber loss function), this least square objec-
tive can be tuned to account better for observations which are more informative. Penalty
models induce sparsity in the sense that they force small variable to become zero.
If predictors are highly correlated, the above shrinkage and selection methods are
not optimal. It is better to choose an average of the predictors as the sole predictor in a
univariate regression. This average approach is the essence of dimension reduction. Prin-
cipal components regression (PCR) and partial least squares (PLS) are two approaches.
PCR starts with a principal components analysis (PCA), which conserves the covariance
structure among the predictors, and then the leading predictors, given by the highest
eigenvalues, are used in the predictive regression. PCR rules out coecients by consider-
ing the covariation among the predictors before considering their goodness in predicting
future returns. PLS contrarily rst performs a dimension reduction by exploiting co-
variation of predictors with the forecast target. We refer to Yeniay and Göktas for a
comparison of PCR, OLS, PLS.
The generalized linear model expresses the model return forecast error as a sum
of an approximation error (bias, not knowing the true model g ∗ ), an estimation error
(variance, not knowing the true parameters in the model g) and an intrinsic error term.
Generalized linear means that non-linear univariate transformations of the predictors are
considered. As it turns out ex-post a weakness is that it does not allows for interactions
among predictors. Considering multivariate functions of predictors would generate such
interactions. But the number of parameters of such a model becomes computationally
intractable.
Instead regression trees are used for incorporating multi-way predictor interactions.
Formally,
K
X
g(zi,t , θ, K, L) = θk χ{zi,t ∈Ck (L)}
k=1
where Ck (L) is one of the K partitions of the data and θk is the sample average of
outcomes within the partition. This formula says that given a tree consider all paths
starting from the root node to the end nodes (sum over K ), at each node in a given path
check whether the feature is above or below the threshold value (indicator function), and
multiply all indicator functions in a given path. Based on the basic decision tree model,
boosting and random forest ensemble methods are introduced in order to stabilize the
results, to improve the performance and to manage the bias-variance tradeo.
Given these models, the out-of-sample R2 for individual excess stock return forecasts
is calculated:
(Ri,t+1 − R̂i,t+1 )2
P
i,j∈τ
R2 = 1 − P 2
Ri,t+1
i,j∈τ
where τ indicates that only the testing sub-sample is used for tting. The metric is used
without demeaning which is meaningful since we consider individual stocks and not broad
indices. Using the historical averages adds a lot of noise. All models under consideration
increase their monthly R2 by 3 percentage points when benchmarked to the historical
means. Table 6.1 shows the monthly out-of-sample stock level prediction performance.
The negative results for OLS reect the in-sample-overt. Restricting OLS to three
style premia or using penalization as in ENet improves the performance signicantly.
6.2. BIG DATA 413
OLS* OLS*-3 PLS ENet* RF GBKT* NN1 NN3 NN5

All -4.60 0.16 0.18 0.19 0.27 0.30 0.35 0.39 0.35
Top 1000 -14.21 0.15 -0.10 0.10 0.62 0.53 0.44 0.72 0.69
Bottom 1000 -2.13 0.37 0.29 0.18 0.29 0.27 0.41 0.46 0.40
Table 6.1: Monthly R2 for the entire panel of stocks using OLS, OLS using only size,
book-to-market, and momentum (OLS-3), PLS, elastic net (ENet), random forest (RF),
gradient boosted regression trees (GBRT), and neural networks with one to ve layers
(NN1-NN5). The generalized linear model (GLM) are not reported (bad performance
anyway). '*' indicates the use of Huber loss instead of the l2 loss. Top 1,000 or bottom
1,000 means 1000 stocks by market value. (Guo et al. [2018])
Regularizing the linear model via dimension reduction improves predictions even fur-
ther, the PLS case. Hence, dimension reduction dominates variable selection. Boosted
trees and random forests are competitive with these methods. Neural networks are the
best performing predictor overall. But the drawback of neural network is the diculty
to interpret the results - what is the economic meaning of the dierent layers one to
ve, why do they generate the performance. Therefore, neural networks fail to be inter-
pretable which is a serious drawback in asset from a client perspective and also from a
regulatory one.
Assessing the statistical signicance of the return perfomances, the authors use Diebold-
Mariano test statistics for pairwise comparisons of a column model versus a row model.
The statistics implies that the performance dierences among regularized linear models
are all insignicant, that is all OLS models, ENet, PLS and PCA produce statistically
indistinguishable forecast performance. Random forest and boosted trees improve over
linear models marginally. Again, neural networks are the only models that produce large
and signicant statistical over all linear models. When one considers which characteristics
matter in the dierent model, a few characteristics turn out to signicantly contribute in
all models to the return performance: Momentum on several time scales, volatility char-
acteristics, spreads for example matter. That is, as we have seen in other parts market
driven characteristic dominate macro economic or accounting type characteristics.
6.2.13 Portfolio Construction with Machine Learning, II

This example is due to de Prado (2016). The authors introduce to so-called Hierarchical
Risk Parity (HRP) approach. HRP is concerned about the issues sensitivity, concentra-
tion and under-performance of classical optimization models and the Markowitz model as
a particular case. One source for these issues is the need to invert the covariance matrix.
HRP is a method were this inversion is not needed. HRP is also expected produces less
risky portfolios out-of-sample compared to traditional risk parity methods.
Correlation matrices can be represented as complete graphs, which lack the notion
of hierarchy: Each investment is substitutable with another. There is no hierarchical

relationship. All nodes are of the same importance. Small estimation errors are magnied
in such a structure. Consider an investor, which invests in many assets where some assets
are close substitutes to each other while others are complementary. Say stocks with a
similar liquidity and of the same economic sector are more substitutable than stocks
which have dierent characteristics. Such a classication of the dependence leads to a
tree structure which includes hierarchical models and not a symmetric complete graph
where weights between any nodes can vary freely, see Figure 6.16.
Figure 6.16: A complete-graph (top) of 50 × 50 correlation matrix and a tree-graph

(bottom) structures (Lopez [2016]).
While a covariance matrix has N × (N − 1)/2 edges to connect the N nodes, a tree
has only N − 1 edges to rebalance the weights among peers at various hierarchical levels.
Furthermore, in the covariance matrix the weights distribution has no natural point to
start with. But in a tree the weights are distributed top-down which is consistent with
many asset managers investment behaviour.
The HRP algorithm is constructed in three steps: First, similar investments are
grouped into clusters, based on a proper distance metric. This denes tree clustering.
Second, the rows and columns of the covariance matrix are reorganized such so that the
largest values lie along the diagonal. This leads to a quasi-diagonalization of the clustered
tree. With such a quasi-diagonalization the problems of inverting the covariance matrix
are circumvented. Third, the allocations is split top-down through a recursive bisection
of the reordered covariance matrix.
6.2. BIG DATA 415
The tree construction, step one, is done in several steps. The rst stage is generate
tree clustering out of the data. Let a T ×N matrix of observations X be given with N
the number of asses and T the periods. The goal is to map the N column vectors into
a hierarchical structure of clusters, such that allocations can ow downstream. The rst
step is to dene 'distance of distances'. Let ρ be the correlation matrix of dimension
N × N. Then D r
1
D = (dij ) , dij = d(Xi , Xj ) = (1 − ρij )
2
is a N × N distance function. The Euclidean distance d¯ between any two column-vectors
of D is dened by v
uN
uX
¯
dij = t (dni − dnj )2 .
n=1
Note that dij is dened on column-vectors of X but d¯ij is dened on column-vectors of D,

the distance of distances: It is a distance function dened on the entire correlation matrix
and not only on particular cross-correlations. The distance-of-distances are clustered as
u(1) = argmini6=j (d¯ij ),
i.e. it is selecting the smallest distance-of-distances number in the d¯ matrix. Since

clustering does not apply to the whole distance matrix, we have to dene the distance
between a new cluster u(1) and and the unclustered items as follows
d¯i,u(1) = min((d¯ij )j∈u(1) ).
Finally, the matrix d¯ij is updated by appending d¯i,u(1) and dropping the clustered columns
and row j ∈ u(1), see the example for illustration:
     
1 − − 0 − − 0 − −
ρ = 0.7 1 − → d = 0.3873 0 − → d¯ = 0.5659 0 − → u(1) = (1, 2)
0.2 −0.2 1 0.6325 0.07746 0 0.9747 1.1225 0
and
 
  0 − − −
0
¯ i,h=1,...4
0.5659 0 − −
u(1) → di,u(1) =  0  → (d) =  .
0.9747 1.1225 0 −
0.9747
0 0 0.9747 0
Finally, the above steps are recursively such that N −1 clusters can be appended to the
matrix D until the algorithm stops when the nal cluster contains all of original items.
The sequence of the cluster formation can be illustrated using a dendogram.
The next stage, quasi-diagonalization, reorganizes the rows and columns of the co-
variance matrix, so that the largest values lie along the diagonal. This operation places
Figure 6.17: Quasi-diagonalizes of the clustered correlation matrix, in the sense that the
largest values lie along the diagonal. HRP does not require a change of basis unlike the
PCA approach for example. HRP works with the original investments. (Lopez [2016]).
similar investments close to each other and dissimilar ones far apart. The used algorithm,
which we do not discuss, preserves the order of the clustering.
Stage three uses the fact that inverse-variance allocation is optimal for a diagonal
matrix. For the quasi-diagonal matrix of stage two one approach to achieve a recursive
bisection is split the allocations of the quasi-diagonal matrix between adjacent subsets
in inverse proportion to their aggregated variances.
The author compares in- and out-of-sample the minimum variance portfolio (GMV),
the inverse volatility portfolio construction (IVP) of risk budgeting (where correlation
information is discarded) and the HRP. In all portfolio constructions he applies long-only
constraints. The simulation is done for 10 assets. The GMV allocates 92.66% on 5 top
holdings and 0 to three assets. The HRP allocation is in between the highly concentrated
GMV and the IVP almost equal distribution. The GMV and the HRP portfolios have
almost the same risk although GMV only uses half of the assets. Therefore, an event im-
pacting the top ve assets will have a more severe impact than in the HRP case. From an
out-of-sample perspective, Gaussian returns are generated with mean zero and 10% stan-
dard deviation, random shocks are added to account for price jumps and the portfolios are
rebalanced monthly (every 22 observations). The simulations are repeated 10, 000 times.
All mean portfolio returns out-of-sample are essentially zero. But the variance of the out-
of-sample portfolio heavily dier. The GMV variance is out-of-sample the highest one,
6.2. BIG DATA 417
72.47% greater than in the HRP's. Intuitively, shocks aecting a specic investment
penalize the GMV concentration. Shocks involving several correlated investments
penalize IVP which ignores the correlation structure. HRP protects against common
and idiosyncratic shocks by balancing between diversication across all investments and
diversication across clusters of investments at multiple hierarchical levels.
6.2.14 Penalty approaches in portfolio optimization

One method to reduce estimation risk in the unknown parameters µ and C is the Least
Absolute Shrinkage and Selection Operator (LASSO) approach of Tibshirani (1996).
There is empirical evidence that this approach provides higher out-of-sample perfor-
mance and that Sharpe ratios more stable. The optimization problem is still convex and
therefore any local numerically found minimum is a global minimum. The optimization
problem reads:
N
X
0
min φ Cφ + λ |φj | , s.t. : e0 φ = 1 , φ0 µ ≥ r . (6.28)
φ∈RN
j=1
Deviations from the zero vector are linearly punished since one superimposes a 'V'-type
function to the risk function. Small values of φ eventually are reduced to zero. This ex-
tra gained investment capacity is distributed to the other investment components which
results in a sparser investment vector. There are many dierent variants of the LASSO
approach, see Fastrich et al. (2013) and Zhou (2006) for the adaptive LASSO approach
to counteract some biases inherent in (6.28).
Bruder et. al (2013) compare the OLS-mean variance approach with the LASSO-
mean variance one for the S&P 500, with monthly rebalancing and data starting Jan
2000 to Dec 2011, see Table 6.2.
Method Return Volatility Sharpe Ratio Max. Drawdown Turnover

OLS-MV 3.60% 14.39% 0.25 -39.71% 19.4
LASSO MV 5.00% 13.82% 0.36 -35.42% 5.9
Table 6.2: OLS-mean variance versus LASSO-mean variance (Bruder [2011])
The LASSO approach shows a better risk adjusted performance than the traditional
one. The extreme losses are comparable in both approaches although the LASSO ap-
proach does not provides any form of a tail hedge. The turnover is much smaller for the
LASSO approach which is a consequence sparse optimal investment vector and informa-
tion matrix in the LASSO approach. Google stock is for example hedged in the OLS
model by 99 stocks compared to 13 stocks only in the LASSO model.
What has this to do with big data? If we consider the asset universe S&P 100, the
problem data integration from dierent data sources but the dimension of the problem
- analytics. It requires powerful software tools. Take MSCI world with around 10 500
stocks. The LASSO approach requires a numerical optimization. Since convexity of the
problem is lost in most type of LASSO approaches the guarantee that a local minimum
is indeed a global one is no longer given. One therefore needs to search for a global
minimum. Since the covariance matrix is of high dimension and its inversion becomes
delicate due to the sparsity of the matrix, one has to use advanced algorithms to nd the
global minimum.
6.3 Blockchain
Cryptography is a main mathematical discipline in a digital world. It makes protection
and validation possible in a world of strangers when we want to exchange values in a
blockchain at a zero human trust level.
6.3.1 Cryptography
The main goals of cryptography are
1. Condentiality (access protection)
2. Integrity (change protection)
3. Authenticity (counterfeit protection)
4. Obligation (non-repudiation)
For the rst goal, encription and decription are used. For the other tree goals, digital
signatures are used.
Today, the operationalization of these goals means to dene mathematical problems

(cryptogaphy) which are hard to solve for supercomputers. While the goals matter since
thousands of years between humans which want to trade, cooperate or ght, the dier-
ence is that today computers are used for encription, decription and signing. The main
source for cryptography is Goldwasser and Bellare (2008).
The ancient problem of cryptography is secure communication over an insecure chan-

nel. Alice wants to send to Bob a secret message. In the traditional solution, Alice
and Bob meet before transmissions starts where they agree on a pair of encryption and
decryption algorithms E, D, and the secret information key S which is the same for both
of them: They use the same keys and the same algorithm for encryption and decryp-
tion. This is called symmetric cryptography or private key encription. The meeting to
exchange the secret information is risky and does not scale to many individuals. An
adversary Eve may know E, D but not the secret key S. Alice encrypts the message M
by computing the ciphertext c = E(S, M ), sends it to Bob who decrypts c by computing
D(S, c) = D(S, E(S, M )) = M.

6.3. BLOCKCHAIN 419
This shows that cryptography needs inverse functions or properties of the preimage of
functions, the so-called one-way function. These are functions where it is easy to calcu-
late f (x) from x but hard to invert, i.e. to calculate x from f (x).
Although one-way functions are believed to exist, mathematical proofs about their
existence are missing. An example is the factoring function of prime numbers, i.e.
f : (x, y) → f (x, y) = xy . The product is simple to calculate but nding the prime
factors is dicult.
Asymmetric cryptography overcomes the diculties of symmetric cryptography by

still using the same algorithms for encryption and decryption but by allowing for dierent
keys for the sender and receiver of a message, see Die and Hellman (1976). Alice and
Bob do not have a common secure key S. Instead they have their own private key. To
achieve this, Bob publishes information, the public key vkB , which is used for verication.
Everybody knows this key and can send messages to Bob using this key without the
need to meet Bob. Bob also generates a private key pkB which is known only to him
and allows him to decrypt any message sent to him. For secure public key encryption
trapdoor functions are used, i.e. one-way functions for which there exists some trapdoor
information known to the receiver Bob alone which Bob can use to invert the one-way
function, see Figure 6.18. Alice also generates a public and a private key.
Denition 91. Private keys are denoted pkX with X the owner of private key and vkX
denotes a public key.
Rivest, Shamir and Adelman proposed a rst candidate trapdoor function, the RSA
system. Before w consider this algorithm, we introduce to basic number theory - modular
arithmetic (MA).
6.3.2 Modular Arithimetic (MA)

The maths is used to formalize the encryption and decryption algorithms as well as gen-
eration of the keys. MA is arithmetic for integers where the addition of two numbers
restarts after a certain value (modulus). The clock is a prototype example: Hour arith-
metic's is modulo 12, written mod 12. 3 = 15 mod 12 means 3 − 15 = −12 is an integer
multiple of 12. An equivalent denition of a = b mod n is that there exists an integer k :
a = kn + b.9 We summarizes some calculus rules.
9 Examples are 40 = 18 mod 11, −40 = 8 mod 6, −40 = 8 mod 8. The modulus operation denes an
equivalence relation on the integer numbers. The relation satises reexivity, symmetry and transitivity.
Diffie–Hellman
Symmetric-key Asymmetric-key
key exchange
Figure 6.18: Symmetric key, asymmetric key and Die - Hellman key exchange (Source:
Wikipedia [2016]).
Proposition 92. Let ai = bi mod n with i = 1, 2 and k an integer.

a + k = b + k mod n
ka = kb mod n
a1 ± a2 = b1 ± b2 mod n
a1 a2 = b1 b2 mod n
ak = bk mod n, k > 0
k a 6= k b mod n
Basic for cryptography is the existence of the inverse a−1 of a. If c=d mod φ(n),
where φ is Euler's totient function, then a
c = ad ( mod n) provided a is coprime with n;
this replaces the last false statement in the above properties. The totient function
φ(n) := card{aN|1 ≤ a < n ∧ gcd(a, n) = 1},
describes all natural numbers smaller than n with greatest-common-divicsor (gcd) 1

with n: The totient function counts the positive integers up to a given integer n that
are coprime to n. For example, φ(20) = card{1, 3, 7, 9, 11, 13, 17, 19} = 8. 10 The totient
10 Two integers a and b are coprime if the only positive integer factor that divides both of them is 1 or
equivalently that their greatest common divisor is 1.
6.3. BLOCKCHAIN 421
function is a multiplicative function: φ(pq) = φ(p)φ(q) for q, p prime which is at the

basis of the multiplication and inversion in MA. The key property for inversion is Euler's
Theorem: If a and n are relatively prime then
aϕ(n) ≡ 1 mod n.
There exists an integer modular multiplicative inverse denoted a−1 , such that aa1 =
1 mod n if and only if a is coprime with n. If a = b mod n and a−1 exists, then
a1 = b1 mod n. Finally, for ax = b mod n and if a is coprime to n, then x = a−1 b mod n.
vk pk
For the RSA algorithm, the inverse of the function a → a modn, is the function b → b ,
the multiplicative inverse of vk mod(φ(n)). Not knowing the factorization of n makes it
dicult to compute the totient function and hence the computation of the private key.
As an example, assume that to each letter of th alphabet we associate one number 0

to 25. Encryption means
y = ax + b mod(n)
and decryption is given by

x = a−1 (y − b) mod(n).
We set the key (a, b) = (9, 13) and gcd(a, 26) = 1. We decrypt the letter C which is
mapped into the number 2. Then
y = 9 · 2 + 13 mod(26) = 31 mod(26) = 5.
Decryption implies
x = 3(5 − 13)mod(26) = 2
where the calculation of the inverse is the tedious part. Since the gcd of 9 and 26 is 1,
the inverse a−1 exists. The number of possible keys 12 · 26 = 312 is very small.
We consider the discrete logarithm which is assumed to be a one-way function.

We recall that n the reals x = logb a solves bx = a. To dene the discrete analogue the
concept of a group is needed.
Denition 93. A set G is a group if there is an operation ∗ such that for two elements
of gq , g2 ∈ G also g1 ∗ g2 ∈ G, the operation ∗ is associative, there exists an unit element
e such that g ∗ e = e ∗ g = g for all g and for each g there exists an inverse g −1 such that
g ∗ g −1 = g −1 ∗ g = e.
The integer numbers with ∗ n numbers
the addition operation, the permutations of
n
and the rotations in R dene groups. If g is a group element, g
k
= g ∗ g ∗ . . . ∗ g is well
dened for k a positive integer and if k is negative, the denition applies to g
−1 . For
a, b ∈ G, an integer k which solves bk = a is the discrete logarithm k = logb a.
The group of integers modulo n, Zn := Z/nZ, is an important group in cryptography.

Zn means that two numbers a, c ∈ Z are the same if they have the same reminder when
divided by n.
We have
a1 + a2 = s1 n + r1 + s2 n + r2 = (s1 + s2 )n + qn + r = r
where r1 + r2 = qn + r. Hence, Zn = {0, 1, . . . , n − 1} is an additive group if any

number exceeding n is 'cut'. Consider Z7 . Then 6 + 3 = 9 = 1 × 7 + 2 and therefore,
[6 + 3] = [9] = [2] in Z7 . The numbers 2 and 9 belong to the same equivalence class [2]
which for simplicity is written 2.
The set Z∗n consists of all integer numbers m, 1 ≤ m ≤ n such that gcd(m, n) = 1.
Proposition 94. Zn is a group under addition modulo n. Z∗n is a group under multipli-
cation modulo n.
Z∗n is a group since a = b mod n

gcd(a, n) = gcd(b, n), i.e. the congru-
implies
ence classes modulo n are well-dened. Next, gcd(a, n) = 1 and gcd(b, n) = 1 implies
gcd(ab, n) = 1, i.e. closure under multiplication follows. Finally, given gcd(a, n) = 1
and nding the inverse aa
−1 = 1 mod n is possible using Bézout's Lemma. The inverse
satises gcd(a
−1 , n) = 1, i.e. it is also an element of the group.
The order of the group Z∗n is given by Euler's totient function and the group is cyclic:
All group elements can be generated by multiplication of a single element, if and only
if n = 1, 2, 4, q k , 2q k where k is a positive integer and q any prime number dierent from 2.
Consider the function f : (n, g, x) → (g x mod n, p, g) where the group is assumed to

be cyclic and g the unique generator is conjectured to be a one-way function: Computing
f can be done be repeated multiplication but the calculation of the inverse requires to
calculate the discrete logarithm. There is no ecient algorithm to invert the function
for large enough values of the parameters. This is the basic method used in cryptogra-
phy. To set-up such a function one needs to nd primes and generators, see the literature.
6.3.3 RSA Algorithm

We consider the RSA algorithm example based on Sullivan (2013) and Corbellini (2015).
The RSA algorithm is used both for encryption, the case we consider now, and for digi-
tal signatures. Bob wants to send the message M 'Hello Alice' to Alice. The goal is to
convert all letters into a deterministic sequence number, then these numbers are mapped
into a random-locking number (encryption) which can only be mapped back to the orig-
inal sequence if the private decryption key is used. Since computers prefer to work with
not too large numbers, a maximum function is used. The private and public key are two
numbers larger than zero and smaller than the maximum number.
To start with assume that the two prime number 13 and 7 are chosen. The maximum
number is 91 = 7 × 13. The public key of Alice is the number vkA = 5. An algorithm
6.3. BLOCKCHAIN 423
based on the information in the system of 91 and 5 generates the private key pkA = 29
for Alice. How can this be used to convert transmit the letter C in the message 'Hello
Alice' ? First, the letter has to be turned into a number. The UTF-8 schemes attributes
the number 67 to the letter C . Then, the letter C is multiplied 5 times - the public key
- with itself. Since already 67 × 67 > 91, the calculation is done modulo the remainder.
This means,
67 × 67 = 4489 = 91 × 49 + 30 .
Therefore, the result after the rst multiplication is 30. This is then again multiplied
with 67, which is larger than 91 and applying the same division as above, the result
is 8 (the remainder). This is repeated in total 5 times leading to the number 58 - the
encryption E of C = 67 is E(67) = 58. This is the message Alice receives. Now she uses
the private key number 29 and multiplies the 58 with itself 29 times where she uses the
same logic - after each multiplication we do the next multiplication with the remainder
- for decryption D:
58 × 58}
| {z = 67
29 times, modulo 91
which is the letter C. In other words,
D(58) = D(E(67)) = 67 .
If you don't know the private key number 29, then you don't know how many times
you have to multiply 58 with itself in the above time consuming way to calculate and
consider in each step the remainder. Besides the easy part of multiplication (encryption),
decryption is a hard to solve factoring-type.
Summarizing,
• Alice choose prime numbers p, q , which she keeps secret, and set n = pq .
• Alice chooses vkA such that the greatest common divisor of vkA and φ(n) is 1, i.e.
the public key and the Euler function number φ(n) are coprime vka ∈ Z∗φ(n) .
• Alice computes the inverse of vkA , the private key satisfying pkA vkA = 1 mod (φ(n)).
• Alice makes n and the public key public keeping p, q and the private key secret.
• Bob encrypts C = M vkA mod (n).
• Alice decrypts C as M vkA pkA = M = C pkA mod (n).

Although the example explains the basic concept, real-life algorithm are more rened.
11
The example did not consider in detail how the keys are distributed and management
in a public key system. Die and Hellmann invented the Secret Key Exchange (SKE).
11 There more rened mathematical concepts are used for the multiplication and factorization problem.
Instead using multiplication dened on nite integer sets one use so-called elliptic curve cryptography,
see Sullivan and Collabrini for an introduction.
Fix a prime p and a generator g in the cyclic group Z∗p where g, p are public known. Alice
picks at
∗
random an element x ∈ Zp−1 and Bob picks at random an element y ∈ Zp−1 .
∗
Alice calculates
a = gx mod p
and Bob calculates b = gy mod p. The keys x, y are private to Alive and Bob, respectively.
Alice sends Bob a and Bob sends Alice b. But then
ay = (g x )y = g xy = (g y )x = bx ∈ Z∗p .
Hence, Alice and Bob can both calculate the result without a prior meeting to generate
a shared key. If Eve wants to calculate ay or bx , then she faces the problem that she
does not knows x or y. To nd these numbers she has to compute the discrete logarithm
which is believed to be not tractable.
6.3.4 Hash Functions

Important functions in cryptography are cryptographic hash functions (algorithms).
To motivate this function, note that many algorithms such as RSA are slow and they gen-
erate output with the same size as the input. Therefore, it makes sense to shorten (hash)
the message rst, and process the short hash. A hash function has 160-512 bit output.
SHA1 was nalized 1995 and maps strings from almost arbitrary length to strings of 160
bits. The hash function ] acts on the message M of any length and produces an output
- the hash or digest - of xed length:
] : M → ](M ) = hash .
Hash functions accelerate database lookup by detecting duplicated records in a large le.
The hash function is deterministic: For the same input always the same hash-output
follows. The term 'cryptographic' means that the hash function needs to satisfy some
security, authentication or privacy criteria. First, the time to compute the hash should
be short for any message input. Second, to reconstruct a message given a hash result is
impossible unless one tries all possible combinations; there are too many combinations.
Changing the message only by a little amount of information should change the hash
value in a way that the new and the old hash look uncorrelated. Finally, it should be a
hard problem to nd two dierent inputs which lead to the same output - the so-called
collision resistance. Summarizing, using cryptographic hash function makes it easy to
verify that some input data maps to a given hash value, but if the input is unknown, it
is dicult to reconstruct it by knowing the hash value. For the proof-of-work in Bitcoin
transactions one has for example to compare fast and easily data of arbitrary size and to
be sure that the message which was digital signed did not changed.
Hash functions are not one-way functions. Historically, popular cryptographic hash
functions have a lifetime of around 10 years before they were broken.
6.3. BLOCKCHAIN 425
6.3.5 Digital Signatures

Digital signatures are important in physical and virtual transactions of assets. A digital
signature scheme provides a way for Alice and Bob to sign messages so that the signatures
can be veried by anyone else. Alice creates a private and public key, signs the message
using the private key and sends it to Bob. Bob uses Alice's public key to verify that Alice
signed the message, that the message contents have not been altered since the message
was signed and that Alice cannot later repudiate having signed the message, since she
is the only owner of her private key. The digital signature DS depends on the private
key pkA and the message M. If Eve, which is the evil in the game, changes M, then DS
0
changes to DS . Applying the public key of Alice to DS 0 does not conrm authorship of
Alice. Unlike physical signatures, DS have to change all the time such that they cannot
be learned and misused.
The RSA system allows to implement digital signatures as follow. Alice wants to
sign electronically a document M. Alice signs M by appending the digital signature
DS(M ) = f −1 (M ) with f is Alice's trapdoor function, i.e. only Alice knows the trapdoor
information. But then anybody can check the validity of the signature since f (f −1 (M )) =
M. This shows that the signature becomes invalid if in the message M is changed. In the
RSA system, DS(M ) = M pkA mod n. Using Alice's public key, anybody can calculate
(M pkA )vkA mod n.

If the result equals M, then the signature Md must have been created by Alice which is
the only to know pkA . Figure 6.19 illustrates the digital signature process for a hashed
message.
For a numerical example, the message M = 9 takes values in 0, 1, . . . , n − 1 with
n the largest number. Applying the signature function on M, i.e. E((n, vkA ), M ) =
M pkA modn = DS , where n = 1591 = p ∗ q = 37 ∗ 43 the product of two prime numbers,
the totient function property φ(n) = (p − 1)(q − 1) = 36 ∗ 42 = 1512 implies for the
public key vkA = 17 with the Euclidian algorithms pkA = 89. Using this generated key
DS = M pkA modn = 989 , mod1591 = 440 .

To verify the result,
D((n, vkA ), DS) = DS vkA modn = 44017 mod1591 =9=M

i.e. the validation using the public key only and the known number n proves that Alice
signed the document.
6.3.6 Blockchain
Denition 95. A blockchain or mutual distributed ledger technology (MDLT) denes
ownership (mutual), a technology (distributed servers) and the object (ledger)12
12 The records in the ledger consider ownership, transactions, identity of assets. In order to allow for
communication between agents, they need to agree (consensus) about the state and authenticity of the
ledger.
Alice Bob
M #(M)
Broadcasting
M= M M #(M)
Hello Bob +DS(#(M),pkA) +DS(#(M),pkA) =ad987
Identical?
hash
DS(#(M),pkA,
#(M) DS(#(M),pkA) DS(#(M),pkA) vkA, )
=ad987 =9a8c7 =9a8c7 = ad987
Sign #(M) with

Verification signed #(M) with
private key
public key
Figure 6.19: The process of signing a hashed document: Hashing the document, signing
the hashed document using the private key, broadcasting the document plus the signed
hash and decomposition of the broadcast in two pieces: Hashed documents and the
verication of the signed hash. If the two results agree, then Alice signed the document
and the document did not change during broadcasting.
Blockchains
13 are decentralised protocols for recording transactions and asset own-
ership. Contrary to centralised protocols with an authority in charge to maintain a unique
common ledger, blockchains operate within a network of participants who possess and
update their own version of the ledger (distributed). The ledger act as the custodian of
the transaction information. The blockchain in the original format is public (such as for
Bitcoin or Etherum), i.e. it belongs to everyone or nobody and participants are anony-
mous. In other words, any trust function in a value chain attributed to a third party,
such as a bank in payment systems, is transformed to a trust function in the blockchain.
We use the expressions blockchain and MDLT as synonyms. Decentralized, public or
private architectures, are the alternative to the often existing centralized architectures
in the nancial industry such as stock exchanges, interbank clearing, monetary policy of
When does it make sense to replace a centralized architecture
a central bank.
by a decentralized one?
While the internet revolutionized information exchange, blockchain can revolu-
13 Rubin is an excellent video on youtube about the basics. References for this section are Duivestein
et al. (2016), Tasca (2016), Aste (2016), Rifkin (2014), Swan (2015), Peter and Panayi (2015), Davidson
et al. (2016), UBS (2015), Nakamoto (2008), Franco (2014), Bliss and Steigerwald (2006), Peters et al.
(2014), Zyskind et al. (2015), Berentsen and Schär (2018).
6.3. BLOCKCHAIN 427
tionize value exchange and it is based on the internet. While ownership is of low
importance for the internet since the goal is to spread information and not value it is
critical when it comes to the the exchange of values. Blockchain technology will not
replace all yet existing centralized ledgers. It will compete with well-established struc-
tures owned by exchanges, central banks or other nancial intermediaries. In fact, the
number of blockchain projects is large but the number of working protable blockchains
is yet low: Many centralized solutions in the FI are dicult to beat in terms of costs,
performance, privacy or a mixture of all of them.
Two requirements are necessary: Decentralization and trust. One requirement is

not sucient. Decentralization is a well-known concept and the denition of trust in a
network populated by strangers without a validating 3rd party is not trivial. Crypto-
security and game theory will prove to be the main tools. One reason for the blockchain
hype in the last years was the assumption that blockchain could be a meaningful solu-
tion if only one of the two requirements is met. But in this case traditional solutions
are better suited than a blockchain solution. We remark that legal, tax and regulatory
issues are independent of the chosen technology.
We consider blockchains for money transfer in more details since Bitcoin cryptocur-
rency uses the archetype of a blockchain and is one of the few up and running blockchain
applications. Traditional money transfer using traditional banking services and trust is
shown in Figure 6.20. We neglect for the moment how money is generated and how
money is represented but we consider the third control structure - transaction exe-
cution. When Alice sends Bob CHF 10, they use a 3rd trusted party. Alice orders her
Bank to transfer the money to Bob, i.e. the bank of Alice connects to the Bank of Bob
which maintains his account. Both banks are keeping the accounts - that is the ledgers.
Both banks are trusted - Alice does not need to know Bob and vice versa. The banks
check whether the money can be transferred. The central bank then is a trusted 3rd
party acting between the ledgers of the two banks and running an own ledger where the
bank's account balances are recorded.
This traditional transaction execution consists of three parts:
1. Transaction Feasibility. The net of bank branches, using the online banking and
initiating payments by written payment order dene feasibility.
2. Transaction Legitimization. Banks are permanently checking the legitimization

of payment orders. At each date only a single version of the ledger exists.
3. Transaction Consensus. At each date a central party allows for ecient exe-
cution and it xes at each date in a unique way the distribution of money in the
whole system.
These three parts of the transaction execution hold for all monetary system, in par-
ticular also for crypto currencies. Blockchain attempt to change this classical money
transfer in three respects:
3rd Party Bank A

Validation
Bank A
Alice Account Alice
Alice Eve Account Bank A
3rd Party Central Bank

CHF 10 Validation
CHF 10
CHF 20
CHF 10 Account Bank B
Bank B
Bob
Account Bob
Bob
Central 3rd Party Validations
Private Ledger
Alice owner of CHF 10?
Alice pays Bob 10 Double Spending?
Bob pays Eve 20 Accounts are used
Figure 6.20: Alice paying CHF 10 to Bob using the centralized banking system. On a
payment level, Alice announce her willingness to pay to her bank A. The bank checks
whether Alice possesses CHF 10 and makes sure that there is no double spending. This
third party validation is repeated by the central bank where the bank accounts of bank
A and B are checked.
1. There is no central third party validation.
2. The transactions are done at a higher speed.
3. The transaction fees are lower.
We describe in principle how this works; leaving aside the many complicated details
which matter if one considers an implementation of a blockchain.
One has to assure that Alice possesses CHF 10, that she did not promised to pay the
same CHF 10 to multiple recipients and that indeed Bob is receiving the money. The
rst concept is the use of of a MDLT, see Figure 6.21.
Consider Alice, Bob and Eve, where Alice sends CHF 10 to Bob and Bob CHF 5 to
Eve. The central open ledger records that Alice indeed has CHF 20 on her account and
that she is able to pay CHF 10 to Bob. Both transactions are recorded and time-ordered
linked. If Alice wants to pay CHF 15 to Eve while only CHF left 10, in the open ledger
the participants realize that she fails to have enough cash. This central ledger is then
removed by making a copy of this ledger and save it on the servers of all participants: A
distributed or decentralized open ledger architecture.
6.3. BLOCKCHAIN 429
Open Ledger
Alice has CHF 20

Distributed Open Ledger
Alice  Bob CHF 10
Alice has CHF 20
Bob  Eve CHF 5 Alice  Bob CHF 10
K Bob  Eve CHF 5

Alice  Eve CHF 15 Alice  Eve CHF 5
CHF 20
Distributed Open Ledger CHF 20 Distributed Open Ledger
Alice Eve
Alice has CHF 20
Alice Eve Alice has CHF 20 CHF 5
Alice  Bob CHF 10 CHF 15 Alice  Bob CHF 10
Bob  Eve CHF 5 Bob  Eve CHF 5
Alice  Eve CHF 15 Alice  Eve CHF 15
CHF 10 CHF 5
CHF 10 CHF 5

Alice has CHF 20
Alice has CHF 20 Bob
Bob Alice  Bob CHF 10
Alice  Bob CHF 10
Bob  Eve CHF 5
Bob  Eve CHF 5
Alice  Eve CHF 5
Alice  Eve CHF 15
Figure 6.21: Left Panel: Open and distributed open ledger technology. Right Panel:
Validation of transactions. New transactions are grouped into a new block and after its
validation - the consensus work to install unambiguous asset ownership - the block is
added to the existing blockchain. Each block is further marked with a time-stamp and a
digital ngerprint (identication number) of the previous block. This digital ngerprint
('K' in the above example) - called a hash - identies a block uniquely and the verication
of the ngerprint can be easily done by any node in the network.
Transaction feasibility means that a payment from Alice to John which is not di-
rectly connected to Alice is possible: Alice broadcasts the payment instructions to her
next node Bob, who broadcasts to his next node and so on. But there are many paths
linking Alice and John and it does not matter how the message reaches John. The pay-
ment network works also if some links are not functioning: A decentralized system is
more robust than a centralized one. The drawback of the decentralized system is that
there are no admission constraints, each node can broadcast any type of information, and
in the Bitcoin network all nodes are anonymous. Each node needs to be able to check
the validity of each transaction information.
The distribution of the ledgers and unrestricted broadcasting generate challenges.

The rst is synchronization. All copies of transactions have to be identical at each
date: New transactions need to be validated and potential new transaction and the val-
idated transaction has to be added to all ledger copies, consider the right hand panel in
Figure 6.21. Given the three veried (blue) transactions, the transaction of Alice to Eve
of CHF 5 is not yet validated. First verify that indeed Alice generated the transaction
message. To achieve this, Alice signs the message using her private key. Everybody in
the network can verify that Alice generated the message using Alice's public key. This
legitimization check has to be done by each node in a given path in the network afresh.
Given transaction legitimate, the transaction ends in a queue of transactions which wait
to be validated as a block such that they can be written to the ledgers. Validation means
to nd transaction consensus. This consensus should prevent for example from double
spending. Assume Alice tries to double spend. In a centralized system, the rst transac-
tion arriving is valid and hence there is no problem. In a decentralized system, the two
messages sent by Alice are a priori ready to be validated. How should one observe that
the same amount of money i planned to be spent twice? Which transaction will be nally
validated is irrelevant for the network but consensus is needed about the validated one.
To achieve this agreement without a central party, the open transactions are arranged
into blocks by the miners participants, i.e. the nodes in the network which are willing to
do the consenus work.
In a centralized banking architecture, trust in the banks is sucient that each banks
does the validation. In the Bitcoin network, populated by anonymous strangers, a dif-
ferent approach is needed: Economic incentives and cryptography dene the incentives
for miners to do the validation of the block.
The consensus for Bitcoin is called proof-of-work (PoW). Each miner is free to choose
the amount of transactions which he wants to validate. They solve a purely numerical
problem unrelated to the blockâs content (mining). More precisely they solve a cryp-
tographic puzzle using a try-and-error approach indicated by 'K' in the gure. A miner
who solves his problem rst attaches his proof-of-work, to his block and then broadcasts
it where all other miners can easily verify the correctness of the PoW: The PoW requires
computer power while the validation of the PoW is simple. The validated block is added
to the blockchain. The PoW requires eort to play (energy spent by the computers,
investment in the hardware for example). Nakamoto (2008) argues tha PoW generates
a stable consensus, i.e. a single chain, if miners always take the last solved block as
the parent for their next block. Each block under PoW-consideration needs to make
reference to a yet validated block where the new block after consensus nding should be
linked to, that's why we speak about a blockchain. This reference is done using a hash
value. Changing a past validated block by the miner changes the hash which leads to
inconsistencies in the blockchain. The participants by construction consider always the
longest chain which contains legitimate transactions. Therefore, to cheat a miner needs
to be able to recalculate a whole chain afresh for validation before a single new block is
validated by another miner. This is practically not feasible. To generate a new block
by a miner takes just a short time. Without restricting this block generation process,
validation for consensus would become impossible since the frequency how blocks are
generated dominates the speed of propagation in the network. Therefore, the process is
slowed down such that on average each ten minutes a new block is mined and veried.
6.3. BLOCKCHAIN 431
The winner gets remunerated for his eorts (new Bitcoins generation). He takes it
all. In a PoW one authenticates the fact that resources have been spent, such as calcu-
lation power, time, money or physical energy, to solve a cryptographic problem. These
denes the economic incentives: The more computer calculation power a miner invests
alone or pooled with other miner the higher the likelihood that they will mine the block
as rst ones. If Alice wants to cheat by using a double-spending strategy, she rst has
to spend resources in order to validate the block containing her fraudulent transactions.
PoW validation is a peer-to-peer type consensus mechanism since the validation can be
veried to be true by all miners. No trust is needed and no node can simply claim to have
found a key without having spent resources due to the easy possibility of verication of
the candidate solution.
Summarizing, PoW means for MDLT to reach decentralized consensus by interacting

with distributed miners which keep the record. A decentralized consensus architecture
can be more robust for contracting between the network agents. On the opposite side
the to achieve any consensus means that information has to be spread to a certain min-
imum degree in the network for validation of the PoW. These two forces, robustness of
decentralized consensus which enhances contractability versus the need of information
distribution which aects, dene a natural tension on the MDLT. In other words the
viability of MDLT depends on the balance between these two forces.
Summarizing, key aspects of blockchains are:
• The rules are contained in the Bitcoin protocol, an open source cryptographic
protocol.
• A blockchain exchanges values using the internet in a distributed ledger framework.
• Switching from single third party trust to distributed ledger trust for transactions.
• Unambiguous ownership rights at any moment in time due to the consensus mech-
anism.
• The P2P complete stranger consensus PoW is the most expensive and slowest
consensus mechanism.
• Approved data in the distributed ledger cannot be changed - immutable history of

transactions exist.
• Persistence: The blockchain is independent of service providers, device manufac-

turers or any type of applications.
• Fork. Assume that a miner attaches his mined block not to the last validated but
the second last one - a fork follows. Miners can choose to attach validated block to
the original chain or to the other one of the fork. Then there are competing versions
of the ledger. Forks reduce the the credibility and reliability of the blockchain.
Even if, eventually, all miners agree to attach their blocks to the same chain, the
occurrence of the fork is not innocuous. A fork can also occur when some miners
adopt a new version of the mining software that is incompatible with the current
version. Does the blockchain protocol rule out the occurrence of forks?
We close this section with an analogy to blockchains where the values are massive
pieces of stone. The Coin of Yap problem is a problem which the population in the
Yap islands in Western Pacic Ocean faced. It shows some similarities to the blockchain
trust and ledger issue for the crypto-currency Bitcoin. The Yaps produced stone money.
There were ve dierent sizes of stones where the largest one needed around 20 men
to be transported. It was not possible to carry the stones from one island to the next
one for exchange reasons using the canoes. How could one use the stones for payment
if they could not be physically exchanged against the goods? The solution was to store
the ownership information in the consciousness of the Yap people (the blockchain): The
Yap knew who owes the dierent stone pieces. They did not need to move them when
ownership changes since the public memory records the changes in ownership. There is
a society consensus over ownership. If there is a conict, the stronger strain wins. Due
to the limited size of islands and population the system costs never became too high to
become ineective.
6.3.7 Dierent Blockchain Types, Type of Consensus

There are dierent blockchain types or architectures. From a ledger-distribution
perspective among the users centralized, decentralized or distributed topologies are
possible, see Figure 6.22. From an authorization perspective, permissionless and
permissioned blockchains can be distinguished. In the former, anyone can participate
in the verication process whereas in the latter one, verication nodes are preselected
by a central authority or consortium. Contrary to permissionsless networks, the actors
in a permissioned network are named. The intention is that they are also legally ac-
countable for their activity. The transactions in such networks will be predominantly
so-called o-chain assets - at currencies, titles of ownership, digital representation of
securities - whereas in the permisssionless world on-chain assets such as virtual currency
are transacted. Since the number of actors is smaller in permissioned blockchains, only a
small number of participants need to operate which makes such networks more scalable
than the permissionless ones. Furthermore, actors in a permissioned network are not
anonymous and therefore the time-consuming and expensive PoW is not needed. Much
simpler and faster consensus schemes apply. An example of a permissioned blockchain
is Ripple. Summarizing, mutually distributed blockchain technology, which is owned by
nobody, and the powerful monopolistic or oligopolistic structures owned by FI have to
be dierntiatd. For the latter ones blockchain should be used to reduce complexity, to
lower costs (no trade conciliation, using digitized version of contracts (smart contracts)
but it is not in their interest to give up ownership over their business value chains.
The market dynamics for blockchain consists 2017 of about 300 start-ups worldwide
and more than eighty percent of the global banks running blockchain projects (WEF
6.3. BLOCKCHAIN 433
Figure 6.22: Emergence of dierent network topologies (Celent [2015], UBS [2015]).
(2017)). 20% of the global banks will have a commercial blockchain product by the end
of 2017 (IBM (2016)) and global investments in this technology are estimated to be USD
1.5 bn (WEF (2017)).
To PoW is a costly way to reach consensus. In February 2018 the energy needed to
perform the PoW is similar to total energy consumption of Romania.
There are alternative consensus mechanisms. Proof-of-stake is also based on algo-

rithms. Users of the technology are asked to prove ownership over a stake (a currency or
any other asset). There is no competitions as in the PoW and the mechanism is much
less energy consuming than the PoW faster and overall cheaper. Whatever consensus
mechanism one uses, cryptography plays a dominant role.
6.3.8 Blockchain Examples

Blockchains can be used in many areas. Besides logistics and transportation, healthcare
or the energy industry as examples, the technology can have an impact in several areas
of the nancial industry: Clearing and settlement, brokerage and nancial research ac-
tivities, correspond banking, trade nance, remittance and payments, trust and custody
functions in asset management, smart contracts for automated, self-controlled manage-
ment of nancial contracts and distributed storage, authentication, anonymization of
private information.
6.3.8.1 Bitcoin
Considers miners which want to do the PoW, see Figure 6.23. The problem in the PoW
should be dicult to solve, has a solution that is easy to verify and whose diculty can
be scaled over time. A miner selects the message M of Alice ready for validation and
selects a random number k , the nonce, and let all information running through the hash,
i.e. he calculates ](M + k). If this result is larger than the thresholds T , he chooses a
new k and continues until ](M + k) < T . Then the miner broadcasts k and Bob can
easily check that the hash is indeed smaller than the threshold level. We note that the
nonce is a 32-bit data string. Varying the 32-bit nonce is a trivial task since 232 amount
to around 4 billion possibilities which today can be checked in a few seconds. Therefore,
to increase the complexity the transactions are grouped into a so-called Merkle tree form.
In a Merkle tree data blocks are grouped in pairs and the hash of each of these blocks is
stored in a parent node. The parent nodes are in turn grouped in pairs and their hashes
stored one level up the tree. This continues until the root node is reached.
#(Block)<Target
Figure 6.23: Proof-of-work. The information which is mind consists of the time stamp,
the information of the transactions, the reference to the hash of the former mined block
and among other pieces of information about the nonce, (How Bitcoin Works Under the
Hood, https://www.youtube.com/watch?v=Lx9zgZCMqXE).
The SHA-256 has function is used whose output is 64 digit string. Consider the hash
000000000000004c296e6376db3a241271f43fd3f5de7ba18986e517a243baa7.
which was the hash 2013 of a block ready for the miners. It has 16 ciphers, all number
6.3. BLOCKCHAIN 435
from 0 to 9 and letters from a to f. The hash starts with 16 zeros, the threshold level.
The diculty of the problem is not constant over time, this means the number of zeros
in the header is varying in a non-manipulable way. The diculty is calibrated in such
a way that it is possible to nd a block in about 10 minutes. The SHA hash goes one
way: It has 2256 outputs which one needs to evaluate in order to break the hash or to
calculate the input.
The Bitcoin system is managed dierently from a centralized network. How is the
management organized such that the system can be improved and deciencies can be
corrected when there is no central party with the power to do so? To prevent that a
member of the network changes the network which is not in the interest of the users can
be avoided by either sanctioning such actions or by setting incentives such that for each
member the dominating strategy is not to deviate from the existing rules, i.e. a kind of a
Nash equilibrium. It is this game theoretic concept which is implemented in the Bitcoin
system. To allow for changes, a voting system is used where a predened majority has to
exist before a change is implemented. This democratic rule is very complicated since not
all nodes have the same rights and action spaces (miners have an advantage) but other
user groups also have the possibility to form coalitions which then can try to enforce
their views. In any case, in such a DMLT, no one can be forced to follow any decision.
If part of th e community is not willing to follow a change but they decide to use the
old code, then the system separates into two systems: A fort realized. In a soft forc the
rules for consensus are stricter than in the original chain, i.e. new ledger entries are also
valid under the old system. In a hard fork, the new register entries are not longer valid
under the old rules of the blockchain before the fork happened.
6.3.8.2 Settlement
The process where a buyer and a seller agree to exchange a security (trade execution) and
the date where the trade is settled (assets are exchanged) can be 2 or 3 days depending
on the jurisdiction and the type of asset. A longer period between trade execution and
settlement raises settlement risk - the risk that one leg of the transaction may be com-
pleted but not the other, and counter party risk - one party defaults on its obligation.
Besides the reduction of risk, a decentralized blockchain technology could also reduce the
costs the trade and settlement process.
A standard trade-clearing-settlement process life cycle can be described as follow

(Bliss and Steigerwald [2006]):
Trading.
• The investors (buyer and seller) who wish to trade contact their trading member
which place their orders on the exchange.
• The trades are executed in the exchange or any other platform such as a multilateral
trading facility or an organized trading system.
Clearing.
• Clearing members who have access to the clearing house or the central counter
party, which are also trading members, settle the trades.
• Clearing and settlement can be bilateral, i.e. settled by the parties to each contract.
The G20 enforces after the GFC to switch from bilateral to central counter party
(CCP) clearing for the OTC derivatives. A CCP acts as a counterparty for the
two parties in the contract. This simplies the risk management process, as rms
now have a single counterparty to their transactions. Through a process termed
novation, the CCP enters into bilateral contracts with the two counterparties, and
these contract essentially replace what would have been a single contract in the
bilateral clearing case. This also leads to contract standardisation and there is a
general reduction in risk capital required due to multilateral netting of cash and
fungible securities. Therefore, CCP means that the bilateral clearing topology is
transformed into a centralized or star shaped one. From a systemic risk perspective,
while the more risky bilateral connections are replaced by less risky centralized ones
the major risk concentration is now located in the few CCPs.
Settlement.
• The two custodians, who are responsible for safeguarding the assets, exchange the
assets where a typical instruction is 'delivery versus payment': Delivery of the
assets will only occur if the associated payment occurs.
Using a blockchain means to transform the centralized CCP topology back into a decen-
tralized one where there is no need for an CCP. In the trading-clearing-settlement cycle,
a consortium blockchain can be used as follow to satisfy the present standards. On the
trading level, a consortium of brokers can set up a distributed exchange, where each of
them operate a node to validate transactions. The investors still trade through a broker,
but the exchange fees can be drastically reduced. On the clearing level, a consortium of
clearing members can set up a distributed clearing house, thus eliminating the need for a
CCP. Contrary to bilateral clearing, the contract stipulations are administered through
a smart contract which reduces risk management issues. If the securities and money are
digitalized, settlement does not need any custodians with securities depositories but the
assets are part of the permissioned blockchain.
6.3.8.3 R3CEV, Corda

R3CEV is a rm that leads a consortium partnership with over 100 of the world's lead-
ing nancial institutions. The goal is to design and deliver advanced distributed ledger
technologies to the nancial markets around the world. The blockchain used in described
in the white paper Corda (Brown et. al (2016)).
6.3. BLOCKCHAIN 437
Consider banks (the nodes) which search for a technology to record and enforce
nancial contracts such as cash, derivative or any other type of products. More precisely,
the banks want to record and manage the initiation and the life cycle of nancial contracts
between two or more parties which is grounded in the legal documentation of the contracts
and which is compatible with the existing emerging regulation in an
• ecient way: duplications and reconciliation of transactions are not necessary.
• open way: every regulated institution can use the technology.
• appropriate privacy/public mix way: consensus about transactions is reached on a

smaller than full ledger level.
These requirements lead to the solution Corda. We state the most important changes
compared to the Bitcoin blockchain.
First, there are no miners and there is no proof-of-work since no currency needs to
be generated (mining) and due to the mixed private/public association of information
no general consensus on the ledger is needed. The advantages are avoidance of costly
mining activities, of a deationary currency and of a concentration of the mining capa-
bilities in a few nodes. Second, Bitcoins can only contain a smaller amount of data due
to the xed length data format. This is not useful if one considers all economic, legal and
regulatory information in an interest rate swap between two parties. Corda encodes the
information of arbitrary complex nancial contracts in a contract code - the prosa of the
allowable operations dened in term sheets is encoded. Corda call this code state ob-
jects. Consider a cash payment from bank A to a company C . The state object contains
the legal text describing the issuer, the date, the currency, the recipient etc. and the
codication of the information. This state is then transformed into a true transaction if
the bank digitally signs the transaction and if it veried, that the state object is not used
by another transaction. Hence, there are two type of consensus mechanics. First, one
has to validate the transaction by running the code in the state object to see whether
it is successful and to check all required signatures. This consensus is carried out only
by the parties engaged in the transaction. In other words, the state object is a digital
document which records all information of an agreement between two or more parties.
Second, parties need to be sure that the transaction under consideration is unique. This
consensus which checks the whole existing ledger is done by an independent third party.
Summarizing, the ledger is not globally visible to all nodes. The state objects in the
ledger are immutable in the same way as we described it for blockchains. Given that
not all data is visible to all banks, strong cryptographic hashes are used to identify the
dierent banks and the data.
Why are the leading banks pushing this system? They can all use only one ledger
which makes reconciliation and error xing in today's individual ledgers at topic of the
past. Furthermore, the single ledger does not change the competitive power of the banks
in the ledger. The economic rationale, prot and risks to enter into a swap remain within
UBS and Goldman Sachs but the costs and operational risks of the infrastructure are
reduced due to the collaboration to maintain shared records. In other words, while the
banks keep the prot and loss from their banking transactions unchanged to the present
competitive situation, they reduce the technology cost part by cooperation.
6.3.8.4 Smart Contracts, Ethereum

The concept of smart contracts was invented by Szabo (Szabo (1997)):
Denition 96. A smart contract is a computerized transaction protocol that executes the
terms of a contract. The general objectives are to satisfy common contractual conditions
(such as payment terms, liens, condentiality, and even enforcement), minimize excep-
tions both malicious and accidental, and minimize the need for trusted intermediaries.
Related economic goals include lowering fraud loss, arbitrations and enforcement costs,
and other transaction costs.
The functionality of a smart contract means contracting on contingencies on a de-

centralized consensus at low-cost with algorithmic execution. Achieving decentralized
consensus a self-executing a distributed ledger is needed. Contingencies in a smart con-
tract are codied making automated execution is feasible and reducing enforcement cost.
XXX then dene:
Denition 97. Smart contracts are digital contracts allowing terms contingent on decen-
tralized consensus that are self-enforcing and tamper-proof through automated execution.
Thus, in a blockchain, other types of protocols than currency transactions can be

performed. Smart contracts, unlike cryptocurrencies, do not require validation (consen-
sus) through a cryptographic system. The blockchain network automatically enforces
execution of the contract when trigger events are realized. When an event occurs, the
computer code in the document triggers a pre-programmed action. This digitizes the
lifecycle management of contracts. The rational is to reduce manual expenses in the
management of contracts over time and a reduction in error rates. The risks are that
someone improperly programs the smart contracts and then makes wrong decisions.
An examples of a smart contract is a Bitcoin transfer between two agents which is

made dependent on some other conditions which extends the Ethereum decentralized
blockchain platform that handles smart contracts. Applications run as they were pro-
grammed. This takes place without any downtime, censorship, fraud or interference
from third parties. Such contracts are useful cost cutters in the life cycle management
of nancial contracts since for example the built-in software automatically carries out a
corporate action in the documents for given market signals.
Vitalik Buterin wrote 2013 the white-paper. The market capitalization of Etherum
amounted to USD 1 bn in October 2016 and USD 74 bn in December 2017. Ethereum is
an application platform. Eevelopers can create applications without building their own
6.3. BLOCKCHAIN 439
blockchain, again smart contracts are on top of an existing blockchain. Ethereum enables
peer-to-peer contracts and peer-to-peer applications through its own currency carrier. In
Ethereum the block time is set to 14 to 15 seconds compared to the 10 minutes at Bit-
coin. Ethereum can be used for crowdfunding, voting systems, options markets and many
other applications. What happens if the software of a smart contract has a fault or the
logic of the software allows someone to use the software in his favour?
This was the case in the so-called Decentralized Anonymous Organization (DAO)
Hack . DAO was a form of investor-directed venture capital fund. It was the biggest
crowdfunding experiment in the world raising USD 150 millions within 21 days. However,
on June 17 2016, a hacker exploited a security bug on the smart contract and transferred
USD 50 millions to his own account. Th cryptocurrency Ether lost 50% of its value
on the same day. Since the hacker did nothing illegal but was just smarter than those
who wrote the smart contract code there was a priori no reason to consider any actions
regarding the validity of the transaction. But many in the community were invested and
hence faced personal losses if one would not o-set the hacker's transaction.
Figure 6.24: The hard fork in the Etherum protocol.
The rst alternative was to cancel the transaction and restore the money to the DAO
users. The second choice was to do nothing. Then the hacker would keep with USD
50 millions and a lot of people invested in the DAO would loose their investment. The
cancellation of the transaction, leading to a hard fork, would enable all DAO investors
to exchange their tokens at a xed price, as in a currency reform. They only need to
update their software. The old DAO exists on the old Ethereum blockchain, but should
die out without an investor. The token of the hacker would become worthless. But parts
of the community refuse the update. They see a violation of the ideals of Ethereum.
In protest, they stay in the old blockchain and baptize it Ethereum Classic. Instead of
losing value, the DAO wins. The event damaged the reputation of the technology from
a security perspective. In addition, the community damaged its reputation during the
period: responsibilities were not clear, blaming started and it was not possible to nd a
single solution.
6.4 Currencies and Crypto-Currencies

6.4.1 Money and Payment Systems
Something is considered to be money if: It stores value, can be used as a medium for
exchange of goods and services and is a unit of account.14
6.4.2 Fiat Money

Fiat money is money backed by a monetary framework, an economy and a monetary
system, Central bank act as monetary policy makers, the ultimate settlement agent for
monetary transactions and hold reserves in gold as a trust backing facility. But ultimately,
the government's ability to raise taxes and the willingness of banks to lend money dene
the major monetary policy forces. Most currencies such as the USD, Remimbi are at
money and most digital money is at money too. We consider the three components
economy, monetary framework and monetary system. The economy denes the uctu-
ating quantity value of a currency given by the GDP, interest rate levels, ination etc.
The monetary framework denes the stable acceptance value - do the population ac-
cepts a currency? Trust in a currency is a function of these three system components. For
crypto currencies, the economy component is empty or it consists of a business promise
such as in Initial Coin Oerings ICO used by start up rms. An investor sells his dollars
to buy say Ether, hands over the Ether to the ICO, receives the start up token, hopes
that the ICO business will not default which then boosts the value of the token. The
ICO will change Ether with dollars to pay its investment activities. It is dicult to make
economic sense out of this long chain of transactions which could be shortened by simply
investing directly using dollars and receiving equity.
That people belief in the value of money is fundamental to any currency: It is not
possible to enforce value to a currency if people do not want to accept the currency.
14 Store of value means that money must be able to be reliably saved, stored, and retrieved and the
value must remain relatively stable over time. Medium of exchange means that it is used to compare
the values of dissimilar objects, as a standard of deferred payment, that is an accepted way to settle a
debt. Unit of account is a standard numerical monetary unit of measurement of the market value of
goods, services, and other transactions. Divisibility and fungibility are other characteristics of an unit
of account.
6.4. CURRENCIES AND CRYPTO-CURRENCIES 441
Trust Trust CHF Trust Bitcoin
Swiss Economy ?
Economy
Monetary System Monetary Monetary System Monetary

Monetary System Monetary • Banks Framework Framework
Framework • Blockchain
• SIC Payment System • Swiss State • Code • P2P
• CLS • SNB • Code
• PoW
Figure 6.25: Three components for at.
Figure 6.26 provides an overview over dierent currencies.
Figure 6.26: Overview of the dierent currencies. Source: Source: Bech and Garrati
(2017).
Table 6.3 summarizes some features of at money, money issued in a permissioned
blockchain and money issued in a MDLT.
Feature Fiat Crypto DLT Crypto MDLT

Permissioned
Storage Holdings Accounts FI DLT MDLT
Double Spend. Identity P2P restricted P2P open
TX Processing FI Trusted ledger nodes Proof-of-work
Settlement Central Bank Encoded Follow longest chain
Supply CB and loan policy Protocol Protocol
TCM Reputation FI Reputation issuer/DLT PoW
Scalability High (Visa, etc.) Not major importance Bounded
Users KYC KYC Anonymous
Power Centralized Centr. / Decentr. Majority rules
Recon Needed Not needed Not needed
Trade Reversal Yes Yes No
Risk Systemic Trusted nodes Loss pk, exchanges
Table 6.3: Summary of dierent monetary systems. TX Transaction, Rep Reputa-
tion, Recon Reconcilliation, TCM Trust Creating Mechanism, pk Private Key. Source:
Adapted and extended from Natarajan et al. (2017)
6.4.3 Bitcoin
First, Bitcoin represents a crypto-currency.
15 This means a unit of a Bitcoin is used
to store and transmits values between individuals who belief in this currency. Second,
Bitcoin represents a communication medium. All individuals using or creating Bitcoins
communicate by the Bitcoin protocol via the internet. The protocol is the code which
contains the set of rules used in the Bitcoin system.
At the time of writing, the number of Bitcoin transactions is around 3000 000 trans-
action per day which is approximatively equal to USD 3 bn at market exchange rates in
November 2017 and the market cap of Bitcoins by the end of 2015 is USD 261 billion
(Source: Blockchain.info). A crypo-currency combines two main components: A new
currency such as Bitcoin and a new decentralized payment system - the blockchain.
Bitcoin has value because people belief in it. If people stop to believe in it, as long as
there is no real economic production backing the coin, then the value evaporates. That
belief can be created quickly and vanish also rapidly. We consider the period after the
rst Gulf War in the Kurds region of Iraq. Kurds used in their areas of Iraq the Iraqi
Swiss Dinar.
16 Hence, although a legal tender existed in Iraq, the Saddam dinars, it
15 The text follows Antonopoulos (2015), Jogenfors (2016), Aste (2016), Khan Academy (2016), Boehme
et al. (2015) and Tasca (2016), BIS (2018). For an economic review see Bank of England (2014) and
Boehme et al (2015).
16 'Swiss' because the printing plates were made in Switzerland and stolen.
became worthless in the Kurd regions. People cannot to be forced to belief in a currency.
So far, we did not compared digital and crypto curencies, see Figure 6.27.
Digital Crypto
• Every non-physical currency is a digital currency • Subset of digital currencies
• Digital currencies consist of numbers and digits • Features: Privacy, distributed mutual ledger for transaction recording,
cryptography
• 90% of global currency are digital, most of it is a fiat currency
• Etherum, Bitcoin, Litecoin and more than 1’000 other crypto currencies
• Online banking, mobile payment, Paypal, Mint, credit cards are based on digital
currencies • 99% without a regulatory or institutional backing, most are not considered to
be legal tenders
• Digital currency possess a monetary regulatory and institutional setting. They
are accepted as legal tenders • Lost cryptographic key to get access to the cryptocurrency or stolen coins are
not replaced and lost for the economic system forever since there is no 3rd
• Money generation is mostly done by the inside money mechanism and party in the system
secondary by central banks
• Coins are generated by the mutually distributed ledger technology
• Convertible in cash. Payments are not public
• Fully transparent but full anonymous payment system
• There is no anonymity
• 7x24, payments without knowing the recipients, payments only possible in the
• 7x24, worldwide payments, no need to know the recipient peer group which accepts the coin
• Banks and other intermediaries act as 3rd party validators using accounts as • Trust, security and protection, see below
ledgers which are centrally stored and not public. Trust is in this validators.
Errors can be off-set, stolen coins are often replaced and a lost identification • High volatility, i.e. Vol BTCUSD is around 14 xtimes larger than Vol CHF USD
for authentication can be replaced by a new one (lost ID card)
Figure 6.27: Comparing digital and crypto currencies.
6.4.3.1 Fact and Figures

By 2011, 10 USD was worth 1 Bitcoin. In 2013, the exchange rate was up to 266 USD
for one Bitcoin. Shortly after the high, the exchange rate dropped by 80 percent. In
November 2013 the exchange rate was 1200 USD/Bitcoin. After the default of the plat-
form Mt. Gox, the rate dropped to a value of 340 USD/Bitcoin. In 2017, the value of
Bitcoin exploded. From around USD 800 in Jan to more than USD 16,000 in Dec 2017
and crashing to less than USD 6'000 at the beginning of Feb 2018. This shows the risks of
Bitcoin and in at the present status the non-usability to store value or to make payments.
The Bank of England (2014) states that volatility of Bitcoins is 17 times larger than
the volatility of the British pound: The use of Bitcoins as a short-term storage medium
is questionable although nothing can be inferred about its value as a long-term storage
medium. The number of transactions of retail clients is used to measure their willingness
to accept Bitcoins as a medium of payment. Since this number is not observable, proxy
variables are used instead such as data from 'My Wallet', see Bank of England (2014).
The analysis shows that the number of transactions per wallet is decreasing since 2012
to a value 0.02 transactions per wallet. Most clients buy-and-hold their Bitcoins instead
of using them. Finally, there is little evidence that Bitcoins are used as units of account
since.
The traditional payment systems which crypto currencies attack turned out in the
past to be safe, cost-eective and scalable, i.e. they handle high volumes. Visa, Master-
card and Papal handle between 3'500 and 240 transactions per second, while for Bitcoin
and Ether the number is a around 7-20.
17 Bitcoins are so far only cheaper to produce
than those in centralized system since the miners in the crypto-currency system receive
as a subsidy new currency coins for their proof-of-work eorts. Given that the production
of new Bitcoins is decreasing over the next decades, that energy production to achieve
consensus grows over-proportionally and if the exchange value of Bitcoin will not stabilize
at a large value compared to the USD, then the eect of subsidies diminishes leading to
increasing costs for Bitcoins issuance. Removing centralized trust by using a P2P trust-
less MDLT is costly in several respects. The energy consumption of the Bitcoin miners
equals 2018 total energy consumption of the 20 million nation of Romania. Etherum is
also highly energy intensive. It will be of vital importance whether other consensus than
the PoW can be designed and which will be accepted such that the MDLT will consume
much less energy.
The number of hashes drive energy costs. Aste (2016) estimates that to keep a capital
of around USD 10 bn secure in the Bitcoin blockchain annual costs of 10% are needed.
The reason is the number of hashes which are generated every second of 1 bn times 1 bn
fro the PoW. Given the high transaction costs, most users access their cryptocurrency
not directly but via an intermediary such as crypto-wallet providers or crypto exchanges.
That is, the main motivation of Bitcoin of not needing a central third party such as a
central bank end by trusting often unregulated third parties. It is then no surprise that
fraudulent or hacked institutions such s the Mt Gox leads to thefts and zero-recovery
losses for the users.
Permissioned crypto currency often do not face some of the above problems of Bit-
coins. The World Food Programme's blockchain-based handle payments for food aid
serving Syrian refugees in Jordan. The unit of account is centrally controlled by the
World Food Programme. Vsing a permissioned version of the Ethereum protocol, the
decits of Ethereum were overcome (slow, expensive) and transaction costs are reduced
by 98% also relative to bank-based alternatives.
Scalability is another limitation since the transaction ledger is growing over time.
The Bitcoin ledgers amount 2017 to 170 GB with a growth of 50 GB in 2017. Therefore
a simple Fermi-type calculation shows that the network size needed to replace standard
currency regimes is out of any feasible size. This not only means storage of data but also
the needed processing capacity for transaction verication.
17 Committee on Payments and Market Infrastructures, Statistics on payment, clearing and settlement
systems in the CPMI countries, December 2017; www.bitinfocharts.com; Digiconomist; Mastercard;
PayPal; Visa; BIS calculations.
Figure 6.28 shows market capitalization of Bitcoin, Ripple and Etherum, the average
transaction costs, that Bitcoin mining is around the 10 minutes as it should be and the
mining in Etherum takes much less time reecting the proof-of-stake approach. Com-
paring the number of daily Bitcoin - around 1000 000 by the end of 2015 (Coinometrics,
Capgemini) - with the number of daily transactions by Visa (212 mio.), MasterCard (93
mio.) and all other traditional entities together summing up to 340 mio. - the Bitcoin
percentage is 0.03% of this total transaction volume.
Market Capitalization in USD bn Average Block Time in min
Average Transaction Fee in USD

Average Hashrate (#/s) per day
Figure 6.28: Bitcoin statistical data (BitInfoCharts.com [2018]).
Although we focus on Bitcoin, there an ination of crypto-currencies. Coinmar-

ketcap.com reports that by September 2015 there were 676 listed crypo-currencies, in
December 2017 there were 1,376 crypto-currencies with total market cap USD bn 556.
2015, Bitcoin consolidated 85% of market capitalization and number two Ripple follow-
ing with 6%. 2017, the consolidated value of Bitcoin is fallen to 43 percent, Etherum as
number two has 12 percent. The tenth largest entity - Bytecoin - represented a market
capitalization of only 0.2% in 2015. In 2017, the currency Monero represented as num-
ber ten only 0.9%. Since the Bitcoin system can be copied, most new coins are pure
copies of the Bitcoin system or copies with minor changes. These coins are also called
altcoins. The many crypto currencies are undergoing a selection process where some of
them sharply increase in market share value in a period but then can also fast evaporate
to become insignicant. This evolutionary process triggers the question: How often will
people accept losses in a crypto currency which vanished unless they lose complete trust
in the whole crypto currency business?
The algorithm of the Bitcoin protocol dene the supply side. Therefore supply is
xed and inelastic which is one source of the high price volatility. Since every currency
loses value if it fails be a scarce resource, new Bitcoins are issued in a controlled way.
Bitcoins do not specify a claim on somebody contrary to digital money created by the
creation of loans since each loan creates a deposit position on the loan borrower's bank
account. The demand and supply for Bitcoins has no physical foundation and the total
supply of Bitcoins is limited to the creation of 21 million Bitcoins. Given the rule-based
creation process, this amount will be reached around 2041. With this xed supply side
and its diminishing rate of productions, Bitcoins are a deationary currency. Bitcoins
miners are in some sense the clearing houses which maintain the book-keeping system
and verify the validity of transactions.
Bitcoins do not have a well-dened governance structure as central banks have. The
identity of any participant in the network is for example unveried. This contradicts the
increasing regulatory and legal ghting against money laundering or tax hiding activi-
ties. Prominent in the early days were the uses of Bitcoin in the anonymous Silk Road
platform. The main activity in this platform was trading narcotics. The U.S. investi-
gation estimated that in the period Feb 2011 to Jul 2013 9.9 million Bitcoin payments
were made with an equivalent of USD 214 million. After the demise of Silk Road an
unclear number of successors or competitors are actively using Bitcoin. But the initially
signicant fraction of money inow into the Bitcoin system from criminal activities - the
residual value - signicantly decreased. Tasca (2016) reports that in 2012 the relative
income for black market and online gambling had a share in the Bitcoin income ow
of around 70%. This number collapsed in the last two years to less than 10%. Bitcoin
transaction are contrary to real or electronic payments strictly irreversible. This prop-
erty is due to the desire to keep the Bitcoin system at a manageable level. Changing the
protocol, as we discussed above, follows a complicated game theoretic motivation which
can lead to forks and where dierent types of network members have dierent rights and
the possibility to form coalitions if a change in the protocol is suggested.
From the risk perspective counter party risk of currency exchanges is critical. Ex-
changes active in Bitcoin charge transaction fees between 20 and 200 bps. The number
of such exchanges is modest since the exchanges need an internet infrastructure which
is able to withstand attacks. The rules to set-up an exchange are strict in the U.S. and
also in UK or Germany for example. Prominent is the default of Mt. Gox exchange in
Japan in 2012. They reported that they lost 7540 000 Bitcoins of their customers which
amounts to USD 450 million. The counter party risk of exchanges matters for the clients
since most convert their electronic currencies into Bitcoins and leave the Bitcoin at the
exchange. The exchange acts as a bank. Moore and Christen (2013) estimate that 45
percent of the currency exchanges terminated operations. While large exchanges often
faced security problems, the reasons for the smaller ones are unknown. Therefore, if the
exchange which in fact act as a bank holding Bitcoins accounts of the customers shuts
down counter party risk realized. The loss given default following Moore and Christen
(2013) is 46% - only 54% of the closed exchanges reimburse their customers.
Furthermore, the proof-of-work mechanics consumes a lot of physical energy. The

PoW is estimated to need as much energy in 2021 as Denmark. Therefore, only a few
networks such as that one for Bitcoin can be added in the world before touching the
limits of energy consumption.
The following statements which are often heard in the FI summarizes the discussion:
Blockchain yes; Bitcoin no.
6.4.4 Bitcoin Blockchain Security

So far the Bitcoin network did not suered from a fork. This leads to the folklore that
the blockchain underlying Bitcoin is secure. Can this statement be proven?
We start with some market facts. In 2016 the most active miners are located in China
who cover around 50% of the total market share (Tasca (2016)), followed by Europe with
around 25%. This is also reected in the traded currency pairs. The traded volume
CNY/BTC is about three times larger than the USD/BTC one. This dominance of
Chinese activity can also be observed in the number of active Bitcoin clients normalized
by the number of users which have direct access to the internet: The number in China
is around 5 times larger than the second largest numbers of the US or Russia. Bitcoin
start-ups raised around USD 1 bn in the three years 2012 − 2015 with an annual growth
rate of 150%. This rate dominates other start-up rates such as crowdfunding, lending
or banking in general by factor 2 − 3. If a mining pool gains 51 percent of computing
capacity, they can attack the network by rewriting in principle all blocks and generate a
new blockchain. The pool gash.io in January 9, 2014 possessed 45 percent of the min-
ing power and needed to appeal pool members to exit the pool. Summarizing, mining
industry is an oligopoly where the market share of the ten largest miners is between
70% − 80% by the end of 2015 (Tasca (2016). This raises security concerns since to gain
51% consensus about a block transaction verication becomes more risky the less miners
contribute to the majority value.
A stream of theoretical work focus on a rational analysis of the system. They treat
Bitcoin as a game between competing rational single miners or pools of miners which
maximize a utility function which captures the incentive structure for the system. The
goal is to prove under which condition Bitcoin achieves a stable game theoretic equi-
librium. Overall, the results are rather pessimistic. This means, unless one does not
impose strong conditions attacks on the Bitcoin mining protocol follow leading for exam-
ple to forks on the blockchain. Eyal and Sirer (2013) for example show that the Bitcoin
protocol is not incentive-compatible. They show that an attack from colluding miners
leads to a revenue which exceeds their fair revenue value. They propose a modication
of the protocol which then protects against selsh mining pools. Sompolinsky and Zohar
(2013) analyze the implications of high volune throughput on Bitcoin's security against
double-spend attacks. They show that the strength of the attacks can weaken to reverse
even accepted transactions if volume increases. They propose a reorganization of the
Bitcoin blockchain by new rules which have been implemented by the Ethereum project.
the expected success outlook of a competing mining pool. Lewenberg et al. (2015) ana-
lyze the stability of mining pools. The authors examine the dynamics of pooled mining
and how they should share the rewards when they behave in a cooperative way. Using
cooperative game theory, for particular networks under under high transaction loads the
distribution of the rewards is unstable. This means, some miners have an incentive to
switch between the pools. These ndings are in contrast with the empirical observation
no fork or substantial slowdown that is attributed to rational attacks has been observed
to date.
Given this dierence between theory and observations, Badertscher et al. (2018) ask:
How come Bitcoin is not broken using such an attack? Or, stated dierently, why
does it work and why do majorities not collude to break it?
Why do honest miners keep mining given the plausibility of such attacks?
They use a rational-cryptography framework for capturing the economic forces that
underly the tension between honest miners and deviating miners, and explain how these
forces aect the miners' behavior. They show how expected revenues of the miners in
combination with a high monetary value of Bitcoin, can explain the fact that Bitcoin is
not being attacked in reality even though majority coalitions are in fact possible. Hence,
assumptions about the miners' incentives, which depend solely on costs and rewards for
mining, can substitute the honest-majority assumption.
6.4.5 Initial Coin Oering (ICO)

Tokens are values derived from values that are maintained on a blockchain. Tokens can
thus be cryptocurrencies themselves but also represent commodity values or rm values.
They can be exchangeable and tradable. Creating tokens is a much simpler process than
creating block chunks: smart contracts and the templates of the underlying blockchain
create the tokens. Tokens are created to nance projects and distributed to the pub-
lic. This is done through an Initial Coin Oering (ICO), in analogy to an initial public
oering (IPO) for shares. However, there are important dierences between ICO and
IPO. Summarizing, tokens are pieces of code while coins store value (Bitcoin) and can be
merely used for transaction payments. The reseach as of July 2018 from Satis Research
states that from all 1,592 traded tokens and coins 828 were tokens and 764 were coins.
The market cap was only 13% for the tokens compared to the 87% for the coins.
What are the means for risk management and due diligence for an investor? ICO
write so-called with papers where they describe their business model and set-up a nice
6.5. DEMOGRAPHY AND PENSION FUNDS 449
webpage. That is mostly all information one has. Hence, besides the usual IPO risk
they also face the token-value risk. Given how easy it is to raise money in this way
it is evident that the ICO scene is crowded by criminals. The regulators in dierent
countries have therefore banned ICOs, such as China and South Korea. Catalini et al.
(2018) estimate that more than 20 percent of all ICOs are fraudulent either because the
website of the ICO discontinued after the ICO or due to the assessment of the white
paper. We consider the Swiss regulation by the Finma regulatory authority. The Finma
dierentiates between three types of tokens: payment tokens (PT), ie cryptocurrencies
that do not contain any rights to a counterparty, utility tokens (UT) that give access to
a protocol or a decentralized service and asset tokens (AT), which own a counterparty
and give the right to a cash ow stream. For PT, the focus is on the regulation of
money laundering, there is no regulation for UT and AT can comparatively be regarded
as unsecured nancial assets under Swiss law.
6.5 Demography and Pension Funds

We already considered parts of the topics demography, retirement provision and pension
systems. Before we continue to discuss these topics also from an asset management
perspective I remark that asset management is only a important tool for the solution of
the problems in the dierent retirement pillars which many countries face. Necessary for
the change of the dierent systems are deep political reforms which will restore the trust
of the populations in the retirement systems.
6.5.1 Demographic Facts

Not so long ago, in the years following World War II, the world was preoccupied with
population growth. Though population explosion is no longer the burning issue it once
was, we are still experiencing staggering population growth of 2 to 3 percent per annum.
Population pressure will of course mean a growing likelihood of mass emigration to other
parts of the world; in particular if those countries with strong population growth are hit
by the eects of climate change or war.
The economically most advanced societies face another population problem. Each fu-
ture generation will be smaller than the one that preceded it. For some, this has already
become a matter of national survival. Triggered by low fertility rates, this phenomenon
is gaining ground worldwide: 46 percent of the world's population has fallen into a low-
fertility regime. There is nothing to indicate that this rate is going to recover. Magnus
(2013) states that (i) the ratio of children to older citizens stands at about 3 : 1 but is
declining. By around 2040, there will be more older citizens than children. By 2050, there
will be twice as many older citizens as there are children, (ii) the number of over-60s in
the rich world is predicted to rise by 2.5 times by 2050 to 418 million, but the trajectory
starts to level o in about 20 years time. Within this cohort, the number of people aged
over 80 will rise six times to about a 120 million and (iii) in the emerging and developing
worlds, the number of over-60s will grow by more than seven times to over 1.5 billion by
2050, and behind this, you can see a 17-fold increase in the expected population of those
aged over 80, to about 262 million. Magnus (2013)
Malthus (1798) were the rst to study the interdependence between economic growth
and population growth. He assumed that as long as there was enough to eat, people
would continue to produce children.
Since this would lead to population growth rates in excess of the growth in the food
supply, people would be pushed down to the subsistence level. According to Malthus's
theory, sustained growth in per capita incomes was not possible; population growth
would always catch up with increases in production and push per capita incomes
down. Of course, today we know that Malthus was wrong, at least as far as the now
industrialized countries are concerned. Still, his theory was an accurate description of
population dynamics before the industrial revolution, and in many countries it seems
to apply even today. Malthus lived in England just before the demographic transition
took place. The very rst stages of industrialization were accompanied by rapid popula-
tion growth, and only with some lag did the fertility rates start to decline. Doepke (2012).
Hence, for Malthus children were a normal good. When income went up more
children were consumed by parents. Using a micro economic model, see the exercises,
the equilibrium supports the above intuition: An increase in productivity causes a rise
in the population, but only until the wage is driven back down to its steady- state level.
Even sustained growth in productivity will not raise per capita incomes. The population
size will catch up with technological progress and put downward pressure on per capita
incomes. This model explains the relationship between population and out-put for almost
all of history, and it still applies to large parts of the world today. Doepke (2012).
Since fertility rates decreased in Europe in the nineteenth century, per capita could
grow. What are the causes for this growth? We consider time-cost factor of raising
children. In the Malthus' model, all labor is of equal quality. In modern economies,
human capital has two components. Innate human capital that is possessed by every
worker, regardless of education. In addition, people can acquire extra human capital
through education by their parents. Further new feature are that parents must invest
their time and not goods to raise children. As a result, the growth rate of the population
is not constant. It depends on the human capital of the parents. The lower their human
capital, the higher the number of children. Contrary, if human capital is high, fertility
falls. Two factor drive this outcome. An increasing human capital means an increase
of the value of time. Then, the education of children becomes very costly and hence,
parents decide to have less of them. The second reason is that people with high human
capital prefer quality over quantity since there are better at teaching children and this
makes it more attractive for them to invest few children.
In developed, Western countries, persistent sub-replacement fertility levels, ageing,

and immigration are recognized as the three major population policy issues. Sub-
replacement fertility and immigration, in particular, are areas in which eective policies
are hard to come by. The debate, May (2012), is marred by controversy and passion
and discussions on policy issues are polarized. Policy actors seem to be torn between a
laissez-faire attitude and increasing immigration. Increasing immigration has two serious
limitations. First, the level of immigration cannot grow arbitrarily high without gen-
erating political tensions. Second, it is becoming increasingly dicult to nd the kind
of migrants one wishes to attract since more and more countries are striving to attract
highly skilled migrants. Japan, South Korea, and Taiwan populations are shrinking.
Yet they still resist immigration. They choose automation as a response to dwindling
manpower. In Western democracies, immigration has become an ideology to the extent
that any rational discussion thereof is barely possible. While any forecasts regarding
personal longevity are uncertain, in the last 150 years women have seen their average life
expectancies increase at a rate of three months each year. All those who have forecast
that growth in personal longevity will come to a standstill have been proved wrong. But
there are currently two factors that could well put a stop to growth in average longevity:
the rapid growth of so-called lifestyle illnesses and increasing medical care costs. The
breakdown of the Soviet Union showed that once medical care fails to maintain its level
of quality for the whole population, that population's life expectancy quickly falls signif-
icantly.
The speed of ageing is dierent for dierent countries; see Magnus (2013). In France,
for example, it took 100 years for the proportion of the population over 60 years old to
double from 7 percent to 14 percent. The pace in emerging markets is much dierent.
For Indonesia, Brazil, China, or India, the time taken for this proportion to double is
only around 20 years. That is, the speed of ageing is rising rapidly in emerging economies.
But ageing in developed countries occurs in parallel with better health, more extensive
education, and related societal changes. We are not just living longer, we are slower to
age. Boersch-Suppan et al. (2005, 2006, 2013) make this precise. They nd that:
• The average expected healthy life expectancy of men at the age of 65 is larger than
5 years for men living in any European country.
• Using more than 4.8 million data sets of a large insurance company, the authors
measured the productivity of dierent aged workers for dierent type of work
classes: Contract negotiation (the most challenging jobs), standard advice of cus-
tomers and repetitive jobs. They found that older workers made more error in the
repetitive jobs than the younger ones but were signicantly more productive in the
challenging jobs than their younger counter parts.
• The intergenerational warfare is a myth. There is no support in the data used to

analyse conicts between children and parents of such a potential warfare.
Fact 98. The discussion whether we can work until the age of say 67 is for the average
population not related to its healthiness. The retirement age could from a health point of
view be raised to 70 years. The tendency of ring older works is destruction productivity
since the experience of the older, motivated workers generates a higher productivity for
demanding jobs as this is achieved by younger ones.
We spend longer in education; we travel more before permanently joining the work-
force; we start families later. We don't think of ourselves as being as old as previous
generations would have at the same age. The eect of all these changes taken together
is not that society is ageing, but that it is getting younger. Finally, a society with
a predominantly young population has a dierent productivity level than a more aged
population. Syl and Galenson show that 40 percent of productivity increases are down
to young people who enter new markets. These young people break with tradition and
manifest new ways of thinking. Google and Facebook are two prominent examples. Older
individuals possess more experience and wisdom. But Syl and Galenson state that this
only gradually changes productivity.
To manage the emerging demographic regime, innovative policies and new ways of think-
ing about population are called for. Romaniuk (2012). This change in the structure of
society will have many consequences. One of the most signicant will be a labor short-
age. If societies are going to maintain their standard of living, they are going to have
to avoid any reduction in the workforce as a proportion of the total population. At the
same time, many people are going to reach retirement age and realize that they do not
have enough income to maintain what they feel is an acceptable standard of living. The
combination of these two issues will put a lot of pressure on our current views on the
relationship between working and retirement. Employment and retirement laws designed
for a young and growing population no longer suit populations that are predominantly
old but healthy and capable of being productive, all the more so in a work environment
of automated technology. Prevailing family assistance policies are equally antiquated.
Though the maternity instinct may still be present as it always was, women's conditions
have radically changed. The women of today in developed countries, and throughout
the modernizing world, are faced with many deterrents to maternity (e.g., widespread
celibacy, marital instability, nancial insecurity) on the one hand, and with many ful-
lling, nancially well-rewarded opportunities on the other. So much that they are left
with little incentive to trade the latter for the uncertainties of motherhood.
It is easier to bring population down than to make it up, writes John May (2012).
And that is why - in order to escape the sub-replacement fertility trap and to bring the
fertility rate to, and sustain it at, even a generational replacement level, Romaniuk (2012)
- we need to bring to bear meaningful nancial and social rewards for maternity. The
current family allowance and other welfare-type assistance to families cannot do this.
Societies under a demographic maturity regime may need to have in place permanent,
'life-sustaining' mechanisms to prevent fertility from sliding ever lower. Instead we need
a more balanced resource allocation between production and reproduction.
Impact on Retirement Systems
With such demographic development, it will not be possible to meet the promises of the
three pillars of social welfare in many countries. This will lead to more saving behav-
ior on an individual basis and solidarity between generations (the rst pillar) will come
under stress. In order for the retirement system not to collapse, the state will have to
dene reforms, see Albrecher et al. (2016). Will it save the rst pillar - that is, it will
secure the minimum necessary standard of living for all? How will the second and third
pillars be changed or will they disappear? As a result, people will individually save more -
because they have to and because condence in the social welfare system will not increase.
The Melbourne Mercer Global Pension Index report (MMGPI [2015]) from the Aus-
tralian Centre for Financial Studies and Mercer compared the status of the retirement
systems of 25 countries. The index is based on the following construction; see Figure
6.29.
Benefits Coverage Regulation

Savings Total Assets Governance
Tax Support Contributions Protection
Benefit Design Demography Communication
Growth Assets Government Debt Costs
Adequacy Sustainability Integrity
40% 35% 25%
Melbourne Mercer Global Pension Index
Figure 6.29: The Melbourne Mercer Global Pension Index (GMMPI [2015]).
Although it is called a 'pension index', it allows one to consider the entire retirement
systems of the dierent countries. Figure 6.30 summarizes the results for the 25 countries
surveyed.
6.5.2 Pension Funds

The pension fund assets in the OECD member countries encompassed USD 23 trillion
in 2014. The collision between demographics and the strong reliance on pay-as-you-go
systems in developed countries requires resolution; if not, these problems can be expected
Grade Index Value Countries Description
A >80 DK, NL Robust retirement system that

delivers good benefits, is
sustainable and has a high level
of integrity
B+ 75-80 AU
Compared to A-grade it has some
S, CH, Finnland, CA, Chile, areas for improvement
B 65-75
UK
C+ 60-65 Singapore, D, Ireland It has some good features but also
has major risks and/or
shortcomings. Without adressing
C 50-60 F, USA, Poland, SA, BR, A, I,
them efficacy and/or
Mexico
sustainability can be questionned.
D 35-50 Indonesia, China, J, South A system with major weaknessses
Korea, India and omissions that need to be
adressed.
E <35 Nil
A poor or non-existing system
Figure 6.30: Summary for the 25 countries in the Melbourne Mercer Global Pension
Index as of 2015 (Adapted from GMMPI [2015]).
to spread to the rest of the world. There are a number of ways of approaching this,
including (Walter (2007)):
• Raising mandatory social charges on employees to cover increasing pension obli-

gations. This is very problematic due to the 'inverse' demographic pyramid and
becomes even more dicult to implement in countries where individuals already
face a high tax burden.
• Cutting retirement benets. Limiting the growth of pension expenditures to the

projected rate of economic growth starting in 2015 reduces the income-replacement
rate from 45 percent to 30 percent over a period of 15 years. Walter (2007). This
would push retired people with low personal saving resources into poverty.
• Increasing the retirement age. For countries with a high unemployment rate this
is not a feasible alternative.
• Reforming the systems away from pay-as-you-go toward dened-contributions or

dened-benet pension plans. This is a possibility, and would create a huge demand
for professional asset management.
Keeping the pay-as-you-go systems and reducing the contribution to the pension
funds. These changes impact asset management. The demographic problems in devel-
oped countries and the diculties in nding structural solutions will force pension funds
to increase their investment performance.
The asset allocation of pension fund assets diers signicantly between countries.
The exposure to growth assets (including equities and property) varies and ranges from
less than 10 percent, in India, Korea, and Singapore, to about 70 percent in Australia,
South Africa, the UK, the US, and Switzerland. GlobalPensionIndex (2015). The more
growth assets are included in the asset allocation, the larger are the risks: there were
signicant declines in the value of assets in 2010 and 2011 reecting the consequences of
the global nancial crisis of 2007 and 2008. However, since that time there has been a
steady recovery in the level of pension assets in each country surveyed as equity markets
have recovered. GlobalPensionIndex (2015).
6.5.3 Role of Asset Management

Asset management can support the pension system in three respects: First, asset manage-
ment could become more ecient - this means to save costs. Second, asset management
could expand the range of solutions - the investment strategies and nally, asset man-
agement could expand the investment opportunity set - the assets.
The expansion of investment strategies means to apply factor investing for example.
All pros and cons of last sections also apply to the pension system case. The third pos-
sibility means to make some illiquid asset classes accessible for pension funds. Examples
are private equity, insurance-linked investments and securitized loans. These are the
typical examples given.
Example
Asset managers can become more important nancial actors by driving the raising of
capital and the capital deployment required to meet the demands of growing urbaniza-
tion and cross-border trade. The world urban population is expected to increase by 75
percent between 2010 and 2050, from 3.6 billion to 6.3 billion. The urban prole in the
east will see many more 'megacities' (cities with a population in excess of 10 million)
emerging. Today's number of 23 megacities will be augmented by a further 14 by 2025,
of which 12 will be in emerging markets.
This will create signicant pressure on infrastructures. According to the OECD, USD
40 trillion needs to be spent on global infrastructure through 2030 to keep pace with the
growth of the global economy. Some policy makers appear to have taken the problem
on board: in Europe - after considerable debate - the European Long Term Investment
Funds (ELTIF) initiative was nally created in 2013, helping European asset managers
to invest in infrastructure. But infrastructure investments will disproportionately target
emerging markets and emerging markets' asset managers have recognized this and already
started to focus on it.
Example - Metro Lima 2 project
The city of Lima intends to extend its metro network. Peru wants to narrow the exist-
ing large infrastructure decit in connection with projects in telecommunications, water,
sanitation and many more. Therefore, the government issues bonds to nance some of the
projects. To attract investors for such a project, three main challenges are rst a sound
governance of all parties involved in the project. There are more than 15 parties involved
in the project. An important one for investors is the grantor. Second, a project plan
linked to payments. The plan consists of milestones where there is a compensation for
each milestone dened in an agreement. All of the projects use a build-operate-transfer or
BOT model under which the project is eventually transferred to the government after the
private developer has been able to get his capital back plus a return. The BOT structure
incorporates RPICAO, a payment mechanism under which, by submitting a construction
progress report (CAO) to a government agency or state-owned company, a concessionaire
earns the right to receive compensation for construction costs incurred in connection with
a project. RPICAOs are denominated in either US dollars or local currency (adjusted
for ination), and represent an irrevocable and unconditional payment obligation of the
relevant government agency or state-owned company. Chadbourne (2015). Third, legal
setup. Peru, as grantor, is the direct obligor of the RPICAO. Figure 6.31 shows the legal
entities and the cash ows.
Whatever of the above measure is considered, it is evident that asset management

alone is not able to solve some of the fundament problems of pension funds which we
discussed above. At its best, the asset management function can help to reduce some
costs or to improve the likelihood of higher investment returns. But it cannot produce
what it is not possible - this means, to solve the problem of the demographic change.
But the asset management function can play an important role in two other aspects.
First, it can provide solutions for the baby-boomers with their asset decumulation needs.
Second, asset management will be central for increased private savings of individuals due
to the weakness of the rst and second pillar. The growth of the future AuM will arise
much more from this channel than from the traditional pension fund channel.
6.5.4 Investment Consultants

Investment consultants play an important role as intermediaries, in particular for insti-
tutional investors and pension funds. They oer the following services: asset/liability
modelling, strategic asset allocation, benchmark selection, fund manager selection, and
performance monitoring. Goyal and Wahal (2008) estimate that 82 percent of US public
plan sponsors use investment consultants, as do 50 percent of corporate sponsors. In-
vestment consultants have largely avoided the attention of academics with one notable
exception - Jenkinson et al. (2014). A recent survey by Pension and Investments [2013]
found that 94 percent of plan sponsors employed investment consultants. The ve leading
investment consultants worldwide - ranked by 'assets under advisement' - in 2011 were
Figure 6.31: The legal setup for the Metro Lima 2 project. Collections of tari payments
are collected and deposited into a collection account (1). If the amount collected is not
sucient to make the payments under RPICAO, then the grantor is required to deposit
sucient funds (2). The Peruvian Trustee makes the payments on the RPICAO to the
Identure Trustee on behalf of the Issuer, as RPICAO titleholder (3). If a commitment
termination event occurs prior to the payment of the purchase price by the PRICAO
purchaser, the Concessionaire pays the applicable amount to the issuer. MTC represents
Peru as a grantor of the concession. [Lima Metro Line 2 Finance Ltd].
Hewitt Ennis Knupp (USD 4.4 trillion), Mercer (USD 4.0 trillion), Cambridge Associates
(USD 2.5 trillion), Russell Investments (USD 2.4 trillion), and Towers Watson (USD 2.1
trillion).
What drives investment consul-

Jenkinson et al. (2014) ask the following questions:
tants' recommendations of institutional funds? What impact do these recommendations
have on ows? Do recommendations add value for plan sponsors?
The authors use data from eVestment and limit their analysis to US long-only equity
products, which can be considered to be among the ecient markets. In the approxi-
mate period 1999 to 2011, one-quarter of these products were recommended annually by
investment consultants and the rest were not recommended. This much larger number
of recommended products compared to the non-recommended ones remains stable in the
dierent years studied.
The authors nd, the rst question, that consultants' recommendations are partly
driven by past fund performance, but also by other soft factors such as service quality and
investment quality factors, Jenkinson et al. (2014) : to be recommended it is not su-
cient to have a strong return history. The authors then analyze whether the size of the
fees charged has an impact on the recommendation rate. If this were the case, conicts
of interest would be suspected. The analysis shows that this is not the case. Fees are
very similar for recommended and non-recommended products independent of the size of
the products and their styles (growth, value, small- and mid- cap). The fees are in line
with the fees in Section 5.3.3.3 - that is to say, close to 70 bps for larger products.
Recommendations, and in particular changes in recommendations, have a strong im-

pact on product ows (question 2): Moving from zero-recommendation to the case where
all consultants recommend leads to an additional inow of assets of USD 2.4 billion. On
a percentage basis, on average the extra inow equals 29 percent of the assets managed
by that product in the previous year, compared to a not shortlisted product.
The answer to the third question created a lot of public attention. They construct
equal- and value-weighted portfolio returns of recommended and non-recommended prod-
ucts. Using the returns of these portfolio they estimate one- (the CAPM), three- (FF),
and four- factor (FFC) alphas and excess returns over portfolios of selected benchmarks.
For the equally weighted portfolios, the returns of the recommended products were
signicantly lower than those of the non-recommended ones by the order of 1 percent in
magnitude, independent of the factor model chosen (see Figure 6.32). For value-weighted
Value-
portfolios, dierent factor models lead to dierent returns for the two alternatives.
weighted returns and alphas are consistently lower, suggesting that smaller products per-
form relatively better. Jenkinson et al. (2014). Summarizing the evidence: investment
consultants are not able consistently to add value by selecting superior investment prod-
ucts.
The underperformance of recommended products in the equally weighted case could

be explained by the tendency of consultants to recommend large products that perform
worse. However, after adjusting for dierent sizes, the explanation turns out to be wrong.
These results raise several questions. First, why do pension funds use - on a rational
basis - investment consultants that add no value? The argument, that consultants act as
insurance against being sued is simply not justiable. Second, it is dicult to understand
why investment consultants are virtually unregulated in most jurisdictions.
6.6 Uniformity of Minds

Technology not only connects the dierent worldwide market places, it also allows infor-
mation to spread about any given local event, without delay, to the rest of the world.
This fact may also homogenize the way in which people think and make decisions in ge-
6.6. UNIFORMITY OF MINDS 459
Figure 6.32: The table shows the performance of portfolios of actively managed US equity
products that experience a net increase (decrease) in the number of recommendations in
the twelve or twenty-four month period following the recommendation change. Perfor-
mance is measured using raw returns; returns in excess of a benchmark chosen to match
the product style and market capitalization; and one-, three-, and four-factor alphas
(corresponding to the CAPM, the Fama - French three-factor model, and the Fama -
French - Carhart model). Excess returns and alphas are expressed in percent per year.
All reported gures are gross of fees. The rst part of the table shows the results for
equally weighted portfolios of products whereas the second part of the table shows the
same statistics for portfolios of products weighted using total net assets at the end of the
previous year. t-statistics based on standard errors - robust to conditional heteroscedas-
ticity and serial correlation of up to two lags as in Newey and West (1987) - are reported
in parentheses. ***, **, * mean statistically signicant at the 1, 5, and 10 percent levels,
respectively. The benchmarks for the investment products are the corresponding Rus-
sell indices. Investment product large cap growth is benchmarked by the Russell 1000
Growth, the small cap value by the Russell 2000 Value, etc. (Jenkinson et al. [2014]).
ographically and culturally dierent places. Is such an alignment of minds taking place,
and - if so - what are the possible consequences? We follow Bacchetta and van Wincoop
(2013) and Bacchetta et al. (2013), all of whom compare the GRF or 'Great Recession'
of 2008 with the global economic recession (Great Depression) of the 1930s.
6.6.1 The Great Depression and the Great Recession

Figure 6.33 compares the economic impact on the US and non-US economies during the
GFC and the Great Depression.
Figure 6.33: Comparing global GDP growth (pecent, annual, real) in the Great Recession
and Great Depression for the US and developed non-US countries (Bacchetta and van
Wincoop [2013]).
There was basically no dierence during the Great Recession between the GDP
growth in the US and that in the G20 states representing the main worldwide econ-
omy without the US. But in the Great Depression, the decline in US GDP growth did
not spread with comparable intensity to the rest of the world. This indicates that while
the Great Recession can be called a global crisis, the Great Depression was more local
in nature. The authors show that the Great Recession was, in historical terms, the rst
global recession. The rst question is: How could the crisis spread from the US nancial
sector to the US real sector? The second question is: Why did the Great Recession
spread almost instantaneously from the US economy to the global economy - how did
the recession become a global one?
6.6.2 Uniformity of Minds

Bacchetta and van Wincoop (2013) show that standard macroeconomic approaches fail
to provide convincing explanations. Before we turn to the global issue, we reconsider
the US. First, one can consider - for the US - direct eects of the nancial sector on the
real economy. Examples of these direct eects include broken nancial intermediation
6.6. UNIFORMITY OF MINDS 461
leading to a credit crunch or stock market declines leading to negative wealth eects.
While such explanations sound convincing, they are awed due to the main methodical
problem of the GFC not being exogenous but endogenous in the macroeconomic cycle.
That is to say, the impact of the nancial crisis is not a separated output variable acting
on the economy but is part of the whole economy and must therefore impact the real
economy.
As many authors have shown, the nancial crisis was part of the so-called boom -
bust cycle of the real economy. Of particular importance are real-estate boom - bust
cycles. Reinhart and Rogo (2008) illustrate the following pattern. Set T to be the date
of a banking crisis. Consider the growth rate of the real-estate asset class some years
before and after this date. One typically observes that before T prices increase and that
they fall after or shortly before the banking crisis. In this sense, a nancial crisis is part
of a boom - bust cycle. The surprising aspect of the most recent crisis was not that
it happened, but that such a crisis could be strong enough to destabilize the nancial
system of a developed economy (the US, here).
Given this US view, how could the recession become a global one? The standard
channel for explaining global linkages is trade. But the US is not a very open economy,
and imports - for many countries - to the US are relatively small. There is no empirical
evidence of a link between openness in terms of trade and a decline in growth. Hence,
the macroeconomic trade channel fails to provide an answer to the question of how the
recession spread globally. Another possible channel is the nancial channel. That is
to say, the decline is asset prices and real-estate prices and changes to the credit supply
channelled into the real economies outside of the US. But this hypothesis is not supported
by empirical evidence either. While real-estate prices dropped in, say, Spain and Ireland,
they did not in Germany or Switzerland. While Switzerland has a much stronger nan-
cial link to the US than do most European countries, the European countries were much
more aected by the Great Recession. While some countries faced a decline in credit
supply, others did not. Although policy makers have often used the expression credit
crunch', rms participating in surveys about the period have indicated that - during the
Great Recession - lower demand was more important to them than reduced credit supply.
Summarizing, standard macroeconomic models cannot explain the global recession.
Bacchetta et al. (2013) argue that there must have been other drivers that caused
the global recession. They argue that it was not the globalization of the economy, as
considered above, but rather the globalization of how individuals form expectations that
was responsible for the recession spreading worldwide. This argument is, of course,
linked to questions of information technology, information transmission, and information
quality in worldwide terms. In contrast with the past, information today is spread almost
in real time around the world, it is more dicult to control information distribution, and
mainstream information is mostly costless to the consumer. Therefore, one can argue
that - given a nancial crisis and its related information ow - individuals around the
world had access to similar information sets upon which to form their expectations. The
authors claim that panic, by consumers and rms throughout the world, lead to declines
in aggregated demand in most countries. Such panic must show a systemic component
to have a worldwide impact. They assume therefore that such panic is rational or self-
fullling:
• Agents rst expect low future income due to the information available and uncer-
tainty at play at the beginning of the nancial crisis.
• This leads to low current consumption.
• This reduction in consumption lowers rms' current prots.
• This leads to low future production and income, which matches the agents expec-
tations as outlined in the rst step.
6.7 Climate Change and Finance

The climate change is one of the major threats and opportunities in this century. We
rst provide an overview about some key facts of the climate change including its po-
tential impact on society, bio diversity and the economy. Dierent examples are given
where the interplay of nancial innovation and technological progress leads to ecological
and economic meaningful solutions. The goal is to design solutions which do not rely on
government's command & control.
Data show that humankind is facing a climate change which largely will be irre-
versible. Global temperature anomalies of the recent past compared to the 1951â1980
show that the last years were the warmest one, see Figure 6.34.
Energy demand will further increase due to population growth, progressing indus-
trialization and increasing wealthiness. This will without any countermeasures increase
human made CO2 emissions. But keeping in line with the 2 degree goal a drastic reduc-
tion of CO2 emission is needed: For most countries 30 percent is ok but better would be
50 percent.
As stated above, we do not focus on governmental law or laisser fair but we consider
cases where it pays economically to reduce CO2 emission and to invest in energy eciency.
The results depend on the following facts:
• There is enough clean energy, i.e. energy which can be used to replace CO2 emitting
energy.
• The technology to transport energy eciently exists.
• There exist nancial market solution which match investor's demand for sustainable
investment with the demand for energy project nance.
6.7. CLIMATE CHANGE AND FINANCE 463
Figure 6.34: 2015 was the warmest year in the NASA/NOAA temperature record, which
starts in 1880. It has since been superseded by the following years (NASA/NOAA; 20
January 2016).
Figure 6.35 shows the impact of potential climate changes on dierent dimension of
humanity and the ecosystem.
Figure 6.35: Impact of potential climate change. Sources: Stern Review, IPCC, 4th
Assessment Report, Climate Change 2007, WWF and Credit Suisse 2011.
The increase in CO2 over the last 100 years has lead to a measurable change in cli-
mate. The data from researchers show that climatic change less related to risk but more
to uncertainty: There is lack of knowledge about the speed, the irreversibility, possible
feedback eects and some hidden non-linearities. The impact of the worst case scenarios
on GDP forecasts a drop between 5 and 20 percent. Estimates of Credit Suisse and
other institutions state that investment ows between USD 700 and 2000 billion per an-
num is required over the next decade to limit warming to 2 degrees Celsius.. This would
mean around 2 percent of global GDP per annum. The majority of the required cap-
ital investment is concentrated in low carbon energy, energy eciency, and low carbon
transport infrastructure. Low carbon energy is primarily linked to investment in renew-
ables, electricity infrastructure like grids and transmission and storage. The opportunity
is concentrated in China, the US and the EU27. They represent nearly 60 percent of
the mitigation cost. Figure 6.36 shows the distribution of the investments necessary to
achieve the 2 percent pathways. The matrix has the dimensions geographical location
and area of investment. The authora state dierent type of barriers aecting the current
decarbonization eorts. Since regulatory mechanisms do not exist yet which price the
externalities of carbon emissions technical and nancial barriers exist: The economics of
low carbon projects are often less attractive than those of their high carbon alternatives.
Structural barriers include network eects (consumer will not buy electric cars unless
there are workable and available charging solutions, but private investor hesitate to build
a charging network unless there is sucient demand), agency problems (the party making
low carbon investment is under existing structures often not the one which will benet
from the savings) or the status quo bias (strong bias towards maintaining the status quo
instead of making changes).
Figure 6.36: Annual investment required to achieve 2 degrees Celsius pathway is USD
700 bn. Sources: Credit Suisse/WWF analysis based (2011) on McKinsey's Climate Desk
tool.
To highlight the challenges in setting up projects on a large scale we consider the

DESERTEC concept. The bottom line of the project was that a 300 times 300 kilo-
meter thermal solar energy plant in a desert is sucient to generate enough energy
to cover the world wide electricity demand. DESERTEC includes energy security and
climate protection as well as drinking water production, socio-economic development, se-
curity policy and international cooperation. DESERTEC was founded 2009 in Germany
and rapidly gained support from large cooperates and politics. The goal was to produce
energy in the Sahara for Europe. To some extend similar projects on much lower scale
have been implemented in the US and other countries. DESERTEC faced several risks.
• Political risk. Countries such as Algeria, Libya, Saudi Arabia given their natural
oil resources have no interest in a solar energy project. But is not the case; in fact
Saudi Arabia owes leading solar energy institutes. Furthermore, the solar energy
project will be benecial for employment and job creation in these countries to a
far larger extent and due to the excess solar energy these states will be able to
create new farming land.
• Why should these countries produce energy for Europe given their own need? This
point led also to heavy debates in Europe and to a standstill of the project.
• Another risk factor is political instability. The events starting in 2011 demon-
strate that this risk exists and that without the protection of an army the project
cannot be sustainable. But which army should guarantee the functioning of the
technology?
• A further political risk is that the middle and northern part of Europe will depend
on a single point of entry in the Mediterranean region.
• Technology and nancing risk. Natural damage risks and energy losses in
the transport are not material. But the need to construct a new powerful energy
infrastructure in Europe triggers delicate nancial issues.
Besides the DESERTEC example other examples show the risks of large scale environ-
mental projects. The Lisbon Strategy, adopted in 2000, largely failed on its three pillars,
where the environmental pillar recognized that economic growth must be decoupled
from the use of natural resources. The overly complex structure with multiple goals and
actions, an unclear division of responsibilities and tasks and a lack of political engage-
ment from the member states let to its failure and GFC was then the nal blow to the
strategy. At the Spring Summit 2010 EU leaders endorsed the European Commission's
proposal for a Europe 2020 strategy. This new strategy puts knowledge, innovation and
green growth at the heart of the EU's blueprint for competitiveness and proposes tighter
monitoring of national reform programmes, one of the greatest weaknesses of the Lisbon
Strategy.
Another example is water pollution in Switzerland in the 60s of last century. Den-
ing incentives and providing nancial support by the Swiss government a new industry
emerged (clarication plants) and treatment of farming land was changed. After some
decades water from Swiss lakes or rivers is often potable. In the US, acid rain led to the
implementation of the Clean Water Act which solved also the problem.
6.7.1 Green Bonds

When it comes to nancial innovation, many dierent initiatives were raised since the
1980s. But most of them failed to be of sustainable success while so-called green bonds
are proving to be the favourite wrapper.
Market-based solutions have proven consistently more eective in protecting the

environment than government regulation alone. Project nancing, public/private part-
nerships, and tradable permits have come to supplement or replace conventional regula-
tion and purely tax-based instruments. This approach can minimize the aggregate costs
of achieving environmental targets while providing dynamic incentives for the adoption
and diusion of greener technologies. The most practical solution for building a greener
economy is to correct faulty pricing by making consumers and rms pay for the envi-
ronmental damage they cause. Once these negative externalities are internalized,
they will be incorporated in the prices of goods and services, creating real incentives for
the creation and adoption of clean technologies. One of the most compelling examples of
using these principles to x broken markets is that of cap-and-trade pollution markets.
In such markets, the cap (or maximum amount of total pollution allowed) is usually set
by government. Businesses, factory plants, and other entities are given or sold permits
to emit some portion of the region's total amount. If an organization emits less than its
allotment, it can sell or trade its unused permits to other businesses that have exceeded
their limits. Entities can trade permits directly with each other, through brokers, or in
organized markets.
The rst green bond was issued 2007 by the European Investment Bank (EIB) and
World Bank. For more details see www.climatebonds.net where the following data are
taken from. In November 2013 the rst corporate green bond was issued by a Swedish
company. Tesla Energy issued the rst solar ABS in November 2013. The biggest ABS
issuer is Fannie Mae. ABS includes solar ABS, green MBS, green RMBS, green CMBS,
and other types. The green bond market 2018 issuance reached USD 167.3 bn with over
USD 500 bn currently outstanding.
Using debt capital markets to fund climate solutions
The majority of the green bonds issued are green 'use of proceeds' or asset-linked
bonds. The following products fall in the category green bond, taken from www.climatebonds.net.
• 'use of proceeds' bonds. Proceeds from these bonds are earmarked for green
projects. The same credit rating applies as issuer's other bonds. Barlays Green
Bond are an example.
• 'use of proceeds' Revenue Bond or ABS Earmarked for nance of green projects
Revenue streams from the issuers though fees, taxes etc are collateral for the debt.
The Hawaii State ABS is backed by fee on electricity bills of the state utilities.
• Project Bond. This are ring-fenced for the specic underlying green project. Re-
course is only to the project's assets and balance sheet. Example is the Invenergy
Wind Farm bond which is backed by Invenergy Campo Palomas wind farm.
• Securitisation (ABS) Bond. They renance portfolios of green projects or proceeds

and they are earmarked for green projects. Recourse is to the asset pool such as a
pool of green mortgages. Tesla Energy is for example backed by residential solar
leases.
• Covered Bond. They are earmarked for eligible projects included in the covered
pool. Recourse is to the issuer and to the collateral pool. The Berlin Hyp green
Pfandbrief is an example.
• Loan. A loan is not a security. Loans are earmarked for eligible projects and full
recourse to the borrower in the case of unsecured loans and in the case of covered
bonds to the collateral. Examples are MEP Werke, Ivanhoe Cambridge and Natixis
Assurances (DUO).
Benets for issuers outweigh their additional costs compared to non-green bonds since
issuers must track, monitor and report on use of proceeds. The benets for the issuers
are reputation, branding and build up of know about environmental investments.
Green bonds are at and the same as for ordinary bonds, i.e. they are pari pasu to
vanilla issuance. As an outlook, investors with $ 45 tr of assets under management have
made public 'commitments' to climate and responsible investment. This is around 50
percent of all AuM.
6.7.2 Energy Contracting and Structured Finance

We consider in some details how the tech and nancing aspects are designed for energy
contracting. Such contracts are dened between the following parties:
• Energy eciency searching institution. An institution - public entity, a corporate

- in our case wants to reduce energy costs in an existing building or a new project.
To be specic we consider a large city administration.
• Energy solution provider. A corporate provides the technology to realize the energy
cost gains. The energy solution should lead to a substantial reduction in energy
costs.
• Financial solution provider. A FI oers dierent possibilities to nance the project.

The nancial solution should reect the particular nancial and political needs of
the city administration.
As an overview the following gures hold as rough rules in case buildings are made energy
ecient. The data are from Siemens (2015).
Type of Optimization Energy Saving Amortization Period

Measure & Visualize ∼ 10% ∼ 1 − 2y
Optimizing Operations ∼ 10 − 25% ∼ 2 − 3y
Building Services Eengineering ∼ 25 − 35% ∼ 3 − 7y
Renew ∼ 30 − 45% ∼ 6 − 12y
Table 6.4: Rough gures for building energy eciency measures.
Measure and Visualize means that a rm makes transparent its energy consumption
at well-chosen location within the rm. Elevators are often used since most people in
an elevator search for a x point to focus on or the entrance lobby is also well suited.
It has by now been reported in several studies that simple transparency or monitoring
without any other actions leads to an approximate energy reduction of about 10 percent.
It seems that such a transparency changes behavior of some employees leading to this
reduction.
When it comes to nancing the project a major requirement is that the project also
makes economic sense. That is, we require Gain > 0. The gain is a sum of investment
costs I and the savings of energy costs over time. Saving of the energy costs has four
risk sources:
• Investment risk. The amount I = I¯ + dI is equal to the expected costs I¯ and

possible deviations dI .
• Volume risk. I.e. the amount of saved energy ct is given by
ct = c̄ + dct
with c̄ the expected amount of saved energy once the project is nished and dct
the risk of deviation from the expectation.
• Energy price risk. I.e. the price pt of saved energy (oil, electricity, a mixture of
them) is equal to
pt = p̄ + dpt
with p̄ the forward/futures prices and dp the deviation risk from the forward prices.
• The last risk is counter party risk of the energy solution user - here the city admin-
istration. Depending on type of nancing the project the counter party risk matter
for the investors or not, see below for details. We write default risk is in the form
u = 1 − dk with 1 for not-defaulting and dk for the expected default rate.
The gain of the project can be written symbolically - i.e. without using summation and
discounting notation, but focussing on the dierent parts in the gain function - as follow:

 ¯
I, expected investment costs;




 dI, investment risk;
c̄ × p̄, estimated savings (costs and volume);



Gain = dc × p̄, volume risk;
c̄ × dp,



 energy price risk;
dc × dp,

cross risk;



dk × c × p,

default risk.
This denes the risk prole for the city without any structuring of risk. Therefore, the
next question is: Who bears which risk? Professional technology provider keep the
investment and volume risk due to their experience and their large project portfolio.
That is variation in these two factors are absorbed in a large project portfolio. Consider
an investor. The investor is willing to pay the expected investment costs I¯ in exchange of
participating at the future energy saving. That is, the city and the investor share future
energy savings: The city participates with c̄× p̄×a and the investor with c̄× p̄×(1−a) at
future energy savings. This denes the performance contract. The function a denes
as function of time future participation. Since the investment has to be paid back to the
investor, he will participate stronger at the beginning than the city. Else, the payback
time increases. In this setup the whole investment is risk free for the city. The only risk
which is not attributed is default risk of the city. Either it is passed and compensated to
the investor or the bank keeps this risk. This type is a structured product solution.
Other possible solutions are:
• City pays the project cash.
• City issues a bond.
• City issues a green bond.
• Bank issues a structured product (solution above).
• A special purpose vehicle is setup.
Before we consider some of these solutions we provide an example for the structured
product. Assume a project which payback time 4y. Then the amount of saved energy
c̄ = 25%. Assume that the project costs 100 in a currency, that a increases linearly from
10 to 40 percent, 1−a decreases linearly from 90 to 60 percent, that energy price risk
is ±2 percent per annum, that default risk of the city is 10 bps, that fees in structuring
the deal are 1 percent per annum and that interest rates are at at 2 percent.
Then,
• After 8y the whole energy savings belong to the city.
• After 5 years the investment amount is amortized, i.e. the years 6-8 generate return
for the investor.
• The return for the investor is in case of constant energy prices equal to 6.3 percent,
5.3 percent if energy price fall by 2 percent each year and 7.1 percent in monotone
increasing case. This return has to be corrected by the possible default of the city.
If the investor does not want to take this default risk, the returns are lowered by
the credit risk costs for the city.
Finally, if an investor wishes to get ride-o energy price risk the structuring delivers him
x energy prices or prices which are kept within a bandwidth.
From the other nancing possibilities we only mention the green bond. This bond
is issued by the city as an ordinary bond. The dierence to such a bond is the coupon
payment. The value of the coupon each year is determined by the price of the saved
energy amount, i.e. it is a coupon derived from the underlying value 'energy price ×
saved energy volume'.
Clearly such a construction requires strong legal and documentation work for and
between the dierent parties. Furthermore, more hazard issues exists: The energy solu-
tion provider can change an excessive price I for the investment to cover possible price
risk dI or the energy solution provider can predict biased low saved energy amounts to
reduce its energy volume risk. To avoid such potential disincentives, a simple solution
is let the energy rm itself invest into the project, i.e. to take a part of the investor's
stake. This then both reduces moral hazard related to the investment amount and also
to the expected energy volume savings since systematic deviations reduce the return of
investment.
Chapter 7
Proofs
We prove Proposition 24:
Proof. The proof follows from the fact that the variance of the sum is equal to the sum
of the variances since there is no covariance:
   
N N
X 1  1 X Nc
σp2 = var  Rj = 2 var  Rj  ≤ 2
N N N
j=1 j=1
with c the largest variance of all N assets.
Proof. The proof is only slightly more complicated than the former proof, and leads to
the result:
var 1
σp2 = + (1 − )cov .
N N
By increasing the number N of assets, the average portfolio variance var can be made
arbitrarily small - the portfolio variance is determined by the average covariance. But
the average portfolio covariance approaches a non-zero value.
Proof. Dierentiate both sides of the equation f (tu) = tf (u) with respect to t, apply the
chain rule, and choose t = 1. For the converse, let g(t) = f (tu). Since htu, ∇f (tu)i =
f (tu) we have
1 1
g 0 (t) = hu, ∇f (tu)i = f (tu) = g(t) .
t t
Solving this dierential equation for g implies g(t) = g(1)t. This implies f (tu) = g(t)t(f u).
We prove Proposition ??:
473
474 CHAPTER 7. PROOFS
Proof. Starting for the assumption µm = λµa +(1−λ)µp and assuming that the expected
return of the passive investment equals market return at once also follows that active
return has to be equal to passive return.
We prove the optimal dynamic investment decision rules of the Merton models 3.8:
Proof. We rst split the integral in two parts for small dt:
Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + u(t, c, W )dt + f (W (T ), T )
c t0 t0 +dt
dWt = g(t, c, W )dt + σ(t, c, W )dBt , W (t0 ) = w0 . (7.1)
Using the Principle of Optimality, the control function in the second integral should be
optimal for the problem beginning at t0 + dt W (t0 + dt) = w0 + dW . Hence,
in the state
Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + Et0 +dt,w0 +dW u(t, c, W )dt + f (W (T ), T ) .
c t0 t0 +dt
hR i
T
Optimality implies Et0 +dt,w0 +dW t0 +dt u(t, c, W )dt = J(t0 + dt, w0 + dW ), i.e.
Z t0 +dt
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + J(t0 + dt, w0 + dW ) . (7.2)
c t0
We next approximate the second value function since dt is small. This also allows us to
assume that the control c is constant over a time interval with length dt. We get:
J(t0 , w0 ) = max Et0 , w0 [u(t, c, W )dt + J(t0 , w0 ) + ∂t J(t0 , w0 )dt

c
1 2
+ ∂w J(t0 , w0 )dW + ∂ww J(t0 , w0 )(dW )2 ] + o(dt) . (7.3)
2
This looks like a second order expansion in the state variable - but the square of Brownian
motion (dB)2 ) is linear in time (see the part and appendix on continuous time nance),
2 2 2
i.e. (dW ) = (g(t, u, W )dt + σ(t, u, W )dB) = σ dt. The only random component in the
above value function expression is therefore the term ∂w JdW . Since E[dB] = 0, we get
E[∂w JdW ] = ∂w Jgdt .
Dividing by dt we nally get the fundamental partial dierential equation (PDE)

1 2 2
0 = max u + ∂t J + ∂w Jg + + ∂ww Jσ . (7.4)
c 2
Therefore,
1. Taking formally the derivative w.r.t. to c in the above PDE gives us optimal
decision making c as a function of the unknown value function J.
475
2. Reinsert this candidate into the fundamental PDE (7.4) solve the resulting J-
equation with the boundary and initial conditions (if any).
3. Use this explicit solution J to obtain the fully specied optimal policy c∗t and the
∗
optimal controlled state dynamics Wt .
Insering J(t, W ) = e−rt V (W ). into the fundamental PDE leads after cancelling of the
exponential function to
ca

2 2
0 = max − rV + ∂w V g + +1/2∂ww V σ .
c,ω a
(7.5)
The wealth dynamics Wt follows from the asset dynamics and the consumption rate.
There is a risky asset with dynamics dS/S = µdt + σdB where the drift and the volatility
are constant and a so-called risk less asset with dynamics dB = Brdt. The growth rate of
wealth is the equal to the weighted sum of the asset growth rates minus the consumption
rate, i.e.
dW/W = ωdS/S + (1 − ω)dB/B − c/W dt .
The weight ω is equal to the number of risky assets times their price S divided by total
wealth. Inserting the asset dynamics in the wealth growth rate equations gives the nal
wealth dynamics:
dW = (ωµW + (1 − ω)rW − c)dt + σωW dB .
Inserting this dynamics in the fundamental PDE gives:
ca

1
0 = max − rV + (ωµW + (1 − ω)rW − c)∂w V + + (σωW )2 ∂ww
2
V . (7.6)
c,ω a 2
Taking the derivative w.r.t. to the two choice variables, setting them to zero gives the
candidate solutions (First Order Conditions):

∗ 1
∗ r−µ 1
c = (∂w V ) a−1 , ω = ∂w V 2 2 V
. (7.7)
σ W ∂ww
This candidate optimal choice solution possess a drawback - they depend on the yet
unknown value function. One has to determine the value function V. To achieve this,
we reinsert the optimal candidate functions into the fundamental PDE. This gives an
equation for the unknown value function V:
1 1−a (r − µ)2 (∂w V )2
V = (∂w V ) a−1 + rW ∂w V − . (7.8)
a 2σ 2 2 V
∂ww
This is a highly non-linear equation but the value function V (w) is proportional to the
a
expected value of c . Therefore, a guess is V (W ) = αW a as a candidate solution with
α a constant. Testing this guess in the PDE we see that all terms are proportional to
W a: We can factor out this power function times a complicated function which does
not depend on the state variable W. Since this product has to be zero for all W, the
complicated function has to be zero which gives us a value for the constant α and we
obtained in this way a solution for the unknown value function. To carry this out we
insert this guess into (7.8):
(r − µ)2 a

a 1 − a a−1
1
0=W α α − 1 + ra − .
a 2σ 2 a − 1
| {z }
=:F (α)
That is, the state variable dependence W a appears in each term of the original PDE
a
and can be factored out. Therefore V (W ) = αW solves the PDE if F (α) = 0. This
∗
equation can be solved explicitly, leading to a constant α . Hence we found a solution
for the value function PDE which then provides us an explicit solution for the choice
variables:
1 µ−r 1
V (W ) = α∗ W a , c∗ = W (aα∗ ) a−1 , ω ∗ = .
σ2 1 − a
We prove the Markowitz Proposition 34:
Proof. With the Lagrangian
1
L = hφ, Cφi + λ1 (1 − he, φi) + λ2 (r − hµ, φi),
2
the rst order conditions
 
∂L
∂φ1
 ∂L 
∂L  ∂φ2 
0 = :=  .
 , (7.9)
∂φ  . 
 . 
∂L
∂φN
are a set of N equations. From
1 ∂hφ, Cφi ∂hφ, µi

= Cφ , =µ
2 ∂φ ∂φ
we get
0 = Cφ − λ1 e − λ2 µ (7.10)
1 = he, φi (7.11)
r = hµ, φi . (7.12)
The optimality conditions are therefore N + 2 linear equations in the N + 2 variables

φ, λ1 , λ2 . To solve this system, we proceed as follows. Since C is strictly positive denite,
C −1 exists and (7.11) implies
φ = λ1 C −1 e + λ2 C −1 µ .
477
Multiplying this last equation from the left with e and µ, respectively, and using the
normalization condition and the return constraint, we get a linear system for the two
Lagrange multipliers:
1 = λ1 he, C −1 ei + λ2 he, C −1 µi
r = λ1 hµ, C −1 ei + λ2 hµ, C −1 µi . (7.13)
If we set τ = (λ1 , λ2 )0 andy = (1, r)0 the last system reads
he, C −1 ei he, C −1 µi

y = τ =: Aτ . (7.14)
hµ, C −1 ei hµ, C −1 µi
If A is invertible, we are done since then y = Aτ can be trivially solved for τ. This
∗
determines the Lagrange multipliers λi and inserting this result in φ∗ = λ∗1 C −1 e+λ∗2 C −1 µ
gives us the optimal portfolio and proves the proposition. We prove that within the given
model, the matrix A is invertible, i.e. we claim that det A = ∆ > 0. To prove this we
use the Cauchy-Schwartz inequality, i.e. for two arbitrary vectors x, y we have
|x|2 |y|2 ≥ hx, yi2 ,
where the strict inequality holds if the two vectors are independent. To rewrite the
determinant in the form needed for the Cauchy-Schwartz inequality, we have rst to
dene the vectors x, y . Therefore, we use the decomposition C = U U 0, which always
exists for strictly positive denite, symmetric matrices. Using this, we get
he, C −1 ei = he, (U U 0 )−1 ei = he, (U 0 )−1 U −1 ei = hU −1 e, U −1 ei =: hx, xi ,
where we used
hx, A0 Axi = hAx, Axi
and properties of the matrix inverse. Proceeding in the same form with the other elements
of A and dening hµ, C −1 ei = hy, xi we get
det A = hx, xihy, yi − hx, yi2 .
Hence, using the Cauchy-Schwartz inequality det A ≥ 0 follows. Since µ and e are
linearly independent, the same holds for x = U −1 e and y = U −1 µ too. This nally
proves det A > 0 and the prove of the proposition is completed. For further reference,
we note the optimal multiplier values:
1
λ∗1 = (A−1 y)1 = −hµ, C −1 µi + rhe, C −1 µi

(7.15)
∆
1
λ∗2 = (A−1 y)2 = −he, C −1 µi + rhe, C −1 ei .

(7.16)
∆
We prove the Mutual Fund Proposition 35:

Proof. Let φ and ψ be two solutions of the Markowitz portfolio problem. Then they
satisfy linear FOC but then also any convex linear combination aφ + (1 − a)ψ also
satises the FOC. Using the that the sum of weights times the Lagrange multipliers add
up to one, the weight a follows.
We prove the Value-at-Risk formula (3.16)
Proof. The V aRα for the quantile α and a xed time horizon solve implicitly the in-
equality:
P (X ≤ −Varα ) ≤ α .
If P ∼ N (µ, σ 2 ), the inequality reads
Z −VaRα (x−µ)2
1
√ e− 2σ 2 dx ≤ α .
2πσ −∞
x−µ
With the change of coordinates z= σ this becomes
Z −VaRα
1 1 2
√ e− 2 z σdz ≤ α ,
2πσ −∞
with σ the Jacobian, i.e.
Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞
The upper limit of the integral depends on α, the mean and variance. Setting the variance
to unity and the mean to zero, then for a given α the critical factor k or the VaR follows.
For α = 0.01, i.e. a VaR of 99% condence, numerically solving
Z kα
1 1 2
√ e− 2 z dz ≤ 0.01
2π −∞
kα = −2.33 follows. Increasing the condence interval to 99.9 percent, i.e. α = 0.001,
the critical value becomes kα = −3.09.
We use these insights in the VaR calculation. From
Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞
follows
VaRα −µ
− ≤ kα ,
σ
or the V aRα is under normality equal to
−VaRα ≤ σkα + µ .
479
Since the VaR-constraint binds,
−VaRα = σkα + µ .
This is the VaR for a xed time horizon. Calculating for example variance on an annual
basis but the VaR on a weekly basis, the square-root rule implies:
√
−VaRα = σkα T
assuming zero mean and in the example T = 52.
ŝF (x) = lim sF (z)

z∈C+
Proof. We prove the SML relationship and form a portfolio consisting of asset i and the
market portfolio M where we invest the fraction of wealth φ in i and 1−φ in M. The
expected rate of return of this portfolio is
µφ = φµi + (1 − φ)µM
and the variance is
σφ2 = φ2 σi2 + (1 − φ)2 σM + 2φ(1 − φ)cov(i, M ) .
As a function of φ, the pair (σφ , µφ ) traces out a curve in the risk-return space. The
curve cannot cross the CML, this would violate the property that the CML is an ecient
boundary of the feasible region. Hence, as a passes through zero, the curve traced out
by (σφ , µφ ) must be tangent to the CML at M. In other words, the slope of the CML
and of the curve at the point M must be equal, where the point M is where φ = 0.
Calculating the slopes and setting them equal, implies
µi − µ0 = βi (µT − µ0 ) . (7.17)
To check this:
dσφ 2
|φ=0 = cov(i, M ) − σM σM .
dφ
Next,
drφ
drφ dφ |φ=0 (µi − µM )σM
|φ=0 = dσφ
= 2 .
dσφ cov(i, M ) − σM
dφ |φ=0
drφ
Since the slope of
σφ |φ=0 should equal the slope of the CML, we have
(µi − µM )σM µM − Rf
2 = .
cov(i, M ) − σM σM
Solving for the expected return of asset i proves the claim.
We prove Propsition 52:
Proof.
Proof. The proof uses the Separating Hyperplane Theorem. A hyperplane can be written
in the form
hx, ai = d .
Subspaces of Rn are kernels of linear maps F. The Riesz-Fischer theorem implies
F (x) = hx, ai = 0
for the kernel. Since each ane subspace is representable by F (x) = d we dene:
Denition 99. The hyperplane Hx through the vector x is dened by:
Hx = {a ∈ Rn |hx, ai = d} , (7.18)
and the half spaces Hx+,− are dened by:

≥
Hx+,− = {a ∈ Rn |hx, ai ≤ d} . (7.19)
Let U, V be two subsets of Rn . The hyperplane H separates the sets U, V ⇐⇒ U and

V are in dierent half spaces. The hyperplane H separates the sets U, V strictly ⇐⇒
Hx separates the sets and the sets are disjoint.
Proposition 100 (Separating hyperplane theorem). Let C and K be two disjoint and
convex subsets of Rn . Let C be compact and K be closed. Then there exits a hyperplane
H , which separates C and K strictly.
The compactness of one set is necessary:
1 1
U = {(x, y) ∈ R2 | x > 0, y ≥ } , V = {(x, y) ∈ R2 | x > 0, y ≥ − } .
x x
The sets are disjoint and convex. But they are not compact and therefore they cannot
be strictly separated.
We show that there exists a hyperplane through z0 , which is perpendicular to y0 x0
and which does not intersect U, V . Let
d(C, K) = inf ||x − y||

x∈C,y∈K
be the shortest distance between C and K. For C compact and K closed a minimizing
point x0 , y0 exist, i.e.
d(C, K) = ||x0 − y0 || > 0 .
481
Figure 7.1: The hyperplane separates the two convex sets A and B in R2 . A set is convex if
any 'line with end and starting point in the set remains fully in the set'.
Let Hx0 be the hyperplane through x0 , which is perpendicular to the line y0 x0 . We write
Hx0 as follows:
Hx0 = {z ∈ Rn |hy0 − x0 , z − x0 i = 0} .
The function φ(λ) measures the distance between y0 and x:
φ(λ) := ||y0 − (x0 + λ(x − x0 ))||2

= hy0 − x0 , y0 − x0 i − 2λhy0 − x0 , x − x0 i + λ2 hx − x0 , x − x0 i.
This function is continuously dierentiable and we have φ(λ) ≥ φ(0), ∀λ ∈ [0, 1], since
x0 is closest to y0 . Therefore, φ0 (λ) = −2hy0 − x0 , x − x0 i + 2λhx − x0 , x − x0 i and
φ0 (0) = −2hy0 − x0 , x − x0 i ≥ 0 ,
i.e.
hx0 − y0 , x − x0 i ≤ 0 , ∀x ∈ U,
since C is convex. In the same way one shows that for Hy0 the inequality
hy0 − x0 , y − y0 i ≤ 0 , ∀x ∈ U
holds. Since for all y∈V
hy0 − x0 , y − y0 i = hy0 − x0 , y0 − x0 i + hy0 − x0 , y − y0 i ≥ 0,
it follows that Hx0 separates the sets C, V and the same is true for Hy0 . Therefore, Hz0
separates the sets strictly.
We prove Proposition 57.
Proof. ⇒. Let ψ be a vector where all components are strictly positive. We claim that
it is a state vector if each attainable payo V = Pφ implies hψ, V i = hS0 , φi (we omit
the time index T ). V = Pφ implies
hψ, V i = hψ, Pφi = hP0 ψ, φi.
If ψ is a state vector, S0 = P0 ψ , i.e. hψ, V i = hS0 , φi follows. If hψ, V i = hS0 , φi,
hP0 ψ, φi = hS0 , φi .
This proves the claim.
Hence, for each attainable payo V = Pφ the identityhψ, V i = hS0 , φi holds. There-
fore, if all components of V are positive, also hS0 , φi ≥ 0 holds, i.e. arbitrage is not
possible.
⇐. We set
M = {(x, xK+1 ) ∈ RK+1 | x = Pφ, xK+1 = −hS0 , φi = −V0 }
and
X
K = {x ∈ RK+1 | xi ≥ 0, xi = 1} .
i
M is an augmented space of payos. It consists of all payos at date T plus the price
of the portfolio −hS0 , φi at time zero. K is a simplex. M is a convex and closed set
and K is compact. Since the compact set lies in the positive orthant the denition of no
arbitrage implies that M and K are disjoint. The Separation Theorem then applies:
There exists a vector z ∈ RK+1 such that hz, xi < b < hz, yi for all x ∈ M, y ∈ K .
Since M z ∈ M ⊥ . But
is a linear space, these inequalities can only hold if the vector
then b > 0. Since also hz, yi > b > 0 for y ∈ K , all components of the vector z are
zk
positive. This allows us to dene the state price density as ψk :=
strictly
zK+1 and ψ
solves S0 = P0 ψ . To prove this, recall that z ∈ M⊥ and therefore for each strategy vector
φ∈ RN :
0 = hzK+1 ψ, Pφi − zK+1 hS0 , φi = zK+1 (hP0 ψ, φi − hS0 , φi) .
Therefore, hP0 ψ, φi = hS0 , φi, i.e. P0 ψ = S0 , holds for all strategies φ. This proves the
claim.
We prove the Riesz-Fischer Proposition 61 in nite dimension. That is we prove:

483
Theorem 101. (Riesz ) Let X be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ X , the Riesz kernel, such that
p(x) = hr, xi
for all x ∈ X .
Proof. We recall some facts from linear algebra and projection geometry rst:
Let M and M 0 be subspaces of Rn . Then M 0 is the complement of M . If M is a linear
of R , we dene the orthogonal complement M :
subspace
n ⊥
M ⊥ = {x ∈ Rn | hx, y, i = 0 , ∀y ∈ M } .
i each vector x ∈ Rn can be written as the sum of two vectors z∈M and z0 ∈ M 0 i.e.
x = z + z0 .
We write Rn = M ⊕ M 0 . Then M0 is the complement of M i M and M0 have only the

n
zero vector in common and the two spaces span R , i.e.
dim M + dim M 0 = dim Rn = n .

p
If two vectors x, y are orthogonal, i.e. hx, y, i = 0, and the norm norm || • || = h•, •i
is induced by the scalar product. Considering the orthogonal decomposition
n
of R in
M ⊕ M ⊥, the vector y∈M dened by
x = y + y0 , y0 ∈ M ⊥ ,
is the orthogonal projection of x onto M : y = PM x. This vector has minimal distance

to x, i.e.
y = arg min ||x − w||2 , if x = y + y0 , y0 ∈ M ⊥, y ∈ M .
w∈M
The kernel and the image of a linear map f : Rn → Rm are dened as follow:
ker f := {x ∈ Rn | f (x) = 0} ⊂ Rn
and
imf := {y ∈ Rm | y = f (x), x ∈ Rn } ⊂ Rm .
The dimension formula
dim Rn = dim ker f + dim imf
holds.
We start to prove the theorem. If l(y) = 0, we set z = 0. Suppose that l(y) 6= 0.

Since iml ⊂ R, we have
dim M = dim ker l + dim iml = dim ker l + 1 = dim ker l + dim(ker l)⊥ .
Since the kernel is a subspace it follows dim(ker l)⊥ = 1. Let e∈M be a basis of (ker l)⊥ .
We decompose the vector y∈M
y = y 0 + λe, y 0 ∈ ker l, λ ∈ R .
Since e and y0 are orthogonal,
he, yi = he, y 0 i + λhe, e, i = λhe, e, i
and therefore
he, yi
λ= .
he, e, i
For all y∈M we get:
l(y) = l(y 0 + λe) = l(y 0 ) + λl(e) = λl(e) ,
where we used the linearity of l and that y 0 ∈ ker l. But this implies
he, yi l(e)e
l(y) = λl(e) = l(e) = hẽ, yi, ẽ = .
he, e, i he, e, i
This proves, that each linear functional can be represented in the claimed form by a
scalar product. Uniqueness follows by taking two dierent vectors ẽ and ẽ0 and showing
that they indeed have to agree.
Proof. To do.
Proof. To do.
Proof. We prove the direction 'SDF implies the expected return representation'. Take a
0
SDF M = a + b f and consider an asset i with return Ri . The general covariance formula
applied to 1 = E[M R] implies
1 1 1 1
E[Ri ] = −1− cov(M, Ri ) = −1− b0 cov(f, Ri ).
E[M ] E[M ] E[M ] E[M ]
For the multivariate regression of asset i on the factors
Ri = αi + βi0 f + i
485
the vector of betas is given by βi0 = Cf−1 cov(f, Ri ) with Cf the factor covariance matrix.
Substituting this expression into the above expected return formula for the asset return
we get
E[Ri ] = κ + Λ0 βi
where
1 1
κ= − 1 ,Λ = bCf .
E[M ] E[M ]
This proves the claim.
To prove the other direction, we assume that E[Ri ] = κ + Λ0 βi holds for some scalar
κ and some vector Λ for each asset i. We search
0
for a, b such that M = a + b f follows.
Since
E[Ri ] = κ + Λ0 βi = κ + Λ0 Cf−1 cov(f, Ri )
it suces to have κ= 1
E[M ] −1 and b = −E[M ]Cf−1 Λ. Choosing
1 1
b=− Cf−1 Λ , a = (1 + µ0f Cf−1 Λ)
1+κ 1+κ
the random variable M = a+b0 f is such that for each asset i the equation E[Ri ] = κ+Λ0 βi
holds. Therefore,
1 1
E[Ri ] = −1− b0 cov(f, Ri )
E[M ] E[M ]
holds too and M is a SDF.
The proof is taken from Wikipedia. We prove the bias-variance equation 6.2:
Proof. For any random variable X we have
2
Var[X] = E[X 2 ] − E[X]
Rearranging:
2
E[X 2 ] = Var[X] + E[X]
Since f is deterministic, E[f ] = f . Given y = f +ε and E[ε] = 0, implies E[y] =

E[f + ε] = E[f ] = f . Since Var[ε] = σ 2
2
Var[y] = E[(y − E[y])2 ] = E[(y − f )2 ] = E[(f + ε − f )2 ] = E[ε2 ] = Var[ε] + E[ε] = σ 2
Thus, since ε and fˆ are independent, we can write

E (y − fˆ)2 = E[y 2 + fˆ2 − 2y fˆ] = E[y 2 ] + E[fˆ2 ] − E[2y fˆ]

= Var[y] + E[y]2 + Var[fˆ] + E[fˆ]2 − 2f E[fˆ]

= Var[y] + Var[fˆ] + f 2 − 2f E[fˆ] + E[fˆ]2
= Var[y] + Var[fˆ] + (f − E[fˆ])2 = σ 2 + Var[fˆ] + Bias[fˆ]2
Proof. The following bounds are used over and over in statistical learning theory.
Theorem 102 (Hoeding). Let X1 , . . . , Xn be independent bounded random variables

with Xi taking values in [ai , bi ]. Let Sn = ni Xi . Then for every > 0:
P

−2 2
P (|Sn − E(Sn )| ≥ ) ≤ 2e Wn (7.20)
with Wn2 = − ai )2 .
P
i (bi
The proof uses a technical lemma and the Cherno bounding method.
Lemma 103. Let X be a random variable with expected value zero and taking values in
the interval [a, b]. For s > 0,
2 (b−a)2 /8
E[esX ] ≤ es .
Proof. The convexity of the exponential function implies
x − a sb b − x sa
esx ≤ e + e .
b−a b−a
Since E[X] = 0 and setting p = −a/(b − a) if follows
a sb b sa
E[esX ] ≤ − e + e = (1 − p + pes(b−a) )e−sp(b−a) .
b−a b−a
Setting g(u) = −pu + log(−p + peu ), u := s(b − a) we can write
E[esX ] ≤ eg(u) .
The function g satises g(0) = g 0 (0) = 0 by taking the derivative the second derivative
00
satises g (u) ≤ 1/4. Taylor's theorem up to second order around zero implies for some
c ∈ [0, u] (the rst two terms in the series are zero):
1 u2 s2 (b − a)2
g(u) = u2 g 00 (c) ≤ = .
2 8 8
487
Using this lemma, we prove Hoeding's theorem.
Let X be a non-negative random variable and > 0. The inequality of Markov states
E[X]
P [X ≥ ] ≤ .

Hence for s > 0:
E[esX ]
P [X ≥ ] = P [esX ≥ es ] ≤ .
es
The Cherno method means to nd a positive s such that an upper bound on a random
expression is minimized:
P
P (Sn − E[Sn ] ≥ ) ≤ e−s E[es i (Xi −E[Xi ]) ]
Y
= e−s E[es(Xi −E[Xi ]) ]
i
Y
−s
≤ e es(Xi −E[Xi ])
i
−s s i (bi −ai )2 )/8
P
= e e
−22 /Wn2
:= e
by using rst the Markov inequality, then the independence of the random variables,
then the technical lemma and nally by choosing s appropriately. This concludes the
proof for Sn − E[Sn ]. The same bounds hold for E[Sn ] − Sn and hence the proof of the
theorem follows.
Proof. Let f be the function were the supremum is attained. Then
χ(Remp (f )−R(f )|≥) χ(Remp (f )−Remp

0 (f )≤/2)
= χ(Remp (f )−R(f )≥ ∧ Remp (f )−Remp
0 (f )≥−/2)
≤ χRemp (f )−Remp
0 (f )>/2 .
Taking expections w.r.t the ghost sample:
0
χRemp (f )−R(f )≥ P (Remp (f ) − Remp (f ) ≤ /2) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .
The inequality of Chebyshev implies:
4var(f ) 1
P 0 (Remp (f ) − Remp
0
(f ) > /2) ≤ ≤ 2
n2 n
since random variables with values in the unit interval have a variance of less than 1/4.
Putting things together we have:
1
χRemp (f )−R(f )≥ (1 − ) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .
n2
Taking expectations w.r.t. the rst sample proves the result.
Proof. By denition, we get in update k
hθ∗ , θ(k) i = hθ∗ , θ(k−1) i + hym (θ∗ ), xm i ≥ hθ∗ , θ(k−1) i + γ .
Iteration for k updates

hθ∗ , θ(k) i ≥ kγ .
The next step is to bound the norm ||θ(k) ||2 . By denition, the boundness assumption
of x and k iterations we get:
||θ(k) ||2 = ||θ(k−1) + ym xm ||2 ≤ ||θ(k−1) ||2 + r2 ≤ kr2 .
Therefore, hθ∗ , θ(k) i grows at least linearly and ||θ(k) ||2 increases at most linearly. We
consider the cosine
hθ∗ , θ(k) i kγ
cos(θ∗ , θ(k) ) = (k) 2 ∗ 2
≥√ .
||θ || ||θ || kr2 ||θ∗ ||
By combining the two we can show that the cosine of the angle between θ(k) and Î¸θ
∗
has to increase by a nite increment due to each update. Since cosine is bounded, we
can only make a nite number of updates.
Chapter 8
Appendix
AM Firm Description USD bn 52w 2y 3y 5y
Vanguard S&P 500 ETF 224 20.2 14 10.4 15.7
Vanguard 500 Inx 182 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Adm 138 20.2 14 10.4 15.7
iShares:Core Instl Idx, Inst 133 20.2 14 10.4 15.7
Vanguard S&P 500 123 19.5 14 10.1 15.48
Vanguard TSM Idx;Inv 121 19.6 14.1 na na
Vanguard TSM Idx;Inst+ 116 28.1 12.6 6.7 8.1
Vanguard Tot I Stk, Ins 108 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Inst 92 20.2 14 10.4 15.7
Fidelity Instl Indx, InsP 89 30.8 15.5 13.3 16.8
Vanguard Contrafund 88 28.3 12.7 6.8 8.2
Vanguard Tot I Stk, Ins 86 19.6 14.1 10.3 15.6
Vanguard TSM, Idx, ETF 85 14.3 10.5 7.7 10.8
Vanguard Wellington;Adm 85 3.5 2.75 2.3 1.9
American Tot Bd II, INV 84 24.1 15 12.1 16.4
iShares:MSCI Funds Gro, A 81 27.4 9.8 6.1 8.7
Vanguard EAFE ETF 81 3.7 2.9 2.4 2.0
Vanguard Tot BD, Adm 78 20.2 14 10.4 15.7
American 500 Index, ETF 77 12.7 9.8 6.2 9.6
Fidelity Funds Inc, A 72 20.1 14 10.4 15.7
American 500 Idx, Pr 71 15.08 8.5 4.9 7.9
Dodge Funds CIB, A 68 14.9 15.1 9.6 16.4
Vanguard Cox Stock 65 27.5 11.2 7.3 9.3
Dodge FTSE ETF 65 26.5 11.8 4.51 10
Table 8.1: Source: Lipper Performance Report. November 2017
489
490 CHAPTER 8. APPENDIX
Largest Custodians
Rank Provider Assets under custody USD bn Reference date
1 BNY Mellon 28,300 Sep 30, 2014
2 J.P. Morgan 21,000 Mar 31, 2014
3 State Street 20,996 Mar 31, 2014
4 Citi 14,700 Mar 31, 2014
5 BNP Paribas 9,447 Jun 30, 2014
6 HSBC Securities Services 6,210 Dec 31, 2013
7 Northern Trust 5,910 Sep 30, 2014
8 Societe Generale 4,915 Sep 30, 2014
9 Brown Brothers Harriman 3,800 Mar 31, 2014
10 UBS AG 3,438 Sep 30, 2014
11 SIX Securities Services 3,247 Dec 31, 2013
12 CACEIS 3,200 Dec 31, 2013
Table 8.2: Source: globalcustody.net.

Chapter 9
References
1. D. Acemoglu, A. Malekian and A. Ozdaglar, Network Security and Contagion, Journal of
Economic Theory 166, 536-585, 2016.
2. A. Acquisti, C. Taylor, and L. Wagman, The Economics of Privacy, Journal of Economic
Literature 54. 2442-492, 2016.
3. Accenture, Digital Business Era: Stretch Your Boundaries, Accenture Technology Vision
2015, 2015.
4. C. Ackermann, R. McEnally and D. Ravenscraft, The Performance of Hedge Funds: Risk,
Return, and Incentives. Journal of Finance, 833-874, 1999.
5. V. Agarwal, N.D. Daniel and N.Y. Naik, Role of Managerial Incentives and Discretion in
Hedge Fund Performance. The Journal of Finance, 64(5), 2221-2256, 2009.
6. A. Agrawal, J. Horton, N. Lacetera and E. Lyons, Digitization and the Contract Labor
Market: A Research Agenda, in A. Goldfarb, S. Greenstein and C. Tucker, Economics of
Digitization: An Agenda. National Bureau of Economic Research, 2013.
7. h. Albrecher, P. Embrechts, D. FilipoviÄ, G. W. Harrison, P. Koch, S. Loisel, P. Vanini
and J. Wagner, Old-Age Provision: Past, present, Future. European actuarial journal,
6(2), 287-306, 2016.
8. G.S. Amin and H.M. Kat, Hedge Fund Performance 1990 - 2000: Do the 'Money Machines'
Really add Value?, Journal of nancial and quantitative analysis, 38(02), 251-274, 2003.
9. R.M. Anderson, S.W. Bianchi and L.R. Goldberg, Determinants of Levered Portfolio Per-
formance, Forthcoming Financial Analysts Journal, UCLA at Berkeley, 2014.
10. R. Anderson and T. Moore, The Economics of Information Security, Science 314, 610-613,
2006.
11. A. Ang, Mean-Variance Investing, Lecture Notes Columbia University, ssrn.com, 2012.
12. A. Ang, Asset Management. A Systematic Approach to Factor Investing, Oxford Univer-
sity Press, 2014.
13. A. Ang, W. Goetzmann, and S. Schaefer, Evaluation of Active Management of the Nor-
wegian GPFG, Norway: Ministry of Finance, 2009. (the Professor's Report)
14. A. Ang, S. Gorovyy and G.B. Van Inwegen, Hedge Fund leverage. Journal of Financial
Economics, 102(1), 102-126, 2011.
491
492 CHAPTER 9. REFERENCES
15. A. Ang, D. Basu, M. D.Gates and V. Karir, Model Portfolios, ssrn.com, 2018.
16. A. M. Antonopoulos, Mastering Bitcoin, O'Reilly Books, New York, 2015.
17. F. Allen and D. Gale, Financial Markets, Intermediaries and Intertemporal Smoothing, J.
Pol. Econom., 105, 523-546, 1997.
18. A. Artzner, F. Delbaen, J.-M. Eber and D. Heaths, Coherent Measures of Risk, Mathe-
matical Finance, 9(3), 203-228, 1999.
19. T. Aste, Blockchain, University College London, Center for Blockchain Technologies,
preprint ssrn.com, 2016.
20. C. S. Asness, Hedge Funds: The (Somewhat Tepid) Defense, AQR, October 24, 2014.
21. C.S. Asness, How Can a Strategy Still Work if Everyone Knows About it? International
Invest Magazine, September, 2015.
22. C.S. Asness and J. Liew, The Great Divide of Market Eciency, Institutional Investor,
March 03, 2014.
23. C.S. Asness, A. Frazzini, R. Israel and T. Mokowitz, Fact, Fiction, and Value Investing,
Forthcoming, Journal of Portfolio Management, Fall 2015, 2015.
24. V. Agarwal, N. D. Daniel, and N. Y. Naik, Do Hedge Funds Manage Their Reported
Returns?, Review of Financial Studies, forthcoming, 2011.
25. V. Agarwal and N.Y. Naik, Multi-Period Performance Persistence Analysis of Hedge Funds,
JFQE, 35(03), 327-342, 2000.
26. F. Allen, J. Barth and G. Yago, Fixing the Housing Market: Financial Innovations for
the Future, Wharton School Publishing-Milken Institute Series on Financial Innovations,
Upper Saddle River, NJ: Pearson Education, 2012.
27. F. Allen and G. Yago, Financing the Futures. Market-Based Innovations for Growth.
Wharton School of Publishing and Milken Institute, 2012.
28. G.O. Aragon and J.S. Martin, A Unique View of Hedge Fund Derivatives Usage: Safeguard
or Speculation? Journal of Financial Economics, 105(2), 436-456, 2012.
29. Assenagon Asset Management, 1. Assenagon Derivatetag am See, 2013.
30. M. Avellaneda and D. Dobi, Structural Slippage of Leveraged ETFs, ssrn.com, 2012.
31. D. Avramov, R. Kosowski, N.Y. Naik and M. Teo, Hedge Funds, Managerial Skill, and
Macroeconomic Variables. Journal of Financial Economics, 99(3), 672-692, 2011.
32. Ph. Bacchetta, C. Tille and E. van Wincoop, Self-Fullling Risk Panics, American Eco-
nomic Review 102, 3674-3700, 2013.
33. K. E. Back, Asset Pricing and Portfolio Choice Theory, Oxford University Press, 2010.
34. Bank of England, The Economics of Digital Currencies, Quarterly Bulleting, Q3, 2014.
35. D. H. Bailey, J. M. Borwein, M. L. de Prado and O. J. Zhux, Pseudo-Mathematics and Fi-
nancial Charlatanism: The Eects of Backtest Overtting on Out-Of-Sample Performance,
Notices of the American Mathematical Society, 61(5), 458-471, 2014.
36. M. Baker, B. Bradley and J. Wurgler, Benchmarks as Limits to Arbitrage: Understanding
the Low-Volatility Anomaly, Financial Analysts Journal, 67(1):40-54, 2011.
493
37. N. Barberis and A. Shleifer, Style Investing, Journal of Financial Economics 68 (2), 181-99,
2003.
38. L. Barras, O. Scaillet, and R. Wermers, False Discoveries in Mutual Fund Performance:
Measuring Luck in Estimated Alphas, The Journal of Finance 65.1, 179-216, 2010.
39. G. Baquero and M. Verbeek, A Portrait of Hedge Fund Investors: Flows. Performance
and Smart Money, ssrn.com, 2005.
40. P.A. Bares, R. Gibson and Gyger, Performance in the Hedge Funds Industry: An Analysis
of Short-and Long-Term Persistence, The Journal of Alternative Investments, 6(3), 25-41,
2003.
41. M. Bech and R. Garratt, Central Cank Cryptocurrencies, BIS Quarterly Review, Septem-
ber, 55â70, 2017.
42. I. Ben-David, F. Franzoni, A. Landier and R. Moussawi, 2012, Do Hedge Funds Manipulate
Stock Prices, Fisher College of Business Working Paper Series.
43. R. Berentsen and F. Schaer, Bitcoin: A Currency Here to Stay?, Swiss Finance Institute
Seminar, Zurich, October, 2014.
44. R. Berentsen and F. Schaer, Bitcoin, Blockchain, Kryptoassets, Universität Basel, 2017.
45. Roland Berger, FinTechs in Europe â Challenger and Partner, Zurich, November, 2016.
46. P. L. Bernstein, Wimps and Consequences, The Journal of Portfolio Management, p.1,
1999.
47. BIS, Cryptocurrencies: Looking Beyond the Hype, Bank of International Settlement,
Basel, 2018.
48. Black Rock, ETF landscape: Global Handbook Q1, 2011.
49. F. Black, M. C. Jensen and M.S. Scholes, The Capital Asset Pricing model: Some Empirical
Tests, papers.ssrn.com, 1972.
50. F. Black and R. Litterman, Robert, Asset Allocation: Combining Investor Views with
Market Equilibrium, Goldman Sachs Fixed Income Research Note, September, 1990.
51. R. B. Bliss and R. Steigerwald, Derivatives Clearing and Settlement: A Comparison of
Central Counterparties and Alternative Structures, Economic Perspectives, 30(4), 2006.
52. D. Blitz, Strategic Allocation to Premiums in the Equity Market, ssrn.com, 2011.
53. J.-P. Bouchaud and M. Potters, Financial Applications of Random Matrix Theory: a Short
Review, arXiv preprint arXiv:0910.1205, 2009.
54. D. Blitz, Is Rebalancing the Source of Factor Premiums?, The Journal of Portfolio Man-
agement, Summer 2015, 2015.
55. R. Boehme, N. Christin, B. Edelmann and T. Moore, Bitcoin: Economics, Technology and
Governance, Journal of Economic Perspectives, Vol. 29, 2, Spring 2015, 213-238, 2015.
56. A. Börsch-Supan, K. H. Alcser, Health, Aging and Retirement in Europe: First Results
from the Survey of Health, Ageing and Retirement in Europe. Mannheim: Mannheim
Research Institute for the Economics of Aging (MEA), 2005.
57. A. Börsch-Supan, A. Ludwig, and J. Winter, Ageing, Pension Reform and Capital Flows:
A MultiâCountry Simulation Model, Economica 73.292, 625-658, 2006.
58. A. Börsch-Supan, M. Brandt, C. Hunkler, T. Kneip, J. Korbmacher, F. Malter and S.

Zuber, Data resource prole: the Survey of Health, Ageing and Retirement in Europe
(SHARE), International journal of epidemiology, dyt088, 2013.
59. C. Badertscher, J. Garay, U. Maurer, D. Tschudi and V. Zikas, But why does it Work?
A Rational Protocol Design Treatment of Bitcoin, In Annual International Conference on
the Theory and Applications of Cryptographic Techniques, Springer, Cham 34-65, 2018.
60. T. Bourgeron, E. Lezmi and T. Roncalli, Robust Asset Allocation for Robo-Advisors,
arXiv, arxiv.org/abs/1902.07449, 2018.
61. M.W. Brandt, Portfolio Choice Problems, Brandt, in Y. Ait-Sahalia and L.P. Hansen
(eds.), Handbook of Financial Econometrics, Volume 1: Tools and Techniques, North
Holland, 269-336, 2010.
62. M. Brenner and Y. Izhakian, Asset Prices and Ambiguity: Empirical Evidance, Stern
School of Business, Finance Working Paper Series, FIN-11-10, 2011.
63. R. Brian, F. Nielsen and D. Steek, Portfolio of Risk Premia: A New Approach to Diver-
sication, MSCI Barra Research Insights, 2009.
64. S. Browne, Reaching Goals by a Deadline: Digital Options and Continuous-Time Active
Portfolio Management, Adv. Appl. Prob. 31, 551-557, 1999.
65. S. J. Byun and B.H. Jeon Momentum Crashes and the 52-Week High, 2018.
66. R. G. Brown, J. Carlyle, I. Grigg and M. Hearn, Corda: An Introduction, squarespace.com,
2016.
67. S.J. Brown, W. Goetzmann, R.G. Ibbotson and S.A. Ross, Survivorship Bias in Perfor-
mance Studies, Review of Financial Studies, 5(4), 553-580, 1992.
68. S.J. Brown, W. Goetzmann and R.G. Ibbotson, Oshore Hedge Funds: Survival and
Performance, 1989-95, Journal of Business, 72(1), 1999.
69. S.J. Brown, W. Goetzmann and J.M. Park, Conditions for Survival: Changing Risk and
the Performance of Hedge Fund Managers and CTAs, ssrn.com, 1999.
70. B. Bruder, N. Gaussel, J.-C. Richard and T. Roncalli, Regularization of Portfolio Alloca-
tion, Lyxor White Paper Series, 10, 2013.
71. J. Bruna, Mathematics of Deep Learning, Courant Institute of Mathematical Science,
NYU, 2018.
72. C. Burges, A tutorial on support vector machines for pattern recognition. Data mining
and knowledge discovery, 2. Jg., Nr. 2, S. 121-167, 1998.
73. A. Corbellini, Elliptic Curve Cryptography: A Gentle Introduction, webpage of A. Cor-
bellini, 2015.
74. R.J. Caballero, Macroeconomics after the Crisis: Time to Deal with the Pretense-of-
Knowledge Syndrome, Journal of Economic Perspectives, Volume 24, Number 4, Fall, 85
- 102, 2010.
75. R.J. Caballero and A. Krishnamurthy, Collective risk management in a ight to quality
episode. The Journal of Finance, 63(5), 2195-2230, 2008.
76. C. Camerer, G. Loewenstein, and D. Prelec. Neuroeconomics: How Neuroscience can
Inform Economics. Journal of economic Literature: 9-64, 2005.
495
77. J.Y. Campbell and L. M. Viceira, Strategic Asset Allocation: Portfolio Choice for Long-
Term Investors, books.gooble.com; 2002.
78. C. Cao, Y. Chen, B. Liang and A.W. Lo, Can Hedge Funds Time Market Liquidity?,
Journal of Financial Economics, 109(2), 493-516, 2013.
79. M.M. Carhart, On Persistence in Mutual Fund Performance, The Journal of nance, 52(1),
57-82, 1997.
80. Z. Cazalet and T. Roncalli, Style Analysis and Mutual Fund Performance Measurement
Revisited, Lyxor Research Paper, 2014.
81. Y. Chen, Timing Ability in the Focus Market of Hedge Funds, Journal of Investment
Management, 5(2), 66, 2007.
82. Y. Chen, Derivatives Use and Risk Taking: Evidence from the Hedge Fund industry,
Journal of Financial and Quantitative Analysis, 46(04), 1073-1106, 2011.
83. CEM Benchmarking, CEM Toronto, 2014.
84. P.Cheridito and E. Kromer, Reward-Risk Ratios, Journal of Investment Strategies 3(1),
1-16, 2013.
85. T. Chordia, A. Goyal and A. Saretto, p-hacking: Evidence from Two Million Trading
Strategies, University of Lausanne, preprint, 2017.
86. Y. Choueifaty and Y- Coignard, Toward Maximum Diversication. Journal of Portfolio
Management, 35(1), 40, 2008.
87. M.M. Christensen, On the History of the Growth Optimal Portfolio, University Southern
Denmark, Preprint, 2005.
88. J. Cochrane, Asset Pricing, Princeton University Press, 2005.
89. J. Cochrane, The Dog That Did Not Bark: A Defense of Return Predictability, Review of
Financial Studies 21 (4): 1533 - 75, 2077.
90. J. Cochrane, Discount Rates, Presidential Address AFA 2010, Journal of Finance, Vol
LXVI, 4, August, 2011.
91. N. Cuche-Curti, O. Sigrist and F. Boucard, Blockchain: An Introduction, Research and
Policy Notes, Swiss National Bank, 2016.
92. J. Cui, F. De Jong and E. Ponds, Intergenerational Risk Sharing within Funded Pension
Schemes. Journal of Pension Economics and Finance 10.01, 1-29, 2011.
93. C. Culp and J. Cochrane, Equilibrium Asset Pricing and Discount Factors: Overview and
Implications for Derivatives Valuation and Risk Management, Modern Risk Management:
A History. Peter Field, ed. London: Risk Books, 2003.
94. T. Dangl, O. Randl and J. Zechner, Risk Control in Asset Management: Motives and
Concepts, K. Glau et al. (eds), Innovation in Quantitative Risk Management, Springer
Proceedings in Mathematics and Statistics 99, 239-266, 2015.
95. V. DeMiguel, V. Galappi and R. Uppal, Optimal Versus Naive Diversication: How In-
ecient is the 1/n Portfolio Strategy?, Review of Financial Studies, 22(5), 1915-1953,
2009.
96. V. DeMiguel, Y. Plyakha, R. Uppal, G. Vilkov, Improving Portfolio Selection using Option-
Implied Volatility and Skewness, Forthcoming in Journal of Financial and Quantitative
Analysis, 2010.
97. M. L. de Prado, Building Diversied Portios that Outperfom out-of-sample, ssrn.com,
May, 2016.
98. L. Deville, Exchange Traded Funds: History, Trading, and Research, Handbook of Finan-
cial Engineering, Zopounidis, Doumpos and Pardalos (eds)., 67-99, 2007.
99. K. Daniel and T. Moskowitz, Momentum Crashes, The Q-Group: Fall Seminar, 2012.
100. K. Daniel and S. Titman, Evidence on the Characteristic of Cross Sectional Variation in
Stock Returns, Journal of Finance 55 (1), 380-406, 1997.
101. Deutsche Bank, Equity Risk Premia, Deutsche Bank London, February, 2015.
102. Deutsche Bank, A New Asset Allocation Paradigm, Deutsche Bank London, July, 2012.
103. F.X. Diebold, A. Hickman, A. Inoue, and T. Schuermann, Converting 1-Day Volatility to
h-Day Volatility: Scaling by Root-h is Worse than You Think, Risk, 11, 104-107, 1998.
104. D. Dobi and M. Avellaneda, Structural Slippage of Leveraged ETFs, Preprint NYU, 2012.
105. J. Dow and S. R. d. C.Werlang, Uncertainty Aversion, Risk Aversion, and the Optimal
Choice of Portfolio, Econometrica, Vol. 60, No. 1, 197 - 204, 1992.
106. M. Dudler, B. Gmür and S. Malamud, Risk-Adjusted Time Series Momentum, Working
Paper, 2014.
107. S. Duivestein, M. van Doorn, T. van manen, J. Bloem and E. van Ommeren, Design to
Disrupt, Blockchain: Cryptoplatform for a Frictionless Economy, SogetiLabs, 2016.
108. F.R. Edwards and M.O. Caglayan, Hedge Fund Performance and manager skill, ssrn.com,
2011.
109. EFAMA, European Fund and Asset Management Association, Annual Figure 2013, 2014.
110. EFAMA, European Fund and Asset Management Association, Annual Figure 2017, 2018.
111. D. Ellsberg, Risk, Ambiguity, and the Savage Axioms, Quarterly Journal of Economics,
75, 643-669, 1961.
112. E.J. Elton and M. J. Gruber, Risk Reduction and Portfolio Size: An Analytical Solution,
Journal of Business: 415-437, 1977.
113. Ernst & Young, What's new? Innovation for Asset Management, 2012 Survey, 2012.
114. Ethereum, www.ethereum.org, 2016.
115. ETF Sta, A Short Course in Currency Overlay. etf.com, April, 1999.
116. I. Eyal and E. G. Sirer, Majority is not Enough: Bitcoin Mining is Vulnerable, International
Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014.
117. F. Fabozzi, R. J. Shiller, and R. Tunaru, Hedging Real-Estate Risk, working paper 09-12,
Yale International Center for Finance, 2009.
118. M. Faber, A Quantitative Approach to Tactical Asset Allocation. Journal of Wealth
Management 9 (4), 69 - 79, 2007.
497
119. E.F. Fama, The Behavior of Stock Market Prices, Journal of Business, 38, 34-101, 1965.
120. E.F. Fama, Ecient Capital Markets: A Review of Theory and Empirical Work, Journal
of Finance 25, 383 - 417, 1970.
121. E.F. Fama, Ecient Markets: II, Journal of Finance, 46(5), 1575-1618, 1991.
122. E. F. Fama and J. D. MacBeth, Risk, Return, and Equilibrium: Empirical Tests, Journal
of political economy, 81(3), 607-636, 1973.
123. E.F. Fama and K. R. French, Permanent and Temporary Components of Stock Prices,
Journal of Political Economy 96: (2): 246 - 67. 1988.
124. E.F. Fama and K.R. French, Disagreement, Tastes, and Asset Prices, Journal of Financial
Economics 83 (3), 667-89, 2007.
125. E.F. Fama and K.R. French, A Five-Factor Asset Pricing Model, Journal of Financial
Economics, 116, 1-22, 2015.
126. B. Fastrich, S. Paterlini and P. Winker, Constructing Optimal Sparse Portfolios Using
Regularization Methods, ssrn.com, 2013.
127. J. D. Fisher, D.M. Geltner, and R.B. Webb, Value indices of commercial real estate: a com-
parison of index construction methods. The journal of real estate nance and economics,
9(2), 137-164, 1994
128. T. Fletcher, Machine Learning for Financial Market Prediction, PhD Thesis University
College London, 2012.
129. A. Frazzini and L. H. Pedersen, Betting Against Beta, Journal of Financial Economics
111.1, 1-25, 2014.
130. G. Frahm and C. Memmel, Dominating estimators for minimum-variance portfolios. Jour-
nal of Econometrics, 159(2), 289-302, 2010.
131. P. Franco, Understanding Bitcoin: Cryptography, Engineering and Economics. John Wi-
ley& Sons, 2014.
132. J. Freire, Massive Data Analysis: Course Overview, NYU School of Engineering, 2015.
133. C.B. Frey und M.A. Osborne, The Future of Employment: How Susceptible are Jobs to
Computerisation?, Oxford, September, 2013.
134. W. Fung, D.A. Hsieh, N.Y. Naik and R. Ramadorai, Hedge Funds: Performance, Risk,
and Capital Formation, The Journal of Finance, 63(4), 1777-1803, 2008.
135. W. Fung and D.A. Hsieh, Empirical Characteristics of Dynamic Trading Strategies: The
Case of Hedge Funds, Review of nancial studies, 10(2), 275-302, 1997.
136. W. Gale and R. Levine, Financial Literacy: What Works? How could it be more Eective,
Financial Security Project, Boston College, 2011.
137. J. Gatheral, Random Matrix Theory and Covariance Estimation, New York, October 3,
2008.
138. M. Gao and J. Huang, Capitalizing on Capitol Hill: Informed Trading by Hedge Fund
Managers, In Fifth Singapore International Conference on Finance, 2011.
139. D.M. Geltner, N. G. Miller, J. Clayton, and P. Eichholtz, Commercial real estate analysis
and investments (Vol. 1, p. 642). Cincinnati, OH: South-western, 2001.
140. C. R. Genovese, A Tutorial on False Discovery Control, Carnegie Mellon University, 2004.
141. D.M. Geltner and J. Fisher, Pricing and Index Considerations in Commercial Real Estate
Derivatives Journal of Portfolio Management Special Issue: Real Estate, 1 - 21, 2007.
142. M. Getmansky, B. Liang, C. Schwarz and R. Wermers, Share Restrictions and Investor
Flows in the Hedge Fund Industry, Working Paper, University of Massachusetts, Amherst,
2015.
143. M. Getmansky, M.P. Lee, and A. Lo, Hedge Funds: A Dynamic Industry In Transition,
NBER, 2015.
144. G. Gigerenzer and G.Goldstein, Reasoning the Fast and Frugal Way: Models of Bounded
Rationality, in Heuristics: The Foundations of Adaptive Behavior, eds Gigerenzer G.,
Hertwig R., Pachur T., editors. (New York: Oxford University Press; ), 31-57, 2011.
145. C. Gini, Measurement of Inequality of Incomes, The Economic Journal: 124-126, 1921.
146. P. W. Glimcher, and E. Fehr, eds. Neuroeconomics: Decision making and the brain.
Academic Press, 2013.
147. W.N. Goetzmann, J.E. Ingersoll and S.A. Ross, High-water Marks and Hedge Fund Man-
agement Contracts, Journal of Finance 58, 1685 - 1717, 2003.
148. W.N. Goetzmann and A. Kumar, Equity Portfolio Diversication, Review of Finance, Vol.
12, No. 3, 433 - 463, 2008.
149. W.N. Goetzmann and K. Rouwenhorst, The History of Financial Innovation, Carbon Fi-
nance Spearker Series at Yale, 2007.
150. S. Goldwasser and M. Bellare, Lecture Notes on Cryptography, MIT, 2008.
151. I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
152. A. Goyal, Empirical Cross-Sectional Asset Pricing: a Survey, Financial Markets and Port-
folio Management, 26(1), 3-38, 2012.
153. A. Goval and N. Jegadesh, Cross-Sectional and Time-Series Tests of Return Predictability:
What Is the Dierence? Review of Financial Studies, 31(5), 1784 - 1824, 2018.
154. A. Goyal and S. Wahal The Selection and Termination of Investment Management Firms
by Plan Sponsors, Journal of Finance 63, 1805 - 1847, 2008.
155. M. Grinblatt and S.Titman, Mutual Fund Performance: An analysis of quarterly portfolio
holdings, Journal of business: 393-416, 1989.
156. R.C. Grinold, The Fundamental Law of Active Management, The Journal of Portfolio
Management 15.3, 30-37, 1989.
157. R.C. Grinold and R.N. Kahn, Active Portfolio Management. A Quantitative Approach
for Providing Superior Returns and Controlling Risk, McGraw-Hill, Second Edition, New
York, 2000.
158. S.J. Grossman and J. E. Stiglitz, On the Impossibility of Informationally Ecient Markets.
The American Economic Review: 393-408, 1980.
159. S. Gu, B. T. Kelly and D. Xiu, D., Empirical Asset Pricing Via Machine Learning, Chicago
Booth, 2018.
499
160. Harvey, C. R., Liu, Y. and Zhu, H., â¦ and the cross-section of expected returns. The
Review of Financial Studies, 29(1), 5-68, 2016.
161. W. Hallerbach, Disentangling Rebalancing Return, Journal of Asset Management, 15, 301-
316, 2014.
162. J. Hansen, Australian house prices: A comparison of hedonic and repeatâsales mea-
sures. Economic Record, 85(269), 132-145, 2009.
163. L. Hansen and T. Sargent, Robust Control and Model Uncertainty. American Economic
Review 91 (2), 60-66, University Press, 2008.
164. C.R. Harvey, Y. Liu and H. Zhu, The Cross-Section of Expected Returns, Working Paper
ssrn.com, 2015.
165. J. Hasanhodzic, A. W. Lo, and E. Viola, Is It Real, or Is It Randomized?: A Financial
Turing Test, MIT Working Papers, 2010.
166. M. Hassine and R. Roncalli, Measuring Performance of Exchange Traded Funds. ssrn.com,
2013.
167. C. R. Harvey and Y. Liu, Backtesting, Journal of Portfolio Management, Volume 42,
Number 1, 13-28, 2015.
168. C. Harvey and A. Siddique, Conditional Skewness in Asset Pricing Tests, Journal of Fi-
nance, 55:1263-1295, 2000.
169. R. Haugen and A. Heins, Risk and the Rate of Return on Financial Assets: Some old Wine
in new Bottles, Journal of Financial and Quantitative Analysis, 10:775-784, 1975.
170. S. Hayley, Diversication Returns, Rebalancing Returns and Volatility Pumping, City
University London, 2015.
171. J. M. Grin, Are the Fama and French Factors Global or Country Specic?, Review of
Financial Studies, 15(3), 783-803, 2002.
172. S. Gu, B. Kelly and D. Xio, Empirical Asset Pricing via Machine Learning, Booth School
of Business University of Chicago, July 21, 2018.
173. E. Hazan, Theoretical Machine Learning, Princeton University, 2017.
174. G. He and R. Litterman, The Intuition Behind Black-Litterman Model Portfolios, Goldman
Sachs Asset Management Working paper, 1999.
175. R.D. Henriksson and R.C. Merton, On Market Timing and Investment Performance. II.
Statistical Procedures for Evaluating Forecasting Skills, Journal of business, 513-533, 1981.
176. O.C. Herndahl, Concentration in the Steel Industry, Diss. Columbia University, 1950.
177. U. Herold, Portfolio Construction with Qualitative Forecasts, Journal of Portfolio Man-
agement, Fall 2003, 61-72, 2003.
178. E. Hjalmarsson, Portfolio Diversication Across Characteristics, The Journal of Investing,
Vol. 20, No. 4, 2011.
179. S. Holden and J. VanDerhei, 401 (k) Plan Asset Allocation, Account Balances, and Loan
Activity in 2003, Investment Company Institute, Perspective, Vol. 6, No. 1., 2004.
180. K. Hou, C. Xue, and L. Zhang. Replicating Anomalies. No. w23394. National Bureau of
Economic Research, 2017.
181. G. Huberman and Z. Wang, Arbitrage Pricing Theory, Federal Reserve Bank of New York
Sta Reports, Sta Report no.216, 2005.
182. J. Huij and M. Verbeek, On The Use of Multifactor Models to Evaluate Mutual Fund
Performance, Financial Management, 38(1), 75-102, 2009.
183. M. Hulbert, The Prescient are Few, New York Times, July 13, 2008.
184. R.G. Ibbotson, P. Chen and K.X. Zhu, The ABCs of Hedge Funds: Alphas, Betas, and
Costs, Financial Analysts Journal, 67(1), 15-25, 2011.
185. T. Idzorek, A Step-By-Step guide to the Black-Litterman Model, Incorporating User-
Specied Condence Levels, Working paper, 2005.
186. T. Idzorek, and M. Kowara, Factor-Based Asset Allocation vs. Asset-Class-Based Asset
Allocation, Financial Analysts Journal, Vol. 69 (3), 2013.
187. A. Ilmanen, Expected Returns: An Investor's Guide to Harvesting Market Rewards, Wiley
Finance, 2011.
188. A. Ilmanen and J. Kizer, The Death of Diversication Has Been Greatly Exaggerated, The
Journal of Portfolio Management, Vol. 38, No. 3, 2012.
189. Investment Company Institute, Prole of Mutual Fund Shareholders, 2014, ICI Research
Report, 2014.
190. T. Jaakola, Machine Learning, MIT OpenCourseWare, MIT, 2016.
191. R. Jagannathan and T. Ma, Risk Reduction in Large Portfolios: Why Imposing the Wrong
Constraints Helps, Journal of Finance 58, 1651 - 1684, 2003.
192. R. Jagannathan, A. Malakhov and D. Novikov, Do Hot Hands Exist Among Hedge Fund
Managers? An Empirical Evaluation. The Journal of Finance, 65(1), 217-255, 2010.
193. T. Jaakkola, Machine Learning, MIT OpenCourseWare, 2016.
194. N. Jegadeesh and S. Titman, Protability of Momentum Strategies: An Evaluation of
Alternative Explanations. The Journal of Finance, 56(2), 699-720, 2001.
195. T. Jenkinson, H. Jones and J.V. Martinez, Picking winners? Investment consultants'
recommendations of fund managers, Forthcoming Journal of Finance, 2014.
196. M.C. Jensen, Some Anomalous Evidence Regarding Market Eciency, Journal of Financial
Economics, 6, 95-101, 1978.
197. H. Jiang and B. Kelly, Tail risk and Hedge Fund Returns, Chicago Booth Research Paper,
(12-44), 2012.
198. B. Johnson, A. Laszka, J. Grossklags, M. Vasek and T. Moore, Game-Theoretic Analysis
of DDoS Attacks Against Bitcoin Mining Pools, International Conference on Financial
Cryptography and Data Security. Springer Berlin Heidelberg, 2014.
199. B. Jones, Re-thinking Asset Allocation - The Role of Risk Factor Diversication, Deutsche
Bank Macro Investment Strategy, September 2011.
200. B. Jones, Rethinking Portfolio Construction and Risk Management, Deutsche Bank Macro
Investment Strategy, January 2012.
201. JP Morgan and Oliver Wyman, Unlocking Economic Advantage with Blockchain. A Guide
for Asset Managers, 2016.
501
202. J. Jogenfors, Key Distribution and Trust, Elliptic Curve Cryptography, Cryptography
Lecture 9, Linkoeping University, 2014.
203. J. Jogenfors, Digital Cash and Bitcoin, Cryptography Lecture 12, Linkoeping University,
2014.
204. E. Jurczenko and J. Teiletche, Active Risk-Based Investing, Working Paper ssrn.com, 2015.
205. Khan Academy, https://www.khanacademy.org.
206. D. Kahneman, Thinking Fast and Slow. New York: Farrar, Straus and Giroux, 2011.
207. D. Kahneman and A. Tversky, Prospect Theory: An Analysis of Decision under Uncer-
tainty, Econometrica 47: 236 - 91, 1979.
208. S. Kandel and R. F. Stambaugh, On the Predictability of Stock Returns: An Asset-
Allocation Perspective, The Journal of Finance, Vol LI, No. 2, 385-424, 1996.
209. H. Kaya, W. Lee and Y. Wan, Risk Budgeting with Asset Class and Risk Class Ap-
proaches', The Journal of Investing, Vol, 21, No. 1, 2012.
210. J.L. Kelly, A new Interpretation of Information Rate, Bell System Technical Journal, 35,
917-926, 1956
211. F. Knight, Risk, Uncertainty, and Prot, New York: Houghton Miin, 1921.
212. M.P. Kritzman, Puzzles of Finance: Six Practical Problems and Their Remarkable Solu-
tions, John Wiley, New York, NY, 2000.
213. R. Kunz, Asset Management, DAS in Banking and Finance, SFI, 2014.
214. Y.K. Kwok, Lecture Notes, University of Hong Kong, 2010.
215. C.H. Lanter, Institutional Portfolio Management, Swiss Finance Institute, Asset Manage-
ment Program, 2015.
216. O. Ledoit and M. Wolf, Improved Estimation of the Covariance Matrix of Stock Returns
with an Application to Portfolio Selection, Journal of Empirical Finance, 10(5), 603-621,
2003.
217. W. Lee, Advanced Theory and Methodology of Tactical Asset Allocation, Duke University,
2000.
218. W. Lee and D.Y. Lam, Implementing Optimal Risk Budgeting, The Journal of Portfolio
Management, 28, 1, 73-80, 2001.
219. O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance
matrices. Journal of multivariate analysis, 88(2), 365-411, 2003.
220. O. Ledoit and M. Wolf, Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selec-
tion: Markowitz Meets Goldilocks, Revue of Financial Studies, vol 30, 2018.
221. B. Lehmann and D.M. Modes, Mutual Fund Performance Evaluation: a Comparison of
Benchmarks and Benchmarks' Comparisons, Journal of Finance, 233 - 265 June, 1987.
222. M. Leippold, Resampling and Robust Portfolio Optimization, Lecture Notes University of
Zurich, 2010.
223. M. Leippold, Asset Management, Lecture Notes University of Zurich, 2011.
224. M. Leippold and R. Rüegg, Fifty Shades of Active and Index Alpha, ssrn.com, 2018.
225. E. Levina and R. Vershynin, Partial Estimation of Covariance Matrices, Probability theory
and related elds, 153(3-4), 405-419, 2012.
226. S. F. LeRoy and J. Werner, Principles of Financial Economics, Lecture Notes, UC Santa
Barbara and U Minnesota, 2000.
227. J. Lewellen, S. Nagel and J. Shanken, A Sceptical Appraisal of Asset Pricing Tests, Journal
of Financial Economics 96, 175-194, 2010.
228. Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar and J. Rosenschein, Bitcoin Mining
Pools: A Cooperative Game Theoretic Analysis, Proceedings of the 2015 International
Conference on Autonomous Agents and Multiagent Systems. International Foundation for
Autonomous Agents and Multiagent Systems, 2015.
229. H. Li, X. Zhang and R. Zhao, Investing in Talents: Manager Characteristics and Hedge
Fund Performance, Journal of Financial and Quantitative Analysis, 46(01), 59-82, 2011.
230. B. Liang, Hedge Funds: The Living and the Dead. Journal of Financial and Quantitative
Analysis, 35(03), 309-326, 2000.
231. C.-Y. Lin, Big Data Analytics, Lecture Notes, University of Columbia, 2015.
232. A. Lo, Data-Snooping Biases in Financial analysis. AIMR Conference Proceedings. Vol.
1994. No. 9. Association for Investment Management and Research, 1994.
233. A. Lo, The Statistics of Sharpe Ratios, Financial Analysts Journal, (58)4, 2002.
234. A. Lo, Ecient Markets Hypothesis, The New Palgrave: A Dictionary of Economics, L.
Blume, S. Durlauf, eds., 2nd Edition, Palgrave Macmillan Ltd., 2007.
235. D. Luenberger, Projection Pricing, Stanford University, researchgate.net, 2014.
236. F. Maccheroni, M. Marinacci and D. Runo, Alpha as Ambiguity: Robust Mean-Variance
Portfolio Analysis, Econometrica. Volume 81, Issue 3, pages 1075 - 1113, May, 2013.
237. G. Magnus, The Age of Ageing: Global Demographics, Destinies, and Coping Mechanisms,
First webcast: The Conference Board, 2013.
238. D. Mahringer, W. Pohl and P. Vanini, Structured Products: Performance, Costs and
Investments, SFI White Papers, 2015.
239. S. Maillard, T. Roncalli and J. Teiletche, On the Properties of Equally-Weighted Risk
Contributions Portfolios, ssrn.com 1271972, 2008.
240. B.G. Malkiel, The Ecient Market Hypothesis and Its Critics, Journal of economic per-
spectives, 59-82, 2003.
241. B.G. Malkiel and A. Saha, Hedge Funds: Risk and Return, Financial analyst journal,
61(6), 80-88, 2005.
242. L. Martellini and V. Milhau, Factor Investing: A Welfare-Improving New Investment
Paradigm or Yet Another Marketing Fad? EDHEC-Risk Institute Publication, July, 2015.
243. W. Marty, Portfolio Analytics. An Introduction to Return and Risk Measurement, Springer
Texts in Business and Economics (2nd edition), Springer Berlin, 2015.
244. J. F. May, World Population Policies: Their Origin, Evolution, and Impact, Canadian
Studies in Population 39, No. 1 - 2 (Spring/Summer 2012):125 - 34, Dordrecht: Springer,
2012.
503
245. McKinsey& Company, Looking Ahead in Turbulent Times - Strategic Imperatives for
Asset Managers Going Forward, SFI Asset Management Education, R. Matthias, 2015.
246. McKinsex&Company, State of the Industry 2014/15 - a Perspective on Global Asset Man-
agement, SFI Asset Management Education, R. Matthias, 2015.
247. A. Mehtaa, M. Bukov, C.-H. Wang, A.G.R. Daya, C. Richardson, C.K. Fisher and D.
J. Schwab, A high-bias, low-variance Introduction to Machine Learning for Physicists,
Physics Reports, March, 2019.
248. Melbourne Mercer Global Pension Index, Report, 2015.
249. The Memo, Looking for a UK business loan? Amazone might be the answer, 2015.
250. The Millennial Disruption Index, Viacom Media Networks, 2013.
251. MIT, Applied Macro- and International Economics II, Spring 2016, MIT OpenCourseWare,
2016.
252. E. Moritz, The Big Four - werden Amazon, Google, Apple und Facebook die besseren
Banken?, Finance News, 2016.
253. R. C. Merton, Lifetime Portfolio Selection under Uncertainty: the Continuous-Time Case,
The Review of Economics and Statistics 51 (3): 247 - 257, 1969.
254. R. C. Merton, Optimum consumption and portfolio rules in a continuous-time model,
Journal of Economic Theory 3 (4): 373 - 413, 1971.
255. R. C. Merton, An Intertemporal Capital Asset Pricing Model, Econometrica: Journal of
the Econometric Society, 867-887, 1973.
256. R. C. Merton, On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,
Journal of Finance, 29:449-470, 1974.
257. A. Meucci, Black - Litterman Approach, Encyclopedia of Quantitative Finance, Wiley
Finance, 2010.
258. A. Meucci, Fully Flexible Views: Theory and Practice, ssrn.com library, 2010b.
259. The Millennial Disruption Index, Viacom Media Networks, 2013.
260. P. Milnes, The Top 50 Hedge Funds in the World, hedgethink.com, 2014.
261. T. J. Moskowitz, Y.H. Ooi, and L. H. Pedersen, Time series momentum, Journal of Fi-
nancial Economics 104.2, 228-250, 2012.
262. A. H. Munnell, M.S. Rutledge and A. Webb, Are Retirees Falling Short? Reconciling the
Conicting Evidence, Reconciling the Conicting Evidence (November 2014). CRR WP
16, 2014.
263. A.H. Munnell and M. Soto, State and Local Pensions are Dierent from Private Plans,
Center for Retirement Research at Boston College, Number 1, November, 2007.
264. S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
265. NHGRI Genome Sequencing Program (GSP), www.genome.gov/sequencingcostsdata. 2017
266. S.V. Nieuwerburgh and R.S.J. Kojen, Financial Economics, Return Predictability, and
Market Eciency, University of Tilburg, Preprint, 2007.
267. R. Novy-Marx and J. D. Rauh, Policy Options for State Pension Systems and their Impact
on Plan Liabilities, Journal of Pension Economics and Finance 10.02: 173-194, 2011.
268. OECD Science, Technology and Industry Scoreboard: Innovation for Growth, Paris, 2013.
269. S. Pafka and I. Kondor, Estimated Correlation Matrices and Portfolio Optimization, Phys-
ica A, 343, 623-634, 2004.
270. S. Pal and T.K.L. Won, Energy, Entropy, and Arbitrage, arXiv preprint arXiv:1308.5376,
2013.
271. A. Patton, T. Ramadorai and M. Streateld. Change You Can Believe In? Hedge Fund
Data Revisions. Journal of Finance, 2013.
272. L. Pastor, R. F. Stambaugh and L. A. Taylor, Scale and Skill in Active Management,
Journal of Financial Economics, 2014
273. L. Pastor, and R. F. Stambaugh, Comparing Asset Pricing Models: An Investment Per-
spective, Journal of Financial Economics, 56, 335-381, 2000.
274. A. F. Perold and W. F. Sharpe, Dynamic Strategies for Asset Allocation, Financial Analyst
Journal, Jan, 16-27, 1988.
275. L. H. Pedersen, Sharpening the arithmetic of active management. Financial Analysts
Journal, 74(1), 21-36, 2018.
276. G. W. Peters, E. Panayi and A. Chapelle, Trends in Crypto-Currencies and Blockchain
Technologies: A Monetary Theory and Regulation Perspective, arXiv preprint, 2015.
277. E. Podkaminer, Risk Factors as Building Blocks for Portfolio Diversication: The Chem-
istry of Asset Allocation, Investment Risk and Performance, CFA Institute, 2013.
278. PriceWaterhouseCoupers, Asset Management 2020, A Brave New World, assetmanage-
ment, 2014
279. PriceWaterhouseCoupers, Asset & Wealth Management Revolution: Embracing Exponen-
tial Change, 2018.
280. E. Quian, A Mathematical and Empirical Analysis of Rebalancing Alpha, www.ssrn.com,
2014.
281. N. Rab and R. Warnung, Scaling Portfolio Volatility and Calculating Risk Contributions
in the Presence of Serial Cross-Correlations, arxiv.q-n.RM, preprint, 2011.
282. M. Rabin, Risk Aversion and Expected-Utility Theory: A Calibration Theorem, Econo-
metrica 68.5, 1281-1292, 2000.
283. T. Ramadorai, Capacity Constraints, Investor Information, and Hedge Fund Returns,
Journal of Financial Economics, 107(2), 401-416, 2013.
284. S. Ramaswamy, Market Structures and Systemic Risks of Exchange-Traded funds, BIS,
2011.
285. S.C. Rambaud, J.G. Perez, M.A. Granero and J.E. Segovia, Markowitz Model with Eu-
clidian Vector Spaces, European Journal of Operational Research, 196, 1245-1248, 2009.
286. R. Rebonato and A. Denev, Portfolio Management under Stress: A Baysian Net Approach
to Coherent Asset Allocation, Cambridge University Press, Cambridge, 2013.
505
287. L. M. Rotando and E.O. Thorp, The Kelly Criterion and the Stock Market, The American
Mathematical Monthly, December, 1992.
288. J. Rifkin, The Zero Marginal Cost Society: The Internet of Things, the Collaborative
Commons, and the Eclipse of Capitalism, Palgrave Macmillan Trade, 2014.
289. C. O. Roche, Understanding Modern Portfolio Construction, ssrn.com working paper,
2016.
290. P. Rohner, Seminar Asset Management, University of Zurich, 2014.
291. R. Roll, A Critique of the Asset Pricing Theory's Tests, Journal of Financial Economics
4: 129 - 176, 1977.
292. T. Roncalli, Introduction to Risk Parity and Budgeting, Chapman & Hall, Financial Math-
ematics Series, 2014.
293. T. Roncalli, How Machine Learning Can Improve Portfolio Allocation of Robo-Advisors,
swissQuant Conference, 2018.
294. S.A. Ross, The Arbitrage Theory of Capital asset Pricing, Journal of Economic Theory
13, 341 - 60, 1976.
295. S. Satchell and A. Scowcroft, A Demystication of the Black-Litterman Model: Managing
Quantitative and Traditional Portfolio Construction, Journal of Asset Management, Vol
1, 2, 138-150, 2000.
296. C.J. Savage, The foundation of statistics, Wiley, New York, 1954.
297. W. F. Sharpe, Capital asset prices: A theory of market equilibrium under conditions of
risk, Journal of Finance, 19 (3), 425-442, 1964.
298. B. Scherer, Portfolio Construction and Risk Budgeting, Third Edition, Risk Books, 2007.
299. SEC, Mutual Funds: A Guide for Investors, New York, 2008.
300. S. Schaefer, Factor Investing, Lecture at SFI Annual Meeting, 2015.
301. P. Schneider, Generalized Risk Premia, Journal of Financial Economics. forthcoming,
2015.
302. P. Schneider, C. Wagner and J. Zechner, Low Risk Anomalies, Preprint SFI, 2016.
303. C. Shimizu, H. Takatsuji, H. Ono, and K. Nishimura, Structural and temporal changes
in the housing market and hedonic housing price indices: A case of the previously owned
condominium market in the Tokyo metropolitan area. International Journal of Housing
Markets and Analysis, 3(4), 351-368, 2010.
304. J. Siegel, Stocks for the Long Run, McGraw-Hill, New York, NY, 1994.
305. S. Shalev-Shwartz, Introduction to machine Learning, Lecture Notes The Hebrew Univer-
sity of Jerusalem, 2016.
306. R. J. Shiller, The Uses of Volatility Measures in Assessing Market Eciency, Journal of
Finance 36: 291 - 304, 1981.
307. R. J. Shiller, From Ecient Markets Theory to Behavioral Finance, Journal of Economic,
Perspectives 17 (1): 83 - 104, 2003.
308. R. J. Shiller, Speculative Asset Prices, Cowles Foundation Paper No. 1424, 2014.
309. R. J. Shiller, Market Eciency and Role of Finance in Society, Key Note Lecture, EFA
2014, Lugano, 2014.
310. R. J. Shiller and A.N. Weiss, Home Equity Insurance, The Journal of Real Estate Finance
and Economics, 19(1): 21-47, 1999.
311. M. Silver, How to better measure hedonic residential property price indexes, IMF Working
Paper, 2018.
312. A. J. Smola and B. Schölkopf, A tutorial on support vector regression. Statistics and
computing, 14. Jg., Nr. 3, S. 199-222, 2004.
313. Y. Sompolinsky and A. Zohar, Secure High-Rate Transaction Processing in Bitcoin, In-
ternational Conference on Financial Cryptography and Data Security. Springer Berlin
Heidelberg, 2015.
314. State Street, The Folklore of Finance, Center of Applied Research. 2014.
315. G.V.G. Stevens, On the Inverse of the Covariance Matrix in Portfolio Analysis, The Journal
of Finance, Vol. 53(5), 1821-1827, 1998.
316. R. Sullivan, A. Timmermann, and H. White, Data-snooping, Technical Trading Rule Per-
formance , and the Bootstrap, The Journal of Finance 54 (5), 1647 - 1691, 1999.
317. M. Swan, Blockchain: Blueprint for a New Economy, O'Reilly Media, 2015.
318. Swissquant, Costumer Retention, Big Data Analytics, 2017.
319. J. Syz, M. Salvi and P. Vanini, Property Derivatives and Index-Linked Mortgages, Journal
of Real Estate Finance and Economics, Vol. 36, No. 1, 2008.
320. J. Syz and P. Vanini, Real Estate, Swiss Finance Institute Annual Meeting, 2008.
321. N. Sullivan, A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography,
Cloudfare blog, 2013.
322. N. Szabo, Formalizing and Securing Relationships on Public Networks, First Monday, 2(9),
1997.
323. P. Tasca, Economic Foundation of the Bitcoin Economy, University College London, Center
for Blockchain Technologies, Blockchain Workshop Zurich, 2016.
324. N. Taleb, The Black Swan. The Impact of the Highly Improbable. New York: Random
House, 2010.
325. J. Teiletche, Risk-Based Investing: Myths and Realities, CFA UK Masterclass, London
June 9th, 2015.
326. J. Teiletche, Active Risk-Based Investing, CQ Asia, Hong Kong, 2014.
327. M. Teo, The Liquidity Risk of Liquid Hedge Funds, Journal of Financial Economics, 100(1),
24-44, 2011.
328. J. Ter Horst and M. Verbeek, Fund Liquidation, Self-Selection, and Look-Ahead Bias in
the Hedge Fund Industry, Review of Finance, 11(4), 605-632, 2007.
329. J. Treynor and K. Mazuy, Can Mutual Funds Outguess the Market, Harvard business
review, 44(4), 131-136, 1966.
507
330. F. Trojani and P. Vanini, A Note on Robustness in Merton's Model of Intertemporal

Consumption and Portfolio Choice, Journal of Economic Dynamics and Control, Vol. 26,
No. 3, 423-435, 2002.
331. Tu and Zhou, Data-Generating Process Uncertainty, What Dierence Does it Make in
Portfolio Decisions?, Journal of Financial Economics, 72, 385-421, 2003.
332. S. Tilly and F. Triebel, Automobilindustrie 1945-2000, Stepanhie Tilly % Florian Triebel
(eds), Oldenburg Verlag München, 2013.
333. UBS, Strategy and Regulation. Impact of Regulation on Strategy and Execution, SFI
Conference on Managing International Asset Management, N. Karrer, 2015.
334. UBS, Distribution Strategies in Action, SFI Conference on Managing International Asset
Management, A. Benz, 2015.
335. Vershynin, R., How close is the sample covariance matrix to the actual covariance matrix?.
Journal of Theoretical Probability, 25(3), 655-686, 2012.
336. Viacom Media Networks, 2013.
337. L. Vignola and P. Vanini, Optimal Decision-Making with Time Diversication, Review of
Finance, 6.1, 1-30, 2002.
338. I. Walter, The Asset Management Industry Dynamics of Growth, Structure and Perfor-
mance , edited By Michael Pinedo and Ingo Walter, 2013.
339. J.H. White, Volatility Harvesting: Extracting Return from Randomness, arXiv, November,
2015.
340. World Economic Forum, The Future of Long-term Investing, New York, 2011.
341. World Economic Forum, Future of Financial Services, New York, 2015.
342. World Economic Forum, Beyond Fintech: A Pragmatic Assessment Of Disruptive Potential
In Financial Services, New York, 2017.
343. A. Yeniay and A. Göktas, A Comparison of Partial Least Square Regression with other
Prediction Methods, Journal of Mathematics and Statistics Volume 31, 99-111, 2002.
344. A. Zelltner and V.K. Chetty, Prediction and Decision Problems in Regression Models from
the Baysian Point of View, Journal of the American Statistical Association, 60, 608-616,
1965.
345. ZKB, Index Methoden, 2013.
346. H. Zou, The Adaptive LASSO and its Oracle Properties, Journal of the American Statis-
tical Association 101(476), 1418â1429, 2006.
347. G. Zyskind, N. Oz and A. Pentland, Enigma: Decentralized Computation Platform with
Guaranteed Privacy, arXiv preprint, 2015.
Index
[, 356 Bayesian Approach to Estimation Risk, 175
Benchmark Return, 65
Active Investment and Benchmarking, 145 Benchmarking, 115
Active Portfolio Management, 25 Beta and Volatility Based Low Risk Anoma-
Active versus Passive lies, 266
Sharpe's Arithmetics , 114 Bias-Variance Trade-O, 378
Altcoins, 441 Bitcoin Protocol, 427
Alternative Investments (AIs) Bitcoin Security, 443
Insurance-Linked Investments, 316 Black and Scholes
Arithmetical Relative Return (ARR), 65 Call Price, 338
Asset Class Black-Litterman Model, 151
Denition, 11 Black-Scholes Equation, 342
Asset Management Industry Brinson-Hood-Beebower (BHB) Eect, 65
Expectations 2020, 8 Broken Covered Interest Parity (CIP), 355
Overview 2002-2015, 8 Buy-and-hold, static, 53
Wealth 2020, 15
Asset Management Overview, 12 Capital Weighted Index Funds, 305
Asset Pricing CAPM, 8
Absolute Pricing, 235 Appraisal (Information) Ratio, 202
Fundamental Asset Pricing Equation, Assumption, 197
238 Beta Pricing Model, 196
General Equilibrium, 236 CML and SML, 199
Good and Bad Times, 239 Conditional CAPM, 206
Low Volatility Strategies, 266 Empirical Failure, 203
Multi Factor Models, 265 Jensen's alpha, 202
Multi Period, 265 Performance Measurement, 201
What Happens if an Investment Strat- Proposition, 198
egy is Known to Everyone? , 269 Tracking Error, 201
Average Investment Capital (AIC), 68 Treynor Ratio, 202

CAPM Cross Section, 204
Backtests CAPM Time Series, 204
Data Snooping, 212 Cash Flow (CF), 36
False Discovery Rate (FDR), 218 Centralized and Decentralized Architecture,
Multiple Testing, 218 422
Barrier Reverse Convertibles, 103 Cholesky Decomposition, 196
Basis of Forwards, 29 CIO Investment Process, 154
508
INDEX 509
Compounding, 38 Two Statistical Propositions, 102

Conduct Risk, 94 Drifted Weights, 53
Cost and Risk Function, 376 Dynamic Investment
Covered Interest Parity (CIP), 351 Goal Based Investment (GBI), 132
Cross-Sectional vs Time Series Predictabil- Merton Model, 130
ity, 83
CRR Eective Rate, 39
Filtration, 72 Ellsberg Paradoxon, 126
Possible State, 71 Empirical Risk Minimization (ERM, 377
cryptocurencies, 436 Energy Term, 63
Currency Overlay, 350 Estimation Risk, 162

ETF
Data Pre-Processing, 372 Construction, 309
Denition Alpha, 24 Dierent Asset Classes, 312
Demography and Pension Funds, 445 Leveraged ETFs (LETFs), 313
Dietz Return, 68 Unfunded Swap-Based Approach, 311
Digital Signatures, 414, 421 Exact Factor Pricing Equation, 253
Discount Factor, 37 Expectation Functionas, 250
Discount Function, 37 Expectation Kernel, 250
Discrete Logarithm, 417 Expected Loss, 381
Discrete Model Expense Ratios for Actively Managed Funds,
Arrow-Debreu Securities, 247 Index Funds and ETFs, 315
Discount Factor, 247
First Fundamental Theorem of Finance Factor Investing, 207
(FFTF), 246 Factor Investment
Hedge Risk, 248 Industry Approach, 188
No Arbitrage, 246 Factor Risk Premium, 196
Payo Matrix, 245 False Discovery Methodology
Risk Neutral Probabilities, 247 FDR, 332
Second Fundamental ... , 248 False Discovery Rate
State Prices, 246, 247 FDR, 229
Trinomial Model, 249 Fama-French
Complete Market, 246 3-Factor Model, 207
Diversication, 98 5-Factor Model, 209
Asset Allocation Europe, 110 Fee Models, 6
Fork, 431
Conservative, Balanced, Dynamic, Growth
Portfolios, 99 Forward Rate Agreements (FRA), 40
Costs and Performance, 113 Forwards and Futures, 27
Dierent Portfolio Constructions , 108 Frontier Returns, 252
Herndahl Index, 106 Fund Industry
Needed Investment Amount, 102 Mutual Funds , 295
Risk Scaling, Square-root Rule, 112 Taxonomy of Mutual Funds , 296
Shannon Entropy , 107 Mutual Funds and SICAVs, 293
Tasche Index, 106 Overview, 291
510 INDEX
US Mutual Funds versus European UCITS, Withdrawing Restrictions, 327

294 Hedonic Index, 259
Active vs Passive Investments, 224 Herding of Pension Funds, 281
Fees for Mutual Funds, 298 Heuristic Models, 126
Fundamental Law of Active Manage-
Index Construction, 302
ment, 226
Index Funds and ETFs, 302
kill and Luck in Mutual Fund Manage-
Information Coecient (IC), 226
ment, 229
Information Ratio IR, 156, 226
Success of the Active Strategy, 224
Interest Rate Parity
TER and Performance, 298
Carry Trade, 354
UCITS, 300
CIP, 352
FX Forward, 351
Covered, 351
Game Theoretic Concept Blockchain, 431 Trilemma, 354
General Linear Model, 408 UIP, 354
Generalization Error, 377 Uncovered, 351
Geometric Margin, 394 Interest Rate Swaps (IRS), 40
Global AM Internal Rate of Return (IRR), 68
2014-2020, 288 Investment Consultants, 452
AM versus Trading, 289

Kelly Criterion, 124
AM versus Wealth Management, 291
Demand and Supply Side, 284 Ledoit-Wolf Shrinkage, 151
Eurozone, 284 Liquid TAA Replication, 351
Global Figures 2007-2014, 286 Long Run Return and Risk, 98
Greece and EU Uncertainty, 127 Longevity and Demographics, 5
Greeks and Black and Scholes, 342 Loss Function, 175
Greeks, Delta, 339
Greeks, Gamma , 339 Machine Learning, 374
Greeks, Rho, 341 Agnostic Learning, 384
Greeks, Theta, 339 Approximation Error, 387
Greeks, Vega, 341 Batch Setting, 380
Growth of Wealth, 5 Bayes Classier, 382
Growth Optimal Portfolios, 124 Bias-Variance Tradeo, 379
Growth Rate of Wealth, 53 Classication Algorithm, 376
Growth Rates AuM, 14 Complexity, 380

Consistency, 385
Hedge Funds Customer Retention: Text Mining, 404
CTA Strategy, 325 Dvoretzky - Kiefer - Wolfowitz inequal-
Denition, 322 ity, 388
Entries and Exits, 329 Empirical Risk Minimization, 388
Fees, 325 Empirival Risk Minimization (ERM),
Industry, 323 383
Investment Performance, 329 Ensemble Models, 400
Strategies, 325 Error Function, 381
INDEX 511
Estimation Error, 387 Minimum Risk Value, 382

Generalization, 385 Money-Weighted Rate of Return (MWR),
Geometric Margin, 394 67
Hypothesis Class, 376 Moore-Penrose Pseudo Inverse, 245
Inequality of Hoeding, 388 Mutual Distributed Leader Technology, 421
Linear Threshold Model, 393 Mutual Funds
Margin, 394 UCITS, 9
Naive Bayes Classier, 400

No Arbitrage Condition, 37
No-Free-Lunch Theorem, 386
Normalized Portfolio, 52
On-line Setting, 380
Perceptron Rule, 393
One-Way Function, 415
Probably Approximately Correct (PAC),
Optimal Investment
382
Convex and Concave Strategies, 273
Realizability, 380
Introduction, 270
Sentimental Risk Model, 403
Long-term (hedging demand), 271
Support Vector Machines (SVM), 395
Rebalancing = Short Volatility, 59
Symmetrization Trick, 386
Rebalancing and Leverage, 69
Threshold Linear Classier, 393
Rebalancing Facts, 271
Tree Based Learning, 397
Short-term (myopic), 271
Union Bounds, 389
Volatility Drag, 55
Vapnik and Chervonenkis, 389
Ordinary Risk Premium, 196
Vapnik and Chervonenkis Symmetriza-
tion Lemma, 386 Par Swap Rate, 42
Macro Economic Uncertainty, 127 Parameter Uncertainty, Estimation Risk, 163
Margins, 29 Payo Pricing Functional, 250
Market Evolution Pension Fund
Option Trading Book, 342 DB versus DC, 20
Market Neutral, 183 Management, 21
Market Portfolio, 62 Pension Funds, 449
Market Price of Risk (MPR), 131 Dened Benet (DB), 18, 19
Market Weighted Indizes, 114 Dened Contribution (DC), 18, 19
Markowitz, 119 Longevity and Fertility, 19
Comparing Other Models, 146 SAA, TAA, 23
Estimation Covariance, 165 TAA and SAA, 24
Many Risky Assets, 134 Technical Interest Rates, 21
Mutual Fund Theorem, 138 Three Pillar System, 18
Principle, 119 Performance Attribution Tree, 66
Principle Component Analysis (PCA), Platform-as-a-Service (PaaS), 88
167 Plug-In, Estimation Risk, 175
Risk-Free Asset, 139 Popularity of Markowitz Model, 146
TAA and SAA, 144 Portfolio, 52
Tangency Portfolio, Capital Market Line, Portfolio Construction
140 Overview, 7
512 INDEX
Predictability Reward-Risk Ratio, 110

Denition, 76 Reward-risk ratio (RR), 111
Forecast Regression, 79 Riesz Kernel, 250
Martingale, 76 Riesz-Fischer Theorem, 250, 479
Return Predictability, 80 Risk Budgeting
Pricing Kernel, 250 Budgeting Problem, 158
Prime Finance, 356 Equal Risk / Risk Parity Contribution
Principal Component Analysis, 167 (ERC), 160
Private Investors and Institutional Investors, Introduction, 157
16 Risk Allocation, 158
Private Markets, 320 Risk Measurements, 157
Prot and Loss P&L, 342 Risk Factors
Projections AuM 2020, 15 Idiosyncratic Risk, 182
Proof-of-Work PoW, 427 Industry Evolution, 187
Proof-of-Work PoW), 426 Momentum, 184
PV, FV, 37 Quality, 183
Risk Function, 175
Real Estate Equilibrium Valuation, 264
Risk Preferences, 121
Real Estate Replication Portfolio Valuation,
Risk Premia, 23
264
Risk Weighted Index Funds, 308
Real Estate Risk, 258
Rule 2/20 for Hedge Funds, 322
Rebalanced, 53
Regularization, 377 Self-nancing, 52
Regularization Techniques, 168 Self-Financing Strategy, 52
Regulation Sharpe Ratio (SR), 56
Anti-Tax-Evasion, 95 Short-Term versus Long-Term Investment
CIO Investment Process, 92 Horizons, 276
Client Segmentation, 90 Shrinkage Rule, 178
Conduct Risk, 94 Sovereign wealth fund (SWFs), 16
Fines in UK, 95 Stability Markowitz, 135
Hedge Fund Disclosure, 96 Statistical Learning Model, 376
Impact Swiss Banking Industry, 87 Statistical Models, 119
Intermediation Channel Segmentation, Stochastic Discount Factor (SDF), 76
89 Stochastic Portfolio Theory (SPT), 60
Mandate Solutions, 93 Structural Volatility Arbitrage, 343
MiFID II, 88 Structured Products, 334
Overview, 86 Symmetric Cryptography, 414
Product Suitability, 91 Synchronization in Blockchains, 425
Relative Entropy Return Decomposition, 63
Relative Pricing TAA Construction, 25
Arbitrage Pricing Theory (APT), 256 Technology
Renaissance Medallion Fund, 85 Cryptography Examples, 429
Repo Transaction, 357 Dierent Currencies, 436
Return and Leverage, 68 Big Data Denition, 371
INDEX 513
Bitcoin, 438
Customer-centricity, 364
DAO hack, 435
Disruptive Eciency, 364
Etherum, 434
FinTech and Big Data Summary, 9
Hierarchical Risk Parity Portfolio Con-
struction, 409
ICO, 444
Test Set, 376
Thikonov Regularization, 149
Time Value of Money, 36
Time-Weighted Rate of Return (TWR), 66
Tobin Separation, 199
Tracking Error, 115, 145
Training Set, 376
Transaction Consensus, 423
Transaction Execution, 423
Transaction Feasibility, 423
Transaction Legitimization, 423
True Risk, 381
Two-Pass Regressions, 205
Uncertainty, 120
Uncovered Interest Parity, 354
Uniformity of Minds, 454
Universality, 170
Value Chain, Investment Process and Tech-

nology, 367
Vector Space E, 250
Vega, 343
Volatility Drag Equation, 56
volatility harvesting, 57
Warren Buet, 85
Wealth of Nations, 13
Yap problem, 428

Yield-to-Maturity (YtM), 38
View publication stats

Asset Management

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Asset Management

Caricato da

Copyright:

Formati disponibili

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Book · May 2018

Asset Management View project

Operational Risk View project

The user has requested enhancement of the downloaded file.

May 17, 2019

2.6.2 Diversication of Assets - Portfolios . . . . . . . . . . . . . . . . . 98

3 Portfolio Construction 117

3.6 Backtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

4 Investment Theory Synthesis 235

5 Global Asset Management 283

5.3.3 Fees for Mutual Funds . . . . . . . . . . . . . . . . . . . . . . . . . 298

6 Asset Management Innovation 361

6.2.2 Demand for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 372

• Technological Disruptions: Platforms, data analysis and mutual distributed ledger

• The distribution of AM services will be redesigned. Economies of scale force global

Today's technology enables new approaches to investment. Such connections between

• Technology is irreversible while regulation is not. Regulators could revoke any

• Technology has still an overall positive connotation - it improves the circumstances

• Technology puts clients center stage. Regulation intends to do so.

Portfolio construction in Chapter 3 has three steps:

• Asset implementation maps the theoretical asset allocation into trades.

Table 1.1: Expected AuM growth until 2020 ((PwC [2014]).

AM is aected dierently by digitization: market platforms, smarter and faster ma-

Asset management is a systematic process of analyzing, trading, lending and borrow-

To summarize, the four key questions in AM are:

2. How do we invest? The investment method question.

3. Where do we invest? The asset selection question.

• Which clients should be served?

2.1 Wealth of Nations and Assets under Management (AuM)

• Western Europe, northern America, Japan: 1.6% p.a.

Markets 31 Dec 1989 31 Mar 2016 trn USD

Share of Global Nominal Consumption

2.2.1 Sovereign Wealth Funds (SWFs)

2.2.2 Pension Funds

Figure 2.4 illustrates the importance of dierent pillars in dierent countries.Retirement

intergenerational risk sharing.

The drivers in pay-as-you-go systems

The above statements can be illustrated by the following back-on-the-envelope cal-

2.2.2.1 DB versus DC Planes

Another perspective on DC and DB is nancial literacy - the ability of decision makers

2.2.2.2 Demographic Changes and Longevity

To understand redistribution risk, we consider the technical interest rate. This is by

2.2.3 Management of Pension Funds

Denition 2. (Sharpe (2007)) In SAA, an investor's return objectives, risk tolerance,

Example Historical background TAA

Denition 4. Look-forward (ex-ante), alpha is a forecast of residual return. Looking

We turn to active portfolio management. An active position is the dierence between

2.2.4 TAA Construction

ISV TAA ISV TAA Weight Duration

2.2.5 Forwards and Futures

F (t, T ) = S(t) · er(T −t) (2.1)

W (T ) = −S(T ) · er(T −t) + S(T ) − (S(T ) − F (t, T ))

F (t, T ) ∼ S(t) · (1 + r(T − t)) . (2.2)

F (t, T ) = S(t) · eq(T −t) (2.3)

V (s) := PV(S(T ) − F (s, T )) .

The PV of S(T ) equals S(s) if no dividends are paid and

PVF (t, T ) = D(s, T )F (t, T )

Proposition 7. The value of forward contract at time s is given by

V (s) = (F (s, T ) − F (t, T ))D(s, T ) . (2.4)

Consider a stock with S(t = 0) = 25 CHF and a 6m forward contract. The 6m

F (0, 0.5) = 25(1 + 0.0712/2) = 25.89 CHF

F (0.25, 0.5) = 23(1 + 0.0808/4) = 23.46 CHF .

V (0.25) = (F (0.25, 0.5) − F (0, 0.5))D(0.25, 0.5) = −2.38 CHF .

2.6.2 Diversication of Assets - Portfolios . . . . . . . . . . . . . . . . . 98

AM is aected dierently by digitization: market platforms, smarter and faster ma-

Figure 2.4 illustrates the importance of dierent pillars in dierent countries.Retirement

Another perspective on DC and DB is nancial literacy - the ability of decision makers

Denition 2. (Sharpe (2007)) In SAA, an investor's return objectives, risk tolerance,

Denition 4. Look-forward (ex-ante), alpha is a forecast of residual return. Looking

We turn to active portfolio management. An active position is the dierence between

CHF 1. Therefore, the dierent rates Rc , Rd , Rs are in one-to-one relationship. Equating

Example Eective rate of return and Yield-to-Maturity (YtM)

Table 2.8: Rates for rms A and B .

PVSwap, xed (0, s0,T (0)) = PVSwap, oating (0) .

3 We assume that the dierence between two consecutive dates is equidistant.

Proposition 9. The PV of a oating rate note is equal to the notional.

• At initiation time 0 no cash ows are exchanged.

Solving implies for the rst discount factor

with C the cash ow matrix, ~