00 mi piace00 non mi piace

5 visualizzazioni53 pagineaccounting

Aug 29, 2017

SSRN-id2590588

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

accounting

© All Rights Reserved

5 visualizzazioni

00 mi piace00 non mi piace

SSRN-id2590588

accounting

© All Rights Reserved

Sei sulla pagina 1di 53

Johan Perols

Associate Professor of Accounting

University of San Diego

jperols@sandiego.edu

Robert Bowen

Distinguished Professor of Accounting

University of San Diego

rbowen@sandiego.edu

Carsten Zimmermann

Associate Professor of Management

University of San Diego

zimmermann@sandiego.edu

Basamba Samba

RWTH Aachen University

basamba.samba@rwth-aachen.de

April 2016

University of San Diego and helpful comments from Darren Bernard, Nicole Cade, Ed deHaan,

Weili Ge, Jane Jollineau, Yen-Ting (Daniel) Lin, Sarah Lyon, Dawn Matsumoto, Barry Mishra,

Ted Mock, Ryan Ratcliff, Terry Shevlin, Brady Williams, and workshop participants at the

University of California, Riverside and the University of San Diego. All remaining errors are

our own.

Finding Needles in a Haystack: Using Data Analytics to Improve Fraud Prediction

Abstract: Developing models to detect financial statement fraud involves challenges related

to (i) the rarity of fraud observations, (ii) the relative abundance of explanatory variables

identified in the prior literature, and (iii) the broad underlying definition of fraud. Following

the emerging data analytics literature, we introduce and systematically evaluate three

methods to address these challenges. Results from evaluating actual cases of financial

statement fraud suggest that two of these methods improve fraud prediction performance by

approximately ten percent relative to the best current techniques. Improved fraud prediction

can result in meaningful benefits, such as improving the ability of the SEC to detect

fraudulent filings and improving audit firms client portfolio decisions.

Key words: Financial statement fraud, Data analytics, Fraud prediction, Risk

assessment, Data rarity, Data imbalance.

Data availability: Data are available from sources identified in the text.

I. INTRODUCTION

Organizations lose an estimated 5 percent of annual revenues to fraud in general and 1.6

percent of annual revenues specifically to financial statement fraud (ACFE 2014). Further, when

resources are misallocated because of misleading financial data, fraud can harm the efficiency of

capital, labor, and product markets. Financial statement fraud (henceforth fraud) also increases

business risk. For example, audit firms can face lawsuits, reputational costs, and loss of clients;

investors and banks are more likely to make suboptimal investment and loan decisions.

Data analytics is an important emerging field in both academic research (e.g., Agarwal and

Dhar 2014; Chen, Chiang, and Storey 2012) and in practice (e.g., Brown, Chui, and Manyika

2011; LaValle, Lesser, Shockley, Hopkins, and Kruschwitz 2013).1 In the fraud context, data

analytics can, for example, be used to create fraud prediction models that help (i) auditors

improve client portfolio management and audit planning decisions and (ii) regulators and other

oversight agencies identify firms for potential fraud investigation (e.g., SEC 2015; Walter 2013).

However, the usefulness of data analytics in fraud prediction is hindered by three challenges.

First, fraud prediction is a needle in a haystack problem. That is, the relative rarity of fraud

firms compared to non-fraud2 control firms (Bell and Carcello 2000) makes fraud prediction

difficult (Perols 2011). Second, fraud prediction is complicated by the curse of data

dimensionality (Bellman 1961). The rarity of fraud observations relative to the large number of

explanatory variables identified in the fraud literature (Whiting, Hansen, McDonald, Albrecht,

1 Data analytics refers to techniques that are grounded in data mining (e.g., decision trees, artificial neural networks, and support

vector machines) and statistics (e.g., ANOVA, regression analysis, and logistic regressions) (Chen et al. 2012). Data analytics

draws from statistics, artificial intelligence, computer science, and database research. It is related to big data in that it provides

tools that enable the analysis of large datasets. Data analytics is typically focused on prediction as opposed to explanation.

2 We use the term fraud as opposed to other terminology, such as material misstatements (Dechow, Ge, Larson, and Sloan 2011)

or misreporting. The primary difference between fraud and misstatements is that fraud is intentional while misstatements can be

either intentional or errors. Further, we use the term non-fraud firms to describe all firms for which fraud has not been detected.

This primarily includes firms that have not committed fraud, but also includes undetected cases of fraud. To the extent that

undetected fraud exists in our data, noise is introduced. This noise reduces the effectiveness of all prediction models, and

methods that address this noise might further improve fraud prediction. However, this noise is not likely to bias performance

comparisons among prediction models that use the same data.

-1-

and Albrecht 2012) can result in over-fitted prediction models that perform poorly when

predicting new observations. Third, prior research generally treats all frauds as homogeneous

events. This can make fraud prediction more difficult because prediction models have to detect

patterns that are common across different fraud types (e.g., revenue vs. expense fraud).

While prior fraud detection research enhances our general understanding of fraud indicators

and prediction methods, this research rarely addresses these problems explicitly. With a primary

methods grounded in data analytics research.3 The methods we examine have performed well in

other settings characterized by data rarity, such as predicting credit card fraud (e.g., Chan and

Stolfo 1998). The first method, Multi-subset Observation Undersampling (OU), addresses the

imbalance between the low number of fraud observations relative to the number of non-fraud

observations by creating multiple subsets of the original dataset that each contain all fraud

observations and different random subsamples of non-fraud observations. The second method,

Multi-subset Variable Undersampling (VU), addresses the imbalance between the low number of

fraud observations relative to the number of explanatory variables identified in the fraud

The third method, VU partitioned by type of fraud (PVU), is a variation of the second method

that addresses issues associated with treating all fraud cases as homogenous events. Rather than

randomly selecting variables, we instead use our a priori knowledge to partition the variables

into subsets based on their relation to specific types of fraud (e.g., revenue vs. expense fraud).

We use a dataset with 51 fraud firms, 15,934 non-fraud firm years, and 109 explanatory

variables from prior research. We then analyze over 10,000 prediction models to systematically

3We evaluate our results on out-of-sample data and thus perform predictive modeling. To clearly delineate our work from

explanatory models, we refer to our models as prediction models throughout the paper.

-2-

evaluate how to best implement these methods, e.g., how many data subsets to use in OU. In

benchmarks that represent the current standard in the literature, e.g., model 2 in Dechow et al.

(2011) and simple undersampling as used in Perols (2011). To avoid biasing the results, we

evaluate prediction performance using the prediction models probability predictions on hold-out

Results indicate that including additional data subsets (up to approximately 12 subsets)

increases OU fraud prediction performance, i.e., additional subsets after 12 do not appear to

While results indicate that VU also has the potential to improve fraud prediction, the

performance of this method is highly dependent on the specific variables selected in the various

independent variables into different subsets based on the type of fraud they are likely to predict,

e.g., revenue or expense fraud. This method, i.e., PVU, improves fraud prediction performance

by 9.6 percent relative to the best performing VU benchmark. Additional analyses also show

that performance can be further improved by combining OU and PVU, but only under certain

Our paper makes at least five important contributions. First, by introducing and

systematically evaluating three new methods and showing that two of these methods improve

4 We follow recent fraud data analytics research (e.g., Cecchini et al. 2010) and findings in Perols (2011) and implement all

prediction models using support vector machines. Support vector machines determine how to separate fraud firms from non-

fraud firms by finding the hyperplane that provides the maximum separation in the training data between fraud and non-fraud

firms. In additional analyses we also use logistic regression and bootstrap aggregation to examine the robustness of our results.

-3-

research that focuses on improving the performance of fraud prediction models. The

performance improvements from OU and PVU are large relative to other approaches for

improving prediction performance, e.g., (i) a 0.9 percent performance advantage in Dechow et al.

(2011) when two additional significant independent variables are added to their initial model and

(ii) a 2.2 percent improvement in Price, Sharp, and Wood (2011), when comparing Audit

Integritys Accounting and Governance Risk measure to Dechow et al. (2011) model 2.5

Second, the finding that OU significantly improves prediction performance has important

methodological implications for research that evaluates the value of new explanatory variables.

This research can potentially benefit from applying OU to ensure that (i) results are robust across

different subsamples and (ii) new variables provide incremental predictive value to models

Third, we show that the ability of VU to predict fraud improves consistently only when we

recognize that not all frauds are alike and subdivide the general fraud problem into types of

fraud. The importance of this approach likely extends beyond variable undersampling. For

example, future research could reorganize or design new fraud variables to predict a specific

Fourth, OU and PVU can be extended to address rarity and data dimensionality problems that

5 Dechow et al. (2011) do not report predictive performance and the 0.9 percent difference is based on a separate analysis that we

performed using the two models in their paper (Model 1 and Model 2). This analysis uses the same procedures used in our

material misstatement analysis described in Section IV. Price et al. (2011) compare Audit Integritys Accounting and

Governance Risk measure, which is considered the gold standard in commercial risk measures, to Dechow et al. (2011) Model 1

using material misstatement data. Based on their results, we calculate a 3.16 percent fraud prediction performance improvement

of the commercial measure to model 1. This implies a 2.24 percent improvement over Dechow et al. (2011) Model 2, which we

include as one of our benchmark models.

-4-

Finally, the introduction and evaluation of these methods makes an important contribution to

practice. Better prediction models can, for example, help the SEC and external auditors improve

their identification of potentially fraudulent accounting practices (Walter 2013; SEC 2015).

The remainder of the paper is organized as follows. Section II summarizes the fraud

literature, discusses data rarity, and describes how methods drawn from the data analytics

literature can be applied to fraud prediction. Section III describes the data, performance

measure, and experimental design. Section IV provides results, and section V concludes.

Prior Fraud Prediction Research

Research on financial statement fraud prediction contributes to understanding factors that can

be used to predict fraud. Prior research includes testing fraud hypotheses grounded in the

earnings management and corporate governance literatures (e.g., Beasley 1996; Dechow, Sloan,

and Sweeney 1996; Summers and Sweeney 1998; Beneish 1999; Sharma 2004; Erickson

Hanlon, and Maydew 2006; Lennox and Pittman 2010; Feng, Ge, Luo, and Shevlin 2011; Perols

and Lougee 2011; Caskey and Hanlon 2012; Armstrong, Larcker, Ormazabal, and Taylor 2013;

Markelevich and Rosner 2013). This research also evaluates the significance of a variety of

other potential explanatory variables, such as red flags emphasized in auditing standards,

discretionary accruals measures, and non-financial indicators (e.g., Loebbecke, Eining, and

Willingham 1989; Beneish 1997; Lee, Ingram, and Howard 1999; Apostolou, Hassell, and

Webber 2000; Kaminski, Wetzel, and Guan 2004; Ettredge, Sun, Lee, and Anandarajan 2008;

Jones, Krishnan, and Melendrez 2008; Brazel, Jones, and Zimbelman 2009; Dechow et al. 2011).

Varian (2014) highlights the importance of the emerging field of data analytics. He suggests

that researchers using traditional econometric methods should consider adapting recent advances

from this field. A second stream of financial statement fraud prediction research follows this

-5-

suggestion and applies developments in data analytics research to improve fraud prediction.

Early research within this stream concludes that artificial neural networks perform well relative

to discriminant analysis and logistic regressions (e.g., Green and Choi 1997; Fanning and Cogger

1998; Lin, Hwang, and Becker 2003). More recent research in this stream examines additional

classification algorithms, such as support vector machines, decision trees, and adaptive learning

methods (e.g., Cecchini et al. 2010; Perols 2011; Abbasi, Albrecht, Vance, and Hansen 2012;

Gupta and Gill 2012; Whiting et al. 2012) and text mining methods (e.g., Glancy and Yadav

2011; Humpherys, Moffitt, Burns, Burgoon, and Felix 2011; Goel and Gangolly 2012; Larcker

Data rarity is observed in diverse prediction settings, such as credit card fraud (Chan and

Stolfo 1998), auto insurance fraud (Phua, Alahakoon, and Lee 2004), bankruptcy (Shin, Lee, and

Kim 2005), and financial statement fraud (Whiting et al. 2012). Classification algorithms (e.g.,

logistic regression) have inherent difficulties in processing rarity (Weiss 2004), and data rarity is

regarded as one of the primary challenges in data analytics research (Yang and Wu 2006). Data

rarity is particularly severe in financial statement fraud detection because financial statement

fraud is characterized by both (i) relative rarity (a.k.a., the needle in the haystack problem) and

(ii) absolute rarity combined with an abundance of explanatory variables proposed in the

The needle in a haystack problem. Relative rarity occurs when detected fraud observations

are a relatively small percentage of the majority non-fraud observations, e.g., approximately only

0.6 percent of all audited U.S. financial reports have been identified as fraudulent (Bell and

Carcello 2000). Relative rarity is a challenge since it forces classification algorithms to consider

a large number of potential patterns without having enough fraud observations to determine

-6-

which patterns are driven by noisy data. This increases the risk that identified patterns are based

on spurious relations in a particular sample, resulting in increased false positive rates for a given

false negative rate when the developed model is applied to a new sample (Weiss 2004). Further,

observations from the majority class correctly (e.g., Maloof 2003). To illustrate, if 99 percent of

all observations are non-fraudulent, a prediction model identifying all observations as non-

fraudulent achieves an overall accuracy of 99 percent correctly classifying 100 percent of the

Perols (2011) takes an initial step towards addressing the relative rarity problem in a fraud

context by examining the performance of classification algorithms after undersampling the non-

fraud observations. However, while the simple undersampling method used in Perols (2011),

i.e., a method that simply removes non-fraud observations from the sample, generates more

balanced datasets, it also discards potentially useful non-fraud observations. We, therefore,

introduce a more sophisticated undersampling method that does not discard non-fraud

Chan and Stolfo (1998), to address relative rarity. OU uses multiple data subsets, where each

subset contains all fraud observations but different subsamples of non-fraud observations. We

specifically select OU because prior research shows that it performs well in other settings

constrained by relative rarity, such as predicting credit card fraud (e.g., Chan and Stolfo 1998).

OU is also effective compared to (i) other undersampling and oversampling methods (Nguyen,

Cooper, and Kamei 2012) and (ii) various types of bootstrap aggregation, boosting, and hybrid

ensemble data rarity methods used in the data analytics literature (Galar, Fernndez,

-7-

Barrenechea, Bustince, and Herrera 2012). OU is conjectured to improve performance (e.g.,

Nguyen et al. 2012) not only because it improves the balance between minority (fraud) and

majority (non-fraud) observations, but also because it more efficiently incorporates potentially

useful majority cases. By creating multiple prediction models that are based on different non-

overlapping subsets of majority observations, each prediction model is likely to differ somewhat

from the other prediction models. Importantly, patterns that are predictive of fraud are likely to

be present in multiple subsets. However, spurious patterns that exist by random chance in

individual subsets are unlikely to also exist in other subsets. By using a combination of these

models rather than a model built using a single data set, potentially important patterns are more

likely to be identified and estimated accurately (assuming that each model has a slightly different

estimate of the pattern). Additionally, when individual models are combined, spurious patterns

are likely to be discarded (or given less weight). This decreases the risk of overfitting, i.e., that

the prediction model has good in-sample performance but does not generalize to new

observations.

When applied in the fraud setting, OU first preprocesses the model building data by dividing

the data into multiple subsets, where each subset includes all fraud observations and a random

sample of non-fraud observations selected without replacement (Figure 1). Thus, all fraud

observations are included in all subsets while each non-fraud observation is part of at most one

subset. Each subset is then used in combination with a classification algorithm to build a fraud

prediction model.6 To perform fraud prediction, each prediction model is then applied to out-of-

sample data. For each observation in the out-of-sample data, the resulting model predictions are

-8-

averaged into an overall fraud probability prediction for the observation.7 For example, if OU is

implemented with 12 subsets, the method first creates 12 subsets as described above. Each

subset is then used to build a prediction model, for a total of 12 prediction models. The

prediction models are then applied to out-of-sample data, resulting in 12 fraud probability

predictions for each observation in the out-of-sample data. The probability predictions for each

observation are then combined by taking the average of the 12 probability predictions. Section

The curse of data dimensionality problem. According to the curse of data dimensionality,

data requirements increase exponentially with the number of explanatory variables in the dataset

(Bellman 1961).8 This is a potential problem in fraud prediction because the number of known

fraud cases is small relative to the extensive number of independent variables identified in prior

fraud research. Hence, only a small number of fraud observations are available to identify

patterns among the large number of independent variables and fraud. This may result in over-

fitted prediction models that perform poorly when predicting new observations.

model, Dechow et al. (2011) partially address the problem of data dimensionality in the fraud

context. However, while stepwise backward variable selection is designed to retain explanatory

7 This method has been found to perform well compared to more complex combiner methods (Duin and Tax 2000). Other

combiner methods, such as a Dempster-Shafer Fusion method, may be able to further improve the effectiveness of our proposed

methods; we encourage future research to examine this and other methods in more detail.

8 More specifically, when the number of explanatory variables increase, data used to fit models are spread across an increasingly

large feature space that grows exponentially with each additional explanatory variable, e.g., with one explanatory variable the

feature space is a line, with two variables the feature space is a plane, with three variables the feature space is a three-dimensional

space, etc. For example, with a dataset containing 50 fraud and 50 non-fraud observations and only one continuous explanatory

variable, the 100 observations are positioned on a line. If another variable is added, these same 100 observations are spread

across a two dimensional space. If a third variable is added, the 100 observations are spread within a three dimensional space.

For every variable that is added, the observations cover a smaller portion of the feature space. Thus, to cover a given percentage

of the feature space, the number of required observations would have to increase exponentially with the number of variables.

-9-

variables with the highest significance levels, it may discard potentially useful variables. We

build on Dechow et al. (2011) and introduce a new method that attempts to address the curse of

To reduce the imbalance between minority fraud observations and the number of variables

identified in the literature to predict fraud, we design a new data rarity method, Multi-subset

Variable Undersampling (VU).9 VU randomly splits the set of explanatory variables without

replacement into different subsets (Figure 2). Each subset contains the same observations, but

different non-overlapping sets of explanatory variables. As with OU, each subset is then used in

combination with a classification algorithm to build a fraud prediction model that is applied to

out-of-sample data. For each observation in the out-of-sample data, the resulting model

predictions are then combined into an overall fraud probability prediction for the observation.

Managers commit financial statement fraud by manipulating specific accounts, e.g., they may

financial statement fraud variables used in the literature are inherently related to a specific type

of fraud. For example, abnormal revenue growth is a potential measure of revenue fraud while

9 In an attempt to further mitigate problems associated with having a small number of fraud observations to learn from, we

examine the usefulness of an observation oversampling method named SMOTE in fraud prediction. SMOTE was developed by

Chawla, Bowyer, Hall, and Kegelmeyer (2002) and performs well across multiple classification problems (e.g., Chawla et al.

2002; He and Garcia 2009). We perform two experiments to investigate (i) the number of fraud observations to use when

creating a new synthetic fraud observation and (ii) the oversampling ratio to use, which determines how many additional

synthetic fraud observations are generated. In the first experiment, untabulated results indicate that SMOTE only performs

significantly better than the benchmark (simple oversampling, i.e., duplication of fraud observations in the training data) in one

out of 27 comparisons. In the second experiment, we again fail to find a significant performance advantage for SMOTE relative

to simple oversampling. Finally, we implement SMOTE after partitioning the data on fraud types and find that this

implementation does not statistically differ from the original implementation of SMOTE. Based on the above results, we cannot

recommend SMOTE to address data rarity in the fraud context.

- 10 -

an abnormally low amount of allowance for doubtful accounts is a potential measure of expense

fraud. Although these variables may provide useful information about a specific type of fraud,

they are less likely to detect multiple types of fraud.10 When different fraud types are combined

into a binary classification problem, variables that are helpful when detecting a specific type of

fraud may be discarded if they do not do well in predicting fraud in general. For example, a

variable that provides a good signal about expense fraud but provides no useful information

about other types of fraud will only provide value when classifying expense fraud cases, which

in our sample is only about ten percent of the fraud cases. Additionally, by combining different

fraud types into a binary classification problem, the classification algorithms focus on finding

patterns common to all fraud types. Given heterogeneity among different fraud types, such

To reduce the potential negative effects associated with combining different fraud types into

on different fraud types (PVU).11 When implementing PVU, we place all variables that appear

to predict a specific fraud type into a separate variable subset. Variables that can be used to

predict multiple fraud types are placed in multiple subsets. This creates four subsets of variables

relating to revenue, expenses, assets, and liabilities (each subset is also restricted to fraud

observations that represent the associated fraud type). We also include three additional variable

subsets, because some fraud variables measure general attributes of fraud, such as incentives,

opportunities, or the aggregate effect of fraud. The first of these subsets includes all variables

10 Since accounting information is recorded using a double entry system, specific variables may capture the effect of multiple

fraud types.

11 Additionally, the use of multiple VU variable subsets that focus on different fraud types increases the likelihood that different

prediction models capture different fraud patterns, which improves diversity among the prediction models. Prediction model

diversity is important for performance when combining multiple models (Kittler et al. 1998). We do not modify OU based on

different fraud types because OU only undersamples the non-fraud data and does not preprocess the fraud data.

- 11 -

not categorized as a specific fraud type variable. The second subset includes the variables used

in Dechow et al. (2011). These variables are included for their utility in binary fraud prediction.

The third subset includes all variables and is created to allow the classifiers to find patterns

among both fraud type specific and non-fraud type specific variables.

Sample Data

We obtain a sample containing 51 fraud firms12 and 15,934 non-fraud firm years from Perols

(2011). We only include one firm year for each fraud observation that corresponds to the first

year that the Accounting and Auditing Enforcement Release (AAER) alleges that fraud was

committed. We do not include previous years as the fraud may have predated the reported first

fraud year. We do not include multiple fraud years for each fraud firm to prevent a single fraud

firm from being included in both the model building dataset and the out-of-sample model

evaluation dataset.

Perols (2011) identifies fraud firms in SEC investigations reported in AAERs between 1998

and 2005 that explicitly reference Section 10(b) Rule 10b-5 (Beasley 1996) or contain

descriptions of fraud. This fraud firm dataset excludes: financial firms; firms without the first

fraud year specified in the SEC release; non-annual financial statement fraud; foreign firms;

10-KSB or IPO; and firms with missing Compustat (financial statement data), Compact D/SEC

(executive and director names, titles and company holdings), or I/B/E/S data (one-year-ahead

analyst earnings per share forecasts and actual earnings per share) in relevant years.13 Randomly

12 This sample size of 51 fraud firms is comparable to other fraud studies (e.g., Beasley 1996, Erickson et al. 2006; Brazel et al.

2009). Other research (e.g., Dechow et al. 2011) uses AAERs to create samples focused on material misstatements. Material

misstatement data include firms with AAERs that explicitly allege fraud as well as other firms that describe a material

misstatement without explicitly alleging fraud. While such samples are larger, they do not necessarily focus on fraud.

13 Since we add additional variables to the Perols (2011) dataset, some of the variables have missing values. Missing values are

replaced by global means/modes. The effect of this is a reduction in the utility of variables that have many missing values.

- 12 -

selected Compustat non-fraud firms (excluding observations following the applicable criteria

specified for fraud firms above) are added to the fraud firm dataset to create a sample with 0.3

percent fraud firms, which allows us to examine the robustness of the results around best

estimates of prior fraud probability, i.e., 0.6 percent (Bell and Carcello 2000), in the population

of interest. We include explanatory variables (summarized in Appendix A) that have been used

in recent literature to predict fraud or material misstatements (Cecchini et al. 2010; Dechow et al.

2011; Perols 2011). More specifically, we include all variables from Perols (2011) and all

variables from the final Dechow et al. (2011) model that can be calculated using Compustat data.

Following and extending Cecchini et al. (2010), we also include 48 variables measuring levels

and changes in levels, percentage changes in levels, and abnormal percentage changes of

Experimental Design

Overview of the experiments. As summarized in Table 1, we perform multiple experiments

to (i) determine how to best implement OU and VU (e.g., how many subsets to use) and (ii)

evaluate their relative performance compared to various benchmarks. The primary objective in

these experiments is to detect trends that indicate how to implement the methods in future

research. By detecting clear trends between the number of subsets and predictive ability rather

than selecting implementations that happen to be the most predictive, we reduce the risk that we

recommend implementations that perform well on our test data, but do not generalize well.

In experiment 1, we use OU to create observation subsets that contain all fraud observations

and random samples of non-fraud observations that yield 20 percent fraud observations per

subset. In an evaluation of simple undersampling ratios, Perols (2011) finds that this ratio

provides relatively good performance. We then evaluate how many observation subsets to

include when implementing OU. In experiment 2a, we use VU to randomly divide the variables

- 13 -

used in prior fraud prediction research into 20 subsets. We then assess how many variable

subsets to include when implementing VU. In experiment 2b, we examine whether the number

of variables included in each subset affects performance by dividing the total number of

variables into subsets as follows: one subset with all variables, two subsets each with one-half of

the variables, four subsets each with one-quarter, six subsets each with one-sixth, eight subsets

each with one-eighth, etc. We then evaluate how many variables per subset to include when

independent variables are grouped together based on their relation to specific types of fraud.

to the fraud detection literature to reduce the imbalance between the number of fraud versus non-

fraud observations, we use simple undersampling as a benchmark (Perols 2011) for OU.14 This

benchmark randomly removes non-fraud observations from the sample to generate a more

balanced model-building sample. OU and the OU benchmarks use all variables (as independent

variable reduction is examined in the VU analysis) and are implemented using support vector

machines, following recent fraud data analytics research (e.g., Cecchini et al. 2010; Perols 2011),

method that has the potential to improve the performance over currently used variable selection

methods. As a baseline we include a benchmark (the Dechow benchmark) that uses the

independent variables from model 2 in Dechow et al. (2011). We also use (i) a benchmark that

randomly selects variables and (ii) a benchmark that includes all variables (the all variables

14We also used no undersampling as an additional benchmark. However, because simple undersampling performs better than

no undersampling by 7.3 percent, we adopted simple undersampling as the benchmark.

- 14 -

benchmark) where data dimensionality is not reduced. The benchmark that randomly selects

variables performs better than both the Dechow benchmark and the all variables benchmark.15

Thus, we report our VU (and PVU) results using the benchmark that randomly selects variables.

VU, PVU, and their benchmarks use all observations (observation undersampling is examined in

in-sample performance measures because they provide a more realistic measure of prediction

performance than measures commonly used in economics (Varian 2014: 7), and cross-

validation is particularly useful. We use stratified 10-fold cross-validation, where 10 folds (i.e.,

subsamples of observations) are generated using random sampling without replacement. The 10

folds rotate between being used for training and testing the prediction models. In each rotation,

nine folds are used for training (i.e., model building) and one fold is used for testing (i.e., model

evaluation). For example, in the first round, subsets one through nine are used for training and

subset 10 is used for testing; in round two, subsets one through eight and subset 10 are used for

training, and subset nine is used for testing. By using stratified cross-validation, we ensure that

the ratio of fraud to non-fraud observations is kept consistent across the training and test sets in

the different rounds. With a total of 51 fraud firms in the sample, 45 or 46 fraud firms are used

for model building and five or six fraud firms are used for model evaluation in each cross-

validation round. In our experiments, the OU and VU methods are only applied to training data.

Prediction performance metric. Following prior financial statement fraud research (e.g.,

Beneish 1997; Kwon, Pastena, and Park 2000; Lin et al. 2003), we use expected cost of

misclassification (ECM) as the preferred performance metric. ECM allows the researcher to

15The Dechow benchmark performed 0.02 percent better than the all variables benchmark and the random variable selection

benchmark performed 3.87 percent better than Dechow benchmark.

- 15 -

vary two important parameters in evaluating the prediction models performance on out-of-

sample data: (i) estimated percentage of fraud firms in the population of interest and (ii)

estimated ratio of the cost of a false negative to the cost of a false positive in the population of

interest. Including both parameters is important in settings such as fraud prediction that are

characterized by relative rarity and uneven misclassification costs. Given specific classification

where CFP and CFN are estimates of the cost of false positive and false negative classifications,

respectively, deflated by the lower of CFP or CFN; P(Fraud) and P(Non-Fraud) are estimates of

prior probability of fraud and non-fraud, respectively; nFP and nFN are the number of false

positive and false negative classifications, respectively, on the cross-validation test data;16 and nP

and nN are the number of fraud and non-fraud observations, respectively, in the cross-validation

test data. Bayley and Taylor (2007) estimate that actual cost ratios (FN to FP cost) average

between 20:1 and 40:1, while Bell and Carcello (2000) estimates that approximately 0.6 percent

of all firm years represent detected fraud. Thus, in experiments, which compare model

prediction performance at best estimates of prior fraud probability and cost ratios, we calculate

ECM at a cost ratio of 30:1 and a prior fraud probability of 0.6 percent (together with the

prediction models actual false positive and false negative rates). The goal of the prediction

16Following prior research (e.g., Beneish 1997; Feroz, Kwon, Pastena, and Park 2000; Lin et al. 2003), nFP and nFN are obtained

using optimal fraud classification thresholds (e.g., probability cutoffs for classifying an observation as fraud or non-fraud) for

each combination of prior fraud probability and cost ratio. These optima are established by examining ECM scores using all

unique fraud probability predictions as potential thresholds.

- 16 -

1) The full sample is first separated into model-building data (a.k.a., training data) and model-

evaluation data (a.k.a., test data) using 10-fold cross-validation.

2) For each cross-validation round and OU implementation, the OU method is applied to the

training data (but not the test data, which is left intact) to partition the training data into OU

subsets. For example, in the first cross-validation round when evaluating the OU

implementation with 12 subsets, the OU method creates 12 subsets of the first training set.

3) A classification algorithm is used with each OU training subset generated in step 2 to build

one prediction model for each OU subset. For example, in OU with 12 subsets, a total of 12

prediction models are generated.

4) The test set, which was not modified using the OU method, is applied to each of the

prediction models generated in step 3.

5) For each observation in the test set, the probability predictions from each prediction model

are averaged. After combining the probability predictions, each observation in the test set

has a single probability prediction representing the average prediction of all the prediction

models developed in step 3.

6) The probability predictions along with the class labels (e.g., fraud or non-fraud) are used to

calculate ECM scores. When calculating ECM scores, optimal fraud classification thresholds

(cutoffs) are first determined for each combination of prior fraud probability and cost ratio

by examining ECM scores at different classification threshold levels (Beneish 1997).

Optimal thresholds are then used to calculate ECM scores for each combination of prior

fraud probability and cost ratio for that specific test dataset.

7) The experimental procedure repeats steps two through six for each cross validation round and

each OU implementation, e.g., OU with two subsets, OU with three subsets, etc., within each

cross validation round.

8) After completing all ten rounds, each OU implementation has ten ECM scores (one for each

test set) for each prior fraud probability and cost ratio combination. Averages of the ten ECM

scores are then used to examine prediction performance of different OU implementations and

against the benchmarks at different prior fraud probability and cost ratio levels.

- 17 -

IV. RESULTS

Main Results

Figures 4-6 summarize the performance results of different OU and VU implementations.

For each implementation, the results represent the average expected cost of misclassification

(ECM) from ten test folds. ECM is reported at the best estimates of (i) prior fraud probability,

i.e., 0.6 percent, and (ii) false negative to false positive cost ratios, i.e., 30:1. The results are

presented as the percentage difference in ECM between each OU and VU implementation and

their respective benchmarks.17 Given that each figure is plotted using a single benchmark that is

held constant across different implementations, we first use the figures to look for clear trends

that indicate how to implement OU and VU, respectively. We then compare the performance of

details in Table 2) presents the performance results of OU relative to the best performing OU

benchmark (i.e., simple undersampling) as the number of subsets in OU increases. Our results

indicate that the benefit provided by OU initially increases as additional subsets are used but

Figure 4 also includes the corresponding results from two sensitivity analyses, i.e., the

experiments in which the subsets are selected in a different order and the random selection of

non-fraud cases is repeated. The results across all three versions of experiment 1 are similar in

that each shows a performance benefit from using OU that initially increases in the number of

OU subsets, but starts to plateau after about 10 subsets. These results indicate that the marginal

17Reported p-values are based on pairwise t-tests using the average and standard deviation of ECM scores across the ten test

folds and are one-tailed unless otherwise noted. Assumptions related to normality and independent observations are unlikely to

be satisfied, and p-values are only included as an indication of the relation between the magnitude and the variance of the

difference between each implementation and the respective benchmarks.

- 18 -

performance benefit from adding subsets declines as new subsets become less and less likely to

Taken together, these experiments indicate that OU provides performance benefits and that

the number of subsets to include in OU is relatively consistent in the fraud setting. In an attempt

to balance performance benefits (we want to include enough subsets to make sure that we have

reached the performance plateau) with analysis costs (given that we have reached the plateau, we

want to keep the number of subsets low since adding additional subsets increases processing

OU(12). This configuration lowers the expected cost of misclassification in the primary analysis

examine two versions. The dashed line shows the results when the number of variables in each

subset remains constant per experimental round (Experiment 2a). The round dotted line shows

the results when all variables are included and divided evenly across the subsets in each

18 OU, which uses all variables and under-sampled non-fraud firm observations (across multiple subsets), appears to improve

performance in two ways. First, simple undersampling improves performance over no undersampling by 7.3 percent. Second,

OU(12) further improves the performance over simple undersampling by another 10.8 percent. This indicates that OU improves

performance relative to the benchmarks that use all observations (i) because it undersamples observations, but more importantly

(ii) because of the way it undersamples these observations. That is, it creates multiple subsets including non-overlapping non-

fraud observations. This suggests that OU creates diverse models using different subsets. To better understand the source of this

diversity, i.e., if using different observations in the subsets allows OU to obtain more robust parameter estimates of a subset of

important variables or if different variables are emphasized in the different models, we perform an additional comparison. This

supplemental analysis indicates that OU(12) with all variables (as implemented in the paper) performs 7.0 percent better than

OU(12) with only the Dechow variables, which in turn performs 11.1 percent better than the Dechow benchmark that uses all

observations. The improvement in the Dechow benchmark when combined with OU(12) suggests that some performance benefit

is obtained by OU(12) creating more robust parameter estimates. The additional performance benefit of OU(12) with all

variables over OU(12) with only the Dechow variables (together with results in footnote 19), indicates that different models at

least partially rely on different variables. OU thus appears to improve performance by generating more robust parameter

estimates and by emphasizing different variables in different models.

- 19 -

<Insert Figure 5 Here>

When the number of variables is kept constant in each subset (the dashed line), the

subsets, and then decreasing at 19 subsets. However, even at the plateau (VU with 11 to 18

subsets), the performance difference between VU and the benchmark only approaches statistical

significance (p = 0.125 on average). In addition, the jagged line indicates that VU is sensitive to

When all available variables are divided into the selected subsets (the round dotted line), VU

does not provide a performance benefit relative to the random variable selection benchmark.

Consistent with the results from the analysis where the number of variables is kept constant in

each subset, these results indicate that the performance of VU is dependent on the specific

variables included in each subset. This second VU experiment also emphasizes the importance

The VU results discussed above suggest that a more deliberate partitioning of variables may be

important. We earlier argued that fraud consists of multiple types (e.g., revenue vs. expense

fraud) and that it might be beneficial to partition the explanatory variables with this in mind. Our

results for PVU support this conjecture. More specifically, in untabulated results, PVU lowers

the expected cost of misclassification by 9.6 percent (p = 0.019) relative to the best performing

VU benchmark.19

19 To better understand why PVU (and VU) improves performance over the benchmarks, we first note that the small performance

difference (0.02 percent) between the all variables benchmark (that uses all observations and all variables) and the Dechow

benchmark (that uses all observations and a subset of variables as selected in Dechow et al. 2011) suggests that performance does

not improve by simply adding more variables. Given that VU (as well as PVU that performs even better) improves performance

relative to the all variables benchmark by 7.2 percent, it appears that the segmentation of the variables rather than the inclusion of

additional variables contributes to the performance improvement. Additionally, because PVU performs 6.3 percent better than

VU, it appears that how the variables are segmented matters.

- 20 -

Additional Analyses

Further validation using misstatement data. We use the observations in a material

misstatement dataset that is an expanded version (additional years) of the data used in Dechow et

al. (2011) to perform three additional analyses. This dataset is available from the Center for

Financial Reporting and Management at the University of California, Berkeley and includes the

fraud firms used in our primary dataset as well as additional material misstatement firms reported

in AAERs by the SEC.20 Unless otherwise noted, the prediction models are implemented using

the same variables as in the main experiments (e.g., OU is implemented using all variables) and

we use the Dechow benchmark given that these data are based on Dechow et al. (2011). To

evaluate predictive performance, we again use 10-fold cross validation. Further, due to a lack of

good estimates of prior probabilities and cost ratios for material misstatements, we use a

performance metric known as area under the Receiver Operating Curve (ROC) or simply AUC

The first analysis provides further validation of out-of-sample prediction performance of the

proposed methods and compares OU and PVU to the Dechow benchmark when using the

20 We exclude firms from the finance industry and, following Dechow et al. (2011), add all Compustat non-fraud firms in the

same year and industry as the fraud firms. We do, however, only include the first fraud year, i.e., we do not include multiple

years for each fraud firm, due to the potential bias introduced when including fraud firm years. We also follow the procedure

used in Dechow et al. (2011) to eliminate observations with missing values in one or more of the variables included in the

Dechow benchmark. We use mean replacement to handle missing values in the remaining variables. We also perform the

analyses reported in this section after eliminating all observations with one or more missing values. Before performing this

elimination, we remove six variables with over 25 percent missing values: abnormal change in order backlog, allowance for

doubtful accounts, allowance for doubtful accounts to accounts receivable, allowance for doubtful accounts to net sales, expected

return on pension plan assets, and change in expected return on pension plan assets.

21 While ECM is a preferred performance metric when prior probabilities and cost ratios are known, AUC is preferred over other

performance measures in settings with unknown error costs and prior probabilities (Provost, Fawcett, and Kohavi 1998). AUC

has become the de facto standard performance measures in machine learning research and has also been used in accounting

research (e.g., Larcker and Zakolyukina 2012). A single ROC curve is generated for each predicted evaluation dataset by

changing the classification threshold and then plotting the true positive rate (positive cases classified correctly to all positive

cases) to the false positive rate (negative cases classified incorrectly to all negative cases). ROC curves thus depict the trade-off

between classifying additional positive cases correctly and the cost of classifying additional negative cases incorrectly, as the

classification threshold decreases. Alternatively, they also show how well the prediction model performs in ranking the

evaluation dataset observations. The area under the ROC curve (AUC) provides a numeric value of this trade-off and represents

the probability that a randomly selected positive instance is ranked higher than a randomly selected negative instance. An AUC

of 0.5 is equivalent to a random rank order while an AUC of 1 is perfect ranking of the evaluation cases.

- 21 -

observations in the material misstatement data. This analysis also provides insight into the

usefulness of the proposed methods in a slightly different setting (material misstatements vs.

fraud). Results in Table 3 suggest that OU and PVU (panel A) continue to improve performance

over the Dechow benchmark when using material misstatement data now by 16.9 (p = 0.004)

The second analysis examines the sensitivity of the results to the classification algorithm

used. This analysis evaluates the performance of the methods when combined with logistic

regression and bootstrap aggregation instead of support vector machines (that is used in all other

analyses). Results in Table 3 (panel B) show that the performance of OU and PVU are

consistent (OU more so than PVU) across the different classification algorithms. The

algorithm used. More importantly, OU and PVU perform significantly better than the Dechow

benchmark across all of the different classification algorithms. OU improves the performance

over the benchmark by 3.6 (p = 0.004) and 50.5 (p = 0.003) percent when logistic regression and

bootstrap aggregation are used, respectively.22 Similarly, PVU improves the performance over

the benchmark by 7.8 (p < 0.001) and 36.0 (p < 0.001) percent when logistic regression and

bootstrap aggregation are used, respectively. These results suggest that the performance benefits

from using OU and PVU are robust to the specific classification algorithm used.23

22 The difference between OU and the Dechow benchmark when using logistic regression does not appear to be as strong as that

suggested in the main experiment using fraud data. In the main experiment, we used the same classification algorithm (support

vector machines) for all methods and benchmarks to maintain internal validity and to avoid making the experiments overly

complex. To evaluate the effect of a potential bias against the Dechow benchmark associated with this decision, we examine the

performance of the Dechow benchmark in the main experiment with logistic regression instead of support vector machines. The

results indicate an insignificant difference between the two implementations (p = 0.984, two tailed), and this result is robust

across different prior probability levels and cost ratios. Thus, the decision to use support vector machines for all methods and

benchmarks does not appear to have biased the results against the Dechow benchmark.

23 These results are also robust to an additional analysis using a sample that excludes all variables with over 25 percent missing

values and all observations with one or more missing values in remaining variables. We also performed some limited analysis

using boosting, and OU and PVU continue to outperform the Dechow benchmark by 44.8 (p < 0.001) and 5.1 (p = 0.044)

percent, respectively. However, the performance of both PVU and the Dechow benchmark fell considerably (while the

- 22 -

<Insert Table 3 Here>

The third analysis provides insight into (i) the usefulness of OU when used in combination

with a different set of independent variables (based on the financial kernel of Cecchini et al.

2010) and (ii) whether OU provides incremental predictive power when used in combination

with this kernel. Cecchini et al. (2010) based their financial kernel on 23 financial statement

variables commonly used to construct independent variables for fraud prediction models. The

financial kernel divides each of the 23 original variables by each other both in the current year

and in the prior year and calculates changes in the ratios. Both current and lagged ratios as well

as their changes are then used to construct a dataset with 1,518 independent variables.

We use the same initial set of observations used in the previous analysis and recreate the

financial kernel following Cecchini et al. (2010). We also follow their procedures and exclude

all observations with missing values. We then compare OU implemented with the variables in

the financial kernel to the Cecchini benchmark, which uses the financial kernel but does not

attempt to implement PVU, as it is not clear how we would separate the 1,518 variables into

different fraud types. Results in Table 3 (panel C) indicate that OU (AUC = 0.67) outperforms

the financial kernel (AUC = 0.59) in misstatement prediction by 14.2 percent (p = 0.004).24

Combining the Methods. We analyze whether various combinations of OU, PVU, and

SMOTE (see footnote 9) provide additional performance benefits compared to OU(12), the best

performing individual method. Figure 6 plots the performance difference of various method

combinations compared to OU(12) at different cost ratios. The selection of the specific

performance of OU only fell slightly) when using boosting. Similarly, we performed some limited experiments using Bayesian

learning, but the performance of all three methods fell drastically. Thus, boosting and Bayesian learning do not appear to be

viable options, and we do not tabulate these results.

24 When including fraud firm years, OU performs 5.7 percent (p < 0.001) better than the Cecchini benchmark and both

approaches have high AUC values (AUC = 0.863 and AUC = 0.816, respectively).

- 23 -

configurations used in these combinations is based on their general performance in the previous

experiments. The combinations are generated by creating prediction models using OU and VU

separately and then averaging the predictions from the OU and PVU prediction models.25

In untabulated results, the three-method combination does not perform significantly different

than OU(12) (p = 0.465) at best estimates of prior fraud probability and cost ratios. Similarly,

the two-method combination of OU(12) and PVU also does not perform significantly different

than OU(12) (p = 0.421) at best estimates of prior fraud probability and cost ratios. Thus, in

typical fraud prediction research settings, we recommend using OU(12). However, the two- and

three-method combinations provide performance benefits over OU(12) at higher cost ratios and

higher prior fraud probability levels (see Figure 6). Given that the combination of OU(12) and

PVU either performs significantly better or not significantly different than OU(12), we

recommend using this combination of the two methods if maximizing predictive ability is more

important than minimizing implementation costs. For example, when the SEC uses a prediction

model to help decide which firms to investigate for potential fraud, the additional

implementation costs associated with using the combination is likely to be small relative to the

costs of misclassifying a non-fraud firm and using resources to investigate the firm (and even

more so relative to misclassifying a fraud firm and not detecting the fraud).

identify new explanatory variables to improve fraud prediction. Traditionally, this research uses

the entire sample (i.e., all observations) or a single matched sample to evaluate the significance

of one or more independent variables that are hypothesized to be associated with the dependent

25

SMOTE is incorporated by oversampling the data used by OU and PVU. We also first create the OU subsets and then apply

SMOTE and PVU to these subsets, but this more integrated and complex combination does not improve performance further.

- 24 -

variable. However, the predictive performance benefits of OU reported earlier suggest that

classification algorithms (e.g., logistic regression) recognize different fraud patterns when

trained on different subsets of non-fraud firms. Thus, when evaluating explanatory variables in

hypothesis testing research, it may be important to consider the robustness of results across

example uses data from the additional analyses that examine misstatement data. In this example

we examine the significance of Sales to Employees given a set of control variables selected

based on prior research (the control variables in this example were selected using step-wise

backward feature selection). Traditionally, the hypothesis would be tested using all observations

in the sample, i.e., the full sample. The results for the full sample in Table 4 indicate that the

hypothesis is supported (p = 0.0116). However, the OU subsample analyses indicate that this

result might not be robust. For example, the average p-value of all Sales to Employees estimates

across the 12 models obtained using different sub-samples is p = 0.180 and the p-value is above

Results in Table 4 suggest that OU yields similar results to the traditional hypothesis testing

analysis, i.e., the most significant variables in the traditional approach tend to be the most

significant in the OU analysis. However, the OU results are generally more conservative. For

example, in only two cases are the median p-values from the OU results numerically smaller

(more significant) than the corresponding parametric result. For 12 of 17 variables, the median p-

values are numerically larger (less significant) than their parametric counterparts. Thus, we

- 25 -

encourage future research to consider applying OU as a robustness check for hypothesis testing.26

Financial statement fraud is a costly problem that has far reaching negative consequences.

Hence, the accounting literature investigates a wide range of explanatory variables and various

classification algorithms that contribute to more accurate prediction of fraud and material

misstatements. However, the rarity of fraud data, the relative abundance of variables identified in

prior literature, and the broad definition of fraud create challenges in specifying effective

prediction models.

Research in the emerging field of data analytics has been applied successfully in other

settings constrained by data rarity, such as predicting credit card fraud (Chan and Stolfo 1998).

We, therefore, follow the call of Varian (2014) to apply recent advances in data analytics in other

settings and investigate the ability of methods drawn from data analytics to improve fraud

undersampling of non-fraud observations to establish a more effective balance with scarce fraud

observations. When used with 12 subsamples, this method improves fraud prediction by

lowering the expected cost of misclassification by more than ten percent relative to the best

performing benchmark. This method is also both efficient and relatively easy to implement.

explanatory variables to put them more in balance with scarce fraud observations. Fraud

26In untabulated results, we repeat the analysis using bootstrapping. More specifically, the full sample is used to generate 1,000

bootstrap subsamples (each sample contained observations selected randomly with replacement). Each bootstrap subsample is

then used to fit a logistic regression model from which 2.5 and 97.5 percentiles of independent variable coefficient estimates are

obtained. The bootstrapping results are similar to the OU results in that they are also generally more conservative.

- 26 -

into different subsets. However, it does not do so reliably. When we instead implement Multi-

subset Variable Undersampling by partitioning variables into subsets based on the type of fraud

they are likely predict (PVU), the expected cost of misclassification is reduced by 9.6 percent

Our research makes multiple contributions to the prior literature. First, we identify and

directly address financial statement fraud data rarity problems by systematically evaluating

multiple methods that we believe are new to the accounting literature. Based on our

experiments, we conclude that OU and PVU each produce economically and statistically

significant reductions in the expected cost of misclassification of about ten percent.27 This

compares to, for example, a 0.9 percent prediction performance advantage when, following

Dechow et al. (2011), two additional significant independent variables are added to their initial

model. The introduction and evaluation of these methods directly contributes to research that

focuses on improving fraud prediction. Beneish (1997) and Dechow et al. (2011), among others,

create fraud prediction models that can be used to indicate the likelihood that a company has

committed financial statement fraud. Our methods can be used to improve the quality of such

fraud predictions. We also directly extend research that examines the usefulness of data

analytics methods in fraud prediction (e.g., Cecchini et al. 2010; Perols 2011; Larcker and

27 We specifically recommend the use of OU(12) at times in combination with PVU. The choice between using OU by itself or

in combination with PVU depends on the cost ratio and the prior fraud probability assumed by the specific entity that is trying to

predict fraud (see Figure 6).

28 Future research that tries to improve fraud prediction using data analytics methods can examine other problems related to

rarity, such as (i) noisy data that potentially have more significant negative effects on rare cases (Weiss 2004), and (ii) mislabeled

non-fraud firms, i.e., firms that are labeled non-fraud but have actually committed fraud. We performed a limited analysis of one

potential approach. We (1) manipulated the training data in each cross-validation round by using OU to generate fraud

probability predictions for all the observations in the training data and then removed all non-fraud firms with high fraud

probability predictions (we tried five different thresholds: 0.9, 0.8, 0.7, 0.6, and 0.5) from the training data; (2) used the modified

training data from step 1 as input into OU; and (3) compared the results from step 2 to the original OU implementation.

Untabulated results did not show any significant performance improvements over the original OU implementation. When

compared to the original implementation, the average change in AUC across the ten test folds was -0.08% (p = 0.809; two-tailed),

- 27 -

Second, by showing that performance benefits can be gained by (i) addressing data rarity

problems in fraud detection and (ii) partitioning financial statement fraud into different fraud

types, our results provide an indication of the potential benefits that may result from addressing

similar problems in other settings. For example, bankruptcy, financial statement restatements,

material weaknesses in internal control over financial reporting, and audit qualifications are also

Third, our research has implications for research that focuses on designing new explanatory

variables and developing parsimonious prediction models (e.g., Dechow et al. 2011; and

Markelevich and Rosner 2013). Our findings suggest that classification algorithms recognize

different fraud patterns when trained on different subsets of non-fraud firms. Thus, even if an

explanatory variable is deemed significant in one subsample, it is valuable to show that it is also

measure proposed by Athey and Imbens (2015) that creates subsamples based on values of the

independent variables in the model. While we perform additional analyses that suggest that OU

(i) performs better than bootstrapping in predictive modeling and (ii) can be used to evaluate the

recommendations about which method(s) to use for hypothesis testing.29 Further, research that

concludes that a new explanatory variable provides incremental predictive power should

0.08% (p = 0.360; one-tailed), 0.12% (p = 0.337; one-tailed), 0.31% (p = 0.182; one-tailed), and 0.24% (p = 0.228; one-tailed)

when using thresholds of 0.5, 0.6, 0.7, 0.8, and 0.9, respectively. Future research is also needed to more directly address the

challenges associated with biases introduced by only having few fraud observations in absolute terms, while at the same time

having a potentially large number of undetected fraud cases. For example, to assess the impact of a potential overreliance on a

small sample of fraud firms and to attempt to improve out-of-sample predictive performance, future research could use random

subsamples rather than all fraud firms in each OU subset.

29 Future research can also examine the use of OU in conjunction with propensity score matching. For example, can OU be used

to generate more robust propensity scores? Alternatively, by applying OU and then generating propensity scores, matched

samples, and evaluating difference between the samples within each OU subsets, can OU be used to evaluate the robustness of

propensity score matching results?

- 28 -

consider showing that the variable provides incremental predictive value to models implemented

Fourth, we also make a contribution by following the call to consider different types of fraud

(Brazel et al. 2009). We partition financial statement fraud into types and show that this

reframing improves the performance of VU in fraud prediction. The importance of this finding

may extend beyond VU. Research that examines predictors of fraud could, similar to Brazel et

al. (2009), design new explanatory variables to detect a specific type of fraud instead of fraud in

general. For example, fraud research could potentially develop variables that predict different

fraud types using different types of analyst forecasts (e.g., revenue vs. earnings) or different

types of debt covenants (e.g., leverage vs. interest coverage). For example, an independent

variable that indicates whether a firm uses a leverage (interest expense) debt covenant can in turn

be used in a prediction model that predicts liabilities (expense) fraud. This reframing could as

such contribute to better theoretical understanding of fraud and also more precise evaluation of

explanatory variables.

Finally, we believe that regulators and practitioners can potentially benefit from our findings.

Regulators, such as the SEC, are investing resources in developing better fraud risk models

(Walter 2013; SEC 2015). Our findings may enhance their ability to identify firms that have

committed fraud. This is important because, due to resource constraints, the SEC has to focus

prediction models can be cost effective in identifying potential fraud firms. The negative effects

30Please refer to www.fraudpredictionmodels.com/ou for further details on OU in general and more specific guidance on how to

use OU to evaluate (1) the robustness of independent variable hypothesis testing results and (2) the incremental predictive

performance of new independent variables. The hypothesis testing example includes further details on the analysis performed in

Table 4 of this paper and also includes mock data. The predictive performance example explains how to use OU in combination

with out-of-sample and includes mock data and SAS code.

- 29 -

customers, and lenders can also be potentially reduced. For example, auditors can use our

methods to potentially improve fraud risk assessment models that, in turn, can improve audit

client portfolio management and audit planning decisions. Given the significant costs and

widespread effects of financial statement fraud, improvements in fraud prediction models can

- 30 -

REFERENCES

Abbasi, A., C. Albrecht, A. Vance, and J. Hansen. 2012. MetaFraud: A Meta-Learning

Framework for Detecting Financial Fraud. MIS Quarterly. 36(4): 1293-1327.

Agarwal, R., and V. Dhar. 2014. EditorialBig Data, Data Science, and Analytics: The

Opportunity and Challenge for IS Research. Information Systems Research. 25(3): 443-448.

Apostolou, B., J. Hassell, and S. Webber. 2000. Forensic Expert Classification of Management

Fraud Risk Factors. Journal of Forensic Accounting. 1(2): 181-192.

Armstrong, C. S., D. F. Larcker, G. Ormazabal, and D. J. Taylor. 2013. The relation between

equity incentives and misreporting: the role of risk-taking incentives. Journal of Financial

Economics. 109(2): 327-350.

Association of Certified Fraud Examiners. 2014. Report to the Nation on Occupational Fraud

and Abuse. Austin, TX.

Athey, S., and G. Imbens. 2015. A Measure of Robustness to Misspecification. American

Economic Review. 105(5): 476-80.

Bayley, L., and S. Taylor. 2007. Identifying earnings management: A financial statement

analysis (red flag) approach. Proceedings of the American Accounting Association Annual

Meeting, Chicago, IL.

Beasley, M. 1996. An Empirical Analysis of the Relation between the Board of Director

Composition and Financial Statement Fraud. The Accounting Review. 71(4): 443-465.

Bell, T., and J. Carcello. 2000. A Decision Aid for Assessing the Likelihood of Fraudulent

Financial Reporting. Auditing: A Journal of Practice & Theory. 19(1): 169-184.

Bellman, R. Adaptive Control Processes: A Guided Tour, Princeton University Press, 1961.

Beneish, M. 1997. Detecting GAAP Violation: Implications for Assessing Earnings Management

among Firms with Extreme Financial Performance. Journal of Accounting and Public Policy.

16(3): 271-309.

Beneish, M. 1999. Incentives and Penalties Related to Earnings Overstatements That Violate

GAAP. The Accounting Review. 74(4): 425-457.

Brazel, J. F., K. L. Jones, and M. F. Zimbelman. 2009. Using nonfinancial measures to assess

fraud risk. Journal of Accounting Research. 47(5): 1135-1166.

Breiman, L. 1996. Bagging predictors. Machine learning. 24(2): 123-140.

Brown, B., M. Chui, and J. Manyika. 2011. Are you ready for the era of big data. McKinsey

Quarterly. 4: 24-35.

Caskey, J., and M. Hanlon. 2013. Dividend Policy at Firms Accused of Accounting Fraud.

Contemporary Accounting Research. 30(2): 818-850.

Cecchini, M., G. Koehler, H. Aytug, and P. Pathak. 2010. Detecting Management Fraud in

Public Companies. Management Science. 56(7): 1146-1160.

Chan, P., and S. Stolfo. 1998. Toward Scalable Learning with Non-uniform Class and Cost

Distributions: A Case Study in Credit Card Fraud Detection. Proceedings of the Fourth

International Conference on Knowledge Discovery and Data Mining, New York, NY.

Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic

Minority Oversampling Technique. Journal of Artificial Intelligence Research. 16: 321-357.

- 31 -

Chen, H., R. H. Chiang, and V. C. Storey. 2012. Business Intelligence and Analytics: From Big

Data to Big Impact. MIS Quarterly. 36(4): 1165-1188.

Dechow, P. M., R. G. Sloan, and A. P. Sweeney. 1996. Causes and consequences of earnings

manipulation: An analysis of firms subject to enforcement actions by the sec. Contemporary

Accounting Research. 13(1): 1-36.

Dechow, P. M., W. Ge, C. R. Larson, and R. G. Sloan. 2011. Predicting Material Accounting

Misstatements. Contemporary Accounting Research. 28(1): 17-82.

Duin, P. W. R., and M. J. D. Tax. 2000. Experiments with Classifier Combining Rules.

Proceedings of the International Workshop on Multiple Classifier Systems 2000.

Erickson, M., M. Hanlon, and E. L. Maydew. 2006. Is There a Link between Executive Equity

Incentives and Accounting Fraud? Journal of Accounting Research. 44(1): 113-143.

Ettredge, M. L., L. Sun, P. Lee, and A. A. Anandarajan. 2008. Is earnings fraud associated with

high deferred tax and/or book minus tax levels? Auditing: A Journal of Practice & Theory.

27(1): 1-33.

Fanning, K., and K. Cogger. 1998. Neural network detection of management fraud using

published financial data. International Journal of Intelligent Systems in Accounting, Finance

and Management. 7(1): 21-41.

Feng, M., W. Ge, S. Luo, and T. Shevlin. 2011. Why do CFOs become involved in material

accounting manipulations? Journal of Accounting and Economics. 51(1): 21-36.

Feroz, E., T. Kwon, V. Pastena, and K. Park. 2000. The Efficacy of Red-Flags in Predicting the

SEC's Targets: An Artificial Neural Networks Approach. International Journal of Intelligent

Systems in Accounting, Finance & Management. 9(3): 145-157.

Galar, M., A. Fernndez, E. Barrenechea, H. Bustince, and F. Herrera. 2012. A review on

ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based

approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and

Reviews. 42(4): 463-484.

Glancy, F. H., and S. B. Yadav. 2011. A computational model for financial reporting fraud

detection. Decision Support Systems. 50(3): 595-601.

Goel, S., and J. Gangolly. 2012. Beyond The Numbers: Mining The Annual Reports For Hidden

Cues Indicative Of Financial Statement Fraud. Intelligent Systems in Accounting, Finance

and Management. 19(2): 75-89.

Green, B. P., and J. H. Choi. 1997. Assessing the Risk of Management Fraud Through Neural

Network Technology. Auditing: A Journal of Practice & Theory. 16(1): 14-28.

Gupta, R., and N. S. Gill. 2012. A Solution for Preventing Fraudulent Financial Reporting using

Descriptive Data Mining Techniques. International Journal of Computer Applications. 58(1):

22-28.

He, H., and E. A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on

Knowledge and Data Engineering. 21(9): 1263-1284.

Humpherys, S. L., K. C. Moffitt, M. B. Burns, J. K. Burgoon, and W. F. Felix. 2011.

Identification of fraudulent financial statements using linguistic credibility analysis. Decision

Support Systems. 50(3), 585-594.

Jones, K. L., G. V. Krishnan, and K. D. Melendrez. 2008. Do Models of Discretionary Accruals

- 32 -

Detect Actual Cases of Fraudulent and Restated Earnings? An Empirical Analysis.

Contemporary Accounting Research. 25(2): 499-531.

Kaminski, K., S. Wetzel, and L. Guan. 2004. Can Financial Ratios Detect Fraudulent Financial

Reporting. Managerial Auditing Journal. 19(1): 15-28.

Kittler, J., M. Hatef, R.P.W. Duin, and J. Matas. 1998. On Combining Classifiers. IEEE

Transactions on Pattern Analysis and Machine Intelligence. 20(3): 226-239.

Larcker, D. F., and A. A. Zakolyukina. 2012. Detecting deceptive discussions in conference

calls. Journal of Accounting Research. 50(2): 495-540.

LaValle, S., E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz. 2013. Big data, analytics

and the path from insights to value. MIT Sloan Management Review. 21.

Lee, T. A., R. W. Ingram, and T. P. Howard. 1999. The Difference between Earnings and

Operating Cash Flow as an Indicator of Financial Reporting Fraud. Contemporary

Accounting Research. 16(4): 749-786.

Lennox, C., and J. A. Pittman. 2010. Big Five Audits and Accounting Fraud. Contemporary

Accounting Research, 27(1): 209-247.

Lin, J., M. Hwang, and J. Becker. 2003. A Fuzzy Neural Network for Assessing the Risk of

Fraudulent Financial Reporting. Managerial Auditing Journal. 18(8): 657-665.

Loebbecke, J. K., M. M. Eining, and J. J. Willingham. 1989. Auditors experience with material

irregularities: Frequency, nature, and detectability. Auditing: A Journal of Practice and

Theory. 9(1): 1-28.

Maloof, M. 2003. Learning When Data Sets are Imbalanced and When Costs are Unequal and

Unknown. Proceedings of the Twenty International Conference on Machine Learning,

Washington, DC.

Markelevich, A., and R. L. Rosner. 2013. Auditor Fees and Fraud Firms. Contemporary

Accounting Research. 30(4), 1590-1625.

Nguyen, H. M., E. W. Cooper, and K. Kamei. 2012. A comparative study on sampling

techniques for handling class imbalance in streaming data. Soft Computing and Intelligent

Systems. 1762-1767.

Perols, J. 2011. Financial statement fraud detection: An analysis of statistical and machine

learning algorithms. Auditing: A Journal of Practice & Theory. 30(2): 19-50.

Perols, J. L., and B. A. Lougee. 2011. The relation between earnings management and financial

statement fraud. Advances in Accounting. 27(1): 39-53.

Phua, C., D. Alahakoon, and V. Lee. 2004. Minority Report in Fraud Detection: Classification of

Skewed Data. SIGKDD Explorations. 6(1): 50-59.

Price III, R. A., N. Y. Sharp, and D. A. Wood. 2011. Detecting and predicting accounting

irregularities: A comparison of commercial and academic risk measures. Accounting

Horizons. 25(4): 755-780.

Provost, F. J., T. Fawcett, and R. Kohavi. 1998. The case against accuracy estimation for

comparing induction algorithms. Proceedings of the Fiftheenth International Conference on

Machine Learning, Madison, WI. 98: 445-453.

Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.,

San Francisco, CA, USA.

- 33 -

SEC 2015. Examination Priorities for 2015. Retrieved from

http://www.sec.gov/about/offices/ocie/national-examination-program-priorities-2015.pdf.

Sharma, V. 2004. Board of Director Characteristics, Institutional Ownership, and Fraud:

Evidence from Australia, Auditing: A Journal of Practice & Theory. 23(2): 105-117.

Shin, K. S., T. Lee, and H. J. Kim. 2005. An Application of Support Vector Machines in

Bankruptcy Prediction Models. Expert Systems with Application. 28: 127-135.

Summers, S. L., and J. T. Sweeney. 1998. Fraudulently Misstated Financial Statements and

Insider Trading: An Empirical Analysis. The Accounting Review. 73(1): 131-146.

Varian, H. R. 2014. Big data: New tricks for econometrics. The Journal of Economic

Perspectives. 28(2): 3-27.

Walter, E. (2013, February). Harnessing Tomorrows Technology for Todays Investors and

Markets. Speech Presented at American University School of Law, Washington, D.C.

Weiss, G. 2004. Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations

Newsletter. 6(1): 7-19.

Whiting, D. G., J. V. Hansen, J. B. McDonald, C. Albrecht, and W. S. Albrecht. 2012. Machine

Learning Methods For Detecting Patterns Of Management Fraud. Computational

Intelligence. 28(4): 505-527.

Yang, Q., and X. Wu. 2006. 10 challenging problems in data mining research. International

Journal of Information Technology & Decision Making. 5(4): 597-604.

- 34 -

APPENDIX A: Definitions of explanatory variablesa

Variable Definitionb

Abnormal change in (OB - OBt-1) / OBt-1 - (SALE - SALEt-1) / SALEt-1

order backlog

Actual issuance IF SSTK>0 or DLTIS>0 THEN 1 ELSE 0

Book to market CEQ / (CSHO * PRCC_F)

Change in expected PPROR-PPRORt-1

return on pension plan

assets

Change in free cash (IB - RSST Accruals) / Average total assets - (IBt-1 - RSST Accrualst-1) /

flows Average total assetst-1

Change in inventory (INVT- INVTt-1)/Average total assets

Change in operating ((MRC1/1.1+ MRC2/1.1^2+ MRC3/1.1^3+ MRC4/1.1^4+ MRC5/1.1^5) - (MRC1t-

lease activity 1/1.1+ MRC2t-1/1.1^2+ MRC3t-1/1.1^3+ MRC4 -1/1.1^4+ MRC5t-1/1.1^5) )/

Average total assets

Change in receivables (RECT- RECTt-1)/Average total assets

Change in return on IB / Average total assets - IBt-1 / Average total assetst-1

assets

Deferred tax expense TXDI / ATt-1

Demand for financing IF ((OANCF-(CAPXt-3+CAPXt-2+ CAPXt-1)/ 3) /(ACT) < -0.5 THEN 1 ELSE 0

(ex ante)

Earnings to price IB / (CSHO x PRCC_F)

Existence of operating IF (MRC1 > 0 OR MRC2 > 0 OR MRC3 > 0 OR MRC4 > 0 OR MRC5 > 0 THEN 1

leases ELSE 0

Expected return on PPROR

pension plan assets

Level of finance raised FINCF / Average total assets

Leverage DLTT / AT

Percentage change in ((1-(COGS+(INVT-INVTt-1))/(SALE-(RECT-RECTt-1)))-

cash margin (1-(COGSt-1+(INVTt-1-INVTt-2))/(SALEt-1-(RECTt-1-RECTt-2)))) /

(1-(COGSt-1+(INVTt-1-INVTt-2))/(SALEt-1-(RECTt-1-RECTt-2)))

Percentage change in ((SALE - (RECT - RECTt-1)) - (SALEt-1 - (RECTt-1 - RECTt-2))) /

cash sales (SALEt-1 - (RECTt-1 - RECTt-2))

RSST accruals RSST Accruals = (WC + NCO + FIN)/Average total assets, where

WC = (ACT - CHE) - (LCT - DLC)

NCO = (AT - ACT - IVAO) - (LT - LCT - DLTT)

FIN = (IVST + IVAO) - (DLTT + DLC + PSTK)

Soft assets (AT-PPENT-CHE)/Average total assets

Unexpected employee FIRM((SALE/EMP - SALEt-1/EMPt-1)/(SALEt-1/EMPt-1)) -

productivityc INDUSTRY((SALE/EMP - SALEt-1/EMPt-1)/(SALE t-1/EMPt-1))

WC accruals (((ACT - ACTt-1) - (CHE - CHEt-1)) - ((LCT - LCTt-1) - (DLC - DLCt-1) -

(TXP - TXPt-1)) - DP)/Average total assets

- 35 -

APPENDIX A: Definitions of explanatory variables (continued)

Variable Definitionb

Accounts receivable to (RECT/SALE)

sales

Accounts receivable to (RECT/AT)

total assets

Allowance for doubtful (RECD)

accounts

Allowance for doubtful (RECD/RECT)

accounts to accounts

receivable

Allowance for doubtful (RECD/SALE)

accounts to net sales

Altman Z score 3.3*(IB+XINT+TXT)/AT+0.999*SALE/AT+0.6*CSHO*PRCC_F/LT+

1.2*WCAP/AT+1.4*RE/AT

Big four auditor IF 0 < AU < 9 THEN 1 ELSE 0

Current minus prior year (INVT)/(SALE)-(INVTt-1)/(SALEt-1)

inventory to sales

Days in receivables (RECT/SALE)/(RECTt-1/SALEt-1)

index

Debt to equity (LT/CEQ)

Declining cash sales IF SALE-(RECT-RECTt-1) < SALE t-1-(RECT t-1-RECT t-2)

dummy THEN 1 ELSE 0

Fixed assets to total PPEGT/AT

assets

Four year geometric (SALE/SALEt-3)^(1/4)-1

sales growth rate

Gross margin (SALE-COGS)/SALE

Holding period return in (PRCC_F-PRCC_Ft-1)/PRCC_Ft-1

the violation period

Industry ROE minus NIindustry/CEQindustry - NI /CEQ

firm ROE

Inventory to sales INVT/SALE

Net sales SALE

Positive accruals dummy IF (IB-OANCF) > 0 and (IBt-1-OANCFt-1) > 0 THEN 1 ELSE 0

Prior year ROA to total (NIt-1/ATt-1) / AT

assets current year

Property plant and PPENT/AT

equipment to total assets

Sales to total assets SALE/AT

The number of auditor IF AU<>AUt-1 THEN 1 ELSE 0 + IF AUt-1<>AUt-2 THEN 1 ELSE 0 +

turnovers IF AUt-2<>AUt-3 THEN 1 ELSE 0

Times interest earned (IB+XINT+TXT) / XINT

Total accruals to total (IB-OANCF) / AT

assets

Total debt to total assets LT/AT

- 36 -

APPENDIX A: Definitions of explanatory variables (continued)

Total discretionary RSST Accrualst-1 + RSST Accrualst-2 + RSST Accrualst-3, where

accrual

Value of issued IF CSHI > 0 THEN CSHI*PRCC_F/(CSHO*PRCC_F) ELSE IF (CSHO-CSHOt-1)>0

securities to market THEN ((CSHO - CSHOt-1)*PRCC_F) / (CSHO*PRCC_F) ELSE 0

value

Whether accounts IF (RECT/RECT t-1) > 1.1 THEN 1 ELSE 0

receivable > 1.1 of last

years

Whether firm was listed IF EXCHG=5, 15, 16, 17, 18 THEN 1 ELSE 0

on AMEX

Whether gross margin IF ((SALE-COGS) / SALE) / ((SALEt-1 - COGSt-1)/SALEt-1) > 1.1 THEN 1 ELSE 0

percent > 1.1 of last

years

Whether LIFO IF INVVAL=2 THEN 1 ELSE 0

Whether new securities IF (CSHO-CSHOt-1)>0 OR CSHI>0 THEN 1 ELSE 0

were issued

Whether SIC code larger IF 2999<SIC<4000 THEN 1 ELSE 0

(smaller) than 2999

(4000)

Variable Definitionb

Sales SALE

Change in sales SALE - SALEt-1

% Change in sales (SALE - SALEt-1) / (SALEt-1)

Abnormal % change in (SALE - SALEt-1) / (SALEt-1) - INDUSTRY(SALE - SALEt-1) / (SALEt-1))

sales

Sales to assets SALE/AT

Change in sales to assets SALE/AT - SALEt-1/ATt-1

% Change in sales to (SALE/AT - SALEt-1/ATt-1) / (SALEt-1/ATt-1)

assets

Abnormal % change in (SALE/AT - SALEt-1/ATt-1) / (SALEt-1/ATt-1) -

sales to assets INDUSTRY(SALE/AT - SALEt-1/ATt-1) / (SALEt-1/ATt-1))

Sales to employees SALE/EMP

Change in sales to SALE/EMP - SALEt-1/EMPt-1

employees

% Change in sales to (SALE/EMP - SALEt-1/EMPt-1) / (SALEt-1/EMPt-1)

employees

Sales to operating SALE/XOPR

expenses

Change in sales to SALE/XOPR - SALEt-1/XOPRt-1

operating expenses

% Change in sales to (SALE/XOPR - SALEt-1/XOPRt-1) / (SALEt-1/XOPRt-1)

operating expenses

Abnormal % change in (SALE/XOPR - SALEt-1/XOPRt-1) / (SALEt-1/XOPRt-1) - INDUSTRY(SALE/XOPR-

sales to operating SALEt-1/XOPRt-1) / (SALEt-1/XOPRt-1))

expenses

Return on assets NI/AT

- 37 -

APPENDIX A: Definitions of explanatory variables (continued)

Change in return on NI/AT - NIt-1/ATt-1

assets

% Change in return on (NI/AT - NIt-1/ATt-1) / (NIt-1/ATt-1)

assets

Abnormal % change in (NI/AT - NIt-1/ATt-1) / (NIt-1/ATt-1) -

return on assets INDUSTRY(NI/AT - NIt-1/ATt-1) / (NIt-1/ATt-1))

Return on equity NI/CEQ

Change in return on NI/CEQ - NIt-1/CEQt-1

equity

% Change in return on (NI/CEQ - NIt-1/CEQt-1) / (NIt-1/CEQt-1)

equity

Abnormal % change in (NI/CEQ - NIt-1/CEQt-1) / (NIt-1/CEQt-1) -

return on equity INDUSTRY(NI/CEQ - NIt-1/CEQt-1) / (NIt-1/CEQt-1))

Return on sales NI/SALE

Change in return on NI/SALE - NIt-1/SALEt-1

sales

% Change in return on (NI/SALE - NIt-1/SALEt-1) / (NIt-1/SALEt-1)

sales

Abnormal % change in (NI/SALE - NIt-1/SALEt-1) / (NIt-1/SALEt-1) -

return on sales INDUSTRY(NI/SALE - NIt-1/ SALEt-1) / (NIt-1/SALEt-1))

Accounts payable to AP/INVT

inventory

Change in accounts AP/INVT - APt-1/INVTt-1

payable to inventory

% Change in accounts (AP/INVT - APt-1/INVTt-1) / (APt-1/INVTt-1)

payable to inventory

Abnormal % change in (AP/INVT - APt-1/INVTt-1) / (APt-1/INVTt-1) -

accounts payable to INDUSTRY(AP/INVT - APt-1/ INVTt-1) / (APt-1/INVTt-1))

inventory

Liabilities LT

Change in liabilities LT - LTt-1

% Change in liabilities (LT - LTt-1) / (LTt-1)

Abnormal % change in (LT - LTt-1) / (LTt-1) - INDUSTRY(LT - LTt-1) / (LTt-1))

liabilities

Liabilities to interest LT/XINT

expenses

Change in liabilities to

interest expenses LT/XINT - LTt-1/XINTt-1

% Change in liabilities (LT/XINT - LTt-1/XINTt-1) / (LTt-1/XINTt-1)

to interest expenses

Abnormal % change in (LT/XINT - LTt-1/XINTt-1) / (LTt-1/XINTt-1) -

liabilities to interest INDUSTRY(LT/XINT - LTt-1/XINTt-1) / (LTt-1/XINTt-1))

expenses

Assets AT

Change in assets AT - ATt-1

% Change in assets (AT - ATt-1) / (ATt-1)

Abnormal % change in (AT - ATt-1) / (ATt-1) - INDUSTRY(AT - ATt-1) / (ATt-1))

assets

Assets to liabilities AT/LT

- 38 -

APPENDIX A: Definitions of explanatory variables (continued)

Change in assets to AT/LT - ATt-1/LTt-1

liabilities

% Change in assets to (AT/LT - ATt-1/LTt-1) / (ATt-1/LTt-1)

liabilities

Abnormal % change in (AT/LT - ATt-1/LTt-1) / (ATt-1/LTt-1) -

assets to liabilities INDUSTRY(AT/LT - ATt-1/LTt-1) / (ATt-1/LTt-1))

Expenses XOPR

Change in expenses XOPR - XOPRt-1

% Change in expenses (XOPR - XOPRt-1) / (XOPRt-1)

Abnormal % change in (XOPR - XOPRt-1) / (XOPRt-1) -

expenses INDUSTRY(XOPR - XOPRt-1) / (XOPRt-1))

Notes:

a

The explanatory variables included represent a relatively comprehensive set of variables based on recent fraud and

material misstatement literature (Cecchini et al. 2010; Dechow et al. 2011; Perols 2011). We include all variables

from Perols (2011) and all variables from the final Dechow et al. (2011) model that can be calculated using

Compustat data. Dechow et al. (2011) perform step-wise backward feature selection to derive more parsimonious

material misstatement models. We use their second model, which is the most complete model in their study that

only relies on Compustat data (they also include a model that requires market related data). This study predicts

material misstatements using the following variables: RSST accruals, change in receivables, change in inventory,

soft assets, percentage change in cash sales, change in return on assets, actual issuance of securities, abnormal

change in employees, and existence of operating leases. The model in Cecchini et al. (2010) includes a total of

1,518 explanatory variables derived using 23 financial statement items. These items are divided by each other both

in the current year and in the prior year and used to calculate changes in the ratios. Both current and lagged ratios as

well as their changes are then used to construct a dataset with 1,518 independent variables. Rather than including all

1,518 variables in our study, we follow and extend the approach used in Cecchini et al. (2010) by including 48

variables measuring levels and changes in levels, percentage change in levels, and abnormal percentage change of

commonly manipulated financial statement items and ratios. We examine a model with all 1,518 variables from

Cecchini et al. (2010) in an additional analysis.

b

ACT is Current Assets - Total; AT is Assets - Total; AU is Auditor ; CAPX is Capital Expenditures; CEQ is

Common/Ordinary Equity - Total; CEQ is Common/Ordinary Equity - Total; CHE is Cash and Short-Term

Investments; COGS is Cost of Goods Sold; CSHI is Common Shares Issued; CSHO is Common Shares

Outstanding; DLC is Debt in Current Liabilities - Total; DLTIS is Long-Term Debt Issuance; DLTT is Long-Term

Debt - Total; DP is Depreciation and Amortization; EMP is Employees; EXCHG is Stock Exchange ; FINCF is

Financing Activities Net Cash Flow; IB is Income Before Extraordinary Items; INVT is Inventories - Total;

INVVAL is Inventory Valuation Method; IVAO is Investment and Advances Other; IVST is Short-Term

Investments - Total; LCT is Current Liabilities - Total; LT is Liabilities - Total; MRC1 is Rental Commitments

Minimum 1st Year; MRC2 is Rental Commitments Minimum 2nd Year; MRC3 is Rental Commitments

Minimum 3rd Year; MRC4 is Rental Commitments Minimum 4th Year; MRC5 is Rental Commitments

Minimum 5th Year; NI is Net Income (Loss); OANCF is Operating Activities Net Cash Flow; OB is Order

Backlog; PPEGT is Property Plant and Equipment - Total (Gross); PPENT is Property Plant and Equipment - Total

(Net); PPROR is Pension Plans Anticipated Long-Term Rate of Return on Plan Assets; PRCC_F is Price Close -

Annual - Fiscal Year; PSTK is Preferred/Preference Stock (Capital) - Total; RE is Retained Earnings; RECD is

Receivables - Estimated Doubtful; RECT is Receivables Total; SALE is Sales/Turnover (Net); SIC is SIC Code;

SSTK is Sale of Common and Preferred Stock; TXDI is Income Taxes - Deferred; TXP is Income Taxes Payable;

TXT is Income Taxes - Total; WCAP is Working Capital (Balance Sheet); XINT is Interest and Related Expense -

Total; XINT is Interest and Related Expense - Total; and XOPR is Operating Expense. We also included controls

for year and industry (two-digit SIC code).

c

Similar variable used in both Dechow et al. (2011) and Perols (2011).

d

Variable construction based on Financial Kernel in Cecchini et al. (2010).

- 39 -

Figure 1 Multi-subset Observation Undersampling (OU)

Notes:

Column 1 represents the raw data with the fraud observations stacked on top and non-fraud cases below. Column 1

also shows that model building and out-of-sample data are kept separated. Column 2 shows the data subsets that are

created based on the OU method. All fraud data are used in each subset while the non-fraud data are under-sampled

to address data rarity within each subset. Cumulatively across all subsets, all of the non-fraud data can be used, but

a single non-fraud observation is only used in one subset. In column 3, a classification algorithm is used to build

one prediction model per subset with the goal of accurately classifying firms into fraud or non-fraud cases. Each

model is then applied out-of-sample and generates a fraud probability prediction for each observation in the out-of-

sample data. In column 4, for each out-of-sample observation, the individual fraud prediction probabilities are then

combined to arrive at an overall combined fraud probability prediction for each observation.

More formally, let M={f1, f2, f3,, fk} be a set of k fraud observations f and let C={c1, c2, c3,, cn} be a set of n non-

fraud observations c, where M is the minority class, i.e., k < n. Note that the union of M and C, i.e., M U C, forms a

set that contains k fraud and n non-fraud observations. To achieve a more balanced dataset, d non-fraud

observations c are removed from the non-fraud set C, where 0 < d n - k. However, instead of deleting these

removed non-fraud observations, OU segments the non-fraud observations into n / (n - d) or fewer subsets Ui that

each contains n - d different non-fraud observations c, i.e., C={U1, U2, U3,, Un/n-d}. Note that all subsets Ui

contain mutually exclusive (disjoint) sets of non-fraud observations, Ui Uj = for i j. OU then combines all

fraud observations, i.e., the entire set M, with each Ui to create subsets Wi. OU thus creates up to n / (n - d) subsets

Wi that contain all fraud observations f and n - d unique non-fraud observations c. Each subset Wi is then used to

build a prediction model that is used to predict out-of-sample observations. In our experiments, OU is only used on

the model building data and the model evaluation data is left intact. Finally, for each out-of-sample observation, the

different prediction models probability predictions are averaged into an overall probability prediction for each

observation.

- 40 -

Figure 2 Multi-subset Variable Undersampling (VU)

Notes:

Column 1 represents the raw data that include all explanatory variables used to predict fraud. These explanatory

variables are partitioned into different subsets represented by the vertical lines. Each subset contains a subset of the

explanatory variables and all of the observations. Column 1 also shows that model building and out-of-sample data

are kept separated. In column 2, a classification algorithm is used to build one prediction model per variable subset

with the goal of classifying firms into fraud vs. non-fraud cases. Each prediction model is then applied out-of-

sample to generate a fraud probability prediction for each observation in the out-of-sample data. In column 3, for

each out-of-sample observation, the fraud prediction probabilities from the different prediction models are combined

to arrive at an overall combined fraud prediction probability for each observation.

More formally, let W denote a dataset with m variables x, i.e., W={x1, x2, x3,, xm}. VU reduces data dimensionality

by randomly dividing the variables in W into q subsets X, where each X contains m/q variables, i.e., the following

variable subsets are created by VU, X1={x1, x2, x3,, xm/q}, X2={xm/q+1, xm/q+2, xm/q+3,, x2m/q}, X3={x2m/q+1, x2m/q+2,

x2m/q+3,, x3m/q},, Xq={xm-m/q+1, xm-m/q+2, xm-m/q+3,, xm}. The subsets X are then used to build q prediction models.

The prediction models are then (i) used to predict out-of-sample observations and (ii) for each out-of-sample

observation, the prediction models probability predictions are combined into an overall prediction for each

observation by taking an average of the individual probability predictions.

- 41 -

Figure 3 Experimental Procedures Multi-subset Observation Undersampling (OU) Example

Start

Raw Data

Perform 10-fold

cross-validation

round n = {1, 2, 3,, 10}

Round n training data

For each OU

implementation l

= {1, 2, 3,, 20}

Create l OU

subsets

Round n test data

l round n training OU subsets

Build prediction

models

l prediction models

Predict

test data

l round n test data sets with predictions

For each n test data

observation, average the Combine

l probability predictions

OU subset

predictions

Round n test data with combined predictions

classification threshold scores

and calculate ECM scores

for each test set

l = 20? False

True

n = 10? False

True

End

- 42 -

Figure 4 Multi-subset Observation Undersampling (OU) with Different Numbers of Subsets -

Percentage Performance Improvement Relative to Benchmark

15ECM % New subsets

Improvement

14

13

12 New order

11

10 Original order

9

8

7

6

5

4

3

2

1

0 # OU

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Subsets

Notes:

ECM is calculated using a 0.6 percent fraud probability and a 30:1 false negative to false positive cost ratio.

As discussed in the text, three versions of the experiment were conducted. Original order refers to the main

OU experiment; new order refers to the analysis in which the OU subsets are selected in a different order; and

new subsets refers to the analysis in which the random sampling of non-fraud cases is repeated using a different

random draw.

The benchmark is simple undersampling (Perols 2011), which randomly removes non-fraud observations from

the sample to generate a more balanced training sample. This benchmark performs better than a benchmark that

includes all fraud and non-fraud observations. OU and the OU benchmarks use all variables (independent

variable reduction is examined in the VU analysis) and are implemented using support vector machines.

- 43 -

Figure 5 Multi-subset Variable Undersampling (VU) with Different Numbers of Subsets of

Explanatory Variables - Percentage Performance Improvement Relative to Benchmark

8ECM %

Improvement

7

6 Constant

5 number of

variables in each

4 subset

3

2

1

0

-1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# VU

-2 Subsets

-3

-4 All variables in

-5 each round

-6

Notes:

ECM is calculated using a 0.6 percent fraud probability and a 30:1 false negative to false positive cost ratio.

As discussed in the text, two versions of the experiment were conducted. The constant number of variables in

each subset experiment (the dashed line) uses subsets that contain five or six variables in each subset; the all

variables in each round experiment (the round dotted line) uses all variables in each experimental round by

randomly dividing all 109 variables into different subsets (consequently, as the number of subsets increases, the

number of variables in each subset decreases).

The all variables in each round experiment only manipulates the number of VU subsets in even increments.

The benchmark contains six randomly selected variables (from the 109 variables described in Appendix A) and

is equivalent to the VU implementation with only one subset. This benchmark performed better than

benchmarks implemented using (i) all the variables in the dataset and (ii) the variables selected in Dechow et al.

(2011), i.e., RSST accruals, change in receivables, change in inventory, soft assets, percentage change in cash

sales, change in return on assets, actual issuance of securities, abnormal change in employees, and existence of

operating leases VU and the VU benchmarks use all fraud and non-fraud observations (observation

undersampling is examined in the OU analysis) and are implemented using support vector machines.

- 44 -

Figure 6 Performance of combinations of OU, PVU, and SMOTE

Percentage Performance Improvement Relative to OU(12)

ECM %

Difference to PVU + OU(12) +

OU(12) SMOTE(600)

8

6 PVU + OU(12)

4

2 PVU

0 Cost

1:1 10:1 20:1 30:1 40:1 50:1 60:1 70:1 80:1 90:1 100:1 Ratio

-2

-4

-6 SMOTE(600)

-8

-10

-12

Notes:

OU is Multi-subset Observation Undersampling. OU(12) represents the best performing individual OU

implementation.

PVU is Multi-subset Observation Undersampling partitioned on fraud type.

SMOTE(600) is Multi-subset Observation Oversampling with an oversampling ratio of 600 percent. This represent

the best performing SMOTE implementation.

ECM is calculated assuming an evaluation fraud probability of 0.6 percent.

- 45 -

TABLE 1

Summary of Experiments

1. Evaluating the We create OU subsets with 20 percent fraud cases in each subset.b Each subset We use two benchmarks: (i) simple

number of Multi- includes all original fraud cases and a random sample of the original non-fraud observation undersampling, i.e., OU

subset cases selected without replacement. We then empirically examine the optimal with only one subset, as used in Perols

Observation number of subsets for implementing OU.c As sensitivity analyses, we repeat the (2011), and (ii) no undersampling.

Undersampling experiment using the same subsets, but select the subsets in a different order,

(OU) subsets and re-perform the random selection procedure of non-fraud cases. OU and the

benchmarks use all variables (data dimensionality reduction is examined in the

VU analyses below).

2a. Evaluating the To evaluate how many VU subsets to use, we first randomly select variables We use three benchmarks: (i) simple

number of Multi- from the 109 variables used in the prior literature and place these variables into variable undersampling (i.e., VU with

subset Variable 20 different subsets. Thus, each variable subset contains five or six variables. only one subset); (ii) a model that

Undersampling To determine how many VU variable subsets to use, we perform an experiment includes all variables; and (iii) model 2

(VU) subsets in which the subsets are randomly added one by one to VU. in Dechow et al. (2011).

2b. Evaluating the This experiment evaluates the performance of VU as we change the number of We use the same three benchmarks as

number of variables in each subset. We use all variables in each experimental round and the first VU experiment.

variables in each randomly divide the variables into the different subsets. Thus, as the number of

VU subset subsets increase, the number of variables in each subset decreases. For

example, all variables are included in one set in the first experimental round,

half the variables are included in each of two subsets in the second experimental

round, etc. This experiment skips all uneven rounds except for the first round to

reduce processing time.

3. Evaluating VU This experiment evaluates the performance of VU when partitioned on fraud We compare the performance of PVU

partitioned on types. Note that we do not examine the performance of different PVU to the best performing benchmark in the

fraud types (PVU) implementations in this experiment as the specific subsets included in PVU are VU experiments (i.e., simple variable

driven by the partitioning rather than an empirical evaluation. undersampling).

- 46 -

TABLE 1 (continued)

Summary of Experiments

Notes:

a

Since we introduce OU to the fraud detection literature to reduce the imbalance between the number of fraud and the number of non-fraud observations, we use

simple undersampling as a benchmark (Perols 2011) when evaluating the performance of OU. This benchmark randomly removes non-fraud observations from

the sample to generate a more balanced training sample. We also use no undersampling as an additional benchmark. However, simple undersampling performs

on average 7.3 percent better no undersampling and we consequently report only simple undersampling. The OU and the OU benchmarks use all variables (as

data dimensionality reduction is examined in the VU analysis). VU is introduced as a data dimensionality reduction method that is argued to improve the

performance over currently used variable selection methods. As a baseline, we use a benchmark that was created using the variables included in Dechow et al.

(2011) model 2 (the Dechow benchmark): RSST accruals, change in receivables, change in inventory, soft assets, percentage change in cash sales, change in

return on assets, actual issuance of securities, abnormal change in employees, and existence of operating leases. This model compares different fraud detection

variables with the objective of creating a parsimonious fraud prediction model. We also use (i) a benchmark that randomly selects variables and (ii) a benchmark

that includes all variables (the all variables benchmark), i.e., where data dimensionality is not reduced. The benchmark that randomly selects variables performs

better than the Dechow benchmark and the all variables benchmark. More specifically, VU with 12 variable subsets performs on average 7.2 percent better than

both the All Variable Benchmark and the benchmark based on Dechow et al. (2011). Thus, we report our results using the benchmark that randomly selects

variables. VU and the VU benchmarks use all fraud and non-fraud observations. Following recent fraud prediction research (e.g., Cecchini et al. 2010) and

findings in Perols (2011), all prediction models are implemented using support vector machines. Sensitivity analyses are used to examine other classification

algorithms.

b

Perols (2011) finds that a simple undersampling ratio of 20 percent provides relatively good performance compared to other undersampling ratios.

c

More specifically, we first create one subset and examine the performance of OU with this single subset. We then create a second subset and use this subset

along with the previously created subset to evaluate the performance of OU with two subsets. Note that while it is possible to derive a total of 41 subsets

following Chan and Stolfos (1998) approach, the addition of another OU subset is only valuable if the additional subset contains new information. We expect

that the marginal benefit of adding an additional subset decreases as the total number of subsets in OU increases. Additionally, for each subset that is added,

another prediction model has to be built, used for prediction, and combined with the other prediction models predictions. Thus, there is a computational cost

associated with increasing the number of subsets used. Based on this and the results that indicate that the performance benefit tapers off around 12 subsets, we

do not extend the experiment beyond 20 subsets.

- 47 -

TABLE 2

Multi-subset Observation Undersampling (OU)

Performancea - Increasing the Number of Subsets

Percentage

Number of Difference to

OU Subsets ECM Benchmark p-valueb

Benchmarkc 0.160

2 0.156 2.3% 0.146

3 0.151 5.4% 0.015

ECM Improving

4 0.151 5.4% 0.036

5 0.148 7.3% 0.031

6 0.149 6.7% 0.039

7 0.148 7.4% 0.012

8 0.146 8.9% 0.005

9 0.145 9.3% 0.005

10 0.143 10.8% 0.003

11 0.142 11.1% 0.003

Performance Plateau

13 0.142 11.1% 0.006

14 0.143 10.4% 0.011

15 0.144 10.1% 0.013

16 0.142 11.2% 0.008

17 0.142 11.1% 0.008

18 0.143 10.7% 0.009

19 0.143 10.4% 0.010

20 0.143 10.6% 0.009

Notes:

a

Performance is the average Expected Cost of Misclassification (ECM)

across the ten test folds. ECM is measured at best estimates of prior fraud

probability, i.e., 0.6 percent, and cost ratios, i.e., 30:1.

b

Reported p-values are based on pairwise t-tests using the average and

standard deviation in ECM scores across the ten test folds and are one-tailed

unless otherwise noted. Assumptions related to normality and independent

observations are unlikely to be satisfied and p-values are only included as

an indication of the relation between the magnitude and the variance of the

difference between each implementation and the respective benchmarks.

c

The benchmark is simple undersampling (Perols 2011), which randomly

removes non-fraud observations from the sample to generate a more

balanced training sample. This benchmark performed better than a

benchmark that included all fraud and non-fraud observations. OU and the

OU benchmarks use all variables (independent variable reduction is

examined in the VU analysis) and are implemented using support vector

machines. (Other classification algorithms are used in additional analyses.)

- 48 -

TABLE 3

Prediction Performancea,b of OU and PVU on a

Material Misstatements Hold-Out Sample

Notes:

a

Prediction performance is evaluate using 10-fold cross-validation in which separate datasets are used for model

building vs. model evaluation. Performance is area under the ROC curve (AUC). AUC provides a numeric

value of how well the prediction model ranks the observations in the test sets and represents the probability that a

randomly selected positive (misstatement) instance is ranked higher than a randomly selected negative (non-

misstatement) instance. An AUC of 0.5 is equivalent to a random rank order while an AUC of 1 is perfect

ranking of the evaluation cases.

b

The results in Panel A compares the performance of OU and VU to the Dechow benchmark using material

misstatement data (all methods and benchmarks are implemented using support vector machines; Panel B reports

results when other classification algorithms are used). This comparison provides further validation of the results

reported earlier on fraud data and provides insight into the usefulness of the proposed methods in a slightly

different setting. The results in Panel B examine the sensitivity of the proposed methods to the use of other

classification algorithms, i.e., logistic regression and bootstrap aggregation. Please see footnotes 4, 22, and 26 in

the text for details about support vector machines and bootstrap aggregation. The results in Panel C compare the

performance of the financial kernel from Cecchini et al. (2010) with and without OU (both implementations use

support vector machines). This analysis provides insight into (i) the usefulness of OU when used in combination

with a different set of independent variables (created using the financial kernel of Cecchini et al. (2010)) and (ii)

whether OU provides incremental predictive power when used in combination with the financial kernel.

- 49 -

TABLE 3 (continued)

Prediction Performance of OU and PVU on a

Material Misstatements Hold-Out Sample

c

In panels A and B, given the source, i.e., Dechow et al. (2011), and the nature of the material misstatement data,

we use the Dechow et al. (2011) benchmark in these comparisons. This benchmark is based on model 2 from

Dechow et al. (2011): material misstatement = RSST accruals + change in receivables + change in inventory +

soft assets + percentage change in cash sales + change in return on assets + actual issuance of securities +

abnormal change in employees + existence of operating leases. The independent variables in this model were

selected using a material misstatement sample that is similar to the sample used in this experiment. Because the

entire sample was used when selecting these variables it is possible that the benchmark performance represents

an overfitted model. In this experiment, OU uses all 107 variables, but under-samples the non-fraud

observations using the OU method. PVU uses all data, but partitions the original 107 variables based on fraud

types.

d

The financial kernel consists of 1,518 independent variables representing current and lagged ratios and changes

in the ratios of 23 financial statement variables commonly used to construct independent variables in fraud

research. In this experiment, OU is implemented using the same 1,518 independent variables and support vector

machines. PVU is not implemented in this experiment, as it is not clear how to partition the 1,518 independent

variables into different fraud categories.

e

p-values are one-tailed based on pairwise t-tests using the average and standard deviation of ECM scores across

the ten test folds. Assumptions related to normality and independent observations are unlikely to be satisfied and

p-values are only included as an indication of the relation between the magnitude and the variance of the

difference between each implementation and the benchmark.

- 50 -

TABLE 4

Hypothesis Testing: Results on Full Sample Logistic Regressions versus 12 OU Subsamples Logistic Regressions

Percent

Average St. Dev. p-value p-value p-values Lower p-value Upper

Variables Estimate Std Error ChiSquare Prob>ChiSq Estimate Estimates Mean Minimum below 0.05 Quartile Median Quartile

7.820 0.698 125.46 <0.001 5.513 0.632 <0.001 <0.001 100% <0.001 <0.001 <0.001

SOFT_ASSETS -3.012 0.611 24.34 <0.001 -3.251 0.636 <0.001 <0.001 100% <0.001 <0.001 <0.001

FOURYGEOM_S -1.750 0.392 19.96 <0.001 -1.917 0.873 0.017 <0.001 92% <0.001 <0.001 0.004

AZSCORE -0.101 0.026 15.15 <0.001 -0.118 0.056 0.012 <0.001 83% <0.001 <0.001 0.009

TACCRU_T_TA -3.247 1.007 10.40 0.0013 -3.383 0.729 0.012 <0.001 92% <0.001 0.002 0.008

T_XOPR 0.000 0.000 9.52 0.0020 0.000 0.000 0.016 <0.001 92% 0.002 0.006 0.012

PPANDEQ_T_TA -3.644 1.193 9.33 0.0023 -3.887 1.379 0.041 <0.001 83% <0.001 <0.001 0.012

NETS 0.000 0.000 7.26 0.0070 0.000 0.000 0.034 <0.001 83% 0.003 0.016 0.046

S_T_T_EMP 0.001 0.000 6.37 0.0116 0.001 0.001 0.180 <0.001 67% <0.001 0.044 0.280

FA_T_TA 1.816 0.720 6.36 0.0117 1.841 1.146 0.138 <0.001 58% <0.001 0.029 0.206

PCHG_ACCP_T_INV -0.005 0.002 5.93 0.0149 -0.006 0.002 0.045 <0.001 83% 0.010 0.017 0.034

T_APCHG_ACCP_T_INV 0.004 0.002 4.39 0.0362 0.004 0.001 0.112 0.002 25% 0.054 0.076 0.108

ASS_T_LIAB 0.162 0.081 3.97 0.0462 0.166 0.095 0.191 <0.001 33% 0.008 0.178 0.319

PCHG_ASS_T_LIAB -0.005 0.003 3.72 0.0538 -0.005 0.002 0.114 0.013 33% 0.043 0.112 0.150

ACCR_T_TA 1.550 0.817 3.60 0.0578 1.694 1.391 0.168 <0.001 58% 0.008 0.029 0.321

INDUSTRY_FIRM_ROE -0.060 0.033 3.29 0.0695 -0.058 0.012 0.127 0.052 0% 0.076 0.113 0.189

LIAB_T_IEXP 0.001 0.001 2.90 0.0887 0.002 0.001 0.159 0.024 25% 0.050 0.103 0.237

RSST_ACCRUALS 0.022 0.013 2.82 0.0928 0.021 0.005 0.140 0.054 0% 0.078 0.102 0.227

Note: Average estimates, standard deviation estimates, and average p-values are based on estimates and p-values from the 12 OU subsample logistic regression

results. P-values less than 0.0001 were converted to 0.0001 before taking the average.

- 51 -

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.