Sei sulla pagina 1di 183
Naveen Prakash Deepika Prakash Data Warehouse Requirements Engineering A Decision Based Approach

Naveen Prakash Deepika Prakash

Data
Data
Naveen Prakash Deepika Prakash Data Warehouse Requirements Engineering A Decision Based Approach

Warehouse

Requirements

Engineering

A Decision Based Approach

Naveen Prakash Deepika Prakash Data Warehouse Requirements Engineering A Decision Based Approach

Data Warehouse Requirements Engineering

Naveen Prakash Deepika Prakash

Data Warehouse Requirements Engineering

A Decision Based Approach

Naveen Prakash

Deepika Prakash

ICLC Ltd.

Central University of Rajasthan

New Delhi

Kishangarh

India

India

ISBN 978-981-10-7018-1

https://doi.org/10.1007/978-981-10-7019-8

ISBN 978-981-10-7019-8 (eBook)

Library of Congress Control Number: 2017961755

© Springer Nature Singapore Pte Ltd. 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional af liations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. part of Springer Nature. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

To

Our Family

Preface

That requirements engineering is part of the systems development life cycle and is about the rst activity to be carried out when building systems is today considered as basic knowledge in computer science/information technology. Requirements engineering produces requirements speci cations that are carried through to system design and implementation. It is assumed that systems automate speci c activities that are carried out in the real world. These activities are transactions, for example reservations, cancellations, buying, selling, and the like. Thus requirements engi- neering produces requirements speci cations of transactional systems. So long as systems were not very complex, the preparation of a requirements speci cation was feasible and did not compromise on system delivery times. However, as systems became more and more complex, iterative and incremental development came to the fore. Producing a requirements speci cation is now frowned upon and we need to produce, in the language of Scrum, user stories for small parts of the system. About the time requirements engineering was developing, data warehousing also became important. Data warehouse development faced the same challenges as transactional systems do, namely determination of the requirements to be met and the role of requirements engineering in the era of agile development. However, both these issues have been taken up relatively recently. Due to this recent interest in the area, requirements engineering for data ware- housing is relatively unknown. We fear that there is widespread paucity of understanding of the nature of data warehouse requirements engineering, how it differs from traditional transaction-oriented requirements engineering and what are the new issues that it raises. Perhaps, the role of agility in data warehouse development is even more crucial than in transactional systems development. This is because of the inherent com- plexity of data warehouse systems, long lead times to delivery, and the huge costs involved in their development. Indeed, the notion of data marts and the bus approach to data warehouse development is an early response to these challenges. This book is our attempt at providing exposure to the problem of data warehouse requirements engineering. We hope that the book shall contribute to a wider

viii

Preface

awareness of the difference between requirements engineering for transactional and data warehouse systems, and of the challenges that data warehousing presents to the requirements engineering community. The position adopted in this book is that even in the face of agile development, requirements engineering continues to be relevant. Requirements engineering today is not to produce requirements speci cations of entire systems. Rather, it is done to support incremental and iterative development. In other words, rather than restrict incremental and iterative development to downstream tasks of design and imple- mentation, we must extend it to the requirements engineering task as well. We argue that the entire data warehouse systems development life cycle should become agile. Thus, we make requirements and requirements engineering as the fulcrum for data warehouse agile development. Just as requirements speci cations of systems formed the basis for proceeding with systems development earlier, so also now requirements speci cations of system increments must form the basis of incremental and iterative development. Following this line of argument, instead of a requirements speci cation, we propose to develop requirements granules. It is possible to consider building a requirements granule per data mart. However, we consider a data mart as having very large granularity because it addresses an entire subject like sales, purchase etc. Therefore, the requirements granule that will be produced shall be large-grained resulting in relatively long lead times to delivery of the intended product increment. It is worth developing an approach to requirements engineering that can produce requirements granules of smaller sizes. To reduce the sizes of requirements granules, we introduce the notion of a decision and propose to build data warehouse fragments for decisions. Thus, data warehouse requirements engineering is for discovering the decisions of interest and then determining the information relevant to each decision. A requirements granule is the collection of information relevant to a decision. If this information is available then it is possible for the decision maker to obtain it from the data warehouse fragment, evaluate it, and decide whether to take the decision or not. This implies that the size of a granule is determined by the amount of information that is associated with a decision. The notion of a decision is thus central to our approach. A decision represents the useful work that the data warehouse fragment supports and a data warehouse fragment is the implementation of a requirements fragment. The approach in this book represents a departure from the conventional notion of a data mart that is built to analyze a subject area. Analysis for us is not an aim in itself but taking a decision is and analysis is only in support of the decision making task. As more and more decisions are taken up for development, there is a prolifer- ation of requirements granules and data warehouse fragments. This results in problems of inconsistent information across the enterprise, and of proliferating costs due to multiple platforms and ETL processes. This is similar to what happens in the bus-of-data-marts approach except that a decision may be of a lower granularity than a data mart. This means that we can expect many more data warehouse

Preface

ix

fragments than data marts and the problem of inconsistency and costs is even more severe. Given the severity of the problem, we do not consider it advisable to wait for the problem to appear and then take corrective action by doing consolidation. It is best to take a preventive approach that minimizes fragment proliferation. Again, keeping in mind that for us requirements are the fulcrum for data warehouse development, we consolidate requirements granules even as they are de ned. This book is a summary of research in the area of data warehouse requirements engineering carried out by the authors. To be sure, this research is ongoing and we expect to produce some more interesting results in the future. However, we believe that we have reached a point where the results we have achieved form a coherent whole from which the research and industrial community can bene t. The initial three chapters of the book form the backdrop for the last three. We devote Chap. 1 to the state of the art in transactional requirement engineering whereas Chap. 2 is for data warehouse requirements engineering. The salient issues in data warehouse requirements engineering addressed in this book are presented in Chap. 3 . Chapter 4 deals with the different types of decisions and contains techniques for their elicitation. Chapter 5 is devoted to information elicitation for decisions and the basic notion of a requirements granule is formulated here. Chapter 6 deals with agility built around the idea of the requirements granules and data warehouse fragments. The approach to data warehouse consolidation is explained here. The book can be used in two ways. For those readers interested in a broad-brush understanding of the differences between transactional and data warehouse requirements engineering, the rst three chapters would sufce. However, for those interested in deeper knowledge, the rest of the chapters would be of relevance as well.

New Delhi, India

Naveen Prakash

Kishangarh, India September 2017

Deepika Prakash

Contents

1 Requirements Engineering for Transactional Systems

1

1.1 Transactional System Development Life Cycle

2

1.2 Transactional Requirements Engineering

5

1.3 Requirements Engineering (RE) as a Process

6

1.4 Informal Approaches to Requirements Elicitation

8

1.5 Model-Driven Techniques

11

 

1.5.1 Goal Orientation

11

1.5.2 Agent-Oriented Requirements Engineering

13

1.5.3 Scenario Orientation

14

1.5.4 Goal Scenario Coupling

15

1.6

Conclusion

 

15

References

 

16

2 Requirements Engineering for Data Warehousing

19

2.1

Data

Warehouse

Background

19

2.2 Data Warehouse Development Experience

22

2.3 Data Warehouse Systems Development Life Cycle, DWSDLC

24

2.4 Methods for Data Warehouse Development

28

 

2.4.1 Monolithic Versus Bus Architecture

28

2.4.2 Data Warehouse Agile Methods

30

2.5 Data Mart Consolidation

34

2.6 Strategic Alignment

 

38

2.7 Data Warehouse Requirements Engineering

40

 

2.7.1 Goal-Oriented DWRE Techniques

43

2.7.2 Goal-Motivated Techniques

46

2.7.3 Miscellaneous Approaches

47

2.7.4 Obtaining Information

47

2.8

Conclusion

 

48

References

 

49

xii

Contents

3 Issues in Data Warehouse Requirements Engineering

51

3.1

The Central Notion of a Decision

51

3.1.1 The Decision Process

52

3.1.2 Decision-Oriented Data Warehousing

54

3.2

Obtaining Information Requirements

60

3.2.1 Critical Success Factors

60

3.2.2 Ends Achievement

61

3.2.3 Means Efciency

62

3.2.4 Feedback Analysis

62

3.2.5 Summary

62

3.3 Requirements Consolidation

63

3.4 Conclusion

 

68

References

69

4 Discovering Decisions

71

4.1

Deciding Enterprise Policies

72

4.1.1 Representing Policies

74

4.1.2 Policies to Choice Sets

75

4.2

Deciding Policy Enforcement Rules

79

4.2.1 Representing Enforcement Rules

80

4.2.2 Developing Choice Sets

82

4.3

De ning Operational Decisions

89

4.3.1

Structure of an Action

89

4.4

Computer-Aided Support for Obtaining Decisions

92

4.4.1 Architecture

92

4.4.2 User Interface

94

4.5

Conclusion

98

References

 

99

5 Information Elicitation

101

5.1 Obtaining Multidimensional Structure

101

5.2 Decisional Information Elicitation

103

5.3 The Decision Requirement Model

106

 

5.3.1 The Notion of a Decision

106

5.3.2 Metamodel of Decisions

107

5.3.3 Information

109

5.4 Eliciting Information

111

 

5.4.1 CSFI Elicitation

111

5.4.2 ENDSI Elicitation

112

5.4.3 MEANSI Elicitation

113

5.4.4 Feedback Information Elicitation

114

5.5 The Global Elicitation Process

114

Contents

xiii

5.6

Eliciting Information for Policy Decision-Making

116

5.6.1 CSFI Elicitation

116

5.6.2 Ends Information Elicitation

118

5.7 Eliciting Information for PER Formulation

118

5.8 Information Elicitation for Operational Systems

120

 

5.8.1 Elicitation for Selecting PER

120

5.8.2 Information Elicitation for Actions

121

5.9

The Late Information Substage

125

5.9.1 ER Schema for Policy Formulation

125

5.9.2 ER Schema for PER Formulation and Operations

126

5.9.3 Guidelines for Constructing ER Schema

126

5.10

Computer-Based Support for Information Elicitation

127

5.10.1 User Interfaces

127

5.10.2 The Early Information Base

131

5.11

Conclusion

132

References

 

133

6 The Development Process

135

6.1 Agile Data Warehouse Development

135

6.2 Decision Application Model (DAM) for Agility

137

6.3 A Hierarchical View

139

6.4 Granularity of Requirements

141

 

6.4.1

Selecting the Right Granularity

144

6.5 Showing Agility Using an Example

148

6.6 Comparison of DAM and Epic ThemeStory Approach

150

6.7 Data Warehouse Consolidation

151

6.8 Approaches to Consolidation

155

6.9 Consolidating Requirements Granules

156

 

6.9.1

An Example Showing Consolidation

160

6.10 Tool Support

165

6.11 Conclusion

167

References

 

168

7 Conclusion

169

About the Authors

Naveen Prakash started his career with the Computer Group of Bhabha Atomic Research Centre Mumbai in 1972. He obtained his doctoral degree from the Indian Institute of Technology Delhi (IIT Delhi) in 1980. He subsequently worked at the National Center for Software Development and Computing Techniques, Tata Institute of Fundamental Research (NCSDCT, TIFR) before joining the R&D group of CMC Ltd where he worked for over 10 years doing industrial R&D. In 1989, he moved to academics. He worked at the Department of Computer Science and Engineering, Indian Institute of Technology Kanpur (IIT Kanpur), and at the Delhi Institute of Technology (DIT) (now Netaji Subhas Institute of Technology (NSIT)), Delhi. During this period he provided consultancy services to Asian Development Bank and African Development Bank projects in Sri Lanka and Tanzania, respectively, as well as to the Indira Gandhi National Centre for the Arts (IGNCA) as a United Nations Development Programme (UNDP) consultant. He served as a scienti c advisor to the British Council Division, New Delhi and took up the directorship of various educational institutes in India. Post-retirement, he worked on a World Bank project in Malawi. Prof. Prakash has lectured extensively in various universities abroad. He is on the editorial board of the Requirements Engineering Journal, and of the International Journal of Information System Modeling and Design (IJISMD). He has published over 70 research papers and authored two books. Prof. Prakash continues to be an active researcher. Besides Business Intelligence and Data Warehousing, his interests include the Internet-of-things and NoSQL database. He also lectures at the Indira Gandhi Delhi Technical University for Women (IGDTUW), Delhi and IIIT Delhi.

Deepika Prakash obtained her Ph.D. from Delhi Technological University, Delhi in the area of Data Warehouse Requirements Engineering. Currently, she is an Assistant Professor at the Department of Big Data Analytics, Central University of Rajasthan, Rajasthan.

xvi

About the Authors

Dr. Prakash has ve years of teaching experience, as well as two years of experience in industrial R&D, building data marts for purchase, sales and inventory and in data mart integration. Her responsibilities in industry spanned the complete life cycle, from requirements engineering through conceptual modeling to extract- transform-load (ETL) activities. As a researcher, she has authored a number of papers in international forums and has delivered invited lectures at a number of Institutes throughout India. Her current research interests include Business Intelligence, Health Analytics, and the Internet-of-Things.

Chapter 1

Requirements Engineering for Transactional Systems

Transactional systems have been the forte of Information Systems/Software Engineering. These systems deal with automating the functionality of systems, to provide value to the users. Initially, up to the end of the decade of the 1960s, transactional systems were simple, single-function systems. Thus, we had payroll systems that accounts people would use to compute the salary of employees and print out salary. Information Systems/Software Engineering technology graduated to multi-functional systems that looked at the computerization of relatively larger chunks of the business. Thus, it now became possible to deal with the accounts department, the human resource department, customer interface, etc. Technology to deal with such systems stabilized in the period 1960 1980. Subsequently, attention shifted to even more complex systems, the computerization of the entire enterprise, and to inter-organization information systems. The demand for engineering of ever more complex systems led to the software crisis , a term widely used in the 1990s to describe the difculties that industry of that time faced. A number of studies were carried out and some of the problems highlighted were systems failure/rejection by clients, inability to deliver complex and large software well. The Standish Group s Chaos reports [ 1 ] presented software industry s record in delivering large-sized systems using traditional development methods. The Group conducted a survey of 8380 projects carried out by 365 major American companies. The results showed that projects worth up to even $750,000 exceeded budget and time. Further, they failed to deliver the promised features more than 55% of the time. As the size of the applications grew, the success rate fell to 25% for efforts over $3 million and down to zero for projects over $10 million. Bell labs and IBM [ 2 ] found that 80% of all defects in software products lie in the requirements phase. Boehm and Papaccio [ 3 ] said that correcting requirements errors is 5 times more expensive when carried out during the design phase; the cost of correction is 10 times during implementation phase; the cost rises to 20 times for corrections done during testing and it becomes an astronomical 200 times after the system has been delivered. Evidently, such corrections result in expensive products

© Springer Nature Singapore Pte Ltd. 2018 N. Prakash and D. Prakash, Data Warehouse Requirements Engineering ,

https://doi.org/10.1007/978-981-10-7019-8_1

1

2

1 Requirements Engineering for Transactional Systems

and/or total rejection of software. The Standish group [ 4 ] reported that one of the reasons for project failure is incomplete requirements . Clearly, the effect of poorly engineered requirements ranges from outright systems rejection by the customer to major reworking of the developed system. The Software Hall of Shame [5 ] surveyed around 30 large software development projects that failed between 1992 and 2005 to try to identify the causes of this failure. It was found that failures arise because either projects go beyond actual needs or because of expansion in the scope of the original project. This implied that requirements changed over the course of product development and this change was difcult to handle. The foregoing suggested that new methods of software development were needed that delivered on time, on budget, met their requirements, and were also capable of handling changing requirements. The response was twofold:

An emphasis on incremental and iterative product development rather than one-shot development of the entire product. Small, carefully selected product parts were developed and integrated with other parts as and when these latter became available. As we shall see this took the form of agile software development.

The birth of the discipline of requirements engineering in which the earlier informal methods were replaced by model-driven methods. This led to the systematization of the requirements engineering process, computer-based management of requirements, guidance in the requirements engineering task, and so on.

We discuss these two responses in the rest of this chapter.

1.1 Transactional System Development Life Cycle

The System Development Life cycle, SDLC, for transactional systems (TSDLC) starts from gathering system/software requirements and ends with the deployment of the system. One of the earliest models of TSDLC is the waterfall model. The waterfall model has six sequential phases. Each phase has different actors partici- pating in it. Output of one phase forms the input to the next phase. This output is documented and used by the actors of the next phase. The size of documentation produced is very large and time-consuming. Since the model is heavy on documentation, the model is sometimes referred to as document driven. Table 1.1 shows the actors and document produced against each phase of the life cycle. The process starts with identifying what needs to be built. There are usually several stakeholders of a system. Each stakeholder sits down with the requirements engineer and details what s/he speci cally expects from the system. These needs are referred to as requirements. A more formal de nition of the term requirements is available in the subsequent sections of this chapter. These requirements as given

1.1

Transactional System Development Life Cycle

3

Table 1.1 The different phases of TSDLC

TSDLC phase

Actor

Document

Requirements

Stakeholder, Requirements engineer

System requirements speci cation

engineering

System and software design

System analysts

High-level and low-level design documents

Implementation

Development team

Code

Veri cation

Tester

Test case document

Maintenance

Project manager, Stakeholder

User manuals

by the stakeholder are documented as a System Requirements Speci cation (SRS) document. Once the SRS is produced, the actors of the system design phase, system analyst, convert the requirements into high-level design and low-level design. The former describes the software architecture. The latter discusses the data structure to be used, the interfaces and other procedural details. Here, two documents are pro- duced, the high-level and low-level design document. The design documents are made available to the implementation team for the development activity to start. Apart from development, unit testing is also a feature of this phase. In the Veri cation phase, functional and non-functional testing is performed and a detailed test case document is produced. Often test cases are designed with involvement of the stakeholders. Thus, apart from the testers, stakeholders are also actors of this phase. Finally, the software is deployed and support is provided for maintenance of the product. Notice, each phase is explored fully before moving on to the next phase. Also notice, there is no feedback path to go to a previous already completed phase. Consider the following scenario. The product is in the implementation phase and the developers realize that an artifact has been poorly conceptualized. In other words, there is a need to rework a part of the conceptual model for development to proceed. However, there is no provision in the model to go back to the system design phase once the product is in the development phase. This model also implies that

(a)

Requirements once speci ed do not change. However, this is rarely the case. A feedback path is required in the event of changing requirements. This ensures that changes are incorporated in the current software release rather than waiting for the next release to adopt the changed requirement.

(b)

All requirements can be elicited from the stakeholders. The requirements engineering phase ends with a sign-off from the stakeholder. However, as already brought out, studies have shown that it is not possible to elicit all the requirements upfront from the stakeholders. Stakeholders are often unable to envision changes that could arise 12 24 months down the line and generally mention requirements as of the day of the interview with the requirements engineer.

4

1 Requirements Engineering for Transactional Systems

Being sequential in nature, a working model of the product is released only at the end of the life cycle. This leads to two problems. One that feedback can be got from the stakeholder only after the entire product is developed and delivered. Even a slightly negative feedback means that the entire system has to be redeveloped; considerable time and effort in delivering the product is wasted. The second problem is that these systems suffer from long lead time for product delivery. This is because the entire requirements speci cation is made before the system is taken up for design and implementation. An alternate method to system development is to adopt an agile development model. The aim of this model is to provide an iterative and incremental devel- opment framework for delivery of a product. An iteration is de ned by clear

deliverables which are identi ed by the stakeholder. Deliverables are pieces of the product usable by the stakeholder. Several iterations are performed to deliver the

nal product making the development process incremental. Also, iterations are time

boxed with time allocated to each iteration remaining almost the same till the nal product is delivered. One of the popular approaches to agile development is Scrum. In Scrum, iter- ations are referred to as sprints. There are two actors, product owner and developer. The product owner is the stakeholder of the waterfall model. The requirements are

elicited in the form of user stories. A user story is de ned as a single sentence that identi es a need. User stories have three parts, Who identi es the stakeholder, What identies the action, and Why identi es the reason behind the action.

A

good user story is one that is actionable, meaning that the developer is able to use

it

to deliver the need at the end of the sprint. Wake [ 6 ] introduced the INVEST test as a measure of how good a user story is.

A

good user story must meet the following criteria: Independent, Not too speci c,

Valuable, Estimable, Small, and Testable. One major issue in building stories is that

of determining when the story is small . Small is de ned as that piece of work that

can be delivered in a sprint. User stories as elicited from the product owner may not

t in a sprint. Scrum uses the epic theme user story decomposition approach to

deal with this. Epics are stories identi ed by the product owner in the rst con- versation. They require several sprints to deliver. In order to decompose the epic, further interaction with the product owner is performed to yield themes. However, a theme by itself may take several sprints, but a lesser number than for its epic, to deliver. Therefore, a theme is further decomposed into user stories of the right size. When comparing agile development model with the waterfall model, there are two major differences as follows:

1. In Scrum, sprints do not wait for the full requirements speci cation to be pro- duced. Further, the requirements behind a user story are also not fully speci ed but follow the 80 20 principle. 80% of the requirements need to be clari ed before proceeding with a sprint and the balance 20% are discovered during the sprint. Thus, while in waterfall model, stakeholder involvement in the require- ments engineering phase ends with a sign-off from the stakeholder, in Scrum the stakeholder is involved during the entire life cycle. In fact, iterations proceed with the feedback of the stakeholder.

1.1

Transactional System Development Life Cycle

5

2. At the end of one iteration, a working sub-product is delivered to the stake- holder. This could either be an enhancement or a new artifact. This is unlike the waterfall model where the entire product is delivered at the end of the life cycle.

1.2 Transactional Requirements Engineering

Let us start with some basic de nitions that tell us what requirements are and what requirements engineering does.

Requirements

A requirement has been de ned in a number of ways. Some de nitions are as follows.

De nition 1 : A requirement as de ned in [ 7 ] is (1) a condition or capability needed by a user to solve a problem or achieve an objective, (2) A condition or capability that must be met or possessed by a system or system components to satisfy a contract, standard, speci cation or other formally imposed documents, (3) A document representation of a condition as in (1) or in (2) . According to this de nition, requirements arise from user, general organization, standards, government bodies, etc. These requirements are then documented. A requirement is considered as a speci c property of a product by Robertson, and Kotonya as shown in De nition 2 and De nition 3 below.

De nition 2 : Something that the product must do or a quality that the product must have [8 ].

De nition 3 : A description of how the system shall behave, and information about the application domain, constraints on operations, a system property etc. [ 9 ].

De nition 4 : Requirements are high level abstractions of the services the system shall provide and the constraints imposed on the system . Requirements have been classi ed as functional requirements, FR, and non-functional requirements, NFR. Functional requirements are statements about what a system should do, how it should behave, what it should contain, or what components it should have and non-functional requirements are statements of quality, performance and environment issues with which the system should con- form [ 10 ]. Non-functional requirements are global qualities of a software system, such as exibility, maintainability, etc. [ 11 ].

Requirements Engineering

Requirements engineering, RE, is the process of obtaining and modeling require- ments. Indeed, a number of de nitions of RE exist in literature.

De nition 1 : Requirements engineering (RE) is de ned [ 7 ] as the systemic process of developing requirements through an iterative cooperative process of analyzing

6

1 Requirements Engineering for Transactional Systems

the problem, documenting the resulting observations in a variety of representation formats and checking the accuracy of understanding gained . The process is cooperative because different stakeholders have different needs and therefore varying viewpoints. RE must take into account con icting views and interests of users and stakeholders. Capturing different viewpoints allows con icts to surface at an early stage in the requirements process. Further, the resulting requirements are the ones that are agreeable to both customers and developers.

De nition 2 Zave [ 12 ]: Requirements engineering deals with the real-world goals for functions and constraints of the software system. It makes a precise speci cation of software behavior and its evolution over time. This de nition incorporates real-world goals in its de nition. In other words, this de nition hopes to capture requirements that answer the why of software systems. Here, the author is referring to functional requirements. Further, the de nition also gives emphasis to precise requirements . Thus quality of require- ments captured is also important.

De nition 3 van Lamsweerde [ 13 ]: RE deals with the identi cation of goals to be achieved by the system to be developed, the operationalization of such goals into services and constraints.

De nition 4 Nuseibeh and Easterbrook [ 14 ]: RE aims to discover the purpose behind the system to be built, by identifying stakeholders and their needs, and their documentation. Here, the emphasis is on identifying stakeholders and capturing the requirements of the stakeholders.

1.3 Requirements Engineering (RE) as a Process

Evidently, requirements engineering can be viewed as a process with an input and an output. Stakeholders are the problem owners. They can be users, designers, system analysts, business analysts, technical authors, and customers. In the RE process, requirements are elicited from these sources. Output of the process is generally a set of agreed requirements, system speci cations, and system models. The rst two of these are in the form of use cases, goals, agents, or NFRs. System models can be object models, goal models, domain descriptions, behavioral models, problem frames, etc. There are three fundamental concerns of RE, namely, understanding the prob- lem, describing the problem, and attaining an agreement on the nature of the problem. The process involves several actors for the various activities. We visualize the entire process as shown in Fig. 1.1 . There are four stages each with speci c actors, marked with green in the gure. A requirements engineer is central in the entire process. Let us elaborate the components of the gure.

1.3

Requirements Engineering (RE) as a Process

7

Users EARLY PHASE Literature Existing - software stakeholders Requirements Engineer Stakeholder Different
Users
EARLY PHASE
Literature
Existing -
software
stakeholders
Requirements Engineer
Stakeholder
Different
Requirements
understanding of
Inconsistencies
elicitation
system
Missing requirements
Ambiguous requirements
New
Project Manager
Requirements Engineer
Verification
information
Analysis and
Stakeholder
Stakeholder
and Validation
required
Negotiation
Facilitator
Specification
and
Conflicting requirements
Documentation
System analysts
Domain experts
Formal Specification
LATE PHASE
languages
Knowledge
representation
language
Fig. 1.1 The RE process

Requirements Elicitation : Requirements are elicited from users, domain experts, literature, and existing software that is similar to the one to be built, stakeholders, etc. As can be seen in the gure, this forms the input into this step. The actors involved in this step are requirements engineer and stakeholders. The requirements engineer uses one or more of the several elicitation techniques, described in the later sections of this chapter, elicits requirements from the stake- holder. Usually, several sessions are required where each session may employ a different elicitation technique. This is a labor-intensive step and usually takes a large amount of time and resources. Analysis/Negotiation : Requirements elicited during the previous step is input into this step. As can be seen in the gure, a facilitator is also an actor along with requirements engineer and stakeholders. There are multiple stakeholders for the systems to-be. Each stakeholder may have a different view of what functionality the

8

1 Requirements Engineering for Transactional Systems

system must have and what is the goal of building the system to-be. This gives rise to con icts. In this step, an agreement between the various stakeholders on the requirements of the system to-be is established with the help of a facilitator. Notice a red arrow in Fig. 1.1 from this step to the previous requirements elicitation step. It may happen that during resolution of the con icts, there is a new and different understanding of the system entirely or of a part of the system. By going back to the requirements elicitation stage, new requirements are elicited. Speci cation and Documentation : Once all con icts are resolved, require- ments are documented for use in subsequent stages of system development. As shown in Fig. 1.1 , the document may be in formal speci cation languages, knowledge representation languages, etc. System analyst and domain experts are involved in this task. It is possible that during documentation new con icting requirements are found. To accommodate this, there is a feedback loop, shown by the red arrow, which goes from this stage to the analysis/negotiation stage. It may also happen that more information in the form of requirements is needed for the system to be built. For this, there is a loop (red arrow in the gure) to the re- quirements speci cation stage. Veri cation and Validation (V&V) : The main goal here is to check if the document meets the customers /clients needs. The input into this stage is the documented requirements. The project manager along with the stakeholder is involved in this task. Consistency and completeness are some aspects of the doc- ument that are checked. Once the requirements have been veri ed and validated, the RE process is considered completed and other phases of the TSDLC described in Sect. 1.1 are executed. However, if any inconsistencies, missing requirements or ambiguous requirements are found, then the entire process is repeated. There are two phases of RE, an early phase and late RE phase. The dotted line running diagonally across Fig. 1.1 divides the RE process into the two phases. Early RE phase focuses on whether the interest of stakeholders is being addressed or compromised. Requirements elicitation and analysis/negotiation form the early RE phase. Late RE phase focuses on consistency, completeness, and veri cation of requirements [ 15 ].

1.4 Informal Approaches to Requirements Elicitation

Requirements elicitation was initially performed using techniques like interviews, analyzing existing documentation, questionnaires group elicitation techniques, brainstorming, and eventually evolved into JAD/RAD, workshops, prototyping, contextual, cognitive, and ethnographic studies. This set of techniques is infor- mal , in contrast to model-driven techniques described later. Let us look at each of these approaches individually, in terms of the technique employed to elicit requirements, their advantages and disadvantages.

1.4

Informal Approaches to Requirements Elicitation

9

(i)

Interviews Interviews are held between the requirements engineer and the stakeholder. Interviews are commonly acknowledged to be a stimulus re- sponse interaction [ 16 ]. This interaction is based on some (usually unstated) assumptions [ 16 ]. The requirements engineer prepares a set of relevant questions in order to get to know what the stakeholders want. It is assumed that the questions will be read without variation, will be interpreted in an unambiguous way, and will stimulate a valid response. Suchman and Jordan [ 17 ] argue that this validity is not assured. There is another problem of the requirements engineer trying to impose their views in the form of the questions they ask.

(ii)

Analyzing existing documentation Documentation like organizational charts, process models, or standards or existing manuals can be analyzed to gather requirements for systems that closely resemble or are a replacement to an old system. If the documentation is well done, then it can form a rich source of requirements [ 10 ]. Great caution has to be imposed while ana- lyzing the documentation. A tendency to over-analyze the existing docu- mentation often leads to the new system to be too constrained [ 18 ].

(iii)

QuestionnairesThis technique is generally used for problems that are fairly concrete and in understanding the external needs of a customer [ 18 ]. This method has several advantages. One can, quickly collect information from large numbers of people, administer it remotely, and can collect atti- tudes, beliefs, and characteristics of the customer. However, this technique has certain disadvantages [16 ]. The questionnaire can have simplistic (presupposed) categories providing very little context. It also limits room for users to convey their real needs. One must also be careful while selecting the sample to prevent any bias.

(iv)

Group elicitation techniques Focus groups are a kind of group inter- views [16 ]. They overcome the problem of the interviewing technique being a very rigid interaction. This technique exploits the fact that a more natural interaction between people helps elicit richer needs [ 14 ]. The groups gen- erally are formed ad hoc based on the agenda of the day. It usually consists of stakeholders and requirements engineers. These group elicitation tech- niques are good for uncovering responses to products. These techniques overcome the disadvantages of interviews. However, they are not found to be effective in uncovering design requirements. Two popular groups are the Joint Application Development (JAD) and the Rapid Application Development (RAD) group.

(v)

Brainstorming [ 19 ] This is highly specialized groups consisting of actual users, middle level, and/or total stakeholders brainstorm in order to elicit the requirements. This process has two phases: idea generation and idea re- duction . In the idea generation phase, many ideas as possible are generated. These ideas are then mutated. Ideas can also be combined. The idea reduction phase involves pruning ideas generated in the idea generation phase that are not worthy of further discussion. Similar ideas are grouped into one super topic. The ideas that survive the idea reduction phase are

10

1 Requirements Engineering for Transactional Systems

documented and prioritized. There is a trained facilitator during both the phases. The job of this facilitator is to see to it that criticism and personal egos of the group members do not come into play. He/she should also ensure that a published agenda is strictly followed.

(vi) Workshops This is one of the most powerful techniques for eliciting

requirements [19 ]. Workshops are attended by all key stakeholders for a short but intensely focused period. Brainstorming is the most important part of the workshop [ 14 ]. It is agenda based. The agenda is published along with the other pre-workshop documentation. Balance is the key to workshops. There is an outside facilitator to see that the group tries to follow the agenda but to not to strictly obey it, especially if good discussion is going on. (vii) Prototyping This technique has been used where requirements are uncertain or fuzzy or when customer feedback is required [ 14 ]. Here, tan- gible operating subsystems are built and feedback is sought from the cus- tomers. Prototyping is generally combined with other elicitation techniques. Building prototypes are useful when the system to be built is small and the cost of prototyping is low. An additional constraint is that rapid prototyping should not be done unless building the subsystem is really rapid [ 18 ]. Prototyping is proposed for exploring the friendliness of user interfaces, thus helping in converting a vague goal like user friendliness into speci c system properties or behavior. (viii) Contextual Software systems are used in a social and organizational context. This can in uence or even dominate the system requirements. So it is

necessary to determine the requirements corresponding to the social system [ 20 ]. For example, consider a hospital procedure for treatment of patients. In

India, the hospital rst asks for the patient to register. However, in Europe the

rst thing asked is for insurance coverage. Therefore, the system designed for

an Indian hospital will have a different set of requirements. Another example of booking a ticket can be looked at. In India, we rst block a seat and then pay. However, in Europe payment has to be made before blocking the seat. To capture this social and organizational context, a social scientist observes and analyzes how people actually work.

(ix) Cognitive - Knowledge elicitation techniques Knowledge related to the domain and performance can be elicited by this technique [ 21 ].

Eliciting Performance Knowledge. This is done by a procedure called Protocol Analysis. In this method, the experts think aloud. There is an observer who observes the expert and tries to understand the cognitive process of the expert. This method of analysis is good for understanding interaction problems with existing systems.

Eliciting Domain Knowledge

(a) Laddering Probes are used to elicit structure and content of stakeholder knowledge.

1.4

Informal Approaches to Requirements Elicitation

11

(b) Card Sorting Cards have some domain entity. Stakeholders sort the cards into groups. This technique helps elicit requirements based on some classi cation knowledge.

Multiple Experts: Delphi technique Used where contact between experts is difcult. Each expert submits her judgment. All judgments circulated anonymously. Each expert submits revised judgment. The process iterates.

1.5 Model-Driven Techniques

Informal techniques rely heavily on the intuition of requirements engineer and depend on stakeholders views. The nature of the questions to be asked, the depth to which a question is to be answered, and the information to be elicited all lie in the minds of the requirements engineer. There is little computer-based support to manage elicited requirements or to guide the requirements engineer in the RE task. Further, as we move to systems with increasing complexity, requirements re ect only the data and processes needed by the system to-be, thereby making it difcult to understand requirements with respect to high-level concerns [ 22 ] of the business. Modeling requirements has today become a core process in requirements elici- tation. Generally, the system and possible alternate con gurations of the system are modeled. These techniques shift the focus from what feature of the system to why of the system [ 23 ]. While the former focuses on activities of the system, the latter focuses on the rationale for setting the system up. There are two techniques:

goals and agent-oriented modeling both of which are interrelated.

1.5.1 Goal Orientation

Goal-oriented requirements engineering (GORE) is concerned with the use of goals for eliciting, elaborating, structuring, specifying, analyzing, negotiating, documenting, and modifying requirements [ 24 ]. This indicates that goals can be used in almost every activity of the requirements process. Goals have been looked upon in a number of ways some of which are described below:

(i) Dardenne et al. [ 25 ] state that goals are high-level objectives of the business, organization, or system; they capture the reasons why a system is needed and guide decisions at various levels within the enterprise.

(ii) According to [23 ], Goals are targets for achievement which provide a

framework for the desired system. Goals are high level objectives of the business, organization, or system.

12

1 Requirements Engineering for Transactional Systems

It is also interesting to observe that goals are prescriptive statements as against descriptive statements [ 26 ] in that they state what is expected from the system and not statements describing the domain of the system. Goals have been used in RE for eliciting functional requirements [ 25 ] as well as non-functional requirements [11 ]. Hard goals can be satisfyced by the system and used for modeling and analyzing FRs [ 10 ]. Satisfaction and information goals are examples of hard goals. Softgoals are goals that do not have a clear-cut criterion for their satisfaction [ 11 ] and are used to model and analyze NFRs. Goals are modeled using the goal decomposition method. It was noticed that goals positively or negatively supportother goals [27 ]. These goal links are used as the basis for building a re nement tree and the links are expressed in terms of AND/OR associations. Thus, goal models are directed acyclic graphs with the nodes of the graphs representing goals [ 28 ] and achievement as edges. An AND association means that all the subgoals, g 1, , gn , must be satis ed to satisfy the parent goal, g . An OR association means that satisfying at least one subgoal, g 1, , gn , is enough to satisfy the parent goal, g . The third link, con ict , was also added to the re nement tree to capture the case when satisfying one goal caused another goal to not be satis ed. Further links were added to this basic model. van Lamsweerde [ 13 ] introduced pre-conditions, post-conditions, and trigger condi- tions. Link between goals and operations was also introduced by Dardenne et al. [ 25 ] where lowest level goals were said to be operational goals. This meant that operational goals can be implemented using functions of a functional system. Before one can start goal modeling, goals need to be identied. One source of goals is from current systems and documents like ER diagrams, owcharts, etc. Another source is from stakeholder interviews. Stakeholders own goals, though, requirements, are expressed by them not in terms of goals but as actions and operations [ 23 ]. Goals can be extracted from actions by selecting appropriate action words . It is the agents/actors that ful ll goals. So during goal modeling, goals are identi ed and then during operationalization, agents are allocated to goals. KAOS method described below though does model agents having wishes and they participate in the RE process. Two Goal-Oriented Requirements Engineering, GORE, techniques are brie y described below:

KAOS [25 ] de nes a formal speci cation language for goal speci cation con- sisting of objects, operations, agent, and goal. Objects can be entities, rela- tionships, or events. The elicitation process is in two parts. Initially, an initial set of system goals and objects and an initial set of agents and actions are de ned. In the second part, re ning goals using AND/OR decomposition, identifying obstacles to goals, operationalizing goals into constraints, and re ning and formalizing de nitions of objects and actions are done iteratively. Goal re ne- ment ends when every subgoal is realizable by some agent.

Goal-Based Requirements Analysis Method, GBRAM, Ant ó n [ 23 ] identi es, elaborates, and re nes the goals as requirements. It deals with two issues. How can goals be identied and what happens to requirements when goals change?

1.5

Model-Driven Techniques

13

The rst part of the question is answered by Goal Analysis and the second part by Goal Evolution. In the former, goals, stakeholders, actors, and constraints are identied. This gives a preliminary set of goals. Once validated by the stake- holders, this initial set can be rened.

It has been observed by Ant ó n and Potts [ 29 ] that identifying goals of the system is not the easiest task. GORE is subjective, dependent on the requirements engineer view of the real world from where goals are identied [ 28 ]. Horkoff and Yu [ 30 ] also point out that such models are informal and incomplete and difcult to precisely de ne . Horkoff and Yu [31 ] observe goal modeling is not yet widely used in practice [ 32 ] and notice that constructs used in KAOS are not used in practice.

1.5.2 Agent-Oriented Requirements Engineering

Agents have been treated in software engineering as autonomous units that can change state and behavior. They can be humans, machine, or any other type. Agents have the following properties [ 15 , 22 , 33 ]:

(i) Agents are intentional in that they have properties like goals, beliefs, abili- ties, etc. associated with them. These goals are local to the agent. It is important to note that there is no global intention that is captured.

(ii) Agents have autonomy. However, they can in uence and constrain one

another. This means that they are related of each other at the intentional level.

(iii) Agents are in a strategic relationship with each other. They are dependent on each other and are also vulnerable w.r.t. other agents behavior.

Agents help in de ning the rationale and intensions of building the system. This enables them to ask and answer the why question. Agent-oriented RE focuses on early RE (see Fig. 1.1 ). The central concept is that goals belong to agents rather than the concept in GORE where agents ful l goals . Notice that even though it is possible to have goals without agents and agents without goals, goals and agents complement each other. We discuss the i* framework. i* framework was developed for modeling and reasoning the organizational environment and its information system. The central concept of i* is that of the intentional actor . This model has two main concepts, the Strategic Dependency Model (SDM) and the Strategic Rationale Model (SRM). Both early and late phase requirements can be captured through this model. SDM component of the model describes the actors in their organizational environments and captures the intentional dependencies between them. The free- dom and the constraints of the actors are shown in terms of different dependencies like goal, task, softgoal, and resource dependencies. SRM is at a much lower level of abstraction than SDM. It captures the intentional relationships that are internal

14

1 Requirements Engineering for Transactional Systems

and inside actors. Intentional properties are modeled as external dependencies, using means ends relationships as well as task decomposition. Means ends rela- tionship helps us understand why an actor would engage in some task. This can also assist in the discovery of new softgoals and therefore provide more alternate solutions. During modeling, we can travel from means to ends or vice versa. Task decomposition results in hierarchy of intentional elements part of a routine. Matulevi č ius and Heymans [ 32 ] notice that constructs used in i* are not used in practice. Further, using ontological studies, they found similarities between i* and KAOS. They concluded that constructs like i* goal and soft goal of KAOS, and means end link of i* and contribution relation of KAOS are conceptually the same.

1.5.3 Scenario Orientation

Scenarios have been used for requirements engineering [ 34 ] particularly for elici- tation re ning and validating requirements, that is, in the late RE phase. Scenarios have also been used to support goals formulated in the early requirements phase. They show whether the system satis es (ful llment) or does not satisfy (non-fulllment) a goal. In other words, scenarios concretise goals. Holbrook [ 35 ] states that Scenarios can be thought of as stories that illustrate how a perceived system will satisfy a user s needs. This indicates that scenarios describe the system from the viewpoint of the user. They have a temporal com- ponent as seen in the de nition given by van Lamsweerde and Willemet [ 36 ]: a

scenario is a temporal sequence of interaction events between the software to-be and its environment in the restricted context of achieving some implicit purpose(s) . Scenarios have also been de ned with respect to agents. Plihon et al. [ 37 ] say that scenario is “… possible behaviours limited to a subset of purposeful communi- cations taking place among two or several agents .

A meta schema was proposed by Sutcliffe et al. [34 ] that shows the relationship

between goals, scenarios, and agents. Scenarios are a single instance of a use case. Use cases are composed of actions that help in ful llment of goals. One use case ful lls one goal. A single action involves one or more agents.

Several elicitation techniques exist two of which are described below:

SBRE [ 35 ]: There are two worlds, users and designers world. The goal set is

de ned in the user s world. It contains information regarding goals and con-

straints of the system. The goals are represented as subgoals. The design set is in the designer s world. This set consists of design models that represent the

system. The goal set and the design set communicate with each other with the help of scenarios that is in the scenario set. This set shows how a speci c design meets a goal. Scenarios have a one-to-one relationship with the design models.

A speci c scenario may satisfy many goals. Any issue that may arise is captured

in the issue set. A feedback cycle captures the user s response to issue and design. Scenarios form part of the speci cation of the required system.

1.5

Model-Driven Techniques

15

CREWS [34 ]: This technique is integrated with OO development and employs use cases to model functionality of the system to-be. Here, scenario is repre-

sented by one instance of an event which is de ned by a pathway of a use case. Thus, many scenarios can be generated from one use case and one scenario is composed of one or more events. Use cases are elicited from users and for- matting guidelines. The use cases are compared with generic requirements and

nally normal and exception ows are modeled. From the former normal sce-

narios and from the latter exception, scenarios are generated. They are validated using validation frames. Scenarios originated from system design and those captured from actual experience are captured by this technique.

1.5.4 Goal Scenario Coupling

Proposals for goal scenario coupling also exist in literature [38 41 ]. This can be unidirectional from goals to scenarios or bidirectional coupling of goals and sce- narios. Unidirectional coupling says that goals are realized by scenarios and this reveals how goals can be achieved. Bidirectional coupling considers going from scenario to goals in addition to going from goals to scenarios. It says that scenarios can be sources of subgoals of the goal for which the scenario is written.

1.6 Conclusion

Origins of Requirements and Requirements Engineering, RE, lie in Information Systems/Software Engineering (SE) with the aim to deliver the needed functionality in the hands of the user. The system development task starts when requirements are elicited from users, collected, and prioritized, and a system speci cation is made. The area of agile development has grown independently of the subject of requirements engineering. Therefore, the in uence of one on the other has been rather limited. In the next chapter, we will consider the impact of transactional requirements engineering on data warehouse requirements engineering. We will also see that given the complexity of data warehouse systems, agility is of the essence. Therefore, the in uence of agile development on data warehouse development shall also be taken up in the next chapter.

16

1 Requirements Engineering for Transactional Systems

References

1.

Chaos Report, Standish Group. (1994).

2.

Hooks, I. F., & Farry, K. A. (2001). Customer-centered products: Creating successful products through smart requirements management . New York: AMACOM Div American Mgmt Assn.

3.

Boehm, B. W., & Papaccio, P. N. (1988). Understanding and controlling software costs. IEEE Transactions on Software Engineering, 14 (10), 1462 1477.

4.

Standish Group. (2003). Chaos Chronicles Version 3.0. West Yarmouth, MA.

5.

Charette, R. N. (2005). Why software FAILS. IEEE Spectrum, 42(9), 4249.

6.

Wake, W. C. (2003). INVEST in good stories and SMART tasks . Retrieved December 29, 2005. From http://xp123.com/xplor/xp0308/index.shtml .

7.

IEEE Standard, IEEE-Std 610. (1990).

8.

Robertson, S., & Robertson, J. (2012). Mastering the requirements process: Getting requirements right . MA: Addison-wesley.

9.

Kotonya, G., & Sommerville, I. (1998). Requirements engineering: Processes and techniques . New York: Wiley.

10.

Sutcliffe, A. (2002). User-centred requirements engineering . Berlin: Springer.

11.

Mylopoulos, J., Chung, L., & Yu, E. (1999). From object-oriented to goal-oriented requirements analysis. Communications of the ACM, 42 (1), 3137.

12.

Zave, P. (1997). Classi cation of research efforts in requirements engineering. ACM Computing Surveys (CSUR), 29 (4), 315321.

13.

van Lamsweerde, A. (2000, June). Requirements engineering in the year 00: A research perspective. In Proceedings of the 22nd International Conference on Software Engineering (pp. 519). New York: ACM.

14.

Nuseibeh, B., & Easterbrook, S. (2000, May). Requirements engineering: A roadmap. In Proceedings of the Conference on the Future of Software Engineering (pp. 3546). New York: ACM.

15.

Yu, E. S. (1997, January). Towards modelling and reasoning support for early-phase requirements engineering. In Proceedings of the Third IEEE International Symposium on Requirements Engineering (pp. 226235). IEEE.

16.

Goguen, J. A., & Linde, C. (1993). Techniques for requirements elicitation. Requirements Engineering, 93, 152164.

17.

Suchman, L., & Jordan, B. (1990). Interactional troubles in face-to-face survey interviews. Journal of the American Statistical Association, 85(409), 232241.

18.

Hickey, A., & Davis, A. (2003). Barriers to transferring requirements elicitation techniques to practice. In 2003 Business Information Systems Conference .

19.

Lefngwell, D., & Widrig, D. (2000). Managing software requirements . MA:

Addison-Wesley.

20.

Davis, G. B. (1982). Strategies for information requirements determination. IBM Systems Journal, 21, 4 30.

21.

Burton, A. M., Shadbolt, N. R., Rugg, G., & Hedgecock, A. P. (1990). The efcacy of knowledge elicitation techniques: A comparison across domains and levels of expertise. Journal of Knowledge Acquisition, 2, 167178.

22.

Lapouchnian, A. (2005). Goal-oriented requirements engineering: An overview of the current research. University of Toronto.

23.

Ant ón, A. I. (1996, April). Goal-based requirements analysis. In Proceedings of the Second International Conference on Requirements Engineering (pp. 136144). IEEE.

24.

van Lamsweerde, A. (2004, September). Goal-oriented requirements engineering: A roundtrip from research to practice engineering. In Requirements Engineering Conference, 2004. Proceedings. 12th IEEE International (pp. 4 7). IEEE.

References

17

26. Pohl, K. (2010). Requirements engineering: Fundamentals, principles, and techniques . Berlin: Springer.

27. van Lamsweerde, A. (2001). Goal-oriented requirements engineering: A guided tour. In Proceedings. Fifth IEEE International Symposium on Requirements Engineering, 2001 (pp. 249262). IEEE.

28. Haumer, P., Pohl, K., & Weidenhaupt, K. (1998). Requirements elicitation and validation with real world scenes. IEEE Transactions on, Software Engineering, 24(12), 1036 1054.

29. Ant ón, A. I., & Potts, C. (1998, April). The use of goals to surface requirements for evolving systems. In Proceedings of the 1998 International Conference on Software Engineering, 1998 (pp. 157166). IEEE.

30. Horkoff, J., & Yu, E. (2010). Interactive analysis of agent-goal models in enterprise modeling. International Journal of Information System Modeling and Design (IJISMD), 1(4), 123.

31. Horkoff, J., & Yu, E. (2012). Comparison and evaluation of goal-oriented satisfaction analysis techniques. Requirement Engineering Journal, 124.

32. Matulevi čius, R., & Heymans, P. (2007). Comparing goal modelling languages: An experiment. Requirements engineering: Foundation for software quality (pp. 18 32). Berlin Heidelberg: Springer.

33. Castro, J., Kolp, M., & Mylopoulos, J. (2002). Towards requirements-driven information systems engineering: The Tropos project. Information Systems, 27(6), 365389.

34. Sutcliffe, A. G., Maiden, N. A., Minocha, S., & Manuel, D. (1998). Supporting scenario-based requirements engineering. IEEE Transactions on Software Engineering, 24 (12), 1072 1088.

35. Holbrook, H., III. (1990). A scenario-based methodology for conducting requirements elicitation. ACM SIGSOFT Software Engineering Notes, 15(1), 95104.

36. van Lamsweerde, A., & Willemet, L. (1998). Inferring declarative requirements speci cations from operational scenarios. IEEE Transactions on Software Engineering, 24(12), 1089 1114.

37. Plihon, V., Ralyte, J., Benjamen, A., Maiden, N. A., Sutcliffe, A., Dubois, E., & Heymans, P. (1998). A reuse-oriented approach for the construction of scenario bases methods. In Proceedings of International Conference on Software Process (pp. 1 16).

38. Liu, L., & Yu, E. (2004). Designing information systems in social context: A goal and scenario modelling approach. Information Systems, 29(2), 187 203.

39. CREWS Team. (1998). The CREWS glossary, CREWS report 98-1. http://SUNSITE. informatik.rwth-aachen.de/CREWS/reports.htm .

40. Pohl, K., & Haumer, P. (1997, June). Modelling contextual information about scenarios. In Proceedings of the Third International Workshop on Requirements Engineering: Foundations of Software Quality REFSQ (Vol. 97, pp. 187204).

41. Cockburn, A. (1997). Structuring use cases with goals, 1997. Journal of Object-Oriented Programming.

Chapter 2

Requirements Engineering for Data Warehousing

Whereas in the last chapter we considered transactional systems, in this chapter we consider data warehouse systems. The former kind of systems deals with delivering system functionality in the hands of the user. However, the latter does not carry out any action. Rather such systems supply information to their users who are decision-makers, so that they could take appropriate decisions. These decisions are made after decision-makers carry out suitable analysis of the information retrieved from the data warehouse. In this chapter, we consider the manner in which data warehouses are developed. In Sect. 2.1 , we provide a brief background of data warehousing so as to form the basis for the rest of the chapter. In Sect. 2.2 , we look at some studies that bring out the problems experienced in developing data warehouses. The Systems Development Life Cycle, SDLC for Data warehouses, DWSDLC, is the subject of Sect. 2.3 . The methods that can be used to realize this life cycle are presented in Sect. 2.4 . These methods are the monolithic, top-down approach, data mart approach, and the agile approach. The problem of consolidation that arises in the data mart and agile approaches is considered thereafter in Sect. 2.5 . Success of a data warehouse project is crucially dependent on its alignment with the business environment in which it is to function. It is important therefore to involve business people in planning and laying out a roadmap for data warehouse roll-out. We identify, in Sect. 2.6 , the several factors that go into making good alignment and highlight the role of requirements engineering. Thereafter, in Sect. 2.7 we present a survey of data warehouse requirements engineering techniques.

2.1 Data Warehouse Background

There are two perspectives to data warehousing, the organizational and the tech- nological. From the organizational standpoint, data warehouse technology is for providing service to the organization: it provides Business Intelligence, BI. The

© Springer Nature Singapore Pte Ltd. 2018 N. Prakash and D. Prakash, Data Warehouse Requirements Engineering ,

https://doi.org/10.1007/978-981-10-7019-8_2

19

20

2 Requirements Engineering for Data Warehousing

Data Warehouse Institute considers BI in three parts, namely, data warehousing, tools for business analytics, and knowledge management. The value of BI [ 1 ] is realized as pro table business action. This means that BI is of little value if knowledge that can be used for pro table action is ignored. Conversely, if dis- covered knowledge is not realized into a value-producing action, then it is of little value. Thus, managers should be able to obtain the speci c information that helps in making the optimal decision so that speci c actions can be taken . It follows that Business Intelligence [ 1 ] incorporates the tools, methods, and processes needed to transform data into actionable knowledge. Turning now to the technological point of view, the classical de nition of a data warehouse was provided by Inmon, according to which a data warehouse is a subject-oriented, integrated, time variant, non-volatile collection of data for supporting management s decisional needs. Another view is to look upon a data warehouse as the data, processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable business information to authorized individuals for effective decision-making. According to this view, a data ware- house is not just a storehouse of data but is an environment or infrastructure for decision-making. The central difference between a data warehouse and a database lies in what they aim to deliver. The former supports Online Analytical Processing, OLAP, whereas the latter is for Online Transaction Processing, OLTP. A database contains in it data of all transactions that were performed during business operations. Thus, for example, data of every order received is available in the database. If modi cation of the order occurred, then the modi ed data is available. In this sense, a database is an image, a snapshot of the state of the business at a given moment, T. Though it is possible that data in databases is speci cally timestamped to keep historical data, nor-

mally databases do not maintain historical data but re ect data at current time T only. In contrast, the purpose of the data warehouse is to provide information to facilitate making a business decision. Interest is in analyzing the state of the business at time t (this may include current data at t as well as historical data) so as to determine what went wrong and needs correction, what to promote, what to optimize and, in general, to decide how to make the business perform better. The state of the business lies in the collection of data sources of the business, the several databases,

les, spreadsheets, documents, emails, etc. at time t . Therefore, a data warehouse at

t is a collection of all this information. Unlike a database that is updated each time a new transaction occurs, a data warehouse is refreshed at well-de ned refresh times. Thus, the data warehouse does not necessarily contain the current state of the business, that is, t may be older than current time t. In other words, it is possible that data in the data warehouse may be older than that currently contained in the indi- vidual data sources of the business. For purposes of decision-making, this is toler- able so long as business events between t and tdo not make decisions based on data at t irrelevant. This difference between a database and a data warehouse makes the usual create, update, and delete operations of a database largely irrelevant to a data warehouse. The traditional relational read is also found to be very restrictive, and a more

2.1

Data Warehouse Background

21

versatile way of querying the data warehouse is needed. In other words, we need a different model of data than the database model. This OLAP model enables data to be viewed and operated upon to promote analysis of business data.

A data warehouse provides a multidimensional view of data. Data is viewed in

terms of facts and dimensions, a fact being the basic data that is to be analyzed,

whereas dimensions are the various parameters along which facts are analyzed.

Both facts and dimensions have their own attributes. Thus, sales data expressed as number of units sold or in revenue terms (rupees, dollars) is basic sales data that can be analyzed by location, customer pro le, and time. These latter are the dimensions. The n -dimensions provide an n -dimensional space in which facts are placed. A three-dimensional fact is thus represented as a cube (see Fig. 2.1 ). The X- , Y- , and Z-axes of the cube represent the three dimensions, and the cells of the cube contain facts. For our example, the three axes correspond to location, customer pro le, and time, respectively. Each cell in the cube contains sales data, i.e., units sold or revenue. Of course facts may have more than three dimensions. These form hypercubes but often in data warehouse terminology, the words cubes and hyper- cubes are used interchangeably.

It is possible for attributes of dimensions to be organized in a hierarchy. For

example, the attributes month, quarter, half year, and year of the dimension time form a hierarchy. Monthly facts can be aggregated into quarterly, half-yearly, and yearly facts, respectively. Such aggregations may be computed on the y or, once

computed, may be physically materialized. In the reverse direction, one can obtain

ner grained information by moving from yearly facts to half-yearly, quarterly, and monthly facts, respectively. The multidimensional structure comes with its own operations for Online Analytical Processing, OLAP. These operations are as follows:

Fig. 2.1 A three-dimensional fact

own operations for Online Analytical Processing, OLAP. These operations are as follows: Fig. 2.1 A three-dimensional

22

2 Requirements Engineering for Data Warehousing

Roll up: aggregating by moving up the hierarchy as described above.

Drill down: inverse of roll up. As described above, this operation is for obtaining ner data from aggregated/rolled up data.

Pivot or Rotate: in order to visualize the faces of a cube, the cube is rotated in space.

Slice: generates a cube that has one fewer dimension than the original cube. So, for one dimension, a single value is chosen thus creating a rectangular, two-dimensional, subset of the original cube.

Dice: here, speci c values across multiple dimensions are picked yielding a sub-cube.

Drill across: this operation allows obtaining data from two different cubes that have common (conformed) dimensions.

Designers of multidimensional structures need knowledge of the information that is of interest in the organization. This information is then converted into facts and dimensions that are expressed in the model adopted by the data warehouse package (Cognos, SQL server, Hyperion, etc.) used for implementation. Finally, the physical structure of the data warehouse, the partitions, etc. needs to be de ned. Broadly, we have three steps, conceptual design for developing a conceptual schema of the needed information, logical design for obtaining and representing facts and dimensions, and physical design for de ning the physical properties of the data warehouse. These three phases are similar to the corresponding phases in databases: conceptual design refers to a semantic abstraction that results in, for example, entity relationship schema of the database, logical design refers to rep- resentation as relations, and physical design to the indices, buffers, etc. required for an optimal physical layout of the database. A major difference between data warehouse development and database devel- opment is due to the need in the former to draw data from disparate data sources. This is done in the Extraction Transformation and Loading, ETL, step where data is taken from the different sources, standardized, any inconsistencies removed, and thereafter the data is brought into multidimensional form. This cleaned up data is then loaded in the data warehouse. Though data preparation, entry, and initially populating the data are important in databases, it is nowhere near as complex as the ETL process is. As we will see, the presence of the ETL step has an important bearing on data warehouse project management.

2.2 Data Warehouse Development Experience

Data warehouses have been developed for the last 25 years or so. The broad learning is that data warehouse projects are complex, time-consuming, and expensive. They are also risky and have a high propensity to fail. Often, they do not meet expectations and have a poor record of delivering the promised products. As a result, there are issues concerning data warehouse project management as well as around DWSDLC.

2.2

Data Warehouse Development Experience

23

A number of studies have found that data warehouse projects are expensive in

nancial terms as well as in terms of the effort required to deliver them. In [ 2 ], we

nd a number of speci c indicators:

One company hired the services of a well-quali ed systems integrator. It cost two million USD and after 3 years time they got a single report but not a working system.

Having bought a ready to use data model, another company found that it needed 2 years to customize the model to their needs and 3 years to populate it with their data. Only after that would they get their rst outputs.

Yet another company required 150 people for over 3 years and the project got so expensive as to hurt their share price.

Aside from being expensive, data warehouse projects are risky. Ericson [3 ] cites a survey showing that data warehouse projects whose cost averaged above $12M failed 65% of the time. Hayen et al. [ 4 ] refer to studies that indicate the typical cost of a data warehouse project to be one million dollars in the very rst year and that one-half to two-thirds of most data warehouse projects fail. Loshin [1 ] points out that data warehouse projects generate high expectations but bring many disappointments. This is due to failure in the way that DW projects are taken from conception through to implementation. This is corroborated by [ 2 ] who concludes that the lessons learnt from data warehouse projects covered the entire development life cycle:

Requirements : The business had little idea of what they wanted because they had never experienced a data warehouse. Further, requirements gathering should have been done better.

Design : Design took a very long time and con icting de nitions of data made designs worthless.

Coding : It went slow, testing came under pressure, and crucial defects were not caught.

The implications of the foregoing were deviations from initial cost estimates, reduced number and value of the delivered features, and long delivery time. The conclusion of Alshboul [ 5 ] is that one of the causes of data warehouse project failure is inadequate determination of the relationship of the DW with strategic business requirements. From the foregoing, we observe in Table 2.1 that there are three main causes of data warehouse project failure. The rst is inadequate DW-business alignment. It is necessary to ensure that the data warehouse brings value to the organization. This is possible if the data warehouse delivers information relevant to making business decisions. The second issue is that of requirements gathering. In its early years, data warehouse requirements engineering was de-emphasized. Indeed, requirements were the last thing to be discovered [ 6 ]. However, considerable effort has been put in the last 15 years or so to systematize requirements engineering. Starting from

24

2 Requirements Engineering for Data Warehousing

Table 2.1 Failure and its mitigation strategies

Failure cause

Mitigating failure

Inadequate alignment of DW with business needs

Relate information to business decisional needs

Requirements gathering

Systematize and broad base the requirements gathering process

Project delivery

Reduce long delivery time

initial, ad hoc techniques we have seen the emergence of model-driven require- ments engineering. Finally, we have the issue of project delivery and long delivery times. We need to move beyond the waterfall model. Stage-wise delivery does not allow a down- stream DWSDLC activity to be completed till upstream activities of the develop- ment life cycle are completed. In large development warehouse projects, this leads to unacceptable delays in product delivery. We need a development method that produces a steady stream of product deliverables as early as possible from the time the project starts.

2.3 Data Warehouse Systems Development Life Cycle, DWSDLC

The activities that are to be performed in order to develop a data warehouse are laid out in the DWSDLC. The manner in which these activities are performed is de ned in process models. For example, in the waterfall model, the activities comprising the SDLC are performed in a linear manner: an activity is performed when the previous one is completed. It is possible to do development in an iterative and incremental manner in which case this strict linear ordering is not followed. It is interesting to see the evolution of DWSDLC over the years. In its early years, data warehouse development was largely concerned with implementation issues. Therefore, upstream activities of conceptual design and requirements engineering were de-emphasized. Data warehouse development started from anal- ysis of data in existing databases. The nature of this analysis was largely experience based and data warehouse developers used their expertise to determine the needed facts and dimensions. This was viewed in [ 7 ], in terms of the requirements, con- ceptual design, and construction stages of the DWSDLC. In the requirements stage, facts and preliminary workload information are obtained starting from a database schema. Subsequently, in conceptual design, the database schema, workload, and facts are all used to obtain the dimensional schema. Notice that there is no real conceptualization of the data warehouse as no con- ceptual schema is built. The process of identifying dimensions is not apparent and seems to rely on designer insight and understanding of information requirements of

2.3

Data Warehouse Systems Development Life Cycle, DWSDLC

25

the decision-maker. The nature of the requirements engineering activity is also rather ad hoc. Thus, the manner in which facts and workload are obtained, the underlying modeling and systematization of this process, is not articulated. Little effort is made to go beyond existing database schema and no real enquiry to identify any other needed information is launched. Finally, there is no assurance that the resulting data warehouse does, indeed, address decision-making needs. This is because not much effort is put into interacting with decision-makers to determine their information needs.

Conceptual Design Stage

The next step in SDLC evolution was the introduction of a conceptual design stage. The argument was that conceptual schemas like the ER schema could be good starting points since they captured the entire information to be kept in the data warehouse. This could then be analyzed together with domain experts to determine interesting measures, dimensions, and initial OLAP queries. Thereafter, a multi- dimensional schema is built that is taken into the construction phase. The approach of Hü semann et al. [ 8 ] produces measures and dimensional attributes of the data warehouse from ER schemas of databases. The E/R schema is assumed to be available to the data warehouse designer. If not, then it can either be reverse engineered by using various tools/algorithms or built afresh. The domain expert picks up the relevant operational attributes needed for multidimensional analysis and speci es the purpose of using them in a tabular form. The tabulation of the dimensional attributes obtained is then converted into a dimensional schema. This schema in a graphical form comprises fact schemas along with their dimension hierarchies and fact attributes or measures. There are two main drawbacks with this technique. First, there is no de ned method that helps the designer in identifying the facts, dimensions, and measures. Similarly, determining aggregates requires designers to be experienced and skilled. The absence of guidance in these tasks means that the process is essentially an art. The second method [9 ] based on the ER diagram also derives data warehouse structures from ER schemas. As before, if an ER schema does not exist, then either

it is to be developed or reverse engineered from existing databases. The proposal is that entities are classied as transaction entities, component entities, or classi ca- tion entities. Transaction entities form fact tables. Component entities and classi-

cation entities form dimension tables and answer the who , what , when ,

where , how , and why of business events. All hierarchies that exist in the data model are identi ed. In the nal stage, dimensional models are produced from the identi ed entities. This second method also offers no guidance in selecting which transaction entities are important for decision-making and therefore become facts. A precedence hierarchy for resolving ambiguities that arise during classifying entities has been de ned. But again no guidance in terms of an algorithm or method has been provided.

26

2 Requirements Engineering for Data Warehousing

Golfarelli et al. [ 10 ] developed a process that can be used for converting an ER diagram into multidimensional form. This is partially automated and requires developers to bring additional knowledge to decide on the nal multidimensional structure. Moving to the conceptual design stage did present one major advantage over database schema-based approaches. This move was based on the argument that the process of discovery of data warehouse concepts should be rooted in an analysis of the conceptual schema. Thus, it provided a foundation for obtaining facts, dimensions, etc. It contributed to somewhat de-mystifying this process. Conceptual schema/ER-driven techniques have been criticized on several grounds:

Limited data: If reverse engineered from operational databases, then the infor- mation carried by ER schemas is limited to that in the database schema. It is difcult to identify sources that are both external and other internal sources [ 7 ].

Modeling de ciencies: ER schemas are not designed to model historical information as well as aggregate information, both of which are so important in data warehousing.

Ignoring the user: ER-based techniques do not give primary importance to the users perspective [ 10 , 11 ]. As a result, the DW designer ends up deciding on the relevance of data but this decision should be taken by the user and not by designers.

The Requirements Engineering Stage

The introduction of the requirements engineering stage in the DWSDLC addressed the concerns raised in conceptual schema-driven techniques. A clear effort was made to take into account needs of stakeholders in the data warehouse to-be. The de nition of multidimensional structures was based on understanding of business goals, business services, and business processes. This understanding was gained by interaction with decision-makers. Thus, the context of the data warehouse was explored and the requirements of the data warehouse to-be were seen to originate in the business. Since ab initio investigation into decision-maker needs is carried out in the requirements engineering stage, existing data sources and/or their concep- tualization in conceptual schemas did not impose any limitations. Rather, the determined requirements could use data from existing data sources or come up with completely new data not available in these sources. With the introduction of the requirements engineering stage in DWSDLC, there is today no difference between the stages of the TSDLC of transactional systems and stages of the DWSDLC. However, the tasks carried out in these stages are different. This is brought out in Table 2.2 . It is important to notice that the problem of data warehouse requirements engineering, DWRE, is that of determining the information that shall be contained in the data warehouse to-be. On the other hand, requirements engineering for transactional systems, TRE, aims to identify the needed functionality of the

2.3

Data Warehouse Systems Development Life Cycle, DWSDLC

27

Table 2.2 SDLCs of transactional and data warehouse systems

SDLC stage

Transactional systems

Data warehouse systems

Requirements

De ning system functionality

Dening needed information

engineering

Conceptual

Building the conceptual schema: structural, behavioral, functional models

Building structural

design

schema

Construction

Implementing the functions

Implementing the

multidimensional

structure

transactional system to-be. This functionality is eventually to be implemented in the construction stage of the TSDLC, for which a suitable design must be produced in the conceptual design stage. This design is expressed in structural, behavioral, and functional models and typically expressed in UML notation. In contrast, in the construction stage of the DWSDLC, the determined information must be collected from disparate sources, standardized and made consistent, and integrated together to populate the multidimensional schema. For this, a multidimensional structure must be obtained. It is traditional to focus on this structural aspect, and consequently, expression in behavioral and functional models is de-emphasized. There are two ways in which the DWSDLC supports determination of multi- dimensional structures. The rst is (see Fig. 2.2 a) to consider the requirements engineering stage as not only producing the required information but also the facts and dimensions. In this way, there is no need for the conceptual schema in the form of an ER schema to be built. Instead, a direct translation to facts/dimensions is done. The second way (shown in Fig. 2.2 b) is to build the conceptual schema and then use any of the methods for obtaining facts/dimensions proposed in ER-driven approach outlined earlier. We will see requirements engineering techniques of these two kinds in the next section. The possibility of skipping over the conceptual design stage is a major variation between TSDLC and DWSDLC.

(a)

(b)

Requirements Engineering Requirements Engineering Conceptual Design Conceptual Design Construc on Construc on
Requirements Engineering
Requirements Engineering
Conceptual Design
Conceptual Design
Construc on
Construc on

Fig. 2.2 a Bypassing conceptual stage, b using all stages

28

2 Requirements Engineering for Data Warehousing

2.4 Methods for Data Warehouse Development

Methods for developing data warehouses need to go through the stages of the DWSDLC. There are two possibilities, to traverse the DWSDLC breadth rst or depth rst. The breadth- rst approach calls for the three stages in the DWSDLC to be done sequentially, construction after conceptual design after requirements engineering. The depth- rst approach breaks down the task of data warehouse development into small pieces or vertical slices, and the DWSDLC is followed for each slice produced.

2.4.1 Monolithic Versus Bus Architecture

Breadth- rst traversal can be done based on two different assumptions. The rst assumption is that the deliverable is the complete data warehouse. Hence, requirements of the entire data warehouse must be identied; the multidimensional model for the enterprise must be designed and then taken into implementation. Thereafter, speci c subject-oriented data marts are de ned so as to make appro- priate subsets of the data warehouse available to speci c users. This is shown in Fig. 2.3 where sales, purchase, and production data marts are built on top of the enterprise-wide data warehouse. De ning data marts in this manner is analogous to construction of subschemas on the schema of a database. The main idea in both is to provide a limited view of the totality, limited to that which is relevant to speci c users. This approach of constructing the monolithic data warehouse follows the waterfall model; each stage of the DWSDLC must be completed before moving to the next stage. This implies that lead time in delivering the project is very high. There is danger that the requirements might change even as work is in progress. In

Sales Purchase Produc on Data Warehouse
Sales
Purchase
Produc on
Data Warehouse

Fig. 2.3 Monolithic development

2.4

Methods for Data Warehouse Development

29

Purchase Conformed Dimensions Sales Produc on
Purchase
Conformed Dimensions
Sales
Produc on

Fig. 2.4 The bus architecture

short, monolithic development is prone to all problems associated with the waterfall model of development. However, the likely bene t from the waterfall model is that it could produce a long-lasting and reliable data architecture. A different process model results if the assumption of delivering the full data warehouse is relaxed. Rather than building the entire monolithic data warehouse, this approach calls for rst building data marts and then integrating them by putting them on a common bus. This bus consists of conformed dimensions, dimensions that are common across data marts, and therefore allow the drill across operation to be performed. Consequently, data held in different data marts can be retrieved. This approach is shown in Fig. 2.4 . Data marts are built independently of one another. Since the size of a data mart is smaller than the entire data warehouse, the lead time for release is lesser. Therefore, business value can be provided even with the release of the rst data mart. Freshly built data marts can then be added on to the bus. Thus, the data warehouse consists of a number of integrated, self-contained data marts rather than a big centralized data warehouse. Evidently, the bus approach promotes iterative and incremental development and no complete plan is required upfront. The risks are that data marts may contain missing or incompatible measures and dimensions contain replicated data and display inconsistent results. The success of the bus architecture is crucially dependent on conforming facts and dimensions. Thus, if one data mart contains product information in number of cases shipped and another keeps product information as units sold, then moving across these data marts yields incompatible information. Such facts must be con- formed, by keeping, along with shipping data, unit data as well. This allows units shipped to be compared with units sold. Dimensions need to be conformed too. If one data mart has attributes day, quarter, and year for the dimension time and another has day and month half year, then drill across becomes difcult. The dimension attributes must be made to conform and the lowest granularity attribute kept in both the dimensions. The product information must also be available on a daily basis in our example. There are two possibilities for conforming dimensions. The rst is to do it on the y , as each new data mart is added on to the bus. This may involve reworking existing data marts to make them conform. This can be adopted so long as the effort

30

2 Requirements Engineering for Data Warehousing

to bring about conformity is within limits and does not offset the bene ts involved in doing early release. When this boundary is crossed, then attention must be paid to designing for conformity. This means that the bus of conformed dimensions must be determined either all upfront in the waterfall model style or enough investigation should be carried out to get a modicum of assurance that the bus is well de ned. The trade-off between the two is apparent, delayed release versus the risk of rework.

2.4.2 Data Warehouse Agile Methods

Iterative and incremental development that forms the basis for the bus architecture is at the core of agile methods. Indeed, agility has been extended to data warehouse development as well. However, agile methods for data warehousing, which we refer to as DW-agile methods, differ from agile methods for transactional systems or T-agile methods. We bring these out by considering two DW-agile methods.

Using Scrum and User Stories

In Hughes [ 2 ], we see the adoption in DW-agile methods of notions of sprint and user stories of T-agile methods. Recall that user stories are not complete require- ments speci cations but identify the needs with the details left to be discovered as the sprint progresses. De ning user stories is an art and de ning good stories requires experienced story writers. Story writing follows the epic theme story trajectory and the INVEST test is applied to test if a story is appropriately de ned or not. Over several decades, in order for teams to better author user stories, agile practitioners have devised a number of strategies and tools. Since a user story aims to answer the who, ” “ what, and why of a product; a more detailed examination of these components is suggested. Strategies like user role modeling, vision boxes, and product boards have also been devised. Finally, Hughes also introduced the T-agile roles of product owner, Scrum master, and the development team in DW-agile methods. The point of real departure in DW-agile methods is reached when de ning sprints for doing data integration. During this stage, data from disparate sources is brought together in a (a) staging area, (b) integrated, (c) converted into dimensional form, and (d) dashboards are built. This is to be done for each fact and dimension comprising the multidimensional schema. Thus, we get four sprints, one each for (a) to (d): one sprint that does (a) for all facts and dimensions of the schema, another that does (b), and so on, for all the four stages. If we now ask the question, what is the value delivered by each sprint and to whom, then we do not get a straightforward answer. Indeed, no business value is delivered at the end of sprints for (a) to (c) to any stakeholder. The only role aware that progress in the task of delivering the data warehouse is being made is the product owner but this role is not the end user. The Data Warehouse Business Intelligence, DWBI, reference data architecture shown in Fig. 2.5 makes the foregoing clearer. This architecture separates DWBI

2.4

Methods for Data Warehouse Development

31

data and processes into two layers, back end and front end. Within the back end part, we have sub-layers for staging, integration, and presentation sub-part relevant to integration, whereas the front end layer comprises the presentation sub-layer interfaces to the sematic sub-layer, the semantic sub-layer as well as the dashboard sub-layer. Delivering a dashboard requires the preceding four layers to be delivered and can be likened to delivery of four applications rolled into one. Delivering such a large application is unlikely to be done in a single sprint of a few weeks in duration and needs to be broken down into sub-deliverables. To deal with this, Hughes introduces the idea of developer stories. A developer story is linked to a user story and is expressed in a single sentence in the who what why form of user stories. However, these stories have the product owner as the end user and are de ned by the development team. A developer story provides value to the product owner and is a step in delivering business value to the stakeholder. It de nes a sprint. Developer stories must pass the DILBERT S test;

Fig. 2.5 The DWBI reference data architecture

de fi nes a sprint. Developer stories must pass the DILBERT ’ S test; Fig. 2.5

32

2 Requirements Engineering for Data Warehousing

they should be demonstrable, independent, layered, business valued, estimable, re nable, testable, and small. Large developer stories are made smaller by decomposing them. The difference between user stories and developer stories lies in the Demonstrable, D, and Layered, L, elements of the DILBERTS test. The other elements of DILBERT S test can be mapped to elements of the INVEST test for user stories. Let us consider the meanings of these two as follows:

1. Demonstrable indicates that at the end of a sprint for the developer story, the development team should demonstrate the result achieved to the product owner. There are several bene ts of this demonstration. For example, involvement of the product owner could enable a check on the quality of the data in source systems and an assessment of whether the staged data is adequate for business decision-making. Similarly, transformations and dashboards could be checked out. Equally importantly, Demonstrability provides assurance to the product owner that the development team is making progress. If required, the product owner could even obtain some operations relevant to the business to convince end users about this progress and also obtain feedback.

2. Layered: Each developer story must show progress in only one layer of the DWBI reference data architecture. This promotes independence of developer stories (the I in DILBERT S).

Introduction of developer stories requires a number of additional roles, other than the three roles of product owner, Scrum master, and development team. These are as follows:

Project architect: This role is for conceptualizing the application and commu- nicating it to both business stakeholders as well as to technical people. The job involves relating source data to target data in a presentation layer and for- mulating the major functions of the dashboards.

Data architect ensures that the semantics of the data are clear, manages the data models of the various layers, implements normalization, etc.

Systems analyst: Starting from user stories, determines the transformations of source data required to meet business needs. In doing so, the systems analyst will need to look at developer stories to determine the transformations across the multiple layers of the DWBI reference architecture. This role may also need to work with the data architect to de ne any integrity constraints that must be satis ed by the data before it is accepted into the next layer.

Systems tester: To ascertain if the build is correct and complete. This is done on a daily basis, at the end of each iteration and when a release is issued. It is normally done at the end of each day.

2.4

Methods for Data Warehouse Development

33

In the DW-agile method considered above, the issue of how the conformed bus is built is not addressed. Presumably, it is to be built on the y since no provision has been made in the method to build the bus.

Using Agile Data Modeling: Data Stories

Yet, another approach to developing data warehouses in an agile manner is that of Business Event Analysis and Modeling, BEAM* [ 12 ]. This method is based on Agile Data Modeling. The argument behind using Agile Data Modeling is that techniques like Scrum and user stories will improve BI application development but only once the data warehouse is already in position. However, not much guidance is available in such techniques for developing the data warehouse per se. Therefore, we must move towards building the dimensional models in an agile manner. This is where the role of Agile Data Modeling lies. Agile Data Modeling [ 13 ] is for exploring data-oriented structures. It provides for incremental, iterative, and collaborative data modeling. Incremental data modeling refers to availability of more requirements when they are better under- stood or become clear to the stakeholder. The additional requirements are obtained on the y when the developer needs them for completing the implementation task at hand. Iterative data modeling emphasizes reworking to improve existing work. As requirements become better understood and as need for changing data schemas is felt, correcting errors, including missing information just discovered and other such rework, referred to as refactoring in the data warehouse community, is carried out. Collaborative data modeling calls for close interaction between the devel- opers and stakeholders in obtaining and modeling data requirements. Thus, it moves away from merely eliciting and documenting data requirements with

stakeholder participation but also includes stakeholder participation in modeling of data. BEAM* uses the notion of data stories that are told by stakeholders to capture data about business events that comprise business processes. These data stories are answers to seven types of questions about events and each answer provides a fact or dimension of the multidimensional schema. These questions, called 7W, are (1) Who is involved in the event? (2) What did they do? To what is it done? (3) When did it happen? (4) Where did it take place? (5) Why did it happen? (6) How did it happen in what manner? (7) How many or much was recorded how can it be measured? Out of these, the rst six supply dimensions whereas the last one supplies facts. As an example, the event, order delivered, can have three who -type dimensions, namely, Customer, Product, Carrier and two when -type dimensions, order date and shipment date. Even as facts and dimensions are being discovered, stakeholder developer interaction attempts to make them conform. The key issue is ensuring that identi-

cation of conformed facts and dimensions is done in an agile manner. To do this

an event matrix is built. This matrix has business events as rows and dimensions as columns. There is a special column in the matrix, labeled as Importance that contains a number to show the importance of the event. Associated with each row is

34

2 Requirements Engineering for Data Warehousing

Table 2.3 The event matrix

 

Importance

Dimension 1

Dimension 2

Dimension i

Dimension n

Importance

 

610

640

650

430

Event 1

700

       

Event 2

600

       

Event j

500

       

Event m

200

       

an indication of the importance of the event of the row and similarly a row labeled Importance contains the importance of the dimension (Table 2.3 ). When associating dimensions with events, the product owner initiates discus- sions to make sure that there is agreement on the meaning of dimensions across the different events to which these are applicable. As a result, conformed dimensions are entered into the event matrix. It is possible to follow the waterfall model and build the event matrix for all events in all processes in the organization. However, agility is obtained when just enough events have been identi ed so as to enable de ning the next sprint. Further, a prioritization of the backlog on the basis of the Importance value is done. Thus, the event matrix is the backlog. Events have event stories associated with them. Since conformed dimensions have already been identi ed, it is expected that event stories will be written using these. The expression of events is as a table whose attributes are (a) speci c to the event and (b) are the conformed dimensions already obtained from the event matrix. The table is lled in with event stories; each event story is a row of the event table. An event table is lled in with several event stories so as to ensure that all stake- holders agree on the meaning of each attribute in the event table. If there is no agreement, then attributes that are homonyms have been discovered and separate attributes for each meaning must be de ned. Reports that are desired by stakeholders are captured in report stories. These stories are taken further to do data pro ling and then on to development in a sprint. This is the BI application aspect of data warehouse development.

2.5 Data Mart Consolidation

As seen in the previous section, there are essentially two kinds of data marts as follows:

Dependent data marts: These are built from an already operational data ware- house and so data for the data mart is extracted directly from the latter. Therefore, such data marts have data which has already been integrated as part of developing the data warehouse. Data in a dependent data mart will also, quite naturally, be consistent with the data in the enterprise data warehouse. The

2.5

Data Mart Consolidation

35

enterprise data warehouse represents the single version of the truth and these data marts comply with this.

Independent data marts: Developed independently from the enterprise data warehouse, these are populated with data often directly from an application, an OLTP database or operational data sources. Consequently, data is not integrated and is likely to be inconsistent with the data warehouse. Independent data marts are built by several different teams using technologies preferred by these teams. Therefore, there is a proliferation of tools, software, hardware, and processes. Clearly, the foregoing happens if conformity across data marts is handled on the

y. Notice, however, that this happens even if consolidation is designed for as in BEAM* because post-design, data marts are developed independently and independent teams work on the several data marts.

As already discussed, building independent data marts results in early delivery. This mitigates two pressures that development teams are under, (a) meet infor- mation needs early and (b) show that nancial investment made is providing returns. As a result, there is great momentum behind building data marts as and when needed, with minimal concern for the enterprise data warehouse. Since departments gain early bene t from this, data mart proliferation has come to be widely accepted. Further, since data marts are developed taking only departmental requirements into account, they facilitate departmental control and better response times to queries. However, the downside of this is that independent data marts lead to the creation of departmental data silos [ 14 16 ]. That is, data needs of individual departments are satis ed but the data is not integrated across all the departments. This leads to data having inconsistent de nitions, inconsistent collection and update times, and difcult sharing and integration. Data mart proliferation raises a number of issues as follows:

A large number of data marts imply increased hardware and software costs as well as higher support and maintenance costs.

Each data mart has its own ETL process and so there are several such processes in a business.

Same data existing in a large number of data marts leads to redundancy and inconsistency between data.

There is no common data model. Multiple data de nitions, differing update cycles, and differing data sources abound. This leads to inconsistent/inaccurate reports and analyses.

Due to lack of consistency between similar data, it could happen that decision-making is inaccurate or inconsistent.

Data mart proliferation can be a drain on company resources. Industry surveys [ 14 ] show that the number of data marts maintained by 59% of companies is 30. There are companies that maintain 100 or more data marts. Maintenance of a single data mart can cost between $1.5 million and $2 million annually. Out of these costs, 35 70% are redundant costs.

36

2 Requirements Engineering for Data Warehousing

The foregoing implies that there is a tipping point beyond which independent data mart proliferation becomes very expensive. Beyond this stage, an enterprise-wide data warehouse supporting dependent data marts can meet demand better. This is because such a data warehouse has enterprise scope, and therefore (a) supports multiple work areas and applications across the business, and (b) has consistent de nitions of the data. The dependent data mart approach enables faster delivery than building yet another independent data mart does because new applications can leverage on the data in the data warehouse. It follows that at this tipping point, consolidating the disparate data marts together starts to create value. Data mart consolidation involves building a centralized enterprise data ware- house (EDW). Data from multiple, disparate sources is centralized or consolidated into a single EDW. Anyone in the organization authorized to access data in the EDW will be able to do so. Thus, consolidation allows business to (a) retain functional capabilities of the original sources, and at the same time (b) broaden the business value of the data. Data mart consolidation provides bene ts as follows:

A centralized EDW results in common resources like hardware used, software and tools, processes, and personnel. This results in a signi cant reduction in cost per data mart.

Since it is easier to secure centralized data than data distributed across different platforms in multiple locations, better information security can be provided. This also aids in being compliant with regulatory norms.

There is a single version of the truth , which enables better decision-making by providing more relevant information. Enterprise managers, as different from department managers, require data from all departments and this is made pos- sible by data consolidation.

There are two factors in consolidation, the data warehouse implementation platform and the data model. This yields four possible approaches to doing consolidation:

1. Platform change but no change in data models: This addresses only the issues of consolidating the platform. All existing data marts are brought to the same platform. We get same common procedures for backup, recovery, and security. Proliferation of platforms and associated hardware/software costs is mitigated. Further, the business gets cost savings in supporting and maintenance staff. However, this is a mere re-hosting of existing data models and several data models continue to exist though on a centralized platform. This form of consolidation is relatively easy to carry through. The main effort lies in redoing those procedures that might have used platform-speci c features. However, with this approach, multiple ETL processes continue to be needed and there is no metadata integration.

2. No platform change but changed data model: This type of consolidation does integrating of data of the several data marts. As a result, problems of incon- sistency, redundancy, missing data, etc. are removed. BI applications give better

2.5

Data Mart Consolidation

37

results and costs in keeping redundant data are minimized. This approach requires the construction of the bus of conformed dimensions. These may have not been determined earlier, as is likely in the approach of using Scrum and user stories, or consolidation may have been designed for, as in the approach of BEAM*. Clearly, the former shall require more work than the latter. To the extent that conformed dimensions are used, some standardization of metadata does occur in this approach. However, non-conformed data continues to have different metadata. There could be changes in schemas due to conformed dimensions and these may require changes in the ETL processes. However, such changes are minimal. Similarly, there may be changes in the code that produces reports to take into account the changed schema. Note, however, that due to diverse platforms, the cost savings of using a common platform do not accrue. According to [ 16 ] organizations use this as a

rst step in moving to consolidation as per approach (4) below.

3. No platform change, no change in data model: This leads to no consolidation and can be discarded.

4. Changed platform and changed data model. In this case, we get bene ts of both a common platform and integrated data models. As mentioned earlier, this is the culminating step in data mart consolidation and is usually preceded by following approach (2) above.

There are two ways in which this kind of consolidation can be done. These are as follows:

a. Consolidate by merging with primary: Two data marts, a primary and a secondary data mart, are selected out of the several data marts that exist. The secondary data mart is to be merged with the primary. As a rst step, the primary data mart is moved to the new platform. The secondary data mart is then migrated to the new platform and conformed to the primary or, in other words, conformed dimensions and facts are determined. Once merging is completed, the secondary data mart can be discarded. This migration to the new platform and integration with the primary is repeated for all remaining data marts to yield the enterprise-wide data warehouse. The merge with primary approach works well if the schema of the primary does not have to undergo major changes in accommodating the independent data marts. If this condition is not satis ed, then the approach considered below is deployed.

b. Consolidate by doing a redesign: In this case, a fresh design is made keeping in mind the common information across independent data marts. Existing data marts are not used except to gain some understanding of the department view of the business, thereby laying a basis for development of the enterprise-wide data warehouse schema. Evidently, this approach can require large effort and time before delivery.

To sum up, the simplest form of data mart consolidation saves cost of software and hardware infrastructure. More complex forms of consolidation can further help

38

2 Requirements Engineering for Data Warehousing

in eliminating redundant and dormant processes ensuring a new optimized system. Data mart consolidation reaches ful llment when it also removes inconsistencies and presents a single version of the truth . When this happens, then the data warehouse addresses both departmental and enterprise-wide concerns. Through consolidation then, a stage is reached when functionality, processes, and hardware/ software infrastructure are all rationalized.

2.6 Strategic Alignment

By strategic alignment, we refer to the alignment of a data warehouse with business strategy. This alignment is fostered when business and IT people come together with a realization that a data warehouse provides critical value to the organization and that the data warehouse can prove highly bene cial to the business. This means that the strategy for the deployment of IT should be co-developed with business strategy since IT processes play a major role in delivering business value. In an aligned business, IT and business managers are jointly responsible for identifying IT investments to be made, prioritizing these, and deciding resource allocation. Since top business management is seen to be cooperating with IT managers, this cooperating culture ows down to all levels of the organization. As a result, there is effective communication to facilitate the realization of IT strategy and development of IT products and services by bringing together business and technological capabilities. The foregoing is possible if there is corporate commitment at the top levels to alignment. On the other hand, IT professionals also need to learn the importance of alignment. Whereas, traditionally, the IT professional was largely evaluated on technical skills, today they need to additionally have skills for listening, negotiation and consensus building, teamwork, and customer service. The IT professional must be sensitive to the manner in which technology can be brought into the business, what bene ts it can bring, and what changes in business practice will be needed. These new skills go a long way in developing cooperation with all stakeholders thereby minimizing con ict that may negatively affect IT plans. We can now consider the alignment of data warehouse with business strategy. IT managers responsible for data warehousing must work together with business managers to achieve alignment. Alignment implies establishing coordination and communication besides doing joint envisioning of the data warehouse strategy and setting priorities. This is built on the basis that both IT and Business see the data warehouse as a critical resource that shall provide competitive advantage to the company. Bansali [ 17 ] proposes ve factors that apply speci cally to alignment of data warehouse with business. These are as follows:

1. Joint responsibility between data warehouse and business managers : Since data warehousing involves multiple stakeholders each having their own data,

2.6

Strategic Alignment

39

severe data quality issues emerge that need to be resolved. Senior management needs to impose data standards and additionally, overcome any resistance to change current practice.

2. Alignment between data warehouse plan and business plan : The vision of the data warehouse in the business needs to be de ned. If this is only a short-term vision, then a lower budget will be allocated, early delivery shall be required, and the independent data mart approach shall be adopted. If on the other hand full organizational control is needed, then an enterprise-wide data warehouse would be required. The strategy for this may be through building data marts and then doing consolidation.

3. Flexibility in data warehouse planning : If there is likelihood that business strategy changes even as data warehouse development is ongoing, then a change in the business requirements of the data warehouse could be necessitated. In such a situation, iterative development with short lead times to delivery may be the answer.

4. Technical integration of the data warehouse : The business case for a data warehouse must rst be established and business needs determined before opting for a particular technology. Selecting technology is based on its ability to address business and user requirements. The inability of the organization to absorb large amounts of new technology may lead to failure. Conversely, deploying old technology may not produce the desired results. Similarly, dumping huge quantities of new data in the lap of users may be as negative as providing very little new data.

5. Business user satisfaction : End-user participation is essential so as to both manage user expectations and satisfy their requirements. The selection of appropriate users in the project team is crucial.

The data warehouse community has responded to the need for alignment in several ways. One is the adoption of agile techniques. The agile manifesto is in four statements as follows:

Individuals and interactions over processes and tools,

Working software over comprehensive documentation,

Customer collaboration over contract negotiation, and

Responding to change over following a plan.

It can be seen that this manifesto addresses the ve factors discussed above. Agile development, as we have already seen, provides a broad developmental approach to data warehouse development but does not provide techniques by which the various stages shall be handled in a real project. To realize its full potential, it relies on models, tools, and techniques in the area of requirements, design, and construction engineering. Of interest to us, in this book, is requirements engi- neering. All work in the area of data warehouse requirements engineering, DWRE, is predicated upon close requirements engineer stakeholder interaction.

40

2 Requirements Engineering for Data Warehousing

2.7 Data Warehouse Requirements Engineering

The importance of requirements gathering was highlighted in Sect. 2.2 . The area of data warehouse requirements engineering, DWRE, aims to arrive at a clear requirements speci cation on which both organizational stakeholders and the development team agree. As already seen, this speci cation may be a complete enterprise-wide speci cation if the DWSDLC is being followed breadth rst, or it may be partial if the DWSDLC is being sliced vertically. The rst question that arises is, what is a data warehouse requirement? Notionally, this question can be answered in two ways:

(a)

What shall the data warehouse do?

(b)

What information shall the data warehouse provide?

Data warehouse technology does not directly address the rst question. One answer that is provided is that the data warehouse supports analysis of different forms, analyze sales, analyze customer response, etc. The second answer is that the data warehouse can be queried, mined, and Online Analytical Processing (OLAP) operations performed on it. It follows that a data warehouse per se does not provide value to the business. Rather, value is obtained because of the improved decision-making that results from the better information that is available in it. This situation is different from that in transactional systems. These systems provide functionality and can perform actions. Thus, a hotel reservation system can do room bookings, cancelations, and the like. On the other hand, by providing capabilities to query, mine, and do OLAP, a data warehouse can be used by decision-makers to make decisions about what to do next. Therefore, data warehouse requirements cannot be expressed in terms of the functionality they provide, because they are not built to provide functionality. Asking what data warehouses do is the wrong question to ask. The second question is of relevance to data warehousing. If the information to be kept in the data warehouse is known, then it is possible to structure it in multidi- mensional form and thereafter, to query it, mine it, and do OLAP with it. Thus, the data warehouse requirements engineering problem is that of determining the information contents of the data warehouse to-be. Again notice, the difference with transactional systems where supplying information is not the priority, and asking what information to keep would be the wrong question to ask. Now, the information to be kept in the data warehouse cannot be determined in isolation and requires a context within which it is relevant. Thus, information for a human resource data mart is different from that of the nance data mart. Due to this, requirements engineering techniques explore the context and then arrive at the information that is to be kept in the data warehouse. There are several proposals for exploring the context and determining information relevant to the context. Broadly speaking, there are two approaches as shown in Fig. 2.6 . On the left side of the gure, we see that interest is in the immediate concern that motivates obtaining information from the data warehouse. This may be a requirement for

2.7

Data Warehouse Requirements Engineering

41

analyzing sales, forecasting sales, or simply asking questions about sales. Once these needs are identied, then it is a matter of eliciting the information that should be kept in the data warehouse. The important point to note is that though the immediate context may be derived from an organizational one, the latter is not modeled and is only informally explored. The second approach is shown on the right side of Fig. 2.6 . Here, the organi- zational context that raises the immediate concern is also of interest and is, con- sequently, modeled. There is a clear representation of the organizational context from which the immediate context can be derived. For example, forecasting sales may be of interest because the organization is launching a variant of an existing product. It may also be interest to know trends of sales of existing products. The organizational context then provides the rationale for the immediate context. It provides a check that the immediate context is indeed relevant to the organization and is not merely a fanciful analysis. How many levels deep is the organizational context? We will show that there are proposals for organizing data warehouse requirements engineering in multiple levels of the organizational context.

Immediate Context

Hughes [ 2 ] makes a case for agile data warehouse engineering and builds user stories that form the basis for subsequent data warehouse development. User stories principally specify the analysis needs of decision-makers, for example, analyze sales. This is determined by moving down the epic theme story levels of Scrum. The technique is completely based on interviewing and deriving stories. Paim and Castro [18 ] proposed the DWARF technique and used traditional techniques like interviews and prototyping to elicit requirements. Winter and Strauch [19 ] propose a cyclic process which maps the information demand made by middle-level managers and knowledge workers with information supplied in operational databases, reports, etc. They have an initial phase, an as is phase, and a to be phase. In the rst phase, they argue that since different users can result in different data models, the dominant users must be identi ed. This helps

Data warehouse Data Immediate Context warehouse Organiza onal Context Immediate Context
Data
warehouse
Data
Immediate Context
warehouse
Organiza onal Context
Immediate Context

Fig. 2.6 Contexts of DWRE

42

2 Requirements Engineering for Data Warehousing

target a speci c business process. In the as isphase, an information map is created by analyzing (a) existing information systems and (b) reports that the users commonly use. According to the authors, analyzing the latter helps identify more sources of information that one is not commonly aware of. In the to be phase, information demand is elicited from the user by asking business questions. The information supply and information demand are compared and inconsistencies analyzed. Finally, using semantic models information requirements are modeled.

Organizational-Immediate Context

Traditionally, there are two major approaches: one is for setting the organizational context with goal modeling and the other with business process modeling. There is a third group of techniques that modify goal orientation by introducing additional business-related concepts. We refer to these as goal-motivated techniques. As we have already seen, there is much interest in RE for transactional systems on goal-oriented [ 20 , 21 ] and scenario-oriented techniques [ 22 , 23 ]. These were coupled together to yield the goal scenario coupling technique [24 , 25 ]. Goal orientation uses means ends analysis to reduce goals and the goal hierarchy identi es the goals that are to be operationalized in the system. Notice the near absence of the data/information aspect in goal orientation. Scenario orientation reveals typical functionality and its variations by identifying typical interaction between the system and the user. Even though example data is shown to ow across the system user interface, focus is not on the data aspect; data and its modeling are largely ignored in scenario-oriented RE. Goal scenario coupling allows develop- ment of a scenario for a goal of the goal hierarchy. Consequently, variations of goals are discovered in its scenario. Any new functionality indicated by the scenario is then introduced in the goal hierarchy. Thus, a mutually cooperating system is developed to better discover system goals. Again, notice that data is largely ignored. A number of proposals for goal-oriented data warehouse requirements engi- neering, GODWRE, are available and all of these link goals with data, that is, all are aimed at obtaining facts and dimensions of data warehouses from goals [ 7 , 26 31 ]. We consider each of these in turn. The second approach takes business processes as the basis for determining the organizational context. An example of a business process is order processing. Events that take place during a business process generate/capture data and a business would like to analyze this data. Thus, data may be, for example, logs of web service execution, application logs, event logs, resource utilization data,

nancial data, etc. Interest is in analyzing the data to optimize processes, resource allocation, load prediction and optimization, and exception understanding and prevention. When starting off from business processes, the several processes carried out in a business are rst prioritized and the process to be taken up next is selected.

2.7

Data Warehouse Requirements Engineering

43

Requirements of the business process are then obtained and taken into one or more dimensional models. The data resulting from events of business processes is essentially performance metrics and can be mapped to facts of the multidimensional model, whereas parameters of analysis become dimensions. Therefore, business intelligence can be applied to this data. There are also a number of hybrid approaches that follow from goal-oriented approaches, one of which is to couple goals and processes. Others are for example to couple goals with key performance indicators and to couple goals with decisions. We refer to these as approaches that are motivated by goal modeling. We consider these three types of DWRE techniques in the rest of this section.

2.7.1 Goal-Oriented DWRE Techniques

Goal-Oriented DWRE, GODWRE, techniques draw heavily from the notion of goals developed in GORE considered in Chap. 1 . Thus, the organizational context of the data warehouse is represented in terms of goals that the business wants to achieve. Goal reduction techniques as in GORE are adopted to yield the goal hierarchy. Thereafter, facts and dimensions are associated with goals. An early proposal for GODWRE was due to Bonifati et al. [ 27 ] who obtained DW structures from users goals and operational databases. This was done by three levels of analysis: (i) top-down using goals, (ii) bottom-up for operational data- bases, and (iii) integration for integrating data warehouse structures obtained from steps (i) and (ii). Our interest is in step (i) only that relies on Goal Question Metric analysis. Users requirements are collected through traditional techniques like inter- viewing and brainstorming to obtain goals. A goal is expressed in terms of

Object of study: the part of the reality being studied,

Purpose: why the study is being done,

Quality focus: the characteristics of interest in the study,

Viewpoints: who is interested in the study, and

Environment: the application context in which the study is being done.

Goals are further decomposed into simpler subgoals using goal reduction techniques. Now, in order to obtain the information contents of the data warehouse, goal characteristics are collected on GQM Abstraction sheets. These sheets are in four parts as follows:

(a) Quality focus: The interesting question here is, How can the quality focus be detailed? There is no guidance on how to obtain these details but some examples are provided. These are cost, performance, resources required, etc. These factors yield the facts of the data warehouse to-be.

44

2 Requirements Engineering for Data Warehousing

(b)

Variation factor: The relevant question to ask here is, What factors can in uence quality focus? Examples of such factors are customers, time, work center, etc. Again, eliciting variation factors requires considerably skilled and experienced requirements engineers who ask the right questions and understand the responses.

(c)

Baseline hypothesis: What are the values assigned to the quality focus of interest? These are the typical queries that shall be asked when the warehouse becomes operational, for example, average cost of activities of a certain type.

(d)

Impact on baseline hypothesis: How do baseline hypothesis vary quality focus? These tell us the query results that the data warehouse will produce once it becomes operational.

The requirements engineering aspect is over once abstraction sheets are built. However, just to complete the description of the technique, we consider the manner in which the star schema is constructed. First, using the information obtained from abstraction sheets, ideal star schemas are constructed. Thereafter, in the bottom-up analysis phase, step (ii) above, entity relationship diagrams of existing operational databases are obtained and converted to star schemas. Finally, in step (iii) the ideal star schemas and those of step (ii) are matched. A metrics for selection is applied and the star schemas are ranked. The designer then chooses the best t for system design. Notice that in this technique, the organization context is the goal structure, whereas the immediate context is the abstraction sheets. Yet, another goal-oriented technique is due to Mazon et al. [ 30 ] who base their approach on i* methodology. They relate goals supported by DW with information requirements. Facts and dimensions are discovered from information requirements. An intentional actor refers to a decision-maker involved in the decision-making process. For each intentional actor, there are three intentional elements goals , tasks, and resources . Goals can be of three kinds:

Strategic goals are at the highest level of abstraction. These goals are the main objectives of the business process and cause a bene cial change of state in the business. Thus, increase sales is a strategic goal.

Decision goals are at the next lower level of abstraction. These goals are for achieving strategic goals. As an example, open new store is a decision goal that achieves the strategic goal and increases sales.

Information goals are at the lowest level of abstraction. These goals identify the information required to achieve a decision goal. For example, analyze pur- chases is an information goal.

Information is derived from information goals and is represented as tasks that must be carried out to achieve information goals. The requirements process starts with identi cation of decision-makers and a strategic dependency model is built that shows the dependency between different decision-makers. In the Strategic Rationale model, SR of i*, speci c concepts for multidimensional structures are introduced. These are business process, measures,

2.7

Data Warehouse Requirements Engineering

45

and context. Business processes are related to goals of decision-makers, measures are related to the information obtained from information goals, and context is the way in which information is analyzed. Relations between contexts are de ned that enable aggregates to be determined. We can summarize the steps to be carried out as follows:

Discovering the intentional actors, i.e., the decision-makers and de ning SR models for each decision-maker;

Discovering the three kinds of goals of decision-makers;

From information goals discovered, arriving at information requirements; and

Extracting multidimensional concepts from information requirements.

This basic technique has been used by Leal et al. [ 32 ] to develop a business strategy based approach. The basic idea is to go deeper into the organizational context. Thus, we can consider the proposal to be the introduction of another layer, the business strategic layer as shown in Fig. 2.7 . The new layer is for VMOST analysis of the business, vision, mission, objective, strategy, and tactic. First, decision-makers are identied. The DW is also considered as an actor. VMOST components along with strategic goals are obtained from decision-makers. Thereafter, intentional elements like objectives, tasks, and tactics are elicited as are means ends links. After verifying that the VMOST components are in accordance with the BMM model concepts, the approach of strategic, decision, and information goals outlined above is followed. The GRAND approach [31 , 33 ] divides the requirements elicitation process into two perspectives, the organizational perspective and the decisional perspective. The former models the requirements of the business that includes actors who may not be decision-makers. This perspective consists of two steps, goal analysis and fact analysis. In the goal analysis phase, goals for the actor are represented using an actor diagram. Each goal is decomposed by AND/OR decomposition and the rationale diagram built. During the facts analysis phase, facts are associated with

Fig. 2.7 The business strategic layer

Data warehouse Bu Immediate Context Organiza onal Context Business Strategy analysis
Data
warehouse Bu
Immediate Context
Organiza onal Context
Business Strategy analysis

46

2 Requirements Engineering for Data Warehousing

goals. This association arises because facts are the data to be kept when a goal is achieved. Finally, facts are augmented with their attributes. In the decisional phase, the organizational model is reviewed but with the decision-maker as the actor. The focus in this phase is in determining the analysis needs of the decision-maker and goals like analyze sales are established. Such goals are decomposed to yield their own goal hierarchy. Facts are normally imported from the organizational perspective but some additional facts may be obtained when the analyst investigates the goal model of the decisional phase. Dimensions are obtained by considering the leaf goals of the decision-maker goal hierarchy and the facts in the upper layers of this hierarchy.

2.7.2 Goal-Motivated Techniques

The techniques discussed in the previous section associate facts and dimensions with goals. There are other approaches that start out with goals but introduce an intermediate concept using which facts and dimensions are obtained. The goal-process approach of Boehnlein and Ulbricht [ 7 , 26 ] rely on the Semantic Object model, SOM, framework. After building a goal model for the business at hand, the business processes that are performed to meet the goals are modeled. The business application systems resulting from these are then used to yield a schema in accordance with the Structured Entity Relationship Model, SERM. Business objects of the business processes get represented as entities of SERM, and dependencies between entities are derived from the task structure. Thereafter, a special fourth stage is added to SOM in which only those attributes

that are relevant for information analysis required for decision-making are identi-

ed. Thereafter, the developer converts the SERM schema to facts and dimensions;

facts are determined by asking the question, how can goals be evaluated by metrics? Dimensions are identi ed from dependencies of the SERM schema. The Goal-Decision-Information, GDI, technique [ 28 , 29 ] associates decisions with business goals. A decision is a selection from a choice set of alternative. Each alternative is a way of achieving a goal. The decision-maker needs information in order to select an alternative. For each decision, relevant information is obtained by writing informational scenarios. These scenarios are sequences of information requests expressed in an SQL-like language. An information scenario is thus a typical system stakeholder interaction to identify information required for a deci- sion. Once information for all decisions is elicited, an ER diagram is built from

which the multidimensional schema is constructed. Typical information retrieval requests use the rather fuzzy notion of relevant information . What constitutes relevance is not spelt out.

2.7

Data Warehouse Requirements Engineering

47

2.7.3 Miscellaneous Approaches

Though the DWRE area is highly oriented toward goals, techniques that start off from notions other than goals do exist. One such example is that of BEAM*. This approach [ 12 ] gives prominence to business events that comprise a business process. Each business event is repre-

sented as a table and the RE problem now is to identify the table attributes. This is done by using the 7W framework that provides for asking questions of seven types, namely, (1) Who is involved in the event? (2) What did they do? To what is done? (3) When did it happen? (4) Where did it take place? (5) Why did it happen? (6) How did it happen in what manner? and (7) How many or much was recorded how can it be measured? Out of these, the rst six supply dimensions, whereas the last one supplies facts. Yet, another proposal kicks off from use cases [ 34 ]. Use cases are used for communication between stakeholders, domain experts, and DW designers. The authors propose an incremental method to develop use cases. Facade iteration is the

rst iteration where use case outlines and high-level descriptions are captured. Its

purpose is to identify actors for other major iterations. The information gathered is regarding names and short descriptions of actor interactions with DW system. During the next iteration, ideas of use cases are broadened and deepened. They generally include functional , information requirements plus requirement attri- butes. Since the requirements gathered can be too large, use cases are rst indi- vidually evaluated for errors and omissions, then prioritized and pruned. This is done so that at the end only the use cases that provide sufcient information to build DW system are left. Thereafter, con icting/inconsistent use cases are identi ed and reassessed. Finally, use cases are used for obtaining relevant information. The use of key performance indicators has also formed the basis of DWRE techniques. References [ 35 , 36 ] model business indicators as functions and identify the needed parameters and return type. That is, input and output information needed to compute a business indicator is determined.

2.7.4 Obtaining Information

It can be seen that there is a clear attempt to obtain the context in which facts and dimensions of interest carry meaning. This context is explored through a variety of concepts like goals, decisions, business processes, business events, and KPIs. Thereafter, attention turns to obtaining data warehouse information. The techniques for this second part are summarized in Table 2.4 . The primary difculty with Boehnlein and Ulbricht is the absence of any model or guideline to discover the attributes relevant to the analysis. The authors do not indicate how stakeholders articulate the analysis to be performed. Consequently, attribute identi cation becomes an unfocused activity. Further, the approach is for

48

2 Requirements Engineering for Data Warehousing

Table 2.4 Approaches to obtaining information

Approach

Obtaining data warehouse information

Boehnlein and

Relevant business objects and attributes

Ulbricht

Edges of SERM schema

Bonifati

Quality focus and variation factors

Prakash and Gosain

Information scenarios

Maz ó n et al.

Measures, context

Georgini et al.

Goal achievement measures, dimensions from leaves of goal hierarchy

Nasiri et al. [37]

Follows approach of Maz ó n et al. as in row 4 of this table

Corr and Stagnitto

Uses 7W framework

obtaining nominal information for the company as a whole. Therefore, individual stakeholder s information needs are de-emphasized. Bonifati relies on quality focus and variation factors. Evidently merely asking questions like how quality focus can be detailedand what factors can in uence quality focus is not enough since no guidance and support is provided for answering the questions. We need some model that can be populated and a suitable investigation needs to be carried out to perform this task. Even though an SQL-like structure of queries is provided to express information scenarios by Prakash and Gosain, there is no guidance on what information to ask for and what factors to consider. Thus, the approach relies heavily on the experience of the scenario writer. Maz ó n et al. rely on obtaining measures and contexts for information goals. This is an ad hoc activity that relies completely on stakeholder experience. Georgini et al., similarly, do not provide guidance in the task of analyzing leaf goals and on the aspects to be considered in arriving at dimensions. The last row of Table 2.4 deals with the 7W framework used in Corr and Stagnitto. This approach is rather simplistic; compared to this, the other techniques discussed here provide some concepts using which stakeholders can identify their information needs.

2.8 Conclusion

Data warehouse development has two major concerns, namely

1. What method to adopt to build data warehouses without imposing excessive costs on organizations and minimizing the lead times for product delivery? This is the issue of the development strategy to be adopted.

2. How to ensure that the data warehouse to-be meets the information requirements of decision-makers? This is the issue of requirements engineering.

2.8 Conclusion

49

As for transactional systems, the two issues have been treated independently of one another. That is, the manner in which requirements engineering can support an efcient, iterative, and incremental development strategy has not been addressed. It is evident, however, that there is a fundamental difference between require- ments engineering for transactional systems and that for data warehousing. The former is oriented toward discovering the functionality of the system to-be. The discovered functionality is then implemented or operationalized in the system to be built. In contrast, the problem of DWRE is to determine the information contents of the data warehouse to-be. However, our analysis of information elicitation tech- niques shows that these are rather ad hoc, and provide little guidance in the requirements engineering task. We need models, tools, and techniques to do this task better.

References

1.

Loshin, D. (2013). Business intelligence the savvy manager s guide (2nd ed.). Elsevier.

2.

Hughes, R. (2013). Agile data warehousing project management business intelligence systems using scrum . Morgan Kaufman.

3.

Ericson, J. (2006, April). A simple plan, information management magazine . http://www. information-management.com/issues/20060401/1051182-1.html . Accessed September 2011.

4.

Hayen, R., Rutashobya, C., & Vetter, D. (2007). An investigation of the factors affecting data warehousing success. Issues In Information Systems , VIII (2), 547553.

5.

Alshboul, R. (2012). Data warehouse explorative study. Applied Mathematical Sciences, 6 (61), 3015 3024.

6.

Inmon, B. (2005). Building the data warehouse (4th ed.). New York: Wiley.

7.

Boehnlein, M., & Ulbrich vom Ende, A. (1999). Deriving initial data warehouse structures from the conceptual data models of the underlying operational information systems. In Proceedings of Workshop on Data Warehousing and OLAP (pp. 15 21). ACM.

8

Hü semann, B., Lechtenb ö rger, J., & Vossen, G. (2000). Conceptual data warehouse design. In Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW 2000), Stockholm, Sweden, June 56 .

9

Moody, L.D., & Kortink, M.A.R. (2000). From enterprise models to dimensional models: A methodology for data warehouses and data mart design. In Proceedings of the International Workshop on Design and Management of Data Warehouses, Stockholm, Sweden (pp. 5.1

5.12)

10.

Golfarelli, M., Maio, D., & Rizzi, S. (1998). Conceptual design of data warehouses from E/R schemes. In Proceedings of the Thirty-First Hawaii International Conference on System Sciences, 1998 (Vol. 7, pp. 334343). IEEE.

11.

Prakash, N., Prakash, D., & Sharma, Y. K. (2009). Towards better tting data warehouse systems. In The practice of enterprise modeling (pp. 130144). Springer, Berlin, Heidelberg.

12.

Corr, L., & Stagnitto, J. (2012). Agile data warehouse design . UK: Decision One Press.

13.

Ambler, S. www.agiledata.org .

14.

CMP. Data mart consolidation and business intelligence standardization. www. businessobjects.com/pdf/investors/data_mart_consolidation.pdf.

15.

Muneeswara, P. C. Data mart consolidation process, What, Why, When, and How, Hexaware Technologies white paper . www.hexaware.com .

50

2 Requirements Engineering for Data Warehousing

17. Bansali, N. (2007). Strategic alignment in data warehouses two case studies (Ph.D. thesis). RMIT University.

18. Paim, F. R. S., & de Castro, J. F. B. (2003). DWARF: An approach for requirements de nition and management of data warehouse systems. In 11th IEEE Proceedings of International Conference on Requirements Engineering, 2003 (pp. 75 84). IEEE.

19. Winter, R., & Strauch, B. (2003). A method for demand-driven information requirements analysis in data warehousing projects. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003 (p. 9). IEEE.

20. Ant ón, A. I. (1996, April). Goal-based requirements analysis. In Proceedings of the Second International Conference on Requirements Engineering (pp. 136144). IEEE.

21. Lamsweerde, A. (2000). Requirements engineering in the year 00: A research perspective. In Proceedings of the 22nd International Conference on Software Engineering (pp. 5 19). ACM.

22. Sutcliffe, A. G., Maiden, N. A., Minocha, S., & Manuel, D. (1998). Supporting scenario-based requirements engineering. IEEE Transactions on Software Engineering, 24 (12), 1072 1088.

23. Lamsweerde, A., & Willemet, L. (1998). Inferring declarative requirements speci cations from operational scenarios. IEEE Transactions on Software Engineering, 24(12), 1089 1114.

24. CREWS Team. (1998). The CREWS glossary, CREWS Report 98-1. http://SUNSITE. informatik.rwth-aachen.de/CREWS/reports.htm .

25. Liu, L., & Yu, E. (2004). Designing information systems in social context: A goal and scenario modelling approach. Information systems, 29(2), 187203.

26. Boehnlein, M., & Ulbrich vom Ende, A. (2000). Business process oriented development of data warehouse structures. In Proceedings of Data Warehousing 2000 (pp. 3 21). Physica Verlag HD.

27. Bonifati, A., Cattaneo, F., Ceri, S., Fuggetta, A., & Paraboschi, S. (2001). Designing data marts for data warehouses. ACM Transactions on Software Engineering and Methodology, 10 (4), 452483.

28. Prakash, N., & Gosain, A. (2003). Requirements driven data warehouse development. In CAiSE Short Paper Proceedings (pp. 1317).

29. Prakash, N., & Gosain, A. (2008). An approach to engineering the requirements of data warehouses. Requirements Engineering Journal , Springer, 13(1), 4972.

30. Maz ó n, J. N., Pardillo, J., & Trujillo, J. (2007). A model-driven goal-oriented requirement engineering approach for data warehouses. Advances in Conceptual Modeling Foundations and Applications (pp. 255264). Springer, Berlin, Heidelberg.

31. Giorgini, P., Rizzi, S., & Garzetti, M. (2008). GRAnD: A goal-oriented approach to requirement analysis in data warehouses. Decision Support Systems, 45(1), 421.

32. Leal, C. A., Maz ón, J. N., & Trujillo, J. (2013). A business-oriented approach to data warehouse development. Ingenier í a e Investigaci ón , 33(1), 59