Sei sulla pagina 1di 8

Automation in Construction 15 (2006) 800 807 www.elsevier.

com/locate/autcon

A project-oriented data warehouse for construction


Thammasak Rujirayanyong a , Jonathan J. Shi b,
b

Department of Civil Engineering, Rangsit University, Patum-Thani, Thailand Department of Civil and Architectural Engineering, Illinois Institute of Technology, Chicago, IL, USA Accepted 16 November 2005

Abstract A construction organization generates a great amount of operational data that are distributed across various functional systems to support its daily operations. Although those data may be potentially useful for future projects, they are not widely collected and centrally stored in the organization. This research presents a Project-oriented Data Warehouse (PDW) for contractors. PDW is designed with dimensional data models consisting of 26 tables. Sixteen of the tables are dimension tables for storing general descriptive information, and the other ten are fact tables for detailing various facts that are captured in the lifecycle of construction projects. PDW can be directly populated with data from existing operational systems, such as P3 files, MS Access, P3/e databases, and Excel files. It maintains each data in the context of its associated project so that a user can retrieve a specific piece of information plus any background information of the related project. PDW has been populated with three sample project data. Through the user interface, a user can generate interested query reports as needed. The presented warehouse structure and data models are scalable. They may be adopted by medium or large contractors for developing company-level data facilities. 2005 Elsevier B.V. All rights reserved.
Keywords: Database; Information system; Decision support system; Construction; Data warehousing

1. Introduction Everyday organizations large or small create billions of bytes of data about various aspects of their business, such as customers, products, operations and people [1]. An organization has the need to access a variety of information to support either its daily operations or business decisions. Construction organizations also generate a great amount of operational data that are distributed across various functional databases. These data play an important role in securing a project's completion on time, within budget and meeting design specifications [2]. The information systems in an organization are generally divided into two categories: operation support systems (OSS) and decision support systems (DSS). OSS serve the need of running the daily operations of the business. DSS provide historical information for analyzing the business so that important business decisions can be made appropriately. Many companies have realized the importance of the hidden treasure of information, which can significantly improve the quality of
Corresponding author. E-mail address: jonathan.shi@iit.edu (J.J. Shi). 0926-5805/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.autcon.2005.11.001

decisions [3]. Moreover, unlike consumable resources, information as an organization's intangible asset can be reused over and over without losing its value. Instead, it may even be enriched in the process. Therefore, it is the interest of an organization to collect and store the information for future use. Data warehousing is a new technology evolved in the last decade. It intends to provide all users in an organization with timely access to whatever level of information as needed [1]. A data warehouse provides an architectural model for the flow of data from operational systems already in place to decisionsupport environments [4,5]. It is periodically populated with data from operational systems such as equipment managements, accounting systems, material inventory systems and customer management systems. Essentially, a data warehouse collects all of the relevant data into one central system, organizes the data efficiently so it is consistent and easy to retrieve, keeps old data for historical analysis, and enables access to and use of data conveniently so that users can do it themselves [6]. William Inmon, who coined the term data warehouse in 1990, defined a data warehouse as a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management decision [7].

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

801

Data warehousing has become popular among organizations that seek competitive advantage [8,9]. A data warehouse essentially holds the business intelligence (BI) for the enterprise to enable strategic decision making. It contains critical measurements of the business processes stored along business dimensions. The data warehouse institute (TDWI) defines BI as the processes, tools, and technologies that are required to turn data into information and to turn information into knowledge and effective business plans. Objective information obtained from the warehouse allows the business to examine its strategy and to build up its competitive advantage in the market. Historical data shows how successful the business was in the past; the current information tells where the business stands; and the sum of past and present information helps to position the business for the future. In general, information provides the basis for decision makers to have a better knowledge about the business and to develop superior business strategies accordingly. To succeed in today's business environment, we must understand how we performed in the past, where we are, and how to correctly position ourselves for the future. Data warehousing is gaining its popularity as organizations realize the benefits of having a central database for supporting efficient management functions. It has become an instant phenomenon in many large organizations [10]. More than half of the companies in the United States have committed to implement the technology [11]. Despite its popularity in manufacturing and other business sectors, studies and implementations are still very limited in the construction industry. In 1997, Decker et al. reported a cost engineering data warehouse for supporting cost estimating for Amoco Corporation [4]. Chau et al. in 2002 developed a prototype of a material inventory data warehouse for inventory management in Hong Kong [12]. In order to take a full advantage of the technology, more research is needed on how to collect and store company-wide construction data for future uses and how to develop a data warehouse for assisting construction business decision making. The current industry practice shows that many local databases are maintained by different offices in a construction company to support its management functions. For instance, the

estimating office keeps a historical cost database; the operations office maintains a project database and the equipment department runs a database system for all equipment the company owns. Each of these databases may be associated with a specific operational application. For example project data is stored in Primavera Project Planner (P3) and accounting data is maintained in the selected accounting system. These operational systems and data sources are segregated along the physical boundaries between management offices. The segregation causes many problems for construction business including: It prevents data sharing between management functions; Multiple entries for the same data are generated at various locations; It causes misunderstandings if the same data at different locations are not updated simultaneously; It slows down the decision-making process if data obtained from different sources conflict with each other; and It does not support advanced analysis and complex queries that are essential for supporting business decisions. Construction business is project-oriented. All data can be associated with related projects and all decisions are made for projects. Collecting and maintaining data in relation to projects sound practical and logical. This research develops a companylevel data warehouse named Project-oriented Data Warehouse (PDW) for large and medium contractors. 2. The development of a project-oriented data warehouse (PDW) The PDW organizes construction data in the context of its associated projects. Its architecture is shown in Fig. 1 with four major components: data sources, data staging area, data storage servers, and data access. The data source component includes source databases that supply data to the warehouse. The data staging area is an intermediate database which is used for transferring data from its sources to the warehouse. The data storage servers store the data in the warehouse. The data access

Data Source
Project performance

Data Staging Area

Data Storage Server

Data Access

Estimate

Query/Reporting Data Staging Area PDW

Material

ETL

ETL

Analysis

Contract& Bidding Data Mart External DB Data Mart

Data Mining

Fig. 1. The project-oriented data warehouse architecture.

802

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

component provides an interface for end users to retrieve data, to process, organize or analyze data, and to export data to external environments as appropriate. Data marts and data mining tools may be added to the system for advanced data retrieval and analysis. 2.1. Construction project data sources While designing a data warehouse, the first challenge is to determine what data will be uploaded to the warehouse. Two distinct approaches may be used to determine the corresponding strategy: need-based and availability-based approaches. The need-based approach examines what data will be needed in the future based on the business nature so that these needed data will be collected and be uploaded to the warehouse. The availability-based approach examines what data is currently available in the operational systems; and the available data will be selected to the warehouse. Some data to be loaded in the warehouse may not have any immediate use but may find it useful in the future because it is much cheaper to store data than to collect it later. Construction project data is generated along its life cycle starting from bidding to construction. This research classifies project data into four categories: performance, materials, estimates, and bidding/contracts. A project performance data may exist in a large variety of formats or even different systems such as Primavera Project Planner (P3) or Microsoft Project. However, many construction applications are re-invented themselves in recent years to the clientserver architecture with central databases. For example, Primavera has released its on-line project planning, scheduling and management tools for Enterprise (P3/e) with a central database which can be supported by either Oracle or MS SQL Server [13,14]. Estimating data is usually created in Spreadsheet (e.g., MS Excel). While material management systems and contract/ bidding databases generally operate in relational database systems. The PDW will include these major types of project data. 2.2. Dimensional modeling The data models in the PDW must be compatible with how project data is currently maintained and how it may be used in the future. Moreover, the data in the warehouse must be structured to meet the management needs at both the project and company levels. Two major types of data models are widely used for constructing data warehouses. An Entity-Relationship (ER) model removes all redundancies in the data. It provides advantages in transactional processing by making transactions simple and deterministic. On the other hand, dimensional modeling enables speedy access and queries. Although it may use more space to store data, it provides one of the most practical techniques for delivering data to end users in a data warehouse [15]. A dimensional data model is composed of a central fact table and a set of surrounding dimension tables each of which

corresponds to one of the components or dimensions of the fact table. The fact table is the primary table that contains quantitative or factual data of the business although it may also contain textual attributes in order to limit number of dimension tables. A dimension table contains descriptive attributes for constraining and grouping data. Its size is usually smaller than that of a fact table. Each dimension table is defined with a primary key field. The fact table uses foreign key fields to reference with its dimension tables. A dimensional data model is scalable to allow new fact and dimension tables to be added as needed. An example data model with a star schema is shown in Fig. 2 with one fact table and four dimension tables. The fact table details the cost, duration, and quantity for each activity. The four dimension tables describe: the time dimension when an activity is constructed, the activity dimension which defines its characteristics, the WBS dimension which defines the relationships between activities or processes through the working-breakdown structure (WBS), and project dimension that describes the general project information. A star schema can be refined to a snowflake schema which can support hierarchical dimension tables. A star schema offers a better performance than a snowflake schema and is relatively easier to manage. A snowflake schema increases the number of joins and can slow down queries, but it provides a necessary logical separation of data for complex business. Considering the complexity of construction data, a snowflake schema is used in this research with an example as shown in Fig. 3. In this example, the project dimension is further broken down with a lower level dimensioncategory dimension, which allows a more specific description about project types. 2.3. The PDW database structure A construction project is commonly broken down into controllable workpackages/activities by following a work breakdown structure (WBS) which may also be associated with an organization breakdown structure (OBS) to detail management responsibilities. Activities are assigned with account numbers for monitoring and control purposes. The
Project Project_key Project_name Description Category

Calendar Date Month Year Cost Fact Date Project_key WBS_key Activity_key Cost Duration Quantity

Activity Activity_key Activity_name Activity_code Acitivity_type

WBS WBS_key WBS_name WBS_code

Fig. 2. A star schema data model.

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

803

Calendar Date Month Year Cost Fact Date Project_key WBS_key Activity_key Cost Duration Quantity

Project Project_key Project_name Description Category_key

Category Category_key Category

Activity Activity_key Activity_name Activity_code Acitivity_type

WBS WBS_key WBS_name WBS_code

Fig. 3. A snowflake schema data model.

essential data of each activity includes its as-planned and asbuilt schedules, costs, resources, and change orders incurred during construction. After a careful examination of the available project data and the essential data for potential future uses, the PDW is designed with 26 tables: 16 dimension tables and 10 fact tables. The 16 dimension tables as summarized in Table 1 provide descriptive information about a construction project including: its owner, geographical location, time, cost accounts, materials, suppliers, relationships between activities, organizational responsibility, subcontractors, and activities. The 10 fact tables as summarized in Table 2 detail a project information in 10 categories including: bid, change order, contract, estimate, expense, material, relationship, resource, schedule, and subcontract. The relationships between the 26 tables are shown in Fig. 4 in which the boxes represent the fact tables and the ovals represent the dimension tables. Some dimension tables are shared by multiple fact tables such as project dimension, activity dimension, and geography dimension. The details of the 26 tables are not provided in this article due to a limit on space. Interested readers may write to the authors
Table 1 The 16 dimension tables Dimension table Activity Activity predecessor Awarded project Calendar Change extra Cost account Expense Geography Material OBS Owner Project Resource Subcontractor Supplier WBS Table descriptions General activity data Predecessors of activities Awarded project name from project database Date when an event occurs Change and extra work data Cost accounts of a project Types of expense information Geographic information of parties and projects General materials information Assigned management responsibilities to the WBS Owner information of a project General data of participated projects Labor and equipment information Subcontractor information Supplier information WBS of a project

for more information. To explain the relationship between a fact table and its dimension tables, an example schema for materials is taken out as shown in Fig. 5 with six dimension tables: geography, project, supplier, material, cost account, and calendar dimensions. The relationships between the tables are established by the primary keys in the dimension tables and the foreign keys in the fact table. To further illustrate the difference between a dimension table and a fact table, the material dimension table and the material fact table in the schema are detailed as shown in Tables 3 and 4. The material dimension table in Table 3 has four fields: Material_wk, Material_ID, Material_name, and Unit. It contains the textual attributes for constraining and grouping materials in queries. The fact table in Table 4 contains 10 fields which provide detailed material and its usage information. The first six fields in Table 4 are the foreign keys for referencing to the six relevant dimension tables. The other four fields in the table contain fact information including: unit price, quantity, cost and cost discount for each material. The last two rows in the fact table show that PO 1007 includes two purchase items. From their Material_wk 11 and 7 (column 6), we know that the two items are Door and Paint in reference to the material dimension tableTable 3. Similarly by referencing to the other
Table 2 The 10 fact tables Fact table Bid Change order Contract Estimate Expense Material Relationship Resource Schedule Subcontract Table descriptions Bid data for the projects that the company has submitted bids Change and extra work Contract data for all projects Estimate data of all submitted bids Expense data (material, subcontractor and other) assigned to activities Material data delivered to the completed projects Logical relationship between activities Detailed labor and equipment data assigned to activities As-built and as-planned activity schedule data for the completed projects Subcontract data of a project

804

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

Schedule Facts

Activity

Estimate Facts

Calendar Activity Predecessor

Relationship Facts

Contract Facts
Owner Awarded-Project

Resource

Resource Facts
Project

Bid Facts
Geography

Expense

Expense Facts
Cost Account

Subcontract Facts

Subcontractor

Supplier Change Extra

Change Extra Facts

WBS

Material Facts
Material

OBS

Fig. 4. The 26 tables and their relationships.

five dimension tables, we can determine the project ID, the cost account, the time the PO was placed, the time that goods were received, and the suppliers. 2.4. The development of the PDW A data warehouse is a central database residing on the servers and users can access and retrieve data at their client computers. The PDW must be constructed in a clientserver

environment. After the warehouse architecture and data models have been designed, a database management system must be selected for constructing the warehouse. MS SQL Server 2000 was chosen in this research because it supports clientserver data warehouse applications and also it comes with the Data Transformation Service (DTS) pack. Implementing the PDW involves the two major tasks: (1) creating the warehouse structure with the 26 tables; (2) designing the strategy and tools for populating the warehouse.

Fig. 5. The material fact table and its supporting dimension tables.

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807 Table 3 The material dimension table Material_wk 1 2 3 4 5 6 7 8 9 10 11 Material_ID RB10 C101 PC103 HR1 C104 WM3 P005 BK213 CAP11 WA5 DR200 Material_name Steel rebar Structural concrete Barrier pre-cast concrete Hand rails Topping concrete Wire mesh Paint Brick Carpet Interior wall Door Unit ton cu yd each lf cu yd sf gal sf sf sf each

805

Creating a warehouse structure is similar to creating a database in any database environment. Implementing the PDW starts with creating the 26 tables in the MS SQL Sever 2000 environment. In the process, each table is defined with corresponding number of fields and attribute for each field. Additionally, each dimension table is defined with a primary key field that maps with the foreign key(s) in other tables for defining the relationships between the tables. After the 26 tables are created, the relationships between the tables are created according to the primary keys and foreign keys. A data warehouse must be uploaded with data from the operating database systems. Populating a warehouse starts with extracting data from its sources. Then, the extracted data must be processed and checked for correctness before it is uploaded to the warehouse. In general, populating a warehouse involves executing an Extraction, Transformation, and Loading process (ETL) as follows [16]: Extract: extraction is the action of pulling data from a source system or systems; Transform: transformation reconciles data type and format differences from different sources, resolves uniqueness issues, and ensures conformity. A major task is to format the extracted data to meet the requirement of the warehouse; and Load: loading moves the data to the dimension or fact tables in the warehouse. In order to avoid interference with the source systems, a temporary working area is needed to host the extracted data, and
Table 4 The material fact table Project_wk 1 1 1 1 1 1 2 2 2 2 2 Cost_Account_wk 1 1 3 2 2 4 1 1 7 5 4 Order_Date_wk 20020105 20020112 20020203 20020215 20020215 20020215 20020412 20020412 20020515 20020520 20020520 Received_date_wk 20020112 20020120 20020210 20020119 20020222 20020225 20020417 20020421 20020522 20020527 20020529

it is commonly addressed as the data staging area which some writers also refer it as the construction site of a data warehouse [15]. The warehouse shows what data is needed and how it is to be stored; and the sources of the different data may be located at various servers across the company's computer network. After all data is extracted from multiple sources to be held at one place, data transformation can be efficiently performed. The data staging area requires reconciling data structures of the source systems and the data structures in the warehouse. It is usually created with flat files and/or databases to meet the needs. In order to automate a data warehouse population process, an ETL procedure must be developed. Developing an ETL tool may consume about half of the time of a warehouse project. An ETL tool must map the source and the destination for each piece of data. It must be specified with the correct paths of the data sources and corresponding destinations so that it can pull the data from the given sources and send it to the right destinations in the warehouse. Moreover, the ETL tool must also clearly define what data to be pulled from each source and what transformation is to be performed for the data. The DTS pack of the MS SQL Server 2000 provides a collection of objects and tools that allow users to import, export, and transform heterogeneous data between one or multiple types of data formats, such as MS SQL Server, MS Excel, MS Access, Text files, etc. It provides an efficient means for developing ETL tools. An audit database is usually created to keep a record of the operations of each ETL process such as the time when the populating process is executed and the specific actions on what data movements and transformations are performed. A data warehouse must be updated periodically. Each time when an ETL process is executed, the audit database will be updated so that it can assist in detecting what data has been moved to the warehouse before a new updating takes place. Therefore, it can support an incremental loading approach as used in the PDW to ensure that only new data will be uploaded to the warehouse in every populating process. 2.5. Querying a warehouse Retrieving information from PDW is achieved by queries. Common queries are divided into three types: drill up type for

Supplier_wk 1 1 3 2 2 2 1 1 4 4 4

Material_wk 1 2 3 4 5 7 1 2 10 11 7

PO_Number 1001 1002 1003 1004 1004 1004 1005 1005 1006 1007 1007

Unit_price $50.00 $65.00 $810.00 $10.00 $55.00 $25.00 $50.00 $65.00 $3.00 $150.00 $25.00

Quantity 110.00 346.15 20.00 120.00 196.36 8.00 30.00 21.54 800.00 14.67 52.00

Discount $2.20 $6.92 $0.00 $2.40 $3.93 $0.00 $0.60 $0.00 $16.00 $0.00 $1.04

806

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

Fig. 6. The sample query schema and report.

summary information, drill down type for detailed information of a particular project, and drill across type for retrieving information across multiple fact tables or projects. MS SQL 2000 Server provides a user interface for creating queries. With the interface, creating a query report involves identifying a fact table(s) and relevant dimension tables, and then selecting fields from these tables to form a new table as the query report. By executing the query, the database management system will retrieve relevant data from the warehouse and generate a report consisting of the retrieved records. A created query can be saved as a sample report template. Without recreating the query, a user can directly execute a query template to generate an updated query report. Common queries may be created and saved as sample reports. Query reports can be exported to external systems as needed, such as, MS Excel, MS Access, MS Query and Query editor, or any in-house applications. An example is used here to illustrate the process for creating a query. The example searches for the suppliers who delivered pre-cast concrete to the Canal bridge project and relevant cost data. To create the query, the material fact table is first selected. The following dimensional tables are also selected: Project, Cost Account, Material, and Supplier. Checking the columns in these tables will enable them to appear in the query report. The selected columns include: Supplier Name, Material Name, Unit, Quantity, and Unit cost. Additionally, the Canal bridge project and Pre-cast concrete are set as the query constraints in the Project and Cost Account Tables respectively. The total cost is calculated by multiplying Quantity and Unit cost. Fig. 6 shows the query schema and the generated report. The report as shown in the table under the query schema in Fig. 6 contains four

records which show that two companies supplied the pre-cast concrete products to the project. 3. Conclusions Historical project data can assist construction managers in answering questions about the business, the performance of interested operations, business trends, and what can be done to improve the business. Although the data is there, it is always a challenge to find the needed data in time when a decision is made. Decisions can be slowed down due to inconsistency and/ or inaccuracy of data. Data warehousing provides the technology for storing historical construction project data which can be extracted from existing operational databases/ systems. However, unlike many other application systems, an organization cannot simply buy a data warehouse off the shelf. Instead, a proper design is needed to ensure that the data structure of the warehouse can address the nature of the organization and meet its business needs so that the data to be captured in the warehouse will reflect what is available and what will be needed by the company. PDW underlines the project-oriented nature of the construction business. It is intended to provide a robust tool for collecting, storing, and utilizing historical construction project data. The PDW architecture and data models presented in this paper may be adopted by large and medium contractors. PDW can serve as a central data facility for users to retrieve the right data for making business decisions. Moreover, the quality data available in the PDW will provide useful information for conducting in-depth business analyses or data mining studies. The PDW warehouse architecture is scalable with new

T. Rujirayanyong, J.J. Shi / Automation in Construction 15 (2006) 800807

807

components to be added as needed, such as new data sources, data access modules, data marts, new fact and dimension tables, etc. To meet particular business analysis needs, data marts may be developed in future research, such as productivity, contract performance, resource allocation, etc. Those functions can be added to the data access component in the architecture with the data to be migrated from the warehouse. References
[1] K. Orr, Data Warehouse Technology: Revised Edition 2000, White Paper, The Ken Orr Institute, Kansas, 1996. [2] I. Ahmed, Data warehousing in construction organizations, Construction Congress VI 2000 Proceeding, ASCE, 2000, pp. 194203. [3] N. Goyal, Data Warehousing Lecture Notes, Birla Institute of Technology & Science, Pilani, India, 2003. [4] K. Decker, A. Oaks, M. Salinas, Building a Cost Engineering Data Warehouse, AACE International Transactions, IM.06, AACE International, Morgantown, 1997. [5] I. Manning, Data warehousingwhat is it? http://www.csesolutions.com/ data_warehousing.htm, 2002. [6] M. Corey, M. Abbey, I. Abramsom, Oracle 8 Data WarehousingA practical Guide to Successful Data Warehouse Analysis, ORACLE Press, 1998.

[7] W.H. Inmon, Building the Data Warehouse, John Wiley and Sons, Inc., New York, 1993. [8] R. Adhikari, Migrating legacy data, Software Magazine 16 (1) (1996 (January)) 7580. [9] J. Kador, One on one, Midrange Systems 8 (20) (1995 (October)). [10] L. Miller, S. Nilakanta, Data warehouse modeler: a CASE tool for warehouse design, Proc. of the 31st Hawaii Intl. Conf. on Sys. Sciences, IEEE Computer Society Press, 1998. [11] P. Ponniah, Data Warehousing Fundamentals, John Wiley & Sons, Inc., New York, 2001. [12] K.W. Chau, Y. Cao, M. Anson, J. Zhang, Application of data warehouse and decision support system in construction management, Automation in Construction (12) (2002) 213224. [13] Primavera Systems, Inc., Primavera Enterprise: Administrator's Guide, Pennsylvania, 2002. [14] Primavera Systems, Inc., Primavera Enterprise: User's Guide, Pennsylvania, 2002. [15] R.L. Kimball, L. Reeves, M. Ross, W. Thornthwaite, The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses, John Wiley & Sons, Inc., New York, 1998. [16] M. Chaffin, B. Knight, T. Robinson, Professional SQL Server 2000 DTS, Wrox Press Inc., Illinois, 2000.

Potrebbero piacerti anche