Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
QS
BIC160015
2019
v
TABLE OF CONTENTS
PAGE
NO.
CHAPTER ONE 7
INTRODUCTION 7
1.1 Introduction 7
1.2 Research Background 7
1.3 Problem statement 8
1.4 Research Questions 9
1.5 Aim and Objectives 10
1.6 Significance of the Research 10
1.7 Scope of Research 10
1.8 Research Methodology 11
1.9 Dissertation Structure 11
CHAPTER TWO 13
LITERATURE REVIEW 13
2.1 Introduction 13
2.2 Construction Cost Estimation 13
2.2.1 Definition 13
2.2.2 Factors influencing cost estimation 14
2.2.3 Type of Construction Cost Estimates Approaches 19
2.3 Data Mining 20
2.3.1 Definition 20
2.3.2 The Scope of Data Mining 21
2.3.3 Method of Data Mining 22
2.3.4 Data mining model 23
2.3.5 Data Mining process 24
2.4 Data Mining in improving the accuracy of construction cost estimation 24
2.5 Framework for data mining 25
CHAPTER THREE 30
References 30
6
CHAPTER ONE
INTRODUCTION
1.1 Introduction
The overall view of the study is highlighted in this chapter. This chapter starts
with research background, followed by the problem statement. After that, the research
questions will be presented. Afterward, the research aim and objectives will be
outlined. This is followed by the research methodology and the significance of the
study. Lastly, the structure of the thesis will be presented before discussing the
research scope and finally ending with the chapter conclusion.
Cost estimate as per definition is a set if compiled item that has been
analyzed which result in the total cost of a project (Adnan Enshassi, 2007).
Estimates of a project generally breakdown into two types which is approximate
7
estimate and detailed estimate. The selection of type of estimate depends on the
ease of estimate and the amount of information available.
The nature of data mining suits well with the cost estimate along with large
amount of uncertain construction data. Uncertain data in construction can be fully
utilized that could benefit the construction industry. In future, data mining should be
able to tackle the issue on cost estimate at every stage of construction.
Data mining in construction has been used a lot for these past few decades.
The advanced in technology has helped those in the industry to improve their quality
and services. As example, the data mining is used in predicting the construction risk
on site. Data mining helps in predicting the outcome that might happen by using the
data collected. However, there are still things need to be improved in order to fully
utilized the potential of data mining.
To mine data for cost estimation, it requires collection of data. The bigger the
size of data, the more accurate the result. In (Akintoye, 2000) research, some of data
needed to be mine are complexity of construction scope, condition of market,
construction method used, site constraint, financial standing of a client, project
8
buildability and location of project. However, these data are not utilized and stored
properly (Soibelman & Kim, 2002). Ahmed, Tezel, Aziz, and Sibley (2017) classified
this problem as operational challenges. The research highlighted this problem
happened because of incapability to work with new technology and the high cost in
cost of technology and recruiting technologist.
9
1.5 Aim and Objectives
The aim of this research is to examine the usage of data mining in improving the
accuracy of construction cost estimates.
Academic contribution
The study will expand the existing body of knowledge, particularly in its contribution
to the quantity surveying literature in Malaysia particularly data mining which only at
infancy level.
Practical contribution
The scope of this research guided by the aim of this research which is to examine the
usage of data mining in improving the accuracy of construction cost estimates. The
respondent for this research will be the data mining expert with information technology
knowledge. The limitation of this research will be limiting only the field of study on
construction cost estimation.
10
1.8 Research Methodology
This research is done by doing literature review and expert interviews. Early findings
of challenges on getting accurate cost estimate and how data mining improve the
accuracy of cost estimate found by literature review. The discovery from literature
review serves as the motivation and rationale when interviewing the expert.
The interviews were audio recorded and transcribed using content analysis.
Analysis is developed by sorting the interview result based on categories derived
from the literature review. The coding process was done by sorting, organizing and
assigning the raw data quoted by the respondents into codes to fit the categories.
The categories are then code accordingly. Lastly, the data obtained were explained
and mapped against previously categorized factor in the literature review.
Chapter 1: Introduction
In this chapter, thesis is summarized which covers the research background,
problems statement, research questions, aims and objectives, significance of this
research, and scope of study.
11
construction cost estimates. Then, a framework on how to conduct a data mining in
improving the accuracy of construction cost estimates.
12
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
This literature review divided into several crucial sections. In the beginning, it will
focus on definition of Construction Cost Estimation. Followed by the type of
construction cost estimation available. Then, this section will explain about the factor
affecting the construction cost estimation accuracy. Then it will explain about
definition of data mining. Followed by the type of data mining available.
2.2.1 Definition
13
2.2.2 Factors influencing cost estimation
There are many researches done on the factor influencing the construction
cost estimates. A research by (Elcin Tas, 2005) highlighted that reliability of
information based on the most recent information. The most updated information will
result in more accurate estimate. In (Aibinu, 2008) also touch on the factor related to
project information. The research emphasized the factor influence for obtaining the
accurate estimates such as project value, gross floor area, number of storeys,
project location, procurement route, project type, structural material used and price
intensity. Furthermore, Stoy, Pollalis, Schalcher, and Management (2008) focus
project information in compactness of the building, number of elevators, absolute
size of the project, construction duration, proportion of openings and region. Last but
not least, Azman, Abdul-Samad, and Ismail (2013) concentrate in project
information upon project value and project size, price intensity theory, number of
bidders, location (state), type of schools and contract period.
Moving to the next factor influence cost estimate, the project attribute is
found to be one. Studies by (Akintoye, 2000) shown that the project attribute that the
attention is project complexity, technological requirements, project information,
project team requirement, contract requirements, project duration and market
requirement. In corresponding to project attribute, Adnan Enshassi (2005) had
similar finding with Akintoye (2000) which is project complexity, project information,
technological requirements, contract efficiency, market requirements, project
duration and project risk. Whereas, scope quality, information quality, uncertainty
level, estimator performance and quality of estimating procedure being the main
attention in (Serpell, 2004).
14
In others research, Li Liu (2007) made a framework on cost estimating and
identified two main two main factors which are control factors and idiosyncratic
factors. From control factors, it can breakdown into more specific class; input,
behavioral and output control categories. While idiosyncratic is influence factor that
cannot be control by the estimator such as market conditions, weather, and site
constraint. Moving to the next research by (Cheng, 2014), the research combined
various influencing factor on construction projects and classified into four group
which are environmental and circumstantial influences, contract scope, projects
risks and management & technique. Next research by (Steven M. Trost, 2003)
highlighted five main factor that affect the accuracy of cost estimate. These factors
are incorporating basic process design, team experience & cost information, the
time allowed for preparing estimates, site requirements and labor climate. Whereas,
Chan and Park (2005) suggest three group of influencing factor which are project
design, complexity & time, as well as the level of professional competency of project
team members and contractors.
Noor Akmal Adillah Ismail, Erezi Utiome, Robert Owen, and Drogemuller
(2015) has done an intensive research on the factor influencing the construction
cost estimate. Multiple previous research on the factor influencing cost estimate is
being compared upon its difference and similarities. As the result, the factor is
recategorized into six; 1) Information of the project 2) Characteristic of the project 3)
Requirement from the project team 4) Requirement of the client 5) The requirement
for the contract 6) External influence. Table below are the primary factors and their
sub-componen
15
Table 1 Summary of cost estimating factors (Noor Akmal Adillah Ismail et al., 2015)
Authors(years) Akintoye Trost & Serpell Enshassi Chan & Elhag Liu & Aibinu Stoy Koleola Cheng Azman
(2000) Ob l d (2004) et al. Park et al. Zhu & et al. & (2013) et al.
(2003) (2005) (2005) (2005) (2007) Pasco (2008) Henry (2013)
(2008) (2008)
Main factors
Project information
Project value X X
Price intensity X X
Project location X X X X X X
Project type X X X
Project duration X X X
Storey/compactness/volume/ X X X
Opening
Project characteristics
Design/construction X X X X X X X X
(drawing/scope/process)
16
Information X X X X X X X
(flow/availability/quality)
Project complexity X X X X
(design/construction)
Project team requirements
Experience/expertise/ X X X X X X X X
professional level
Team alignment/capacity/ X X X
communication
Personal
characteristic/performance
Estimation design/ X X X
process/procedure
Management & technique X X X
(time/cost control)
Client requirements
Client’s budget/financial X X X
status
Return profit /money issues X
Client characteristic/type X X
Time/quality requirements X
17
Contract requirements
Scope of contract X
Tender/contract period X X X
Tender selection method X
Procurement route/contractual X X X X
arrangement
Type of contract/standard X
Pre-contract X X
(design/construction)
External influences
Site requirements X
Bidding/contractor attributeS X X X X X
Market conditions X X X X X X
(rates/inflation/fluctuation)
Technology requirements X X X X
Uncertainties X X X X
(contingencies/variations)
Political situation X X
Environmental X
(climate/geology/disaster)
Disputes X
(contract/regulations/payment)
18
2.2.3 Type of Construction Cost Estimates Approaches
Parametric method is a high level of estimate that use various factor where the data
extracted from historical databases, engineering practices and technologies. Cost
data that is used for this estimate comes from previous project. The estimate is
produced by using historical cost and relevant historical percentage. The efficiency
of this method depends on the project definition available, and the similarity between
the new project and historical models. This method is recommended when there is
little or no design information available. The similar data that is used for this
estimate requires good judgement and analysis when choosing the cost data. The
parametric method is commonly used at the early stage of construction where there
is not much of details for cost estimate.
19
parametric method also uses historical costs and relevant historical percentages,
but the costing is much more detail than previous method as the cost consist of
composite unit prices. however, the costing is at a much more detailed level, often
consisting of composite unit prices. The elemental parametric estimating method is
commonly used on transportation infrastructure projects. It can be used for any size
project and is particularly recommended for very large, complex projects. This
method is also effective for developing project option analysis for comparison
purposes. Elemental parametric estimates are often used as a basis for developing
a project budget.
Detailed cost estimating is the most accurate estimating method. Every cost
item is quantified and priced. This approach is only eligible when the design
definition has advanced, and the unit work can be quantified. Basically, there are
two approaches used in detailed cost estimating namely the historical bid-based
approach and the cost-based approach. Cost-based approach does not rely on
historical cost data. The estimate determines the contractor’s cost for labor,
equipment, materials, and any specialty subcontractor effort, for each item needed
to complete the work. The cost-based approach is not commonly used, but
Contractors generally utilize it to prepare bids.
2.3.1 Definition
Data mining is a process of creating new useful information from sets of data
(Jiawei Han, 2011). Besides, data mining is capable of detecting hidden relationship,
pattern and trends from large amount of data (Sumathi & Sivanandam, 2006). Data
mining can be identified as knowledge discovery in database (KDD), data
exploration and so on.
.
20
2.3.2 The Scope of Data Mining
Data mining originates from the similarities between the search for valuable
business information in a large database. The two above procedures will find out
exactly where the treasured can be found. If the given database is satisfactory in
terms of size and quality, the data mining technology can create new prospects with
these skills (Agyapong, Hayfron-Acquah, & Asante, 2016).
21
2.3.3 Method of Data Mining
For time series analysis, it comprises methods and techniques for analyzing time
series data in order to extract meaningful statistics and other characteristics of the
data. A time series is a collection of temporal data objects; the characteristics of
time series data include large data size, high dimensionality, and updating
22
continuously. Commonly, time series task relies on 3 parts of components, including
representation, similarity measures, and indexing (Esling & Agon, 2012; Fu, 2011).
Lastly, for the outlier analysis, it describes and models’ regularities or trends for
objects whose behavior changes over time. Outlier detection refers to the problem
of finding patterns in data that are very different from the rest of the data based on
appropriate metrics. Such a pattern often contains useful information regarding
abnormal behavior of the system described by the data. Distance based algorithms
calculate the distances among objects in the data with geometric interpretation.
Density-based algorithms estimate the density distribution of the input space and
then identify outliers as those lying in low density. Rough sets-based algorithms
introduce rough sets or fuzzy rough sets to identify outliers (Gogoi, Bhattacharyya,
Borah, & Kalita, 2011).
There are two main data mining models types namely descriptive and predictive
(Agyapong et al., 2016). The descriptive model recognizes the designs or
relationships in data and discovers the properties of the data studied. Predictive
analytics has been defined by Delen and Demirkan (2013) as to have data modeling
as a prerequisite when making authoritative predictions about the future using
business forecasting and simulation. These address the questions of “what will
happen?” and “why will it happen?” A different study by Lechevalier, Narayanan,
and Rachuri (2014), defines Predictive analytics as a tool that “uses statistical
techniques, machine learning, and data mining to discover facts in order to make
predictions about unknown future events,” in investigating a domain-specific
framework for Predictive analytics in manufacturing.
23
2.3.5 Data Mining process
Preprocess-
Data source Data Target data Patterns Knowledge
ed data
The first step of data mining is preparing the data. This step includes 3 sub steps.
First, data is integrating with various data sources and data will undergoes noise
cleaning. Second, some part of data extracted into data mining system. Lastly, the
data is preprocessing to facilitate the data mining. Going to the next step of data
mining, it is where data mining happens. Algorithm were applied to the data to
discover the pattern and evaluate to identified whether it is useful knowledge. Last
process of data mining is presentation of data. Data produced were visualized and
present the knowledge to the user.
24
actual final cost. This indicates the model’s ability to generalize satisfactorily when
validated with new data. The models are being deployed within the operations of the
industry partner involved in this research to help increase the reliability and accuracy
of initial cost estimates.
The table below shows the framework on selecting the best data mining according
to the requirement. Regression data mining were used when predicting something.
The technique uses to undergoes data mining is by using regression analysis. This
technique is used because of its flexibility. Next is data mining by using clustering
method. This method is used when requires pattern discovery. The technique used
is Support vector machine (SVM). This method is used because of its accuracy.
After that, data mining using classification method. This method is used when
requires surveillance. The surveillance done by self-organizing maps. Moving to the
next method, namely visualization. It is use when requires performance. The
technique applied is genetic algorithm that is good in interpretability. Lastly for the
summarization method of data mining is used when measuring business
understanding. This method mainly uses because of its ease of deployment.
After selecting the type of data mining to be used. Now moving to four important
phase in data mining (Weiss & Indurkhya, 2018). In general phase 1 is where data
prepared. Phase 2 for data reduction. Phase 3 is data modelling and prediction.
Phase 4 is case and solution analyses.
25
Phase 1: Data Preparation
Define
Goals
Data Initial
Data Standard
Warehouse Transformations Form
Time-
dependencies
The most crucial part in data mining is the preparation and transformation of data.
This task was unlikely touch in literature review because some consider it too
application specific. Figure below explain the procedure for data preparation where
raw data move into initial standard form, the first complete spreadsheet. This is done
to constraint the original data to a relatively simple uniform representation that is
almost universally acceptable to data mining. Some of the data preparation task can
be performed during the design of data warehouse, but many specialized
transformations may be needed.
26
Phase 2: Data Reduction
Data-
Final
Reduction
Train
Methods
Set
Initial Reduced-
Initial Value and
Standard Data
Trainin Feature
Form Standard
g Set Reductions Form
Final
Initial Test Set Test
Set
27
Phase 3: Data Modelling and Prediction
Change parameter
Prediction
Method
Solution
Compare
to Best
Available
Training Subset of
Data cases
Validation
Test Set
Figure 3: Iterative Data Modelling and Prediction (Weiss & Indurkhya, 2018)
The available training data that achieved from previous phase is then classified into
subset of cases. Then the data is process through prediction method and the
solution is produced. The solution is then undergoing validation test set. This
process is to compare which is the best solution. If the goal is not achieved yet, the
parameter is changed and undergoes process of predicting under prediction
method.
28
Phase 4: Case and Solution Analysis
Validation
Prediction
Test Set
Method
Subset of Error
Solution
Cases Analysis
Available
Training
Increment
Data
Cases
In this phase, the process is almost similar. The different is the available training
data undergoing increment case. Increment case is an attempt to remove error.
Then, the data is related to subset of cases. After that run through prediction
method. When the solution is produced, error analysis on the solution is done. The
solution also validates if still got error, the solution must undergo the increment
cases again.
29
CHAPTER THREE
References
Adnan Enshassi, S. M., Ibrahim Madi. (2005). Factors affecting accuracy of cost estimation
of building contracts in the Gaza strip. 10(2), 115-125.
Adnan Enshassi, S. M., Ibrahim Madi. (2007). Cost Estimation Practice in the Gaza Strip: A
Case Study. IUG Journal for Natural and Engineering Studies, 15(2), 153-177.
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of
items in large databases. Paper presented at the Acm sigmod record.
Agyapong, K. B., Hayfron-Acquah, J., & Asante, M. (2016). An Overview of Data Mining
Models (Descriptive and Predictive).
Ahiaga-Dagbui, D. D., & Smith, S. D. (2014). Dealing with construction cost overruns using
data mining. Construction Management and Economics, 32(7-8), 682-694.
doi:10.1080/01446193.2014.933854
Ahmed, V., Tezel, A., Aziz, Z., & Sibley, M. (2017). The future of Big Data in facilities
management: opportunities and challenges. 35(13/14), 725-745. doi:10.1108/F-06-
2016-0064
Aibinu, A. A. P., Thomas. (2008). The accuracy of pre-tender building cost estimates in
Australia. Construction Management and Economics, 26(12), 1257-1269.
Akintoye, A. (2000). Analysis of factors influencing project cost estimating practice.
Construction Management and Economics, 18(1), 77-89.
doi:10.1080/014461900370979
Ansari, S., Chetlur, S., Prabhu, S., Kini, G. N., Hegde, G., Hyder, Y. J. I. J. o. E. T., &
Engineering, A. (2013). An overview of clustering analysis techniques used in data
mining. 3(12), 284-286.
Azman, M. A., Abdul-Samad, Z., & Ismail, S. J. I. J. o. P. M. (2013). The accuracy of
preliminary cost estimates in Public Works Department (PWD) of Peninsular
Malaysia. 31(7), 994-1005.
Chan, S. L., & Park, M. (2005). Project cost estimation using principal component regression.
Construction Management and Economics, 23(3), 295-304.
doi:10.1080/01446190500039812
Chau, M., Cheng, R., Kao, B., & Ng, J. (2006). Uncertain Data Mining: An Example in
Clustering Location Data. Lecture Notes in Computer Science, 199–204
doi:10.1007/11731139_24
Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A. V., & Rong, X. (2015). Data Mining for
the Internet of Things: Literature Review and Challenges. 11(8), 431047.
doi:10.1155/2015/431047
Cheng, Y.-M. (2014). An exploration into cost-influencing factors on construction projects.
32(5), 850-860.
Delen, D., & Demirkan, H. (2013). Data, information and analytics as services. In: Elsevier.
Elcin Tas, H. Y. (2005). A building cost estimation model based on cost significant work
packages. 12(3), 251-263.
Elhag, T. M. S., Boussabaine, A. H., & Ballal, T. M.A. . (2005). Critical determinants of
construction tendering costs: Quantity surveyors’ standpoint. 23(7), 538-545.
Esling, P., & Agon, C. J. A. C. S. (2012). Time-series data mining. 45(1), 12.
Fu, T.-c. J. E. A. o. A. I. (2011). A review on time series data mining. 24(1), 164-181.
30
Gogoi, P., Bhattacharyya, D., Borah, B., & Kalita, J. K. J. T. C. J. (2011). A survey of outlier
detection methods in network anomaly identification. 54(4), 570-588.
Gosain, A., & Bhugra, M. (2013). A comprehensive survey of association rules on
quantitative data in data mining. Paper presented at the Information &
Communication Technologies (ICT), 2013 IEEE Conference on.
Holm, L., Schaufelberger, J. E., Griffin, D., & Cole, T. (2005). Construction cost estimating:
process and practices: Pearson Prentice Hall Upper Saddle River, New Jersey.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data.
Jaseena, K., David, J. M. J. C. S., & Technology, I. (2014). Issues, challenges, and solutions:
Big data mining. 131-140.
Jiawei Han, M. K., Jian Pei. (2011). Data Mining. Concepts and Techniques, 3rd Edition-
Morgan Kaufmann
Kesavaraj, G., & Sukumaran, S. (2013). A study on classification techniques in data mining.
Paper presented at the Computing, Communications and Networking Technologies
(ICCCNT), 2013 Fourth International Conference on.
Koleola T. Odusami, H. N. O. (2008). Factors affecting the accuracy of a pre-tender cost
estimate in Nigeria. 50(9), 32.
Lechevalier, D., Narayanan, A., & Rachuri, S. (2014). Towards a domain-specific framework
for predictive analytics in manufacturing. Paper presented at the BigData
Conference.
Li Liu, K. Z. (2007). Improving cost estimates of construction projects using phased cost
factors. 133(1), 91-95.
MOTI, M. o. T. a. I. (2013). PROJECT COST ESTIMATING GUIDELINES Version 01.02.
Blanshard Street, Victoria BC
Noor Akmal Adillah Ismail, Erezi Utiome, Robert Owen, & Drogemuller, R. (2015). Exploring
Accuracy Factors in Cost Estimating Practice towards Implementing Building
Information Modelling (BIM).
Peurifoy, R. L., & Oberlender, G. D. (2008). Estimating Construction Costs: McGraw-Hill
Education.
Piatetsky-Shapiro, G., Matheus, C., Smyth, P., & Uthurusamy, R. J. A. m. (1994). Kdd-93:
Progress and challenges in knowledge discovery in databases. 15(3), 77.
Serpell, A. F. (2004). Towards a knowledge-based assessment of conceptual cost estimates.
32(2), 157-164.
Soibelman, L., & Kim, H. (2002). Data Preparation Process for Construction Knowledge
Generation through Knowledge Discovery in Databases. Journal of Computing in
Civil Engineering, 16(1), 39-48. doi:10.1061/(ASCE)0887-3801(2002)16:1(39)
Steven M. Trost, G. D. O. (2003). Predicting accuracy of early cost estimates using factor
analysis and multivariate regression. 129(2), 198-204.
Stoy, C., Pollalis, S., Schalcher, H.-R. J. J. o. C. E., & Management. (2008). Drivers for cost
estimating in early design: Case study of residential construction. 134(1), 32-39.
Sumathi, S., & Sivanandam, S. (2006). Introduction to data mining and its applications (Vol.
29): Springer.
Sumera Ahmad, G. R. B. (2017). Software Cost Estimation Using Data Mining: Review.
International Journal of Scientific Development and Research (IJSDR), 2(7), 181-183.
Weiss, S., & Indurkhya, N. (2018). Predictive data mining : a practical guide / Sholom M.
Weiss, Nitin Indurkhya.
Williams, J. (2013). Estimating for Building & Civil Engineering Work: Routledge.
31