Research Paper Afiq

B.
QS
DATA MINING IN IMPROVING THE ACCURACY

OF CONSTRUCTION COST ESTIMATE
MOHAMAD AFIQ ZUHAIRY MARZUKI
MOHAMAD AFIQ ZUHAIRY MARZUKI
BIC160015
2019
FACULTY OF BUILT ENVIRONMENT
v
TABLE OF CONTENTS
PAGE
NO.
CHAPTER ONE 7
INTRODUCTION 7
1.1 Introduction 7
1.2 Research Background 7
1.3 Problem statement 8
1.4 Research Questions 9
1.5 Aim and Objectives 10
1.6 Significance of the Research 10
1.7 Scope of Research 10
1.8 Research Methodology 11
1.9 Dissertation Structure 11
CHAPTER TWO 13
LITERATURE REVIEW 13
2.1 Introduction 13
2.2 Construction Cost Estimation 13
2.2.1 Definition 13
2.2.2 Factors influencing cost estimation 14
2.2.3 Type of Construction Cost Estimates Approaches 19
2.3 Data Mining 20
2.3.1 Definition 20
2.3.2 The Scope of Data Mining 21
2.3.3 Method of Data Mining 22
2.3.4 Data mining model 23
2.3.5 Data Mining process 24
2.4 Data Mining in improving the accuracy of construction cost estimation 24
2.5 Framework for data mining 25
CHAPTER THREE 30
References 30
6
CHAPTER ONE
INTRODUCTION
1.1 Introduction
The overall view of the study is highlighted in this chapter. This chapter starts
with research background, followed by the problem statement. After that, the research
questions will be presented. Afterward, the research aim and objectives will be
outlined. This is followed by the research methodology and the significance of the
study. Lastly, the structure of the thesis will be presented before discussing the
research scope and finally ending with the chapter conclusion.
1.2 Research Background
Construction industry is one of the most challenging industry because of it’s

unique nature. One construction project is not entirely same as others. The project
differs with one and another due to the type of project, financial status of a client,
design requirement, location and others. One of thee most important step to start a
construction project is cost estimating. The result of a cost estimating will define
whether the project is viable or not to proceed. The cost estimate also will be the
basis of construction throughout the construction process.
Cost estimate as per definition is a set if compiled item that has been
analyzed which result in the total cost of a project (Adnan Enshassi, 2007).
Estimates of a project generally breakdown into two types which is approximate
7
estimate and detailed estimate. The selection of type of estimate depends on the
ease of estimate and the amount of information available.
The accuracy of a construction project is very crucial. If the total cost of

estimate is more than the actual cost, the project can be considered fail as it cost
much loss to the client of the project and the stakeholder. Throughout the year, the
cost estimate expert such as quantity surveyor has done tons of research in order to
improve the exactness of the estimate. With the advance in technology and
development, data mining is set to be used to improve the accuracy of cost
estimates.
Data mining also known as knowledge discovery from data (KDD) is a

method of transforming large amount of data set into useful pattern and knowledge
(Jiawei Han, 2011). According to (Sumera Ahmad, 2017), data mining categorized
into two types which is descriptive and predictive. The research paper explain that
the “descriptive mining means cauterizing the data on the bases of general
properties of data into databases”. While, “predictive mining tasks deals with
performing inference on current data for prediction”.
The nature of data mining suits well with the cost estimate along with large
amount of uncertain construction data. Uncertain data in construction can be fully
utilized that could benefit the construction industry. In future, data mining should be
able to tackle the issue on cost estimate at every stage of construction.
1.3 Problem statement
Data mining in construction has been used a lot for these past few decades.
The advanced in technology has helped those in the industry to improve their quality
and services. As example, the data mining is used in predicting the construction risk
on site. Data mining helps in predicting the outcome that might happen by using the
data collected. However, there are still things need to be improved in order to fully
utilized the potential of data mining.
To mine data for cost estimation, it requires collection of data. The bigger the
size of data, the more accurate the result. In (Akintoye, 2000) research, some of data
needed to be mine are complexity of construction scope, condition of market,
construction method used, site constraint, financial standing of a client, project
8
buildability and location of project. However, these data are not utilized and stored
properly (Soibelman & Kim, 2002). Ahmed, Tezel, Aziz, and Sibley (2017) classified
this problem as operational challenges. The research highlighted this problem
happened because of incapability to work with new technology and the high cost in
cost of technology and recruiting technologist.
Data collected may contain imprecision and uncertainty. Uncertainty of data

results from inaccuracy of measurement, discrepancy in sampling, data from outdated
sources, or other errors (Chau, Cheng, Kao, & Ng, 2006). Jaseena, David, and
Technology (2014) also highlighted the same issue. The research explains that the
because of some large scale as well as the presence of mixed data based on different
patterns or rules in the collected and stored data. Only few researches identified the
issue in uncertainty of mining data. The value of data is no longer atomic with this
uncertainty. The uncertain data need to be summarized into atomic values when
applying traditional data mining technique. Meanwhile, the quality of the mining results
could seriously affect from discrepancy in the summarized recorded values and the
actual values. Many objects could possibly put into wrong cluster if relying solely on
the recorded values. Things can cause more problem if each cluster member would
change centroids of the cluster, thus resulting in more errors.
Lastly, Chen et al. (2015) highlighted another problem in accessing, extracting

large scale data from different data storage locations. The amount of information
needed is hard to get due to unavailability of well-defined automated mechanism for
site manager to extract, preprocess, analyze the data and summarized the result
(Soibelman & Kim, 2002). Integration of data mining to existing software has been an
issue for quite some time. (Piatetsky-Shapiro, Matheus, Smyth, & Uthurusamy, 1994)
highlighted that for new deployed system, only few interface is changed. The rest of
the software is still the same.
1.4 Research Questions
1. What is the factor affecting the accuracy of construction cost estimation?

2. What is the type of data mining in construction cost estimation method
available?
3. How the data mining help to improve accurate construction cost estimation?
9
1.5 Aim and Objectives
The aim of this research is to examine the usage of data mining in improving the
accuracy of construction cost estimates.
The research aim can be broken down into these objectives: -
• To evaluate the current challenges in improving the accuracy of construction

estimates.
• To identify various method of data mining in construction cost estimating.
• To develop a procedure strategies framework (PSF) for predictive cost
analytics by using data mining.
1.6 Significance of the Research
Academic contribution
The study will expand the existing body of knowledge, particularly in its contribution
to the quantity surveying literature in Malaysia particularly data mining which only at
infancy level.
Practical contribution
The study will assist stakeholders especially construction manager in understanding

the correct way of data mining. This understanding will help construction industry
formulating strategies that could facilitate the decision-making process of investors
and therefore increase efficiency in construction cost estimate.
1.7 Scope of Research
The scope of this research guided by the aim of this research which is to examine the
usage of data mining in improving the accuracy of construction cost estimates. The
respondent for this research will be the data mining expert with information technology
knowledge. The limitation of this research will be limiting only the field of study on
construction cost estimation.
10
1.8 Research Methodology
This research is done by doing literature review and expert interviews. Early findings
of challenges on getting accurate cost estimate and how data mining improve the
accuracy of cost estimate found by literature review. The discovery from literature
review serves as the motivation and rationale when interviewing the expert.
A qualitative approach was done to support information discovered from the

literature review. Semi structured interviews were chosen to gain more insights on
data mining by data mining expert. Moreover, information also gained from interview
that consume much time and commitment. The respondents could give as detail as
they wanted regarding the topic. As a result, the data from literature review will be
validated as the interviews approves the fact.
The interviews were audio recorded and transcribed using content analysis.
Analysis is developed by sorting the interview result based on categories derived
from the literature review. The coding process was done by sorting, organizing and
assigning the raw data quoted by the respondents into codes to fit the categories.
The categories are then code accordingly. Lastly, the data obtained were explained
and mapped against previously categorized factor in the literature review.
1.9 Dissertation Structure
This research report structured into six chapter.
Chapter 1: Introduction
In this chapter, thesis is summarized which covers the research background,
problems statement, research questions, aims and objectives, significance of this
research, and scope of study.
Chapter 2: Literature Review

In this chapter, in depth research was done by extracting valuable information from
previous research, articles and scholarly materials related to construction cost
estimate and data mining. Basically, this chapter is about definition, type and process
involved in improving the accuracy of construction cost estimate. Moreover, this
chapter explains the usage of data mining in construction industry especially in
11
construction cost estimates. Then, a framework on how to conduct a data mining in
improving the accuracy of construction cost estimates.
Chapter 3: Research Methodology

In this chapter, the research methodologies are explained briefly. The methodology
used is by literature review and qualitative approaches. All method used must be
sufficient in order to achieve the aim and objectives of this study.
Chapter 4: Results of Data Analysis

In this chapter, data collected will be based on information achieved from interviews.
The results are recoded by audio recorder and sorted accordingly.
Chapter 5: Discussion of Results

Outcome of the research are discussed in this chapter. Comparative study is on
literature review constructed based on the collection of the findings. Conclusion will
be made based on self-understanding throughout the research.
Chapter 6: Conclusion and Recommendation

This chapter summarized all the findings in this research. Besides, the research
question will be answered. Moreover, the recommendation for further research will
be presented.
12
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
This literature review divided into several crucial sections. In the beginning, it will
focus on definition of Construction Cost Estimation. Followed by the type of
construction cost estimation available. Then, this section will explain about the factor
affecting the construction cost estimation accuracy. Then it will explain about
definition of data mining. Followed by the type of data mining available.
2.2 Construction Cost Estimation
2.2.1 Definition
Construction cost estimate is a process of examining the cost related to the

specific scope of work and the total cost related in completing the work (Adnan
Enshassi, 2007). This process involved the activity of gathering the available data
related to the building cost project, analyze and summarize the data (Holm,
Schaufelberger, Griffin, & Cole, 2005). By referring to (Williams, 2013), construction
cost estimate takes into consideration element such as labor, material and plant unit
cost of the various items of work as itemized in a Bill of Quantities or Specification of
Works. He added that the total construction costs should also include of site charges
or what is known as preliminaries overheads and establishment charges, provisional
sums, day works and prime cost amounts.
13
2.2.2 Factors influencing cost estimation
There are many researches done on the factor influencing the construction
cost estimates. A research by (Elcin Tas, 2005) highlighted that reliability of
information based on the most recent information. The most updated information will
result in more accurate estimate. In (Aibinu, 2008) also touch on the factor related to
project information. The research emphasized the factor influence for obtaining the
accurate estimates such as project value, gross floor area, number of storeys,
project location, procurement route, project type, structural material used and price
intensity. Furthermore, Stoy, Pollalis, Schalcher, and Management (2008) focus
project information in compactness of the building, number of elevators, absolute
size of the project, construction duration, proportion of openings and region. Last but
not least, Azman, Abdul-Samad, and Ismail (2013) concentrate in project
information upon project value and project size, price intensity theory, number of
bidders, location (state), type of schools and contract period.
Moving to the next factor influence cost estimate, the project attribute is
found to be one. Studies by (Akintoye, 2000) shown that the project attribute that the
attention is project complexity, technological requirements, project information,
project team requirement, contract requirements, project duration and market
requirement. In corresponding to project attribute, Adnan Enshassi (2005) had
similar finding with Akintoye (2000) which is project complexity, project information,
technological requirements, contract efficiency, market requirements, project
duration and project risk. Whereas, scope quality, information quality, uncertainty
level, estimator performance and quality of estimating procedure being the main
attention in (Serpell, 2004).
Elhag (2005) in his research has categorized six influencing factors of

construction projects at the pre-tender stage for building projects in the UK. namely:
client characteristics, consultant & design parameters, contractor attributes, project
characteristics, contract procurement methods and external factors/market
conditions. In conclusion to the research, any action by the stakeholders at early
stage of construction rather than those contractor during later stage of construction.
Nonetheless. There is a Nigerian project that has minor difference from the common
factor. As referred (Koleola T. Odusami, 2008), the research take into consideration
of factors such as consultants’ expertise, information quality & flow requirements,
project team’s experience of construction type, tender period & market conditions,
extent of completion of pre-contract design and complexity of design & construction.
14
In others research, Li Liu (2007) made a framework on cost estimating and
identified two main two main factors which are control factors and idiosyncratic
factors. From control factors, it can breakdown into more specific class; input,
behavioral and output control categories. While idiosyncratic is influence factor that
cannot be control by the estimator such as market conditions, weather, and site
constraint. Moving to the next research by (Cheng, 2014), the research combined
various influencing factor on construction projects and classified into four group
which are environmental and circumstantial influences, contract scope, projects
risks and management & technique. Next research by (Steven M. Trost, 2003)
highlighted five main factor that affect the accuracy of cost estimate. These factors
are incorporating basic process design, team experience & cost information, the
time allowed for preparing estimates, site requirements and labor climate. Whereas,
Chan and Park (2005) suggest three group of influencing factor which are project
design, complexity & time, as well as the level of professional competency of project
team members and contractors.
Noor Akmal Adillah Ismail, Erezi Utiome, Robert Owen, and Drogemuller
(2015) has done an intensive research on the factor influencing the construction
cost estimate. Multiple previous research on the factor influencing cost estimate is
being compared upon its difference and similarities. As the result, the factor is
recategorized into six; 1) Information of the project 2) Characteristic of the project 3)
Requirement from the project team 4) Requirement of the client 5) The requirement
for the contract 6) External influence. Table below are the primary factors and their
sub-componen
15
Table 1 Summary of cost estimating factors (Noor Akmal Adillah Ismail et al., 2015)
Authors(years) Akintoye Trost & Serpell Enshassi Chan & Elhag Liu & Aibinu Stoy Koleola Cheng Azman
(2000) Ob l d (2004) et al. Park et al. Zhu & et al. & (2013) et al.
(2003) (2005) (2005) (2005) (2007) Pasco (2008) Henry (2013)
(2008) (2008)
Main factors
Project information
Project value X X
Project size/floor area X X X X
Price intensity X X
Project location X X X X X X
Project type X X X
Project duration X X X
Storey/compactness/volume/ X X X
Opening
Project characteristics
Design/construction X X X X X X X X
(drawing/scope/process)
16
Information X X X X X X X
(flow/availability/quality)
Project complexity X X X X
(design/construction)
Project team requirements
Experience/expertise/ X X X X X X X X
professional level
Team alignment/capacity/ X X X
communication
Personal
characteristic/performance
Estimation design/ X X X
process/procedure
Management & technique X X X
(time/cost control)
Client requirements
Client’s budget/financial X X X
status
Return profit /money issues X
Client characteristic/type X X
Time/quality requirements X
17
Contract requirements
Scope of contract X
Tender/contract period X X X
Tender selection method X
Procurement route/contractual X X X X
arrangement
Type of contract/standard X
Pre-contract X X
(design/construction)
External influences
Site requirements X
Bidding/contractor attributeS X X X X X
Market conditions X X X X X X
(rates/inflation/fluctuation)
Technology requirements X X X X
Uncertainties X X X X
(contingencies/variations)
Political situation X X
Environmental X
(climate/geology/disaster)
Disputes X
(contract/regulations/payment)
18
2.2.3 Type of Construction Cost Estimates Approaches
Choosing the right approach is very important at the early stage of

construction. Good selection will result in cost saving and avoid waste of time.
Different type of approaches will result in different type of accuracy. The selection of
cost estimate approaches commonly depends on the availability of data and the
ease of application. Generally, construction cost estimate divided into two types
which is approximate/parametric estimates and detailed estimates (Peurifoy &
Oberlender, 2008). However, MOTI (2013) categorized cost estimate into three
types namely parametric estimate, elemental parametric estimate and detailed cost
estimate.
Parametric Estimating Method
Parametric method is a high level of estimate that use various factor where the data
extracted from historical databases, engineering practices and technologies. Cost
data that is used for this estimate comes from previous project. The estimate is
produced by using historical cost and relevant historical percentage. The efficiency
of this method depends on the project definition available, and the similarity between
the new project and historical models. This method is recommended when there is
little or no design information available. The similar data that is used for this
estimate requires good judgement and analysis when choosing the cost data. The
parametric method is commonly used at the early stage of construction where there
is not much of details for cost estimate.
Elemental Parametric Estimating Method
This method of estimates produces based on its elements and parameters. As

example of elements building blocks, number of rooms and so on while parameters
are variables that need to be defines such as number of culverts for drainage, depth
of materials and so on. This method is combination of elements and parametric. It
does not provide details on cost. However, the data is consistent and increasingly
detailed breakdown for decision-making over the project life cycle. The elemental
19
parametric method also uses historical costs and relevant historical percentages,
but the costing is much more detail than previous method as the cost consist of
composite unit prices. however, the costing is at a much more detailed level, often
consisting of composite unit prices. The elemental parametric estimating method is
commonly used on transportation infrastructure projects. It can be used for any size
project and is particularly recommended for very large, complex projects. This
method is also effective for developing project option analysis for comparison
purposes. Elemental parametric estimates are often used as a basis for developing
a project budget.
Detailed Cost Estimating Method
Detailed cost estimating is the most accurate estimating method. Every cost
item is quantified and priced. This approach is only eligible when the design
definition has advanced, and the unit work can be quantified. Basically, there are
two approaches used in detailed cost estimating namely the historical bid-based
approach and the cost-based approach. Cost-based approach does not rely on
historical cost data. The estimate determines the contractor’s cost for labor,
equipment, materials, and any specialty subcontractor effort, for each item needed
to complete the work. The cost-based approach is not commonly used, but
Contractors generally utilize it to prepare bids.
2.3 Data Mining
2.3.1 Definition
Data mining is a process of creating new useful information from sets of data
(Jiawei Han, 2011). Besides, data mining is capable of detecting hidden relationship,
pattern and trends from large amount of data (Sumathi & Sivanandam, 2006). Data
mining can be identified as knowledge discovery in database (KDD), data
exploration and so on.
.
20
2.3.2 The Scope of Data Mining
Data mining originates from the similarities between the search for valuable
business information in a large database. The two above procedures will find out
exactly where the treasured can be found. If the given database is satisfactory in
terms of size and quality, the data mining technology can create new prospects with
these skills (Agyapong, Hayfron-Acquah, & Asante, 2016).
Behaviors and automated trend prediction. Data mining mechanizes the

process of discovery in large databases of predictive data. Questions that require a
wide range of hands- on research can now be answered directly from the data
quickly. Target marketing is a classic predictive problem. Data mining customs data
on past advertising mailings to categorize the objectives probably to achieve the
best return on investment in future mailings. Other predictive problems include the
prediction of bankruptcy and other forms of default and the identification of
population segments that are likely to respond similarly to certain events (Agyapong
et al., 2016).
Other predictive problems include the prediction of bankruptcy and other

forms of default and the identification of population segments that are likely to
respond similarly to certain events. Other problems with design detection include the
notice of fraudulent credit card transactions and the recognition of irregular data that
could symbolize input errors (Agyapong et al., 2016).
Data mining techniques can generate the advantages of mechanization on

existing software and hardware platforms and can be applied to new systems as
existing platforms are raised and new products are established. Data mining tools
can use high- performance parallel processing systems to investigate large
databases in a few minutes. Faster processing means that users can mechanically
test compound data with additional models. High swiftness makes it hands-on for
users to investigate massive amounts of data. Databases that are bigger, always in
turn produce better quality forecast (Agyapong et al., 2016).
21
2.3.3 Method of Data Mining
There are several method of data mining namely classification, clustering,

association analysis, time series analysis, and outlier analysis (Chen et al., 2015).
Each method had different goals and suitability. When deciding the method to be
used, the goals of doing data mining should be clearly so that the characteristic of
methods reach the requirement of the method.
Classification is the process of gathering sets of models or function and distinguish

the existing classes and classified them into new group. Classification helps in
making the right decision. When an object assigns to a predefined target classes, it
is called classification. Classification is done to accurately predict the target class for
the given data (Kesavaraj & Sukumaran, 2013). The classification of data can be
achieved by applying decision tree induction, frame-based or rule-based expert
systems, hierarchical classification, neural networks, Bayesian network, and support
vector machines.
Next method is clustering. Clustering is a method which analyzes data objects

without consulting a known class model. Clustering algorithms (Jain & Dubes, 1988)
split the data into meaningful groups so that patterns in the same group are similar
in some sense and patterns in different group are dissimilar in the same sense.
Searching for clusters involves unsupervised learning (Ansari et al., 2013). In
information retrieval, for example, the search engine clusters billions of web pages
into different groups, such as news, reviews, videos, and audios.
Next, association analysis is the discovery of association rules displaying attribute

value conditions that frequently occur together in a given set of data. Association
Analysis. Association rule mining (Agrawal, Imieliński, & Swami, 1993) focuses on
the market basket analysis or transaction data analysis, and it targets discovery of
rules showing attribute value associations that occur frequently and also help in the
generation of more general and qualitative knowledge which in turn helps in decision
making (Gosain & Bhugra, 2013).
For time series analysis, it comprises methods and techniques for analyzing time
series data in order to extract meaningful statistics and other characteristics of the
data. A time series is a collection of temporal data objects; the characteristics of
time series data include large data size, high dimensionality, and updating
22
continuously. Commonly, time series task relies on 3 parts of components, including
representation, similarity measures, and indexing (Esling & Agon, 2012; Fu, 2011).
Lastly, for the outlier analysis, it describes and models’ regularities or trends for
objects whose behavior changes over time. Outlier detection refers to the problem
of finding patterns in data that are very different from the rest of the data based on
appropriate metrics. Such a pattern often contains useful information regarding
abnormal behavior of the system described by the data. Distance based algorithms
calculate the distances among objects in the data with geometric interpretation.
Density-based algorithms estimate the density distribution of the input space and
then identify outliers as those lying in low density. Rough sets-based algorithms
introduce rough sets or fuzzy rough sets to identify outliers (Gogoi, Bhattacharyya,
Borah, & Kalita, 2011).
2.3.4 Data mining model
There are two main data mining models types namely descriptive and predictive
(Agyapong et al., 2016). The descriptive model recognizes the designs or
relationships in data and discovers the properties of the data studied. Predictive
analytics has been defined by Delen and Demirkan (2013) as to have data modeling
as a prerequisite when making authoritative predictions about the future using
business forecasting and simulation. These address the questions of “what will
happen?” and “why will it happen?” A different study by Lechevalier, Narayanan,
and Rachuri (2014), defines Predictive analytics as a tool that “uses statistical
techniques, machine learning, and data mining to discover facts in order to make
predictions about unknown future events,” in investigating a domain-specific
framework for Predictive analytics in manufacturing.
23
2.3.5 Data Mining process
Data preparation Data mining Presentation
Preprocess-
Data source Data Target data Patterns Knowledge
ed data
Integration Extraction Preprocess Mining Visualization
The data mining process (Chen et al., 2015).
The first step of data mining is preparing the data. This step includes 3 sub steps.
First, data is integrating with various data sources and data will undergoes noise
cleaning. Second, some part of data extracted into data mining system. Lastly, the
data is preprocessing to facilitate the data mining. Going to the next step of data
mining, it is where data mining happens. Algorithm were applied to the data to
discover the pattern and evaluate to identified whether it is useful knowledge. Last
process of data mining is presentation of data. Data produced were visualized and
present the knowledge to the user.
2.4 Data Mining in improving the accuracy of construction cost estimation
Existing theories on construction cost overrun suggest a number of causes

ranging from technical difficulties, optimism bias, managerial incompetence and
strategic misrepresentation. However, much of the budgetary decision-making
process in the early stages of a project is carried out in an environment of high
uncertainty with little available information for accurate estimation. For these past
few years. There are theories that data mining can improve cost estimation with high
uncertainty together with less information available.
Research by (Ahiaga-Dagbui & Smith, 2014) highlighted that data mining

using non-parametric bootstrapping and ensemble modelling in artificial neural
networks, final project cost-forecasting models were developed with 1600 completed
projects. This helped to extract information embedded in data on completed
construction projects, to address the problem of the dearth of information in the early
stages of a project. It was found that 92% of the 100 validation predictions were
within ±10% of the actual final cost of the project while 77% were within ±5% of
24
actual final cost. This indicates the model’s ability to generalize satisfactorily when
validated with new data. The models are being deployed within the operations of the
industry partner involved in this research to help increase the reliability and accuracy
of initial cost estimates.
2.5 Framework for data mining
The table below shows the framework on selecting the best data mining according
to the requirement. Regression data mining were used when predicting something.
The technique uses to undergoes data mining is by using regression analysis. This
technique is used because of its flexibility. Next is data mining by using clustering
method. This method is used when requires pattern discovery. The technique used
is Support vector machine (SVM). This method is used because of its accuracy.
After that, data mining using classification method. This method is used when
requires surveillance. The surveillance done by self-organizing maps. Moving to the
next method, namely visualization. It is use when requires performance. The
technique applied is genetic algorithm that is good in interpretability. Lastly for the
summarization method of data mining is used when measuring business
understanding. This method mainly uses because of its ease of deployment.
Table 2: Framework for selecting data mining technique (Ahiaga-Dagbui &

Smith, 2014)
Data mining Data mining Data mining Technique

category requirement technique characteristic
Regression Prediction Regression Flexibility

Clustering Pattern discovery Support vector Accuracy
machine (SVM) (precision)
Classification Surveillance Self-organizing Power
maps.
Visualization Performance Genetic algorithm Interpretability
Summarization Measurement Ease of

Business deployment
Understanding
After selecting the type of data mining to be used. Now moving to four important
phase in data mining (Weiss & Indurkhya, 2018). In general phase 1 is where data
prepared. Phase 2 for data reduction. Phase 3 is data modelling and prediction.
Phase 4 is case and solution analyses.
25
Phase 1: Data Preparation
Define
Goals
Data Initial
Data Standard
Warehouse Transformations Form
Time-
dependencies
Figure 1: Data Preparation (Weiss & Indurkhya, 2018)
The most crucial part in data mining is the preparation and transformation of data.
This task was unlikely touch in literature review because some consider it too
application specific. Figure below explain the procedure for data preparation where
raw data move into initial standard form, the first complete spreadsheet. This is done
to constraint the original data to a relatively simple uniform representation that is
almost universally acceptable to data mining. Some of the data preparation task can
be performed during the design of data warehouse, but many specialized
transformations may be needed.
26
Phase 2: Data Reduction
Data-
Final
Reduction
Train
Methods
Set
Initial Reduced-
Initial Value and
Standard Data
Trainin Feature
Form Standard
g Set Reductions Form
Final
Initial Test Set Test
Set
Figure 2: Data Reduction (Weiss & Indurkhya, 2018)
In theory, having big data for training

and testing are good but in practice, the data may be too big. The dimension can
exceed the capacity of a prediction program, or it may take too long to process and
produce solution. Big amount of data can also cause the experiments repeated.
Once the data in standard form, there are a number of effective techniques to
reduce dimensions. Figure below illustrates the principal steps for data reduction.
Given the standard form spreadsheet, the data are reduced by either features or
values, and a new spreadsheet is produced. When the data dimensions of the
standard form are within acceptable bounds, data reduction may be bypassed.
27
Phase 3: Data Modelling and Prediction
Change parameter
Prediction
Method
Solution
Compare
to Best
Available
Training Subset of
Data cases
Validation
Test Set
Figure 3: Iterative Data Modelling and Prediction (Weiss & Indurkhya, 2018)
The available training data that achieved from previous phase is then classified into
subset of cases. Then the data is process through prediction method and the
solution is produced. The solution is then undergoing validation test set. This
process is to compare which is the best solution. If the goal is not achieved yet, the
parameter is changed and undergoes process of predicting under prediction
method.
28
Phase 4: Case and Solution Analysis
Validation
Prediction
Test Set
Method
Subset of Error
Solution
Cases Analysis
Available
Training
Increment
Data
Cases
Figure 4: Case and solution analysis (Weiss & Indurkhya, 2018)
In this phase, the process is almost similar. The different is the available training
data undergoing increment case. Increment case is an attempt to remove error.
Then, the data is related to subset of cases. After that run through prediction
method. When the solution is produced, error analysis on the solution is done. The
solution also validates if still got error, the solution must undergo the increment
cases again.
29
CHAPTER THREE
References
Adnan Enshassi, S. M., Ibrahim Madi. (2005). Factors affecting accuracy of cost estimation
of building contracts in the Gaza strip. 10(2), 115-125.
Adnan Enshassi, S. M., Ibrahim Madi. (2007). Cost Estimation Practice in the Gaza Strip: A
Case Study. IUG Journal for Natural and Engineering Studies, 15(2), 153-177.
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of
items in large databases. Paper presented at the Acm sigmod record.
Agyapong, K. B., Hayfron-Acquah, J., & Asante, M. (2016). An Overview of Data Mining
Models (Descriptive and Predictive).
Ahiaga-Dagbui, D. D., & Smith, S. D. (2014). Dealing with construction cost overruns using
data mining. Construction Management and Economics, 32(7-8), 682-694.
doi:10.1080/01446193.2014.933854
Ahmed, V., Tezel, A., Aziz, Z., & Sibley, M. (2017). The future of Big Data in facilities
management: opportunities and challenges. 35(13/14), 725-745. doi:10.1108/F-06-
2016-0064
Aibinu, A. A. P., Thomas. (2008). The accuracy of pre-tender building cost estimates in
Australia. Construction Management and Economics, 26(12), 1257-1269.
Akintoye, A. (2000). Analysis of factors influencing project cost estimating practice.
Construction Management and Economics, 18(1), 77-89.
doi:10.1080/014461900370979
Ansari, S., Chetlur, S., Prabhu, S., Kini, G. N., Hegde, G., Hyder, Y. J. I. J. o. E. T., &
Engineering, A. (2013). An overview of clustering analysis techniques used in data
mining. 3(12), 284-286.
Azman, M. A., Abdul-Samad, Z., & Ismail, S. J. I. J. o. P. M. (2013). The accuracy of
preliminary cost estimates in Public Works Department (PWD) of Peninsular
Malaysia. 31(7), 994-1005.
Chan, S. L., & Park, M. (2005). Project cost estimation using principal component regression.
Construction Management and Economics, 23(3), 295-304.
doi:10.1080/01446190500039812
Chau, M., Cheng, R., Kao, B., & Ng, J. (2006). Uncertain Data Mining: An Example in
Clustering Location Data. Lecture Notes in Computer Science, 199–204
doi:10.1007/11731139_24
Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A. V., & Rong, X. (2015). Data Mining for
the Internet of Things: Literature Review and Challenges. 11(8), 431047.
doi:10.1155/2015/431047
Cheng, Y.-M. (2014). An exploration into cost-influencing factors on construction projects.
32(5), 850-860.
Delen, D., & Demirkan, H. (2013). Data, information and analytics as services. In: Elsevier.
Elcin Tas, H. Y. (2005). A building cost estimation model based on cost significant work
packages. 12(3), 251-263.
Elhag, T. M. S., Boussabaine, A. H., & Ballal, T. M.A. . (2005). Critical determinants of
construction tendering costs: Quantity surveyors’ standpoint. 23(7), 538-545.
Esling, P., & Agon, C. J. A. C. S. (2012). Time-series data mining. 45(1), 12.
Fu, T.-c. J. E. A. o. A. I. (2011). A review on time series data mining. 24(1), 164-181.
30
Gogoi, P., Bhattacharyya, D., Borah, B., & Kalita, J. K. J. T. C. J. (2011). A survey of outlier
detection methods in network anomaly identification. 54(4), 570-588.
Gosain, A., & Bhugra, M. (2013). A comprehensive survey of association rules on
quantitative data in data mining. Paper presented at the Information &
Communication Technologies (ICT), 2013 IEEE Conference on.
Holm, L., Schaufelberger, J. E., Griffin, D., & Cole, T. (2005). Construction cost estimating:
process and practices: Pearson Prentice Hall Upper Saddle River, New Jersey.
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data.
Jaseena, K., David, J. M. J. C. S., & Technology, I. (2014). Issues, challenges, and solutions:
Big data mining. 131-140.
Jiawei Han, M. K., Jian Pei. (2011). Data Mining. Concepts and Techniques, 3rd Edition-
Morgan Kaufmann
Kesavaraj, G., & Sukumaran, S. (2013). A study on classification techniques in data mining.
Paper presented at the Computing, Communications and Networking Technologies
(ICCCNT), 2013 Fourth International Conference on.
Koleola T. Odusami, H. N. O. (2008). Factors affecting the accuracy of a pre-tender cost
estimate in Nigeria. 50(9), 32.
Lechevalier, D., Narayanan, A., & Rachuri, S. (2014). Towards a domain-specific framework
for predictive analytics in manufacturing. Paper presented at the BigData
Conference.
Li Liu, K. Z. (2007). Improving cost estimates of construction projects using phased cost
factors. 133(1), 91-95.
MOTI, M. o. T. a. I. (2013). PROJECT COST ESTIMATING GUIDELINES Version 01.02.
Blanshard Street, Victoria BC
Noor Akmal Adillah Ismail, Erezi Utiome, Robert Owen, & Drogemuller, R. (2015). Exploring
Accuracy Factors in Cost Estimating Practice towards Implementing Building
Information Modelling (BIM).
Peurifoy, R. L., & Oberlender, G. D. (2008). Estimating Construction Costs: McGraw-Hill
Education.
Piatetsky-Shapiro, G., Matheus, C., Smyth, P., & Uthurusamy, R. J. A. m. (1994). Kdd-93:
Progress and challenges in knowledge discovery in databases. 15(3), 77.
Serpell, A. F. (2004). Towards a knowledge-based assessment of conceptual cost estimates.
32(2), 157-164.
Soibelman, L., & Kim, H. (2002). Data Preparation Process for Construction Knowledge
Generation through Knowledge Discovery in Databases. Journal of Computing in
Civil Engineering, 16(1), 39-48. doi:10.1061/(ASCE)0887-3801(2002)16:1(39)
Steven M. Trost, G. D. O. (2003). Predicting accuracy of early cost estimates using factor
analysis and multivariate regression. 129(2), 198-204.
Stoy, C., Pollalis, S., Schalcher, H.-R. J. J. o. C. E., & Management. (2008). Drivers for cost
estimating in early design: Case study of residential construction. 134(1), 32-39.
Sumathi, S., & Sivanandam, S. (2006). Introduction to data mining and its applications (Vol.
29): Springer.
Sumera Ahmad, G. R. B. (2017). Software Cost Estimation Using Data Mining: Review.
International Journal of Scientific Development and Research (IJSDR), 2(7), 181-183.
Weiss, S., & Indurkhya, N. (2018). Predictive data mining : a practical guide / Sholom M.
Weiss, Nitin Indurkhya.
Williams, J. (2013). Estimating for Building & Civil Engineering Work: Routledge.
31

Research Paper Afiq

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Research Paper Afiq

Caricato da

Copyright:

Formati disponibili

B.

DATA MINING IN IMPROVING THE ACCURACY

MOHAMAD AFIQ ZUHAIRY MARZUKI

FACULTY OF BUILT ENVIRONMENT

1.2 Research Background

Construction industry is one of the most challenging industry because of it’s

The accuracy of a construction project is very crucial. If the total cost of

Data mining also known as knowledge discovery from data (KDD) is a

1.3 Problem statement

Data collected may contain imprecision and uncertainty. Uncertainty of data

Lastly, Chen et al. (2015) highlighted another problem in accessing, extracting

1.4 Research Questions

1. What is the factor affecting the accuracy of construction cost estimation?

The research aim can be broken down into these objectives: -

• To evaluate the current challenges in improving the accuracy of construction

1.6 Significance of the Research

The study will assist stakeholders especially construction manager in understanding

1.7 Scope of Research

A qualitative approach was done to support information discovered from the

1.9 Dissertation Structure

This research report structured into six chapter.

Chapter 2: Literature Review

Chapter 3: Research Methodology

Chapter 4: Results of Data Analysis

Chapter 5: Discussion of Results

Chapter 6: Conclusion and Recommendation

2.2 Construction Cost Estimation

Construction cost estimate is a process of examining the cost related to the

Elhag (2005) in his research has categorized six influencing factors of

Project size/floor area X X X X

Choosing the right approach is very important at the early stage of

Parametric Estimating Method

Elemental Parametric Estimating Method

This method of estimates produces based on its elements and parameters. As

Detailed Cost Estimating Method

2.3 Data Mining

Behaviors and automated trend prediction. Data mining mechanizes the

Other predictive problems include the prediction of bankruptcy and other

Data mining techniques can generate the advantages of mechanization on

There are several method of data mining namely classification, clustering,

Classification is the process of gathering sets of models or function and distinguish

Next method is clustering. Clustering is a method which analyzes data objects

Next, association analysis is the discovery of association rules displaying attribute

2.3.4 Data mining model

Data preparation Data mining Presentation

Integration Extraction Preprocess Mining Visualization

The data mining process (Chen et al., 2015).

2.4 Data Mining in improving the accuracy of construction cost estimation

Existing theories on construction cost overrun suggest a number of causes

Research by (Ahiaga-Dagbui & Smith, 2014) highlighted that data mining

2.5 Framework for data mining

Table 2: Framework for selecting data mining technique (Ahiaga-Dagbui &

Data mining Data mining Data mining Technique

Regression Prediction Regression Flexibility

Summarization Measurement Ease of

Figure 1: Data Preparation (Weiss & Indurkhya, 2018)

Figure 2: Data Reduction (Weiss & Indurkhya, 2018)

In theory, having big data for training

Figure 4: Case and solution analysis (Weiss & Indurkhya, 2018)

Potrebbero piacerti anche