Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Big data is a term for data sets that are so large or complex that
traditional data processing application softwares are inadequate to
deal with them.
But…
The term "big data" often refers simply to the use of predictive
analytics, user behaviour analytics, or certain other advanced data
analytics methods that extract value from data, and seldom to a
particular size of data set.
And…
What counts as "big data" varies depending on the capabilities of
the users and their tools, and expanding capabilities make big data a
moving target.
https://en.wikipedia.org/wiki/Big_data
Saturday, 21 March 2020 10
Objective of Big Data
VIDEO
RECORDING
HUMAN
SENSORS ENTERPRISE CONTENT, M2M LOG
EXTERNAL SOURCES FILES
DOCUMENTS
EMAIL
BUSINESS
SATELLITE
IMAGING
PROCESS BIO-
WEB LOGS DATABASE DATA SOCIAL
INFORMATICS
VARIETY
VOLUME VARIETY
VELOCITY VOLUME VOLUME
OLTP
1X 10X 100X
More Data with More Complex Relationships…in Real Time and At Scale
Monday, 23 March 2020 To manage, govern and analyze 13
Big Data Sources: Machine Generated Data
Advantages:
• Remote monitoring
• Real time monitoring
See: http://greywale.blogspot.co.uk/2014/06/iot-success-batteries-backhaul.html
Saturday, 21 March 2020 14
Big Data Sources: People
Examples
• Social Media
• Blogging
• Commenting
• Email
• Images
Text heavy
Unstructured → doesn’t conform to a predefined data model
(doesn’t “fit” with traditional databases)
Saturday, 21 March 2020 15
Big Data Sources: Organizations
Examples
• Transactions
• Customer Relationship Management
• Website click stream
• Medical records
• Survey Data
Distributed Computing
• Split data, support large data
volumes
• Computing nodes connected by
network
• Fast access
• Fault tolerance
• Optimized for a variety of data types
• Shared environment
• Increased performance/reduced
costs
• Many applications (large support
community)
• Support for Scaling Out – adding
more computing nodes
Value comes from the integration of multiple sources and types of data
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
Data Acquisition
Value comes from the integration of multiple sources and types of data
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
Popular APIs
Twitter
Instagram
Facebook
Linkedin
Pinterest
Foursquare
Flickr
Google Plus
Open Street Maps
Value comes from the integration of multiple sources and types of data
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
1. Clean
• Invalid data
• Missing Values
• Duplicate Records
• Outliers
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
1. Classification
Predict category (will the visitor come back?)
2. Clustering
Organize similar items into groups (market segmenting)
3. Regression
Predict numeric value (forecasting visitor arrivals)
4. Association Analysis
Fine rules to capture associations between items (market basket analysis →
recommender systems)
5. Graph Analytics
Find connections between entities (Visitor Flow Analysis)
Validate Select
Model Technique
Build Model
Reporting
Value comes from the integration of multiple sources and types of data
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
Value comes from the integration of multiple sources and types of data
Data
Data Data Data Data Usage/
Acquisition Preparation Analysis Report Decision
Structured Decisions
– established situation, programmable decision, situation
fully understood, routine,
Unstructured Decisions
– emergent situation, creative decision, situation unclear,
one-shot, general processes
Semi-structured decisions
– have some structured elements and some unstructured
elements
From SHARDA, RAMESH; DELEN, DURSUN; TURBAN, EFRAIM, BUSINESS INTELLIGENCE AND
ANALYTICS: SYSTEMS FOR DECISION SUPPORT, 10th Edition, © 2015. Used by permission of
Pearson Education, Inc., New York, NY. All Rights Reserved.
Business Intelligence Applications
1. Intelligence Phase
Enabling continuous scanning of external and internal information sources
to identify problems and/or opportunities
2. Design Phase (generating alternatives)
Structured/simple problems → standard and/or special models
Unstructured/complex problems
human experts, brainstorming, OLAP, data/text mining
3. Choice Phase
Use sensitivity analyses, what-if analyses, goal seeking
Simulation and other descriptive models
4. Implementation Phase
Decision communication, explanation and justification to reduce resistance
to change
Big Data
Stienmetz, J. L. (2018). Deconstructing Visitor Experiences: Structure and Sentiment. In Information and
Communication Technologies in Tourism 2018 (pp. 489-500). Springer, Cham.
4. Path Aggregation
1. Photos Uploaded
2. VGI data
Residents Visitors
0.055
0.05
0.05
0.045
0.045
0.04
0.04
0.035
0.035
0.03
0.03
1 day 2 3 4 5 6 7 8-14 15-21 22 to
days days days days days days days days 31
days
Responsible Practice
Imagine that you are the chief information officer for Thomson Travel and that
you have been asked to use Big Data to make recommendations for the
creation of a new holiday package for the millennial market. Describe using
examples and illustrations the five-step process you would take using Big
Data. Discuss both the challenges and benefits of using Big Data for this
purpose.
Fuchs, M., Hopken, W., & Lexhagen, M. (2014). Big data analytics for
knowledge generation in tourism destinations. Journal of Destination Marketing
& Management, 3, 198–209.