Sei sulla pagina 1di 15

Big Data Computing

Dr. Syed Saood Zia


Assistant Professor Lecture # 2
Software Engineering Department
Sir Syed University of Engineering & Technology
Content
• Idea of Big Data
• Example Models
• Relevance of Big Data
• Key Computing Resources for Big Data
• Scalability — Scale Up & Scale Out
• Techniques towards Big Data
• Why Big Data now?
• Contrasting Approaches in Adopting High-Performance Capabilities
• Big Data Market
Idea of Big Data
• Methods of obtaining knowledge
Theory (model), hypothesis, experiment, analysis (repeat)
▫ Explorative: start theory with observations of phenomena
▫ Constructivism: starts with axioms and reason implications

• (Big) Data + Analytics ) Insight (prediction of the future)


▫ For industry: insight = business advantage and money...
• Analytics: follow an explorative approach and study the data
▫ To infer knowledge, use statistics / machine learning
• Construct a theory (model) and validate it with the data
Example Models
• Similarity is a (very) simplistic model and predictor for the world
▫ Humans use this approach in their cognitive process
▫ Uses the advantage of Big Data
• Weather prediction
▫ You may develop and rely on complex models of physics
▫ Or use a simple model for a particular day; e.g. expect it to be similar to the
weather of the day over the last X years.
• Preferences of Humans
▫ Identify a set of people which liked items you like
▫ Predict you like also the items those people like (items you haven’t rated so far)
Relevance of Big Data
• Big Data Analytics is emerging
• Relevance increases compared to supercomputing

Google Search Trends, relative searches


Key Computing Resources for Big Data
Key Computing Resources “Big Data Analytics”, David Loshin, 2013
• Processing capability: CPU, processor, or node.
• Memory
• Storage
• Network
Scalability — Scale Up & Scale Out
• Scale out
▫ Use more resources to distribute workload in parallel
▫ Higher data access latency is typically incurred
• Scale up
▫ Efficiently use the resources
▫ Architecture-aware algorithm design
Scalability — Scale Up & Scale Out

• For independent data ==> scale up may not have obvious


advantage than scale out
• For linked data ==> utilizing scale up as much as possible
before scale out
Techniques towards Big Data
• Massive Parallelism • Data Mining and Analytics
• Huge Data Volumes Storage • Data Retrieval
• Data Distribution • Machine Learning
• High-Speed Networks • Data Visualization
• High-Performance Computing
• Task and Thread Management
➔ Techniques exist for years to
decades. Why is Big Data hot now?
Why Big Data now?
• More data are being collected and stored
• Open source code
• Commodity hardware / Cloud ➔
• High-Volume
• High-Velocity
• High-Variety ➔
• Artificial Intelligence
Contrasting Approaches in Adopting High-Performance
Capabilities
Aspect Typical Scenario Big Data
Applications that take advantage of A simplified application execution model
massive parallelism developed by encompassing a distributing file system, application
Application specialized developers skilled in programming model, distributed database and
Development high-performance computing, program scheduling is packaged within Hadoop, an
performance optimization, and open source framework for reliable, scalable,
code tuning distributed and parallel computing.
Uses high-cost massively parallel Innovative methods of creating scalable and yet
processing (MPP) computers, elastic virtualized platforms take advantage of
Platform utilizing high-bandwidth networks clusters of commodity hardware components
and massive I/O devices. (Cloud-based utility computing services) coupled
with open source tools and technology.
Contrasting Approaches in Adopting High-Performance
Capabilities
Aspect Typical Scenario Big Data
Limited to file-based or relational Alternate models or data management (i.e. No SQL)
database management systems provide a variety of methods for managing
(RDBMS) using standard row- information to best suit specific business process
Data
oriented data layouts needs, such as in-memory data management (for
Management
rapid access), columnar layouts to speed query
response and graph databases (for social network
analytics)
Requires large capital investment The ability to deploy systems like Hadoop on
in purchasing high-end hardware virtualized platforms allows small and medium
Resources
to be installed and managed in- businesses to utilize cloud-based environments
house.
Big Data Market
Human Brain is a Graph/ Network of 100 B nodes and 1 T
Edges
Summary
• Idea of Big Data
• Example Models
• Relevance of Big Data
• Key Computing Resources for Big Data
• Scalability — Scale Up & Scale Out
• Techniques towards Big Data
• Why Big Data now?
• Contrasting Approaches in Adopting High-Performance Capabilities
• Big Data Market

Potrebbero piacerti anche