Sei sulla pagina 1di 21

BDA Technologies & Selected

Case Studies
Ettikan Kandasamy Karuppiah (Ph.D),
Principal Researcher & Director of Accelerative Technologies Lab
MIMOS Berhad

SEMINAR INTERNET COMPUTING TECHNOLOGY


Theme: Delivering Values From Hyperconnectivities
2.00-2.45pm @Bilik Serbaguna 1, MAMPU

19th January 2015

Big Data Analytics in a Glance


Big data is defined by the high volume, velocity, variety, veracity and value of data which are
generated every second, minute, hour, day.by device, human etc

Broadening data

Growing data

VOLUME
90% of worlds

data generated
over last 2 years

VARIETY
Turning
big data into

Value

80% of the worlds data

is unstructured (text,
geospatial, audio, video)

ECONOMIC
BENEFITS

Establishing the

Increasing data

VELOCITY

GOVERNMENT
BENEFITS

VERACITY

175,000

SOCIETAL
BENEFITS

Big Data technology allows


us to establish quality and
accuracy especially in
unstructured data

tweets per
second

of big data
sources

Big Data Computing in ICT Sector


The Malaysian ICT services sub-sector has huge potential growth, with a

Bus i ne s s Va l ue

projected share of 35% in the nations Digital Economy in 2020...

Requires
Transformative
Platform

Source: MDEC, as taken from APeJ Big DataMaturityScape


Assessment 2013 by IDC

Software Solutions and Support is the Key GDP Contributor

MIMOS BigData Technologies R&D


Intel Malaysia
/US MoU
MultiCore
Java
Compiler
Conducted
Workshop,
Hadoop
Programming
training to
Malaysian
Research
Community

Sentiment
Analysis Model
&
Data Modeling
& Data
Warehouse for
PIK MOH
& GPGPU Video
Data Analytics
Library

AMD Malaysia
/US/Europe MoU

nVidia COE for


GPGPU
Established

Data Cleansing
Engine for
PERKESO
&
Data Warehouse
for PERKESO
ESRI Inc/US Mou
Established

Establish work on
General Purpose
Graphics
Processing Unit for
text manipulation,
Hadoop Trainings

Acquire
Train
R&D

GE13 Electoral
Roll Analysis with
Hadoop & GPU

GPU
Accelerated
Libraries for
Data
Cleansing &
Financial Risk
Modeling

R&D
Collaboration
MiAccLib
Cleansing

High Risk Profiling, Illicit,


Taxable & Drugs
Detection
(PoC)
Accelerated
Libraries for
Database
Accelerator
Library
(Galactica)

Data Modeling
& Visualization
for PDRM
Workforce
Planning
& GPGPU Data
Security Library

Data
Encryption/Decr
yption for
National Data
Protection

MiAccLib
MiAccLib Crypto
Image
MiAccLib
BigData

MiAccLib
MiAccLib
Video MiAccLib
MiAccLib Finance
Acquire
Algo/Map
Cleansing
Train

RM10 ->
Foundation & Early Adaptation for
Heterogenetic Computing

2014 MIMOS Berhad. All Rights Reserved.

RM11 ->
Maturation & Progressive
Deployment of Scalable
4
Heterogenetic Computing

Assisting Both Government & Private Sector


Needs
National Public Sector

Private Sector to Go Global

DECISIONS REQUESTED
FCC is requested to:
1.

Take note of data science upskilling for civil servants

2.

Take note of MAMPU developing the Government Open Data


framework by 2015

3.

Endorse the DG Lab on BDA to identify use cases and pilot


projects that address societal wellbeing

4.

Take note of MIMOS defining and developing the Big Data


technology platform for Government by 2015.

5.

Mandate opening up of all relevant data (Open/Non-Open) to


the DG Lab on BDA for the pilot projects

Source : MDeC

Policy
RahsiaBesar

Technology

Rahsia
Sulit
Terhad

Opening Up Non-Sensitive Government Data


Policy for all government agencies to open up data
categorised under terbuka
o
Terbuka

E.g. - non-sensitive data like meteorology, transport


timetables and pricing of essential goods based on
Open Data criteria

Developing BDA Open Innovation Platform


An open-innovation platform between Government, businesses and Rakyat to improve eparticipation and user satisfaction. Prioritization through the development of high impact, lowcost, demand driven life-event solutions

Secure environment (sandbox) for


Government Data

DATA

Project
Sponsor

Sector-specific use cases /life-events: eg.


Welfare, Education, Healthcare, Transportation
BDA Technology Platform
Open Data

Data.gov.my

OUTCOMES
Expertise

- Community
Data
- Government
Data

BDA DG (Digital Government) LAB

POCs,
pilots &
apps

BDA Technology Platform Strategy


Research & Development on KEY Data Extraction, Processing
& Analytics Components

DATA
Community
Government

Data
Extraction
Secured

Cloud
Security

Services

Key Values

Accelerated
Data
Staging
Computing
Cleansing

Data DB Store

Harmonisation

Infrastructure
Management

Anonymisation

i.
ii.
iii.

National Data Sovereignty


Trusted Data
Secured Data

Localized Entity (ie. MIMOS, Cybersecurity)

Machine Learning
Model &
-Data
Malaysian
Analytics
Context
- (BM, English,
Chinese, Tamil)

Visualization
Data
Visualization
- Malaysian
Perspective
Traceability

BDA Technology Platform Strategy


Data Source

Applications
Customization
Data
Staging
Mi-Morphe

Mi-Helio

Mi-UAP

Mi-Harvester

Cleansing
Mi-Harmony

Data
Mi-BIS Visualization

Security
Mi-ARMC

Mi-Doc

Harmonisation
Mi-Scrambler

Mi-Portal

Mi-Trust

Data
Mi-DW
Management

Anonymisation
Mi-AccLib

Mi-DSS

Mi-SP (Video Analytics)

Mi-Market

Mi-Trace
Traceability

Mi-AccLytics
Mi-STP
Data
Model & Analytics

Galactica
Data
DB Store

Mi-ROSS

Mi-HPDW

Mi-CLIP

Data Extraction

Mi-Cloud
Infrastructure
Management

Mi-Mobile

Mi-Target

Structured
+
Open Linked
Data

Unstructured

Mi-MOCHA

3rd Party Systems & Hardware


8

Extracting Value from Data


Unstructured
Data Collector
Mi-Clip

Data
Harvesting

Data
Cleansing

Data
Harmonisation

Data
Anonymisation

Data Sharing

Scrambled
database &
Datamarts

Staging
Data

UnStructured
Data Sources
Structured
Data Sources
Knowledge
Harvester (LOD)
Mi-Harvester

Cleansing

Data
Correction
Detect
Correction
Exception

Harmonisation

Harmonisation
Terminologies

Data
Anonymisation
Mi-Scramble
+ Mi-Crypto +
MiAccLib

Data
Visualization
Mi-HELIO;
Mi-BIS

Data Harmonisation
Mi-Harmony +
Mi-Semantics

Mi-Morphe +
Mi-AccLib

Published
Data Marts
Data Warehouse Platform
(Mi-Galactica, Mi-AccConnect, Mi-HPDW)

Social Network
Analytics
Mi-Visualitic

Data
Analytics
Mi-Portal

Data Analytics
Granular
Primary
Database

Authentication &
Authorization
Mi-UAP
Mi-ARMC

Data Visualization

Data Modeling

Data
Analytics
Mi-HPDW
Data
Statistics
Mi-AccStat

Sentiment
Analytics
Mi-Intelligence;
Mi-NLP
Data
Analytics
Mi-Target

Virtualized Platform & Integrity Manager


Mi-CLOUD + Mi-Mocha
2014 MIMOS Berhad. All Rights Reserved.

Technology Challenges Ahead (11th Malaysia Plan)


Technology Pull

NEWER Sources
of Data

(eg. high speed streams)

NEWER Channels
of Consumption

NEWER Methods
of Visualization

(eg. Omni channel data market)

(eg. Multi dimensional view)

Mi-CLIP

Mi-Morphe

Mi-Helio

Mi-UAP

Mi-Harvester

Mi-Harmony

Mi-BIS

Mi-ARMC

Mi-Doc

Mi-Scrambler

Mi-Portal

Mi-Trust

Mi-DW

Mi-AccLib

Mi-DSS

Mi-SP (Video Analytics)

Mi-Market

Mi-Trace

Mi-AccLytics

Mi-STP

Galactica

Mi-ROSS

Mi-HPDW

Mi-Target

Mi-Cloud

Mi-Mobile

Mi-MOCHA

NEWER Paradigms on Computing


(eg. Dockers)

Technology Push

New Platforms & Revisions

10

Big Data Moving Forward


IoA
Internet of Anything
II
Industrial Internet
IoE
Internet of Everything

IoT
Internet of Things

11

Big Data Moving Forward


IoA

Internet of Anything
Software Defined
Network

II
Industrial Internet
IoE
Internet of Everything

Big Data
Processing

Mobile Systems

Wearables

Cloud
Computing

IoT

Internet of Things
Cyberphysical
systems
Cyber- Internet
of
biological
systems Humans

12

Open Platform & BDA Middleware Architecture


Data Source
Structured,
Semistructured &
Un-structured
Data Sources

Data
Extraction

Data Visualisation

Data Cleansing

Mi-Clip

Mi-Morphe

Mi-Harvester

Mi-AccLib

Sqoop

Mi-HPDW
Open Linked
Data

Mi-Intelligence

MiHPDW

Mi-Harmony

Cloudera
Search & Solr

Mi-AccConnect
Mi-HPDW

Galactica Connector

Mi-HPDW

Galactica

Mi-Target

Mi-AccStat

Mahout

Data Harmonization

Data Anonymisation
Data Model

Mi-Visualitics

Mi-Helio

Data Analytics Tools (Machine Learning)

Kafka
RDBM
S

Mi-Portal

Mi-BIS

GIS

Apache Drill | Spark/Shark | Hue


Pig

Hive

Impala

Shark

Mi-Scramble
Mi-Crypto

Data
Management

YARN

Mi-AccLib

Mi-HPDW
STORAGE

MiTrust

Mi-NLP

ML-Lib (Spark)

Cloudera
Manager/
Falcon

Data Storage
Files

Data
Security
Mi-UAP

Mi-Morphe

Flume
Web & Social
Media

Data Staging

Galactica FS
Galactica

HDFS, NoSQL
Hadoop

RDBMS
Data warehouse / Data mart

Infrastructure

Mi-Cloud

Mi-Mocha

MIMOS
Solution

3rd Party Solution

RDF
Graph DB

Zoo
Keeper
Oozie
Sentry

MIMOS BigData Stack With Reference to Hadoop Stack


Visualization
Batch Query
Sentry | Mi-UAP | Mi-ARMC | Mi-Trust

Security and Authentication

MapReducev2 |

Machine Learning

Pig | Hive

Analytics

Mi-BIS (Weka) | Accstats (R and Cloudera C++)


ML-LIB (Spark) | Revolution R, Weka

Processing

Simulator | Planning Tool | Predictive


Prescriptive | Prediction Algorithm
Mi-BIS (Mi-Accstats)
Mi-BIS (Data Mining)
Revolution R 3rd Party
GIS 3rd party

Real Time Query

Mi-Morphe | Morphlines | Mi-Acclib


MapReducev2 (Accelerated ETL)
HPDW Data Model Plugin

Mi-BIS with Impala through Mi-AccConnect


Hue | Galactica | Apache Drill | Spark/Shark |
HPDW-BigData DB

(For MiMorphev3/Pentaho)

Management

Mi-Helio | Mi-Portal | Mi-BIS


(Mi-AccConnect) | 3rd Party Apps

YARN (resource management) | Big Data Orchestration Engine/Layer | Zookeeper (configuration and synchronization)
Oozie (work flow scheduler) | Cloudera Manager | Management for Luster

Application Program Interface

Thrift | REST | Java API | AVRO

Data Management

Stream

Search

Sqoop | Flume

Spark | Kafka | Spring XD & Storm

Cloudera Search & Solr

Storage

HDFS | HPDW-Storage |Galactica FS | NoSQL (Hbase)


Distributed Database (Cassandra) |RDBMS (Postgress, MySQL)

Multi & Many Cores Processors (CPU + GPU)

Streaming (twitter, logs, etc)

Legend:

RDBMS

Complete 3rd Party

(Data Sources Type)


3rd Party & MIMOS Offering

NoSQL Data Type

MIMOS Technologies
3rd Party Technologies

Proof of Concepts
Selected Use Cases

15

Proof of Concepts
-Mixed Scenario(Technology Capabilities)

16

Challenges to be Addressed
During Initial Roll-Outs

17

Data Challenges (Stage 1)


Data is stored in partial & distributed locations
Format of data both in digital & non digital while some are in paper based
format
Incomplete data set (Q issues)
Cleanliness of the data
Missing values, Random, Non-Random, CR, Noise
Cleaning while maintaining integrity & value

Extracting the features


Data in plural languages (at least English & Malay)
Structured has longer historical value to be acquired
Data storage media & format for extraction and usage

How to authenticate the key values? Where is the reference point?


As for unstructured data (e.g social media), current technology is adequate
to support the pre-processing, analytics
With some local challenges

Who are the data owner? How to ensure the security level of the data for
sharing? PDP compliance confusion .
More to be share by visiting MIMOS Lab

Analytics Challenges (Stage 2)

Tools are available but right approach is still critical for evaluation
Which are the best/right algorithms to be used?
Can you identify the right domain expert within the organization?
Who are the local domain experts to be consulted for the
methods/algorithms selection?
You may not have data scientist in specific gov. organization, but how to form one
(external + internal) -> analytics team

What exactly are the data owners business needs?


Why do they need to do this?
Headache for thembest to leave the data to rest in peace !!

Which data to be included and which to be excluded, what to be


anonymized?
concern of meaning/trend extraction

Plurality of languages & interpretation accuracy


Semantification of the language specific analytics

Bottlenecks to be identified and accelerated approach required for the


specific processing
Agile is the best way

Results Challenges (Stage 3)


Visualization of the results in simple, action-able and communicable
how to handle continuously changing analytics (and the results) due to
New data inclusion
New domain expert inclusion
New additional factors to be considered

Who validates the results?


How to translate results to value for (gov) organization
How to translate the value to actions?
How to follow-up on 2nd cycle of activities?

Benefiting Humanity Through Technology

Thank You

Potrebbero piacerti anche