Big DataBigData

BIG DATA: riconoscerli, gestirli, analizzarli
Dedagroup Highlights
Nu me r o 1
Featuring research from
2
Big Data e le tre V: cosa sono,
come gestirli
4
Big Data Analytics: una miniera
da sfruttare
6
From the Gartner Files:
Big Data and Content Will
Challenge IT Across the Board
11
Proli Aziendali
2 Dedagroup Highlights Numero 1
Big Data e le tre V: cosa sono, come gestirli
In azienda sempre pi avvertita
lesigenza di archiviare, gestire
e trattare moli di dati sempre
crescenti. Difcile stimare la crescita
del volume di dati generati ed in
qualche modo da gestire nei prossimi
anni, il fatto certo che il volume
crescer, e in maniera cospicua. La
necessit, quindi, di espandere le
architetture per la gestione dei dati,
se non ancora affrontata, sar presto
sul tavolo dellarea IT di molte
aziende. Ma che cosa si intende
esattamente per Big Data?
Una visione interessante di cosa
sono i Big Data stata esposta da
Alexander Jaimes, ricercatore presso
Yahoo Research, che nel corso di
una recente conferenza in Italia
ha affermato i dati siamo noi.
Lutilizzo ormai diffuso di qualsiasi
dispositivo elettronico genera,
infatti, una massa dinformazioni
spesso indirette, che possono andare
ad alimentare basi dati di grandi
dimensioni. Ma sufciente la
dimensione per parlare di Big Data?
E come si distingue un semplice
dato non strutturato da un Big
Data ? Secondo molti analisti se
linformazione ha le caratteristiche
di Variet, Velocit, Volume allora si
sta affrontando un vero Big Data.
Come riconoscerli
Partiamo dallultima componente,
quella apparentemente pi scontata,
ovvero il Volume. E facile capire che
stiamo trattando informazioni che
partono dai terabytes ai petabytes per
entrare nel mondo degli zetabytes,
e che i volumi sono in continuo
aumento. Per chi non fosse avvezzo a
queste dimensioni, lo zetabytes pari
a ben un miliardo di terabytes.
Multipli dei Bytes
Laspetto Variet qualcosa di
nuovo per noi: lera dei Big Data
caratterizzata dalla necessit e
desiderio di esplorare anche dati
non strutturati oltre e insieme alle
informazioni tradizionali. Se pensiamo
ad un post su Facebook, un tweet
o un blog, essi possono essere in un
formato strutturato (JSON), ma il
vero valore si trova nella parte dei
dati non strutturati. Nellimpostare
quindi le regole di denizione di un
data warehouse e le analisi che ne
conseguono necessiteremo di strumenti
diversi dai database tradizionali che
richiedono coerenza e
integrit (pensiamo ad
esempio a un le di log
che non sar conservato
a lungo). Inne, la terza
caratteristica, ovvero la
Velocit. Contrariamente
a quanto si potrebbe
pensare la velocit non
si riferisce alla crescita,
ovvero al volume, ma alla
necessit di comprimere i
tempi di gestione e analisi:
in brevissimo tempo
il dato pu diventare
obsoleto. E dunque
strategico presidiare e
gestire il ciclo di vita dei Big Data. Gli
strumenti software che permettono
dunque di gestire con velocit lanalisi
delle informazioni sono quelli pi
indicati nella gestione di questa
tipologia di dati.
Lesplosione dei dispositivi che
hanno automatizzato e forse migliorato
la vita di tutti noi ha generato
unenorme massa dinformazioni
destrutturate che tutti gli analisti
prevedono cresceranno a ritmi
esponenziali nei prossimi anni. In questa
sezione della monograa proveremo ad
analizzare dove sono i Big Data, come
riconoscerli e come memorizzarli in
modo da renderli il pi possibile fruibili
dalle applicazioni di Analytics.
Vediamo alcuni esempi di dove si
trovano i Big Data. Dal 2005 al 2011
gli RFID (Radio Frequency ID tags)
sono cresciuti da 1.3 a trenta miliardi
e la crescita prevista sar ancora
pi rilevante nei prossimi sei anni.
Non tutti sanno che un motore di un
aeromobile genera circa 10 TB di dati
ogni trenta minuti di volo, poich nelle
kilobyte (KB)
megabyte (MB

gigabyte (GB)

terabyte (TB)

petabyte (PB)

exabyte (EB)

zettabyte (ZB

yottabyte (YB)
10
3
10
6

10
9

10
12

10
15

10
18

10
21

10
24
Numero 1 Dedagroup Highlights 3
tratte nazionali vi sono due motori
signica che un volo Genova-Catania
genera 60 TB ma un Milano-New
York, con un quadrimotore ben
640 TB. La gestione di una simile
mole di dati richiede un approccio
completamente diverso da quello
tradizionale.
Un altro esempio viene dal settore
utilities, in particolare dalla gestione
del tema smart metering, che
impone misurazioni complesse di tutti
i sensori e per un amplissimo orizzonte
temporale: unimmensa quantit di
misurazioni in tempo reale.
Un settore nel quale sono
sicuramente presenti Big Data
la meteorologia: sono milioni i
sensori, le telecamere, i rilevatori
che sono presenti nel mondo. Tutte
queste informazioni possono adesso
essere utilizzate in modalit pro-
attiva e non solo per ni statistici.
Due esempi permettono di capire
come utilizzare i Big Data in
questambito per migliorare la vita
di tutti noi. E gi in funzione negli
USA unapplicazione di analisi
dei percorsi utilizzata dal centro
di analisi degli uragani - National
Hurricane Center (http://www.
nhc.noaa.gov/) - che, oltre a fare la
previsione dellintensit, consente di
automatizzare lapprovvigionamento
dei magazzini che dovranno essere
utilizzati come primo soccorso lungo
il percorso delluragano stesso.
Immaginate invece che a fronte
di un ingrossamento di un ume
che attraversa una citt con un
forte rischio di esondazione, siano
automatizzate tutte le informazioni
e tutti i semafori cittadini per
convogliare il trafco verso zone
sicure della citt.
E poi ci siamo noi, che
alimentiamo i social networks che
costituiscono oggi i Big Data pi
interessanti per tutti gli analisti di
tutti i settori. Facebook gi nel 2009
ha generato 25 TB di log per ogni
giorno (Facebook log data reference:
http://www.datacenterknowledge.
com/archives/2009/04/17/a-look-
inside-facebooks-data-center/) e
circa 7/8 terabytes di informazioni
caricate su Internet. Twitter circa
al 50% rispetto a Facebook ma
negli ultimi due anni la crescita
rispetto al competitor ha limato
signicativamente la differenza.
Google non fornisce dati ma
pensando alla diffusione di
applicazioni quali Gmail, Google
Maps, You Tube e Google Analytics
possiamo intuirne lentit. Non
fate troppo afdamento su questi
numeri: mentre leggete questa
monograa controllatene la data di
pubblicazione, perch in un mese
i dati crescono mediamente con
un ritmo a doppia cifra. Sono gi
moltissime le aziende che avvertono
lesigenza di gestire, analizzare,
comprendere questa tipologia di
dati: tra questi un nostro cliente che
opera nel settore del commercio a
livello mondiale e desidera capire
cosa pensano i suoi clienti dei
prodotti analizzando blogs e social
networks. Per questo sono necessari
strumenti di analisi innovativi.
Come organizzarli
Una volta riconosciuti i Big
Data, necessario pensare a come
e dove memorizzarli. Gli storage
tradizionali si possono prestare per
una memorizzazione a basso costo
ma in alcuni casi con prestazioni di
analisi poco soddisfacenti e necessit
di investimenti in tuning e ridisegno
dei data warehouse ottimizzandoli
per le analisi richieste. Tutti gli
analisti concordano nellindividuare
nei data warehouse appliances
gli strumenti del futuro. Donald
Feinberg di Gartner sostiene che
Entro il 2015, almeno il 50%
delle aziende enterprises con data
warehouses in produzione includer
data warehouse appliances.
Lutilizzo di appliances
con funzionalit avanzate di
ricerca permette di sgravare gli
amministratori dei data warehouse
di molte delle attivit di design e
tuning necessarie per la gestione delle
applicazioni di business analytics
quali ad esempio la creazione di
indici, partizionamento e relative
attivit di monitoring. Le necessit
degli utenti delle applicazioni che
analizzano i dati aziendali variano
molto rapidamente e i relativi
tasks di tuning, denizione di
nuovi indici o di creazione di data
mart ad-hoc, occupano in modo
costante il tempo delle risorse di
amministrazione. In molti casi
le appliances aiutano a ridurre
sensibilmente questi costi.
In conclusione possiamo
affermare che i Big Data sono
intorno a che e noi stessi ne siamo
tra i pi grandi generatori. Le
necessit di analizzarli saranno
sempre pi importanti per permettere
alle aziende di essere competitive
sul mercato. Il corretto utilizzo dei
nuovi strumenti per memorizzare e
analizzare le informazioni saranno
oggetto della sda dei sistemi
informativi nei prossimi anni.
Fonte: Dedagroup
Big Data Analytics un concetto
nuovo, che in realt lunione di 2
concetti dei quali si parla ormai da
molti anni e che, quindi, nuovi non
sono. Da un lato Big Data, con tutte
le problematiche connesse, come
abbiamo visto.
Dallaltro lato la Business Analytics.
Di modello dimensionale dei dati e
applicazioni OLAP si parla da pi
di 20 anni, la Business Intelligence e
Performance Management sono tra
le aree IT che negli ultimi anni hanno
avuto pi attenzione ed investimenti, il
Data Mining e le analisi predittive sono
stata lultima frontiera che ha portato
verso la Business Analytics. Oggi
difcile trovare unazienda che non
abbia affrontato almeno uno dei temi
appena citati.
Ci che inedito, invece, il
concetto di Big Data Analytics. Unione
che non il semplice accostamento
dei 2 concetti sopra descritti. Big Data
Analytics, non signica semplicemente,
o non solo, potere fare analisi su grossi
volumi di dati. Anche gli altri 2 fattori
che caratterizzano i Big Data, ovvero
la variet dei dati e la necessit di
trasformare i dati in informazioni il pi
velocemente possibile sono intervenuti
a determinare la denizione di questa
nuova categoria di applicazioni.
La variet delle informazioni
Probabilmente esistono 2 distinte
categorie di Big Data. La prima
relativa alle tipologie tradizionali
di dati. La seconda relativa ai dati
generati dalla rete. Se prestiamo
attenzione a questa ultima categoria,
non difcile immaginare che se
unazienda, probabilmente qualsiasi
azienda, volesse mettere a frutto
limmenso potenziale informativo
implicitamente contenuto nella
rete, si troverebbe ad affrontare,
prima ancora del problema della
quantit dei dati, il problema della
loro variet. Solo per fare alcuni
esempi, proviamo ad elencare quali
potrebbero essere le principali fonti
dati per la Big Data Analytics:
Dati
strutturati
in tabelle
(relazionali)
Sono i dati sui
quali si basa
la tradizionale
Business
Intelligence e
la sua recente
evoluzione,
la Business
Analytics.
I volumi
Big Data Analytics: una miniera da sfruttare
sempre crescenti di dati
memorizzabili e le sempre pi
performanti architetture rendono
ancora oggi le tabelle relazionali
la principale fonte dati per la
Big Data Analytics. Tutti sistemi
gestionali esistenti producono dati
strutturati o strutturabili in tabelle
relazionali. Restano il modello di
dati preferenziale per le principali
piattaforme di analytics.
Dati semistrutturati (XML
e standard simili)
E il tipo di dati che sta sdando
legemonia dei dati strutturati.
Applicazioni transazionali e non,
forniscono nativamente output di
dati in formato XML o in formati
tipici di specici settori (SWIFT,
ACORD). Si tratta perlopi
di dati business-to-business
organizzabili gerarchicamente.
Dati di eventi e macchinari
(messaggi, batch o real time,
sensori, RFID e periferiche)
Sono i tipici dati denibili Big
Data, che sino a pochi anni fa
venivano memorizzati solo con
profondit temporali molto brevi
(massimo un mese) per problemi
di storage.
Dati non strutturati (linguaggio
umano, audio, video)
Sono enormi quantit di metadati,
perlopi memorizzati sul web, dai
quali possibile estrarre informazioni
strutturate attraverso tecniche
avanzate di analisi semantica.
Dati non strutturati da social media
(social network, blog, tweet)
Sono lultima frontiera delle fonti dati
non strutturate. Crawling, parsing,
entity extraction sono tra le tecniche
per lestrazione di dati strutturati
e analizzabili. I volumi aumentano
esponenzialmente nel tempo. Il loro
utilizzo pu aprire nuovi paradigmi
di analisi prima impensabili.
Dati dalla navigazione web
(Clickstream)
Web Logs, Tag javascript, Packet
snifng per ottenere la Web
Analytics. Enormi quantit di
dati che portano informazioni
sui consumi e le propensioni
di milioni di utenti. Anche per
questi dati, i volumi aumentano
esponenzialmente nel tempo.
Dati GIS (Geospatial, GPS)
I dati geospaziali sono generati
da applicazioni sempre pi
diffuse. La loro memorizzazione
ormai uno standard e i volumi
sono in crescente aumento.
I dati geospaziali, analizzati
statisticamente e visualizzati
cartogracamente, integrano i dati
strutturati fornendo, ad esempio,
informazioni di business, sulla
sicurezza o sociali.
Dati scientici (astronomici,
genetica, sica)
Come i dati di eventi, sono per
denizione dei Big Data. Per il
loro trattamento e analisi si sono
sperimentate tutte le pi innovative
tecniche computazionali nella
storia recente dellInformatica e per
questi dati sono stati progettati,
nel tempo, tutti i pi potenti
calcolatori elettronici. I loro volumi
sono enormi e in costante aumento.
Dopo questa elencazione,
comunque non esaustiva, sar
risultato chiaro anche al lettore meno
attento alla problematica, quale sia la
potenziale variet di dati da trattare
in unapplicazione sviluppata per
trasformare i dati in informazioni
di business. Proprio questultima
riessione ci permette di introdurre
lultimo fattore che ha determinato la
nascita delle Big Data Analytics.
La velocit delle decisioni
In realt anche quello della velocit
non un concetto nuovo per le
applicazioni analitiche. Da sempre ci
si pone il problema di come rendere
le interrogazioni pi
performanti, di come
ottenere in tempo reale
le informazioni che ho
bisogno e pi in generale
di come riesco, nel pi
breve tempo possibile,
a trasformare i dati
in informazioni e le
informazioni in decisioni
di business. Con lo
scenario e la complessit
visti in precedenza, per,
le cose si complicano
ulteriormente. Ho bisogno di velocit
sia per catturare velocemente i dati
sia per memorizzarli velocemente in
forma strutturata. La struttura dei
dati, poi, permette di individuare una
pattern based strategy per lestrazione
di informazioni coerenti, confrontabili
e aggiornate. Informazioni non
aggiornate, anche se basate su Big
Data, ne impoveriscono il valore no
a renderle inutili se non addirittura
dannose. Lallineamento delle basi dati,
lelaborazione delle interrogazioni e la
restituzione dei risultati necessitano di
tecnologie, architetture e applicazioni
ottimizzate e dedicate.
La solidit della conoscenza
Il paradigma della Business
Intelligence si basa tradizionalmente
su tre layer, Sistemi alimentanti,
Datawarehouse e Front end analitico
e di presentazione. La medesima
architettura proponibile anche nella
Big Data Analytics. Nel processo
guidato dal classico usso che porta
dal Dato alla Conoscenza, afnch
linformazione non sia superata o,
peggio, inesatta, necessario che si
adottino architetture estremamente
performanti e si implementino
applicazioni efcienti su piatta
forme evolute.
Fonte: Ecos
From the Gartner Files:
Big Data and Content Will Challenge IT Across the Board
Recommendations
CIOs and IT leaders should
investigate the opportunities
for training their existing staff
regarding the challenges of big data
tools and solution architectures.
Allocate budgeting and staff for
taking over end-user clusters
deployed for big data analytics in
your IT planning.
Consider introducing access to
an existing MapReduce cluster
and do not focus only on
consolidated repositories.
Align technology which analyzes
big data assets with at least one
pilot business personalization
initiative.
Analysis
Enterprise architects, information
managers, and data management and
integration leaders often delve into
the challenge of big data and nd
that the volume of data represents
only one aspect of the problem.
Clients and vendors increasingly
encounter a phenomenon they call
big data, but the term is sometimes
misleading because the challenge
has many dimensions beyond the
volume of data under management.
Gartner has identied 12 dimensions
in three categories: quantication,
access enablement and control,
and information qualication and
assurance. These dimensions interact
with each other to exacerbate
the challenges of next-generation
information management. IT leaders
must recognize all these challenges,
design information architectures
and management strategies to
The impact of big data
is extremely broad, for both
the business and information
management and utilization. We
discuss a diverse set of analytic
impacts which affect some of the
most sensitive IT initiatives in
your organization.
Overview
Big data forces organizations
to address the variety of information
assets and how fast these new asset
types are changing information
management demands. It is
important to understand what
the impacts of big data are on the
existing information management
approach, in order to develop a plan
to respond to these issues.
Impacts
IT and business professionals
integrating big data structured
assets with content must increase
their business requirement
identication skills.
IT support teams will be tasked
with supporting end-user-deployed
big data solutions, and allocation
of funding for support will be
contentious.
Enterprise data warehouses
will undergo major revisions to
address big data, or face being
decommissioned.
Business analysts using context-
aware algorithmic analysis of
big data must address the delity
and contract aspects of extreme
information management, or false
analysis output could actually drive
customers away.
address them, and then deploy new
technologies and practices to manage
data extremes because traditional
methods will fail. Failure to plan
for all of the extreme dimensions in
systems deployed over the next three
years will force a massive redesign
for more expansive capabilities
within two or three years. However,
processing matters, too: a complex
statistical model can make a 300GB
database seem bigger than a
110TB database, even if both are
running on multicore, distributed
parallel processing platforms.
In 2012, big data has reached a
point of inection. Gartner inquiries
note the increasing incidence of
big data as part of the issue: in
2011, more than 2,000 end-user
inquiries included some aspect of the
topic. Tools are now offered by a
variety of vendors for implementing
MapReduce as one solution.
The Apache Hadoop open-source
project offers a variety of tools which
can be self-deployed or implemented
via managed distribution.
Major vendors such as IBM
and Microsoft are developing,
or offering their own products
for, certain components of a
MapReduce implementation.
Some traditional data aggregators
and analytics vendors also offer
big data solutions, although
not necessarily MapReduce; for
example, LexisNexis.
Smaller vendors such as Cloudera
offer a combination of managed
Hadoop distributions coupled
with professional services for
implementation.
Source: Gartner (February 2012)
Figure 1 Impacts and Top Recommendations For Big Datas
Some large vendors are partnering
to support MapReduce technology
(for example, Oracles offering of
Cloudera as part of its Oracle Big
Data Appliance).
However, MapReduce is a
technology approach, not a product,
and is not equal to big data which
is some combination of volume,
variety and velocity issues. Big data
is also not equal to the Hadoop
solution approach. Graph is a part of
big data analytics, and big data issues
also abound in text, document and
media analysis. Certain infrastructure
as a service (IaaS) vendors are also
offering big data processing and
analysis solutions. Finally, NoSQL
solutions including key value,
graph, document and column-style
data stores are also increasing in
analytic use cases.
Impact: IT and business professionals
integrating big data structured
assets with content must increase
their business requirement
identication skills.
The broader context of big data
challenges existing practices of
selecting which data to integrate,
with the proposition that all
information can be integrated and
that technology should be developed
to support this. As a new issue
driving requirements (that demands
a new approach), the breaching of
traditional boundaries will occur
extremely fast because the many
sources of new information assets are
increasing geometrically (for example,
desktops became notebooks and now
tablets; portable data is everywhere
and in multiple context formats) and
this is causing exponential increases
in data volumes. Additionally, the
information assets include the entire
spectrum of information content:
from fully undetermined structure
(content) to fully documented
and traditionally accessed structures
(structured). As a result,
organizations will seek to address the
full spectrum of extreme information
management issues, and will use
this as differentiation from their
competitors to become leaders in
their markets in the next two to ve
years. Big data is, therefore, a current
issue (focused on combinations
of volume, velocity, variety and
complexity of data), which highlights
a much larger extreme information
management topic that demands
almost immediate solutions. Gartner
estimates that organizations which
have introduced the full spectrum of
extreme information management
issues to their information
management strategies by 2015, will
begin to outperform their unprepared
competitors within their industry
sectors by 20% in every available
nancial metric.
Recommendations:
Identify a large volume of datasets
and content assets that can
form a pilot implementation for
distributed processing, such as
MapReduce. Enterprises already
using portals as a business delivery
channel should leverage the
opportunity to combine geospatial,
demographic, economic and
engagement preferences data in
analyzing their operations, and/
or to leverage this type of data in
developing new evaluation models.
Particular focus on supply chain
situations which include location
tracking through route and time
and which can be combined with
business process tracking is a
good starting point. The life sciences
industry will also be able to leverage
big data: for example, large content
volumes in clinical trials or genomic
research and environmental
analysis as contributing factors
to health conditions.
CIOs and IT leaders should
utilize opportunities to train their
existing staff in the challenges of
extreme information management.
Staff will then be able to deliver
big data solutions directly, or
supervise their delivery.
Impact: IT support teams will be
tasked with supporting end-user-
deployed big data solutions, and
allocation of funding for support will be
contentious.
End users have deployed
MapReduce clusters using
their departmental budgets and
discretionary funds over the past
two to four years. As the analytics
output from these deployments
demonstrates their valuable decision-
support capabilities, business
executives will want to leverage both
the infrastructure and the analysis
processes. IT will be asked to develop
a strategy for supporting these
expanded use cases. However, these
are custom deployments that are not
tools-based. Additionally, business
unit budgets are not accessible to
IT (either generally or to allocate
funds to maintain and support these
deployments); these funds are allocated
on a project basis and considered one-
time investments. At the same time,
attempts to leverage these personal
and departmental clusters will be met
with resistance: because the users
deploying these systems have neither
the time to support leveraging these
systems in an enterprise manner, nor
the budget to introduce enterprise-class
infrastructure. Control of the systems
will also be contentious because the
end users have grown accustomed
to using these clusters as dedicated,
personal systems. IT will once again
have the task of identifying tools,
versions and distribution standards,
publishing those standards and
then encouraging business-managed
deployments to follow the standards
even if they appear to be
rogue projects.
Recommendations:
Allocate budgeting and staff for
taking over end-user deployed
MapReduce clusters in your IT
planning. Budgeted amounts should
include travel and training for IT
specialists to learn how the software
tools and hardware infrastructure
operates. If your organization
requires funding reallocation, plan
on the transfer of these funds out of
the IT budget and into business unit
accounting categories.
IT should only plan to assume
control and support of these
deployments where three or
more business units are already
leveraging the cluster(s). IT should
avoid taking over the management
of any deployments which are
being used by one or two business
organizational units, and the
owning units should remain
responsible for both stafng and
budgets. If the business units
refuse to continue funding and
stafng such units they should be
decommissioned on the basis that,
You have to pay to play.
Identify which software is being
used and develop a standardized
approach for what IT will
support. This can be a managed
distribution, a vendor product,
or even a contract with a
professional services provider; IT
only assumes the management of
the relationship.
Impact: Enterprise data warehouses
will undergo major revisions to
address big data, or face being
decommissioned.
Over the years, the various
options (centralized enterprise
data warehouses [DWs], federated
marts, the hub-and-spoke array of
central warehouse with dependent
marts, and the virtual warehouse)
have all served to emphasize certain
aspects of the service expectations
for a DW. The common thread
running through all styles is that
they were repository-oriented. A
repository-only style of warehouse
will be completely overwhelmed
by the simultaneous increases in
volume, variety and velocity of data
assets, and the demand for data
integration toward a repository
as the only strategy will fail. This,
however, is changing: the DW is
evolving from competing repository
concepts to include a fully-enabled
data management and information-
processing platform. This new
warehouse forces a complete rethink
of how data is manipulated, and
where in the architecture each type
of processing occurs to support
transformation and integration. It
also introduces a governance model
that is only loosely coupled with
data models and le structures, as
opposed to the very tight, physical
orientation previously used.
This new type of warehouse the
logical data warehouse (LDW) is
a series of information management
and access engines that takes an
architectural approach, and which
de-emphasizes repositories in favor of
new guidelines:
The LDW follows a semantic
directive to orchestrate the
consolidation and sharing of
information assets, as opposed
to one that focuses exclusively
on storing integrated datasets
in dedicated repositories. The
LDW is highly dependent upon
the introduction of information
semantic services.
The semantics are described
by governance rules from data
creation and use-case business
processes in a data management
layer, instead of via a negotiated,
static transformation process
located within individual tools
or platforms. These semantics
include leveraging externally
managed processing clusters for
MapReduce, Graph and other big
data processes.
Integration leverages both steady-
state data assets in repositories and
services in a exible, audited model
via the best available optimization
and comprehension solution.
Recommendations:
Start your evolution toward an
LDW by identifying data assets
that are not easily addressed
by traditional data integration
approaches and/or easily supported
by a single version of the truth.
Consider all technology options for
data access and do not focus only
on consolidated repositories. This is
especially relevant to big data issues.
Identify pilot projects in which to
use LDW concepts, by focusing
on highly volatile and signicantly
interdependent business processes.
Use an LDW to create a single,
logically consistent information
resource independent of any
semantic layer that is specic to
an analytic platform. The LDW
should manage reused semantics
and reused data.
Impact: Business analysts using
context-aware algorithmic analysis
of big data must address the delity
and contract aspects of extreme
information management, or false
analysis output could actually drive
customers away.
In a brave new world of
transparency and customer fairness
legislation, nancial institutions
and other online services (offered
to consumers in other industries)
could, quite conceivably, be
providing biased interfaces (including
supposed transparency) based on
the providers interpretation of
what should be transparent (for
example, the information that its
computerized algorithms have
generated), and the consumer
wouldnt know the difference
between viewing all the information
or only selected information. This
could impact product pricing, terms
and conditions, trust, and service
levels, and not necessarily for the
true benet of the customer. Also,
inappropriate use of this information
could be extremely damaging.
Sometimes, just the perception that
a bank has used data a customer
doesnt want them to use can damage
trust and the brand.
In the world of hyperdigitized and
algorithmic decisions, the customer
might think he or she has made the
best online/mobile personalized
choices based on the ltering
and openness of the information
available. However, this openness
and the ltering techniques were
being dened and controlled by the
marketer via particular search and
decision algorithms that IT developed
for the business to support a more
personalized customer experience.
In other words, the algorithms
begin to tune the experience, based
on the individuals ability to use
the interfaces and information,
instead of optimizing the use for
all options available. Any online
provider could, in theory, enable such
algorithm-dened, context-aware
personalization, and control that
personalization via computerized
algorithms without the customers
being aware. (Anecdotal evidence
reveals that ltering differences exist
for even simple nancial services
search terms.)
Recommendations:
Reconstitute your big data
personalization tools.
Organizations need to be aware of
the potentially signicant negative
brand impact from incorrectly
applied algorithm-oriented,
context-aware personalization
technologies.
Align technology which analyzes
big data assets with at least one
pilot business personalization
initiative. Organizations need to
ensure that these personalization
and context-aware capabilities
align with customer expectations
and legal/regulatory requirements.
Review and revise the policies
and codes of conduct around
the use of not manually
intermediated decision making
and personalization. The use of
computerized search, ltering
and personalization needs to
include more than just relevance
as the core part of the information
collection, analysis, decisioning and
dissemination.
Source: Gartner RAS Core Research, G00231456,
M. Beyer, D. Cearley, 15 February 2012
Profili Aziendali
DEDAGROUP la federazione delle competenze
Dedagroup ICT Network un Gruppo industriale che opera nel mercato ICT fornendo
soluzioni software, competenze distintive e servizi per banche e istituzioni nanziarie,
pubblica amministrazione e industria. Il brand Dedagroup simbolo visibile della decisione assunta di intraprendere un percorso
imprenditoriale nuovo basato su un modello federativo. Oggi il Gruppo conta dodici aziende, ciascuna con una connotazione
dofferta ben precisa, che rappresentano nel proprio segmento il pi alto livello di competenza e specializzazione.
Lofferta Dedagroup Technology 4Business in particolare garantisce ai clienti consulenza indipendente e soluzioni IT
complete per la gestione e lo sviluppo del business. Grazie anche a numerose partnership con i big player del mercato i
clienti possono contare su un partner unico per know-how, certicazioni e ampiezza dellofferta sui temi dellInformation
Protection & Management, attraverso servizi mirati di Infrastructure IT & Datacenter, ICT Security & Information
Protection, Consultancy Setvices e Information Management. In questambito Dedagroup ICT Network annovera tra i
propri clienti alcuni tra i principali protagonisti italiani del mondo industriale, nanziario e dei servizi. Realt quali A2A,
Agsm Verona, Artoni Trasporti, Dolomiti Energia, Edipower, Eurogroup Italia, Fercam , GDF Suez, Iren, Istituto Nazionale
dei Tumori Milano, Itas Assicurazioni, Lichtstudio, Noema Life, PensPlan, TNT Post.
www.dedagroup.it
Fonte: Dedagroup
ECOS
Ecos una societ di consulenza che da 15 anni si occupa di Business Intelligence e Performance
Management. Nellautunno del 2003 Ecos fonda il proprio centro di competenza, diventando in breve tempo
un fondamentale punto di riferimento per il mondo della consulenza IBM Business Analytics. Lelevata
competenza tecnologica, sistemistica e applicativa, le competenze funzionali, il rigore nellapplicazione di modelli e metodologie,
lafdabilit e la qualit delle relazioni dei consulenti Ecos, hanno permesso di maturare numerose esperienze qualicanti e di
successo con molte delle pi importanti realt aziendali italiane ed europee, pubbliche e private.Su tutte le aree del Performance
Management, Ecos vanta signicative esperienze: Datawarehousing, Business Intelligence, Planning, BSC. Lapplicazione di
metodologie consolidate e delle best practices, sono garanzia di un risultato certo e soddisfacente per ogni cliente.
Nel 2010 ECOS entrata a far parte dellICT Network di Dedagroup. La sede storica di Ecos si trova a Tortona (AL), in una
struttura di archeologia industriale oggetto di un importante progetto di ristrutturazione. Le altre sedi della societ si trovano
a Trento e Roma. Il numero e limportanza dei clienti di ECOS sono chiaro segno dello status raggiunto. Eccone una selezione:
Leaf, UCIMU, GKN Driveline, IVECO Group, Lear Corporation, Toyota Motor Italia, Baxter, Novartis, TRS Trussardi, Banca
dItalia, Cassa di Compensazione e Garanzia (London Stock Exchange), Intesa San Paolo, Mercedes Benz Italia, Monte dei Paschi
di Siena, Unicredit, Equens, Lillo MD Discount, Bticino, Giovanni Bozzetto, Editoriale Domus, Consip, INPS, Ministero Economia
e Finanze (MEF), Provincia Autonoma di Trento, Informatica Trentina, Provincia di Milano, Ospedale Pedriatico Bambino Ges,
Aereoporti di Roma (ADR), Alpitour, Costa Crociere, IBM, Manpower, Entel Chile, Entel Bolivia, Mobilkom Austria, Telecom
Italia, TIM, TIM Brasil, TIM International, Hupac, ENEL, Sorgenia, ACNielsen
www.ecos2k.it
DEDAGROUP HIGHLIGHTS is published by Dedagroup. Editorial supplied by Dedagroup is independent of Gartner analysis. All Gartner research is 2012 by Gartner, Inc. All rights reserved. All Gartner
materials are used with Gartners permission. The use or publication of Gartner research does not indicate Gartners endorsement of Dedagroups products and/or strategies. Reproduction or distribution of this
publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy,
completeness or adequacy of such information. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed
herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed
or used as such. Gartner is a public company, and its shareholders may include rms and funds that have nancial interests in entities covered in Gartner research. Gartners Board of Directors may include senior
managers of these rms or funds. Gartner research is produced independently by its research organization without input or inuence from these rms, funds or their managers. For further information on the
independence and integrity of Gartner research, see Guiding Principles on Independence and Objectivity on its website, http://www.gartner.com/technology/about/ombudsman/omb_guide2.jsp.

Big DataBigData

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Big DataBigData

Caricato da

Copyright:

Formati disponibili

BIG DATA: riconoscerli, gestirli, analizzarli

Potrebbero piacerti anche