Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
BIG DATA
SWASTIKAA MOUDGIL, JAGREET KAUR
Computer Science and Engineering Department, Chandigarh College Of Engineering and Technology Sector26,
Chandigarh, 160019, India, Panjab University
Computer Science and Engineering Department, Chandigarh College Of Engineering and Technology Sector26,
Chandigarh, 160019, India, Panjab University
Abstract -- Big Data is an all encompassing term for any collection of datasets so large and complex that it becomes
difficult to process them using traditional data processing applications. It has been defined as per 3V Model (Volume,
Velocity, Variety).Big Data has found applications in Science, Government sector, Climate Control, Private sector etc.
Currently, Big Data is being worked on by companies like Microsoft under their Microsoft Research(MSR),IBM and Apple
in collaboration on Ios application Mobile First and many more. It has increased demand of information specialists in
projects like Oracle, Dell, IBM etc .Hence, the management of exponentially increasing data, intelligent use of this
heterogeneous data is becoming a prime concern for the complete industrial sector.
Keywords 3V Model of Big Data Harrenhausen Conference, Introduction to Hadoop, Mobile First app
1. Introduction
Big data can also be defined as "Big data is a large volume unstructured data which cannot be handled by standard
database management systems like DBMS, RDBMS or ORDBMS". Big data usually includes data sets with sizes beyond
the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.
Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from
large datasets that are diverse, complex, and of a massive scale. Big data is an all-encompassing term for any collection
of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.
The challenges include analysis, capture, search, sharing, storage, transfer etc. Scientists regularly encounter limitations due
to large data sets in many areas, including meteorology, genomics, complex physics simulations, and in e-Science . The
limitations also affect Internet search, finance and business informatics. Big data is difficult to work with using
most relational database management systems and desktop statistics and visualization packages, requiring instead
"massively parallel software running on tens, hundreds, or even thousands of servers.
2. Characteristics
In a 2001 research report and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth
challenges and opportunities as being three-dimensional, i.e. increasing volume, velocity, and variety. Gartner, and now
much of the industry, continue to use this "3Vs" model for describing big data. In 2012, Gartner updated its definition as
follows: "Big data is high volume, high velocity, and high variety information assets that require new forms of processing
to enable enhanced decision making, insight discovery and process optimization. Additionally, a new V "Veracity" is added
by some organizations to describe it. Big data can be described by the following characteristics: Volume The quantity of
data that is generated is very important in this context. It is the size of the data which determines the value and potential of
the data under consideration and whether it can actually be considered as Big Data or not. Variety - This means that the
category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps
the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and
thus upholding the importance of the Big Data. Velocity - The term velocity in the context refers to the speed of
generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead
in the path of growth and development. Variability - This is a factor which can be a problem for those who analyze the
data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to
handle and manage the data effectively. Big Data Analytics consists of 6Cs in the integrated Industry 4.0 and Cyber
Physical Systems environment. 6C system that is consist of Connection (sensor and networks), Cloud (computing and data
on demand), Cyber (model & memory), Content/context (meaning and correlation), Community (sharing & collaboration),
and Customization (personalization and value).
3. Applications
3.1Science and Research The Large Hadron Collider experiments represent about 150 million sensors delivering data 40
million times per second. There are nearly 600 million collisions per second. As a result, only working with less than
0.001% of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before
replication.
__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved
Page -58
ISSN: 2395-0560
__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved
Page -59
ISSN: 2395-0560
4. Market Growth
Big data has increased the demand of information management specialists in that Software AG, Oracle
Corporation, IBM,FICO, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms
specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at
almost 10 percent a year: about twice as fast as the software business as a whole. The world's effective capacity to
exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes
in 2000, 65 exabytes in 2007 and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes
annually by 2014. It is estimated that one third of the globally stored information is in the form of alphanumeric text and
still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused
data (i.e. in the form of video and audio content). Data sets grow in size in part because they are increasingly being
gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs,
cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. The world's
technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012,
every day 2.5 exabytes (2.51018) of data were created; as of 2014, every day 2.3 zettabytes(2.31021) of data were created.
5. Present Scenario
Large amounts of data, a variety of sources, high speed production, but also high speed processing - these are the basic
characteristics of Big Data. The amount of data that is generated and collected in each second grows exponentially. The
management of Big Data, the intelligent use of large, heterogeneous data sets, is becoming increasingly important for
competition. It is affecting all sectors - industry and academia but also the public sector. While the economy is exploring
Big Data as a new gold mine, politicians are fighting over the problem of data capitalism, whereas science tackles the
question of cross-disciplinary benefits, as well as on the challenges and the likely consequences for technology, innovation,
and society. As a marketing term or industry description, big data is so omnipresent these days that it doesnt mean much.
But it is pretty clear that we are at a tipping point. The global scale of the Internet, the ubiquity of mobile devices, the everdeclining costs of cloud computing and storage, and an increasingly networked physical word create an explosion of data
unlike anything weve seen before. Big data is nothing new. In fact, although the official definition which refers to big
data in terms of data volume, velocity and variety only came about in 2001, companies have been gathering large
amounts of data for decades. But big data has taken on a new lease of life in the last five years, largely as a result of
companies finding new ways to analyze data. Experts at GP Bull hound, an investment banking firm, suggest the future of
big data in a new report, entitled 'Extracting Insights from Exabytes', Really, big data has moved on from the initial stage
where the challenge was about storing the data and has moved onto the next, which is all about the insights companies
can obtain from the data.
5.1On March 25-27, 2015, researchers and international experts meet in Hannover for a Herrenhausen Conference
on "Big Data in a Transdisciplinary Perspective. The focus of the Herrenhausen Conference lies on open questions,
unsolved problems, and future perspectives. The conference on Big Data therefore will not focus on a particular discipline
but provide a transdisciplinary forum for Big Data researchers. We would like to discuss the challenges and consequences
of Big Data research for society as well as innovation and technology, address the influence on economics as well as the
legal framework and close on the challenges for research and research funding in the field of Big Data. Our goal is to create
an inspiring setting for the discussion of new ideas.
5.2Big Data Talent Hotspots
San Francisco Bay Area: Despite high talent cost, San Francisco Bay Area is expected to be the premier destination for Big
Data and Analytics talent. The High Tech ecosystem provides a favorable setting for the development and advancement of
new skillsets such as Big Data.. The Bay area is home to Big Data teams of reputed global firms like Google, Amazon,
Yahoo, Apple, LinkedIn and Face book. The presence of premier research institutions such as Stanford and University of
California ensure a steady supply of qualified graduate engineers. They offer intensive programs in the field of Big Data
research. Bay area is also a cradle for new age Big Data startups such as Platfora and Adchemy.
Bangalore: By 2020, Bangalore is expected to emerge as the second largest destination for Big Data R&D, driven by its fast
growing and cost effective talent pool. MNCs such as Amazon, IBM, EMC and E-bay have big data teams operating from
Bangalore. Local companies such as TCS, Wipro and Infosys are also building Big Data capabilities to cater to their
international clientele. Indian Institute of Science situated at Bangalore is a premier institute involved in cutting edge
research in the field of statistics and analytics. In addition, the presence of startups is enriching the big data ecosystem.
__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved
Page -60
ISSN: 2395-0560
__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved
Page -61
ISSN: 2395-0560
6. Unique Features
6.1 Unstructured data has never been so ubiquitous One of the elements that makes big data, well, big, is the data type.
Unlike traditional business insight, which analyses structured data (the likes of which include financial details, sales and
inventory), big data analytics tends to focus on unstructured data, such as emails, videos, photos and even posts on social
media networks. According to a 2011 IBM report, IBM Big Data Success Stories, 90% of the world's data was created in
the two years before publication. Here are some figures to help you understand just where such data is coming from: every
minute, 208,300 photos are uploaded to Face book and 350,000 updates sent on Twitter.
6.2 Tools such as 'Hadoop' mean that storing large amounts of data has become incredibly cheap Although unstructured
data is becoming more pervasive Hadoop, an open-source framework for storing large scale data, has developed
substantially in the last decade. No longer a research project , Hadoop underpins data processing at some of the world's
largest internet businesses. Why? It can deal with unstructured data and it is faster and cheaper than tools before it.
6.3 Big data analytics could lead to $610 in productivity gains in only four sectors If there was mainstream adoption of big
data analytics, the retail and manufacturing industries alone could see an increase of $325 billion to their annual GDP as a
result of increased efficiency, according to a report by McKinsey in July. Healthcare and government services could also
see productivity gains of as much as $285 billion by 2020.
6.4 Big data analytics saw nearly $1.4 billion of VC funding in the last 12 months Venture capitalists have started to look at
big data analytics with increasing scrutiny. In the last 12 months, they invested $1.37 billion into various companies, an
increase of 217% over investment in the previous period. There were 19 deals in the last quarter alone, according to GP
Bull hounds report.
7. Conclusion
Big data is now "enterprise-ready", i.e. it is commercially useful. It is now cheaper to store and process this data and
increases in computer processing speeds mean that more businesses can leverage big data analytics. Analytics tools are
opening up big data to people without specialized skills. Although only PhD-level specialists could understand the earliest
versions of tools like Hadoop , new iterations and companies are democratizing big data. The most-common feature is for
companies to show the data in easy-to-understand visualisations.Analysis can now be done in real-time. While Hadoop was
not designed for real-time analysis, new companies are now innovating to build on the framework to give companies
instant insight. Such technology is being used by companies like Hailo, the taxi-calling app, to assign drivers to prospective
passengers.
References
[1]https://en.wikipedia-org/wiki/Big-data
[2]https://agenda.weforum.org/2015/02/how-big-data-can-make-cities-work-for-the-poor/
[3]https://timesofindia.indiatimes.com/tech/tech-news/Data-scientists-Next-big-opportunity-for-India/articlesshow/17282008.cms
[4]https://www.microsoft.com/enterprise/en-esa/it-trends/big-data/articles/a-golden-era-of-insight-big-data-s-brightfuture.aspx#fbid=tdyL9dPhDgz
[5]https://www.volkswagenstiftung.de
[6]https://www.apple.com/pr/library/2014/12/10Apple-and-IBM-Deliver-First-Wave-of-IBM-MobileFirst-for-iOS-Apps.html
[7]https://www.nature.com/nature/journal/v455/n7209/full/455028a.html
[8]https://www.sciencedirect.com/science/journal/22145796/1
__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved
Page -62