Sei sulla pagina 1di 5

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 2 of February 2015

BIG DATA
SWASTIKAA MOUDGIL, JAGREET KAUR
Computer Science and Engineering Department, Chandigarh College Of Engineering and Technology Sector26,
Chandigarh, 160019, India, Panjab University
Computer Science and Engineering Department, Chandigarh College Of Engineering and Technology Sector26,
Chandigarh, 160019, India, Panjab University

Abstract -- Big Data is an all encompassing term for any collection of datasets so large and complex that it becomes
difficult to process them using traditional data processing applications. It has been defined as per 3V Model (Volume,
Velocity, Variety).Big Data has found applications in Science, Government sector, Climate Control, Private sector etc.
Currently, Big Data is being worked on by companies like Microsoft under their Microsoft Research(MSR),IBM and Apple
in collaboration on Ios application Mobile First and many more. It has increased demand of information specialists in
projects like Oracle, Dell, IBM etc .Hence, the management of exponentially increasing data, intelligent use of this
heterogeneous data is becoming a prime concern for the complete industrial sector.

Keywords 3V Model of Big Data Harrenhausen Conference, Introduction to Hadoop, Mobile First app

1. Introduction
Big data can also be defined as "Big data is a large volume unstructured data which cannot be handled by standard
database management systems like DBMS, RDBMS or ORDBMS". Big data usually includes data sets with sizes beyond
the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.
Big data is a set of techniques and technologies that require new forms of integration to uncover large hidden values from
large datasets that are diverse, complex, and of a massive scale. Big data is an all-encompassing term for any collection
of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.
The challenges include analysis, capture, search, sharing, storage, transfer etc. Scientists regularly encounter limitations due
to large data sets in many areas, including meteorology, genomics, complex physics simulations, and in e-Science . The
limitations also affect Internet search, finance and business informatics. Big data is difficult to work with using
most relational database management systems and desktop statistics and visualization packages, requiring instead
"massively parallel software running on tens, hundreds, or even thousands of servers.

2. Characteristics
In a 2001 research report and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth
challenges and opportunities as being three-dimensional, i.e. increasing volume, velocity, and variety. Gartner, and now
much of the industry, continue to use this "3Vs" model for describing big data. In 2012, Gartner updated its definition as
follows: "Big data is high volume, high velocity, and high variety information assets that require new forms of processing
to enable enhanced decision making, insight discovery and process optimization. Additionally, a new V "Veracity" is added
by some organizations to describe it. Big data can be described by the following characteristics: Volume The quantity of
data that is generated is very important in this context. It is the size of the data which determines the value and potential of
the data under consideration and whether it can actually be considered as Big Data or not. Variety - This means that the
category to which Big Data belongs to is also a very essential fact that needs to be known by the data analysts. This helps
the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and
thus upholding the importance of the Big Data. Velocity - The term velocity in the context refers to the speed of
generation of data or how fast the data is generated and processed to meet the demands and the challenges which lie ahead
in the path of growth and development. Variability - This is a factor which can be a problem for those who analyze the
data. This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to
handle and manage the data effectively. Big Data Analytics consists of 6Cs in the integrated Industry 4.0 and Cyber
Physical Systems environment. 6C system that is consist of Connection (sensor and networks), Cloud (computing and data
on demand), Cyber (model & memory), Content/context (meaning and correlation), Community (sharing & collaboration),
and Customization (personalization and value).

3. Applications
3.1Science and Research The Large Hadron Collider experiments represent about 150 million sensors delivering data 40
million times per second. There are nearly 600 million collisions per second. As a result, only working with less than
0.001% of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before
replication.

__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved

Page -58

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 2 of February 2015
This becomes nearly 200 petabytes after replication. If all sensor data were to be recorded in LHC, the data flow would be
extremely hard to work with. The data flow would exceed 150 million petabytes annual rate, or nearly500 exabytes per day,
before replication. The Square Kilometer Array is a telescope which consists of millions of antennas and is expected to be
operational by 2024. Collectively, these antennas are expected to gather 14 exabytes and store one petabyte per day. It is
considered to be one of the most ambitious scientific projects ever undertaken. When the Sloan Digital Sky Survey (SDSS)
began collecting astronomical data in 2000, it amassed more in its first few weeks than all data collected in the history of
astronomy. Continuing at a rate of about 200 GB per night, SDSS has amassed more than 140 terabytes of information.
The NASA Center for Climate Simulation (NCCS) stores 32 petabytes of climate observations and simulations on the
Discover supercomputing cluster.
3.2Government
In 2012, the Obama administration announced the Big Data Research and Development Initiative, to explore how big
data could be used to address important problems faced by the government. The initiative is composed of 84 different big
data programs spread across six departments. Big data analysis was, in parts, responsible for the BJP and its allies to
win a highly successful Indian General Election 2014.
3.3Private sector
The Utah Data Center is a data center currently being constructed by the United States National Security Agency. When
finished, the facility will be able to handle a large amount of information in exabytes collected by the NSA over the
Internet. EBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop cluster for search,
consumer recommendations, and merchandising. Amazon.com handles millions of back-end operations every day, as well
as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based
and as of 2005 they had the worlds three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB. WalMart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain
more than 2.5 petabytes of data. Face book handles 50 billion photos from its user base.
3.4Big Data Terrorism
The recent Sony hacking case is notable because it appears to potentially be the first state-sponsored act of cyber-terrorism
where a company has been successfully threatened under the glare of the national media. Ill leave it to the pundits to argue
whether Sonys decision to postpone releasing an inane farce was prudent or cowardly. Whats interesting is that the cyber
terrorists caused real fear to Sony by publicly releasing internal enterprise data including salaries, email conversations
and information about actual movies. Security software companies are investing in big data analytics to help companies
better protect against future attacks. FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts
world-wide.
3.5Manufacturing
The generated big data acts as the input into predictive tools and preventive strategies such as Prognostics and Health
Management [PHM]. Current PHM implementations mostly utilize data during the actual usage while analytical algorithms
can perform more accurately when more information throughout the machines lifecycle, such as system configuration,
physical knowledge and working principles, are included .With such motivation coupled model scheme has been
developed . The coupled model is a digital twin of the real machine that operates in the cloud platform and simulates the
health condition with an integrated knowledge from both data driven analytical algorithms as well as other available
physical knowledge.
3.6Climate
Take Climate Corporation, for instance. Open access to weather data powers the companys insurance products and Internet
software, which helps farmers manage risk and optimize their fields. Or take Zillow as another example. The successful
real estate media site uses federal and local government data, including satellite photography, tax assessment data and
economic statistics to provide potential buyers a more dynamic and informed view of the housing market.
3.7Personalized Medicine
Even as we engage in a vibrant discussion about the need for personal privacy, big data pushes the boundaries of what is
possible in health care. Whether we label it precision medicine or personalized medicine, these two aligned trends
the digitization of the health care system and the introduction of wearable devices are quietly revolutionizing health and
wellness .In the not-too-distant future, doctors will be able to create customized drugs and treatments tailored for your
genome, your activity level, and your actual health. Big data analytics has the potential to disrupt the way we practice
health care and change the way we think about our wellness.
3.8Digital Learning, Everywhere
Both sides recognize that digital learning, inside and outside the classroom, is an unavoidable trend. From Massive Open
Online Courses (MOOCs) to adaptive learning technologies that personalize the delivery of instructional material to the
individual student, educational technology thrives on data. From names that you grew up with (McGraw Hill, Houghton

__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved

Page -59

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 2 of February 2015
Mifflin, Pearson) to some you didnt (Cengage, Amplify), companies are making bold investments in digital products that
do more than just push content online; theyre touting products that fundamentally change how and when students learn and
how instructors evaluate individual student progress and aid their development. Now that weve moved past mere adoption
to implementation and utilization, 2015 will undoubtedly be big datas break-out year.

4. Market Growth
Big data has increased the demand of information management specialists in that Software AG, Oracle
Corporation, IBM,FICO, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms
specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at
almost 10 percent a year: about twice as fast as the software business as a whole. The world's effective capacity to
exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes
in 2000, 65 exabytes in 2007 and it is predicted that the amount of traffic flowing over the internet will reach 667 exabytes
annually by 2014. It is estimated that one third of the globally stored information is in the form of alphanumeric text and
still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused
data (i.e. in the form of video and audio content). Data sets grow in size in part because they are increasingly being
gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs,
cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. The world's
technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012,
every day 2.5 exabytes (2.51018) of data were created; as of 2014, every day 2.3 zettabytes(2.31021) of data were created.

5. Present Scenario
Large amounts of data, a variety of sources, high speed production, but also high speed processing - these are the basic
characteristics of Big Data. The amount of data that is generated and collected in each second grows exponentially. The
management of Big Data, the intelligent use of large, heterogeneous data sets, is becoming increasingly important for
competition. It is affecting all sectors - industry and academia but also the public sector. While the economy is exploring
Big Data as a new gold mine, politicians are fighting over the problem of data capitalism, whereas science tackles the
question of cross-disciplinary benefits, as well as on the challenges and the likely consequences for technology, innovation,
and society. As a marketing term or industry description, big data is so omnipresent these days that it doesnt mean much.
But it is pretty clear that we are at a tipping point. The global scale of the Internet, the ubiquity of mobile devices, the everdeclining costs of cloud computing and storage, and an increasingly networked physical word create an explosion of data
unlike anything weve seen before. Big data is nothing new. In fact, although the official definition which refers to big
data in terms of data volume, velocity and variety only came about in 2001, companies have been gathering large
amounts of data for decades. But big data has taken on a new lease of life in the last five years, largely as a result of
companies finding new ways to analyze data. Experts at GP Bull hound, an investment banking firm, suggest the future of
big data in a new report, entitled 'Extracting Insights from Exabytes', Really, big data has moved on from the initial stage
where the challenge was about storing the data and has moved onto the next, which is all about the insights companies
can obtain from the data.
5.1On March 25-27, 2015, researchers and international experts meet in Hannover for a Herrenhausen Conference
on "Big Data in a Transdisciplinary Perspective. The focus of the Herrenhausen Conference lies on open questions,
unsolved problems, and future perspectives. The conference on Big Data therefore will not focus on a particular discipline
but provide a transdisciplinary forum for Big Data researchers. We would like to discuss the challenges and consequences
of Big Data research for society as well as innovation and technology, address the influence on economics as well as the
legal framework and close on the challenges for research and research funding in the field of Big Data. Our goal is to create
an inspiring setting for the discussion of new ideas.
5.2Big Data Talent Hotspots
San Francisco Bay Area: Despite high talent cost, San Francisco Bay Area is expected to be the premier destination for Big
Data and Analytics talent. The High Tech ecosystem provides a favorable setting for the development and advancement of
new skillsets such as Big Data.. The Bay area is home to Big Data teams of reputed global firms like Google, Amazon,
Yahoo, Apple, LinkedIn and Face book. The presence of premier research institutions such as Stanford and University of
California ensure a steady supply of qualified graduate engineers. They offer intensive programs in the field of Big Data
research. Bay area is also a cradle for new age Big Data startups such as Platfora and Adchemy.
Bangalore: By 2020, Bangalore is expected to emerge as the second largest destination for Big Data R&D, driven by its fast
growing and cost effective talent pool. MNCs such as Amazon, IBM, EMC and E-bay have big data teams operating from
Bangalore. Local companies such as TCS, Wipro and Infosys are also building Big Data capabilities to cater to their
international clientele. Indian Institute of Science situated at Bangalore is a premier institute involved in cutting edge
research in the field of statistics and analytics. In addition, the presence of startups is enriching the big data ecosystem.

__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved

Page -60

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 2 of February 2015
Inmobi, a mobile advertisement platform headquartered in Bangalore is building digital solutions for global customers
using Big Data. Mu-Sigma Analytics recently valued at over a billion dollars * has built significant analytical capabilities
and employs a large number of decision scientists and data scientists. Shanghai: Shanghai is still in its nascent stages as a
Big Data R&D hot spot, but is expected to grow rapidly driven by talent and cost benefits. MNCs such as eBay, IBM, HP
and Intel have small Big Data teams working on Big Data platforms
5.3How big data can make cities work for the poor BY AXEL VAN TROTSENBERG ON FEB 2 2015
Big data can be a critically important tool in this exercise, which is the focus of our new report titled East Asias
Changing Urban Landscape: Measuring a Decade of Spatial Growth, It uses satellite imagery and geospatial mapping to
provide an analytical overview of the regions urbanization in the first decade of the 21st Century. This report uses
comparable data on an international scale to build a foundation to help planners ensure rapid expansion of cities benefits
people. This is critical for poverty reduction, because we already know that urbanization is associated with increased
incomes. The new data is part of the World Banks ongoing series of initiatives to engage with governments across the
region on urbanization.
5.4A Golden Era of Insight: Big Datas Bright Future
REDMOND, Wash. Feb. 15, 2013 At Microsoft Research labs around the world, some very deep thinkers are
contemplating big data. This includes Eric Horvitz, distinguished scientist at Microsoft and co-director of Microsoft
Researchs Redmond lab, who was recently elected to the National Academy of Engineering for his work in
computational mechanisms for decision making under uncertainty and with bounded resources.He sees a future where
machines, fueled by large amounts of data, can become empowering, lifelong digital companions who know what you
want or need where you want to go
and generally work with a passion on your behalf. Capturing data, storing it,
interpreting it, and leveraging it can provide insights on small and large scales, and in high-tech and mainstream
fields.Microsoft News Center recently spoke to Horvitz about how Microsoft Research (MSR) is investing time and talent
in the area of big data and machine intelligence, what breakthroughs MSR has made, and his vision for the future of these
fields. Looking out at the longer-term future, I expect that machine learning, and machine intelligence more broadly, is
going to provide us with foundational new tools for doing scientific research, and that many breakthroughs over the next
few decades will come as a collaboration between people and the machine learning and reasoning tools. There are
opportunities to learn new things from large amounts of data, including getting to the bottom of healthcare mysteries by
going through data with automated learning .Another direction is working to weave together a set of technologies
machine learning, speech recognition, natural language understanding, machine vision and decision making to create
systems that act like bright collaborators and that complement human intellect in new kinds of ways.
5.5Data scientists:
Next big opportunity for India Pradeep Thakur, TNN 2012 After the success of India's software and BPO industry, the
next big thing is likely to be Big Data where US multinationals are looking at the Indian market, with business proposition
worth $150 million to be created in the next few years. In what could be good news for Indians, these MNCs plan to hire
around 1 lakh professionals in a new category called Data Scientists by 2014. Even Indian IT majors are building up
analytics practice to compete with global MNCs. A NASSCOM-CRISIL report puts Big Data opportunity for Indian IT
industry to be worth $1 billion globally by 2015. Academic courses have been designed in association with universities and
were launched in Mumbai last week by a New York stock exchange listed firm, EMC Corporation, where the company has
set a target of training at least 30,000 scientists for ''Big Data'' management in 2013. Harvard Business Review has termed
Data Scientists to be "sexiest career of the 21st Century."India is emerging as the most lucrative market for global IT giants
with independent studies projecting Big Data solutions market to double in the next two years from $80 million in 2012 to
$153 million in 2014.An EMC report said, "Globally, we generated and consumed 1.8 zettabytes of data in 2011 which is
expected to grow to 35 zettabytes by 2020. In India over the next decade by 2020, digital information will grow from
40,000 petabytes to 2.3 million petabytes, twice as fast as the worldwide rate." A zettabyte is a trillion gigabytes, or a
billion terabytes.EMC, which has been providing data storage facility for the UPA government's ambitious Unique
Identification number project, has been in talks with the government to analyze the billions of data pieces it will capture to
provide solutions for efficient management of resources and study citizen's behavioral pattern to address their needs.
5.6Apple and IBM Deliver First Wave of IBM Mobile First for Ios Apps Big Data Analytics and Security
Capabilities Arrive on iPhone & iPad CUPERTINO, California and ARMONK, New YorkDecember 10, 2014
Apple and IBM today deliver the first wave of IBM Mobile First for iOS solutions in a new class of made-for-business
apps and supporting cloud services that bring IBMs big data and analytics capabilities to iPhone and iPad users in the
enterprise. IBM Mobile First for iOS solutions are now available to enterprise customers in banking, retail, insurance,
financial services, telecommunications and for governments and airlines, thanks to an unprecedented collaboration between
Apple and IBM. IBM clients today announcing support for IBM Mobile First for iOS solutions include: Citi, Air Canada,
Sprint and Banorte. Apple and IBM have launched a big data and analytics platform for iOS devices, designed to help

__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved

Page -61

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 2 of February 2015
businesses integrate secure, analytics-based apps and link them to current processes. It can be managed and upgraded via
cloud services from IBM specifically for iOS devices, making the process simple and secure for everyone involved.

6. Unique Features
6.1 Unstructured data has never been so ubiquitous One of the elements that makes big data, well, big, is the data type.
Unlike traditional business insight, which analyses structured data (the likes of which include financial details, sales and
inventory), big data analytics tends to focus on unstructured data, such as emails, videos, photos and even posts on social
media networks. According to a 2011 IBM report, IBM Big Data Success Stories, 90% of the world's data was created in
the two years before publication. Here are some figures to help you understand just where such data is coming from: every
minute, 208,300 photos are uploaded to Face book and 350,000 updates sent on Twitter.
6.2 Tools such as 'Hadoop' mean that storing large amounts of data has become incredibly cheap Although unstructured
data is becoming more pervasive Hadoop, an open-source framework for storing large scale data, has developed
substantially in the last decade. No longer a research project , Hadoop underpins data processing at some of the world's
largest internet businesses. Why? It can deal with unstructured data and it is faster and cheaper than tools before it.
6.3 Big data analytics could lead to $610 in productivity gains in only four sectors If there was mainstream adoption of big
data analytics, the retail and manufacturing industries alone could see an increase of $325 billion to their annual GDP as a
result of increased efficiency, according to a report by McKinsey in July. Healthcare and government services could also
see productivity gains of as much as $285 billion by 2020.
6.4 Big data analytics saw nearly $1.4 billion of VC funding in the last 12 months Venture capitalists have started to look at
big data analytics with increasing scrutiny. In the last 12 months, they invested $1.37 billion into various companies, an
increase of 217% over investment in the previous period. There were 19 deals in the last quarter alone, according to GP
Bull hounds report.

7. Conclusion
Big data is now "enterprise-ready", i.e. it is commercially useful. It is now cheaper to store and process this data and
increases in computer processing speeds mean that more businesses can leverage big data analytics. Analytics tools are
opening up big data to people without specialized skills. Although only PhD-level specialists could understand the earliest
versions of tools like Hadoop , new iterations and companies are democratizing big data. The most-common feature is for
companies to show the data in easy-to-understand visualisations.Analysis can now be done in real-time. While Hadoop was
not designed for real-time analysis, new companies are now innovating to build on the framework to give companies
instant insight. Such technology is being used by companies like Hailo, the taxi-calling app, to assign drivers to prospective
passengers.

References
[1]https://en.wikipedia-org/wiki/Big-data
[2]https://agenda.weforum.org/2015/02/how-big-data-can-make-cities-work-for-the-poor/
[3]https://timesofindia.indiatimes.com/tech/tech-news/Data-scientists-Next-big-opportunity-for-India/articlesshow/17282008.cms
[4]https://www.microsoft.com/enterprise/en-esa/it-trends/big-data/articles/a-golden-era-of-insight-big-data-s-brightfuture.aspx#fbid=tdyL9dPhDgz
[5]https://www.volkswagenstiftung.de
[6]https://www.apple.com/pr/library/2014/12/10Apple-and-IBM-Deliver-First-Wave-of-IBM-MobileFirst-for-iOS-Apps.html
[7]https://www.nature.com/nature/journal/v455/n7209/full/455028a.html
[8]https://www.sciencedirect.com/science/journal/22145796/1

__________________________________________________________________________________________
2015 ,IRJIE-All Rights Reserved

Page -62

Potrebbero piacerti anche