Sei sulla pagina 1di 9

BIG DATA MANAGEMENT

SAW WAN SYNN, LIAU SHUK YEE ,NOR AZREENA HUSNA BT MOHD JAMAIL, ARMILA AZIRA BT
MOHD SHARIFF

Page 1|9
Introduction / Background

Definition of big data

Big data is a data sets that are so large complex that traditional data
processing application software is inadequate to deal with them. The big data
is used of predictive analysis, user behaviour analytic or certain other
advanced data analytics methods that extract value from data and seldom to a
particular size of data size. Besides that, big data also can mean a massive
volume of both structured and unstructured data that is so large it is difficult to
process using traditional database and software techniques. In most enterprise
scenarios the volume of data is too big or it moves too fast or it exceeds
current processing capacity.

Big Data has the potential to help companies improve operations and
make faster, more intelligent decisions. This data, when captured, formatted,
manipulated, stored, and analyse can help a company to gain useful insight to
increase revenues, get or retain customers, and improve operations.
Furthermore, big data relates to data creation, storage, retrieval and analysis
that is remarkable in terms of volume, velocity, and variety. Therefore, 3Vs
are three defining properties or dimensions of big data. Volume refers to the
amount of data for example organization collect data from a variety of sources,
including business transactions, social media and information from sensor or
machine-to-machine data. While velocity refers to the speed of data
processing for example data streams in at an unprecedented speed and must be
dealt with in a timely manner. RFID tags, sensors and smart metering are
driving the need to deal with torrents of data in near-real time. Next, variety is
refers to the number of types of data for example data comes in all types of
formats are from structured, numeric data in traditional databases to
unstructured text documents, email, video, audio, stock ticker data and
financial transactions.

Page 2|9
Figure 1: 3Vs of Big Data

What is big data management?

Big data management is where data management disciplines, tools, and


platforms are applied to the management of big data. Traditional data and new
big data can be quite different in terms of content, structure, and intended use,
and each category has many variations within it. To accommodate this
diversity, software solutions for BDM tend to include multiple types of data
management tools and platforms, as well as diverse user skills and practices

In addition, another explanation for big data management is the


organization, administration and governance of large volumes of both
structured and unstructured data.
For unstructured data is a generic label for describing data that is not
contained in a database or some other of data structure while unstructured data
can be textual or non-textual. Data structure is specialized format for
organizing and storing data. General data structure types include the array, the
file, the record, the table, the tree and so on.

Besides that, the goal of big data management is to ensure a high level
of data quality and accessibility for business intelligence and big data analytics
applications. Corporations, government agencies and other organizations

Page 3|9
employ big data management strategies to help them contend with fast-
growing pools of data, typically involving many terabytes or even petabytes of
information saved in a variety of file formats. Effective big data management
helps companies locate valuable information in large sets of unstructured data
and semi-structured data from a variety of sources, including call detail
records, system logs and social media sites.

Why managing big data is becoming increasingly important?

Because big data technology provides the possibility of data collection,


analysis, processing and application. Big data technology helps the
government to obtain and use massive amounts of data, providing
visualization application, supporting strategic decision and business decision.
Therefore, big data very helpful for organization manager can measure and
hence know more about their businesses, and directly translate that knowledge
into improved decision making and performance. Maintaning good data
performance is crucial especially in industries that deals with massive
ammount of data daily (Norazah M. Khushairi, Nurul, A. Emran 2014).
Besides that, when big data is effectively and efficiently captured, processed
and analyse, companies are able to gain a more complete understanding of
their business, customers, products, competitors which can lead to efficiency
improvements, increased sales, lower costs, better customer service, improved
products and services. In the other hand, also can help in identification of
hidden pattern and unknown correlations while a lot of organizations are using
big data to target customer-centric outcomes, tap into internal data and build a
better information ecosystem.

Challenges of big data management

One of the challenges of big data management is the data visualization. Its
hard to present the mountain of data in consumable form. If the interpreters-human or
software-concluded analyse the data and produce the output or result that cannot be
understand at all. The backbone of data visualization involve deep understanding of
the range and vagaries of human cognition. Its critical to do that.
Page 4|9
Secondly, data quality is another challenge of big data management. The
problem is the accumulation of data makes it hard to keep all data consistent , correct
and complete (Emran et al. 2008), (Leza & Emran 2014). The more the data that you
stored, the harder the data integrity. It is important to make sure the data is static all
the time. If the data cannot update to all places, it means the data is not synchronize
then the output will not be the same from origin. Hence, the data quality is bad at this
situation. The data should be stored and updated concurrently to achieve best data
quality in big data management.

Besides that, the more the data you stored, the harder you analyse the data.
Interpretation of data is hard to make. How you deal with the data is affecting the
interpretation result but the key point is that your understanding level about the data
stored. It definitely is not easy job to read, understand, analyse the data because big
data include many stuffs inside. Efficient filter and pattern recognizers have to be
designed to sieve through the huge of data. As a result, the finding pattern which may
relevant with the dimension of interest.

Furthermore, querying is hard to achieve within big data. The method or the
way that the data stored may affect the querying of data. Such a huge amount of data
stored and you must know the relationship between data before to get the overall
output. On the other hand, it is sure that you can get the output faster if you retrieve
data from 10 rows only compare to thousands of row of data. The high complexity
and high volume of data caused crucial querying.

Security of big data management also is an interest issue among challenges of


big data management. Not only security, privacy and regulatory consideration also
need to be concerned in big data management. Security prevent the unauthorized
people to log in into account and prevent the data and information from gaining
access by unauthorized users. It is not that easy to retrieve the intended data from big
data. We sure want to capture the reliable content which is not edit by unauthorized
people. Privacy is the matter of keeping confidential on big data. For instance, the

Page 5|9
personal health data should be kept well because the information inside is fully
confidential to the patient.

Tools for big data management

The tools that work with big data are used for storage, analyzation, and
querying. Storing big data is an issue that data managers need to deal with especially
with organizations that set green data management as a priority (Emran et al. 2013). A
good big data tools will provide the best infrastructure to support all the related
activities. This is due to the fact that the data we duel with is quite big compare to
traditional data. Therefore, good tools are demanding to get the best performance on
big data. Hadoop, Cloudera, Talend and others are the instances of big data
management tools.

Hadoop is one of the tools to manage big data. This is a popular tool to duel
with data organization and data tackling. Hadoop is produced by Apache and it is
open-source software framework. It is prevalent among many industries because it
provides advanced software library is superior processing of voluminous data sets in
clusters of computers using effective programming models. Hadoop has the ability to
achieve great processing and handle virtually limitless concurrent tasks or jobs. In
addition, the developer provides improvements and updates to the product regularly.

Cloudera has the main purpose in creating data repository that can be accessed
by all corporate users that need the data for different purposes. Cloudera helps the
business to build an enterprise data hub and allow people in organization better access
to the data that are storing. Cloudera just like the enterprise solution to manage the
business and Hadoop ecosystem too at the same time. Cloudera helps to increase the
competitive power by this combination.

Talend also provide a good platform to perform with big data. It is open-
source and the name of the software is Talend Open Studio. Talend offers Eclipse-

Page 6|9
based IDE to combine the tasks with Hadoop. They are focusing the Master Data
Management (MDM) offering, which combines real-time data, applications, and
process integration with embedded data quality and stewardship. Talend Studio able
to build up jobs by dragging and dropping little icons onto a canvas. If want to get an
RSS feed, component of Talend will fetch the RSS and add proxying if necessary.
There are dozens of components for gathering information and dozens more for doing
things like a "fuzzy match." Then, output the results. Stringing together blocks
visually can be simple after get a feel for what the components actually do and don't
do. This was easier to figure out when started looking at the source code being
assembled behind the canvas. Visual programming of Talend may seem like a lofty
goal, but the icons can never represent the mechanisms with enough detail to make it
possible to understand what's going on.

Conclusion

Big data management is the current trend now. It helps to increase the level of
business intelligent level. Moreover, it enhances better performance on running
operational data, cleaning data, enriching data, modelling data and others for the best
analysis result. Majority of the software for big data management are open-source
makes it easy to deploy data analysis that cooperation between data volume, velocity
and variety. Big data management helps to conquer the rapid changing of the
innovation now. In a nutshell, Big Data is the up and coming era of information
warehousing and business investigation and is ready to convey best line incomes cost
proficiently for enterprises.

References

Search Data Management. (2016). Big Data Management. Retrieved from


http://searchdatamanagement.techtarget.com/definition/big-data-management

Wikipedia. (2017). Big Data. Retrieved from


https://en.wikipedia.org/wiki/Big_data

Page 7|9
Webopedia. (2017). Big Data. Retrieved from
http://www.webopedia.com/TERM/B/big_data.html

Teach Target. (2013). 3vs. Retrieved from


http://whatis.techtarget.com/definition/3Vs

Search Business Analytics. (2010). Unstructured Data. Retrieved from


http://searchbusinessanalytics.techtarget.com/definition/unstructured-data

Search SQL Server. (2006). Data Structure. Retrieved from


http://searchsqlserver.techtarget.com/definition/data-structure

Tom Jager. (2016). Top 10 tools for working with big data for successful analytics
developers. Retrieved from
http://bigdata-madesimple.com/top-10-tools-for-working-with-big-data-for-
successful-analytics-developers-2/

Import.io. (2017). All the best big data tools and how to use them. Retrieved from
https://www.import.io/post/all-the-best-big-data-tools-and-how-to-use-them/

James Nunns. (2015). 10 of the most popular Big Data tools for developers. Retrieved
from
http://www.cbronline.com/news/big-data/10-of-the-most-popular-big-data-tools-for-
developers-4570483/

Kathy Simpson. (2016). 10 Tips to Prevent Data Theft for Your Small Business.
Retrieved from
https://sba.thehartford.com/managing-risk/10-tips-to-prevent-data-theft

Rajeev Agrawal, Christopher Nyamful. (2016). Challenges of big data storage and
management. Retrieved from

Page 8|9
https://www.researchgate.net/publication/298433319_Challenges_of_big_data_storag
e_and_management

Bill Carmody. (2016). Biggest problem with big data management in 2016. Retrieved
from
https://www.inc.com/bill-carmody/biggest-problem-with-big-data-management-in-
2016.html

Kirk Borne. (2014). Top 10 big data challenges A serious look at 10 big data vs.
Retrieved from
https://mapr.com/blog/top-10-big-data-challenges-serious-look-10-big-data-vs/
Peter Wayne. (2012). 7 top tools for taming big data. Retrieved from
http://www.infoworld.com/article/2616959/big-data/7-top-tools-for-taming-big-
data.html

John Parkinson. (2012). Managing Big Data: Six Operational Challenges. Retrieved
from
http://www.cioinsight.com/c/a/Expert-Voices/Managing-Big-Data-Six-Operational-
Challenges-484979
Emran, N.A., Abdullah, N. & Isa, M.N.M., 2013. Storage space optimisation for
green data center. In Procedia Engineering. pp. 483490.

Emran, N., Embury, S. & Missier, P., 2008. Model-driven component generation for
families of completeness. In 6th International Workshop on Quality in
Databases and Management of Uncertain Data, Very Large Databases (VLDB).

Leza, F.N.M. & Emran, N.A., 2014. Data accessibility model using QR code for
lifetime healthcare records. World Applied Sciences Journal, 30(30), pp.395402.

Norazah M. Khushairi, Nurul, A. Emran, M.M.Y., 2014. Database Performance


Tuning Methods for Manufacturing Execution System. World Applied Sciences
Journal, 30(30), pp.97-199.

Page 9|9

Potrebbero piacerti anche