Sei sulla pagina 1di 27

About Data

Data comes from the Latin word, “datum,”


meaning a “thing given.” Although the term “data”
has been used since as early as the 1500s, modern
usage started in the 1940s and 1950s as practical
electronic computers began to input, process, and
output data.

The inventor of the World Wide Web, Tim


Berners-Lee, is often quoted as having said “Data is
not information, information is not knowledge,
knowledge is not understanding, understanding is
not wisdom.” This quote suggests a kind of pyramid,
where data are the raw materials that make up the
foundation at the bottom of the mountain, and
information, knowledge, understanding and wisdom
represent higher and higher levels of the pyramid.
Data is: Big!
● Around 100 hours of video are uploaded to YouTube every minute, it
would take about 15 years to watch every video uploaded in one day
● Everything around you collects/generates data
● Social media sites
● Business transactions
● Location-based data
● Sensors
● Digital photos, videos
● Consumer behavior (online and store transactions)
● More data is publicly available
● Database technology is advancing
● Cloud based & mobile applications are widespread
Databases You Use
Pretty much every website you interact with
Social Media Online Shopping
Banking Course Registration/ Canvas
File Sharing Travel
Search Engines Etc. etc. etc…..
You broadcast/generate data everywhere you go
Cell phones Email
Purchases Posting status updates
Driving (GPS) Attending events
Streaming music Etc. etc. etc…..
What is data science?
The basic ideas underlying the definitions
given above are that data science is used
to acquire knowledge from data in some
relevant fields and to provide support for
existing scientific research and
management decision making plan.
Data science affects academic and applied
research in many domains, including
machine translation, speech recognition,
robotics, search engines, digital economy,
but also the biological sciences, medical
informatics, health care, social sciences
and the humanities. It heavily influences
economics, business and finance.
Why Data Science?
Say, you have a company which makes mobile phones.
You released your first product, and it became a massive hit.
Every technology has a life, right? So, now its time to come up
with something new. But you don’t know what should be
innovated, so as to meet the expectations of the users, who
are eagerly waiting for your next release?
Somebody, in your company comes up with an idea of
using the user generated feedback and pick things which we
feel users are expecting in the next release.
Comes in Data Science, you apply various data mining
techniques like sentiment analysis etc and get the desired
results. 
It’s not only this, you can make better decisions, you
can reduce your production costs by coming out with efficient
ways, and give your customers what they actually want!
Benefits and uses of data science
and big data
• Data science and big data are used almost
everywhere in both commercial and
noncommercial settings. Commercial
companies in almost every industry use data
science and big data to gain insights into their
customers, processes, staff, completion, and
products. Many companies use data science
to offer customers a better user experience,
as well as to cross-sell, up-sell, and
personalize their offerings.
Data science is the science of
studying scientific data.
In theory, the scientific method works like this:
Researchers ask a question, construct a hypothesis, collect
data, evaluate their results, etc. the world gains valuable
scientific insights.
As scientific data have become more accessible, data
science has been used to better characterize the data
intensive nature of today’s science and engineering. Many
disciplines use data technology to deal with scientific data
from their respective areas.
For example, researchers in NuMedii, Inc., a big -data
company in Silicon Valley, predicted whether existing drugs
could be used to treat cancer by examining gene expression
data from over 2,500 tumor samples.
• A good example of a data for the product is the recommendation
engine that ingests data generated by users and composes tailored
implication depending on the data that has been extracted. Some
common examples of products generating data are:
• The recommendation engine of Amazon put forward items for
users to buy or show associated items and products as well as
decide which algorithm to use.
• Another data product is the Gmail's spam filterer - which runs an
algorithm in the background and checks for incoming emails and
decides whether that message is a spam or a genuine one.
• Self-driving cars are another data product - which uses machine
learning techniques and algorithms that can detect traffic lights,
other colliding objects in traffic, cars, and pedestrian on the road,
trees, and pillars, etc.
Data Science Components
Statistics:
Statistics is the most critical unit in Data science. It is the method or science
of collecting and analyzing numerical data in large quantities to get useful
insights.
Visualization:
Visualization technique helps you to access huge amounts of data in easy
to understand and digestible visuals.
Machine Learning:
Machine Learning explores the building and study of algorithms which learn
to make predictions about unforeseen/future data.
Deep Learning:
Deep Learning method is new machine learning research where the
algorithm selects the analysis model to follow.
Data science is the science of
studying business data.
In 2013, Provost pointed out, “extracting knowledge from data to
solve business problems” is one of the fundamental concepts of data
science.
Providing support for Business Intelligence methodology research
makes up a significant portion of the work performed by many data
scientists. To effect this, a large proportion of Business Intelligence
practitioners were transitioned into data scientists. Amazon, Google,
LinkedIn, Facebook ,and other internet companies opened job positions
for data scientists and established data science teams. These data
scientists study and analyze business data to provide services for
management decision making. For example, Amazon uses collaborative
filtering to generate high -quality product recommendations, and
Facebook uses a “People you may know” feature to recommend friend
connections. From this point of view, the acquisition of knowledge from
business data in order to make decisions is one aspect of data science.
This is similar to what Business Intelligence scientists work on. For this
reason, many BI scientists are also called data scientists.
Data science is an integration of statistics,
computing technology, and artificial
intelligence (AI).
This viewpoint often comes up in discussions on what
data scientists are. It is generally believed that data
scientists should have skills in statistics, computing
technology, AI, and related fields and that data scientists
are not individual people specializing in one field so much
as teams consisting of statisticians, computer scientists,
AI experts, and domain experts. For example, the data
scientist teams at Google and Facebook are composed of
statisticians, computer scientists, AI scientists, and
experts in other relevant fields. This viewpoint is simple:
Because statistics, computing technology, and AI are all
used to process and analyze data, they are all a natural
part of data science.
However, all work described above is still not enough to
establish data science as a new, unique branch of science. This
is because the objects of their study are things in the natural
world, and their research issues are also addressed in existing
scientific fields.
With the development of digital equipment, things in the
natural world are increasingly being stored in cyberspace in
the form of data. Data are entered, generated, and created in
cyberspace in a variety of ways and have become more and
more diverse, complex, and out of human control. More and
more data are unknown to or poorly understood by humans.
Data in the cyberspace already show features of an
independent world, like the natural world , so all data in
cyberspace are here referred to as data nature.
Real Data & Virtual Data
It should be noted that there are two types of data in the
cyberspace. The first is the data that represent things in
the natural world, here called real data. An example is
personal information, which is data representative of
personal characteristics. The second is data that do not
represent things in the natural world, here called virtual
data. Virtual data means that the instances of such data
have no references in the natural world. An example is
computer viruses, which are neither viruses in the natural
world nor data representation of real viruses; instead,
they only exist in cyberspace.
Data science – discovery of data insight
This aspect of data science is all about uncovering findings from
data. Diving in at a granular level to mine and understand
complex behaviors, trends, and inferences. It's about surfacing
hidden insight that can help enable companies to make smarter
business decisions. For example:
•Netflix data mines movie viewing patterns to understand what
drives user interest, and uses that to make decisions on which
Netflix original series to produce.
•Imtiaz Super Market identifies what are major customer
segments within it's base and the unique shopping behaviors
within those segments, which helps to guide messaging to
different market audiences.
•Proctor & Gamble utilizes time series models to more clearly
understand future demand, which help plan for production levels
more optimally.
What is Search Engine Optimization (SEO)?
Search Engine Optimization (SEO) is the practice of increasing the number
and quality of visitors to a website by improving rankings in the algorithmic search
engine results.
Research shows that websites on the first page of Google receive almost
95% of clicks, and studies show that results that appear higher up the page receive an
increased click through rate (CTR), and more traffic.

How Does SEO Work?


•Google (and Bing, which also power Yahoo search results) score their search results largely
based upon relevancy and authority of pages it has crawled and included in its web index, to a
users query to provide the best answer.
•SEO, therefore, involves making sure a website is accessible, technically sound, uses words that
people type into the search engines, and provides an excellent user experience, with useful and
high quality, expert content that helps answers the user’s query.
How do data scientists mine out insights?
• It starts with data exploration. When given a challenging
question, data scientists become detectives. They investigate
leads and try to understand pattern or characteristics within
the data. This requires a big dose of analytical creativity.
• Then as needed, data scientists may apply quantitative
technique in order to get a level deeper – e.g. inferential
models, segmentation analysis, time series forecasting,
synthetic control experiments, etc. The intent is to
scientifically piece together a forensic view of what the data is
really saying.
• This data-driven insight is central to providing strategic
guidance. In this sense, data scientists act as consultants,
guiding business stakeholders on how to act on findings.
Data science – development of data product
• A "data product" is a technical asset that: (1) utilizes data as
input, and (2) processes that data to return algorithmically-
generated results. The classic example of a data product is a
recommendation engine, which ingests user data, and makes
personalized recommendations based on that data. Here are
some examples of data products:
• Amazon's recommendation engines suggest items for you to
buy, determined by their algorithms. Netflix recommends
movies to you. Spotify website recommends music to you.
• Gmail's spam filter is data product – an algorithm behind the
scenes processes incoming mail and determines if a message
is junk or not.
• Computer vision used for self-driving cars is also data product
– machine learning algorithms are able to recognize traffic
lights, other cars on the road, pedestrians, etc.
“Hardware” and “Software” of
Data Science
Data storage, data communication, security, and
computing machinery can be considered as the
“hardware” in data science. Artificial intelligence (AI),
statistical methods, algorithm design, and possible new
mathematical methodologies can be viewed as the
“software” of data science.
Applications of Data science
• Internet Search:
Google search use Data science technology to search a
specific result within a fraction of a second.
• Recommendation Systems:
To create a recommendation system. Example,
"suggested friends" on Facebook or suggested videos"
on YouTube, everything is done with the help of Data
Science.
• Image & Speech Recognition:
Speech recognizes system like Siri, Google assistant,
Alexa runs on the technique of Data science.
Moreover, Facebook recognizes your friend when you
upload a photo with them, with the help of Data
Science.
What is Artificial Intelligence?
According to the father of Artificial Intelligence, John
McCarthy, it is “The science and engineering of making
intelligent machines, especially intelligent computer programs”.
Artificial Intelligence is a way of making a computer, a
computer-controlled robot, or a software think intelligently, in
the similar manner the intelligent humans think.
AI is accomplished by studying how human brain thinks,
and how humans learn, decide, and work while trying to solve a
problem, and then using the outcomes of this study as a basis of
developing intelligent software and systems.
Artificial Intelligence –
Research Areas
• The domain of artificial intelligence is huge in breadth and
width. While proceeding, we consider the broadly common
and prospering research areas in the domain of AI −
Real Life Applications of Research Areas
There is a large array of applications where AI is serving common people in
their day-to-day lives −
Sr.No. Research Areas Real Life Application

Expert Systems
1 Examples − Flight-tracking systems,
Clinical systems.

Natural Language Processing


Examples: Google Now feature,
2
speech recognition, Automatic voice
output.

Neural Networks
Examples − Pattern recognition
3 systems such as face recognition,
character recognition, handwriting
recognition.

Robotics
Examples − Industrial robots for
4 moving, spraying, painting, precision
checking, drilling, cleaning, coating,
carving, etc.

Fuzzy Logic Systems


5 Examples − Consumer electronics,
automobiles, etc.
SIR NASEEM AHMED KHAN
DOW VOCATIONAL & TECHNICAL TRAINING CENTRE
Email: naseemahmedkhan@hotmail.com

Potrebbero piacerti anche