Ebook753 pages6 hours

Principles of Data Science

Name: Principles of Data Science
Brand: Packt Publishing
Rating: 4.0 (3 reviews)

By Sinan Ozdemir

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

You should be fairly well acquainted with basic algebra and should feel comfortable reading snippets of R/Python as well as pseudo code. You should have the urge to learn and apply the techniques put forth in this book on either your own data sets or those provided to you. If you have the basic math skills but want to apply them in data science or you have good programming skills but lack math, then this book is for you.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateDec 16, 2016

ISBN9781785888922

Author

Sinan Ozdemir

Related authors

Skip carousel

Related to Principles of Data Science

Related ebooks

Skip carousel

Python Data Science Essentials - Second Edition
Ebook
Python Data Science Essentials - Second Edition
byBoschetti Alberto
Rating: 4 out of 5 stars
4/5
Data Science: Concepts and Practice
Ebook
Data Science: Concepts and Practice
byVijay Kotu
Rating: 3 out of 5 stars
3/5
Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5
Python Deep Learning
Ebook
Python Deep Learning
byValentino Zocca
Rating: 5 out of 5 stars
5/5
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Learning Predictive Analytics with Python
Ebook
Learning Predictive Analytics with Python
byKumar Ashish
Rating: 0 out of 5 stars
0 ratings
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
Ebook
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
byRupam Kumar Sharma
Rating: 0 out of 5 stars
0 ratings
Mastering Python Data Analysis
Ebook
Mastering Python Data Analysis
byMagnus Vilhelm Persson
Rating: 0 out of 5 stars
0 ratings
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
Ebook
Data Science: What the Best Data Scientists Know About Data Analytics, Data Mining, Statistics, Machine Learning, and Big Data – That You Don't
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Machine Learning with R - Second Edition
Ebook
Machine Learning with R - Second Edition
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Data Analytics. Fast Overview.
Ebook
Data Analytics. Fast Overview.
byGeorge Letton
Rating: 3 out of 5 stars
3/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Ebook
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
byRiley Adams
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Regression Analysis with Python
Ebook
Regression Analysis with Python
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
R Machine Learning By Example
Ebook
R Machine Learning By Example
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
Machine Learning with R
Ebook
Machine Learning with R
byBrett Lantz
Rating: 4 out of 5 stars
4/5
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
Ebook
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
byYoon Hyup Hwang
Rating: 5 out of 5 stars
5/5
Data Science for Business: Predictive Modeling, Data Mining, Data Analytics, Data Warehousing, Data Visualization, Regression Analysis, Database Querying, and Machine Learning for Beginners
Ebook
Data Science for Business: Predictive Modeling, Data Mining, Data Analytics, Data Warehousing, Data Visualization, Regression Analysis, Database Querying, and Machine Learning for Beginners
byHerbert Jones
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
Ebook
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
byChris Y. Reynolds
Rating: 0 out of 5 stars
0 ratings
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Programming Arduino: Getting Started with Sketches
Ebook
Programming Arduino: Getting Started with Sketches
bySimon Monk
Rating: 4 out of 5 stars
4/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
HTML in 30 Pages
Ebook
HTML in 30 Pages
byU.Q. Magnusson
Rating: 5 out of 5 stars
5/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python for Beginners: Learn the Fundamentals of Computer Programming
Ebook
Python for Beginners: Learn the Fundamentals of Computer Programming
byJ Foster
Rating: 0 out of 5 stars
0 ratings
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
TensorFlow in 1 Day: Make your own Neural Network
Ebook
TensorFlow in 1 Day: Make your own Neural Network
byKrishna Rungta
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byChris Minnick
Rating: 0 out of 5 stars
0 ratings
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
#63 The Past and Present of Data Science
Podcast episode
#63 The Past and Present of Data Science
byDataFramed
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
Podcast episode
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
Improving the Learning Experience on Real Python
Podcast episode
Improving the Learning Experience on Real Python
byThe Real Python Podcast
0 ratings
0% found this document useful
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
Chapter 1: What is Data Science?
Podcast episode
Chapter 1: What is Data Science?
byBuild a Career in Data Science
0 ratings
0% found this document useful
Measuring Your Python Learning Progress
Podcast episode
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
Podcast episode
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
byData Engineering Podcast
0 ratings
0% found this document useful
119: Editable Python Installs, Packaging Standardization, and pyproject.toml - Brett Cannon: Brett and I talk about some upcoming work on Python packaging, such as: * editable install standardization * other tools using pyproject.toml for configuration * what should and shouldn't be in the standard library * and a few tangents
Podcast episode
119: Editable Python Installs, Packaging Standardization, and pyproject.toml - Brett Cannon: Brett and I talk about some upcoming work on Python packaging, such as: * editable install standardization * other tools using pyproject.toml for configuration * what should and shouldn't be in the standard library * and a few tangents
byTest and Code
0 ratings
0% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
Podcast episode
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
82: How to Get Started with Advanced Analytics R-Python w/ Ryan Wade: Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He...
Podcast episode
82: How to Get Started with Advanced Analytics R-Python w/ Ryan Wade: Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He...
byAnalytics on Fire
0 ratings
0% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
Podcast episode
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
Podcast episode
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
byCurious Minds at Work
0 ratings
0% found this document useful
How Data Science is Improving Venture Capital: Jon asks Peter everything about data science. What data sources are they currently using? And how much will it disrupt the functioning of Venture Capital firms.
Podcast episode
How Data Science is Improving Venture Capital: Jon asks Peter everything about data science. What data sources are they currently using? And how much will it disrupt the functioning of Venture Capital firms.
byVenture Capital
0 ratings
0% found this document useful
38: How skilled do you need to be at marketing reporting?: You need to incorporate reporting into your skillset, and it’s not as scary as you think. There’s a lot that makes you qualified to produce reports, even if you don’t feel like an expert. Marketers, particularly in smaller companies, need to learn enough
Podcast episode
38: How skilled do you need to be at marketing reporting?: You need to incorporate reporting into your skillset, and it’s not as scary as you think. There’s a lot that makes you qualified to produce reports, even if you don’t feel like an expert. Marketers, particularly in smaller companies, need to learn enough
byHumans of Martech
0 ratings
0% found this document useful
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
Podcast episode
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
byThe Effective Statistician - in association with PSI
0 ratings
0% found this document useful
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
Podcast episode
051: Strategy evaluation techniques, flaws and solutions with Dave Walton: Today we’re covering a topic which can really be a concern for traders of all levels, from beginner to pro, and that is the topic of strategy evaluation. Have you ever found that real-life performance does not match expected results? Or perhaps you...
byBetter System Trader
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
Podcast episode
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
#41 | 5 Steps To Using Your Podcast Analytics & Metrics To Grow An Audience - Deep Dive
Podcast episode
#41 | 5 Steps To Using Your Podcast Analytics & Metrics To Grow An Audience - Deep Dive
byWhy Your Podcast Isn't Growing: A Get More Listeners Podcast For Podcasters
0 ratings
0% found this document useful
What is Customer Science? Is this the next wave of change?: The fusion of Technology, behavioral science and data.
Podcast episode
What is Customer Science? Is this the next wave of change?: The fusion of Technology, behavioral science and data.
byThe Intuitive Customer - Helping You Improve Your Customer Experience To Gain Growth
0 ratings
0% found this document useful

Skip carousel

Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
How to Make Predictive Analytics Work for Your Business
Entrepreneur
Article
How to Make Predictive Analytics Work for Your Business
Jul 1, 2014
1 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Rotman Management
Article
Questions for Angela Zutavern, Machine Intelligence Expert, Booz Allen Hamilton
Jan 1, 2018
You believe that the world of leadership has hit an inflection point. How so? As useful as popular mental models and heuristics are, machine models now outstrip human performance in about half of the portfolio of cognitive tasks. Going forward, we wi
6 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Things Get Strange When AI Starts Training Itself
The Atlantic
Article
Things Get Strange When AI Starts Training Itself
Feb 16, 2024
7 min read
Questions for Iris Bohnet, Professor, Harvard Kennedy School
Rotman Management
Article
Questions for Iris Bohnet, Professor, Harvard Kennedy School
Sep 1, 2017
There is some disagreement about the ‘business case’ for gender equality. What is your take on it? The disagreement is justified. The focus to date has largely been on the diversity of corporate boards and senior management teams, and the problem is,
6 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
Better Together: Behavioural Science + Data Science
Rotman Management
Article
Better Together: Behavioural Science + Data Science
May 1, 2020
IMAGINE THIS SCENARIO: You are designing a new customer experience to drive a shift in customer behaviour. You have reviewed the reports and dashboards describing current behaviour. You have asked customers how they felt and incorporated their feedba
5 min read
Essentially There Many Fundamental Questions
The European Business Review
Article
Essentially There Many Fundamental Questions
May 25, 2021
1 What we are trying to assess? The answer appears to be no: selectors are still interested in an individual’s ability, personality and motivation as well as their integrity and health. Whilst new concepts appear every so often (e.g agility, resilien
1 min read
The Democratization of Judgment
Rotman Management
Article
The Democratization of Judgment
Jan 1, 2018
8 min read
Small Data
PC Pro Magazine
Article
Small Data
Oct 8, 2022
3 min read
Deconstructing Management Analytics
Rotman Management
Article
Deconstructing Management Analytics
Sep 1, 2022
7 min read
The Data-Empowered Organization
Rotman Management
Article
The Data-Empowered Organization
Sep 1, 2022
A FEW YEARS BACK, the media was full of articles about how Big Data would solve a perrennial challenge: gaining valuable customer insights. Today, it is everywhere because of the growth of devices recording data and the connectivity between those dev
6 min read
The Future Of Cannabis Data
High Times
Article
The Future Of Cannabis Data
Jan 10, 2024
3 min read
Making Experimentation Work For You
Rotman Management
Article
Making Experimentation Work For You
Sep 1, 2020
You have said that every organization should embrace the practice of continuous experimentation. Why is this so critical, particularly right now? The main reason is that experimentation is the engine of innovation. As a result, anyone who cares about
6 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Intelligence Analysis
PRIVATE GAME WILDLIFE RANCHING
Article
Intelligence Analysis
Jun 13, 2018
3 min read
BEWARE THE-LIMITING BRIEF SELE & THE LIMITS OF YOUR OWN BUSINESS CULTURE
NZ Marketing
Article
BEWARE THE-LIMITING BRIEF SELE & THE LIMITS OF YOUR OWN BUSINESS CULTURE
Mar 27, 2017
Last December MarketResearch.com reported on what international market research heavyweights predict to be the biggest market research trends of 2017. Pleasingly, all the trends are evident in New Zealand and Ipsos isn’t the only company up with the
2 min read
As CMOs Morph, Agency Strategists Rise
Marketing
Article
As CMOs Morph, Agency Strategists Rise
Oct 14, 2019
Providing advice about collection, processing, interpretation and meaningful deployment of data has become a common discussion between clients and agencies. However, more than a real necessity aiming at addressing a specific challenge or embracing an
4 min read

Related categories

Skip carousel

Reviews for Principles of Data Science

Rating: 4 out of 5 stars

4/5

3 ratings0 reviews

Book preview

Principles of Data Science - Sinan Ozdemir

Principles of Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. How to Sound Like a Data Scientist

What is data science?

Basic terminology

Why data science?

Example – Sigma Technologies

The data science Venn diagram

The math

Example – spawner-recruit models

Computer programming

Why Python?

Python practices

Example of basic Python

Example – parsing a single tweet

Domain knowledge

Some more terminology

Data science case studies

Case study – automating government paper pushing

Fire all humans, right?

Case study – marketing dollars

Case study – what's in a job description?

Summary

2. Types of Data

Flavors of data

Why look at these distinctions?

Structured versus unstructured data

Example of data preprocessing

Word/phrase counts

Presence of certain special characters

Relative length of text

Picking out topics

Quantitative versus qualitative data

Example – coffee shop data

Example – world alcohol consumption data

Digging deeper

The road thus far…

The four levels of data

The nominal level

Mathematical operations allowed

Measures of center

What data is like at the nominal level

The ordinal level

Examples

Mathematical operations allowed

Measures of center

Quick recap and check

The interval level

Example

Mathematical operations allowed

Measures of center

Measures of variation

Standard deviation

The ratio level

Examples

Measures of center

Problems with the ratio level

Data is in the eye of the beholder

Summary

3. The Five Steps of Data Science

Introduction to data science

Overview of the five steps

Ask an interesting question

Obtain the data

Explore the data

Model the data

Communicate and visualize the results

Explore the data

Basic questions for data exploration

Dataset 1 – Yelp

Dataframes

Series

Exploration tips for qualitative data

Nominal level columns

Filtering in Pandas

Ordinal level columns

Dataset 2 – titanic

Summary

4. Basic Mathematics

Mathematics as a discipline

Basic symbols and terminology

Vectors and matrices

Quick exercises

Answers

Arithmetic symbols

Summation

Proportional

Dot product

Graphs

Logarithms/exponents

Set theory

Linear algebra

Matrix multiplication

How to multiply matrices

Summary

5. Impossible or Improbable – A Gentle Introduction to Probability

Basic definitions

Probability

Bayesian versus Frequentist

Frequentist approach

The law of large numbers

Compound events

Conditional probability

The rules of probability

The addition rule

Mutual exclusivity

The multiplication rule

Independence

Complementary events

A bit deeper

Summary

6. Advanced Probability

Collectively exhaustive events

Bayesian ideas revisited

Bayes theorem

More applications of Bayes theorem

Example – Titanic

Example – medical studies

Random variables

Discrete random variables

Types of discrete random variables

Binomial random variables

Poisson random variable,

Continuous random variables

Summary

7. Basic Statistics

What are statistics?

How do we obtain and sample data?

Obtaining data

Observational

Experimental

Sampling data

Probability sampling

Random sampling

Unequal probability sampling

How do we measure statistics?

Measures of center

Measures of variation

Definition

Example – employee salaries

Measures of relative standing

The insightful part – correlations in data

The Empirical rule

Summary

8. Advanced Statistics

Point estimates

Sampling distributions

Confidence intervals

Hypothesis tests

Conducting a hypothesis test

One sample t-tests

Example of a one sample t-tests

Assumptions of the one sample t-tests

Type I and type II errors

Hypothesis test for categorical variables

Chi-square goodness of fit test

Assumptions of the chi-square goodness of fit test

Example of a chi-square test for goodness of fit

Chi-square test for association/independence

Assumptions of the chi-square independence test

Summary

9. Communicating Data

Why does communication matter?

Identifying effective and ineffective visualizations

Scatter plots

Line graphs

Bar charts

Histograms

Box plots

When graphs and statistics lie

Correlation versus causation

Simpson's paradox

If correlation doesn't imply causation, then what does?

Verbal communication

It's about telling a story

On the more formal side of things

The why/how/what strategy of presenting

Summary

10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials

What is machine learning?

Machine learning isn't perfect

How does machine learning work?

Types of machine learning

Supervised learning

It's not only about predictions

Types of supervised learning

Regression

Classification

Data is in the eyes of the beholder

Unsupervised learning

Reinforcement learning

Overview of the types of machine learning

How does statistical modeling fit into all of this?

Linear regression

Adding more predictors

Regression metrics

Logistic regression

Probability, odds, and log odds

The math of logistic regression

Dummy variables

Summary

11. Predictions Don't Grow on Trees – or Do They?

Naïve Bayes classification

Decision trees

How does a computer build a regression tree?

How does a computer fit a classification tree?

Unsupervised learning

When to use unsupervised learning

K-means clustering

Illustrative example – data points

Illustrative example – beer!

Choosing an optimal number for K and cluster validation

The Silhouette Coefficient

Feature extraction and principal component analysis

Summary

12. Beyond the Essentials

The bias variance tradeoff

Error due to bias

Error due to variance

Two extreme cases of bias/variance tradeoff

Underfitting

Overfitting

How bias/variance play into error functions

K folds cross-validation

Grid searching

Visualizing training error versus cross-validation error

Ensembling techniques

Random forests

Comparing Random forests with decision trees

Neural networks

Basic structure

Summary

13. Case Studies

Case study 1 – predicting stock prices based on social media

Text sentiment analysis

Exploratory data analysis

Regression route

Classification route

Going beyond with this example

Case study 2 – why do some people cheat on their spouses?

Case study 3 – using tensorflow

Tensorflow and neural networks

Summary

Index

Principles of Data Science

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2016

Production reference: 1121216

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-791-8

www.packtpub.com

Credits

Author

Sinan Ozdemir

Reviewers

Samir Madhavan

Oleg Okun

Acquisition Editor

Sonali Vernekar

Content Development Editor

Samantha Gonsalves

Technical Editor

Anushree Arun Tendulkar

Copy Editor

Shaila Kusanale

Project Coordinator

Devanshi Doshi

Proofreaders

Safis Editing

Indexer

Tejal Daruwale Soni

Graphics

Jason Monteiro

Production Coordinator

Melwyn Dsa

Cover Work

Melwyn Dsa

About the Author

Sinan Ozdemir is a data scientist, startup founder, and educator living in the San Francisco Bay Area with his dog, Charlie; cat, Euclid; and bearded dragon, Fiero. He spent his academic career studying pure mathematics at Johns Hopkins University before transitioning to education. He spent several years conducting lectures on data science at Johns Hopkins University and at the General Assembly before founding his own start-up, Legion Analytics, which uses artificial intelligence and data science to power enterprise sales teams.

After completing the Fellowship at the Y Combinator accelerator, Sinan has spent most of his days working on his fast-growing company, while creating educational material for data science.

I would like to thank my parents and my sister for supporting me through life, and also, my various mentors, including Dr. Pam Sheff of Johns Hopkins University and Nathan Neal, the chapter adviser of my collegiate leadership fraternity, Sigma Chi.

Thank you to Packt Publishing for giving me this opportunity to share the principles of data science and my excitement for how this field will impact all of our lives in the coming years.

About the Reviewers

Samir Madhavan has over six years of rich data science experience in the industry and has also written a book called Mastering Python for Data Science. He started his career with Mindtree, where he was a part of the fraud detection algorithm team for the UID (Unique Identification) project, called Aadhar, which is the equivalent of a Social Security number for India. After this, he joined Flutura Decision Sciences and Analytics as the first employee, where he was part of the core team that helped the organization scale to an over a hundred members. As a part of Flutura, he helped establish big data and machine learning practice within Flutura and also helped out in business development. At present, he is leading the analytics team for a Boston-based pharma tech company called Zapprx, and is helping the firm to create data-driven products that will be sold to its customers.

Oleg Okun is a machine learning expert and an author/editor of four books, numerous journal articles, and conference papers. His career spans more than a quarter of a century. He was employed in both academia and industry in his mother country, Belarus, and abroad (Finland, Sweden, and Germany). His work experience includes document image analysis, fingerprint biometrics, bioinformatics, online/offline marketing analytics, and credit-scoring analytics.

He is interested in all aspects of distributed machine learning and the Internet of Things. Oleg currently lives and works in Hamburg, Germany.

I would like to express my deepest gratitude to my parents for everything that they have done for me.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Preface

The topic of this book is data science, which is a field of study and application that has been growing rapidly for the past several decades. As a growing field, it is gaining a lot of attention in both the media as well as in the job market. The United States recently appointed its first ever chief data scientist, DJ Patil. This move was modeled after tech companies who, honestly, only recently started hiring massive data teams. These skills are in high demand and their applications extend much further than today's job market.

This book will attempt to bridge the gap between math/programming/domain expertise. Most people today have expertise in at least one of these (maybe two), but proper data science requires a little bit of all three. We will dive into topics from all three areas and solve complex problems. We will clean, explore, and analyze data in order to derive scientific and accurate conclusions. Machine learning and deep learning techniques will be applied to solve complex data tasks.

What this book covers

Chapter 1, How to Sound Like a Data Scientist, gives an introduction to the basic terminology used by data scientists and a look at the types of problem we will be solving throughout this book.

Chapter 2, Types of Data, looks at the different levels and types of data out there and how to manipulate each type. This chapter will begin to deal with the mathematics needed for data science.

Chapter 3, The Five Steps of Data Science, uncovers the five basic steps of performing data science, including data manipulation and cleaning, and sees examples of each step in detail.

Chapter 4, Basic Mathematics, helps us discover the basic mathematical principles that guide the actions of data scientists by seeing and solving examples in calculus, linear algebra, and more.

Chapter 5, Impossible or Improbable – a Gentle Introduction to Probability, is a beginner's look into probability theory and how it is used to gain an understanding of our random universe.

Chapter 6, Advanced Probability, uses principles from the previous chapter and introduces and applies theorems, such as the Bayes Theorem, in the hope of uncovering the hidden meaning in our world.

Chapter 7, Basic Statistics, deals with the types of problem that statistical inference attempts to explain, using the basics of experimentation, normalization, and random sampling.

Chapter 8, Advanced Statistics, uses hypothesis testing and confidence interval in order to gain insight from our experiments. Being able to pick which test is appropriate and how to interpret p-values and other results is very important as well.

Chapter 9, Communicating Data, explains how correlation and causation affect our interpretation of data. We will also be using visualizations in order to share our results with the world.

Chapter 10, How to Tell If Your Toaster Is Learning – Machine Learning Essentials, focuses on the definition of machine learning and looks at real-life examples of how and when machine learning is applied. A basic understanding of the relevance of model evaluation is introduced.

Chapter 11, Predictions Don't Grow on Trees, or Do They?, looks at more complicated machine learning models, such as decision trees and Bayesian-based predictions, in order to solve more complex data-related tasks.

Chapter 12, Beyond the Essentials, introduces some of the mysterious forces guiding data sciences, including bias and variance. Neural networks are introduced as a modern deep learning technique.

Chapter 13, Case Studies, uses an array of case studies in order to solidify the ideas of data science. We will be following the entire data science workflow from start to finish multiple times for different examples, including stock price prediction and handwriting detection.

What you need for this book

This book uses Python to complete all of its code examples. A machine (Linux/Mac/Windows OK) with access to a Unix-style Terminal with Python 2.7 installed is required. Installation of the Anaconda distribution is also recommended as it comes with most of the packages used in the examples.

Who this book is for

This book is for people who are looking to understand and utilize the basic practices of data science for any domain.

The reader should be fairly well acquainted with basic mathematics (algebra, perhaps probabilities) and should feel comfortable reading snippets of R/Python as well as pseudocode. The reader is not expected to have worked in a data field; however, they should have the urge to learn and apply the techniques put forth in this book to either their own datasets or those provided to them.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: For these operators, keep the boolean datatype in mind.

A block of code is set as follows:

tweet = "RT @j_o_n_dnger: $TWTR now top holding for

Andor, unseating $AAPL"

words_in_tweet = first_tweet.split(' ') # list of words in tweet

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

for word in words_in_tweet: # for each word in list

if $ in word: # if word has a cashtag

print THIS TWEET IS ABOUT, word # alert the user

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Principles-of-Data-Science. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/PrinciplesofDataScience_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of the existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Chapter 1. How to Sound Like a Data Scientist

No matter which industry you work in, IT, fashion, food, or finance, there is no doubt that data affects your life and work. At some point in this week, you will either have or hear a conversation about data. News outlets are covering more and more stories about data leaks, cybercrimes, and how data can give us a glimpse into our lives. But why now? What makes this era such a hotbed for data-related industries?

In the 19th century, the world was in the grip of the industrial age. Mankind was exploring its place in industry alongside giant mechanical inventions. Captains of industry, such as Henry Ford, recognized major market opportunities at the hands of these machines, and were able to achieve previously unimaginable profits. Of course the industrial age had its pros and cons. While mass production placed goods in the hands of more consumers, our battle with pollution also began around this time.

By the 20th century, we were quite skilled at making huge machines; the goal now was to make them smaller and faster. The industrial age was over and was replaced by what we refer to as the information age. We started using machines to gather and store information (data) about ourselves and our environment for the purpose of understanding our universe.

Beginning in the 1940s, machines like ENIAC (considered one of, if not the first, computer) were computing math equations and running models and simulations like never before.

The ENIAC, http://ftp.arl.mil/ftp/historic-computers/

We finally had a decent lab assistant who could run the numbers better than we could! As with the industrial age, the information age brought us both the good and the bad. The good was the extraordinary pieces of technology, including mobile phones and televisions. The bad in this case was not as bad as worldwide pollution, but still left us with a problem in the 21st century, so much data.

That's right, the information age, in its quest to procure data, has exploded the production of electronic data. Estimates show that we created about 1.8 trillion gigabytes of data in 2011 (take a moment to just think about how much that is). Just one year later, in 2012, we created over 2.8 trillion gigabytes of data! This number is only going to explode further to hit an estimated 40 trillion gigabytes of data creation in just one year by 2020. People contribute to this every time they tweet, post on Facebook, save a new resume on Microsoft Word, or just send their mom a picture through text message.

Not only are we creating data at an unprecedented rate, we are consuming it at an accelerated pace as well. Just three years ago, in 2013, the average cell phone user used under 1 GB of data a month. Today, that number is estimated to be well over 2 GB a month. We aren't just looking for the next personality quiz, what we are looking for is insight. All of this data out there, some of it has to be useful to me! And it can be!

So we, in the 21st century, are left with a problem. We have so much data and we keep making more. We have built insanely tiny machines that collect data 24/7, and it's our job to make sense of it all. Enter the data age. This is the age when we take machines dreamed up by our 19th century ancestors and the data created by our 20th century counterparts and create insights and sources of knowledge that every human on Earth can benefit from. The United States created an entire new role in the government for the chief data scientist. Tech companies, such as Reddit, who up until now did not have a data scientist on their team, are now hiring them left and right. The benefit is quite obvious—using data to make accurate predictions and simulations gives us a look into our world like never before.

Sounds great, but what's the catch?

This chapter will explore the terminology and vocabulary of the modern data scientist. We will see key words and phrases that are essential in our discussion on data science throughout this book. We will also look at why we use data science and the three key domains data science is derived from before we begin to look at code in Python, the primary language used in this book:

Basic terminology of data science

The three domains of data science

The basic Python syntax

What is data science?

Before we go any further, let's look at some basic definitions that we will use throughout this book. The great/awful thing about this field is that it is so young that these definitions can differ from textbook to newspaper to whitepaper.

Basic terminology

The definitions that follow are general enough to be used in daily conversations and work to serve the purpose of the book, an introduction to the principles of data science.

Let's start by defining what data is. This might seem like a silly first definition to have, but it is very important. Whenever we use the word data, we refer to a collection of information in either an organized or unorganized format:

Organized data: This refers to data that is sorted into a row/column structure, where every row represents a single observation and the columns represent the characteristics of that observation.

Unorganized data: This is the type of data that is in the free form, usually text or raw audio/signals that must be parsed further to become organized.

Whenever you open Excel (or any other spreadsheet program), you are looking at a blank row/column structure waiting for organized data. These programs don't do well with unorganized data. For the most part, we will deal with organized data as it is the easiest to glean insight from, but we will not shy away from looking at raw text and methods of processing unorganized forms of data.

Data science is the art and science of acquiring knowledge through data.

What a small definition for such a big topic, and rightfully so! Data science covers so many things that it would take pages to list it all out (I should know, I tried and got edited down).

Data science is all about how we take data, use it to acquire knowledge, and then use that knowledge to do the following:

Make decisions

Predict the future

Understand the past/present

Create new industries/products

This book is all about the methods of data science, including how to process data, gather insights, and use those insights to make informed decisions and predictions.

Data science is about using data in order to gain new insights that you would otherwise have missed.

As an example, imagine you are sitting around a table with three other people. The four of you have to make a decision based on some data. There are four opinions to consider. You would

Enjoying the preview?

Page 1 of 1

Principles of Data Science

About this ebook

Sinan Ozdemir

Related authors

Related to Principles of Data Science

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Principles of Data Science

What did you think?

Book preview

Principles of Data Science - Sinan Ozdemir

Table of Contents

Principles of Data Science

Principles of Data Science

Credits

About the Author

About the Reviewers

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Chapter 1. How to Sound Like a Data Scientist

What is data science?

Basic terminology