Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Jeffrey Leek
Material for Introduction to Chromebook
Data Science
Jeffrey Leek
This book is for sale at http://leanpub.com/universities/courses/jhu/cbds-intro
Copyright © Johns Hopkins University 2018. Creative Commons Attribution 4.0 International
License.
Contents
Program Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
How To Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Finding Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Account Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Google Sheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
RStudio Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
DataCamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Welcome to Chromebook Data
Science
Hello and welcome! This is the first course in the Chromebook Data Science program. The goal of
Chromebook Data Science is to help anyone with an internet connection and a computer learn to
do data science. The program will start with the very basics of using a computer on the internet
and work all the way up to doing data science and data analysis. We hope that by building this
program we can help people get into the exciting tech world in one of the fastest growing¹ and most
satisfying² jobs in the United States. There are only going to be more and more jobs asking for data
science skills³ in the future. We believe that by making this career accessible to anyone we can have
a positive impact on the world.
Course Details
Before we jump into the content, we just wanted to orient you to how this course and all the courses
in this program will be laid out:
• Courses - There are multiple courses in the Chromebook Data Science program. The first one
is “Introduction to Chromebook Data Science”, which is the course you’re in right now.
• Lessons - Each course will consist of lessons. You’re looking at the first lesson here. It’s called
“Welcome to Chromebook Data Science”. You can see a list of all the lessons in this course in
the left panel. The lessons will contain text and images to walk you through every lesson of
each course.
• Videos - At the end of each lesson there will be a link to a YouTube video. This video contains
the same information as what is included in the text of the lesson; however, we know that some
people learn better by listening. Sometimes you may find the videos more helpful. Sometimes
you may find the text more helpful. These are included in case they are more helpful than the
text to you personally.
• Slides - Each lesson also has link at the bottom to an accompanying slide show. Feel free to
look through these slides if you find them helpful. They are the same images that were used to
generate the video.
• Quizzes - Most lessons will have a quiz to evaluate your understanding of the material in that
lesson. Successful completion of these quizzes is required for receipt of the certificate at the
end of each course.
¹https://www.pwc.com/us/en/publications/data-science-and-analytics.html
²https://www.techrepublic.com/article/is-data-scientist-the-most-rewarding-tech-job-new-report-says-yes/
³https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-predicts-demand-for-data-scientists-will-soar-28-by-2020/
Welcome to Chromebook Data Science 2
• Exercise - A few of the courses will have associated exercises. Think of these as larger projects.
They won’t be required to receive the certificate at the end of the course; however, the skills
the exercises require will be essential if you’re interested in getting a job in data science, so
we highly suggest you complete them. Also, occassionally, there will be DataCamp exercises.
DataCamp is a company that generates content to help people learn to code. These cost money
and are not required for completion of the program; however, they will help you get additional
practice if you choose to do them.
“Data science is asking a question that can be answered with data, collecting and cleaning
the data, studying the data, creating models to help understand and answer the question,
and sharing the answer to the question with other people.”
The reason this field is growing so fast is that nearly every government, company, and organization
is now collecting data. As the data have become cheaper and cheaper, the ability to analyze that data
and find useful information has become a more and more valuable skill. But most people don’t have
training or experience sifting through big piles of data to make interesting and valuable discoveries.
The people who can do this well are called data scientists. They have a job that is exciting, interesting,
and promises to be in high demand for years to come.
Most of the people who are currently data scientists have degrees in math, statistics, physics. They
can afford computers that cost thousands of dollars and specialized computing software to help them
do their jobs. They also mostly live in a few major cities like New York, San Francisco, Seattle, and
Washington D.C. Many of these data scientists are former software engineers or other white-collar
workers who moved into data science when they saw the demand for this kind of job.
It is our goal with Chromebook Data Science to try to help people who would otherwise not have
access to this exciting career to get into the career. To do that we need to remove some of the
challenges above. So we designed this program to tackle some of the challenges that are preventing
more widespread adoption of this career.
• Chromebook Data Science is being released as a set of online courses with a pay-what-you-can
model. That means you can take the whole series of courses for free or for whatever cost you
can afford.
• Chromebook Data Science is designed to be done entirely online using only tools you can
access from a web browser. This means that you can do the entire program on a Chromebook⁴
- which you can get for as little as $150.
• Chromebook Data Science starts at the very basics of how to set up all of your accounts, which
websites and apps to use, and simple little projects that anyone can do. The only pre-requisites
are high school math/reading and the ability to use a computer.
⁴https://www.google.com/chromebook/
Welcome to Chromebook Data Science 4
• Chromebook Data Science includes resources for finding, getting, and working at data science
jobs. It also includes resources for finding and working at remote data science jobs that can be
done from anywhere in the world.
But the program can be completed by anyone! We hope that it will be useful for anyone who wants
to learn something new about data science. This program is also focused on people who want to
learn to do data science.
In some cases this program may not be the most efficient way to learn about data science.
If you already have a background in statistics, math, or computer science and want to jump directly
to more advanced topics we have already created a Data Science Specialization⁵ on Coursera just
for you. There are many jobs that require people to understand or manage a data science project. If
you are a leader or executive who just wants a high level overview of what data science is all about,
we have also created an Executive Data Science Specialization⁶.
Our goal here is also to create a supportive and inclusive learning experience. Data science is
frustrating and slow to learn. Often the best way is to learn from other people who have discovered
similar solutions or made similar mistakes. Fortunately, there are communities in data science that
are cheerful, friendly and willing to help new people get involved. Throughout the program we will
introduce you to these communities and hope that you will also make an effort to help your fellow
students as they discover this exciting field.
• Courses: Courses are designed to be able to be done in about a month working in your spare
time or day or two working full time. You can receive a certificate for each course and all
courses are based on a pay-what-you-can model. Each course consists of:
⁵https://www.coursera.org/specializations/jhu-data-science
⁶https://www.coursera.org/specializations/executive-data-science
Welcome to Chromebook Data Science 5
To keep up on the latest information about the program, courses and more go to http://jhudatascience.org/chromeboo
Slides⁸
Take this quiz online⁹
⁸https://docs.google.com/presentation/d/18q2gRHXGZxBL7pSWcQg_HThmgoo5qDeO9O372QkAnYU/edit?usp=sharing
⁹http://leanpub.com/courses/jhu/cbds-intro/quizzes/quiz_00_welcome
Program Philosophy
Our philosophy with building this course and this program is to try to make data science accessible
to the widest audience possible.
This course is part of the “Chromebook Data Science”¹⁰ series of courses.
These courses are designed to tackle some of the challenges that prevent people from getting into
data science in the first place. Some of those challenges are geographic - we’ll talk more about that
later. Some are due to the price of education - that is why we are offering these courses as MOOCs.
But one of the key barriers is that the type of computer you usually need to do data science is
expensive.
Chromebooks¹¹, on the other hand, are a very cheap type of computer. Chromebooks aren’t exactly
like normal computers and they have a few unique characteristics:
A simple way to think about it is that a Chromebook is a computer that only lets you use an internet
browser like Chrome¹³. You can’t really do much on the computer itself. Some people call this way
of working - working only through the internet - “cloud computing”¹⁴.
It’s called cloud computing because the computer you are using most of the time is not the one
sitting in front of you. You are using the internet to access tools and computers to do your work.
But the physical computers doing the work are stored somewhere else - it could be nearby or on the
other side of the globe. That is why people call the computers “in the cloud”.
The goal of Chromebook Data Science is not that you have to use a Chromebook to finish the
program, it is just that you could use a Chromebook to finish the whole program. You can finish the
entire sequence of courses using any computer with an internet connection and a web browser.
We took this approach because we want data science to be accessible to everyone. We have found
that in earlier classes we taught online, the cost of computers, difficulties installing software, and
lack of computing resources prevented from students from completing our courses. We wanted to
strip all those barriers away so that more students would have access to our program.
¹⁰http://jhudatascience.org/chromebookdatascience/
¹¹https://www.google.com/chromebook/
¹²https://www.google.com/chromebook/find-yours/
¹³https://www.google.com/chrome/
¹⁴https://en.wikipedia.org/wiki/Cloud_computing
Program Philosophy 7
We also believe that the future of data science is increasingly cloud based. So this educational choice
matches a trend we see in the field that we can help you take advantage of. It is less and less likely
that you will work only on your laptop as a data scientist. Through the internet you will access data
and computing power so that you can magnify the impact of what you are working on. We hope to
show you how to use those resources to maximize the value you can bring as a new data scientist.
We do recognize that internet access is also a limiting factor for many people. We have tried to
make it so that you don’t have to download data so hopefully the broadband requirements will be
minimal. We hope that if internet access is a challenge for you that you can leverage the resources
you have - whether they are local libraries, coffee shops, or internet cafes to complete this program.
If that isn’t an option for you we’d love to hear from you and see if we can find ways to make data
science accessible to everyone, everywhere.
• Slides¹⁶
¹⁶https://docs.google.com/presentation/d/1s7sLqa0GAUqVaD9TE63ntBKJfXm2Ac2dG90G_41egLI/edit?usp=sharing
Why Automated Videos
What’s the deal with these videos? And why do the videos say almost exactly the same thing as
the printed lecture material? You have probably already noticed that the lectures and videos for this
class are structured a little differently than in many MOOCs you have taken. We created this video
to explain to you why we made this change and why we think it highlights the awesome power of
R and data science.
We create a lot of massive online open courses at the Johns Hopkins Data Science Lab. We have
created more than 30 courses on multiple platforms over the last 5 years. Our goal with these classes
is to provide the best and most up to date information to the broadest audience possible.
But there are significant challenges to maintaining this much material online. R packages go out of
date, new workflows are invented, and typos - oh the typos!
We used to make these courses like many other universities. We’d create course material in the form
of lecture slides, then we’d record videos of ourselves delivering those lectures. In some ways this was
great - you actually got to hear our voices delivering your lectures, including all the “yawns”, “ums”,
“buts”, and “so’s”. But the downside is that it is difficult and time consuming to update the content
when we have to book a recording studio, set up special equipment, record ourselves delivering a
lecture, edit those lectures, and then upload them to a system.
The result is that a lot of our lectures have been out of date, include errors, or don’t include the latest,
best versions of workflows and pipelines. This has been a problem for a while, but as the number of
courses we offer grows, it has become more and more of a challenge for us to keep them up to date.
So we started to think about how to solve this challenging problem. We realized that while recording
and editing videos is extremely time consuming there is another type of content we can edit, update,
and maintain much more frequently - regular old plain text documents¹⁷. We aren’t the only ones
who have thought this - massive online open course innovators like Lorena Barba have been saying
that videos aren’t even necessary for these types of courses¹⁸.
So when we sat down to develop our new process for creating and maintaining our courses we
wanted to see if we could figure out how to make a class made entirely out of plain text documents.
We broke down a massive online open course into its basic elements:
• Tutorials - these we can easily write in plain text formats like markdown or R markdown.
• Slides - these are easy enough to maintain and share if we make them with something like
Google Slides¹⁹.
¹⁷https://simplystatistics.org/2017/06/13/the-future-of-education-is-plain-text/
¹⁸https://www.class-central.com/report/why-my-mooc-is-not-built-on-video/
¹⁹https://www.google.com/slides/about/
Why Automated Videos 9
• Assessments - here we can use a markup language²⁰ to create quizzes and other assessments.
• Videos - this was the sticking point, how were we going to make videos from plain text
documents?
By a happy coincidence, the data science and artificial intelligence communities were solving a huge
part of this problem for us, improving text to voice synthesis! So we could now write a script for a
video and use Amazon Polly²¹ to synthesize our voices!
To take advantage of this new technology we created two new R packages: ari²² and didactr²³.
Ari will take a script and a set of Google Slides and narrate the script over the slides using Amazon
Polly. It will also generate the closed caption file needed to include captions and ensure that the
videos are accessible to those with hearing impairment. didactr automates several of the steps from
creating the videos with ari, to uploading them to YouTube, so that we can quickly make edits to
the scripts or slides, remake the videos, re-upload them and reduce our maintenance overhead for
keeping our content fresh.
Whenever we change the text file or edit the slides we can recreate the video in a couple of minutes.
Everything is done in R. One of the coolest features of going to this new process is showing you how
powerful the R programming language is. This is the main language you will learn in this program
and we hope you will be able to build cool things like this system by the time you are done with our
courses.
Why did we choose this approach instead of creating each piece of a lesson separately? Well, first,
this process makes it a lot easier for us to maintain and update the courses. If you report an issue
or find a mistake with a lesson, all we need to do is to change these two files and recreate the
courses again. Therefore, we will have a more efficient way of maintaining the course content and
updating it. Second, by using this process we have made our instruction more accessible. Since videos
have transcripts and transcripts have voice over, the content is accessible by those of us who have
disabilities. For everyone else, you can have a choice of reading versus listening versus watching the
content as you wish.
Finally, a cool feature of using text to speech synthesis is that our videos will keep getting better
as the voice synthesis software improves. It means that we can change the voice to different
voices. Ultimately, it will allow us to translate our courses into different languages quickly and
automatically using machine learning. We think this highlights the incredible power of data science
and artificial intelligence to improve the world.
If you find the robot voice annoying, we get it. We know that the technology isn’t perfect yet. That’s
why we’ve made the written lecture material reflect as closely as possible the video lectures. This
means that you don’t have to watch these vidoes. Using this setup, you can pick how you want
to consume our classes. We hope that this change will allow us to better serve you with the best
content at the fastest speed. Thanks for participating in this new phase of course development with
us!
²⁰https://leanpub.com/markua/read#leanpub-auto-quizzes-and-exercises
²¹https://aws.amazon.com/polly/
²²https://cran.r-project.org/web/packages/ari/index.html
²³https://github.com/muschellij2/didactr
Why Automated Videos 10
• Slides²⁵
²⁵https://docs.google.com/presentation/d/1FtdynwBR8IAE8x9cMTZrnWKfTPXEhH-KwkAZKQ0zwk8/edit?usp=sharing
The Data Science Process
In the first few lessons of this course we discussed what data is and talked about the fact that data
are everywhere. We also introduced you to the philosophy of this program: that everyone should
have access to the knowledge needed to become a data scientist and that these materials should be
able to be updated with ease as technologies and methodologies change over time. What we haven’t
yet covered is what an actual data science project looks like. To do so, we’ll first step through an
actual data science project, breaking down the parts of a typical project and then, provide a number
of links to other interesting data science projects. Our goal in this lesson is to expose you to the
process one goes through as they carry out data science projects.
The Question
When setting out on a data science project, it’s always great to have your question well-defined.
Additional questions may pop up as you do the analysis, but knowing what you want to answer
with your analysis is a really important first step. Hilary Parker’s question is included in bold in her
post. Highlighting this makes it clear that she’s interested in answer the following question:
Is Hilary/Hillary really the most rapidly poisoned name in recorded American history?
The Data
To answer this question, Hilary collected data from the Social Security website²⁹. This dataset
included the 1000 most popular baby names from 1880 until 2011.
Data Analysis
As explained in the blog post, Hilary was interested in calculating the relative risk for each of the
4,110 different names in her dataset from one year to the next from 1880 to 2011. By hand, this would
²⁹https://www.ssa.gov/OACT/babynames/
The Data Science Process 13
be a nightmare. Thankfully, by writing code in R, all of which is available on GitHub³⁰, Hilary was
able to generate these values for all these names across all these years. It’s not important at this
point in time to fully understand what a relative risk calculation is (although Hilary does a great job
breaking it down in her post!), but it is important to know that after getting the data together, the
next step is figuring out what you need to do with that data in order to answer your question. For
Hilary’s question, calculating the relative risk for each name from one year to the next from 1880 to
2011 and looking at the percentage of babies named each name in a particular year would be what
she needed to do to answer her question.
What you don’t see in the blog post is all of the code Hilary wrote to get the data from the Social
Security website³¹, to get it in the format she needed to do the analysis, and to generate the figures.
As mentioned above, she made all this code available on GitHub³² so that others could see what she
did and repeat her steps if they wanted. In addition to this code, data science projects often involve
writing a lot of code and generating a lot of figures that aren’t included in your final results. This
is part of the data science process too. Figuring out how to do what you want to do to answer your
³⁰https://github.com/hilaryparker/names
³¹https://www.ssa.gov/OACT/babynames/
³²https://github.com/hilaryparker/names
The Data Science Process 14
question of interest is part of the process, doesn’t always show up in your final project, and can be
very time-consuming.
That said, given that Hilary now had the necessary values calculated, she began to analyze the
data. The first thing she did was look at the names with the biggest drop in percentage from one
year to the next. By this preliminary analysis, Hilary was sixth on the list, meaning there were five
other names that had had a single year drop in popularity larger than the one the name “Hilary”
experienced from 1992 to 1993.
In looking at the results of this analysis, the first five years appeared peculiar to Hilary Parker.
(It’s always good to consider whether or not the results were what you were expecting, from any
analysis!) None of them seemed to be names that were popular for long periods of time. To see if this
hunch was true, Hilary plotted the percent of babies born each year with each of the names from
this table. What she found was that, among these “poisoned” names (names that experienced a big
drop from one year to the next in popularity), all of the names other than Hilary became popular
all of a sudden and then dropped off in popularity. Hilary Parker was able to figure out why most
of these other names became popular, so definitely read that section of her post! The name, Hilary,
however, was different. It was popular for a while and then completely dropped off in popularity.
The Data Science Process 15
To figure out what was specifically going on with the name Hilary, she removed names that became
popular for short periods of time before dropping off, and only looked at names that were in the
top 1000 for more than 20 years. The results from this analysis definitively show that Hilary had the
quickest fall from popularity in 1992 of any female baby name between 1880 and 2011. (“Marian”’s
decline was gradual over many years.)
The Data Science Process 16
Communication
The final step in this data analysis process was, once Hilary Parker had answered her question on
her computer, it was time to share it with the world. An important part of any data science project is
effectively communicating the results of the project. Hilary did so by writing a wonderful blog post
that communicated the results of her analysis, answered the question she set out to answer, and did
so in an entertaining way.
Additionally, it’s important to note that most projects build off someone else’s work. It’s really
important to give those people credit. Hilary accomplishes this by:
• linking to a blog post³³ where someone had asked a similar question previously
• linking to the Social Security website³⁴ where she got the data
• linking to a link about where she learned about web scraping³⁵
reports and web applications that allow you to effectively communicate your results. To give you
an example of the types of things that can be built using the R programming and suite of available
tools that use R, below are a few examples of the types of things that have been built using the data
science process and the R programming language - the types of things that you’ll be able to generate
by the end of this series of courses.
Masters students at the University of Pennsylvania set out to predict the risk of opioid overdoses in
Providence, Rhode Island. They include details on the data they used, the steps they took to clean
their data, their visualization process, and their final results³⁶. While the details aren’t important
now, seeing the process and what types of reports can be generated is important. Additionally,
they’ve created a Shiny App³⁷, which is an interactive web application. This means that you can
choose what neighborhood in Providence you want to focus on. All of this was built using R
programming.
³⁶https://pennmusa.github.io/MUSA_801.io/project_5/index.html
³⁷https://jordanbutz.shinyapps.io/directory/
The Data Science Process 18
• Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half³⁸, by David
Robinson³⁹
• Where to Live in the US⁴⁰, by Maelle Salmon⁴¹
• Sexual Health Clinics in Toronto⁴², by Sharla Gelfand⁴³
Conclusions
In this lesson, we hope we’ve conveyed that sometimes data science projects are tackling difficult
questions (‘Can we predict the risk of opioid overdose?’) while other times the goal of the project
is to answer a question you’re interested in personally (‘Is Hilary the most rapidly poisoned baby
name in recorded American History?’). In either case, the process is similar. You have to form your
question, get data, explore and analyze your data, and communicate your results. With the tools
you’ll learn in this series of courses, you will be able to set out and carry out your own data science
projects, like the examples included in this lesson!
³⁸http://varianceexplained.org/r/trump-tweets/
³⁹http://varianceexplained.org/about/
⁴⁰http://www.masalmon.eu/2017/11/16/wheretoliveus/
⁴¹http://www.masalmon.eu/about/
⁴²https://sharlagelfand.netlify.com/posts/tidying-toronto-open-data/
⁴³https://sharlagelfand.netlify.com/about/
The Data Science Process 19
• Slides⁴⁵
Failure
Even when a project is successful, know that there was failure on the way to success! The problem
is that what you see in a final blog post or a product put out by data scientists at a company is the
final product. This product may be something that is functional, really important, or even beautiful.
What you don’t see is all the failure that happened on the way to getting the end product. Data
science projects can be a lot like social media accounts. On social media, it’s easy to only show the
good stuff about one’s life. For data science projects, the end product of a data science project may
How To Learn 21
be awesome, so the user will only see the good stuff. But, there’s a lot of struggle and failure that
went into creating the awesome end product!
In fact, that pathway to success in data science is always full of failure. And, often, failure followed
by figuring out why you just failed is a great way to learn.
That doesn’t make failure easier. It will be frustrating from time to time, and figuring out why
something isn’t working can be hard. That’s ok! Know that you’re not alone. Even experienced data
scientists who have built really cool stuff experience lots of failure along the way.
How To Learn 22
The Mindset
To learn how to learn, it’s important to know just how important your mindset is. Your goal should be
to answer an interesting question. Your objective is not to memorize a bunch of functions. It’s to use
those functions to do something interesting. The path to accomplishing that goal may be circuitous.
You may take a few steps backward and experience a setback or two before moving forward. That’s
ok!
How To Learn 23
mindset
The Path
When carrying out a data science projects, there is always more than one way to solve a problem.
Your path may be different than someone else’s path.
In fact, while you may not know R code yet, the following four lines of code all produce the exact
same output:
Any one of these would be a reasonable approach. We use this example to explain that there is more
than one way to approach and to answer a question! Your path may be different than someone else’s.
Your approaches may not be identical. And, that is more than ok!
How To Learn 24
path
While we’ll point out where to find help when you’re stuck throughout this course set; however,
it may not be obvious when to ask for help. While this is not a hard and fast rule, if you’ve been
trying to find the answer to something you’re stuck on for half an hour and cannot figure it out,
it may be time to post your question online for someone else to answer or to reach out directly to
someone to get your question answered. During the half hour when you’re trying on your own, you
should Google for the answer. If it’s a coding question, you should try running code to test to see
if the fixes from Google fix your problem. If you’re getting error messages, paste those messages
into Google. If after trying all of these things you’re still stuck, then you should ask for help every
time. Rather than give up because you’re stuck, ask questions!
How To Learn 25
Ask Questions
Summary
Learning how to learn and asking questions may seem simple when you read this lesson, but in
practice it can be tough. It’s hard to admit you don’t know something and it can be difficult
sometimes to explain what it is you don’t know. Try anyway! Everyone was a beginner at some point.
Those who moved from beginner to advanced did so because they learned the material, practiced
and because they asked questions along the way. We’ll remind you of the information included in
this lesson throughout the course set because while it’s easy to read the information here, it’s not
always easy to remember it when you’re struggling!
How To Learn 26
• Slides⁴⁸
⁴⁸https://docs.google.com/presentation/d/1sgE2Um0t2AhkUlPHLJDSVLTJlyTabg1gtz1ybOgO-kY/edit?usp=sharing
Finding Help
In data science and computer-related work in general, it is common to ask for help multiple times
per day. While sometimes we ask our colleagues for help in-person, most of the time we search the
web.
Throughout this coursework it may surprise you just how frequently other people have run into the
same problem or had the exact same question you have. Often, there is an answer that was publicly
shared previously on the Internet that can help answer the very question you’re asking. There are
a number of websites and discussion boards where people frequently ask and answer questions. By
knowing how to effectively search the web, you can easily find these answers.
Google search
On the search box, as you start typing your question you will see suggestions based on what you
have written so far. This is called Google auto-complete. Here is an example where Google suggests
a few common searches that start with “how to find help in”.
Finding Help 29
Google auto-complete
The auto-complete feature can be useful because it helps us refine our search query which will lead to
more relevant results and answers. Throughout this course work, we’ll be using hte R programming
language to complete data analyses. Thus, you will often be searching for help related to the R
programming language. So, in this example, let’s select “how to find help in r” and then click on the
Google search button, we will get a list of websites that are most related to our question as shown
below.
Finding Help 30
Google highlights some of our key terms from our search in the search results list. For example, the
word help is bolded twice on the first link title “R: Getting Help with R”.
Each search result includes a short title, the web link, a short extract from the website, and some of
our search terms (words) highlighted. Using this information we have to decide if our search was
specific enough. For example, we could have searched “how to get help”. Google search would have
had no way of knowing that we had an R question specifically. Alternatively, searching “how to
get help for all the questions I’ve ever had or may have in the R programming language today or
tomorrow” is also not ideal. Devising a search with the fewest words that help accurately answer
your questions is the goal!
We will cover different ways of finding help. Throughout this coursework, you’ll likely learn that
part of being a data scientist means being good at Googling. Effectively searching the web is an
important skill to have.
Search Guidelines
The best way to get a response to your question is to be able to boil it down to relatively few words.
Less is usually better…it’s also faster to type too! So, when you’re Googling things, keep a few things
in mind:
Finding Help 31
• Use the fewest words possible - full sentences and correct grammar are not necessary when
searching google
• Be Specific - include words that are important to your specific search
• Know specific websites where you can get help - while Google is generally a great place to start,
sometimes it can be helpful to know specific websites where you can get help. StackOverflow⁵¹
and the RStudio Community⁵² will likely be helpful places as you learn to program in R. These
resources will be covered in detail in a future course; however, it’s good at this point to know
they exist
• Slides⁵⁴
Choosing a Username
Choosing an appropriate username is important. Some combination of your first and last name is a
good idea. For example, if your name were Jane Doe, a username such as “JaneDoe” or “Jane.Doe”
would work. If the first username you attempt is taken, you can try another, similar username. In
this case, maybe try “JDoe”.
Appropriate Usernames
But, be sure that whatever name you choose, you would be comfortable sharing it with your boss
or family member. Usernames with nicknames or profanity are not a good idea.
Account Setup 33
Accounts
To give you an idea of where we’re going, the first account (and arguably the most important
account) you set up in the next lesson will be a Google account⁵⁶. After that we will walk you
through the steps to get you set up with accounts on:
• slack⁵⁹ - this is a website where you will be able to chat online with your fellow students and
instructors.
• RStudio Cloud⁶⁰ - this is a website where you can use Rstudio, the main tool to learn data
science.
• DataCamp⁶¹ - this is a website where you can practice using R and Rstudio.
• GitHub⁶² - this is a website where we will share the results of our data science projects with
each other and the world.
Accounts
⁵⁹https://slack.com/
⁶⁰rstudio.cloud
⁶¹https://www.datacamp.com/
⁶²https://github.com
Account Setup 35
• Slides⁶⁴
Google Products
If you already have a Google account with an appropriate username that you would like to use
throughout this course, you can skip the next section and move to the “Log off Guest Chromebook”
section. However, it’s probably best to create a new account dedicated to all your Data Science
accounts, many of which you will set up in the next lesson.
Google website
Once this new tab is open, type in ‘www.gmail.com’ in the web address bar at the top of your
Chrome session. After clicking enter, you will be brought to a Login screen. Here, you will click on
“More options”
Google Account Setup 38
Google Sign in
You will then click on “Create account” to start the process of getting a Google Login.
Google Account Setup 39
Begin filling in the blank spaces in the box to the right with your information. Google will alert you
if the username you’ve chosen has already been taken. Once you’ve filled out all the blanks, click
on “Next Step” at the bottom right.
Google Account Setup 40
You’ll be asked to read the “Privacy and Terms.” To scroll through the entire document, click on the
blue arrow in the middle at the bottom of the document. After reading over the Privacy and Terms,
click “I AGREE” to continue.
You will then be asked to verify your account. To do so, ensure that a valid phone number for you
is in the ‘Phone number’ box. Select whether you prefer to be contacted by “Text message (SMS)”
or ‘Voice Call’. Click ‘Continue’ once the information has been entered. You will then be sent a
verification code by text message or by phone call, depending on your choice to this question. Enter
the verification code into the box on the screen and click “Continue”.
Congratulations! You now have a Google username and account! Be sure to remember your
username and password! This will be used for your email address (Gmail) and all other Google
products.
Google Account Setup 41
Google Welcome!
Chromebook Sign in
Enter your new Google account name here. Click ‘Next’. Enter your password. Click ‘Next.’ You will
now be logged on. Anytime you work on this Chromebook now, you will simply log in using your
new Google account.
Google Account Setup 45
• Slides⁶⁷
LinkedIn Account
LinkedIn⁶⁹ is a social networking site for employment. Think of it as Facebook for getting a job. It
allows you to put your qualifications online (like an online resume), has a space where you can look
for jobs, and can put you in contact with employers. Don’t worry about the details now. Through
this program, you will have the chance to set up your LinkedIn gradually. For right now, we’re just
worried about getting this account set up.
To begin set up, you’ll go to the web address bar in your Chrome browser. You will type
www.linkedin.com and hit ‘Enter.’
⁶⁹http://www.linkedin.com
Other Accounts Setup 47
LinkedIn website
This will bring you to LinkedIn’s login screen. On this screen you’ll begin filling out the boxes in the
middle of the screen. Be sure to use the Gmail username you just created in the last chapter when
asked for your Email address. Choose a password that cannot be easily guessed by someone else.
Once the four boxes are filled in, click “Join now”.
Other Accounts Setup 48
A screen may pop up asking you to verify that you’re not a robot. Whenever this happens, just click
on the empty square box to let the computer know you’re a real person.
Other Accounts Setup 49
Not a robot
After clicking “Join now” (and maybe after you verify that you are a human) you will be brought to
a new screen. Here, at the top, it will show you that you almost have a LinkedIn login but that first
you have to confirm your email address.
To do so, you will open a new tab at the top of the Chrome browser window. You will then type in
‘www.gmail.com’ and click ‘Enter.’
Other Accounts Setup 50
gmail website
This will bring you to your email account. An email from ‘LinkedIn Messages’ should be there. Click
on that email to open it.
Other Accounts Setup 51
In that email there will be a button where you can click to ‘Confirm your email’.
Other Accounts Setup 52
This will open a new screen where your google username will be in the box already. Click ‘Continue’
on this screen.
Other Accounts Setup 53
LinkedIn Continue
While other boxes may pop up for you to go further on LinkedIn and get your profile set up, this
is all you have to do for now. You now have a LinkedIn username and account! We will now go
through similar processes for the other accounts needed in this program.
Twitter Account
The second account you will need will be a Twitter⁷⁰ account. You may already be familiar with
Twitter; however, data scientists tend to use Twitter for work rather than socializing. Twitter is a
social media platform where users can “post and interact with messages.” These messages are known
as ‘tweets.’ Twitter is a great place to learn new things, connect with other data scientists, and to
ask/answer questions quickly.
You will need a professional Twitter account for our program. If you already have a Twitter
account you use for personal tweets and communicating with friends you should still create a
new, professional account where you will only post professional links and interact with other data
scientists.
To get a Twitter account, first type ‘www.twitter.com’ in the search bar at the top of your Chrome
session.
⁷⁰http://www.twitter.com
Other Accounts Setup 54
Twitter website
This will bring you to a screen where at the top right you’ll want to click ‘Sign up’.
Other Accounts Setup 55
Twitter sign in
This will bring you to a screen that prompts you for some information and asks you to create a new
password. After filling out the information, you will click ‘Sign up.’
Other Accounts Setup 56
Twitter sign in
You will be brought to a screen asking for your phone number. After entering your phone number,
you will click ‘Next.’
Other Accounts Setup 57
This will bring you to a screen where you will choose a username. This will be what your Twitter
‘handle’ will be. For simplicity, it would be best for your Gmail username and Twitter handle to be
the same (ie if your Gmail address is Jane.Doe@gmail.com, ‘Jane.Doe’ would be a great Twitter
username). If that name is unavailable, choose a different, but appropriate and simple, Twitter
username. Once you have chosen a Twitter username, click ‘Next’.
Other Accounts Setup 58
Twitter username
At this point you will be brought to a screen that will have a button saying ‘Let’s go!’ This will lead
you to set up your profile further, which is not needed at this time. Instead, go to your gmail, look
for an email from ‘Twitter’ and click on the email.
Other Accounts Setup 59
In this email, there will be a ‘Confirm now’ button to click. To verify your Twitter account, click on
this button.
Other Accounts Setup 60
Slack Account
Slack⁷¹ is a place where teams of people can easily communicate and work together on a project.
As a data scientist, you are often working in a group on a project. Slack is a place where everyone
working on that project can communicate. Slack is where communication throughout this course
will happen. You will be able to ask questions, answer questions, and communicate with others on
Slack about the things you are learning and the projects you are working on.
To get a slack account, you will first open a new tab in your browser by typing ctrl and t. Once you
have a new tab open, you will type ‘www.slack.com’ at the top of your browser in the web address
bar.
⁷¹http://www.slack.com
Other Accounts Setup 61
On this webpage, type your Gmail address in the ‘email address’ box, and click ‘GET STARTED.’
Other Accounts Setup 62
We won’t be signing into any workspaces yet; however, later in the program, when we do, you will
have an account! That’s all you need to do with Slack for now!
This will bring you to a screen where you can click on ‘Get Started.’ This will bring you to a login
screen where, instead of typing in your information, since you already have a Google account, just
click on ‘Sign up with Google.’
Other Accounts Setup 64
You will be prompted to choose which Google account you want to use. Choose your professional
Google account. Then, you will be brought to a screen where you will have to enter a username.
Again, for simplicity, try to use the same username across all accounts.
Other Accounts Setup 65
Then, click ‘Create Account’ and you’re all set! You now have an RStudio Cloud account!
DataCamp Account
DataCamp⁷² is an online platform where people learn Data Science. Throughout this program, you
will take courses and do exercises on DataCamp to ensure that you are acquiring the skills necessary
to be a successful data scientist.
Getting a DataCamp account will be very similar to getting an rstudio.cloud account because you
can again sign in using your Google account. To do so, go to www.datacamp.com.
⁷²http://www.datacamp.com
Other Accounts Setup 66
You will be brought to a screen where you can ‘Create Your Free Account.’ Click on the ‘Google+’
button. You will again be brought to a screen where you’re asked to choose your Google account.
Other Accounts Setup 67
DataCamp Google+
GitHub Account
GitHub⁷³ is a website that hosts computer code and allows for version control. We’ll get back to
what version control is later, but as for now, know that GitHub is where you’ll be ‘saving’ all of the
code you write. It’s also a place where you can look at other people’s code. And, throughout this
program, you’ll realize that you can learn a lot from other people’s code!
To get a GitHub account, first type www.github.com into the web address bar at the top of your
Chrome window and hit ‘Enter’.
⁷³http://www.github.com
Other Accounts Setup 69
You will be brought to a page where you should fill in your information. As with the other accounts,
try to use the same Username if possible. Enter your Gmail Email address. And, create a password
that cannot be easily guessed by others. Then, click ‘Sign up for GitHub.’
Other Accounts Setup 70
GitHub sign up
One fianl note about GitHub usernames in particular. This will be used for your website (which
you’ll build later) and all the code you write. You’ll use GitHub a lot, so this is a case where it is
particularly helpful to choose a good username, particularly one that has something to do with your
name and not much else. For example, the person writing this lesson is named Shannon Ellis. Her
GitHub username is “ShanEllis.” While it is possible to change your GitHub username down the line,
it’s a bit of a pain, so choose wisely now!
You now have a GitHub account and all of the online accounts needed for this program!
Other Accounts Setup 71
• Slides⁷⁵
“Data science is asking a question that can be answered with data, collecting and cleaning
the data, studying the data, creating models to help understand and answer the question,
and sharing the answer to the question with other people.”
Rather than try to explain data science with examples made by other people, we are going to show
you the process of data science through a project that you will complete.
The first step in any data science project is to come up with a question. You are taking this course
on Leanpub⁷⁷. Leanpub is a website where you can sell books and courses. For this first project the
question we are trying to answer is:
“How does the readership of a bestselling book relate to how much the author is charging
for that book?”
This question isn’t about data. It is just something we might be curious about. In this case, if you
were going to write and sell a book on Leanpub you might want to know what price to pick in order
to try to sell the most books. Many good data science questions don’t start out with data. They are
just questions you wish that you knew the answer to. Later, you try to find out if there is data to
answer your question.
In this case, to answer our question, we need some information on books on the Leanpub website.
If you go to https://leanpub.com/bookstore you will see a website that looks like this.
⁷⁷https://leanpub.com/
Your first data science project 73
This shows the bestseller books for the last week. If you click on one of the pictures of a book you
can get some information on that book. If I click on the page for the first book “PowerShell 101” I
see something like this.
Your first data science project 74
It will probably be a different book for you since it will be a different weekly bestseller. But you can
look in the top left corner and see how many people read the book. This information is there for
most books, but is sometimes missing if the author decides not to publish that number. In this case
there are 1,036 total readers of this book.
Your first data science project 75
Next we can find out the suggested price. This is on the right hand side and is the price the author
thinks is the appropriate price for their book. In this case the suggested price is $15.99.
Your first data science project 76
But one nice thing about Leanpub is that you can set up a “pay what you want” model where people
can choose how much they pay for a book. When authors do this, there is also a minimum price
they set for the book. If there is a minimum price it is also on the right hand side. In this case the
minimum price is $7.99.
Your first data science project 77
We could do this for each book and then we’d have a nice data set that would tell us something
about the number of readers for a book and the price of that book. Then we could start to look at
the numbers we collected and see if we see any patterns to the data that we have collected to try to
answer our question.
We’ll go through the steps necessary to do all of this and answer the project question “How does
readership of a bestselling book relate to how much the author is charging for that book?” in the
following lessons.
Your first data science project 78
Slides⁷⁹
⁷⁹https://docs.google.com/presentation/d/1auByZV5pghzELH-SMKLwxrZtigtXd-PC4Q5SrcT4qlE/edit?usp=sharing
Google Sheets
Google Sheets is a free, online spreadsheet program. If you’re familiar with Excel, it is similar to
Excel. If you are unfamiliar with Excel, that’s ok! We’ll go through everything you need to know to
get started on the project here. And, later in the program, we will go into more details to get you
fully comfortable working with Google Sheets. As for right now, just know that when you have data
that you want to input into a spreadsheet, Google Sheets is an ok place to start. Google Sheets is also
great because you never have to worry about saving your work. If you are online, Google Sheets
automatically saves your work.
What is a spreadsheet?
A spreadsheet is a type of document where data are stored in rows and columns of a grid. Each
square is referred to as a ‘cell’ in the spreadsheet. In Google Sheets (and many other spreadsheet
programs like Excel), the rows are numbered (like 1,2,3,…) and the columns are labeled with capital
letters (like A, B, C,…).
spreadsheet
Google Sheets 80
If you want to talk about a specific spot on the grid you can use the number and letter corresponding
to that point. For example, A2 specifies the data in cell in the first column (A) and second row (2) of
the spreadsheet.
spreadsheet position
When you are working with data in a spreadsheet you can type directly into the spreadsheet. It is
important to make sure you double check all the numbers you type since there isn’t a good way to
“spellcheck” your work when you are editing a spreadsheet.
We will talk a lot more in future courses about how to organize data that you have collected. Mostly
we will want to collect “tidy data”⁸⁰ which is data that has
Here we are only collecting one “kind” of data - just data on books. The columns will be different
types of information about the books. We will collect information on the name of the book, the
number of sales of that book, the minimum price of the book, and the suggested price of the book.
⁸⁰https://en.wikipedia.org/wiki/Tidy_data
Google Sheets 81
Each of those will be in a separate column. Then, for each book, we will make a new row with the
data for that book.
Remember we are collecting information on the bestselling books from the last week on Leanpub.
You can find the list of bestsellers here: https://leanpub.com/bookstore⁸¹. Remember that if you click
on the image of one book you will get something that looks like this.
Now click on the big plus sign and you will get a new spreadsheet that will look like this.
Google Sheets 83
Untitled sheet
If you click on the words “Untitled Spreadsheet” you can rename the spreadsheet. Type in the words
“leanpub_data” to change the name of your spreadsheet. You should now have a spreadsheet that
looks like this.
Google Sheets 84
leanpub_data sheet
We are almost done setting up the spreadsheet, now we just need to label the different kinds of data
we are going to collect. Start by clicking on the upper left hand cell (A1) and type “title”. This will
be the column where we are going to store information on the title of the book.
Google Sheets 85
Then move one cell to the right, click and type “readers”. This will be where we will store how many
readers a book has. Move one more cell to the right type “suggested” and then one more cell and
type “minimum”. Make sure your column names are not capitalized.
Google Sheets 86
Collecting data
Now you are all set to start collecting data! To do this open another new tab by holding ctrl and
pressing t, then go to the webpage: https://leanpub.com/bookstore. Click on the book and write the
title, number of readers, suggested, and minimum prices on a row in the spreadsheet tab. When you
are doing this make sure that:
• There are no commas in numbers. Just leave them out. So don’t write “1,036” write “1036”
instead.
• You don’t put dollar signs for the price, just include the number like “7.99.”
• If a book’s minimum price is free, enter “0” in the cell.
• If the book has no readers, put “0” in the cell.
• If the book’s author opted not to inlcude how many readers their book has, put “NA” in the
“readers” column for that book.
So for me, since the first book is “PowerShell 101” after getting the data for the first book my
spreadsheet will look like this.
Google Sheets 87
Continue this process, entering each book into a new row. Collect information on ten or twenty
books. One book for every row. At the end you should have a data set that looks something like this.
But yours will have different numbers and names in it.
Google Sheets 88
1. You have at least 11 rows with reader and minimum price information (one header row and at
least 10 books included - if you have NAs anywhere, you’ll want more than 11 books)
2. Your dollar amounts do NOT have dollar signs next to them.
3. Your number of readers does not include any commas.
4. If a book’s minimum price is FREE, you have put the number 0 in the cell, rather than “FREE”
Google Sheets 89
This is great! You now have a question you want to answer and you have collected some data to
answer that question. You are on your way to becoming a data scientist!
Publish
A box will appear to confirm that you would like to publish this Google Sheet. Click “OK.”
Google Sheets 92
OK
Share
A “Share with others” box will pop up. Click on “Get shareable link.”
Google Sheets 94
Your screen will update so that this document can now be viewed by anyone, as long as they have
the link to the spreadsheet.
Google Sheets 95
Shareable
Congrats! You have successfully made this spreadsheet shareable and the link has been copied. You’ll
be asked to paste this link in the quiz for this lesson, and we’ll use this spreadsheet link in the next
lesson when you get started using RStudio Cloud, so don’t close your Google Sheets tab quite yet.
Google Sheets 96
• Slides⁸³
First, in the top left-hand portion of the window, the scripting area is where you will see code to run
in your first project in a few slides. In the future, this will be where you will type all your code. The
code typed in this space can be saved and re-run later whenever you need it.
RStudio Cloud 99
In the bottom left-hand portion of the window is the Console. This is where the code you type in
the scripting window above will actually run. You script what you want to happen in the scripting
window. In the Console, what you wanted to happen actually happens.
RStudio Cloud 100
The coding language R is an object-oriented programming language. This means that when you
code, objects are created. We’ll talk in detail about what that means later. However, any objects that
you create while coding will be listed here in the Environment section in the top right-hand portion
of the RStudio Cloud window.
RStudio Cloud 101
The fourth component is at the bottom on the right-hand side of the window. Here, any files or
folders you create, such as the scripts you save, will be listed.
RStudio Cloud 102
You’ll also note that there are multiple tabs in each of these sections. We’ll talk about the other tabs
shortly; however, we’ll note now that in the bottom right-hand section, there is a “Plots” tab. If you
were to click on that you would simply see an empty blank space because you haven’t made any
plots yet. However, when you do the project you’ll be generating a plot. The plot you create will
show up in this tab.
RStudio Cloud 103
“How does readership of a bestselling book relate to how much the author is charging for
that book?”
To start working in RStudio Cloud, open up a new tab by pressing ctrl and pressing t, then copy this
URL and paste it into your web browser http://bit.ly/cbds_projects⁸⁷. If you get a log in page, press
the button to “Log in with Google” just like you did when you were setting up your account.
You should now see a page that looks like this. You should see a Project listed that is called “leanpub_-
project”.
⁸⁷http://bit.ly/cbds_projects
RStudio Cloud 104
On the right-hand side, you should see an icon to “Copy” the project. Click on this icon.
RStudio Cloud 105
You should now see a page that looks like this across the top.
RStudio Cloud 106
You’ll first want to title your project. Click on ‘leanpub_project’ at the top and begin typing. Title it
with ‘leanpub_project_lastname’. So, for example if your last name were Doe, the project would be
titled ‘leanpub_project_doe’. You’re ready to get going!
RStudio Cloud 107
You are now using the RStudio software! The first thing that you should do is go to the bottom right
hand side of the screen and click on the file called “leanpub_googlesheets_analysis.R”.
RStudio Cloud 108
This should open up a file full of code in the top left-hand portion of the screen. Your screen should
now look like this.
RStudio Cloud 109
This file already has computer code in it. That computer code will read the data from the Google
Sheet you have created and make a plot. If you scroll through this code you will see likes that start
with “#”. Any time you see a line that starts with a pound sign (#) in code is a comment. This is text
that is added to explain to anyone looking at the code what the code does. The rest of the text in
this file tells the computer what to do. Using this code, we’ll do a few things:
1. Get things set up. The details aren’t important now, but we’ll definitely get into them later in
the series.
2. Read in the Google Sheet you generated.
3. Check to make sure that the data are in the correct format.
4. Make a plot that will look at the relationship between the number of readers and minimum
price for Leanpub books.
In the future, you’ll learn how to write this code. For now, all the code is available to you. All you
should have to do to make this work is copy the public URL for the Google Sheet that you made
in the last chapter of the course. To do this, scroll through the code in the top left-hand panel of
RStudio Cloud. Find the place in the computer code that says “PASTE_YOUR_GOOGLE_SHEET_-
LINK_HERE!”.
RStudio Cloud 110
Now you should be ready to run your code! You can do so all at once by highlighting all the code
in the “leanpub_googlesheets_analysis.R” script. Then, you would find the button that says “Run”
at the top of the code file and click on that button.
RStudio Cloud 113
You should see code running in the bottom left-hand panel. As code runs, there will be some output
in red text, letting you know that the code is running. This red text does not mean anything is
wrong. Note that red text in RStudio sometimes is an error, while other times it is just providing you
with information. If it says error, than it’s an error. But, don’t be alarmed that red text is appearing
on your screen. If the code runs, a plot should appear on the lower right hand side.
RStudio Cloud 114
If you don’t see any red Xs in your code, there is likely an error with how you formatted your
spreadsheet. The errors will appear in the bottom left-hand Console panel. Scroll through the text
there to see if any of the error messages help point you to what mistake may have been made. Then,
edit your spreadsheet in Google Sheets and re-run all the code again.
RStudio Cloud 116
Once you have your plot, you have what you need to make the Google Doc and finish your project
in the next lesson. Keep this tab open so that you can copy your plot in the next lesson!
• Slides⁸⁹
This will open a Blank Google Doc. You’re now ready to get started working in Google Docs.
Google Docs 119
In this document, you’ll want to include a short summary about what question you were asking,
what data you collected, and where these data were collected from in a section titled “Summary”.
You’ll then want to paste your results and explain what you see in the plot you generated in a
“Results” section. Finally you’ll conclude how the price of a bestselling book relates to how much
the author is charging for that book in a “Conclusion” section.
Google Docs 121
In order to get the plot to paste into your report, you’ll start a new tab by typing ctrl and t at the
same time and going to http://bit.ly/cbds_projects. You should see your project here. You will click
on that project. The analysis you already carried out will be here. To copy the plot you generated,
click on ‘Export’ in the ‘Plots’ tab in the bottom right-hand of the RStudio window.
Google Docs 122
Export in rstudio.cloud
Copy to Clipboard
Plot in rstudio.cloud
With your cursor over the plot that pops up, you will then tap the mouse keypad with two fingers
at the same time to bring up a new menu. On this menu, select, ‘Copy Image.’
Google Docs 125
You can now return to Google Docs, place your cursor where you’d like the plot to go, tap the mouse
keypad with two fingers at the same time to bring up a new menu, and click ‘Paste’ to paste your
plot from RStudio Cloud in your Google Doc.
Google Docs 126
A ‘Share with others’ box will pop up in the middle of the screen. Click on ‘Get shareable link.’
Google Docs 129
Shareable Link
A new screen will pop up informing you that your link has been copied. This is the link you will
paste by pressing ctrl and v in the quiz below when asked for your Google Doc link. Congrats!
You’ve completed your first report from a data science project!
Google Docs 130
• Slides⁹²
Presentation Guidelines
We’ll go into more details later, but there are three things to keep in mind anytime you are making
a slide presentation:
3. Make the font and pictures big enough to be seen when presentation is projected.
Google Slides 132
This will open up a blank and simple slide where you can begin to work on your presentation.
Google Slides 133
Similar to the Google Doc you created, you’ll want to rename this file. To do so, click on ‘Untitled
presentation’ in the top left-hand corner of the presentation. Again, title this slideshow using your
last name. For example, if your last name were Doe, you would title this ‘leanpub_presentation_doe.’
You’re now ready to get ready working on your first slide!
Google Slides 134
A reasonable title would be ‘Leanpub Data Science Project.’ You’d then want to include who did the
analysis as a subtitle. By clicking ‘Click to add subtitle’ you can then include your name on your
presentation.
Google Slides 136
If you wanted to change the font size of any of the text to make it bigger or smaller, you would
highlight that text and then click on the font size at the top of the menu to display a drop down
menu. Font size can be selected from this list or typed in that box directly.
Google Slides 137
You can use a similar process of highlighting text and then selecting from the toolbar to change
formatting in a number of other ways. You can change the font of the text, make the text bold,
italicize the text, underline the text, or change the color of the text as well.
Google Slides 138
Once you’re happy with how your title slide looks, you’ll want to start working on the next slide
in your presentation. To start the next slide, you’ll click the plus sign at the top left-hand portion of
your Google Slides presentation.
Google Slides 139
A second slide in your presentation will appear. You can add text to this slide the same way you did
on the title slide. Pictures can also be copy and pasted into your Google Slides.
Google Slides 140
You will want to create a Google Slides presentation with approximately four slides summarizing
the Leanpub data science project you have been working on. These slides should include
• Title slide
• The question you were asking in your data science project
• Information about how the data were collected, where the data came from, and what data were
collected
* The results (including your plot!) and conclusions from your analysis
A ‘Share with others’ window will pop up. Here you will click ‘Get shareable link’.
Google Slides 142
This will bring up a new box indicating that your link has been copied. This is the link you will
paste when asked by the quiz at the end of this lesson for your Google slides link.
Google Slides 143
• Slides⁹⁵
⁹⁵https://docs.google.com/presentation/d/1sjOuMmP1oXuqvTMeKlAoOSCqD-TOncWraD67b_pzrUE/edit?usp=sharing
Google Slides 144
• DataCamp allows you to practice R without having to set up all the software
• DataCamp has courses covering a broad range of topics we are going to cover in the courses
Logging on to DataCamp
You previously signed on to DataCamp in an earlier lesson and you will simply repeat that process
now by first going to ‘www.datacamp.com’ and clicking the ‘Google+’ logo to log on.
You will then be on the DataCamp home page. On this page, you will click on ‘Learn’ from the menu
across the top.
DataCamp 147
DataCamp Learn
This will open a drop-down menu. From this menu, you will select ‘Introduction to R’ from the
Course listings on the left-most column.
DataCamp 148
DataCamp Introduction to R
Introduction to R
This will bring you to the course page for ‘Introduction to R.’ Here you will click ‘Start Course For
Free.’
DataCamp 149
This will open up the DataCamp course. This layout will be used throughout the course and should
look somewhat familiar. It is similar to RStudio Cloud in that you have a place where you will write
your code (SCRIPT.R) and a place where that code will run (R CONSOLE). However, DataCamp
is different in that it has lessons and exercises to help teach you how to code in the programming
language R.
The information you need to learn will always be on the left side of the DataCamp window. At the
top there will be an ‘EXERCISE.’ The text in here will explain what you need to know to complete
the lesson.
DataCamp 150
DataCamp Exercise
Below the ‘EXERCISE’ is the ‘INSTRUCTIONS’ section. This window will include the specific
instructions for what you will need to do before continuing on to the next part of the course.
DataCamp 151
DataCamp Instructions
If you scroll through this part of the window, you will notice a ‘Take Hint’ button that you can click
on. You’ll always want to try the exercise without taking a hint; however, if you get stuck, clicking
on ‘Take Hint’ may help you.
DataCamp 152
Now that you know where to find instructions, you’re ready to start learning how to code. All code
will be written in the SCRIPT.R portion of the DataCamp window in the top right-hand portion of
the screen.
DataCamp 153
DataCamp Script.R
The code you write will then execute, or be carried out in the R CONSOLE in the bottom right-hand
corner of the screen.
DataCamp 154
DataCamp R Console
In order to run a line of code, you can first highlight the line you want to run. You then click on
‘Run Code.’ This will send the code to the console to execute. In this example, you will see that R
acts as a calculator. When you run the code ‘3 + 4’ in the R Console, you get back that the answer is
‘7.’
DataCamp 155
Once you’ve completed the task asked of you in the instructions section and clicked ‘Run Code’ to
test your answer, you can then click ‘Submit Answer.’
DataCamp 156
If your response is correct the screen at left will pop up to let you know that you’re ready to continue
on to the next section of the course. Press ‘Enter’ to continue.
DataCamp 157
DataCamp Continue
• Slides⁹⁹
⁹⁹https://docs.google.com/presentation/d/1Kgpmw00v_OjhhXkf_ULGV4pWIJjNuu3Sukmd2aqbHUk/edit?usp=sharing
¹⁰⁰http://leanpub.com/courses/jhu/cbds-intro/quizzes/quiz_10_datacamp