Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In the linear algebra boot camp, the instructor will draw from a combination of the following
books:
For the time being, you can rely on your linear algebra book from college, as well as the free
linear algebra book by Jim Hefferson at
http://joshua.smcvt.edu/linearalgebra/book.pdf.
In your initial review, focus on the following topics: vectors, matrices, and associated oper-
ations, solving linear equations, determinants, vector spaces, eigenvalues and eigenvectors,
and linear transformations. Also start looking
If you are looking for an online learning venue, consider taking the OCW Scholar course in
linear algebra at the Massachusetts Institute of Technology at
http://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/.
https://www.khanacademy.org/math/linear-algebra.
A fair warning on the online courses, watching but not doing the homework has next to no
value. Engage in the work.
Probability and Statistics
In the probability and statistics boot camp (MSAN 504), the instructor will use a combina-
tion of the following books:
Selected excerpts from the above-mentioned books are available on the Canvas page for
MSAN 504. Students will have access to this Canvas page as soon as they log onto MyUSF
(and the Canvas page is published).
The following represent some resources that you may find helpful:
• Data Science for Business by Foster Provost and Tom Fawcett contains a wealth of
useful information.
• Web Analytics 2.0: The Art of Online Accountability and Science of Customer Cen-
tricity by Avinash Kaushik. In this book the following chapters are useful 1 – web
analytics 2.0 framework; 5 – conversions, revenue, and satisfaction; 6 – customer cen-
tricity; 7 – experimentation and testing; 8 – competitive intelligence analysis; 13 –
planning your career for success; and 14 – creating a data-driven culture.
• Next, consider reading Predictive Analytics: The Power to Predict Who Will Click,
Buy, Lie, or Die by Eric Seigel and Thomas H. Davenport.
• If your English skills (particularly your written skills) are weak, practice some of the
writing exercises at http://www.autoenglish.org/writing.htm. Also, consider re-
viewing The Elements of Style by W. Strunk and E.B. White and Style, Ten Lessons
in Clarity and Grace by J. Williams.
Computer Programming Languages and Tools
Laptops
Every student must have a laptop with enough computing power and memory to complete
projects and practicum work. We strongly recommend you buy a Mac (or Linux) laptop
with at least the following specs:
We further recommend that you get 16G not 8G RAM (memory).The more memory you
have, the better.
Let us be very clear that Mac OS X is the preferred operating system. You can get
away with Linux, but should avoid Windows. If you choose to use Windows, you are on your
own: Faculty will not be able to help you install software or packages on Windows! Most
of the software we use in this program does not work well or cannot be easily installed on
Windows.
Programming
The greater your facility with writing software and using programming tools, the easier
you will find the entire curriculum. When students have difficulty in the programming
assignments, our first question to them is: What did you do in the months prior to the boot
camp? Every year a few students do not pass the computational boot camp and must exit
the program. We have created these notes so that you can properly prepare yourself.
Python 3.x
To ensure that you satisfy #1 in Python (3.x) you must be able to read and write code
using the following items. There will be a quiz at the start of the first class of computational
boot camp to verify this.
• Types vs variables. Common types: strings, integer and floating-point numbers. Dif-
ference between objects and values (see section 8.11 of “Python for Everybody book”).
Conversion between types. Multiple assignments: a,b = f().
• List and string element access, s[i], and slicing, s[1:5], s[1:], etc...
• Function declaration syntax; the difference between local variables, parameters, and
global variables; return values.
• Importing and using common functions and libraries: range(), round(), len(), min,
max, split()ing strings, reading and writing text files.
• Be able to look up functions to learn about their parameters and return values.
• An understanding of Python packages and how to import code from one file to another.
You can glean all of that from the following book: “Python for Everybody book” at
http://www.pythonlearn.com/book.php or by going through any of the following free courses:
• https://www.coursera.org/learn/python
• https://www.coursera.org/learn/interactive-python-1
• https://www.udacity.com/course/programming-foundations-with-python--ud036
• https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-10
• https://www.edx.org/course/cs-all-introduction-computer-science-harveymuddx-cs005x-0
• https://www.codecademy.com/learn/python
Python Tutor. You will also find http://pythontutor.com/ to be a very useful tool
when trying to visualize what’s going on with the objects in your running program. For
example, it will show you how a list of numbers is laid out in memory. Being able to
visualize data structures is a critical skill.
How to learn. Many of you have already gone through these courses, but you might
not have gotten that much out of it, particularly if you didn’t do the projects. If you want
to learn to write software, there is no substitute for actually typing in code. Don’t just
listen and watch the instructor write software. Study the code, understand the problem it is
solving, and then with a blank screen try to reproduce the code yourself without looking at
the solution unless you get stuck. Just as in a foreign natural language, we are much better
at listening than speaking. To improve your speaking, you must get as much practice as
possible speaking that language. You should be typing in (not just cut/pasting) lots and lots
of Python between now and the boot camp.
For incoming students with some Python experience already, you might consider review-
ing the first three chapters of Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and IPython, by Wes McKinney.
R
You will do a lot of programming in R. If you are new to R, you should familiarize yourself
with Robert Kabacoff’s “R in Action.” A good introduction to data science techniques can
be found in “Data Science with R” by Grolemun and Wickham, which can be found free
online at http://r4ds.had.co.nz/. There will be a srtong focus on the following packages:
dplyr, ggplot2 and magrittr. Students with more substantial programming experience,
but in languages other than R, should consider reviewing Chapters 1 to 3 of Software for
Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers.
Tools
Throughout the program you will use a number of tools in order to write software, execute
code, and communicate with team members and faculty. Before arriving at orientation, you
are required to have the following software installed on your laptop:
• Python 3.x not 2.7.x. Install Anaconda 3, a Python installation that includes
most of the packages you will need in this program:
https://www.anaconda.com/download
Be aware that there is almost certainly an existing Python installation on your laptop,
which you do not want to use. To ensure that you are using the correct version from
the command line (see bash below), you should see “Anaconda” when you start up
the Python interactive shell:
$ python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 6 2017, 12:04:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
• Jupyter notebooks (or in beta: Jupyter Lab). See http://jupyter.org/, but these
notebooks are built into the Anaconda installation. (Run jupyter notebook from
the shell to start.) Notebooks are combined programs and text that are great for
presentations as well as development. You can intersperse your thoughts with the code
that implements those ideas. Machine learning programming and many other kinds of
data science programming are particularly challenging because of the amount of data
involved. Being able to display graphs, data frames, and output directly in line with
the code in the same document is extremely valuable.
• Bash. Using the command line is a critical skill in this degree program. The command
line is also called “the shell” or by the specific shell’s name we use: Bash. When we
create servers in the cloud, we communicate with it through the command line. Further,
you will manage and process files on your laptop or remote server using the command
line. Bash is the default shell on both OS X and Linux. You should go through a
course such as this one: http://www.bash.academy
• git. The most important collaborative tool used by programmers is a revision control
system. We will use a tool called git in particular, which is hideous but powerful and
is the most commonly used. Very often you will be submitting software for grading
through git to http://www.github.com. Certainly, it is how multiple students work
on the same project. It is likely something that you will use on the first day of your
practicum. Potential employers will look at your sample projects at github. You should
go through these courses:
https://www.codecademy.com/learn/learn-git
https://www.udacity.com/course/how-to-use-git-and-github--ud775