Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
org
How
to
use
the
Machine
Learning
Salons
Kit?
...........................................................
16
What
is
the
Machine
Learning
Salons
Kit?
..................................................................................................
16
What
is
not
the
Machine
Learning
Salons
Kit?
..........................................................................................
16
Why
are
we
not
on
GitHub?
................................................................................................................................
16
If
you
are
a
CTO
who
wants
to
recruit
smart
Machine
Learning
developers
...............................
16
If
you
want
to
become
a
contributor
..............................................................................................................
16
If
you
want
to
remove
a
link
..............................................................................................................................
16
If
you
want
to
add
a
better
description
of
your
website
........................................................................
16
If
you
are
willing
to
give
a
discount
to
the
Machine
Learning
Salons
readers
............................
17
About
the
Founder
of
The
Machine
Learning
Salons
Website
&
Kit
...............................................
17
Contact
.........................................................................................................................................................................
17
MOOC
or
Opencourseware
English
..........................................................................
18
Coursera
......................................................................................................................................................................
18
Machine
Learning
Stanford
Course
.................................................................................................................
18
Pratical
Machine
Learning
.................................................................................................................................
18
Machine
Learning
Washington
Course
..........................................................................................................
18
Core
Concepts
in
Data
Analysis
(Higher
School
of
Economics)
...........................................................
19
Neural
Networks
for
Machine
Learning
........................................................................................................
19
Natural
Language
Processing
...........................................................................................................................
19
Probabilistic
Graphical
Models
.........................................................................................................................
20
Stanford
Engineering
Everywhere
..................................................................................................................
20
EdX
................................................................................................................................................................................
21
Learning
from
data
(Caltech)
............................................................................................................................
21
Articifial
Intelligence
(BerkeleyX)
....................................................................................................................
21
Big
Data
and
Social
Physics
(Ethics)
..............................................................................................................
22
Introduction
to
Computational
Thinking
and
Data
Science
................................................................
22
MIT
OpenCourseWare
(OCW)
...........................................................................................................................
22
VLAB
MIT
Entreprise
Forum
Bay
Area,
Machine
Learning
Videos
...................................................
22
Foundations
of
Machine
Learning
by
Mehryar
Mohri
-
10
years
of
Homeworks
with
Solutions
and
Lecture
Slides,
not
to
be
missed
!
.......................................................................................
23
IPAM,
Institute
for
Pure
and
Applied
Mathematics,
Videos,
UCLA
...................................................
23
Carnegie
Mellon
University
................................................................................................................................
24
Carnegie
Mellon
University
(CMU)
Video
resources
................................................................................
24
Convex
Optimisation,
Fall
2013,
by
Barnabas
Poczos
and
Ryan
Tibshirani,
CMU
.....................
24
Machine
Learning,
Spring
2011,
by
Tom
Mitchell,
CMU
........................................................................
24
Metacademy
Concept
list
and
roadmap
list
................................................................................................
25
Harvard
University
.................................................................................................................................................
25
Advanced
Machine
Learning,
Fall
2013
(Free
access
to
most
of
videos)
........................................
25
Data
Science
Course,
Fall
2013
.........................................................................................................................
25
Oxford
University,
Nando
de
Freitas
video
lectures
................................................................................
26
Cambridge
University
Machine
Learning
Slides,
Spring
2014
............................................................
26
Caltech
University,
Learning
from
Data
........................................................................................................
26
University
College
London
Discovery
............................................................................................................
26
University
College
London,
Supervised
Learning
.....................................................................................
26
Yann
LeCuns
Publications
..................................................................................................................................
27
Francis
Bach,
Ecole
Normale
Superieure
-
Courses
and
Exercises
with
solutions
(English-
French)
........................................................................................................................................................................
27
Technion,
Israel
Institute
of
Technology,
Machine
Learning
Videos
...............................................
28
NPTEL,
National
Programme
on
Technology
Enhanced
Learning,
India
.......................................
28
Probability
Theory
and
Applications
..............................................................................................................
28
Pattern
Recognition
...............................................................................................................................................
28
Videolectures.net
....................................................................................................................................................
28
MLSS
Machine
Learning
Summer
Schools
Videos
....................................................................................
28
MLSS
Videos
from
2004
to
2012
.......................................................................................................................
28
MLSS
Videos
2012
...................................................................................................................................................
28
MLSS
Videos
2012
...................................................................................................................................................
28
Max
Planck
Institute
for
Intelligent
Systems
Tubingen,
MLSS
Videos
2013
..................................
29
GoogleTechTalks
.....................................................................................................................................................
29
Machine
Learning
...................................................................................................................................................
29
Deep
Learning
...........................................................................................................................................................
29
Udacity
Opencourseware
....................................................................................................................................
29
Supervised
Learning
(select
"View
Courseware"
for
free
access)
.....................................................
29
Unsupervised
Learning
(select
"View
Courseware"
for
free
access)
................................................
29
Reinforcement
Learning
(select
"View
Courseware"
for
free
access)
.............................................
30
Mathematicalmonk
Machine
Learning
..........................................................................................................
30
Judea
Pearl
Symposium
.......................................................................................................................................
30
Machine
Learning
Reading
Group,
Indian
Institute
of
Science
...........................................................
30
SIGDATA,
Indian
Institute
of
Technology
Kanpur
....................................................................................
30
Hakka
Labs
.................................................................................................................................................................
30
Open
Yale
Course
....................................................................................................................................................
31
Columbia
University
..............................................................................................................................................
31
Machine
Learning
resources
..............................................................................................................................
31
Applied
Data
Science
by
Ian
Langmore
and
Daniel
Krasner
...............................................................
31
Deep
Learning
..........................................................................................................................................................
32
BigDataWeek
Videos
.............................................................................................................................................
32
Neural
Information
Processing
Systems
Foundation
(NIPS)
Video
resources
............................
32
Hong
Kong
Open
Source
Conference
2013
(English&Chinese)
..........................................................
33
ICLR
2014
Videos
....................................................................................................................................................
33
ICLR
2013
Videos
....................................................................................................................................................
33
Machine
Learning
Conference
Videos
...........................................................................................................
33
Internet
Archive
......................................................................................................................................................
35
University
of
Berkeley
..........................................................................................................................................
35
AMP
Camps,
Big
Data
Bootcamp,
UC
Berkeley
...........................................................................................
35
Resources
and
Tools
of
Noah's
ARK
Research
Group
.............................................................................
35
ESAC
DATA
ANALYSIS
AND
STATISTICS
WORKSHOP
2014
...............................................................
36
The
Royal
Society
....................................................................................................................................................
36
Statistical
and
causal
approaches
to
machine
learning
by
Professor
Bernhard
Schlkopf
...
37
Deep
Learning
..........................................................................................................................................................
37
Deep
Learning
RNNaissance
with
Dr.
Juergen
Schmidhuber
..............................................................
37
Introduction
to
Deep
Learning
with
Python
by
Alec
Radford
.............................................................
37
Miscellaneous
...........................................................................................................................................................
38
Introduction
To
Modern
Brain-Computer
Interface
Design
by
Swartz
Center
for
Computational
Neuroscience
.............................................................................................................................
38
Distributed
Computing
Courses
(lectures,
exercises
with
solutions)
by
ETH
Zurich,
Group
of
Prof.
Roger
Wattenhofer
......................................................................................................................................
38
The
wonderful
and
terrifying
implications
of
computers
that
can
learn
|
Jeremy
Howard
|
TEDxBrussels
.............................................................................................................................................................
39
OCTAVE
.......................................................................................................................................................................
54
JULIA
.............................................................................................................................................................................
55
Julia
by
example
.......................................................................................................................................................
55
The
R
PROJECT
for
Statistical
Computing
....................................................................................................
55
R
......................................................................................................................................................................................
55
R
Graph
Gallery
........................................................................................................................................................
55
Code
School
-
R
Course
..........................................................................................................................................
56
Coursera
R
programming
....................................................................................................................................
56
Open
Intro
R
Labs
....................................................................................................................................................
56
R
Tutorial
....................................................................................................................................................................
56
DataCamp
R
Course
................................................................................................................................................
56
R
Bloggers
...................................................................................................................................................................
56
STAN
Software
.........................................................................................................................................................
57
List
of
Machine
Learning
Open
Source
Software
......................................................................................
57
Google
Prediction
API
...........................................................................................................................................
57
Reddit
...........................................................................................................................................................................
58
SCHOGUN
toolbox
...................................................................................................................................................
58
Infer.NET,
Microsoft
Research
..........................................................................................................................
58
F#
Software
Foundation
......................................................................................................................................
58
BigML
...........................................................................................................................................................................
59
BRML
Toolbox
in
Matlab
David
Barber
Toolbox,
University
College
London
..........................
59
Dmitry
Efimov
Software
......................................................................................................................................
59
SCILAB
.........................................................................................................................................................................
59
OverFeat
and
Torch7,
CILVR
Lab
@
NYU
.....................................................................................................
59
Mloss.org
....................................................................................................................................................................
59
Sourceforge
...............................................................................................................................................................
60
Freecode
.....................................................................................................................................................................
60
Open
Machine
Learning
Workshop
organized
by
Alekh
Agarwal,
Alina
Beygelzimer,
and
John
Langford,
August
2014
...............................................................................................................................
60
Maxim
Milakov
Software
.....................................................................................................................................
60
Alfonso
Nieto-Castanon
Software
....................................................................................................................
61
Lib
Skylark
.................................................................................................................................................................
61
Mutual
Information
Text
Explorer
..................................................................................................................
61
Data
Science
Resources
by
Jonathan
Bower
on
GitHub
.........................................................................
61
Joseph
Misiti's
Blog
................................................................................................................................................
62
Michael
Waskom
GitHub
repositories
...........................................................................................................
62
Visualizing
distributions
of
data
......................................................................................................................
62
Exploring
Seaborn
and
Pandas
based
plot
types
in
HoloViews
by
Philipp
John
Frederic
Rudiger
........................................................................................................................................................................
62
Open
Source
Hong
Kong
......................................................................................................................................
63
Lamda
Group,
Nanjing
University
...................................................................................................................
63
10
11
12
13
14
15
16
If
you
are
willing
to
give
a
discount
to
the
Machine
Learning
Salons
readers
Just
tell
us
what
youre
willing
to
propose
and
if
we
think
that
the
readers
will
find
it
relevant,
we
will
add
it
without
any
exchange
of
money.
You
will
never
get
an
email
list
of
our
visitors
because
we
dont
have
any
information
about
our
visitors.
We
are
working
with
a
Basic
Adobe
Business
Catalysts
website,
weve
got
the
geographical
location
of
our
visitors,
their
loyalty,
their
page
views,
etc.
but
no
IP
addresses,
We
have
nothing
to
sale
and
we
are
not
willing
to.
About
the
Founder
of
The
Machine
Learning
Salons
Website
&
Kit
The
Machine
Learning
Salon
is
founded
by
Jacqueline
I.
Forien
who
very
much
enjoyed
her
Master
of
Science
in
Machine
Learning
at
University
College
London
thanks
to
all
her
wonderful
Machine
Learning
&
Computational
Statistics
and
Machine
Learning's
Peers
and
Teachers.
Jacqueline
would
like
to
express
a
special
gratitude
to
her
director
of
Machine
Learning
studies
at
UCL,
Professor
Mark
Herbster,
her
tutor,
Professor
David
Barber,
her
supervisor
of
Master's
project,
Professor
Nadia
Berthouze.
In
addition,
Jacqueline
would
like
to
express
many
thanks
to
Igor
Carron
who
initiated
the
smart
association
of
'Machine
Learning'
and
'Salon',
and
gave
her
the
opportunity
to
organise
in
London
a
wonderful
event
that
was
the
Europe
Wide
Machine
Learning
Meetup
between
Paris,
Berlin,
Zurich
and
London
with
Andrew
Ng
as
a
Guest
speaker.
Contact
Please,
contact
us
if
you
want
to
add
a
contribution,
remove
a
link,
etc.
Any
suggestion
is
welcome!
Contact
at
contact@machinelearningsalon.org
17
18
19
technologies
are
having
a
dramatic
impact
on
the
way
people
interact
with
computers,
on
the
way
people
interact
with
each
other
through
the
use
of
language,
and
on
the
way
people
access
the
vast
amount
of
linguistic
data
now
in
electronic
form.
From
a
scientific
viewpoint,
NLP
involves
fundamental
questions
of
how
to
structure
formal
models
(for
example
statistical
models)
of
natural
language
phenomena,
and
of
how
to
design
algorithms
that
implement
these
models.
https://www.coursera.org/course/nlangp
Probabilistic
Graphical
Models
Uncertainty
is
unavoidable
in
real-world
applications:
we
can
almost
never
predict
with
certainty
what
will
happen
in
the
future,
and
even
in
the
present
and
the
past,
many
important
aspects
of
the
world
are
not
observed
with
certainty.
Probability
theory
gives
us
the
basic
foundation
to
model
our
beliefs
about
the
different
possible
states
of
the
world,
and
to
update
these
beliefs
as
new
evidence
is
obtained.
These
beliefs
can
be
combined
with
individual
preferences
to
help
guide
our
actions,
and
even
in
selecting
which
observations
to
make.
While
probability
theory
has
existed
since
the
17th
century,
our
ability
to
use
it
effectively
on
large
problems
involving
many
inter-related
variables
is
fairly
recent,
and
is
due
largely
to
the
development
of
a
framework
known
as
Probabilistic
Graphical
Models
(PGMs).
This
framework,
which
spans
methods
such
as
Bayesian
networks
and
Markov
random
fields,
uses
ideas
from
discrete
data
structures
in
computer
science
to
efficiently
encode
and
manipulate
probability
distributions
over
high-dimensional
spaces,
often
involving
hundreds
or
even
many
thousands
of
variables.
These
methods
have
been
used
in
an
enormous
range
of
application
domains,
which
include:
web
search,
medical
and
fault
diagnosis,
image
understanding,
reconstruction
of
biological
networks,
speech
recognition,
natural
language
processing,
decoding
of
messages
sent
over
a
noisy
communication
channel,
robot
navigation,
and
many
more.
The
PGM
framework
provides
an
essential
tool
for
anyone
who
wants
to
learn
how
to
reason
coherently
from
limited
and
noisy
observations.
https://www.coursera.org/course/pgm
Stanford
Engineering
Everywhere
SEE
programming
includes
one
of
Stanfords
most
popular
engineering
sequences:
the
three-course
Introduction
to
Computer
Science
taken
by
the
majority
of
Stanford
undergraduates, and seven more advanced courses in artificial intelligence and
electrical engineering.
20
21
22
VLAB
is
the
San
Francisco
Bay
Area
chapter
of
the
MIT
Enterprise
Forum,
a
non-
profit
organization
dedicated
to
promoting
the
growth
and
success
of
high-tech
entrepreneurial
ventures
by
connecting
ideas,
technology
and
people.
We
provide
a
forum
for
San
Francisco
and
Silicon
Valley's
leading
entrepreneurs,
industry
experts,
venture
capitalists,
private
investors
and
technologists
to
exchange
insights
about
how
to
effectively
grow
high-tech
ventures
amidst
dynamic
market
risks
and
challenges.
In
a
world
where
markets
change
at
breakneck
speed,
knowledge
is
a
critical
source
of
competitive
advantage.
Our
forums
provide
an
excellent
opportunity
to
network
and
learn
about
pivotal
business
issues,
emerging
industries
and
the
latest
technologies.
http://www.youtube.com/user/vlabvideos/search?query=machine+learning
Foundations
of
Machine
Learning
by
Mehryar
Mohri
-
10
years
of
Homeworks
with
Solutions
and
Lecture
Slides,
not
to
be
missed
!
Added
the
11-Nov-2014
Course
Description
This
course
introduces
the
fundamental
concepts
and
methods
of
machine
learning,
including
the
description
and
analysis
of
several
modern
algorithms,
their
theoretical
basis,
and
the
illustration
of
their
applications.
Many
of
the
algorithms
described
have
been
successfully
used
in
text
and
speech
processing,
bioinformatics,
and
other
areas
in
real-world
products
and
services.
The
main
topics
covered
are:
Probability
tools,
concentration
inequalities
PAC
model
Rademacher
complexity,
growth
function,
VC-dimension
Perceptron,
Winnow
Support
vector
machines
(SVMs)
Kernel
methods
Decision
trees
Boosting
Density
estimation,
maximum
entropy
models
Logistic
regression
Regression
problems
and
algorithms
Ranking
problems
and
algorithms
Halving
algorithm,
weighted
majority
algorithm,
mistake
bounds
Learning
automata
and
transducers
Reinforcement
learning,
Markov
decision
processes
(MDPs)
http://www.cs.nyu.edu/~mohri/ml14/
IPAM,
Institute
for
Pure
and
Applied
Mathematics,
Videos,
UCLA
IPAM
records
many
of
its
lectures
and
makes
them
available
to
the
public
so
that
a
wider
audience
may
benefit
from
the
scientific
programs
we
offer.
Since
July
2012,
IPAM
has
begun
to
record
most
of
its
lectures.
You
can
access
the
lectures
for
a
particular
program
or
workshop
(such
as
Materials
Defects
Tutorials)
by
following
the
program
link
listed
below
to
the
relevant
workshop
schedule.
Each
speaker
is
listed
along
with
available
slide
shows
and
videos.
For
public
lectures,
the
link
will
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
23
take
you
directly
to
the
video.
The
programs
and
public
lectures
are
listed
in
reverse
chronological
order.
Older
videos
play
on
Real
Player
only;
recent
videos
will
play
on
Flash
supported
browsers
and
software.
https://www.ipam.ucla.edu/videos.aspx
Carnegie
Mellon
University
Carnegie
Mellon
University
(CMU)
Video
resources
"The
videos
below
are
intended
to
serve
as
resources
for
our
current
students,
and
not
as
online
learning
materials
for
students
outside
of
our
program."
-
The
Machine
Learning
Department
http://www.ml.cmu.edu/teaching/video-resources.html
Convex
Optimisation,
Fall
2013,
by
Barnabas
Poczos
and
Ryan
Tibshirani,
CMU
Overview
and
objectives
Nearly
every
problem
in
machine
learning
and
statistics
can
be
formulated
in
terms
of
the
optimization
of
some
function,
possibly
under
some
set
of
constraints.
As
we
obviously
cannot
solve
every
problem
in
machine
learning
or
statistics,
this
means
that
we
cannot
generically
solve
every
optimization
problem
(at
least
not
efficiently).
Fortunately,
many
problems
of
interest
in
statistics
and
machine
learning
can
be
posed
as
optimization
tasks
that
have
special
propertiessuch
as
convexity,
smoothness,
separability,
sparsity
etc.
permitting
standardized,
efficient
solution
techniques.
This
course
is
designed
to
give
a
graduate-level
student
a
thorough
grounding
in
these
properties
and
their
role
in
optimization,
and
a
broad
comprehension
of
algorithms
tailored
to
exploit
such
properties.
The
main
focus
will
be
on
convex
optimization
problems,
though
we
will
also
discuss
nonconvex
problems
at
the
end.
We
will
visit
and
revisit
important
applications
in
statistics
and
machine
learning.
Upon
completing
the
course,
students
should
be
able
to
approach
an
optimization
problem
(often
derived
from
a
statistics
or
machine
learning
context)
and:
(1)
identify
key
properties
such
as
convexity,
smoothness,
sparsity,
etc.,
and/or
possibly
reformulate
the
problem
so
that
it
possesses
such
desirable
properties;
(2)
select
an
algorithm
for
this
optimization
problem,
with
an
understanding
of
the
ad-
vantages
and
disadvantages
of
applying
one
method
over
another,
given
the
problem
and
properties
at
hand;
(3)
implement
this
algorithm
or
use
existing
software
to
efficiently
compute
the
solution.
http://www.stat.cmu.edu/~ryantibs/convexopt/#videos
Machine
Learning,
Spring
2011,
by
Tom
Mitchell,
CMU
Machine
Learning
is
concerned
with
computer
programs
that
automatically
improve
their
performance
through
experience
(e.g.,
programs
that
learn
to
recognize
human
faces,
recommend
music
and
movies,
and
drive
autonomous
robots).
This
course
covers
the
theory
and
practical
algorithms
for
machine
learning
from
a
variety
of
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
24
25
http://discovery.ucl.ac.uk
http://www.youtube.com/watch?v=Euaoblv_nL8
26
27
28
Max
Planck
Institute
for
Intelligent
Systems
Tubingen,
MLSS
Videos
2013
Our
goal
is
to
understand
the
principles
of
Perception,
Action
and
Learning
in
autonomous
systems
that
successfully
interact
with
complex
environments
and
to
use
this
understanding
to
design
future
systems.
The
Institute
studies
these
principles
in
biological,
computational,
hybrid,
and
material
systems
ranging
from
nano
to
macro
scales.We
take
a
highly
interdisciplinary
approach
that
combines
mathematics,
computation,
material
science,
and
biology.
The
MPI
for
Intelligent
Systems
has
campuses
in
Stuttgart
and
Tbingen.
Our
Stuttgart
campus
has
world-leading
expertise
in
small-scale
intelligent
systems
that
leverage
novel
material
science
and
biology.
The
Tbingen
campus
focuses
on
how
intelligent
systems
process
information
to
perceive,
act
and
learn.
http://www.youtube.com/channel/UCty-pPOWlWUk4gXNm5pydcg
GoogleTechTalks
Machine
Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=machine+learning
Deep Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=deep+learning
Udacity
Opencourseware
Supervised
Learning
(select
"View
Courseware"
for
free
access)
Why
Take
This
Course?
In
this
course,
you
will
gain
an
understanding
of
a
variety
of
topics
and
methods
in
Supervised
Learning.
Like
function
approximation
in
general,
Supervised
Learning
prompts
you
to
make
generalizations
based
on
fundamental
assumptions
about
the
world.
Michael:
So
why
wouldn't
you
call
it
"function
induction?"
Charles:
Because
someone
said
"supervised
learning"
first.
Topics
covered
in
this
course
include:
Decision
trees,
neural
networks,
instance-
based
learning,
ensemble
learning,
computational
learning
theory,
Bayesian
learning,
and
many
other
fascinating
machine
learning
concepts.
https://www.udacity.com/course/ud675
Unsupervised
Learning
(select
"View
Courseware"
for
free
access)
Why
Take
This
Course?
You
will
learn
about
and
practice
a
variety
of
Unsupervised
Learning
approaches,
including:
randomized
optimization,
clustering,
feature
selection
and
transformation,
and
information
theory.
You
will
learn
important
Machine
Learning
methods,
techniques
and
best
practices,
and
will
gain
experience
implementing
them
in
this
course
through
a
hands-on
final
project
in
which
you
will
be
designing
a
movie
recommendation
system
(just
like
Netflix!).
https://www.udacity.com/course/ud741
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
29
30
folks
that
will
power
innovation
and
disrupt
industries,
and
ultimately
shape
our
future.
Hakka
originally
launched
in
SF
Bay
&
NYC
and
rapidly
built
relationships
with
the
top
companies,
CTOs
and
tech
influencers
in
these
key
areas.
We
have
deep
connections
to
the
software
engineering
worlds
on
both
coasts
and
often
invite
groups
of
CTOs
and
engineers
to
our
office
in
Soho,
or
meet
with
them
at
engineering
events
that
we
either
run
or
participate
in.
We're
also
currently
up
&
running
in
Berlin
&
Moscow,
and
plan
to
continue
to
rapidly
expand
worldwide.
Not
too
shabby
for
a
scrappy
startup
with
a
small
marketing
budget!
http://www.hakkalabs.co
https://www.youtube.com/user/g33ktalktv/videos
Open
Yale
Course
Game
Theory
Each
course
includes
a
full
set
of
class
lectures
produced
in
high-quality
video
accompanied
by
such
other
course
materials
as
syllabi,
suggested
readings,
exams,
and
problem
sets.
The
lectures
are
available
as
downloadable
videos,
and
an
audio-
only
version
is
also
offered.
In
addition,
searchable
transcripts
of
each
lecture
are
provided.
http://oyc.yale.edu/courses
Columbia
University
Machine
Learning
resources
Course
related
notes
Regression
by
linear
combination
of
basis
functions
[ps]
[pdf]
The
perceptron
[ps]
[pdf]
Document
classification
with
the
multinomial
model
[ps]
[pdf]
Sampling
from
a
Gaussian
[ps]
[pdf]
Slides
on
exponential
family
distributions
[ps]
[pdf]
http://www.cs.columbia.edu/~jebara/4771/tutorials.html
Applied
Data
Science
by
Ian
Langmore
and
Daniel
Krasner
The
purpose
of
this
course
is
to
take
people
with
strong
mathematical/statistical
knowledge
and
teach
them
software
development
fundamentals.
This
course
will
cover
Design
of
small
software
packages
Working
in
a
Unix
environment
Designing
software
in
teams
Fundamental
statistical
algorithms
such
as
linear
and
logistic
regression
Overfitting
and
how
to
avoid
it
Working
with
text
data
(e.g.
regular
expressions)
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
31
Time
series
And
more.
.
.
http://columbia-applied-data-science.github.io/appdatasci.pdf
http://columbia-applied-data-science.github.io
Deep
Learning
Deep
Learning
is
a
new
area
of
Machine
Learning
research,
which
has
been
introduced
with
the
objective
of
moving
Machine
Learning
closer
to
one
of
its
original
goals:
Artificial
Intelligence.
This
website
is
intended
to
host
a
variety
of
resources
and
pointers
to
information
about
Deep
Learning.
In
these
pages
you
will
find
a
reading
list,
links
to
software,
datasets,
a
list
of
deep
learning
research
groups
and
labs,
a
list
of
announcements
for
deep
learning
related
jobs
(job
listings),
as
well
as
tutorials
and
cool
demos.
For
the
latest
additions,
including
papers
and
software
announcement,
be
sure
to
visit
the
Blog
section
and
subscribe
to
our
RSS
feed
of
the
website.
Contact
us
if
you
have
any
comments
or
suggestions!
http://www.deeplearning.net/tutorial/
http://deeplearning.net
BigDataWeek
Videos
Big
Data
Week
is
one
of
the
most
unique
global
platforms
of
interconnected
community
events
focusing
on
the
social,
political,
technological
and
commercial
impacts
of
Big
Data.
It
brings
together
a
global
community
of
data
scientists,
data
technologies,
data
visualisers
and
data
businesses
spanning
six
major
commercial,
financial,
social
and
technological
sectors.
http://www.youtube.com/user/BigDataWeek/videos
Neural
Information
Processing
Systems
Foundation
(NIPS)
Video
resources
The
Foundation:
The
Neural
Information
Processing
Systems
(NIPS)
Foundation
is
a
non-profit
corporation
whose
purpose
is
to
foster
the
exchange
of
research
on
neural
information
processing
systems
in
their
biological,
technological,
mathematical,
and
theoretical
aspects.
Neural
information
processing
is
a
field
which
benefits
from
a
combined
view
of
biological,
physical,
mathematical,
and
computational
sciences.
The
primary
focus
of
the
NIPS
Foundation
is
the
presentation
of
a
continuing
series
of
professional
meetings
known
as
the
Neural
Information
Processing
Systems
Conference,
held
over
the
years
at
various
locations
in
the
United
States,
Canada
and
Spain.
http://www.youtube.com/user/NeuralInformationPro/feed
32
Machine
Learning
Conference
Videos
Events
matching
your
search:
ICML 2011
33
Scale
Learning)
Big
Data
Meets
Computer
Vision:
First
International
Workshop
on
Large
Scale
Visual
Recognition
and
Retrieval
2nd
Workshop
on
Semantic
Perception,
Mapping
and
Exploration
(SPME)
Object,
functional
and
structured
data:
towards
next
generation
kernel-based
methods
-
ICML
2012
Workshop
Tutorial
on
Statistical
Learning
Theory
in
Reinforcement
Learning
and
Approximate
Dynamic
Programming
beyond
Performance
Evaluation
for
Learning
Algorithms:
Techniques,
Application
and
Issues
2013
The
4th
International
Workshop
on
Music
and
Machine
Learning:
Learning
from
Musical
Structure
Inferning
2012:
ICML
Workshop
on
interaction
between
Inference
and
Learning
PAC-Bayesian
Analysis
in
Supervised,
Unsupervised,
and
Reinforcement
Learning
Sixteenth
International
Conference
on
Artificial
Intelligence
and
Statistics
(AISTATS)
2013
34
NYU
Course
on
Deep
Learning
(Spring
2014)
NYU
Course
on
Machine
Learning
and
Computational
Statistics
2014
http://techtalks.tv/search/results/?q=machine+learning
Internet
Archive
Hello
Patron,
Every
day
3
million
people
use
our
collections.
We
have
archived
over
ten
petabytes
(that's
10,000,000,000,000,000
bytes!)
of
information,
including
everything
ever
written
in
Balinese.
This
year
we
also
launched
our
groundbreaking
TV
News
Search
and
Borrow
service,
which
former
FCC
Chairman
Newton
Minow
said
"offers
citizens
exceptional
opportunities"
to
easily
do
their
own
fact
checking
and
"to
hold
powerful
public
institutions
accountable."
Your
support
helps
us
build
amazing
services
and
keep
them
free
for
people
around
the
globe.
https://archive.org/search.php?query=machine%20learning
University
of
Berkeley
http://www.youtube.com/user/UCBerkeley/search?query=machine+learning
AMP
Camps,
Big
Data
Bootcamp,
UC
Berkeley
AMP
Camps
are
Big
Data
training
events
organized
by
the
UC
Berkeley
AMPLab
about
big
data
analytics,
machine
learning,
and
popular
open-source
software
projects
produced
by
the
AMPLab.
All
AMP
Camp
curriculum,
and
whenever
possible
videos
of
instructional
talks
presented
at
AMP
Camps,
are
published
here
and
accessible
for
free.
http://ampcamp.berkeley.edu
AMP
Camp
5
was
held
at
UC
Berkeley
and
live-streamed
online
on
November
20
and
21,
2014.
Videos
and
exercises
from
the
event
are
available
on
the
AMPCamp
5
page.
http://ampcamp.berkeley.edu/5/
Resources
and
Tools
of
Noah's
ARK
Research
Group
The
following
were
developed
by
ARK
researchers
(*developed
in
whole
or
in
part
before
joining
ARK):
NLP
tools:
universal
part-of-speech
tagset,
set
of
twelve
coarse
POS
tags
that
generalizes
across
several
languages
Semantics:
SEMAFOR,
an
open-source
statistical
frame-semantic
parser;
AMALGr,
an
open-source
statistical
analyzer
for
multiword
expressions
in
context
35
36
The
Societys
fundamental
purpose,
reflected
in
its
founding
Charters
of
the
1660s,
is
to
recognise,
promote,
and
support
excellence
in
science
and
to
encourage
the
development
and
use
of
science
for
the
benefit
of
humanity.
The
Society
has
played
a
part
in
some
of
the
most
fundamental,
significant,
and
life-
changing
discoveries
in
scientific
history
and
Royal
Society
scientists
continue
to
make
outstanding
contributions
to
science
in
many
research
areas.
The
Royal
Society
is
the
national
Academy
of
science
in
the
UK,
and
its
core
is
its
Fellowship
and
Foreign
Membership,
supported
by
a
dedicated
staff
in
London
and
elsewhere.
The
Fellowship
comprises
the
most
eminent
scientists
of
the
UK,
Ireland
and
the
Commonwealth.
A
major
activity
of
the
Society
is
identifying
and
supporting
the
work
of
outstanding
scientists.
The
Society
supports
researchers
through
its
early
and
senior
career
schemes,
innovation
and
industry
schemes,
and
other
schemes.
The
Society
facilitates
interaction
and
communication
among
scientists
via
its
discussion
meetings,
and
disseminates
scientific
advances
through
its
journals.
The
Society
also
engages
beyond
the
research
community,
through
independent
policy
work,
the
promotion
of
high
quality
science
education,
and
communication
with
the
public.
https://www.youtube.com/user/RoyalSociety/videos?spfreload=10
Statistical
and
causal
approaches
to
machine
learning
by
Professor
Bernhard
Schlkopf
https://www.youtube.com/watch?v=ek9jwRA2Jio&spfreload=10
Deep
Learning
Deep
Learning
RNNaissance
with
Dr.
Juergen
Schmidhuber
A
great
session
of
NYC-ML
Meetup
Hosted
by
ShutterStock
in
the
glorious
Empire
State
building.
Details:
Deep
Learning
RNNaissance
Machine
learning
and
pattern
recognition
are
currently
being
revolutionised
by
"Deep
Learning"
(DL)
https://www.youtube.com/watch?v=6bOMf9zr7N8&spfreload=10
Introduction
to
Deep
Learning
with
Python
by
Alec
Radford
Alec
Radford,
Head
of
Research
at
indico
Data
Solutions,
speaking
on
deep
learning
with
Python
and
the
Theano
library.
The
emphasis
of
the
talk
is
on
high
performance
computing,
natural
language
processing
using
recurrent
neural
nets,
and
large
scale
learning
with
GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
SlideShare
presentation
is
available
here:
http://slidesha.re/1zs9M11
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
37
Miscellaneous
Introduction
To
Modern
Brain-Computer
Interface
Design
by
Swartz
Center
for
Computational
Neuroscience
This
is
an
online
course
on
Brain-Computer
Interface
(BCI)
design
with
a
focus
on
modern
methods.
The
lectures
were
first
given
by
Christian
Kothe
(SCCN/UCSD)
in
2012
at
University
of
Osnabrueck
within
the
Cognitive
Science
curriculum
and
have
now
been
recorded
in
the
form
of
an
open
online
course.
The
course
includes
basics
of
EEG,
BCI,
signal
processing,
machine
learning,
and
also
contains
tutorials
on
using
BCILAB
and
the
lab
streaming
layer
software.
http://sccn.ucsd.edu/wiki/Introduction_To_Modern_Brain-
Computer_Interface_Design
Distributed
Computing
Courses
(lectures,
exercises
with
solutions)
by
ETH
Zurich,
Group
of
Prof.
Roger
Wattenhofer
Mission
We
are
interested
in
both
theory
and
practice
of
computer
science
and
information
technology.
In
our
group
we
cultivate
a
large
breadth
of
areas,
reflecting
our
different
backgrounds
in
computer
science,
mathematics,
and
electrical
engineering.
This
gives
us
a
unique
blend
of
basic
and
applied
research,
proving
mathematical
theorems
on
the
one
hand,
and
building
practical
systems
on
the
other.
We
currently
study
the
following
topics:
Distributed
computing
(computability,
locality,
complexity),
distributed
systems
(Bitcoin),
wireline
networks
(software
defined
networks),
wireless
networks
(media
access
theory
and
practice),
social
networks
(influence),
algorithms
(online
algorithms,
game
theory),
learning
theory
(recommendation
theory
and
practice).
We
regularly
publish
in
different
communities:
distributed
computing
(e.g.
PODC,
SPAA,
DISC),
networking
(e.g.
SIGCOMM,
MobiCom,
SenSys),
theory
(e.g.
STOC,
FOCS,
SODA,
ICALP),
and
from
time
to
time
at
random
in
areas
such
as
machine
learning
or
human
computer
interaction.
Members
of
our
group
have
won
several
best
paper
awards
at
top
conferences
such
as
PODC,
SPAA,
DISC,
MobiCom,
or
P2P.
Roger
Wattenhofer
has
won
the
Prize
for
Innovations
in
Distributed
Computing
in
2012,
for
extensive
contributions
to
the
study
of
distributed
approximation.
Some
projects
turned
into
startup
companies,
e.g.
Wuala,
StreamForge,
BitSplitters.
Several
projects
have
been
covered
by
popular
media
and
blogs,
e.g.
Gizmodo,
Lifehacker,
New
York
Times,
NZZ,
PC
World
Magazine,
Red
Herring,
or
Technology
Review.
Some
of
the
software
developed
by
our
students
is
very
popular:
The
music
application
Jukefox
and
the
peer-to-peer
client
BitThief
have
together
more
than
1
million
downloads.
A
branch
of
the
United
States
FBI
has
requested
to
use
a
ver-
sion
of
BitThief
as
a
tool
to
uncover
illegal
activities.
About
half
of
the
former
PhD
students
are
in
academic
positions,
some
others
founded
startup
companies.
http://dcg.ethz.ch/courses.html
38
The
wonderful
and
terrifying
implications
of
computers
that
can
learn
|
Jeremy
Howard
|
TEDxBrussels
Published
on
6
Dec
2014
This
talk
was
given
at
a
local
TEDx
event,
produced
independently
of
the
TED
Conferences.
The
extraordinary,
wonderful,
and
terrifying
implications
of
computers
that
can
learn
https://www.youtube.com/watch?v=xx310zM3tLs&spfreload=10
39
fractionnaires
des
sommets
ou
des
artes,
graphes
de
Kneiser),
les
problmes
de
transversales
d'un
graphe
(parcours
eulriens,
cycles
hamiltoniens,
graphes
de
DeBruijn,
etc.)
et
la
notion
de
marche
alatoire
sur
un
graphe
(chanes
de
Markov,
existence
de
la
distribution
limite,
mixing
time,
etc.).
Plusieurs
problmes
sur
les
graphes
ont
d'lgantes
solutions,
d'autres
videmment
sont
NP-complets;
une
partie
de
ce
cours
portera
donc
sur
la
thorie
de
la
complexit
(problmes
NP
et
NP-complets,
thorme
de
Cook,
algorithmes
de
rductions).
https://cours.ift.ulaval.ca/2012a/ift7012_89927/
Hugo
Larochelle,
Apprentissage
automatique,
French
Canadian
Je
m'intresse
aux
algorithmes
d'apprentissage
automatique,
soit
aux
algorithmes
capables
d'extraire
des
concepts
ou
patrons
partir
de
donnes.
Mes
travaux
se
concentrent
sur
le
dveloppement
d'approches
connexionnistes
et
probabilistes
diverses
problmes
d'intelligence
artificielle,
tels
la
vision
artificielle
et
le
traitement
automatique
du
langage.
Les
thmes
de
recherche
auxquels
je
m'intresse
incluent:
Problmes:
apprentissage
supervis,
semi-supervis
et
non-supervis,
prdiction
de
cibles
structures,
ordonnancement,
estimation
de
densit;
Modles:
rseaux
de
neurones
profonds
(deep
learning),
autoencodeurs,
machines
de
Boltzmann,
champs
Markoviens
alatoires;
Applications:
reconnaissance
et
suivi
d'objects,
classification
et
ordonnancement
de
documents;
https://www.youtube.com/channel/UCiDouKcxRmAdc5OeZdiRwAg
http://www.dmi.usherb.ca/~larocheh/index_fr.html
Francis
Bach,
Ecole
Normale
Superieure
-
Courses
and
Exercises
with
solutions
(English-French)
Spring
2014:
Statistical
machine
learning
-
Master
M2
"Probabilites
et
Statistiques"
-
Universite
Paris-Sud
(Orsay)
Fall
2013:
An
introduction
to
graphical
models
-
Master
M2
"Mathematiques,
Vision,
Apprentissage"
-
Ecole
Normale
Superieure
de
Cachan
Spring
2013:
Statistical
machine
learning
-
Master
M2
"Probabilites
et
Statistiques"
-
Universite
Paris-Sud
(Orsay)
Spring
2013:
Statistical
machine
learning
-
Filiere
Math/Info
-
L3
-
Ecole
Normale
Superieure
(Paris)
Fall
2012:
An
introduction
to
graphical
models
-
Master
M2
"Mathematiques,
Vision,
Apprentissage"
-
Ecole
Normale
Superieure
de
Cachan
Spring
2012:
Statistical
machine
learning
-
Filiere
Math/Info
-
L3
-
Ecole
Normale
Superieure
(Paris)
Spring
2012:
Statistical
machine
learning
-
Master
M2
"Probabilites
et
Statistiques"
-
Universite
Paris-Sud
(Orsay)
Fall
2011:
An
introduction
to
graphical
models
-
Master
M2
"Mathematiques,
Vision,
Apprentissage"
-
Ecole
Normale
Superieure
de
Cachan
Spring
2011:
Statistical
machine
learning
-
Master
M2
"Probabilites
et
Statistiques"
-
Universite
Paris-Sud
(Orsay)
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
40
more
to
come
41
Classification
Pattern recognition
Regression analysis
Prediction
Applied Statistics
Signal Processing
All Destinations
http://www.machinelearning.ru/wiki/index.php?title=_
Yandex
School
The
Yandex
School
of
Data
Analysis
The
School
of
Data
Analysis
is
a
free
Masters-level
program
in
Computer
Science
and
Data
Analysis,
which
is
offered
by
Yandex
since
2007
to
graduates
in
engineering,
mathematics,
computer
science
or
related
fields.
The
aim
of
the
School
is
to
train
specialists
in
data
analysis
and
information
retrieval
for
further
employment
at
Yandex
or
any
other
IT
company.
The
Schools
courses
are
taught
by
Russian
and
international
experts
at
Yandexs
Moscow
office
in
the
evenings,
several
times
a
week.
The
average
study
load
is
15-
20
hours
per
week,
including
9-12
hours
of
lectures
and
seminars.
The
School
also
runs
distance-learning
courses
and
provides
lectures
over
the
internet.
All
courses
at
the
Yandex
School
of
Data
Analysis
are
currently
taught
only
in
Russian.
http://shad.yandex.ru/lectures/
Alexander
Dyakonov
Resources
http://alexanderdyakonov.narod.ru/index.htm
Unknown
in
Data
Mining
and
Machine
Learning
(2013)
http://alexanderdyakonov.narod.ru/lpot4emu.pdf
Introduction
to
Data
Mining
(2012)
http://alexanderdyakonov.narod.ru/intro2datamining.pdf
Tricks
in
Data
Mining
(2011)
http://alexanderdyakonov.narod.ru/lpotdyakonov.pdf
Manual
"Logic
Games,
Data
Mining,
Weka,
RapidMiner,
MATLAB"
(2010)
, ,
, WEKA, RapidMiner MatLab
http://www.machinelearning.ru/wiki/images/7/7e/Dj2010up.pdf
42
43
Artificial
Intelligence
http://mooc.guokr.com/search/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA%
E8%83%BD
More
coming
soon
.
, , ,
.
, . ,
, ,
" " .
http://www.youtube.com/user/openofek/search?query=machine+learning
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
44
More
coming
soon
Applications
MIT
Media
Lab
The
real-time
city
is
now
real!
The
increasing
deployment
of
sensors
and
hand-held
electronics
in
recent
years
is
allowing
a
new
approach
to
the
study
of
the
built
environment.
The
way
we
describe
and
understand
cities
is
being
radically
transformed
-
alongside
the
tools
we
use
to
design
them
and
impact
on
their
physical
structure.
Studying
these
changes
from
a
critical
point
of
view
and
anticipating
them
is
the
goal
of
the
SENSEable
City
Laboratory,
a
new
research
initiative
at
the
Massachusetts
Institute
of
Technology.
http://senseable.mit.edu
TEDx
San
Francisco,
Connected
Reality
Connected
Reality
is
an
evening
that
explored
how
the
exponential
technologies
of
the
Internet
of
Things
will
give
us
deep
insights
that
augment
our
understanding
of
the
world
and
each
other
and
will
propel
our
ability
to
build
intelligent
tools
that
augment
our
lives.
We'll
briefly
see
the
future
through
the
eyes
of
presenters
from
varied
industries
of
medicine
to
manufacturing
who
will
illustrate
how
they
use
sensor
data
to
perceive
and
understand
the
world
differently
and
adjust
their
realities
based
on
their
new
connectivity
to
their
environment.
http://tedxsf.org/videos/#tedxsf-connected-reality
Emotion&Pain
Project
One
of
the
main
challenges
facing
healthcare
providers
in
the
UK
today
(and
in
Europe)
is
the
rising
number
of
people
with
chronic
health
problems.
Almost
1
in
7
UK
citizens
experiences
chronic
pain,
some
due
to
chronic
diseases
such
as
osteoarthritis,
but
much
of
it
mechanical
low
back
pain
(LBP)
with
no
treatable
pathology.
40%
of
these
people
experience
severe
pain
and
are
very
restricted
by
it.
The
capacity
of
our
current
health
care
system
is
insufficient
to
treat
all
these
patients
face-to-face.
Pain
experience
is
affected
by
physical,
psychological,
and
social
factors
and
hence
it
poses
a
problem
to
the
medical
profession.
This
has
prompted
the
development
of
a
multidisciplinary
approach
to
the
treatment
of
chronic
LBP,
primarily
involving
psychology
and
physiotherapy
alongside
specialist
clinicians
(see
British
Pain
Society
guidelines).
These
programmes
enable
patients
to
become
more
self-managing
through
improving
their
physical
and
psychological
functioning.
While
short
term
results
are
good,
maintenance
of
these
gains,
and
building
on
them,
remains
a
problem,
with
psychological
factors
being
one
of
the
primary
limiting
causes.
Rehabilitation-assistive
technologies
have
shown
some
success
in
helping
recovery
in
a
number
of
conditions
but
have
yet
to
have
an
impact
in
pain
management,
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
45
46
Google
Research
Google
publishes
hundreds
of
research
papers
each
year.
Publishing
is
important
to
us;
it
enables
us
to
collaborate
and
share
ideas
with,
as
well
as
learn
from,
the
broader
scientific
community.
Submissions
are
often
made
stronger
by
the
fact
that
ideas
have
been
tested
through
real
product
implementation
by
the
time
of
publication.
http://research.google.com/pubs/papers.html
Yahoo
Research
The
machine
learning
group
is
a
team
of
experts
in
computer
science,
statistics,
mathematical
optimization,
and
automatic
control.
They
focus
on
making
computers
learn
abstractions,
patterns,
conditional
probability
distributions,
and
policies
from
web
scale
data
with
the
goal
to
improve
the
online
experience
for
Yahoo!
users,
partner
publishers,
and
advertisers.
Machine
learning
has
such
a
broad
influence
on
the
internet,
it
can
be
quite
difficult
to
recognize.
Machine
learnings
benefits
are
often
hidden
they
are
the
spam
emails
you
dont
see,
the
uninteresting
news
articles
you
dont
see,
and
the
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
47
irrelevant
search
results
you
dont
see,
just
to
name
a
new.
Machine
learning
is
one
of
the
best
technologies
we
have
for
solving
some
of
the
biggest
problems
on
the
Web.
http://labs.yahoo.com/areas/?areas=machine-learning
Microsoft
Research
The
Machine
Learning
Groups
of
Microsoft
Research
include
a
set
of
researchers
and
developers
who
push
the
state
of
the
art
in
machine
learning.
We
span
the
space
from
proving
theorems
about
the
math
underlying
ML,
to
creating
new
ML
systems
and
algorithms,
to
helping
our
partner
product
groups
apply
ML
to
large
and
complex
data
sets.
http://research.microsoft.com/en-us/groups/mldept/
Journal
from
MIT
Press
The
Journal
of
Machine
Learning
Research
(JMLR)
provides
an
international
forum
for
the
electronic
and
paper
publication
of
high-quality
scholarly
articles
in
all
areas
of
machine
learning.
All
published
papers
are
freely
available
online.
http://jmlr.org
INRIA
Access
to
Research
Papers
http://haltools.inrialpes.fr/Public/afficheRequetePubli.php?labos_exp=sierra&CB_auteur=oui&CB_titre=oui&CB_article=oui&l
angue=Anglais&tri_exp=annee_publi&tri_exp3=date_publi&ordre_aff=TA&Fen=Aff&css=../css/VisuCondense.css
48
JAVA
Weka
3:
Data
Mining
Software
in
Java
Weka
is
a
collection
of
machine
learning
algorithms
for
data
mining
tasks.
The
algorithms
can
either
be
applied
directly
to
a
dataset
or
called
from
your
own
Java
code.
Weka
contains
tools
for
data
pre-processing,
classification,
regression,
clustering,
association
rules,
and
visualization.
It
is
also
well-suited
for
developing
new
machine
learning
schemes.
http://www.cs.waikato.ac.nz/~ml/weka/index.html
A
deep-learning
library
for
Java
Distributed
Deep
Learning
Platform
for
Java
https://github.com/agibsonccc/java-deeplearning
List
of
Java
ML
Software
by
Machine
Learning
Mastery
http://machinelearningmastery.com/java-machine-learning/
List
of
Java
ML
Software
by
MLOSS
http://mloss.org/software/language/java/
PYTHON
Theano
Library
for
Deep
Learning
Theano
is
a
Python
library
that
allows
you
to
define,
optimize,
and
evaluate
mathematical
expressions
involving
multi-dimensional
arrays
efficiently.
Theano
features:
tight
integration
with
NumPy
Use
numpy.ndarray
in
Theano-compiled
functions.
transparent
use
of
a
GPU
Perform
data-intensive
calculations
up
to
140x
faster
than
with
CPU.(float32
only)
efficient
symbolic
differentiation
Theano
does
your
derivatives
for
function
with
one
or
many
inputs.
speed
and
stability
optimizations
Get
the
right
answer
for
log(1+x)
even
when
x
is
really
tiny.
dynamic
C
code
generation
Evaluate
expressions
faster.
extensive
unit-testing
and
self-verification
Detect
and
diagnose
many
types
of
mistake.
Theano
has
been
powering
large-scale
computationally
intensive
scientific
investigations
since
2007.
But
it
is
also
approachable
enough
to
be
used
in
the
classroom
(IFT6266
at
the
University
of
Montreal).
http://deeplearning.net/software/theano/
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
49
http://nbviewer.ipython.org/github/craffel/theano-
tutorial/blob/master/Theano%20Tutorial.ipynb
Introduction
to
Deep
Learning
with
Python
Alec
Radford,
Head
of
Research
at
indico
Data
Solutions,
speaking
on
deep
learning
with
Python
and
the
Theano
library.
The
emphasis
of
the
talk
is
on
high
performance
computing,
natural
language
processing
using
recurrent
neural
nets,
and
large
scale
learning
with
GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
Udacity
-
Programming
foundations
with
Python
Youll
pick
up
some
great
tools
for
your
programming
toolkit
in
this
course!
You
will:
Start
coding
in
the
programming
language
Python;
Reuse
and
share
code
with
Object
Oriented
Programming;
Create
and
share
amazing,
life-hacking
projects!
https://www.udacity.com/course/ud036
Scikit-learn,
Machine
Learning
in
Python
Simple
and
efficient
tools
for
data
mining
and
data
analysis
Accessible
to
everybody,
and
reusable
in
various
contexts
Built
on
NumPy,
SciPy,
and
matplotlib
Open
source,
commercially
usable
-
BSD
license
http://scikit-learn.org/stable/index.html
Pydata
PyData
is
a
gathering
of
users
and
developers
of
data
analysis
tools
in
Python.
The
goals
are
to
provide
Python
enthusiasts
a
place
to
share
ideas
and
learn
from
each
other
about
how
best
to
apply
our
language
and
tools
to
ever-evolving
challenges
in
the
vast
realm
of
data
management,
processing,
analytics,
and
visualization.
https://www.youtube.com/user/PyDataTV/videos
PyData
NYC
2014
Videos
Published
4
days
ago
https://www.youtube.com/user/PyDataTV/videos?spfreload=10
PyData
is
a
gathering
of
users
and
developers
of
data
analysis
tools
in
Python.
The
goals
are
to
provide
Python
enthusiasts
a
place
to
share
ideas
and
learn
from
each
other
about
how
best
to
apply
our
language
and
tools
to
ever-evolving
challenges
in
the
vast
realm
of
data
management,
processing,
analytics,
and
visualization.
We
aim
to
be
an
accessible,
community-driven
conference,
with
tutorials
for
novices,
advanced
topical
workshops
for
practitioners,
and
opportunities
for
package
developers
and
users
to
meet
in
person.
A
major
goal
of
the
conference
is
to
provide
a
venue
for
users
across
all
the
various
domains
of
data
analysis
to
share
their
experiences
and
their
techniques,
as
well
as
highlight
the
triumphs
and
potential
pitfalls
of
using
Python
for
certain
kinds
of
problems.
http://pydata.org/nyc2014/about/about/
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
50
PyData,
The
Complete
Works
by
Rohit
Sivaprasad
Added
in
the
kit
11-Nov-2014
The
unofficial
index
of
all
PyData
talks.
This
was
intially
going
to
be
a
pickled
pandas
DataFrame
object,
but
then
I
decided
against
it.
So
here
it
is
-
in
beautiful
Github
flavored
markdown.
There
are
placeholders
for
links
to
the
video.
Currently,
the
hyperlinks
point
to
the
pydata.org
talk
pages.
Please
do
feel
free
to
make
it
better
by
contributing
to
the
repo.
https://github.com/DataTau/datascience-anthology-pydata
Anaconda
Completely
free
enterprise-ready
Python
distribution
for
large-scale
data
processing,
predictive
analytics,
and
scientific
computing
We
want
to
ensure
that
Python,
NumPy,
SciPy,
Pandas,
IPython,
Matplotlib,
Numba,
Blaze,
Bokeh,
and
other
great
Python
data
analysis
tools
can
be
used
everywhere.
We
want
to
make
it
easier
for
Python
evangelists
and
teachers
to
promote
the
use
of
Python.
We
want
to
give
back
to
the
Python
community
that
we
love
being
a
part
of.
https://store.continuum.io/cshop/anaconda/
Ipython
Interactive
Computing
IPython
provides
a
rich
architecture
for
interactive
computing
with:
Powerful
interactive
shells
(terminal
and
Qt-based).
A
browser-based
notebook
with
support
for
code,
text,
mathematical
expressions,
inline
plots
and
other
rich
media.
Support
for
interactive
data
visualization
and
use
of
GUI
toolkits.
Flexible,
embeddable
interpreters
to
load
into
your
own
projects.
Easy
to
use,
high
performance
tools
for
parallel
computing.
http://ipython.org
Scipy
SciPy
refers
to
several
related
but
distinct
entities:
The
SciPy
Stack,
a
collection
of
open
source
software
for
scientific
computing
in
Python,
and
particularly
a
specified
set
of
core
packages.
The
community
of
people
who
use
and
develop
this
stack.
Several
conferences
dedicated
to
scientific
computing
in
Python
-
SciPy,
EuroSciPy
and
SciPy.in.
The
SciPy
library,
one
component
of
the
SciPy
stack,
providing
many
numerical
routines.
http://www.scipy.org
Numpy
NumPy
is
the
fundamental
package
for
scientific
computing
with
Python.
It
contains
among
other
things:
a
powerful
N-dimensional
array
object
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
51
emcee
emcee
is
an
extensible,
pure-Python
implementation
of
Goodman
&
Weare's
Affine
Invariant
Markov
chain
Monte
Carlo
(MCMC)
Ensemble
sampler.
It's
designed
for
Bayesian
parameter
estimation
and
it's
really
sweet!
http://dan.iel.fm/emcee/current/
52
PyMC
PyMC
is
a
python
module
that
implements
Bayesian
statistical
models
and
fitting
algorithms,
including
Markov
chain
Monte
Carlo.
Its
flexibility
and
extensibility
make
it
applicable
to
a
large
suite
of
problems.
Along
with
core
sampling
functionality,
PyMC
includes
methods
for
summarizing
output,
plotting,
goodness-
of-fit
and
convergence
diagnostics.
http://pymc-devs.github.io/pymc/
Pylearn2
Ian
J.
Goodfellow,
David
Warde-Farley,
Pascal
Lamblin,
Vincent
Dumoulin,
Mehdi
Mirza,
Razvan
Pascanu,
James
Bergstra,
Frdric
Bastien,
and
Yoshua
Bengio.
"Pylearn2:
a
machine
learning
research
library".
arXiv
preprint
arXiv:1308.4214
(BibTeX)
https://github.com/lisa-lab/pylearn2
Giant
list
of
python
learning
resources
Keep
following
this
post,
we'll
keep
updating
this
huge
list
&
collection.
http://python2web.com/giant-list-of-python-learning-resources/
PyCon
US
2014
PyCon
is
the
largest
annual
gathering
for
the
community
using
and
developing
the
open-source
Python
programming
language.
It
is
produced
and
underwritten
by
the
Python
Software
Foundation,
the
501(c)(3)
nonprofit
organization
dedicated
to
advancing
and
promoting
Python.
Through
PyCon,
the
PSF
advances
its
mission
of
growing
the
international
community
of
Python
programmers.
Because
PyCon
is
backed
by
the
non-profit
PSF,
we
keep
registration
costs
much
lower
than
comparable
technology
conferences
so
that
PyCon
remains
accessible
to
the
widest
group
possible.
The
PSF
also
pays
for
the
ongoing
development
of
the
software
that
runs
PyCon
and
makes
it
available
under
a
liberal
open
source
license.
140
videos
http://pyvideo.org/category/50/pycon-us-2014
PyCon
India
2012
https://www.youtube.com/playlist?list=PL6GW05BfqWIdWaV_aP6kHJKFY0ybOOfoA
Montreal
Python
Montral-Python's
mission
is
to
promote
the
growth
of
a
lively
and
dynamic
community
of
users
of
the
Python
programming
language
and
to
promote
the
use
of
the
latter.
Montral-Python
also
aims
to
disseminate
the
local
Python
knowledge
to
build
a
stronger
developer
community.
Montral-Python
promotes
Free
and
Open
Source
Software,
favors
its
adoption
within
the
community,
and
collaborates
with
community
players
to
achieve
this
goal.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
53
http://www.youtube.com/user/MontrealPython/videos
http://montrealpython.org/en/
SciPy
2014
SciPy
is
a
community
dedicated
to
the
advancement
of
scientific
computing
through
open
source
Python
software
for
mathematics,
science,
and
engineering.
The
annual
SciPy
Conference
allows
participants
from
all
types
of
organizations
to
showcase
their
latest
projects,
learn
from
skilled
users
and
developers,
and
collaborate
on
code
development.
http://pyvideo.org/category/51/scipy-2014
PyLadies
London
Meetup
resources
PyLadies
is
an
international
mentorship
group
with
a
focus
on
helping
more
women
and
genderqueers
become
active
participants
and
leaders
in
the
Python
open-
source
community.
Our
mission
is
to
promote,
educate
and
advance
a
diverse
Python
community
through
outreach,
education,
conferences,
events,
and
social
gatherings.
PyLadies
also
aims
to
provide
a
friendly
support
network
for
women
and
genderqueers,
and
a
bridge
to
the
larger
Python
world.
https://github.com/pyladieslondon/resources
Python
Tools
for
Machine
Learning
by
CB
Insights
http://www.cbinsights.com/blog/python-tools-machine-learning
Python
Tutorials
by
Jessica
MacKellar
I
am
a
startup
founder,
software
engineer,
and
open
source
developer
living
in
San
Francisco,
California.
I
enjoy
the
Internet,
networking,
low-level
systems
engineering,
relational
databases,
tinkering
on
electronics
projects,
and
contributing
to
and
helping
other
people
contribute
to
open
source
software.
"Be
the
change
you
wish
to
see
in
the
world"
may
be
clichd,
but
what
can
I
say,
I
believe
in
it.
I
am
committed
to
applying
my
skills,
in
individual
and
collective
efforts,
to
improve
the
world.
Right
now,
this
means
I
spend
a
lot
of
time
volunteering,
engaging
technologists
about
education,
and
empowering
effective
people
and
initiatives
in
my
capacity
as
a
Director
for
the
Python
Software
Foundation.
http://web.mit.edu/jesstess/
OCTAVE
GNU
Octave
is
a
high-level
interpreted
language,
primarily
intended
for
numerical
computations.
It
provides
capabilities
for
the
numerical
solution
of
linear
and
nonlinear
problems,
and
for
performing
other
numerical
experiments.
It
also
provides
extensive
graphics
capabilities
for
data
visualization
and
manipulation.
Octave
is
normally
used
through
its
interactive
command
line
interface,
but
it
can
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
54
also
be
used
to
write
non-interactive
programs.
The
Octave
language
is
quite
similar
to
Matlab
so
that
most
programs
are
easily
portable.
http://www.gnu.org/software/octave/
JULIA
Julia
is
a
high-level,
high-performance
dynamic
programming
language
for
technical
computing,
with
syntax
that
is
familiar
to
users
of
other
technical
computing
environments.
It
provides
a
sophisticated
compiler,
distributed
parallel
execution,
numerical
accuracy,
and
an
extensive
mathematical
function
library.
The
library,
largely
written
in
Julia
itself,
also
integrates
mature,
best-of-breed
C
and
Fortran
libraries
for
linear
algebra,
random
number
generation,
signal
processing,
and
string
processing.
In
addition,
the
Julia
developer
community
is
contributing
a
number
of
external
packages
through
Julias
built-in
package
manager
at
a
rapid
pace.
IJulia,
a
collaboration
between
the
IPython
and
Julia
communities,
provides
a
powerful
browser-based
graphical
notebook
interface
to
Julia.
Julia
programs
are
organized
around
multiple
dispatch;
by
defining
functions
and
overloading
them
for
different
combinations
of
argument
types,
which
can
also
be
user-defined.
For
a
more
in-depth
discussion
of
the
rationale
and
advantages
of
Julia
over
other
systems,
see
the
following
highlights
or
read
the
introduction
in
the
online
manual.
http://julialang.org
Julia
by
example
http://www.scolvin.com/juliabyexample/
The
R
PROJECT
for
Statistical
Computing
R
R
is
a
language
and
environment
for
statistical
computing
and
graphics
R
provides
a
wide
variety
of
statistical
(linear
and
nonlinear
modelling,
classical
statistical
tests,
time-series
analysis,
classification,
clustering,
...)
and
graphical
techniques,
and
is
highly
extensible.
The
S
language
is
often
the
vehicle
of
choice
for
research
in
statistical
methodology,
and
R
provides
an
Open
Source
route
to
participation
in
that
activity.
One
of
R's
strengths
is
the
ease
with
which
well-designed
publication-quality
plots
can
be
produced,
including
mathematical
symbols
and
formulae
where
needed.
Great
care
has
been
taken
over
the
defaults
for
the
minor
design
choices
in
graphics,
but
the
user
retains
full
control.
http://www.r-project.org
R
Graph
Gallery
The
blog
is
a
collection
of
script
examples
with
example
data
and
output
plots.
R
produce
excellent
quality
graphs
for
data
analysis,
science
and
business
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
55
presentation,
publications
and
other
purposes.
Self-help
codes
and
examples
are
provided.
Enjoy
nice
graphs
!!
http://rgraphgallery.blogspot.co.uk/2013/04/ploting-heatmap-in-map-using-maps.html
56
connect
and
follow
the
R
blogosphere
(you
can
view
a
7
minute
talk,
from
useR2011,
for
more
information
about
the
R-blogosphere).
http://www.r-bloggers.com
STAN
Software
Stan
is
a
probabilistic
programming
language
implementing
full
Bayesian
statistical
inference
with
MCMC
sampling
(NUTS,
HMC)
and
penalized
maximum
likelihood
estimation
with
Optimization
(BFGS)
Stan
is
coded
in
C++
and
runs
on
all
major
platforms
(Linux,
Mac,
Windows).
Stan
is
freedom-respecting,
open-source
software
(new
BSD
core,
GPLv3
interfaces).
Interfaces
Download
and
getting
started
instructions,
organized
by
interface:
RStan
v2.5.0
(R)
PyStan
v2.5.0
(Python)
CmdStan
v2.5.0
(shell,
command-line
terminal)
MatlabStan
(MATLAB)
Stan.jl
(Julia)
http://mc-stan.org
List
of
Machine
Learning
Open
Source
Software
To
support
the
open
source
software
movement,
JMLR
MLOSS
publishes
contributions
related
to
implementations
of
non-trivial
machine
learning
algorithms,
toolboxes
or
even
languages
for
scientific
computing.
http://jmlr.org/mloss/
Google
Prediction
API
Google's
cloud-based
machine
learning
tools
can
help
analyze
your
data
to
add
the
following
features
to
your
applications:
Customer
sentiment
analysis,
Message
routing
decisions,
Document
and
email
classification,
Recommendation
systems,
Churn
analysis,
Spam
detection,
Upsell
opportunity
analysis,
Diagnostics,
Suspicious
activity
identification,
and
much
more
Free
Quota:
Usage
is
free
for
the
first
six
months,
up
to
the
following
limits
per
Google
Developers
Console
project.
This
free
quota
applies
even
when
billing
is
enabled,
until
the
six-month
expiration
time.
Usage
limits:
Predictions:
100
predictions/day
Hosted
model
predictions:
Hosted
models
have
a
usage
limit
of
100
predictions/day/user
across
all
models.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
57
F#
Software
Foundation
F#
is
ideally
suited
to
machine
learning
because
of
its
efficient
execution,
succinct
style,
data
access
capabilities
and
scalability.
F#
has
been
successfully
used
by
some
of
the
most
advanced
machine
learning
teams
in
the
world,
including
several
groups
at
Microsoft
Research.
Try
F#
has
some
introductory
machine
learning
algorithms.
Further
resources
related
to
different
aspects
of
machine
learning
are
below.
See
also
the
Math
and
Statistics
and
Data
Science
sections
for
related
material.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
58
http://fsharp.org/machine-learning/
BigML
Now
Free
Unlimited
tasks
(up
to
16MB/Task)
https://bigml.com/
BRML
Toolbox
in
Matlab
David
Barber
Toolbox,
University
College
London
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Software
59
not
enough
to
bring
existing
and
freshly
developed
toolboxes
and
algorithmic
implementations
to
people's
attention.
More
importantly
the
MLOSS
platform
will
facilitate
collaborations
with
the
goal
of
creating
a
set
of
tools
that
work
with
one
another.
Far
from
requiring
integration
into
a
single
package,
we
believe
that
this
kind
of
interoperability
can
also
be
achieved
in
a
collaborative
manner,
which
is
especially
suited
to
open
source
software
development
practices.
https://mloss.org/software/view/501/
Sourceforge
Find,
Create,
and
Publish
Open
Source
Software
for
free
http://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=machine%20learning
Freecode
Freecode
maintains
the
Web's
largest
index
of
Linux,
Unix
and
cross-platform
software,
and
mobile
applications.
Thousands
of
applications,
which
are
preferably
released
under
an
open
source
license,
are
meticulously
cataloged
in
the
Freecode
database,
and
links
to
new
applications
are
added
daily.
Each
entry
provides
a
description
of
the
software,
links
to
download
it
and
to
obtain
more
information,
and
a
history
of
the
project's
releases,
so
readers
can
keep
up-to-date
on
the
latest
developments.
Freecode
is
the
first
stop
for
Linux
users
hunting
for
the
software
they
need
for
work
or
play.
It
is
continuously
updated
with
the
latest
developments
from
the
"release
early,
release
often"
community.
In
addition
to
providing
news
on
new
releases,
Freecode
offers
a
variety
of
original
content
on
technical,
political,
and
social
aspects
of
software
and
programming,
written
by
both
Freecode
readers
and
Free
Software
luminaries.
The
comment
board
attached
to
each
page
serves
as
a
home
for
spirited
discussion,
bug
reports,
and
technical
support.
An
essential
resource
for
serious
developers,
Freecode
makes
it
possible
to
keep
up
on
who's
doing
what,
and
what
everyone
else
thinks
of
it.
http://freecode.com/search?q=machine+learning&submit=Search
Open
Machine
Learning
Workshop
organized
by
Alekh
Agarwal,
Alina
Beygelzimer,
and
John
Langford,
August
2014
The
goal
of
this
workshop
is
to
inform
people
about
open
source
machine
learning
systems
being
developed,
aid
the
coordination
of
such
projects,
and
discuss
future
plans.
http://hunch.net/~nyoml/
Maxim
Milakov
Software
I
am
a
researcher
in
machine
learning
and
high-performance
computing.
I
designed
and
implemented
nnForge
-
a
library
for
training
convolutional
and
fully
connected
neural
networks,
with
CPU
and
GPU
(CUDA)
backends.
You
will
find
my
thoughts
on
convolutional
neural
networks
and
the
results
of
applying
convolutional
ANNs
for
various
classification
tasks
in
the
Blog.
http://www.milakov.org
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
60
Alfonso
Nieto-Castanon
Software
http://www.alfnie.com/software
Lib
Skylark
The
Sketching
based
Matrix
computations
for
Machine
Learning
is
a
library
for
matrix
computations
suitable
for
general
statistical
data
analysis
and
optimization
applications.
Many
tasks
in
machine
learning
and
statistics
ultimately
end
up
being
problems
involving
matrices:
whether
you're
finding
the
key
players
in
the
bitcoin
market,
or
inferring
where
tweets
came
from,
or
figuring
out
what's
in
sewage,
you'll
want
to
have
a
toolkit
for
least-squares
and
robust
regression,
eigenvector
analysis,
non-
negative
matrix
factorization,
and
other
matrix
computations.
Sketching
is
a
way
to
compress
matrices
that
preserves
key
matrix
properties;
it
can
be
used
to
speed
up
many
matrix
computations.
Sketching
takes
a
given
matrix
A
and
produces
a
sketch
matrix
B
that
has
fewer
rows
and/or
columns
than
A.
For
a
good
sketch
B,
if
we
solve
a
problem
with
input
B,
the
solution
will
also
be
pretty
good
for
input
A.
For
some
problems,
sketches
can
also
be
used
to
get
faster
ways
to
find
high-precision
solutions
to
the
original
problem.
In
other
cases,
sketches
can
be
used
to
summarize
the
data
by
identifying
the
most
important
rows
or
columns.
A
simple
example
of
sketching
is
just
sampling
the
rows
(and/or
columns)
of
the
matrix,
where
each
row
(and/or
column)
is
equally
likely
to
be
sampled.
This
uniform
sampling
is
quick
and
easy,
but
doesn't
always
yield
good
sketches;
however,
there
are
sophisticated
sampling
methods
that
do
yield
good
sketches.
http://xdata-skylark.github.io
Mutual
Information
Text
Explorer
The
Mutual
information
Text
Explorer
is
a
tool
that
allows
interactive
exploration
of
text
data
and
document
covariates.
See
the
paper
or
slides
for
information.
Currently,
an
experimental
system
is
available.
http://brenocon.com/MiTextExplorer/
Data
Science
Resources
by
Jonathan
Bower
on
GitHub
Added
in
the
kit
27-Oct-2014
This
repo
is
intended
to
provide
open
source
resources
to
facilitate
learning
or
to
point
practicing/aspiring
data
scientists
in
the
right
direction.
It
also
exists
so
that
I
can
keep
track
of
resources
that
are/were
helpful
to
me
and
hopefully
for
you.
I
aim
to
cover
the
full
spectrum
of
data
science
and
to
hopefully
include
topics
of
data
science
that
aren't
either
actively
covered
or
easy
to
find
in
the
open-source
world.
For
instance,
I
haven't
focused
on
in-depth
machine
learning
theory
since
that
is
well
covered.
If
you
are
looking
for
ML
theory
I
would
look
to
some
of
the
online
courses,
books
or
bootcamps.
There
is
a
lot
of
theory
information
available
online,
some
is
linked
lower
on
this
page
here,
here
and
other
info
is
available
with
many
purchasable
books.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
61
Keep
in
mind
that
this
is
a
constant
work
in
progress.
If
you
have
anything
to
add,
any
feedback,
or
would
like
to
be
a
contributor
-
please
reach
out.
If
there
are
any
mistakes
or
typos,
be
patient
with
me,
but
please
let
me
know.
Lastly,
I
would
add
that
a
large
portion
of
data
science
is
exploratory
data
analysis
and
properly
cleaning
your
data
to
implement
the
tools
and
theory
necessary
to
solve
the
problem
at
hand.
For
each
problem
there
are
many
different
ways
and
tools
to
execute
a
successful
solution
-
if
one
method
isn't
working
re-evaluate,
re-
work
the
problem,
try
another
approach
and/or
reach
out
to
the
community
for
support.
Good
luck
and
I
hope
this
repo
helpful!
https://github.com/jonathan-bower/DataScienceResources
Joseph
Misiti's
Blog
A
curated
list
of
awesome
machine
learning
frameworks,
libraries
and
software
(by
language).
Inspired
by
awesome-php.
Other
awesome
lists
can
be
found
in
the
awesome-awesomeness
list.
https://github.com/josephmisiti/awesome-machine-learning
Michael
Waskom
GitHub
repositories
I'm
a
Ph.D.
student
in
the
Department
of
Psychology
at
Stanford
University,
where
I
work
with
Anthony
Wagner.
I
use
behavioral,
computational,
and
neuroimaging
methods
to
study
cognitive
control
and
decision
making
in
humans.
Previously,
I
spent
time
in
John
Gabrieli's
lab
at
MIT
investigating
whether
cognition
can
be
improved
through
training.
I
did
my
undergrad
at
Amherst
College,
where
I
studied
philosophy
and
neuroscience.
Complementing
this
research,
I
have
developed
a
set
of
software
libraries
for
statistical
analysis
and
visualization.
These
libraries
aim
to
make
computationally-
based
research
more
reproducible
and
improve
the
visual
presentation
of
statistical
and
neuroimaging
results.
https://github.com/mwaskom
Visualizing
distributions
of
data
This
notebook
demonstrates
different
approaches
to
graphically
representing
distributions
of
data,
specifically
focusing
on
the
tools
provided
by
the
seaborn
package.
http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/p
lotting_distributions.ipynb
Exploring
Seaborn
and
Pandas
based
plot
types
in
HoloViews
by
Philipp
John
Frederic
Rudiger
In
this
notebook
we'll
look
at
interfacing
between
the
composability
and
ability
to
generate
complex
visualizations
that
HoloViews
provides
and
the
great
looking
plots
incorporated
in
the
seaborn
library.
Along
the
way
we'll
explore
how
to
wrap
different
types
of
data
in
a
number
of
Seaborn
View
types,
including:
-
Distribution
Views
-
Bivariate
Views
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
62
-
TimeSeries
Views
Additionally
we
explore
how
a
Pandas
dframe
can
be
wrapped
in
a
general
purpose
View
type,
which
can
either
be
used
to
convert
the
data
into
standard
View
types
or
be
visualized
directly
using
a
wide
array
of
plotting
options,
including:
-
Regression
plots,
correlation
plots,
box
plots,
autocorrelation
plots,
scatter
matrices,
histograms
or
regular
scatter
or
line
plots.
http://philippjfr.com/blog/seabornviews/
Open
Source
Hong
Kong
Open
Source
Hong
Kong
(OSHK)
is
an
open
source
organization
in
Hong
Kong
which
is
aimed
to
advocate
open
source
and
technologies
developments.
http://opensource.hk/en/event
Lamda
Group,
Nanjing
University
Open
Source
Software
http://lamda.nju.edu.cn/Data.ashx#code
Apache
SPARK
Apache
Spark
Machine
Learning
Library
MLlib
is
a
Spark
implementation
of
some
common
machine
learning
(ML)
functionality,
as
well
associated
tests
and
data
generators.
MLlib
currently
supports
four
common
types
of
machine
learning
problem
settings,
namely,
binary
classification,
regression,
clustering
and
collaborative
filtering,
as
well
as
an
underlying
gradient
descent
optimization
primitive.
http://spark.apache.org/docs/0.9.1/mllib-guide.html
2013
Spark
Summit
exercises
Welcome
to
the
Spark
Summit
hands-on
exercises.
These
exercises
are
adapted
from
similar
exercises
that
were
prepared
for
and
run
at
AMP
Camp
Big
Data
Bootcamps.
They
were
written
by
volunteer
graduate
students
and
postdocs
in
the
UC
Berkeley
AMPLab.
Many
of
those
same
graduate
students
are
also
volunteers
here
on
the
Spark
Summit
Training
day
team
as
well.
The
exercises
we
cover
today
will
have
you
working
directly
with
the
Spark
specific
components
of
the
AMPLabs
open-source
software
stack,
called
the
Berkeley
Data
Analytics
Stack
(BDAS).
http://spark-summit.org/2013/exercises/index.html
63
64
Apache
Spark
is
100%
open
source,
and
at
Databricks
we
are
fully
committed
to
maintaining
this
model.
We
believe
that
no
computing
platform
will
win
in
the
Big
Data
space
unless
it
is
fully
open
source.
Spark
has
one
of
the
largest
open
source
communities
in
Big
Data,
with
over
200
contributors
from
50+
organizations.
Databricks
works
closely
with
the
community
to
maintain
this
momentum.
https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA/videos
Apache
MAHOUT
Apache
Mahout
ML
library
The
Apache
Mahout
project's
goal
is
to
build
a
scalable
machine
learning
library.
Currently
Mahout
supports
mainly
three
use
cases:
Recommendation
mining
takes
users'
behavior
and
from
that
tries
to
find
items
users
might
like.
Clustering
takes
e.g.
text
documents
and
groups
them
into
groups
of
topically
related
documents.
Classification
learns
from
exisiting
categorized
documents
what
documents
of
a
specific
category
look
like
and
is
able
to
assign
unlabelled
documents
to
the
(hopefully)
correct
category.
https://mahout.apache.org
Apache
Mahout
on
Javaworld
Enjoy
machine
learning
with
Mahout
on
Hadoop,
2014
Mahout
brings
the
power
of
scalable
processing
to
Hadoop's
huge
data
sets
http://www.javaworld.com/article/2241046/big-data/enjoy-machine-learning-with-mahout-on-hadoop.html
http://www.javaworld.com/article/2077907/open-source-tools/mapreduce-programming-with-apache-hadoop.html
Deeplearning4j
Deeplearning4j
is
the
first
commercial-grade
deep
learning
library
written
in
Java.
It
is
meant
to
be
used
in
business
environments,
rather
than
as
a
research
tool
for
extensive
data
exploration.
Deeplearning4j
is
most
helpful
in
solving
distinct
problems,
like
identifying
faces,
voices,
spam
or
e-commerce
fraud.
Deeplearning4j
aims
to
be
cutting-edge
plug
and
play,
more
convention
than
configuration.
By
following
its
conventions,
you
get
an
infinitely
scalable
deep-
learning
architecture.
The
framework
has
a
domain-specific
language
(DSL)
for
neural
networks,
to
turn
their
multiple
knobs.
Deeplearning4j
includes
a
distributed
deep-learning
framework
and
a
normal
deep-learning
framework;
i.e.
it
runs
on
a
single
thread
as
well.
Training
takes
place
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
65
in
the
cluster,
which
means
it
can
process
massive
amounts
of
data.
Nets
are
trained
in
parallel
via
iterative
reduce.
The
distributed
framework
is
made
for
data
input
and
neural
net
training
at
scale,
and
its
output
should
be
highly
accurate
predictive
models.
By
following
the
links
at
the
bottom
of
each
page,
you
will
learn
to
set
up,
and
train
with
sample
data,
several
types
of
deep-learning
networks.
These
include
single-
and
multithread
networks,
Restricted
Boltzmann
machines,
deep-belief
networks
and
Stacked
Denoising
Autoencoders.
For
a
quick
introduction
to
neural
nets,
please
see
our
overview.
http://deeplearning4j.org/
Udacity
opencourseware
"Intro
to
Hadoop
and
MapReduce"
Course
Summary
The
Apache
Hadoop
project
develops
open-source
software
for
reliable,
scalable,
distributed
computing.
Learn
the
fundamental
principles
behind
it,
and
how
you
can
use
its
power
to
make
sense
of
your
Big
Data.
Why
Take
This
Course?
How
Hadoop
fits
into
the
world
(recognize
the
problems
it
solves)
Understand
the
concepts
of
HDFS
and
MapReduce
(find
out
how
it
solves
the
problems)
Write
MapReduce
programs
(see
how
we
solve
the
problems)
Practice
solving
problems
on
your
own
https://www.udacity.com/course/ud617
Storm
Apache
Apache
Storm
is
a
free
and
open
source
distributed
realtime
computation
system.
Storm
makes
it
easy
to
reliably
process
unbounded
streams
of
data,
doing
for
realtime
processing
what
Hadoop
did
for
batch
processing.
Storm
is
simple,
can
be
used
with
any
programming
language,
and
is
a
lot
of
fun
to
use!
http://storm.incubator.apache.org
http://storm.incubator.apache.org/documentation/Tutorial.html
Michael
Viogiatzis
Blog
How
to
spot
first
stories
on
Twitter
using
Storm
As
a
first
blog
post,
I
decided
to
describe
a
way
to
detect
first
stories
(a.k.a
new
events)
on
Twitter
as
they
happen.
This
work
is
part
of
the
Thesis
I
wrote
last
year
for
my
MSc
in
Computer
Science
in
the
University
of
Edinburgh.You
can
find
the
document
here.
http://micvog.com/2013/09/08/storm-first-story-detection/
66
Elasticsearch
Elasticsearch
is
a
flexible
and
powerful
open
source,
distributed,
real-time
search
and
analytics
engine.
Architected
from
the
ground
up
for
use
in
distributed
environments
where
reliability
and
scalability
are
must
haves,
Elasticsearch
gives
you
the
ability
to
move
easily
beyond
simple
full-text
search.
Through
its
robust
set
of
APIs
and
query
DSLs,
plus
clients
for
the
most
popular
programming
languages,
Elasticsearch
delivers
on
the
near
limitless
promises
of
search
technology.
http://www.elasticsearch.org
Prediction
IO
BUILD
SMARTER
SOFTWARE
with
Machine
Learning
PredictionIO
is
an
open
source
machine
learning
server
for
software
developers
to
create
predictive
features,
such
as
personalization,
recommendation
and
content
discovery.
http://prediction.io
https://hacks.mozilla.org/2014/04/introducing-predictionio/
http://www.youtube.com/channel/UCN0jVSCIEh7eeuWXIuo316g
Container
Cluster
Manager
Kubernetes
builds
on
top
of
Docker
to
construct
a
clustered
container
scheduling
service.
The
goals
of
the
project
are
to
enable
users
to
ask
a
Kubernetes
cluster
to
run
a
set
of
containers.
The
system
will
automatically
pick
a
worker
node
to
run
those
containers
on.
As
container
based
applications
and
systems
get
larger,
some
tools
are
provided
to
facilitate
sanity.
This
includes
ways
for
containers
to
find
and
communicate
with
each
other
and
ways
to
work
with
and
manage
sets
of
containers
that
do
similar
work.
When
looking
at
the
architecture
of
the
system,
we'll
break
it
down
to
services
that
run
on
the
worker
node
and
services
that
play
a
"master"
role.
https://github.com/GoogleCloudPlatform/kubernetes?utm_source
Domino
Data
Labs
Domino
is
a
platform
for
modern
data
scientists
using
Python,
R,
Matlab,
and
more.
Use
our
cloud-hosted
infrastructure
to
securely
run
your
code
on
powerful
hardware
with
a
single
command
without
any
changes
to
your
code.
If
you
have
your
own
infrastructure,
our
Enterprise
offering
provides
powerful,
easy-to-use
cluster
management
functionality
behind
your
firewall.
Special
offer
for
The
Machine
Learning
Salon's
readers:
Machine
Learning
Salon
readers
can
get
$50
worth
of
compute
credits
when
they
sign
up
for
Domino.
Domino
lets
you
run
your
analyses
on
powerful
cloud
hardware
in
one
step
without
any
setup
or
changes
to
your
code.
Sign
up
here,
or
email
support@dominoup.zendesk.com
and
tell
them
you
are
a
Machine
Learning
Salon
reader.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
67
http://www.dominoup.com
Data
Science
Central
Data
Science
Central
is
the
industry's
online
resource
for
big
data
practitioners.
From
Analytics
to
Data
Integration
to
Visualization,
Data
Science
Central
provides
a
community
experience
that
includes
a
robust
editorial
platform,
social
interaction,
forum-based
technical
support,
the
latest
in
technology,
tools
and
trends
and
industry
job
opportunities.
http://www.datasciencecentral.com
Amazon
Web
Services
Videos
https://www.youtube.com/user/AmazonWebServices/playlists
Google
Cloud
Computing
Videos
https://developers.google.com/cloud/videos
VLAB:
Deep
Learning:
Intelligence
from
Big
Data,
Stanford
Graduate
School
of
Business
Added
the
22-Nov-2014
http://www.youtube.com/watch?v=czLI3oLDe8M&spfreload=10
Machine
Learning
and
Big
Data
in
Cyber
Security
Eyal
Kolman
Technion
Lecture
Added
the
22-Nov-2014
http://www.youtube.com/watch?v=G2BydTwrrJk&spfreload=10
Chaire
Machine
Learning
Big
Data,
Telecom
Paris
Tech
(Videos
in
French)
Tlcom
ParisTech
a
organis
les
premires
rencontres
de
la
Chaire
de
recherche
Machine
Learning
for
Big
data,
le
26
novembre
2014,
avec
ses
partenaires
Fondation
tlcom,
Criteo,
PSA
Peugeot
Citron,
Safran.
http://www.dailymotion.com/video/x2cti71_chaire-ml-big-data-premieres-
rencontres_school
https://www.youtube.com/user/TelecomParisTech1/search?query=big+data
An
Architecture
for
Fast
and
General
Data
Processing
on
Large
Clusters
by
Matei
Zaharia,
2014
The
past
few
years
have
seen
a
major
change
in
computing
systems,
as
growing
data
volumes
and
stalling
processor
speeds
require
more
and
more
applications
to
scale
out
to
distributed
systems.
Today,
a
myriad
data
sources,
from
the
Internet
to
business
operations
to
scientific
instruments,
produce
large
and
valuable
data
streams.
However,
the
processing
capabilities
of
single
machines
have
not
kept
up
with
the
size
of
data,
making
it
harder
and
harder
to
put
to
use.
As
a
result,
a
grow-
ing
number
of
organizationsnot
just
web
companies,
but
traditional
enterprises
and
research
labsneed
to
scale
out
their
most
important
computations
to
clusters
of
hundreds
of
machines.
68
At
the
same
time,
the
speed
and
sophistication
required
of
data
processing
have
grown.
In
addition
to
simple
queries,
complex
algorithms
like
machine
learning
and
graph
analysis
are
becoming
common
in
many
domains.
And
in
addition
to
batch
processing,
streaming
analysis
of
new
real-time
data
sources
is
required
to
let
organizations
take
timely
action.
Future
computing
platforms
will
need
to
not
only
scale
out
traditional
workloads,
but
support
these
new
applications
as
well.
This
dissertation
proposes
an
architecture
for
cluster
computing
systems
that
can
tackle
emerging
data
processing
workloads
while
coping
with
larger
and
larger
scales.
Whereas
early
cluster
computing
systems,
like
MapReduce,
handled
batch
processing,
our
architecture
also
enables
streaming
and
interactive
queries,
while
keeping
the
scalability
and
fault
tolerance
of
previous
systems.
And
whereas
most
deployed
systems
only
support
simple
one-pass
computations
(e.g.,
aggregation
or
SQL
queries),
ours
also
extends
to
the
multi-pass
algorithms
required
for
more
complex
analytics
(e.g.,
iterative
algorithms
for
machine
learning).
Finally,
unlike
the
specialized
systems
proposed
for
some
of
these
workloads,
our
architecture
allows
these
computations
to
be
combined,
enabling
rich
new
applications
that
intermix,
for
example,
streaming
and
batch
processing,
or
SQL
and
complex
analytics.
We
achieve
these
results
through
a
simple
extension
to
MapReduce
that
adds
primitives
for
data
sharing,
called
Resilient
Distributed
Datasets
(RDDs).
We
show
that
this
is
enough
to
efficiently
capture
a
wide
range
of
workloads.
We
implement
RDDs
in
the
open
source
Spark
system,
which
we
evaluate
using
both
synthetic
benchmarks
and
real
user
applications.
Spark
matches
or
exceeds
the
performance
of
specialized
systems
in
many
application
domains,
while
offering
stronger
fault
tolerance
guarantees
and
allowing
these
workloads
to
be
combined.
We
explore
the
generality
of
RDDs
from
both
a
theoretical
modeling
perspective
and
a
practical
perspective
to
see
why
this
extension
can
capture
a
wide
range
of
previously
disparate
workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
69
for
members
of
the
global
workforce,
so
dream
big.
Proposals
will
be
evaluated
on
a
combination
of
three
factors,
all
equally
weighted:
Novelty:
Takes
into
account
the
thoughtfulness
and
originality
of
the
entry,
including
its
unique
approach
to
taking
advantage
of
data
from
the
Economic
Graph.
Impact:
Considers
the
potential
benefits
to
the
region,
country
and
the
world,
as
well
as
the
extensibility
of
the
proposal.
Feasibility:
This
criterion
will
weigh
the
practicality
of
the
submission,
measuring
the
likelihood
it
can
be
researched
and
implemented
within
a
reasonable
time
period
and
the
types
of
data
from
LinkedIn
that
will
be
necessary
for
the
proposed
research.
A
diverse
panel
of
judges
will
evaluate
and
select
winning
proposals.
Research
Award
Recipients
LinkedIn
will
select
up
to
three
proposals
as
winners
of
the
LinkedIn
Economic
Graph
Challenge.
Selected
winners
will
be
notified
in
early
2015.
Each
winning
submission
will
receive:
A
one-time
$25,000
(USD)
research
award.
Round-trip
travel
and
accommodations
to
LinkedIn
headquarters
in
Mountain
View,
CA
to
participate
in
the
LinkedIn
Economic
Challenge
Research
Reception
(early
2015)
and
Final
Presentation
(Fall
2015).
The
potential
to
receive
research
resources
to
execute
proposal
including
a
LinkedIn
employee
collaborator,
access
to
select
data
from
LinkedIn,
and
equipment
for
use
during
the
six
month
research
period.
Research
award
recipients
will
have
six
months
to
conduct
their
research,
and
will
return
to
Mountain
View,
CA,
for
a
final
presentation
in
Fall
2015.
Research
award
recipients
must
sign
agreements
covering
intellectual
property
and
non-disclosure
of
information,
and
may
not
publish
results
without
written
consent
from
LinkedIn
Corporation.
http://economicgraphchallenge.linkedin.com/details/
ChaLearn
Added
in
the
kit
before
24-Oct-2014
Mission:
Machine
Learning
is
the
science
of
building
hardware
or
software
that
can
achieve
tasks
by
learning
from
examples.
The
examples
often
come
as
{input,
output}
pairs.
Given
new
inputs
a
trained
machine
can
make
predictions
of
the
unknown
output.
Examples
of
machine
learning
tasks
include:
automatic
reading
of
handwriting
assisted
medical
diagnosis
automatic
text
classification
(classification
of
web
pages;
spam
filtering)
financial
predictions
We
organize
challenges
to
stimulate
research
in
this
field.
The
web
sites
of
past
challenges
remain
open
for
post-challenge
submission
as
ever-going
benchmarks.
ChaLearn
is
a
tax-exempt
organization
under
section
501(c)(3)
of
the
US
IRS
code.
DLN:
17053090370022.
http://www.chalearn.org
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
70
ILSVRC
2013
ILSVRC
2012
ILSVRC
2011
ILSVRC
2010
http://image-net.org/challenges/LSVRC/2014/
Kaggle
Added
in
the
kit
before
24-Oct-2014
Kaggle
is
the
world's
largest
community
of
data
scientists.
They
compete
with
each
other
to
solve
complex
data
science
problems,
and
the
top
competitors
are
invited
to
work
on
the
most
interesting
and
sensitive
business
problems
from
some
of
the
worlds
biggest
companies
through
Masters
competitions.
http://www.kaggle.com/competitions
Kaggle
Competition
Past
Solutions
Added
in
the
kit
before
24-Oct-2014
We
learn
more
from
code,
and
from
great
code.
Not
necessarily
always
the
1st
ranking
solution,
because
we
also
learn
what
makes
a
stellar
and
just
a
good
solution.
I
will
post
solutions
I
came
upon
so
we
can
all
learn
to
become
better!
I
collected
the
following
source
code
and
interesting
discussions
from
the
Kaggle
held
competitions
for
learning
purposes.
Not
all
competitions
are
listed
because
I
am
only
manually
collecting
them,
also
some
competitions
are
not
listed
due
to
no
one
sharing.
I
will
add
more
as
time
goes
by.
Thank
you.
http://www.chioka.in/kaggle-competition-solutions/
71
TEDx
San
Francisco,
Jeremy
Howard
talk
(Connecting
Devices
with
Algorithms)
Added
in
the
kit
before
24-oct-2014
http://tedxsf.org/videos/#tedxsf-connected-reality
CrowdANALYTICS
Added
in
the
kit
before
24-oct-2014
https://crowdanalytix.com/jq/solver.html
Challenges
for
governmental
applications
Added
in
the
kit
before
24-oct-2014
https://challenge.gov
InnoCentive
Challenge
Center
Added
in
the
kit
before
24-oct-2014
https://www.innocentive.com/ar/challenge/browse
TunedIT
Added
in
the
kit
before
24-oct-2014
http://tunedit.org
Ants,
AI
Challenge,
sponsored
by
Google,
2011
Added
in
the
kit
before
24-oct-2014
The
AI
Challenge
is
all
about
creating
artificial
intelligence,
whether
you
are
a
beginning
programmer
or
an
expert.
Using
one
of
the
easy-to-use
starter
kits,
you
will
create
a
computer
program
(in
any
language)
that
controls
a
colony
of
ants
which
fight
against
other
colonies
for
domination.
http://ants.aichallenge.org
International
Collegial
Programming
Contest
Added
in
the
kit
before
24-oct-2014
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
72
73
The
focus
of
the
Internet
of
Things
(IoT)
Innovation
Grand
Challenge
is
to
spearhead
an
industry-wide
initiative
to
accelerate
the
adoption
of
breakthrough
technologies
and
products
that
will
contribute
to
the
growth
and
evolution
of
the
Internet
of
Things.
This
global
open
competition
aims
to
recognize,
promote
and
reward
innovators,
entrepreneurs
and
early-stage
startup
businesses
that
can
help
us
transform
businesses
and
industries
by
re-inventing
business
processes,
operational
efficiencies
and
customer
service
innovations.
We
are
seeking
submissions
from
early
stage
businesses
and
teams
that
have
technology-based
prototypes
and
proof
of
concepts
(PoC)
in
development.
https://iotchallenge.cisco.spigit.com/Page/Home
74
75
76
77
78
79
We
have
various
types
of
data
available
to
share.
They
are
categorized
into
Ratings,
Language,
Graph,
Advertising
and
Market
Data,
Computing
Systems
and
an
appendix
of
other
relevant
data
and
resources
available
via
the
Yahoo!
Developer
Network.
http://webscope.sandbox.yahoo.com/catalog.php
Windows
Azure
Marketplace
Added
in
the
kit
before
24-oct-2014
One-Stop
Shop
for
Premium
Data
and
Applications
Hundreds
of
Apps,
Thousands
of
Subscriptions,
Trillions
of
Data
Points
https://datamarket.azure.com/browse/data?price=free
Amazon
Public
Data
Sets
Added
in
the
kit
before
24-oct-2014
Public
Data
Sets
on
AWS
provides
a
centralized
repository
of
public
data
sets
that
can
be
seamlessly
integrated
into
AWS
cloud-based
applications.
AWS
is
hosting
the
public
data
sets
at
no
charge
for
the
community,
and
like
all
AWS
services,
users
pay
only
for
the
compute
and
storage
they
use
for
their
own
applications.
Learn
more
about
Public
Data
Sets
on
AWS
and
visit
the
Public
Data
Sets
forum.
http://aws.amazon.com/datasets/
Wikipedia:
Database
Download
Added
in
the
kit
before
24-oct-2014
Wikipedia
offers
free
copies
of
all
available
content
to
interested
users.
These
databases
can
be
used
for
mirroring,
personal
use,
informal
backups,
offline
use
or
database
queries
(such
as
for
Wikipedia:Maintenance).
All
text
content
is
multi-
licensed
under
the
Creative
Commons
Attribution-ShareAlike
3.0
License
(CC-BY-
SA)
and
the
GNU
Free
Documentation
License
(GFDL).
Images
and
other
files
are
available
under
different
terms,
as
detailed
on
their
description
pages.
For
our
advice
about
complying
with
these
licenses,
see
Wikipedia:Copyrights.
http://en.wikipedia.org/wiki/Wikipedia:Database_download
Gutenberg
project
(Free
books
available
in
different
format,
useful
for
NLP)
Added
in
the
kit
before
24-oct-2014
Project
Gutenberg
offers
45,541
free
ebooks
to
download.
(source
the
5th
June
2014)
http://www.gutenberg.org/ebooks/search/?sort_order=downloads
Freebase
Added
in
the
kit
before
24-oct-2014
Use
Freebase
data
Freebase
data
is
free
to
use
under
an
open
license.
You
can:
Query
Freebase
using
our
Search,
Topic,
or
MQL
APIs
Download
our
weekly
data
dumps
http://www.freebase.com
Datamob
Data
Added
in
the
kit
before
24-oct-2014
http://datamob.org/datasets
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
80
Reddit
Datasets
Added
in
the
kit
before
24-oct-2014
http://www.reddit.com/r/datasets/
100+
Interesting
Data
Sets
for
Statistics
Added
in
the
kit
before
24-oct-2014
Summary:
Looking
for
interesting
data
sets?
Here's
a
list
of
more
than
100
of
the
best
stuff,
from
dolphin
relationships
to
political
campaign
donations
to
death
row
prisoners.
http://rs.io/2014/05/29/list-of-data-sets.html
Data
portal
of
the
City
of
Chicago
Added
in
the
kit
before
24-oct-2014
https://data.cityofchicago.org/browse?limitTo=datasets&utf8=
Remark:
you
need
to
copy
the
following
link
in
your
browser,
temporary
problem
Gold
mine
where
we
can
find
data
set
such
as
names,
salaries,
positions
of
all
persons
working
for
Chicago
City!
https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-
Salaries-and-Position-Title/xzkq-xp2w
Data
portal
of
the
City
of
Seattle
Added
in
the
kit
before
24-oct-2014
https://data.seattle.gov/browse
Data
portal
of
the
City
of
LA
Added
in
the
kit
before
24-oct-2014
https://data.lacity.org/browse?limitTo=datasets&utf8=
Remark:
you
need
to
copy
the
following
link
in
your
browser,
temporary
problem
California
Department
of
Water
Resources
Added
in
the
kit
27-oct-2014
DWR
has
many
programs
and
data
tools
to
collect
and
disseminate
information
on
water
resources.
All
Water
Data
Topics
http://www.water.ca.gov/nav/nav.cfm?loc=t&id=106
CALIFORNIA
DATA
EXCHANGE
CENTER
(CDEC)
With
the
cooperation
of
over
140
other
agencies,
the
CDEC
provides
real-time,
forecast,
and
historical
hydrologic
data.
This
data
includes
water
discharge
in
rivers,
water
storage
in
reservoirs,
precipitation
accumulation,
and
water
content
in
snow
pack,
primarily
focused
in
flood
management.
However,
the
data
is
also
helpful
for
determining
general
water
availability
and
natural
supply
trends.
More
about
CDEC
http://cdec.water.ca.gov/
CALIFORNIA
IRRIGATION
MANAGEMENT
INFORMATION
SYSTEM
(CIMIS)
CIMIS
is
a
network
of
over
120
automated
weather
stations
in
California.
CIMIS
was
developed
in
1982
by
DWR
and
the
University
of
California,
Davis
to
assist
California's
irrigators
to
manage
their
water
resources
efficiently.
More
about
CIMIS
http://wwwcimis.water.ca.gov/cimis/welcome.jsp
WATER
DATA
LIBRARY
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
81
Finding
the
perfect
house
using
open
data,
Justin
Palmers
Blog
Added
in
the
kit
before
24-oct-2014
http://dealloc.me/2014/05/24/opendata-house-hunting/
Synapse
Added
in
the
kit
before
24-oct-2014
A
private
or
public
workspace
that
allows
you
to
aggregate,
describe,
and
share
your
research.
A
tool
to
improve
reproducibility
of
data
intensive
science,
recording
progress
as
you
work
with
tools
such
as
R
and
Python.
A
set
of
living
research
projects
enabling
contribution
to
large-scale
collaborative
solutions
to
scientific
problems.
https://www.synapse.org
NYC
Taxi
Trips
Date
from
2013
Added
in
the
kit
before
24-oct-2014
These
data
were
made
publicly
available
thanks
to
Chris
Whong
who
did
the
heavy
lifting.
He
is
also
providing
links
to
a
bittorrent
where
the
data
can
be
downloaded
much
faster.
Read
more
about
it
here.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
82
http://www.andresmh.com/nyctaxitrips/
Sebastian
Raschkas
Dataset
Collections
Added
in
the
kit
before
24-oct-2014
https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md
83
84
Lamda
Group
Data
Image
Data
For
Multi-Instance
Multi-Label
Learning
MDDM
Data
for
for
multi-label
dimensionality
reduction.
Text
Data
for
Multi-Instance
Learning
MILWEB
Data
for
Multi-Instance
Learning
Based
Web
Index
Recommendation.
SGBDota
Data
for
the
PCES
(Positive
Concept
Expansion
with
Single
snapshot)
problem.
Single
Face
Dataset
Data
for
Face
Recognition
with
One
Training
Image
per
Person.
Text
Data
For
Multi-Instance
Multi-Label
Learning
http://lamda.nju.edu.cn/Data.ashx
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
85
Data
Visualisation
Visualization
Lab
Gallery,
Computer
Science
Division,
University
of
California,
Berkeley
Added
the
15-Nov-2014
CS
294-10
Fall
'14
Visualization
Instructors:
Maneesh
Agrawala
and
Jessica
Hullman
Course
Wiki
CS
160
Spring
'14
User
Interface
Design
Instructor:
Maneesh
Agrawala
and
Bjoern
Hartmann
TAs:
Brittany
Cheng,
Steve
Rubin,
and
Eric
Xiao
Course
Wiki
CS
294-10
Fall
'13
Visualization
Instructor:
Maneesh
Agrawala
Course
Wiki
CS
160
Spring
'12
User
Interface
Design
Instructor:
Maneesh
Agrawala
TAs:
Nicholas
Kong,
Anuj
Tewari
Course
Wiki
CS
294-69
Fall
'11
Image
Manipulation
and
Computational
Photography
Instructor:
Maneesh
Agrawala
TA:
Floraine
Berthouzoz
Course
Wiki
CS
294-10
Spring
'11
Visualization
Instructor:
Maneesh
Agrawala
Course
Wiki
CS
184
Fall
'10
Computer
Graphics
Instructor:
Maneesh
Agrawala
TAs:
Robert
Carroll,
Fu-Chung
Huang
Course
Wiki
CS
160
Spring
'10
User
Interface
Instructors:
Bjoern
Hartmann,
Maneesh
Agrawala
TAs:
Kenrick
Kin,
Anuj
Tewari
Course
Wiki
CS
294-10
Spring
'10
Visualization
Instructor:
Maneesh
Agrawala
Course
Wiki
CS
160
Spring
'09
User
Interfaces
Instructors:
Maneesh
Agrawala,
Jeffrey
Nichols
TAs:
Nicholas
Kong
Course
Wiki
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
86
87
D3
JS
Data-Driven
Documents
D3.js
is
a
JavaScript
library
for
manipulating
documents
based
on
data.
D3
helps
you
bring
data
to
life
using
HTML,
SVG
and
CSS.
D3s
emphasis
on
web
standards
gives
you
the
full
capabilities
of
modern
browsers
without
tying
yourself
to
a
proprietary
framework,
combining
powerful
visualization
components
and
a
data-driven
approach
to
DOM
manipulation.
http://d3js.org
Shan
He,
Research
Fellow
at
MIT
Senseable
City
Lab
Shan
He
is
research
fellow
at
MIT
Senseable
City
Lab.
She
is
an
architect
and
a
computational
design
specialist.
She
is
currently
a
student
at
MIT
Department
of
Architecture
pursuing
her
SMArchS
in
Design
and
Computation.
At
Senseable,
her
focus
is
on
data
visualization,
interactive
design
and
web
application.
Prior
to
coming
to
MIT
she
worked
as
a
product
designer
for
Blu
Homes
where
she
worked
on
developing
an
online
3-D
customization
tool
with
intellectual
property.
During
her
time
at
MIT
she
has
worked
as
a
research
assistant
for
the
Clean
Energy
City
Lab
at
the
Advanced
Urbanism
Center
and
also
for
the
Mobile
Experience
Lab
at
the
CMS.
Shan
holds
a
B.Arch
from
Tsinghua
University
in
China
and
a
M.Arch
from
University
of
Michigan,
Ann
Arbor.
http://cargocollective.com/shanhe/About-Shan-He
Gource
software
version
control
visualization
Software
projects
are
displayed
by
Gource
as
an
animated
tree
with
the
root
directory
of
the
project
at
its
centre.
Directories
appear
as
branches
with
files
as
leaves.
Developers
can
be
seen
working
on
the
tree
at
the
times
they
contributed
to
the
project.
https://www.youtube.com/watch?v=NjUuAuBcoqs#t=73
https://code.google.com/p/gource/
Logstalgia,
website
access
log
visualization
Logstalgia
(aka
ApachePong)
is
a
website
access
log
visualization
tool.
https://code.google.com/p/logstalgia/
Andrew
Caudwell's
Blog
Andrew
Caudwell
is
a
software
developer
and
sometimes
computer
graphics
programmer/artist
located
in
Wellington,
New
Zealand.
He
is
probably
best
known
through
his
work
as
the
author
of
several
popular
data
visualizations:
Logstalgia
(aka
Apache
Pong)
a
visualization
of
website
traffic
as
a
pong-like
game
Gource
a
force-directed
layout
software
version
control
visualization
This
blog
is
a
collection
of
his
work,
experiments,
thoughts
and
ideas
on
procedurally
generated
computer
graphics
and
animation.
http://www.thealphablenders.com
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
88
Books
English
An
Architecture
for
Fast
and
General
Data
Processing
on
Large
Clusters
by
Matei
Zaharia,
2014
The
past
few
years
have
seen
a
major
change
in
computing
systems,
as
growing
data
volumes
and
stalling
processor
speeds
require
more
and
more
applications
to
scale
out
to
distributed
systems.
Today,
a
myriad
data
sources,
from
the
Internet
to
business
operations
to
scientific
instruments,
produce
large
and
valuable
data
streams.
However,
the
processing
capabilities
of
single
machines
have
not
kept
up
with
the
size
of
data,
making
it
harder
and
harder
to
put
to
use.
As
a
result,
a
grow-
ing
number
of
organizationsnot
just
web
companies,
but
traditional
enterprises
and
research
labsneed
to
scale
out
their
most
important
computations
to
clusters
of
hundreds
of
machines.
At
the
same
time,
the
speed
and
sophistication
required
of
data
processing
have
grown.
In
addition
to
simple
queries,
complex
algorithms
like
machine
learning
and
graph
analysis
are
becoming
common
in
many
domains.
And
in
addition
to
batch
processing,
streaming
analysis
of
new
real-time
data
sources
is
required
to
let
organizations
take
timely
action.
Future
computing
platforms
will
need
to
not
only
scale
out
traditional
workloads,
but
support
these
new
applications
as
well.
This
dissertation
proposes
an
architecture
for
cluster
computing
systems
that
can
tackle
emerging
data
processing
workloads
while
coping
with
larger
and
larger
scales.
Whereas
early
cluster
computing
systems,
like
MapReduce,
handled
batch
processing,
our
architecture
also
enables
streaming
and
interactive
queries,
while
keeping
the
scalability
and
fault
tolerance
of
previous
systems.
And
whereas
most
deployed
systems
only
support
simple
one-pass
computations
(e.g.,
aggregation
or
SQL
queries),
ours
also
extends
to
the
multi-pass
algorithms
required
for
more
complex
analytics
(e.g.,
iterative
algorithms
for
machine
learning).
Finally,
unlike
the
specialized
systems
proposed
for
some
of
these
workloads,
our
architecture
allows
these
computations
to
be
combined,
enabling
rich
new
applications
that
intermix,
for
example,
streaming
and
batch
processing,
or
SQL
and
complex
analytics.
We
achieve
these
results
through
a
simple
extension
to
MapReduce
that
adds
primitives
for
data
sharing,
called
Resilient
Distributed
Datasets
(RDDs).
We
show
that
this
is
enough
to
efficiently
capture
a
wide
range
of
workloads.
We
implement
RDDs
in
the
open
source
Spark
system,
which
we
evaluate
using
both
synthetic
benchmarks
and
real
user
applications.
Spark
matches
or
exceeds
the
performance
of
specialized
systems
in
many
application
domains,
while
offering
stronger
fault
tolerance
guarantees
and
allowing
these
workloads
to
be
combined.
We
explore
the
generality
of
RDDs
from
both
a
theoretical
modeling
perspective
and
a
practical
perspective
to
see
why
this
extension
can
capture
a
wide
range
of
previously
disparate
workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
89
90
Building
towards
including
the
mcRBM
model,
we
have
a
new
tutorial
on
sampling
from
energy
models:
HMC
Sampling
-
hybrid
(aka
Hamiltonian)
Monte-Carlo
sampling
with
scan()
Building
towards
including
the
Contractive
auto-encoders
tutorial,
we
have
the
code
for
now:
Contractive
auto-encoders
code
-
There
is
some
basic
doc
in
the
code.
Energy-based
recurrent
neural
network
(RNN-RBM):
Modeling
and
generating
sequences
of
polyphonic
music
http://deeplearning.net/tutorial/deeplearning.pdf
Statistical
Inference
for
Everyone,
by
Professor
Bryan
Blais,
2014
This
is
a
new
approach
to
an
introductory
statistical
inference
textbook,
motivated
by
probability
theory
as
logic.
It
is
targeted
to
the
typical
Statistics
101
college
student,
and
covers
the
topics
typically
covered
in
the
first
semester
of
such
a
course.
It
is
freely
available
under
the
Creative
Commons
License,
and
includes
a
software
library
in
Python
for
making
some
of
the
calculations
and
visualizations
easier.
I
am
a
professor
of
Science
and
Technology,
Bryant
University
and
a
research
professor
at
the
Institute
for
Brain
and
Neural
Systems,
Brown
University.
My
interests
include
Theoretical
Neuroscience
learning
and
memory
in
neural
systems
vision
spike-timing
dependent
plasticity
Bayesian
Inference
frequentist
versus
Bayesian
statistics
Bayesian
approaches
to
learning
and
memory
Digital
to
Analog
Computer
Control
autonomous
experiments
neural
networks
and
robotics
Global
Resources
Dynamics
of
global
resources
and
economics
Population
growth,
Malthusian
traps,
and
energy
http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html
Mining
of
Massive
Datasets
by
Jure
Leskovec,
Anand
Rajaraman,
Jeff
Ullman,
2014
The
book
The
book
is
based
on
Stanford
Computer
Science
course
CS246:
Mining
Massive
Datasets
(and
CS345A:
Data
Mining).
The
book,
like
the
course,
is
designed
at
the
undergraduate
computer
science
level
with
no
formal
prerequisites.
To
support
deeper
explorations,
most
of
the
chapters
are
supplemented
with
further
reading
references.
The
Mining
of
Massive
Datasets
book
has
been
published
by
Cambridge
University
Press.
You
can
get
20%
discount
here.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
91
By
agreement
with
the
publisher,
you
can
download
the
book
for
free
from
this
page.
Cambridge
University
Press
does,
however,
retain
copyright
on
the
work,
and
we
expect
that
you
will
obtain
their
permission
and
acknowledge
our
authorship
if
you
republish
parts
or
all
of
it.
We
are
sorry
to
have
to
mention
this
point,
but
we
have
evidence
that
other
items
we
have
published
on
the
Web
have
been
appropriated
and
republished
under
other
names.
It
is
easy
to
detect
such
misuse,
by
the
way,
as
you
will
learn
in
Chapter
3.
We
welcome
your
feedback
on
the
manuscript.
The
2nd
edition
of
the
book
(v2.1)
The
following
is
the
second
edition
of
the
book.
There
are
three
new
chapters,
on
mining
large
graphs,
dimensionality
reduction,
and
machine
learning.
There
is
also
a
revised
Chapter
2
that
treats
map-reduce
programming
in
a
manner
closer
to
how
it
is
used
in
practice.
Together
with
each
chapter
there
is
aslo
a
set
of
lecture
slides
that
we
use
for
teaching
Stanford
CS246:
Mining
Massive
Datasets
course.
Note
that
the
slides
do
not
necessarily
cover
all
the
material
convered
in
the
corresponding
chapters.
Download
the
latest
version
of
the
book
as
a
single
big
PDF
file
(511
pages,
3
MB).
Note
to
the
users
of
provided
slides:
We
would
be
delighted
if
you
found
this
our
material
useful
in
giving
your
own
lectures.
Feel
free
to
use
these
slides
verbatim,
or
to
modify
them
to
fit
your
own
needs.
PowerPoint
originals
are
available.
If
you
make
use
of
a
significant
portion
of
these
slides
in
your
own
lecture,
please
include
this
message,
or
a
link
to
our
web
site:
http://www.mmds.org/.
Comments
and
corrections
are
most
welcome.
Please
let
us
know
if
you
are
using
these
materials
in
your
course
and
we
will
list
and
link
to
your
course.
http://infolab.stanford.edu/~ullman/mmds/book.pdf
Social
Media
Mining
by
Reza
Zafarani,
Mohammad
Ali
Abbasi,
Huan
Liu,
2014
Added
in
the
kit
29-oct-2014
The
growth
of
social
media
over
the
last
decade
has
revolutionized
the
way
individuals
interact
and
industries
conduct
business.
Individuals
produce
data
at
an
unprecedented
rate
by
interacting,
sharing,
and
consuming
content
through
social
media.
Understanding
and
processing
this
new
type
of
data
to
glean
actionable
patterns
presents
challenges
and
opportunities
for
interdisciplinary
research,
novel
algorithms,
and
tool
development.
Social
Media
Mining
integrates
social
media,
social
network
analysis,
and
data
mining
to
provide
a
convenient
and
coherent
platform
for
students,
practitioners,
researchers,
and
project
managers
to
understand
the
basics
and
potentials
of
social
media
mining.
It
introduces
the
unique
problems
arising
from
social
media
data
and
presents
fundamental
concepts,
emerging
issues,
and
effective
algorithms
for
network
analysis
and
data
mining.
Suitable
for
use
in
advanced
undergraduate
and
beginning
graduate
courses
as
well
as
professional
short
courses,
the
text
contains
exercises
of
different
degrees
of
difficulty
that
improve
understanding
and
help
apply
concepts,
principles,
and
methods
in
various
scenarios
of
social
media
mining.
http://dmml.asu.edu/smm/book/
Slides
http://dmml.asu.edu/smm/slides/
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
92
Causal
Inference
by
Miguel
A.
Hernn
and
James
M.
Robins,
May
14,
2014,
Draft
Added
in
the
kit
29-oct-2014
The
book
provides
a
cohesive
presentation
of
concepts
of,
and
methods
for,
causal
inference.
Much
of
this
material
is
currently
scattered
across
journals
in
several
disciplines
or
confined
to
technical
articles.
We
expect
that
the
book
will
be
of
interest
to
anyone
interested
in
causal
inference,
e.g.,
epidemiologists,
statisticians,
psychologists,
economists,
sociologists,
other
social
scientists
The
book
is
geared
towards
graduate
students
and
practitioners.
We
have
divided
the
book
in
3
parts
of
increasing
difficulty:
causal
inference
without
models,
causal
inference
with
models,
and
causal
inference
from
complex
longitudinal
data.
We
will
make
drafts
of
selected
book
sections
available
on
this
website.
The
idea
is
that
interested
readers
can
submit
suggestions
or
criticisms
before
the
book
is
published.
If
you
wish
to
share
any
comments,
please
email
me
or
visit
us
on
Facebook
(user
causalinference).
Warning:
These
documents
are
drafts.
We
are
constantly
revising
and
correcting
errors
without
documenting
the
changes.
Please
make
sure
you
use
the
most
updated
version
posted
here.
http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Slides
for
High
Performance
Python
tutorial
at
EuroSciPy2014
by
Ian
Ozsvald
Added
in
the
kit
29-oct-2014
This
is
Ian
Ozsvald's
blog,
I'm
an
entrepreneurial
geek,
a
Data
Science/ML/NLP/AI
consultant,
founder
of
the
Annotate.io
social
media
mining
API,
author
of
O'Reilly's
High
Performance
Python
book,
co-organiser
of
PyDataLondon,
co-founder
of
the
SocialTies
App,
author
of
the
A.I.Cookbook,
author
of
The
Screencasting
Handbook,
a
Pythonista,
co-founder
of
ShowMeDo
and
FivePoundApps
and
also
a
Londoner.
Here's
a
little
more
about
me.
https://github.com/ianozsvald/euroscipy2014_highperformancepython
http://ianozsvald.com/2014/08/30/slides-for-high-performance-python-tutorial-
at-euroscipy2014-book-signing/
Neural
Networks
and
Deep
Learning,
2014
Added
in
the
kit
before
24-oct-2014
Neural
Networks
and
Deep
Learning
is
a
free
online
book.
The
book
will
teach
you
about:
Neural
networks,
a
beautiful
biologically-inspired
programming
paradigm
which
enables
a
computer
to
learn
from
observational
data
Deep
learning,
a
powerful
set
of
techniques
for
learning
in
neural
networks
Neural
networks
and
deep
learning
currently
provide
the
best
solutions
to
many
problems
in
image
recognition,
speech
recognition,
and
natural
language
processing.
This
book
will
teach
you
the
core
concepts
behind
neural
networks
and
deep
learning.
The
book
is
currently
an
incomplete
beta
draft.
More
chapters
will
be
added
over
the
coming
months.
For
now,
you
can:
Read
Chapter
1,
which
explains
how
neural
networks
can
learn
to
recognize
handwriting
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
93
94
95
may
seem
notoriously
difficult
to
understand.
Dont
get
me
wrong,
the
information
in
those
books
is
extremely
important.
However,
if
you
are
a
programmer
interested
in
learning
a
bit
about
data
mining
you
might
be
interested
in
a
beginners
hands-on
guide
as
a
first
step.
Thats
what
this
book
provides.
This
guide
follows
a
learn-by-doing
approach.
Instead
of
passively
reading
the
book,
I
encourage
you
to
work
through
the
exercises
and
experiment
with
the
Python
code
I
provide.
I
hope
you
will
be
actively
involved
in
trying
out
and
programming
data
mining
techniques.
The
textbook
is
laid
out
as
a
series
of
small
steps
that
build
on
each
other
until,
by
the
time
you
complete
the
book,
you
have
laid
the
foundation
for
understanding
data
mining
techniques.
This
book
is
available
for
download
for
free
under
a
Creative
Commons
license
(see
link
in
footer).
You
are
free
to
share
the
book,
and
remix
it.
Someday
I
may
offer
a
paper
copy,
but
the
online
version
will
always
be
free.
http://guidetodatamining.com
Artificial
Intelligence,
Foundations
of
Computational
Agents
by
David
Poole
and
Alan
Mackworth,
2010
Added
in
the
kit
before
24-oct-2014
Artificial
Intelligence:
Foundations
of
Computational
Agents
is
a
book
about
the
science
of
artificial
intelligence
(AI).
The
view
we
take
is
that
AI
is
the
study
of
the
design
of
intelligent
computational
agents.
The
book
is
structured
as
a
textbook
but
it
is
designed
to
be
accessible
to
a
wide
audience.
We
wrote
this
book
because
we
are
excited
about
the
emergence
of
AI
as
an
integrated
science.
As
with
any
science
worth
its
salt,
AI
has
a
coherent,
formal
theory
and
a
rambunctious
experimental
wing.
Here
we
balance
theory
and
experiment
and
show
how
to
link
them
intimately
together.
We
develop
the
science
of
AI
together
with
its
engineering
applications.
We
believe
the
adage,
"There
is
nothing
so
practical
as
a
good
theory."
The
spirit
of
our
approach
is
captured
by
the
dictum,
"Everything
should
be
made
as
simple
as
possible,
but
not
simpler."
We
must
build
the
science
on
solid
foundations;
we
present
the
foundations,
but
only
sketch,
and
give
some
examples
of,
the
complexity
required
to
build
useful
intelligent
systems.
Although
the
resulting
systems
the
will
be
complex,
the
foundations
and
the
building
blocks
should
be
simple.
http://artint.info/html/ArtInt.html
The
Elements
of
Statistical
Learning,
T.
Hastie,
R.
Tibshirani,
and
J.
Friedman,
2009
Added
in
the
kit
before
24-oct-2014
During
the
past
decade
has
been
an
explosion
in
computation
and
information
technology.
With
it
has
come
vast
amounts
of
data
in
a
variety
of
fields
such
as
medicine,
biology,
finance,
and
marketing.
The
challenge
of
understanding
these
data
has
led
to
the
development
of
new
tools
in
the
field
of
statistics,
and
spawned
new
areas
such
as
data
mining,
machine
learning,
and
bioinformatics.
Many
of
these
tools
have
common
underpinnings
but
are
often
expressed
with
different
terminology.
This
book
descibes
the
important
ideas
in
these
areas
in
a
common
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
96
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
97
Chapters
921
build
on
the
foundation
of
the
first
eight
chapters
to
cover
a
variety
of
more
advanced
topics.
http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf
http://www-nlp.stanford.edu/IR-book/
Kernel
Method
in
Machine
Learning
by
Thomas Hofmann; Bernhard Schlkopf;
Alexander J. Smola,
2008
Added
in
the
kit
before
24-oct-2014
We
review
machine
learning
methods
employing
positive
definite
kernels.
These
methods
formulate
learning
and
estimation
problems
in
a
reproducing
kernel
Hilbert
space
(RKHS)
of
functions
defined
on
the
data
domain,
expanded
in
terms
of
a
kernel.
Working
in
linear
spaces
of
function
has
the
benefit
of
facilitating
the
construction
and
analysis
of
learning
algorithms
while
at
the
same
time
allowing
large
classes
of
functions.
The
latter
include
nonlinear
functions
as
well
as
functions
defined
on
nonvectorial
data.
We
cover
a
wide
range
of
methods,
ranging
from
binary
classifiers
to
sophisticated
methods
for
estimation
with
structured
data.
https://archive.org/details/arxiv-math0701907
Introduction
to
Machine
Learning,
Alex
Smola,
S.V.N. Vishwanathan, 2008
Added
in
the
kit
before
24-oct-2014
Over
the
past
two
decades
Machine
Learning
has
become
one
of
the
main-
stays
of
information
technology
and
with
that,
a
rather
central,
albeit
usually
hidden,
part
of
our
life.
With
the
ever
increasing
amounts
of
data
becoming
available
there
is
good
reason
to
believe
that
smart
data
analysis
will
become
even
more
pervasive
as
a
necessary
ingredient
for
technological
progress.
The
purpose
of
this
chapter
is
to
provide
the
reader
with
an
overview
over
the
vast
range
of
applications
which
have
at
their
heart
a
machine
learning
problem
and
to
bring
some
degree
of
order
to
the
zoo
of
problems.
After
that,
we
will
discuss
some
basic
tools
from
statistics
and
probability
theory,
since
they
form
the
language
in
which
many
machine
learning
problems
must
be
phrased
to
become
amenable
to
solving.
Finally,
we
will
outline
a
set
of
fairly
basic
yet
effective
algorithms
to
solve
an
important
problem,
namely
that
of
classification.
More
sophisticated
tools,
a
discussion
of
more
general
problems
and
a
detailed
analysis
will
follow
in
later
parts
of
the
book.
http://alex.smola.org/drafts/thebook.pdf
Pattern
Recognition
and
Machine
Learning,
Christopher
M.
Bishop,
2006
Added
in
the
kit
before
24-oct-2014
Pattern
recognition
has
its
origins
in
engineering,
whereas
machine
learning
grew
out
of
computer
science.
However,
these
activities
can
be
viewed
as
two
facets
of
the
same
field,
and
together
they
have
undergone
substantial
development
over
the
past
ten
years.
In
particular,
Bayesian
methods
have
grown
from
a
specialist
niche
to
become
mainstream,
while
graphical
models
have
emerged
as
a
general
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
98
framework
for
describing
and
applying
probabilistic
models.
Also,
the
practical
applicability
of
Bayesian
methods
has
been
greatly
enhanced
through
the
development
of
a
range
of
approximate
inference
algorithms
such
as
variational
Bayes
and
expectation
propa-
gation.
Similarly,
new
models
based
on
kernels
have
had
significant
impact
on
both
algorithms
and
applications.
Chapter
8
Graphical
Models
Probabilities
play
a
central
role
in
modern
pattern
recognition.
We
have
seen
in
Chapter
1
that
probability
theory
can
be
expressed
in
terms
of
two
simple
equations
corresponding
to
the
sum
rule
and
the
product
rule.
All
of
the
probabilistic
infer-
ence
and
learning
manipulations
discussed
in
this
book,
no
matter
how
complex,
amount
to
repeated
application
of
these
two
equations.
We
could
therefore
proceed
to
formulate
and
solve
complicated
probabilistic
models
purely
by
algebraic
ma-
nipulation.
However,
we
shall
find
it
highly
advantageous
to
augment
the
analysis
using
diagrammatic
representations
of
probability
distributions,
called
probabilistic
graphical
models.
These
offer
several
useful
properties:
1.
They
provide
a
simple
way
to
visualize
the
structure
of
a
probabilistic
model
and
can
be
used
to
design
and
motivate
new
models.
2.
Insights
into
the
properties
of
the
model,
including
conditional
independence
properties,
can
be
obtained
by
inspection
of
the
graph.
3.
Complex
computations,
required
to
perform
inference
and
learning
in
sophis-
ticated
models,
can
be
expressed
in
terms
of
graphical
manipulations,
in
which
underlying
mathematical
expressions
are
carried
along
implicitly.
http://research.microsoft.com/en-us/um/people/cmbishop/PRML/pdf/Bishop-PRML-sample.pdf
http://research.microsoft.com/en-us/um/people/cmbishop/prml/
Gaussian
processes
for
Machine
Learning,
C.
Rasmussen
and
C.
Williams,
2006
Added
in
the
kit
before
24-oct-2014
Gaussian
processes
(GPs)
provide
a
principled,
practical,
probabilistic
approach
to
learning
in
kernel
machines.
GPs
have
received
increased
attention
in
the
machine-
learning
community
over
the
past
decade,
and
this
book
provides
a
long-needed
systematic
and
unified
treatment
of
theoretical
and
practical
aspects
of
GPs
in
machine
learning.
The
treatment
is
comprehensive
and
self-contained,
targeted
at
researchers
and
students
in
machine
learning
and
applied
statistics.The
book
deals
with
the
supervised-learning
problem
for
both
regression
and
classification,
and
includes
detailed
algorithms.
A
wide
variety
of
covariance
(kernel)
functions
are
presented
and
their
properties
discussed.
Model
selection
is
discussed
both
from
a
Bayesian
and
a
classical
perspective.
Many
connections
to
other
well-known
techniques
from
machine
learning
and
statistics
are
discussed,
including
support-
vector
machines,
neural
networks,
splines,
regularization
networks,
relevance
vector
machines
and
others.
Theoretical
issues
including
learning
curves
and
the
PAC-Bayesian
framework
are
treated,
and
several
approximation
methods
for
learning
with
large
datasets
are
discussed.
The
book
contains
illustrative
examples
and
exercises,
and
code
and
datasets
are
available
on
the
Web.
Appendixes
provide
mathematical
background
and
a
discussion
of
Gaussian
Markov
processes.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
99
http://www.gaussianprocess.org/gpml/chapters/
Bayesian
Machine
Learning
by
Chakraborty,
Sounak,
2005
Added
in
the
kit
before
24-oct-2014
PhD
Thesis
https://archive.org/details/bayesianmachinel00chak
Machine
Learning
by
Tom
Mitchell,
2005
Added
in
the
kit
before
24-oct-2014
Policy
on
use:.
You
are
welcome
to
download
these
chapters
for
your
personal
use,
or
for
use
in
classes
you
teach.
In
return,
I
ask
only
two
things:
Please
do
not
re-post
these
documents
on
the
internet.
If
you
wish
to
make
them
available
to
your
students,
point
them
directly
to
this
site.
If
you
find
errors
please
send
me
email
at
Tom.Mitchell@cmu.edu
I
hope
you
find
these
useful!
Tom
Mitchell
http://www.cs.cmu.edu/%7Etom/NewChapters.html
http://www.cs.cmu.edu/%7Etom/mlbook-chapter-slides.html
Information
Theory,
Inference,
and
Learning
Algorithms,
David
McKay,
2003
Added
in
the
kit
before
24-oct-2014
This
book
is
aimed
at
senior
undergraduates
and
graduate
students
in
Engineering,
Science,
Mathematics,
and
Computing.
It
expects
familiarity
with
calculus,
probability
theory,
and
linear
algebra
as
taught
in
a
first-
or
second-
year
undergraduate
course
on
mathematics
for
scientists
and
engineers.
Conventional
courses
on
information
theory
cover
not
only
the
beautiful
theoretical
ideas
of
Shannon,
but
also
practical solutions
to
communica-
tion
problems.
This
book
goes
further,
bringing
in
Bayesian
data
modelling,
Monte
Carlo
methods,
variational
methods,
clustering
algorithms,
and
neural
networks.
Why
unify
information
theory
and
machine
learning?
Because
they
are
two
sides
of
the
same
coin.
In
the
1960s,
a
single
field,
cybernetics,
was
populated
by
information
theorists,
computer
scientists,
and
neuroscientists,
all
studying
common
problems.
Information
theory
and
machine
learning
still
belong
together.
Brains
are
the
ultimate
compression
and
communication
systems.
And
the
state-of-
the-art
algorithms
for
both
data
compression
and
error-correcting
codes
use
the
same
tools
as
machine
learning.
http://www.inference.phy.cam.ac.uk/itprnn/book.html
https://archive.org/details/MackayInformationTheoryFreeEbookReleasedByAuthor
Free
Book
List
Added
in
the
kit
before
24-oct-2014
E-Books
for
free
online
viewing
and/or
download
http://www.e-booksdirectory.com/listing.php?category=284
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
100
101
Books
-
Spanish
Coming
soon
Books
-
German
Coming
soon
Books
-
Italian
Coming
soon
Books
-
French
Coming
soon
Books
Russian
Pattern
Recognition
by
..,
2011
http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf
Algorithmic
models
of
learning
classification:
rationale,
comparison,
selection,
2014
http://www.machinelearning.ru/wiki/images/c/c3/Donskoy14algorithmic.pdf
More
coming
soon
Books
-
Japanese
Coming
soon
102
Books
-
Chinese
Books
-
Portuguese
Coming
soon
https://skillsmatter.com/explore?content=skillscasts&location=&q=machine+learning
Slides
Slideshare.com
http://www.slideshare.net/search/slideshow?searchfrom=header&q=machine+learning
Slides.com
http://slides.com/explore?search=machine%20learning
Powershow.com
http://www.powershow.com/search/presentations/machine-learning
Speaker
Deck
https://speakerdeck.com/search?q=machine+learning
103
104
Conferences
ICML,
Bellevue,
US
2011
http://www.icml-2011.org
http://techtalks.tv/icml-2011/
ICML,
Haifa,
Israel
2010
http://www.icml2010.org
Full
archive
of
ICML
http://machinelearning.org/icml.html
Machine
Learning
Conference
Videos
http://techtalks.tv/search/results/?q=machine+learning
Annual
Machine
Learning
Symposium
6th
http://techtalks.tv/sixth-annual-machine-learning-symposium/
8th
http://www.nyas.org/Events/Detail.aspx?cid=2cc3521e-408a-460e-b159-e774734bcbea
Archive
http://www.nyas.org/whatwedo/fos/machine.aspx
105
106
Meetup
-
English
631
Machine
Learning
Meetup
in
the
World
http://machine-learning.meetup.com/
Data
Science
Weekly
List
of
Meetups
List
of
Data
Science
Meetups:
NYC,
San
Francisco,
Washington
DC,
Boston,
Chicago,
Seattle,
Denver,
Austin,
Atlanta,
Toronto,
Vancouver,
London,
Berlin,
Paris,
Amsterdam,
Tel
Aviv,
Dubai,
Delhi,
Bangalore,
Singapore,
Sydney
http://www.datascienceweekly.org/data-science-resources/data-science-meetups
Other
Meetups
missing
in
Data
Science
Weekly
London
Machine
Learning
Meetup
http://www.meetup.com/London-Machine-Learning-Meetup/
London
Deep
Learning
Meetup
http://www.meetup.com/Deep-Learning-London/
107
Blog
English
Data
Science
Weekly
The
Data
Science
Weekly
Blog
contains
interviews
to
better
understand
how
people
are
using
Data
and
Data
Science
to
change
the
world.
http://www.datascienceweekly.org/blog
Yann
LeCun,
Google+
My
main
research
interests
are
Machine
Learning,
Computer
Vision,
Mobile
Robotics,
and
Computational
Neuroscience.
I
am
also
interested
in
Data
Compression,
Digital
Libraries,
the
Physics
of
Computation,
and
all
the
applications
of
machine
learning
(Vision,
Speech,
Language,
Document
understanding,
Data
Mining,
Bioinformatics).
https://plus.google.com/+YannLeCunPhD/posts
Igor
Carron
Blog
Nuit
Blanche
is
a
blog
that
focuses
on
Compressive
Sensing,
Advanced
Matrix
Factorization
Techniques,
Machine
Learning
as
well
as
many
other
engaging
ideas
and
techniques
needed
to
handle
and
make
sense
of
very
high
dimensional
data
also
known
as
Big
Data.
http://nuit-blanche.blogspot.co.uk
KDD
Community,
Knowledge
discovery
and
Data
Mining
KDD
bringing
together
the
data
mining,
data
science
and
analytics
community
http://www.sigkdd.org/blog
Kaggle
Blog
http://blog.kaggle.com
Digg
Digg
is
a
news
aggregator
with
an
editorially
driven
front
page,
aiming
to
select
stories
specifically
for
the
Internet
audience
such
as
science,
trending
political
issues,
and
viral
Internet
issues.
(source
wikipedia)
http://digg.com/search?q=machine+learning
Feedly
Found
a
site
you
like?
Use
the
+feedly
button
to
add
it
to
your
feedly
reading
list
http://feedly.com/index.html#explore%2F%23Machine%20Learning
Mlwave
Learning
Machine
Learning
ML
Wave
is
a
platform
that
talks
about
machine
learning
and
data
science.
It
was
founded
in
2014
by
the
Dutch
Kaggle
user
Triskelion.
http://mlwave.com
FastML
Machine
Learning
made
easy
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
108
FastML
probably
grew
out
of
a
frustration
with
papers
you
need
a
PhD
in
math
to
understand
and
with
either
no
code
or
half-baked
Matlab
implementation
of
homework-assignment
quality.
We
understand
that
some
cutting-edge
researchers
might
have
no
interest
in
providing
the
goodies
for
free,
or
just
no
interest
in
such
down-to-earth
matters.
But
we
dont
have
time
nor
desire
to
become
experts
in
every
machine
learning
topic.
Fortunately,
there
is
quite
a
lot
of
good
software
with
acceptable
documentation.
http://fastml.com
Beating
the
Benchmark
http://beatingthebenchmark.blogspot.co.uk
YOU
CANalytics
Welcome
to
UCAnalytics.com,
the
idea
behind
this
website
is
to
explore
the
applications
of
advanced
Analytics
and
data
mining
in
business.
Analytics
is
an
effort
to
explore
interesting
but
hidden
patterns
in
data
for
business
growth.
This
idea
has
inspired
me
to
name
the
site
UCAnalytics:
YOU
CANalytics
UCAnalytics:
YOU
SEE
Analytics
UCAnalytics:
University
for
Analytics
This
is
sort
of
like
finding
patterns
in
a
cluster
of
clouds
a
fun
exercise.
However,
we
will
explore
some
serious
business
applications
and
usage
of
Analytics
over
here.
A
few
topics
including
1.
Analytical
Scorecard
Development
2.
Customer
Segmentation
to
gain
deeper
knowledge
of
customer
behaviour
3.
Data
mining
and
Big
Data
Analytics
4.
Business
Applications
of
Bayesian
Statistics
Nate
Silver
has
made
Bayesian
cool!
5.
Challenges
&
Pitfalls
in
Business
Forecasting
Time
Series
Modelling
6.
Business
Growth
through
right
Design-of-Experiments
7.
Business
Growth
&
Risk
Estimation
through
Analytical
simulations
Look
forward
to
share
my
ideas
and
hear
back
from
you.
Roopam
Upadhyay
http://ucanalytics.com/blogs
Trevor
Stephens
Blog
http://trevorstephens.com
Mozilla
Hacks
Mozilla
Hacks
is
one
of
the
key
resources
for
people
developing
for
the
Open
Web,
talking
about
news
and
in-depth
descriptions
of
technologies
and
features.
https://hacks.mozilla.org/?s=machine+learning
Banach's
Algorithmic
Corner,
University
of
Warsaw
This
blog
is
maintained
by
members
of
Algorithmic
group
at
University
of
Warsaw:
http://corner.mimuw.edu.pl
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
109
DataCamp
Blog
http://blog.datacamp.com
Natural
Language
Processing
Blog,
Hal
Daume
http://nlpers.blogspot.co.uk
Maxim
Milakov
Blog
I
am
a
researcher
in
machine
learning
and
high-performance
computing.
I
designed
and
implemented
nnForge
-
a
library
for
training
convolutional
and
fully
connected
neural
networks,
with
CPU
and
GPU
(CUDA)
backends.
You
will
find
my
thoughts
on
convolutional
neural
networks
and
the
results
of
applying
convolutional
ANNs
for
various
classification
tasks
in
the
Blog.
http://www.milakov.org
Alfonso
Nieto-Castanon
Blog
I
work
on
the
field
of
computational
neuroscience,
and
my
background
is
on
neuroscience
(Ph.D.
Cognitive
and
Neural
Systems,
Boston
University)
and
engineering
(B.S./M.S.
Telecommunication
Engineering,
Universidad
de
Valladolid).
My
areas
of
specialization
are
modeling
and
statistics,
fMRI
analysis
methods,
and
signal
processing.
http://www.alfnie.com/home
Persontyle
Blog
Every
object
on
earth
is
generating
data,
including
our
homes,
our
cars
and
yes
even
our
bodies.
Data
is
the
by-product
of
our
new
digital
existence.
Data
has
the
potential
to
revolutionize
the
way
business,
government,
science,
research,
and
healthcare
are
carried
out.
Data
presents
unprecedented
opportunities
to
those
who
have
the
skills
and
expertise
to
use
it
to
unveil
patterns,
insights,
signals
and
predict
trends
which
was
never
possible
before.
In
massively
connected
data
driven
world,
it
is
imperative
that
the
workforce
of
today
and
tomorrow
is
able
to
understand
what
data
is
available
and
use
scientific
methods
to
analyze
and
interpret
it.
Were
here
to
help
you
learn
and
apply
the
art
and
science
of
turning
data
into
meaningful
insights
and
intelligent
predictions
http://www.persontyle.com/blog/
Analytics
Vidhya
Learn
everything
about
Analytics
Welcome
to
Analytics
Vidhya!
For
those
of
you,
who
are
wondering
what
is
Analytics
Vidhya,
Analytics
can
be
defined
as
the
science
of
extracting
insights
from
raw
data.
The
spectrum
of
analytics
starts
from
capturing
data
and
evolves
into
using
insights
/
trends
from
this
data
to
make
informed
decisions.
Vidhya
on
the
other
hand
is
a
Sanskrit
noun
meaning
Knowledge
or
Clarity
on
a
subject.
Knowledge,
which
has
been
gained
through
reading
literature
or
through
self
practice
/
experimentation.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
110
Through
this
blog,
I
want
to
create
a
passionate
community,
which
dedicates
itself
in
study
of
Analytics.
I
share
my
learning
and
tips
on
Analytics
through
this
blog.
http://www.analyticsvidhya.com/blog/
Bugra
Akyildiz's
Blog
Great
Blog
(Notes)
both
theoretical
and
practical
I
work
as
a
Machine
Learning/NLP
Engineer
at
CB
Insights
where
I
apply
machine
learning
algorithms
to
NLP
problems.
I
received
B.S
from
Bilkent
University
and
M.Sc
from
New
York
University
focusing
signal
processing
and
machine
learning.
http://bugra.github.io
Data
origami
8
great
data
blogs
to
follow
https://www.dataorigami.net/blogs/great-data-blogs
Rasbts
Blog
A
collection
of
tutorials
and
examples
for
solving
and
understanding
machine
learning
and
pattern
classification
tasks
Links
to
useful
resources
https://github.com/rasbt/pattern_classification#links-to-useful-resources
Gilles
Louppe's
Blog
Understanding
Random
Forest,
PhD
Thesis
https://github.com/glouppe/phd-thesis/blob/master/thesis.pdf
AI
Topics
AITopics
is
a
mediated
information
portal
provided
by
AAAI
(The
Association
for
the
Advancement
of
Artificial
Intelligence),
with
the
goal
of
communicating
the
science
and
applications
of
AI
to
interested
people
around
the
world.
Contents
! Good
Starting
Places
! General
Readings
! Organizations
! Educational
Resources
! Hardware
and
Software
! Competitions
! News
! Videos
! Podcasts
! Classic
Articles
&
Books
http://aitopics.org/topic/machine-learning
111
AI
International
This
international
AI
site
is
designed
to
help
you
locate
AI
research
efforts
in
your
country
or
region.
Pages
on
this
site
will
link
to
local
AI
societies,
universities,
labs,
and
other
research
efforts.
http://www.aiinternational.org/index.html
Joseph
Misiti's
Blog
machine-learning
+
applied
mathematics
+
django
+
hadoop.
Co-Founder
of
@socialq.
https://github.com/josephmisiti
https://medium.com/@josephmisiti
MIRI,
Machine
Intelligence
Research
Institute
The
mathematics
of
safe
machine
intelligence
MIRIs
mission
is
to
ensure
that
the
creation
of
smarter-than-human
intelligence
has
a
positive
impact.
We
aim
to
make
intelligent
machines
behave
as
we
intend
even
in
the
absence
of
immediate
human
supervision.
Much
of
our
current
research
deals
with
reflection,
an
AIs
ability
to
reason
about
its
own
behavior
in
a
principled
rather
than
ad-hoc
way.
We
focus
our
research
on
AI
approaches
that
can
be
made
transparent
(e.g.
principled
decision
algorithms,
not
genetic
algorithms),
so
that
humans
can
understand
why
the
AIs
behave
as
they
do.
http://intelligence.org/blog/
Kevin
Davenport
Data
Blog
Added
in
the
kit
04-Nov-2014
I'm
a
tech
enthusiast
interested
in
automation,
machine
learning,
and
conveying
complex
statistical
models
through
visualization.
Recent
Posts
Regularized
Logistic
Regression
Intuition
October
27,
2014
Dynamic
Time-Series
Modeling
May
22,
2014
A
Real
World
Introduction
to
Information
Entropy
April
21,
2014
The
Cost
Function
of
K-Means
February
14,
2014
Mahalanobis
Distance
and
Outliers
December
3,
2013
Quick
Look:
Facebooks
Kaggle
Competition
October
21,
2013
Significance
Magazine
Contribution
August
28,
2013
Absolute
Deviation
Around
the
Median
August
8,
2013
My
Trip
to
Spain:
The
R
User
Conference
2013
July
23,
2013
Gradient
Boosting:
Analysis
of
LendingClubs
Data
July
4,
2013
Shiny
Server
on
CentOS
June
29,
2013
Data
imputation
I
June
12,
2013
ggplot2
graphics
in
a
loop
April
30,
2013
Predicting
Dichotomous
Outcomes
I
April
14,
2013
Data
visualization
with
R
and
ggplot2
March
28,
2013
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
112
113
Yvonne
Rogers
Blog
Yvonne
Rogers
is
a
Professor
of
Interaction
Design,
the
director
of
UCLIC
and
a
deputy
head
of
the
Computer
Science
department
at
UCL.
Her
research
interests
are
in
the
areas
of
ubiquitous
computing,
interaction
design
and
human-computer
interaction.
A
central
theme
is
how
to
design
interactive
technologies
that
can
enhance
life
by
augmenting
and
extending
everyday,
learning
and
work
activities.
This
involves
informing,
building
and
evaluating
novel
user
experiences
through
creating
and
assembling
a
diversity
of
pervasive
technologies.
http://www.interactiveingredients.com
Blog
-
Spanish
Coming
soon
Blog
-
Italian
Coming
soon
Blog
-
German
Coming
soon
Blog
-
French
Coming
soon
Blog
-
Russian
Coming
soon
Blog
-
Japanese
Coming
soon
114
Blog
-
Chinese
Coming
soon
Blog
-
Portuguese
Coming
soon
Journals
-
English
Journal
of
Machine
Learning
Research,
MIT
Press
http://jmlr.org
Machine
Learning
Journal
(last
article
could
be
downloaded
for
free)
http://link.springer.com/journal/10994
Machine
Learning
(Theory)
This
is
an
experiment
in
the
application
of
a
blog
to
academic
research
in
machine
learning
and
learning
theory
by
John
Langford.
Exactly
where
this
experiment
takes
us
and
how
the
blog
will
turn
out
to
be
useful
(or
not)
is
one
of
those
prediction
problems
we
so
dearly
love
in
machine
learning.
http://hunch.net
List
of
Journals
on
Microsoft
Academic
Research
website
http://academic.research.microsoft.com/RankList?entitytype=4&topDomainID=2&subDomainID=6&last=0&start=1&end=10
0
Wired
magazine
http://www.wired.com/tag/machine-learning/
Data
Science
Central
Data
Science
Central
is
the
industry's
online
resource
for
big
data
practitioners.
From
Analytics
to
Data
Integration
to
Visualization,
Data
Science
Central
provides
a
community
experience
that
includes
a
robust
editorial
platform,
social
interaction,
forum-based
technical
support,
the
latest
in
technology,
tools
and
trends
and
industry
job
opportunities.
http://www.datasciencecentral.com
Journals
Spanish
Coming
soon
115
Journals
German
Coming
soon
Journals
Italian
Coming
soon
Journals
French
Coming
soon
Journals
Russian
Coming
soon
Journals
Japanese
Coming
soon
Journals
Chinese
Coming
soon
Journals
-
Portuguese
Coming
soon
116
117
118
Reddit
in
Russian
http://www.reddit.com/r/MachineLearning_Ru
http://www.reddit.com/r/MachineLearning_Ru/comments/249f7x/meta______faq/
119
Fun - English
120
121
http://ai.stanford.edu/courses/
Carnegie
Mellon
University
Machine
Learning
Department
The
Machine
Learning
Department
is
an
academic
department
within
Carnegie
Mellon
University's
School
of
Computer
Science.
We
focus
on
research
and
education
in
all
areas
of
statistical
machine
learning.
Watch
an
interview
with
Tom
Mitchell,
Department
Head:
http://videolectures.net/mlas06_mitchell_itm/
http://www.ml.cmu.edu
Noah's
ARK
Research
Group,
Carnegie
Mellon
University
Noah's
ARK[1]
is
Noah
Smith's
informal
research
group
at
the
Language
Technologies
Institute,
School
of
Computer
Science,
Carnegie
Mellon
University.
(The
research
is
formal;
the
group
is
informal.)
As
you
may
have
guessed,
our
research
focuses
on
problems
of
ambiguity
and
uncertainty
in
natural
language
processing,
including
morphology,
syntax,
semantics,
translation,
and
behavioral/social
phenomena
observed
through
languageall
viewed
through
a
computational
lens.
http://www.ark.cs.cmu.edu
Intelligent
Interactive
Systems
Group
at
Harvard
University
Intelligent
Interactive
Systems
are
fundamentally
hard
to
design
because
they
require
intelligent
technology
that
is
well
suited
for
people's
abilities,
limitations,
and
preferences;
they
also
require
entirely
novel
interactions
that
can
give
the
user
a
predictable
and
reliable
experience
despite
the
fact
that
the
underlying
technology
is
inherently
proactive,
unpredictable,
and
occasionally
wrong.
Thus,
design
of
successful
intelligent
interactive
systems
requires
intimate
knowledge
of
and
ability
to
innovate
in
two
very
disparate
areas:
human-computer
interaction
and
artificial
intelligence
or
machine
learning.
Our
projects
span
the
full
range
from
formal
user
studies
to
statistical
machine
learning.
We
have
worked
on
developing
new
intelligent
technologies
to
enable
novel
interactions
(e.g.,
SUPPLE
system)
and
on
understanding
the
principles
underlying
how
people
interact
with
intelligent
systems
(e.g.,
the
project
on
exploring
the
design
space
of
adaptive
user
interfaces).
Our
Brain-Computer
Interface
project
aims
at
developing
a
new
set
of
interactions
for
efficiently
controlling
complex
applications,
and
we
are
also
interested
in
building
and
studying
complete
applications.
One
particular
area
of
inteterest
is
the
ability-based
user
interfaces
--
an
approach
for
adapting
interactions
to
the
individual
abilities
of
people
with
impairments
or
of
able-bodied
people
in
unusual
situations.
http://iis.seas.harvard.edu
http://iis.seas.harvard.edu/resources/
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
122
123
BIDS
brings
together
researchers
across
disciplines
and
enhances
career
paths
for
data
scientists
through
a
number
of
newly
created
Data
Science
Fellows
positions,
graduate
student
fellowships,
boot-camps,
special
classes,
and
conferences
of
interest
to
the
academic
community
and
general
public.
The
Institutes
initial
support
is
provided
by
a
5-year
$12.5
million
grant
from
the
Moore
and
Sloan
Foundations
together
with
significant
support
provided
by
UC
Berkeley.
The
Moore-Sloan
Data
Science
Environment
also
supports
similar
programs
with
shared
goals
and
objectives
at
the
University
of
Washington
and
New
York
University.
http://vcresearch.berkeley.edu/DATASCIENCE/BIDS
Data
Science
Lecture
Series:
Maximizing
Human
Potential
Using
Machine
Learning-
Driven
Applications
https://www.youtube.com/channel/UCBBd3JxQl455JkWBeulc-9w?spfreload=10
Princeton
University
Department
of
Computer
Science
-
ARTIFICIAL
INTELLIGENCE
&
MACHINE
LEARNING
Machine
learning
and
computational
perception
research
at
Princeton
is
focused
on
the
theoretical
foundations
of
machine
learning,
the
experimental
study
of
machine
learning
algorithms,
and
the
interdisciplinary
application
of
machine
learning
to
other
domains,
such
as
biology
and
information
retrieval.
Some
of
the
techniques
that
we
are
studying
include
boosting,
probabilistic
graphical
models,
support-
vector
machines,
and
nonparametric
Bayesian
techniques.
We
are
especially
interested
in
learning
from
large
and
complex
data
sets.
Example
applications
include
habitat
modeling
of
species
distributions,
topic
models
of
large
collections
of
scientific
articles,
classification
of
brain
images,
protein
function
classification,
and
extensions
of
the
Wordnet
semantic
network.
http://www.cs.princeton.edu/research/areas/mlearn
University
of
California,
Los
Angeles
(UCLA)
Research
Laboratories
and
Groups
Automated
Reasoning
Group
(Adnan
Darwiche)
Biocybernetics
Laboratory
(Joe
DiStefano)
Center
for
Vision,
Cognition,
Learning
and
Art
(Song-Chun
Zhu)
Cognitive
Systems
Laboratory
(Judea
Pearl)
Concurrent
Systems
Laboratory
(Yuval
Tamir)
Digital
Arithmetic
and
Reconfigurable
Architecture
Laboratory
(Milos
Ercegovac)
ER:
Embedded
and
Reconfigurable
System
Design
(Majid
Sarrafzadeh)
Information
and
Data
Management
Group
(multiple
faculty)
Internet
Research
Laboratory
(Lixia
Zhang)
Laboratory
for
Embedded
Collaborative
Systems
(LECS)
(archived
CENS
documents)
Laboratory
for
Advanced
Systems
Research
(LASR)
(Peter
Reiher)
MAGIX:
Computer
Graphics
&
Vision
Laboratory
(Demetri
Terzopoulos)
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
124
125
University
of
Washington
Machine
Learning
UW
is
one
of
the
world's
top
centers
of
research
in
machine
learning.
We
are
active
in
most
major
areas
of
ML
and
in
a
variety
of
applications
like
natural
language
processing,
vision,
computational
biology,
the
Web,
and
social
networks.
Check
out
the
links
on
the
left
to
find
out
who's
who
and
what's
happening
in
ML
at
UW.
And
be
sure
to
see
our
CSE-wide
efforts
in
Big
Data
https://www.cs.washington.edu/research/ml/
"Big
Data"
Research
and
Education
UW
CSE
is
driving
the
"Big
Data"
revolution.
Our
traditional
strength
in
data
management
(Magda
Balazinska,
Bill
Howe,
Dan
Suciu),
machine
learning
(Pedro
Domingos),
and
open
information
extraction
(Oren
Etzioni,
Dan
Weld)
has
recently
been
augmented
by
key
hires
in
machine
learning
(Emily
Fox,
Carlos
Guestrin,
Ben
Taskar)
and
data
visualization
(Jeff
Heer).
Our
efforts
are
coordinated
with
those
of
outstanding
researchers
in
the
University
of
Washington's
top-ten
programs
in
Statistics,
Biostatistics,
and
Applied
Mathematics,
among
others.
Through
the
University
of
Washington
eScience
Institute
(directed
by
Ed
Lazowska)
we
are
integrally
involved
in
ensuring
that
researchers
across
the
campus
have
access
to
cutting-edge
approaches
to
data-
driven
discovery.
http://www.cs.washington.edu/research/bigdata
Social
Robotics
Lab
-
Yale
University
The
members
of
our
lab
perform
research
over
a
diverse
collection
of
topics.
Though
these
projects
approach
social
and
developmental
research
from
varied
perspectives,
they
all
share
common
themes.
Robots
provide
an
embodied,
empirical
testbed
that
allows
for
repeated
validation.
Robots
also
enable
the
use
of
social
interactions
as
part
of
the
modeled
experimental
environment,
staying
grounded
in
real-world
perceptions,
and
appropriately
integrating
perceptual,
motor,
and
cognitive
skills.
http://scazlab.yale.edu/publications/all-publications
Georgia
Institute
of
Technology
ML@GT
http://ml.cc.gatech.edu
University
of
Texas
and
Austin
Machine
Learning
Research
Group
Machine
learning
is
the
study
of
adaptive
computational
systems
that
improve
their
performance
with
experience.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
126
127
128
129
University
of
Montreal
Machine
Learning
Lab
The
LISA
(machine
learning
lab)
aims
towards
improving
our
understanding
of
the
principles
that
give
rise
to
powerful
learning
and
to
intelligence,
which
will
be
important
to
make
significant
progress
on
learning
algorithms
and
artificial
intelligence
(AI).
Acquiring
the
kind
of
complex
knowledge
necessary
for
AI
requires
some
form
of
learning,
with
the
ability
to
discover
hidden
relationships
and
statistical
structure
that
may
be
highly
complex,
with
many
interacting
factors
of
variations
explaining
the
observed
high-dimensional
data
that
sensors
can
provide.
According
to
us
this
is
the
main
challenge
for
machine
learning
and
AI.
Like
the
brain,
deep
learning
algorithms
are
based
on
several
levels
of
representation
and
processing,
creating
several
levels
of
levels
of
abstraction.
Compared
to
learning
algorithms
based
on
shallower
architectures,
deep
learners
have
the
potential
to
efficiently
represent
highly
complex
functions
and
distributions.
We
explore
various
learning
algorithms
for
deep
learning,
based
in
particular
on
unsupervised
pre-training
(e.g.,
various
kinds
of
Boltzmann
machines
and
auto-encoders).
Unsupervised
pre-training
allows
to
exploit
very
large
quantities
of
mostly
unlabeled
examples
(such
as
documents,
images,
and
videos
from
the
web).
The
learned
representations
capture
the
salient
factors
of
variation
(and
invariances)
implicitly
present
in
the
data,
and
can
be
exploited
in
the
context
of
several
supervised
learning
tasks
(multi-task
learning,
self-taught
learning,
semi-supervised
learning).
http://lisa.iro.umontreal.ca/index_en.html
University
of
Sherbrooke
Intelligence
artificielle
Trois
quipes
oeuvrent
dans
cet
axe
de
recherche;
d'autres
projets
sont
conduits
par
des
chercheurs
agissant
titre
individuel.
L'quipe
de
recherche
dans
le
domaine
des
systmes
tutoriels
intelligents
ASTUS
(Apprentissage
par
Systme
Tutoriel
de
l'Universit
de
Sherbrooke)
travaille
autour
des
thmes
suivants:
reprsentation
des
connaissances,
modlisation
de
l'utilisateur,
interactions
humain-machine,
psychologie
de
l'ducation
et
sciences
cognitives.
L'quipe
de
recherche
dans
le
domaine
du
forage
de
donnes,
Prospectus
(Prospection
de
donnes
l'Universit
de
Sherbrooke),
travaille
autour
des
thmes
suivants:
prospection
des
donnes,
prospection
et
modlisation
des
connaissances,
reconnaissance
de
formes,
segmentation
et
classification,
mthodes
d'intelligence
artificielle
non
symboliques,
rseaux
de
neurones
et
rseaux
baysiens,
dtection
de
structures
et
comportements
latents.
L'quipe
de
recherche
dans
le
domaine
de
la
planification
en
intelligence
artificielle,
PLANIART,
travaille
autour
de
thmes
suivant
:
planification
de
trajectoires,
planification
de
comportements
et
reconnaissance
de
plans
dans
les
jeux
vido
et
en
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
130
131
More to come
132
133
Institute
will
empower
Imperial
and
its
partners
to
collaborate
in
the
pursuit
of
world
class
data-driven
innovation.
http://www3.imperial.ac.uk/data-science
The
University
of
Edinburgh,
Institute
for
Adaptive
and
Neural
Computation
http://www.anc.ed.ac.uk/machine-learning/
Cambridge
University
We
are
a
part
of
the
Computational
and
Biological
Learning
Laboratory
located
in
the
Department
of
Engineering
at
the
University
of
Cambridge.
The
research
in
our
group
is
very
broad,
and
we
are
interested
in
all
aspects
of
machine
learning.
Particular
strengths
of
the
group
are
in
Bayesian
approaches
to
modelling
and
inference
in
statistical
applications.
The
type
of
work
we
do
can
range
from
studying
fundamental
concepts
in
applied
Bayesian
statistics,
all
the
way
to
getting
our
algorithms
to
perform
competitively
against
the
state-of-the-art
in
big-data
applications.
We
also
work
in
a
broad
range
of
application
domains,
including
neuroscience,
bioinformatics,
finance,
social
networks,
and
physics,
just
to
name
a
few,
and
we
actively
seek
to
collaborate
with
other
groups
within
the
Department
of
Engineering,
throughout
the
university
as
a
whole,
and
with
other
groups
within
the
UK
and
around
the
world.
If
you
are
interested
in
finding
out
more
about
our
research,
please
visit
our
Publications
page,
or
visit
the
individual
research
pages
of
our
group
members.
http://mlg.eng.cam.ac.uk
About Us
134
also
invite
you
to
keep
up
to
date
with
our
activities
by
following
us
on
Twitter
@intelsensing
and
to
enjoy
our
research
videos
at
http://cis.eecs.qmul.ac.uk.
Professor
Andrea
Cavallaro
Director
http://cis.eecs.qmul.ac.uk
Videos
https://www.youtube.com/user/intelsensing/feed?spfreload=10
ICRI,
The
Intel
Collaborative
Research
Institute
The
Intel
Collaborative
Research
Institute
is
concerned
with
how
to
enhance
the
social,
economic
and
environmental
well
being
of
cities
by
advancing
compute,
communication
and
social
constructs
to
deliver
innovations
in
system
architecture,
algorithms
and
societal
participation.
http://www.cities.io
MACHINE
LEARNING
RESEARCH
GROUPS
in
EUROPE,
FRANCE
Magnet,
MAchine
learninG
in
information
NETworks,
INRIA,
France
The
Magnet
project
aims
to
design
new
machine
learning
based
methods
geared
towards
mining
information
networks.
Information
networks
are
large
collections
of
interconnected
data
and
documents
like
citation
networks
and
blog
networks
among
others.
For
this,
we
will
define
new
structured
prediction
methods
for
(networks
of)
texts
based
on
machine
learning
algorithms
in
graphs.
Such
algorithms
include
node
classification,
link
prediction,
clustering
and
probabilistic
modeling
of
graphs.
Envisioned
applications
include
browsing,
monitoring
and
recommender
systems,
and
more
broadly
information
extraction
in
information
networks.
Application
domains
cover
social
networks
for
cultural
data
and
e-
commerce,
and
biomedical
informatics.
https://team.inria.fr/magnet/
Sierra
Team
-
Ecole
Normale
Superieure
,
CNRS,
INRIA
SIERRA
is
based
in
the
Laboratoire
d'Informatique
de
l'cole
Normale
Superiure
(CNRS/ENS/INRIA
UMR
8548)
and
is
a
joint
research
team
between
INRIA
Rocquencourt,
cole
Normale
Suprieure
de
Paris
and
Centre
National
de
la
Recherche
Scientifique.
We
follow
four
main
research
directions:
Supervised
learning:
This
part
of
our
research
focuses
on
methods
where,
given
a
set
of
examples
of
input/output
pairs,
the
goal
is
to
predict
the
output
for
a
new
input,
with
research
on
kernel
methods,
calibration
methods,
structured
prediction,
and
multi-task
learning.
Unsupervised
learning:
We
focus
here
on
methods
where
no
output
is
given
and
the
goal
is
to
find
structure
of
certain
known
types
(e.g.,
discrete
or
low-
dimensional)
in
the
data,
with
a
focus
on
matrix
factorization,
statistical
tests,
dimension
reduction,
and
semi-supervised
learning.
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
135
136
137
138
http://zaa.mimuw.edu.pl
more
to
come
MACHINE
LEARNING
RESEARCH
GROUPS
in
ASIA,
INDIA
Indian
Institute
of
Science
Machine
Learning
and
Learning
Theory
Group
Our
research
group
focuses
on
the
design
and
analysis
of
machine
learning
algorithms,
and
on
understanding
the
mathematical
and
statistical
properties
of
solutions
to
machine
learning
problems.
Members
of
the
group
have
strong
backgrounds
in
several
areas
including
probability,
linear
algebra,
convex
analysis,
optimization,
spectral
graph
theory,
and
others,
enabling
us
to
explore
problems
from
a
variety
of
different
viewpoints.
Our
emphasis
is
on
developing
a
strong
fundamental
understanding
of
various
problems
of
current
interest
in
machine
learning
and
statistical
learning
theory.
Some
of
our
current
research
directions
include
designing
and
analyzing
algorithms
for
problems
such
as
ranking
and
various
types
of
structured
prediction
tasks,
understanding
statistical
consistency
properties
for
such
problems,
exploring
new
issues
in
machine
learning
such
as
those
related
to
privacy,
and
selected
applications
of
machine
learning
in
computational
biology
and
medicine.
http://drona.csa.iisc.ernet.in/~mllt/
Indian
Institute
of
Technology
of
Kanpur
https://www.google.com/search?q=machine%20learning&domains=iitk.ac.in&sitesearch=www.iitk.ac.in&gws_rd=ssl
More
to
come
139
http://eecs.pku.edu.cn/eecs_english/InstComputationalLinguistics.shtml
PKU
Real
course
online
http://www.grids.cn/
Beijing
University
of
Technology
Beijing
Key
Lab
of
Multimedia
and
Intelligent
Software
Technology
Artificial
Intelligence
and
Knowledge
Engineering
The
research
fields
in
this
direction
include
fundamental
research
of
Knowledge
Science
and
Knowledge
Engineering,
research
and
application
of
Data
Mining
and
Machine
Learning,
and
Knowledge-Based
Computer
Aided
Animation
Generation.
In
those
fields,
the
laboratory
has
performed
8
programs
from
National
Natural
Science
Foundation
(including
1
subprogram
of
major
research
program
of
National
Natural
Science
Foundation),
1
program
from
Key
Programs
in
the
National
Science
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
140
Nanjing
University
Lamda
Group
LAMDA
is
affiliated
with
the
National
Key
Laboratory
for
Novel
Software
Technology
and
the
Department
of
Computer
Science
&
Technology,
Nanjing
University,
China.
It
locates
at
Computer
Science
and
Technology
Building
in
the
Xianlin
campus
of
Nanjing
University,
mainly
in
Rm910.
The
Founding
Director
of
LAMDA
is
Prof.
Zhi-Hua
Zhou.
"LAMDA"
means
"Learning
And
Mining
from
DatA".
The
main
research
interests
of
LAMDA
include
machine
learning,
data
mining,
pattern
recognition,
information
retrieval,
evolutionary
computation,
neural
computation,
and
some
other
related
areas.
Currently
our
research
mainly
involves:
ensemble
learning,
semi-supervised
and
active
learning,
multi-instance
and
multi-label
learning,
cost-sensitive
and
class-
imbalance
learning,
metric
learning,
dimensionality
reduction
and
feature
selection,
structure
learning
and
clustering,
theoretical
foundations
of
evolutionary
computation,
improving
comprehensibility,
content-based
image
retrieval,
web
search
and
mining,
face
recognition,
computer-aided
medical
diagnosis,
bioinformatics,
etc.
http://lamda.nju.edu.cn/MainPage.ashx
More
to
come
MACHINE
LEARNING
RESEARCH
GROUPS
in
ASIA,
RUSSIA
Moscow
State
University
http://www.msu.ru/
More
to
come
141
142
Mona
Singh
Added
in
the
kit
before
24-Oct-2014
My
group
develops
algorithms
for
a
diverse
set
of
problems
in
computational
molecular
biology.
We
are
particularly
interested
in
predicting
specificity
in
protein
interactions
and
uncovering
how
molecular
interactions
and
functions
vary
across
context,
organisms
and
individuals.
We
leverage
high-throughput
biological
datasets
in
order
to
develop
data-driven
algorithms
for
predicting
protein
interactions
and
specificity;
for
analyzing
biological
networks
in
order
to
uncover
cellular
organization,
functioning,
and
pathways;
for
uncovering
protein
functions
via
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
143
sequences
and
structures;
and
for
analyzing
proteomics
and
sequencing
data.
An
appreciation
of
protein
structure
guides
much
of
our
research.
http://www.cs.princeton.edu/~mona/
Olga
Troyanskaya
Added
in
the
kit
before
24-Oct-2014
The
goal
of
my
research
is
to
bring
the
capabilities
of
computer
science
and
statistics
to
the
study
of
gene
function
and
regulation
in
the
biological
networks
through
integrated
analysis
of
biological
data
from
diverse
data
sources--both
existing
and
yet
to
come
(e.g.
from
diverse
gene
expression
data
sets
and
proteomic
studies).
I
am
designing
systematic
and
accurate
computational
and
statistical
algorithms
for
biological
signal
detection
in
high-throughput
data
sets.
More
specifically,
I
am
interested
in
developing
methods
for
better
gene
expression
data
processing
and
algorithms
for
integrated
analysis
of
biological
data
from
multiple
genomic
data
sets
and
different
types
of
data
sources
(e.g.
genomic
sequences,
gene
expression,
and
proteomics
data).
http://reducio.princeton.edu/cm/node/13
UCLA,
US
Judea
Pearl,
Cognitive
System
Laboratory
Added
in
the
kit
before
24-Oct-2014
Judea
Pearl
(born
1936)
is
an
Israeli-born
American
computer
scientist
and
philosopher,
best
known
for
championing
the
probabilistic
approach
to
artificial
intelligence
and
the
development
of
Bayesian
networks
(see
the
article
on
belief
propagation).
He
is
also
credited
for
developing
a
theory
of
causal
and
counterfactual
inference
based
on
structural
models
(see
article
on
causality).
He
is
the
2011
winner
of
the
ACM
Turing
Award,
the
highest
distinction
in
computer
science,
"for
fundamental
contributions
to
artificial
intelligence
through
the
development
of
a
calculus
for
probabilistic
and
causal
reasoning".
(source
Wikipedia)
http://bayes.cs.ucla.edu/csl_papers.html
Rice
University,
US
Justin
Esarey
Lectures,
Assistant
Professor
of
Political
Science
Dr.
Justin
Esarey
is
an
Assistant
Professor
of
Political
Science
at
Rice
University
who
specializes
in
political
methodology.
His
areas
of
expertise
include
detecting
and
presenting
context-specific
relationships,
model
specification
and
sensitivity,
the
analysis
of
binary
data,
laboratory
social
experimentation,
and
promoting
thoughtful
inference
(and
thinking
about
inference)
by
using
technology
to
make
methodological
resources
available
to
the
scholarly
public.
His
recent
substantive
projects
study
the
relationship
between
corruption
and
female
participation
in
government,
the
effect
of
"naming
and
shaming"
on
human
rights
abuse,
and
the
behavioral
implications
of
political
ideology.
https://www.youtube.com/user/jeesarey/videos?spfreload=10
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
144
Justin
Esarey
Publications
&
Software,
Assistant
Professor
of
Political
Science,
Rice
University
http://jee3.web.rice.edu/research.htm
University
of
Maryland,
US
Hal
Daume
III
Added
in
the
kit
before
24-Oct-2014
I
am
Hal
Daum
III,
an
Associate
Professor
in
Computer
Science
(also
UMIACS
and
Linguistics)
at
the
University
of
Maryland;
I
was
previously
in
the
School
of
Computing
at
the
University
of
Utah
(CV).
Although
I'd
like
to
be
known
for
my
research
in
language
(computational
linguistics
and
natural
language
processing)
and
machine
learning
(structured
prediction,
domain
adapation
and
Bayesian
methods),
I
am
probably
best
known
for
my
NLPers
blog.
I
associate
myself
most
with
conferences
like
ACL,
ICML,
EMNLP
and
NIPS.
At
UMD,
I'm
affiliated
with
the
Computational
Linguistics
lab,
the
machine
learning
reading
group,
the
language
science
program
and
the
AI
group,
and
interact
closely
with
LINQS
and
computer
vision.
http://hal3.name
145
146
I
love
this
field
because
it
allows
us
to
apply
our
expertise
to
a
variety
of
tough
problems,
including
film
and
photo
special
effects
(computational
photography),
action
analysis
(of
people,
animals,
and
cells),
and
authoring
systems
(for
architecture,
animation,
presentations)
that
make
the
most
of
user
effort.
"Motion
reveals
everything"
used
to
be
my
main
research
mantra,
but
that
has
now
taken
hold
sufficiently
(obviously
NOT
just
through
my
efforts!)
that
it
no
longer
needs
championing.
http://www0.cs.ucl.ac.uk/staff/g.brostow/#Research
Jun
Wang
Added
in
the
kit
before
24-Oct-2014
My
research
focus
is
on
the
areas
of
information
retrieval,
large
scale
data
mining,
multimedia
content
analysis,
and
statistical
pattern
recognition;
current
research
covers
both
theoretical
and
practical
aspects:
portfolio
theory
and
statistical
modeling
of
information
retrieval,
data
mining
and
collaborative
filtering
(recommendation),
web
economy
and
online
advertising,
user-centric
information
seeking,
social,
the
wisdom
of
crowds,
approaches
for
content
understanding,
organisation,
and
retrieval,
peer-to-peer
information
retrieval
and
filtering,
and
multimedia
content
analysis,
indexing
and
retrieval.
http://scholar.google.com/citations?user=wIE1tY4AAAAJ&hl=en
David
Jones
Lab
Added
in
the
kit
before
24-Oct-2014
My
main
research
interests
are
in
protein
structure
prediction
and
analysis,
simulations
of
protein
folding,
Hidden
Markov
Model
methods,
transmembrane
protein
analysis,
machine
learning
applications
in
bioinformatics,
de
novo
protein
design
methodology,
and
genome
analysis
including
the
application
of
intelligent
software
agents.
New
areas
of
research
include
the
use
of
high
throughput
computing
and
Grid
technology
for
bioinformatics
applications,
analysis
and
prediction
of
protein
disorder,
expression
array
data
analysis
and
the
analysis
and
prediction
of
protein
function
and
protein-protein
interactions.
http://bioinf.cs.ucl.ac.uk/publications/
Simon
Prince
Added
in
the
kit
before
24-Oct-2014
My
initial
work
addressed
human
stereo
vision.
My
doctoral
thesis
concerned
the
solution
of
the
binocular
stereo
correspondence
problem
in
the
human
visual
system.
I
also
worked
on
the
physiology
of
stereo
vision
in
my
subsequent
post-
doctoral
research.
I
became
interested
in
computer
vision
and
made
the
switch
in
2000.
My
first
Computer
Science
research
was
on
time-series
methods
for
the
solution
of
the
inverse
problem
in
Optical
Tomography
with
Simon
Arridge
at
UCL.
In
Singapore,
I
worked
for
several
years
on
augmented
reality.
This
involved
developing
algorithms
machinelearningsalon
kit
28th
December
2014
Dont
keep
an
old
version!
machinelearningsalon
kit
is
regularly
updated!
147
148
149
150
151
Karon
MacLean
http://www.cs.ubc.ca/labs/spin/publications/index.html
Alan
Mackworth
http://www.cs.ubc.ca/~mack/Publications/sort_date.html
Dinesh
K.
Pai
http://www.cs.ubc.ca/~pai/
David
Poole
http://www.cs.ubc.ca/~poole/publications.html
152
153
154
155