Machinelearningsalon Kit 28-12-2014

machinelearningsalon.
org
How to use the Machine Learning Salons Kit? ........................................................... 16
What is the Machine Learning Salons Kit? .................................................................................................. 16
What is not the Machine Learning Salons Kit? .......................................................................................... 16
Why are we not on GitHub? ................................................................................................................................ 16
If you are a CTO who wants to recruit smart Machine Learning developers ............................... 16
If you want to become a contributor .............................................................................................................. 16
If you want to remove a link .............................................................................................................................. 16
If you want to add a better description of your website ........................................................................ 16
If you are willing to give a discount to the Machine Learning Salons readers ............................ 17
About the Founder of The Machine Learning Salons Website & Kit ............................................... 17
Contact ......................................................................................................................................................................... 17
MOOC or Opencourseware English .......................................................................... 18
Coursera ...................................................................................................................................................................... 18
Machine Learning Stanford Course ................................................................................................................. 18
Pratical Machine Learning ................................................................................................................................. 18
Machine Learning Washington Course .......................................................................................................... 18
Core Concepts in Data Analysis (Higher School of Economics) ........................................................... 19
Neural Networks for Machine Learning ........................................................................................................ 19
Natural Language Processing ........................................................................................................................... 19
Probabilistic Graphical Models ......................................................................................................................... 20
Stanford Engineering Everywhere .................................................................................................................. 20
EdX ................................................................................................................................................................................ 21
Learning from data (Caltech) ............................................................................................................................ 21
Articifial Intelligence (BerkeleyX) .................................................................................................................... 21
Big Data and Social Physics (Ethics) .............................................................................................................. 22
Introduction to Computational Thinking and Data Science ................................................................ 22
MIT OpenCourseWare (OCW) ........................................................................................................................... 22
VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos ................................................... 22
Foundations of Machine Learning by Mehryar Mohri - 10 years of Homeworks with
Solutions and Lecture Slides, not to be missed ! ....................................................................................... 23
IPAM, Institute for Pure and Applied Mathematics, Videos, UCLA ................................................... 23
Carnegie Mellon University ................................................................................................................................ 24
Carnegie Mellon University (CMU) Video resources ................................................................................ 24
Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani, CMU ..................... 24
Machine Learning, Spring 2011, by Tom Mitchell, CMU ........................................................................ 24
Metacademy Concept list and roadmap list ................................................................................................ 25
Harvard University ................................................................................................................................................. 25
Advanced Machine Learning, Fall 2013 (Free access to most of videos) ........................................ 25
Data Science Course, Fall 2013 ......................................................................................................................... 25
Oxford University, Nando de Freitas video lectures ................................................................................ 26
Cambridge University Machine Learning Slides, Spring 2014 ............................................................ 26
Caltech University, Learning from Data ........................................................................................................ 26
University College London Discovery ............................................................................................................ 26
University College London, Supervised Learning ..................................................................................... 26
Yann LeCuns Publications .................................................................................................................................. 27
machinelearningsalon kit 28th December 2014

Dont keep an old version! machinelearningsalon kit is regularly updated!
Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions (English-
French) ........................................................................................................................................................................ 27
Technion, Israel Institute of Technology, Machine Learning Videos ............................................... 28
NPTEL, National Programme on Technology Enhanced Learning, India ....................................... 28
Probability Theory and Applications .............................................................................................................. 28
Pattern Recognition ............................................................................................................................................... 28
Videolectures.net .................................................................................................................................................... 28
MLSS Machine Learning Summer Schools Videos .................................................................................... 28
MLSS Videos from 2004 to 2012 ....................................................................................................................... 28
MLSS Videos 2012 ................................................................................................................................................... 28
MLSS Videos 2012 ................................................................................................................................................... 28
Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013 .................................. 29
GoogleTechTalks ..................................................................................................................................................... 29
Machine Learning ................................................................................................................................................... 29
Deep Learning ........................................................................................................................................................... 29
Udacity Opencourseware .................................................................................................................................... 29
Supervised Learning (select "View Courseware" for free access) ..................................................... 29
Unsupervised Learning (select "View Courseware" for free access) ................................................ 29
Reinforcement Learning (select "View Courseware" for free access) ............................................. 30
Mathematicalmonk Machine Learning .......................................................................................................... 30
Judea Pearl Symposium ....................................................................................................................................... 30
Machine Learning Reading Group, Indian Institute of Science ........................................................... 30
SIGDATA, Indian Institute of Technology Kanpur .................................................................................... 30
Hakka Labs ................................................................................................................................................................. 30
Open Yale Course .................................................................................................................................................... 31
Columbia University .............................................................................................................................................. 31
Machine Learning resources .............................................................................................................................. 31
Applied Data Science by Ian Langmore and Daniel Krasner ............................................................... 31
Deep Learning .......................................................................................................................................................... 32
BigDataWeek Videos ............................................................................................................................................. 32
Neural Information Processing Systems Foundation (NIPS) Video resources ............................ 32
Hong Kong Open Source Conference 2013 (English&Chinese) .......................................................... 33
ICLR 2014 Videos .................................................................................................................................................... 33
ICLR 2013 Videos .................................................................................................................................................... 33
Machine Learning Conference Videos ........................................................................................................... 33
Internet Archive ...................................................................................................................................................... 35
University of Berkeley .......................................................................................................................................... 35
AMP Camps, Big Data Bootcamp, UC Berkeley ........................................................................................... 35
Resources and Tools of Noah's ARK Research Group ............................................................................. 35
ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014 ............................................................... 36
The Royal Society .................................................................................................................................................... 36
Statistical and causal approaches to machine learning by Professor Bernhard Schlkopf ... 37
Deep Learning .......................................................................................................................................................... 37
Deep Learning RNNaissance with Dr. Juergen Schmidhuber .............................................................. 37
Introduction to Deep Learning with Python by Alec Radford ............................................................. 37
Miscellaneous ........................................................................................................................................................... 38
Introduction To Modern Brain-Computer Interface Design by Swartz Center for
Computational Neuroscience ............................................................................................................................. 38
Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich, Group of
Prof. Roger Wattenhofer ...................................................................................................................................... 38

The wonderful and terrifying implications of computers that can learn | Jeremy Howard |
TEDxBrussels ............................................................................................................................................................. 39
MOOC or Opencourseware - Spanish ......................................................................... 39

MOOC or Opencourseware - German ......................................................................... 39
MOOC or Opencourseware - Italian ........................................................................... 39
MOOC or Opencourseware French .......................................................................... 39
University of Laval (French Canadian) .......................................................................................................... 39
Apprentissage automatique ............................................................................................................................... 39
Thorie algorithm. des graphes ........................................................................................................................ 39
Hugo Larochelle, Apprentissage automatique, French Canadian ...................................................... 40
Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions (English-
French) ........................................................................................................................................................................ 40
College de France, Mathematics and Digital Science, French .............................................................. 41
MOOC or Opencourseware - Russian ......................................................................... 41

Russian Machine Learning Resources ........................................................................................................... 41
Yandex School The Yandex School of Data Analysis ................................................................................ 42
Alexander Dyakonov Resources ...................................................................................................................... 42
Unknown in Data Mining and Machine Learning (2013) ..................................................................... 42
Introduction to Data Mining (2012) ............................................................................................................... 42
Tricks in Data Mining (2011) ............................................................................................................................. 42
Manual "Logic Games, Data Mining, Weka, RapidMiner, MATLAB" (2010) ................................. 42
Machine Learning lectures by Konstantin Vorontsov. ........................................................................... 43
MOOC or Opencourseware - Japanese ....................................................................... 43
MOOC or Opencourseware Chinese ........................................................................ 43
Yeeyan Coursera Chinese Classroom ............................................................................................................ 43
Hong Kong Open Source Conference 2013 .................................................................................................. 43
Guokr.com .................................................................................................................................................................. 43
Machine Learning ................................................................................................................................................... 43
Data Mining ............................................................................................................................................................... 43
Artificial Intelligence ............................................................................................................................................. 44
MOOC or Opencourseware - Portuguese .................................................................... 44
Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computao, UFF, 2010 ............... 44
Algoritmo de Aprendizado de Mquina by Aurora Trinidad Ramirez Pozo, Universidade
Federal do Paran, UFPR .................................................................................................................................... 44
Digital Library, Universidad de Sao Paulo ................................................................................................... 44
MOOC or Opencourseware Hebrew&English ........................................................... 44

Open University of Israel ..................................................................................................................................... 44
Applications ............................................................................................................... 45
MIT Media Lab .......................................................................................................................................................... 45
TEDx San Francisco, Connected Reality ........................................................................................................ 45
Emotion&Pain Project .......................................................................................................................................... 45
NHK Documentary Robot Revolution Developing Robots for Dangerous Fukushima
Decommission Process ......................................................................................................................................... 46
IBM Research ............................................................................................................................................................ 46

Visualizing MBTA Data: An interactive exploration of Boston's subway system ....................... 46
Commercial Applications (listed without any transfer of money) ............................... 47

Google glass ............................................................................................................................................................... 47
Google self-driving car .......................................................................................................................................... 47
SenseFly ...................................................................................................................................................................... 47
Free access to Research papers - English .................................................................... 47
Cambridge University Publications page ..................................................................................................... 47
Google Scholar .......................................................................................................................................................... 47
Google Research ...................................................................................................................................................... 47
Yahoo Research ....................................................................................................................................................... 47
Microsoft Research ................................................................................................................................................ 48
Journal from MIT Press ........................................................................................................................................ 48
INRIA ............................................................................................................................................................................ 48
Open Source Software English ................................................................................. 49
JAVA .............................................................................................................................................................................. 49
Weka 3: Data Mining Software in Java .......................................................................................................... 49
A deep-learning library for Java ....................................................................................................................... 49
List of Java ML Software by Machine Learning Mastery ........................................................................ 49
List of Java ML Software by MLOSS ................................................................................................................. 49
PYTHON ...................................................................................................................................................................... 49
Theano Library for Deep Learning .................................................................................................................. 49
Introduction to Deep Learning with Python ............................................................................................... 50
Udacity - Programming foundations with Python .................................................................................... 50
Scikit-learn, Machine Learning in Python .................................................................................................... 50
Pydata .......................................................................................................................................................................... 50
PyData NYC 2014 Videos ..................................................................................................................................... 50
PyData, The Complete Works by Rohit Sivaprasad .................................................................................. 51
Anaconda .................................................................................................................................................................... 51
Ipython Interactive Computing ......................................................................................................................... 51
Scipy .............................................................................................................................................................................. 51
Numpy .......................................................................................................................................................................... 51
matplotlib ................................................................................................................................................................... 52
pandas .......................................................................................................................................................................... 52
SymPy ........................................................................................................................................................................... 52
Orange .......................................................................................................................................................................... 52
Pythonic Perambulations: How to be a Bayesian in Python ................................................................ 52
emcee ............................................................................................................................................................................ 52
PyMC ............................................................................................................................................................................. 53
Pylearn2 ...................................................................................................................................................................... 53
Giant list of python learning resources .......................................................................................................... 53
PyCon US 2014 .......................................................................................................................................................... 53
PyCon India 2012 .................................................................................................................................................... 53
PyCon India 2013 .................................................................................................................................................... 53
Montreal Python ...................................................................................................................................................... 53
SciPy 2014 .................................................................................................................................................................. 54
PyLadies London Meetup resources ................................................................................................................ 54
Python Tools for Machine Learning by CB Insights .................................................................................. 54
Python Tutorials by Jessica MacKellar ........................................................................................................... 54
OCTAVE ....................................................................................................................................................................... 54
JULIA ............................................................................................................................................................................. 55
Julia by example ....................................................................................................................................................... 55
The R PROJECT for Statistical Computing .................................................................................................... 55
R ...................................................................................................................................................................................... 55
R Graph Gallery ........................................................................................................................................................ 55
Code School - R Course .......................................................................................................................................... 56
Coursera R programming .................................................................................................................................... 56
Open Intro R Labs .................................................................................................................................................... 56
R Tutorial .................................................................................................................................................................... 56
DataCamp R Course ................................................................................................................................................ 56
R Bloggers ................................................................................................................................................................... 56
STAN Software ......................................................................................................................................................... 57
List of Machine Learning Open Source Software ...................................................................................... 57
Google Prediction API ........................................................................................................................................... 57
Reddit ........................................................................................................................................................................... 58
SCHOGUN toolbox ................................................................................................................................................... 58
Infer.NET, Microsoft Research .......................................................................................................................... 58
F# Software Foundation ...................................................................................................................................... 58
BigML ........................................................................................................................................................................... 59
BRML Toolbox in Matlab David Barber Toolbox, University College London .......................... 59
Dmitry Efimov Software ...................................................................................................................................... 59
SCILAB ......................................................................................................................................................................... 59
OverFeat and Torch7, CILVR Lab @ NYU ..................................................................................................... 59
Mloss.org .................................................................................................................................................................... 59
Sourceforge ............................................................................................................................................................... 60
Freecode ..................................................................................................................................................................... 60
Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer, and
John Langford, August 2014 ............................................................................................................................... 60
Maxim Milakov Software ..................................................................................................................................... 60
Alfonso Nieto-Castanon Software .................................................................................................................... 61
Lib Skylark ................................................................................................................................................................. 61
Mutual Information Text Explorer .................................................................................................................. 61
Data Science Resources by Jonathan Bower on GitHub ......................................................................... 61
Joseph Misiti's Blog ................................................................................................................................................ 62
Michael Waskom GitHub repositories ........................................................................................................... 62
Visualizing distributions of data ...................................................................................................................... 62
Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic
Rudiger ........................................................................................................................................................................ 62
Open Source Hong Kong ...................................................................................................................................... 63
Lamda Group, Nanjing University ................................................................................................................... 63
Big Data/Cloud Computing English .......................................................................... 63

Apache SPARK .......................................................................................................................................................... 63
Apache Spark Machine Learning Library ..................................................................................................... 63
2013 Spark Summit exercises ............................................................................................................................ 63
2014 Spark Summit Training ............................................................................................................................. 64
Apache Spark Summit Videos ............................................................................................................................ 64
Databricks Videos .................................................................................................................................................... 64
Apache MAHOUT ..................................................................................................................................................... 65
Apache Mahout ML library ................................................................................................................................. 65

Apache Mahout on Javaworld ............................................................................................................................ 65
Deeplearning4j ......................................................................................................................................................... 65
Udacity opencourseware "Intro to Hadoop and MapReduce" ............................................................ 66
Storm Apache ........................................................................................................................................................... 66
Michael Viogiatzis Blog ......................................................................................................................................... 66
Elasticsearch ............................................................................................................................................................. 67
Prediction IO ............................................................................................................................................................. 67
Container Cluster Manager ................................................................................................................................. 67
Domino Data Labs .................................................................................................................................................. 67
Data Science Central .............................................................................................................................................. 68
Amazon Web Services Videos ............................................................................................................................ 68
Google Cloud Computing Videos ...................................................................................................................... 68
VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business .... 68
Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture .................. 68
Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French) ................................ 68
An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia,
2014 .............................................................................................................................................................................. 68
Predictive Modeling Competitions English ............................................................... 69

LinkedIn Economic Graph Challenge, Deadline: 15-12-2014, $25,000 research award ......... 69
ChaLearn ..................................................................................................................................................................... 70
IMAGENET Large Scale Visual Recognition Challenge 2014 (closed) ............................................. 71
Kaggle ........................................................................................................................................................................... 71
Kaggle Competition Past Solutions ................................................................................................................. 71
Kaggle Connectomics Winning Solution Research Article .................................................................... 72
Solution to the Galaxy Zoo Challenge ............................................................................................................. 72
Winning 2 Kaggle in class competitions on spam ..................................................................................... 72
Matlab Benchmark for Packing Santas Sleigh translated in Python .............................................. 72
TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms) .................... 72
CrowdANALYTICS .................................................................................................................................................. 72
Challenges for governmental applications .................................................................................................. 72
InnoCentive Challenge Center ........................................................................................................................... 72
TunedIT ....................................................................................................................................................................... 72
Ants, AI Challenge, sponsored by Google, 2011 ......................................................................................... 72
International Collegial Programming Contest ........................................................................................... 72
Dream challenges .................................................................................................................................................... 73
Texata ........................................................................................................................................................................... 73
Cisco Internet of Things Innovation Grand Challenge ............................................................................ 73
Predictive Modeling Competitions - Spanish .............................................................. 74
Predictive Modeling Competitions - German .............................................................. 74
Predictive Modeling Competitions - Italian ................................................................ 74
Predictive Modeling Competitions French ............................................................... 74
RATP OpenDataLab results ................................................................................................................................ 74
Predictive Modeling Competitions - Russian .............................................................. 74
Competition Avito.ru-2014: Recognition of contact information in images ................................. 75
Russian AI Cup - Competition Programming Artificial Intelligence, 2013 .................................... 75
Predictive Modeling Competitions - Portuguese ......................................................... 76

Open Dataset English .............................................................................................. 76
The Text REtrieval Conference (TREC) Datasets ...................................................................................... 76
HDX Humanitarian Data Exchange ................................................................................................................. 77
World Data Bank ..................................................................................................................................................... 77
US Dataset .................................................................................................................................................................. 77
US City Open Data Census ................................................................................................................................... 78
Machine Learning repository ............................................................................................................................ 78
IMAGENET ................................................................................................................................................................. 78
Stanford Large Network Dataset Collection ................................................................................................ 78
Deep Learning datasets ........................................................................................................................................ 79
Open Government Data (OGD) Platform India ........................................................................................... 79
Yahoo Datasets ......................................................................................................................................................... 79
Windows Azure Marketplace ............................................................................................................................ 80
Amazon Public Data Sets ..................................................................................................................................... 80
Wikipedia: Database Download ....................................................................................................................... 80
Gutenberg project (Free books available in different format, useful for NLP) ............................ 80
Freebase ...................................................................................................................................................................... 80
Datamob Data ........................................................................................................................................................... 80
Reddit Datasets ........................................................................................................................................................ 81
100+ Interesting Data Sets for Statistics ...................................................................................................... 81
Data portal of the City of Chicago .................................................................................................................... 81
Data portal of the City of Seattle ....................................................................................................................... 81
Data portal of the City of LA ............................................................................................................................... 81
California Department of Water Resources ................................................................................................ 81
Data portal of the City of Dallas ........................................................................................................................ 82
Data portal of the City of Austin ....................................................................................................................... 82
How to produce and use datasets: lessons learned, mlwave ............................................................... 82
MITx and HarvardX release MOOC datasets and visualization tools ............................................... 82
Finding the perfect house using open data, Justin Palmers Blog ...................................................... 82
Synapse ....................................................................................................................................................................... 82
NYC Taxi Trips Date from 2013 ........................................................................................................................ 82
Sebastian Raschkas Dataset Collections ...................................................................................................... 83
Awesome Public Datasets by Xiaming Chen, Shanghai, China ............................................................ 83
UK Dataset .................................................................................................................................................................. 83
LONDON DATASTORE - 591 datasets ........................................................................................................... 83
Transport For London Open Data, UK ........................................................................................................... 83
Gaussian Processes List of Datasets ............................................................................................................... 83
The New York Times Linked Open Data (Beta) ......................................................................................... 84
Google Public Data Explorer .............................................................................................................................. 84
Open Dataset - French ............................................................................................... 85
Montreal, Portail Donnees Ouvertes (French&English), Canada ....................................................... 85
Insee, France ............................................................................................................................................................. 85
RATP Open Data, French Tube in Paris, France ......................................................................................... 85
LOpen-Data franais cartographi ................................................................................................................. 85
Open Dataset - China ................................................................................................. 85
Lamda Group ............................................................................................................................................................ 85
Data Visualisation ...................................................................................................... 86

Visualization Lab Gallery, Computer Science Division, University of California, Berkeley .... 86
Visualization Lab Software, Computer Science Division, University of California, Berkeley 87
Visualization Lab Course Wiki, Computer Science Division, University of California, Berkeley
......................................................................................................................................................................................... 87
Mike Bostock ............................................................................................................................................................. 87
Eyeo Festival ............................................................................................................................................................. 87
MIT Data Collider .................................................................................................................................................... 87
D3 JS Data-Driven Documents ........................................................................................................................... 88
Shan He, Research Fellow at MIT Senseable City Lab ............................................................................. 88
Gource software version control visualization .......................................................................................... 88
Logstalgia, website access log visualization ................................................................................................ 88
Andrew Caudwell's Blog ...................................................................................................................................... 88
Books English .......................................................................................................... 89

An Architecture for Fast and General Data Processing on Large Clusters by Matei Zaharia,
2014 .............................................................................................................................................................................. 89
Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua
Bengio, Ian Goodfellow and Aaron Courville, 20-Oct-2014 ................................................................. 90
Deep Learning Tutorial by LISA Lab, University of Montreal, 2014 ................................................. 90
Statistical Inference for Everyone, by Professor Bryan Blais, 2014 ................................................. 91
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014 ............... 91
Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014 ........................ 92
Causal Inference by Miguel A. Hernn and James M. Robins, May 14, 2014, Draft .................... 93
Slides for High Performance Python tutorial at EuroSciPy2014 by Ian Ozsvald ........................ 93
Neural Networks and Deep Learning, 2014 ................................................................................................ 93
Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-Pilon,
2014 .............................................................................................................................................................................. 94
Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 02-2014) 94
Past, Present, and Future of Statistical Science by COPSS, 2014 ........................................................ 94
Essential of Metaheuristics by Sean Luke, 2013 ....................................................................................... 95
Statistical Model Building, Machine Learning, and the Ah-Ha Moment by Grace Wahba,
2013 .............................................................................................................................................................................. 95
An Introduction to Statistical Learning with applications in R. by Gareth James Daniela
Witten Trevor Hastie Robert Tibshirani, 2013 (first printing) .......................................................... 95
A course in Machine Learning by Hal Daume, 2012 ................................................................................ 95
Machine Learning in Action, Peter Harrington, 2012 ............................................................................. 95
A Programmer's Guide to Data Mining, by Ron Zacharski, 2012 ....................................................... 95
Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan
Mackworth, 2010 .................................................................................................................................................... 96
The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009 ............. 96
Learning Deep Architecture for AI by Yoshua Bengio, 2009 ............................................................... 97
An Introduction to Information Retrieval by Christopher D. Manning Prabhakar Raghavan
Hinrich Schtze, 2009 ........................................................................................................................................... 97
Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schlkopf; Alexander
J. Smola, 2008 ......................................................................................................................................................... 98
Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008 ................................ 98
Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006 .................................. 98
Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006 ...................... 99
Bayesian Machine Learning by Chakraborty, Sounak, 2005 .............................................................. 100

Machine Learning by Tom Mitchell, 2005 .................................................................................................. 100

Information Theory, Inference, and Learning Algorithms, David McKay, 2003 ....................... 100
Free Book List ......................................................................................................................................................... 100
Free resource book (need to sign in) ........................................................................................................... 101
Free ML ebooks on it-ebooks, but this website is controversial, please read stackoverflow
before accessing to this website by yourself ............................................................................................ 101
Wikipedia: Machine Learning, the Complete Guide ............................................................................... 101
ISSUU .......................................................................................................................................................................... 101
Books - Spanish ........................................................................................................ 102

Books - German ....................................................................................................... 102
Books - Italian .......................................................................................................... 102
Books - French ......................................................................................................... 102
Books Russian ....................................................................................................... 102
Pattern Recognition by .., 2011 ................................................................................................ 102
Algorithmic models of learning classification: rationale, comparison, selection, 2014 ......... 102
Books - Japanese ...................................................................................................... 102
Books - Chinese ........................................................................................................ 103
Blog recommending useful books ................................................................................................................. 103
Textbook for Statistics ........................................................................................................................................ 103
Introduction to Pattern recognition ............................................................................................................. 103
Translated version of Machine Learning by Tom Mitchell: ................................................................ 103
Books - Portuguese .................................................................................................. 103
Presentation, Infographics and Documents - English ................................................ 103
Meetup's Presentations ...................................................................................................................................... 103
Slides .......................................................................................................................................................................... 103
Slideshare.com ....................................................................................................................................................... 103
Slides.com ................................................................................................................................................................ 103
Powershow.com .................................................................................................................................................... 103
Speaker Deck .......................................................................................................................................................... 103
Slides from Lectures ............................................................................................................................................ 104
Slides from Meetups ............................................................................................................................................ 104
Slides from Conferences ..................................................................................................................................... 104
Conferences ............................................................................................................. 105
International Conference in Machine Learning (ICML) ....................................................................... 105
ICML, Beijing, China 2014 ................................................................................................................................. 105
ICML, Atlanta, US 2013 ...................................................................................................................................... 105
ICML, Edinburgh, UK 2012 ............................................................................................................................... 105
ICML, Bellevue, US 2011 .................................................................................................................................... 105
ICML, Haifa, Israel 2010 .................................................................................................................................... 105
Full archive of ICML ............................................................................................................................................ 105
Machine Learning Conference Videos ......................................................................................................... 105
Annual Machine Learning Symposium ........................................................................................................ 105
6th ............................................................................................................................................................................... 105
8th ................................................................................................................................................................................. 105
Archive ...................................................................................................................................................................... 105

MLSS Machine Learning Summer Schools ................................................................................................. 106
Data Gotham 2012,2013 .................................................................................................................................... 106
Meetup - English ...................................................................................................... 107

631 Machine Learning Meetup in the World ............................................................................................ 107
Data Science Weekly List of Meetups ....................................................................................................... 107
Other Meetups missing in Data Science Weekly ..................................................................................... 107
London Machine Learning Meetup ............................................................................................................... 107
London Deep Learning Meetup ...................................................................................................................... 107
Blog English ........................................................................................................... 108
Data Science Weekly ............................................................................................................................................ 108
Yann LeCun, Google+ ........................................................................................................................................... 108
Igor Carron Blog .................................................................................................................................................... 108
KDD Community, Knowledge discovery and Data Mining .................................................................. 108
Kaggle Blog .............................................................................................................................................................. 108
Digg ............................................................................................................................................................................. 108
Feedly ......................................................................................................................................................................... 108
Mlwave ...................................................................................................................................................................... 108
FastML ....................................................................................................................................................................... 108
Beating the Benchmark ...................................................................................................................................... 109
YOU CANalytics ...................................................................................................................................................... 109
Trevor Stephens Blog .......................................................................................................................................... 109
Mozilla Hacks .......................................................................................................................................................... 109
Banach's Algorithmic Corner, University of Warsaw ............................................................................ 109
DataCamp Blog ....................................................................................................................................................... 110
Natural Language Processing Blog, Hal Daume ....................................................................................... 110
Maxim Milakov Blog ............................................................................................................................................ 110
Alfonso Nieto-Castanon Blog ........................................................................................................................... 110
Persontyle Blog ...................................................................................................................................................... 110
Analytics Vidhya .................................................................................................................................................... 110
Bugra Akyildiz's Blog .......................................................................................................................................... 111
Data origami ............................................................................................................................................................ 111
Rasbts Blog ............................................................................................................................................................. 111
Gilles Louppe's Blog ............................................................................................................................................. 111
AI Topics ................................................................................................................................................................... 111
AI International ..................................................................................................................................................... 112
Joseph Misiti's Blog .............................................................................................................................................. 112
MIRI, Machine Intelligence Research Institute ........................................................................................ 112
Kevin Davenport Data Blog .............................................................................................................................. 112
Alexandre Passant's Blog .................................................................................................................................. 113
Daniel Nouris Blog ............................................................................................................................................... 113
Yvonne Rogers Blog ............................................................................................................................................. 114
Blog - Spanish .......................................................................................................... 114
Blog - Italian ............................................................................................................. 114
Blog - German .......................................................................................................... 114
Blog - French ............................................................................................................ 114
10
Blog - Russian ........................................................................................................... 114

Blog - Japanese ........................................................................................................ 114
Blog - Chinese .......................................................................................................... 115
Blog - Portuguese ..................................................................................................... 115
Journals - English ..................................................................................................... 115
Journal of Machine Learning Research, MIT Press ................................................................................. 115
Machine Learning Journal (last article could be downloaded for free) ........................................ 115
Machine Learning (Theory) ............................................................................................................................. 115
List of Journals on Microsoft Academic Research website ................................................................. 115
Wired magazine ..................................................................................................................................................... 115
Data Science Central ............................................................................................................................................ 115
Journals Spanish .................................................................................................... 115
Journals German ................................................................................................... 116
Journals Italian ...................................................................................................... 116
Journals French ..................................................................................................... 116
Journals Russian .................................................................................................... 116
Journals Japanese ................................................................................................. 116
Journals Chinese ................................................................................................... 116
Journals - Portuguese ............................................................................................... 116
Forum, Q&A - English ............................................................................................... 116
Data Tau .................................................................................................................................................................... 116
Hacker News ........................................................................................................................................................... 117
Metaoptimize .......................................................................................................................................................... 117
Kaggle Forums ....................................................................................................................................................... 117
Reddit in English ................................................................................................................................................... 117
Cross validated Stack Exchange ..................................................................................................................... 117
Open data Stack Exchange ................................................................................................................................ 117
Data Science Beta Stack Exchange ................................................................................................................. 118
Quora .......................................................................................................................................................................... 118
Machine Learning Impact Forum ................................................................................................................... 118
Forum, Q&A - Spanish .............................................................................................. 118
Forum, Q&A - German ............................................................................................. 118
Forum, Q&A - Italian ................................................................................................ 118
Forum, Q&A - French ............................................................................................... 118
Forum, Q&A - Russian .............................................................................................. 119
Reddit in Russian .................................................................................................................................................. 119
Forum, Q&A Portuguese ....................................................................................... 119
Forum, Q&A Chinese ............................................................................................. 119
11
Zhihu.com ................................................................................................................................................................. 119

Machine Learning ................................................................................................................................................ 119
Data Mining ............................................................................................................................................................ 119
Artificial Intelligence .......................................................................................................................................... 119
Guokr.com ................................................................................................................................................................ 119
Machine Learning ................................................................................................................................................ 119
Data Mining ............................................................................................................................................................ 119
Artificial Intelligence .......................................................................................................................................... 119
Governmental Reports - English ............................................................................... 120

Big Data report, Whitehouse, US .................................................................................................................... 120
Fun - English ............................................................................................................. 120
Founder of PhD Comics ...................................................................................................................................... 120
MACHINE LEARNING RESEARCH GROUPS ................................................................. 121
MACHINE LEARNING RESEARCH GROUPS in AMERICA, USA .......................................................... 121
MIT .............................................................................................................................................................................. 121
Stanford University .............................................................................................................................................. 121
Carnegie Mellon University .............................................................................................................................. 122
Intelligent Interactive Systems Group at Harvard University .......................................................... 122
University of California, Berkeley .................................................................................................................. 123
Princeton University ............................................................................................................................................ 124
University of California, Los Angeles (UCLA) ........................................................................................... 124
Cornwell University ............................................................................................................................................. 125
University of Illinois at Urbana Champaign ............................................................................................. 125
California Institute of Technology, Caltech ............................................................................................... 125
University of Washington ................................................................................................................................. 126
Social Robotics Lab - Yale University ........................................................................................................... 126
Georgia Institute of Technology ..................................................................................................................... 126
University of Texas and Austin ....................................................................................................................... 126
University of Pennsylvania ............................................................................................................................... 127
Columbia University ............................................................................................................................................ 127
New York City University .................................................................................................................................. 127
University of Chicago .......................................................................................................................................... 127
The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos ..... 127
Miscellaneous ......................................................................................................................................................... 128
IARPA Organization ............................................................................................................................................ 128
MACHINE LEARNING RESEARCH GROUPS in AMERICA, CANADA ................................................ 128
University of Toronto .......................................................................................................................................... 128
University of Waterloo ....................................................................................................................................... 128
University of British Columbia ........................................................................................................................ 129
University of Montreal ........................................................................................................................................ 130
University of Sherbrooke ................................................................................................................................... 130
University of Laval ............................................................................................................................................... 131
MACHINE LEARNING RESEARCH GROUPS in AMERICA, BRAZIL ................................................... 132
USP - UNIVERSIDADE DE SO PAULO, Instituto de Cincias Matemticas e de Computao
...................................................................................................................................................................................... 132
MACHINE LEARNING RESEARCH GROUPS in EUROPE, UK ............................................................... 132
University College London ................................................................................................................................ 132
CASA (Centre for Advanced Spatial Studies) Working Papers, University College London . 133

12
Oxford University .................................................................................................................................................. 133

Imperial College .................................................................................................................................................... 133
The University of Edinburgh, Institute for Adaptive and Neural Computation ........................ 134
Cambridge University ......................................................................................................................................... 134
Centre for Intelligent Sensing, Queen Mary University of London, UK .......................................... 134
ICRI, The Intel Collaborative Research Institute .................................................................................... 135
MACHINE LEARNING RESEARCH GROUPS in EUROPE, FRANCE .................................................... 135
Magnet, MAchine learninG in information NETworks, INRIA, France .......................................... 135
Sierra Team - Ecole Normale Superieure , CNRS, INRIA ..................................................................... 135
ENS Ecole Normale Superieure ...................................................................................................................... 136
MACHINE LEARNING RESEARCH GROUPS in EUROPE, GERMANY ............................................... 137
Max Planck Institute for Intelligent Systems, Tbingen site ............................................................. 137
BRML Research Lab, Institute of Informatics at the Technische Universitt Mnchen ....... 137
MACHINE LEARNING RESEARCH GROUPS in EUROPE, SWITZERLAND ..................................... 137
EPFL Ecole Polytechnique Federale de Lausanne, Switzerland ....................................................... 137
IDSIA: the Swiss AI Lab ...................................................................................................................................... 138
MACHINE LEARNING RESEARCH GROUPS in EUROPE, NETHERLANDS .................................... 138
Machine Learning Research Groups in The Netherlands .................................................................... 138
MACHINE LEARNING RESEARCH GROUPS in EUROPE, POLAND ................................................... 138
University of Warsaw, Dept. of Mathematics, Informatics and Mechanics ................................. 138
MACHINE LEARNING RESEARCH GROUPS in ASIA, INDIA ................................................................ 139
Indian Institute of Science ................................................................................................................................ 139
Indian Institute of Technology of Kanpur ................................................................................................. 139
MACHINE LEARNING RESEARCH GROUPS in ASIA, CHINA ............................................................... 139
Peking University .................................................................................................................................................. 139
Beijing University of Technology ................................................................................................................... 140
University of Science and Technology of China, USTC .......................................................................... 141
Nanjing University ............................................................................................................................................... 141
MACHINE LEARNING RESEARCH GROUPS in ASIA, RUSSIA ............................................................. 141
Moscow State University ................................................................................................................................... 141
MACHINE LEARNING RESEARCH GROUPS in AFRICA ......................................................................... 141
MACHINE LEARNING RESEARCH GROUPS in OCEANIA ..................................................................... 142
NICTA Machine Learning Research Group, Australia .......................................................................... 142
Academics (with free access to their publications), US ............................................. 143

Stanford University, US ...................................................................................................................................... 143
Andrew Ng ............................................................................................................................................................... 143
Princeton University, US .................................................................................................................................... 143
Robert Schapire ..................................................................................................................................................... 143
Mona Singh ............................................................................................................................................................. 143
Olga Troyanskaya ................................................................................................................................................ 144
UCLA, US ................................................................................................................................................................... 144
Judea Pearl, Cognitive System Laboratory ................................................................................................ 144
Rice University, US ............................................................................................................................................... 144
Justin Esarey Lectures, Assistant Professor of Political Science ....................................................... 144
University of Maryland, US ............................................................................................................................... 145
Hal Daume III ......................................................................................................................................................... 145
Academics (with free access to their publications), FRANCE ..................................... 145
Ecole Normale Superieure, FRANCE ............................................................................................................ 145
Francis Bach ........................................................................................................................................................... 145
13
Academics (with free access to their publications), UK ............................................. 145

University College London, UK ....................................................................................................................... 145
John Shaw-Taylor ................................................................................................................................................. 145
Mark Herbster ........................................................................................................................................................ 146
David Barber .......................................................................................................................................................... 146
Gabriel Brostow .................................................................................................................................................... 146
Jun Wang .................................................................................................................................................................. 147
David Jones Lab ..................................................................................................................................................... 147
Simon Prince ........................................................................................................................................................... 147
Massimiliano Pontil ............................................................................................................................................. 148
Cambridge University, UK ................................................................................................................................. 148
Richard E Turner .................................................................................................................................................. 148
Oxford University, UK ......................................................................................................................................... 148
Phil Blunsom ........................................................................................................................................................... 148
Nando de Freitas ................................................................................................................................................... 149
Karl Hermann ........................................................................................................................................................ 149
Edward Grefenstette ........................................................................................................................................... 149
Delft University of Technology, NETHERLANDS ..................................................................................... 149
Thomas Geijtenbeek Publications & Videos .............................................................................................. 149
Academics (with free access to their publications), CANADA .................................... 150
University of Montreal, CANADA ................................................................................................................... 150
Yoshua Bengio ....................................................................................................................................................... 150
University of Toronto, CANADA ..................................................................................................................... 150
Geoffrey Hinton ..................................................................................................................................................... 150
Universite de Sherbrooke, CANADA ............................................................................................................. 151
Hugo Larochelle .................................................................................................................................................... 151
University of British Columbia, CANADA ................................................................................................... 151
Giuseppe Carenini ................................................................................................................................................. 151
Cristina Conati ....................................................................................................................................................... 151
Kevin Leyton-Brown ............................................................................................................................................ 151
Holger Hoos ............................................................................................................................................................ 151
Jim Little ................................................................................................................................................................... 151
David Lowe .............................................................................................................................................................. 151
Karon MacLean ..................................................................................................................................................... 152
Alan Mackworth .................................................................................................................................................... 152
Dinesh K. Pai ........................................................................................................................................................... 152
David Poole ............................................................................................................................................................. 152
Academics (with free access to their publications), CHINA ....................................... 152
USPC, CHINA ........................................................................................................................................................... 152
En-Hong Chen ........................................................................................................................................................ 152
Linli Xu ...................................................................................................................................................................... 152
University of Beijing, CHINA ............................................................................................................................ 152
Yuan Yao, School of Mathematical Sciences ............................................................................................. 152
Academics (with free access to their publications), RUSSIA ...................................... 153
Moscow State University, RUSSIA ................................................................................................................. 153
Dmitry Efimov ........................................................................................................................................................ 153
Academics (with free access to their publications), POLAND .................................... 153
14
University of Warsaw, POLAND ..................................................................................................................... 153

Marcin Murca ......................................................................................................................................................... 153
Academics (with free access to their publications), SWITZERLAND ........................... 154

Prof. Jrgen Schmidhuber's Home Page (Great resources! Not to be missed!) ........................ 154
Free access to a list of Machine Learning MSc/PhD Dissertations ............................. 154
Machine Learning Department, Carnegie Mellon University ............................................................ 154
Machine Learning Department, Columbia University .......................................................................... 154
PhD Dissertations, University of Edingburgh, UK .................................................................................. 154
MSc Dissertations, University of Oxford, UK ............................................................................................. 154
Machine Learning Group, Department of Engineering, University of Cambridge, UK .......... 155


15
How to use the Machine Learning Salons Kit?

What is the Machine Learning Salons Kit?

For now, the Machine Learning Salons Kit is just a collection of useful websites
gathered on Blogs such as Datatau.com, Groups on LinkedIn, posts on Twitter,
publications on Google Scholar, Universities websites, etc.

We are just speaking French and English but we are gathering information from all
over the World thanks to Google Translate! As an example, on MachineLearning.ru,
from Russian translated in English, weve found a very helpful link to the annual
report of the American Statistical Society. Who would expect that?

What is not the Machine Learning Salons Kit?
The Machine Learning Salons Kit is not a commercial product. We are not making
any money of it. Its free, without any registration, and free from any advertising.

Why are we not on GitHub?
We want to provide free worldwide information, and regarding the country, not
everybody wants to register on GitHub (which is very helpful).
We have found that the PDF file is the most universal solution.

If you are a CTO who wants to recruit smart Machine Learning developers
When The Machine Learning Salon will move from Beta Test to Live, you will have
the opportunity to post a video regarding the Machine Learning challenges that you
are facing. If ML and AI developers are interested by you, they will contact you
directly.

If you want to become a contributor
Just send us valuable information about ML and AI by email, we will add it in The
Machine Learning Salons kit and you will be listed as a contributor. We are very
keen on valuable resources in Spanish, Portuguese, Russian, Chinese, German, etc
because we dont speak these languages and its hard to find information.

If you want to remove a link
Just tell us and why, and well remove it.

If you want to add a better description of your website

Just tell us and why, and well add it.

16
If you are willing to give a discount to the Machine Learning Salons readers
Just tell us what youre willing to propose and if we think that the readers will find it
relevant, we will add it without any exchange of money.

You will never get an email list of our visitors because we dont have any
information about our visitors. We are working with a Basic Adobe Business
Catalysts website, weve got the geographical location of our visitors, their loyalty,
their page views, etc. but no IP addresses,

We have nothing to sale and we are not willing to.

About the Founder of The Machine Learning Salons Website & Kit

The Machine Learning Salon is founded by Jacqueline I. Forien who very much
enjoyed her Master of Science in Machine Learning at University College London
thanks to all her wonderful Machine Learning & Computational Statistics and
Machine Learning's Peers and Teachers.

Jacqueline would like to express a special gratitude to her director of Machine
Learning studies at UCL, Professor Mark Herbster, her tutor, Professor David
Barber, her supervisor of Master's project, Professor Nadia Berthouze.

In addition, Jacqueline would like to express many thanks to Igor Carron who
initiated the smart association of 'Machine Learning' and 'Salon', and gave her the
opportunity to organise in London a wonderful event that was the Europe Wide
Machine Learning Meetup between Paris, Berlin, Zurich and London with Andrew
Ng as a Guest speaker.

Contact

Please, contact us if you want to add a contribution, remove a link, etc.
Any suggestion is welcome! Contact at contact@machinelearningsalon.org


17
MOOC or Opencourseware English

Coursera
Machine Learning Stanford Course
This course provides a broad introduction to machine learning, datamining, and
statistical pattern recognition. Topics include: (i) Supervised learning
(parametric/non-parametric algorithms, support vector machines, kernels, neural
networks). (ii) Unsupervised learning (clustering, dimensionality reduction,
recommender systems, deep learning). (iii) Best practices in machine learning
(bias/variance theory; innovation process in machine learning and AI). The course
will also draw from numerous case studies and applications, so that you'll also learn
how to apply learning algorithms to building smart robots (perception, control), text
understanding (web search, anti-spam), computer vision, medical informatics,
audio, database mining, and other areas.
https://www.coursera.org/course/ml
Pratical Machine Learning
One of the most common tasks performed by data scientists and data analysts are
prediction and machine learning. This course will cover the basic components of
building and applying prediction functions with an emphasis on practical
applications. The course will provide basic grounding in concepts such as training
and tests sets, overfitting, and error rates. The course will also introduce a range of
model based and algorithmic machine learning methods including regression,
classification trees, Naive Bayes, and random forests. The course will cover the
complete process of building prediction functions including data collection, feature
creation, algorithms, and evaluation.
https://www.coursera.org/course/predmachlearn
Machine Learning Washington Course
Machine learning algorithms can figure out how to perform important tasks by
generalizing from examples. This is often feasible and cost-effective when manual
programming is not. Machine learning (also known as data mining, pattern
recognition and predictive analytics) is used widely in business, industry, science
and government, and there is a great shortage of experts in it. If you pick up a
machine learning textbook you may find it forbiddingly mathematical, but in this
class you will learn that the key ideas and algorithms are in fact quite intuitive. And
powerful!

Most of the class will be devoted to supervised learning (in other words, learning in
which a teacher provides the learner with the correct answers at training time). This
is the most mature and widely used type of machine learning. We will cover the
main supervised learning techniques, including decision trees, rules, instances,
Bayesian techniques, neural networks, model ensembles, and support vector
machines. We will also touch on learning theory with an emphasis on its practical
uses. Finally, we will cover the two main classes of unsupervised learning methods:
18
clustering and dimensionality reduction. Throughout the class there will be an

emphasis not just on individual algorithms but on ideas that cut across them and
tips for making them work.
https://www.coursera.org/course/machlearning

Core Concepts in Data Analysis (Higher School of Economics)
Learn both theory and application for basic methods that have been invented either
for developing new concepts principal components or clusters, or for finding
interesting correlations regression and classification. This is preceded by a
thorough analysis of 1D and 2D data
This is an unconventional course in modern Data Analysis, Machine Learning and
Data Mining. Its contents are heavily influenced by the idea that data analysis should
help in enhancing and augmenting knowledge of the domain as represented by the
concepts and statements of relation between them. According to this view, two main
pathways for data analysis are summarization, for developing and augmenting
concepts, and correlation, for enhancing and establishing relations. The term
summarization embraces here both simple summaries like totals and means and
more complex summaries: the principal components of a set of features and cluster
structures in a set of entities. Similarly, correlation covers both bivariate and
multivariate relations between input and target features including Bayes classifiers.
https://www.coursera.org/course/datan

Neural Networks for Machine Learning
Neural Networks use learning algorithms that are inspired by our understanding of
how the brain learns, but they are evaluated by how well they work for practical
applications such as speech recognition, object recognition, image retrieval and the
ability to recommend products that a user will like. As computers become more
powerful, Neural Networks are gradually taking over from simpler Machine
Learning methods. They are already at the heart of a new generation of speech
recognition devices and they are beginning to outperform earlier systems for
recognizing objects in images. The course will explain the new learning procedures
that are responsible for these advances, including effective new proceduresr for
learning multiple layers of non-linear features, and give you the skills and
understanding required to apply these procedures in many other domains.
https://www.coursera.org/course/neuralnets

Natural Language Processing
Natural language processing (NLP) deals with the application of computational
models to text or speech data. Application areas within NLP include automatic
(machine) translation between languages; dialogue systems, which allow a human
to interact with a machine using natural language; and information extraction,
where the goal is to transform unstructured text into structured (database)
representations that can be searched and browsed in flexible ways. NLP
19
technologies are having a dramatic impact on the way people interact with
computers, on the way people interact with each other through the use of language,
and on the way people access the vast amount of linguistic data now in electronic
form. From a scientific viewpoint, NLP involves fundamental questions of how to
structure formal models (for example statistical models) of natural language
phenomena, and of how to design algorithms that implement these models.
https://www.coursera.org/course/nlangp

Probabilistic Graphical Models
Uncertainty is unavoidable in real-world applications: we can almost never predict
with certainty what will happen in the future, and even in the present and the past,
many important aspects of the world are not observed with certainty. Probability
theory gives us the basic foundation to model our beliefs about the different
possible states of the world, and to update these beliefs as new evidence is obtained.
These beliefs can be combined with individual preferences to help guide our actions,
and even in selecting which observations to make. While probability theory has
existed since the 17th century, our ability to use it effectively on large problems
involving many inter-related variables is fairly recent, and is due largely to the
development of a framework known as Probabilistic Graphical Models (PGMs). This
framework, which spans methods such as Bayesian networks and Markov random
fields, uses ideas from discrete data structures in computer science to efficiently
encode and manipulate probability distributions over high-dimensional spaces,
often involving hundreds or even many thousands of variables. These methods have
been used in an enormous range of application domains, which include: web search,
medical and fault diagnosis, image understanding, reconstruction of biological
networks, speech recognition, natural language processing, decoding of messages
sent over a noisy communication channel, robot navigation, and many more. The
PGM framework provides an essential tool for anyone who wants to learn how to
reason coherently from limited and noisy observations.
https://www.coursera.org/course/pgm
Stanford Engineering Everywhere
SEE programming includes one of Stanfords most popular engineering sequences:
the three-course Introduction to Computer Science taken by the majority of Stanford
undergraduates, and seven more advanced courses in artificial intelligence and
electrical engineering.
Introduction to Computer Science

Programming Methodology
CS106A
Programming Abstractions
CS106B
Programming Paradigms
CS107
Artificial Intelligence
Introduction to Robotics
CS223A

20
Natural Language Processing

CS224N
Machine Learning
CS229
Linear Systems and Optimization

The Fourier Transform and its Applications
EE261
Introduction to Linear Dynamical Systems
EE263
Convex Optimization I
EE364A
Convex Optimization II
EE364B
Additional School of Engineering Courses
Programming Massively Parallel Processors
CS193G
iPhone Application Programming
CS193P
Seminars and Webinars
http://see.stanford.edu/see/courses.aspx
EdX
Learning from data (Caltech)
This is an introductory course in machine learning (ML) that covers the basic
theory, algorithms, and applications. ML is a key technology in Big Data, and in many
financial, medical, commercial, and scientific applications. It enables computational
systems to automatically learn how to perform a desired task based on information
extracted from the data. ML has become one of the hottest fields of study today,
taken up by undergraduate and graduate students from 15 different majors at
Caltech. This course balances theory and practice, and covers the mathematical as
well as the heuristic aspects.
https://www.edx.org/course/caltechx/caltechx-cs1156x-learning-data-1120#.U5NNJxaRPwI
https://www.edx.org/course/caltechx/caltechx-cs1156x-learning-data-1120#.U4oB75SSyG4
Articifial Intelligence (BerkeleyX)

CS188.1x is a new online adaptation of the first half of UC Berkeley's CS188:
Introduction to Artificial Intelligence. The on-campus version of this upper division
computer science course draws about 600 Berkeley students each year.
Artificial intelligence is already all around you, from web search to video games. AI
methods plan your driving directions, filter your spam, and focus your cameras on
faces. AI lets you guide your phone with your voice and read foreign newspapers in
English. Beyond today's applications, AI is at the core of many new technologies that
will shape our future. From self-driving cars to household robots, advancements in
AI help transform science fiction into real systems.
CS188.1x focuses on Behavior from Computation. It will introduce the basic ideas
and techniques underlying the design of intelligent computer systems. A specific
21
emphasis will be on the statistical and decisiontheoretic modeling paradigm. By

the end of this course, you will have built autonomous agents that efficiently make
decisions in stochastic and in adversarial settings. CS188.2x (to follow CS188.1x,
precise date to be determined) will cover Reasoning and Learning. With this
additional machinery your agents will be able to draw inferences in uncertain
environments and optimize actions for arbitrary reward structures. Your machine
learning algorithms will classify handwritten digits and photographs. The
techniques you learn in CS188x apply to a wide variety of artificial intelligence
problems and will serve as the foundation for further study in any application area
you choose to pursue.
https://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-cs188-1x-artificial-579#.U4CqKl6RPwI
Big Data and Social Physics (Ethics)

Social physics is a big data science that models how networks of people behave and
uses these network models to create actionable intelligence. It is a quantitative
science that can accurately predict patterns of human behavior and guide how to
influence those patterns to (for instance) increase decision making accuracy or
productivity within an organization. Included in this course is a survey of methods
for increasing communication quality within an organization, approaches to
providing greater protection for personal privacy, and general strategies for
increasing resistance to cyber attack.
https://www.edx.org/course/mitx/mitx-mas-s69x-big-data-social-physics-1737#.U4Cox5RdWG4

Introduction to Computational Thinking and Data Science

6.00.2x is aimed at students with some prior programming experience in Python
and a rudimentary knowledge of computational complexity. We have chosen to
focus on breadth rather than depth. The goal is to provide students with a brief
introduction to many topics, so that they will have an idea of whats possible when
the time comes later in their career to think about how to use computation to
accomplish some goal. That said, it is not a computation appreciation course.
Students will spend a considerable amount of time writing programs to implement
the concepts covered in the course. Topics covered include plotting, stochastic
programs, probability and statistics, random walks, Monte Carlo simulations,
modeling data, optimization problems, and clustering.
https://www.edx.org/course/mitx/mitx-6-00-2x-introduction-computational-2836
MIT OpenCourseWare (OCW)
OCW makes the materials used in the teaching of MIT's subjects available on the
Web.
http://ocw.mit.edu/index.htm
https://www.youtube.com/user/MIT
VLAB MIT Entreprise Forum Bay Area, Machine Learning Videos
Added the 22-Nov-2014
Discovery of Disruptive Innovations & Actionable Ideas.
22
VLAB is the San Francisco Bay Area chapter of the MIT Enterprise Forum, a non-
profit organization dedicated to promoting the growth and success of high-tech
entrepreneurial ventures by connecting ideas, technology and people. We provide a
forum for San Francisco and Silicon Valley's leading entrepreneurs, industry
experts, venture capitalists, private investors and technologists to exchange insights
about how to effectively grow high-tech ventures amidst dynamic market risks and
challenges. In a world where markets change at breakneck speed, knowledge is a
critical source of competitive advantage. Our forums provide an excellent
opportunity to network and learn about pivotal business issues, emerging industries
and the latest technologies.
http://www.youtube.com/user/vlabvideos/search?query=machine+learning
Foundations of Machine Learning by Mehryar Mohri - 10 years of Homeworks with
Solutions and Lecture Slides, not to be missed !
Course Description
This course introduces the fundamental concepts and methods of machine learning,
including the description and analysis of several modern algorithms, their
theoretical basis, and the illustration of their applications. Many of the algorithms
described have been successfully used in text and speech processing, bioinformatics,
and other areas in real-world products and services. The main topics covered are:
Probability tools, concentration inequalities
PAC model
Rademacher complexity, growth function, VC-dimension
Perceptron, Winnow
Support vector machines (SVMs)
Kernel methods
Decision trees
Boosting
Density estimation, maximum entropy models
Logistic regression
Regression problems and algorithms
Ranking problems and algorithms
Halving algorithm, weighted majority algorithm, mistake bounds
Learning automata and transducers
Reinforcement learning, Markov decision processes (MDPs)
http://www.cs.nyu.edu/~mohri/ml14/
IPAM, Institute for Pure and Applied Mathematics, Videos, UCLA
IPAM records many of its lectures and makes them available to the public so that a
wider audience may benefit from the scientific programs we offer. Since July 2012,
IPAM has begun to record most of its lectures. You can access the lectures for a
particular program or workshop (such as Materials Defects Tutorials) by following
the program link listed below to the relevant workshop schedule. Each speaker is
listed along with available slide shows and videos. For public lectures, the link will
23
take you directly to the video. The programs and public lectures are listed in reverse
chronological order.
Older videos play on Real Player only; recent videos will play on Flash supported
browsers and software.
https://www.ipam.ucla.edu/videos.aspx

Carnegie Mellon University
Carnegie Mellon University (CMU) Video resources
"The videos below are intended to serve as resources for our current students, and
not as online learning materials for students outside of our program." - The Machine
Learning Department
http://www.ml.cmu.edu/teaching/video-resources.html
Convex Optimisation, Fall 2013, by Barnabas Poczos and Ryan Tibshirani, CMU
Overview and objectives
Nearly every problem in machine learning and statistics can be formulated in terms
of the optimization of some function, possibly under some set of constraints. As we
obviously cannot solve every problem in machine learning or statistics, this means
that we cannot generically solve every optimization problem (at least not
efficiently). Fortunately, many problems of interest in statistics and machine
learning can be posed as optimization tasks that have special propertiessuch as
convexity, smoothness, separability, sparsity etc. permitting standardized,
efficient solution techniques.
This course is designed to give a graduate-level student a thorough grounding in
these properties and their role in optimization, and a broad comprehension of
algorithms tailored to exploit such properties. The main focus will be on convex
optimization problems, though we will also discuss nonconvex problems at the end.
We will visit and revisit important applications in statistics and machine learning.
Upon completing the course, students should be able to approach an optimization
problem (often derived from a statistics or machine learning context) and:
(1) identify key properties such as convexity, smoothness, sparsity, etc., and/or
possibly reformulate the problem so that it possesses such desirable properties;
(2) select an algorithm for this optimization problem, with an understanding of the
advantages and disadvantages of applying one method over another, given the
problem and properties at hand;
(3) implement this algorithm or use existing software to efficiently compute the
solution.
http://www.stat.cmu.edu/~ryantibs/convexopt/#videos

Machine Learning, Spring 2011, by Tom Mitchell, CMU
Machine Learning is concerned with computer programs that automatically improve
their performance through experience (e.g., programs that learn to recognize human
faces, recommend music and movies, and drive autonomous robots). This course
covers the theory and practical algorithms for machine learning from a variety of
24
perspectives. We cover topics such as Bayesian networks, decision tree learning,

Support Vector Machines, statistical learning methods, unsupervised learning and
reinforcement learning. The course covers theoretical concepts such as inductive
bias, the PAC learning framework, Bayesian learning methods, margin-based
learning, and Occam's Razor. Short programming assignments include hands-on
experiments with various learning algorithms, and a larger course project gives
students a chance to dig into an area of their choice. This course is designed to give a
graduate-level student a thorough grounding in the methodologies, technologies,
mathematics and algorithms currently needed by people who do research in
machine learning.
http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
Homework with solutions
http://www.cs.cmu.edu/~tom/10701_sp11/hws.shtml

Metacademy Concept list and roadmap list
Metacademy is a community-driven, open-source platform for experts to
collaboratively construct a web of knowledge. Right now, Metacademy focuses on
machine learning and probabilistic AI, because that's what the current contributors
are experts in. But eventually, Metacademy will cover a much wider breadth of
knowledge, e.g. mathematics, engineering, music, medicine, computer science
http://www.metacademy.org/list
http://www.metacademy.org/roadmaps/
Harvard University
Advanced Machine Learning, Fall 2013 (Free access to most of videos)
This course is about learning to extract statistical structure from data, for making
decisions and predictions, as well as for visualization. The course will cover many of
the most important mathematical and computational tools for probabilistic
modeling, as well as examine specific models from the literature and examine how
they can be used for particular types of data. There will be a heavy emphasis on
implementation. You may use Matlab, Python or R. Each of the five assignments
will involve some amount of coding, and the final project will almost certainly
require the running of computer experiments.
https://www.seas.harvard.edu/courses/cs281/

Data Science Course, Fall 2013
Learning from data in order to gain useful predictions and insights. This course
introduces methods for five key facets of an investigation: data wrangling, cleaning,
and sampling to get a suitable data set; data management to be able to access big
data quickly and reliably; exploratory data analysis to generate hypotheses and
intuition; prediction based on statistical methods such as regression and
classification; and communication of results through visualization, stories, and
interpretable summaries.
25
We will be using Python for all programming assignments and projects.

http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml
Oxford University, Nando de Freitas video lectures
I am a machine learning professor at UBC. I am making my lectures available to the
world with the hope that this will give more folks out there the opportunity to learn
some of the wonderful things I have been fortunate to learn myself. Enjoy.
http://www.youtube.com/user/ProfNandoDF
Cambridge University Machine Learning Slides, Spring 2014
LECTURE SYLLABUS
This year, the exposition of the material will be centered around three specific
machine learning areas: 1) supervised non-paramtric probabilistic inference using
Gaussian processes, 2) the TrueSkill ranking system and 3) the latent Dirichlet
Allocation model for unsupervised learning in text.
http://mlg.eng.cam.ac.uk/teaching/4f13/1314/
Caltech University, Learning from Data
Free, introductory Machine Learning online course (MOOC)
Taught by Caltech Professor Yaser Abu-Mostafa [article]
Lectures recorded from a live broadcast, including Q&A
Prerequisites: Basic probability, matrices, and calculus
8 homework sets and a final exam
Discussion forum for participants
Topic-by-topic video library for easy review
http://work.caltech.edu/telecourse.html
http://work.caltech.edu/library/
University College London Discovery
UCL Discovery showcases UCL's research publications, giving access to journal
articles, book chapters, conference proceedings, digital web resources, theses and
much more, from all UCL disciplines. Where copyright permissions allow, a full copy
of each research publication is directly available from UCL Discovery.
You can search or browse UCL Discovery, see the most-downloaded publications,
and keep up to date with the latest UCL research by RSS or even on Twitter.
UCL Discovery supports UCL's Publications Policy.
http://discovery.ucl.ac.uk/cgi/search/simple?q=machine+learning&_order=bytitle&basic_srchtype=ALL&_satisfyall=ALL&_ac
tion_search=Search
http://discovery.ucl.ac.uk
http://www.youtube.com/watch?v=Euaoblv_nL8
University College London, Supervised Learning

The course covers supervised approaches to machine learning. It starts by
probabilistic pattern recognition followed by an in-depth introduction to various
supervised learning algorithms such as Least Squares, Lasso, Perceptron Algorithm,
Support Vector Machines and Boosting.
http://www0.cs.ucl.ac.uk/staff/M.Herbster/GI01/
26
Yann LeCuns Publications

My main research interests are Machine Learning, Computer Vision, Mobile
Robotics, and Computational Neuroscience. I am also interested in Data
Compression, Digital Libraries, the Physics of Computation, and all the applications
of machine learning (Vision, Speech, Language, Document understanding, Data
Mining, Bioinformatics).
http://yann.lecun.com/exdb/publis/index.html#fulllist
Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions
(English-French)
Spring 2014: Statistical machine learning - Master M2 "Probabilites et Statistiques"
- Universite Paris-Sud (Orsay)
Fall 2013: An introduction to graphical models - Master M2 "Mathematiques,
Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013:
Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale
Superieure (Paris) Fall 2012: An introduction to graphical models - Master M2
"Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de
Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole
Normale Superieure (Paris)
Vision, Apprentissage" - Ecole Normale Superieure de Cachan
Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2008:
Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
May 2007: Probabilistic modelling and graphical models: Enseignement Specialise -
Ecole des Mines de Paris
http://www.di.ens.fr/~fbach/
27
Technion, Israel Institute of Technology, Machine Learning Videos

Technion - Israel Institute of Technology is Israel's biggest scientific-technological
university and one of the largest centers of applied research in the world. Here the
future is being shaped - by over 13,000 of Israel's most dynamic students active in
18 faculties. Technion is Israel's flagship of world-class education, bringing Israel its
first Nobel Prizes in science. From the cornerstone laying ceremony in 1912,
Technion's over 70,000 alumni have built the state of Israel and created and lead the
majority of Israel's successful companies, impacting millions of scientists, students,
entrepreneurs and citizens worldwide.
http://www.youtube.com/user/Technion/search?query=machine+learning
NPTEL, National Programme on Technology Enhanced Learning, India
NPTEL provides E-learning through online Web and Video courses in Engineering,
Science and humanities streams. The mission of NPTEL is to enhance the quality of
Engineering education in the country by providing free online courseware.
http://nptel.ac.in
Probability Theory and Applications
http://nptel.ac.in/courses/111104079/
Pattern Recognition
http://nptel.ac.in/courses/106106046/1
Videolectures.net
VideoLectures.NET is an award-winning free and open access educational video
lectures repository. The lectures are given by distinguished scholars and scientists
at the most important and prominent events like conferences, summer schools,
workshops and science promotional events from many fields of Science. The portal
is aimed at promoting science, exchanging ideas and fostering knowledge sharing by
providing high quality didactic contents not only to the scientific community but
also to the general public. All lectures, accompanying documents, information and
links are systematically selected and classified through the editorial process taking
into account also users' comments.
http://videolectures.net/Top/Computer_Science/Machine_Learning/
http://videolectures.net/Top/Computer_Science/Machine_Learning/#o=top

MLSS Machine Learning Summer Schools Videos
MLSS Videos from 2004 to 2012
http://videolectures.net/site/search/?q=MLSS
MLSS Videos 2012
http://www.youtube.com/user/compcinemaucsc/feed
MLSS Videos 2012
http://www.youtube.com/channel/UCHhbDEKA7BP58mq1wfTBQNQ
28
Max Planck Institute for Intelligent Systems Tubingen, MLSS Videos 2013
Our goal is to understand the principles of Perception, Action and Learning in
autonomous systems that successfully interact with complex environments and to
use this understanding to design future systems. The Institute studies these
principles in biological, computational, hybrid, and material systems ranging from
nano to macro scales.We take a highly interdisciplinary approach that combines
mathematics, computation, material science, and biology.
The MPI for Intelligent Systems has campuses in Stuttgart and Tbingen. Our
Stuttgart campus has world-leading expertise in small-scale intelligent systems that
leverage novel material science and biology. The Tbingen campus focuses on how
intelligent systems process information to perceive, act and learn.
http://www.youtube.com/channel/UCty-pPOWlWUk4gXNm5pydcg

GoogleTechTalks
Machine Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=machine+learning

Deep Learning
https://www.youtube.com/user/GoogleTechTalks/search?query=deep+learning
Udacity Opencourseware
Supervised Learning (select "View Courseware" for free access)
Why Take This Course?
In this course, you will gain an understanding of a variety of topics and methods in
Supervised Learning. Like function approximation in general, Supervised Learning
prompts you to make generalizations based on fundamental assumptions about the
world.
Michael: So why wouldn't you call it "function induction?"
Charles: Because someone said "supervised learning" first.
Topics covered in this course include: Decision trees, neural networks, instance-
based learning, ensemble learning, computational learning theory, Bayesian
learning, and many other fascinating machine learning concepts.
https://www.udacity.com/course/ud675
Unsupervised Learning (select "View Courseware" for free access)
You will learn about and practice a variety of Unsupervised Learning approaches,
including: randomized optimization, clustering, feature selection and
transformation, and information theory.
You will learn important Machine Learning methods, techniques and best practices,
and will gain experience implementing them in this course through a hands-on final
project in which you will be designing a movie recommendation system (just like
Netflix!).
29
Reinforcement Learning (select "View Courseware" for free access)

You will learn about Reinforcement Learning, the field of Machine Learning
concerned with the actions that software agents ought to take in a particular
environment in order to maximize rewards.
Michael: Reinforcement Learning is a very popular field. Charles: Perhaps because
you're in it, Michael. Michael: I don't think that's it.
In this course, you will gain an understanding of topics and methods in
Reinforcement Learning, including Markov Decision Processes and Game Theory.
You will gain experience implementing Reinforcement Learning techniques in a final
project.
In the final project, well bring back the 80's and design a Pacman agent capable of
eating all the food without getting eaten by monsters.
Mathematicalmonk Machine Learning
Videos about math, at the graduate level or upper-level undergraduate.
https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA
Judea Pearl Symposium
Judea Pearl (born 1936) is an Israeli-born American computer scientist and
philosopher, best known for championing the probabilistic approach to artificial
intelligence and the development of Bayesian networks (see the article on belief
propagation). He is also credited for developing a theory of causal and
counterfactual inference based on structural models (see article on causality). He is
the 2011 winner of the ACM Turing Award, the highest distinction in computer
science, "for fundamental contributions to artificial intelligence through the
development of a calculus for probabilistic and causal reasoning". (source
Wikipedia)
http://www.youtube.com/playlist?list=PLMliWGoMCBYilM6tw6S_4BpL_t29jbWsp
http://www.youtube.com/user/UCLA/playlists
Machine Learning Reading Group, Indian Institute of Science
Focus Areas: Machine Learning & Convex Optimization
http://clweb.csa.iisc.ernet.in/achintya/mlrg/
SIGDATA, Indian Institute of Technology Kanpur
http://www.cse.iitk.ac.in/users/sigdata/
http://www.cse.iitk.ac.in/users/sesres/

Hakka Labs
Hakka Labs is passionate about helping professional software engineers level up in
their careers. Our content, events & community have grown by leaps and bounds
since our humble origin when we launched as a Tumblr blog in 2011.
We believe that "software is eating the world" and our passion is in building
valuable resources and community for startup-oriented software engineers - the
30
folks that will power innovation and disrupt industries, and ultimately shape our
future.
Hakka originally launched in SF Bay & NYC and rapidly built relationships with the
top companies, CTOs and tech influencers in these key areas. We have deep
connections to the software engineering worlds on both coasts and often invite
groups of CTOs and engineers to our office in Soho, or meet with them at
engineering events that we either run or participate in.
We're also currently up & running in Berlin & Moscow, and plan to continue to
rapidly expand worldwide. Not too shabby for a scrappy startup with a small
marketing budget!
http://www.hakkalabs.co
https://www.youtube.com/user/g33ktalktv/videos

Open Yale Course
Game Theory
Each course includes a full set of class lectures produced in high-quality video
accompanied by such other course materials as syllabi, suggested readings, exams,
and problem sets. The lectures are available as downloadable videos, and an audio-
only version is also offered. In addition, searchable transcripts of each lecture are
provided.
http://oyc.yale.edu/courses

Columbia University
Machine Learning resources
Course related notes
Regression by linear combination of basis functions [ps] [pdf]
The perceptron [ps] [pdf]
Document classification with the multinomial model [ps] [pdf]
Sampling from a Gaussian [ps] [pdf]
Slides on exponential family distributions [ps] [pdf]
http://www.cs.columbia.edu/~jebara/4771/tutorials.html

Applied Data Science by Ian Langmore and Daniel Krasner
The purpose of this course is to take people with strong mathematical/statistical
knowledge and teach them software development fundamentals. This course will
cover
Design of small software packages
Working in a Unix environment
Designing software in teams
Fundamental statistical algorithms such as linear and logistic regression
Overfitting and how to avoid it
Working with text data (e.g. regular expressions)
31
Time series
And more. . .
http://columbia-applied-data-science.github.io/appdatasci.pdf
http://columbia-applied-data-science.github.io

Deep Learning
Deep Learning is a new area of Machine Learning research, which has been
introduced with the objective of moving Machine Learning closer to one of its
original goals: Artificial Intelligence.
This website is intended to host a variety of resources and pointers to information
about Deep Learning. In these pages you will find
a reading list,
links to software,
datasets,
a list of deep learning research groups and labs,
a list of announcements for deep learning related jobs (job listings),
as well as tutorials and cool demos.
For the latest additions, including papers and software announcement, be sure to
visit the Blog section and subscribe to our RSS feed of the website. Contact us if
you have any comments or suggestions!
http://www.deeplearning.net/tutorial/
http://deeplearning.net
BigDataWeek Videos
Big Data Week is one of the most unique global platforms of interconnected
community events focusing on the social, political, technological and commercial
impacts of Big Data. It brings together a global community of data scientists, data
technologies, data visualisers and data businesses spanning six major commercial,
financial, social and technological sectors.
http://www.youtube.com/user/BigDataWeek/videos
Neural Information Processing Systems Foundation (NIPS) Video resources
The Foundation: The Neural Information Processing Systems (NIPS) Foundation is a
non-profit corporation whose purpose is to foster the exchange of research on
neural information processing systems in their biological, technological,
mathematical, and theoretical aspects. Neural information processing is a field
which benefits from a combined view of biological, physical, mathematical, and
computational sciences.
The primary focus of the NIPS Foundation is the presentation of a continuing series
of professional meetings known as the Neural Information Processing Systems
Conference, held over the years at various locations in the United States, Canada and
Spain.
http://www.youtube.com/user/NeuralInformationPro/feed

32
Hong Kong Open Source Conference 2013 (English&Chinese)

Wang Leung Wong
The Vice-Chairperson of the Hong Kong Linux User Group
This channel will post the videos of my life and opensource events in Hong Kong.

Hong Kong Linux User Group: http://linux.org.hk
Facebook: https://www.facebook.com/groups/hklug/
http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f
http://www.youtube.com/playlist?list=PL2FSfitY-hTLOL6tT_12YUK4c67e-E0xh
ICLR 2014 Videos
It is well understood that the performance of machine learning methods is heavily
dependent on the choice of data representation (or features) on which they are
applied. The rapidly developing field of representation learning is concerned with
questions surrounding how we can best learn meaningful and useful
representations of data. We take a broad view of the field, and include in it topics
such as deep learning and feature learning, metric learning, kernel learning,
compositional models, non-linear structured prediction, and issues regarding non-
convex optimization.
Despite the importance of representation learning to machine learning and to
application areas such as vision, speech, audio and NLP, there is currently no
common venue for researchers who share a common interest in this topic. The goal
of ICLR is to help fill this void.
ICLR 2014 will be a 3-day event from April 14th to April 16th 2014, in Banff,
Canada. The conference will follow the recently introduced open reviewing and
open publishing publication process, which is explained in further detail
here: Publication Model.
https://www.youtube.com/playlist?list=PLhiWXaTdsWB-3O19E0PSR0r9OseIylUM8

ICLR 2013 Videos

ICLR 2013 will be a 3-day event from May 2nd to May 4th 2013, co-located
with AISTATS2013 in Scottsdale, Arizona. The conference will adopt a novel
publication process, which is explained in further detail here: Publication Model.
https://sites.google.com/site/representationlearning2013/program-details/program

Machine Learning Conference Videos
Events matching your search:
ICML 2011
Sixth Annual Machine Learning Symposium
1st Lisbon Machine Learning School
Copulas in Machine Learning Workshop 2011
NIPS 2011 Workshop on Integrating Language and Vision
Machine Learning in Computational Biology (MLCB) 2011

33
Learning Semantics Workshop
Sparse Representation and Low-rank Approximation

Scale
Big Learning: Algorithms, Systems, and Tools for Learning at

Learning)
ICML 2012 Oral Talks (International Conference on Machine

Big Data Meets Computer Vision: First International Workshop
on Large Scale Visual Recognition and Retrieval

2nd Workshop on Semantic Perception, Mapping and
Exploration (SPME)

Object, functional and structured data: towards next
generation kernel-based methods - ICML 2012 Workshop

Tutorial on Statistical Learning Theory in Reinforcement
Learning and Approximate Dynamic Programming

beyond
Tutorial on Causal inference - conditional independences and
ICML 2012 Tutorial on Prediction, Belief, and Markets

Performance Evaluation for Learning Algorithms: Techniques,
Application and Issues
2nd Lisbon Machine Learning School (2012)
OpenCV using Python
Big Learning : Algorithms, Systems, and Tools
NIPS 2012 Workshop on Log-Linear Models
Machine Learning in Computational Biology (MLCB) 2012
NYU Course on Big Data, Large Scale Machine Learning

2013
International Conference on Learning Representations (ICLR)
ICML 2013 Plenary Webcast

The 4th International Workshop on Music and Machine
Learning: Learning from Musical Structure
ICML 2012 Workshop on Representation Learning

Inferning 2012: ICML Workshop on interaction between
Inference and Learning

PAC-Bayesian Analysis in Supervised, Unsupervised, and
Reinforcement Learning

Sixteenth International Conference on Artificial Intelligence
and Statistics (AISTATS) 2013

34

NYU Course on Deep Learning (Spring 2014)

NYU Course on Machine Learning and Computational Statistics
2014
http://techtalks.tv/search/results/?q=machine+learning

Internet Archive
Hello Patron,
Every day 3 million people use our collections.
We have archived over ten petabytes (that's 10,000,000,000,000,000 bytes!) of
information, including everything ever written in Balinese. This year we also
launched our groundbreaking TV News Search and Borrow service, which former
FCC Chairman Newton Minow said "offers citizens exceptional opportunities" to
easily do their own fact checking and "to hold powerful public institutions
accountable."
Your support helps us build amazing services and keep them free for people around
the globe.
https://archive.org/search.php?query=machine%20learning

University of Berkeley
http://www.youtube.com/user/UCBerkeley/search?query=machine+learning

AMP Camps, Big Data Bootcamp, UC Berkeley
AMP Camps are Big Data training events organized by the UC Berkeley AMPLab
about big data analytics, machine learning, and popular open-source software
projects produced by the AMPLab. All AMP Camp curriculum, and whenever
possible videos of instructional talks presented at AMP Camps, are published here
and accessible for free.
http://ampcamp.berkeley.edu
AMP Camp 5 was held at UC Berkeley and live-streamed online on November 20 and
21, 2014. Videos and exercises from the event are available on the AMPCamp 5 page.
http://ampcamp.berkeley.edu/5/

Resources and Tools of Noah's ARK Research Group
The following were developed by ARK researchers (*developed in whole or in part
before joining ARK):
NLP tools:
universal part-of-speech tagset, set of twelve coarse POS tags that generalizes
across several languages
Semantics: SEMAFOR, an open-source statistical frame-semantic parser; AMALGr,
an open-source statistical analyzer for multiword expressions in context

35
Syntax: TurboParser, an open-source, trainable statistical dependency parser;

MSTParserStacked, an open-source, trainable statistical dependency parser based
on stacking; DAGEEM code for unsupervised dependency grammar induction
Information extraction: Arabic named entity recognizer
Libraries/languages: AD3, an approximate MAP decoder; *Dyna, a declarative
programming language for dynamic programming algorithms
Machine translation tools, including: *cdec, a framework for statistical translation
and other structure prediction problems; *Egypt, a statistical machine translation
toolkit that includes Giza; gappy pattern models, code for modeling monolingual and
bilingual textual patterns with gaps; Rampion, a training algorithm for statistical
machine translation models
Social media tools, including: Twitter NLP resources
Datasets: *STRAND (parallel text collections from the web); CURD (the Carnegie
Mellon University Recipe Database); 10-K Corpus (company annual reports and
stock return volatility data); political blog corpus; movie$ corpus; movie summary
corpus; question-answer data; Congressional bills corpus; Arabic named entity and
supersense corpora; NFL tweets corpus; multiword expressions corpus
Project websites: Flexible Learning for NLP; Low-Density MT; Compuframes, Big
Multilinguality, Corporate Social Network
http://www.ark.cs.cmu.edu/#resources

ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2014
ABOUT THE ESAC FACULTY
The ESAC Faculty was created in 2006 in order to foster an effective scientific
environment at ESAC, and to to present a united face to the scientific work done at
the centre. The faculty includes all active (i.e. publishing papers) research scientists
at ESAC: ESA staff, Research Fellows, Science Contractors, and LAEFF members. For
an insight into the founding principles, see the Overview of the ESAC Faculty
presentation given at the first assembly.
The ESAC Faculty's main purpose is to stimulate and promote science activities at
ESAC. For this it maintains an active and attractive visitor programme for short-to-
medium term collaborative stays at ESAC, covering established researchers as well
as young post-docs, PhD and graduate students. The Faculty also supports visiting
seminar speakers, conferences, workshops and travel not possibly via normal
mission budgets.
ESAC Faculty members pursue their own research (as per the scientific interests of
individual members), but are also involved in numerous internal and external
collaborations (overview of Faculty Science at ESAC). Faculty members are also
strongly involved in the ESAC Trainee programme.
http://www.cosmos.esa.int/web/esac-science-faculty/esac-statistics-workshop-
2014
The Royal Society
The Royal Society is a self-governing Fellowship of many of the worlds most
distinguished scientists drawn from all areas of science, engineering, and medicine.
36
The Societys fundamental purpose, reflected in its founding Charters of the 1660s,
is to recognise, promote, and support excellence in science and to encourage the
development and use of science for the benefit of humanity.
The Society has played a part in some of the most fundamental, significant, and life-
changing discoveries in scientific history and Royal Society scientists continue to
make outstanding contributions to science in many research areas.
The Royal Society is the national Academy of science in the UK, and its core is its
Fellowship and Foreign Membership, supported by a dedicated staff in London and
elsewhere. The Fellowship comprises the most eminent scientists of the UK, Ireland
and the Commonwealth.
A major activity of the Society is identifying and supporting the work of outstanding
scientists. The Society supports researchers through its early and senior career
schemes, innovation and industry schemes, and other schemes.
The Society facilitates interaction and communication among scientists via its
discussion meetings, and disseminates scientific advances through its journals. The
Society also engages beyond the research community, through independent policy
work, the promotion of high quality science education, and communication with the
public.
https://www.youtube.com/user/RoyalSociety/videos?spfreload=10
Statistical and causal approaches to machine learning by Professor Bernhard
Schlkopf
https://www.youtube.com/watch?v=ek9jwRA2Jio&spfreload=10

Deep Learning

Deep Learning RNNaissance with Dr. Juergen Schmidhuber
A great session of NYC-ML Meetup Hosted by ShutterStock in the glorious Empire
State building. Details:
Deep Learning RNNaissance
Machine learning and pattern recognition are currently being revolutionised by
"Deep Learning" (DL)
https://www.youtube.com/watch?v=6bOMf9zr7N8&spfreload=10

Introduction to Deep Learning with Python by Alec Radford
Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning
with Python and the Theano library. The emphasis of the talk is on high
performance computing, natural language processing using recurrent neural nets,
and large scale learning with GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
SlideShare presentation is available here:
http://slidesha.re/1zs9M11

37
Miscellaneous
Introduction To Modern Brain-Computer Interface Design by Swartz Center for
Computational Neuroscience
This is an online course on Brain-Computer Interface (BCI) design with a focus on
modern methods. The lectures were first given by Christian Kothe (SCCN/UCSD) in
2012 at University of Osnabrueck within the Cognitive Science curriculum and have
now been recorded in the form of an open online course.
The course includes basics of EEG, BCI, signal processing, machine learning, and also
contains tutorials on using BCILAB and the lab streaming layer software.
http://sccn.ucsd.edu/wiki/Introduction_To_Modern_Brain-
Computer_Interface_Design

Distributed Computing Courses (lectures, exercises with solutions) by ETH Zurich,
Group of Prof. Roger Wattenhofer
Mission
We are interested in both theory and practice of computer science and information
technology. In our group we cultivate a large breadth of areas, reflecting our
different backgrounds in computer science, mathematics, and electrical engineering.
This gives us a unique blend of basic and applied research, proving mathematical
theorems on the one hand, and building practical systems on the other.
We currently study the following topics: Distributed computing (computability,
locality, complexity), distributed systems (Bitcoin), wireline networks (software
defined networks), wireless networks (media access theory and practice), social
networks (influence), algorithms (online algorithms, game theory), learning theory
(recommendation theory and practice). We regularly publish in different
communities: distributed computing (e.g. PODC, SPAA, DISC), networking (e.g.
SIGCOMM, MobiCom, SenSys), theory (e.g. STOC, FOCS, SODA, ICALP), and from time
to time at random in areas such as machine learning or human computer
interaction.
Members of our group have won several best paper awards at top conferences such
as PODC, SPAA, DISC, MobiCom, or P2P. Roger Wattenhofer has won the Prize for
Innovations in Distributed Computing in 2012, for extensive contributions to the
study of distributed approximation. Some projects turned into startup companies,
e.g. Wuala, StreamForge, BitSplitters. Several projects have been covered by popular
media and blogs, e.g. Gizmodo, Lifehacker, New York Times, NZZ, PC World
Magazine, Red Herring, or Technology Review. Some of the software developed by
our students is very popular: The music application Jukefox and the peer-to-peer
client BitThief have together more than 1 million downloads. A branch of the United
States FBI has requested to use a version of BitThief as a tool to uncover illegal
activities. About half of the former PhD students are in academic positions, some
others founded startup companies.
http://dcg.ethz.ch/courses.html

38
The wonderful and terrifying implications of computers that can learn | Jeremy
Howard | TEDxBrussels
Published on 6 Dec 2014
This talk was given at a local TEDx event, produced independently of the TED
Conferences. The extraordinary, wonderful, and terrifying implications of
computers that can learn
https://www.youtube.com/watch?v=xx310zM3tLs&spfreload=10

MOOC or Opencourseware - Spanish

Coming soon
MOOC or Opencourseware - German

Coming soon
MOOC or Opencourseware - Italian

Coming soon
MOOC or Opencourseware French

University of Laval (French Canadian)
Open access to the course material
Apprentissage automatique
Apprentissage automatique partir de donnes et apprentissage supervis. Minimisation du risque
empirique et minimisation du risque structurel. Mthodes d'estimation du vrai risque partir
de donnes et intervalles de confiance. Classificateurs linaires et non linaires. Forme duale
de l'algorithme du perceptron. Noyaux de Mercer. Classificateurs large marge de sparation.
SVMs marge rigide et marge floue. Apprentissage probablement approximativement correct (PAC)
et thorie de Vapnik et Chervonenkis sur l'erreur de prdiction des classificateurs.
L'apprentissage par compression de l'chantillon et applications aux SCMs et perceptrons.
https://cours.ift.ulaval.ca/2009a/ift7002_81602/
Thorie algorithm. des graphes
Ce cours aborde des sujets tels la connexit dans un graphe (problmes du flot maximum,
de la dualit min-max, de couplage parfait, etc.), la planarit d'un graphe (formule d'Euler,
thorme de Kuratowski, graphe dual), le coloriage d'un graphe (coloriages entiers et
39
fractionnaires des sommets ou des artes, graphes de Kneiser), les problmes de transversales
d'un graphe (parcours eulriens, cycles hamiltoniens, graphes de DeBruijn, etc.) et la notion de
marche alatoire sur un graphe (chanes de Markov, existence de la distribution limite,
mixing time, etc.). Plusieurs problmes sur les graphes ont d'lgantes solutions,
d'autres videmment sont NP-complets; une partie de ce cours portera donc sur la thorie de la
complexit (problmes NP et NP-complets, thorme de Cook, algorithmes de rductions).
https://cours.ift.ulaval.ca/2012a/ift7012_89927/

Hugo Larochelle, Apprentissage automatique, French Canadian
Je m'intresse aux algorithmes d'apprentissage automatique, soit aux algorithmes
capables d'extraire des concepts ou patrons partir de donnes. Mes travaux se
concentrent sur le dveloppement d'approches connexionnistes et probabilistes
diverses problmes d'intelligence artificielle, tels la vision artificielle et le traitement
automatique du langage.
Les thmes de recherche auxquels je m'intresse incluent:
Problmes: apprentissage supervis, semi-supervis et non-supervis, prdiction
de cibles structures, ordonnancement, estimation de densit;
Modles: rseaux de neurones profonds (deep learning), autoencodeurs,
machines de Boltzmann, champs Markoviens alatoires;
Applications: reconnaissance et suivi d'objects, classification et ordonnancement
de documents;
https://www.youtube.com/channel/UCiDouKcxRmAdc5OeZdiRwAg
http://www.dmi.usherb.ca/~larocheh/index_fr.html

Francis Bach, Ecole Normale Superieure - Courses and Exercises with solutions
(English-French)
Vision, Apprentissage" - Ecole Normale Superieure de Cachan Spring 2013:
Statistical machine learning - Master M2 "Probabilites et Statistiques" - Universite
Paris-Sud (Orsay)
Spring 2013: Statistical machine learning - Filiere Math/Info - L3 - Ecole Normale
Superieure (Paris) Fall 2012: An introduction to graphical models - Master M2
"Mathematiques, Vision, Apprentissage" - Ecole Normale Superieure de
Cachan Spring 2012: Statistical machine learning - Filiere Math/Info - L3 - Ecole
Normale Superieure (Paris)
40

Vision, Apprentissage" - Ecole Normale Superieure de Cachan May 2008:
Probabilistic modelling and graphical models: Enseignement Specialise - Ecole des
Mines de Paris
Fall 2007: An introduction to graphical models - Master M2 "Mathematiques, Vision,
Apprentissage" - Ecole Normale Superieure de Cachan
May 2007: Probabilistic modelling and graphical models: Enseignement Specialise -
Ecole des Mines de Paris
College de France, Mathematics and Digital Science, French
One of the Collge de France's missions is to promote French research and thought
abroad, and to participate in intel-lectual debates on major world issues. The
institution therefore participates in international exchange through its teaching and
the dissemination of knowledge, as well as through the research programmes
involving its Chairs and laboratories. The fact that one fifth of the professors are
currently from abroad, confirms the Collge de France's wid-ening research and
education policy.
This policy of international openness translates into:
Collge de France professors' teaching missions abroad
Lectures and lecture series by visiting professors
Junior Visiting Researchers scheme
Lecture series and symposia abroad
Internet broadcasts
http://www.college-de-france.fr/site/audio-video/_audiovideos.jsp?index=0&prompt=&fulltextdefault=mots-
cles...&fulltext=&fields=TYPE2_ACTIVITY&fieldsdefault=0_0&TYPE2=0&ACTIVITY=mathematiques

more to come
MOOC or Opencourseware - Russian

Russian Machine Learning Resources
Google Translation from Russian:
Professional information and analytical resource dedicated
machine learning , pattern recognition and data mining .
Now resource contains 831 article in Russian. (Source 16-07-2014)
41
Classification
Pattern recognition
Regression analysis
Prediction
Analysis and understanding of images

Processing and analysis of texts
Applied Statistics
Applied Systems Analysis Data
Signal Processing
All Destinations

http://www.machinelearning.ru/wiki/index.php?title=_

Yandex School
The Yandex School of Data Analysis
The School of Data Analysis is a free Masters-level program in Computer Science
and Data Analysis, which is offered by Yandex since 2007 to graduates in
engineering, mathematics, computer science or related fields. The aim of the School
is to train specialists in data analysis and information retrieval for further
employment at Yandex or any other IT company.

The Schools courses are taught by Russian and international experts at Yandexs
Moscow office in the evenings, several times a week. The average study load is 15-
20 hours per week, including 9-12 hours of lectures and seminars. The School also
runs distance-learning courses and provides lectures over the internet. All courses
at the Yandex School of Data Analysis are currently taught only in Russian.
http://shad.yandex.ru/lectures/

Alexander Dyakonov Resources
http://alexanderdyakonov.narod.ru/index.htm
Unknown in Data Mining and Machine Learning (2013)

http://alexanderdyakonov.narod.ru/lpot4emu.pdf
Introduction to Data Mining (2012)

http://alexanderdyakonov.narod.ru/intro2datamining.pdf
Tricks in Data Mining (2011)
http://alexanderdyakonov.narod.ru/lpotdyakonov.pdf
Manual "Logic Games, Data Mining, Weka, RapidMiner, MATLAB" (2010)
, ,
, WEKA, RapidMiner MatLab
http://www.machinelearning.ru/wiki/images/7/7e/Dj2010up.pdf


42
Machine Learning lectures by Konstantin Vorontsov.

http://shad.yandex.ru/lectures/machine_learning.xml

More to come
MOOC or Opencourseware - Japanese

Coming soon
MOOC or Opencourseware Chinese

Yeeyan Coursera Chinese Classroom

Google Translation from Chinese (Simplified Han) to English
Welcome to Yeeyan Coursera Chinese classroom.
In this always have a small partner to accompany the classroom, you can:
join collaborative translation;
exchange ideas;
enrollment became class representative;
punch seek supervision;
......
Finally, welcome to drying out your certificate, either Coursera joint Yeeyan
Translator's Certificate or Certificate of Coursera course, you are overcome my own
life winner!
http://coursera.yeeyan.org/
Hong Kong Open Source Conference 2013
Wang Leung Wong
The Vice-Chairperson of the Hong Kong Linux User Group
This channel will post the videos of my life and opensource events in Hong Kong.
Hong Kong Linux User Group: http://linux.org.hk
Facebook: https://www.facebook.com/groups/hklug/
http://www.youtube.com/playlist?list=PL2FSfitY-hTKbEKNOwb-j0blK6qBauZ1f

Guokr.com
Machine Learning
http://mooc.guokr.com/search/?wd=+%E6%9C%BA%E5%99%A8%E5%AD%A6
%E4%B9%A0
Data Mining
http://mooc.guokr.com/search/?wd=%E6%95%B0%E6%8D%AE%E6%8C%96%E
6%8E%98
43
http://mooc.guokr.com/search/?wd=%E4%BA%BA%E5%B7%A5%E6%99%BA%
E8%83%BD

More coming soon

MOOC or Opencourseware - Portuguese

Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computao, UFF, 2010
http://www2.ic.uff.br/~bianca/aa/

Algoritmo de Aprendizado de Mquina by Aurora Trinidad Ramirez Pozo, Universidade
Federal do Paran, UFPR
http://www.inf.ufpr.br/aurora/tutoriais/aprendizadomaq/
http://www.inf.ufpr.br/aurora/tutoriais/arvoresdecisao/
http://www.inf.ufpr.br/aurora/tutoriais/Ceapostila.pdf
http://www.inf.ufpr.br/aurora/

Digital Library, Universidad de Sao Paulo
http://www.teses.usp.br/index.php?option=com_jumi&fileid=20&Itemid=96&lang=
en&cx=011662445380875560067%3Acack5lsxley&cof=FORID%3A11&hl=en&q=
machine+learning&siteurl=www.teses.usp.br%2Findex.php%3Foption%3Dcom_ju
mi%26fileid%3D20%26Itemid%3D96%26lang%3Den&ref=www.teses.usp.br%2F
&ss=5799j3321895j16

Coming soon
MOOC or Opencourseware Hebrew&English

Open University of Israel
.

, , ,
.
, . ,
, ,
" " .
http://www.youtube.com/user/openofek/search?query=machine+learning

44

More coming soon

Applications
MIT Media Lab
The real-time city is now real! The increasing deployment of sensors and hand-held
electronics in recent years is allowing a new approach to the study of the built
environment. The way we describe and understand cities is being radically
transformed - alongside the tools we use to design them and impact on their
physical structure.
Studying these changes from a critical point of view and anticipating them is the
goal of the SENSEable City Laboratory, a new research initiative at the
Massachusetts Institute of Technology.
http://senseable.mit.edu
TEDx San Francisco, Connected Reality
Connected Reality is an evening that explored how the exponential technologies of
the Internet of Things will give us deep insights that augment our understanding of
the world and each other and will propel our ability to build intelligent tools that
augment our lives. We'll briefly see the future through the eyes of presenters from
varied industries of medicine to manufacturing who will illustrate how they use
sensor data to perceive and understand the world differently and adjust their
realities based on their new connectivity to their environment.
http://tedxsf.org/videos/#tedxsf-connected-reality
Emotion&Pain Project
One of the main challenges facing healthcare providers in the UK today (and in
Europe) is the rising number of people with chronic health problems. Almost 1 in 7
UK citizens experiences chronic pain, some due to chronic diseases such as
osteoarthritis, but much of it mechanical low back pain (LBP) with no treatable
pathology. 40% of these people experience severe pain and are very restricted by it.
The capacity of our current health care system is insufficient to treat all these
patients face-to-face. Pain experience is affected by physical, psychological, and
social factors and hence it poses a problem to the medical profession. This has
prompted the development of a multidisciplinary approach to the treatment of
chronic LBP, primarily involving psychology and physiotherapy alongside specialist
clinicians (see British Pain Society guidelines). These programmes enable patients
to become more self-managing through improving their physical and psychological
functioning. While short term results are good, maintenance of these gains, and
building on them, remains a problem, with psychological factors being one of the
primary limiting causes.
Rehabilitation-assistive technologies have shown some success in helping recovery
in a number of conditions but have yet to have an impact in pain management,
45
mostly because of the complexity of dealing with emotional and motivational

aspects of self-directed activity increase. By providing the means to automatically
recognise, interpret, and act upon human affective states, recent developments in
sensing technology and the field of affective computing offer new avenues for
addressing these limitations and alleviating the difficulties patients face in building
on treatment gains.
Thus we propose the design and development of an intelligent system that will
enable ubiquitous monitoring and assessment of patients pain-related mood and
movements inside (and in the longer term, outside) the clinical environment.
Specifically, we aim to
(a) develop a set of methods for automatically recognising audiovisual cues related
to pain, behavioural patterns typical of low back pain, and affective states
influencing pain, and
(b) integrate these methods into a system that will provide appropriate feedback
and prompts to the patient based on his/her behaviour measured during self-
directed physical therapy sessions. In doing so, we seek to develop a new generation
of multimodal patient-centred personal health technology.
http://www.emo-pain.ac.uk
NHK Documentary Robot Revolution Developing Robots for Dangerous Fukushima
Decommission Process
http://www.youtube.com/watch?v=mDD1TGv_2fo

IBM Research
Machine learning applications
Five innovations that will change our lives within five years
http://www.research.ibm.com/cognitive-computing/machine-learning-applications/index.shtml#fbid=Dp4uN7k8b2O

EFPL Ecole Polytechnique Federale de Lausanne

EPFL is one of two Federal Institutes of Technology in Switzerland. Located along
the shore of Lake Geneva, the university has more than 9,000 students in seven
academic schools including Life Science, Architecture, and Computer Sciences.
http://www.youtube.com/channel/UClMJeVIVyGp-3_kWtspkS0Q

Visualizing MBTA Data: An interactive exploration of Boston's subway system
Bostons Massachusetts Bay Transit Authority (MBTA) operates the 4th busiest
subway system in the U.S. after New York, Washington, and Chicago. We attempt
to present this information to help people in Boston better understand the trains,
how people use the trains, and how the people and trains interact with each other.
http://mbtaviz.github.io

46
Commercial Applications (listed without any transfer of money)

Google glass
http://www.youtube.com/watch?v=D7TB8b2t3QE
Google self-driving car

http://www.youtube.com/watch?v=cdgQpa1pUUE
SenseFly
http://www.youtube.com/watch?v=NuZUSe87miY

Free access to Research papers - English

Cambridge University Publications page
http://mlg.eng.cam.ac.uk/pub/
Google Scholar
Stand on the shoulders of giants.
Google Scholar provides a simple way to broadly search for scholarly literature.
From one place, you can search across many disciplines and sources: articles, theses,
books, abstracts and court opinions, from academic publishers, professional
societies, online repositories, universities and other web sites. Google Scholar helps
you find relevant work across the world of scholarly research.
http://scholar.google.com/intl/en/scholar/about.html
http://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=machine+learning&before_author=m83-
_28PAAAJ&astart=0
Google Research
Google publishes hundreds of research papers each year. Publishing is important to
us; it enables us to collaborate and share ideas with, as well as learn from, the
broader scientific community. Submissions are often made stronger by the fact that
ideas have been tested through real product implementation by the time of
publication.
http://research.google.com/pubs/papers.html
Yahoo Research
The machine learning group is a team of experts in computer science, statistics,
mathematical optimization, and automatic control. They focus on making computers
learn abstractions, patterns, conditional probability distributions, and policies from
web scale data with the goal to improve the online experience for Yahoo! users,
partner publishers, and advertisers.
Machine learning has such a broad influence on the internet, it can be quite difficult
to recognize. Machine learnings benefits are often hidden they are the spam
emails you dont see, the uninteresting news articles you dont see, and the
47
irrelevant search results you dont see, just to name a new. Machine learning is one
of the best technologies we have for solving some of the biggest problems on the
Web.
http://labs.yahoo.com/areas/?areas=machine-learning
Microsoft Research
The Machine Learning Groups of Microsoft Research include a set of researchers
and developers who push the state of the art in machine learning. We span the space
from proving theorems about the math underlying ML, to creating new ML systems
and algorithms, to helping our partner product groups apply ML to large and
complex data sets.
http://research.microsoft.com/en-us/groups/mldept/
Journal from MIT Press
The Journal of Machine Learning Research (JMLR) provides an international forum
for the electronic and paper publication of high-quality scholarly articles in all areas
of machine learning. All published papers are freely available online.
http://jmlr.org
INRIA
Access to Research Papers
http://haltools.inrialpes.fr/Public/afficheRequetePubli.php?labos_exp=sierra&CB_auteur=oui&CB_titre=oui&CB_article=oui&l
angue=Anglais&tri_exp=annee_publi&tri_exp3=date_publi&ordre_aff=TA&Fen=Aff&css=../css/VisuCondense.css

48
Open Source Software English

JAVA
Weka 3: Data Mining Software in Java
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from your own Java
code. Weka contains tools for data pre-processing, classification, regression,
clustering, association rules, and visualization. It is also well-suited for developing
new machine learning schemes.
http://www.cs.waikato.ac.nz/~ml/weka/index.html
A deep-learning library for Java
Distributed Deep Learning Platform for Java
https://github.com/agibsonccc/java-deeplearning
List of Java ML Software by Machine Learning Mastery
http://machinelearningmastery.com/java-machine-learning/

List of Java ML Software by MLOSS
http://mloss.org/software/language/java/

PYTHON
Theano Library for Deep Learning
Theano is a Python library that allows you to define, optimize, and evaluate
mathematical expressions involving multi-dimensional arrays efficiently. Theano
features:
tight integration with NumPy Use numpy.ndarray in Theano-compiled
functions.
transparent use of a GPU Perform data-intensive calculations up to 140x
faster than with CPU.(float32 only)
efficient symbolic differentiation Theano does your derivatives for
function with one or many inputs.
speed and stability optimizations Get the right answer for log(1+x)
even when x is really tiny.
dynamic C code generation Evaluate expressions faster.
extensive unit-testing and self-verification Detect and diagnose many
types of mistake.
Theano has been powering large-scale computationally intensive scientific
investigations since 2007. But it is also approachable enough to be used in the
classroom (IFT6266 at the University of Montreal).
http://deeplearning.net/software/theano/
49
http://nbviewer.ipython.org/github/craffel/theano-
tutorial/blob/master/Theano%20Tutorial.ipynb
Introduction to Deep Learning with Python
Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning
with Python and the Theano library. The emphasis of the talk is on high
performance computing, natural language processing using recurrent neural nets,
and large scale learning with GPUs.
https://www.youtube.com/watch?v=S75EdAcXHKk
Udacity - Programming foundations with Python
Youll pick up some great tools for your programming toolkit in this course! You
will:
Start coding in the programming language Python;
Reuse and share code with Object Oriented Programming;
Create and share amazing, life-hacking projects!
Scikit-learn, Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license
http://scikit-learn.org/stable/index.html
Pydata
PyData is a gathering of users and developers of data analysis tools in Python. The
goals are to provide Python enthusiasts a place to share ideas and learn from each
other about how best to apply our language and tools to ever-evolving challenges in
the vast realm of data management, processing, analytics, and visualization.
https://www.youtube.com/user/PyDataTV/videos
PyData NYC 2014 Videos
Published 4 days ago
https://www.youtube.com/user/PyDataTV/videos?spfreload=10
PyData is a gathering of users and developers of data analysis tools in Python. The
goals are to provide Python enthusiasts a place to share ideas and learn from each
other about how best to apply our language and tools to ever-evolving challenges in
the vast realm of data management, processing, analytics, and visualization.
We aim to be an accessible, community-driven conference, with tutorials for
novices, advanced topical workshops for practitioners, and opportunities for
package developers and users to meet in person.
A major goal of the conference is to provide a venue for users across all the various
domains of data analysis to share their experiences and their techniques, as well as
highlight the triumphs and potential pitfalls of using Python for certain kinds of
problems.
http://pydata.org/nyc2014/about/about/
50

PyData, The Complete Works by Rohit Sivaprasad
Added in the kit 11-Nov-2014
The unofficial index of all PyData talks. This was intially going to be a pickled pandas
DataFrame object, but then I decided against it. So here it is - in beautiful Github
flavored markdown.
There are placeholders for links to the video. Currently, the hyperlinks point to the
pydata.org talk pages. Please do feel free to make it better by contributing to the
repo.
https://github.com/DataTau/datascience-anthology-pydata
Anaconda
Completely free enterprise-ready Python distribution for large-scale data
processing, predictive analytics, and scientific computing
We want to ensure that Python, NumPy, SciPy, Pandas, IPython, Matplotlib, Numba,
Blaze, Bokeh, and other great Python data analysis tools can be used everywhere.
We want to make it easier for Python evangelists and teachers to promote the use of
Python.
We want to give back to the Python community that we love being a part of.
https://store.continuum.io/cshop/anaconda/
Ipython Interactive Computing
IPython provides a rich architecture for interactive computing with:
Powerful interactive shells (terminal and Qt-based).
A browser-based notebook with support for code, text, mathematical expressions,
inline plots and other rich media.
Support for interactive data visualization and use of GUI toolkits.
Flexible, embeddable interpreters to load into your own projects.
Easy to use, high performance tools for parallel computing.
http://ipython.org
Scipy
SciPy refers to several related but distinct entities:
The SciPy Stack, a collection of open source software for scientific computing
in Python, and particularly a specified set of core packages.
The community of people who use and develop this stack.
Several conferences dedicated to scientific computing in Python - SciPy,
EuroSciPy and SciPy.in.
The SciPy library, one component of the SciPy stack, providing many numerical
routines.
http://www.scipy.org
Numpy
NumPy is the fundamental package for scientific computing with Python. It contains
among other things:
a powerful N-dimensional array object
51
sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary data-types can be defined. This
allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
http://www.numpy.org
matplotlib
matplotlib is a python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms.
matplotlib can be used in python scripts, the python and ipython shell (ala
MATLAB* or Mathematica), web application servers, and six graphical user
interface toolkits.
http://matplotlib.org
pandas
Python Data Analysis Library
pandas is an open source, BSD-licensed library providing high-performance, easy-
to-use data structures and data analysis tools for the Python programming language.
http://pandas.pydata.org
SymPy
SymPy is a Python library for symbolic mathematics.
http://sympy.org/en/index.html
Orange
Open source data visualization and analysis for novice and experts. Data mining
through visual programming or Python scripting. Components for machine learning.
Add-ons for bioinformatics and text mining. Packed with features for data analytics.
http://orange.biolab.si
Pythonic Perambulations: How to be a Bayesian in Python
Below I'll explore three mature Python packages for performing Bayesian analysis
via MCMC:
emcee: the MCMC Hammer
pymc: Bayesian Statistical Modeling in Python
pystan: The Python Interface to Stan
http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/
emcee
emcee is an extensible, pure-Python implementation of Goodman & Weare's Affine
Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler. It's designed for
Bayesian parameter estimation and it's really sweet!
http://dan.iel.fm/emcee/current/

52
PyMC
PyMC is a python module that implements Bayesian statistical models and fitting
algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility
make it applicable to a large suite of problems. Along with core sampling
functionality, PyMC includes methods for summarizing output, plotting, goodness-
of-fit and convergence diagnostics.
http://pymc-devs.github.io/pymc/
Pylearn2
Ian J. Goodfellow, David Warde-Farley, Pascal Lamblin, Vincent Dumoulin, Mehdi
Mirza, Razvan Pascanu, James Bergstra, Frdric Bastien, and Yoshua Bengio.
"Pylearn2: a machine learning research library". arXiv preprint arXiv:1308.4214
(BibTeX)
https://github.com/lisa-lab/pylearn2
Giant list of python learning resources
Keep following this post, we'll keep updating this huge list & collection.
http://python2web.com/giant-list-of-python-learning-resources/

PyCon US 2014
PyCon is the largest annual gathering for the community using and developing the
open-source Python programming language. It is produced and underwritten by the
Python Software Foundation, the 501(c)(3) nonprofit organization dedicated to
advancing and promoting Python. Through PyCon, the PSF advances its mission of
growing the international community of Python programmers.
Because PyCon is backed by the non-profit PSF, we keep registration costs much
lower than comparable technology conferences so that PyCon remains accessible to
the widest group possible. The PSF also pays for the ongoing development of the
software that runs PyCon and makes it available under a liberal open source license.
140 videos
http://pyvideo.org/category/50/pycon-us-2014
PyCon India 2012
https://www.youtube.com/playlist?list=PL6GW05BfqWIdWaV_aP6kHJKFY0ybOOfoA
PyCon India 2013

https://www.youtube.com/playlist?list=PL6GW05BfqWIdsaaV35jcHWPWTI-DAw6Yn

Montreal Python
Montral-Python's mission is to promote the growth of a lively and dynamic
community of users of the Python programming language and to promote the use of
the latter. Montral-Python also aims to disseminate the local Python knowledge to
build a stronger developer community. Montral-Python promotes Free and Open
Source Software, favors its adoption within the community, and collaborates with
community players to achieve this goal.
53
http://www.youtube.com/user/MontrealPython/videos
http://montrealpython.org/en/
SciPy 2014
SciPy is a community dedicated to the advancement of scientific computing through
open source Python software for mathematics, science, and engineering. The annual
SciPy Conference allows participants from all types of organizations to showcase
their latest projects, learn from skilled users and developers, and collaborate on
code development.
http://pyvideo.org/category/51/scipy-2014
PyLadies London Meetup resources
PyLadies is an international mentorship group with a focus on helping more women
and genderqueers become active participants and leaders in the Python open-
source community. Our mission is to promote, educate and advance a diverse
Python community through outreach, education, conferences, events, and social
gatherings. PyLadies also aims to provide a friendly support network for women
and genderqueers, and a bridge to the larger Python world.
https://github.com/pyladieslondon/resources

Python Tools for Machine Learning by CB Insights
http://www.cbinsights.com/blog/python-tools-machine-learning

Python Tutorials by Jessica MacKellar
I am a startup founder, software engineer, and open source developer living in San
Francisco, California.
I enjoy the Internet, networking, low-level systems engineering, relational
databases, tinkering on electronics projects, and contributing to and helping other
people contribute to open source software.
"Be the change you wish to see in the world" may be clichd, but what can I say, I
believe in it. I am committed to applying my skills, in individual and collective
efforts, to improve the world. Right now, this means I spend a lot of time
volunteering, engaging technologists about education, and empowering effective
people and initiatives in my capacity as a Director for the Python Software
Foundation.
http://web.mit.edu/jesstess/

OCTAVE
GNU Octave is a high-level interpreted language, primarily intended for numerical
computations. It provides capabilities for the numerical solution of linear and
nonlinear problems, and for performing other numerical experiments. It also
provides extensive graphics capabilities for data visualization and manipulation.
Octave is normally used through its interactive command line interface, but it can
54
also be used to write non-interactive programs. The Octave language is quite similar
to Matlab so that most programs are easily portable.
http://www.gnu.org/software/octave/

JULIA
Julia is a high-level, high-performance dynamic programming language for technical
computing, with syntax that is familiar to users of other technical computing
environments. It provides a sophisticated compiler, distributed parallel execution,
numerical accuracy, and an extensive mathematical function library. The library,
largely written in Julia itself, also integrates mature, best-of-breed C and Fortran
libraries for linear algebra, random number generation, signal processing, and
string processing. In addition, the Julia developer community is contributing a
number of external packages through Julias built-in package manager at a rapid
pace. IJulia, a collaboration between the IPython and Julia communities, provides a
powerful browser-based graphical notebook interface to Julia.
Julia programs are organized around multiple dispatch; by defining functions and
overloading them for different combinations of argument types, which can also be
user-defined. For a more in-depth discussion of the rationale and advantages of Julia
over other systems, see the following highlights or read the introduction in the
online manual.
http://julialang.org
Julia by example
http://www.scolvin.com/juliabyexample/

The R PROJECT for Statistical Computing

R
R is a language and environment for statistical computing and graphics
R provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, ...) and graphical
techniques, and is highly extensible. The S language is often the vehicle of choice for
research in statistical methodology, and R provides an Open Source route to
participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots
can be produced, including mathematical symbols and formulae where needed.
Great care has been taken over the defaults for the minor design choices in graphics,
but the user retains full control.
http://www.r-project.org

R Graph Gallery
The blog is a collection of script examples with example data and output plots. R
produce excellent quality graphs for data analysis, science and business
55
presentation, publications and other purposes. Self-help codes and examples are
provided. Enjoy nice graphs !!
http://rgraphgallery.blogspot.co.uk/2013/04/ploting-heatmap-in-map-using-maps.html
Code School - R Course

Learn the R programming language for data analysis and visualization. This
software programming language is great for statistical computing and graphics.
https://www.codeschool.com/courses/try-r
Coursera R programming
In this course you will learn how to program in R and how to use R for effective data
analysis. You will learn how to install and configure software necessary for a
statistical programming environment and describe generic programming language
concepts as they are implemented in a high-level statistical language. The course
covers practical issues in statistical computing which includes programming in R,
reading data into R, accessing R packages, writing R functions, debugging, profiling
R code, and organizing and commenting R code. Topics in statistical data analysis
will provide working examples.
https://www.coursera.org/course/rprog
Open Intro R Labs
OpenIntro Labs promote the understanding and application of statistics through
applied data analysis. The statistical software R is a widely used and stable software
that is free. RStudio is a user-friendly interface for R.
http://www.openintro.org/stat/labs.php
R Tutorial
Hierarchical Linear Model
Bayesian Classification with Gaussian Process
Bayesian Inference Using OpenBUGS
Significance Test for Kendall's Tau-b
Support Vector Machine with GPU, Part II
Hierarchical Cluster Analysis
http://www.r-tutor.com

DataCamp R Course
Introduction to R
Data Analysis and Statistical Inference
Introduction to Computational Finance and Financial Econometrics
How to work with Quandl in R
https://www.datacamp.com/courses

R Bloggers
R-Bloggers.com is a central hub (e.g: A blog aggregator) of content collected from
bloggers who write about R (in English). The site will help R bloggers and users to
56
connect and follow the R blogosphere (you can view a 7 minute talk, from
useR2011, for more information about the R-blogosphere).
http://www.r-bloggers.com
STAN Software
Stan is a probabilistic programming language implementing full Bayesian statistical
inference with
MCMC sampling (NUTS, HMC)
and penalized maximum likelihood estimation with
Optimization (BFGS)
Stan is coded in C++ and runs on all major platforms (Linux, Mac,
Windows).
Stan is freedom-respecting, open-source software (new BSD core, GPLv3
interfaces).
Interfaces
Download and getting started instructions, organized by interface:
RStan v2.5.0 (R)
PyStan v2.5.0 (Python)
CmdStan v2.5.0 (shell, command-line terminal)
MatlabStan (MATLAB)
Stan.jl (Julia)
http://mc-stan.org
List of Machine Learning Open Source Software
To support the open source software movement, JMLR MLOSS publishes
contributions related to implementations of non-trivial machine learning
algorithms, toolboxes or even languages for scientific computing.
http://jmlr.org/mloss/

Google Prediction API
Google's cloud-based machine learning tools can help analyze your data to add the
following features to your applications: Customer sentiment analysis, Message
routing decisions, Document and email classification, Recommendation systems,
Churn analysis, Spam detection, Upsell opportunity analysis, Diagnostics, Suspicious
activity identification, and much more
Free Quota:
Usage is free for the first six months, up to the following limits per Google
Developers Console project. This free quota applies even when billing is enabled,
until the six-month expiration time.
Usage limits:
Predictions: 100 predictions/day
Hosted model predictions: Hosted models have a usage limit of 100
predictions/day/user across all models.
57
Training: 5MB trained/day

Streaming updates: 100 streaming updates/day
Lifetime cap: 20,000 predictions.
Expiration: Free quota expires six months after activating Google Prediction for
your project in the Google Developers Console.
https://developers.google.com/prediction/
Reddit
Reddit /rdt/,[3] stylized as reddit,[4] is an entertainment, social networking
service and news website where registered community members can submit
content, such as text posts or direct links. Only registered users can then vote
submissions "up" or "down" to organize the posts and determine their position on
the site's pages. Content entries are organized by areas of interest called
"subreddits". (source Wikipedia)
http://www.reddit.com/r/MachineLearning/
SCHOGUN toolbox
A large scale machine learning toolbox. SHOGUN is designed for unified large-scale
learning for a broad range of feature types and learning settings, like classification,
regression, or explorative data analysis.
http://www.shogun-toolbox.org/page/home/
Comparison between ML toolbox
https://docs.google.com/spreadsheet/ccc?key=0Aunb9cCVAP6NdDVBMzY1TjdPcmx4ei1EeUZNNGtKUHc&hl=en#gid=0
Infer.NET, Microsoft Research

Infer.NET is a framework for running Bayesian inference in graphical models. It can
also be used for probabilistic programming as shown in this video.
You can use Infer.NET to solve many different kinds of machine learning problems,
from standard problems like classification or clustering through to customised
solutions to domain-specific problems. Infer.NET has been used in a wide variety of
domains including information retrieval, bioinformatics, epidemiology, vision, and
many others.
A new feature in Infer.NET 2.5 is Fun, a library turns the simple succinct syntax of
F# into a probabilistic modeling language for Bayesian machine learning. You can
run your models with F# to compute synthetic data, and you can compile your
models with the Infer.NET compiler for efficient inference. See the Infer.NET Fun
website for additional information.
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/default.aspx
F# Software Foundation
F# is ideally suited to machine learning because of its efficient execution, succinct
style, data access capabilities and scalability. F# has been successfully used by some
of the most advanced machine learning teams in the world, including several groups
at Microsoft Research.
Try F# has some introductory machine learning algorithms. Further resources
related to different aspects of machine learning are below.
See also the Math and Statistics and Data Science sections for related material.
58
http://fsharp.org/machine-learning/
BigML
Now Free
Unlimited tasks (up to 16MB/Task)
https://bigml.com/
BRML Toolbox in Matlab David Barber Toolbox, University College London
http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.Software
Dmitry Efimov Software

http://mech.math.msu.su/~efimov/indexe.php
SCILAB
Scilab is free and open source software for numerical computation providing a
powerful computing environment for engineering and scientific applications.
Scilab includes hundreds of mathematical functions. It has a high level programming
language allowing access to advanced data structures, 2-D and 3-D graphical
functions.
http://www.scilab.org/en/scilab/about
OverFeat and Torch7, CILVR Lab @ NYU
OverFeat is an image recognizer and feature extractor built around a convolutional
network.
The OverFeat convolutional net was trained on the ImageNet 1K dataset. It
participated in the ImangeNet Large Scale Recognition Challenge 2013 under the
name OverFeat NYU.
This release provides C/C++ code to run the network and output class probabilities
or feature vectors. It also includes a webcam-based demo.
Torch7 is an interactive development environment for machine learning and
computer vision. It is an extension of the Lua language with a multidimensional
numerical array library.
Lua is a very simple, compact and efficient interpreter/compiler with a
straightforward syntax. It is used widely as a scripting language in the computer
game industry. Torch extends Lua with an extensive numerical library and various
facilities for machine learning and computer vision.
Torch has computational back-ends for multicore/multi-CPU machines (using
Intel/AVX and OpenMP), NVidia GPUs (using CUDA), and ARM CPUs (using the Neon
instruction set).
Many research projects at the CILVR Lab are built with Torch.
http://cilvr.nyu.edu/doku.php?id=code:start

Mloss.org
Our goal is to support a community creating a comprehensive open source machine
learning environment. Ultimately, open source machine learning software should be
able to compete with existing commercial closed source solutions. To this end, it is
59
not enough to bring existing and freshly developed toolboxes and algorithmic
implementations to people's attention. More importantly the MLOSS platform will
facilitate collaborations with the goal of creating a set of tools that work with one
another. Far from requiring integration into a single package, we believe that this
kind of interoperability can also be achieved in a collaborative manner, which is
especially suited to open source software development practices.
https://mloss.org/software/view/501/

Sourceforge
Find, Create, and Publish Open Source Software for free
http://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=machine%20learning

Freecode
Freecode maintains the Web's largest index of Linux, Unix and cross-platform
software, and mobile applications. Thousands of applications, which are preferably
released under an open source license, are meticulously cataloged in the Freecode
database, and links to new applications are added daily. Each entry provides a
description of the software, links to download it and to obtain more information,
and a history of the project's releases, so readers can keep up-to-date on the latest
developments.
Freecode is the first stop for Linux users hunting for the software they need for
work or play. It is continuously updated with the latest developments from the
"release early, release often" community. In addition to providing news on new
releases, Freecode offers a variety of original content on technical, political, and
social aspects of software and programming, written by both Freecode readers and
Free Software luminaries. The comment board attached to each page serves as a
home for spirited discussion, bug reports, and technical support. An essential
resource for serious developers, Freecode makes it possible to keep up on who's
doing what, and what everyone else thinks of it.
http://freecode.com/search?q=machine+learning&submit=Search
Open Machine Learning Workshop organized by Alekh Agarwal, Alina Beygelzimer,
and John Langford, August 2014
The goal of this workshop is to inform people about open source machine learning
systems being developed, aid the coordination of such projects, and discuss future
plans.
http://hunch.net/~nyoml/
Maxim Milakov Software
I am a researcher in machine learning and high-performance computing.
I designed and implemented nnForge - a library for training convolutional and fully
connected neural networks, with CPU and GPU (CUDA) backends.
You will find my thoughts on convolutional neural networks and the results of
applying convolutional ANNs for various classification tasks in the Blog.
http://www.milakov.org
60

Alfonso Nieto-Castanon Software
http://www.alfnie.com/software

Lib Skylark
The Sketching based Matrix computations for Machine Learning is a library for
matrix computations suitable for general statistical data analysis and optimization
applications.
Many tasks in machine learning and statistics ultimately end up being problems
involving matrices: whether you're finding the key players in the bitcoin market, or
inferring where tweets came from, or figuring out what's in sewage, you'll want to
have a toolkit for least-squares and robust regression, eigenvector analysis, non-
negative matrix factorization, and other matrix computations.
Sketching is a way to compress matrices that preserves key matrix properties; it can
be used to speed up many matrix computations. Sketching takes a given matrix A
and produces a sketch matrix B that has fewer rows and/or columns than A. For a
good sketch B, if we solve a problem with input B, the solution will also be pretty
good for input A. For some problems, sketches can also be used to get faster ways to
find high-precision solutions to the original problem. In other cases, sketches can be
used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the
matrix, where each row (and/or column) is equally likely to be sampled. This
uniform sampling is quick and easy, but doesn't always yield good sketches;
however, there are sophisticated sampling methods that do yield good sketches.
http://xdata-skylark.github.io
Mutual Information Text Explorer
The Mutual information Text Explorer is a tool that allows interactive exploration
of text data and document covariates. See the paper or slides for information.
Currently, an experimental system is available.
http://brenocon.com/MiTextExplorer/
Data Science Resources by Jonathan Bower on GitHub
Added in the kit 27-Oct-2014
This repo is intended to provide open source resources to facilitate learning or to
point practicing/aspiring data scientists in the right direction. It also exists so that I
can keep track of resources that are/were helpful to me and hopefully for you.
I aim to cover the full spectrum of data science and to hopefully include topics of
data science that aren't either actively covered or easy to find in the open-source
world. For instance, I haven't focused on in-depth machine learning theory since
that is well covered. If you are looking for ML theory I would look to some of the
online courses, books or bootcamps. There is a lot of theory information available
online, some is linked lower on this page here, here and other info is available with
many purchasable books.
61
Keep in mind that this is a constant work in progress. If you have anything to add,
any feedback, or would like to be a contributor - please reach out. If there are any
mistakes or typos, be patient with me, but please let me know.
Lastly, I would add that a large portion of data science is exploratory data analysis
and properly cleaning your data to implement the tools and theory necessary to
solve the problem at hand. For each problem there are many different ways and
tools to execute a successful solution - if one method isn't working re-evaluate, re-
work the problem, try another approach and/or reach out to the community for
support. Good luck and I hope this repo helpful!
https://github.com/jonathan-bower/DataScienceResources

Joseph Misiti's Blog
A curated list of awesome machine learning frameworks, libraries and software (by
language). Inspired by awesome-php. Other awesome lists can be found in the
awesome-awesomeness list.
https://github.com/josephmisiti/awesome-machine-learning
Michael Waskom GitHub repositories
I'm a Ph.D. student in the Department of Psychology at Stanford University, where I
work with Anthony Wagner. I use behavioral, computational, and neuroimaging
methods to study cognitive control and decision making in humans.
Previously, I spent time in John Gabrieli's lab at MIT investigating whether cognition
can be improved through training. I did my undergrad at Amherst College, where I
studied philosophy and neuroscience.
Complementing this research, I have developed a set of software libraries for
statistical analysis and visualization. These libraries aim to make computationally-
based research more reproducible and improve the visual presentation of statistical
and neuroimaging results.
https://github.com/mwaskom
Visualizing distributions of data
This notebook demonstrates different approaches to graphically representing
distributions of data, specifically focusing on the tools provided by the seaborn
package.
http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/p
lotting_distributions.ipynb

Exploring Seaborn and Pandas based plot types in HoloViews by Philipp John Frederic
Rudiger
In this notebook we'll look at interfacing between the composability and ability to
generate complex visualizations that HoloViews provides and the great looking
plots incorporated in the seaborn library. Along the way we'll explore how to wrap
different types of data in a number of Seaborn View types, including:
- Distribution Views
- Bivariate Views
62
- TimeSeries Views
Additionally we explore how a Pandas dframe can be wrapped in a general purpose
View type, which can either be used to convert the data into standard View types or
be visualized directly using a wide array of plotting options, including:
- Regression plots, correlation plots, box plots, autocorrelation plots, scatter
matrices, histograms or regular scatter or line plots.
http://philippjfr.com/blog/seabornviews/
Open Source Hong Kong
Open Source Hong Kong (OSHK) is an open source organization in Hong Kong
which is aimed to advocate open source and technologies developments.
http://opensource.hk/en/event

Lamda Group, Nanjing University
Open Source Software
http://lamda.nju.edu.cn/Data.ashx#code

Big Data/Cloud Computing English

Apache SPARK
Apache Spark Machine Learning Library
MLlib is a Spark implementation of some common machine learning (ML)
functionality, as well associated tests and data generators. MLlib currently supports
four common types of machine learning problem settings, namely, binary
classification, regression, clustering and collaborative filtering, as well as an
underlying gradient descent optimization primitive.
http://spark.apache.org/docs/0.9.1/mllib-guide.html
2013 Spark Summit exercises
Welcome to the Spark Summit hands-on exercises. These exercises are adapted
from similar exercises that were prepared for and run at AMP Camp Big Data
Bootcamps. They were written by volunteer graduate students and postdocs in the
UC Berkeley AMPLab. Many of those same graduate students are also volunteers
here on the Spark Summit Training day team as well. The exercises we cover today
will have you working directly with the Spark specific components of the AMPLabs
open-source software stack, called the Berkeley Data Analytics Stack (BDAS).
http://spark-summit.org/2013/exercises/index.html

63
2014 Spark Summit Training

Course Prerequisites:
Laptop with WiFi capabilities
Java 6 or 7
TRACK A: Introduction to Apache Spark Workshop
INTRO EXERCISES
The Introduction to Apache Spark workshop is for users to learn the core Spark
APIs. This session features hands-on technical exercises to get developers up to
speed in using Spark for data exploration, analysis, and building big data
applications.
The integrated lecture and lab format covers the following topics:
Overview of Big Data and Spark
Installing Spark Locally
Using Sparks Core APIs in Scala, Java, & Python
Building Spark Applications
Deploying on a Big Data Cluster
Building Applications for Multiple Platforms
TRACK B:Advanced Apache Spark Workshop

ADVANCED EXERCISES
The Advanced Apache Spark Workshop will cover advanced topics on architecture,
tuning, and each of Sparks high-level libraries (including the latest features).
Attendees will have the opportunity after the lunch break to work through labs on
each of the libraries.
Some familiarity with Spark or MapReduce is expected, as this workshop will not
cover basic Spark programming.
Topics covered include:

Advanced Spark Internals and Tuning Reynold Xin SLIDES
Spark SQL Michael Armburst SLIDES
Spark Streaming Tathagata Das SLIDES
MLlib Ameet Talwalkar SLIDES
GraphX Ankur Dave SLIDES
Apache Spark Summit Videos
Videos related to the Apache Spark cluster computing engine.
https://www.youtube.com/user/TheApacheSpark/playlists

Databricks Videos
Databricks was founded out of the UC Berkeley AMPLab by the creators of Apache
Spark. Weve been working for the past six years on cutting-edge systems to extract
value from Big Data. We believe that Big Data is a huge opportunity that is still
largely untapped, and were working to revolutionize what you can do with it.
Open Source Commitment
64
Apache Spark is 100% open source, and at Databricks we are fully committed to
maintaining this model. We believe that no computing platform will win in the Big
Data space unless it is fully open source.
Spark has one of the largest open source communities in Big Data, with over
200 contributors from 50+ organizations. Databricks works closely with the
community to maintain this momentum.
https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA/videos

Apache MAHOUT
Apache Mahout ML library
The Apache Mahout project's goal is to build a scalable machine learning library.
Currently Mahout supports mainly three use cases: Recommendation mining takes
users' behavior and from that tries to find items users might like. Clustering takes
e.g. text documents and groups them into groups of topically related documents.
Classification learns from exisiting categorized documents what documents of a
specific category look like and is able to assign unlabelled documents to the
(hopefully) correct category.
https://mahout.apache.org

Apache Mahout on Javaworld
Enjoy machine learning with Mahout on Hadoop, 2014
Mahout brings the power of scalable processing to Hadoop's huge data sets
http://www.javaworld.com/article/2241046/big-data/enjoy-machine-learning-with-mahout-on-hadoop.html
Know this right now about Hadoop, 2014

From core elements like HDFS and YARN to ancillary tools like Zookeeper, Flume,
and Sqoop, here's your cheat sheet and cartography of the ever expanding Hadoop
ecosystem.
http://www.javaworld.com/article/2158789/data-storage/know-this-right-now-about-hadoop.html
MapReduce programming with Apache Hadoop, 2008

Process massive data sets in parallel on large clusters
http://www.javaworld.com/article/2077907/open-source-tools/mapreduce-programming-with-apache-hadoop.html
Deeplearning4j
Deeplearning4j is the first commercial-grade deep learning library written in Java. It
is meant to be used in business environments, rather than as a research tool for
extensive data exploration. Deeplearning4j is most helpful in solving distinct
problems, like identifying faces, voices, spam or e-commerce fraud.
Deeplearning4j aims to be cutting-edge plug and play, more convention than
configuration. By following its conventions, you get an infinitely scalable deep-
learning architecture. The framework has a domain-specific language (DSL) for
neural networks, to turn their multiple knobs.
Deeplearning4j includes a distributed deep-learning framework and a normal
deep-learning framework; i.e. it runs on a single thread as well. Training takes place
65
in the cluster, which means it can process massive amounts of data. Nets are trained
in parallel via iterative reduce.
The distributed framework is made for data input and neural net training at scale,
and its output should be highly accurate predictive models.
By following the links at the bottom of each page, you will learn to set up, and train
with sample data, several types of deep-learning networks. These include single-
and multithread networks, Restricted Boltzmann machines, deep-belief networks
and Stacked Denoising Autoencoders.
For a quick introduction to neural nets, please see our overview.
http://deeplearning4j.org/

Udacity opencourseware "Intro to Hadoop and MapReduce"
Course Summary
The Apache Hadoop project develops open-source software for reliable, scalable,
distributed computing. Learn the fundamental principles behind it, and how you can
use its power to make sense of your Big Data.
How Hadoop fits into the world (recognize the problems it solves)
Understand the concepts of HDFS and MapReduce (find out how it solves the
problems)
Write MapReduce programs (see how we solve the problems)
Practice solving problems on your own

Storm Apache
Apache Storm is a free and open source distributed realtime computation system.
Storm makes it easy to reliably process unbounded streams of data, doing for
realtime processing what Hadoop did for batch processing. Storm is simple, can be
used with any programming language, and is a lot of fun to use!
http://storm.incubator.apache.org
http://storm.incubator.apache.org/documentation/Tutorial.html

Michael Viogiatzis Blog
How to spot first stories on Twitter using Storm
As a first blog post, I decided to describe a way to detect first stories (a.k.a new
events) on Twitter as they happen. This work is part of the Thesis I wrote last year
for my MSc in Computer Science in the University of Edinburgh.You can find the
document here.
http://micvog.com/2013/09/08/storm-first-story-detection/


66
Elasticsearch
Elasticsearch is a flexible and powerful open source, distributed, real-time search
and analytics engine. Architected from the ground up for use in distributed
environments where reliability and scalability are must haves, Elasticsearch gives
you the ability to move easily beyond simple full-text search. Through its robust set
of APIs and query DSLs, plus clients for the most popular programming languages,
Elasticsearch delivers on the near limitless promises of search technology.
http://www.elasticsearch.org

Prediction IO
BUILD SMARTER SOFTWARE with Machine Learning
PredictionIO is an open source machine learning server for software developers to
create predictive features, such as personalization, recommendation and content
discovery.
http://prediction.io
https://hacks.mozilla.org/2014/04/introducing-predictionio/
http://www.youtube.com/channel/UCN0jVSCIEh7eeuWXIuo316g

Container Cluster Manager
Kubernetes builds on top of Docker to construct a clustered container scheduling
service. The goals of the project are to enable users to ask a Kubernetes cluster to
run a set of containers. The system will automatically pick a worker node to run
those containers on.
As container based applications and systems get larger, some tools are provided to
facilitate sanity. This includes ways for containers to find and communicate with
each other and ways to work with and manage sets of containers that do similar
work.
When looking at the architecture of the system, we'll break it down to services that
run on the worker node and services that play a "master" role.
https://github.com/GoogleCloudPlatform/kubernetes?utm_source

Domino Data Labs
Domino is a platform for modern data scientists using Python, R, Matlab, and more.
Use our cloud-hosted infrastructure to securely run your code on powerful
hardware with a single command without any changes to your code.
If you have your own infrastructure, our Enterprise offering provides powerful,
easy-to-use cluster management functionality behind your firewall.
Special offer for The Machine Learning Salon's readers:
Machine Learning Salon readers can get $50 worth of compute credits when they
sign up for Domino. Domino lets you run your analyses on powerful cloud hardware
in one step without any setup or changes to your code. Sign up here, or email
support@dominoup.zendesk.com and tell them you are a Machine Learning Salon
reader.
67
http://www.dominoup.com
Data Science Central
Data Science Central is the industry's online resource for big data practitioners.
From Analytics to Data Integration to Visualization, Data Science Central provides a
community experience that includes a robust editorial platform, social interaction,
forum-based technical support, the latest in technology, tools and trends and
industry job opportunities.
http://www.datasciencecentral.com
Amazon Web Services Videos
https://www.youtube.com/user/AmazonWebServices/playlists
Google Cloud Computing Videos
https://developers.google.com/cloud/videos
VLAB: Deep Learning: Intelligence from Big Data, Stanford Graduate School of Business
http://www.youtube.com/watch?v=czLI3oLDe8M&spfreload=10

Machine Learning and Big Data in Cyber Security Eyal Kolman Technion Lecture
http://www.youtube.com/watch?v=G2BydTwrrJk&spfreload=10

Chaire Machine Learning Big Data, Telecom Paris Tech (Videos in French)
Tlcom ParisTech a organis les premires rencontres de la Chaire de recherche
Machine Learning for Big data, le 26 novembre 2014, avec ses partenaires
Fondation tlcom, Criteo, PSA Peugeot Citron, Safran.
http://www.dailymotion.com/video/x2cti71_chaire-ml-big-data-premieres-
rencontres_school
https://www.youtube.com/user/TelecomParisTech1/search?query=big+data

An Architecture for Fast and General Data Processing on Large Clusters by Matei
Zaharia, 2014
The past few years have seen a major change in computing systems, as growing data
volumes and stalling processor speeds require more and more applications to scale
out to distributed systems. Today, a myriad data sources, from the Internet to
business operations to scientific instruments, produce large and valuable data
streams. However, the processing capabilities of single machines have not kept up
with the size of data, making it harder and harder to put to use. As a result, a grow-
ing number of organizationsnot just web companies, but traditional enterprises
and research labsneed to scale out their most important computations to clusters
of hundreds of machines.

68
At the same time, the speed and sophistication required of data processing have
grown. In addition to simple queries, complex algorithms like machine learning and
graph analysis are becoming common in many domains. And in addition to batch
processing, streaming analysis of new real-time data sources is required to let
organizations take timely action. Future computing platforms will need to not only
scale out traditional workloads, but support these new applications as well.
This dissertation proposes an architecture for cluster computing systems that can
tackle emerging data processing workloads while coping with larger and larger
scales. Whereas early cluster computing systems, like MapReduce, handled batch
processing, our architecture also enables streaming and interactive queries, while
keeping the scalability and fault tolerance of previous systems. And whereas most
deployed systems only support simple one-pass computations (e.g., aggregation or
SQL queries), ours also extends to the multi-pass algorithms required for more
complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike
the specialized systems proposed for some of these workloads, our architecture
allows these computations to be combined, enabling rich new applications that
intermix, for example, streaming and batch processing, or SQL and complex
analytics.
We achieve these results through a simple extension to MapReduce that adds
primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show
that this is enough to efficiently capture a wide range of workloads. We implement
RDDs in the open source Spark system, which we evaluate using both synthetic
benchmarks and real user applications. Spark matches or exceeds the performance
of specialized systems in many application domains, while offering stronger fault
tolerance guarantees and allowing these workloads to be combined. We explore the
generality of RDDs from both a theoretical modeling perspective and a practical
perspective to see why this extension can capture a wide range of previously
disparate workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf
Predictive Modeling Competitions English

LinkedIn Economic Graph Challenge, Deadline: 15-12-2014, $25,000 research award
Added in the kit 25-oct-2014
Eligibility
The LinkedIn Economic Graph Challenge is open to all U.S. residents (including
citizens, permanent residents and visa holders) ages 18 and up. Team entries are
allowed, but entries on behalf of a company are not allowed. Teams may have up to
five individuals. Only one proposal (either team or individual) per person. Sorry,
current LinkedIn employees, contractors, affiliates or interns are not eligible to
enter.
Selection
Proposals are due by midnight Pacific Time on December 15, 2014. Each entry must
be an idea or process designed to create positive economic opportunity and impact
69
for members of the global workforce, so dream big. Proposals will be evaluated on a
combination of three factors, all equally weighted:
Novelty: Takes into account the thoughtfulness and originality of the entry,
including its unique approach to taking advantage of data from the Economic Graph.
Impact: Considers the potential benefits to the region, country and the world, as
well as the extensibility of the proposal.
Feasibility: This criterion will weigh the practicality of the submission, measuring
the likelihood it can be researched and implemented within a reasonable time
period and the types of data from LinkedIn that will be necessary for the proposed
research.
A diverse panel of judges will evaluate and select winning proposals.
Research Award Recipients
LinkedIn will select up to three proposals as winners of the LinkedIn Economic
Graph Challenge. Selected winners will be notified in early 2015. Each winning
submission will receive:
A one-time $25,000 (USD) research award.
Round-trip travel and accommodations to LinkedIn headquarters in Mountain View,
CA to participate in the LinkedIn Economic Challenge Research Reception (early
2015) and Final Presentation (Fall 2015).
The potential to receive research resources to execute proposal including a LinkedIn
employee collaborator, access to select data from LinkedIn, and equipment for use
during the six month research period.
Research award recipients will have six months to conduct their research, and will
return to Mountain View, CA, for a final presentation in Fall 2015. Research award
recipients must sign agreements covering intellectual property and non-disclosure
of information, and may not publish results without written consent from LinkedIn
Corporation.
http://economicgraphchallenge.linkedin.com/details/

ChaLearn
Added in the kit before 24-Oct-2014
Mission:
Machine Learning is the science of building hardware or software that can achieve
tasks by learning from examples. The examples often come as {input, output} pairs.
Given new inputs a trained machine can make predictions of the unknown output.
Examples of machine learning tasks include:
automatic reading of handwriting
assisted medical diagnosis
automatic text classification (classification of web pages; spam filtering)
financial predictions
We organize challenges to stimulate research in this field. The web sites of past
challenges remain open for post-challenge submission as ever-going benchmarks.
ChaLearn is a tax-exempt organization under section 501(c)(3) of the US IRS code.
DLN: 17053090370022.
http://www.chalearn.org
70
IMAGENET Large Scale Visual Recognition Challenge 2014 (closed)

Introduction
This challenge evaluates algorithms for object detection and image classification at
large scale. This year there will be two competitions:
A PASCAL-style detection challenge on fully labeled data for 200 categories of
objects, and
An image classification plus object localization challenge with 1000 categories.
NEW: This year all participants are encouraged to submit object localization results;
in past challenges, submissions to classification and classification with localization
tasks were accepted separately.
One high level motivation is to allow researchers to compare progress in detection
across a wider variety of objects -- taking advantage of the quite expensive labeling
effort. Another motivation is to measure the progress of computer vision for large
scale image indexing for retrieval and annotation.
History
ILSVRC 2013
ILSVRC 2012
ILSVRC 2011
ILSVRC 2010
http://image-net.org/challenges/LSVRC/2014/
Kaggle
Kaggle is the world's largest community of data scientists. They compete with each
other to solve complex data science problems, and the top competitors are invited to
work on the most interesting and sensitive business problems from some of the
worlds biggest companies through Masters competitions.
http://www.kaggle.com/competitions

Kaggle Competition Past Solutions
We learn more from code, and from great code. Not necessarily always the 1st
ranking solution, because we also learn what makes a stellar and just a good
solution. I will post solutions I came upon so we can all learn to become better!
I collected the following source code and interesting discussions from the Kaggle
held competitions for learning purposes. Not all competitions are listed because I
am only manually collecting them, also some competitions are not listed due to no
one sharing. I will add more as time goes by. Thank you.
http://www.chioka.in/kaggle-competition-solutions/


71
Kaggle Connectomics Winning Solution Research Article

Added in the kit before 24-oct-2014
Simple connectome inference from partial correlation statistics in calcium imaging
http://arxiv.org/abs/1406.7865
Solution to the Galaxy Zoo Challenge
http://benanne.github.io/2014/04/05/galaxy-zoo.html
https://github.com/benanne/kaggle-galaxies
Winning 2 Kaggle in class competitions on spam
http://mlwave.com/winning-2-kaggle-in-class-competitions-on-spam/
Matlab Benchmark for Packing Santas Sleigh translated in Python
http://beatingthebenchmark.blogspot.co.uk/search?updated-min=2013-01-01T00:00:00-08:00&updated-max=2014-01-
01T00:00:00-08:00&max-results=4
TEDx San Francisco, Jeremy Howard talk (Connecting Devices with Algorithms)
http://tedxsf.org/videos/#tedxsf-connected-reality
CrowdANALYTICS
https://crowdanalytix.com/jq/solver.html
Challenges for governmental applications
https://challenge.gov
InnoCentive Challenge Center
https://www.innocentive.com/ar/challenge/browse
TunedIT
http://tunedit.org
Ants, AI Challenge, sponsored by Google, 2011
The AI Challenge is all about creating artificial intelligence, whether you are a
beginning programmer or an expert. Using one of the easy-to-use starter kits, you
will create a computer program (in any language) that controls a colony of ants
which fight against other colonies for domination.
http://ants.aichallenge.org
International Collegial Programming Contest
72
The ACM International Collegiate Programming Contest (ICPC) is the premiere

global programming competition conducted by and for the worlds universities. The
competition operates under the auspices of ACM, is sponsored by IBM, and is
headquartered at Baylor University. For nearly four decades, the ICPC has grown to
be a game- changing global competitive educational program that has raised
aspirations and performance of generations of the worlds problem solvers in the
computing sciences and engineering.
http://icpc.baylor.edu/welcome.icpc

Dream challenges
The Dialogue on Reverse Engineering Assessment and Methods (DREAM) project is
an initiative to advance the field of systems biology through the organization of
Challenges to foster the development of predictive models that allow scientists to
better understand human disease. Challenges engage broad and diverse
communities of scientists to competitively solve a specific problem in a given time
period. The concept fosters collaboration between scientists through shared data
and approaches.
DREAM has developed by Challenge concept by launching 27 successful challenges
over the past seven years. Sage Bionetworks and DREAM merged in early 2013 in
order to develop Challenges engage a broader participation of the research
community in open science projects hosted on Synapse, and that provide a
meaningful impact to both discovery and clinical research. By presenting the
research community with well-formulated questions that usually involve complex
data, we effectively enable the sharing and improvement of predictive models,
accelerating many-fold the transformation of this data into useful scientific
knowledge. Our ultimate goal is to foster collaborations of like-minded researchers
that together will find the solution for vexing problems that matter most to citizens
and patients.
https://www.synapse.org/#!Wiki:syn1929437/ENTITY

Texata
Welcome to the Official 2014 TEXATA Big Data Analytics World Championships.
This global event is a fun, innovative and challenging competition for students and
professionals to develop and test their Big Data Analytics skills against their friends,
colleagues and top data experts from around the world. TEXATA 2014 is a World
Championship Event independently organized and administered by the Professional
Services Champions League (PSCL).
http://www.texata.com

Cisco Internet of Things Innovation Grand Challenge
73
The focus of the Internet of Things (IoT) Innovation Grand Challenge is to spearhead
an industry-wide initiative to accelerate the adoption of breakthrough technologies
and products that will contribute to the growth and evolution of the Internet of
Things.
This global open competition aims to recognize, promote and reward innovators,
entrepreneurs and early-stage startup businesses that can help us transform
businesses and industries by re-inventing business processes, operational
efficiencies and customer service innovations.
We are seeking submissions from early stage businesses and teams that have
technology-based prototypes and proof of concepts (PoC) in development.
https://iotchallenge.cisco.spigit.com/Page/Home
Predictive Modeling Competitions - Spanish

Coming soon

Predictive Modeling Competitions - German

Coming soon
Predictive Modeling Competitions - Italian

Coming soon

Predictive Modeling Competitions French

RATP OpenDataLab results

http://data.ratp.fr/fr/actualites.html

Coming soon

Predictive Modeling Competitions - Russian


74
Competition Avito.ru-2014: Recognition of contact information in images

Contest to recognize the contact information on the pictures Avito.ru - contest
on solving applied problems from the field of image analysis, held under an
informational support of the 10-th International Conference"intellectualization of
information processing-2014" (IOI 2014), Greece, on. Crete, 4-11 October 2014.
The organizers of the competition - the company Avito.ru and her partner -
Foreksis .
Questions to the organizers of the contest can be set in the discussion page of the
competition for registered portalMachineLearning.ru users, or by e-mail
to competition.avito.2014@forecsys.ru indicating in the subject line "Question".
With information about the organizer of the contest, its rules, the number of awards,
the date, place and manner of their preparation can be found here .
Preliminary rating of participants .
Key dates of the competition
October 1, 2014 - Start of the contest until 23:59 November 4 - Registration of
participants 23:59 November 13 -Education and collection algorithms
participants November 14 - Providing control sample C, and answers for the
sample B to 23:59 November 18 - Collecting the results of algorithms the control
sample C November 19 - December 10 - The winners and check the
reproducibility of results, publication of presentations of winners on the contest
page
http://www.machinelearning.ru/wiki/index.php?title=%D0%9A%D0%BE%D0%B
D%D0%BA%D1%83%D1%80%D1%81_Avito.ru-
2014:_%D1%80%D0%B0%D1%81%D0%BF%D0%BE%D0%B7%D0%BD%D0%B
0%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%BA%D0%BE%D0%BD%
D1%82%D0%B0%D0%BA%D1%82%D0%BD%D0%BE%D0%B9_%D0%B8%D0%
BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0%B8_%
D0%BD%D0%B0_%D0%B8%D0%B7%D0%BE%D0%B1%D1%80%D0%B0%D0%
B6%D0%B5%D0%BD%D0%B8%D1%8F%D1%85
Russian AI Cup - Competition Programming Artificial Intelligence, 2013
Open competition Programming Artificial Intelligence. Try your hand at
programming strategy game! It's simple, clear and fun!
Championship second Russian AI Cup called CodeTroopers. You have to program
the AI to the detachment of soldiers. Your strategy will battle each other in the
Sandbox and the championship. You can use any of the programming languages: C +
+, Java, C #, Python or Pascal. Sandbox is already open. Good luck!
To participate in the competition are invited as novice programmers - students and
students and professionals alike. Does not require any special knowledge, fairly
basic programming skills.
http://russianaicup.ru/


75
Predictive Modeling Competitions - Portuguese

Coming soon
Open Dataset English

The Text REtrieval Conference (TREC) Datasets
The Text REtrieval Conference (TREC), co-sponsored by the National Institute of
Standards and Technology (NIST) and U.S. Department of Defense, was started in
1992 as part of the TIPSTER Text program. Its purpose was to support research
within the information retrieval community by providing the infrastructure
necessary for large-scale evaluation of text retrieval methodologies. In particular,
the TREC workshop series has the following goals:
to encourage research in information retrieval based on large test
collections;
to increase communication among industry, academia, and government by
creating an open forum for the exchange of research ideas;
to speed the transfer of technology from research labs into commercial
products by demonstrating substantial improvements in retrieval
methodologies on real-world problems; and
to increase the availability of appropriate evaluation techniques for use by
industry and academia, including development of new evaluation techniques
more applicable to current systems.
TREC is overseen by a program committee consisting of representatives from
government, industry, and academia. For each TREC, NIST provides a test set of
documents and questions. Participants run their own retrieval systems on the data,
and return to NIST a list of the retrieved top-ranked documents. NIST pools the
individual results, judges the retrieved documents for correctness, and evaluates the
results. The TREC cycle ends with a workshop that is a forum for participants to
share their experiences.
This evaluation effort has grown in both the number of participating systems and
the number of tasks each year. Ninety-three groups representing 22 countries
participated in TREC 2003. The TREC test collections and evaluation software are
available to the retrieval research community at large, so organizations can evaluate
their own retrieval systems at any time. TREC has successfully met its dual goals of
improving the state-of-the-art in information retrieval and of facilitating technology
transfer. Retrieval system effectiveness approximately doubled in the first six years
of TREC.
TREC has also sponsored the first large-scale evaluations of the retrieval of non-
English (Spanish and Chinese) documents, retrieval of recordings of speech, and
retrieval across multiple languages. TREC has also introduced evaluations for open-
domain question answering and content-based retrieval of digital video. The TREC
test collections are large enough so that they realistically model operational
76
settings. Most of today's commercial search engines include technology first

developed in TREC.
http://trec.nist.gov/data.html
HDX Humanitarian Data Exchange
What is HDX?
The goal of the Humanitarian Data Exchange (HDX) is to make humanitarian data
easy to find and use for analysis. We are working on three elements that will
eventually combine into an integrated data platform.
Repository
The HDX repository, where data providers can upload their raw data spreadsheets
for others to find and use.
Analytics
HDX analytics, a database of high-value data that can be compared across countries
and crises, with tools for analysis and visualisation.
Standards
Standards to help share humanitarian data through the use of a consensus
Humanitarian Exchange Language.
https://data.hdx.rwlabs.org/dataset
World Data Bank
Explore. Create. Share: Development Data
DataBank is an analysis and visualisation tool that contains collections of time series
data on a variety of topics. You can create your own queries; generate tables, charts,
and maps; and easily save, embed, and share them.
The World Bank Group has set two goals for the world to achieve by 2030:
End extreme poverty by decreasing the percentage of people living on less than
$1.25 a day to no more than 3%
Promote shared prosperity by fostering the income growth of the bottom 40%
for every country
The World Bank is a vital source of financial and technical assistance to developing
countries around the world. We are not a bank in the ordinary sense but a unique
partnership to reduce poverty and support development. The World Bank Group
comprises five institutions managed by their member countries.
Established in 1944, the World Bank Group is headquartered in Washington, D.C.
We have more than 10,000 employees in more than 120 offices worldwide.
http://databank.worldbank.org/data/home.aspx
US Dataset
The home of the U.S. Governments open data
Here you will find data, tools, and resources to conduct research, develop web and
mobile applications, design data visualizations, and more.
http://www.data.gov/
77
US City Open Data Census

http://us-city.census.okfn.org
Machine Learning repository
The UCI Machine Learning Repository is a collection of databases, domain theories,
and data generators that are used by the machine learning community for the
empirical analysis of machine learning algorithms. The archive was created as an ftp
archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that
time, it has been widely used by students, educators, and researchers all over the
world as a primary source of machine learning data sets. As an indication of the
impact of the archive, it has been cited over 1000 times, making it one of the top 100
most cited "papers" in all of computer science. The current version of the web site
was designed in 2007 by Arthur Asuncion and David Newman, and this project is in
collaboration with Rexa.info at the University of Massachusetts Amherst. Funding
support from the National Science Foundation is gratefully acknowledged.
https://archive.ics.uci.edu/ml/datasets.html
IMAGENET
ImageNet is an image database organized according to the WordNet hierarchy
(currently only the nouns), in which each node of the hierarchy is depicted by
hundreds and thousands of images. Currently we have an average of over five
hundred images per node. We hope ImageNet will become a useful resource for
researchers, educators, students and all of you who share our passion for pictures.
Who uses ImageNet?
We envision ImageNet as a useful resource to researchers in the academic world, as
well as educators around the world.
Does ImageNet own the images? Can I download the images?
No, ImageNet does not own the copyright of the images. ImageNet only provides
thumbnails and URLs of images, in a way similar to what image search engines do. In
other words, ImageNet compiles an accurate list of web images for each synset of
WordNet. For researchers and educators who wish to use the images for non-
commercial research and/or educational purposes, we can provide access through
our site under certain conditions and terms. For details click here
http://www.image-net.org
Stanford Large Network Dataset Collection
Social networks : online social networks, edges represent interactions between
people
Networks with ground-truth communities : ground-truth network communities in
social and information networks
Communication networks : email communication networks with edges representing
communication
78
Citation networks : nodes represent papers, edges represent citations

Collaboration networks : nodes represent scientists, edges represent collaborations
(co-authoring a paper)
Web graphs : nodes represent webpages and edges are hyperlinks
Amazon networks : nodes represent products and edges link commonly co-
purchased products
Internet networks : nodes represent computers and edges communication
Road networks : nodes represent intersections and edges roads connecting the
intersections
Autonomous systems : graphs of the internet
Signed networks : networks with positive and negative edges (friend/foe,
trust/distrust)
Location-based online social networks : Social networks with geographic check-ins
Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia
Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
Online communities : Data from online communities such as Reddit and Flickr
Online reviews : Data from online review systems such as BeerAdvocate and
Amazon
Information cascades : ...
SNAP networks are also availalbe from UF Sparse Matrix collection. Visualizations
of SNAP networks by Tim Davis.
http://snap.stanford.edu/data/

Deep Learning datasets
Deep Learning is a new area of Machine Learning research, which has been
introduced with the objective of moving Machine Learning closer to one of its
original goals: Artificial Intelligence.
This website is intended to host a variety of resources and pointers to information
about Deep Learning. In these pages you will find
a reading list,
links to software,
datasets,
a list of deep learning research groups and labs,
a list of announcements for deep learning related jobs (job listings),
as well as tutorials and cool demos.
http://deeplearning.net/datasets/
Open Government Data (OGD) Platform India
http://data.gov.in
Yahoo Datasets
79
We have various types of data available to share. They are categorized into Ratings,
Language, Graph, Advertising and Market Data, Computing Systems and an appendix
of other relevant data and resources available via the Yahoo! Developer Network.
http://webscope.sandbox.yahoo.com/catalog.php
Windows Azure Marketplace
One-Stop Shop for Premium Data and Applications
Hundreds of Apps, Thousands of Subscriptions, Trillions of Data Points
https://datamarket.azure.com/browse/data?price=free
Amazon Public Data Sets
Public Data Sets on AWS provides a centralized repository of public data sets that
can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the
public data sets at no charge for the community, and like all AWS services, users pay
only for the compute and storage they use for their own applications. Learn more
about Public Data Sets on AWS and visit the Public Data Sets forum.
http://aws.amazon.com/datasets/
Wikipedia: Database Download
Wikipedia offers free copies of all available content to interested users. These
databases can be used for mirroring, personal use, informal backups, offline use or
database queries (such as for Wikipedia:Maintenance). All text content is multi-
licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-
SA) and the GNU Free Documentation License (GFDL). Images and other files are
available under different terms, as detailed on their description pages. For our
advice about complying with these licenses, see Wikipedia:Copyrights.
http://en.wikipedia.org/wiki/Wikipedia:Database_download
Gutenberg project (Free books available in different format, useful for NLP)
Project Gutenberg offers 45,541 free ebooks to download. (source the 5th June 2014)
http://www.gutenberg.org/ebooks/search/?sort_order=downloads
Freebase
Use Freebase data
Freebase data is free to use under an open license. You can:
Query Freebase using our Search, Topic, or MQL APIs
Download our weekly data dumps
http://www.freebase.com
Datamob Data
http://datamob.org/datasets
80
Reddit Datasets
http://www.reddit.com/r/datasets/
100+ Interesting Data Sets for Statistics
Summary: Looking for interesting data sets? Here's a list of more than 100 of the
best stuff, from dolphin relationships to political campaign donations to death row
prisoners.
http://rs.io/2014/05/29/list-of-data-sets.html
Data portal of the City of Chicago
https://data.cityofchicago.org/browse?limitTo=datasets&utf8=
Remark: you need to copy the following link in your browser, temporary problem
Gold mine where we can find data set such as names, salaries, positions of all
persons working for Chicago City!
https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-
Salaries-and-Position-Title/xzkq-xp2w
Data portal of the City of Seattle
https://data.seattle.gov/browse
Data portal of the City of LA
https://data.lacity.org/browse?limitTo=datasets&utf8=
Remark: you need to copy the following link in your browser, temporary problem
California Department of Water Resources
DWR has many programs and data tools to collect and disseminate information on
water resources.
All Water Data Topics http://www.water.ca.gov/nav/nav.cfm?loc=t&id=106
CALIFORNIA DATA EXCHANGE CENTER (CDEC)
With the cooperation of over 140 other agencies, the CDEC provides real-time,
forecast, and historical hydrologic data. This data includes water discharge in rivers,
water storage in reservoirs, precipitation accumulation, and water content in snow
pack, primarily focused in flood management. However, the data is also helpful for
determining general water availability and natural supply trends.
More about CDEC http://cdec.water.ca.gov/
CALIFORNIA IRRIGATION MANAGEMENT INFORMATION SYSTEM (CIMIS)
CIMIS is a network of over 120 automated weather stations in California. CIMIS was
developed in 1982 by DWR and the University of California, Davis to assist
California's irrigators to manage their water resources efficiently.
More about CIMIS http://wwwcimis.water.ca.gov/cimis/welcome.jsp
WATER DATA LIBRARY
81
The library provides geographic-based data on water conditions.

More about the Water Data Library http://www.water.ca.gov/waterdatalibrary/
INTERAGENCY ECOLOGICAL PROGRAM
The Interagency Ecological Program (IEP) provides ecological information and
scientific leadership for use in management of the San Francisco Estuary.
More about IEP http://www.water.ca.gov/iep/
INTEGRATED WATER RESOURCES INFORMATION SYSTEM (IWRIS)
IWRIS is a one stop shop for state-wide water resources information. It integrates
multi-disciplinary data to support Integrated Regional Water Management.
More about IWRIS http://www.water.ca.gov/iwris/
http://www.water.ca.gov/data_home.cfm
Data portal of the City of Dallas
https://www.dallasopendata.com/browse
Data portal of the City of Austin
https://data.austintexas.gov
How to produce and use datasets: lessons learned, mlwave
http://mlwave.com/how-to-produce-and-use-datasets-lessons-learned/
MITx and HarvardX release MOOC datasets and visualization tools
http://newsoffice.mit.edu/2014/mitx-and-harvardx-release-mooc-datasets-and-vizualization-tools
Finding the perfect house using open data, Justin Palmers Blog
http://dealloc.me/2014/05/24/opendata-house-hunting/
Synapse

A private or public workspace that allows you to aggregate,
describe, and share your research.

A tool to improve reproducibility of data intensive science, recording
progress as you work with tools such as R and Python.

A set of living research projects enabling contribution to large-scale
collaborative solutions to scientific problems.
https://www.synapse.org
NYC Taxi Trips Date from 2013
These data were made publicly available thanks to Chris Whong who did the heavy
lifting. He is also providing links to a bittorrent where the data can be downloaded
much faster. Read more about it here.
82
http://www.andresmh.com/nyctaxitrips/
Sebastian Raschkas Dataset Collections
https://github.com/rasbt/pattern_classification/blob/master/resources/dataset_collections.md

Awesome Public Datasets by Xiaming Chen, Shanghai, China

This list of public data sources are collected and tidyed from blogs, answers, and
user reponses. Most of the data sets listed below are free, however, some are not.
https://github.com/caesar0301/awesome-public-datasets
I am now a Ph.D. candidate with Prof. Yaohui Jin at Shanghai Jiao Tong Univ.. I
received my B.S. (2010) of Optical Information and Science Technology at Xidian
University, Xi'an, China.
My research interests come from the measurement and analysis of network traffic,
especially the renewed models and characteristics of networks traffic, with the data
mining techniques and high performance processing platforms like Network
Processors and distributed processing systems like Hadoop/MapReduce or Spark.
If you are interested in my articles, researches, or projects, you can reach me via
email or other partially instant messages like github.
Enjoy! :-)
http://hsiamin.com/pages/about.html
UK Dataset
Opening up government
http://data.gov.uk/

LONDON DATASTORE - 591 datasets
Welcome to the new look DataStore

Over the last few months we have been busy updating London Datastore to deliver a
host of practical new features - improved (geography based) searches, dataset
previews and APIs all of which will make for a much sleeker experience. The
technical improvements are there to support our broader aim of kick-starting
collaboration so that the value of data in our city reaches its full potential.
Have a look around, read the introductory blog and Let us know what you think.
http://data.london.gov.uk
Transport For London Open Data, UK

http://www.tfl.gov.uk/info-for/open-data-users/our-open-data
Gaussian Processes List of Datasets

Welcome to the web site for theory and applications of Gaussian Processes
83
Gaussian Process is powerful non-parametric machine learning technique for

constructing comprehensive probabilistic models of real world problems. They can
be applied to geostatistics, supervised, unsupervised, reinforcement learning,
principal component analysis, system identification and control, rendering music
performance, optimization and many other tasks.
People
Geology & Modelling Research Group at Rio Tinto Centre for Mine Automation,
ACFR, University of Sydney
http://gaussianprocess.com/datasets.php

The New York Times Linked Open Data (Beta)


For the last 150 years, The New York Times has maintained one of the most
authoritative news vocabularies ever developed. In 2009, we began to publish this
vocabulary as linked open data.
The Data
As of 13 January 2010, The New York Times has published approximately ,10,000
subject headings as linked open data under a CC BY license. We provide both RDF
documents and a human-friendly HTML versions. The table below gives a
breakdown of the various tag types and mapping strategies on data.nytimes.com.
Type Manually Mapped Tags Automatically Mapped Tags Total
People 4,978 0 4,978
Organizations 1,489 1,592 3,081
Locations 1,910 0 1,910
Descriptors 498 0 498
Total 10,467
http://data.nytimes.com

Google Public Data Explorer
The Google Public Data Explorer makes large, public-interest datasets easy to
explore, visualize and communicate. As the charts and maps animate over time, the
changes in the world become easier to understand. You don't have to be a data
expert to navigate between different views, make your own comparisons, and share
your findings.
Students, journalists, policy makers and everyone else can play with the tool to
create visualizations of public data, link to them, or embed them in their own
webpages. Embedded charts and links can update automatically so youre always
sharing the latest available data.
The Public Data Explorer launched in March, 2010. See this blog post, which
originally announced the product, for more background and historical perspective.
https://www.google.com/publicdata/directory?hl=en_US&dl=en_US#!st=DATASET

84
Open Dataset - French

Montreal, Portail Donnees Ouvertes (French&English), Canada
http://donnees.ville.montreal.qc.ca
Insee, France
http://www.insee.fr/fr/publications-et-services/depliant_webinsee.pdf
RATP Open Data, French Tube in Paris, France
http://data.ratp.fr/fr/les-donnees.html
LOpen-Data franais cartographi
Voici trois cartographies de lcosphre de lOpen Data franais. Sur fond noir, les
trois posters (tlchargeable au format A0) livrent un aperu gnral sur lopen-
data franais actuel. Les trois cartographies sont bases sur les donnes fournies par
Data-Publica, notamment deux tudes ralises rcemment par Guillaume
Lebourgeois, Pierrick Boitel et Perrine Letellier (ayant accueilli les deux derniers
dans mon enseignement lUTC au semestre dernier). Lobjectif de ces cartes est
dentamer une radiographie assez complte du domaine, renouvelable dans le
temps (peut-tre tous les six mois) et directement associe aux donnes prsentes
chez Data-Publica. En somme, une sorte dobservatoire de lopen-data franais dans
lequel je me lance travers les productions de lAtelier de Cartographie.
http://ateliercartographie.wordpress.com/2012/09/23/lopen-data-francais-
cartographie/
Open Dataset - China

Lamda Group
Data
Image Data For Multi-Instance Multi-Label Learning
MDDM Data for for multi-label dimensionality reduction.
Text Data for Multi-Instance Learning
MILWEB Data for Multi-Instance Learning Based Web Index
Recommendation.
SGBDota Data for the PCES (Positive Concept Expansion with Single
snapshot) problem.
Single Face Dataset Data for Face Recognition with One Training Image per
Person.
Text Data For Multi-Instance Multi-Label Learning
http://lamda.nju.edu.cn/Data.ashx
85
Data Visualisation

Visualization Lab Gallery, Computer Science Division, University of California, Berkeley

CS 294-10 Fall '14 Visualization
Instructors: Maneesh Agrawala and Jessica Hullman
Course Wiki
CS 160 Spring '14 User Interface Design
Instructor: Maneesh Agrawala and Bjoern Hartmann
TAs: Brittany Cheng, Steve Rubin, and Eric Xiao
Course Wiki
Instructor: Maneesh Agrawala
Course Wiki
CS 160 Spring '12 User Interface Design
TAs: Nicholas Kong, Anuj Tewari
Course Wiki
CS 294-69 Fall '11 Image Manipulation and Computational Photography
TA: Floraine Berthouzoz
Course Wiki
CS 294-10 Spring '11 Visualization
Course Wiki
CS 184 Fall '10 Computer Graphics
TAs: Robert Carroll, Fu-Chung Huang
Course Wiki
CS 160 Spring '10 User Interface
Instructors: Bjoern Hartmann, Maneesh Agrawala
TAs: Kenrick Kin, Anuj Tewari
Course Wiki
Course Wiki
CS 160 Spring '09 User Interfaces
Instructors: Maneesh Agrawala, Jeffrey Nichols
TAs: Nicholas Kong
Course Wiki
86

Course Wiki
CS 160 Spring '08 User Interfaces
TAs: Wesley Willett and Seth Horrigan
Course Wiki
Course Wiki
CS 160 Fall '06 User Interfaces
TAs: David Sun and Jerry Yu
Course Wiki
Organizers: Maneesh Agrawala, Jeffrey Heer
Course Wiki
http://vis.berkeley.edu/courses/cs294-10-
fa14/wiki/index.php/Visualization_Gallery

Visualization Lab Software, Computer Science Division, University of California,
Berkeley

http://vis.berkeley.edu/software

Visualization Lab Course Wiki, Computer Science Division, University of California,
Berkeley

http://vis.berkeley.edu/courses/

Mike Bostock
Visualizing algorithms
http://bost.ocks.org/mike/
Eyeo Festival
Eyeo assembles an incredible set of creative coders, data designers and artists, and
attendees -- expect enthralling talks, unique workshops and interactions with open
source instigators and super fascinating practitioners. Join us for an extraordinary
festival.
http://eyeofestival.com
MIT Data Collider
A new language for data visualisation
http://datacollider.io
87
D3 JS Data-Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you
bring data to life using HTML, SVG and CSS. D3s emphasis on web standards gives
you the full capabilities of modern browsers without tying yourself to a proprietary
framework, combining powerful visualization components and a data-driven
approach to DOM manipulation.
http://d3js.org

Shan He, Research Fellow at MIT Senseable City Lab
Shan He is research fellow at MIT Senseable City Lab. She is an architect and a
computational design specialist. She is currently a student at MIT Department of
Architecture pursuing her SMArchS in Design and Computation. At Senseable, her
focus is on data visualization, interactive design and web application.
Prior to coming to MIT she worked as a product designer for Blu Homes where she
worked on developing an online 3-D customization tool with intellectual property.
During her time at MIT she has worked as a research assistant for the Clean Energy
City Lab at the Advanced Urbanism Center and also for the Mobile Experience Lab at
the CMS.
Shan holds a B.Arch from Tsinghua University in China and a M.Arch from
University of Michigan, Ann Arbor.
http://cargocollective.com/shanhe/About-Shan-He

Gource software version control visualization
Software projects are displayed by Gource as an animated tree with the root
directory of the project at its centre. Directories appear as branches with files as
leaves. Developers can be seen working on the tree at the times they contributed to
the project.
https://www.youtube.com/watch?v=NjUuAuBcoqs#t=73
https://code.google.com/p/gource/
Logstalgia, website access log visualization
Logstalgia (aka ApachePong) is a website access log visualization tool.
https://code.google.com/p/logstalgia/

Andrew Caudwell's Blog
Andrew Caudwell is a software developer and sometimes computer graphics
programmer/artist located in Wellington, New Zealand.
He is probably best known through his work as the author of several popular data
visualizations:
Logstalgia (aka Apache Pong) a visualization of website traffic as a pong-like game
Gource a force-directed layout software version control visualization
This blog is a collection of his work, experiments, thoughts and ideas on
procedurally generated computer graphics and animation.
http://www.thealphablenders.com

88
Books English

An Architecture for Fast and General Data Processing on Large Clusters by Matei
Zaharia, 2014
The past few years have seen a major change in computing systems, as growing data
volumes and stalling processor speeds require more and more applications to scale
out to distributed systems. Today, a myriad data sources, from the Internet to
business operations to scientific instruments, produce large and valuable data
streams. However, the processing capabilities of single machines have not kept up
with the size of data, making it harder and harder to put to use. As a result, a grow-
ing number of organizationsnot just web companies, but traditional enterprises
and research labsneed to scale out their most important computations to clusters
of hundreds of machines.
At the same time, the speed and sophistication required of data processing have
grown. In addition to simple queries, complex algorithms like machine learning and
graph analysis are becoming common in many domains. And in addition to batch
processing, streaming analysis of new real-time data sources is required to let
organizations take timely action. Future computing platforms will need to not only
scale out traditional workloads, but support these new applications as well.
This dissertation proposes an architecture for cluster computing systems that can
tackle emerging data processing workloads while coping with larger and larger
scales. Whereas early cluster computing systems, like MapReduce, handled batch
processing, our architecture also enables streaming and interactive queries, while
keeping the scalability and fault tolerance of previous systems. And whereas most
deployed systems only support simple one-pass computations (e.g., aggregation or
SQL queries), ours also extends to the multi-pass algorithms required for more
complex analytics (e.g., iterative algorithms for machine learning). Finally, unlike
the specialized systems proposed for some of these workloads, our architecture
allows these computations to be combined, enabling rich new applications that
intermix, for example, streaming and batch processing, or SQL and complex
analytics.
We achieve these results through a simple extension to MapReduce that adds
primitives for data sharing, called Resilient Distributed Datasets (RDDs). We show
that this is enough to efficiently capture a wide range of workloads. We implement
RDDs in the open source Spark system, which we evaluate using both synthetic
benchmarks and real user applications. Spark matches or exceeds the performance
of specialized systems in many application domains, while offering stronger fault
tolerance guarantees and allowing these workloads to be combined. We explore the
generality of RDDs from both a theoretical modeling perspective and a practical
perspective to see why this extension can capture a wide range of previously
disparate workloads.
http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf

89
Deep Learning (Artificial Intelligence) , An MIT Press book in preparation, by Yoshua

Bengio, Ian Goodfellow and Aaron Courville, 20-Oct-2014
Please help us make this a great book! This draft is still full of typos and can be
improved in many ways. Your suggestions are more than welcome. Do not hesitate
to contact any of the authors directly by e-mail or Google+ messages: Yoshua, Ian,
Aaron.
Table of Contents
Deep Learning for AI
Linear Algebra
Probability and Information Theory
Numerical Computation
Machine Learning Basics
Feedforward Deep Networks
Structured Probabilistic Models: A Deep Learning Perspective
Unsupervised and Transfer Learning
Convolutional Networks
Sequence Modeling: Recurrent and Recursive Nets
The Manifold Perspective on Auto-Encoders
Confronting the Partition Function
References
http://www.iro.umontreal.ca/~bengioy/dlbook/
Deep Learning Tutorial by LISA Lab, University of Montreal, 2014
The tutorials presented here will introduce you to some of the most important deep
learning algorithms and will also show you how to run them using Theano. Theano
is a python library that makes writing deep learning models easy, and gives the
option of training them on a GPU.
The algorithm tutorials have some prerequisites. You should know some python,
and be familiar with numpy. Since this tutorial is about using Theano, you should
read over the Theano basic tutorial first. Once youve done that, read through our
Getting Started chapter it introduces the notation, and [downloadable] datasets
used in the algorithm tutorials, and the way we do optimization by stochastic
gradient descent.
The purely supervised learning algorithms are meant to be read in order:
1. Logistic Regression - using Theano for something simple
2. Multilayer perceptron - introduction to layers
3. Deep Convolutional Network - a simplified version of LeNet5
The unsupervised and semi-supervised learning algorithms can be read in any order
(the auto-encoders can be read independently of the RBM/DBN thread):
Auto Encoders, Denoising Autoencoders - description of autoencoders
Stacked Denoising Auto-Encoders - easy steps into unsupervised pre-training for
deep nets
Restricted Boltzmann Machines - single layer generative RBM model
DeepBeliefNetworks-unsupervisedgenerativepre-
trainingofstackedRBMsfollowedbysupervised fine-tuning
90
Building towards including the mcRBM model, we have a new tutorial on sampling
from energy models:
HMC Sampling - hybrid (aka Hamiltonian) Monte-Carlo sampling with scan()
Building towards including the Contractive auto-encoders tutorial, we have the code
for now:
Contractive auto-encoders code - There is some basic doc in the code.
Energy-based recurrent neural network (RNN-RBM):
Modeling and generating sequences of polyphonic music
http://deeplearning.net/tutorial/deeplearning.pdf
Statistical Inference for Everyone, by Professor Bryan Blais, 2014
This is a new approach to an introductory statistical inference textbook, motivated
by probability theory as logic. It is targeted to the typical Statistics 101 college
student, and covers the topics typically covered in the first semester of such a
course. It is freely available under the Creative Commons License, and includes a
software library in Python for making some of the calculations and visualizations
easier.
I am a professor of Science and Technology, Bryant University and a research
professor at the Institute for Brain and Neural Systems, Brown University. My
interests include
Theoretical Neuroscience
learning and memory in neural systems
vision
spike-timing dependent plasticity
Bayesian Inference
frequentist versus Bayesian statistics
Bayesian approaches to learning and memory
Digital to Analog Computer Control
autonomous experiments
neural networks and robotics
Global Resources
Dynamics of global resources and economics
Population growth, Malthusian traps, and energy
http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html

Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, 2014
The book
The book is based on Stanford Computer Science course CS246: Mining Massive
Datasets (and CS345A: Data Mining).
The book, like the course, is designed at the undergraduate computer science level
with no formal prerequisites. To support deeper explorations, most of the chapters
are supplemented with further reading references.
The Mining of Massive Datasets book has been published by Cambridge University
Press. You can get 20% discount here.
91
By agreement with the publisher, you can download the book for free from this
page. Cambridge University Press does, however, retain copyright on the work, and
we expect that you will obtain their permission and acknowledge our authorship if
you republish parts or all of it. We are sorry to have to mention this point, but we
have evidence that other items we have published on the Web have been
appropriated and republished under other names. It is easy to detect such misuse,
by the way, as you will learn in Chapter 3.
We welcome your feedback on the manuscript.
The 2nd edition of the book (v2.1)
The following is the second edition of the book. There are three new chapters, on
mining large graphs, dimensionality reduction, and machine learning. There is also a
revised Chapter 2 that treats map-reduce programming in a manner closer to how it
is used in practice.
Together with each chapter there is aslo a set of lecture slides that we use for
teaching Stanford CS246: Mining Massive Datasets course. Note that the slides do
not necessarily cover all the material convered in the corresponding chapters.
Download the latest version of the book as a single big PDF file (511 pages, 3 MB).
Note to the users of provided slides: We would be delighted if you found this our
material useful in giving your own lectures. Feel free to use these slides verbatim, or
to modify them to fit your own needs. PowerPoint originals are available. If you
make use of a significant portion of these slides in your own lecture, please include
this message, or a link to our web site: http://www.mmds.org/.
Comments and corrections are most welcome. Please let us know if you are using
these materials in your course and we will list and link to your course.
http://infolab.stanford.edu/~ullman/mmds/book.pdf
Social Media Mining by Reza Zafarani, Mohammad Ali Abbasi, Huan Liu, 2014
The growth of social media over the last decade has revolutionized the way
individuals interact and industries conduct business. Individuals produce data at an
unprecedented rate by interacting, sharing, and consuming content through social
media. Understanding and processing this new type of data to glean actionable
patterns presents challenges and opportunities for interdisciplinary research, novel
algorithms, and tool development. Social Media Mining integrates social media,
social network analysis, and data mining to provide a convenient and coherent
platform for students, practitioners, researchers, and project managers to
understand the basics and potentials of social media mining. It introduces the
unique problems arising from social media data and presents fundamental concepts,
emerging issues, and effective algorithms for network analysis and data mining.
Suitable for use in advanced undergraduate and beginning graduate courses as well
as professional short courses, the text contains exercises of different degrees of
difficulty that improve understanding and help apply concepts, principles, and
methods in various scenarios of social media mining.
http://dmml.asu.edu/smm/book/
Slides

http://dmml.asu.edu/smm/slides/
92
Causal Inference by Miguel A. Hernn and James M. Robins, May 14, 2014, Draft
The book provides a cohesive presentation of concepts of, and methods for, causal
inference. Much of this material is currently scattered across journals in several
disciplines or confined to technical articles. We expect that the book will be of
interest to anyone interested in causal inference, e.g., epidemiologists, statisticians,
psychologists, economists, sociologists, other social scientists The book is geared
towards graduate students and practitioners.
We have divided the book in 3 parts of increasing difficulty: causal inference
without models, causal inference with models, and causal inference from complex
longitudinal data. We will make drafts of selected book sections available on this
website. The idea is that interested readers can submit suggestions or criticisms
before the book is published. If you wish to share any comments, please email me or
visit us on Facebook (user causalinference).
Warning: These documents are drafts. We are constantly revising and correcting
errors without documenting the changes. Please make sure you use the most
updated version posted here.
http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Slides for High Performance Python tutorial at EuroSciPy2014 by Ian Ozsvald
This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI
consultant, founder of the Annotate.io social media mining API, author of O'Reilly's
High Performance Python book, co-organiser of PyDataLondon, co-founder of the
SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a
Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner.
Here's a little more about me.
https://github.com/ianozsvald/euroscipy2014_highperformancepython
http://ianozsvald.com/2014/08/30/slides-for-high-performance-python-tutorial-
at-euroscipy2014-book-signing/
Neural Networks and Deep Learning, 2014
Neural Networks and Deep Learning is a free online book. The book will teach you
about:
Neural networks, a beautiful biologically-inspired programming paradigm which
enables a computer to learn from observational data
Deep learning, a powerful set of techniques for learning in neural networks
Neural networks and deep learning currently provide the best solutions to many
problems in image recognition, speech recognition, and natural language
processing. This book will teach you the core concepts behind neural networks and
deep learning.
The book is currently an incomplete beta draft. More chapters will be added over
the coming months. For now, you can:
Read Chapter 1, which explains how neural networks can learn to recognize
handwriting
93
Read Chapter 2, which explains backpropagation, the most important algorithm

used to learn in neural networks.
http://neuralnetworksanddeeplearning.com/index.html

Probabilistic Programming and Bayesian Methods for Hackers by Cameron Davidson-
Pilon, 2014
Bayesian Methods for Hackers is designed as a introduction to Bayesian inference
from a computational/understanding-first, and mathematics-second, point of view.
Of course as an introductory book, we can only leave it at that: an introductory book.
For the mathematically trained, they may cure the curiosity this text generates with
other texts designed with mathematical analysis in mind. For the enthusiast with
less mathematical-background, or one who is not interested in the mathematics but
simply the practice of Bayesian methods, this text should be sufficient and
entertaining.
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-
Methods-for-Hackers

Bayesian Reasoning and Machine Learning, David Barber, 2012 (online version 02-
2014)
Machine learning methods extract value from vast data sets quickly and with
modest resources. They are established tools in a wide range of industrial
applications, including search engines, DNA sequencing, stock market analysis, and
robot locomotion, and their use is spreading rapidly. People who know the methods
have their choice of rewarding jobs. This hands-on text opens these opportunities to
computer science students with modest mathematical backgrounds. It is designed
for final-year undergraduates and master's students with limited background in
linear algebra and calculus. Comprehensive and coherent, it develops everything
from basic reasoning to advanced techniques within the framework of graphical
models. Students learn more than a menu of techniques, they develop analytical and
problem-solving skills that equip them for the real world. Numerous examples and
exercises, both computer based and theoretical, are included in every chapter.
Resources for students and instructors, including a MATLAB toolbox, are available
online.
http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=Brml.Online
Past, Present, and Future of Statistical Science by COPSS, 2014
http://nisla05.niss.org/copss/past-present-future-copss.pdf


94
Essential of Metaheuristics by Sean Luke, 2013

Fill the form and download for free
This is an open set of lecture notes on metaheuristics algorithms, intended for
undergraduate students, practitioners, programmers, and other non-experts. It was
developed as a series of lecture notes for an undergraduate course I taught at GMU.
The chapters are designed to be printable separately if necessary. As it's lecture
notes, the topics are short and light on examples and theory. It's best when
complementing other texts. With time, I might remedy this.
http://cs.gmu.edu/~sean/book/metaheuristics/
Statistical Model Building, Machine Learning, and the Ah-Ha Moment by Grace
Wahba, 2013
https://archive.org/details/arxiv-1303.5153

An Introduction to Statistical Learning with applications in R. by Gareth James Daniela
Witten Trevor Hastie Robert Tibshirani, 2013 (first printing)
http://web.stanford.edu/~hastie/local.ftp/Springer/ISLR_print1.pdf

A course in Machine Learning by Hal Daume, 2012
Machine learning is the study of algorithms that learn from data and experience. It is
applied in a vast variety of application areas, from medicine to advertising, from
military to pedestrian. Any area in which you need to make sense of data is a
potential consumer of machine learning.
CIML is a set of introductory materials that covers most major aspects of modern
machine learning (supervised learning, unsupervised learning, large margin
methods, probabilistic modeling, learning theory, etc.). It's focus is on broad
applications with a rigorous backbone. A subset can be used for an undergraduate
course; a graduate course could probably cover the entire material and then some.
http://ciml.info

Machine Learning in Action, Peter Harrington, 2012
Chapter 1 and 7 are available for free on the publisher website
http://www.manning.com/pharrington/MLiAchapter1sample.pdf
http://www.manning.com/pharrington/MLiAchapter7sample.pdf

A Programmer's Guide to Data Mining, by Ron Zacharski, 2012
About This Book
Before you is a tool for learning basic data mining techniques. Most data mining
textbooks focus on providing a theoretical foundation for data mining, and as result,
95
may seem notoriously difficult to understand. Dont get me wrong, the information
in those books is extremely important. However, if you are a programmer interested
in learning a bit about data mining you might be interested in a beginners hands-on
guide as a first step. Thats what this book provides.
This guide follows a learn-by-doing approach. Instead of passively reading the book,
I encourage you to work through the exercises and experiment with the Python code
I provide. I hope you will be actively involved in trying out and programming data
mining techniques. The textbook is laid out as a series of small steps that build on
each other until, by the time you complete the book, you have laid the foundation for
understanding data mining techniques. This book is available for download for free
under a Creative Commons license (see link in footer). You are free to share the
book, and remix it. Someday I may offer a paper copy, but the online version will
always be free.
http://guidetodatamining.com

Artificial Intelligence, Foundations of Computational Agents by David Poole and Alan
Mackworth, 2010
Artificial Intelligence: Foundations of Computational Agents is a book about the
science of artificial intelligence (AI). The view we take is that AI is the study of the
design of intelligent computational agents. The book is structured as a textbook but
it is designed to be accessible to a wide audience.
We wrote this book because we are excited about the emergence of AI as an
integrated science. As with any science worth its salt, AI has a coherent, formal
theory and a rambunctious experimental wing. Here we balance theory and
experiment and show how to link them intimately together. We develop the science
of AI together with its engineering applications. We believe the adage, "There is
nothing so practical as a good theory." The spirit of our approach is captured by the
dictum, "Everything should be made as simple as possible, but not simpler." We
must build the science on solid foundations; we present the foundations, but only
sketch, and give some examples of, the complexity required to build useful
intelligent systems. Although the resulting systems the will be complex, the
foundations and the building blocks should be simple.
http://artint.info/html/ArtInt.html

The Elements of Statistical Learning, T. Hastie, R. Tibshirani, and J. Friedman, 2009
During the past decade has been an explosion in computation and information
technology. With it has come vast amounts of data in a variety of fields such as
medicine, biology, finance, and marketing. The challenge of understanding these
data has led to the development of new tools in the field of statistics, and spawned
new areas such as data mining, machine learning, and bioinformatics. Many of these
tools have common underpinnings but are often expressed with different
terminology. This book descibes the important ideas in these areas in a common
96
conceptual framework. While the approach is statistical, the emphasis is on

concepts rather than mathematics. Many examples are given, with a liberal use of
color graphics. It should be a valuable resource for statisticians and anyone
interested in data mining in science or industry. The book's coverage is broad, from
supervised learning (prediction) to unsupervised learning. The many topics include
neural networks, support vector machines, classification trees and boosting--the
first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including
graphical models, random forests, ensemble methods, least angle regression & path
algorithms for the lasso, non-negative matrix factorization and spectral clustering.
There is also a chapter on methods for ``wide'' data (italics p bigger than n),
including multiple testing and false discovery rates.
http://statweb.stanford.edu/~tibs/ElemStatLearn/

Learning Deep Architecture for AI by Yoshua Bengio, 2009
Abstract
Theoretical results suggest that in order to learn the kind of complicated functions that can
represent high-level abstractions (e.g., in vision, language, and other AI-level tasks), one may
need deep architectures. Deep architectures are composed of multiple levels of non-linear
operations, such as in neural nets with many hidden layers or in complicated propositional
formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a
difficult task, but learning algorithms such as those for Deep Belief Networks have recently been
proposed to tackle this problem with notable success, beating the state- of-the-art in certain areas.
This monograph discusses the motivations and principles regarding learning algorithms for deep
architectures, in particular those exploiting as building blocks unsupervised learning of singlelayer models such as Restricted Boltzmann Machines, used to construct deeper models such as
Deep Belief Networks.
http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf
An Introduction to Information Retrieval by Christopher D. Manning Prabhakar

Raghavan Hinrich Schtze, 2009
This book is the result of a series of courses we have taught at Stanford University
and at the University of Stuttgart, in a range of durations including a single quarter,
one semester and two quarters. These courses were aimed at early-stage graduate
students in computer science, but we have also had enrollment from upper-class
computer science undergraduates, as well as students from law, medical
informatics, statistics, linguistics and various engineering disciplines. The key
design principle for this book, therefore, was to cover what we believe to be
important in a one-term graduate course on information retrieval. An additional
principle is to build each chapter around material that we believe can be covered in
a single lecture of 75 to 90 minutes.
The first eight chapters of the book are devoted to the basics of information
retrieval, and in particular the heart of search engines; we consider this material to
be core to any course on information retrieval.
97

Chapters 921 build on the foundation of the first eight chapters to cover a variety
of more advanced topics.
http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf
http://www-nlp.stanford.edu/IR-book/

Kernel Method in Machine Learning by Thomas Hofmann; Bernhard Schlkopf;
Alexander J. Smola, 2008
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of
a kernel. Working in linear spaces of function has the benefit of facilitating the
construction and analysis of learning algorithms while at the same time allowing
large classes of functions. The latter include nonlinear functions as well as functions
defined on nonvectorial data. We cover a wide range of methods, ranging from
binary classifiers to sophisticated methods for estimation with structured data.
https://archive.org/details/arxiv-math0701907
Introduction to Machine Learning, Alex Smola, S.V.N. Vishwanathan, 2008
Over the past two decades Machine Learning has become one of the main- stays of
information technology and with that, a rather central, albeit usually hidden, part of
our life. With the ever increasing amounts of data becoming available there is good
reason to believe that smart data analysis will become even more pervasive as a
necessary ingredient for technological progress.
The purpose of this chapter is to provide the reader with an overview over the vast
range of applications which have at their heart a machine learning problem and to
bring some degree of order to the zoo of problems. After that, we will discuss some
basic tools from statistics and probability theory, since they form the language in
which many machine learning problems must be phrased to become amenable to
solving. Finally, we will outline a set of fairly basic yet effective algorithms to solve
an important problem, namely that of classification. More sophisticated tools, a
discussion of more general problems and a detailed analysis will follow in later
parts of the book.
http://alex.smola.org/drafts/thebook.pdf

Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006
Pattern recognition has its origins in engineering, whereas machine learning grew
out of computer science. However, these activities can be viewed as two facets of the
same field, and together they have undergone substantial development over the
past ten years. In particular, Bayesian methods have grown from a specialist niche
to become mainstream, while graphical models have emerged as a general
98
framework for describing and applying probabilistic models. Also, the practical
applicability of Bayesian methods has been greatly enhanced through the
development of a range of approximate inference algorithms such as variational
Bayes and expectation propagation. Similarly, new models based on kernels have
had significant impact on both algorithms and applications.

Chapter 8 Graphical Models
Probabilities play a central role in modern pattern recognition. We have seen in
Chapter 1 that probability theory can be expressed in terms of two simple equations
corresponding to the sum rule and the product rule. All of the probabilistic infer-
ence and learning manipulations discussed in this book, no matter how complex,
amount to repeated application of these two equations. We could therefore proceed
to formulate and solve complicated probabilistic models purely by algebraic ma-
nipulation. However, we shall find it highly advantageous to augment the analysis
using diagrammatic representations of probability distributions, called probabilistic
graphical models. These offer several useful properties:
1. They provide a simple way to visualize the structure of a probabilistic model and
can be used to design and motivate new models.
2. Insights into the properties of the model, including conditional independence
properties, can be obtained by inspection of the graph.
3. Complex computations, required to perform inference and learning in sophis-
ticated models, can be expressed in terms of graphical manipulations, in which
underlying mathematical expressions are carried along implicitly.
http://research.microsoft.com/en-us/um/people/cmbishop/PRML/pdf/Bishop-PRML-sample.pdf
http://research.microsoft.com/en-us/um/people/cmbishop/prml/

Gaussian processes for Machine Learning, C. Rasmussen and C. Williams, 2006
Gaussian processes (GPs) provide a principled, practical, probabilistic approach to
learning in kernel machines. GPs have received increased attention in the machine-
learning community over the past decade, and this book provides a long-needed
systematic and unified treatment of theoretical and practical aspects of GPs in
machine learning. The treatment is comprehensive and self-contained, targeted at
researchers and students in machine learning and applied statistics.The book deals
with the supervised-learning problem for both regression and classification, and
includes detailed algorithms. A wide variety of covariance (kernel) functions are
presented and their properties discussed. Model selection is discussed both from a
Bayesian and a classical perspective. Many connections to other well-known
techniques from machine learning and statistics are discussed, including support-
vector machines, neural networks, splines, regularization networks, relevance
vector machines and others. Theoretical issues including learning curves and the
PAC-Bayesian framework are treated, and several approximation methods for
learning with large datasets are discussed. The book contains illustrative examples
and exercises, and code and datasets are available on the Web. Appendixes provide
mathematical background and a discussion of Gaussian Markov processes.
99
http://www.gaussianprocess.org/gpml/chapters/

Bayesian Machine Learning by Chakraborty, Sounak, 2005
PhD Thesis
https://archive.org/details/bayesianmachinel00chak

Machine Learning by Tom Mitchell, 2005
Policy on use:. You are welcome to download these chapters for your personal use,
or for use in classes you teach. In return, I ask only two things:
Please do not re-post these documents on the internet. If you wish to make
them available to your students, point them directly to this site.
If you find errors please send me email at Tom.Mitchell@cmu.edu
I hope you find these useful! Tom Mitchell
http://www.cs.cmu.edu/%7Etom/NewChapters.html
http://www.cs.cmu.edu/%7Etom/mlbook-chapter-slides.html

Information Theory, Inference, and Learning Algorithms, David McKay, 2003
This book is aimed at senior undergraduates and graduate students in Engineering,
Science, Mathematics, and Computing. It expects familiarity with calculus,
probability theory, and linear algebra as taught in a first- or second- year
undergraduate course on mathematics for scientists and engineers.
Conventional courses on information theory cover not only the beautiful theoretical
ideas of Shannon, but also practical solutions to communication problems. This
book goes further, bringing in Bayesian data modelling, Monte Carlo methods,
variational methods, clustering algorithms, and neural networks.
Why unify information theory and machine learning? Because they are two sides of
the same coin. In the 1960s, a single field, cybernetics, was populated by
information theorists, computer scientists, and neuroscientists, all studying
common problems. Information theory and machine learning still belong together.
Brains are the ultimate compression and communication systems. And the state-of-
the-art algorithms for both data compression and error-correcting codes use the
same tools as machine learning.
http://www.inference.phy.cam.ac.uk/itprnn/book.html
https://archive.org/details/MackayInformationTheoryFreeEbookReleasedByAuthor

Free Book List
E-Books for free online viewing and/or download
http://www.e-booksdirectory.com/listing.php?category=284
100
Free resource book (need to sign in)

There are too many machine learning resources on the internet, so much so that it
can feel overwhelming.
I have read the books and taken the courses and can give you good advice on where
to start.
Resources you can use to learn faster
I have hand-picked the best machine learning

books

websites

videos

university courses

software

competition sites
These resources have been listed in a handy PDF that you can download now
http://machinelearningmastery.com/machine-learning-resources/
Free ML ebooks on it-ebooks, but this website is controversial, please read
stackoverflow before accessing to this website by yourself
http://meta.stackoverflow.com/questions/255032/should-we-add-it-ebooks-info-to-the-stack-overflow-url-blacklist
Wikipedia: Machine Learning, the Complete Guide

This is a Wikipedia book, a collection of Wikipedia articles that can be easily saved,
rendered electronically, and ordered as a printed book. For information and help on
Wikipedia books in general, see Help:Books (general tips) and WikiProject
Wikipedia-Books (questions and assistance).
https://en.wikipedia.org/wiki/Book:Machine_Learning_-_The_Complete_Guide

ISSUU
Rediscover reading
With over 19 million publications, Issuu is the fastest growing digital publishing
platform in the world. Millions of avid readers come here every day to read the free
publications created by enthusiastic publishers from all over the globe with topics in
fashion, lifestyle, art, sports and global affairs to mention a few. And that's not all.
We've also got a prominent range of independent publishers utilizing the Issuu
network to reach new fans every day.
Created by a bunch of geeks with an undying love for the publishing industry, Issuu
has grown to become one of the biggest publishing networks in the industry. It's an
archive, library and newsstand all gathered in one reading experience.
http://issuu.com/search?q=%22machine+learning%22

101
Books - Spanish

Coming soon
Books - German

Coming soon

Books - Italian

Coming soon

Books - French

Coming soon

Books Russian

Pattern Recognition by .., 2011

http://www.recognition.mccme.ru/pub/RecognitionLab.html/slbook.pdf

Algorithmic models of learning classification: rationale, comparison, selection, 2014

http://www.machinelearning.ru/wiki/images/c/c3/Donskoy14algorithmic.pdf

More coming soon

Books - Japanese

Coming soon


102
Books - Chinese

Blog recommending useful books

A blog written in Chinese which introduces and recommends many useful ML books
(the books are mostly written in English).
http://blog.csdn.net/pongba/article/details/2915005
Textbook for Statistics
http://baike.baidu.com/subview/1724467/13114186.htm
Introduction to Pattern recognition
http://baike.baidu.com/view/3911812.htm
Translated version of Machine Learning by Tom Mitchell:
http://book.douban.com/subject/1102235/

Books - Portuguese

Coming soon

Presentation, Infographics and Documents - English

Meetup's Presentations
https://skillsmatter.com/explore?content=skillscasts&location=&q=machine+learning

Slides
Slideshare.com
http://www.slideshare.net/search/slideshow?searchfrom=header&q=machine+learning
Slides.com
http://slides.com/explore?search=machine%20learning
Powershow.com
http://www.powershow.com/search/presentations/machine-learning
Speaker Deck
https://speakerdeck.com/search?q=machine+learning


103
Slides from Lectures

Introduction to Artificial Intelligence, 2014, University of Waterloo
https://www.student.cs.uwaterloo.ca/~cs486/syllabus.html
Aprendizado de Maquina, Conceitos e definicoes by Jose Augusto Baranauskas
http://dcm.ffclrp.usp.br/~augusto/teaching/ami/AM-I-Conceitos-Definicoes.pdf

Aprendizado de Maquina by Bianca Zadrozni, Instituto de Computao, UFF, 2010
http://www2.ic.uff.br/~bianca/aa/

More coming soon

Slides from Meetups

NYC ML Meetup, 2014
Natural Language Processing in Investigative Journalism by Jonathan Stray
http://www.scribd.com/doc/230605794/Natural-Language-Processing-in-Investigative-Journalism
https://github.com/overview/overview-server/wiki/Visualization-Plugin-API
Statistics with Doodles by Thomas Levine

http://thomaslevine.com/!/statistics-with-doodles-2014-03/

More coming soon

Slides from Conferences

More coming soon


104
Conferences

International Conference in Machine Learning (ICML)

ICML, Beijing, China 2014
http://icml.cc/2014/
ICML, Atlanta, US 2013
http://techtalks.tv/icml/2013/
ICML, Edinburgh, UK 2012
http://techtalks.tv/icml/2012/orals/
http://techtalks.tv/icml_2012_representation_learning/
http://techtalks.tv/icml/2012/inferning2012/
http://techtalks.tv/icml/2012/object2012/
http://techtalks.tv/icml/2012/icml_colt_2012_tutorials/icml-2012-tutorial-on-prediction-belief-and-market/

ICML, Bellevue, US 2011
http://www.icml-2011.org
http://techtalks.tv/icml-2011/
ICML, Haifa, Israel 2010
http://www.icml2010.org
Full archive of ICML
http://machinelearning.org/icml.html

Machine Learning Conference Videos
http://techtalks.tv/search/results/?q=machine+learning

Annual Machine Learning Symposium
6th
http://techtalks.tv/sixth-annual-machine-learning-symposium/
8th
http://www.nyas.org/Events/Detail.aspx?cid=2cc3521e-408a-460e-b159-e774734bcbea
Archive
http://www.nyas.org/whatwedo/fos/machine.aspx

105
MLSS Machine Learning Summer Schools

http://www.mlss.cc
http://www.mlss2014.com/index.html
Data Gotham 2012,2013
http://www.youtube.com/user/DataGotham

106
Meetup - English
631 Machine Learning Meetup in the World
http://machine-learning.meetup.com/
Data Science Weekly List of Meetups
List of Data Science Meetups: NYC, San Francisco, Washington DC, Boston, Chicago,
Seattle, Denver, Austin, Atlanta, Toronto, Vancouver, London, Berlin, Paris,
Amsterdam, Tel Aviv, Dubai, Delhi, Bangalore, Singapore, Sydney
http://www.datascienceweekly.org/data-science-resources/data-science-meetups
Other Meetups missing in Data Science Weekly
London Machine Learning Meetup
http://www.meetup.com/London-Machine-Learning-Meetup/
London Deep Learning Meetup
http://www.meetup.com/Deep-Learning-London/


107
Blog English
Data Science Weekly
The Data Science Weekly Blog contains interviews to better understand how people
are using Data and Data Science to change the world.
http://www.datascienceweekly.org/blog
Yann LeCun, Google+
My main research interests are Machine Learning, Computer Vision, Mobile
Robotics, and Computational Neuroscience. I am also interested in Data
Compression, Digital Libraries, the Physics of Computation, and all the applications
of machine learning (Vision, Speech, Language, Document understanding, Data
Mining, Bioinformatics).
https://plus.google.com/+YannLeCunPhD/posts
Igor Carron Blog
Nuit Blanche is a blog that focuses on Compressive Sensing, Advanced Matrix
Factorization Techniques, Machine Learning as well as many other engaging ideas
and techniques needed to handle and make sense of very high dimensional data also
known as Big Data.
http://nuit-blanche.blogspot.co.uk
KDD Community, Knowledge discovery and Data Mining
KDD bringing together the data mining, data science and analytics community
http://www.sigkdd.org/blog
Kaggle Blog
http://blog.kaggle.com
Digg
Digg is a news aggregator with an editorially driven front page, aiming to select
stories specifically for the Internet audience such as science, trending political
issues, and viral Internet issues. (source wikipedia)
http://digg.com/search?q=machine+learning
Feedly
Found a site you like? Use the +feedly button to add it to your feedly reading list
http://feedly.com/index.html#explore%2F%23Machine%20Learning
Mlwave
Learning Machine Learning
ML Wave is a platform that talks about machine learning and data science. It was
founded in 2014 by the Dutch Kaggle user Triskelion.
http://mlwave.com
FastML
Machine Learning made easy
108
FastML probably grew out of a frustration with papers you need a PhD in math to
understand and with either no code or half-baked Matlab implementation of
homework-assignment quality. We understand that some cutting-edge researchers
might have no interest in providing the goodies for free, or just no interest in such
down-to-earth matters. But we dont have time nor desire to become experts in
every machine learning topic. Fortunately, there is quite a lot of good software with
acceptable documentation.
http://fastml.com
Beating the Benchmark
http://beatingthebenchmark.blogspot.co.uk

YOU CANalytics
Welcome to UCAnalytics.com, the idea behind this website is to explore the
applications of advanced Analytics and data mining in business. Analytics is an effort
to explore interesting but hidden patterns in data for business growth. This idea has
inspired me to name the site
UCAnalytics: YOU CANalytics
UCAnalytics: YOU SEE Analytics
UCAnalytics: University for Analytics
This is sort of like finding patterns in a cluster of clouds a fun exercise. However,
we will explore some serious business applications and usage of Analytics over
here. A few topics including
1. Analytical Scorecard Development
2. Customer Segmentation to gain deeper knowledge of customer behaviour
3. Data mining and Big Data Analytics
4. Business Applications of Bayesian Statistics Nate Silver has made Bayesian cool!
5. Challenges & Pitfalls in Business Forecasting Time Series Modelling
6. Business Growth through right Design-of-Experiments
7. Business Growth & Risk Estimation through Analytical simulations
Look forward to share my ideas and hear back from you.
Roopam Upadhyay
http://ucanalytics.com/blogs
Trevor Stephens Blog
http://trevorstephens.com
Mozilla Hacks
Mozilla Hacks is one of the key resources for people developing for the Open Web,
talking about news and in-depth descriptions of technologies and features.
https://hacks.mozilla.org/?s=machine+learning
Banach's Algorithmic Corner, University of Warsaw
This blog is maintained by members of Algorithmic group at University of Warsaw:
http://corner.mimuw.edu.pl
109
DataCamp Blog
http://blog.datacamp.com
Natural Language Processing Blog, Hal Daume
http://nlpers.blogspot.co.uk
Maxim Milakov Blog
I am a researcher in machine learning and high-performance computing.
I designed and implemented nnForge - a library for training convolutional and fully
connected neural networks, with CPU and GPU (CUDA) backends.
You will find my thoughts on convolutional neural networks and the results of
applying convolutional ANNs for various classification tasks in the Blog.
http://www.milakov.org

Alfonso Nieto-Castanon Blog
I work on the field of computational neuroscience, and my background is on
neuroscience (Ph.D. Cognitive and Neural Systems, Boston University) and
engineering (B.S./M.S. Telecommunication Engineering, Universidad de Valladolid).
My areas of specialization are modeling and statistics, fMRI analysis methods, and
signal processing.
http://www.alfnie.com/home
Persontyle Blog
Every object on earth is generating data, including our homes, our cars and yes even
our bodies. Data is the by-product of our new digital existence.
Data has the potential to revolutionize the way business, government, science,
research, and healthcare are carried out. Data presents unprecedented
opportunities to those who have the skills and expertise to use it to unveil patterns,
insights, signals and predict trends which was never possible before.
In massively connected data driven world, it is imperative that the workforce of
today and tomorrow is able to understand what data is available and use scientific
methods to analyze and interpret it.
Were here to help you learn and apply the art and science of turning data into
meaningful insights and intelligent predictions
http://www.persontyle.com/blog/

Analytics Vidhya
Learn everything about Analytics
Welcome to Analytics Vidhya!
For those of you, who are wondering what is Analytics Vidhya, Analytics can be
defined as the science of extracting insights from raw data. The spectrum of
analytics starts from capturing data and evolves into using insights / trends from
this data to make informed decisions. Vidhya on the other hand is a Sanskrit noun
meaning Knowledge or Clarity on a subject. Knowledge, which has been gained
through reading literature or through self practice / experimentation.
110
Through this blog, I want to create a passionate community, which dedicates itself in
study of Analytics. I share my learning and tips on Analytics through this blog.
http://www.analyticsvidhya.com/blog/

Bugra Akyildiz's Blog
Great Blog (Notes) both theoretical and practical
I work as a Machine Learning/NLP Engineer at CB Insights where I apply machine
learning algorithms to NLP problems. I received B.S from Bilkent University and
M.Sc from New York University focusing signal processing and machine learning.
http://bugra.github.io
Data origami
8 great data blogs to follow
https://www.dataorigami.net/blogs/great-data-blogs

Rasbts Blog
A collection of tutorials and examples for solving and understanding machine
learning and pattern classification tasks
Links to useful resources
https://github.com/rasbt/pattern_classification#links-to-useful-resources
Gilles Louppe's Blog
Understanding Random Forest, PhD Thesis
https://github.com/glouppe/phd-thesis/blob/master/thesis.pdf

AI Topics
AITopics is a mediated information portal provided by AAAI (The Association for
the Advancement of Artificial Intelligence), with the goal of communicating the
science and applications of AI to interested people around the world.
Contents
! Good Starting Places
! General Readings
! Organizations
! Educational Resources
! Hardware and Software
! Competitions
! News
! Videos
! Podcasts
! Classic Articles & Books
http://aitopics.org/topic/machine-learning


111
AI International
This international AI site is designed to help you locate AI research efforts in your
country or region. Pages on this site will link to local AI societies, universities, labs,
and other research efforts.
http://www.aiinternational.org/index.html

Joseph Misiti's Blog
machine-learning + applied mathematics + django + hadoop. Co-Founder of
@socialq.
https://github.com/josephmisiti
https://medium.com/@josephmisiti

MIRI, Machine Intelligence Research Institute
The mathematics of safe machine intelligence
MIRIs mission is to ensure that the creation of smarter-than-human intelligence has
a positive impact. We aim to make intelligent machines behave as we intend even in
the absence of immediate human supervision. Much of our current research deals
with reflection, an AIs ability to reason about its own behavior in a principled rather
than ad-hoc way. We focus our research on AI approaches that can be made
transparent (e.g. principled decision algorithms, not genetic algorithms), so that
humans can understand why the AIs behave as they do.
http://intelligence.org/blog/

Kevin Davenport Data Blog
I'm a tech enthusiast interested in automation, machine learning, and conveying
complex statistical models through visualization.
Recent Posts
Regularized Logistic Regression Intuition October 27, 2014
Dynamic Time-Series Modeling May 22, 2014
A Real World Introduction to Information Entropy April 21, 2014
The Cost Function of K-Means February 14, 2014
Mahalanobis Distance and Outliers December 3, 2013
Quick Look: Facebooks Kaggle Competition October 21, 2013
Significance Magazine Contribution August 28, 2013
Absolute Deviation Around the Median August 8, 2013
My Trip to Spain: The R User Conference 2013 July 23, 2013
Gradient Boosting: Analysis of LendingClubs Data July 4, 2013
Shiny Server on CentOS June 29, 2013
Data imputation I June 12, 2013
ggplot2 graphics in a loop April 30, 2013
Predicting Dichotomous Outcomes I April 14, 2013
Data visualization with R and ggplot2 March 28, 2013
112
Samsung Phone Data Analysis Project March 19, 2013

Laymans Random Forests March 19, 2013
Commercial Machine Learning Algorithms? March 4, 2013
Simple Count Probability February 24, 2013
Common & special cause variation: Part 1 February 13, 2013
Unknown Variance Two-Tailed Test of Population Mean February 11, 2013
Tidy Data January 31, 2013
http://kldavenport.com

Alexandre Passant's Blog
I'm a hacker, researcher, and entrepreneur. I'm passionate about the Web and I love
when smart algorithms and architectures power beautiful and useful products.
I'm co-founder of MDG Web (http://mdg.io), a music-tech start-up based in
Dogpatch Labs Dublin and focusing on the music discovery field. We're building
seevl (http://seevl.fm), a free, unlimited and targeted music discovery platform
available as a standalone app and a Deezer app. We also work with industry
stakeholders to let hem promote their content on streaming platforms through their
own branded apps.
I was previously a Research Fellow and Unit Leader at DERI (http://deri.ie), the
world's largest Web 3.0 R&D lab, leading high-impact projects with partners such as
Google, Cisco, and more, on the Social / Semantic / Sensor Web, with a focus on
Knowledge Representation and Management, Personalisation, Privacy, Distributed
Systems, and Recommender Systems.
Overall, Im trying to make the Web a better place, and Im having fun doing it.
http://apassant.net

Daniel Nouris Blog
Using convolutional neural nets to detect facial keypoints tutorial, Daniel
Nouri's Blog
This is a hands-on tutorial on deep learning. Step by step, we'll go about building a
solution for the Facial Keypoint Detection Kaggle challenge. The tutorial introduces
Lasagne, a new library for building neural networks with Python and Theano. We'll
use Lasagne to implement a couple of network architectures, talk about data
augmentation, dropout, the importance of momentum, and pre-training. Some of
these methods will help us improve our results quite a bit.
I'll assume that you already know a fair bit about neural nets. That's because we
won't talk about much of the background of how neural nets work; there's a few of
good books and videos for that, like the Neural Networks and Deep Learning online
book. Alec Radford's talk Deep Learning with Python's Theano library is a great
quick introduction. Make sure you also check out Andrej Karpathy's mind-blowing
ConvNetJS Browser Demos.
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-
detect-facial-keypoints-tutorial/
113

Yvonne Rogers Blog
Yvonne Rogers is a Professor of Interaction Design, the director of UCLIC and a
deputy head of the Computer Science department at UCL. Her research interests are
in the areas of ubiquitous computing, interaction design and human-computer
interaction. A central theme is how to design interactive technologies that can
enhance life by augmenting and extending everyday, learning and work activities.
This involves informing, building and evaluating novel user experiences through
creating and assembling a diversity of pervasive technologies.
http://www.interactiveingredients.com
Blog - Spanish

Coming soon
Blog - Italian

Coming soon
Blog - German

Coming soon

Blog - French

Coming soon

Blog - Russian

Coming soon

Blog - Japanese

Coming soon

114
Blog - Chinese

Coming soon
Blog - Portuguese

Coming soon

Journals - English
Journal of Machine Learning Research, MIT Press
http://jmlr.org
Machine Learning Journal (last article could be downloaded for free)
http://link.springer.com/journal/10994
Machine Learning (Theory)
This is an experiment in the application of a blog to academic research in machine
learning and learning theory by John Langford. Exactly where this experiment takes
us and how the blog will turn out to be useful (or not) is one of those prediction
problems we so dearly love in machine learning.
http://hunch.net
List of Journals on Microsoft Academic Research website
http://academic.research.microsoft.com/RankList?entitytype=4&topDomainID=2&subDomainID=6&last=0&start=1&end=10
0
Wired magazine
http://www.wired.com/tag/machine-learning/

Data Science Central
Data Science Central is the industry's online resource for big data practitioners.
From Analytics to Data Integration to Visualization, Data Science Central provides a
community experience that includes a robust editorial platform, social interaction,
forum-based technical support, the latest in technology, tools and trends and
industry job opportunities.
http://www.datasciencecentral.com
Journals Spanish

Coming soon

115
Journals German

Coming soon
Journals Italian

Coming soon
Journals French

Coming soon
Journals Russian

Coming soon
Journals Japanese

Coming soon

Journals Chinese

Coming soon

Journals - Portuguese

Coming soon

Forum, Q&A - English

Data Tau
Hacker News for Data Scientists
Great website with a lot of really good and leading edge information! Respect the
users privacy by do not asking any personal information or email!
116
Remark: machinelearningsalon.org is using standard templates for forums which

are provided by its website hosting system, but machinelearningsalon.org is looking
forward to do the same than DataTau.com!
http://www.datatau.com
Hacker News
Great website like datatau.com but less dedicated to Machine Learning! Respect the
users privacy by do not asking any personal information or email!
https://news.ycombinator.com

Metaoptimize
Where scientists ask and answer questions on machine learning, natural language
processing, artificial intelligence, text analysis, information retrieval, search, data
mining, statistical modeling, and data visualization!
http://metaoptimize.com/qa/
Kaggle Forums
44,032 posts in 8,087 topics in 439 forums. (source 4th June 2014)
https://www.kaggle.com/forums

Reddit in English
News, Research Papers, Videos, Lectures, Softwares and Discussions on:
Machine Learning
Data Mining
Information Retrieval
Predictive Statistics
Learning Theory
Search Engines
Pattern Recognition
Analytics
http://www.reddit.com/r/MachineLearning/
Beginners: Please have a look at our FAQ and Link-Collection
http://www.reddit.com/r/MachineLearning/wiki/index
Cross validated Stack Exchange
Cross Validated is a question and answer site for people interested in statistics,
machine learning, data analysis, data mining, and data visualization. It's 100% free,
no registration required.
http://stats.stackexchange.com
Open data Stack Exchange
Open Data Stack Exchange is a question and answer site for developers and
researchers interested in open data. It's 100% free, no registration required.
http://opendata.stackexchange.com

117
Data Science Beta Stack Exchange

Data Science Stack Exchange is a question and answer site for Data science
professionals, Machine Learning specialists, and those interested in learning more
about the field. It's 100% free, no registration required.
http://datascience.stackexchange.com
Quora
Quora is your best source for knowledge.
Why do I need to sign in?
Quora is a knowledge-sharing community that depends on everyone being able to
pitch in when they know something.
http://www.quora.com/Machine-Learning
Machine Learning Impact Forum
Welcome! Please contribute your ideas for what challenges we might aspire to solve,
changes in our community that can improve machine learning impact, and examples
of machine learning projects that have had tangible impact.
http://mlimpact.com

Forum, Q&A - Spanish

Coming soon

Forum, Q&A - German

Coming soon

Forum, Q&A - Italian

Coming soon

Forum, Q&A - French

Coming soon


118
Forum, Q&A - Russian
Reddit in Russian
http://www.reddit.com/r/MachineLearning_Ru
http://www.reddit.com/r/MachineLearning_Ru/comments/249f7x/meta______faq/

More coming soon

Forum, Q&A Portuguese

Forum, Q&A Chinese

Zhihu.com
Machine Learning
http://www.zhihu.com/search?q=%E6%9C%BA%E5%99%A8%E5%AD%A6%E4
%B9%A0&type=question
Data Mining
http://www.zhihu.com/search?q=%E6%95%B0%E6%8D%AE%E6%8C%96%E6%
8E%98&type=question
http://www.zhihu.com/search?q=%E4%BA%BA%E5%B7%A5%E6%99%BA%E8
%83%BD&type=question
Guokr.com
Machine Learning
http://www.guokr.com/search/all/?wd=%E6%9C%BA%E5%99%A8%E5%AD%A
6%E4%B9%A0
Data Mining
http://www.guokr.com/search/all/?wd=%E6%95%B0%E6%8D%AE%E6%8C%9
6%E6%8E%98&sort=&term=True
http://www.guokr.com/search/all/?wd=%E4%BA%BA%E5%B7%A5%E6%99%B
A%E8%83%BD&sort=&term=True

More coming soon

119
Governmental Reports - English

Big Data report, Whitehouse, US
http://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf
Fun - English
Founder of PhD Comics

Jorge is the creator of "PHD Comics", the popular comic strip about life (or the lack
thereof) in Academia. He is also the co-founder of PHDtv, a video science and
discovery outreach collaborative, and a founding board member of Endeavor
College Prep, a non-profit school for kids in East L.A. He earned his Ph.D. in Robotics
from Stanford University and was an Instructor and Research Associate at Caltech
from 2003-2005. He is originally from Panama.
http://jorgecham.com

Companies using Machine Learning and Artificial Intelligence techniques will
answer 3 questions, and their answers will be published for free on this website:
1- Why is Machine Learning important to your Business?
2- What Machine Learning algorithms and technologies are you using?
3- What Machine Learning development could you forecast in the near future?


120
MACHINE LEARNING RESEARCH GROUPS

MACHINE LEARNING RESEARCH GROUPS in AMERICA, USA
MIT
Computer Science and Artificial Intelligence Lab
The Computer Science and Artificial Intelligence Laboratory known as CSAIL is
the largest research laboratory at MIT and one of the worlds most important
centers of information technology research.
CSAIL and its members have played a key role in the computer revolution. The Labs
researchers have been key movers in developments like time-sharing, massively
parallel computers, public key encryption, the mass commercialization of robots,
and much of the technology underlying the ARPANet, Internet and the World Wide
Web.
CSAIL members (former and current) have launched more than 100 companies,
including 3Com, Lotus Development Corporation, RSA Data Security, Akamai,
iRobot, Meraki, ITA Software, and Vertica. The Lab is home to the World Wide Web
Consortium (W3C), directed by Tim Berners-Lee, inventor of the Web and a CSAIL
member.
CSAIL research is focused on developing the architectures and infrastructures of
tomorrows information technology, and on creating innovations that will yield
long-term improvements in how people live and work. Lab members conduct
research in almost all aspects of computer science, including artificial intelligence,
the theory of computation, systems, machine learning, computer graphics, as well as
exploring revolutionary new computational methods for advancing healthcare,
manufacturing, energy and human productivity.
http://www.csail.mit.edu
Stanford University
Artificial Intelligence Laboratory
Welcome to the Stanford AI Lab
Founded in 1962, The Stanford Artificial Intelligence Laboratory (SAIL) has been a
center of excellence for Artificial Intelligence research, teaching, theory, and practice
for over fifty years.
Reading group
We have several weekly reading groups where we present and discuss papers on
various topics in machine learning, natural language processing, computer vision,
etc.
Autonomous Highway Driving
A deep learning model outputs the location of lane markings and surrounding cars
given only a single camera image.
http://ai.stanford.edu
121
http://ai.stanford.edu/courses/
Carnegie Mellon University
Machine Learning Department
The Machine Learning Department is an academic department within Carnegie
Mellon University's School of Computer Science. We focus on research and
education in all areas of statistical machine learning. Watch an interview with Tom
Mitchell, Department Head:
http://videolectures.net/mlas06_mitchell_itm/
http://www.ml.cmu.edu

Noah's ARK Research Group, Carnegie Mellon University
Noah's ARK[1] is Noah Smith's informal research group at the Language
Technologies Institute, School of Computer Science, Carnegie Mellon University.
(The research is formal; the group is informal.) As you may have guessed, our
research focuses on problems of ambiguity and uncertainty in natural language
processing, including morphology, syntax, semantics, translation, and
behavioral/social phenomena observed through languageall viewed through a
computational lens.
http://www.ark.cs.cmu.edu

Intelligent Interactive Systems Group at Harvard University
Intelligent Interactive Systems are fundamentally hard to design because they
require intelligent technology that is well suited for people's abilities, limitations,
and preferences; they also require entirely novel interactions that can give the user
a predictable and reliable experience despite the fact that the underlying technology
is inherently proactive, unpredictable, and occasionally wrong. Thus, design of
successful intelligent interactive systems requires intimate knowledge of and ability
to innovate in two very disparate areas: human-computer interaction and
artificial intelligence or machine learning.
Our projects span the full range from formal user studies to statistical machine
learning. We have worked on developing new intelligent technologies to enable
novel interactions (e.g., SUPPLE system) and on understanding the principles
underlying how people interact with intelligent systems (e.g., the project on
exploring the design space of adaptive user interfaces). Our Brain-Computer
Interface project aims at developing a new set of interactions for efficiently
controlling complex applications, and we are also interested in building and
studying complete applications. One particular area of inteterest is the ability-based
user interfaces -- an approach for adapting interactions to the individual abilities of
people with impairments or of able-bodied people in unusual situations.
http://iis.seas.harvard.edu
http://iis.seas.harvard.edu/resources/

122
University of California, Berkeley

Statistical Machine Learning
Research Statement
Statistical machine learning merges statistics with the computational sciences---
computer science, systems science and optimization. Much of the agenda in
statistical machine learning is driven by applied problems in science and
technology, where data streams are increasingly large-scale, dynamical and
heterogeneous, and where mathematical and algorithmic creativity are required to
bring statistical methodology to bear. Fields such as bioinformatics, artificial
intelligence, signal processing, communications, networking, information
management, finance, game theory and control theory are all being heavily
influenced by developments in statistical machine learning.
The field of statistical machine learning also poses some of the most challenging
theoretical problems in modern statistics, chief among them being the general
problem of understanding the link between inference and computation.
Research in statistical machine learning at Berkeley builds on Berkeley's world-class
strengths in probability, mathematical statistics, computer science and systems
science. Moreover, by its interdisciplinary nature, statistical machine learning helps
to forge new links among these fields.
An education in statistical machine learning at Berkeley thus involves an immersion
in the traditions of statistical science broadly defined, a thoroughgoing involvement
in exciting applied problems, and an opportunity to help shape the future of
statistics.
http://www.stat.berkeley.edu/~statlearning/
UC Berkeley AMPLab, AMP: ALGORITHMS MACHINES PEOPLE
People will play a key role in data-intensive applications not simply as passive
consumers of results, but as active providers and gatherers of data, and to solve ML-
hard problems that algorithms on their own cannot solve. With crowdsourcing,
people can be viewed as highly valuable but unreliable and unpredictable resources,
in terms of both latency and answer quality. They must be incentivized
appropriately to provide quality answers despite varying expertise, diligence and
even malicious behavior. The AMPLab is addressing these issues in all phases of the
analytics lifecycle.
https://amplab.cs.berkeley.edu
Videos
https://www.youtube.com/user/BerkeleyAMPLab/videos?spfreload=10
Berkeley Institute for Data Science
The Berkeley Institute for Data Science (BIDS) was founded in fall 2013 to build on
existing campus strengths with a multidisciplinary emphasis that aims to facilitate
and enhance the development and application of cutting-edge data science
techniques in the biological, physical, social and engineering sciences. The Institute
aims to build on the many recent innovations in data science techniques so that they
can be applied in effective ways to domain science challenges.
123
BIDS brings together researchers across disciplines and enhances career paths for
data scientists through a number of newly created Data Science Fellows positions,
graduate student fellowships, boot-camps, special classes, and conferences of
interest to the academic community and general public.
The Institutes initial support is provided by a 5-year $12.5 million grant from the
Moore and Sloan Foundations together with significant support provided by UC
Berkeley. The Moore-Sloan Data Science Environment also supports similar
programs with shared goals and objectives at the University of Washington and New
York University.
http://vcresearch.berkeley.edu/DATASCIENCE/BIDS
Data Science Lecture Series: Maximizing Human Potential Using Machine Learning-
Driven Applications
https://www.youtube.com/channel/UCBBd3JxQl455JkWBeulc-9w?spfreload=10

Princeton University
Department of Computer Science - ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
Machine learning and computational perception research at Princeton is focused on
the theoretical foundations of machine learning, the experimental study of machine
learning algorithms, and the interdisciplinary application of machine learning to
other domains, such as biology and information retrieval. Some of the techniques
that we are studying include boosting, probabilistic graphical models, support-
vector machines, and nonparametric Bayesian techniques. We are especially
interested in learning from large and complex data sets. Example applications
include habitat modeling of species distributions, topic models of large collections of
scientific articles, classification of brain images, protein function classification, and
extensions of the Wordnet semantic network.
http://www.cs.princeton.edu/research/areas/mlearn

University of California, Los Angeles (UCLA)
Research Laboratories and Groups
Automated Reasoning Group (Adnan Darwiche)
Biocybernetics Laboratory (Joe DiStefano)
Center for Vision, Cognition, Learning and Art (Song-Chun Zhu)
Cognitive Systems Laboratory (Judea Pearl)
Concurrent Systems Laboratory (Yuval Tamir)
Digital Arithmetic and Reconfigurable Architecture Laboratory (Milos Ercegovac)
ER: Embedded and Reconfigurable System Design (Majid Sarrafzadeh)
Information and Data Management Group (multiple faculty)
Internet Research Laboratory (Lixia Zhang)
Laboratory for Embedded Collaborative Systems (LECS) (archived CENS documents)
Laboratory for Advanced Systems Research (LASR) (Peter Reiher)
MAGIX: Computer Graphics & Vision Laboratory (Demetri Terzopoulos)
124
Multimedia Information System Technology Group & Laboratory (Alfonso Cardenas)

Network Research Laboratory (Mario Gerla)
Software Systems Group (multiple faculty)
Vision Laboratory (Stefano Soatto)
VLSI Architecture, Synthesis & Technology (VAST) Laboratory (Jason Cong)
Web Information Systems Laboratory (Carlo Zaniolo)
WiNG (Wireless Networking Group) (Songwu Lu)
http://www.cs.ucla.edu/research/research-labs

Cornwell University
https://confluence.cornell.edu/display/ml/Home
https://confluence.cornell.edu/display/ML/Courses

University of Illinois at Urbana Champaign
Machine Learning Research
The Department of Computer Science at the University of Illinois at Urbana
Champaign has several faculty members working in the area of machine learning,
learning theory, explanation based learning, learning in natural language processing
and data mining. In addition, many faculty members inside and outside the
department whose primary research interests are in other areas have specific
research projects involving machine learning in some way.
http://ml.cs.illinois.edu

California Institute of Technology, Caltech
Department of Computing + Mathematical Science
The Computing + Mathematical Sciences department pursues numerous research
interests covering a wide array of application areas. We take full advantage of
Caltech's unique interdisciplinary character by drawing on research expertise not
only from our own department, but from throughout the Institute. Research efforts
within the department evolve at a fast pace, and cover currently six discernible
focus areas:
Discrete Differential Modeling
DNA Computing and Molecular Programming
Perceptual and Machine Learning for Autonomous Systems
Rigorous Systems Research
Scientific Computing and Applied Analysis
Theory of Computation
http://www.cms.caltech.edu/research/foci


125
University of Washington
Machine Learning
UW is one of the world's top centers of research in machine learning. We are active
in most major areas of ML and in a variety of applications like natural language
processing, vision, computational biology, the Web, and social networks. Check out
the links on the left to find out who's who and what's happening in ML at UW.
And be sure to see our CSE-wide efforts in Big Data
https://www.cs.washington.edu/research/ml/
"Big Data" Research and Education
UW CSE is driving the "Big Data" revolution. Our traditional strength in data
management (Magda Balazinska, Bill Howe, Dan Suciu), machine learning (Pedro
Domingos), and open information extraction (Oren Etzioni, Dan Weld) has recently
been augmented by key hires in machine learning (Emily Fox, Carlos Guestrin, Ben
Taskar) and data visualization (Jeff Heer).
Our efforts are coordinated with those of outstanding researchers in the University
of Washington's top-ten programs in Statistics, Biostatistics, and Applied
Mathematics, among others. Through the University of Washington eScience
Institute (directed by Ed Lazowska) we are integrally involved in ensuring that
researchers across the campus have access to cutting-edge approaches to data-
driven discovery.
http://www.cs.washington.edu/research/bigdata
Social Robotics Lab - Yale University
The members of our lab perform research over a diverse collection of topics.
Though these projects approach social and developmental research from varied
perspectives, they all share common themes. Robots provide an embodied,
empirical testbed that allows for repeated validation. Robots also enable the use of
social interactions as part of the modeled experimental environment, staying
grounded in real-world perceptions, and appropriately integrating perceptual,
motor, and cognitive skills.
http://scazlab.yale.edu/publications/all-publications

Georgia Institute of Technology
ML@GT
http://ml.cc.gatech.edu

University of Texas and Austin
Machine Learning Research Group
Machine learning is the study of adaptive computational systems that improve
their performance with experience.
126
The Machine Learning Research Group at UT Austin is led by Professor Raymond

Mooney, and our research has explored a wide variety of issues in machine learning
for over two decades. Our current research focuses primarily on natural language
learning, statistical relational learning, transfer learning, and active learning.
https://www.cs.utexas.edu/~ml/

University of Pennsylvania
Penn Research in Machine Learning
Current projects:
Structured Prediction
Bandit and Limited-Feedback Problems
Computation and Statistics
Online Learning, Sequential Prediction, Regret Minimization
Statistical Learning Theory
http://priml.upenn.edu/Main/Research
Columbia University
Machine Learning @ Columbia
The Columbia Machine Learning Lab pursues
research in machine learning with applications
in vision, graphs and spatio-temporal data.
Funding provided by NSF.
http://www.cs.columbia.edu/learning/
New York City University
CILVR Lab and Center for Data Science
The CILVR Lab (Computational Intelligence, Learning, Vision, and Robotics)
regroups three faculty members, research scientists, postdocs, and students
working on AI, machine learning, and a wide variety of applications, notably
computer perception, robotics, and health care.
http://cilvr.nyu.edu/doku.php
http://cds.nyu.edu

University of Chicago
http://ml.cs.uchicago.edu

The Johns Hopkins Center for Language and Speech Processing (CLSP) Archive Videos
The Johns Hopkins Center for Language and Speech Processing (CLSP) is an
interdisciplinary research and educational center focused on the science and
technology of language and speech. Within its field, CLSP is recognized as one of the
largest and most influential academic research centers in the world. The center
127
conducts research across a broad spectrum of fundamental and applied topics

including acoustic processing, automatic speech recognition, big data, cognitive
modeling, computational linguistics, information extraction, machine learning,
machine translation, and text analysis.
http://clsp.jhu.edu/seminars/archive/video/
Miscellaneous
IARPA Organization
The Intelligence Advanced Research Projects Activity (IARPA) invests in high-
risk/high-payoff research programs that have the potential to provide our nation
with an overwhelming intelligence advantage over future adversaries.
http://www.iarpa.gov

MACHINE LEARNING RESEARCH GROUPS in AMERICA, CANADA
University of Toronto
Machine Learning Lab
Machine Learning @ UofT:
The Department of Computer Science at the University of Toronto has several
faculty members working in the area of machine learning, neural networks,
statistical pattern recognition, probabilistic planning, and adaptive systems. In
addition, many faculty members inside and outside the department whose primary
research interests are in other areas have specific research projects involving
machine learning in some way.
http://learning.cs.toronto.edu/index.shtml
http://learning.cs.toronto.edu/index.shtml?section=research
The Fields Institute for Research in Mathematical Science, Canada
The Fields Institute is a center for mathematical research activity - a place where
mathematicians from Canada and abroad, from business, industry and financial
institutions, can come together to carry out research and formulate problems of
mutual interest. Our mission is to provide a supportive and stimulating environment
for mathematics innovation and education. The Fields Institute promotes
mathematical activity in Canada and helps to expand the application of mathematics
in modern society.
http://www.fields.utoronto.ca
University of Waterloo
Artificial Intelligence Research Group
The Artificial Intelligence Group conducts research in many areas of artificial
intelligence. The group has active interests in: models of intelligent interaction,
128
multi-agent systems, natural language understanding, constraint programming,

computational vision, robotics, machine learning, and reasoning under uncertainty.
http://ai.uwaterloo.ca
Course material
http://ai.uwaterloo.ca/coursegr.html
University of British Columbia
Artificial Intelligence Research Groups
Research Groups
Computer Vision and Robotics: This is one of the most influential vision and
robotics groups in the world. It is this group that created RoboCup and the
celebrated SIFT features. The students in this group have won most of the AAAI
Semantic Robot Challenges. The group has four active faculty: David Lowe, Jim Little,
Alan Mackworth and Bob Woodham.
Empirical Algorithmics: Led by Holger Hoos and Kevin Leyton Brown, this
research group studies the empirical behaviour of algorithms and develops
automated methods for improving algorithmic performance. Work by the empirical
algorithmics group at UBC/CS has lead to substantial improvements in the state of
the art in solving a wide range of prominent problems, including SAT, AI Planning
and Mixed Integer Programming, and won numerous awards.
Game Theory and Decision Theory: With Kevin Leyton Brown in the lead, this group
has made significant contributions to algorithmic game theory, multiagent systems
and mechanism design. David Poole also contributes to this group with his work on
decision processes and planning. The research problems attacked by this group are
therefore of great importance to e-commerce, auctions and advertising.
Intelligent User Interfaces: With Cristina Conati and Giuseppe Carenini this group's
goal is to investigate principles and techniques for preference modeling and
elicitation, interactive decision making, user-adaptive information visualization
and visual interfaces for text analysis.
Knowledge Representation and Reasoning: David Poole leads this group with his
foundational work on probabilistic first order logic and semantic science. This work
on logical and probabilistic reasoning has been of profound and broad impact in the
field of artificial intelligence (AI). Holger Hoos is also an important member of this
group with his work on satisfiability (SAT) and planning, which has won numerous
awards and competitions.
Machine Learning: With the guidance of Nando de Freitas and Kevin Murphy, this
group's vision is to advance the frontier of knowledge in Bayesian inference, Monte
Carlo algorithms, probabilistic graphical models, neural computation,
personalization, mining web-scale datasets, prediction and optimal decision
making.
Natural Language Processing: Under the leadership of Giuseppe Carenini and
Raymond Ng (Data Management and Mining Lab) this group's vision is to further
our understanding of abstactive summarization, mining conversations and
evaluative text, natural language generation.
https://www.cs.ubc.ca/cs-research/lci/research-groups/machine-learning
129

University of Montreal
Machine Learning Lab
The LISA (machine learning lab) aims towards improving our understanding of the
principles that give rise to powerful learning and to intelligence, which will be
important to make significant progress on learning algorithms and artificial
intelligence (AI). Acquiring the kind of complex knowledge necessary for AI requires
some form of learning, with the ability to discover hidden relationships and
statistical structure that may be highly complex, with many interacting factors of
variations explaining the observed high-dimensional data that sensors can provide.
According to us this is the main challenge for machine learning and AI.
Like the brain, deep learning algorithms are based on several levels of
representation and processing, creating several levels of levels of abstraction.
Compared to learning algorithms based on shallower architectures, deep learners
have the potential to efficiently represent highly complex functions and
distributions. We explore various learning algorithms for deep learning, based in
particular on unsupervised pre-training (e.g., various kinds of Boltzmann machines
and auto-encoders).
Unsupervised pre-training allows to exploit very large quantities of mostly
unlabeled examples (such as documents, images, and videos from the web). The
learned representations capture the salient factors of variation (and invariances)
implicitly present in the data, and can be exploited in the context of several
supervised learning tasks (multi-task learning, self-taught learning, semi-supervised
learning).
http://lisa.iro.umontreal.ca/index_en.html
University of Sherbrooke
Intelligence artificielle
Trois quipes oeuvrent dans cet axe de recherche; d'autres projets sont conduits par
des chercheurs agissant titre individuel.
L'quipe de recherche dans le domaine des systmes tutoriels intelligents ASTUS
(Apprentissage par Systme Tutoriel de l'Universit de Sherbrooke) travaille autour
des thmes suivants: reprsentation des connaissances, modlisation de
l'utilisateur, interactions humain-machine, psychologie de l'ducation et sciences
cognitives.
L'quipe de recherche dans le domaine du forage de donnes, Prospectus
(Prospection de donnes l'Universit de Sherbrooke), travaille autour des thmes
suivants: prospection des donnes, prospection et modlisation des connaissances,
reconnaissance de formes, segmentation et classification, mthodes d'intelligence
artificielle non symboliques, rseaux de neurones et rseaux baysiens, dtection de
structures et comportements latents.
L'quipe de recherche dans le domaine de la planification en intelligence artificielle,
PLANIART, travaille autour de thmes suivant : planification de trajectoires,
planification de comportements et reconnaissance de plans dans les jeux vido et en
130
robotique mobile. La planification permet de dcider quoi faire (dcomposition des

buts), comment le faire (allocation des ressources) et quand le faire
(ordonnancement).
http://www.usherbrooke.ca/informatique/recherche/domaines-de-recherche/intelligence-artificielle/
Centre de recherche sur les environnements intelligents

Le Centre de Recherche sur les Environnements Intelligents (CREI) comprend 13
membres rguliers, 11 membres associs et plus d'une soixantaine d'tudiants
gradus. Le CREI fdre 7 laboratoires dont les intrts de recherche portent
sur l'imagerie numrique, lintelligence artificielle, la modlisation-validation et
lintelligence ambiante. Les chercheurs du CREI collaborent depuis des annes,
dveloppant des applications en lien avec les environnements intelligents.
http://www.usherbrooke.ca/crei/

University of Laval
Machine Learning Research Group
Selected Papers
2014
Luc Bgin, Pascal Germain, Franois Laviolette and Jean-Francis Roy. PAC-Bayesian
Theory for Transductive Learning. International Conference on Artificial
Intelligence and Statistics (AISTATS), 2014. [ pdf, supplementary, abstract | Poster |
Source code ]
2013
Sbastien Gigure, Franois Laviolette, Mario Marchand, Denise Tremblay, Sylvain
Moineau, ric Biron and Jacques Corbeil. Improved design and screening of high
bioactivity peptides for drug discovery. Under Review. [ pdf | Source Code ]
Sbastien Gigure, Alexandre Drouin, Alexandre Lacoste, Mario Marchand, Jacques
Corbeil, Franois Laviolette. MHC-NP: Predicting Peptides Naturally Processed by
the MHC. Journal of Immunological Methods, 2013, vol. 400, p. 30-36. [ pdf ]
Pascal Germain, Amaury Habrard, Franois Laviolette, Emilie Morvant. A PAC-
Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers.
In ICML 2013. [ bib | pdf | Source Code ]
Sbastien Gigure, Franois Laviolette, Mario Marchand, Khadidja Sylla. Risk Bounds
and Learning Algorithms for the Regression Approach to Structured Output
Prediction. In ICML 2013. [ bib | pdf ]
Maxime Latulippe, Alexandre Drouin, Philippe Giguere, and Franois Laviolette.
Accelerated Robust Point Cloud Registration in Natural Environments through
Positive and Unlabeled Learning. In Proceedings of the International Joint Conference
on Artificial Intelligence (IJCAI 2013) 2013. [ pdf ]
Sbastien Gigure, Mario Marchand, Franois Laviolette, Jacques Corbeil, and
Alexandre Drouin. Learning a Peptide-Protein Binding Affinity Predictor with Kernel
Ridge Regression. BMC Bioinformatics, 2013, vol. 14, no 1, p. 82. [ bib | pdf ]
http://graal.ift.ulaval.ca

131
More to come
MACHINE LEARNING RESEARCH GROUPS in AMERICA, BRAZIL

USP - UNIVERSIDADE DE SO PAULO, Instituto de Cincias Matemticas e de

Computao
http://www.icmc.usp.br/Portal/

UFRJ - Federal University of Rio de Janeiro

UFMG - Federal University of Minas Gerais
UFRGS - Federal University of Rio Grande do Sul
Unicamp - University of Campinas
Unesp - So Paulo State University
UFSC - Federal University of Santa Catarina
UnB - University of Braslia
UFPR - Federal University of Paran
UFPE - Federal University of Pernambuco
UNIFESP - Federal University of So Paulo
UFSCAR- Federal University of So Carlos
UERJ- Rio de Janeiro State University
UFSM- Federal University of Santa Maria
PUC-RIO- Pontifical Catholic University of Rio de Janeiro
UFC- Federal University of Cear
UFBA- Federal University of Bahia
UFF- Fluminense Federal University
PUCRS- Pontifical Catholic University of Rio Grande do Sul
UFV- Federal University of Viosa
More coming soon

MACHINE LEARNING RESEARCH GROUPS in EUROPE, UK

University College London
The Centre for Computational Statistics and Machine Learning (CSML) spans three
departments at University College London, Computer Science, Statistical Science,
and the Gatsby Computational Neuroscience Unit.
The Centre will pioneer an emerging field that brings together statistics, the recent
extensive advances in theoretically well-founded machine learning, and links with a
broad range of application areas drawn from across the college, including
neuroscience, astrophysics, biological sciences, complexity science, etc. There is a
132
deliberate intention to maintain and cultivate a plurality of approaches within the

centre including Bayesian, frequentist, on-line, statistical, etc.
http://www.csml.ucl.ac.uk

CASA (Centre for Advanced Spatial Studies) Working Papers, University College
London
http://www.bartlett.ucl.ac.uk/casa/latest/publications/working-papers
Example #198
A global inter-country economic model based on linked input-output models
We present a new, flexible and extensible alternative to multi-regional input-output
(MRIO) for modelling the global economy. The limited coefficient set of MRIO
(technical coefficients only) is extended to include two new sets of coefficients,
import ratios and import propensities. These new coefficient sets assist in the
interaction of the new model with other social science models such as those of trade,
migration, international security and development aid. The model uses input-output
models as descriptions of the internal workings of countries' economies, and
couples these more loosely than in MRIO using trade data for commodities and
services from the UN. The model is constructed using a minimal number of
assumptions, seeks to be as parsimonious as possible in terms of the number of
coefficients, and is based to a great extent on empirical observation. Two new
metrics are introduced, measuring sectors' economic significance and economic
self-reliance per country. The Chinese vehicles sector is shown to be the world's
most significant, and self-reliance is shown to be strongly correlated with
population. The new model is shown to be equivalent to an MRIO under an
additional assumption, allowing existing analysis techniques to be applied.
http://www.bartlett.ucl.ac.uk/casa/publications/working-paper-198

Oxford University
The Machine Learning Research Group is a sub-group within Information
Engineering (Robotics Research Group) in the Department of Engineering Science of
the University of Oxford.
We are interested in probabilistic reasoning applied to problems in science,
engineering and computing. We use the tools of statistical, and in particular
Bayesian, inference to deal rationally with uncertainty and information in a number
of domains including astronomy, biology, finance, image & signal processing and
multi-agent systems, as well as researching the theory of Bayesian modelling and
inference.
http://www.robots.ox.ac.uk/~parg/doku.php?id=home
Imperial College
The Data Science Institute at Imperial College is being established to conduct
research on the foundations of data science by developing advanced theory,
technology and systems that will contribute to the state-of-the-art in data science
and big data, and support data-driven research at Imperial and beyond. The
133
Institute will empower Imperial and its partners to collaborate in the pursuit of
world class data-driven innovation.
http://www3.imperial.ac.uk/data-science

The University of Edinburgh, Institute for Adaptive and Neural Computation
http://www.anc.ed.ac.uk/machine-learning/

Cambridge University
We are a part of the Computational and Biological Learning Laboratory located in
the Department of Engineering at the University of Cambridge. The research in our
group is very broad, and we are interested in all aspects of machine
learning. Particular strengths of the group are in Bayesian approaches to modelling
and inference in statistical applications. The type of work we do can range from
studying fundamental concepts in applied Bayesian statistics, all the way to getting
our algorithms to perform competitively against the state-of-the-art in big-data
applications. We also work in a broad range of application domains, including
neuroscience, bioinformatics, finance, social networks, and physics, just to name a
few, and we actively seek to collaborate with other groups within the Department of
Engineering, throughout the university as a whole, and with other groups within the
UK and around the world. If you are interested in finding out more about our
research, please visit our Publications page, or visit the individual research pages of
our group members.
http://mlg.eng.cam.ac.uk

About Us
Centre for Intelligent Sensing, Queen Mary University of London, UK

I am delighted to introduce you to the Centre for Intelligent Sensing (CIS).
CIS is a focal point for research in Intelligent Sensing at Queen Mary University of
London. The Centre focuses on breakthrough innovations in computational
intelligence that will have a major impact in transforming the way humans and
machines utilise a variety of sensor inputs for interpretation and decision making.
The Centre gathers 33 academics with expertise in all aspects of intelligent sensing
from the design and building of the physical sensors to the mathematical and
computational challenges of extracting key information from real-time streams of
high-dimensional data acquired by networks of sensors. The legal, ethical and social
implications of these processes are also addressed.
CIS researchers have an outstanding international reputation in camera and sensor
networks, image and signal processing, computer vision, data mining, pattern
recognition, machine learning, bio-inspired computing, human-computer
interaction, affective computing and social signal processing.
The Centre also provides post-graduate research and teaching in Intelligent Sensing,
and is responsible for the MSc programme in Computer Vision.
I do hope that you will enjoy reading this brochure and learning more about who we
are and how the research we do helps to address important societal challenges. I
134
also invite you to keep up to date with our activities by following us on Twitter
@intelsensing and to enjoy our research videos at http://cis.eecs.qmul.ac.uk.
Professor Andrea Cavallaro Director
http://cis.eecs.qmul.ac.uk
Videos
https://www.youtube.com/user/intelsensing/feed?spfreload=10

ICRI, The Intel Collaborative Research Institute
The Intel Collaborative Research Institute is concerned with how to enhance the
social, economic and environmental well being of cities by advancing compute,
communication and social constructs to deliver innovations in system architecture,
algorithms and societal participation.
http://www.cities.io

MACHINE LEARNING RESEARCH GROUPS in EUROPE, FRANCE

Magnet, MAchine learninG in information NETworks, INRIA, France
The Magnet project aims to design new machine learning based methods geared
towards mining information networks. Information networks are large collections
of interconnected data and documents like citation networks and blog networks
among others. For this, we will define new structured prediction methods for
(networks of) texts based on machine learning algorithms in graphs. Such
algorithms include node classification, link prediction, clustering and probabilistic
modeling of graphs. Envisioned applications include browsing, monitoring and
recommender systems, and more broadly information extraction in information
networks. Application domains cover social networks for cultural data and e-
commerce, and biomedical informatics.
https://team.inria.fr/magnet/
Sierra Team - Ecole Normale Superieure , CNRS, INRIA
SIERRA is based in the Laboratoire d'Informatique de l'cole Normale Superiure
(CNRS/ENS/INRIA UMR 8548) and is a joint research team between INRIA
Rocquencourt, cole Normale Suprieure de Paris and Centre National de la
Recherche Scientifique.
We follow four main research directions:
Supervised learning: This part of our research focuses on methods where, given a
set of examples of input/output pairs, the goal is to predict the output for a new
input, with research on kernel methods, calibration methods, structured prediction,
and multi-task learning.
Unsupervised learning: We focus here on methods where no output is given and
the goal is to find structure of certain known types (e.g., discrete or low-
dimensional) in the data, with a focus on matrix factorization, statistical tests,
dimension reduction, and semi-supervised learning.
135
Parsimony: The concept of parsimony is central to many areas of science. In the

context of statistical machine learning, this takes the form of variable or feature
selection. The team focuses primarily on structured sparsity, with theoretical and
algorithmic contributions.
Optimization: Optimization in all its forms is central to machine learning, as many
of its theoretical frameworks are based at least in part on empirical risk
minimization. The team focuses primarily on convex and bandit optimization.
http://www.di.ens.fr/sierra/
ENS Ecole Normale Superieure
The Computer Science Department of ENS (DI ENS) is both a teaching department
and a research laboratory affiliated with CNRS and INRIA (UMR 8548).
On the teaching side, the DI ENS trains students through its Pre-doctoral program
and the Masters program (MPRI).
On the research side, the research is structured into research groups. The DI ENS is
member of the Fondation Sciences Mathmatiques de Paris.
The Computer Services (SPI) and the Mathematics and Computer Science Library
are common to the DI ENS and the Department of Mathematics and Applications
(DMA).
Teams of the Computer Science Department at cole normale suprieure
Antique Static analysis by abstract interpretation (head: Xavier Rival)
Cascade Cryptography (head: David Pointcheval)
Data Signal Processing and Classification (head: Stphane Mallat)
Dyogene Dynamics of Geometric Networks (head: Marc Lelarge)
Parkas Parallelism of Synchronous Kahn Networks (head: Marc Pouzet)
Sierra Machine Learning (head: Francis Bach)
Talgo Theory, Algorithms, topoLogy, Graphs, and Optimization (head: Claire
Mathieu)
Willow Artificial Vision (head: Jean Ponce)
http://www.di.ens.fr

WILLOW Publications and PhD Thesis
Our research is concerned with representational issues in visual object recognition
and scene understanding. Our objective is to develop geometric, physical, and
statistical models for all components of the image interpretation process, including
illumination, materials, objects, scenes, and human activities. These models will be
used to tackle fundamental scientific challenges such as three-dimensional (3D)
object and scene modeling, analysis, and retrieval; human activity capture and
classification; and category-level object and scene recognition. They will also
support applications with high scientific, societal, and/or economic impact in
domains such as quantitative image analysis in domains such as archaeology and
cultural heritage conservation; film post-production and special effects; and video
annotation, interpretation, and retrieval. Moreover, machine learning now
represents a significant part of computer vision research, and one of the aims of the
project is to foster the joint development of contributions to machine learning and
136
computer vision, together with algorithmic and theoretical work on generic

statistical machine learning.
http://www.di.ens.fr/willow/publications/YearOnly/publications.html

MACHINE LEARNING RESEARCH GROUPS in EUROPE, GERMANY
Max Planck Institute for Intelligent Systems, Tbingen site
Intelligent systems can optimise their structure and properties in order to
successfully function within a complex, partially changing environment. Three sub-
areas perception, learning and action can be differentiated here. The scientists at
the Max Planck Institute for Intelligent Systems are carrying out basic research and
development of intelligent systems in all three sub-areas. Research expertise in the
areas of computer science, material science and biology is brought together in one
Institute, at two different sites. Machine learning, image recognition, robotics and
biological systems will be investigated in Tbingen, while so-called learning
material systems, micro- and nanorobitics, as well as self-organisation will be
explored in Stuttgart. Although the focus is on basic research, the Institute has a
high potential for practical applications in, among other areas, robotics, medical
technology, and innovative technologies based on new materials.
http://www.mpg.de/1342929/intelligenteSystemeTuebingen

BRML Research Lab, Institute of Informatics at the Technische Universitt Mnchen
Patrick van der Smagt's BRML is a collaborative research lab of fortiss--an Institute
at TUM; Chair for Robotics and Embedded Systems, Institute of Informatics at the
Technische Universitt Mnchen; and the DLR Institute of Robotics and
Mechatronics. The heart of our inforfacious research is formed by machine learning.
Within that, we focus on biomechanics and body-machine interfaces. We apply our
methods to advanced rehabilitation and assistive robotics.
http://brml.org

more to come

MACHINE LEARNING RESEARCH GROUPS in EUROPE, SWITZERLAND
EPFL Ecole Polytechnique Federale de Lausanne, Switzerland
Artificial Intelligence & Machine Learning
The modern world is full of artificial, abstract environments that challenge

our natural intelligence. The goal of our research is to develop Artificial
Intelligence that gives people the capability to master these challenges,
ranging from formal methods for automated reasoning to interaction
techniques that stimulate truthful elicitation of preferences and opinions.
Another aspect is characterizing human intelligence and cognitive science,
with applications in human-computer interaction and computer animation.
137
Machine Learning aims to automate the statistical analysis of large

complex datasets by adaptive computing. A core strategy to meet growing
demands of science and applications, it provides a data-driven basis for
automated decision making and probabilistic reasoning. Machine learning
applications at EPFL range from natural language and image processing to
scientific imaging as well as computational neuroscience.
http://ic.epfl.ch/intelligence-artificielle-et-apprentissage-automatique

IDSIA: the Swiss AI Lab
The Swiss AI Lab IDSIA (Istituto Dalle Molle di Studi sull'Intelligenza Artificiale) is a
non-profit oriented research institute for artificial intelligence, affiliated with both
the Faculty of Informatics of the Universit della Svizzera Italiana and the
Department of Innovative Technologies of SUPSI, the University of Applied Sciences
of Southern Switzerland. We focus on machine learning (deep neural networks,
reinforcement learning), operations research, data mining, and robotics.
IDSIA researchers win nine international competitions
Our neural networks research team has won nine international competitions in
machine learning and pattern recognition. Follow the link to learn more about the
methods that allowed us to achieve these results.
http://www.idsia.ch

MACHINE LEARNING RESEARCH GROUPS in EUROPE, NETHERLANDS
Machine Learning Research Groups in The Netherlands
A large number of researchers and research groups are active in the broad area of
machine learning, ranging from Bayesian inference, to robotics and neural
networks. Collected is a brief overview, the researchers can be contacted for more
information.
http://www.mlplatform.nl/researchgroups/
MACHINE LEARNING RESEARCH GROUPS in EUROPE, POLAND
University of Warsaw, Dept. of Mathematics, Informatics and Mechanics
Algorithms group
Our research
The research of our group focuses on several branches of modern

algorithmics and the underlying fields of discrete mathematics. The latter
include combinatorics on words and on ordered sets, graph theory, formal
languages, computational geometry, information theory, foundation of
cryptography. The research on algorithms covers parallel and distributed
algorithms, large scale algorithms, approximation and randomized
algorithms, fixed-parameter and exponential-time algorithms, dynamic
algorithms, radio algorithms, multi-party computations, and cryptographic
protocols.
138
http://zaa.mimuw.edu.pl

more to come

MACHINE LEARNING RESEARCH GROUPS in ASIA, INDIA
Indian Institute of Science
Machine Learning and Learning Theory Group
Our research group focuses on the design and analysis of machine learning
algorithms, and on understanding the mathematical and statistical properties of
solutions to machine learning problems.
Members of the group have strong backgrounds in several areas including
probability, linear algebra, convex analysis, optimization, spectral graph theory, and
others, enabling us to explore problems from a variety of different viewpoints. Our
emphasis is on developing a strong fundamental understanding of various problems
of current interest in machine learning and statistical learning theory.
Some of our current research directions include designing and analyzing algorithms
for problems such as ranking and various types of structured prediction tasks,
understanding statistical consistency properties for such problems, exploring new
issues in machine learning such as those related to privacy, and selected
applications of machine learning in computational biology and medicine.
http://drona.csa.iisc.ernet.in/~mllt/

Indian Institute of Technology of Kanpur
https://www.google.com/search?q=machine%20learning&domains=iitk.ac.in&sitesearch=www.iitk.ac.in&gws_rd=ssl

More to come

MACHINE LEARNING RESEARCH GROUPS in ASIA, CHINA

Peking University
School of Electronics Engineering and Computer Science
We have built strong cooperation with many famous academic organizations, e.g.,
University of California at Berkeley, University of California at Los Angeles, Stanford
University, University of Illinois at Urbana-Champaign, Oxford University, University
of Edinburgh, Paris High Division, University of Tokyo, Waseda University.
These cooperation cover most of our research directions: from electronic
communication, optical communication, to quantum communication; from
computer hardware, software, to network; from micro-electromechanical system to
nano techniques; from machine perception to machine intelligence.
Center for Information Science
Main Research Areas

139
Machine Vision Image processing, image and video compression, pattern

recognition and machine learning, biometrics, 3-D visual informational processing.
Machine Audition Computational auditory models, speech signal processing,
spoken language processing, natural language processing, intelligent human-
machine interaction.
Intelligent Information Systems Computational intelligence, multimedia
resource organization and management, data mining and content-oriented massive
information integration, analysis, processing and service.
Physiology and Psychology for Machine Perception Electro-physiology,
psychophysics and neurophysiology of vision and audition, theories and methods of
hearing rehabilitation.
http://www.cis.pku.edu.cn/
http://eecs.pku.edu.cn/eecs_english/CnterInfoScience.shtml
Institute of Computational Linguistics
Main Research Areas
Comprehensive Language Knowledge Databases, including large scale word-
level information database for the Chinese language.
Corpus based NLP, including large scale corpus processing and statistical
models and theories.
Domain Knowledge Construction, including computational terminology and
term database construction.
Multilingual Semantic Lexicons, focusing on the study of a Chinese concept
dictionary.
Computer-aided Translation, focusing on translation methods for technical
documents.
Information Retrieval, Extraction and Summarization, including various levels
of docu ment processing such as document retrieval, topic extraction,
summarization, and question answering.
http://eecs.pku.edu.cn/index.aspx?menuid=5&type=articleinfo&lanmuid=84&infoid=232&language=cn
http://eecs.pku.edu.cn/eecs_english/InstComputationalLinguistics.shtml
PKU Real course online
http://www.grids.cn/

Beijing University of Technology
Beijing Key Lab of Multimedia and Intelligent Software Technology
Artificial Intelligence and Knowledge Engineering
The research fields in this direction include fundamental research of Knowledge
Science and Knowledge Engineering, research and application of Data Mining and
Machine Learning, and Knowledge-Based Computer Aided Animation Generation. In
those fields, the laboratory has performed 8 programs from National Natural
Science Foundation (including 1 subprogram of major research program of National
Natural Science Foundation), 1 program from Key Programs in the National Science
140
& Technology Pillar Program, 5 programs from 863 High-Tech Programs, 3

programs from Beijing Natural Science Foundation, and won the second prize
Advanced Science & Technology Award of Beijing twice.
http://bjut.edu.cn/bjut_en/detail.jsp?articleID=4171

University of Science and Technology of China, USTC
http://en.wikipedia.org/wiki/University_of_Science_and_Technology_of_China
Nanjing University
Lamda Group
LAMDA is affiliated with the National Key Laboratory for Novel Software
Technology and the Department of Computer Science & Technology, Nanjing
University, China. It locates at Computer Science and Technology Building in the
Xianlin campus of Nanjing University, mainly in Rm910. The Founding Director of
LAMDA is Prof. Zhi-Hua Zhou.
"LAMDA" means "Learning And Mining from DatA". The main research interests of
LAMDA include machine learning, data mining, pattern recognition, information
retrieval, evolutionary computation, neural computation, and some other related
areas. Currently our research mainly involves: ensemble learning, semi-supervised
and active learning, multi-instance and multi-label learning, cost-sensitive and class-
imbalance learning, metric learning, dimensionality reduction and feature selection,
structure learning and clustering, theoretical foundations of evolutionary
computation, improving comprehensibility, content-based image retrieval, web
search and mining, face recognition, computer-aided medical diagnosis,
bioinformatics, etc.
http://lamda.nju.edu.cn/MainPage.ashx

More to come

MACHINE LEARNING RESEARCH GROUPS in ASIA, RUSSIA

Moscow State University
http://www.msu.ru/

More to come

MACHINE LEARNING RESEARCH GROUPS in AFRICA

More to come


141
MACHINE LEARNING RESEARCH GROUPS in OCEANIA

NICTA Machine Learning Research Group, Australia
We want to change the world.
Machine learning is a powerful technology that can help solve almost any
problem. We think about it differently to much of the machine learning research
community.
We focus on important and challenging problems such as
Navigating the worlds patent literature
Finding sites for geothermal energy production
Predicting the output of rooftop solar photovoltaic systems
Building actionable data analytics for the enterprise
Managing the traffic in large cities
Predicting failures of widespread infrastructure
We develop new technologies to solve these problems and make them freely
available or commercially deploy them.
We regularly host visitors and regularly have job openings and opportunities for
PhD students. If you also want to change the world, come and join us.
http://www.nicta.com.au/research/machine_learning
http://nicta.com.au/research/machine_learning/research_topics

More to come


142
Academics (with free access to their publications), US

Stanford University, US
Andrew Ng
Andrew Ng is a Co-founder of Coursera and the Director of the Stanford AI Lab. In
2011 he led the development of Stanford Universitys main MOOC (Massive Open
Online Courses) platform and also taught an online Machine Learning class that was
offered to over 100,000 students, leading to the founding of Coursera.
Ngs goal is to give everyone in the world access to a high quality education, for free.
Today, Coursera partners with some of the top universities in the world to offer high
quality free online courses. It is the largest MOOC platform in the world.
Outside online education, Ngs work at Stanford is on machine learning with an
emphasis on deep learning. He also founded and led a project at Google to develop
massive-scale deep learning algorithms. It resulted in the famous cat detector
popularly known as the Google cat in which a massive neural network with 1
billion parameters learned from unlabeled YouTube videos.
http://cs.stanford.edu/people/ang/?page_id=414

Princeton University, US
Robert Schapire
Robert Elias Schapire is the David M. Siegel '83 Professor in the computer science
department at Princeton University. His primary specialty is theoretical and applied
machine learning.
His work led to the development of the boosting meta-algorithm used in machine
learning. Together with Yoav Freund, he invented the AdaBoost algorithm in 1996.
He received the Gdel prize in 2003 for his work on AdaBoost with Yoav Freund.
In 2014, Schapire was elected to the National Academy of Engineering for his
contributions to machine learning through the invention and development of
boosting algorithms.[1] (Source Wikipedia)
http://www.cs.princeton.edu/~schapire/
http://mitpress.mit.edu/sites/default/files/titles/content/9780262017183_sch_0001.pdf
Mona Singh
My group develops algorithms for a diverse set of problems in computational
molecular biology. We are particularly interested in predicting specificity in protein
interactions and uncovering how molecular interactions and functions vary across
context, organisms and individuals. We leverage high-throughput biological datasets
in order to develop data-driven algorithms for predicting protein interactions and
specificity; for analyzing biological networks in order to uncover cellular
organization, functioning, and pathways; for uncovering protein functions via
143
sequences and structures; and for analyzing proteomics and sequencing data. An
appreciation of protein structure guides much of our research.
http://www.cs.princeton.edu/~mona/
Olga Troyanskaya
The goal of my research is to bring the capabilities of computer science and
statistics to the study of gene function and regulation in the biological networks
through integrated analysis of biological data from diverse data sources--both
existing and yet to come (e.g. from diverse gene expression data sets and proteomic
studies). I am designing systematic and accurate computational and statistical
algorithms for biological signal detection in high-throughput data sets. More
specifically, I am interested in developing methods for better gene expression data
processing and algorithms for integrated analysis of biological data from multiple
genomic data sets and different types of data sources (e.g. genomic sequences, gene
expression, and proteomics data).
http://reducio.princeton.edu/cm/node/13
UCLA, US
Judea Pearl, Cognitive System Laboratory
Judea Pearl (born 1936) is an Israeli-born American computer scientist and
philosopher, best known for championing the probabilistic approach to artificial
intelligence and the development of Bayesian networks (see the article on belief
propagation). He is also credited for developing a theory of causal and
counterfactual inference based on structural models (see article on causality). He is
the 2011 winner of the ACM Turing Award, the highest distinction in computer
science, "for fundamental contributions to artificial intelligence through the
development of a calculus for probabilistic and causal reasoning". (source
Wikipedia)
http://bayes.cs.ucla.edu/csl_papers.html

Rice University, US
Justin Esarey Lectures, Assistant Professor of Political Science
Dr. Justin Esarey is an Assistant Professor of Political Science at Rice University who
specializes in political methodology. His areas of expertise include detecting and
presenting context-specific relationships, model specification and sensitivity, the
analysis of binary data, laboratory social experimentation, and promoting
thoughtful inference (and thinking about inference) by using technology to make
methodological resources available to the scholarly public. His recent substantive
projects study the relationship between corruption and female participation in
government, the effect of "naming and shaming" on human rights abuse, and the
behavioral implications of political ideology.
https://www.youtube.com/user/jeesarey/videos?spfreload=10
144
Justin Esarey Publications & Software, Assistant Professor of Political Science, Rice
University
http://jee3.web.rice.edu/research.htm

University of Maryland, US
Hal Daume III
I am Hal Daum III, an Associate Professor in Computer Science (also UMIACS and
Linguistics) at the University of Maryland; I was previously in the School of
Computing at the University of Utah (CV). Although I'd like to be known for my
research in language (computational linguistics and natural language processing)
and machine learning (structured prediction, domain adapation and Bayesian
methods), I am probably best known for my NLPers blog. I associate myself most
with conferences like ACL, ICML, EMNLP and NIPS. At UMD, I'm affiliated with the
Computational Linguistics lab, the machine learning reading group, the language
science program and the AI group, and interact closely with LINQS and computer
vision.
http://hal3.name

Academics (with free access to their publications), FRANCE

Ecole Normale Superieure, FRANCE

Francis Bach

Academics (with free access to their publications), UK

University College London, UK
John Shaw-Taylor
John S Shawe-Taylor is a professor at University College London (UK) where he is
Director of the Centre for Computational Statistics and Machine Learning
(CSML). His main research area is Statistical Learning Theory, but his contributions
range from Neural Networks, to Machine Learning, to Graph Theory.
John Shawe-Taylor obtained a PhD in Mathematics at Royal Holloway, University of
London in 1986. He subsequently completed an MSc in the Foundations of Advanced
145
Information Technology at Imperial College. He was promoted to Professor of

Computing Science in 1996. He has published over 150 research papers. He moved
to the University of Southampton in 2003 to lead the ISIS research group. He has
been appointed the Director of the Centre for Computational Statistics and Machine
Learning at University College, London from July 2006. He has coordinated a
number of European wide projects investigating the theory and practice of Machine
Learning, including the NeuroCOLT projects. He is currently the scientific
coordinator of a Framework VI Network of Excellence in Pattern Analysis, Statistical
Modelling and Computational Learning (PASCAL) involving 57 partners.
http://www0.cs.ucl.ac.uk/staff/J.Shawe-Taylor/
Mark Herbster
My research currently focuses on the problem of predicting a labeling of a graph.
This problem is foundational for transductive and semi-supervised learning. Initial
bounds and experimental results are given in Online learning over graphs. The
paper Prediction on a graph with a perceptron significantly improves on previous
results in terms of the tightness and interpretability of the bounds. In the recent
work A fast method to predict the labeling of a tree we've developed methods to
speed up graph prediction methods. I am also broadly interested in online learning,
see my publications page for more details.
http://www0.cs.ucl.ac.uk/staff/M.Herbster/pubs/
David Barber
David Barber received a BA in Mathematics from Cambridge University and
subsequently a PhD in Theoretical Physics (Statistical Mechanics) from Edinburgh
University. He is currently Reader in Information Processing in the department of
Computer Science UCL where he develops novel information processing schemes,
mainly based on the application of probabilistic reasoning. Prior to joining UCL he
was a lecturer at Aston and Edinburgh Universities.
http://web4.cs.ucl.ac.uk/staff/d.barber/publications/david_barber_online.html
Gabriel Brostow
My name is Gabriel Brostow, and I am an associate professor (Senior Lecturer) in
Computer Science here at UCL. My group explores research problems relating to
Computer Vision and Computer Graphics. The students and colleagues here have
diverse interests, but my focus is on "Smart Capture" for analysis and synthesis
applications. To me, smart capture of visual data (usually video) means having or
finding satisfying answers to these questions about a system, whether interactive or
fully automated:
I) Does the system know the intended purpose of the data being captured?
II) Can the system assess its own accuracy?
III) Does the system compare new inputs to old ones?

146
I love this field because it allows us to apply our expertise to a variety of tough
problems, including film and photo special effects (computational photography),
action analysis (of people, animals, and cells), and authoring systems (for
architecture, animation, presentations) that make the most of user effort. "Motion
reveals everything" used to be my main research mantra, but that has now taken
hold sufficiently (obviously NOT just through my efforts!) that it no longer needs
championing.
http://www0.cs.ucl.ac.uk/staff/g.brostow/#Research
Jun Wang
My research focus is on the areas of information retrieval, large scale data mining,
multimedia content analysis, and statistical pattern recognition; current research
covers both theoretical and practical aspects:
portfolio theory and statistical modeling of information retrieval,
data mining and collaborative filtering (recommendation),
web economy and online advertising,
user-centric information seeking,
social, the wisdom of crowds, approaches for content understanding, organisation,
and retrieval,
peer-to-peer information retrieval and filtering, and
multimedia content analysis, indexing and retrieval.
http://scholar.google.com/citations?user=wIE1tY4AAAAJ&hl=en
David Jones Lab
My main research interests are in protein structure prediction and analysis,
simulations of protein folding, Hidden Markov Model methods, transmembrane
protein analysis, machine learning applications in bioinformatics, de novo protein
design methodology, and genome analysis including the application of intelligent
software agents. New areas of research include the use of high throughput
computing and Grid technology for bioinformatics applications, analysis and
prediction of protein disorder, expression array data analysis and the analysis and
prediction of protein function and protein-protein interactions.
http://bioinf.cs.ucl.ac.uk/publications/
Simon Prince
My initial work addressed human stereo vision. My doctoral thesis concerned the
solution of the binocular stereo correspondence problem in the human visual
system. I also worked on the physiology of stereo vision in my subsequent post-
doctoral research.
I became interested in computer vision and made the switch in 2000. My first
Computer Science research was on time-series methods for the solution of the
inverse problem in Optical Tomography with Simon Arridge at UCL. In Singapore, I
worked for several years on augmented reality. This involved developing algorithms
147
for camera pose estimation, and a three-dimensional video-conferencing system

using real-time image based rendering.
More recently, I have worked on face detection in a novel foveated sensor system. I
am interested in face recognition in general and have presented work on how to
recognize faces in the presence of large pose and lighting changes.
I am interested in most areas of computer vision and computer graphics, and still
maintain active links with the neuroscience and medical imaging communities.
http://web4.cs.ucl.ac.uk/research/vis/pvl/
http://www.computervisionmodels.com
Massimiliano Pontil
I am mainly interested in machine learning theory and pattern recognition. I have
also some interest in function representation and approximation, numerical
optimization and statistics. I have worked on different machine learning
approaches, particularly on regularization methods, such as support vector
machines and other kernel-based methods, multi-task and transfer learning, online
learning and learning over graphs. I have also worked on machine learning
applications arising in computer vision, natural language processing, bioinformatics
and user modeling.
http://www0.cs.ucl.ac.uk/staff/M.Pontil/pubs.html

Cambridge University, UK
Richard E Turner
Richard Turner holds a Lectureship (equivalent to US Assistant Professor) in
Computer Vision and Machine Learning in the Computational and Biological
Learning Lab, Department of Engineering, University of Cambridge, UK. Before
taking up this position, he held an EPSRC Postdoctoral research fellowship which he
spent at both the University of Cambridge and the Laboratory for Computational
Vision, NYU, USA. He has a PhD degree in Computational Neuroscience and Machine
Learning from the Gatsby Computational Neuroscience Unit, UCL, UK and a M.Sci.
degree in Natural Sciences (specialism Physics) from the University of Cambridge,
UK.
http://scholar.google.com/citations?user=DgLEyZgAAAAJ&hl=en

Oxford University, UK
Phil Blunsom
My research interests lie at the intersection of machine learning and computational
linguistics. I apply machine learning techniques, such as graphical models, to a range
148
of problems relating to the understanding, learning and manipulation of language.

Recently I have focused on structural induction problems such as grammar
induction and learning statistical machine translation models
http://scholar.google.co.uk/citations?user=eJwbbXEAAAAJ&hl=en
Nando de Freitas
I want to understand intelligence and how minds work. My research is multi-
disciplinary and focuses primarily on the following areas:
Machine learning, big data, and computational statistics
Artificial intelligence, probabilistic reasoning, and decision making
Computational neuroscience, neural networks, and cognitive science
Randomized algorithms, and Monte Carlo simulation
Vision, robotics, and speech perception
http://scholar.google.co.uk/citations?user=nzEluBwAAAAJ&hl=en
Karl Hermann
My research is at the intersection of Natural Language Processing and Machine
Learning, with particular emphasis on semantics. Current topics of interest include:
Compositional Semantics
Learning from Multilingual Data
Semantic Frame Identification
Machine Translation
Hypergraph Grammars
http://www.cs.ox.ac.uk/people/publications/personal/KarlMoritz.Hermann.html
Edward Grefenstette
I am a Franco-American computer scientist, working as a research assistant on
EPSRC Project EP/I03808X/1 entitled A Unified Model of Compositional and
Distributional Semantics: Theory and Applications. I am also lecturing at Hertford
College to students taking Oxford's new computer science and philosophy course.
From October 2013, I will also be a Fulford Junior Research Fellow at Somerville
College.
http://www.cs.ox.ac.uk/people/publications/date/Edward.Grefenstette.html

Delft University of Technology, NETHERLANDS
Thomas Geijtenbeek Publications & Videos
I am a postdoctoral researcher at Delft University of Technology. My main research
interests are simulation, control, animation and artificial intelligence. In addition, I
work part-time as Manager Software Development at Motek Medical.
http://goatstream.com/research/

149
Academics (with free access to their publications), CANADA

University of Montreal, CANADA

Yoshua Bengio
My long-term goal is to understand intelligence; understanding the underlying
principles would deliver artificial intelligence, and I believe that learning algorithms
are essential in this quest.
Machine learning algorithms attempt to endow machines with the ability to capture
operational knowledge through examples, e.g., allowing a machine to classify or
predict correctly in new cases. Machine learning research has been extremely
successful in the past two decades and is now applied in many areas of science and
technology, some well known examples including web search engines, natural
language translation, speech recognition, machine vision, and data-mining. Yet,
machines still seem to fall short of even mammal-level intelligence in many respects.
One of the remaining frontiers of machine learning is the difficulty of learning the
kind of complicated and highly-varying functions that are necessary to perform
machine vision or natural language processing tasks at a level comparable to
humans (even a 2-year old).
See my lab's long-term vision web page for a broader introduction. An introductory
discussion of recent and ongoing research is below. See the lab's publications site
for a downloadable and complete bibliographic list of my papers.
http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html
http://www.iro.umontreal.ca/~bengioy/yoshua_en/

University of Toronto, CANADA
Geoffrey Hinton
I design learning algorithms for neural networks. My aim is to discover a learning
procedure that is efficient at finding complex structure in large, high-dimensional
datasets and to show that this is how the brain learns to see. I was one of the
researchers who introduced the back-propagation algorithm that has been widely
used for practical applications. My other contributions to neural network research
include Boltzmann machines, distributed representations, time-delay neural nets,
mixtures of experts, variational learning, contrastive divergence learning, dropout,
and deep belief nets. My students have changed the way in which speech recognition
and object recognition are done.
I now work part-time at Google and part-time at the University of Toronto.
http://www.cs.toronto.edu/~hinton/papers.html
http://www.cs.toronto.edu/~hinton/


150
Universite de Sherbrooke, CANADA

Hugo Larochelle
Je m'intresse aux algorithmes d'apprentissage automatique, soit aux algorithmes
capables d'extraire des concepts ou patrons partir de donnes. Mes travaux se
concentrent sur le dveloppement d'approches connexionnistes et probabilistes
diverses problmes d'intelligence artificielle, tels la vision artificielle et le traitement
automatique du langage.
Les thmes de recherche auxquels je m'intresse incluent:
Problmes: apprentissage supervis, semi-supervis et non-supervis, prdiction
de cibles structures, ordonnancement, estimation de densit;
Modles: rseaux de neurones profonds (deep learning), autoencodeurs,
machines de Boltzmann, champs Markoviens alatoires;
Applications: reconnaissance et suivi d'objects, classification et ordonnancement
de documents.
http://www.dmi.usherb.ca/~larocheh/index_fr.html
http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html

University of British Columbia, CANADA
Great access to all publications of the faculty members
Giuseppe Carenini
http://www.cs.ubc.ca/%7Ecarenini/storage/new-papers-frame.html

Cristina Conati
http://www.cs.ubc.ca/~conati/publications.php

Kevin Leyton-Brown
http://www.cs.ubc.ca/~kevinlb/publications.html

Holger Hoos
http://www.cs.ubc.ca/~hoos/publications.html

Jim Little
http://www.cs.ubc.ca/~little/links/papers.html

David Lowe
http://www.cs.ubc.ca/~lowe/pubs.html

151
Karon MacLean
http://www.cs.ubc.ca/labs/spin/publications/index.html

Alan Mackworth
http://www.cs.ubc.ca/~mack/Publications/sort_date.html

Dinesh K. Pai
http://www.cs.ubc.ca/~pai/

David Poole
http://www.cs.ubc.ca/~poole/publications.html

Academics (with free access to their publications), CHINA

USPC, CHINA
En-Hong Chen
My current research interests are data mining and machine learning, especially
social network analysis and recommender systems. I have published more than 100
papers on many journals and conferences, including international journals such as
IEEE Trans, ACM Trans, and important data mining conferences, such as KDD, ICDM,
NIPS. My research is supported by the National Natural Science Foundation of China,
National High Technology Research and Development Program 863 of China, etc. I
won the Best Application Paper Award on KDD2008 and Best Research Paper
Award on ICDM2011.
http://staff.ustc.edu.cn/~cheneh/#pub

Linli Xu
My research area is Machine Learning. More specifically, my work combines
aspects from the following:
Unsupervised learning and semi-supervised learning, clustering
Large margin approaches, support vector machines
Optimization, convex programming
http://staff.ustc.edu.cn/~linlixu/papers.html
University of Beijing, CHINA
Yuan Yao, School of Mathematical Sciences
152
My most recent interests are focusing on mathematics for data sciences, in

particular topological and geometric methods for high dimensional data analysis
and statistical machine learning, with applications in computational biology and
information technology.
Publications and code to reproduce results
http://www.math.pku.edu.cn/teachers/yaoy/research.html

Academics (with free access to their publications), RUSSIA

Moscow State University, RUSSIA
Dmitry Efimov
Dmitry is an expert in promising areas of modern complex and functional analysis;
the author of original results. He begins with the systematic study of some classes of
analytic functions in the half-plane that are analogous to the well-known Privalov
classes and maximal Privalov classes in the disc. His main results are the
following: 1) A new factorization formula and accurate estimates of growth for
functions in these classes; 2) The introduction of natural invariant metrics under
which the classes form Frecher algebras; 3) A complete description of the linear
isometries as well as the bounded and completely bounded subsets in the classes.
http://mech.math.msu.su/~efimov/indexe.php
https://www.kaggle.com/users/29346/dmitry-efimov

Academics (with free access to their publications), POLAND

University of Warsaw, POLAND
Marcin Murca
I am an assistant professor at the Institute of Informatics, University of Warsaw,
member of the Algorithms Group (see our blog!).
I work on graph algorithms, approximation algorithms and on-line algorithms you
can find most of my papers at DBLP or here.
You can find my PhD Thesis here it contains a rather detailed exposition of the
algebraic approach to matching problems in graphs.
http://duch.mimuw.edu.pl/~mucha/wordpress/?page_id=58


153
Academics (with free access to their publications),

SWITZERLAND

Prof. Jrgen Schmidhuber's Home Page (Great resources! Not to be missed!)
Prof. Jrgen Schmidhuber's Artificial Intelligence team has won nine international
competitions in machine learning and pattern recognition (more than any other AI
research group) and seven independent best paper/best video awards, achieved the
world's first superhuman visual classification results, Deep Learning since 1991 -
Winning Contests in Pattern Recognition and Sequence Learning Through Fast &
Deep / Recurrent Neural Networks has pioneered Deep Learning methods for
Artificial Neural Networks since 1991, and established the field of mathematically
rigorous universal AI and optimal universal problem solvers. His formal theory of
creativity & curiosity & fun explains art, science, music, and humor. He generalized
algorithmic information theory, and the many-worlds theory of physics, to obtain a
minimal theory of all constructively computable universes - an elegant algorithmic
theory of everything. Google & Apple and many other leading companies are now
using the machine learning techniques developed in his group at the Swiss AI Lab
IDSIA & USI & SUPSI (ex-TUM CogBotLab). Since age 15 or so his main scientific
ambition has been to build an optimal scientist through self-improving AI, then
retire. Progress is accelerating - are 40,000 years of human-dominated history
about to converge within the next few decades?
http://people.idsia.ch/~juergen/

Free access to a list of Machine Learning MSc/PhD Dissertations

Machine Learning Department, Carnegie Mellon University
Added in the kit before 18-Nov-2014
https://www.ml.cmu.edu/research/phd-dissertations.html
Machine Learning Department, Columbia University
(Search for PhD on the page)
http://www.cs.columbia.edu/learning/papers.html
PhD Dissertations, University of Edingburgh, UK
https://www.era.lib.ed.ac.uk/handle/1842/3389/browse?type=dateissued&sort_b
y=2&order=DESC&rpp=20&etal=0&submit_browse=Update
MSc Dissertations, University of Oxford, UK
https://www.cs.ox.ac.uk/admissions/grad/A_list_of_some_recent_theses_that_recei
ved_high_marks
154
Machine Learning Group, Department of Engineering, University of Cambridge, UK

(Search for PhD on the page)
http://mlg.eng.cam.ac.uk/pub/


155

Machinelearningsalon Kit 28-12-2014

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Machinelearningsalon Kit 28-12-2014

Caricato da

Copyright:

Formati disponibili

machinelearningsalon.

machinelearningsalon kit 28th December 2014

machinelearningsalon kit 28th December 2014

MOOC or Opencourseware - Spanish ......................................................................... 39

MOOC or Opencourseware - Russian ......................................................................... 41

MOOC or Opencourseware Hebrew&English ........................................................... 44

machinelearningsalon kit 28th December 2014

Visualizing MBTA Data: An interactive exploration of Boston's subway system ....................... 46

Commercial Applications (listed without any transfer of money) ............................... 47

Big Data/Cloud Computing English .......................................................................... 63

Apache Mahout ML library ................................................................................................................................. 65

Predictive Modeling Competitions English ............................................................... 69

Predictive Modeling Competitions - Portuguese ......................................................... 76

Data Visualisation ...................................................................................................... 86

Books English .......................................................................................................... 89

machinelearningsalon kit 28th December 2014

Machine Learning by Tom Mitchell, 2005 .................................................................................................. 100

Books - Spanish ........................................................................................................ 102

Archive ...................................................................................................................................................................... 105

Meetup - English ...................................................................................................... 107

Blog - Russian ........................................................................................................... 114

Zhihu.com ................................................................................................................................................................. 119

Governmental Reports - English ............................................................................... 120

machinelearningsalon kit 28th December 2014

Oxford University .................................................................................................................................................. 133

Academics (with free access to their publications), US ............................................. 143

Academics (with free access to their publications), UK ............................................. 145

University of Warsaw, POLAND ..................................................................................................................... 153

Academics (with free access to their publications), SWITZERLAND ........................... 154

machinelearningsalon kit 28th December 2014

How to use the Machine Learning Salons Kit?

What is the Machine Learning Salons Kit?

If you want to add a better description of your website

machinelearningsalon kit 28th December 2014

MOOC or Opencourseware English

clustering and dimensionality reduction. Throughout the class there will be an

Introduction to Computer Science

machinelearningsalon kit 28th December 2014

Natural Language Processing

Linear Systems and Optimization

Articifial Intelligence (BerkeleyX)

emphasis will be on the statistical and decisiontheoretic modeling paradigm. By

Big Data and Social Physics (Ethics)

Introduction to Computational Thinking and Data Science

perspectives. We cover topics such as Bayesian networks, decision tree learning,

We will be using Python for all programming assignments and projects.

University College London, Supervised Learning

Yann LeCuns Publications

Technion, Israel Institute of Technology, Machine Learning Videos

Reinforcement Learning (select "View Courseware" for free access)

machinelearningsalon kit 28th December 2014

Hong Kong Open Source Conference 2013 (English&Chinese)

ICLR 2013 Videos

Sixth Annual Machine Learning Symposium

1st Lisbon Machine Learning School

Copulas in Machine Learning Workshop 2011

NIPS 2011 Workshop on Integrating Language and Vision

Machine Learning in Computational Biology (MLCB) 2011

machinelearningsalon kit 28th December 2014

Learning Semantics Workshop

Sparse Representation and Low-rank Approximation

Big Learning: Algorithms, Systems, and Tools for Learning at

ICML 2012 Oral Talks (International Conference on Machine

Tutorial on Causal inference - conditional independences and