Sei sulla pagina 1di 5

Notes on Prediction.

io
Created 04/22/14
Updated 06/28/14, Updated 09/21/14, Updated 11/21/14, Updated 02/10/15

Introduction
PredictionIO is an open source machine learning server for software developers to create predictive features, such
as personalization, recommendation and content discovery.
Their goal is to be the MySQL or LAMP Stack of Machine Learning and Analytics.
Examples of use:

Le Tote, a clothing subscription/rental service that is using PredictionIO to predict customers fashion preferences.
PerkHub, which is using PredictionIO to personalize product recommendations in the weekly group buying emails they send out.

Current version is 0.86. The download was 151MB.

Features
The server is written in Scala and runs on Spark. As a complete example, it includes many elements of Hadoop
and Mahout (however, the Prediction.io marketing pitch is slowly changing from being a replacement for Hadoop
to being an easy implementation of Spark).
Recommendation engine example
cli = predictionio.Client("<my key>")
cli.identify("John")
cli.record_action_on_item("view", "HackerNews" )
# predict top preferences near a specified location
r = cli.get_itemrec_topn("myEngine", 5, {"pio_latlng":[37.9, 91.2]})

Algorithms Supported
Item recommendation
Item similarity
Item rank
The implementations are from MLlib in Spark, and including Naive Bayes and ALS.

Company Information
Formed in early 2013. Pivoted in late 2013, got next funding in mid-2014. Located in Palo Alto and somewhere
in the UK.
The company competes with closed black box MLaaS services or software, such as Google Prediction API,
Wise.io, BigML, and Skytree. However, since Prediction.io is open and extensible, with a developer community,
the company feels that it has an advantage.
The problem PredictionIO is setting out to solve is that building Machine Learning into products is expensive and
time-consuming and in some instances is only really within the reach of major and heavily-funded tech
companies, such as Google or Amazon, who can afford a large team of PhDs/data scientists. By utilizing the
startups open source Machine Learning server, startups or larger enterprises no longer need to start from scratch,
while also retaining control over the source code and the way in which PredictionIO integrates with their existing
wares.

People
Simon Chan, CEO (was at UMich, then startups in China, then UCL)
Donald Szeto, CTO (Stanford, UC Berkeley)

Page 1 of 5

Kennieth Chan, engineer (UCB)


Thomas Stone (VP Sales) (Cornell, University College London)

Funding
Raised $2.5M in July 2014, from the following list: StartX, XG Ventures (founded by ex-Googlers), Sood
Ventures, Ironfire Capital (activist investor firm), Quest Venture Partners (Menlo Park), Azure Capital Partners
(San Francisco and Menlo Park).

Business Model
There was no discussion of pricing for the server, or pricing for service/support.

Architecture of the PredictionIO server


PredictionIO is mainly built with Scala. Scala runs on the JVM, so Java and Scala stacks can be freely mixed for
totally seamless integration. PredictionIO Server consists of a few components:
Admin Server
IO Server
Scheduler
Data Store
Data Processing Stack

The DASE Concept their counterpart of MVC


PredictionIO's DASE architecture brings the separation-of-concerns design principle to predictive engine
development. DASE stands for the following components of an engine:

Data - includes Data Source and Data Preparator


Algorithm(s)
Serving
Evaluator

As you can see from the Quick Start, MyRecommendation takes a JSON prediction query, e.g.{"user":
"1","num":4}, and return a JSON predicted result. In MyRecommendation/src/main/scala/Engine.scala,
the Query case class defines the format of such query:
1
2
3
4

case class Query(


user: String,
num: Int
) extends Serializable

The PredictedResult case class defines the format of predicted result, such as

Page 2 of 5

1
2
3
4
5
6

{"itemScores":[
{"item":22,"score":4.07},
{"item":62,"score":4.05},
{"item":75,"score":4.04},
{"item":68,"score":3.81}
]}

with:
1
2
3
4
5
6
7
8

case class PredictedResult(


itemScores: Array[ItemScore]
) extends Serializable
case class ItemScore(
item: String,
score: Double
) extends Serializable

Finally, RecommendationEngine is the Engine Factory that defines the components this engine will use:
Data Source, Data Preparator, Algorithm(s) and Serving components.
1
2
3
4
5
6
7
8
9
10

object RecommendationEngine extends IEngineFactory {


def apply() = {
new Engine(
classOf[DataSource],
classOf[Preparator],
Map("als" -> classOf[ALSAlgorithm]),
classOf[Serving])
}
...
}

Spark's MLlib ALS algorithm takes training data of RDD type, i.e. RDD[Rating] and train a model, which is
a MatrixFactorizationModel object.
The PredictionIO Recommendation Engine Template, which MyRecommendation is based on, integrates this
algorithm under the DASE architecture.

Data Processing Stack


Built on top of solid data frameworks and technology, such as Hadoop, Cascading, Scalding and Mahout,
PredictionIO can handle a huge amount of data efficiently. A variety of machine learning algorithms are available
for you to implement with just a few clicks.

Admin Server
PredictionIO's Admin Server component provides a web interface for developers to manage applications, engines
and algorithms. It is built on top of Play Framework.

IO Server
IO Server offers scalable REST API services to communicate with your web or mobile app. It is responsible for
handling data input and prediction output. It is built on top of Play Framework.

Page 3 of 5

Scheduler
A scalable scheduler that can be used to manage schedules for executing tens, hundreds, or even tens-of-thousands
of jobs. Quartz is the default scheduler.

Data Store
Data store manages the collected data, the predictive model and the cached prediction results. MongoDB is the
default data store.

Documentation

Android and Java SDK Endpoints


There are commands to send information, to request recalc, and to request results.

Page 4 of 5

PHP API

Delivery in the Cloud


There are EC2 instances which can be spun up preconfigured for Prediction.io
https://aws.amazon.com/marketplace/pp/B00ECGJYGE
For usage information, see http://docs.prediction.io/current/installation/install-predictionio-on-aws.html

Developer Community
There is a forum at https://groups.google.com/forum/#!forum/predictionio-user
The developer community of PredictionIO supports a number of projects. To list a project on their site, please
contact them or do a pull request through PredictionIO Docs Project.
In early 2015, the CEO said they had over 300 developers in their ecosystem.

Questions and Open Issues


How does the server store and manage trained models?
What data sources can be integrated?

Chronology
Late Spring 2014: Learned about this tool
Summer 2014: Initial Evaluation
Fall 2014: Started another round of evaluation, since it was clearer that they were providing a server, and that the
server used Scala / Spark. They were developing templates which captured usage patterns. Also, they were
working to create a developer ecosystem.
Presentation at Predictive APIs conference in November 2014
http://www.slideshare.net/predictionio/predictionio-the-1st-international-conference-on-predictive-apis-and-apps
This was not very technical.
02/09/15: Went to presentation by CEO, hosted by Scala Bay group. The presentation summarized what we
mostly knew, but gave a number of key directions, such as they have developed usage templates that greatly
improve the ease of learning. The presentation is available at https://www.youtube.com/watch?
v=EUDHFOyUumE&feature=youtu.be

Page 5 of 5

Potrebbero piacerti anche