Sei sulla pagina 1di 24

09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Turn Python Scripts into Beautiful ML


Tools
Introducing Streamlit, an app framework built for ML engineers
Adrien Treuille Follow
Oct 1 · 7 min read

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 1/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Coding a semantic search engine with real-time neural-net inference in 300 lines of Python.

In my experience, every nontrivial machine learning project is eventually


stitched together with bug-ridden and unmaintainable internal tools. These
tools — often a patchwork of Jupyter Notebooks and Flask apps — are
difficult to deploy, require reasoning about client-server architecture, and
don’t integrate well with machine learning constructs like Tensorflow GPU
sessions.

I saw this first at Carnegie Mellon, then at Berkeley, Google X, and finally
while building autonomous robots at Zoox. These tools were often born as
little Jupyter notebooks: the sensor calibration tool, the simulation
comparison app, the LIDAR alignment app, the scenario replay tool, and so
on.

As a tool grew in importance, project managers stepped in. Processes


sprouted. Requirements flowered. These solo projects gestated into scripts,
and matured into gangly maintenance nightmares.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 2/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

The machine learning engineers’ ad-hoc app building ow.

When a tool became crucial, we called in the tools team. They wrote
fluent Vue and React. They blinged their laptops with stickers about
declarative frameworks. They had a design process:

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 3/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

The tools team’s clean-slate app building ow.

Which was awesome. But these tools all needed new features, like weekly.
And the tools team was supporting ten other projects. They would say,
“we’ll update your tool again in two months.”

So we were back to building our own tools, deploying Flask apps, writing
HTML, CSS, and JavaScript, and trying to version control everything from
notebooks to stylesheets. So my old Google X friend, Thiago Teixeira, and I
began thinking about the following question: What if we could make
building tools as easy as writing Python scripts?

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 4/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

We wanted machine learning engineers to be able to create beautiful apps


without needing a tools team. These internal tools should arise as a natural
byproduct of the ML workflow. Writing such tools should feel like training a
neural net or performing an ad-hoc analysis in Jupyter! At the same time,
we wanted to preserve all of the flexibility of a powerful app framework. We
wanted to create beautiful, performant tools that engineers could show off.
Basically, we wanted this:

The Streamlit app building ow.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 5/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

With an amazing beta community including engineers from Uber, Twitter,


Stitch Fix, and Dropbox, we worked for a year to create Streamlit, a
completely free and open source app framework for ML engineers. With
each prototype, the core principles of Streamlit became simpler and purer.
They are:

#1: Embrace Python scripting. Streamlit apps are really just scripts that
run from top to bottom. There’s no hidden state. You can factor your code
with function calls. If you know how to write Python scripts, you can write
Streamlit apps. For example, this is how you write to the screen:

import streamlit as st

st.write('Hello, world!')

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 6/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Nice to meet you.

#2: Treat widgets as variables. There are no callbacks in Streamlit! Every


interaction simply reruns the script from top to bottom. This approach leads
to really clean code:

import streamlit as st

x = st.slider('x')
st.write(x, 'squared is', x * x)

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 7/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

An interactive Streamlit app in three lines of code.

#3: Reuse data and computation. What if you download lots of data or
perform complex computation? The key is to safely reuse information across
runs. Streamlit introduces a cache primitive that behaves like a persistent,
immutable-by-default, data store that lets Streamlit apps safely and
effortlessly reuse information. For example, this code downloads data only
once from the Udacity Self-driving car project, yielding a simple, fast app:
https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 8/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

1 import streamlit as st
2 import pandas as pd
3
4 # Reuse this data across runs!
5 read_and_cache_csv = st.cache(pd.read_csv)
6
7 BUCKET = "https://streamlit-self-driving.s3-us-west-2.amazonaws.com/"
8 data = read_and_cache_csv(BUCKET + "labels.csv.gz", nrows=1000)
9 desired_label = st.selectbox('Filter to:', ['car', 'truck'])
10 st.write(data[data.label == desired_label])

cache_example.py hosted with ❤ by GitHub view raw

Using st.cache to persist data across Streamlit runs. To run this code, please follow these instructions.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 9/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

The output of running the st.cache example above.

In short, Streamlit works like this:

1. The entire script is run from scratch for each user interaction.

2. Streamlit assigns each variable an up-to-date value given widget states.

3. Caching allows Streamlit to skip redundant data fetches and


computation.

Or in pictures:

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 10/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

User events trigger Streamlit to rerun the script from scratch. Only the cache persists across runs.

If this sounds intriguing, you can try it right now! Just run:

$ pip install --upgrade streamlit


$ streamlit hello

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501


Network URL: http://10.0.1.29:8501

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 11/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

This will automatically pop open a web browser pointing to your local
Streamlit app. If not, just click the link.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 12/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

To see more examples like this fractal animation, run streamlit hello from the command line.

. . .

Ok. Are you back from playing with fractals? Those can be mesmerizing.

The simplicity of these ideas does not prevent you from creating incredibly
rich and useful apps with Streamlit. During my time at Zoox and Google X, I
watched as self-driving car projects ballooned into gigabytes of visual data,
which needed to be searched and understood, including running models on
images to compare performance. Every self-driving car project I’ve seen
eventually has had entire teams working on this tooling.

Building such a tool in Streamlit is easy. This Streamlit demo lets you
perform semantic search across the entire Udacity self-driving car photo
dataset, visualize human-annotated ground truth labels, and run a
complete neural net (YOLO) in real time from within the app [1].

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 13/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

This 300-line Streamlit demo combines semantic visual search with interactive neural net inference.

The whole app is a completely self-contained, 300-line Python script, most


of which is machine learning code. In fact, there are only 23 Streamlit calls
in the whole app. You can run it yourself right now!

$ pip install --upgrade streamlit opencv-python


$ streamlit run
https://raw.githubusercontent.com/streamlit/demo-self-
driving/master/app.py

. . .

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 14/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

As we worked with machine learning teams on their own projects, we came


to realize that these simple ideas yield a number of important benefits:

Streamlit apps are pure Python files. So you can use your favorite editor
and debugger with Streamlit.

My favorite layout for writing Streamlit apps has VSCode on the left and Chrome on the right.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 15/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Pure Python scripts work seamlessly with Git and other source control
software, including commits, pull requests, issues, and comments. Because
Streamlit’s underlying language is pure Python, you get all the benefits of
these amazing collaboration tools for free 🎉.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 16/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Because Streamlit apps are just Python scripts, you can easily version control them with Git.

Streamlit provides an immediate-mode live coding environment. Just


click Always rerun when Streamlit detects a source file change.

Click “Always rerun” to enable live coding.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 17/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Caching simplifies setting up computation pipelines. Amazingly,


chaining cached functions automatically creates efficient computation
pipelines! Consider this code adapted from our Udacity demo:

1 import streamlit as st
2 import pandas as pd
3
4 @st.cache
5 def load_metadata():
6 DATA_URL = "https://streamlit-self-driving.s3-us-west-2.amazonaws.com/labels.csv.gz"
7 return pd.read_csv(DATA_URL, nrows=1000)
8
9 @st.cache
10 def create_summary(metadata, summary_type):
11 one_hot_encoded = pd.get_dummies(metadata[["frame", "label"]], columns=["label"])
12 return getattr(one_hot_encoded.groupby(["frame"]), summary_type)()
13
14 # Piping one st.cache function into another forms a computation DAG.
15 summary_type = st.selectbox("Type of summary:", ["sum", "any"])
16 metadata = load_metadata()
17 summary = create_summary(metadata, summary_type)
18 st.write('## Metadata', metadata, '## Summary', summary)

caching_DAG_example.py hosted with ❤ by GitHub view raw

A simple computation pipeline in Streamlit. To run this code, please follow these instructions.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 18/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Basically, the pipeline is load_metadata → create_summary. Every time the


script is run Streamlit only recomputes whatever subset of the pipeline
is required to get the right answer. Cool!

To make apps performant, Streamlit only recomputes whatever is necessary to update the UI.

Streamlit is built for GPUs. Streamlit allows direct access to machine-level


primitives like TensorFlow and PyTorch and complements these libraries.
For example in this demo, Streamlit’s cache stores the entire NVIDIA
celebrity face GAN [2]. This approach enables nearly instantaneous
inference as the user updates sliders.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 19/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

This Streamlit app demonstrates NVIDIA celebrity face GAN [2] model using Shaobo Guan’s TL-GAN [3].

Streamlit is a free and open-source library rather than a proprietary


web app. You can serve Streamlit apps on-prem without contacting us. You
can even run Streamlit locally on a laptop without an Internet connection!
Furthermore, existing projects can adopt Streamlit incrementally.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 20/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Several ways incrementally adopt Streamlit. (Icons courtesy of fullvector / Freepik.)

. . .

This just scratches the surface of what you can do with Streamlit. One of the
most exciting aspects of Streamlit is how these primitives can be easily
composed into complex apps that look like scripts. There’s a lot more we
could say about how our architecture works and the features we have
planned, but we’ll save that for future posts.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 21/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

Block diagram of Streamlit’s components. More coming soon!

We’re excited to finally share Streamlit with the community today and see
what you all build with it. We hope that you’ll find it easy and delightful to
turn your Python scripts into beautiful ML apps.

. . .

Thanks to Amanda Kelly, Thiago Teixeira, TC Ricks, Seth Weidman, Regan


Carey, Beverly Treuille, Geneviève Wachtell, and Barney Pell for their helpful
input on this article.

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 22/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

References:

[1] J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement


(2018), arXiv.

[2] T. Karras, T. Aila, S. Laine, and J. Lehtinen, Progressive Growing of GANs


for Improved Quality, Stability, and Variation (2018), ICLR.

[3] S. Guan, Controlled image synthesis and editing using a novel TL-GAN
model (2018), Insight Data Science Blog.

Thanks to TC Ricks, Amanda Kelly, and Amanda Kelly.

Machine Learning Data Science Deep Learning Autonomous Vehicles Python

Discover Medium Make Medium yours Become a member


Welcome to a place where words matter. Follow all the topics you care about, and Get unlimited access to the best stories on
On Medium, smart voices and original we’ll deliver the best stories for you to your Medium — and support writers while
ideas take center stage - with no ads in homepage and inbox. Explore you’re at it. Just $5/month. Upgrade
sight. Watch

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 23/24
09/10/2019 Turn Python Scripts into Beautiful ML Tools - Towards Data Science

About Help Legal

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace 24/24

Potrebbero piacerti anche