Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Practical guide to
hyperparameters search
for deep learning models
Unlike machine learning models, deep learning models are literally full
of hyperparameters. Would you like some some evidence? Just take a
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 1/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
look
FloydHub at the
Blog Transformer base v1 hyperparameters definition.Share this
I rest my case.
Of course, not all of these variables contribute in the same way to the
model's learning process, but, given this additional complexity, it's clear
that finding the best configuration for these variables in such a high
dimensional space is not a trivial challenge.
Luckily, we have different strategies and tools for tackling the searching
problem. Let's dive in!
Our Goal
How?
Why?
Every scientist and researcher wants the best model for the task given the
available resources: 💻, 💰 and ⏳ (aka compute, money, and time).
Effective hyperparameter search is the missing piece of the puzzle that
will help us move towards this goal.
When?
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 2/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Hyperparameter
FloydHub Blog search is also common as a stage or component
Share this
in a semi/fully automatic deep learning pipeline. This is,
obviously, more common among data science teams at
companies.
Hyperparameters are the knobs that you can turn when building
your machine / deep learning model.
Or, alternatively:
We can likely agree that the Learning Rate and the Dropout Rate are
considered hyperparameters, but what about the model design variables?
These include embeddings, number of layers, activation function, and so
on. Should we consider these variables as hyperparameters?
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 3/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
For simplicity's sake, yes – we can also consider the model design
components as part of the hyperparameters set.
Finally, how about the parameters obtained from the training process –
the variables learned from the data? These weights are known as model
parameters. We'll exclude them from our hyperparameter set.
Okay, let's try a real-world example. Take a look at the picture below for
an example illustrating the different classifications of variables in a deep
learning model.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 4/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
We'll keep going like this until we reach a terminating condition (such as
running out of ⏳ or 💰).
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 5/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Grid Search
Random Search
Bayesian Optimization
Babysitting
Babysitting is also known as Trial & Error or Grad Student Descent in
the academic field. This approach is 100% manual and the most widely
adopted by researchers, students, and hobbyists.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 6/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Babysitting
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 7/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Grid Search
Taken from the imperative command "Just try everything!" comes Grid
Search – a naive approach of simply trying every possible configuration.
Search for all the possible configurations and wait for the results
to establish the best one: e.g. C1 = (0.1, 0.3, 4) -> acc = 92%, C2
= (0.1, 0.35, 4) -> acc = 92.3%, etc...
The image below illustrates a simple grid search on two dimensions for
the Dropout and Learning rate.
does
FloydHub mean
Blog is that the more computational resources 💻 you Share
havethis
available, then the more guesses you can try at the same time!
It's common to use this approach when the dimensions are less than or
equal to 4. But, in practice, even if it guarantees to find the best
configuration at the end, it's still not preferable. Instead, it's better to use
Random Search — which we'll discuss next.
Run on FloydHub
model = KerasClassifier(build_fn=create_model)
# GridSearch in action
grid = GridSearchCV(estimator=model,
param_grid=param_grid,
n_jobs=,
cv=,
verbose=)
grid_result = grid.fit(x, y)
Random Search
A few years ago, Bergstra and Bengio published an amazing paper where
they demonstrated the inefficiency of Grid Search.
The only real difference between Grid Search and Random Search is on
the step 1 of the strategy cycle – Random Search picks the point
randomly from the configuration space.
Let's use the image below (provided in the paper) to show the claims
reported by the researchers.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 10/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
In the Grid Layout, it's easy to notice that, even if we have trained 9
models, we have used only 3 values per variable! Whereas, with the
Random Layout, it's extremely unlikely that we will select the same
variables more than once. It ends up that, with the second approach, we
will have trained 9 model using 9 different values for each variables.
As you can tell from the space exploration at the top of each layout in the
image, we have explored the hyperparameters space more widely with
Random Search (especially for the more important variables). This will
help us to find the best configuration in fewer iterations.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 11/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Run on FloydHub
Click this button to open a Workspace on FloydHub. You can use the
workspace to run the code below (Random Search using Scikit-learn and
Keras.) on a fully configured cloud machine.
model = KerasClassifier(build_fn=create_model)
# Search in action!
n_iter_search = 16 # Number of parameter settings that are sam
random_search = RandomizedSearchCV(estimator=model,
param_distributions=param_d
n_iter=n_iter_search,
n_jobs=,
random_search.fit(X, Y)
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 13/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
For example, it's common to use values of batch size as a power of 2 and
sample the learning rate in the log scale.
Common scale space for batch size and learning rate
Zoom In!
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 14/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
It's Blog
FloydHub also very common to start with one of the layouts above Share
for athis
certain
number of iterations, and then zoom into a promising subspace by
sampling more densely in each variables range, and even starting a new
search with the same or a different searching strategy.
It can sound strange and surprising, but what makes Babysitting effective
– despite the amount of time required – is the ability of the scientist to
drive the search and experimentation effectively by using the past as a
resource to improve the next runs.
Bayesian Optimization
This search strategy builds a surrogate model that tries to predict the
metrics we care about from the hyperparameters configuration.
At each new iteration, the surrogate we will become more and more
confident about which new guess can lead to improvements. Just like the
other search strategies, it shares the same termination condition.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 15/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
If this sounds confusing right now, don't worry – it's time for another
visual example.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 16/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
The black dots represent the model trained so far. The red line is the
ground truth, or, in other words, the function that we are trying to learn.
The black line represents the mean of the actual hypothesis we have for
the ground truth function and the grey area shows the related uncertainty,
or variance, in the space.
Now that we've defined the starting point, we're ready to choose the next
promising variables on which train a model. For doing this, we need to
define an acquisition function which will tell us where to sample the next
configuration.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 17/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
is aiming
FloydHub Blog to find the lowest possible value if we will use theShare
proposed
this
configuration from the uncertainty area. The blue dot in the Expected
Improvement chart above shows the point selected for the next training.
The more models we train, the more confident the surrogate will become
about the next promising points to sample. Here's the chart after 8 trained
models:
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 18/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Please note that we've really just scratched the surface about this
fascinating topic, and if you're interested in a more detailed reading and
how to extend SMBO, then take a look at this paper.
Run on FloydHub
Click this button to open a Workspace on FloydHub. You can use the
workspace to run the code below (Bayesian Optimization (SMBO - TPE)
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 19/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
using
FloydHub Hyperas)
Blog on a fully configured cloud machine. Share this
def data():
"""
Data providing function:
This function is separated from model() so that hyperopt
won't reload data for each evaluation run.
"""
# Load / Cleaning / Preprocessing
...
return x_train, y_train, x_test, y_test
Search strategy
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 20/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
comparison
FloydHub Blog Share this
Summary
Bayes SMBO is probably the best candidate as long as resources are not
a constraint for you or your team, but you should also consider
establishing a baseline with Random Search.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 22/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Stopping criteria
The first three criteria are self-explanatory, so let's focus our attention to
the last one.
It's common to cap the training time according to the class of experiment
inside the research lab. This policy acts as a funnel for the experiments
and optimizes for the resources inside the team. In this way, we will be
able to allocate more resources only to the most promising experiments.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 23/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
TheBlog
FloydHub floyd-cli (the software used by our users to communicate with
Share this
FloydHub and that we've open-sourced on Github) provides a flag with
this purpose: our power users are using it massively to regulate their
experiments.
These criteria can be applied manually when babysitting the learning
process, or you can do even better by integrated these rules in your
experiment through the hooks/callbacks provided in the most common
frameworks:
I will stop the list here to limit the discussion to the most used / trending
frameworks (I hope to not have hurt the sensibility of the other
frameworks' authors. If so, you can direct your complaints to me and I'll
be happy to update the content!)
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 24/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
I would like to share with you another interesting research effort from
DeepMind where they used a variant of Evolution Strategy algorithm to
perform hyperparameters search called Population Based Training (PTB
is also at the foundation of another amazing research from DeepMind
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 25/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
which
FloydHub Blogwasn't quite covered from the press but that I stronglyShare
encourage
this
you to check out on your own). Quoting DeepMind:
PBT - like random search - starts by training many neural
networks in parallel with random hyperparameters. But instead
of the networks training independently, it uses information from
the rest of the population to refine the hyperparameters and direct
computational resources to models which show promise. This
takes its inspiration from genetic algorithms where each member
of the population, known as a worker, can exploit information
from the remainder of the population. For example, a worker
might copy the model parameters from a better performing
worker. It can also explore new hyperparameters by changing the
current values randomly.
Managing your
experiments on
FloydHub
One of the biggest features of FloydHub is the ability to compare
different model you're training when using a different set of
hyperparameters.
The picture below shows a list of jobs in a FloydHub project. You can
see that this user is using the job's message field (e.g. floyd run --
message "SGD, lr=1e-3, l1_drop=0.3" ... ) to highlight the
hyperparameters used on each of these jobs.
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 26/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
Additionally,
FloydHub Blog you can also see the training metrics for each job.
ShareThese
this
offer a quick glance to help you understand which of these jobs
performed best, as well as the type of machine used and the total training
time.
Project Page
The FloydHub dashboard gives you an easy way to compare all the
training you've done in your hyperparameter searching – and it updates
in real-time.
Training metrics
As mentioned above, you can easily emit training metrics with your jobs
on FloydHub. When you view your job on the FloydHub dashboard,
you'll find real-time charts for each of the metrics you've defined.
training
FloydHub Blog given the configuration of hyperparameters you've selected.
Share this
For example, if you're babysitting the training process, then the training
metrics will certainly help you to determine and apply the stopping
criteria.
Training metrics
FloydHub HyperSearch
(Coming soon!)
We are currently planning to release some examples of how to wrap the
floyd-cli command line tool with these proposed strategies to
effectively run hyperparameters search on FloydHub. So, stay tuned!
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 28/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
We're
FloydHub Blogreally excited to improve FloydHub to meet all your training
Share this
needs!
youremail@example.com Subscribe
— FloydHub Blog —
Deep Learning
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 29/30
9/12/2018 Practical guide to hyperparameter searching in Deep Learning
https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/ 30/30