Sei sulla pagina 1di 11

Speeding up your analysis with distributed computing

tutorial, qsub, peer, distcomp, matlab, MEG-language

Speeding up your analysis with distributed


computing

Introduction

Many times you are faced with the analysis of multiple subjects and experimental conditions, or with
the analysis of your data using multiple analysis parameters (e.g. frequency bands). Parallel
computing in MATLAB can help you to speed up these types of analysis.

Note that this is usually referred to as distributed computing if you are submitting multiple
independent computations to multiple computers. The term parallel computing is usually reserved
for multiple CPUs or computers working simultaneously at the same problem that requires constant
sharing of small snippets of data between the CPUs. Since the analyses of multiple subjects are done
independently of each other, we call it distributed computing.

This tutorial describes two approaches for distributing the analysis of multiple subjects and
conditions. The data used in this example is the same as in the tutorial scripts: 151-channel MEG
was recorded in 4 subjects, and in each dataset there are three experimental conditions (FC, FIC,
IC). Both approaches rely on the qsubcellfun function which applies a given function to each element
of a cell-array. The function execution is done in parallel on the Torque batch queue system.

After this tutorial you should be able to execute your multi-subject analysis in parallel and design
analysis scripts that allow for easy parallelization, either over subjects or over parameters used in
the analysis.

In this tutorial we use the qsub toolbox that is released along with FieldTrip. There are alternative
methods for distributed computing, such as the MATLAB parallel computing toolbox (e.g. using
parfor or dfeval) or with the peer-to-peer toolbox (also included with FieldTrip). More general
information about the different approaches for distributed processing in MATLAB can be found in
the frequently asked questions.

Background

The qsub distributed computing toolbox was implemented with FieldTrip (and SPM) in mind. At the
moment however, FieldTrip itself does not yet make use of distributed computing, i.e. FieldTrip
functions do not automatically distribute the workload. We are of course planning to make that
possible, i.e. that a single cfg.parallel='yes' option will automatically distribute the computational
load over all available nodes.

At the moment the only way of distributing the workload over multiple nodes requires that you adapt
your scripts. The easiest is to distribute the workload of the analysis of multiple subjects and
conditions over multiple nodes.

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34


Speeding up your analysis with distributed computing

Procedure

To distribute your processes and to speed up your analyses, we provide two examples. The first
example script will show you how to use basic FieldTrip functions for the distribution.

Using the basic FieldTrip functions in a memory efficient manner requires that you save the
intermediate data of each step to disk, and that you load it upon the next (parallel) step in the
analysis. If you prefer not to store all intermediate results, or if you want to have more control over
other aspects of the parallel execution, you can provide your own functions that are executed in
parallel. This is demonstrated in the second example script.

The distributed operations of FieldTrip functions in this example require the original MEG datasets
for the four subjects, which are available from

ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/Subject01.zip
ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/Subject02.zip
ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/Subject03.zip
ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/Subject04.zip

Or, when at the Donders Centre for Cognitive Neuroimaging, use

cd /home/common/matlab/fieldtrip/data

Example 1: using only FieldTrip functions in distributed


computing

This example script demonstrates how to run basic FieldTrip functions in parallel. The idea is
schematically depicted in the following figure.

subjectlist = {
'Subject01.ds'
'Subject02.ds'
'Subject03.ds'
'Subject04.ds'
};

conditionlist = {
'FC'
'FIC'
'IC'
};

triggercode = [
9
3
5
];

Printed on 2016/06/20 07:34 http://www.fieldtriptoolbox.org/


Speeding up your analysis with distributed computing

% start with a new and empty configuration


cfg = {};

for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.dataset = subjectlist&#123;subj&#125;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.prestim = 1;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.poststim = 2;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.eventtype = 'backpanel trigger';
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.eventvalue = triggercode&#40;<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
end
end

cfg = qsubcellfun&#40;@ft_definetrial, cfg&#41;;

% this extends the previous configuration


for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.channel = &#123;'MEG', '-MLP31', '-MLO12'&#125;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.demean = 'yes';
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.baselinewindow = &#91;-0.2 0&#93;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfilter = 'yes';
cfg&#123;subj,<a

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34


Speeding up your analysis with distributed computing

href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfreq = 35;
end
end

data = qsubcellfun&#40;@ft_preprocessing, cfg&#41;;

% start with a new and empty configuration


cfg = &#123;&#125;;

for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
% timelockanalysis does not require any non-default settings
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;
end
end

timelock = qsubcellfun&#40;@ft_timelockanalysis, cfg, data&#41;;

% from here on we won't process the data in parallel any more


% average each condition over all subjects
cfg = &#91;&#93;;
avgFC = ft_timelockgrandaverage&#40;cfg, timelock&#123;:,1&#125;&#41;;
avgFIC = ft_timelockgrandaverage&#40;cfg, timelock&#123;:,2&#125;&#41;;
avgIC = ft_timelockgrandaverage&#40;cfg, timelock&#123;:,3&#125;&#41;;

cfg = &#91;&#93;;
cfg.layout = 'CTF151.lay';
ft_multiplotER&#40;cfg, avgFC, avgFIC, avgIC&#41;;

cfg = &#91;&#93;;
cfg.channel = &#123;'MLC33', 'MLC43', 'MLP11', 'MLP12', 'MLP13', 'MLP33',
'MLP34', 'MLT14', 'MLT15', 'MLT25'&#125;
ft_singleplotER&#40;cfg, avgFC, avgFIC, avgIC&#41;;

Printed on 2016/06/20 07:34 http://www.fieldtriptoolbox.org/


Speeding up your analysis with distributed computing

In the code above all data is processed by the distributed computers and subsequently returned to
the workspace of your desktop computer. The data can take quite a lot of RAM, which you can check
like this.

>> whos
Name Size Bytes Class Attributes

conditionlist 3x1 350 cell


subjectlist 4x1 544 cell
triggercode 3x1 24 double
cfg 4x3 9732384 cell
data 4x3 1203237420 cell
timelock 4x3 65515200 cell
...

Instead of returning the 12 variables for the different subjects and conditions all to your workspace,
you can also use the cfg.inputfile and cfg.outputfile options to have the distributed computers
read/write the data to/from disk. For example the section on ft_preprocessing and
ft_timelockanalysis could be changed into

% ...

for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.channel = &#123;'MEG', '-MLP31', '-MLO12'&#125;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.demean = 'yes';
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.baselinewindow = &#91;-0.2 0&#93;;

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34


Speeding up your analysis with distributed computing

cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfilter = 'yes';
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfreq = 35;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.outputfile = <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sprintf.html"
>sprintf</a>&#40;'subj%02d_cond%02d_raw.mat', subj, <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
end
end

% note that here we don't specify an output parameter


qsubcellfun&#40;@ft_preprocessing, cfg&#41;;

cfg = &#123;&#125;;

for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.inputfile = <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sprintf.html"
>sprintf</a>&#40;'subj%02d_cond%02d_raw.mat', subj, <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
cfg&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.outputfile = <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sprintf.html"
>sprintf</a>&#40;'subj%02d_cond%02d_avg.mat', subj, <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
end
end

% note that here we don't specify an output parameter


qsubcellfun&#40;@ft_timelockanalysis, cfg&#41;;

% ...

Example 2: writing custom functions for distributed

Printed on 2016/06/20 07:34 http://www.fieldtriptoolbox.org/


Speeding up your analysis with distributed computing

computing

This example script demonstrates how you can efficiently design your custom code for distributed
computing.

subjectlist = &#123;
'Subject01.ds'
'Subject02.ds'
'Subject03.ds'
'Subject04.ds'
&#125;;

conditionlist = &#123;
'FC'
'FIC'
'IC'
&#125;;

triggercode = &#91;
9
3
5
&#93;;

% start with a new and empty configuration


cfg1 = &#123;&#125;;
cfg2 = &#123;&#125;;
cfg3 = &#123;&#125;;
cfg4 = &#123;&#125;;
outputfile = &#123;&#125;;

for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
% this is for definetrial and preprocessing
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.dataset = subjectlist&#123;subj&#125;;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.prestim = 1;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.poststim = 2;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34


Speeding up your analysis with distributed computing

nd</a>&#125;.trialdef.eventtype = 'backpanel trigger';


cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.trialdef.eventvalue = triggercode&#40;<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.channel = &#123;'MEG', '-MLP31', '-MLO12'&#125;;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.demean = 'yes';
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.baselinewindow = &#91;-0.2 0&#93;;
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfilter = 'yes';
cfg1&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125;.lpfreq = 35;

% this is for timelockanalysis


cfg2&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;

% this is for megplanar


cfg3&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;

% this is for combineplanar


cfg4&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = &#91;&#93;;

% this defines the file that will contain the output


outputfile&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sprintf.html"
>sprintf</a>&#40;'subj%02d_cond%02d_combined.mat', subj, <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;;
end
end

% note that the "preproc_timelock_planar" function is defined further down in


this tutorial
qsubcellfun&#40;@preproc_timelock_planar, cfg1, cfg2, cfg3, cfg4,

Printed on 2016/06/20 07:34 http://www.fieldtriptoolbox.org/


Speeding up your analysis with distributed computing

outputfile&#41;;

% let's now load the individual subject data from the 12 *.mat files and
% average it for subsequent plotting
timelock = &#123;&#125;;
for subj=1:4
for <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>=1:3
tmp = <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/load.html">lo
ad</a>&#40;<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/sprintf.html"
>sprintf</a>&#40;'subj%02d_cond%02d_combined.mat', subj, <a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#41;&#41;;
timelock&#123;subj,<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/cond.html">co
nd</a>&#125; = tmp.combined;
<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/clear.html">c
lear</a> tmp
end
end

cfg = &#91;&#93;;
avgFC = ft_timelockanalysis&#40;cfg, timelock&#123;:,1&#41;&#41;;
avgFIC = ft_timelockanalysis&#40;cfg, timelock&#123;:,2&#41;&#41;;
avgIC = ft_timelockanalysis&#40;cfg, timelock&#123;:,3&#41;&#41;;

cfg = &#91;&#93;;
cfg.layout = 'CTF151.lay';
ft_multiplotER&#40;cfg, avgFC, avgFIC, avgIC&#41;

This way you can distribute your custom function (e.g. see below) along with the input and output
parameters.

function preproc_timelock_planar&#40;cfg1, cfg2, cfg3, cfg4, outputfile&#41;

cfg1 = ft_definetrial&#40;cfg1&#41;;
data = ft_preprocessing&#40;cfg1&#41;;

timelock = ft_timelockanalysis&#40;cfg2, data&#41;;


<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/clear.html">c
lear</a> data

planar = ft_megplanar&#40;cfg3, timelock&#41;;


<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/clear.html">c
lear</a> timelock

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34


Speeding up your analysis with distributed computing

combined = ft_combineplanar&#40;cfg4, planar&#41;;


<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/clear.html">c
lear</a> planar

<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/save.html">sa
ve</a>&#40;outputfile, 'combined'&#41;;
<a
href="http://www.mathworks.com/access/helpdesk/help/techdoc/ref/clear.html">c
lear</a> combined

Summary and suggested further reading

This tutorial covered how to distribute your computations/workload over multiple computers in a
cluster that uses the Torque or SGE batch queue system. In our example, we have performed a
relatively simple timelock analysis (ERF) on MEG data, but one can imagine that it does not need
many adjustments to distribute any other type of analysis. Using the configuration demonstrated in
Example 2, one can distribute any form of analysis.

FAQs related to issues in this tutorial:

Does a firewall affect the communication between peers?


How can I combine FieldTrip with peer distributed computing?
How can I debug a problematic distributed job?
How can I determine the number of threads that MATLAB uses?
How can I prevent a job from executing twice?
How can I read and write files if I use other people's peers?
How can I set up the peer distributed computing on a large Linux cluster?
How can I set up the peer distributed computing on a single multicore computer?
How can I set up the peer distributed computing on a small number of computers?
How can I stop the different threads created by peermaster and peerslave?
How can I use the command-line peerslave and optimize the MATLAB licenses?
How do I avoid having to allocate N copies of my data if I want to execute N jobs?
How does the peer smartshare algorithm work?
How does the smartmem algorithm work?
How should I call peercellfun when a function requires many inputs (e.g. key-value pairs)?
How to compile MATLAB coooode into stand-alone executables?
How to get started with distributed computing using qsub?
How to get started with peer distributed computing on my own desktop computer?
How to get started with the MATLAB distributed computing toolbox?
What are the different approaches I can take for distributed computing?
What happens if a job fails to execute properly?
What happens with a job that has an error on the slave?
Why are the peers using multicast to announce themselves?
Why does peercellfun resubmit jobs that take too long to get started?

Printed on 2016/06/20 07:34 http://www.fieldtriptoolbox.org/


Speeding up your analysis with distributed computing

From:
http://www.fieldtriptoolbox.org/ - FieldTrip

Permanent link:
http://www.fieldtriptoolbox.org/tutorial/distributedcomputing

Last update: 2014/01/22 09:26

http://www.fieldtriptoolbox.org/ Printed on 2016/06/20 07:34

Potrebbero piacerti anche