Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract | In this paper, we give a brief review of pattern classification algorithms based on
discriminant analysis. We then apply these algorithms to classify movement direction based on
multivariate local field potentials recorded from a microelectrode array in the primary motor
cortex of a monkey performing a reaching task. We obtain prediction accuracies between 55%
and 90% using different methods which are significantly above the chance level of 12.5%.
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 497
REVIEW Hariharan Nalatore et al.
498 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
primarily based on mean differences and the other and 8 buttons that were radially placed around
based on covariance differences. the center one. Reaching one of these 8 buttons
Under the assumption of multivariate normality corresponded to a movement direction of 0◦ , 45◦ ,
(of the pdf ’s of different classes), the Bhattacharya 90◦ , 135◦ , 180◦ , 225◦ , 270◦ and 315◦ . The 0◦
distance is computed by the equation: movement corresponded to moving horizontally
to the right. The other directions were marked in
6 f + 6 g −1
an anticlockwise direction with respect to this 0◦
1
B = [µf − µg ]T [µf − µg ] direction.
8 2 The sequence of events in the tasks was as it
6 +6 ! follows. First, the center button lit up for a short
1 | f 2 g|
+ ln p . time. If the monkey reached the center button in
2 |6 f ||6 g | less than 2 s, a trial began. The monkey had to hold
the button for at least 500 ms, otherwise the trial
Here µf , µg are mean vectors and 6 f , 6 g are was aborted. In any given trial, one of the radial
covariance matrices of the two features f and g. We buttons glowed randomly for 150 ms (this happened
pick those optimal features (e.g. channel averaged randomly in the time interval [500 ms, 1000 ms]).
power at some optimal frequency) which maximises The start time of this event is here referred to as
the Bhattacharya distance in the feature space. the “precue time”. After this instructed delay, all the
radial buttons glowed at once. This time is called
the “gocue time”. The monkey had to move its arm
3. The partitioning of feature space
in that direction where the single radial button had
To classify a point in feature space, we should be lit up initially. If the monkey reached the correct
able to divide the feature space into an exhaustive set radial button within 2 s after gocue, the trial was
of non overlapping regions; one for each group of considered to be successful. On the other hand, if
interest so that every point in the space is uniquely the monkey did not reach the correct radial button
associated with one of the named classes (groups). and held the center button for at least 50 ms, the
The locus of points which separate a pair of groups is target was aborted.
called a ‘decision boundary’21 . These are estimated In this manner, 411 trials of successful hand
using training samples. reaching movements were performed by the monkey.
The case of multi-group classification (m > In each trial, the multichannel Local Field Potentials
2) is intrinsically harder than binary (m = 2) (LFP’s) (which can be thought of as embodying the
classification because the classification algorithm has collective synaptic input of local neuronal clusters22 )
to construct more number of decision boundaries were recorded from a 96-microelectrode array17
for which each group has to be explicitly defined. chronically implanted in the primary motor cortex
of the monkey which is contralateral to the moving
For m groups in the feature space, the probability of
arm. All correct trials were rewarded with juice. Of
correctly classifying a point by chance is 1/m.
the 47 channels of data that were recorded, 43 were
The modeling of each group is nothing but the found to be good for further data analysis.
modeling of the probability density function for Figure 1 shows the spatial map of the
that group in feature space. It must be done in such multichannel electrodes during a recording session.
a way that different groups are as distinct from one By considering a single trial multichannel
another as possible. How well a classifier will work LFP over a brief time interval of about 250 ms
depends on how well the true class density functions prior to gocue (i.e. during the instructed delay
can be determined from the training samples. period), our goal is to decode the direction of hand
In the case of the Gaussian Quadratic norm movement based on spatiotemporal correlation
classifier, the decision boundary in the feature space patterns within the LFP signal. We approach this
will be a second order hyperspace and its form and as a pattern classification problem. First we train
location between the class mean values will depend a classifier using a training data set (which is a
on C i ’s. This classifier is a maximum likelihood subset of the full LFP data). Once this is done,
classifier. we assign each individual trial in the remaining
data set (called the testing data set) to one of
the eight possible directions using various pattern
4. The experiment
classification algorithms. Moreover, this needs to be
We consider an experiment in which a well trained accomplished before the hand movement actually
monkey performs a Sensorimotor cognitive task as starts. Since we know the end result for each trial,
described below. This experiment was performed at we then determine the probability of successful
John Donoghue’s laboratory at Brown University, prediction. In our case, since we have to predict
USA. the direction of movement in one of eight possible
The monkey sat on a chair and in front of her, directions, the probability of obtaining the correct
there was a vertical panel with a center button result by chance is 0.125.
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 499
REVIEW Hariharan Nalatore et al.
500 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
Figure 2: Power spectrum in all channels for 45 degree direction during instructed delay period.
20
15
–5
–10
–15
0 10 20 30 40 50 60 70 80 90 100
Frequency (hz)
the corresponding electrodes are all closed spaced Also, Z i is uncorrelated with Z i−1 , Z i−2 , . . . Z2 and
(within a distance of approximately 4 mm), the Z1 . Without this constraint on the coefficients
powers in different channels can be quite similar (i.e. a ij , we can see that Var(Z i ) can be increased by
not independent) and this can lead to over fitting. increasing one of the a ij (for a given i).
To avoid this, principal component analysis can be The determination of the coefficients a ij is an
used to reduce the number of variables. eigen value problem of the sample covariance matrix
The objective of Principal component analysis25 C. The variances of the principal components are
is to take p variables X1 , X2 , X3 , . . . , X p and find the eigen values of the matrix C. Let the eigen values
combinations of these to produce new variables of C be arranged as l1 ≥ l2 ≥ l3 ≥ ··· ≥ lp ≥ 0. Now,
Z1 , Z2 , Z3 , . . . , Z p that are uncorrelated. These new
li corresponds to the ith principal component Z i .
variables are ordered so that Z1 explains the largest
The coefficients a i1 , a i2 , . . . , a ip are the elements of
amount of variation in the data, Z2 explains the
second largest amount of variation and so on. That the corresponding
Pp eigen vector (these are scaled so
is Var(Z1 ) ≥ Var(Z2 ) ≥ . . . ≥ Var(Z p ). The new that j=1 a2ij = 1).
variables Z i ’s are called principal components. In
performing principal component analysis, there 5.2.2. Implementation
is a possibility that the variances of most Z i ’s will By considering the multitrial multichannel
be so low as to be negligible. In such a case, the LFP recordings over the time interval of [250 ms,
variation in the data set can be adequately described 500 ms], we compute the power in the gamma
by the first few Z i variables whose variances are not frequency band using multitaper method for all
negligible. Hence, principal component analysis is the good channels and for all trials belonging
used to reduce the dimensionality of the problem to the training set. We choose a frequency f a
and to transform interdependent coordinates into in the gamma band. Using the multichannel
significant and independent ones. power at the frequency f a , we transform this
The ith principal component Z i for (1 ≤ i ≤ p) by principal component analysis and pick 5
is given by principal components Z i ’s. Each of these principal
components is a linear combination of the power
Z i = a i1 X1 + a i2 X2 + ··· + a ip X p .
in different channels at the frequency f a . The first
The Var(Z i ) is as large as possible subject to the 5 components explain between 80 to 90 percent
constraint that of the total variance across the eight directions. By
evaluating these principal components for all trials
a2i1 + a2i2 + a2i3 + ··· + a2ip = 1. belonging to various directions, we determine the
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 501
REVIEW Hariharan Nalatore et al.
Figure 3: Time evolution of probability of correct prediction by single frequency feature space approach for various directions. The
horizontal dashed line indicated chance level.
parameters of the classifier for various directions. of probability of correct prediction of monkey’s
Here again, we use Mahalanobis distance as the movement direction.
classifier. This procedure is repeated for various
To test the performance of the classifier on the frequencies f a in the gamma band, and the optimal
testing set, we perform the actual classification frequency that gives the best predictability is chosen.
by considering each single trial data belonging to The graphs of probability of correct prediction for
the testing set. We proceed as follows. From the various analysis direction are shown in Figures 5
single trial multichannel LFP recordings, to get the and 6.
temporal evolution of the prediction, we divide
the data using highly overlapped time windows
of 150 ms duration. For the data in each time 5.3. Prediction by Canonical Discriminant
slice, the multichannel power at the frequency f a function method
computed by multitaper method is transformed 5.3.1. Theory
to get 5 principal components from which the Next, we consider an alternative method for
Mahalanobis distance is determined for various pattern classification—the canonical discriminant
directions. The given trial is assigned to that function method25 . Here, we determine function
direction for which the Mahalanobis distance is Z i ’s which are linear combination of the variables
minimum. This classification when repeated for all X1 , X2 , X3 , . . . , X p that enable us to separate the m
single trial data for all time slice gives the evolution directions (groups) as much as possible. Hence Z
502 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
Figure 4: Time evolution of probability of correct prediction by single frequency feature space approach for various directions. The
horizontal dashed line indicated chance level.
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 503
REVIEW Hariharan Nalatore et al.
Figure 5: Time evolution of probability of correct prediction by principal component analysis approach for various directions. The
horizontal dashed line indicated chance level.
Z i being uncorrelated with Z1 , Z2 , Z3 , . . . and Z i−1 evolution of the probability of the correct prediction.
within groups. Thus the functions Z1 ,Z2 ,Z3 ,. . .,Z s This procedure is repeated for each frequency
are chosen in such a way that Z1 reflects group f a in the gamma band and we pick the optimal
difference as much as possible; Z2 captures as much frequency feature which gives best predicability
as possible of the group differences not displayed by of the monkey’s movement target. It was found
Z1 ; Z3 captures as much as possible of the group that this procedure does not give good predictions.
differences not displayed by Z1 and Z2 ; and so on. Hence we have not displayed the figures.
504 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
Figure 6: Time evolution of probability of correct prediction by principal component analysis approach for various directions. The
horizontal dashed line indicated chance level.
(at frequency f a ). Hence, we have an estimate of the 5.5. General remarks on subsequent methods
pdf ’s for each direction (group) from the training We now describe other methods that we used to
set. classify the testing set data and obtain predictions
Given a single trial LFP recording, we choose of movement directions. However, none of these
the multichannel recording in a given time window methods gave results better than the first few
(slice). By evaluating the channel averaged power at methods. Hence only a brief description of the
the frequency f a , we assign this trial to that direction method will be given for the sake of completeness.
that gives the maximum aposterior probability. In
No figures will be included.
the same manner, the direction allocation of this
The following key steps are common in each
data is repeated for all trials in the testing set.
Repeating this procedure for all time windows of the subsequent methods for decoding monkey’s
enables us to get the temporal evolution of the movement direction:
probability of correct prediction. This procedure is (1) By using the multitrial multichannel LFP
repeated for all frequency in the gamma band and recordings between the time intervals [250 ms,
the frequency that gives best predictability is chosen. 500 ms] during instructed delay, we compute the
The results obtained are shown in the Figures 7 and power spectrum by multitaper method for all
8. channels and trials. The power in the gamma
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 505
REVIEW Hariharan Nalatore et al.
Figure 7: Time evolution of probability of correct prediction for various directions by histogram approach. The horizontal dashed line
indicated chance level.
frequency band [31 Hz, 55 Hz] is used during optimal pair of frequencies (f1∗ , f2∗ ) corresponds
the classification of single trial data in various to those frequencies for which the Bhattacharya
directions. distance is maximum. The classifier used here is
(2) While doing actual classification on training the Gaussian quadratic norm. The parameters of
set, we divide the single trial multichannel data by the classifier are evaluated using channel averaged
using highly overlapped time windows of 150 ms power at this pair of optimal frequencies for each
duration as the signal is non-stationary. We classify direction. The time evolution of the probability of
the trial using some algorithm in each time slice. As correct prediction is obtained in the usual manner.
we slide the time window across the data, we obtain
the temporal evolution of the probability of correct 5.7. Four channel feature space approach
prediction for each of the methods. The best feature subset is obtained as follows. We
sum the power over frequencies in the gamma band
5.6. Two frequency feature space approach for each channel and trial from the training set. Here,
In this method, to choose a set of optimal the frequency summed power of the ith channel
frequencies, (similar to the single frequency f a∗ in (1 ≤ i ≤ 43) is considered as the ith feature. We
the previous methods) we make use of Bhattacharya form all possible 4-tuple channel combinations of
distance. First, the power in the gamma band is these frequency summed powers. For each such 4-
averaged over channels for all trials and frequencies. tuple channel combination, we obtain the pairwise
We want to pick a pair of frequencies (f1∗ , f2∗ ) separability using Bhattacharya distance between all
so that the channel averaged power at the above possible pairs of directions.
optimal pair of frequencies separates the pdf ’s of We consider that direction pair for which the
each direction the most. To do this, we evaluate average Bhattacharya distance across all channel
the Bhattacharya distance using channel averaged combinations is the least. Then for this optimal
power at all possible pairs of frequencies (f1 ,f2 ). The direction pair, we choose that channel combination
506 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
Figure 8: Time evolution of probability of correct prediction for various directions by histogram approach. The horizontal dashed line
indicated chance level.
which gives the largest distance. Here, we are trying 5.9. 3-channel 2-frequency feature space approach
to maximize the successful classification for that pair Here also, the ith channel power at a given frequency
of direction which are hardest to classify. By using is considered to be the ith feature to be used. We
the frequency summed power of this optimal 4- form a 3-tuple channel combination of power at
channel combination, we determine the parameters a pair of frequencies (f1 , f2 ) and obtain pairwise
of the Gaussian quadratic norm for each direction. separability using Bhattacharya distance between
The time evolution of the probability of correct all pairs of directions. This pairwise separability
prediction is obtained in the usual manner. distance is averaged across the pair of directions
for all possible 3-tuple combinations (of power
5.8. 3 Channel 1 frequency feature space approach at al pairs of frequencies). We pick the optimal 3-
Here, the power at a given frequency f1 of the ith tuple channel combination of power at the pairs of
channel is considered to be the ith feature to be used. frequencies (f1∗ , f2∗ ) for which the across direction
We form a 3-tuple channel combination of power at average distance is a maximum. Using this 3-
the frequency f1 and obtain pairwise separability tuple channel combination of power at the pair
using Bhattacharya distance between all pairs of of frequencies (f1∗ , f2∗ ), the parameters of various
direction. This pairwise separability distance is directions are determined. We again make use of the
averaged across the pair of directions for all 3- Gaussian quadratic norm as the classifier. The time
tuple channel power (at frequency f1 ). We pick evolution of the probability of correct prediction is
that 3-tuple channel combination of power at the obtained in the usual manner.
frequency (f1∗ ) for which the across direction average
distance is maximum. Using this optimal 3-tuple 6. Reaction times
channel combination of power at the frequency The reaction time is defined as the time difference
f1∗ , we determine the parameters of the Gaussian between the go cue and the start of hand movement.
quadratic norm for all directions using the training It represents the total time taken by the visual
set. signal to reach the brain and for the appropriate
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 507
REVIEW Hariharan Nalatore et al.
508 Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in
Fast Robust Pattern Classification Algorithms for Real Time Neuro-Motor Prosthetic Applications REVIEW
10. Scherberger H., Jarvis M. R., Andersen R. A. (2005) Cortical 25. R. A. Johnson and D. W. Wichern, Applied Multivariate
local field potential encodes movement intentions in the Statistical Analysis, Pearson, New Delhi, 2002.
posterior parietal cortex. Neuron 46, 347–354.
11. Mehring C., Nawrot M. P., Cardoso de Oliveira S., Vaadia E.,
Govindan Rangarajan completed his
Schulze-Bonhage A., Aertsen A., Ball T. (2005) Comparing Ph.D. from University of Maryland,
information about arm movement direction in single channels College Park, USA. Later he was
of local and epicortical field potentials from monkey and employed as a Staff Scientist at
human motor cortex. J. Physiol. 98, 498–506. the Lawrence Berkeley Laboratory,
12. Rickert J., Cardoso de Oliveira S., Vaadia E., Aertsen A., University of California, Berkeley. He
Rotter S., Mehring C. (2005) Encoding of movement direction returned to India in 1992 as an Assistant
in different frequency ranges of motor cortical local field Professor in the Indian Institute of
potentials. J. Neurosci. 25, 8815–24. Science, Bangalore. He is currently
13. Kim, S. P., Simeral, J. D., Hochberg, L. R., Donoghue, J. P., Professor and Chairman, Department of
Friehs, G. M., Black, M. J. (2007) Multi-state decoding of Mathematics. He is also Convener, Digital Information Services
point-and-click control signals from motor cortical activity in Centre and Convener, IISc Mathematics Initiative. His research
a human with tetraplegia. Proceedings of the 3rd International interests include nonlinear dynamics and chaos, applications
IEEE EMBS Conference on Neural Engineering, pp. 486–489. of time series analysis to neuroscience, and applications of
14. Truccolo, W. and Donoghue, J. P. (2007) Nonparametric stochastic processes. He has over 60 papers to his credit. He was
modeling of Neural Point Processes via Stochastic Gradient recently made a Chevalier (Knight) of the Order of Palms by the
Boosting Regression. Neural Computation, 19, 672–705. Government of France. He has been awarded the Homi Bhabha
Fellowship and is a Fellow of the National Academy of Sciences.
15. Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P. and Black,
M. J. (2005) Bayesian population coding of motor cortical
Wilson Truccolo received the Ph. D.
activity using a Kalman Filter. Neural Computation, vol.18, pp.
degree in Complex Systems and Brain
80–1118. Sciences from the Florida Atlantic
16. Srinivasan, L., Eden, U.T., Mitter, S.K., Brown, E. N. (2007) University, Boca Raton, FL. He is
General-purpose filter design for neural prosthetic devices. J currently a research Assistant Professor
Neurophys 98, 2456–2475. in the Neuroscience Department, Brown
17. Suner, S., Fellows, M. R., Vargas-Irwin, C., Nakata, K. and University, and a Research Fellow in the
Donoghue, J. P. (2005) Reliability of signals from chronically Neurology Department, Massachusetts
implanted, silicon-based electrode array in non-human General Hospital. Mr. Truccolo is a
primate primary motor cortex. IEEE Trans. Neural Syst. principal investigator having recently
Rehabil. Eng. 13, 524–541. received a NIH-NINDS K01 career award. His research focuses
18. Scholkopf, B., Smola, A.J. (2002) Learning with Kernels - on stochastic modeling of cortical dynamics & computation,
Support vector machines, optimization and beyond. MIT sensorimotor neuroprostheses, neural modulation and control.
Press, MS.
19. Vapnik, V. (1998). Statistical Learning Theory. Wiley, New Hariharan Nalatore is currently a
York. Project Scientist at Applied Research
20. Hastie, T., Tibshirani, T., and Friedman, J.H. (2001). Elements International, New Delhi. He obtained
of Statistical Learning. Springer-Verlag, New York. his Ph.D. from the Indian Institute
of Science in 2005 and worked as a
21. D. A. Landgrebe, Signal Theory Methods in Multispectral
Postdoctoral Associate in the Department
Remote Sensing, John Wiley, Hoboken, 2003.
of Biomedical Engineering, University
22. C. M. Gray, J. Computat. Neurosci. 1, 11–38 (1994). of Florida, USA. His primary research
23. B. Pesaran et. al, Nature Neurosci, 5, (2002), 805–811. interest is in time series analysis applied
24. R. A. Andersen, J. W. Burdick, S. Musallam, B. Pesaran and J. to data generated from neuroscience
Cham, Trends in Cognitive Sciences, 8, (2004), 486–493. experiments.
Journal of the Indian Institute of Science VOL 87:4 Oct–Dec 2007 journal.library.iisc.ernet.in 509