Sei sulla pagina 1di 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/273888539

Sound Analysis to Recognize Individuals and Animal Conditions

Conference Paper · January 1998

CITATIONS READS

3 39

3 authors, including:

Gerhard Jahns Wojtek Kowalczyk


Technische Universität Braunschweig Leiden University
35 PUBLICATIONS   208 CITATIONS    50 PUBLICATIONS   436 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

animal call recognition View project

automation View project

All content following this page was uploaded by Gerhard Jahns on 23 March 2015.

The user has requested enhancement of the downloaded file.


Aus dem Institut für Biosystemtechnik
und dem
Institut für Betriebstechnik

Gerhard Jahns
Wojciech Kowalczyk
Klaus Walter

Sound Analysis to Recognize Individuals and Animal


Conditions

Manuskript, zu finden in www.fal.de

Braunschweig
Bundesforschungsanstalt für Landwirtschaft (FAL)
1998

Also available at: http://www.tb.fal.de/index.htm?page=/staff/jahns/


Sound Analysis to Recognize Individuals and Animal Conditions
Gerhard Jahns, Wojciech Kowalczyk, Klaus Walter1

ABSTRACT

In the course of evolution different species of animals have developed different means of
communication e.g. sound. Ethologists and farmers are convinced that sounds produced by animals
allow to recognize individual animals and include information about their state. This may have
practical implications for increasing the efficiency of holding and handling farm and other animals.
The paper presents an application of digital signal processing techniques to identify individual cows
and to evaluate their conditions using standard hardware.
Keywords: Sound analysis, Digital signal processing, FFT, Power spectral density, Nearest-
neighbour classifier

INTRODUCTION
Farmers know this and some claim to be able to draw conclusions based on animal sounds.
As farmers are faced with larger herds, increased yields and new demands for animal breeding, there
is a need for more precise information about their livestock. E.g., in dairy farming efforts are made
to monitor and interpret characteristics of individual animals to detect abnormalities as early as
possible. So data are collected from individual animals like food intake, activity (number of steps per
hour), body temperature, live weight, amount of milk and its quality (pH value, electric conductivity,
fat and protein content, etc.). The detection and interpretation of animal vocalizations by technical
means could be an additional clue to enable the farmer to obtain helpful information 24 hours a day.
It is well known that different species have developed manifold means for communication.
Vocalization is one of them. It is most probably that individual animals can be recognized and
conclusions about their conditions can be drawn from their vocalization. An advantage of sound is
that by a single ‘sensor’ a microphone a whole group or herd of animals can be monitored.
The objectives are to provide valuable information without additional work load to the
farmer for reasonable costs. Sophisticated and expensive equipment to analyze sound as used in the
industry and research, or for speech recognition, is less suited for this purpose. The aim is to use
low-cost hardware like common PCs, with a standard sound card and a microphone, supplemented
by appropriate software. Even inexpensive sound cards have excellent frequency response curves.
Figure 1 shows the frequency response curve of a low-cost sound card. Because most microphones
and sound cards are tuned for human hearing range (20-20000 Hz) it has to be verified that the
vocalization of the species to be monitored is within this frequency range. Fortunately this holds true
for most domestic animals. In the following, vocalizations of dairy cows are serving as an example.
The results obtained so far demonstrate that it is possible to develop a digital signal
processing system that runs on standard hardware and is suitable for:
• identification of individual cows by their vocalizations (inter-class properties),
• identification of specific states of cows (intra-class properties).

1
Authors areResearcher, Professor, Agricultural Engineer, respectively, Institute of Biosystems Engineering, Federal
Agricultural Research Centre, Braunschweig, Germany, e-mail: jahns@bst.fal.de , Dept. of Mathematics and
Computer Science, Vrije Universiteit, Amsterdam, Netherlands, e-mail: wojtek@cs.vu.nl and Institute of Production
Engineering, Federal Agricultural Research Centre, Braunschweig, Germany, e-mail: walter@bt.fal.de

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 1
10

dB

-10
20 100 200 1k 2k 10k 20k
Frequency

Figure 1. Frequency response range of a common sound card (Schwirzke and Hilgefort, 1997)

The applied method is based on an application of the Nearest-Neighbour algorithm (Dasarathy,


1991) to Power Spectrum Density (PSD) characteristics of sounds (Proakis and Manolakis , 1996).
To get a better insight into the structure of cow vocalizations the corresponding PSD have
been mapped into 2-dimensional space. In this way it is possible to identify some regularities in
various sound samples. The reduction of dimension has been achieved by a combination of Principal
Component Analysis (PCA) and Multidimensional Scaling (MDS) (Cox and Cox, 1994).

COLLECTING VOICE SAMPLES


Voice samples had to be collected first in order to build a system which can be used for
determining individual cows and their condition. All samples were split into two groups: a training
or reference set (about 70% of cases) and a validation set (about 30%). The validation set was used
for the final estimation of the system performance. Recordings were made in an ordinary farm
environment (with a noisy background) and digitized. The sampling rate of 11025 Hz was
sufficiently high, if we take into account that the main frequencies of a typical moo (Figure 2) are
below 5000 Hz (Figure 3). All samples were labeled by human experts, according to the state of
recorded cows: hunger, heat, milking delay, calving, claw trimming, thirst etc. To enable an analysis
of inter- and intra-individual differences between the samples also cows' identifiers were attached to
the labels.
To be more specific, for the first problem (cow identification) 99 samples were taken from 4
different cows. For the second problem (identification of a certain condition of a cow) 51 samples
were taken from a cow, which was in one of the following states: hunger, heat and delayed milking.

SOUND PROPERTIES
Most systems for pattern classification use some specific features of patterns (Nadler and
Smith, 1993). In this approach various features were tried that could be extracted from general
characteristics of samples: envelopes, spectrograms and Power Spectrum Density estimates (PSD)
(Proakis, and Manolakis, 1996). Figure 3 shows the PSD estimate of a typical moo of a cow waiting
for milking. It turned out, that a straightforward application of PSD characteristics was sufficient for
obtaining high quality classifications.

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 2
1

0.8

0.6

0.4

0.2
Amplitude

−0.2

−0.4

−0.6

−0.8

−1
0 0.5 1 1.5 2 2.5
Time [sec]

Figure 2. Oscillogram of a ‘moo’ typical of a cow waiting for milking

10

−10
Power Spectrum Magnitude (dB)

−20

−30

−40

−50

−60

−70
0 1000 2000 3000 4000 5000 6000
Frequency [Hz]

Figure 3. Power Spectrum Density estimate of a ‘moo’ shown in Figure 2

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 3
SYSTEM ARCHITECTURE

Data Collection

Analog-Digital
Conversion
Database
Detachment of Single of
Vocalizations Wave Files and
Declarations

Calculation of PSD
Profiles
Parameters
- FFT points N = 128
- window:
- Hanning
- size = N Database
- no overlapping of
PSD profiles

Nearest Neighbour
Classification

Reduce Dimension

Results:
• Identification of Individuals 2 Dimensional
Plot
• Identification of Conditions

Figure 4. The overall architecture of the sound classifying system

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 4
Using the PSD estimate the classification is reduced to classify vectors of fixed length - in
the following called PSD-profiles. As the main classification procedure the so-called first-Nearest-
Neighbour classifier (Dasarathy,1991) was used. It operates as follows: to classify a vector v,
distances to all "known" vectors are calculated and v is classified according to the type of the closest
vector. To apply this procedure two things are needed: a reference set of classified samples (or their
PSD profiles) and a distance function. The choice of a suitable distance measure will be discussed
later.
The overall system architecture is shown in Figure 4. The input signal is digitized and
converted to its PSD profile – a vector of fixed length. Then this profile is compared by the Nearest-
Neighbour procedure to all reference profiles stored in the database. The type of the reference profile
which is 'closest' to the input profile determines the result of classification - cow or her condition.

CALCULATING PSD PROFILES


Estimating PSD involves numerous decisions about settings of various parameters and the
resulting procedure is quite complicated. Fortunately, both theoretical and practical aspects of
estimating PSD have been thoroughly investigated and are discussed in detail in many textbooks on
Digital Signal Processing, e.g. (Proakis, J.G. and D.G. Manolakis, 1996). Also specific algorithms
for calculating PSD are widely available, for example, the commonly used MATLAB system, (Math
Works Inc., 1992), contains a toolbox for signal processing with numerous sound processing
routines.
To estimate the PSD of a given sample the following parameters should be set:
• N - the number of points for which FFT is calculated - usually a power of 2.
• window type. To avoid some side-effects of FFT data from each window are multiplied by a
function which takes 0 at both ends of the window and smoothly increases to reach 1 in the
middle of the window. The choice of specific window type is not very important. The
following results were obtained by using Hanning windows.
• window size - in our case are chosen equal to N.
• overlapping parameter - the windows might be allowed to overlap. The following results were
obtained with non-overlapping windows.
Thus N is the only varied parameter to estimate the PSD.

NEAREST-NEIGHBOUR CLASSIFIER
The Nearest-Neighbour classifier requires: classified samples (or their PSD profiles) and a
distance function. In this case the reference set consisted of carefully collected samples: 68 for the
first problem (cow identification or inter-class classification) and 35 for the second problem (state
determination or intra-class classification). It became evident, that the choice of the distance measure
is of great influence on the result. For example the following functions: Σ|xi-yi|, Σ(xi-yi)2, Σ(xi-yi)4,
Σ(xi-yi)6 , 1-corr(x,y) were examined. The last function, which involves the correlation coefficient,
is not a metric one. To get an idea which choice is better and what the performance of the resulting
classifier might be, the "leave-one-out" validation procedure was applied. Therefore to evaluate the
classifier on a single, classified vector, this vector is removed from the set of vectors that were used
by the classifier and then the classifier is applied to this vector. The procedure was repeated for all
classified vectors and the count of misclassified vectors was used as a measure of accuracy of the
classifier. So in total 5*6=30 classifiers have been investigated, 5 metrics and 6 values of the

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 5
parameter N= 32, 64, 128, 256, 512, 1024. One of the best setting was N=128 and the distance
measured by 1-corr(x,y).

REDUCING DIMENSION
Using N=128 every sample is represented by a vector of 65 numbers which is just its PSD
characteristic. This vector is represented by a single point in a 65 dimensional space, which is
difficult to imagine. To get a better insight into the structure of the samples they were mapped into a
2-dimensional space in such a way, that the original distances between samples were preserved as
much as possible. The technique used for finding such a mapping is called Multidimensional
Scaling, (Cox and Cox, 1994). It must be emphasized that reducing the 65-dimensional space into a
2-dimensional space has to be paid by graphical distortion. The so-called Kruskal's stress (Cox and
Cox, 1994) is a measure of this distortion: the smaller the stress, the better the mapping. Figures 5
and 6 show 2-dimensional versions of our data sets. The axes of both figures consciously have no
numbers since they are normalized.

RESULTS

Title:
c4.eps
Creator:
MATLAB, The Mathworks, Inc.
Preview:
This EPS picture was not saved
with a preview included in it.
Comment:
This EPS picture will print to a
PostScript printer, but not to
other types of printers.

Figure 5. 2-dimensional map of 99 samples which were taken from 4 cows.


The Kruskal's stress is 5.71%.

After selecting these most promising settings (metrics and N) the system was validated using
the validation sets. In the first problem the system was able to identify cows without making a single

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 6
mistake (on a validation set with 31 samples). Despite of the distortion - caused by reducing the
dimension - the clear clusters in Figure 5 also indicate the good performance of the system.

Title:
c3.eps
Creator:
MATLAB, The Mathworks, Inc.
Preview:
This EPS picture was not saved
with a preview included in it.
Comment:
This EPS picture will print to a
PostScript printer, but not to
other types of printers.

Figure 6. 2-dimensional map of 51 samples of one cow in three different states:


'x' - delayed milking, 'o' - hunger, '+' - heat. The Kruskal's stress is 6.1%.

The second problem, identifying states of a cow, turned out to be more difficult: The system
made one mistake (on a validation set of size 16), misclassifying states of 'heat' and 'hunger'.
Moreover, by applying various distance measures, the system had obviously no problems to
distinguish between the state 'milking delayed' and other two states; the differences between 'heat'
and 'hunger' were relatively small making their distinction more difficult. This also becomes obvious
in Figure 6: Sounds produced by a cow waiting for milking are quite different from sounds that
correspond to two other states, hunger and heat; sounds which correspond to these two states don't
form regular clusters.

CONCLUSIONS

The implemented prototype system turned out to be quite robust and classified cows by their voices
with high accuracy. Moreover, its implementation does not require sophisticated hardware: an
ordinary PC with a sound card and a microphone is sufficient. On the other hand there are still some
restrictions to be considered. The approach was tested on a relatively small data set. So the question
arises, will this straightforward approach also be successful if the data set is expanded considerably?
Therefore main efforts are now to enlarge the data base.

Another problem is to label the meaning of a vocalization of an animal. While samples of different
cows can be labeled unequivocally, the reason why an animal makes which noise is much more
difficult to determine. And in addition, even if the situation can be determined correctly there is no

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 7
guaranty that an animal will not make sounds of different meaning. Therefore two ways seem
appropriate: The classification of samples in the way described above and in parallel the
development of an unsupervised classification. This offers the opportunity that, by using the
enlarged data base and such a unsupervised classification, unknown „words“ could be revealed.

Figure 7. Sonogram showing a section of the oscillogram of Figure 2

While the PSD estimates seem to be efficient to recognize individual cows it may become necessary
to supplement the PSD estimates by additional parameters to improve the recognition of animal
conditions. Such parameter could e.g. be derived from the variation of the vocalization through time.
Figure 7, showing a sonogram of Figure 2, reveals that there is much more information in animal
vocalization then we used in the forgoing.

Therefore the present work aims at:


- enlarging the data base,
- checking the unsupervised classification and
- evaluating the benefits of more sophisticated data preprocessing taking time variance into account

REFERENCES

1. Dasarathy, B.V. (editor). 1991. Nearest Neighbor (NN) Norms: NN Pattern Classification
Techniques, IEEE Computer Society Press
2. Cox, T.F., and M.A.A. Cox. 1994. Multidimensional Scaling. ChapmMan & Hall
3. Math Works Inc. 1992. MATLAB User's Guide.
4. Proakis, J.G. and D.G. Manolakis . 1996. Digital Signal Processing: Principles, Algorithms
and Applications. Macmillan, New York, 3rd edition
5. Nadler, M. and E.P.: Smith. 1993. Pattern Engineering Recognition. John Wiley & Sons
6. Schwirzke, K. and U. Hilgefort. 1997. Klangwerk. c’t (2): 118-150

XIII CIGR Congress on Agricultural Engineering, 2-6 February 1998, Rabat, Maroc Page 8

View publication stats

Potrebbero piacerti anche