Sei sulla pagina 1di 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4014499

Application of PCA method to weather prediction task

Conference Paper · December 2002


DOI: 10.1109/ICONIP.2002.1201916 · Source: IEEE Xplore

CITATIONS READS

6 398

2 authors, including:

Jacek Mańdziuk
Warsaw University of Technology
126 PUBLICATIONS   751 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Adaptive metaheuristic methods in dynamically changing environments View project

All content following this page was uploaded by Jacek Mańdziuk on 06 March 2014.

The user has requested enhancement of the downloaded file.


APPLICATION OF PCA METHOD TO WEATHER PREDICTION TASK

Marcin Jaruszewicz(1) and Jacek Mańdziuk(1,2)


(1)
Faculty of Mathematics and Information Science
Warsaw University of Technology, Plac Politechniki 1, 00-661 Warsaw, Poland
e-mail: mandziuk@mini.pw.edu.pl, jaruszewicz@data.pl
(2)
Corresponding author

ABSTRACT training, is able to produce weather predictions almost


A method of short-term weather forecasting based on instantaneously - only a few matrix operations are
artificial neural networks is presented. Each training necessary. On the other hand neural networks, as oppose
sample consist of a date information combined with to the exact differential equations – based methods,
meteorological data from the last three days gathered at provide only approximate solutions and therefore – in
the meteorological station in Miami, USA. The prediction practice – bottom limits on the possible prediction error
goal is the next day’s temperature. Prediction system is exist.
built based on multilayer perceptron network trained with One of the key factors in effective design of neural
backpropagation algorithm with momentum. networks-based systems trained in a supervised manner is
The average prediction error of the network on the test set the quality (reliability) and completeness of the training
equals 1.12°C. The average percentage prediction error is data (past observations).
equal to 5.72%. The results are very encouraging and Based on the completeness criteria the part of Forecast
provide a promise for further exploration of the issue. The Systems Laboratory (FSL) [2] data collected by the
so-called correlation ratio δ between predicted and real University of Miami, Florida, USA was chosen as the
changes (trends) is equal to 0.7136. Relatively high value training and testing data for our prediction system.
of δ additionally confirms good quality of presented Preliminary results of prediction experiments have been
results. published in [3]. The aim of this work is further analysis
Experimental results of application of the Principal of the PCA performance in this task.
Component Analysis method at the stage of pre-
processing of the input data are also presented. In that case 2. SYSTEM ARCHITECTURE
the average prediction error and the average percentage Prediction system was built based on a multilayer
prediction error are equal to 1.41°C and 7.93%, perceptron network. Two different learning algorithms
respectively. In order to explain the reasons of the poorer were applied and tested. At first, standard backpropagation
results obtained with the PCA method a closer look at the method [4-5] with momentum was used. Each network’s
principal components defined by the network is presented. input sample was composed on 17 variables: two
Possible reasons of the PCA failure are pointed out. describing the date of an observation (day and month), and
15 describing the weather data - temperature, pressure, dev
1. INTRODUCTION point temperature, wind direction and wind speed - from
Conventional methods of weather forecasting are based on the last three days preceding the prediction date. The
solving large sets of differential equations which describe network had one hidden layer composed of 10 neurons.
an approximate model of a climate. Completing such task The output layer was composed of one neuron predicting
in a reasonable time requires huge amount of computing temperature value.
power and therefore several attempts to solving the In the other training method, meteorological data was pre-
weather prediction problem using alternative methods processed by the Principal Component Analysis (PCA)
have been proposed. method [6-7]. After applying the PCA, the size of the
Some of them are based on AI systems, and in particular input sample was reduced from 17 to 5 orthogonal,
artificial neural networks [1]. Simple network architecture uncorrelated components. Network’s architecture was
and the power of parallel information processing provide a composed of 5-element input layer, two hidden layers of
tempting alternative to classical methods and open an size 10 and 1-element output layer. Standard
opportunity to output encouraging, high quality results in a backpropagation with momentum was used for training.
relatively short time. A neural network, after appropriate
In both cases the input data was normalized and scaled to where η is the learning rate, wi,j is the weight from the j-th
the range (-1, 1). The following sigmoidal activation neuron in the input layer to the i-th neuron in the output
(transfer) function was used in all neurons except those in layer. After appropriate learning period a set of weights
the input layer: between layers provides an estimation of the PCA
2 (1) transformation matrix.
f ( x) = −x
−1
1+ e
4. WEATHER PREDICTION RESULTS
For input layer neurons identity activation function As mentioned previously the weather data was collected
( f(x)=x ) was used. Initially, all weights in the network for the area of Miami, FL, USA. The dataset consisted of
were randomly chosen from the range (-0.01, 0.01). 942 observations from the period 1998-2000. Variability
of the average daily temperature is presented in Fig. 1.
3. THE PRINCIPAL COMPONENT ANALYSIS
The PCA method defines transformation matrix from the
set of input vectors composed of (possibly) correlated
components to another set of vectors composed of
orthogonal and uncorrelated components. PCA saves the
most relevant information from the multidimensional
dataset and at the same time reduces its dimension. Some
information – statistically irrelevant – is therefore
discarded.
The PCA method is potentially very well suited for neural
networks training methods. Training is more effective
when performed on uncorrelated and orthogonal data.
Moreover, the smaller the network the faster the training
and in several cases the better generalisation properties. Fig. 1. Variability of the daily temperature in Miami, USA
The PCA transformation is based on the following – days of 1998-2000.
autocorrelation matrix:
1 n (2) The average temperature is equal to 21.8°C with the
R xx = ∑ (x k x k )
T

n k =1 minimum of 3.2°C and the maximum of 28.8°C. Seasonal


, oscillations are clearly visible. In the sample there also
where n is the number of vectors in the input set, Xk is the exist some number of (statistically) extreme values. These
k-th vector. outliers as well as the structure of the temperature data are
Eigenvectors of matrix Rxx corresponding to eigenvalues presented in Fig. 2. It is important to note that outliers
sorted in the decreasing order point out the principal were not discarded from the training/testing data.
components. The first principal component is responsible
for the highest percentage of the variance of the
sample [6], the second one – for the next highest variance,
and so on. By choosing the required number of principal
components one is able to build matrix W of the PCA
transformation:
W=[w1,w2,...,wM]T, (3)
where M is the required number of components and wk
(for k=1,…,M) are the principal components itself.
In the neural networks framework the PCA matrix can be
also build in alternative way, using a two layer feed-
forward network trained by Sanger’s formula. The input
Fig. 2. Statistical variability of the temperature in Miami,
and output layer sizes are equal to the dimensions of the
USA – standard box and whisker plot. The central box
input vectors respectively in the initial dataset and in the
covers the middle 50 percent of the data. The sides of the
transformed one. Learning epochs are repeated for all
box are lower and upper quartiles, and the vertical line
vectors from a dataset as long as the weights are changing.
drawn through the box is the median. The whiskers extend
Weights are updated according to the following formula:
out to lower and upper values range of the data. Small
i (4)
∆wi , j = η[ yi x j − yi ∑ w j ,h y h ] boxes are outliers.
h =1 , Since the original size of the dataset (942 samples) was
relatively small for our purpose, the dataset was artificially
doubled by adding to each sample a small amount of is shown above use of PCA put down the quality of all
Gaussian noise (not exceeding 0.01%). The dataset of measured coefficients. Especially the correlation between
1884 observations was divided in two disjunctive sets: predicted change and real change is much smaller then it
learning set composed of 95% of the samples and test set was in the network with plain backpropagation learning
composed of the other 5% of the samples. All samples method.
from learning set were presented to the network 400’000 Table 2. Results for backpropagation with PCA pre-
times in both PCA algorithm and backpropagation processing algorithm.
method.
In case of using the backpropagation algorithm without Error Value [°C] Value [%]
PCA the average prediction error was equal to 1.12°C. Average 1.41 7.93
The respective average percentage error – calculated as Minimum 0,05 0,39
the mean value of all percentage errors within the test set - Maximum 7.5 60.54
was equal to 5.72% (see Table 1). Additionally, the
average comparative error defined as:
N
The final PCA network architecture was chosen based on
(5)
∑ Pi − Ri
t t several preliminary experiments. Test results for example
architectures are presented in Fig. 3.
ε = Ni =0
t −1
∑ Ri − Ri
t

i =0
and the correlation ratio between predicted and real
changes:
N (6)
t −1 t −1
t
∑ ( Ri
t
− Ri )( Pi − Ri )
δ = i =0
,
N
t −1 2 t −1 2
∑ (R i
t
− R ) ( Pi − R )
i
t
i
i =0

were calculated. In the above equations N denotes the size


of the test set, Pit and Rit are the i-th predicted and real
Fig. 3. Percentage error for some of the tested
values at time t, resp. The ε coefficient was equal to
architectures in the PCA experiment.
0.7736 (according to [8] acceptable weather prediction
methods are characterised by ε < 1.0). The correlation 5. CLOSER LOOK AT THE PCA PERFORMANCE
ratio δ was equal to 0.7136. Relatively high value of δ The main qualitative conclusion of this work is that the
additionally confirms good quality of presented results. backpropagation alone outperformed the backpropagation
The experiment was repeated several times with various with PCA support. Several other experiments with
initial weights. The results are repeatable with relatively different number of principal components, including the
low variance (below 10%). case of all components – i.e. without compression –
showed that usability of PCA in weather prediction seems
Table 1. Results for backpropagation without PCA to be doubtful. The possible reason for PCA inefficiency
algorithm. may be very complex mutual relations (dependencies)
between individual factors in the meteorological data.
Error Value [°C] Value [%] Closer look at the principal components defined in our
Average 1,12 5,72 experiment – in case of no data compression, i.e. with all
Minimum 0,00 0,00 17 principal components - revealed some interesting
Maximum 4,40 41,59 properties of the way the input factors were “combined”
into components. First, the sums of absolute values of all
In case the PCA was initially used for pre-processing the weights incoming to every principal component (from the
data (and reducing the input dimension from 17 to 5) the input vector) were calculated. Intuitively, these sums
average error was equal to 1.41°C, i.e. 7.93% of the real should be greater for the first few components (the most
value (see Table 2). relevant ones) than for the less significant components. As
The ε coefficient (average comparative error) was equal to can be clearly seen from Fig. 4, this was actually not the
0.8286 (7.1% larger than plain backpropagation network). case. The sums concerning several further components
The correlation ratio δ was equal to 0.6030 which is (e.g. PC 07 or PC 12) are quite large and comparable to
15.6% smaller then plain backpropagation network. As it the sums of the main ones. Such a situation partly explains
poor results obtained for the PCA with and without
compression. From the rough estimation of the relative factor in this component is temperature (its value and dev
importance of the components presented in Fig. 4 it seems point) - represented by variables number 4, 5 – the last
that the system is unable to select a few leading (main) day, 9, 10 – the last but one day and 14, 15 – from two
components that would properly represent the data. days before. Quite relevant is also the contribution of the
pressure value (inputs number 3, 8 and 13). At the same
time the component has clearly negative correlation with
the wind speed factor (inputs number 7, 12 and 17).
Certainly an expert knowledge is required to evaluate the
quality of the above “choices” as the contributions to the
main component. However, regardless of the particular
choice, the fact that contributing input variables are
chosen according to a clearly defined scheme (the same
variables are selected across all three days) indicates
“rationality” and proper implementation of the PCA
method.

Fig. 4. Sum of absolute values of all input weights to


principal components PC 01, …, PC 17.

Another interesting observation comes from Fig. 5 in


which particular weights from the input to the principal
components are presented. It can be seen in the figure that
even for the last (i.e. the least important) components,
there still exist input variables that provide relatively high
contribution (due to high weights’ values).

Fig. 6. Values of weights from the input layer to the first


principal component PC 01 in the principal components
layer. The meaning of input neurons is described in the
caption of Fig. 5.

From a similar analysis applied to the second principal


component (see Fig. 7) it can be seen that the most
important input factor in that component is the wind
direction – represented by the inputs number 6, 11 and 16.
Also a visibly negative correlation with the month number
in the input vector (the second factor) can be observed.

Fig. 5. Values of weights from the input layer to the


principal components layer. Input vector is defined in the
following way: (1) day of the month, (2) month, (3)
pressure, (4) temperature, (5) dev point temperature, (6)
wind direction, (7) wind speed, (8)-(17) are the same as
(3)-(7) for the two previous days.

On the other hand based on a rough analysis of the main


components defined by the PCA method we believe that
the method is properly implemented since the components
are constructed in a “regular” and intuitive way. The
weights incoming to the first component – presented in
Fig. 6 – unambiguously suggest that the most important
month and the month of the year inputs into one – the day
of the year.

6. CONCLUSIONS
An approach to the weather forecasting problem based on
artificial neural networks is presented. Obtained results -
with an average error of 5.72% (1.12°C) and high
correlation between predicted and real changes - are very
encouraging and provide a promise for further exploration
of the issue, e.g. by applying more sophisticated learning
algorithms like the variable metric method or combining
our approach with expert systems. One of the important
conclusions of this piece of research is problematic
applicability of the PCA method as a supporting tool for
data pre-processing in the problem considered. This issue
Fig. 7. Values of weights from the input layer to the deserves further investigation, however an ad-hoc
second principal component PC 02 in the principal explanation may be a very high complexity and correlation
components layer. The meaning of input neurons is of meteorological data which prevents dimensionality
described in the caption of Fig. 5. reduction and efficient orthogonalisation. Closer look at
the principal components defined by the PCA method
At last in case of the third principal component strong confirms the above suggestion. Our current research is
regularities in the correlation scheme between this devoted to deeper and more precise analysis of the reasons
component and particular input variables can be observed of the PCA failure.
(see Fig. 8). The most important contributions to this
component come from the number of the month (second 7. REFERENCES
input) and the wind direction (variables number 6, 11 and [1] D. Silverman, J. Dracup, Artificial Neural Networks
16). and Long Range Precipitation Prediction in California, J.
of Applied Meteorology, 31(1), 2000, 57-66.
[2] http://raob.fsl.noaa.gov
[3] M. Jaruszewicz, J. Mańdziuk, Short-term weather
forecasting with neural networks, Proc. 6th Int. Conf. On
Neural Networks and Soft Computing, Zakopane,
POLAND, 2002, (in print).
[4] P.J. Werbos, Beyond regression: new tools for
prediction and analysis in the behavioral sciences, Ph.D.
thesis, Harvard University, Cambridge, MA, USA.
[5] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning
Representations by Back-Propagating Errors, Nature, 323,
1986, 533-536.
[6] I.T. Jolliffe, Principal Component Analysis, Springer-
Verlag, 1986.
[7] J. Karhunen and J. Joutsensalo, Representation and
separation of signals using nonlinear PCA type learning,
Fig. 8. Values of weights from the input layer to the third Neural Networks, 7(1): 113-127, 1994.
principal component PC 03 in the principal components [8] Z. Sorbjan, Fundaments of numerical weather
layer. The meaning of input neurons is described in the prediction, WPW, 1975.
caption of Fig. 4.

Another – quite intuitive observation – is the lack of


relevance of the day of the month input variable, which did
not appear to be relevant in either of the first three
components. On the other hand the other data connected
variable representing the month of the year appeared to be
very important. This observation can partly explain the
failure of our previous trial to combine the day of the

View publication stats

Potrebbero piacerti anche