Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
hugo.hernandez@forschem.org
doi: 10.13140/RG.2.2.12571.72489
Abstract
Keywords
Critical Derivatives, Dynamic Monte Carlo simulation, Dynamic Systems, Markovian behavior,
Minimum Variance, Modeling, Noise, Parameter Identification, Randomistics, Regression.
1. Introduction
Noise is the result of simultaneous random variations in different factors influencing a certain
measured variable. In principle, noise cannot be avoided although it can be significantly
reduced either by carefully controlling as many external factors as possible (reducing their
variation), by averaging replicated measurements, or by filtering the data using suitable
mathematical models. However, certain factors might be difficult to control or they are just
unknown significant factors; replicating measurements might be difficult, impractical or
expensive; noise models used for filtering might be inadequate. Furthermore, most
mathematical filtering methods result in loss of useful information about the system. Thus, the
mathematical identification of the behavior of a system is usually affected by noise.
Particularly in this report, the modeling and identification of noisy dynamic systems will be
discussed, although the results can be generalized to other types of systems (e.g. by replacing
time with any other independent variable such as position). For this purpose, the most general
case of randomistic dynamic variables will be considered.[1] The term randomistic basically
describes any variable in general whether it is random, deterministic or both.[2] Thus, any noisy
variable ( ) can be represented by the sum of one determinist and one random component:
( ) ( ) ( )̃ ( )
(1.1)
The derivative of the randomistic dynamic variable with respect to time will be also
randomistic:
( ) ( ) ( ( ) ̃ ( )) ( ) ( ) ̃ ( )
̃ ( ) ( )
(1.2)
( ) ( )
In this expression, the terms and are deterministic, whereas the derivative of the
standard random variable can be expressed as:
̃ ( ) √
̃ ( )
(1.3)
√
where is the standard deviation of the derivative,[4] and ̃ is another standard random
variable with probability density function given by:[5]
̃ ( )
̃ (̃ ) ∫ ̃ (̃ ( )) ̃ (̃ ( ) ̃ ) ̃ ( )
̃ ( )
(1.4)
Interestingly, for normal random variables their derivatives are again normal. For any other
distribution, its derivative will result in a different distribution. Such distribution of the
derivative of any standard random variable with respect to time will always be symmetrical.
Therefore, since negative and positive values of a symmetrical distribution are equally
probable, the th derivative of the standard random variable will eventually become a large sum
of random variables from the same distribution (for large values of ), and thus, it will tend to
be a normal random variable according to the central limit theorem.[6] Afterwards, all
following derivatives will remain normal.
On the other hand, the integral over time (starting at time ) of the dynamic variable is:[7]
( ) ∫( ( ) ( ) ̃ ( )) ∫ ( ) ∫ ( )̃ ( )
∫ ( ) 〈̃ 〉 ∫ ( )
(1.5)
where ( ) represents the first integral over time of , and 〈 ̃ 〉 is the average value of the
standard random variable ̃ in the time range from a certain initial time to . Again, the
integral is also randomistic as it combines deterministic and random terms. As long as the
number of realizations during that time interval is large, the average value is expected to be
exactly zero. In practice, the sample of values of ̃ in the interval may result in 〈 ̃ 〉 . For
this case, if ∫ ( ) ∫ ( ) , then the dynamic behavior of the integral will be
randomly determined. This is the case for example of Markovian random variables.[1,4]
Let us consider a typical random walk behavior described by the following expression:
( )
(1.6)
The position of the object is obtained by integrating Eq. (1.6). One possible result of the
integration, performed by dynamic Monte Carlos simulation, is presented in Figure 1. 50
different possible integration results are presented in Figure 2. Clearly, the behavior of a single
integration is not representative of the behavior of the system. However, if we have only one
integration result available, how can we discriminate between the true dynamic behavior of the
system and just a mathematical artifact caused by randomness? Figure 3 shows a regression
model describing the behavior of the integration path obtained in Figure 1.
Figure 1. Random walk behavior of the displacement of an object over time with respect to its
initial position. Only one integration result is shown. Numerical integration step: 0.01 a.u.
While not perfect, the polynomial regression model obtained fits very well the data ( ,
standard deviation of model error: ). The observed behavior seems deterministic
although contaminated with some small noise. However, the truth is that it was obtained from
a pure random process.
In this report, a method is proposed for identifying the true nature of a dynamic process
(random or deterministic), and for determining the most adequate model for describing both
random and deterministic effects.
Figure 2. Random walk behavior of the displacement of an object over time with respect to its
initial position. 50 different dynamic Monte Carlo simulation results are shown. Numerical
integration step: 0.01 a.u.
Figure 3. Polynomial regression (red dashed line) of the displacement of an object over time
with respect to its initial position presented in Figure 1.
By looking at Eq. (1.3), it is possible to observe that the standard deviation of the derivative
(and therefore the variance) of a random variable always increases (considering that √ ).
Similarly, the standard deviation (or variance) decreases by integration.[7]
On the other hand, let us consider the behavior of a deterministic variable ( ). For a time
interval between and , the average value of the deterministic variable is:
∫ ( )
〈 ( )〉
(2.1)
and its variance will be given by:
∫ ( ( ) 〈 ( )〉) ∫ ( )
〈 ( )〉
(2.2)
Furthermore, the average and variance of its derivative with respect to time will be (using the
ergodic-stochastic transformation [8]):
( )
( ) ∫ ( ) ( )
〈 〉
(2.3)
( ) ( ) ( )
∫ ( 〈 〉) ∫ ( ) ( ) ( )
( )
(2.4)
Assuming that for the time interval considered the deterministic variable can be approximated
by a truncated polynomial series expansion (as long as the function is continuous in the
interval):
( ) ∑ ( ) ∑
(2.5)
where . Then, Eq. (2.2) and (2.4) become:
∑∑ [ ]
( )( )( )
(2.6)
( )( )
∑∑ [ ]
(2.7)
Thus, for , the variance of the deterministic derivative is expected to decrease compared
to the variance of the original deterministic variable, which is the opposite result obtained for
random variables. Similarly, the integration of a deterministic variable will result in an increase
in the variance.
As it can be inferred, these numerical results depend on the unit of time considered. Therefore,
by changing (normalizing) the unit of time in such a way that unit of time, both
conditions will be fulfilled ( √ ).
Then, by comparing the standard deviation of a certain dynamic variable with the standard
deviation of its derivative with respect to the normalized time, it is possible to determine if the
dominant behavior of such variable is deterministic (decrease in variance) or random (increase
in variance).
For randomistic variables where both deterministic and random effects are present, there is a
critical derivative (or integral) of the variable where the change from dominant deterministic to
dominant random behavior (or vice versa) will be observed. Such critical derivative (or integral)
can be easily identified because it will present a global minimum variance. The first integral of
such critical derivative might be a Markovian random variable if the random effect is larger than
the deterministic effect. In order to test the significance of the deterministic effect, let us
assume that the critical derivative ( ) has no deterministic contribution. Then, the variance
of the integrated random variable is:[7]
( ) (∫ ) ( ) (〈 〉 ) ( ) (〈 〉)
(2.8)
Given that (〈 〉 ) (since the time step is unit of time and assuming ), and
(2.9)
Thus, for a pure random process the observed behavior of at normalized time should be
within the confidence interval (for a confidence level of ) described by the following
expression:
( ( ) [ ( ) ( )√ ( ) ( )√ ])
(2.10)
where is the sample variance obtained from the values of the critical derivative, and ( )
is a critical value obtained from the probability distribution of . Assuming that the critical
derivative can be approximately described by a normal random variable, then:
( )
(2.11)
where represents the two-tailed critical Student’s T value obtained for a significance
level and degrees of freedom.
( ) ( )
( )
√
(2.12)
it would be expected that most of the observed T values will have an absolute value less or
equal than for a purely random process. Otherwise the deterministic effect can be
considered significant.
Now, if the ( )th derivative is found to behave randomly, it will be a Markovian random
variable. All other lower order derivatives (or integrals) will be the result of a random process,
even though they show an apparently deterministic behavior.
On the other hand, if the deterministic effect on the ( )th derivative is found significant, it
can be identified using conventional methods of identification of dynamic systems.
Furthermore, since the noise is reduced as the derivative order decreases (or the integration
order increases), it would be desirable to model the system at the lowest derivative (or higher
integration) order possible.
Another alternative to test the significance of the deterministic effect emerging the critical
derivative consists on modeling the behavior of the critical derivative (e.g. using conventional
regression models or any other modeling approach), and then testing the significance of the
model after subtracting the effect of noise coming from the ( )th derivative. This implies
performing an ANOVA test for the model,[9] and correcting the F-value obtained as follows:
( )
(2.13)
where is the sum of square model errors (or residuals), and is the sum of square
integral error coming from the derivative of the modeled variable, assuming that it behaves as
white noise. If , the behavior of the critical derivative is completely random.
(2.14)
where is the total sum of square differences of the modeled variable with respect to its
average value. For the critical derivative, all these terms can be calculated as:
̂
∑( ( ) ( ))
(2.15)
〈 〉
( )( )
(2.16)
∑( ( ) 〈 〉)
(2.17)
̂
where ( ) represents the estimation of the model at normalized time , and 〈 〉 is the
mean normalized time step in the ( ) derivative data.
th
If the corrected F-value is larger than the corresponding critical F-value ( ), then the
deterministic model can be considered significant. Then, if the goodness-of-fit of the model
( ) is satisfactory for the modeler (e.g. ), such model can be used to describe
the deterministic component of the dynamic system.
The method proposed in this report for the identification of models in noisy dynamic systems is
summarized in Figure 4.
Figure 4. Proposed procedure for modeling and identification of dynamic systems with noise.
1. Obtain the discrete set of dynamic data: Information about the response variable(s)
( ) and the time of observation ( ) for each of observations.
(3.1)
There will be no value of the elapsed time for the last (final) observation.
( )
(3.2)
4. Normalize the time scale such that the initial transformed time is zero and one unit of
transformed time corresponds to :
(3.3)
5. Calculate the sample average and sample variance of the response variable(s), as
estimates of the expected value and variance of the randomistic variable:
∑
( ) 〈 〉
(3.4)
∑ ( 〈 〉)
( )
(3.5)
6. Calculate the forward finite differences for the response variable(s) as an estimate of
its derivative:
( ) ( )
(3.6)
7. Estimate the expected value and variance of the derivative using the sample average
and sample variance of the finite differences:
∑ ( )
( ) 〈 ⁄ 〉
(3.7)
∑ (( ) 〈 ⁄ 〉)
( ) ⁄
(3.8)
8. If ⁄ , repeat steps 6 and 7 for the next derivative until a global minimum
variance is found, that is, until the variance increases. The derivative with the minimum
variance will be the critical derivative. If no global minimum is found after a certain
predefined maximum number of derivatives ( ), then the response
variable can be considered to be dominantly deterministic. In general, the expressions
for the th derivative will be:
( ) ( )
( ) ( )
(3.9)
∑ ( )
( ) 〈 ⁄ 〉
(3.10)
∑ (( ) 〈 ⁄ 〉)
( ) ⁄
(3.11)
() ( ) () () ( )
( ) ∫ ( )
(3.12)
()
where and it is non-existent for previous observation times.
The corresponding expected value and variance are estimated using the following
expressions:
()
() () ∑
( ) 〈 〉
(3.13)
()
∑ ( 〈 ( ) 〉)
()
( ) ()
(3.14)
10. Each derivative higher than the critical derivative ( ) can be modeled as a white-noise
random variable as follows:
(3.15)
function (e.g. normal) and testing the suitability of the best model obtained,[10] or ii)
Approximating the probability density function using a polynomial function, and
identifying the coefficients of the polynomial using the data available.[11]
11. Model the deterministic component of the critical derivative using any conventional
modeling method (e.g. linear or non-linear regression, or any other method). Perform
an analysis of variance for the model obtained. Additionally calculate the integral noise
term as:
∑ (( ) 〈 ⁄ 〉)
〈 〉
( )
(3.16)
Correct the F-value and coefficient of the model using Eq. (2.13) and (2.14),
respectively, and analyze significance and goodness-of-fit for the model obtained.
12. If no significant deterministic model for the critical derivative is found, then model all
other lower-order derivatives and/or integrals as random variables, as described in step
10. Otherwise, use conventional modeling approaches (e.g. statistical regression) for
determining both the deterministic and random components of the corresponding
model.
13. Particularly for the ( )th derivative, it is important to test if the supposed
deterministic behavior is truly deterministic or is caused by random walks. This requires
calculating the observed Student’s T values:
( ) ( )
( )
√( )
(3.17)
and computing the proportion ( ) of data points whose absolute T value is larger than
the critical :
∑ (| ( )| )
(3.18)
where represents Heaviside’s step function.
If , then the observed deterministic behavior is most probably independent of
randomness. Otherwise it is most probably caused by the random walk of a Markovian
random variable. In case of doubt, different confidence levels can be used.
(3.19)
Also, please notice that the in the original time scale is given by:
( )
(3.20)
4. Examples
As a first example let us consider the data presented in Figure 1. This data set containing 251
data points was obtained by dynamic Monte Carlo simulation using the following discrete
dynamic model:
( ) ( )
(4.1)
The procedure described in Section 3 was performed to obtain the critical derivative. 4
derivatives and 4 integrals of the original data for the standardized time were considered. The
variances of these variables are summarized in Table 1 and Figure 5. The critical derivative
(minimum variance variable) was found at the first derivative of the original data set. This result
is consistent to the equation used to generate the data (Eq. 4.1), since it can be expressed as:
( ) ( )
(4.2)
The test for Markovian behavior at the original data set (which corresponds to the integral of
the critical derivative) indicates that with a 99% confidence, there is a 73.3% probability that the
behavior is the result of randomness. That is, this method successfully identifies that the
original data set is a sample of a Markovian variable. Figure 6 shows the test of Markovian
behavior for all data points using a 99% confidence interval. It can be observed that most of the
data points lie within the confidence interval for Markovian behavior, indicating that the
deterministic behavior was not significant compared to the effect of randomness.
Table 1. Value of variance obtained for different integrals and derivatives of the data set
considered in Example 4.1, using the normalized time.
Variable Variance
(4)
X Fourth integral 1.73E+13
(3)
X Third integral 4.69E+09
''X Second integral 6.30E+05
'X First integral 2.80E+01
X Original data set 2.60E-03
X' First derivative 9.48E-05
X'' Second derivative 1.95E-04
X(3) Third derivative 6.04E-04
X(4) Fourth derivative 2.06E-03
Critical derivative identification
10
log10(Variance)
5
0
-4 -2 0 2 4
Variable derivative
Figure 5. Behavior of the decimal logarithm of the variance as a function of the order of the
derivative (positive) or integral (negative) for the data set of Example 4.1.
Test for Markovian behavior: 99 % confidence
0.2
0.1
X(c-1)
0.0
-0.1
-0.2
Normalized time
Figure 6. Test for Markovian behavior at 99% confidence for the original data in Example 4.1.
-0.02
Normalized time
Figure 7. Dynamic behavior of the critical derivative (first derivative) of the data in Example 4.1.
Now, let us model the behavior of the critical derivative. Figure 7 shows the dynamic behavior
of the first forward finite difference as an approximation to the first derivative of the original
data. A simple linear regression model in time results in the following equation:
(4.3)
Then, the data presented in Figure 7 is used to identify the parameters of a normal distribution.
The optimal model obtained for is a white-noise normal distribution with
§
The determination coefficient of the random model is obtained by determining the goodness-of-fit of
the model to the cumulative probability obtained from the data. The residuals of the model are
calculated as the closest distance between the value of the cumulative probability function obtained
from the model for each data point and the cumulative probability interval corresponding to the point,
similar to the Kolmogorov-Smirnov test. If the prediction lies within the interval, the residual is set to
zero. On the other hand, the total sum of squares is determined considering the central value of each
cumulative probability interval.
The second example consists of data obtained from the dynamic Monte Carlo simulation of the
following model describing the motion in one dimension of a certain particle subject to random
environmental forces:
(4.4)
where is the position of the particle in one direction, is the mass of the particle, is the
standard deviation of the random force acting on the particle in that direction, and is a
standard normal random number.
The data set was obtained by numerically solving the model presented in Eq. (4.4) assuming
mass unit, 〈 〉 force unit, and using Euler’s integration method with =0.01 time
units, but with a sampling time of 0.1 time units, that is, the data is recorded every 10
integration steps. The particle starts at rest at a position of 1 distance unit. The dynamic
behavior of the position of the particle for the first 2.5 time units is presented in Figure 8 along
with a polynomial regression model.
Figure 8. Dynamic behavior of the position of a particle subject to random forces. Blue dots:
Data obtained from numerical integration of Eq. (4.4). Red line: Polynomial regression model.
The regression model identified from the data set describes the motion of a particle with an
initial position of distance unit, an initial velocity of distance units per time unit, and a
constant acceleration of distance units per square time unit, corresponding to a
constant force of force units. Knowing the source of the data, this is clearly incorrect.
However, if the source of the data is not known, this could have been the conclusion of this
analysis, supported by a high value of almost .
Now, using the dynoise algorithm implementing the method proposed in Section 3, the
following results are obtained:
Variance vector:
Var
X(-4) 1.464066e+07
X(-3) 5.234835e+05
X(-2) 9.649167e+03
X(-1) 6.698097e+01
X(0) 4.927131e-03
X(1) 3.026376e-05
X(2) 8.088210e-06
X(3) 1.046126e-05
X(4) 2.313044e-05
0.01
log10(Variance)
X(c-1)
0.00
0
-0.02 -0.01
-2
-4
-4 -2 0 2 4 0 5 10 15 20 25
Figure 9. Graphical results obtained for the data of Example 4.2. Left plot: Logarithm of
variance vs. derivative order. Right plot: Velocity vs. normalized time.
The method correctly identifies the main source of noise at the second derivative of the
position (acceleration). Furthermore, it is found that the velocity presents without any doubt a
Markovian behavior, as can be seen in Figure 9. Therefore, it can be concluded that the
behavior of the position is random and not deterministic, indicating that the regression model
presented in Figure 8 does not satisfactorily describe the true nature of the system. A white-
noise normal probability model of the second derivative yields ⁄ distance
units per square original time units ( ). This model underestimates the original
standard deviation is actually less than 20% below its true value.
In this example, the dynamic model previously presented in Eq. (4.4) is modified by
incorporating a constant force acting on the particle. The new model is:
(4.5)
where the same parameters as in the previous example are used, and considering
force units. The data obtained from the numerical integration, and the corresponding
polynomial regression model, are summarized in Figure 10.
Figure 10. Dynamic behavior of the position of a particle subject to random and deterministic
forces. Blue dots: Data obtained from numerical integration of Eq. (4.5). Red line: Polynomial
regression model.
The deterministic behavior predicted by the regression model indicates that the particle has an
initial velocity of distance units per unit of time, and that it is subject to a constant
force of distance units per square time unit. Although the initial velocity can be
neglected, the deterministic force is about larger than the real force applied to the
particle. Please notice that this model has a coefficient of , so there should be no
reason to doubt of the goodness of this model. Let us, however, perform the dynamic analysis
of noise for this case. The graphical results are summarized in Figure 11.
Variance vector:
Var
X(-4) 1.478924e+07
X(-3) 5.337957e+05
X(-2) 1.004452e+04
X(-1) 7.400518e+01
X(0) 2.164395e-02
X(1) 1.452142e-04
X(2) 4.401511e-06
X(3) 7.367433e-06
X(4) 1.965444e-05
X(c-1)
0
-2
-0.01
-4
-4 -2 0 2 4 0 5 10 15 20 25
Figure 11. Graphical results obtained for the data of Example 4.3. Left plot: Logarithm of
variance vs. derivative order. Right plot: Velocity vs. normalized time.
The critical derivative was again found to be the second derivative of the position. The velocity
in this case did not show a purely Markovian behavior. The probability of being a true
deterministic effect was found to be 68%. This is a remarkable result as the constant
deterministic force was 10 times smaller than the standard deviation of the random force.
Now, for the critical derivative data, a linear regression model with respect to time results in
the following corrected values: and ( ) . This
means that the second derivative of position is a fixed random variable with a non-zero mean.
Thus, modelling the second derivative with respect to the normalized time as a normal
distribution, the following parameters are found: ⁄ , ⁄
( ). Therefore, the value of the constant deterministic force acting on the particle
would be:
⁄
(4.6)
This model obtained at the critical derivative provides an improved estimation on the true
deterministic force acting on the particle ( ), compared to the value obtained from the
regression model of the observed variable. The estimated standard deviation of the force in
this case would be .
The next example is taken from data published by NASA's Goddard Institute for Space Studies
(GISS) (https://climate.nasa.gov/vital-signs/global-temperature/), indicating the Global
Temperature Anomaly (°C) observed from 1880 to 2018. This temperature anomaly is
determined as the change in global surface temperature relative to 1951-1980 average
temperatures. The non-smoothed data is graphically presented in Figure 12, along with a
polynomial regression model. The data shows a clear increase in global temperature,
particularly from the 1960’s. Given that the data is noisy, the purpose of this example is finding
a more accurate model of global temperature change.
The results obtained using the analysis of minimum variance (dynoise) are the following:
Variance vector:
Var
X(-4) 7.954541e+11
X(-3) 6.575329e+08
X(-2) 2.256168e+05
X(-1) 2.764175e+01
X(0) 1.134398e-01
X(1) 1.273689e-02
X(2) 3.102713e-02
X(3) 9.351267e-02
X(4) 3.100849e-01
Figure 12. Dynamic behavior of the global temperature anomaly between 1880 and 2018. Blue
dots: Data obtained from NASA. Red line: Polynomial regression model.
1
log10(Variance)
X(c-1)
6
0
4
-1
2
0
-2
-2
Figure 13. Graphical results obtained for the data of Example 4.4 from 1880 to 2018. Left plot:
Logarithm of variance vs. derivative order. Right plot: Temperature anomaly vs. normalized
time.
The results obtained indicate that random processes are predominant in this dynamic system,
and that any deterministic effect is not larger than the effect of randomness. The right plot in
Figure 13 illustrates this effect. It can be seen that the data lies within the limits of the potential
random walk behavior. Given that the highest temperatures have been registered since 2001, it
is possible that a deterministic effect cannot be clearly observed in such a long period of time.
Thus, the same analysis is done considering only the data from 2001 to 2018:
Variance vector:
Var
X(-4) 1.951128e+05
X(-3) 1.684098e+04
X(-2) 7.214294e+02
X(-1) 1.202650e+01
X(0) 1.804477e-02
X(1) 9.034559e-03
X(2) 2.121958e-02
X(3) 6.736381e-02
X(4) 2.390725e-01
0.8
log10(Variance)
X(c-1)
2
0.4
0
0.0
-2
-4 -2 0 2 4 0 5 10 15
Figure 14. Graphical results for the data of Example 4.4 from 2001 to 2018. Left plot: Logarithm
of variance vs. derivative order. Right plot: Temperature anomaly vs. normalized time.
Even during the present millennium, a possible deterministic effect on global temperature
cannot be differentiated from the behavior of a pure random walk.
Continuing with the modeling of the system, the behavior of the critical derivative over the
whole time range is presented in Figure 15. A Shapiro-Wilk normality test indicates that the first
derivative data is normal ( , ). Thus, it can be modeled as a white-
noise normal random variable. The optimal value identified for the standard deviation is
⁄ ( ). Thus, the dynamic model can be expressed as:
( ) ( )
(4.7)
where represents the year and is a random number from a standard normal random
distribution.
0.3
0.2
Critical derivative
0.1
0.0
-0.2 -0.1
Normalized time
Figure 15. Dynamic behavior of the first derivative of global temperature from the data in
Example 4.4.
Figure 16. Probability density function of the random models obtained for the yearly change in
average global temperature (in °C/yr). Red dashed line: Model obtained using the 1880-2018
data (Eq. 4.7). Blue solid line: Model obtained using the 2001- 2018 data (Eq. 4.8).
A similar model obtained from the data between 2001 and 2018 is ( ):
( ) ( )
(4.8)
Both models are compared in Figure 16. The similitude [10] between the two random models is
. Both random models are very similar, although the standard deviation for the data
between 2001 and 2018 is slightly lower than the standard deviation observed for the whole
range (1880-2018).
Historical data for the USD/EUR daily exchange rate during 2017, obtained from investing.com,
is presented in Figure 17. A simple regression analysis indicates that in 2017, the USD/EUR
exchange rate decreased at an average rate of , or equivalently, .
Figure 17. Dynamic behavior of the USD/EUR daily exchange rate for 2017. Day 1 corresponds to
January 1st 2017. Blue dots: Data obtained from investing.com. Red line: Linear regression
model.
The analysis of the dynamic system in the presence of noise yields the following results (see
also Figure 18):
Variance vector:
Var
X(-4) 4.575447e+12
X(-3) 5.762850e+09
X(-2) 3.913140e+06
X(-1) 9.627974e+02
X(0) 1.648129e-03
X(1) 1.276320e-04
X(2) 2.321122e-03
X(3) 5.470130e-02
X(4) 1.380370e+00
1.1
10
log10(Variance)
1.0
X(c-1)
5
0.9
0
0.8
-4 -2 0 2 4 0 20 40 60 80 100 120
Figure 18. Graphical results for the USD/EUR daily exchange rate data of Example 4.5 for 2017.
Left plot: Logarithm of variance vs. derivative order. Right plot: Exchange rate vs. normalized
time.
According to these results, the decrease in exchange rate during 2017 could not be considered
significant compared to the dynamic effect of randomness. Furthermore, the optimal model
obtained for describing the behavior of the exchange rate at the first derivative is (
):
(4.9)
In order to validate both the linear regression model and the model obtained after analyzing
the behavior of noise, the daily exchange rate for 2018 will be included in the data. The two-
year data is presented in Figure 19, along with the prediction of the linear regression model
obtained for 2017. Clearly, the linear regression model previously obtained did not have a good
prediction capability. Let us now analyze again the noise of the two-year data set.
Figure 19. Dynamic behavior of the USD/EUR daily exchange rate for 2017 and 2018. Day 1
corresponds to January 1st 2017. Blue dots: Data obtained from investing.com. Red line: Linear
regression model obtained using only 2017 data.
Variance vector:
Var
X(-4) 1.275540e+15
X(-3) 3.841462e+11
X(-2) 6.248026e+07
X(-1) 3.693980e+03
X(0) 1.464498e-03
X(1) 1.177795e-04
X(2) 2.066066e-03
X(3) 4.915564e-02
X(4) 1.269285e+00
At a first glance of the right plot in Figure 20, it looks like the random behavior of the exchange
rate considering both years is similar to the behavior observed only for 2017 (right plot of
Figure 18). Estimating again the standard deviation for a white-noise normal random
distribution, the new model obtained is ( ):
(4.10)
1.2
1.1
10
log10(Variance)
1.0
X(c-1)
5
0.9
0.8
0
0.7
-4 -2 0 2 4 0 50 100 150 200 250
Figure 20. Graphical results for the USD/EUR daily exchange rate data of Example 4.5 for 2017
and 2018. Left plot: Logarithm of variance vs. derivative order. Right plot: Exchange rate vs.
normalized time.
Figure 21. Probability density function of the random models obtained for the daily change in
USD/EUR exchange rate. Red dashed line: Model obtained from 2017 data (Eq. 4.9). Blue solid
line: Model obtained from 2017 and 2018 data (Eq. 4.10).
Figure 21 presents a comparison of the probability density distribution obtained in the models
presented in Eq. (4.9) and (4.10). The similitude [10] obtained between those two models is
. These results indicate that there is only a small decrease in the width of the
distribution by including the data from 2018, without a significant change in the behavior of the
random variable. Therefore, it can be concluded that both models (Eq. 4.9 and 4.10) are in
principle identical.
As a last example, let us consider the data previously reported [12] on the practical
implementation of a body-weight control method. The data, along with a linear regression
model, is presented in Figure 22.
Figure 22. Dynamic behavior of a body weight under a weight control method. Blue dots: Data
reported in [12]. Red line: Linear regression model.
Variance vector:
Var
X(-4) 1.723218e+08
X(-3) 2.708154e+07
X(-2) 2.259027e+06
X(-1) 7.029331e+04
X(0) 1.545614e+00
X(1) 5.319603e+00
X(2) 8.829995e+02
X(3) 1.683780e+05
X(4) 3.492943e+07
Since the critical derivative is at the original dataset, the linear regression model is reliable.
Furthermore, it is confirmed that it is a significant deterministic behavior. It is however needed
to correct the of the regression model using the , and to validate its significance with the
corrected F-value.
On the other hand, the residuals of this regression model can be modelled as a white-noise
normal random distribution with ( ). Please notice that this model
already includes the integral noise. The obtained randomistic model is thus:
( ) ( )
(4.11)
where is the weight in kg, is the time in days from the beginning of the weight-control
method implementation, is a standard normal random variable, and the total value
was calculated as:
(4.12)
where is the determination coefficient of the deterministic model before the correction
( ), and is the determination coefficient of the random model based on the fit of
the cumulative probability distribution.
Since the random variable is the result of two effects, additive noise (measurement noise) in
the weight and integral noise from the weight change rate, a more detailed randomistic model
of the system would be:
(4.13)
( ) ( ) ∫
(4.14)
where the standard deviation of the derivative of the weight was obtained from the in
( ) 〈 〉
additive noise is: √ ( ) . ( ) and 〈 〉 are obtained
from the dataset. Alternatively, the standard error estimated from the regression model ( )
can be used instead of , resulting in . , ∫ and
5. Conclusion
Randomness taking place at the time derivatives of a certain observed dynamic variable might
have a significant effect on the outcome of the dynamic variable, and in some cases, it might be
even more important than deterministic effects. When randomness is more relevant than
determinism at the derivative of the observed variable, the latter will behave as a Markovian
random variable. This means that the observed behavior is the result of chance, and therefore,
it is not a repeatable, predictable result. Modeling of noisy dynamic systems should be done
carefully in order to avoid reaching false conclusions about the system behavior. In this report,
a numerical procedure is proposed for identifying the best model (random, deterministic or
randomistic) that should be considered for a noisy dynamic system, using only the measured
data as an input. The procedure involves the identification of the critical derivative, which
defines the limit between randomness and determinism. This limit is found where a global
minimum variance is observed after a normalization of the time scale. Derivatives higher than
the critical derivative can be modelled as white noise (usually normal according to an extension
of a central limit theorem). The standard deviation of such white noise can be determined by
minimizing the residuals of the cumulative probability distribution.[10] Derivatives (or integrals)
lower than the critical derivative can be modelled using conventional methods, as long as the
emerging determinism is not Markovian. A Markovian test is proposed for assessing the
significance of the emerging determinism, compared to the randomness present in the system.
Furthermore, when the random component of a derivative is modeled, the integral noise
coming from the next derivative should be taken into account in order to obtain a better model
of the system. Different examples were presented in order to test the method. The data of
some of these examples was obtained from dynamic Monte Carlo simulation results,
confirming that the proposed approach satisfactorily identified the original model of the
system. Additional real-life examples illustrate how random processes can easily be interpreted
as deterministic, when the noise in the data is neglected (i.e. global temperature change,
exchange rates, etc.). It is also possible to validate using this method whether a certain change
in the system provides significant results or they are just the result of chance (i.e. efficacy of
weight-loss methods).
Acknowledgments
This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
References
[1] Hernandez, H. (2018). On the Behavior of Dynamic Random Variables. ForsChem Research
Reports 2018-09. doi: 10.13140/RG.2.2.20135.19366.
[2] Hernandez, H. (2018). The Realm of Randomistic Variables. ForsChem Research Reports
2018-10. doi: 10.13140/RG.2.2.29034.16326.
[4] Hernandez, H. (2016). Variance algebra applied to dynamical systems, ForsChem Research
Reports, 2016-2, doi: 10.13140/RG.2.2.36507.26403.
[6] Hernandez, H. (2019). Sums and Averages of Large Samples Using Standard
Transformations: The Central Limit Theorem and the Law of Large Numbers. ForsChem
Research Reports 2019-01. doi: 10.13140/RG.2.2.32429.33767.
[9] Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models.
5th Ed. Boston: McGraw-Hill Irwin.
[11] Hernandez, H. (2018). Comparison of Methods for the Reconstruction of Probability Density
Functions from Data Samples. ForsChem Research Reports 2018-12. doi:
10.13140/RG.2.2.30177.35686.
} else {
omsg=paste("The critical derivative is at the",order[cr],"derivative of the data
set.")
}
if (cr==M){
omsg=paste(omsg,"Please notice that this was the maximum derivative tested. Try
increasing the maximum number of derivatives for finding a global minimum.")
}
}
if (cr<0){
if (cr<(-4)){
omsg=paste("The critical derivative is at the",-cr,"-th integral of the data set.")
} else {
omsg=paste("The critical derivative is at the",order[-cr],"integral of the data set.")
}
if (cr==(-L)){
omsg=paste(omsg,"Please notice that this was the maximum integral tested. Try
increasing the maximum number of integrals for finding a global minimum.")
}
}
if (disp==TRUE){
Vardf=data.frame(Var)
rownames(Vardf)=colnames(X)
print("Variance vector:")
print(Vardf)
print(omsg)
plot((-L:M),log10(Var),xlab="Variable derivative",ylab="log10(Variance)",main="Critical
derivative identification",type="l")
}
print(omsg)
plot(t,X[,L+cr],xlab="Normalized time",ylab="X(c-1)",main=paste("Test for Markovian
behavior:",100*(1-alpha),"% confidence"),ylim=c(min(min(X[,L+cr],na.rm=TRUE),X[1,L+cr]-
tcr*sqrt(max(t)*Var[jc]/3)),max(max(X[,L+cr],na.rm=TRUE),X[1,L+cr]+tcr*sqrt(max(t)*Var[jc]/3
))))
lines(t,X[1,L+cr]+tcr*sqrt(t*Var[jc]/3))
lines(t,X[1,L+cr]-tcr*sqrt(t*Var[jc]/3))
}
#Calculation of SSI for the critical derivative
SSI=((N-abs(cr)-2)*mean(dt)/(2*max(dt)))*Var[jc+1]
omsg=paste("The sum of squares of the integral noise (SSI) is:",SSI,"in normalized time
units, and",SSI/((max(dt))^(2*(cr+1))),"in original time units.")
if (disp==TRUE){
print(omsg)
}
}
#Results
output=data.frame(t,X)
return(output)
}