Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ELECTRÓNICA
GUÍA DE PRÁCTICAS
SANGOLQUÍ, ECUADOR
2018
Guı́a de Prácticas del Laboratorio de
Procesos Estocásticos
INTRODUCCIÓN
Las prácticas serán desarrolladas por los estudiantes después de haber revisado la guı́a co-
rrespondiente y realizado el trabajo preparatorio. El trabajo preparatorio es individual y será
verificado antes de la realización de la práctica.
Las prácticas podrán realizarse en grupo de máximo 2 estudiantes y deberán participar
todos sus integrantes sin excepción. Cada grupo deberá anticiparse en disponer de todos los
elementos/requisitos necesarios para la ejecución de cada práctica.
Se entregará un informe de cada práctica en un plazo no mayor a 8 dı́as a través de la
plataforma de aula virtual utilizada. El informe debe ser subido a la plataforma en formato
PDF y será defendido en forma individual en la próxima sesión de laboratorio.
Presentación de informes
1. Tı́tulo de la práctica
2. Autores y filiación
3. Resumen (Visión general en menos de 200 palabras)
4. Introducción (Fundamento teórico, motivación y objetivos)
5. Métodos y materiales (En caso de aplicar)
6. Procedimiento de la práctica (Proceso, componentes, código, funcionalidad, etc.)
7. Resultados y análisis (Usar anexos en caso de ser necesario)
8. Conclusiones y recomendaciones
9. Bibliografı́a
10. Anexos (En caso de requerirlo)
1
Rúbrica de calificación
Desempeño
Actividad
Excelente Bueno Regular Malo
Trabajo Conoce detalles Conoce solo Apenas conoce No realizó el
preparatorio del trabajo generalidades el tema preparatorio
(6 puntos) (4 puntos) (2 puntos) (0 puntos)
Informe de la Informe incluye Informe incluye Informe incluye No presentó el
práctica todas las sec- todas las acti- secciones y acti- informe
ciones y activi- vidades pero no vidades en for-
dades secciones ma parcial
(7 puntos) (5 puntos) (3 puntos) (0 puntos)
Presentación Conoce detalles Conoce solo Apenas conoce No realizó la
de la práctica generalidades el tema práctica
(7 puntos) (5 puntos) (3 puntos) (0 puntos)
Recomendaciones
2
UNIDAD 1
GUÍA DE PRÁCTICA No. 1.1
1. Tema
Introducción al conteo y muestro en juegos de azar.
3. Documentación a entregar
El informe de cada grupo en formato PDF a través de la plataforma informática.
4. Objetivos
Familiarizar al estudiante con los procesos de conteo y muestro en procesos aleatorios
mediante el uso de Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para el modelado
de juegos de azar.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 1 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para la generación de números
aleatorios.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
3
UNIDAD 1
GUÍA DE PRÁCTICA No. 1.2
1. Tema
Cálculo de probabilidades y el teorema de Bayes.
3. Documentación a entregar
El informe de cada grupo en formato PDF a través de la plataforma informática.
4. Objetivos
Familiarizar al estudiante con el cálculo de probabilidades y uso del teorema de Bayes en
Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para la determi-
nación de probabilidades.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 2 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para el cálculo de probabilidades.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
4
UNIDAD 2
GUÍA DE PRÁCTICA No. 2.1
1. Tema
Modelado de los procesos de Poisson y Markov.
3. Documentación a entregar
El informe de cada grupo en formato PDF a través de la plataforma informática.
4. Objetivos
Familiarizar al estudiante con la utilización de los procesos de Poisson y Markov en Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para el modelado
de los procesos de Poisson y Markov.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 3 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para el modelado de los procesos
de Poisson y Markov.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
5
UNIDAD 2
GUÍA DE PRÁCTICA No. 2.2
1. Tema
Análisis de los procesos de Poisson.
3. Documentación a entregar
El informe de cada grupo en formato PDF a través de la plataforma informática.
4. Objetivos
Familiarizar al estudiante con el análisis de los denominados procesos de Poisson en Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para el análisis
de los procesos de Poisson.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 4 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para el análisis de los procesos
de Poisson.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
6
UNIDAD 3
GUÍA DE PRÁCTICA No. 3.1
1. Tema
3. Documentación a entregar
4. Objetivos
Familiarizar al estudiante con la simulación de sistemas similares a un canal de comunica-
ción binario en Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para la simulación
de sistemas aleatorios.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 5 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para la simulación de sistemas
aleatorios.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
7
UNIDAD 3
GUÍA DE PRÁCTICA No. 3.2
1. Tema
Determinación de la confiabilidad de un sistema.
3. Documentación a entregar
El informe de cada grupo en formato PDF a través de la plataforma informática.
4. Objetivos
Familiarizar al estudiante con parámetros aleatorios como la confiabilidad de un sistema
en Matlab.
Entender la funcionalidad de los diversos comandos existentes en Matlab para el cálculo
de la confiabilidad de un sistema.
5. Materiales
Computador con software Matlab instalado.
6. Procedimiento
Realizar las actividades listadas en el Anexo 6 de esta guı́a.
7. Preguntas
Describir los comandos principales disponibles en Matlab para determinar la confiabilidad
de un sistema.
8. Bibliografı́a
1. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes, 4th Edition.
McGraw-Hill, ISBN 978-0071226615, 2002.
2. Hwei P. Hsu. Schaum’s Outline of Theory and Problems of Probability, Random Variables,
and Random Processes, 2nd Edition. McGraw Hill, ISBN 0-07-030644-3, 2011.
8
ANEXOS
9
Anexo 1. Counting, Sampling, and Games in Matlab1
1. Introduction
In this lab, we will study counting experiments and demonstrate how they relate to random
sampling from a set. These ideas will be used to examine some of the games that people play. As
with most problems in engineering, you will be required to do some mathematical reasoning and
then verify your results using a software tool. For this class, that software tool will be Matlab.
10
Another useful function in Matlab for being used in this lab is randperm. For instance,
randperm(n,k) returns a row vector containing k unique integers selected randomly from 1 to
n inclusive.
There is one function that both computes the value of nk and returns all of the combinations
of length k, namely nchoosek(n,k). This function takes two inputs. When the first input is a
single number, the output is the numerical value of nk . When the first input is a vector of
n
length n, then nchoosek returns the k combinations of the n numbers 1 through n, taken k
at a time. To see this, type nchoosek(5,3) to see that there are 10 combinations of 5 elements
taken 3 at a time. To see the actual combinations, type nchoosek(1:5,3).
When this last function is called with only one return variable (num matches above), it only
returns the number of matches. If you want to see which values are actually matched, you can
replace the last line with
3. Lotto
Lotto is a game where each player chooses K unique numbers out of a possible N numbers
until the State closes the game. At that point, the State picks its own K unique numbers and
then pays each player based on the number of correctly matched numbers.
In a hypothetical Lotto game, let there be N = 42 possible numbers. Each player (and hence
the State) chooses K = 6 numbers. This means that there are 42
6 = 5245786 possible ways to
choose 6 numbers from 42 possible numbers. Therefore, the odds of matching all 6 numbers is
5245786 to 1. But what are the odds of matching, say, only 4 numbers. To see this, break the 42
possible numbers into a desired set of 6 numbers and the unwanted set of 36 numbers. If a player
matches 4 of the desired numbers, then the player also matches 2 numbers from the unwanted
set. Therefore, the total number of combinations of 4 desired numbers and 2 unwanted numbers
is 64 36
2 = 9450. This means that the odds of matching exactly 4 numbers is 5245786 to 9450,
which is approximately 555 to 1. In general, there are
6 42 − 6
k 6−k
ways to match k = 0, 1, . . . , 6 numbers in a (42, 6) Lotto. The odds of matching k numbers is just
the number of possible outcomes vs. the number of outcomes that produce k matches, namely
11
42 6 42−6
6 to k 6−k . The ‘inverse of odds’ is the probability of k matches
6 42−6
k 6−k
P (k; 42, 6) = 42
; k = 0, 1, . . . , 6.
6
There are two functions that you need to create for simulating a Lotto game. The first one,
namely lotto game.m, must play a Lotto game with M players and returns the draw that the
state made along with player draws and their respective matches. To use this function, you have
define three values: N = total possible numbers, k = number of draws for each player, and M =
number of players. For example, if you type
lotto game(42,6,5)
State =
35 2 25 41 12 37
player data =
5 3 12 40 4 14 1
42 9 21 13 29 40 0
33 28 6 4 1 14 0
35 42 1 33 25 22 2
11 35 12 32 28 8 2
In the player data matrix, the last column contains the number of matches that the player
got.
The second function to create is lotto histo.m, which plays a Lotto game with M players
and plots the measured and theoretical statistics for the game. To see this, you must type the
following:
12
way to do this is to set up a vector of length M that is one when the Powerball matches and
zero otherwise. Then, you can use find on this vector to select the correct indices for the various
matches vectors. Also, the first two plots are of conditional probabilities, so their probabilities
may not sum up to one. However, the final plot is a PMF and the sum of its probabilities should
be one.)
4. Keno
Keno is similar to Lotto in that the players choose K numbers out a possible N and the
players are paid based on matching k numbers. However in Keno, the State draws n ≥ K
numbers. Such a game is referred to as an (N, n, K) Keno game. The probability of getting k
matches in an (N, n, K) Keno game is
n N −n
k K−k
P (k; N, n, K) = N
; k = 0, 1, . . . , K.
K
−n
where nk is the number of ways a player can match k desired numbers, N
K−k is the number of
N
ways a player can match K − k unwanted numbers, and K is the total number of ways that a
player can choose K numbers from a possible N numbers.
4.1. Assignment
In a hypothetical scenario, consider playing a (60, 20, 10) Keno game. What is the proba-
bility of matching 0 ≤ k ≤ 10 numbers?
(Optional) Write Matlab functions named keno game.m and keno histo.m that provide
the same information as lotto game.m and lotto histo.m, respectively. Use these pro-
grams to list and plot example plays and interesting graphs.
The Lotto program given earlier is a special case of the Keno program. How would you
use the Keno program to simulate the Lotto experiment?
5. Horse Racing
In a 12 horse race, there are 12! = 479001600 possible ways for the horses to finish, so
choosing the order of all 12 is a long shot. However, there are only (12)(11)(10) = (12)3 = 1320
ways for the order of finish for the first, second, and third horse to finish. Matching the top three
horses in order is known as a ‘trifecta.’
5.1. Assignment
(Optional) Write a Matlab program that simulates many horse races and keep track of
the number of trifecta’s that are hit. Have the program display (or echo) the estimated
probability of hitting a trifecta. What is the theoretical probability of hitting a trifecta?
How does the estimated probability compare to the actual probability. Use many trials.
A ‘trifecta box’ bet allows a player to pick the top three horses without specifying the
order. These bets cost 6 times as much a regular trifecta bet. Why? How many possible
ways can a trifecta box? (Hint: Think about going from permutations to combinations.)
(Optional) Write a Matlab program that simulates many horse races and keep track of the
number of trifecta boxes that are hit. Have the program display (or echo) the estimated
probability of hitting a trifecta box. What is the theoretical probability of hitting a trifecta
13
box? How does the estimated probability compare to the actual probability. Use many
trials.
14
Anexo 2. Probability, Conditional Probability and
Bayes’ Theorem in Matlab2
1. Introduction
In this lab you will use Matlab to help solve a variety of problems in probability theory. Last
exercises require familiarity with Bayes’ theorem.
You will use two of Matlab’s random-number functions, rand and randperm, to simulate
random experiments. These can be used to check your solutions to simple problems – for more
complex problems where it is difficult or impossible to find an analytic solution, these simulation-
based methods are often a good alternative.
2. Simple Simulation
Carrying out probability experiments is known as sampling. In probability theory it is useful
to distinguish between sampling with replacement and sampling without replacement. In the
former case, the conditions of a probability experiment remain the same from one experiment
to the next, so that the probabilities do not change. In the latter case, the conditions change,
based on the outcome of previous experiments. For example, consider selecting balls from a
bag containing 3 red balls and 3 green balls. If the balls are replaced after each experiment,
the probability of selecting a green or red ball will always be 0.5. However, if the balls are not
replaced, selecting a green ball on the first experiment will make selecting a green ball less likely
for the second experiment. Examples of sampling with replacement include tossing a coin or
throwing a die. Examples of sampling without replacement include lottery draws or selections
for a football team.
In Matlab, you can use the rand function to simulate sampling with replacement, and
randperm to simulate sampling without replacement.
Set w = rand(1,1).
Then if w ≤ 0,5, say event A has occurred, if 0,5 ≤ w ≤ 0,8 then say B has occurred, and
otherwise say C has occurred.
If the experiment is repeated many times, the results can be used to estimate event proba-
bilities. The more experiments there are, the more accurate the estimation becomes. You will
investigate this for the simple example of rolling a four-sided die, where each number from 1 to
4 occurs with probability 0.25. You should:
15
2. Say that a 1 has been thrown on the ith experiment if the w(i) is less than or equal to
0.25. Count how many times a 4 has been thrown from the 100 experiments – call it n.
4. How close is this estimate to the true value, 0.25? Repeat the steps above for 500, 1000,
5000 and 10000 experiments. What do you find?
If each experiment involved rolling 2 dice, you could simulate N experiments using rand(N,2),
with each row corresponding to one experiment.
You will repeat this simulation to estimate the probability of drawing a red ball first, then
a green ball second.
1. Write a Matlab program to repeat the above simulation 100 times. Produce a count, n, of
how many times that a red ball was drawn first, and a green ball second.
2. After repeating the simulation, estimate the probability using p(red, green) = n/100
4. Repeat the above steps for 500, 1000, 5000 and 10000 experiments. What do you find?
3. Problems
On paper, calculate the solutions to the following problems. In each case, check your answer
by writing a Matlab program to simulate the problem and estimating the required probability
from a very large number of experiments, using the methods from Section 2. You will need to
decide whether the problem is equivalent to sampling with replacement or sampling without
replacement.
1. A fair coin is tossed four times. What is the probability of getting two heads and two tails
(in any order)?
2. A lottery has balls numbered from 1 to 10. Five balls are drawn, and the winner must
match all five balls (ordering doesn’t matter). What is the probability of winning?
3. Two four sided dice are thrown at the same time. This is repeated three times. What is
the probability that a double 4 is thrown at least once out of the three times?
4. A four-sided die is thrown 6 times. What is the probability of throwing two 4s consecuti-
vely? (This is much easier to simulate than to calculate by hand!)
16
4. Case: Lecture Attendance
The head of the Department is worried about poor lecture attendance among students. He
decides to commission a survey to investigate possible causes. In particular, he is interested
in whether the timing of lectures affects attendance, and whether it varies between males and
females. The Department counts attendance at two lectures for near-identical courses, one held
at 9h00, the other at 10h00, and, using the database of students registered for each course, finds
the following data:
You can load the above data as two matrices, data1 and data2, from the file lab2.mat. You
should be able to answer the questions that follow by performing computations directly on these
matrices using vectorization techniques. We use M to denote the event that a student is male,
F for the event that a student is female; P for the event that a student is present, and A for
the event that a student is absent. Obviously these two pairs of events are mutually exclusive!
1. For each lecture, use the data to find matrices giving the joint probability tables of being
present or absent from a lecture and being male or female, i.e.,
Present Absent
Males p(M ∩ P ) p(M ∩ A)
Females p(F ∩ P ) p(F ∩ A)
2. For each lecture, find two vectors, one giving p(M ) and p(F ), the other giving p(P ) and
p(A).
3. Use your answers to the above two questions to state, for each lecture, whether a student’s
sex has is a factor affecting lecture attendance, by finding whether the events M and F
are independent of the events P and A.
4. Now compute the matrices giving the conditional probabilities of a student attending a
lecture, given the student’s sex, i.e.,
Present Absent
Males p(P |M ) p(A|M )
Females p(P |F ) p(A|F )
1. One of the two coins is selected (we don’t know which). The coin is flipped and comes up
heads. What is the probability that the coin chosen is the biased one, given that it came
up heads?
17
2. The same coin is flipped a second time. What is the probability that the coin comes
up heads, given that it came up heads on the first flip? Why are the two events not
independent?
3. Suppose one of the coins is flipped 2n times. Write a function in Matlab to compute the
probability of obtaining n heads and n tails, given that the coin is fair, and given that the
coin is biased (this should be an argument to the function). The function should work for
any value of n.
4. The coin has been randomly selected. Use your function to compute the probability that
the chosen coin is the biased one, given that n throws were heads and n throws were tails.
Plot the the value of this probability for n from 0 to 40. Explain the shape of your plot.
5. Now suppose that there are two biased coins (both the same as before) and one unbiased
coin. One coin is selected randomly. Given the same scenario of n heads and n tails being
obtained, modify your calculations from question 4 to calculate the probability that the
coin chosen is biased. What is the lowest value of n for which it is more likely that the
coin chosen was the unbiased one?
18
Anexo 3. Continuous Distributions and Language
Modelling3
1. Introduction
In this practical you will use the exponential distribution to model the firing pattern of a
neuron. This requires familiarity with the lecture material on continuous probability distribu-
tions. In addition, you will study the distribution of letters (alphabetic characters) in the English
language.
1. Write down the probability density function for T , assuming that it has an exponential
distribution.
2. Given that we know the mean firing rate to be 10Hz, what is the best choice of the
parameter, λ, of the exponential distribution model?
You can use a histogram to plot the probability density of the sample firing data. To produce
a histogram, use the following Matlab commands:
The first command divides the samples into ‘bins’, each with width 0.025, and produces a
count, n, of how many samples are in each bin. The second line calculates the probability of
each bin, by diving the count by the total number of samples (1000) and the width of the bin,
and then plots this as a bar chart.
2. On the same plot, display the probability density function for the exponential distribution
with your parameter chosen in Question 2 (use Matlab’s fplot function). How well does
it fit the experimental data?
1. p(T ≤ 0,15)
19
3. p(T > 0,15|T > 0,05). This is the probability that the neuron waits more than 0.15s before
firing, given we have observed that it has already waited 0.05s.
4. Can you explain the connection between your last two answers?
chars, an array containing all the letters in order, and a 27th symbol, <b>, signifying the
gaps between words — this should be treated like any other character.
unigram counts, a vector containing the number of occurrences in the dictionary of each
letter. For example, unigram counts(1) is the number of times ‘a’ occurs in the dictionary.
bigram counts, a matrix containing counts of pairs of adjacent letters in the dictionary.
For example, bigram counts(1,2) is the number of times ‘a’ is followed by ‘b’ in the
dictionary.
4.1. Analysis
1. In Matlab, list the letters ordered by how frequently they occur in the dictionary of English.
3. Compute the probability of observing each letter (including word breaks), assuming suc-
cessive letters in a word are independent.
4. Calculate the entropy of the distribution. What is the expected number of bits per letter
needed to code an English word?
5. What assumptions have you made in question 4 that mean that your answer is unlikely to
be true in practice for coding English text?
1. Use the data in bigram counts to compute the full set of bigram probabilities for letters,
p(Li |Li−1 ), where Li is any letter and Li−1 is the preceding letter.
2. Use the original unigram model to compute the probability of observing the word ‘enjoy-
ment’, and of observing the fake word ‘eejmnnoty’. (It is helpful to work using logs).
3. Now use the bigram model to calculate the same probabilities. Comment on your findings.
20
Anexo 4. Poisson Regression4
1. Introduction
When dealing with two or more variables, the functional relation between the variables is
often of interest. For count data, one model that is frequently used is the Poisson regression model
and applications are found in most sciences: technology, medicine etc. The Poisson regression
model is also implemented in many packages for statistical analysis of data. In this computer
lab you will learn more about:
The Poisson regression model and how to estimate the model parameters.
Model selection, i.e., the number of explanatory variables to use.
Before to start the lab, read the theory and try to explain the difference between linear
regression and Poisson regression.
21
3. The Poisson Regression Model
Lets say we have a sequence of count data, ni , i = 1, . . . , k, for some event, i.e., the number
of perished in traffic accidents in a year. This count data is assumed to be observations from
random variables Ni ∈ Po (µi ), (called responses or dependent variables) with mean value µi =
µi (xi1 , . . . , xip ). The variables, xi1 , . . . , xip , are called explanatory variables6 and are assumed to
measure factors that influence the count data.
We restrict µi to be a log-linear function7 ,
where µi = µi (βp~ ) is a function of βp~ = (β0 , . . . , βp ). The ML-estimates βp~∗ = (β0∗ , . . . , βp∗ ) are
the values of β that maximize the likelihood function L(β). Often it is easier to maximize the
log-likelihood function,
k
X k
X k
X
l(β) = − log(ni !) + ni log(µi ) − µi .
i=1 i=1 i=1
By setting the first order derivates of the log-likelihood equal to zero, we get a system of
(p + 1) non-linear equations in βj ,
k k
∂l(β) X ∂µi ni X
= −1 = (ni − µi )xij = 0, j = 0, . . . , p.
∂βj ∂βj µi
i=1 i=1
Usually, the equation system must be solved with some numerical method, e.g., the Newton-
Raphson algorithm. This is also the method implemented in the function lab4 regress, which
was written for the purpose of this lab and can be found in the course Web page. Use the
command “type lab4 regres” to see the code.
Poisson regression model belongs to a class of models called generalized linear models. In a
generalized linear model (GLM), the mean of the response, µ, is modeled as a monotonic (non-
linear) transformation of a linear function of the explanatory variables, g(β0 + β1 x1 + β2 x2 , . . .).
The inverse of the transformation function g is called the canonical link function. In Poisson
6
Several other names exist in the literature: independent variables, regressor variables, predictor variables.
7
Sometimes the model incorporates an extra term ti : µi = ti exp(β0 + β1 xi1 + . . . + βp xip ).
22
regression this function is the log function, but in other GLM’s different link functions are used,
see “doc glmfit” for a list of supported link functions in the Matlab function glmfit8 . Also,
the response may take different distributions, such as the normal or the binomial distribution.
Below, we will use related function glmval with the logarithmic link function to make predictions
from the fitted model, see the code below.
traffic = struct(’year’,data(26:end,1),’killed’,data(26:end,2),...
’cars’,data(26:end,5),’petrol’,data(26:end,6));
Question 1: Which are the explanatory variables? And which is the response?
Redraw the plot from above for the reduced data set
plot(traffic.year,traffic.killed,’o’)
figure(1), hold on
We start the analysis with one explanatory variable, traffic.year. Note usage of the pre-
diction routine for the generalized linear models glmval
X1 = [traffic.year-mean(traffic.year)];
n = traffic.killed;
beta1 = lab4_regress(X1,n,1e-6);
my_fit = glmval(beta1, X1,’log’);
plot(traffic.year, my_fit, ’b-’)
Question 2: What is your estimate of β? Convince yourself that this is the solution. You
can utilize the following code for this purpose:
X0=ones(size(X1));
X=[X0, X1];
mu=exp(X*beta1);
X’*(n-mu)
Does it appear to be the solution? Judging from the plot, is this model sufficient to describe
the number of people killed in traffic accidents?
Although this simple model seems to capture the overall trend, adding further explanatory
variables may improve the fit. Thus, we try adding the number of cars as a variable in our model.
X2 = [traffic.year-mean(traffic.year), traffic.cars-mean(traffic.cars)];
beta2 = lab4_regress(X2,n,1e-6);
my_fit = glmval(beta2, X2,’log’);
plot(traffic.year, my_fit, ’g-’)
8
glmfit uses a method called weighted least squares to compute the β estimates.
23
Question 3: Have your estimates β0∗ and β1∗ changed? Does accounting for the number of
cars improve the fit?
It seems reasonable also to add the quantity of sold petrol as this would reflect the total
mileage of all cars9
X3 = [traffic.year-mean(traffic.year), traffic.cars-mean(traffic.cars),...
traffic.petrol-mean(traffic.petrol)];
beta3 = lab4_regress(X3,n,1e-6);
my_fit = glmval(beta3, X3,’log’);
plot(traffic.year, my_fit, ’r-’)
Question 4: Have your estimates of β changed now? Use the command format long to
display more digits. Which model do you choose?
Question 5: Use chi2inv to get the quantiles of the χ2 distribution. Consider 5 % signifi-
cance level for your test.
DEV2 = 2*traffic.killed’*([X0,X3]*beta3-[X0,X2]*beta2)
Question 6: Is the improvement with model 3 significant compared to model 2? Repeat the
test for model 2 against model 1 and also model 3 against model 1? Which model do you
choose? Do you think that there was a sufficient number of explanatory variables used to
explain the traffic deaths? Why?.
5. Prediction
Now we want to use our model to predict the expected number of perished in traffic accidents
six years from now, i.e., year 2016. In order to do this we first must have an estimate of the
number of cars that year. Start by plotting the number of cars vs. year,
figure(2)
plot(traffic.year, traffic.cars, ’o’)
hold on
9
Assuming that the mean fuel consumption of a car has been constant over the years – a 1970 year model of a
Volvo used about 10l per 100km which is approximately the same as for the 2000 year model. Of course, the year
2000 model has more than twice the horsepower.
24
We will here use a simple linear model for the number of cars, yi , year xi
yi = β0 + β1 xi + i
where the errors, i ∈ N (0, (σ )2 ), are assumed to be independent and identically distributed.
This is called a linear regression model. It is possible to estimate the parameters with the ma-
ximum likelihood method similar as for the Poisson regression model above.
In Matlab, the function regress computes the least-squares (LS) estimates of the linear re-
gression model. In the case of i being normally distributed, the LS method is equivalent to the
ML method with exactly the same estimates.
res = traffic.cars-(phat(1)+phat(2)*traffic.year);
figure(3), plot(traffic.year,res,’o’)
figure(4), normplot(res)
Using the following code provide with prediction of petrol consumption for 2016.
Notice that this time quadratic model had to be fit to the data.
Question 9: Are you satisfied with the obtained fits for the petrol and the number of cars?
However, for our purpose these rough estimates are sufficient. The expected number of pe-
rished can now be predicted using µi = exp(β0 + β1 xi1 + . . . + βp xip ),
25
Anexo 5. Simulating a Binary Communication
Channel10
1. Introduction
A binary symmetric channel is a common communications channel model used in coding
theory and information theory. In this model, a transmitter wishes to send a bit (a zero or a
one), and the receiver receives a bit. It is assumed that the bit is usually transmitted correctly,
but that it will be “flipped” with a small probability (the crossover probability). This channel is
used frequently in information theory because it is one of the simplest channels to analyze.
The result is the noisy output Y = ±µ + N . To decode the transmission, we say a 0 was
transmitted if Y ≤ 0, and a 1 is transmitted if Y > 0. The overall proposed system is shown in
figure 1.
2.1. Assignment
Derive a formula for P (E). Then from your simulations of Y , experimentally estimate the
P (E). You do this by incrementing an error counter whenever Y ≤ 0 for a transmitted 1 and
Y > 0 for a transmitted 0. Divide the number of errors by the number of transmissions.
Estimate P (E) for µ = 5, m = 1000, and σ = 50, 25, 5, 2.5, 0.5, 0.25 and 0.05. Overplot the
2
estimated P (E) and the exact P (E) vs. SN R = 10 log10 ( σµ2 ). What do you conclude from this
plot?
10
Material based on the course Introduction to Communications Principles from Colorado State University
26
Anexo 6. System Reliability11
1. Introduction
The reliability of an engineering system12 is often defined as the probability that the system
will function as intended. We will also refer to the opposite concept, namely the failure probability
Pf (f stands for failure), which is the probability that the system will not function as intended.
The level of performance of a system will obviously depend on the properties of the system.
Assume that all interesting properties of an engineering system are described by a set of
parameters x1 , x2 , . . . , xn . We want the system to endure a set of loads of our choice13 (the
system might be subjected to more than one load). The magnitudes of these loads — let us
denote them y1 , y2 , . . . , ym — must however be limited, due to engineering imperfection, cost
limits, time limits, and the like: we understand that there are combinations of y1 , y2 , . . . , ym and
x1 , x2 , . . . , xn where the system capacity is exceeded and where the system will inevitably break
down. We formalize this by
The system functions as intended ⇔ h(y1 ; . . . ; ym ; x1 ; . . . ; xn ) > 0
The system does not function as intended ⇔ h(y1 ; . . . ; ym ; x1 ; . . . ; xn ) < 0
The function h is called the failure function (performance function, state function). If the
parameters and the applied “loads” are marred by randomness, we instead treat them as ran-
dom variables Y1 , Y2 , . . . , Ym and X1 , X2 , . . . , Xn . In terms hereof, we can now write the failure
probability Pf as
Pf = P (h(Y1 ; . . . ; Ym ; X1 ; . . . ; Xn ) < 0)
The random variable Z = h(Y1 ; . . . ; Ym ; X1 ; . . . ; Xn ) is sometimes referred to as the safety
margin.
In this computer exercise, our goal is to calculate Pf . The function h will always be given, as
will the distribution functions of Y1 , . . . , Ym and X1 , . . . , Xn . We will obtain Pf from simulations.
No real-world data today!
2. MOSFET
A depletion-mode MOSFET (Metal-Oxide-Semiconductor Field-Effect Transistor) is a three-
terminal electronic device. When an n-channel MOSFET is connected like in figure 2, then it
11
Material based on the course Probability, Statistics and Risk from the Chalmers-University of Gothenburg.
12
e.g., a construction, a vehicle, a production line, a multi-article stock-room logistic system, a computer
network, a nuclear power-plant, a dam, a communication satellite, or a finance portfolio.
13
e.g., the construction must bear a certain amount of wind load or weight; the vehicle must cover a satisfactory
distance before its engine starts malfunctioning; the production line must produce goods continuously for at least
a week (say) to be profitable; the stock-room logistic system must deliver at least 99 % (say) of the goods on order
on time and to the right orderer; etc.
27
has the following voltage-current characteristic
2
A × VT R ,
U > VT R (constant current region)
2
I = A(2VT R U − U ), 0 < U < VT R (triode region)
undefined, U <0
Here U is the applied voltage (i.e., the drain-source voltage), VT R is a threshold voltage
(always positive for n-channel MOSFETs), and A is the conductance parameter.
When current flows into the positive terminal of a passive device, electrical power is dissipated
in the device as heat. This electrical power P is equal to the product of the port voltage and
port current. For a multiport device, the total electrical power input is given by the sum of input
power taken over all ports. The dissipated energy will increase the temperature of the device,
which affects the properties of it. Every device has a maximum allowable operating temperature
limit that must not be exceeded. In other words, there is a maximum electrical power limit Pmax .
In our case,
U × I < Pmax
if the MOSFET is to work well. Assume that U , VT R , and A are independent random variables:
Pmax = 300e-3;
N = 20000;
EU = 10;
DU = 2;
medianA = 1e-3;
sigma = 0.2;
aVTR = 3;
bVTR = 5;
U = EU + DU*randn(1,N);
A = medianA*exp(sigma*randn(1,N));
VTR = aVTR + (bVTR-aVTR)*rand(1,N);
I = zeros(1,N);
index1 = find(U >= VTR);
index2 = find(U < VTR);
I(index1) = A(index1).*VTR(index1).^2; % Constant current region
I(index2) = A(index2).*(2*VTR(index2).*U(index2)-U(index2).^2); % Triode region
h = Pmax-U.*I;
Pfhat = sum(h<0)/N
2.1. Assignment
Report simulated probabilities of failure (do few repetitions).
Is P (U < 0) negligible? If U < 0 was not negligible, it is bad for the MOSFET, so let us
consider this case to be a failure. Write down a failure function h(Pmax ; U ; A; VT R ) with
this extra condition.
28