INFORMATION THEORY, CODING & CRYPTOGRAPHY (MCSE 202)
PREPARED BY ARUN PRATAP SINGH 5/26/14 MTECH 2nd SEMESTER
PREPARED BY ARUN PRATAP SINGH 1
1 STOCHASTIC PROCESS : In probability theory, a stochastic process or sometimes random process (widely used) is a collection of random variables; this is often used to represent the evolution of some random value, or system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. In the simple case of discrete time, as opposed to continuous time, a stochastic process involves a sequence of random variables and the time series associated with these random variables (for example, see Markov chain, also known as discrete-time Markov chain). Another basic type of a stochastic process is a random field, whose domain is a region of space, in other words, a random function whose arguments are drawn from a range of continuously changing values. One approach to stochastic processes treats them as functions of one or several deterministic arguments (inputs, in most cases regarded as time) whose values (outputs) are random variables: non-deterministic (single) quantities which have certain probability distributions. Random variables corresponding to various times (or points, in the case of random fields) may be completely different. The main requirement is that these different random quantities all have the same type. Type refers to the codomain of the function. Although the random values of a stochastic process at different times may be independent random variables, in most commonly considered situations they exhibit complicated statistical correlations.
Stock market fluctuations have been modeled by stochastic processes.
UNIT : II
PREPARED BY ARUN PRATAP SINGH 2
2 Given a probability space and a measurable space , an S-valued stochastic process is a collection of S-valued random variables on , indexed by a totally ordered set T ("time"). That is, a stochastic process X is a collection
where each is an S-valued random variable on . The space S is then called the state space of the process.
STATISTICAL INDEPENDENCE : In probability theory, to say that two events are independent (alternatively called statistically independent or stochastically independent ) [1] means that the occurrence of one does not affect the probability of the other. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other. In some instances, the term "independent" is replaced by "statistically independent", "marginally independent", or "absolutely independent" For events : Two events- Two events A and B are independent if and only if their joint probability equals the product of their probabilities:
PREPARED BY ARUN PRATAP SINGH 3
3 . Why this defines independence is made clear by rewriting with conditional probabilities:
and similarly . Thus, the occurrence of B does not affect the probability of A, and vice versa. Although the derived expressions may seem more intuitive, they are not the preferred definition, as the conditional probabilities may be undefined if P(A) or P(B) are 0. Furthermore, the preferred definition makes clear by symmetry that when A is independent of B, B is also independent of A. More than two events A finite set of events {Ai} is pairwise independent iff every pair of events is independent. [2] That is, if and only if for all distinct pairs of indices m, n . A finite set of events is mutually independent if and only if every event is independent of any intersection of the other events. [2] That is, iff for every subset {An}
This is called the multiplication rule for independent events. For more than two events, a mutually independent set of events is (by definition) pairwise independent, but the converse is not necessarily true. For random variables Two random variables Two random variables X and Y are independent iff the elements of the -system generated by them are independent; that is to say, for every a and b, the events {X a} and {Y b} are independent events (as defined above). That is, X and Y with cumulative distribution functions and , and probability densities and , are independent if and only if (iff) the combined random variable (X, Y) has a joint cumulative distribution function
or equivalently, a joint density
PREPARED BY ARUN PRATAP SINGH 4
4
More than two random variables A set of random variables is pairwise independent iff every pair of random variables is independent. A set of random variables is mutually independent iff for any finite subset and any finite sequence of numbers , the events are mutually independent events (as defined above). The measure-theoretically inclined may prefer to substitute events {X A} for events {X a} in the above definition, where A is any Borel set. That definition is exactly equivalent to the one above when the values of the random variables are real numbers. It has the advantage of working also for complex-valued random variables or for random variables taking values in any measurable space (which includes topological spaces endowed by appropriate -algebras). Conditional independence Intuitively, two random variables X and Y are conditionally independent given Z if, once Z is known, the value of Y does not add any additional information about X. For instance, two measurements X and Y of the same underlying quantity Z are not independent, but they are conditionally independent given Z (unless the errors in the two measurements are somehow connected). The formal definition of conditional independence is based on the idea of conditional distributions. If X, Y, and Z are discrete random variables, then we define X and Y to beconditionally independent given Z if
for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are continuous and have a joint probability density function p, then X and Y are conditionally independent given Z if
for all real numbers x, y and z such that pZ(z) > 0. If X and Y are conditionally independent given Z, then
for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given Y and Z is the same as that given Z alone. A similar equation holds for the conditional probability density functions in the continuous case. Independence can be seen as a special kind of conditional independence, since probability can be seen as a kind of conditional probability given no events.
PREPARED BY ARUN PRATAP SINGH 5
5 Independent -algebras[edit] The definitions above are both generalized by the following definition of independence for -algebras. Let (, , Pr) be a probability space and let A and B be two sub--algebras of . A and B are said to be independent if, whenever A A and B B,
Likewise, a finite family of -algebras is said to be independent if and only if for all
and an infinite family of -algebras is said to be independent if all its finite subfamilies are independent. The new definition relates to the previous ones very directly: Two events are independent (in the old sense) if and only if the -algebras that they generate are independent (in the new sense). The -algebra generated by an eventE is, by definition,
Two random variables X and Y defined over are independent (in the old sense) if and only if the -algebras that they generate are independent (in the new sense). The -algebra generated by a random variable X taking values in some measurable space S consists, by definition, of all subsets of of the form X 1 (U), where U is any measurable subset of S. Using this definition, it is easy to show that if X and Y are random variables and Y is constant, then X and Y are independent, since the -algebra generated by a constant random variable is the trivial -algebra {, }. Probability zero events cannot affect independence so independence also holds if Y is only Pr-almost surely constant. Properties : Self-dependence Note that an event is independent of itself iff . Thus if an event or its complement almost surely occurs, it is independent of itself. For example, if A is choosing any number but 0.5 from a uniform distribution on the unit interval,A is independent of itself, even though, tautologically, A fully determines A. Expectation and covariance If X and Y are independent, then the expectation operator E has the property
PREPARED BY ARUN PRATAP SINGH 6
6 and for the covariance since we have
so the covariance cov(X, Y) is zero. (The converse of these, i.e. the proposition that if two random variables have a covariance of 0 they must be independent, is not true. Seeuncorrelated.) Characteristic function Two random variables X and Y are independent if and only if the characteristic function of the random vector (X, Y) satisfies
In particular the characteristic function of their sum is the product of their marginal characteristic functions:
though the reverse implication is not true. Random variables that satisfy the latter condition are called sub-independent.
Examples : Rolling a die The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time are independent. By contrast, the event of getting a 6 the first time a die is rolled and the event that the sum of the numbers seen on the first and second trials is 8 are not independent. Drawing cards If two cards are drawn with replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are independent. By contrast, if two cards are drawn without replacement from a deck of cards, the event of drawing a red card on the first trial and that of drawing a red card on the second trial are again not independent. Pairwise and mutual independence Consider the two probability spaces shown. In both cases, P(A) = P(B) = 1/2 and P(C) = 1/4 The first space is pairwise independent but not mutually independent. The second space is mutually independent. To illustrate the difference, consider conditioning on two events. In the pairwise independent case, although, for example, A is independent of both B and C, it is not independent of B C:
PREPARED BY ARUN PRATAP SINGH 7
7
In the mutually independent case however:
See also for a three-event example in which
and yet no two of the three events are pairwise independent.
Pairwise independent, but not mutually independent, events.
PREPARED BY ARUN PRATAP SINGH 8
8
Mutually independent events.
BERNOULLI PROCESS : In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variables Xi are identical and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin (but with consistent unfairness). Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes (such as the process for a six-sided die); this generalization is known as the Bernoulli scheme. A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ..., such that For each i, the value of Xi is either 0 or 1; For all values of i, the probability that Xi = 1 is the same number p. In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials. Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.) If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property. Interpretation The two possible values of each Xi are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on theith "trial".
PREPARED BY ARUN PRATAP SINGH 9
9 Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables Xi may be called Bernoulli trialswith parameter p. In many applications time passes between trials, as the index i increases. In effect, the trials X1, X2, ... Xi, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any Xi and Xj in the process are simply two from a set of random variables indexed by {1, 2, ..., n} or by {1, 2, 3, ...}, the finite and infinite cases. Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process: The number of successes in the first n trials, which has a binomial distribution B(n, p) The number of trials needed to get r successes, which has a negative binomial distribution NB(r, p) The number of trials needed to get one success, which has a geometric distribution NB(1, p), a special case of the negative binomial distribution The negative binomial variables may be interpreted as random waiting times. Formal definition The Bernoulli process can be formalized in the language of probability spaces as a random sequence of independent realisations of a random variable that can take values of heads or tails. The state space for an individual value is denoted by Specifically, one considers the countably infinite direct product of copies of . It is common to examine either the one-sided set or the two-sided set . There is a natural topology on this space, called the product topology. The sets in this topology are finite sequences of coin flips, that is, finite-length strings ofH and T, with the rest of (infinitely long) sequence taken as "don't care". These sets of finite sequences are referred to as cylinder sets in the product topology. The set of all such strings form a sigma algebra, specifically, a Borel algebra. This algebra is then commonly written as where the elements of are the finite-length sequences of coin flips (the cylinder sets). If the chances of flipping heads or tails are given by the probabilities , then one can define a natural measure on the product space, given by (or by for the two-sided process). Given a cylinder set, that is, a specific sequence of coin flip results at times , the probability of observing this particular sequence is given by
PREPARED BY ARUN PRATAP SINGH 10
10 where k is the number of times that H appears in the sequence, and n-k is the number of times that T appears in the sequence. There are several different kinds of notations for the above; a common one is to write
where each is a binary-valued random variable. It is common to write for . This probability P is commonly called the Bernoulli measure. [1]
Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this is because , for any . One
says that any given infinite sequence has measure zero. Nevertheless, one can still say that some classes of infinite sequences of coin flips are far more likely than others, this is given by theasymptotic equipartition property. To conclude the formal definition, a Bernoulli process is then given by the probability triple , as defined above.
BINOMIAL DISTRIBUTION : The law of large numbers states that, on average, the expectation value of flipping heads for any one coin flip is p. That is, one writes
for any one given random variable out of the infinite sequence of Bernoulli trials that compose the Bernoulli process. One is often interested in knowing how often one will observe H in a sequence of n coin flips. This is given by simply counting: Given n successive coin flips, that is, given the set of all possible strings of length n, the number N(k,n) of such strings that contain k occurrences of H is given by the binomial coefficient
If the probability of flipping heads is given by p, then the total probability of seeing a string of length n with k heads is
This probability is known as the Binomial distribution.
PREPARED BY ARUN PRATAP SINGH 11
11 Of particular interest is the question of the value of P(k,n) for very, very long sequences of coin flips, that is, for the limit . In this case, one may make use of Stirling's approximation to the factorial, and write
Inserting this into the expression for P(k,n), one obtains the Gaussian distribution; this is the content of the central limit theorem, and this is the simplest example thereof. The combination of the law of large numbers, together with the central limit theorem, leads to an interesting and perhaps surprising result: the asymptotic equipartition property. Put informally, one notes that, yes, over many coin flips, one will observe H exactly p fraction of the time, and that this corresponds exactly with the peak of the Gaussian. The asymptotic equipartition property essentially states that this peak is infinitely sharp, with infinite fall-off on either side. That is, given the set of all possible infinitely long strings of Hand T occurring in the Bernoulli process, this set is partitioned into two: those strings that occur with probability 1, and those that occur with probability 0. This partitioning is known as the Kolmogorov 0-1 law. The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is exactly the entropy of the Bernoulli process. Once again, consider the set of all strings of length n. The size of this set is . Of these, only a certain subset are likely; the size of this set is for . By using Stirling's approximation, putting it into the expression for P(k,n), solving for the location and width of the peak, and finally taking one finds that
This value is the Bernoulli entropy of a Bernoulli process. Here, H stands for entropy; do not confuse it with the same symbol H standing for heads. von Neumann posed a curious question about the Bernoulli process: is it ever possible that a given process is isomorphic to another, in the sense of the isomorphism of dynamical systems? The question long defied analysis, but was finally and completely answered with the Ornstein isomorphism theorem. This breakthrough resulted in the understanding that the Bernoulli process is unique and universal; in a certain sense, it is the single most random process possible; nothing is 'more' random than the Bernoulli process (although one must be careful with this informal statement; certainly, systems that are mixing are, in a certain sense, 'stronger' than the Bernoulli process, which is merely ergodic but not mixing. However, such processes do not consist of independent random variables: indeed, many purely deterministic, non-random systems can be mixing).
PREPARED BY ARUN PRATAP SINGH 12
12 POISSON PROCESS : In probability theory, a Poisson process is a stochastic process that counts the number of event and the time that these events occur in a given time interval. The time between each pair of consecutive events has an exponential distribution with parameter and each of these inter-arrival times is assumed to be independent of other inter-arrival times. The process is named after the French mathematician Simon Denis Poisson and is a good model of radioactive decay, [1] telephone calls [2] and requests for a particular document on a web server, [3] among many other phenomena. The Poisson process is a continuous-time process; the sum of a Bernoulli process can be thought of as its discrete-time counterpart. A Poisson process is a pure-birth process, the simplest example of a birth-death process. It is also a point process on the real half-line. The basic form of Poisson process, often referred to simply as "the Poisson process", is a continuous-time counting process {N(t), t 0} that possesses the following properties: N(0) = 0 Independent increments (the numbers of occurrences counted in disjoint intervals are independent of each other) Stationary increments (the probability distribution of the number of occurrences counted in any time interval only depends on the length of the interval) The probability distribution of N(t) is a Poisson distribution. No counted occurrences are simultaneous. Consequences of this definition include: The probability distribution of the waiting time until the next occurrence is an exponential distribution. The occurrences are distributed uniformly on any interval of time. (Note that N(t), the total number of occurrences, has a Poisson distribution over (0, t], whereas the location of an individual occurrence on t (a, b] is uniform.) Other types of Poisson process are described below. 1. Homogeneous 2. Non- Homogeneous
PREPARED BY ARUN PRATAP SINGH 13
13
Sample Path of a counting Poisson process Homogeneous : The homogeneous Poisson process counts events that occur at a constant rate; it is one of the most well-known Lvy processes. This process is characterized by a rate parameter , also known as intensity, such that the number of events in time interval (t, t + ] follows a Poisson distribution with associated parameter . This relation is given as
where N(t + ) N(t) = k is the number of events in time interval (t, t + ]. Just as a Poisson random variable is characterized by its scalar parameter , a homogeneous Poisson process is characterized by its rate parameter , which is the expected number of "events" or "arrivals" that occur per unit time. N(t) is a sample homogeneous Poisson process, not to be confused with a density or distribution function. Non-homogeneous : A non-homogeneous Poisson process counts events that occur at a variable rate. In general, the rate parameter may change over time; such a process is called a non-homogeneous Poisson process or inhomogeneous Poisson process. In this case, the generalized rate function is given as (t). Now the expected number of events between time a and time b is
Thus, the number of arrivals in the time interval (a, b], given as N(b) N(a), follows a Poisson distribution with associated parameter a,b
PREPARED BY ARUN PRATAP SINGH 14
14 A rate function (t) in a non-homogeneous Poisson process can be either a deterministic function of time or an independent stochastic process, giving rise to a Cox process. A homogeneous Poisson process may be viewed as a special case when (t) = , a constant rate.
RENEWAL PROCESS : Renewal theory is the branch of probability theory that generalizes Poisson processes for arbitrary holding times. Applications include calculating the expected time for a monkey who is randomly tapping at a keyboard to type the word Macbeth and comparing the long-term benefits of different insurance policies. A renewal process is a generalization of the Poisson process. In essence, the Poisson process is a continuous-time Markov process on the positive integers (usually starting at zero) which has independent identically distributed holding times at each integer (exponentially distributed) before advancing (with probability 1) to the next integer: . In the same informal spirit, we may define a renewal process to be the same thing, except that the holding times take on a more general distribution. (Note however that the independence and identical distribution (IID) property of the holding times is retained). Let be a sequence of positive independent identically distributed random variables such that
We refer to the random variable as the " th" holding time. Define for each n > 0 :
each referred to as the " th" jump time and the intervals
being called renewal intervals. Then the random variable given by
(where is the indicator function) represents the number of jumps that have occurred by time t, and is called a renewal process.
PREPARED BY ARUN PRATAP SINGH 15
15
Sample evolution of a renewal process with holding timesSi and jump times Jn.
The renewal equation The renewal function satisfies :
where is the cumulative distribution function of and is the corresponding probability density function. Proof of the renewal equation : We may iterate the expectation about the first holding time:
But by the Markov property
So
PREPARED BY ARUN PRATAP SINGH 16
16
as required.
RANDOM INCIDENCE : The Poisson process is one of many stochastic processes that one encounters in urban service systems. The Poisson process is one example of a "point process" in which discrete events (arrivals) occur at particular points in time. For a general point process having its zeroth arrival at time T0 and the remaining arrivals at times T1, T2, T3, . . ., the interarrival times are
Such a stochastic process is fully characterized by the family of joint pdf's for all integer values of p and all possible combinations of different n1, n2, . . ., where each ni is a positive integer denoting a particular interarrival time. Maintaining the depiction of a stochastic process at such a general level, although fine in theory, yields an intractable model and one for which the data (to estimate all the joint pdf 's) are virtually impossible to obtain. So, in the study of stochastic processes, one is motivated to make assumptions about this family of pdf's that
PREPARED BY ARUN PRATAP SINGH 17
17 (1) are realistic for an important class of problems and (2) yield a tractable model.
We wish to consider here the class of point stochastic processes for which the marginal pdf's for all of the interarrival times (Yk) are identical. That is, we assume that
Thus, for Yk, if we selected any one of the family of joint pdf's fYn1,Yn2, . . ., Ynp (yn1, yn2, . . . , yk, . . ., ynP) and "integrated out" all variables except yk, we would obtain fY(.). Note that we have said nothing about independence of the Yk's
They need not be mutually independent, pairwise independent, or conditionally independent in any way. For the special case in which the Yk's are mutually independent, the point process is called a renewal process. The Poisson process is a special case of a renewal process, being the only continuous-time renewal process having "no memory." However, the kind of process we are considering can exhibit both memory and dependence among the inter-event times. In fact, the dependence could be so strong that once we know the value of one of the Yk's we might know a great deal (perhaps even the exact values) of any number of the remaining Yk's.
Example : Consider a potential bus passenger arriving at a bus stop. The kth bus arrives Yk time units after the (k - 1)st bus. Here the Yk's are called bus headways. The probabilistic behavior of the Yk's will determine the probability law for the waiting time of the potential passenger (until the next bus arrives). Here it is reasonable to assume that the Yk's are identically distributed but not independent (due to interactions between successive buses). One could estimate the pdf fY(.) simply by gathering data describing bus interarrival times and displaying the data in the form of a histogram. (This same model applies to subways and even elevators in a multielevator building.) Suppose that buses maintain perfect headway; that is, they are always T0 minutes apart. Then
PREPARED BY ARUN PRATAP SINGH 18
18
That is, the time until the next bus arrives, given random incidence, is uniformly distributed between 0 and T0, with a mean E[V] = T0/2, as we might expect intuitively.
MARKOV MODULATED BERNOULLI PROCESS : The Markov-Modulated Bernoulli Process (MMBP) model is used to analyze the delay experienced by messages in clocked, packed-switched Banyan networks with k x k output- buffered switches. This approach allows us to analyze both single packet messages and multipacket messages with general traffic pattern including uniform traffic, hot-spot traffic, locality of reference, etc. The ability to analyze multipacket messages is very important for multimedia applications. Previous work, which is only applicable to restricted message and traffic patterns, resorts to either heuristic correction factors to artificially tune the model or tedious computational efforts. In contrast, the proposed model, which is applicable to much more general message and traffic patterns, not only is an application of a theoretically complete model but also requires a minimal amount of computational effort. In all cases, the analytical results are compared with results obtained by simulation and are shown to be very accurate.
PREPARED BY ARUN PRATAP SINGH 19
19
PREPARED BY ARUN PRATAP SINGH 20
20
PREPARED BY ARUN PRATAP SINGH 21
21
DTMC - Discrete Time Markov Chains
IRREDUCIBLE FINITE CHAINS WITH APERIODIC STATES : A Markov chain (discrete-time Markov chain or DTMC) named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another on a state space. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property. Markov chains have many applications as statistical models of real-world processes.
PREPARED BY ARUN PRATAP SINGH 22
22
PREPARED BY ARUN PRATAP SINGH 23
23
DISCRETE TIME BIRTH DEATH PROCESS : The birthdeath process is a special case of continuous-time Markov process where the state transitions are of only two types: "births", which increase the state variable by one and "deaths", which decrease the state by one. The model's name comes from a common application, the use of such models to represent the current size of a population where the transitions are literal births and deaths. Birthdeath processes have many applications in demography, queueing theory, performance engineering, epidemiology or in biology. They may be used, for example to study the evolution
PREPARED BY ARUN PRATAP SINGH 24
24 of bacteria, the number of people with a disease within a population, or the number of customers in line at the supermarket. When a birth occurs, the process goes from state n to n + 1. When a death occurs, the process goes from state n to state n 1. The process is specified by birth rates and death rates .
Example : A pure birth process is a birthdeath process where for all . A pure death process is a birthdeath process where for all . A (homogeneous) Poisson process is a pure birth process where for all M/M/1 model and M/M/c model, both used in queueing theory, are birthdeath processes used to describe customers in an infinite queue.
Use in queueing theory : In queueing theory the birthdeath process is the most fundamental example of a queueing model, the M/M/C/K/ /FIFO (in complete Kendall's notation) queue. This is a queue with Poisson arrivals, drawn from an infinite population, and C servers with exponentially distributed service time with K places in the queue. Despite the assumption of an infinite population this model is a good model for various telecommunication systems. M/M/1 queue The M/M/1 is a single server queue with an infinite buffer size. In a non-random environment the birthdeath process in queueing models tend to be long-term averages, so the average rate of arrival is given as and the average service time as . The birth and death process is a M/M/1 queue when,
The difference equations for the probability that the system is in state k at time t are,
PREPARED BY ARUN PRATAP SINGH 25
25 M/M/c queue The M/M/c is a multi-server queue with C servers and an infinite buffer. This differs from the M/M/1 queue only in the service time, which now becomes
and
with
M/M/1/K queue The M/M/1/K queue is a single server queue with a buffer of size K. This queue has applications in telecommunications, as well as in biology when a population has a capacity limit. In telecommunication we again use the parameters from the M/M/1 queue with,
In biology, particularly the growth of bacteria, when the population is zero there is no ability to grow so,
Additionally if the capacity represents a limit where the population dies from over population,
The differential equations for the probability that the system is in state k at time t are,
MARKOV PROPERTY : In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. [1]
A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present values) depends only upon the present state, not on the sequence of events that preceded it. A process with this property is called a Markov process. The term strong Markov property is similar to the Markov property, except that the meaning of
PREPARED BY ARUN PRATAP SINGH 26
26 "present" is defined in terms of a random variable known as a stopping time. Both the terms "Markov property" and "strong Markov property" have been used in connection with a particular "memoryless" property of the exponential distribution. [2]
The term Markov assumption is used to describe a model where the Markov property is assumed to hold, such as a hidden Markov model. A Markov random field [3] extends this property to two or more dimensions or to random variables defined for an interconnected network of items. An example of a model for such a field is the Ising model. A discrete-time stochastic process satisfying the Markov property is known as a Markov chain.
FINITE MARKOV CHAIN :
PREPARED BY ARUN PRATAP SINGH 27
27
PREPARED BY ARUN PRATAP SINGH 28
28
PREPARED BY ARUN PRATAP SINGH 29
29
CONTINUOUS-TIME MARKOV CHAIN : In probability theory, a continuous-time Markov chain (CTMC [1] or continuous-time Markov process [2] ) is a mathematical model which takes values in some finite or countable set and for which the time spent in each state takes non-negative real values and has an exponential distribution. It is a continuous-time stochastic process with theMarkov property which means that future behaviour of the model (both remaining time in current state and next state) depends only on the current state of the model and not on historical behaviour. The model is a continuous-time version of the Markov chain model, named because the output from such a process is a sequence (or chain) of states.
PREPARED BY ARUN PRATAP SINGH 30
30
A continuous-time Markov chain (Xt)t 0 is defined by a finite or countable state space S, a transition rate matrix Q with dimensions equal to that of the state space and initial probability distribution defined on the state space. For i j, the elements qij are non-negative and describe the rate the process transitions from state i to state j. The elements qiiare chosen such that each row of the transition rate matrix sums to zero. There are three equivalent definitions of the process. [3]
Infinitesimal definition Let Xt be the random variable describing the state of the process at time t, and assume that the process is in a state i at time t. Then Xt + h is independent of previous values (Xs : s t) and as h 0 uniformly in t for all j
using little-o notation. The qij can be seen as measuring how quickly the transition from i to j happens Jump chain/holding time definition Define a discrete-time Markov chain Yn to describe the nth jump of the process and variables S1, S2, S3, ... to describe holding times in each of the states where the distribution ofSi is given by qYiYi. Transition probability definition For any value n = 0, 1, 2, 3, ... and times indexed up to this value of n: t0, t1, t2, ... and all states recorded at these times i0, i1, i2, i3, ... it holds that
where pij is the solution of the forward equation (a first-order differential equation)
PREPARED BY ARUN PRATAP SINGH 31
31
with initial condition P(0) is the identity matrix.
HIDDEN MARKOV MODEL : A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. [1][2][3][4][5] It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, [6] who was the first to describe the forward- backward procedure. In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hiddenMarkov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a 'hidden' Markov model even if these parameters are known exactly.
PREPARED BY ARUN PRATAP SINGH 32
32 Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, [7] part-of-speech tagging, musical score following, [8] partial discharges [9] and bioinformatics. A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other. Recently, hidden Markov models have been generalized to pairwise Markov models and triplet Markov models which allow to consider more complex data structures [10][11] and to model nonstationary data.