Sei sulla pagina 1di 56

Statistics and Probability

Statistics and probability are sections of mathematics that deal with data collection and analysis. Probability is the study of chance and is a very fundamental subject
that we apply in everyday living, while statistics is more concerned with how we handle data using different analysis techniques and collection methods. These two
subjects always go hand in hand and thus you can't study one without studying the other.

Still need help after using our statistics resources? Use our service to find a statistics tutor.

Introduction to Statistics
This section deals with introducing the concept of statistics and its relevance to everyday life. data is defined and the various methods of data collection,
like sampling are also introduced.

Averages
We have all used the term 'Average' in some form or another at some point in our lives. Statistical averages are introduced and defined in this section. Mean, median,
mode and range are discussed both at an introductory level and also at a more advanced level, like the concept of Assumed mean.

Frequency and its aspects like Cumulative Frequency are also discussed.

Probability
This section serves as an introduction to the concept of Probability, including definitions of the different terminology and the fundamental method of calculating
Probability.

Different concepts like Dependence and Independence of Events are discussed including the methods of dealing with such concepts.

Probability Distributions
This section sets the stage for a more advanced view of Probability by introducing the idea of Random Variable and the meaning and types of probability distributions
including Discrete and Continous Probability Distributions.

Joint Probability Distributions are also discussed. This entire section is fundamental in understanding the way Probability and Statistics interact.

Introduction to Statistics
Statistics is a branch of mathematics that deals with the collection, analysis and interpretation of data.

Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. In layman's terms, data in
statistics can be any set of information that describes a given entity. An example of data can be the ages of the students in a given class. When you collect those ages,
that becomes your data.

A set in statistics is referred to as a population. Though this term is commonly used to refer to the number of people in a given place, in statistics, a population refers to
any entire set from which you collect data.

Data Collection Methods


As we have seen in the definition of statistics, data collection is a fundamental aspect and as a consequence, there are different methods of collecting data which when
used on one particular set will result in different kinds of data. Let's move on to look at these individual methods of collection in order to better understand the types of
data that will result.

Census Data Collection

Census data collection is a method of collecting data whereby all the data from each and every member of the population is collected.

For example, when you collect the ages of all the students in a given class, you are using the census data collection method since you are including all the members of
the population (which is the class in this case).

This method of data collection is very expensive (tedious, time consuming and costly) if the number of elements (population size) is very large. To understand the scope
of how expensive it is, think of trying to count all the ten year old boys in the country. That would take a lot of time and resources, which you may not have.

Sample Data Collection

Sample data collection, which is commonly just referred to as sampling, is a method which collects data from only a chosen portion of the population.

Sampling assumes that the portion that is chosen to be sampled is a good estimate of the entire population. Thus one can save resources and time by only collecting
data from a small part of the population. But this raises the question of whether sampling is accurate or not. The answer is that for the most part, sampling is
approximately accurate. This is only true if you choose your sample carefully to be able to closely approximate what the true population consists of.

Sampling is used commonly in everyday life, for example, all the different research polls that are conducted before elections. Pollsters don't ask all the people in a given
state who they'll vote for, but they choose a small sample and assume that these people represent how the entire population of the state is likely to vote. History has
shown that these polls are almost always close to accuracy, and as such sampling is a very powerful tool in statistics.

Experimental Data Collection

Experimental data collection involves one performing an experiment and then collecting the data to be further analyzed. Experiments involve tests and the results of
these tests are your data.

An example of experimental data collection is rolling a die one hundred times while recording the outcomes. Your data would be the results you get in each roll. The
experiment could involve rolling the die in different ways and recording the results for each of those different ways.

Experimental data collection is useful in testing theories and different products and is a very fundamental aspect of mathematics and all science as a whole.

Observational Data Collection

Observational data collection method involves not carrying out an experiment but observing without influencing the population at all. Observational data collection is
popular in studying trends and behaviors of society where, for example, the lives of a bunch of people are observed and data is collected for the different aspects of
their lives.

Data
Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables, which is the same as saying that
data can be any set of information that describes a given entity. Data in statistics can be classified into grouped data and ungrouped data.

Any data that you first gather is ungrouped data. Ungrouped data is data in the raw. An example of ungrouped data is a any list of numbers that you can think of.

Grouped Data
Grouped data is data that has been organized into groups known as classes. Grouped data has been 'classified' and thus some level of data analysis has taken place,
which means that the data is no longer raw.

A data class is group of data which is related by some user defined property. For example, if you were collecting the ages of the people you met as you walked down
the street, you could group them into classes as those in their teens, twenties, thirties, forties and so on. Each of those groups is called a class.

Each of those classes is of a certain width and this is referred to as the Class Interval or Class Size. This class interval is very important when it comes to drawing
Histograms and Frequency diagrams. All the classes may have the same class size or they may have different classes sizes depending on how you group your data. The
class interval is always a whole number.

Below is an example of grouped data where the classes have the same class interval.

Age (years)

Frequency

0-9

12

10 - 19

30

20 - 29

18

30 - 39

12

40 - 49

50 - 59

60 - 69

Solution:

Below is an example of grouped data where the classes have different class interval.

Age (years)

Frequency

Class Interval

0-9

15

10

10 - 19

18

10

20 - 29

17

10

30 - 49

35

20

50 - 79

20

30

Calculating Class Interval


Given a set of raw or ungrouped data, how would you group that data into suitable classes that are easy to work with and at the same time meaningful?

The first step is to determine how many classes you want to have. Next, you subtract the lowest value in the data set from the highest value in the data set and then
you divide by the number of classes that you want to have:

Example 1:

Group the following raw data into ten classes.

Solution:

The first step is to identify the highest and lowest number

Class interval should always be a whole number and yet in this case we have a decimal number. The solution to this problem is to round off to the nearest whole
number.

In this example, 2.8 gets rounded up to 3. So now our class width will be 3; meaning that we group the above data into groups of 3 as in the table below.

Number

Frequency

1-3

4-6

7-9

10 - 12

13 - 15

16 - 18

19 - 21

22 - 24

25 - 27

28 - 30

Class Limits and Class Boundaries


Class limits refer to the actual values that you see in the table. Taking an example of the table above, 1 and 3 would be the class limits of the first class. Class limits are
divided into two categories: lower class limit and upper class limit. In the table above, for the first class, 1 is the lower class limit while 3 is the upper class limit.

On the other hand, class boundaries are not always observed in the frequency table. Class boundaries give the true class interval, and similar to class limits, are also
divided into lower and upper class boundaries.

The relationship between the class boundaries and the class interval is given as follows:

Class boundaries are related to class limits by the given relationships:

As a result of the above, the lower class boundary of one class is equal to the upper class boundary of the previous class.

Class limits and class boundaries play separate roles when it comes to representing statistical data diagrammatically as we shall see in a moment.

Sampling
Sampling is a fundamental aspect of statistics, but unlike the other methods of data collection, sampling involves choosing a method of sampling which further
influences the data that you will result with. There are two major categories in sampling: probability and non-probability sampling.

Probability Sampling
Under probability sampling, for a given population, each element of that population has a chance of being picked to part of the sample. In other words, no single
element of the population has a zero chance of being picked

The odd/chances/probability of picking any element is known or can be calculated. This is possible if we know the total number in the entire population such that we
are then able to determine that odds of picking any one element.

Probability sampling involves random picking of elements from a population, and that is the reason as to why no element has a zero chance of being picked to be part
of a sample.

Methods of Probability Sampling


There are a number of different methods of probability sampling including:

1.

Random Sampling

Random sampling is the method that most closely defines probability sampling. Each element of the sample is picked at random from the given
population such that the probability of picking that element can be calculated by simply dividing the frequency of the element by the total number of
elements in the population. In this method, all elements are equally likely to be picked if they have the same frequency.

2.

Systematic Sampling

Systematic sampling is the method that involves arranging the population in a given order and then picking the nthelement from the ordered list of all the
elements in the population. The probability of picking any given element can be calculated but is not likely to be the same for all elements in the
population regardless of whether they have the same frequency.

3.

Stratified Sampling

Stratified sampling involves dividing the population into groups and then sampling from those different groups depending on a certain set criteria.

For example, dividing the population of a certain class into boys and girls and then from those two different groups picking those who fall into the specific
category that you intend to study with your sample.

4.

Cluster Sampling

Cluster sampling involves dividing up the population into clusters and assigning each element to one and only one cluster, in other words, an element
can't appear in more than one cluster.

5.

Multistage Sampling

Multistage sampling involves use of more than one probability sampling method and more than one stage of sampling, for example for using the stratified
sampling method in the first stage and then the random sampling method in the second stage and so on until you achieve the sample that you want.

6.

Probability Proportional to Size Sampling

Under probability proportional to size sampling, the sample is chosen as a proportion to the total size of the population. It is a form of multistage sampling
where in stage one you cluster the entire population and then in stage two you randomly select elements from the different clusters, but the number of
elements that you select from each cluster is proportional to the size of the population of that cluster.

Non-Probability Sampling
Unlike probability sampling, under non-probability sampling certain elements of the population might have a zero chance of being picked. This is because we can't
accurately determine the chances/probability of picking a given element so we do not know whether the odds of picking that element are zero or greater than zero.
Non-probability sampling may not always be a consequence of the sampler's ignorance of the total number of elements in the population but may be a result of the
sampler's bias in the way he chooses the sample by excluding some elements.

Methods of Non-Probability Sampling


There are a number of different methods of Non-probability sampling which include:

1.

Quota Sampling

Quota sampling is similar to stratified sampling only that in this case, after the population is divided into groups, the elements are then sampled from the
group using the sampler's judgement and as a consequence the method loses any aspect of being random and can be extremely biased.

2.

Accidental or Convenience Sampling

Accidental sampling is a method of sampling where by the sampler picks the sample based on the fact that the elements that he/she picks are
conveniently close at the moment. For example, if you walked down the street and sampled the first ten people you meet, the fact that they happened to
be there is convenient for you but accidental for them which leads to the name of the method.

3.

Purposive or Judgemental Sampling

Purposive or judgemental sampling is a method of sampling where by the sampler picks the sample from the entire population solely based on the his/her
judgement. The sampler controls to a very large extent which elements have a chance of being selected to be in the sample and which ones don't.

4.

Voluntary Sampling

Voluntary sampling, as the name suggests, involves picking the sample based on which elements of the population volunteer to participate in the sample.
This is the most common method used in research polls.

5.

Snowball Sampling

Snowball sampling is a method of sampling that relies on referrals of previously selected elements to pick other elements that will participate in the
sample.

Averages
In statistics, an average is defined as the number that measures the central tendency of a given set of numbers. There are a number of different averages including but
not limited to: mean, median, mode and range.

Mean
Mean is what most people commonly refer to as an average. The mean refers to the number you obtain when you sum up a given set of numbers and then divide this
sum by the total number in the set. Mean is also referred to more correctly as arithmetic mean.

Given a set of n elements from a1 to an

The mean is found by adding up all the a's and then dividing by the total number, n

This can be generalized by the formula below:

Mean Example Problems


Example 1

Find the mean of the set of numbers below

Solution

The first step is to count how many numbers there are in the set, which we shall call n

The next step is to add up all the numbers in the set

The last step is to find the actual mean by dividing the sum by n

Mean can also be found for grouped data, but before we see an example on that, let us first define frequency.

Frequency in statistics means the same as in everyday use of the word. The frequency an element in a set refers to how many of that element there are in the set. The
frequency can be from 0 to as many as possible. If you're told that the frequency an element a is 3, that means that there are 3 as in the set.

Example 2

Find the mean of the set of ages in the table below

Age (years)

Frequency

10

11

12

13

14

Solution

The first step is to find the total number of ages, which we shall call n. Since it will be tedious to count all the ages, we can find nby adding up the frequencies:

Next we need to find the sum of all the ages. We can do this in two ways: we can add up each individual age, which will be a long and tedious process; or we can use the
frequency to make things faster.

Since we know that the frequency represents how many of that particular age there are, we can just multiply each age by its frequency, and then add up all these
products.

The last step is to find the mean by dividing the sum by n

Population Mean vs Sample Mean


In the Introduction to Statistics section, we defined a population and a sample whereby a sample is a part of a population.

In statistics there are two kinds of means: population mean and sample mean. A population mean is the true mean of the entire population of the data set while a
sample mean is the mean of a small sample of the population. These different means appear frequently in both statistics and probability and should not be confused
with each other.

Population mean is represented by the Greek letter (pronounced mu) while sample mean is represented by x(pronounced x bar). The total number of elements in a
population is represented by N while the number of elements in a sample is represented by n. This leads to an adjustment in the formula we gave above for calculating
the mean.

The sample mean is commonly used to estimate the population mean when the population mean is unknown. This is because they have the same expected value.

Median
The median is defined as the number in the middle of a given set of numbers arranged in order of increasing magnitude. When given a set of numbers, the median is
the number positioned in the exact middle of the list when you arrange the numbers from the lowest to the highest. The median is also a measure of average. In higher
level statistics, median is used as a measure of dispersion. The median is important because it describes the behavior of the entire set of numbers.

Example 3

Find the median in the set of numbers given below

Solution

From the definition of median, we should be able to tell that the first step is to rearrange the given set of numbers in order of increasing magnitude, i.e. from the lowest
to the highest

Then we inspect the set to find that number which lies in the exact middle.

Lets try another example to emphasize something interesting that often occurs when solving for the median.

Example 4

Find the median of the given data

Solution

As in the previous example, we start off by rearranging the data in order from the smallest to the largest.

Next we inspect the data to find the number that lies in the exact middle.

We can see from the above that we end up with two numbers (4 and 5) in the middle. We can solve for the median by finding the mean of these two numbers as
follows:

Mode
The mode is defined as the element that appears most frequently in a given set of elements. Using the definition of frequency given above, mode can also be defined
as the element with the largest frequency in a given data set.

For a given data set, there can be more than one mode. As long as those elements all have the same frequency and that frequency is the highest, they are all the modal
elements of the data set.

Example 5

Find the Mode of the following data set.

Solution

Mode = 3 and 15

Mode for Grouped Data

As we saw in the section on data, grouped data is divided into classes. We have defined mode as the element which has the highest frequency in a given data set. In
grouped data, we can find two kinds of mode: the Modal Class, or class with the highest frequency and the mode itself, which we calculate from the modal class using
the formula below.

where

L is the lower class limit of the modal class

f1 is the frequency of the modal class

f0 is the frequency of the class before the modal class in the frequency table

f2 is the frequency of the class after the modal class in the frequency table

h is the class interval of the modal class

Example 6

Find the modal class and the actual mode of the data set below

Number

Frequency

1-3

4-6

7-9

10 - 12

13 - 15

16 - 18

19 - 21

22 - 24

25 - 27

28 - 30

Solution

Modal class = 10 - 12

where

L = 10

f1 = 9

f0 = 4

f2 = 2

h=3

therefore,

Solving the above using the order of operations:

Range
The range is defined as the difference between the highest and lowest number in a given data set.

Example 7

Find the range of the data set below

Solution

Assumed Mean
In the section on averages, we learned how to calculate the mean for a given set of data. The data we looked at was ungrouped data and the total number of elements
in the data set was not that large. That method is not always a realistic approach especially if you're dealing with grouped data.

That's where the assumed mean comes into play.

Assumed mean, like the name suggests, is a guess or an assumption of the mean. Assumed mean is most commonly denoted by the letter a. It doesn't need to be
correct or even close to the actual mean and choice of the assumed mean is at your discretion except for where the question explicitly asks you to use a certain
assumed mean value.

Assumed mean is used to calculate the actual mean as well as the variance and standard deviation as we'll see later.

Assumed mean can be calculated from the following formula:

It's very important to remember that the above formula only applies to grouped data with equal class intervals.

Now let us define each term used in the formula:

xis the mean which we're trying to find.

a is the assumed mean.

h is the class interval which we looked at in the section on data.

fi is the frequency of each class, we find the total frequency of all the classes in the data set (fi) by adding up all the fi 's

Each ui is found from the following formula:

where h is the class interval and each di is the difference between the mid element in a class and the assumed mean.

d is calculated from the following formula:

where x is the midpoint of a given class.

x is obtained from the following:

xi is the number in the middle of a given class.

Therefore ui becomes

Let's try an example to see how to apply the assumed mean method for finding mean.

Example 1

The student body of a certain school were polled to find out what their hobbies were. The number of hobbies each student had was then recorded and the data
obtained was grouped into classes shown in the table below. Using an assumed mean of 17, find the mean for the number of hobbies of the students in the school.

Number of hobbies

Frequency

0-4

45

5-9

58

10 - 14

27

15 - 19

30

20 - 24

19

25 - 29

11

30 - 34

35 - 40

Solution

We have been given the assumed mean a as 17 and we know the formula for finding mean from the assumed mean as

we can find the class interval by using the class limits as follows:

We now have one component we need and we're one step closer to finding the mean.

So we can solve the rest of this problem using a table where by we find each remaining component of the formula and then substitute at the end:
Hobbies

Frequency fi

xi

di = x i - a

ui = dih

fiui

0-4

45

-15

-3

-135

5-9

58

-10

-2

-116

10 - 14

27

12

-5

-1

-27

15 - 19

30

17

20 - 24

19

22

19

25 - 29

11

27

10

22

30 - 34

32

15

24

35 - 40

37

20

fi = 200

substituting

The mean number of hobbies is 11.95.

Cumulative Frequency, Quartiles and Percentiles

fiui = -202

Cumulative Frequency
Cumulative frequency is defined as a running total of frequencies. The frequency of an element in a set refers to how many of that element there are in the set.
Cumulative frequency can also defined as the sum of all previous frequencies up to the current point.

The cumulative frequency is important when analyzing data, where the value of the cumulative frequency indicates the number of elements in the data set that lie
below the current value. The cumulative frequency is also useful when representing data using diagrams like histograms.

Cumulative Frequency Table

The cumulative frequency is usually observed by constructing a cumulative frequency table. The cumulative frequency table takes the form as in the example below.

Example 1

The set of data below shows the ages of participants in a certain summer camp. Draw a cumulative frequency table for the data.

Age (years)

Frequency

10

11

18

12

13

13

12

14

15

27

Solution:

The cumulative frequency at a certain point is found by adding the frequency at the present point to the cumulative frequency of the previous point.

The cumulative frequency for the first data point is the same as its frequency since there is no cumulative frequency before it.

Age (years)

Frequency

Cumulative Frequency

10

11

18

3+18 = 21

12

13

21+13 = 34

13

12

34+12 = 46

14

46+7 = 53

15

27

53+27 = 80

Cumulative Frequency Graph (Ogive)


A cumulative frequency graph, also known as an Ogive, is a curve showing the cumulative frequency for a given set of data. The cumulative frequency is plotted on the
y-axis against the data which is on the x-axis for un-grouped data. When dealing with grouped data, the Ogive is formed by plotting the cumulative frequency against
the upper boundary of the class. An Ogive is used to study the growth rate of data as it shows the accumulation of frequency and hence its growth rate.

Example 2

Plot the cumulative frequency curve for the data set below

Age (years)

Frequency

10

11

10

12

27

13

18

14

15

16

16

38

17

Solution:

Age (years)

Frequency

Cumulative Frequency

10

11

10

5+10 = 15

12

27

15+27 = 42

13

18

42+18 = 60

14

60+6 = 66

15

16

66+16 = 82

16

38

82+38 = 120

17

120+9 = 129

Percentiles
A percentile is a certain percentage of a set of data. Percentiles are used to observe how many of a given set of data fall within a certain percentage range; for example;
a thirtieth percentile indicates data that lies the 13% mark of the entire data set.

Calculating Percentiles

Let designate a percentile as Pm where m represents the percentile we're finding, for example for the tenth percentile, m} would be 10. Given that the total number of
elements in the data set is N

Quartiles
The term quartile is derived from the word quarter which means one fourth of something. Thus a quartile is a certain fourth of a data set. When you arrange a date set
increasing order from the lowest to the highest, then you divide this data into groups of four, you end up with quartiles. There are three quartiles that are studied in
statistics.

First Quartile (Q1)

When you arrange a data set in increasing order from the lowest to the highest, then you proceed to divide this data into four groups,
the data at the lower fourth (14) mark of the data is referred to as the First Quartile.

The First Quartile is equal to the data at the 25th percentile of the data. The first quartile can also be obtained using the Ogive whereby
you section off the curve into four parts and then the data that lies on the last quadrant is referred to as the first quartile.

Second Quartile (Q2)

When you arrange a given data set in increasing order from the lowest to the highest and then divide this data into four groups , the
data value at the second fourth (24) mark of the data is referred to as the Second Quartile.

This is the equivalent to the data value at the half way point of all the data and is also equal to the the data value at the 50th percentile.

The Second Quartile can similarly be obtained from an Ogive by sectioning off the curve into four and the data that lies at the second
quadrant mark is then referred to as the second data. In other words, all the data at the half way line on the cumulative frequency curve
is the second quartile. The second quartile is also equal to the median.

Third Quartile (Q3)

When you arrange a given data set in increasing order from the lowest to the highest and then divide this data into four groups, the data
value at the third fourth (34) mark of the data is referred to as the Third Quartile.

This is the equivalent of the the data at the 75th percentile. The third quartile can be obtained from an Ogive by dividing the curve into
four and then considering all the data value that lies at the 34 mark.

Calculating the Different Quartiles

The different quartiles can be calculated using the same method as with the median.

First Quartile

The first quartile can be calculated by first arranging the data in an ordered list, then finding then dividing the data into two groups. If the
total number of elements in the data set is odd, you exclude the median (the element in the middle).

After this you only look at the lower half of the data and then find the median for this new subset of data using the method for finding
median described in the section on averages.

This median will be your First Quartile.

Second Quartile

The second quartile is the same as the median and can thus be found using the same methods for finding median described in the
section on averages.

Third Quartile

The third quartile is found in a similar manner to the first quartile. The difference here is that after dividing the data into two groups,
instead of considering the data in the lower half, you consider the data in the upper half and then you proceed to find the Median of this
subset of data using the methods described in the section on Averages.

This median will be your Third Quartile.

Calculating Quartiles from Cumulative Frequency

As mentioned above, we can obtain the different quartiles from the Ogive, which means that we use the cumulative frequency to calculate the quartile.

Given that the cumulative frequency for the last element in the data set is given as f c, the quartiles can be calculated as follows:

The quartile is then located by matching up which element has the cumulative frequency corresponding to the position obtained above.

Example 3

Find the First, Second and Third Quartiles of the data set below using the cumulative frequency curve.

Age (years)

Frequency

10

11

10

12

27

13

18

14

15

16

16

38

17

Solution:

Age (years)

Frequency

Cumulative Frequency

10

11

10

15

12

27

42

13

18

60

14

66

15

16

82

16

38

120

17

129

From the Ogive, we can see the positions where the quartiles lie and thus can approximate them as follows

Interquartile Range

The interquartile range is the difference between the third quartile and the first quartile.

Dispersion - Deviation and Variance


Dispersion measures how the various elements behave with regards to some sort of central tendency, usually the mean. Measures of dispersion
include range, interquartile range, variance, standard deviation and absolute deviation. We've already looked at the first two in the Averages section, so let's move on to
the other measures.

Absolute Deviation
Absolute deviation for a given data set is defined as the average of the absolute difference between the elements of the set and the mean (average deviation) or the
median element (median absolute deviation).

The average deviation is calculated as follows:

which means that the average deviation is the average of the differences between each element of the data set and the mean.

The median absolute deviation is calculated as follows:

Example 1

The heights of a group of 10 students randomly selected from a given school are as follows (in ft):

5.5, 3.5, 4.6, 6.1, 5.7, 5.11, 4.9, 5.0, 5.0, 5.5

a) Find the absolute deviation from the mean.

b) Find the absolute deviation from the median.

Solution

a) To find the absolute deviation from the mean, we need to first find the mean of the heights.

We know that the mean xis given by:

Using the above, we calculate the mean as:

The mean height is 5.091 ft.

The deviation from the mean for each of the elements in the data set is obtained by subtracting the mean from that element, as follows:

For 5.5:

We find all the deviations and then take their average (remember that we only consider their absolute values):

b) To find the absolute deviation from the median, we need to first find the median height for the data set.

We know that to find the median value, we arrange the elements in the data set in ascending or descending order and the find that element that lies in the middle.

Arranged in ascending order from the smallest to the largest:

Finding the median:

Since we had an even number of elements in the data set, it comes as no surprise that we're unable to obtain a median by canceling out corresponding elements.
We're left with two elements and so we find their mean which then becomes our median.

Having obtained our median as 5.25, we can proceed to find the average deviation from the median using the same steps as in the previous question.

Variance and Standard Deviation


Variance, as the name suggests, is a measure of how different the elements in a given population are. Variance is used to indicate how spread out these elements are
from the mean of the population. There are two kinds of variance: population variance and sample variance.

Population variance is the variance of the entire population and is denoted by 2 while sample variance is the variance of a sample space of the population; and is
denoted by S2

Standard deviation is the square root of variance. Standard deviation is a measure of how precise the mean of a population or sample is. It is used to indicate trends in
the elements in a given data set with respect to the mean, i.e, the spread of these elements from the mean.

Just as we have a population and sample variance, we also have a population and sample standard deviation. Population standard deviation is denoted by while the
sample standard deviation is denoted by S

Although absolute deviation is also a measure of dispersion, variance and standard deviation are better measures because of the way they're calculated. Calculating
variance involves squaring the differences (deviations) between the element and the mean and this makes the differences larger and thus more manageable. Making
the differences larger adds a weighting factor to them making trends easier to spot.

The population variance can be calculated from the following:

where is the population mean.

The sample variance is given by

where xis the sample mean.

Standard deviation is simply the square root of variance, so we can calculate it by taking the square root of the above variance formulae:

Population standard deviation

where is the population mean.

Sample standard deviation

where xis the sample mean.

The difference in calculating 2 and S2 is the average if found using the number of elements in the set for 2. By contrast, we use one less than the sample space size
for S2. The reason for this is that by using n-1 we ensure that S2 is an unbiased estimator of2.

Probability
Probability is the branch of mathematics that deals with the study chance. Probability deals with the study of experiments and their outcomes.

Probability Key Terms

Experiment

An experiment in probability is a test to see what will happen incase you do something. A simple example is flipping a coin. When you
flip a coin, you are performing an experiment to see what side of the coin you'll end up with.

Outcome

An outcome in probability refers to a single (one) result of an experiment. In the example of an experiment above, one outcome would
be heads and the other would be tails.

Event

An event in probability is the set of a group of different outcomes of an experiment. Suppose you flip a coin multiple times, an example
of an event would the getting a certain number of heads.

Sample Space

A sample space in probability is the total number of all the different possible outcomes of a given experiment. If you flipped a coin once,
the sample space S would be given by:

If you flipped the coin multiple times, all the different combinations of heads and tails would make up the sample space. A sample space
is also defined as a Universal Set for the outcomes of a given experiment.

Notation of Probability

The probability that a certain event will happen when an experiment is performed can in layman's terms be described as the chance that something will happen.

The probability of an event, E is denoted by

Suppose that our experiment involves rolling a die. There are 6 possible outcomes in the sample space, as shown below:

The size of the sample space is often denoted by N while the number of outcomes in an event is denoted by n.

From the above, we can denote the probability of an event as:

For the sample space given above, if the event is 2, there is only one 2 in the sample space, thus n = 1 and N = 6.

Thus probability of getting a 2 when you roll a die is given by

Understanding the Magnitude of the Probability of an Event


The largest probability an event can have is one and the smallest is zero. There are no negative probabilities and no probabilities greater than one. Probabilities are real
positive numbers ranging from zero to one. The closer the probability is to 1, the more likely the event is to occur while the closer the event is to zero, the less likely the
event is to occur.

When an event has probability of one, we say that the event must happen and when the probability is zero we say that the event is impossible.

The total of all the probabilities of the events in a sample space add up to one.

Events with the same probability have the same likelihood of occurring. For example, when you flip a fair coin, you are just as likely to get a head as a tail. This is
because these two outcomes have the same probability i.e.

Further Concepts in Probability


The study of probability mostly deals with combining different events and studying these events alongside each other. How these different events relate to each other
determines the methods and rules to follow when we're studying their probabilities.

Events can be pided into two major categories dependent or Independent events.

Independent Events

When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other
event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. The probability of getting any number face on the die in no
way influences the probability of getting a head or a tail on the coin.

Dependent Events

When two events are said to be dependent, the probability of one event occurring influences the likelihood of the other event.

For example, if you were to draw a two cards from a deck of 52 cards. If on your first draw you had an ace and you put that aside, the probability of drawing an ace on
the second draw is greatly changed because you drew an ace the first time. Let's calculate these different probabilities to see what's going on.

There are 4 Aces in a deck of 52 cards

On your first draw, the probability of getting an ace is given by:

If we don't return this card into the deck, the probability of drawing an ace on the second pick is given by

As you can clearly see, the above two probabilities are different, so we say that the two events are dependent. The likelihood of the second event depends on what
happens in the first event.

Conditional Probability
We have already defined dependent and independent events and seen how probability of one event relates to the probability of the other event.

Having those concepts in mind, we can now look at conditional probability.

Conditional probability deals with further defining dependence of events by looking at probability of an event given that some other event first occurs.

Conditional probability is denoted by the following:

The above is read as the probability that B occurs given that A has already occurred.

The above is mathematically defined as:

Set Theory in Probability


A sample space is defined as a universal set of all possible outcomes from a given experiment.

Given two events A and B and given that these events are part of a sample space S. This sample space is represented as a set as in the diagram below.

The entire sample space of S is given by:

Remember the following from set theory:

The different regions of the set S can be explained as using the rules of probability.

Rules of Probability
When dealing with more than one event, there are certain rules that we must follow when studying probability of these events. These rules depend greatly on whether
the events we are looking at are Independent or dependent on each other.

First acknowledge that

Multiplication Rule (AB)

This region is referred to as 'A intersection B' and in probability; this region refers to the event that both A and B happen. When we use the word and we are referring
to multiplication, thus A and B can be thought of as AxB or (using dot notation which is more popular in probability) AB

If A and B are dependent events, the probability of this event happening can be calculated as shown below:

If A and B are independent events, the probability of this event happening can be calculated as shown below:

Conditional probability for two independent events can be redefined using the relationship above to become:

The above is consistent with the definition of independent events, the occurrence of event A in no way influences the occurrence of event B, and so the probability that
event B occurs given that event A has occurred is the same as the probability of event B.

Additive Rule (AB)

In probability we refer to the addition operator (+) as or. Thus when we want to we want to define some event such that the event can be A or B, to find the probability
of that event:

Thus it follows that:

But remember from set theory that and from the way we defined our sample space above:

and that:

So we can now redefine out event as

The above is sometimes referred to as the subtraction rule.

Mutual Exclusivity
Certain special pairs of events have a unique relationship referred to as mutual exclusivity.

Two events are said to be mutually exclusive if they can't occur at the same time. For a given sample space, its either one or the other but not both. As a consequence,
mutually exclusive events have their probability defined as follows:

An example of mutually exclusive events are the outcomes of a fair coin flip. When you flip a fair coin, you either get a head or a tail but not both, we can prove that
these events are mutually exclusive by adding their probabilities:

For any given pair of events, if the sum of their probabilities is equal to one, then those two events are mutually exclusive.

Rules of Probability for Mutually Exclusive Events

Multiplication Rule

From the definition of mutually exclusive events, we should quickly conclude the following:

Addition Rule

As we defined above, the addition rule applies to mutually exclusive events as follows:

Subtraction Rule

From the addition rule above, we can conclude that the subtraction rule for mutually exclusive events takes the form;

Conditional Probability for Mutually Exclusive Events

We have defined conditional probability with the following equation:

We can redefine the above using the multiplication rule

hence

Below is a venn diagram of a set containing two mutually exclusive events A and B.

Introduction to Probability Distributions - Random Variables


A random variable is defined as a function that associates a real number (the probability value) to an outcome of an experiment.

In other words, a random variable is a generalization of the outcomes or events in a given sample space. This is possible since the random variable by definition can
change so we can use the same variable to refer to different situations. Random variables make working with probabilities much neater and easier.

A random variable in probability is most commonly denoted by capital X, and the small letter x is then used to ascribe a value to the random variable.

For examples, given that you flip a coin twice, the sample space for the possible outcomes is given by the following:

There are four possible outcomes as listed in the sample space above; where H stands for heads and T stands for tails.

The random variable X can be given by the following:

To find the probability of one of those out comes we denote that question as:

which means that the probability that the random variable is equal to some real number x.

In the above example, we can say:

Let X be a random variable defined as the number of heads obtained when two coins are tossed. Find the probability the you obtain two heads.

So now we've been told what X is and that x = 2, so we write the above information as:

Since we already have the sample space, we know that there is only one outcomes with two heads, so we find the probability as:

we can also simply write the above as:

From this example, you should be able to see that the random variable X refers to any of the elements in a given sample space.

There are two types of random variables: discrete variables and continuous random variables.

Discrete Random Variables


The word discrete means separate and individual. Thus discrete random variables are those that take on integer values only. They never include fractions or decimals.

A quick example is the sample space of any number of coin flips, the outcomes will always be integer values, and you'll never have half heads or quarter tails. Such a
random variable is referred to as discrete. Discrete random variables give rise to discrete probability distributions.

Continuous Random Variable

Continuous is the opposite of discrete. Continuous random variables are those that take on any value including fractions and decimals. Continuous random variables
give rise to continuous probability distributions.

Probability Distributions
A probability distribution is a mapping of all the possible values of a random variable to their corresponding probabilities for a given sample space.

The probability distribution is denoted as

which can be written in short form as

The probability distribution can also be referred to as a set of ordered pairs of outcomes and their probabilities. This is known as the probability function f(x).

This set of ordered pairs can be written as:

where the function is defined as:

Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) is defined as the probability that a random variable X with a given probability distribution f(x) will be found at a value less
than x. The cumulative distribution function is a cumulative sum of the probabilities up to a given point.

The CDF is denoted by F(x) and is mathematically described as:

Discrete Probability Distributions


Discrete random variables give rise to discrete probability distributions. For example, the probability of obtaining a certain number x when you toss a fair die is given by
the probability distribution table below.
x

P(X = x)

6
6
6
6
6

For a discrete probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample space and f(x) is its probability, must follow the
following:

P(X = x) = f(x)

f(x) 0

x f(x) = 1

Cumulative Distribution Function for a Discrete Random Variable

For a discrete random variable, the CDF is given as follows:

In other words, to get the cumulative distribution function, you sum up all the probability distributions of all the outcomes less than or equal to the given variable.

For example, given a random variable X which is defined as the face that you obtain when you toss a fair die, find F(3)

The probability function can also found from the cumulative distribution function, for example

given that you know the full table of the cumulative distribution functions of the sample space.

Continuous Probability Distribution


Continuous random variables give rise to continuous probability distributions. Continuous probability distributions can't be tabulated since by definition the probability
of any real number is zero i.e.

This is because the random variable X is continuous and as such can be infinitely divided into smaller parts such that the probability of selecting a real integer value x is
zero.

Consequently, the continuous probability distribution is found as

and so on.

While a discrete probability distribution is characterized by its probability function (also known as the probability mass function), continuous probability distributions
are characterized by their probability density functions.

Since we look at regions in which a given outcome is likely to occur, we define the Probability Density Function (PDF) as the a function that describes the probability that
a given outcome will occur at a given point.

This can be mathematically represented as:

In other words, the area under the curve.

For a continuous probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample space and f(x) is its probability, must follow the
following:

P(x_1 < X < x2) = x_1x2 f(x) dx

f(x) 0 for all real numbers

f(x) dx = 1

Cumulative Distribution Function for a Continuous Probability Distribution

For a continuous random variable X, its CDF is given by

which is the same as saying:

and

From the above, we can see that to find the probability density function f(x) when given the cumulative distribution function F(x);

if the derivative exists.

Continuous probability distributions are given in the form

whereby the above means that the probability density function f(x) exists within the region {x;a,b} but takes on the value of zero anywhere else.

For example, given the following probability density function

Find

1.

P(X 4)

2.

P(X < 1)

3.

P(2 X 3)

4.

P(X > 1)

5.

F(2)

Solutions:

1. P(X 4)

Since we're finding the probability that the random variable is less than or equal to 4, we integrate the density function from the given lower limit (1) to the limit we're
testing for (4).

We need not concern ourselves with the 0 part of the density function as all it indicates is that the function only exists within the given region and the probability of the
random variable landing anywhere outside of that region will always be zero.

2. P(X < 1)

P(X < 1) = 0 since the density function f(x) doesn't exist outside of the given boundary

3. P(2 X 3)

Since the region we're given lies within the boundary for which x is defined, we solve this problem as follows:

4. P(X > 1)

The above problem is asking us to find the probability that the random variable lies at any point between 1 and positive Infinity. We can solve it as follows:

but remember that we approximate the inverse of infinity to zero since it is too small

The above is our expected result since we already defined f(x) as lying within that region hence the random variable will always be picked from there.

5. F(2)

The above is asking us to find the cumulative distribution function evaluated at 2.

Thus F(2) can be found from the above as

Joint Probability Distributions


In the section on probability distributions, we looked at discrete and continuous distributions but we only focused on single random variables. Probability distributions
can, however, be applied to grouped random variables which gives rise to joint probability distributions. Here we're going to focus on 2-dimensional distributions (i.e.
only two random variables) but higher dimensions (more than two variables) are also possible.

Since all random variables are divided into discrete and continuous random variables, we have end up having both discrete and continuous joint probability
distributions. These distributions are not so different from the one variable distributions we just looked at but understanding some concepts might require one to have
knowledge of multivariable calculus at the back of their mind.

Essentially, joint probability distributions describe situations where by both outcomes represented by random variables occur. While we only X to represent the
random variable, we now have X and Y as the pair of random variables.

Joint probability distributions are defined in the form below:

where by the above represents the probability that events x and y occur at the same time.

The Cumulative Distribution Function (CDF) for a joint probability distribution is given by:

Discrete Joint Probability Distributions


Discrete random variables when paired give rise to discrete joint probability distributions. As with single random variable discrete probability distribution, a discrete
joint probability distribution can be tabulated as in the example below.

The table below represents the joint probability distribution obtained for the outcomes when a die is flipped and a coin is tossed.

f(x,y)

Row Totals

Heads

Tails

Column Totals

In the table above, x = 1, 2, 3, 4, 5, 6 as outcomes when the die is tossed while y = Heads, Tails are outcomes when the coin is flipped. The letters a through l represent
the joint probabilities of the different events formed from the combinations of x and ywhile the Greek letters represent the totals and should equal to 1. The row
sums and column sums are referred to as the marginal probability distribution functions (PDF).

We shall see in a moment how to obtain the different probabilities but first let us define the probability mass function for a joint discrete probability distribution.

The probability function, also known as the probability mass function for a joint probability distribution f(x,y) is defined such that:

f(x,y) 0 for all (x,y)

Which means that the joint probability should always greater or equal to zero as dictated by the fundamental rule of probability.

x y f(x,y) = 1

Which means that the sum of all the joint probabilities should equal to one for a given sample space.

f(x,y) = P(X =x, Y = y)

The mass probability function f(x,y) can be calculated in a number of different ways depend on the relationship between the random variables X and Y.

As we saw in the section on probability concepts, these two variables can be either independent or dependent.

If X and Y are Independent:

In the example we gave above, flipping a coin and tossing a die are independent random variables, the outcome from one event does not in any way affect the
outcome in the other events. Assuming that the coin and die were both fair, the probabilities given by a through l can be obtained by multiplying the probabilities of
the different x and y combinations.

For example: P(X = 2, Y = Tails) is given by

Since we claimed that the coin and the die are fair, the probabilities a through l should be the same.

The marginal PDF's, represented by the Greek letters should be the probabilities you expect when you obtain each of the outcomes.

For example:

The table thus becomes:

f(x,y)

Heads

Tails

Column Totals

12

12

12

12

12

12

12

12

12

12

Row Totals

12

12

12

If X and Y are Dependent:

If X and Y are dependent variables, their joint probabilities are calculated using their different relationships as in the example below.

Given a bag containing 3 black balls, 2 blue balls and 3 green balls, a random sample of 4 balls is selected. Given that X is the number of black balls and Y is the number
of blue balls, find the joint probability distribution of X and Y.

Solution:

The random variables X and Y are dependent since they are picked from the same sample space such that if any one of them is picked, the probability of picking the
other is affected. So we solve this problem by using combinations.

We've been told that there are 4 possible outcomes of X i.e {0,1,2,3} where by you can pick none, one, two or three black balls; and similarly for Y there are 3 possible
outcomes {0,1,2} i.e. none, one or two blue balls.

The joint probability distribution is given by the table below:

f(x,y)

Row Totals

0
1
2
Column Totals

To fill out the table, we need to calculate the different entries. We know the total number of black balls to be 3, the total number of blue balls to be 2, the total sample
need to be 4 and the total number of balls in the bag to be 3+2+3 = 8.

We find the joint probability mass function f(x,y) using combinations as:

What the above represents are the different number of ways we can pick each of the required balls. We substitute for the different values of x (0,1,2,3) and y
(0,1,2) and solve i.e.

f(0,0) is a special case. We don't calculate this and we outright claim that the probability of obtaining zero black balls and zero blue balls is zero. This is because of the
size of the entire population relative to the sample space. We need 4 balls from a bag of 8 balls, in order not to pick black nor blue balls, we would need there to be at
least 4 green balls. But we only have 3 green balls so we know that as a rule we must have at least either one black or blue ball in the sample.

f(3,2) doesn't exist since we only need 4 balls.

From the above, we obtain the joint probability distribution as:

f(x,y)

Column Totals

70

70

18

70

70

70

30

70
70

70

70

18

15

70

40

70
70

70
70
70

30

Row Totals

70

15

70

Continuous Joint Probability Distribution


Continuous Joint Probability Distributions arise from groups of continuous random variables.

Continuous joint probability distributions are characterized by the Joint Density Function, which is similar to that of a single variable case, except that this is in two
dimensions.

The joint density function f(x,y) is characterized by the following:

f(x,y) 0, for all (x,y)

f(x,y) dx dy = 1

For any region A lying in the xy plane,

The marginal probability density functions are given by

whereby the above is the probability distribution of random variable X alone.

The probability distribution of the random variable Y alone, known as its marginal PDF is given by

Example:

A certain farm produces two kinds of eggs on any given day; organic and non-organic. Let these two kinds of eggs be represented by the random variables X and Y
respectively. Given that the joint probability density function of these variables is given by

a) Find the marginal PDF of X

b) Find the marginal PDF of Y

c) Find the P(X 12, Y 12)

Solution:

a) The marginal PDF of X is given by g(x) where

b) The marginal PDF of Y is given by h(y) where

c) P(X 12, Y 12

Mixed Joint Probability Distribution


So far we've looked pairs of random variables where both variables are either discrete or continuous. A joint pair of random variables can also be composed of one
discrete and one continuous random variable. This gives rise to what is known as a mixed joint probability distribution.

The density function for a mixed probability distribution is given by

where by X is a continuous random variable and Y is a discrete random variable, g(x) is the marginal pdf of X.

The cumulative distribution function is given by

Conditional Probability Distribution


Conditional Probability Distributions arise from joint probability distributions where by we need to know that probability of one event given that the other event has
happened, and the random variables behind these events are joint.

Conditional probability distributions can be discrete or continuous, but the follow the same notation i.e.

where the above is the conditional probability of X given that Y = y.

The conditional probability of variable Y given that X = x is given by:

The conditional probability distribution for a discrete set of random variables can be found from:

where the above is the probability that X lies between a and b given that Y = y.

For a set of continuous random variables, the above probability is given as:

Two random variables are said to be statistically independent if their conditional probability distribution is given by the following:

where g(x) is the marginal pdf of X and h(y) is the marginal pdf of Y.

Expected Values of Random Variables


We already looked at finding the mean in the section on averages. Random variables also have means but their means are not calculated by simply adding up the
different variables.

The mean of a random variable is more commonly referred to as its Expected Value, i.e. the value you expect to obtain should you carry out some experiment whose
outcomes are represented by the random variable.

The expected value of a random variable X is denoted by

Given that the random variable X is discrete and has a probability distribution f(x), the expected value of the random variable is given by:

Given that the random variable X is continuous and has a probability distribution f(x), the expected value of the random variable is given by:

Example 1:

The probability distribution of X, the number of red cars John meets on his way to work each morning, is given by the following table:

x
0
1
2
3
4

f(x)
0.41
0.37
0.16
0.05
0.05

Find the number of red cars that John expects to run into each morning on his way to work.

Solution:

This question is asking us to find the average number of red cars that John runs into on his way to work. What makes this different from an ordinary mean question is
that the odds (probability) of running into a given number of cars are not the same.

Since X is a discrete random variable, the expected value is given by:

Although you wouldn't expect to run into 0.88 cars, let's pretend that the above is multiplied by 100 to get the actual number of cars that John comes across on his way
to work.

Example 2:

A certain software company uses a certain software to check for errors on any of the programs it builds and then discards the software if the errors found exceed a
certain number. Given that the number of errors found is represented by a random variable X whose density function is given by

Find the average number of errors the company expects to find in a given program.

Solution:

The random variable X is given as a continuous random variable, thus its expected value can be found as follows:

The company should expect to find approximately 14.93 errors.

Expected Value of an Arbitrary Function

In some cases, an event is represented by a function of the random variable which we refer to as g(X). To find the expected value of this event, we find substitute the
function for the variable in the expectation formula, i.e.

For a discrete variable X:

For a continuous random variable X:

Example 3:

X is a random variable given by the following probability distribution:

x
-3
6
9

f(x)
1

6
1
2
1
3

Given that g(X) = (x2 + 2), find E[g(X)]

Solution:

For a discrete random variable X, the expected value of an arbitrary function is given by

Example 4:

Given that X is a continuous random variable whose PDF is given by

find E[g(X)] given that g(X) = 3x2

Solution:

For a continuous random variable, the expected value of an arbitrary function of the random variable g(X) is given by

Expected Value of Joint Random Variables


For a pair of random variables X and Y with a joint probability distribution f(x,y), the expected value can be found by use of an arbitrary function of the random
variables g(X,Y) such that

for a discrete pair of random variables X and Y

for a continuous set of random variables X and Y

Example 5:

Given a pair of discrete random variables X and Y whose joint probability distribution function is given by the table below;

f(x,y)
1
2
3

2
0.10
0.20
0.10

4
0.15
0.30
0.15

Find the expected value of the function g(X,Y) given that

Solution:

For a pair of discrete random variables, the joint probability distribution is given by:

Example 6:

Given the random variables X and Y and the function g(X,Y) = XY, find E[G(X,Y)] if the joint density function is given by;

Solution:

The expected value is given by

Variance and Standard Deviation of a Random Variable


We have already looked at Variance and Standard deviation as measures of dispersion under the section on Averages. We can also measure the dispersion of Random
variables across a given distribution using Variance and Standard deviation. This allows us to better understand whatever the distribution represents.

The Variance of a random variable X is also denoted by ;2 but when sometimes can be written as Var(X).

Variance of a random variable can be defined as the expected value of the square of the difference between the random variable and the mean.

Given that the random variable X has a mean of , then the variance is expressed as:

In the previous section on Expected value of a random variable, we saw that the method/formula for calculating the expected value varied depending on whether the
random variable was discrete or continuous. As a consequence, we have two different methods for calculating the variance of a random variable depending on
whether the random variable is discrete or continuous.

For a Discrete random variable, the variance 2 is calculated as:

For a Continuous random variable, the variance 2 is calculated as:

In both cases f(x) is the probability density function.

The Standard Deviation in both cases can be found by taking the square root of the variance.

Example 1

A software engineering company tested a new product of theirs and found that the number of errors per 100 CDs of the new software had the following probability
distribution:
x

f(x)

0.01

0.25

0.4

0.3

0.04

Find the Variance of X

Solution

The probability distribution given is discrete and so we can find the variance from the following:

We need to find the mean first:

Then we find the variance:

Example 2

Find the Standard Deviation of a random variable X whose probability density function is given by f(x) where:

Solution

Since the random variable X is continuous, we use the following formula to calculate the variance:

First we find the mean

Then we find the variance as:

Simplifying the Variance formula

We have seen that variance of a random variable is given by:

We can attempt to simplify this formula by expanding the quadratic in the formula above as follows:

We shall see in the next section that the expected value of a linear combination behaves as follows:

Substituting the expanded form into the variance equation:

Remember that after you've calculated the mean , the result is a constant and the expected value of a constant is that same constant.

This simplifies the formula as shown below:

but

which means that;

The above is a simplified formula for calculating the variance.

We can also derive the above for a discrete random variable as follows:

but since the total probability is 1

and

Therefore,

where by;

Hence

For a continuous random variable:

whereby

which means that

Variance of an Arbitrary function of a random variable g(X)


Consider an arbitrary function g(X), we saw that the expected value of this function is given by:

For a discrete case

For a continuous case

The variance of this functiong(X) is denoted as g(X) and can be found as follows:

For X is a discrete random variable

For X is a continuous random variable

Covariance
In the section on probability distributions, we saw that at times we might have to deal with more than one random variable at a time, hence the need to study Joint
Probability Distributions.

Just as we can find the Expected value of a joint pair of random variables X and Y, we can also find the variance and this is what we refer to as the Covariance.

The Covariance of a joint pair of random variables X and Y is denoted by:

Cov(X,Y).

Potrebbero piacerti anche