Sei sulla pagina 1di 15

Department of Public Health and Community Medicine

Faculty of Medicine, Assiut University


The first part of PhD Statistics (all specialties)
i.

Choose the correct answer(s) in the following questions:

Question (1)
Select all of the following variables that are examples of categorical variables:
1. Number of episodes of disease in a patient over a year
2. Serum bilirubin level
3. Gender
4. Severity of haemophilia (mild/moderate/severe)
5. Reduction in blood pressure following antihypertensive treatment
The right answers are number 3, 4
Question (2)
Select all of the following variables that are examples of numerical variables:
1. Number of episodes of disease in a patient over a year
2. Serum bilirubin level
3. Gender
4. Severity of haemophilia (mild/moderate/severe)
5. Reduction in blood pressure following antihypertensive treatment
The right answers are number 1, 2, 5
Question (3)
Select all of the following statements which you believe to be true. An ordinal
variable is one for which:
1. The data are discrete and can take one of many values.
2. The data are continuous and follow an ordered sequence.
3. The data are categorical.

4. The categories of response are ordered.


5. There can only be two categories of response
The right answers are number 3, 4

Question (4)
Select all of the following variables which are measured on a nominal scale:
1. Height in cm
2. Ethnic group.
3. Social class (I/II/III-N/III-M/IV/V).
4. Age categorised as young, middle-aged or old.
5. Age in years
The right answers are number 2
Question (5)
Select all of the following statements which you believe to be true. If you spot outliers
in your data set:
1. You should leave them out of your analysis.
2. You must transform your data before you analyse them.
3. You can use a non-parametric test to analyse the data.
4. You should repeat the experiment and collect a new data set.
5. You can repeat the analysis both including and excluding the outliers, and
include the outliers if the results are consistent.
The right answers are number 3, 5
Question (6
Select all of the following statements which you believe to be true. A histogram:
6. Can be used instead of a pie chart to display categorical data.
7. Is similar to a bar chart but there are no gaps between the bars.

8. Contains contiguous bars, with the height of each bar being proportional to the
frequency of the observations in the range specified by the bar.
9. Can be used to display either a frequency or a relative frequency distribution.
10. Is used to show the relationship between two variables.
The right answers are number 2, 4
Question (7)
Select all of the following statements which you believe to be true. A bar chart:
1. Is used to display categorical data.
2. Can also be called a histogram.
3. Should be drawn without gaps between the bars.
4. Can only be used to display data which have a symmetrical distribution.
5. Contains separate bars, with the length of each bar being proportional to the
relevant frequency or relative frequency.
The right answers are number 1, 5
Question (8
Select all of the following type(s) of figures that would be appropriate for illustrating
the distribution of heights of children in a class.
1. Bar chart
2. Pie chart
3. Stem-and-leaf plot
4. Histogram
5. Box-plot
6. Clustered bar chart
7. Segmented bar chart
8. Scatter plot
The right answers are number 3, 4, 5
Question (9)

Select all of the following type(s) of figures that would be appropriate for illustrating
the relationship between height and weight among individuals in a study.
1. Bar chart
2. Pie chart
3. Stem-and-leaf plot
4. Histogram
5. Box-plot
6. Clustered bar chart
7. Segmented bar chart
8. Scatter plot
The right answer is number 8
Question (10)
Select all of the following type(s) of figures that would be appropriate for illustrating
the distribution of blood groups in a sample of adults.
1. Bar chart
2. Pie chart
3. Stem-and-leaf plot
4. Histogram
5. Box-plot
6. Clustered bar chart
7. Segmented bar chart
8. Scatter plot
The right answers are number 1, 2
Question (11)
Select all of the following type(s) of figures that would be appropriate for illustrating
the relationship between gender and blood group in a sample of adults.
1. Bar chart
2. Pie chart
4

3. Stem-and-leaf plot
4. Histogram
5. Box-plot
6. Clustered bar chart
7. Segmented bar chart
8. Scatter plot
The right answers are number 6, 7
Question (12)
State whether the data reflecting the heights of individuals in the general population
are likely to be skewed to the right, skewed to the left or symmetrical
1. Skewed to the right
2. Skewed to the left
3. Symmetrical
The right answers are number 3
Question (13)
Select all of the following statements which you believe to be true. The arithmetic
mean of a set of values:
1. Is a particular type of average.
2. Is a useful summary measure of location if the data are skewed to the right.
3. Coincides with the median if the distribution of the data is symmetrical.
4. Is always greater than the median.
5. Cannot be calculated if the data set contains both positive and negative values.
The right answers are number 1, 3
Question (14)
Select all of the following statements which you believe to be true. The median:
1. Is a measure of the spread of the data.
2. Is a useful summary measure when the data are skewed to the right.
3. Is greater than the arithmetic mean when the data are skewed to the right.
5

4. Is wasteful of information.
5. Can be distorted by outliers.
The right answers are number 2, 4
Question (15)
Select all of the following statements which you believe to be true. The standard
deviation of a set of observations:
1. Is a measure of 'location'.
2. Is the square root of the variance so it can only be determined if all the values
in the data set are positive.
3. Has the same units of measurement as the raw data.
4. Is a measure of spread which is equal to the range.
5. Is unaffected by outliers.
6. Is an inappropriate measure of spread for skewed data.
The right answers are number 3, 6
Question (16)
Select all of the following statements which you believe to be true. The 95%
confidence interval for a proportion:
1. Cannot be calculated if the sample size is small.
2. Is the interval within which 95% of sample proportions would lie if we were to
take repeated samples of a given size from the population.
3. Is the interval within which we expect the population proportion to lie with
95% certainty.
4. Is wider than the 99% confidence interval for the proportion.
5. Is calculated as the sample proportion standard error of the proportion.
The right answers are number 2, 3
Question (17)
Select all of the following statements which you believe to be true. The P-value is:
1. The probability that the null hypothesis is true.
2. The probability that the alternative hypothesis is true.

3. The probability of obtaining the observed or more extreme results if the


alternative hypothesis is true.
4. The probability of obtaining the observed results or results which are more
extreme if the null hypothesis is true.
5. Always less than 0.05.
The right answer is number 4
Question (18)
Select all of the following statements which you believe to be true. If the 95%
confidence interval for the mean of a variable of interest obtained from a sample for
values contains a hypothesized value, 1, of the mean:
1. We are 95% certain that the sample mean lies within the interval
2. We are 95% certain that the true mean in the population equals 1
3. We are 95% certain that the sample mean equals the population mean.
4. There is a 5% chance that the population mean lies outside this interval.
5. We can reject the null hypothesis that the true population mean equals m1 at
the 5% level of significance.
The right answers are number 4
Question (19)
Select all of the following statements which you believe to be true. The paired t-test is
appropriate when:
1. The variable of interest is binary.
2. We want to compare two numerical variables when each is measured on every
individual in the sample.
3. The differences between the pairs of observations are Normally distributed.
4. We wish to test the null hypothesis that the mean of the differences between
the pairs of observations in the sample is equal to zero.
5. We wish to test the null hypothesis that the median of the differences between
the pairs of observations in the population is equal to zero.
The right answer is number 3

Question (20)
In a study to evaluate the importance of different factors on the development of
cirrhosis among hepatitis C virus positive individuals (Verbaan H et al. J Vir Hep
1998; 5: 43-51), 35/79 of those without cirrhosis reported alcohol abuse compared to
10/20 of the individuals with cirrhosis (P=0.84). Select all of the following statements
which you believe to be true.
1. We could use the unpaired t-test to analyse these data.
2. The Chi-squared test requires the calculation of observed frequencies in each
cell of the contingency table created from the results.
3. The Chi-square test requires the calculation of expected frequencies in each
cell of the contingency table.
4. An appropriate null hypothesis is that the same number of individuals in the
populations with and without cirrhosis report alcohol abuse.
5. The P-value of 0.84 indicates that there is no difference in the true rates of
alcohol abuse in the two groups.
The right answer is number 3,5
Question (21)
Select all of the following statements which you believe to be true. The Pearson
correlation coefficient between two variables, x and y:
1.
2.
3.
4.

Is always positive.
Is dimensionless.
Takes the same value when the variables x and y are interchanged.
Takes the value zero when there is no linear association between the two
variables x and y.
5. Takes the value + 1 when one variable increases as the other variable
decreases in value, and it is possible to draw a straight line on the scatter
diagram with all the points lying on it.
The right answer is number 2, 3, 4
Question (22)
Both the mean and median of the blood pressure are approximately 83 mmHg and
standard deviation is 12 mm Hg. These indices enable us to conclude each of the
following statements except:
1. Approximately 95% of the mean have diastolic blood pressure between 59 and
107 mm Hg.
2. The distribution is nearly symmetrical
3. The 95% confidence limits on the mean for all men ,aged 30 to 69, in this
population are 59 and 107
8

4. The mean is not distorted very much by extremely high blood pressure in this
case
The right answers is number 3
Question 23)
Select all of the following statements which you believe to be true. We use the Chisquared test to compare two proportions in a 2x2 contingency table provided that:
1.
2.
3.
4.
5.

The rows (and columns) of the table are mutually exclusive.


The data are Normally distributed.
The observed frequency in each cell of the table is greater than or equal to 5.
The observed and expected frequencies in each cell of the table are equal.
There is no association between the factors that define the rows and columns
of the table

The right answers is number 1


Question (24)
Select all of the following statements which you believe to be true. The slope of the
linear regression line between an explanatory variable, x, and a dependent variable, y,
is:
1.
2.
3.
4.
5.

The same as the gradient of the line.


The value of Y when x = 0, where Y is the predicted value of y.
The average change in Y for a unit increase in x.
Always positive.
Often called the regression coefficient.

The right answers are number 1, 3, 5


Question (25)
Select all of the following statements which you believe to be true. We perform a
multiple linear regression analysis when we want:
1. To predict the value of the explanatory variable from two or more dependent
variables.
2. To know whether or not an explanatory variable is linearly related to a
dependent variable, after adjusting for other covariates.
3. To know if many covariates are linearly related to each other.
4. To create a model which describes the linear relationship between a dependent
variable and some covariates.
5. To establish whether the dependent variable is Normally distributed.
The right answers are number 2, 4

Question (26)
The following are true about confidence intervals:
1. the intervals are larger with smaller sample size
2. they indicate the presence or otherwise of a statistical difference between two
groups
3. a 95% confidence interval means that 95% of all observed values fall within
that interval.
4. the intervals give a range of values within which the true value will lie.
The right answers are number 1, 3, 4
Question (27)
True statements about non-parametric tests include:
1. they can be used on small samples
2. they can be used to analyse samples that are normally distributed
3. Student's paired t-test is a non-parametric test
4. they can be applied to ordinal data
5. they can not be used if the nature of the distribution of the data is unknown.
The right answers are number 1, 4
Question (28)
In statistics:
1. null hypothesis describes the probability that a relationship exists between
two samples.
2. analytical statistics are the same as inferential statistics.
3. descriptive statistics produce mean, median and mode from data.
4. the mode is the measurement which lies exactly between each end of a range
of values ranked in order.
The right answers are number 2, 3
Question (29)
Correlation coefficient:
1. is denoted by the symbol "r"
2. is measured on a scale of 0 to 1.
3. a positive value implies that a rise in one variable accompanies a rise in the
other.
4. describes the degree of association between two variables
The right answers are number 1, 3, 4

10

Question (30)
Non-parametric tests include
1. ANOVA
2. Student's t-test
3. Chi-squared test
4. Wilcoxon signed rank test
5. Mann-Whitney U test.
The right answers are number 3, 4, 5
Question (31)
In Student's t-test:
1.
2.
3.
4.

it is a parametric test
distribution is normal at infinite degrees of freedom
is especially useful for multivariant analysis
can be used to study the effect of an eye drop on intraocular pressure

The right answers are number 1, 2, 4


Question (32)
In a normal distribution:
1.
2.
3.
4.
5.

the mean is the same as the mode


the mean is higher than the median
95% of observations lie within one standard deviation of the mean
Mann-Whitney's test is suitable for analysis
the coefficient of variation measures the spread of the values

The right answers are number 1, 5


Question (33)
Paired t-test:
1. applies to normal distribution
2. used only on large number of patients
3. is not suitable for samples less than 20
4. used for the analysis of quantitative data
5. used for two independent samples
The right answers are number 1, 4
Question (34)
The standard error of the mean (SEM):
1. is the square root of the variance
11

2.
3.
4.
5.

measures the spread of observations around the mean


assess the reliability of the mean
is always smaller than the standard deviation
SEM is equal to standard deviation (SD) divided by the number of samples (n)

The right answers are number 3, 4


Question (35)
Select all of the following statements which you believe to be true. The two-sample ttest:
1-May also be called the paired t-test.
2-Is an alternative test to the Wilcoxon signed ranks test.
3-Is appropriate when our aim is to compare the medians in two independent
groups of observations.
4-Assumes that the observations in each of the two comparative groups have the
same variance.
The right answers is number 4
Question (36)
The estimated Pearson correlation coefficient between systolic blood pressure (mm
Hg) and age (years) in a sample of 30 middle-aged women from a given community
was r = 0.72 (P < 0.001). Hence r2 = 0.52. Select all of the following statements
which you believe to be true.
1. There is substantial evidence that systolic blood pressure and age in these
women are linearly related.
2. 72% of the variability of systolic blood pressure in these women can be
explained by its linear relationship with age.
3. 48% of the variability of systolic blood pressure in these women is
unexplained by its linear relationship with age.
4. We can conclude that increasing age is a cause of rising systolic blood
pressure in these women.
5. The null hypothesis that has been tested is that there is no association between
systolic blood pressure and age in these women.
The right answers are number 1, 3

Question (37)
Select all of the following statements which you believe to be true. The Wilcoxon
signed ranks test is appropriate when:
1. We cannot use the sign test because its underlying assumptions are not
satisfied.
12

2. The differences between pairs of observations are not Normally distributed.


3. The observations in each of two paired groups are measured on an ordinal
scale.
4. We have two independent groups of observations.
5. We require a more powerful test than the sign test.
The right answers are number 2, 3, 5

Complete the following sentences with the correct


answers :

ii.

1- Confidence interval = X + Z SE, and its upper and lower ends are called
Confidence interval boundaries
2- What is meant by 95% confidence interval if we were to repeat the
experiment many times, the interval would contain the true population
mean on 95 % of occasions. We usually interpret this confidence interval
as the range of values within which we are 95 % confident that the true
population mean lies
3- What is P-value the level of marginal significance with statistical
hypothesis test ( probability of error)
4- T-test compares two means, while Chi Square Test compares two
proportions.
5- One- way analysis of variance is used when compares more than two means
6- Null hypothesis stated that no effect (e.g. the difference in means equals
zero) in the population, while alternative hypothesis stated that there is
effect (e.g. the difference in means not equal zero) in the population,
7- The median equals to The 50th Percentile.
8

What are the possible results of correlation analysis?


Positive correlation
Negative correlation
No correlation

What are the types of regression analysis do you know?


Linear regression analysis.
Logistic regression analysis
Multivariate regression analysis

10- Positive skewnnes means the 'shape' of the frequency distribution skewed
to the right a long tail to the right with one or a few high values
While negative skewness means the 'shape' of the frequency distribution
skewed to the left a long tail to the left with one or a few low values

13

11- Dependant variable is the predictor variable while independent variable is the
outcome variable (false)
12- What is power of the study?
The probability of rejecting the null hypothesis when it is false
13- value of the correlation co-efficient ( r ) lies between -1 and +1 and its
categories are as follows:
Negligible > 0 - less than 0.2
Weak 0.2 to less than 0.5
Moderate 0.5 to less than 0.8
Strong 0.8 to less than 1.
Perfect when r = 1
Zero no correlation
14- In SPSS, where do you find the option for Spearman's Rho?
The right answer: Correlation.
15- What test would you use to assess whether there is a significant difference
between the mean ranks of two conditions?
The right answer: Both the Mann Whitney and the Wilcoxon can be used. Mann
Whitney is suitable for independent samples and Wilcoxon when you have
related samples.
16- In SPSS, where will you find an alternative to the independent samples t-test?
The right answer: Non parametric tests - 2 independent samples.
17- Where in SPSS would you find the option for conducting a Kruskal-Wallis
test?
The right answer: Non-parametric tests- K independent samples.
18- If you achieved a p-value of 0.04 on a two-tailed test, what would the equivalent
one-tailed p-value be?

The right answer 0.02

iii.

Match these parametric tests with their non-parametric alternatives:


Option
14

Pearson's r
Related samples t-test
ANOVA (independent samples)

A. Kruskal-Wallis
B. Wilcoxon signed ranks
C. Spearman's Rho

The right answer


Option
Pearson's r
Related samples t-test
ANOVA (independent samples)

C. Spearman's Rho
B. Wilcoxon signed ranks
A. Kruskal-Wallis

15

Potrebbero piacerti anche