Sei sulla pagina 1di 5

Anu Varghese

23 April 2014
STA 2023-022
Professor Taysseer Sharaf

Data Analysis Project
(EXTRA CREDIT)

Part A: Descriptive Statistics

1.) Find the Five number summary plus Mean, Variance, and standard deviation for Age, Tumor Size in
mm, and Survival time in months. What can you say about the shape of the distribution of the three
variables?

Descriptive Statistics: Age
Variable Age
Mean 57.036
Standard Deviation 17.438
Variance 304.075
Minimum 3.000
Q1 45.250
Median 58.000
Q3 70.000
Maximum 100.000
The shape of the distribution for the age variable most nearly resembles a normal distribution. This has been
exemplified with the mean and median being similar, but the standard deviation being higher than a perfect
normal distribution.

Descriptive Statistics: Tumor Size
Variable Tumor Size (mm)
Mean 0.6337 mm
Standard Deviation 1.8877 mm
Variance 3.5634 mm
Minimum 0.0100 mm
Q1 0.0800 mm
Median 0.1500 mm
Q3 0.3300 mm
Maximum 9.9500 mm
The shape of the distribution for the tumor size variable is not normal because most of the values lie to the left
of the mean. Since the median is 0.1500 and the mean is .6337, the distribution is skewed to the right.

Descriptive Statistics: Survival Time
Variable Survival Time (months)
Mean 34.352 months
Standard Deviation 23.846 months
Variance 568.643 months
Minimum 1.000 months
Q1 14.000 months
Median 31.000 months
Q3 53.000 months
Maximum 83.000 months

The shape of the distribution for the survival time variable is spread out over the entire data range because the
standard deviation is 23.846. This value is large compared to the mean of 34.352.

2.) Draw a histogram for Age of Patient at Diagnosis and their tumor sizes. What can you say about the
distribution of Age and tumor size? Do you agree with the answer in question 1?


According to the histogram of age, the distribution for this particular variable closely resembles that of a normal
distribution as was predicted for question number one.


According to the histogram of tumor size, the distribution for this particular variable is skewed to the right.

3. From the given sample of Melanoma patients, which year has the higher diagnosis number of
patients? Hint: Use Bar Chart show you the frequency.


According to the Chart of Year Diagnosed, the year in which the highest number of diagnosed patients was in
2009.

4. What can you say about the distribution of Survival time?


According to the Histogram of Survival Time, the distribution of the survival time tends to have decreased with
each passing month. This kind of data can be expected with people who suffer from melanoma since it is the
more life threatening form of cancer.

5. Is there a difference between the survival time of Male patients and female patients? Hint: Use Box
plot to compare between the two variables.


According to the Box Plot illustration, there is basically no relevant difference between the survival time of
male patients and female patients. This is indicated in the median of both box plots. Evidence from the box
plots does show that a higher percentage of women survive longer than males.

Part B: Inferential Statistics

6. Based on this sample, compute 90% Confidence interval for the tumor size.
Write your conclusion.

One-Sample T: Tumor Size
Variable N Mean SD SE Mean 90% CI
Tumor Size 1380 0.6337 1.8877 0.0508 (0.5500,0.7173)
According to the values in the above chart, this means that if another random sample was to be taken, there
would be a 90% chance that it would most likely fall within the 0.5500 and 0.7173 ranges. These values are
considered the confidence of variability around the mean or expected value of a random sample.

7. Find 95% Confidence interval for the age of patients at diagnosis. Write your conclusion.

One-Sample T: Age
Variable N Mean SD SE Mean 95% CI
Age 1380 57.036 17.438 0.469 (56.115,57.956)
According to the values in the above chart, this means that there is a 95% assurance that the true mean lies
within 56.115 and 57.956 ranges.

8. It is known that the survival time of melanoma patients if diagnosed early is more increasing now a
days than before. Do this data suggest the claim that the mean survival time is more than 90 months?
Use 5% level of significance and interpret your results.

One-Sample T: Survival Time
Variable N Mean SD SE Mean 5% Lower
Bound
T P
Survival
Time
1380 34.352 23.846 0.642 33.296 -86.69 1.000
As Test of mu = 90 vs > 90
According to the values in the above chart, the data does not demonstrate that the mean qualifies survival time
as more than 90 months. This is because the test statistic is -86.69 which less than the critical value of 1.645
based on the population exemplifying a normal distribution due to the sample being more than 30.



9. From the given sample, it looks like more people are diagnosed in the year 2010 by a proportion of
17%. Test the hypothesis that the true proportion of patients diagnosed in year 2010 is less than 17%.
Use 5% level of significance and interpret your result.

Sample X N Sample p 95% Upper
Bound
Z-value P-value
1 222 1380 0.160870 0.177138 -0.90 0.183
Test of p = 0.17 vs p < 0.17
According to the values in the above chart, the test statistic is -0.33 and the critical values is -1.645 so the H
hypothesis is rejected and the true proportion of patients diagnosed in the 2010 is less than 17%

10. If a patient was diagnosed by melanoma with tumor thickness of more than 2mm, his/her diagnose
is considered as late diagnoses. Does the given data shows an indication that on average, patients are
diagnosed late? Use 5% level of significance and interpret your result.

Variable N Mean SD SE Mean 95 %
Lower
Bound
T P
Tumor Size 1380 .6337 1.8877 .0508 .5500 -26.89 1.000

The data given above proves that there is an indication of patients being diagnosed by melanoma with tumor
thickness of more than 2mm. The null hypothesis was failed to be rejected using the 5% level of significance
that the patients received a late diagnosis. This meant that on average, the patients were diagnosed late. This
was evident when the P value was calculated as being 1 greater than a 5 % level significance.

Potrebbero piacerti anche