Bikeshare Data Analysis

P1
January 16, 2017
1 P1: Test a Perceptual Phenomenon - Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a
color of ink. The participants task is to say out loud the color of the ink in which the word is
printed. The task has two conditions: a congruent words condition, and an incongruent words
condition. In the congruent words condition, the words being displayed are color words whose
names match the colors in which they are printed: for example RED, BLUE. In the incongruent
words condition, the words displayed are color words whose names do not match the colors in
which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it
takes to name the ink colors in equally-sized lists. Each participant will go through and record a
time from each condition
2 Questions For Investigation
2.1 What is our independent variable? What is our dependent variable?

Independent variable: Based on the color of the words displayed, two cases are identified
namely Congruent and Incongruent. THe identification of the corresponding case is the
independent variable.
Dependent variable: time to identify the color of words corresponding to congruent or in-
congruent case.
2.2 What is an appropriate set of hypotheses for this task? What kind of statistical
test do you expect to perform? Justify your choices.
Null hypothesis : Population mean of Congruent (C ) and Incongruent (I ) cases are
equal.
H0 : C I = 0
Alternate hypothesis : Population mean of Congruent (C ) and Incongruent (I ) cases
are different.
HA : C I 6= 0
1
Since the sample size n < 30, one sample two tailed t-test (for paired samples) with =
.05 is proposed. This will determine whether there is a significant difference in the two
samples namely Congruent and Incongruent cases. We dont know the population standard
deviation, hence the Bessel corrected standard deviation of the sample should be used.
Assumptions made:
We assume the distributions of dependent samples and their difference are normaly
distributed (Gaussian).
We assume the samples are randomly selected.
In [3]: import csv

import numpy as np
import pandas as pd
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('dataset.csv') # read the data
display(data)
Congruent Incongruent Diff_CminusI

0 12.079 19.278 -7.199
1 16.791 18.741 -1.950
2 9.564 21.214 -11.650
3 8.630 15.687 -7.057
4 14.669 22.803 -8.134
5 12.238 20.878 -8.640
6 14.692 24.572 -9.880
7 8.987 17.394 -8.407
8 9.401 20.762 -11.361
9 14.480 26.282 -11.802
10 22.328 24.524 -2.196
11 15.298 18.644 -3.346
12 15.073 17.510 -2.437
13 16.929 20.330 -3.401
14 18.200 35.255 -17.055
15 12.130 22.158 -10.028
16 18.495 25.139 -6.644
17 10.639 20.429 -9.790
18 11.344 17.425 -6.081
19 12.369 34.288 -21.919
20 12.944 23.894 -10.950
21 14.233 17.960 -3.727
22 19.710 22.058 -2.348
23 16.004 21.157 -5.153
2
2.2.1 Report some descriptive statistics regarding this dataset. Include at least one measure of
central tendency and at least one measure of variability.
Mean and Stand deviation for both cases are given.
For congruent case (n = 24) :
xC = 14.051 D = 3.559
For incongruent case (n = 24) :

xI = 22.016 I = 4.797
In [4]: fig=plt.figure(figsize=(7,5.5))
plt.subplot(221)
plt.hist(data["Congruent"], color="#D86E3F")
plt.xlabel('Time Scores for Congruent', fontsize=10)
plt.ylabel('Frequency', fontsize=10)
plt.subplot(222)
plt.hist(data["Incongruent"], color="#2088B2")
plt.xlabel('Time Scores for Incongruent', fontsize=10)
plt.subplot(223)
plt.hist(data["Congruent"], color="#D86E3F",alpha=0.75,
label="Congruent")
plt.hist(data["Incongruent"], color="#2088B2", alpha=0.75,
label="Incongruent")
plt.xlabel('Time Scores', fontsize=10)
fig.tight_layout()
plt.legend(loc=1,prop={'size':9})
plt.subplot(224)
data[["Congruent", "Incongruent"]].boxplot( return_type='dict', grid=False)
plt.ylabel('Time Scores', fontsize=10)
plt.xlabel('Type', fontsize=10)
plt.show()
3
2.3 Provide one or two visualizations that show the distribution of the sample data.
Write one or two sentences noting what you observe about the plot or plots.
The distribution of data for Congruent and Incongruent is shown above.
Observations rom the frequency distribution:
Most of the time scores for Congruent case is lesser than the Incongruent case with
some overlapping data.
Both distribution have the highest frequency at 6 around the middle of each distribu-
tion,i.e. Mode of Congruent < Mode of Incongruent
Boxplot shows the median of congruent case lesser than the incongruent case with some
outliers in the congruent case. i.e. Median of Congruent < Median of Incongruent
2.4 Now, perform the statistical test and report your results. What is your confidence
level and your critical statistic value? Do you reject the null hypothesis or fail to
reject it? Come to a conclusion in terms of the experiment task? Did the results
match up with your expectations?
Measuring the sample differences as xDi = xC i xIi , we can report
4
mean xD = 7.965
standard deviation D = 4.865
degrees of freedom df = 23
Standard Error of Mean SEM = 0.993
tstatistic = 8.021
For a two-tailed test @ = 0.05, the critical t-value tcritical = 2.0687
Correlation factor r2 = .737
p value < 0.0001
Confidence Interval CI = (10.019, 5.910)
Since the tstatistic fall outside critical value tcritical for = 0.05, the difference between two
samples (congruent and incongruent) are significant i.e. not likely due to random chance.
Alternatively, the probability of both samples the being same is less than 0.01%. Hence the
null hypothesis is rejected.
We can say with a 95% confidence interval that the subject requires around 6 to 10 time-units
less to identify congruent words than incongruent words.
Around 73.7% of data account for the difference in the two samples.
Since this is an experimental data, we can conclude that the time taken by subjects to identify
the ink color of a word was significantly influenced by the match/mismatch with words
represented them.
Yes, the results match with expectectaions.
2.5 Optional: What do you think is responsible for the effects observed? Can you
think of an alternative or similar task that would result in a similar effect? Some
research about the problem will be helpful for thinking about these two ques-
tions!
The verbal and visual centers of cognition in the brain seems to be linked. When there is a con-
tradiction between them, the brain seems to take longer time to process information. It would be
intersting to see if there is a difference in cognition time to identify words with swaped letters.
In [ ]:

Bikeshare Data Analysis

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Bikeshare Data Analysis

Caricato da

Copyright:

Formati disponibili

P1

January 16, 2017

1 P1: Test a Perceptual Phenomenon - Background Information

2 Questions For Investigation

2.1 What is our independent variable? What is our dependent variable?

In [3]: import csv

Congruent Incongruent Diff_CminusI

For incongruent case (n = 24) :

Yes, the results match with expectectaions.

Potrebbero piacerti anche