STAT Reviewer

CHAPTER 1: INTRODUCTION Statistics a complete system of knowledge and processes deals with the o collection of data o presentation of data
ta (organization) o analysis of data (comparing; computations of statistical measures; formulas) o interpretation of data ESSENTIAL PROCESSES IN STATISTICS 1. Collection of Data gathering of related info involves what is useful & needed, where to get info, and how to get info sample size methods of collecting data (quantity) a. direct method (interview) b. indirect method (questionnaire) c. document / registration method d. observation method e. experimental method sampling techniques (quality) 2. Presentation of Data systematic way of organizing data involves collecting, classifying, and arraying data in preparation to its analysis textual form graphical form tabular form 3. Analysis of Data extracting relevant info from the data involves comparison, description, statistical measurements 4. Interpretation of Data drawing of logical statements from analyzed info involves generalizing, forecasting, and recommending solutions
MAJOR FIELDS IN STATISTICS 1. Descriptive Stat functions and aims to describe the data exposes basic summaries / characteristics of the data (examples) 2. Inferential Stat functions and aims to infer or to make interpretations to know something about the population, based on the sample (examples) Data
2.
Continuous Data measurable assume decimals or fractions obtained through measuring use of measuring tools
D. According to Arrangement 1. Ungrouped Data raw data w/o specific order or arrangement 2. Grouped Data organized set of data arranged and tabulated Population complete set of all possible numerical and qualitative responses / observation Sample set of responses / observations taken from the population aims to estimate or identify the characteristics of population serves as basis for decision making for drawing conclusion about the sample Sample Size number of elements in the sample set considers precision in using the sample instead of the population SCALES / LEVELS OF DATA MEASUREMENT: 1. Nominal Scale lowest level used for identification purposes no any quantitative value no application of mathematical operations 2. Ordinal Rank gives order / rank / arrangement quantitative differences cant be determined 3. Interval Scale quantitative differences can be determined no true value of zero (wala pero meron) addition and subtraction 4. Ratio Scale has an absolute value of zero multiplication and division
a body of info or observations being considered by the researcher Information the processed data basis for decision making Variable a categorical description thatdefines the set of certain observable values / characteristics CLASSIFICATIONS OF DATA A. According to Nature 1. Quantitative Data in the form of quantities 2. Qualitative Data in the form of categories, characteristics, names, or labels B. According to Source 1. Primary Data first-hand info original source 2. Secondary Data second-hand info C. According to Measurement 1. Discrete Data countable assume whole number form obtained through counting
SAMPLING TECHNIQUES process of selecting sample elements sample is less time-consuming and economical to work than with the population A. Probability Sampling (non-bias; fair) 1. Random Probability Sampling (requires no particular order) a. Lottery Method b. Using the Table of Random Numbers o Direct Selection Method o Remainder Method 2. Non-Random Probability Sampling (samples are chosen w/o regard to their probability of occurrence) th a. Systematic Sampling Techniques Using the K System b. Stratified Sampling Technique c. Cluster Sampling (Area Sampling Technique) d. Multi-state Sampling Technique B. Non-Probability Sampling (bias; unfair; Judgment ST) 1. Purposive Sampling 2. Quota Sampling 3. Convenience Sampling TYPES OF SURVEY ERRORS 1. Coverage Error inadequate population a group is unintentionally excluded (bias) 2. Non-response Error a group of subjects failed to respond to surveys 3. Sampling Error difference in info between population and sample when samples are not properly selected 4. Measurement Error when the instrument is not properly worded use of ambiguous or leading questions METHODS OF GATHERING DATA 1. Interview Method (direct) 2. Questionnaire Method (indirect) 3. Document Method (registration) 4. Observation Method 5. Experiment Method
PRESENTATION OF DATA 1. Textual Form through printed or spoken words 2. Graphical Form easily facilitates analyzing, summarizing, and understanding volume of info line graph bar graph pie graph pictograph cartograph dot graph 3. Tabular Form data is arrayed in rows and columns a. General Reference Table commonly used; variables are arranged in columns or rows together with its corresponding summarized relative info b. Frequency Distribution Table summary table where data are arranged into groupings Frequency Polygon a line graph presenting the frequency distribution of each class interval (represented by its class mark) Histogram rectangular bars presenting the frequency distribution of each class Ogive line graph presenting the cumulative frequency distribution a) Less Than Ogive b) Greater Than Ogive Terms Used in Relation to a Frequency Distribution: 1. class size (i) number of data w/in the class interval 2. class interval (CI) defined by the lower & upper limet relative to class size 3. frequency (f) number of values contained w/in the class interval
4. class mark (x) middle value of class interval 5. class boundary (CB) boundary of each class interval number situade halfway between the upper limit of one class interval and the lower limit of the next class interval 6. less cumulative frequency (<cf) accumulated sum of f of each CI, less than the corresponding UCB 7. greater cumulative frequency (>cf) accumulated sum of f of each CI, greater than the corresponding LCB 8. relative frequency (rf) percentage share of each f relative to the total f of the data
CHAPTER 2: MEASURES OF CENTRAL TENDENCY AND QUANTILES Measure of Central Tendency single value that summarizes a set of data a value where the set of data tends to center average common characteristics locating what is common MEASURES OF CENTRAL TENDENCY: 1. Mean
arithmetic average of the values summation of all values divided by the number of the values always unique always exists all values are included during the computation affected by a change in any score affected by extreme values useful for comparing populations should not be used for nominal data best for interval and ratio data weighted mean by multiplying each value by an appropriate weight, adding these products, and then dividing the result by the sum of the weights
2.
Median middlemost value divides data into 2 equal parts when data is arranged in order dependent on the position of data Position Measure always exists a typical measure unique; there is only one median for each set of data not affected by extreme values not affected by every value for ordinal, interval, and ratio level median class class interval containing the N/2 item in the less than cumulative frequency (<cf)
Quantile a general descriptive measurement used to separate quantitative data into distinct groups Quantiles are calculated similarly to the median; Data should be arranged in increasing order MEASURES OF POSITION QUANTILES 1. Quartiles divide the values into 4 parts of equal size, each comprising 25% of the observations nd median = 2 quartile Deciles divide the values into 10 parts of equal size, each comprising 10% of the observations th median = 5 decile Percentiles divide the values into 100 parts of equal size, each comprising 1% of the observations th median = 50 decile
3.
Mean Absolute Deviation (MAD) the average of the absolute values of the difference between the individual values and the mean Variance Standard Deviation Variance and SD special forms of average deviation from the mean affected by all individual values of the items in any given distribution measure the average scatter around the mean
4. 5.
2.
CHAPTER 4: PROBABILITY AND NORMAL DISTRIBUTION Probability measure of the chance that an event will happen when probability is 0, it means that the event is impossible to happen; when it is 1, the event is absolutely or certainly to happen can be expressed as fraction, decimal, or percent ranges from 0 to 1 or from 0 to 100% a ratio between a part and the whole Probability Distribution a function that assigns the measure of chance to events or propositions set of interval values that measure the possibility that a random variable will take a value in the interval Significance of Probabilities Provided in the Distribution the probabilities are measurement of areas under the curve that graphically represents certain distributions measures the possibility of committing a certain type of error measures the magnitude of the rejection and nonrejection regions; determines the location of the value of a certain test statistic
3.
Mode
3.
value that occurs most frequently can be seen by inspection (for ungrouped) requires no calculation for ungrouped data a poor measure of location may not exist may not be unique provides no info about the distribution, except that the other scores occurred less frequently than the mode modal class the class interval containing the highest frequency no mode all values occur with equal frequency unimodal 1 modal value bimodal 2 modal values trimodal (multimodal) 3 & up modal values
CHAPTER 3: MEASURES OF DISPERSION (VARIABILITY) Measure of Dispersion / Variability measures the extent to which data are dispersed or spread out importance: It adds meaning to the mean (if its close to the mean, then it is consistent with the average; if its far from the mean, then it is not consistent with the average) measures the distance of each value from each other (how far? how close?) 1. Range the difference between the highest and the lowest values Quartile Deviation (Semi-Interquartile Range) amount of spread within the middle half of the items arranged in an array an improvement of the range; It eliminates the effect of two extreme values used for ordinal data
2.
The graph can be described in two types: 1. Normal Distribution a continuous, symmetric, bell-shaped distribution of a variable measures of central tendency are identical the area within one SD below and above the mean is approx. 68.3% the area within two SD below and above the mean is approx. 95.4% the area within three SD below and above the mean is approx. 99.7% the shape and position is dependent on two parameters (mean and SD) 3 possible cases: same means, diff. SD diff. means, same SD diff. means, diff. SD properties: total area is equal to 1 symmetrical about the mean measures of central tendency are equal and are located at the axis of symmetry tails are asymptotic relative to the horizontal line may be divided into at least 3 standard scores each to the left & right of the mean before tails appear to touch the horizontal line distance from one integral standard score to the next is measured by SD Skewed Distribution the mean is pulled in the direction of the extreme scores Positive Skewed Distribution mean > median > mode; curve is pulled to the positive or right side of the distribution Negative Skewed Distribution mean < median < mode; curve is pulled to the negative or left side of the distribution
Skewness measure of the shape of distribution; reflects the lopsidedness of data distribution; coefficient of skewness
2.
Quantitative Form (numerical method of expressing the mathematical relationship of the hypotheses; using equality & directional inequality)
CHAPTER 5: HYPOTHEIS TESTING Hypothesis general statement regarding certain descriptions or conditions about the subject under consideration an assumption; a conjecture or inferential statements concerning a quantitative characteristic of the population involved statement of claim or assertion about a population parameter or causal relationship among group of subjects based on the merits of sample information Hypothesis Testing process of finding enough evidence to conclude whether the rejection or non-rejection is reasonable Ten Steps: 1. Null Hypothesis (Ho) formulation 2. Alternative Hypothesis (Ha) formulation 3. Type of Test (1-tailed / 2-tailed) 4. Level of Significance; Degrees of Freedom 2 5. Test Statistic (z / t / f / x test) 6. Appropriate Formula 7. Test Statistic Value (Computed Value) 8. Tabular / Critical Value 9. Decision 10. Conclusion STEPS 1 & 2: Formulation of Hypotheses 1. Null Hypothesis prevailing or old assumption statement of equality no significant difference 2. Alternative Hypothesis new assumption statement of inequality there is a significant difference Forms of Establishing Hypotheses 1. Statement Form (literal or textual method of formulating hypotheses)
STEP 3: Types of Test 1. One-tailed Test if Ha is directional (< or >) 2. Two-tailed Test if Ha is non-directional STEP 4: Level of Significance & Degrees of Freedom Level of Significance probability of committing Type I error (rejecting Ho, when it is in fact true) measures the risk in decision-making using hypothesis testing methodology denoted by ; determines the size of rejection (consists of values of test statistics that are unlikely to occur if null hypothesis were true) directly under the control of the individual performing the test; from 0.01 (precision level of 99%) to 0.10 (precision level of 90%) Degrees of Freedom z-test: N/A t-test: n - 1 (single sample); n1 + n2 2 (two samples) f-test: J - 1 (dfn); N - J (dfd) 2 x test: r - 1 (r 1; c = 1); c 1 (c 1; r = 1) (r 1) (c - 1) (r 1; c 1) STEP 5: Test Statistic Test Statistic statistical tools / measurements basis for deciding whether to reject / accept the hypothesis based on the estimator of the parameter corresponds to the specific distributions to be used as defined by its assumptions
2.
1.
z-test
parametric test single or two sample means population standard deviation () and variance are known sample size: n 30 normal distribution of population samples are randomly selected independent samples
independent call entries of frequency
STEP 10 F-ratio Test Mean Square Column (MSC) Mean Square Between measures the amount of variability between the columns or the explained variability Sum of Squares Column (SSSC) yields the sum of squares between treatments Mean Square Error (MSE) Mean Square Within measures the amount of variability within the columns or the unexplained variability Sum of Squares Column (SSSC) yields the variation within columns
STEP 6: Appropriate Formula 1. Z-test
2.
t-test
parametric test single or two sample means population standard deviation () and variance are unknown sample variance is known sample size: n < 30 samples are from normally distributed population paired t-test is used to compare samples resulting from before and after experimentation
2.
T-test

CHAPTER 6: SIMPLE CORRELATION AND REGRETION ANALYSIS ( ) ( ) Correlation Analysis to measure the strength of the association between numerical variables Coefficient of Correlation to indicate the strength of the linear relationship between the two variables (x and y) that will be independent of their respective scales of measurement Coefficient of Determination to determine how well the least square regression line fits the sample data useful in assessing how much errors of prediction of y can be reduced by using the info provided by x used to determine the goodness of fit of regression line and the data
3.
F-ratio test parametric test for testing homogeneity of a set of means to test equality of two or more means; to make inferences about whether multiple samples come from populations having equal means describes the relationship between the dependent variable, treatments / groups, and the random error random and independent samples normal populations equal population standard deviations Chi-square Test non-parametric test based on a fewer assumptions about the population than parametric tests to test differences of proportions with one degree of freedom (2 by 2 table), or to test the normality of the distribution of data samples are randomly selected data are classified into categories nominal variables
][
3.
F-ratio test
4.
Chi-square test
4.
STEP 9: Decision Making Hypothesis Testing Methodology depends on the tabular value and the computed value Critical Value divides the rejection and the non-rejection region of the distribution and serves as the boundary between them
Rank Order Correlation Coefficient Regression Analysis for prediction to determine the relationship that may exist between variables goal is to develop a statistical model that can be used to predict the values of a dependent variable based on the values of at least one independent variable dependent variable is the value being predicted independent variable is the one used to predict or explain the dependent variable Linear Regression method used to determine the relationship between 2 variables through a linear equation called least square regression equation (LSRE) Scatter Diagram graph in which each observation is represented by a dot Trend Line represents the series of points plotted in which the sum of the vertical distances of the points above the line is approximately equal to the sum of the vertical distances of the points below the line Least Square Regression Method finding the line that best fits the sample data by minimizing the sum of the squares of the vertical deviations of the data points from the estimating line Regression Line line that best fits the scatter diagram best describes the pattern of points expressed using the LSRE Linear Regression Analysis to find out how good the corresponding regression line fits the data Spearmans Rank Correlation Coefficient the resulting coefficient if two variables are each ranked in ascending order and the correlation of these ranks is calculated

STAT Reviewer

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

STAT Reviewer

Caricato da

Copyright:

Formati disponibili

CHAPTER 1: INTRODUCTION Statistics a complete system of knowledge and processes deals with the o collection of data o presentation of data

independent call entries of frequency

STEP 6: Appropriate Formula 1. Z-test

Potrebbero piacerti anche