Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
for
CONTENTS
THE VERY BASIC BASICS.................................................................................................2 1. ENTERING AND MODIFYING INFORMATION ..........................................................2 a) Formulae in Excel .......................................................................................................3 b) Writing Formulae .......................................................................................................4 2. USING DATA ANALYSIS TOOLPAK............................................................................6 2.1 Loading Data Analysis in MS Excel 2007............................................................6 2.2 Features in Data Analysis .........................................................................................7 2.3 To Obtain Descriptive Statistics ...............................................................................7 2.4 To Obtain a Histogram .............................................................................................9 2.5 Generating Random Numbers ................................................................................11 2.6 Scatterplot and line of best fit (Regression)............................................................12 3. EXCEL FUNCTIONS FOR DISTRIBUTIONS .............................................................15 4. NAMING CELLS and RANGES ....................................................................................17 5. FURTHER RESOURCES .............................................................................................18
Prepared by Dr Dianne Atkinson 2010. School of Mathematical Sciences, Monash University, Clayton, Vic. 3800
Introduction to Excel
The goal of this brief manual is to help you get started in Microsoft Excel 2007 or above for basic computations, data summary and data analysis. This will be a powerful tool for you in the preparation of reports in all disciplines in science. Note this manual is not a comprehensive guide to Excel. It is primarily to give you background in the features used in statistics in which you probably have had limited to no experience of before. You will need to refer to it throughout this course.
Formulae are mathematical expressions that use the values or formulae in other cells to create new values or formulas. Examples: = B3 = A7+1 = B7*(1+B4) = SUM(A7:A12)
a) Formulae in Excel
All formulas must begin with an equals sign = and are entered directly into the cell. This can be either by hand using certain operators or codes as described below or by using inbuilt functions. A formula may contain constants or use cell references containing values (see above examples) to calculate the final result. Operators
Arithmetic operator + (plus sign) (minus sign) * (asterisk) / (forward slash) ^ (caret) Description Addition Subtraction Multiplication Division Exponentiation =3+3 =31 =3*3 =3/3 =2^3 Example 6 2 9 1 8 Result
The Paste Function Wizard on the toolbar is a library of in-built functions so that you do not have to know all of Excels shorthand. Investigate menus, particularly the Statistical one. Some Common Functions with their formulae built-in to Excel are: Built-in Function
SQRT LOG LN EXP SUM AVERAGE STDEV QUARTILE SUMSQ
Description
Square Root Log in base 10 Natural Log Exponential Sum Average Standard Deviation Eg First Quartile Q1 Sum of Squares
Example
=SQRT(A6) =LOG(B1) =LN(A1) =EXP(A1) =SUM(D1:D23) =AVERAGE(D1:D23) =STDEV(D1:D23) =QUARTILE(D1:D23,1) =SUMSQ(D1:D23)
also to learn about Use specific functions. Each function is described briefly here and more detail is available via Help ?
b) Writing Formulae
Exercise 1: Value of an Investment
Open a new spreadsheet and enter the information in the diagram for an annually compounded interest:
(Dont worry that the columns do not appear enough. You can change the column width by putting the cursor on the line between column labels A and B at the top of the sheet dragging it to the desired width). wide easily and
The formula =B3 will show the value 1000 when you press Enter .
To change appearances to reflect currency and percentages, Formatting of the number can be done from the Number section on the toolbar. Select the cell of interest and press the $ or the % button as required:
The 0 and 1 in cells A7 and A8 of the Year column show the pattern of wanting to count in 1s. Highlight these two cells (A7 and A8) and use the Fill Handle of NOTE 1 below to copy this pattern down the column to A57 (50 years).
5 The problem here is that the cell B5 no longer contains the interest rate. We need to make part of this reference absolute (by placing a dollar sign, $, in front of the part of the cell designation that would changehere copying down a column this is the row number that would change otherwise). Thus you can now copy the formula effectively without changing the reference to the constant value in the formula. Select B8, click into the formula bar at top and edit the formula to: = B7*(1+B$4). The value 1050 will remain in cell B8. ($B$4 would also be OK) Select cell B8 and copy down to cell B57 using the fill handle (+) as before.
2. USING DATA ANALYSIS TOOLPAK 2.1 Loading Data Analysis in MS Excel 2007
Click on the Office symbol the drop-down page. On the new screen: and select Excel Options at the bottom
AND
At the bottom of this screen you see Manage Excel Add-ins next to Go. Click Go.
An add-ins pop-up screen appears. Select the box next to Analysis Toolpak and click OK. A pop-up screen appears with Feature not currently installed. Would you like to install now? Click YES. This results in Wait for Configuration Process. Be patient as this may take a few minutes. The Data Analysis option should then appear on the Quick Access Toolbar at the right-hand side in a new DATA tool section called Analysis. This procedure only needs to be done once as Data Analysis will remain on logging out. Loading Data Analysis in Excel 2003: In Excel 2003 the basic statistical operations are found under Tools as Data Analysis. You should firstly ensure the Analysis ToolPak is enabled so that Data Analysis shows under Tools. Select Tools > Add-Ins Check boxes for Analysis ToolPak Click OK Data Analysis should now show as an option under the Tools menu.
7 Data Analysis and Mac: Excel 2003 for Mac does have the Data Analysis tool BUT Excel 2007 and above DO NOT include this feature because Microsoft decided not to support VBA (visual basic applications) in which the macros in Data Analysis are written.
Enter these values in an Excel column, eg cells A2 to A30 with a heading in A1.
Select Data Analysis > Descriptive Statistics Input Range: A1:A30 by highlighting the data (I have included the heading row!). Click radio button for Labels in first row IF you did include the heading. Click radio button for Grouped by Columns as your data is down a column. Click radio button for Output range and enter C1 in the space. (Cell C1 will be the upper
left hand cell of the output generated.)
You will notice a large amount of information is calculated; much of which ( eg kurtosis, skewness) will not concern this unit. Note the measures of the centre of the data: mean (5.45) and median (5.46). Note the measures of spread of the data: Standard deviation (0.22) and range (= max-min = 5.85-4.88 = 0.97. The other important measure of spread is the Interquartile Range (=Q3Q1) which cannot be found here. You can find the quartiles using the built-in function ). =QUARTILE(cell range, x) (see
NOTE that the output is a table of frequencies and a chart. The chart from Excel does need editing to have an acceptable appearance:
edit
10
This edit was achieved by: Select the Bin label and you can edit the wording by typing and Enter Select the heading Histogram and Delete. A caption below a figure is a better explanation of the figure in a report. Remove the legend Frequency by highlighting it and pressing Delete. Click on one of the blue bars, right mouse click for menu: Select Format Data Series in Series Options slide the Gap Width to No Gap in Border Color select Solid Line and make Color Black to outline bars.
11
This is now an ORDERED LIST with a corresponding RANDOM NUMBER. If the two columns are now linked and the RANDOM column is sorted into order the LIST column will correspondingly be RANDOMISED: Highlight the entire 26x2 array. You can include the headings to make it easier. Open Data> Sort Tick My data has headers if column headings were included. Sort by ... in the drop down menu select column Random Number
The LIST will now be randomised, and the first 5 letters will be my randomly chosen letters: D, R, P, N, and E in this example.
12
Like all Excel charts this plot requires editing (axes labels, line of best fit and its equation, correlation) to be presentable:
These edits were achieved by: With the plot selected by clicking the left mouse into area, go to Chart Tools tab at top in centre. Select Chart Layouts. From the drop-down menu choose the style that gives you the axes labels and, if wanted, the added trendline and it equation.
13 Select the heading and Delete. A caption below a figure is a better explanation of the figure in a report. Select the legend and Delete. Only one (x,y) pair ( called a series) is plotted so no need for a legend. Select the minor horizontal gridlines and Delete. Change x-axis and y-axis labels by selecting each and typing an appropriate label with units. This first appears only in the bar at the top after typing, press Enter to place the words at the selected axis. Change each axis scale to start the plot so that the data fills the whole plot area: Place the cursor near the x-axis or one of its values, click left mouse to select axis Click right mouse for menu and select Format Axis Change Minimum to Fixed and type 75 Repeat the 3 arrowed steps above for the y-axis and enter100 as the minimum. Move the equation box to a clearer position.
(ii) Regression Analysis including Residual plot Regression analysis involves placing this line of best fit AND assessing the goodness of the fit of the added trendline through a residual plot. The full inferential statistical analysis of the slope is also given (STA1010 only). The residual is the difference between data value and the lines value at each point. The residuals will be randomly distributed along the line IF the line is an appropriate representation of the data. If the linear relationship of the line of best fit is NOT appropriate a pattern (like a curve or U) would be seen in the residuals as you progress along the line. To determine regression line, residuals and residual plot: Open Data Analysis > Regression: Note that it asks for Y-range first! Input Y Range: B3 : B13 Input X Range: A3 : A13 Check labels box if headings included Output range: A15 Tick Residuals and Residual Plots (Do not ask for the Line Plot here as it is not a good plot.) OK
The SUMMARY OUTPUT is a large table that contains full regression information.
14 In SCI1020 the important information here is: Correlation (Multiple R) and correlation squared (R Square), The line of best fit equation given by y = mx +c where c = Coefficients for Intercept m = Coefficients for slope: weight in this example (sometimes title is X-Variable) The equation in the example is BP= 0.7506 Weight + 72.489 (as seen before on the scatterplot). Residual plot In STA1010 the information needed is as for SCI1020 plus the inference on the slope which involves the row containing P-value and Upper 95% and Lower 95%for the X-variable.
The residual plot is randomly scattered so the linear relationship (described by this line of best fit) is appropriate to the data in this range.
15
16
TINV Returns the t-value of the Student's t-distribution as a function of the probability and the degrees of freedom. This is TDIST in reverse: given the probability what is the t-value? Syntax in Excel is =TINV(probability,degrees_freedom) Probability is the probability associated with the two-tailed Student's t-distribution. degrees_freedom is the number of degrees of freedom with which to characterize the distribution, = sample size 1 = (n-1) Remarks
A one-tailed t-value can be returned by replacing probability with 2*probability. For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is calculated with TINV(0.05,10), which returns 2.28. The one-tailed value for the same probability and degrees of freedom can be calculated with TINV(2*0.05,10), which returns 1.812.
CHIDIST Returns the one-tailed probability of the chi-squared distribution. The 2 distribution is associated with a 2 test. Use the 2 test to compare observed and expected values. By comparing the observed results with the expected ones, you can decide whether your original hypothesis is valid. CHIDIST gives the probability, from the samples chi-squared value, of that sample or MORE EXTREME (the tail area). Syntax in Excel is: =CHIDIST(x,degrees_freedom) X is the value at which you want to evaluate the distribution. degrees_freedom is the number of degrees of freedom.= (#rows-1)x(#colums-1)
17
Excel anticipates that the column heading will be the name and has it there already. If you want different, just type it in. Select OK.
Enter in cell B2 the formula for the Fahrenheit temperature conversion as =1.8*Celsius+32. This is instead of =1.8*A2+32 which is not as informative. Note spelling and capitals must be the same as the defined name. Copy this formula down to cell B72 to show the conversion for all Celsius values. Use the Microsoft Excel Help facility to find out more on how to name cells and ranges for yourself.
18
5. FURTHER RESOURCES
Some useful resources for extra help may be Microsoft Excel Help, , found on the top toolbar of the program and Microsofts Support page for Excel: http://office.microsoft.com/en-us/excel-help/ , choose your version and investigate the Help and How to lists. Excel 2007 - Training - Microsoft Office Online Audio course with many and various tutorials in basics: Get to know Excel: Microsoft Corporation. All rights reserved. ... (http://office.microsoft.com/en-us/excel-help/CH010224830.aspx) For Excel 2003 but very good for basics in any version: Clemson U. Physics Excel http://phoenix.phys.clemson.edu/tutorials/excel/ Copyright 2006, Clemson University. All Rights Reserved http://www.youtube.com , of mixed quality but one suggestion is an Uploader called ExcelIsFun who has a series on Excel Statistics, e.g.Excel Statistics 31 Histogram using Data Analysis Add In