Sei sulla pagina 1di 12

Pie chart

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Pie chart of populations of English native speakers A pie chart (or a circle graph) is a circular chart divided into sectors, illustrating proportion. In a pie chart, the arc length of each sector (and consequently its central angle and area), is proportional to the quantity it represents. When angles are measured with 1 turn as unit then a number of percent is identified with the same number of centiturns. Together, the sectors create a full disk. It is named for its resemblance to a pie which has been sliced. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801.[1][2] The pie chart is perhaps the most ubiquitous statistical chart in the business world and the mass media.[3] However, it has been criticized,[4] and some recommend avoiding it,[5][6][7][8] pointing out in particular that it is difficult to compare different sections of a given pie chart, or to compare data across different pie charts. Pie charts can be an effective way of displaying information in some cases, in particular if the intent is to compare the size of a slice with the whole pie, rather than comparing the slices among them.[1] Pie charts work particularly well when the slices represent 25 to 50% of the data,[9] but in general, other plots such as the bar chart or the dot plot, or non-graphical methods such as tables, may be more adapted for representing certain information.It also shows the frequency within certain groups of information.

Contents
[hide]

1 Example 2 Use, effectiveness and visual perception 3 Variants and similar charts


4 History 5 Notes

3.1 Exploded pie chart 3.2 Polar area diagram 3.3 Spie chart 3.4 Multi-level Pie, Radial tree, or Ring chart 3.5 3-D pie chart 3.6 Doughnut chart

6 See also 7 References

[edit] Example

A pie chart for the example data. The following example chart is based on preliminary results of the election for the European Parliament in 2004. The table lists the number of seats allocated to each party group, along with the derived percentage of the total that they each make up. The values in the last column, the derived central angle of each sector, is found by multiplying the percentage by 360. Group Seats Percent (%) Central angle () EUL 39 5.3 19.2 PES 200 27.3 98.4 EFA 42 5.7 20.7 EDD 15 2.0 7.4 ELDR 67 9.2 33.0 EPP 276 37.7 135.7 UEN 27 3.7 13.3 Other 66 9.0 32.5 Total 732 99.9* 360.2* *Because of rounding, these totals do not add up to 100 and 360. The size of each central angle is proportional to the size of the corresponding quantity, here the number of seats. Since the sum of the central angles has to be 360, the central angle for a quantity that is a fraction Q of the total is 360Q degrees. In the example, the central angle for the largest group (European People's Party (EPP)) is 135.7 because 0.377 times 360, rounded to one decimal place(s), equals 135.7.

[edit] Use, effectiveness and visual perception

Three sets of data plotted using pie charts and bar charts. Pie charts are common in business and journalism[citation needed]. However statisticians generally regard pie charts as a poor method of displaying information, and they are uncommon in scientific literature. One reason is that it is more difficult for comparisons to be made between the size of items in a chart when area is used instead of length and when different items are shown as different shapes. Stevens' power law states that visual area is perceived with a power of 0.7, compared to a power of 1.0 for length. This suggests that length is a better scale to use, since perceived differences would be linearly related to actual differences.

Further, in research performed at AT&T Bell Laboratories, it was shown that comparison by angle was less accurate than comparison by length. This can be illustrated with the diagram to the right, showing three pie charts, and, below each of them, the corresponding bar chart representing the same data. Most subjects have difficulty ordering the slices in the pie chart by size; when the bar chart is used the comparison is much easier.[10] Similarly, comparisons between data sets are easier using the bar chart. However, if the goal is to compare a given category (a slice of the pie) with the total (the whole pie) in a single chart and the multiple is close to 25 or 50 percent, then a pie chart can often be more effective than a bar graph.[11] However, the research of Spence and Lewandowsky did not find pie charts to be inferior.[12][13] Participants were able to estimate values with pie charts just as well as with other presentation forms.

[edit] Variants and similar charts


[edit] Exploded pie chart

An exploded pie chart for the example data, with the largest party group exploded. A chart with one or more sectors separated from the rest of the disk is known as an exploded pie chart. This effect is used to either highlight a sector, or to highlight smaller segments of the chart with small proportions.

[edit] Polar area diagram

"Diagram of the causes of mortality in the army in the East" by Florence Nightingale. The polar area diagram is similar to a usual pie chart, except sectors are equal angles and differ rather in how far each sector extends from the center of the circle. The polar area diagram is used to plot cyclic phenomena (e.g., count of deaths by month). For example, if the count of deaths in each month for a year are to be plotted then there will be 12 sectors (one per month) all with the same angle of 30 degrees each. The radius of each sector would be proportional to the square root of the death count for the month, so the area of a sector represents the number of deaths in a month. If the death count in each month is subdivided by cause of death, it is possible to make multiple comparisons on one diagram, as is clearly seen in the form of polar area diagram famously developed by Florence Nightingale. The first known use of polar area diagrams was by Andr-Michel Guerry, which he called courbes circulaires, in an 1829 paper showing seasonal and daily variation in wind direction over the year and births and deaths by hour of the day.[14] Lon Lalanne later used a polar diagram to show the frequency of wind directions around compass points in 1843. The wind rose is still used by meteorologists. Nightingale published her rose diagram in 1858. The name "coxcomb" is sometimes used erroneously: this was the name Nightingale used to refer to a book containing the diagrams rather than the diagrams themselves.[15] It has been suggested[by whom?] that most of Nightingale's early reputation was built on her ability to give clear and concise presentations of data.

[edit] Spie chart

A useful variant of the polar area chart is the spie chart designed by Feitelson .[16] This superimposes a normal pie chart with a modified polar area chart to permit the comparison of a set of data at two different states. For the first state, for example time 1, a normal pie chart is drawn. For the second state, the angles of the slices are the same as in the original pie chart, and the radii vary according to the change in the value of each variable. In addition to comparing a partition at two times (e.g. this year's budget distribution with last year's budget distribution), this is useful for visualizing hazards for population groups (e.g. the distribution of age and gener groups among road casualties compared with these groups's sizes in the general population). The R Graph Gallery provides an example.[17]

[edit] Multi-level Pie, Radial tree, or Ring chart

Multi-level pie or Ring chart of Disk usage in Linux file system Multi-level pie chart, also known as a radial tree chart is used to visualize hierarchical data, depicted by concentric circles.[18] The circle in the centre represents the root node, with the hierarchy moving outward from the center. A segment of the inner circle bears a hierarchical relationship to those segments of the outer circle which lie within the angular sweep of the parent segment.[19]

[edit] 3-D pie chart


A perspective (3D) pie chart is used to give the chart a 3D look. Often used for aesthetic reasons, the third dimension does not improve the reading of the data; on the contrary, these plots are difficult to interpret because of the distorted effect of perspective associated with the third dimension. The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, not only for pie charts.[7][20]

[edit] Doughnut chart


A doughnut chart (also spelled donut) is functionally identical to a pie chart, with the exception of a blank center and the ability to support multiple statistics as one.

[edit] History
The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801, in which two such graphs are used.[1][2] This invention was not widely used at first;[1] the French engineer Charles Joseph Minard was one of the first to use it in 1858, in particular in maps where he needs to add information in a third dimension.[21]

One of William Playfair's pie charts in his Statistical Breviary, depicting the proportions of the Turkish Empire located in Asia, Europe and Africa before 1789.

Minard's map using pie charts to represent the cattle sent from all around France for consumption in Paris (1858).

Line chart
From Wikipedia, the free encyclopedia

Jump to: navigation, search

This simple graph shows data over intervals with connected points A line chart or line graph is a type of graph, which displays information as a series of data points connected by straight line segments.[1] It is a basic type of chart common in many fields. It is an extension of a scatter graph, and is created by connecting a series of points that represent individual measurements with line segments. A line chart is often used to visualize a trend in data over intervals of time a time series thus the line is often drawn chronologically.[2]

[edit] Example
In the experimental sciences, data collected from experiments are often visualized by a graph that includes an overlaid mathematical function depicting the best-fit trend of the scattered data. This layer is referred to as a best-fit layer and the graph containing this layer is often referred to as a line graph. For example, if one were to collect data on the speed of a body at certain points in time, one could visualize the data by a data table such as the following:

Graph of Speed Vs Time Elapsed Time (s) "Speed" (ms1) 0 0 1 3 2 7 3 12 4 20 5 30 6 45 The table "visualization" is a great way of displaying exact values, but a very bad way of understanding the underlying patterns that those values represent. Because of these qualities, the table display is often erroneously conflated with the data itself; whereas it is just another visualization of the data. Understanding the process described by the data in the table is aided by producing a graph or line chart of Speed versus Time. In this context, Versus (or the abbreviations vs and VS), separates the parameters appearing in an X-Y (two-dimensional) graph. The first argument indicates the dependent variable, usually appearing on the Y-axis, while the second argument indicates the independent variable, usually appearing on the X-axis. So, the graph of Speed versus Time would plot time along the x-axis and speed up the y-axis. Mathematically, if we denote time by the variable t, and speed by v, then the function plotted in the graph would be denoted v(t) indicating that v (the dependent variable) is a function of t. It is simple to construct a "best-fit" layer consisting of a set of line segments connecting adjacent data points; however, such a "best-fit" is usually not an ideal representation of the trend of the underlying scatter data for the following reasons: 1. 2. It is highly improbable that the discontinuities in the slope of the best-fit would correspond exactly with the positions of the measurement values. It is highly unlikely that the experimental error in the data is negligible, yet the curve falls exactly through each of the data points.

A true best-fit layer should depict a continuous mathematical function whose parameters are determined by using a suitable error-minimization scheme, which appropriately weights the error in the data values. In either case, the best-fit layer can reveal trends in the data. Further, measurements such as the gradient or the area under the curve can be made visually, leading to more conclusions or results from the data.

Time series
From Wikipedia, the free encyclopedia

Jump to: navigation, search

Time series: random data plus trend, with best-fit line and different smoothings In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the Nile River at Aswan. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to forecast future events based on known past events to predict data points before they are measured. Time series are very frequently plotted via line charts. Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data analysis problems, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their education level, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A time series model will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility.) Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and recently wavelet analysis; the latter include auto-correlation and cross-correlation analysis.

Contents
[hide]

1 Analysis

1.1 General exploration 1.2 Description 1.3 Prediction and forecasting 2.1 Notation 2.2 Conditions 2.3 Models

2 Models

3 Related tools 4 See also

5 References 6 Further reading 7 External links

[edit] Analysis
There are several types of data analysis available for time series which are appropriate for different purposes.

[edit] General exploration


Graphical examination of data series Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behaviour which need not be related to seasonality. For example, sun spot activity varies over 11 year cycles.[1][2] Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity.

[edit] Description Separation into components representing trend, seasonality, slow and fast variation, cyclical irregular: see
decomposition of time series

Simple properties of marginal distributions

[edit] Prediction and forecasting Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the
time series, representing what might happen over non-specific time-periods in the future Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting).

[edit] Models
Models for time series data can have many forms and represent different stochastic processes. When modeling variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. These three classes depend linearly[3] on previous data points. Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA) model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an initial "V" for "vector". An additional set of extensions of these models is available for use where the observed timeseries is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic or under the experimenter's control. For these models, the acronyms are extended with a final "X" for "exogenous". Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear autoregressive exogenous models. Among other types of non-linear time series models, there are models to represent the changes of variance along time (heteroskedasticity). These models represent autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc). Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast to other possible representations of locally varying variability, where the variability might be modelled as being driven by a separate time-varying process, as in a doubly stochastic model. In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependence at multiple scales.

[edit] Notation

A number of different notations are in use for time-series analysis. A common notation specifying a time series X that is indexed by the natural numbers is written X = {X1, X2, ...}. Another common notation is Y = {Yt: t T}, where T is the index set.

[edit] Conditions
There are two sets of conditions under which much of the theory is built:

Stationary process Ergodicity

However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order stationarity. Both models and applications can be developed under each of these conditions, although the models in the latter case might be considered as only partly specified. In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary. Situations where the amplitudes of frequency components change with time can be dealt with in timefrequency analysis which makes use of a timefrequency representation of a time-series or signal.[4]

[edit] Models
Main article: Autoregressive model The general representation of an autoregressive model, well-known as AR(p), is

where the term t is the source of randomness and is called white noise. It is assumed to have the following characteristics: With these assumptions, the process is specified up to second-order moments and, subject to conditions on the coefficients, may be second-order stationary. If the noise also has a normal distribution, it is called normal or Gaussian white noise. In this case, the AR process may be strictly stationary, again subject to conditions on the coefficients.

Scatter plot
From Wikipedia, the free encyclopedia

(Redirected from Scatter diagram) Jump to: navigation, search

Scatter plot

One of the Seven Basic Tools of Quality


First described by Francis Galton

Purpose

To identify the type of relationship (if any) between two variables

Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. This chart suggests there are generally two "types" of eruptions: short-wait-shortduration, and long-wait-long-duration.

A 3D scatter plot allows for the visualization of multivariate data of up to four dimensions. The Scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and colored using another scalar variable.[1] A scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.[2] This kind of plot is also called a scatter chart, scattergram, scatter diagram or scatter graph.

Contents
[hide]

1 Overview 2 Example 3 See also 4 References 5 External links

Overview
A scatter plot is used when a variable exists that is under the control of the experimenter. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the control parameter or independent variable and is customarily plotted along the horizontal axis. The measured or dependent variable is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables. A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the pattern of dots slopes from lower left to upper right, it suggests a positive correlation between the variables being studied. If the pattern of dots slopes from upper left to lower right, it suggests a negative correlation. A line of best fit (alternatively called 'trendline') can be drawn in order to study the correlation between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree with each other. In this case,

an identity line, i.e., a y=x line, or an 1:1 line, is often drawn as a reference. The more the two data sets agree, the more the scatters tend to concentrate in the vicinity of the identity line; if the two data sets are numerically identical, the scatters fall on the identity line exactly. One of the most powerful aspects of a scatter plot, however, is its ability to show nonlinear relationships between variables. Furthermore, if the data is represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns. The scatter diagram is one of the seven basic tools of quality control.[3]

[edit] Example
For example, to display values for "lung capacity" (first variable) and how long that person could hold his breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold his breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis. A person with a lung capacity of 400 ml who held his breath for 21.7 seconds would be represented by a single dot on the scatter plot at the point (400, 21.7) in the Cartesian coordinates. The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set, and will help to determine what kind of relationship there might be between the two variables.

Bar chart
From Wikipedia, the free encyclopedia

(Redirected from Bar diagram) Jump to: navigation, search See also: Histogram

Example of a bar chart, with 'Country' as the discrete data set. A bar chart or bar graph is a chart with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. Bar charts are used for plotting discrete (or 'discontinuous') data which has discrete values and is not continuous. Some examples of discontinuous data include 'shoe size' or 'eye colour', for which you would use a bar chart. In contrast, some examples of continuous data would be 'height' or 'weight'. A bar chart is very useful if you are trying to record certain information whether it is continuous or not continuous data. Bar charts also look a lot like a histogram.They are often mistaken for each other.

Potrebbero piacerti anche