Sei sulla pagina 1di 4

Supporting Exploration in Social Data

Analysis

Adam Perer Abstract


Human-Computer Interaction Lab Recent implementations of social data analysis services
Department of Computer Science have focused on providing visual overviews of data and
University of Maryland forums for public debate and commentary. While these
College Park, MD 20742 are necessary features for supporting data analysis for
adamp@cs.umd.edu social collaboration, we argue that more attention
should be spent on improving the exploration
Ben Shneiderman experience of users. Improved exploration will allow
Human-Computer Interaction Lab users to dig deeper into the data. Furthermore, richer
Department of Computer Science analysis tools will benefit from the collaborative efforts
University of Maryland of many users. There is a need for sophisticated
College Park, MD 20742 filtering, which allows users to find patterns, gaps and
ben@cs.umd.edu outliers in the midst of an overwhelming visualization.
There is also a need for an integration of statistics with
the visualizations, which facilitate discovery by
providing important clues to complex data. We believe
that supporting advanced filtering techniques and an
integration of statistics and visualization will increase
the utility of social data analysis services.

Keywords
Exploratory data analysis, social data analysis,
information visualization, statistics, filtering
Copyright is held by the author/owner(s).
CHI 2008, April 5 – April 10, 2008, Florence, Italy ACM Classification Keywords
ACM 1-xxxxxxxxxxxxxxxxxx. H5.m. Information interfaces and presentation (e.g.,
HCI): Miscellaneous.
2

Introduction In current social data analysis systems, the


Websites such as Data360 [1], ManyEyes [9] and visualizations often feature limited interactive
Swivel [8] have introduced the masses to the social capabilities for exploration. They present rich
capabilities of data analysis tools. On these websites, overviews of the data, ranging from traditional
users can upload data, create visualizations, and enable scatterplots and histograms to more modern treemaps
a community of interested viewers to take part in the and network visualizations. However, we argue that
analysis. By sharing comments, creating different these sites offer limited functionality for filtering and
projections, annotating interesting outliers or patterns, zooming, step 2 of the Visual Information Seeking
or offering speculation on why the data is the way it is, Mantra [6].
the audience can go beyond the data provider’s
insights. They can point out missed insights, provide There are several side effects for not properly
different interpretations, or suggest mistakes in the supporting filtering. First, scalability is limited as all
underlying data set. These features are extremely data is forced to always be present in the visualization.
important in delivering accountability to data analysis, Second, users may compensate for this first constraint
as well as leveraging the intellect of many interested by filtering off-site. This behavior has a danger of
analysts. Furthermore, these social data analysis users throwing out important data accidently simply
websites act as forums to communicate data in because they did not have access to a visual filtering
interactive ways previously only available to developers process.
or purchasers of expensive software suites.
It is important to note that enabling filtering increases
Although the features offered by these services are the number of paths of exploration and thus also the
both groundbreaking and exemplary, we argue that complexity of the system. These types of interactions
improving the exploratory capabilities should receive can further confuse novices who wish to contribute to
further attention. In this position paper, we focus on analysis. One possible solution is to allow the creators
two specific goals: 1) allowing users to dig deeper of the visualization to produce guidance in the spirit of
using sophisticated filtering tools, and 2) integrating SpotFire guides [7], which can direct novice users
statistics with visualizations to guide users to data with through important analytical steps. A more
interesting properties. We believe these additional sophisticated solution would support Systematic Yet
features will allow richer exploration on social data Flexible (SYF) guides, which records users’ progress
analysis services, and consequently, richer social towards authored goals while also supporting flexible
activity. exploration [4].

Beyond Overviews: Digging Deeper with Visual filtering works best when supported by dynamic
Filtering queries and range sliders to give users direct control.
Since users should be able to filter in as many
dimensions that exist in the data set, the interface
3

should present the stack of filtered dimensions to users complicated to comprehend and use effectively, the
in a coherent manner. discoveries they can lead to can outweigh the cost of
instruction. Furthermore, in a collaborative
Integrating Statistics with Visualization environment, users can partition effort by navigating in
Even with filtering, sometimes certain phenomenon their respective areas of statistical expertise. This
cannot be found solely with visualization. Particularly statistical information will also empower users filter out
with large data sets, visualizations will not always statistically unimportant data, bringing simplicity to
highlight important trends of the underlying data. initially overwhelming visualizations. Finally, having
Statistical properties can be used to detect important statistical overviews of visual information will also help
datapoints, relationships, and clusters. Statistical users trust the resulting information, not allowing users
Figure 1. The rank-by-feature framework analysis can aid the comprehension of visualizations by to maliciously hide or distort the visual representations.
helps users find important patterns between
numerically suggesting (or confirming) visual output.
many columns in data.
Presenting boxplots, standard deviations, lower- Conclusion
triangular matrices will greatly improve the exploratory In this paper, we speculate that a richer experience on
data analysis capibilitly of these websites. social data analysis websites can be had if more
attention is paid to the exploratory capabilities.
When users are faced with data with many columns, Advanced filtering techniques allow users to step
choosing which dimensions to plot in a scatterplot can beyond overviews and take advanced paths to finding
be quite tedious and challenging. The rank-by-feature insights. Additionally, integrating the visualizations
(RBF) framework, shown in Figure 1, suggests with statistical analysis can reduce the complexity of
statistically interesting pair-wise columns that can help complex visualizations while also guiding the users to
guide users to interesting phenomenon [5]. The interesting gaps, outliers and patterns. While both of
resulting scatterplots (not pictured) appear when a user these requirements suggest an increase in the
hovers offer each cell in the matrix. When users are complexity of an interface, the richer explorative
navigating stack charts, highlighting other similar capabilities can leverage the true power of the masses:
stacks in the pattern-finding spirit of TimeSearcher [2], many explorative paths for many insights.
will help users overcome the overview displacement
distortions. When users are trying to interpret a Citations
chaotic network visualization, color-coding and filtering [1] Data360 Data360. (2007).
[2] Hochheiser, H. and Shneiderman, B. Dynamic Query
the nodes by centrality measurements in the spirit of
Tools for Time Series Data Sets, Timebox Widgets
SocialAction [3] can increase comprehension. for Interactive Exploration. Information
Similarly, relevant statistical information can also be Visualization, 3, 1 (2004), 1-18.
Figure 2. A complex network visualization displayed in scented widgets to further improve [3] Perer, A. and Shneiderman, B. Balancing Systematic
(top) can be simplified using statistical navigation [10]. and Flexible Exploration of Social Networks. IEEE
rankings, color-coding, and filtering Transactions on Visualization and Computer
(bottom). Graphics, 12, 5 (2006), 693-700.
Although statistical techniques are even more
4

[4] Perer, A. and Shneiderman, B. Systematic Yet Flexible


Discovery: Guiding Domain Experts through
Exploratory Data Analysis. (Under
Submission)(2007).
[5] Seo, J. and Shneiderman, B. A Rank-by-Feature
Framework for Interactive Exploration of
Multidimensional Data. Information Visualization,
4, 2 (2005), 99-113.
[6] Shneiderman, B. The Eyes Have It: A Task by Data
Type Taxonomy for Information Visualization. In
Proc. Visual Languages(1996), 336-343.
[7] Spotfire DecisionSite (2007).
[8] Swivel Swivel. (2007).
[9] Viégas, F. B., Wattenberg, M., van Ham, F. K., Jesse
and McKeon, M. Many Eyes: A Site for
Visualization at Internet Scale. In Proc. IEEE
Symposium on Information Visualization (InfoVis
2007)(2007).
[10] Willett, W., Heer, J. and Agrawala, M. Scented
Widgets: Improving Navigation Cues with
Embedded Visualizations. In Proc. IEEE
Symposium on Information Visualization (InfoVis
2007)(2007).

Potrebbero piacerti anche