Sei sulla pagina 1di 1

Building Rich Social Network Data

A schema to aid designing, collecting and evaluating social network data!


Eamonn OLoughlin (eamonn.oloughlin@gmail.com)!

Big Data = Different Challenges"


During the past number of decades the use techniques from of Social Network Analysis (SNA) have become signicantly more pervasive among sociologists, statisticians and computer scientists. In additon, during this time the size, scope and complexity of analysed network data have grown substantially.! ! This growth has in part been driven by technological advances (and societys response to those advances) that have resulted in a reduction in the cost associated with collecting and analysing information about social networks.! ! Compared to more traditional multi-dimensional data (including time series, panel and cross-sectional data), there are now a signicantly larger number of methodological and design decisions that must be considered when creating a social network dataset. Furthermore, these decisions must be taken with care, as the features of a dataset determines whether or not it is suitable for particular types of analysis. Because these design decisions are more fundamental than simple implementation details (e.g. what data structure to use), they can easily be overlooked.! ! In this paper we propose a standard schema for social network data. A standard schema is a mechanism that allows us to dene the structure, content, and to some extent, the semantics of a dataset. Our proposed schema denes the most common features that social network datasets may have in a consistent way, allowing for the structure, content and scope of the social network data to be easily documented and communicated. ! ! This work was based upon an analysis of over 150 social network datasets, prepared by the dynamics lab at University College Dublin. This repository of datasets has been made public, and is available on the Dynamics Lab website at http://dl.ucd.ie!

What is a Schema?"
A schema allows us to represent in a particular way the structure and features of a particular object!
! In this research, our aim is to ll in the gaps by creating a standard way to describe the type and features of social network data!

A Social Network Data Schema"


The schema (summarised below) covers all of the types of features that a network dataset may contain. This allows the researcher to describe (or assess) in detail the scope, assumptions, and characteristics of their data.!

Our Approach"
Review the structure, size and features of over 100 publically available social network datasets! Create functional groups of key features! Outline a schema describing the features of social network datasets! Discuss different components of schema and role in the possible types of subsequent analysis! Outline how schema assists in designing and implementing social network data creation / collection strategy.!

Why use a Schema?"


A schema is useful in the early stages of research when an approach is known or a hypothesis is under consideration. At this point, a schema will help in designing or locating appropriate data that can be used to test the hypothesis. In general, there are three distinct ways to access data and how our schema would help with each is summarised below.! Op#on 1: Collect the data through direct observa<on or survey Helps researchers identify potential additional avenues of research post initial analysis (i.e. aid in creation of datasets that support multiple analyses)! ! ! Supports communication with data owner to increase quality of retrieved data! Op#on 2: Retrieve the data by taking a subset of data from an exis<ng electronic system

Our Motivation"
Data Collection is Expensive! Privacy Concerns! Difcult to sample network data!

Serves as a useful checklist prior to commencement of data collection!

Many design decisions!

Different Practitioners!

Increase in ability to analyse data!

Reduced cost of data storage!

Supports cross-teaming across academic disciplines (where data is required for different purposes) ! Helps in identifying appropriate or desirable publically available datasets! Op#on 3: Assess appropriatness / quality of an exis<ng social network dataset

Pervasive sensor technology!

Sourcing Data for Social Network Analysis Research


Enables prioritisation desirable data features where constraints prevent all being met!

Potrebbero piacerti anche