Diana Soto de Jess Research MA Thesis: Cultural Analysis University of Amsterdam June 2011 Supervisor: Joost De Bloois Second Reader: Markus Stauff

TABLE OF CONTENTS I. Introduction: Data Love 3 16 20 23 33 46 50 53 64 75 83 90

II. The New Oil of the Internet: Data, Governmentality and Post-demographics A. The Numerification of Human Characteristics B. On Governmentality C. Population and New Media: Rise of Post-Demographics D. A Getting to Know to Better Mobilize III. Tracing the UPR Student Movement through Facebook Data: a Case Study of Post-demographics A. Figuring Out Taste and the Like Button B. Using Create an Ad as Tool for Academic Research C. Tracing the UPR Student Movement D. Discussion of Results and Considerations Regarding the Data E. Online Post-demographics and Offline Demographical Data: Grounding Data IV. Conclusion: Research and the Mobilization of Data V. References

98 106

Personal data is the new oil of the Internet and the new currency of the digital world declared the European Consumer Commissioner, Meglena Kuneva in 2009 at a roundtable discussion on Online Data Collection, Targeting and Profiling that took place in Brussels. It is a point that many have made, pertaining to the media in general, but particularly for the Internet and dynamics seen in the most popular of its dimensions, the Web. Richard Rogers describes the rise of a service-for-profile dynamic, or the development of a personal information economy that works as having to know you, in order to sell to you characterized by Web giants such as Google and Amazon (Rogers, The Googlization 2). David Lyon emphasizes the ordinary and mundane nature that this having to know you (be it in order to sell to you or for other reasons) has acquired in contemporary societies. He remarks on how even if practices of observing, monitoring and identifying people have always existed, To an unprecedented extent ordinary people now find themselves under surveillance in the routines of everyday life. In numerous ways what was once thought of as the exception has become the rule (4). Yet that surveillance does not necessarily operate in a sinister top-down way. Greg Elmer talks about a form of surveillance that works through enticement or the constant and structural compelling of people to exchange personal data for rewards or benefits. Marc Andrejevic theorizes the call to interactivity, so common in new media nowadays (from online social networks to TiVo), as a way to ensure that this new oil and new currency of (personal) data keeps running. He argues that although in common use the notion of interactivity implies reciprocity of information gathering and exchange in practice it is largely asymmetrical or non-

reciprocal, patterned more on a panoptic version of interactivity than on mutual transparency and accountability (395). The vast collection of so called personal data found in online social networks (OSN) as baggage that gets added up to users profiles as they interact in the network renders some weight to his point. Through the social networks enabled and incentivized by OSN systems, millions of users worldwide create and share data, data from their biological and material characteristics such as age, gender and location but also their tastes, preferences, friends, affects, political postures, etc. This then gets used as input to guess and make current trends and to customize or personalize all sorts of experiences in the Web: from reading the New York Times while being informed of how many of your Facebook friends Like it, to getting targeted advertising on Summer Schools after emailing a cousin about how you were thinking of attending one this year, to promoting products according to trending topics in Twitter. The idea being that if one customizes and personalizes there is a direct appeal to the person/customer, that it is something more enticing than a generic product since it appeals directly to the person/ customers interests, tastes and even affects. That if one trendifies content it insuress relevancy in being part of the conversation, or at least, it is more noticeable. Especially on the Web where there is such an overload of information, such a vast output of data, this customization also presented as personalization, is argued as giving an edge market wise. And so this data from and about persons has become a currency of sorts, a valued commodity to obtain. On the other hand of this need to know the audience in order to appeal directly to it and be more relevant and/or noticeable in an oversaturated market, it must be mentioned that the

actual level of customization enabled by this kind of data has been questioned. A study on Googles customization practices by Martin Feuz, Mathew Fuller and Felix Stalder found that its customization is actually quite generic, quantitively speaking the diversity of results (either by sources provided or by ranking of results) was quite low in searches where customization or personalized search, as Google calls it, was allowed. The personalization aspect of this data as currency has also been questioned, since the irony behind personalization is that it is built upon profiles, profiles on say what teenage white girls from the West Coast US like versus Afro American men from Missouri US.1 In this research, the personal in personal data has been downplayed, sometimes putting it within parenthesis or just not using it, to focus on the technologies and logic of data mobilization itself. It focuses on the mechanics and logics of the constant circulation of data in OSN; thus exploring notions of data, profiles and objects of data aggregation such as the Like Button, etc. This downplaying of the personal in personal data is because the name of personal data may deviate from the actual dynamics inherent in this kind of data. Yes, it is data generated by persons, it is data that says something about the person that originates it, but it is interesting and valuable not so much because it is personal data, data from and about that specific person, but because it is potentially profiling data, data that can be fitted into profiles. And the eloquence of profiles is that while they provide some level of customization, of uniqueness akin to the many diverse traits of human individuals, profiles are not really in-dividual or personal, rather they are malleable and customizable. The data is used not so much in its quality as personal data, or data
1 Again,

see Feuz, Fuller and Stadler who argue that Googles customization is made in great part through group cluster profiles or statistical group profiles. To us the most likely interpretation is that Google not only relies on a users personal semantic histories, but that it extrapolates from what it knows about a person to his or her association with statistical group profiles that Google has built over time in what ends up being an inversion of the promise of personalization (par. 56).

that identifies a specific person or relates in a special manner to her/him but rather as data that says something about humans and their characteristics as humans, which can then be aggregated and analyzed according to whatever needs the entity doing the data analytics has. This data, though coming from persons and saying something about them as persons, is mobilized not in its quality as personal data but rather as data that can be filled into profiles, profiles that can be arranged and rearranged according to whatever criteria are specified. And so profiles, precisely that one space in OSN where the user seems to have the most level of control, surfaces as a multilayered object key to the mobilization of data and to making sense of it. A multilayered object composed of explicit fill-in-the-blank layers and less obvious interactive layers, but also of different dimensions: theres the user profile page that the user can edit and that everyone sees, but then there are also the composite profiles being made and remade as marketers establish different parameters to target their advertising to an audience that they construct by specifying some characteristics and not others. Furthermore, the profile operates also as a concept, there is an ample theoretical work on profilization, and profilization in all sorts of spheres. One can talk of criminal or psychological profilizations, but one can also talk of profiles in art, precisely the field from which the word profile originated (Oxford English Dictionary). In this work the profile as object-concept resurfaces multiple times, opening avenues of exploration with regards to how this data from and about persons is mobilized in and through OSN.

DATA AS CURRENCY... NOT JUST IN THE ECONOMY When Kuneva spoke of (personal) data as currency the image of currency surfaced because its mobilization in the Internet is related to economic imperatives in which this data from and about persons becomes a valuable commodity for companies trying to get to know consumers to better sell to them. But access to data (independent of its origins as coming from persons and describing characteristics on them or say coming from books and detailing their content) has also become a valued commodity in the push towards a digital humanities and/or towards the integration of digital methods as part of humanities research. There is data love going all around, not just in the commercial field. There is a growing trend of trying to assess the qualities of culture through quantitative data. Things like Culturomics or Cultural Analytics, proposed by renowned names and institution such as Harvard and Lev Manovich respectively, have pushed forward the idea that culture and fields traditional to the humanities such as the analysis and assessment of creativity and its manifestations, of the qualities and immaterial aspects of the human, can be assessed through quantitative data, through numbers and analytics rather than more traditional forms of humanistic analysis. Harvards Culturomics, for example, intends to achieve a quantitative analysis of culture through the digitization of thousands of books (in collaboration with Google and Encyclopedia Britannica) which enables them to be turned into a database from which one can Type in a word or phrase in one of seven languages (English, French, German, Spanish, Hebrew, Russian, Chinese) and see how its usage frequency has been changing throughout the past few centuries (Cultural Observatory). In the second one, Lev Manovichs Cultural Analytics, culture

is measured on a global level through both the analysis of digitized objects such as magazine covers but also and quite notably through the trends seen in the traces left by users when they are online. The data sets will come from two places: media content (from magazines to webpages), and digital traces left when people discuss, create, publish, consume, share, edit, and remix these media (par. 3). And so, through the (personal) data left behind, Google searches, video uploads and the use of link sharing buttons become not just technologies but indicators of contemporary culture's heartbeat. In a rather ironic twist it turns out that the new and avant-garde way to assess qualitative qualities on humans such as issues regarding identity politics, perceptions and notions of taste, definitions of concepts and associations, is best done through quantitative data. Or so the trend goes. That is, even if these projects do not discard the use of non quantitative methods, and may even argue for the integration of quantitive methods with qualitative ones; in practice they stop shortly after the datasets have been obtained. Even if they argue for a (digital) humanities in which up-and-coming quantitative methods are integrated with more traditional humanities qualitative methods, the tendency is to showcase the quantitative and focus on that aspect rather than the integration of both. Post-demographics, a digital methods research approach originally proposed by Richard Rogers and that here I take to further develop and elaborate, partakes of this trend in the humanities. In post-demographics social relations, a wide range of tastes (including political and ideological preferences) and lately, immaterial skills are measured and predicted through the scraping and analyzing of multitudes of online profiles available in OSN. In that sense it is more

of an analytics than an analysis, though the idea is that post-demographics as research strategy can help to further shed light in all sort of analysis. Post-demographics feed off the growth of OSN in the past couple of years and the immense amount of data that they gather. With Facebook having over half a billion users (Facebook, Statistics) and Twitter generating an average of 110 million Tweets per day (Chiang) OSN have not only positioned themselves at the forefront of social media, but also as a rich data environment. The huge quantities of data available in them is also complemented with the specific characteristics that some OSN have, such as Facebook, where strict policies regarding the authenticity of its users and contents render more value to the data as data grounded in reality, as data that says something in a direct fashion about its users not just as (Internet) users, but as people. Though defining OSN has proven problematic due to the wide variety of formats out there each with their own idiosyncrasies regarding privacy, what constitutes a network and what are the activities carried out within them; they do have one common element: profiles. In their essay Social Network Sites: Definition, History, and Scholarship Nicole B. Edison and Danah Boyd, who has dedicated her academic work to OSN, observe that While SNSs [social network sites] have implemented a wide variety of technical features, their backbone consists of visible profiles that display an articulated list of Friends who are also users of the system (211).2 Taking from this commonality this research departs from the profile as a digital object to explore the mobilization of data in online social networks.

In their definition Boyd and Ellison refer to OSN as SNS or social network sites rather than online social networks, but both names refers to the same thing. The difference in use rather lies in a theoretical disagreement on which name is more adequate. Here OSN is used because a) it seems to be the most commonly used term (i.e. if one Google both terms OSN has considerably more results) and b) to accentuate the media specificity of these systems since, with regards to the mobilization of post-demographic data, they would not really be replicable in any other medium but the Web.

DATA MOBILIZATION: OUTLINE OF THE THESIS Much of the academic research on OSN centers on the questions of privacy and surveillance, both approaching the circulation of data in social networks as a potential or already established threat. This research shares with them an interest in the circulation of data in social networks, but rather than framing it as a threat, and the problems of the personal in personal data, it focuses on how that data on and from persons is mobilized, it looks rather at the hows of this constant generation and circulation. This is particularly explored in the first chapter through the notion of governmentality and a discussion and analysis of post-demographics itself as a research strategy, its baggage concept-wise and how it is enabled by digital objects such as profiles and dynamics seen in the Web as the emergence of commercial providers of (personal) data such as RapLeaf. But the idea of this research as whole is to try and push farther than that. Rather than just sticking to a critique of data mobilization it is interested in how to approach this data constantly mobilized in and through OSN critically. Literally approach data: how to get it, use it, understand it and actually integrate it with other forms of knowledge. And so the second chapter explores not only how data is mobilized in OSN but also ways in which to access and mobilize data for uses beyond those currently dominating it, which are mainly marketing concerns and needs. In particular it proposes a conceptual hacking of tools intended for marketing, such as Facebooks Create an Ad tool to access data for academic research in general. This is done through a case study on a contemporary social movement, the University of Puerto Rico student movement, for which a composite profile is done through data gathered

using the Create an Ad tool. This data is then articulated and integrated with issues concerning that movement, such as a discussion of the movements representability of the student population (or lack of it), to showcase ways in which data obtained through post-demographics can then be implemented for not just analytics but analysis. The idea throughout is to explore concepts, methods and dynamics in practice, while still engaging them critically. Furthermore, the idea of conceptual hacking of marketing tools such as Create an Ad is motivated by an intention to find openings for data mobilization and access in an environment that increasingly makes data a necessary commodity, a currency even, yet at the same time said currency is severely limited to certain groups and certain knowledges. In the process, this research bridges methods traditional of the humanities such as close reading and analysis of texts, objects and theories, along with digital methods.

DIGITAL METHODS Digital methods, is a term proposed by Richard Rogers to describe a series of relatively new methods that are specific to the digital, that is, this method strategy both works through digital tools and is elaborated out of the logic of digital objects and in particular the Web sphere, such as the logic of relevance as defined by ranking systems in search engines, for example. Digital methods surfaces out of the opposition between the digital and the digitized, as in making a distinction between methods and things native to the digital and things that were digitized, translated to it from another medium. But the issue of digital methods is not strictly a matter of medium specificity, it is also a way of approaching the digital as a space of study directly

connected and grounded in reality rather than something that needs a mediation process (a digitization) in order to pass from two perceivably separate worlds. For example, the Web, probably the most popular and known aspect of the Internet and the digital, has seen different stages of content from starting as a wild and alternate space dominated by animated gifs of aliens and monsters in Web portals, to getting indexed and organized by search engines, to becoming ever more segregated and polished content-wise through customization, personalization and social media. 3 Similarly the Web and the Internet have also seen different stages in how it has been studied and understood theoretically. The conception of the Web as an alternate space can also be seen in the original theoretical conceptualizations and research approaches to the Internet (and with it, the Web). In The End of the Virtual Rogers described three periods of research theories and methods with regards to the Internet. In its first period, theories on the Internet tended to refer to it as cyberspace, a separate space from reality. This meant that the Internet as cyberspace, was not bound by the rules of real life and society: it was the place where gender, race, age, and other social determinants did not matter. This meshes together quite nicely with Web content dominated design wise by images of aliens and monsters. Later theorizations started to construct bridges between these two perceived worlds (cyberspace vs. real life on earth, online vs. offline), bringing tools from the social sciences to ascertain who precisely is using the Internet and how. Concerns like digital divide (where real life structural differences affect Internet use) started surfacing. The Internet was still a separate

In terms of the origins of the Web and how it was conceived as alternate and wild space, think of the names of the firsts browsers (Explorer, Safari, and Navigator), they carry images akin to the discovery of a new world. For more on a history of the Web and its content as related to methods and theoretical approaches to its study see Rogers The End of the Virtual.

space but not as much as before, it was now asserted that it was connected to reality, not necessarily opposing or replacing it. One could say that it was no longer an alternate space in a different reality, but instead a separate one in the same field. In that sense, the perceived distance between reality and the Internet (and with it, the Web) was decreasing. The word virtual to describe what occurred online started becoming more popular. And the question of methods for studying the Internet started being broached more and in a more methodical fashion. Surveys, interviews, observation and participant observation become the preferred methods of inquiry in what is subsequently characterized as virtual methods (Rogers, The End 6). Rogers characterizes these methods as digitized, in that they were originally developed in media other than the Internet and then implemented in it, translated to the digital. They were digitized much like digitization projects in libraries. In contrast to digitized methods are methods that were developed specifically in and for the Internet, which he terms digital methods. But as remarked before, the proposal for digital methods goes beyond a call for a medium specific methodological approach. It implies a certain reconceptualization of the Internet with regards to its relation to reality. Thus far one can see a progression from the Internet being theorized as an entirely different alternate space (cyberspace), to a certain bridging of the two worlds with the rise of virtual methods. Digital methods imply seeing the Internet as something that is not only related to reality in a supplementary fashion, but that it can in fact capture reality itself. The issue no longer is how much of society and culture is online, but rather how to diagnose cultural change and societal conditions using the Internet . . . Knowledge claims may be made on the basis of data collected and analyzed by devices such as search engines (The End 8). And so, instead of seeing the Internet and the Web as a separate space to which one can

travel to (going online), it is taken as a fact that by now culture is online, or at least a huge sum of it, and it is being archived, analyzed and processed by digital systems such as OSN and digital devices such as the Like Button as will be explored further in chapter two. Rogers cites the example of Google Flu Trends, a non commercial ( project launched in 2008, which anticipate local outbreaks of influenza by counting search engines queries for Flu, Flu symptoms and related terms, and geo-locating the places where the queries have been made (The End 8). Something that posits the Web, this one far off medium, as far closer to the ground than one might expect (The End 8). Thus digital methods champion the possibility of online groundedness, where one can use the Internet as a tool for finding data that is grounded data, data that in and of itself can say something about reality itself, indicators about life on the ground rather than just data on online behavior or virtual communities. This project partakes of a similar view, that things going on online, and especially in a social network that incentivizes and enforces authenticity such as Facebook, can tell something directly about real life issues, not just Internet phenomenons. Hence when looking at the UPR student movement through the conceptual hacking of marketing tools such as the Create an Ad tool, the idea is not to evaluate say the phenomenon of online activism as that online activism rather it assumes and proves through statistical correlations with data from more traditional sources such as the University registration records and the US census, that phenomenons being seeing in Facebook such as the one studied here do correspond to real life; that data gathered through a medium such as Facebook, which enforces authenticity, and works from personalization and socialization as grounded in real life actually is grounded data.

DATA LOVE This work partakes of the data love going around. It is certainly fixated on the mobilization of data as currency not just for economic needs but in relation to how power is managed and articulated nowadays, departing from a reading of Foucaults governmentality as a system that believes in naturalism and liberalism, in respecting the givens of things and letting them happen, letting them circulate and move all around rather than fixing them in space and character, while at the same time demanding, needing, to know them (and so define them) in order to respect them and mobilize them. It is also in love so to speak with the possibilities of the relatively new dataset that is post-demographics, of the capacity to gather data, and potentially information (or even knowledge) from aspects of humanities that previously were quite difficult to access in a standardized automatized fashion and in such quantities. But it tries to approach that love critically, with some caution, so that the love is not confused or exchanged for infatuation, for pure madness and oversimplification. Data is currency, but someone has to assign the values.

This work started with a quote, a premise on the current state of so called personal data as the new oil of the internet and the new currency of the digital. Though the statement may be a bit grandiose in style, it highlights the importance of (personal) data nowadays as something to mobilize, to both generate and circulate, and as such something that is in constant movement not strictly out of absolutely neutral and natural logics but constantly mediated. The framing of data as a commodity and a currency highlights the defining role of this mobilization of data that comes from persons and that when communicated, when moved with intention, acquires value as information on humans as an aggregated population, rather than as individuals. Something canalized and managed for example through profiles in OSN, a particularly fertile environment for data generation and circulation, specially post-demographic data. In this mobilization of (personal) data, data can be used as a currency by users wanting to join the system, giving their data and the access to that data by advertisers in exchange for a service free of charge. But data can also be used as a commodity to sell and buy, circulating through a diversity of channels, from Facebooks own Create an Ad to leaking data gathered through applications to commercial data providers such as RapLeaf which integrate data from all sources (including OSN unique IDs) to create a database on online users. Yet, what surfaces as particularly new and interesting is the fact that the data that seems to be holding the most value as a currency and a commodity in the digital world is the postdemographic data. This is the data RapLeaf charges for. This is the data that allows for the Targeting and Advanced Demographics of Facebooks Create an Ad. And this is also the data that

can help to assess compatibility between people through their tastes by establishing patterns and correlations in the data of MySpace profiles, as the research project of elFriendo did. ElFriendo, as an inquiry into post-demographics, partakes of a growing trend to assess the qualities of culture and humans through quantitative data. Data as currency and commodity operates not just in the explicitly commercial field but has also become a much valued unit in research trying to integrate the quantitative methods and logic of digital objects to traditional subjects of humanities. From customization and dreams of relevant advertisements to attempts at capturing cultures heartbeat through the automatized processing of billions of pieces of data, there seems to be a big data love going all around. This data love, which sometimes operates instead as data infatuation, can be seen in demands for humanities to get a grip, update themselves and start taking advantage of all the quantitative data being generated through the digital network to make faster and more comprehensive work that ties in with the input of millions of people rather than whatever relatively minuscule research one lonely scholar in a library can do with the limited amount of books available in said library. The overall logic seems to run along the lines of Why waste your time trying to define culture when you can Google it? 54 And Google, an algorithmic machine, a number crunching machine, will show you relevance in numbers, apparent agreements on the meaning of things through rankings determined by algorithms, and so it will show you knowledge in rankings and numbers... or so it seems. 55


For more on how knowledge and expectations on knowledge generation and understanding have become influenced and even shaped by Google and the logic its technological construction proposes see Nicholas Carrs The Shallows: What the Internet is Doing to Our Brains.

Lucas D. Introna and Helen Nissembaum in particular have articulated a critique on the homogenization of information (and thus potentially knowledge) through search engine logic; on how search engines systematically exclude and give preference to certain websites over others in what they articulate as search engine politics.

Soto de Jess 100 Yet, numbers like words can only be read in context; in context of other words, histories and politics, but also in context of other numbers. Lets take one of the results of this research, the sentence 35% of Puerto Ricos population is on Facebook, and put it next to another of the results of this research postdemographics is a research method in which social relations, a wide range of tastes (including political and ideological preferences) and lately, immaterial skills are measured and predicted through the scraping and analyzing of multitudes of online profiles available in OSN. Especially when put in contrast to the second one, the first result seems so incredibly clearer and easier to manage (and not just because it is a shorter sentence). With the second one may feel the immediate need to ask, but what is this demographics that has a post? And what is this post? Is it post as in the previous part is over and this comes next? Or post-as it builds upon it, looks slightly different, but the previous is still there? And this is just sticking to the first word of the result/sentence. In contrast 35% of Puerto Ricos population is on Facebook seems so straightforward. So simple. Just one layer. Everybody knows percentages. Everybody knows population. Certainly everybody knows Facebook. And thats where the lack of numeracy kicks in.56 Because 35% is nothing without thinking at the same time of a 100%, and that 100% needs to be defined. Numbers are also relative and codependent on other numbers, especially when they come in the shape of statistics. Like words are relative and codependent on other words, numbers presented as statistics are


I refer here to an idea from Brian Kernighan, a computer scientist currently working at Princeton, who argues that despite the prevalent roles of numbers, specially big numbers, nowadays (from budgets, deficits and baylouts to data output and capacity of technological innovations) there is a widespread innumeracy. That actually there is little understanding or skill at assessing their meaning and hence veracity; and so units may be changed for others (talking about billions, when its actually billions) or even invented (specially when it comes to technology, among other numeracy failings.

relative and codependent upon other numbers.57 They need to be read in contrast to others. And they also have histories. To make sense of 35% of Puerto Ricos population is on Facebook one needs to know where is the data from? What was the method to get it? What criteria were considered for data gathering and how were they defined? For example, how was population defined? What is the total and how was it constituted? How were the percentages made, relative to the total or to those who reported? But for getting real insight into this one has to have access to data. For being able to question numbers, to asses quantitative data critically one has to be able to access it not just as an end result, not just in its processed fashion as information (35% of PRs population is on Facebook) but as data in a context. And it is for the latter reason that this particular research looked for not just a critique and analysis of data mobilization in OSN but for a way to literally approach said data mobilization: to get it, define it, mobilize it, and analyze it in its social context. It is because of this that it looked for a way of accessing data on Facebook through the case study of the UPR student movement. When it comes to New Media technologies, there is a constant bombardment of statistics, usually in the shape of info-graphics. But, especially with OSN, one can rarely get the data oneself. It is always subject to others and their worldviews: of what populations are worth studying, of how these populations should be defined, of who is seen and represented in the statistics and who is deemed unremarkable because there are not enough users or because it is not profitable to know them.

Think of theories of language that posit that meaning is not intrinsic upon the word but that rather language works as a matrix in which one knows a cat is a cat not because there is something inherently catty about cats but because a cat is not a dog, is not a chair, is not running, and on and on.

Soto de Jess 102 In a way the project proposed here is about finding alternatives into how to manage the tensions seen nowadays when there is more data than ever but it is increasingly privately owned and segmented in different private domains making its access difficult if not altogether impossible. The problem being approached in this research is not so much whether personal data should be used for commercial purposes (though it does offer a critical perspective on that), but rather that the fact that it is mobilized for and due to commercial needs places some limitations on it, on how this data is perceived and processed and hence further gathered. And it also places limitations on research that may try to assess value and mobilize that data in a more critical fashion than those done by marketing and bottom line needs. This limitation works in a twofold manner, limited as in not available and limited as in predefined by needs and concerns that may be relevant to some but not others, that include some but not others. So for example one may be able to find data on OSN use for people in the US or other rich countries (be it through press releases made by online companies themselves, marketing websites, or by research institutes of countries heavily invested in technology such as the US) but it proves particularly difficult to find something similar for smaller nations. The general justification is that there are not enough users, yet this is a justification based exclusively on quantitative reasoning and a limited quantitative reasoning at that. As has been argued before, numbers need context both of qualitative knowledge and of other numbers. To bring it back to the findings of this research, Facebook users from Puerto Rico are barely a blip in the OSN sphere, a measly 1.5 million in a world of hundreds of millions. But when looked at in relation to other numbers, when framed in context, say in proportion to the general population or in relation to other Web practices, such as most Googled terms and Alexa rankings of Web trafic, that

previously insignificant number surfaces as more valuable, as more significant. Yet the only way to find this is to be able to directly dive into the data and define it according to ones needs and interests as a researcher with a specific contextualized objective rather than those established by others. And so it is about how to get the data, as in to have access to it, and as in to be able to manipulate it and define it enough to get it, to understand it, make sense of it and gather meaning from it in a relevant fashion. Targeted audiences and populations are not passively laying around in an external reality waiting to be discovered, they are constituted, they are defined, according to the needs and biases of whoever is doing the targeting. This is increasingly a problem for knowledge generation in an environment where yes, there is increasingly more data but this data is also increasingly owned more and more by private companies which limit their access. The irony of online life today with regards to data is that, though there are constant claims on how privacy is dead, data remains privateprivately owned and privately mobilized, even if it is is publicly generated and shared. This forces researchers invested on actually getting data from leading OSN such as Facebook to even go against it Terms of Service as in the case of the MIT researchers who used a crawler to get data from public profiles through Facebooks messaging system. The walls of the OSN garden are shut tight when it comes to researchers. Yet, there are ways of working around that. Data found in and through Facebook can be used not just for marketing as is being mainly employed nowadays but also for academic research on contemporary phenomena. As has been done here, the Create an Ad tool can be retooled for non-marketing research to study a particular social movement and shed light on a

Soto de Jess 104 particular question. In an act of conceptual hacking the logic of the Create an Ad tool as an instrument for targeting audiences can be translated from marketing needs and concerns to other needs and concerns, for example the study of social movements through data gathered in and through Facebook. One of the usual challenges of research done on and through a digital medium online is precisely this online-ness, which questions its relevance for offline realities. Yet things going on online, and especially in a social network that incentivizes and enforces authenticity such as Facebook, can tell something directly about real life issues, not just Internet phenomenons. These tools that are being used as part of the organization, awareness and mobilization processes of social movements, can also be used as one of the tools to understand those phenomena better. The data here retrieved could then be correlated with other digital methods strategies and with traditional sources/methods to better assess its validity through contextualization. It can be used to gather information and perspectives on it that have not been covered through other instruments. For example, in places and times when official sources have been compromised this offers a look into the situation from the data provided directly by the actors involved. It is also, up to date, and due to the nature of the medium (OSN) it is data that is constantly being actualized and that gets shaped and reshaped live as things are occurring, as movements are being formed and reformed. There are also advantages specific to using Facebooks Create an Ad to gather data from Facebook. The first, and obvious one being that it is in fact the largest social network in the world and one distinguished for the level of immersion its users have. Second, Facebooks enforcement of authenticity as a key requirement of its system, renders further value to the data

found in it as data prone to be grounded in reality, and data that does say things about people and the things they care about. Finally, but no less important, the reach of the data Facebook collects and organizes through the social graph is not just huge in quantity but also in the types of data, something reflected in the variety of things aggregated under Likes & Interests, a particular kind of data (post-demographic data) of increasing value nowadays both for commercial and non-commercial purposes.

