Sei sulla pagina 1di 20

OIDD 662

BIG DATA & PRIVACY

JUSTIN CHANG, SALOMON COHEN, DANIEL LEE, PARSA SHAHRBABAKI, JAMES QI


OIDD 662 - Enabling Technologies Justin Chang

Fall 2016 Salomon Cohen

Daniel Lee

James Qi

Parsa Shahrbabaki

Big data and privacy

1. Introduction

We present here our research and analysis on the privacy implications in todays big

data-dependent society. In particular, this work will first address consumers perceptions on privacy

issues and related responsive behaviors. Then in investigates regulators responses to protect

consumers privacy as well as the actions undertaken by companies to meet regulatory requirements.

Finally, the research concludes by looking at times where these measures are unsuccessful, identifying

how companies can prepare for worst case scenarios such as hacking.

2. Consumers perspective

a. Definition of the problem

Big data is the massive collection, processing, and use of consumers personal information by

both companies and government agencies. Indeed, the internet and related technologies enable these

players to gather both massive and high-quality data on consumers, which are then used either to

implement better marketing strategies (companies) or to increase surveillance on citizens behaviors

(government agencies). As said, such data are collected by means of new technologies. For instance,

stores can monitor consumers spending patterns by means of loyalty cards; government agencies can

surveille peoples actions via cameras in public spaces; internet companies can target ads to

consumers leveraging their personal e-mail addresses and online behaviors. As these players gather

extensive amounts of data (big data) on consumers and citizens lives, actions, and preferences,
ethical and regulatory concerns arise to protect the private nature of such data. Questions are posed in

our society on the plausibility of outside organizations invading peoples privacy, with many

consumers feeling uneasy in giving away private information, companies trying to expand their reach

on private data, and regulators struggling to find the right balance on the use of big data.

Having said that, we want to highlight the precise scope of our analysis: while we are not

interested in the issues posed by the illegal misappropriation of personal data, we will instead focus on

the concerns posed using those private data that are:

- appropriated by means of new technologies

- in the context of normal and legal business or governmental activities

- with the consent of private users

- usually in exchange for some benefit to the users themselves

- but whose use and employment still represent a grey area in our societys current regulations.

To help the reader understand what we are talking about, we have elaborated the following

mutually-exclusive and collectively-exhaustive list of the privacy issues related to big data:

Issue Description Example

Privacy breaches & Data used to predict intimate Beginning to market products to a

embarrassment personal details & breaches and pregnant woman before she has

hacks that can reveal embarrassing told others in her family

and other personal info

Anonymization & Anonymized data sets can be Unique intersection of attributes

data masking combined to identify individuals can lead to individual

identification
Accuracy & data Predictions based on big data Information from social media can

quality analytics are not always accurate due be used to wrongly screen and

to inaccurate/unverified data or select job applicants

flawed algorithms. Following

decisions can be inappropriate and

harmful for individuals

Discrimination Big data can generate automated Banks may not be able to tell by a

discrimination credit application the applicants

race (illegal) but could deduce race

based upon a wide variety of data

(collected online and through IoT)

and turn down a loan to an

individual

Government Governmental agencies are exposed FBI asking Apple to unlock

exemptions to lighter privacy regulations criminals phone

Data brokerage & Numerous companies collect and sell An insurance company can charge

misuse consumer profile data which can be a higher premium to a consumer

used against consumers purchasing books on Amazon

related to a disease

b. Consumers responses to privacy issues and concerns

As they wait for regulators to enact clear laws to govern the use of big data, consumers

respond to their concerns by trying and defending their privacy by means of autonomous protecting

behaviors, especially when using the internet, which is both the largest source of big data for outside
organizations as well as, arguably, the domain where the most sensitive private information are

shared. As the charts below show, a common practice by privacy-concerned users is that of reducing

the amount of private information publicly shared on their social media profiles; this behavior has

increased over years, with the ever-increasing reliance of companies on big data:

Figure 1: Social Media Disclosure Behavior

Other common defensive behaviors of consumers when surfing the web include: sharing only

demographic, but not contact details; reducing the number of vendors with which to share private

information, to reduce likelihood of leaks; not sharing location data on smartphones to avoid targeted

ads; careful posting on social networks to minimize possibility of negative screening when applying

for jobs.

This notwithstanding, research by Pew Research Center actually shows how consumers are

increasingly become aware of the role of new technologies in their lives and increasingly understand
the trade-offs posed by big data and the digital economy: by collecting and analyzing consumers

private data, companies can tailor their offer to consumers, who could benefit, for instance, from

personalized discounts. Therefore, the research shows, consumers are becoming more willing to share

their private information and give away some privacy in exchange for tangible benefits, like saving

money through loyalty cards, gaining access to useful information/service for free (e.g. Gmail,

Wikipedia), as well as facilitating social and business encounters (e.g. Facebook, LinkedIn). However,

according to the research, the conditio sine qua non for customers to be willing to trade their private

information appears to be the avoidance of getting spam, no risk of data breaches and no violation of

intimacy tied to location data. Some findings of the cited research are represented in the exhibits

below, which show how acceptable some privacy-invading practices are considered to be by

consumers.

Figure 2: Public Perception of Data Collection


c. Privacy Concerns by the Numbers

Go-Globe, a website design company, reports in its study that 64% of internet users have

some degree of privacy concerns when browsing the internet while Pew Research purports that most

internet users would like to be anonymous at least once in awhile. In aggregate statistics compiled by

online article The State of Online Privacy by business2community, the numbers clearly show

significant concern over online privacy across geographies, demographics and content.

Geographically, percentage of internet users who have some degree of concern about their online

privacy ranged from a high of 81% in Latin America to a low of 51% in Europe. North America was

the third lowest at 59%. The following is a few of the highlights from the compile statistics:

- 68% believe current laws are not good enough

- 78% are worried about lack of privacy as a result of having so much information about them

available on the internet

- 70% noted that exposure of personal information online reduce their level of social media use

and presence

- 60% have given inaccurate information on social media about themselves as safety precaution

- 55% have asked someone to remove an online post or untag them because of privacy

concerns

- 1 in 4 used unsecured public wi-fi leaving their personal information in open

- 91% believe it is not fair for a company to collect personal information without their

knowledge in exchange of discount

Furthermore, survey respondents reported the following information about themselves were available

online:

Photo of Birth date Email Company Home Cell Home


you address you work address number phone
for number

66% 50% 46% 44% 30% 24% 21%


Table 1: Privacy Survey Results
3. Government perspective

We have evaluated several key and lesser-known government regulations that relate to both

data protection and the governments access to data. Each of these include both foreign and domestic

regulations and several are being reformed and debated in congress. In this section, we explore some

of the most important acts that have been frequently used by the government.

a. Regulations on Data Protection:

Computer Fraud and Abuse Act (CFAA)

The Computer Fraud and Abuse Act (CFAA) has been one of the most abused regulations,

and has been used throughout many prosecutions since it was formed. Congress enacted this

anti-hacking statute in 1989 as an amendment to an existing computer fraud law that was included in

the Comprehensive Crime Control Act of 1984. It was amended to make it federal crime to access and

share protected information. However, this act was written loosely enough that it has been abused

ever since.

i. Robert Morris Jr. & the first computer worm

In 1986, the CFAA was used to convict Robert Morris Jr. for releasing the first computer

worm. Since then, it has been used to convict countless more low and high profile hackers alike. In

1994, the law was amended to allow civil actions to be held accountable under the law as well; this

allowed companies to sue any workers who steal company secrets.

ii. Prosecution of Lori Drew

In 2008, 49-year-old mother, Lori Drew was prosecuted for creating a fake Myspace profile to

cyberbully a teenage girl. Drew conspired with her daughter to make a fake Myspace profile of a boy

to draw in a teenage girl into an online relationship with a nonexistent boy, and then humiliate her;

this resulted in the girl committing suicide. In order to prosecute Drew, prosecutors attempted to adopt

an interpretation of the CFAA. They stated that Drew had unauthorized access to Myspaces system

and by creating a fake account, Drew violated the websites terms of service - provide factual
information about themselves and not use the Myspace systems to harass people. However, the judge

ruled that the conviction was interpreted too vaguely and went beyond the scope of the CFAA. If the

judge approved of the conviction, this wouldve made it a felony for any person who would violate

the terms of service for any website.

iii. MIT Students at the Hacker Conference

At the 2008 Def Con Hacker Conference, Three MIT students were barred for giving a

presentation on a flaw in the Massachusetts Bay Transportation Authority (MBTA) ticketing system.

The MBTA sought to temporarily bar the students from presenting on this ticketing flaw and a judge

invoked the CFAA under the conditions that speaking about the flaw would enable other people to

hack the system. This ruling implied that talking about hacking was the same thing as actual hacking.

Later the MBTA wanted to make the gag order permanent; however a different judge later ruled that

the CFAA does not apply to speech.

Gramm-Leach-Bliley Act & Health Insurance Portability and Accountability Act

These acts are laws in the financial and healthcare service industries requiring large service

providers to protect against data threats and maintain data integrity. Compliance is mandated in these

industries and they must protect private information in highly regulated industries.

Data Protection Act (1998)

The Data Protection Act of 1998 made it illegal to use user data for any other reason than the

data was intended for. In addition, the data cannot be seen or used unless given permission. This act

applies to any storage system, including computers, servers, external hard drives, etc. that store data

about a living person.

b. Regulations of Government Access to Data:

Electronic Communications Privacy Act (ECPA):

Gives governments the rights to obtain electronic files (email, Facebook messages, Cloud files, etc)
with only a subpoena and not a warrant after theyre 180 days old. It has extended to allow employers

to track employees and governments to track civilians.

Cyber Intelligence Sharing and Protection Act (CISPA)

Dictates how companies share information about cyberthreats with the federal government. This was a

proposed extension of National Security Act of 1947 to work on cybercrime; however loose

definitions of the definition of cyber threats prevented its passing.

Trans-Pacific Partnership (TPP)

Sets standards for intellectual property and other related issues across international borders, including

free movement of data across border. Though heavily debated in todays election process, could set

potential for more data collection across border.

Computer Misuse Act of 1990

Introduced this act to stop threat which have become more frequent over time such as hackers. Used

throughout the organization as they do not tolerate any employee or customer gaining access to other

users computer and profile on the system. This act also allows government operatives to access data

without using a warrant.

c. The Patriot Act

In the 45 days after 9/11, the Patriot Act was passed to make it easier for the government to spy on

American citizens by expanding the ability to monitor phone and email communications and other

data. In a specific part of the Patriot Act - Section 215 - it allows the government to collect data on the

phone records of every person in the US. In addition, it allows for secret court orders to collect

tangible things related to government investigations that could include a variety of things from

identification records all the way to internet browsing patterns.


i. Section 215 - NSA Initiatives

With Section 215 of the Patriot Act, the NSA issued several initiatives cloaked in secrecy.

Due to leaked and government-released documents, people found that the NSA had several

initiatives issued under Section 215. The NSAs initiatives included Section 702, Executive

Order 12333, PRISM, UPSTREAM, MYSTIC, and 215 Metadata.

ii. FISA Amendment Act - Section 702

The FISA Amendment Act allows for the bulk collection of internet communications that

cross US soil to foreign lands. It can be used for any internet communication that is hosted

overseas on a server that crosses outside the US, and this data is collected on the database.

The law prohibits the collection of data from domestic citizens, but allows the targeting of

non-U.S. people that are reasonably believed to be outside the U.S..

iii. Executive Order 12333

The initiative is what the NSA uses when other authorities arent aggressive enough in

collection data, or when the NSA isnt collecting as much data as they would like. It is used

when an internet communication provider passes their data from server to server, which

sometime include ones outside of the US. Once the data passes outside of the US, the NSA

retains a copy of that data.

iv. PRISM

PRISM is how the government extracts the data out of internet communication providers,

directly from their servers. In a sense, PRISM deputizes PRISM partners - Google, Yahoo,

Facebook, Skype, Apple, etc - to monitor all of the data. The type of data involve email,

photos, videos, and much more online user data.

v. UPSTREAM

UPSTREAM is used by the NSA to catch citizens data as it transits through the internet.

Once caught, the data is then copied and searched through. The NSA is able to do this by

installing numerous surveillance equipments at several points along the backbone of the
internet - networks of high-capacity cables, and internet routers.

vi. MYSTIC

With this initiative, the NSA can collect data on all voice calls with citizens. When talking

about certain data, the NSA can record the duration of the call and in some countries even the

content can be recorded, including those in the Bahamas, Mexico, the Phillippines, and

Kenya.

vii. Section 215 Metadata

The government uses 215 Metadata to compel companies to hand over any information with

regards to intelligence. This means that the government can see who youre sharing your data

with.

4. Company responses

As a group notorious to resist change, Congress has attempted to keep up with modern

technology best it could. The regulations mentioned above are many of their attempts to keep

companies and themselves in check. As regulations have changed, companies have needed to change

adapt in order to stay in the good graces of the government. Or have they?

The US government has attempted on numerous occasions to utilize technology to keep better

track of residents and citizens with surveillance of online communications. There have also been

discussions questioning the rights to net neutrality or mandate government backdoors into

technologies that we interact with on a daily basis. In environments like this, technology companies

need to balance the requests of the government with maintaining consumer trust and privacy. These

companies have a wide range of responses to the changes in regulation. With that in mind, it is

important to get an understanding of which companies lean more towards the user versus which tend

to work more often with the government.


a. Evaluation Criteria

There have been a number of studies looking into how companies react to changing

regulation. When it comes to protecting users digital rights, the Electronic Frontier Foundation rates

the titans of Silicon Valley are evaluated in the following five categories.

i. Industry-Accepted Best Practices: This category combines 3 sub-criteria that have

been widely adopted within the industry in the past and are now considered essential to protecting

users.

- Warrant requirement before handing over data: Are governments required to obtain a judicial

warrant before the company can hand over user data to them?

- Transparency report: Does the company publish a transparency report that includes the

number of times government has requested user data and the number of times it has complied

with the request?

- Law enforcement guide: Does the company publish how they respond to requests from the

government?

ii. Informing users about government requests: Has the company made any promise

about informing its users when the government wants their data unless prohibited by law? The main

rationale behind this is to give users a chance to defend themselves to fight the government request in

court.

iii. Disclosure of data retention policy: This criterion concerns whether companies

disclose the length at which they maintain their users data and make them available to the

government, and specifically applies to data that isnt visible to the users, such as the results of

analysis done on their data, the logs of users IP addresses, and any content that users has deleted.

This criterion does not take into account the length of the data retention period, only the disclosure of

it.

iv. Disclosure of government requests to remove user data: Government censorship can

take different forms, and this is one of them. This criterion acknowledges the companies that publish
the number of times the government has requested removal of user generated content or suspension of

users accounts and the number of times the company has complied to such requests. Since censorship

is a serious issue, this criterion also evaluates whether the company has the proper legal processes to

deal with such requests.

v. Opposition to government backdoors: With the recent clash between the FBI and

Apple over access to an iPhone that belonged to the San Bernardino attacker, the issue of giving

government a backdoor access into user data has gained prominence. The majority of cybersecurity

experts have agreed that government mandated backdoors could potentially cause serious security

issues and are opposed to the idea. This criterion looks for any public declaration from the company

about their position on this issue.

b. Evaluation Summary

The following nine companies has satisfied every criteria listed above: Adobe, Apple,

CREDO, Dropbox, Sonic, Wickr, Wikimedia, Wordpress, and Yahoo. While these are the shining

stars in this finding, two major telecom providers - Verizon and AT&T are lagging behind, pass only

one of the criteria out of five. The vast majority of the companies evaluated oppose government

mandated backdoors, to the delight of security experts and users alike, with the notable exceptions

being Reddit and Verizon. A more complete company by company result can be found in the

appendix.

5. Liability

a. History

Managing privacy in the modern age of big data is no simple task. As previously discussed,

this is a multi-party discussion with consumers, companies, and regulators weighing their viewpoints

in determining how to treat the data that is produced. Consumers choose to be online or modern

society practically requires it. With the introduction of cookies and other online trackers, consumers
choices are then tracked and stored for later use. Companies collect this online information. They

track, store, and build models around consumer data to be utilized to better optimize their

decision-making. As consumer information is considered private, regulators must determine what is

and what is not fair use of the data.

The previous sections offered substantial analysis of the give and take between these different

stakeholders. However, not always are these actions enough to protect consumers and companiess

stored data. Hacking, the act of accessing or stealing information through technological loopholes, is

an issue thats been affecting data collectors for decades. In the age of big data, it has become an even

greater problem, as a higher magnitude of more delicate information is being stored than ever before.

In 2016 alone, over two billion records were stolen, many of which from well-known, reputable

companies and federal agencies.

Some of the biggest hacking scandals in the last few years include those of Experian-Court

Ventures, Target, JPMorgan Chase, the IRS, and Yahoo. In the Experian-Court Ventures case, 200

million files were accessed by an international fraud agency, where they were used to construct false

identities. Though investigations led to limited company responsibility, they are still under fire with

both internal and class-action lawsuits. Target, on the the other hand, had around 50 million files

accessed. This breach led them to payout over $100 million in settlements to major banks as well as

consumers. JPMorgan continues to deal with a cyber-attack that accessed 83 million accounts.

Though some of the hackers have been arrested and extradited, the firm has not been assigned any

blame or settlement fee. The largest, recently published data breaches are those of Yahoo!, where 500

million accounts were accessed in 2014 and potentially 1 billion in 2013. Reports that they were

state-sponsored add to the question of who is to blame. These concerns are putting Verizons $4.83

billion acquisition in question.

b. Assigning Guilt
When dealing with data breaches, there is a wide range of results. First and foremost, both

internal and federal investigations are conducted to discover the culprit and how the hack was

conducted. The ways in which data breaches occur are varied, with some possible causes being

unchecked sale of data, stolen hardware, employee mistakes, bug exploitation, and cyber-attacks.

Though it would be easy to assign blame to those conducting the illicit behavior, the assigned justice -

payment, jail time - does not often make up for the damage done. Thus, it takes ample discussion to

determine who is liable and for what amount.

So who is to blame? This answer to this question is difficult, as there has been little

consistency when dealing with large companies. In some cases, such as the Target data breach, the

company was very much on the hot seat for what happened. Because consumer data was accessed, the

consumers filed a class-action lawsuit against them. This lawsuit was settled for $10 million.

However, much larger sums were granted to large banks and credit card companies such as Visa and

MasterCard. In this scenario, not only was Target dealing with customer trust and loyalty, but also

with major partnerships who had to cover the losses both monetarily and with their shared data. These

partner settlements ended up costing Target upwards of $106 million, a much larger amount than that

given to the consumers. This would seem to signify that the company who is breached would be

responsible for all the losses. However, that is not necessarily the case.

Conversely, in many cases, the companies are much slower to respond to complaints. In both

the JPMorgan and Experian-Court Ventures cases, neither has publicly paid for any settlements.

JPMorgan has benefited from the USs ability to extradite involved hackers from other countries.

Experian-Court Ventures, though having a similar luxury, are still dealing with the fact they allowed

sold information without thoroughly vetting the buyer, and also accepted payments after discovering

some misuse of data. Though there is a clear culprit in their case, there has yet to be determined a final

judgment on repercussions since the 2012 breach. Lawsuits internally as well as through class-action

continue to be a concern of the company, but there lacks a clear conclusion.


c. Hack Protection

With the high potential losses from hacking, it is important to invest heavily in data

protection. Though costly to maintain, without it, companies would be liable for even greater amounts

of data loss and customer skepticism. Companies need to be proactive in how they prepare for

hacking attacks. This can be both strategically and technically.

From a strategic perspective, it is vital a company first and foremost is aware of and abides by

legal precedent. If in the healthcare or financial services industries, that means abiding by HIPAA and

GLBA. If in other industries, companies need to be aware of the other laws and legal discussions

taking place to determine how to best protect their customers privacy rights. Second, companies need

to train their employees at all levels how to be aware of potential cyber threats. This specifically

applies to non-technical hacks, which happens when hackers receive information directly from the

company, i.e. calling a service provider asking for information pretending to be the customer. The

company also needs to set up disaster plans, a set of protocol in place in case technical staff notices

something awry in the companys database. Thus, they can act as quickly as possible and prevent

further losses.

From a technical perspective, there are a number of components a company must invest in.

From a partnership perspective, they must confirm that all platforms are hosted on reputable sites.

Companies must set up strong Firewalls, Cyber Protection, and encryption softwares. Its necessary to

implement hard to predict password protocol such as two-factor authentication. Most importantly,

large companies must implement an IT protection team with real-time analytics to constantly be aware

of potential threats. By conducting each of these steps, companies can best prepare themselves for

cyber criminals.
6. Conclusions.

As we move further into the digital age, we are presented with increasingly complex issues

that have never been seen before, most notably in the collection and utilization of private, personal

information. Such information is now being collected and used by virtually every industry in order to

remain competitive and serve their customers better. While there are clear benefits for both consumers

and companies through the use of this data, it could potentially be abused in the wrong hands.

Overtime, the consumers have become more aware of the implications of their private data being

publicly available on the Internet, and have begun taking steps to protect themselves; however in

cases where there are clear and tangible benefits, the consumers are still very much willing to share

their information.

The government has also realized the importance of this issue and has enacted several

regulations such as the Computer Fraud and Abuse Act and the Data Protection Act of 1998, as well

as prosecuting those who try to steal or abuse user information on the Internet. While some politicians

in the government have realized the need to protect consumers from big data, others, namely the NSA,

have tried to take advantage of this increasing availability of information in the name of national

security. Despite the efforts of some to regulate government access to data through legislations such

as the Electronic Communications Privacy Act and the Cyber Intelligence Sharing and Protection Act,

the NSA has had little obstruction in the quest for data.

So the act of protecting consumer data falls largely upon the giants in the technology industry.

In their latest report, the Electronic Frontier Foundation has found that while overwhelming majority

of technology companies oppose giving government a backdoor access to their user data, some

companies, such as Verizon and AT&T, have fallen behind in terms of consumer data protection.
The world of big data is a complex and faceted network. Navigating this network may be

difficult, but only through being aware of all the factors behind big data, can users privately and

securely traverse this world.

Sources

https://www.secureworldexpo.com/10-big-data-analytics-privacy-problems
http://www.csoonline.com/article/2855641/big-data-security/the-5-worst-big-data-privacy-risks-and-
how-to-guard-against-them.html
https://epic.org/privacy/big-data/
http://www.informationweek.com/strategic-cio/security-and-risk-strategy/top-data-privacy-issues-to-s
care-you-in-2016/a/d-id/1323752
http://www.business.com/technology/privacy-and-security-issues-in-the-age-of-big-data/
https://www.isaca.org/Groups/Professional-English/big-data/GroupDocuments/Big_Data_Top_Ten_
v1.pdf
https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf
http://theconversation.com/big-data-security-problems-threaten-consumers-privacy-54798
http://searchsecurity.techtarget.com/feature/Managing-big-data-privacy-concerns-Tactics-for-proacti
ve-enterprises
http://www.business2community.com/infographics/state-online-privacy-infographic-01567181#gCt2C
dU98UUL84iF.97
http://www.pewinternet.org/2016/01/14/privacy-and-information-sharing/
https://www.democraticmedia.org/
https://www.cmu.edu/dietrich/sds/docs/loewenstein/PrivacyHumanBeh.pdf
https://www.eff.org/who-has-your-back-government-data-requests-2015
http://www.pcworld.com/article/2052813/3-essential-techniques-to-protect-your-online-privacy.html
http://www.usatoday.com/story/money/personalfinance/2016/04/16/8-ways-protect-your-privacy-onli
ne/83056240/
https://www.microsoft.com/en-us/safety/online-privacy/prevent.aspx
https://www.eff.org/who-has-your-back-government-data-requests-2015
http://www.nytimes.com/2015/07/08/technology/code-specialists-oppose-us-and-british-government-a
ccess-to-encrypted-communication.html
http://www.business2community.com/infographics/state-online-privacy-infographic-01567181#WjVH

rixgTZFqAeQy.97
https://www.wired.com/2014/11/hacker-lexicon-computer-fraud-abuse-act/

http://www.networkworld.com/article/2164315/lan-wan/4-internet-privacy-laws-you-should-know-ab

out.html

https://www.eff.org/foia/section-215-usa-patriot-act

APPENDIX

Potrebbero piacerti anche