Sei sulla pagina 1di 36

Analyzing and Securing

Social Media:
Cloud-based
Assured Information Sharing
Dr. Bhavani Thuraisingham
The University of Texas at Dallas (UTD)

September 18, 2015


Objectives
Outline
Assured Information Sharing
Layered Framework for a Secure Cloud
Cloud-based Assured Information Sharing
Cloud-based Secure Social Networking
Other Topics
Secure Hybrid Cloud
Cloud Monitoring
Cloud for Malware Detection
Cloud for Secure Big Data
Education
Directions
Related Books
Team Members
Sponsor: Air Force Office of Scientific Research
The University of Texas at Dallas
Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr.
Zhiqiang Lin, Dr. Kamil Sarac
Sub-contractors
Prof. Elisa Bertino (Purdue)
Ms. Anita Miller, Late Dr. Bob Johnson (North Texas Fusion Center)
Collaborators
Late Dr. Steve Barker, Dr. Maribel Fernandez, Kings College, U of
London (EOARD)
Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)
Objectives
Cloud computing is an example of computing in which dynamically scalable
and often virtualized resources are provided as a service over the Internet.
Users need not have knowledge of, expertise in, or control over the
technology infrastructure in the "cloud" that supports them.
Our research on Cloud Computing is based on Hadoop, MapReduce, Xen
Apache Hadoop is a Java software framework that supports data intensive
distributed applications under a free license. It enables applications to work
with thousands of nodes and petabytes of data. Hadoop was inspired by
Google's MapReduce and Google File System (GFS) papers.
XEN is a Virtual Machine Monitor developed at the University of Cambridge,
England
Our goal is to build a secure cloud infrastructure for assured information
sharing and related applications
Information Operations Across Infospheres:
Assured Information Sharing

Objectives
Develop a Framework for Secure and Timely Data Sharing Data/Policy for Coalition

across Infospheres
Investigate Access Control and Usage Control policies for Publish Data/Policy Publish Data/Policy
Secure Data Sharing
Develop innovative techniques for extracting information Publish Data/Policy
from trustworthy, semi-trustworthy and untrustworthy Component Component
partners Data/Policy for Data/Policy for
Agency A Agency C

Component
Data/Policy for
Agency B

Scientific/Technical Approach Accomplishments


Conduct experiments as to how much information is lost
as a result of enforcing security policies in the case of Developed an experimental system for determining
trustworthy partners information loss due to security policy enforcement
Develop more sophisticated policies based on role-based Developed a strategy for applying game theory for semi-
and usage control based access control models trustworthy partners; simulation results
Develop techniques based on game theoretical strategies Developed data mining techniques for conducting
to handle partners who are semi-trustworthy defensive operations for untrustworthy partners
Develop data mining techniques to carry out defensive and
offensive information operations
Challenges
Handling dynamically changing trust levels; Scalability
Our Approach
Policy-based Information Sharing
Integrate the Medicaid claims data and mine the data;
Enforce policies and determine how much information has
been lost (Trustworthy partners);
Application of Semantic web technologies
Apply game theory and probing to extract information from
semi-trustworthy partners
Conduct Active Defence and determine the actions of an
untrustworthy partner
Defend ourselves from our partners using data analytics
techniques
Conduct active defence find our what our partners are
doing by monitoring them so that we can defend our
selves from dynamic situations
Policy Enforcement Prototype

Coalition
Layered Framework for Assured Cloud
Computing
QoS
Policies Applications Resource
XACML Allocation
RDF

HIVE/SPARQL/Query

Hadoop/MapReduc/Storage

XEN/Linux/VMM Cloud
Risks/
Monitors
Costs
Secure Virtual
Network Monitor

Figure2. Layered Framework for Assured Cloud

09/19/17 8
Secure Query Processing with
Hadoop/MapReduce
We have studied clouds based on Hadoop
Query rewriting and optimization techniques designed and
implemented for two types of data
(i) Relational data: Secure query processing with HIVE
(ii) RDF data: Secure query processing with SPARQL
Demonstrated with XACML policies
Joint demonstration with Kings College and University of Insubria
First demo (2011): Each party submits their data and policies
Our cloud will manage the data and policies
Second demo (2012): Multiple clouds
Fine-grained Access Control with Hive
System Architecture

Table/View definition and loading,


Users can create tables as well as
load data into tables. Further, they
can also upload XACML policies
for the table they are creating.
Users can also create XACML
policies for tables/views.
Users can define views only if
they have permissions for all
tables specified in the query used
to create the view. They can also
either specify or create XACML
policies for the views they are
defining.
CollaborateCom 2010
SPARQL Query Optimizer for Secure
RDF Data Processing
To build an
Web Interface efficient storage
New Data Query Answer mechanism using
Hadoop for large
amounts of data
Data Preprocessor MapReduce Framework (e.g. a billion
triples); build an
Parser efficient query
N-Triples Converter mechanism for
data stored in
Query Validator & Hadoop; Integrate
Rewriter with Jena
Prefix Generator
Developed a
Server XACML PDP query optimizer
Backend and query
Predicate Based Query Rewriter By rewriting
Splitter Policy techniques for
RDF Data with
XACML policies
Plan Generator and implemented
Predicate Object on top of JENA
Based Splitter
Plan Executor IEEE Transactions
on Knowledge and
Data Engineering,
Demonstration: Concept of Operation

Agency 1 Agency 2 Agency n

User Interface Layer

Relational Data RDF Data

Fine-grained Access Control SPARQL Query Optimizer


with Hive for Secure RDF Data
Processing
RDF-Based Policy Engine
Technology InterfacetotheSemanticWeb
ByUTDallas

InferenceEngine/
RulesProcessor
e.g.,Pellet
Policies
Ontologies
Rules
InRDF

JENARDFEngine
RDFDocuments
RDF-based Policy Engine on the Cloud
Determine how access is granted to a resource as
well as how a document is shared

User specify policy: e.g., Access Control, Redaction,


Released Policy

Parse a high-level policy to a low-level


representation

Support Graph operations and visualization. Policy


executed as graph operations

Execute policies as SPARQL queries over


large RDF graphs on Hadoop

A testbed for evaluating different policy sets over Support for policies over Traditional data and its
different data representation. Also supporting provenance
provenance as directed graph and viewing policy
outcomes graphically IFIP Data and Applications Security, 2010, ACM
SACMAT 2011
Integration with
Assured Information Sharing:
Agency 1 Agency 2 Agency n

User Interface Layer


SPARQL Query
RDF Data
and Policies Policy Translation and
Transformation Layer
RDF Data Preprocessor
MapReduce Framework for
Query Processing

Hadoop HDFS
Result
Architecture
Policy Reciprocity
Agency 1 wishes to share its resources if
Agency 2 also shares its resources with it
Use our Combined policies
Allow agents to define policies based on reciprocity and mutual interest amongst
cooperating agencies

SPARQL query:
SELECT B
FROM NAMED uri1 FROM NAMED uri2
WHERE P
Develop and Scale Policies
Agency 1 wishes to extend its existing policies
with support for constructing policies at a finer
granularity.
The Policy engine
Policy interface that should be implemented by all
policies
Add newer types of policies as needed
Justification of Resources
Agency 1 asks Agency 2 for a justification of
resource R2

Policy engine
Allows agents to define policies over provenance
Agency 2 can provide the provenance to Agency 1
But protect it by using access control or redaction policies
Other Example Policies
Agency 1 shares a resource with Agency 2 provided
Agency 2 does not share with Agency 3
Agency 1 shares a resource with Agency 2 depending
on the content of the resource or until a certain time
Agency 1 shares a resource R with agency 2 provided
Agency 2 does not infer sensitive data S from R
(inference problem)
Agency 1 shares a resource with Agency 2 provided
Agency 2 shares the resource only with those in its
organizational (or social) network
Analyzing and Securing
Social Networks in the Cloud

Analytics
Location Mining from Online Social Networks
Predicting Threats from Social Network Data,
Sentiment Analysis
Cloud Platform for implementation

Security and Privacy


Preventing the Inference of Private Attributes
(liberal or conservative; gay or straight)
Access Control in Social Networks
Cloud Platform for implementation
Security Policies for On-Line Social
Networks (OSN)
Security Policies ate Expressed in SWRL
(Semantic Web Rules Language) examples
Security Policy Enforcement
A reference monitor evaluates the requests.
Admin request for access control could be evaluated by
rule rewriting
Example: Assume Bob submits the following admin request

Rewrite as the following rule


Framework Architecture
Social Network
Application

Access request Access


Decision

Reference
Monitor Knowledge Base SN Knowledge Base
Queries
Modified Access
request
Reasoning Result

Semantic Web
Reasoning
Policy Store
Engine Policy Retrieval
Secure Social Networking in the Cloud
with Twitter-Storm
Social Network 1 Social Network 2 Social Network N

User Interface Layer

Relational Data RDF Data

Fine-grained Access Control SPARQL Query Optimizer


with Hive for Secure RDF Data
Processing
Secure Storage and Query Processing in a
Hybrid Cloud
The use of hybrid clouds is an emerging trend in cloud computing
Ability to exploit public resources for high throughput
Yet, better able to control costs and data privacy
Several key challenges
Data Design: how to store data in a hybrid cloud?
Solution must account for data representation used
(unencrypted/encrypted), public cloud monetary costs and
query workload characteristics
Query Processing: how to execute a query over a hybrid cloud?
Solution must provide query rewrite rules that ensure the
correctness of a generated query plan over the hybrid cloud
Hypervisor integrity and forensics
in the Cloud
Applications

OS
Linux Solaris XP MacOS
integrity
forensics Hypervisor
Virtualization Layer (Xen, vSphere)

Hardware Layer
Cloud integrity &
forensics
Secure control flow of hypervisor code
Integrity via in-lined reference monitor
Forensics data extraction in the cloud
Multiple VMs
De-mapping (isolate) each VM memory from physical memory
Cloud-based Malware Detection

Feature Training &


Stream of known malware or extraction and Model update
Buffer selection using
benign executables
Cloud

Ensemble of
Unknown Feature Classify Classification
executable extraction models

Benign
Malware Class
Remove Keep
Cloud-based Malware Detection
Binary feature extraction involves
Enumerating binary n-grams from the binaries and selecting the best n-grams
based on information gain
For a training data with 3,500 executables, number of distinct 6-grams can
exceed 200 millions
In a single machine, this may take hours, depending on available computing
resources not acceptable for training from a stream of binaries
We use Cloud to overcome this bottleneck
A Cloud Map-reduce framework is used
to extract and select features from each chunk
A 10-node cloud cluster is 10 times faster than a single node
Very effective in a dynamic framework, where malware characteristics change
rapidly
Identity Management
Considerations in a Cloud
Trust model that handles
(i) Various trust relationships, (ii) access control policies based on roles and
attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and
accountability.
Several technologies are being examined to develop the trust
model
Service-oriented technologies; standards such as SAML and XACML; and
identity management technologies such as OpenID.
Does one size fit all?
Can we develop a trust model that will be applicable to all types of clouds
such as private clouds, public clouds and hybrid clouds Identity architecture
has to be integrated into the cloud architecture.
Big Data and the Cloud
0 Big Data describes large and complex data that cannot be managed by traditional data
management tools
0 From Petabytes to Zettabytes to Exabytes of data

0 Need tools for capture, storage, search, sharing, analysis, visualization of big data.

0 Examples include

- Web logs, RFID and surveillance data, sensor networks, social network data (graphs),
text and multimedia, data pertaining to astronomy, atmospheric science, genomics,
biogeochemical, biological fields, video archives
0 Big Data Technologies

0 Hadoop/MapReduce Platform, HIVE Platform, Twitter Storm Platform, Google Apps


Engine, Amazon EC2 Cloud, Offerings from Oracle and IBM for Big Data Management,
Other: Cassandra, Mahut, PigLatin, - - - -
0 Cloud Computing is emerging a critical tool for Big Data Management

0 Critical to maintain Security and Privacy for Big Data


Security and Privacy for Big Data
0 Secure Storage and Infrastructure
0 How can technologies such as Hadoop and MapReduce be Secured
0 Secure Data Management
0 Techniques for Secure Query Processing
0 Examples: Securing HIVE, Cassandra
0 Big Data for Security
0 Analysis of Security Data (e.g., Malware analysis)
0 Regulations, Compliance Governance
0 What are the regulations for storing, retaining, managing, transferring and
analyzing Big Data
0 Are the corporations compliance with the regulations
0 Privacy of the individuals have to be maintained not just for raw data but also
for data integration and analytics
0 Roles and Responsibilities must be clearly defined
Security and Privacy for Big Data
0 Regulations Stifling Innovation?
0Major Concern is too many regulations will stifle Innovation
0Corporations must take advantage of the Big Data
technologies to improve business
0But this could infringe on individual privacy
0Regulations may also interfere with Privacy example
retaining the data
0Challenge: How can one carry out Analytics and still
maintain Privacy?

0 National Science F Workshop Planned for Spring 2014 at the


University of Texas at Dallas
Education on Secure Cloud Computing
and Related Technologies
Secure Cloud Computing
NSF Capacity Building Grant on Assured Cloud Computing
Introduce cloud computing into several cyber security courses
Completed courses
Data and Applications Security, Data Storage, Digital Forensics, Secure Web
Services
Computer and Information Security
Capstone Course
One course that covers all aspects of assured cloud computing
Week long course to be given at Texas Southern University
Analyzing and Securing Social Networks
Big Data Analytics and Security
Directions
Secure VMM and VNM
Designing Secure XEN VMM
Developing automated techniques for VMM introspection
Determine a secure network infrastructure for the cloud
Integrate Secure Storage Algorithms into Hadoop
Identity Management in the Cloud
Secure cloud-based Big Data Management/Social
Networking
Related Books
Developing and Securing the Cloud, CRC Press (Taylor and
Francis), November 2013 (Thuraisingham)
Secure Data Provenance and Inference Control with
Semantic Web, CRC Press 2014, In Print (Cadenhead,
Kantarcioglu, Khadilkar, Thuraisingham)
Analyzing and Securing Social Media, CRC Press, 2014,
In preparation (Abrol, Heatherly, Khan, Kantarcioglu,
Khadilkar, Thuraisingham)

Potrebbero piacerti anche