Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Preserving
Data
Mining
Supervised By: Prof-DR-Safaa O AL-Mamory
– Query auditing.
• Query auditing
The results of queries are either modified or restricted.
Example: output perturbation and query restriction.
– Randomization method
–The Data fly system was one among the earliest practical applications of
privacy-preserving transformations. This method was designed to forestall
identification of the topics of medical records which can be stored in multi-
dimensional format. The multi-dimensional information may include directly
identifying information like Social Security number, or indirectly identifying
information like age, sex, or zip-code. The system was designed in response to
the priority that the method of removing only directly identifying attributes like
Social Security numbers was not sufficient to ensure privacy. While the work
includes a similar motive because the k- anonymity approach of preventing
record identification, it doesn't formally use k-anonymity model so as to prevent
identification through linkage attacks.
Bioterrorism Applications
–Often a biological agent like anthrax produces symptoms which are
almost like the common respiratory diseases like the cough, cold and flu.
In absence of prior knowledge of such an attack, health-care providers
may diagnose a patient laid low with an anthrax attack as having
symptoms from one or more common respiratory diseases. The secret's to
quickly identify a real anthrax attack from a standard outbreak of a
standard disease. In many cases, an unusual number of such cases in an
exceedingly given locality may indicate a bio-terrorism attack. So as to
identify such attacks it's necessary to trace incidences of those common
diseases furthermore. Therefore, the corresponding data would want to be
reported to public health agencies. However, the common respiratory
diseases don't seem to be reportable disease by law. The answer is to
possess “selective revelation” which initially allows only limited access to
the info. However, within the event of suspicious activity, it allows, drill-
down into the underlying data. This provides more identifiable information
in accordance with public health law.
Genomic Privacy
–Recent years have seen tremendous advances within the science of DNA
sequencing and forensic analysis with the utilization of DNA. As a result,
the databases of collected DNA are growing in no time in the both the
medical and enforcement communities. DNA data is taken into account
extremely sensitive, since it contains almost uniquely identifying
information about a private. As within the case of multi-dimensional data,
simple removal of directly identifying data like social security number isn't
sufficient to stop re-identification. It’s been shown that a software called
Clean Gene can determine the identify ability of DNA entries independent
of the other demographic or other identifiable information. The software
relies on publicly available medical data and knowledge of particular
diseases so as to assign identifications to DNA entries. I has been shown
that 98-100% of the individuals are identifiable using this approach. The
identification is done by taking the DNA sequence of a private so
constructing a genetic profile corresponding to the sex, genetic diseases,
the placement where the DNA was collected etc. One way to protect the
anonymity of such sequences is that the use of generalization lattices
which are constructed in such some way that an entry within the modified
database can't be distinguished from at least (k-1) other entries. Another
approach constructs synthetic data which preserves the aggregate
characteristics of the initial data, but preserves the privacy of the initial
records.
Homeland Security Applications
-Credential validation problem: in credential validation approach, a
shot is created to take advantage of the semantics related to Social
Security number to work out whether the person presenting SSN
credential truly own it
- Identity theft: the identity angel system crawls through cyberspace,
and determines folks that are at risk from fraud. This information is
accustomed notify appropriate parties.
-Web camera surveillance: one possible method for surveillance is
with the employment of publicly available webcams, which may be
accustomed detect unusual activity. The approach is made more
privacy sensitive by extracting only facial count information from the
photographs and using these so as to detect unusual activity. it's
been hypothesized that unusual activity is detected only in terms of
facial count instead of using more specific information about
particular individuals.
-Video surveillance: in context of sharing video-surveillance data, a
serious threat is that the use of facial recognition software, which
might match the facial images in videos to the facial images in an
exceedingly driver license database. A balanced approach is to use
selective downgrading of the facial information, so that it scientifically
limits the flexibility of biometric identification software to reliably
identify faces, while maintaining facial details in images. The
algorithm is noted as k-Same, and also the secret's to identify faces
which are somewhat similar, then construct new faces which
construct combinations of features from these similar faces. Thus, the
identity of the underlying individual is anonymized to a particular
extent, but the video continues to stay useful.
-Watch list problem: the government typically contains a list of
known terrorists or suspected entities which it wishes to trace. The
aim is to look at transactional data like store purchases, hospital
admissions, airplane manifests, and hotel registrations so as to spot
or track these entities. This is a difficult problem, since the
transactional data is private, and also the privacy of subjects who
don't appear within the watch list have to be protected.