NANO SCIENTIFIC RESEARCH CENTRE PVT.LTD., AMEERPET, HYD
WWW.NSRCNANO.COM, 09640648777, 09652926926
DOT NET PROJECTS LIST--2013
DOT NET 2013 IEEE PAPERS
Slicing: A New Approach to Privacy
Preserving Data Publishing
Abstract:
Several
anonymization techniques, such as generalization and bucketization, have been
designed for privacy preserving microdata publishing. Recent work has shown
that generalization loses considerable amount of information, especially for
high-dimensional data. Bucketization, on the other hand, does not prevent
membership disclosure and does not apply for data that do not have a clear
separation between quasi-identifying attributes and sensitive attributes. In
this paper, we present a novel technique called slicing, which partitions the
data both horizontally and vertically. We show that slicing preserves better
data utility than generalization and can be used for membership disclosure protection.
Another important advantage of slicing is that it can handle high-dimensional
data. We show how slicing can be used for attribute disclosure protection and
develop an efficient algorithm for computing the sliced data that obey the
ℓ-diversity requirement. Our workload experiments confirm that slicing
preserves better utility than generalization and is more effective than
bucketization in workloads involving the sensitive attribute. Our experiments
also demonstrate that slicing can be used to prevent membership disclosure.
Algorithm
Used:
Slicing Algorithms:
Our
algorithm consists of three phases: attribute partitioning, column
generalization, and tuple partitioning. We now describe the three phases.
Algorithm
tuple-partition(T, ℓ)
1. Q = {T}; SB = ∅.
2. while Q is not empty
3. remove the
first bucket B from
Q; Q = Q − {B}.
4. split B into two buckets B1 and B2, as in Mondrian.
5. if
diversity-check(T,
Q ∪ {B1,B2} ∪ SB, ℓ)
6. Q = Q ∪ {B1,B2}.
7. else SB = SB ∪ {B}.
8. return SB.
Algorithm
diversity-check(T,T_, ℓ)
1. for each
tuple t ∈ T,
L[t] = ∅.
2. for each
bucket B in T_
3. record f(v) for each column value
v in bucket B.
4. for each
tuple t ∈ T
5. calculate p(t,B) and find D(t,B).
6. L[t] = L[t] ∪ {hp(t,B),D(t,B)i}.
7. for each
tuple t ∈ T
8. calculate p(t, s) for each s based on L[t].
9. if p(t, s) ≥ 1/ℓ, return false.
10. return true.
System
Architecture:
Existing
System:
First, many existing clustering
algorithms (e.g., k-
means) requires the calculation of the “centroids”. But there is no notion
of“centroids”in our setting where each attribute forms a data point in the
clustering space. Second, k-medoid
method is very robust to the existence of outliers (i.e., data points that are
very far away from the rest of data points). Third, the order in which the data
points are examined does not affect the clusters computed from the k-medoid method.
Disadvantages:
1. Existing
anonymization algorithms can be used for column generalization, e.g.,Mondrian. The
algorithms can be applied on the subtable containing only attributes in one
column to ensure the anonymity requirement.
2. Existing
data analysis (e.g., query answering) methods can be easily used on the sliced
data.
3. Existing
privacy measures for membership disclosure protection include differential
privacy and presence.
Proposed System:
We present a novel technique called
slicing, which partitions the data both horizontally and vertically. We show
that slicing preserves better data utility than generalization and can be used
for membership disclosure protection. Another important advantage of slicing is
that it can handle high-dimensional data. We show how slicing can be used for
attribute disclosure protection and develop an efficient algorithm for
computing the sliced data that obey the ℓ-diversity requirement. Our workload
experiments confirm that slicing preserves better utility than generalization
and is more effective than bucketization in workloads involving the sensitive
attribute.
Advantages:
1. We
introduce a novel data anonymization technique called slicing to improve the
current state of the art.
2. We
show that slicing can be effectively used for preventing attribute disclosure,
based on the privacy requirement of ℓ-diversity.
3. We
develop an efficient algorithm for computing the sliced table that satisfies ℓ-diversity. Our
algorithm partitions attributes into columns, applies column generalization,
and partitions tuples into buckets. Attributes that are highly-correlated are
in the same column.
4. We
conduct extensive workload experiments. Our results confirm that slicing
preserves much better data utility than generalization. In workloads involving
the sensitive attribute, slicing is also more effective than bucketization. In
some classification experiments, slicing shows better performance than using
the original data (which may overfit the model). Our experiments also show the
limitations of bucketization in membership disclosure protection and slicing
remedies these limitations.
Module
Description:
1.
Original Data
2.
Generalized Data
3.
Bucketized Data
4.
Multiset-based Generalization Data
5.
One-attribute-per-Column Slicing Data
6.
Sliced Data
System Configuration:
H/W System Configuration:
Processor - Pentium –IV
RAM - 256 MB(min)
Hard
Disk - 20 GB
S/W System
Configuration:-
Operating System :Windows95/98/2000/XP
Application Server
: Tomcat5.0/6.X
Front End :
HTML, Java, JSP,AJAX
Scripts : JavaScript.
Database Connectivity :
MYSQL
No comments:
Post a Comment