Wednesday, 16 January 2013

Slicing: A New Approach to Privacy Preserving Data Publishing


NANO SCIENTIFIC RESEARCH CENTRE PVT.LTD.,  AMEERPET, HYD
WWW.NSRCNANO.COM, 09640648777, 09652926926


JAVA PROJECTS LIST--2013
JAVA 2013 IEEE PAPERS




Slicing: A New Approach to Privacy Preserving Data Publishing

Abstract:
            Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recent work has shown that generalization loses considerable amount of information, especially for high-dimensional data. Bucketization, on the other hand, does not prevent membership disclosure and does not apply for data that do not have a clear separation between quasi-identifying attributes and sensitive attributes. In this paper, we present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the ℓ-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute. Our experiments also demonstrate that slicing can be used to prevent membership disclosure.
Algorithm Used:
Slicing Algorithms:
Our algorithm consists of three phases: attribute partitioning, column generalization, and tuple partitioning. We now describe the three phases.
Algorithm tuple-partition(T, ℓ)
1. Q = {T}; SB = .
2. while Q is not empty
3. remove the first bucket B from Q; Q = Q − {B}.
4. split B into two buckets B1 and B2, as in Mondrian.
5. if diversity-check(T, Q {B1,B2} SB, )
6. Q = Q {B1,B2}.
7. else SB = SB {B}.
8. return SB.
Algorithm diversity-check(T,T_, ℓ)
1. for each tuple t T, L[t] = .
2. for each bucket B in T_
3. record f(v) for each column value v in bucket B.
4. for each tuple t T
5. calculate p(t,B) and find D(t,B).
6. L[t] = L[t] {hp(t,B),D(t,B)i}.
7. for each tuple t T
8. calculate p(t, s) for each s based on L[t].
9. if p(t, s) 1/ℓ, return false.
10. return true.
Existing System:
            First, many existing clustering algorithms (e.g., k- means) requires the calculation of the “centroids”. But there is no notion of“centroids”in our setting where each attribute forms a data point in the clustering space. Second, k-medoid method is very robust to the existence of outliers (i.e., data points that are very far away from the rest of data points). Third, the order in which the data points are examined does not affect the clusters computed from the k-medoid method.
Disadvantages:
1.      Existing anonymization algorithms can be used for column generalization, e.g.,Mondrian. The algorithms can be applied on the subtable containing only attributes in one column to ensure the anonymity requirement.
2.      Existing data analysis (e.g., query answering) methods can be easily used on the sliced data.
3.      Existing privacy measures for membership disclosure protection include differential privacy and presence.
Proposed System:
            We present a novel technique called slicing, which partitions the data both horizontally and vertically. We show that slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. We show how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data that obey the ℓ-diversity requirement. Our workload experiments confirm that slicing preserves better utility than generalization and is more effective than bucketization in workloads involving the sensitive attribute.
Advantages:
1.      We introduce a novel data anonymization technique called slicing to improve the current state of the art.
2.      We show that slicing can be effectively used for preventing attribute disclosure, based on the privacy requirement of -diversity.
3.      We develop an efficient algorithm for computing the sliced table that satisfies -diversity. Our algorithm partitions attributes into columns, applies column generalization, and partitions tuples into buckets. Attributes that are highly-correlated are in the same column.
4.      We conduct extensive workload experiments. Our results confirm that slicing preserves much better data utility than generalization. In workloads involving the sensitive attribute, slicing is also more effective than bucketization. In some classification experiments, slicing shows better performance than using the original data (which may overfit the model). Our experiments also show the limitations of bucketization in membership disclosure protection and slicing remedies these limitations.
Module Description:
1.      Original Data
2.      Generalized Data
3.      Bucketized Data
4.      Multiset-based Generalization Data
5.      One-attribute-per-Column Slicing Data
6.      Sliced Data

System Configuration:

H/W System Configuration:

Processor               -    Pentium –IV

RAM                     -    256 MB(min)
Hard Disk             -   20 GB

S/W System Configuration:-

Operating System            :Windows95/98/2000/XP
Application  Server    :      Tomcat5.0/6.X                                                  
Front End                          :   HTML, Java, JSP,AJAX
Scripts                                :   JavaScript.
Database Connectivity      :   MYSQL

No comments:

Post a Comment