NANO SCIENTIFIC RESEARCH CENTRE PVT.LTD., AMEERPET, HYD
WWW.NSRCNANO.COM, 09640648777, 09652926926
DOT NET PROJECTS LIST--2013
DOT NET 2013 IEEE PAPERS
Optimizing Bloom Filter Settings in Peer-to-Peer
Multi-keyword Searching
Abstract:
Peer-to-Peer
multi-keyword searching requires distributed intersection/union operations
across wide area networks, raising a large amount of traffic cost. Existing
schemes commonly utilize Bloom Filters (BFs) encoding to effectively reduce the
traffic cost during the intersection/union operations. In this paper, we
address the problem of optimizing the settings of a BF. We show, through mathematical
proof, that the optimal setting of BF in terms of traffic cost is determined by
the statistical information of the involved inverted lists, not the minimized
false positive rate as claimed by previous studies. Through numerical analysis,
we demonstrate how to obtain optimal settings. To better evaluate the
performance of this design, we conduct comprehensive simulations on TREC WT10G test
collection and query logs of a major commercial web search engine. Results show
that our design significantly reduces the search traffic and latency of the
existing approaches.
Existing System:
Existing P2P retrieval mechanisms provide a scalable
distributed hash table (DHT) [3] that allows every individual keyword to be
mapped to a set of documents/nodes across the network that contains the keyword.
Using this single-keyword- based index, a list of entries for each keyword in a
query can be retrieved by using existing DHT lookups. For multikeyword search,
the simple solution which merges the results of each keyword search incurs a
lot of traffic.
Proposed System:
It is well known that Bloom Filter (BF) [4] is an effective way to reduce
such communication cost [5], [6]. A BF is a lossy but succinct and efficient
data structure to represent a set S, which can efficiently process the
membership query such as “is the element x in the set S.” By transmitting the encoded
sets instead of raw sets among peers, the communication cost can be effectively
saved. Applying BF is not difficult, but how to get optimal results in terms of
minimum communication cost is not trivial
In this work, we show
mathematically that the optimal setting of BF in terms of communication cost is
determined by the global statistical information of the involved items on both
sides, not the minimized false positive rate as claimed by the previous studies
[5]. We further demonstrate how to get optimal settings through numerical
analysis. Moreover, we find that the intersection order between sets is indeed important
for multikeyword search; thus, we design optimal order strategies based on BF
for both queries with “AND” and “OR” operators.
Software
and Hardware Requirements
Hardware Required:
System : Pentium IV
Hard Disk : 80
GB
RAM : 512 MB
Software
Required:
Operating
System : Windows
XP
Language
: C#
Modules:
·
Authorization
·
Finding Network Computers
·
DHT creation
·
Single-Keyword Search
·
Multi-Keyword Search
No comments:
Post a Comment