NANO SCIENTIFIC RESEARCH CENTRE
PVT.LTD., AMEERPET, HYD
WWW.NSRCNANO.COM, 09640648777, 09652926926
JAVA PROJECTS LIST--2013
JAVA 2013 IEEE PAPERS
Clustering
with Multiviewpoint-Based Similarity Measure
Abstract:
All clustering
methods have to assume some cluster relationship among the data objects that
they are applied on. Similarity between a pair of objects can be defined either
explicitly or implicitly. In this paper, we introduce a novel
multiviewpoint-based similarity measure and two related clustering methods. The
major difference between a traditional dissimilarity/similarity measure and
ours is that the former uses only a single viewpoint, which is the origin,
while the latter utilizes many different viewpoints, which are objects assumed
to not be in the same cluster with the two objects being measured. Using
multiple viewpoints, more informative assessment of similarity could be
achieved. Theoretical analysis and empirical study are conducted to support
this claim. Two criterion functions for document clustering are proposed based
on this new measure. We compare them with several well-known clustering
algorithms that use other popular similarity measures on various document
collections to verify the advantages of our proposal.
Existing System
A
common approach to the clustering problem is to treat it as an optimization
process. An optimal partition is found by optimizing a particular function of
similarity (or distance) among data. Basically, there is an implicit assumption
that the true intrinsic structure of data could be correctly described by the
similarity formula defined and embedded in the clustering criterion function.
Hence, effectiveness of clustering algorithms under this approach depends on
the appropriateness of the similarity measure to the data at hand. For
instance, the original k-means has sum-of-squared-error objective function that
uses Euclidean distance. In a very sparse and high-dimensional domain like text
documents, spherical k-means, which uses cosine similarity (CS) instead of Euclidean
distance as the measure, is deemed to be more suitable.
Proposed System:
The
work in this paper is motivated by investigations from the above and similar
research findings. It appears to us that the nature of similarity measure plays
a very important role in the success or failure of a clustering method. Our
first objective is to derive a novel method for measuring similarity between
data objects in sparse and high-dimensional domain, particularly text
documents. From the proposed similarity measure, we then formulate new
clustering criterion functions and introduce their respective clustering
algorithms, which are fast and scalable like k-means, but are also capable of
providing high-quality and consistent performance.
Software Requirement Specification
Software
Specification
Operating System : Windows XP
Technology : JAVA
1.6, Jfreechart
Hardware
Specification
Processor : Pentium
IV
RAM : 512 MB
Hard Disk : 80GB
No comments:
Post a Comment