Presentation is loading. Please wait.

Presentation is loading. Please wait.

Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.

Similar presentations


Presentation on theme: "Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance."— Presentation transcript:

1 Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance measures as weights in a relation graph; adapt a graph cutting algorithm to use edge weights (Neville et al., 2003). Probabilistic relational model with an adapted EM algorithm (Taskar et al., 2001). Calculate a hybrid metric that linearly combines relation similarity and attribute similarity, run single-link algorithm (Bhattacharya and Getoor, 2005) Open Problems in Relational Data Clustering University of Maryland Baltimore County Adam Anthony aanthon2@umbc.edu Marie desJardins mariedj@cs.umbc.edu Overview Data clustering is the task of detecting patterns in a set of data. Most algorithms take non-relational data as input and are sometimes unable to find significant patterns. Many data sets can include relational information, as well as independent object attributes. Relational data clustering techniques can help find strong patterns in such sets. Two areas of interest in relational data clustering are: clustering heterogeneous data, and relation selection. Feature Space A feature space is a set of objects with attributes, FS = {o 1, o 2, …, o n }, where o i = Internet Movie Database Attributes include personal data such as awards received, financial earnings, age, gender, or Hollywood stock exchange rating. Examples of relations are acted-in, directed, and sequel. CIA World Factbook Attribute values come from categories like government, economics, and population. Relations can be derived from sources such as common membership in international organizations. Relation Space A relation space is a set of relation graphs, RS = {RG 1, RG 2,..., RG K }, where RG i = {O i, R i }, O i  FS, and R i is a set of edges for a specific relation Heterogeneous Data It can be very difficult to compare different typed objects. For example, how can actors be compared to directors? One possibility is an inter-cluster relation signature. Relation Selection It is intuitive that, just as some features are not helpful for clustering a data set, some relations might provide little information for a relational clustering algorithm, or even harm the performance of an algorithm. As relational clustering algorithms continue to develop, detecting such graphs will become more important. Conclusion Early research in relational clustering has been successful. Analyzing relational patterns can help us develop methods for comparing heterogeneous data objects. Development of relation selection techniques will help improve existing relational clustering algorithms. 1.Cluster one set of homogeneous data. This is the reference clustering. 2. For each object, Create a vector that records the number of links from that object to each cluster discovered in step 1. This is the inter-cluster relation signature. 3. Cluster all objects based on the inter-cluster relation signatures. AU G-77 BotswanaKenya ThailandJapanChina AsDB US UKItaly G-8 G-77 UNSC This research funded by NSF grant #0545726 The graph on the right includes an additional relation graph (blue links) that represents the World Trade Organization, which fully connects all countries shown (redundant links omitted). Including the WTO as one of the relation graphs obscures the patterns that can be seen in the graph on the left, making a clustering harder to find. We find this situation to be similar to cases in the feature space where an attribute has the same value for all objects. Removing the WTO graph reduces the size of the total graph, and makes finding patterns easier. AU G- 77 Botswan a Kenya Japan AsDB Italy G-8 G-77 UNSC US Thailand UKChina Ron Howard Norman Jewison Carl Weathers Talia Shire directed acted-in directed Ron Howard Norman Jewison Carl Weathers Talia Shire directed acted-in Boxing ComedyDrama 1 Boxing 1 Comedy 1 Boxing 1 Drama References Bhattacharya, I., & Getoor, L. (2005). Entity resolution in graph data (Technical Report CS-TR-4758). University of Maryland. Neville, J., Adler, M., & Jensen, D. (2003). Clustering relational data using attribute and link information. Proceedings of the Text Mining and Link Analysis Workshop. Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. Proceeding of IJCAI-01, 17 th International Joint Conference on Artificial Intelligence (pp. 870–878). Seattle, US. Yin, X., Han, J., & Yu, P. S. (2005). Cross-relational clustering with user’s guidance. KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 344–353). New York, NY, USA: ACM Press.


Download ppt "Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance."

Similar presentations


Ads by Google