Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:
Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.
Basic Data Mining Techniques Chapter Decision Trees.
Data Mining.
Basic Data Mining Techniques
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on Mikko Vapa, research student.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Introduction to Data Mining Engineering Group in ACL.
Evaluating Performance for Data Mining Techniques
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
West Virginia University
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Data Mining Chun-Hung Chou
National Institute of Science & Technology Algorithm to Find Hidden Links Pradyut Kumar Mallick [1] Under the guidance of Mr. Indraneel Mukhopadhyay ALGORITHM.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Course Title Database Technologies Instructor: Dr ALI DAUD Course Credits: 3 with Lab Total Hours: 45 approximately.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
NEURAL NETWORKS FOR DATA MINING
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
An Evaluation of Commercial Data Mining Proposed and Presented by Emily Davis Supervisor: John Ebden.
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Computing & Information Sciences Kansas State University Paper Review Guidelines KDD Lab Course Supplement William H. Hsu Kansas State University Department.
Clustering.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
Text Clustering Hongning Wang
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
IEEE International Conference on Fuzzy Systems p.p , June 2011, Taipei, Taiwan Short-Term Load Forecasting Via Fuzzy Neural Network With Varied.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
CLUSTERING EE Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
CLARANS: A Method for Clustering Objects for Spatial Data Mining IEEE Transactions on Knowledge and Data Enginerring, 2002 Raymond T. Ng et al. 22 MAR.
Introduction to Machine Learning, its potential usage in network area,
Data mining in web applications
Recommendation in Scholarly Big Data
What Is Cluster Analysis?
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 10 —
DATA MINING © Prentice Hall.
Data Mining: Concepts and Techniques Course Outline
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Department of Computer Science and Engineering, KLS, GIT, Belgavi.
Clustering Wei Wang.
Presentation transcript:

Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani (Head of the Department) SUBMITTED BY G. Naresh Kumar ( ) M.Tech(CSE)-III SEM

Contents: Introduction. Motivation. Problem statement. Work done so far. Work to be done. Conclusion. References.

Introduction. Grid computing or simply grid is a generic term given to technologies designed to make pools of distributed computer resources available on-demand. Grid computing has become a well-established method for Internet-based high- performance computing. Grid provides wide-spread, dynamic, flexible and coordinated sharing of geographically distributed networked resources, among dynamic user groups.

Data mining: Data Mining or Knowledge discovery refers to a variety of techniques that have developed in the fields of databases, machine learning and pattern recognition. The process of finding useful patterns and information from raw data is often known as Data mining.

Clustering: Clustering is a division of data into groups of similar objects. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. It is a process of unsupervised learning. Cluster analysis has been widely used in numerous applications, including market research, pattern recognition, data analysis, and image processing.

Clustering techniques: 1) Partitioning Clustering ~ PAM ~ CLARA ~ CLARANS ~ K-Means 2) Supervised Clustering ~ K-nearest neighbors 3) On line mode clustering ~ ECM ~ Evoc 4) Fuzzy Clustering ~ Fuzzy c-means

Motivation In a grid environment the number of computing nodes and users participating are increase and may reach up to thousands or millions. The abundance of these resources forges new problems, such as how to collect the massive amounts of evolving resources in real time and extract the useful information from them. And, these resources are not ordered, random and chaotic where normal user is not able to easily discover any knowledge or meaningful information from them. In order to deal with these requirements, clustering is proposed as one of the best ways in terms of processing large set of raw data and turning these data into meaningful information.

The Flow of Clustering Process in Grid Environment

Problem statement: Mining cluster in a single large database require more processing power. Due to conventional technology used for centralized data mining is no longer suitable for new systems. We apply different clustering methods on CPU usage to identify computers behaviors. To find out the stable algorithm it requires the dynamicity, accuracy and the ability to identify the stable cluster members. Among those best clustering algorithm will be implemented for better processing and cluster stability in grid environment. However, the results are based on threshold value, stability value and stability hour

Work done so far: Survey on the existing clustering algorithms. Survey on Grid technologies. Installed Grid gain toolkit.

Work to be done : Testing of different types of clustering algorithms and calculate their performances, complexity in a system. Testing of clustering algorithms in grid environment and measure their performances to find out the stable clustering algorithm. Finally, implementation of the stable clustering algorithm in grid environment for better processing and cluster stability.

Cluster Stability Stability Value: The value (in percentage) that measures the change in cluster radius. For instance, if the stability value is defined as 5%, any cluster radius that grows or shrinks less than 5% from the original size will be considered as stable. Stability Hour: The value that defines the required amount of time in hours for a cluster member to stay in the same cluster in order for it to be considered stable. If the stability hour is set to 3 hours, any cluster member that stays in the same cluster for more than this amount of time will be considered as stable.

Assumptions: A cluster is considered to be stable depending on stability value which is pre-defined by the user, for instance 20%. A cluster member is considered to be stable if it stays in the same stable cluster continuously for or at least two hours. The stability hour is determined by the users.

Conclusion: Here the stable clustering algorithm has been evaluated using three main criteria; that is dynamicity, accuracy and the ability to identify the stable cluster members. This stable clustering algorithm can handle and process massive amount of data without any significant error rate. From the experiment, we can conclude that the stable clustering algorithm is more dynamic than other existing clustering algorithms.

References: GRID COMPUTING: A Practical Guide to Technology and Applications. Ahmar Abbas. Charles River Media Inc, “Data Mining Concepts and Techniques” by Jiawei and Micheline Kamber, University of Illinois at Urbana-Champaign 2000© Morgan Kaufmann Publishers. Zhijie Xu, Laisheng Wang, Jiancheng Luo and Jianqin Zhang, “A Modified Clustering Algorithm for Data Mining”. Kee Sim Ee, Chan Huah Yang, Fazilah Haran, “Mining of Resource Usage Using Evoc Algorithm in Grid Environment”. Huimin Wang, Guihua Nie and Kui Fu, “Distributed data mining based on semantic web and grid” in 2009 International Conference on Computational Intelligence and Natural Computing. Ping Luo, Kevin Li, Zhongzhi Shi, Qing He, “Distributed data mining in grid computing environments. David A. Cieslak, Nitesh V. Chawla, and Douglas L. Thain published a “Troubleshooting Thousands of Jobs on Production Grids Using Data Mining Techniques” at 9th Grid Computing Conference 2008.

Thank you