Ant Inspired Data Mining Brandon Emerson April 22, 2013 1.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

CMPUT 615 Applications of Machine Learning in Image Analysis
PARTITIONAL CLUSTERING
Understanding Operating Systems Fifth Edition
Machine Learning and Data Mining Clustering
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Optimizing General Compiler Optimization M. Haneda, P.M.W. Knijnenburg, and H.A.G. Wijshoff.
INEX 2009 XML Mining Track James Reed Jonathan McElroy Brian Clevenger.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu Department of Computing Science U. of.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Fitting a Model to Data Reading: 15.1,
Classical Techniques: Statistics, Neighborhoods, and Clustering.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Unsupervised Learning
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Hashing General idea: Get a large array
Chapter 1 Program Design
CS107 Introduction to Computer Science Lecture 7, 8 An Introduction to Algorithms: Efficiency of algorithms.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Clustering Unsupervised learning Generating “classes”
Metaheuristics The idea: search the solution space directly. No math models, only a set of algorithmic steps, iterative method. Find a feasible solution.
SWARM INTELLIGENCE IN DATA MINING Written by Crina Grosan, Ajith Abraham & Monica Chis Presented by Megan Rose Bryant.
Data Mining Chun-Hung Chou
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Chapter 1: Introduction to Statistics
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
 1  Outline  stages and topics in simulation  generation of random variates.
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Recursion, Complexity, and Sorting By Andrew Zeng.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Surveys and Attitude Measurement The reason surveys seem to be everywhere is that they are tremendously flexible— you can ask people about anything, and.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
Recursion. What is recursion? Rules of recursion Mathematical induction The Fibonacci sequence Summary Outline.
CSC 211 Data Structures Lecture 13
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Objectives At the end of the class, students are expected to be able to do the following: Understand the searching technique concept and the purpose of.
Bits, Data types, and Operations: Chapter 2 COMP 2610 Dr. James Money COMP
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Flat clustering approaches
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
2/20: Ch. 6 Data Management What is data? How is it stored? –Traditional management storage techniques; problems –DBMS.
Marketing Research Chapter 29. The Marketing Research Process The five steps that a business follows when conducting marketing research are: Defining.
Path Planning Based on Ant Colony Algorithm and Distributed Local Navigation for Multi-Robot Systems International Conference on Mechatronics and Automation.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Project CS 116 Section 4 Deadline 04/28 11:59PM Points: 12.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Bayesian Semi-Parametric Multiple Shrinkage
BlinkDB.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
BlinkDB.
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Classification of unlabeled data:
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Functions/Review
Group 9 – Data Mining: Data
Machine Learning and Data Mining Clustering
Presentation transcript:

Ant Inspired Data Mining Brandon Emerson April 22,

What is data mining? Data mining is any process that analyzes and organizes data into clear and concise formats. It can be particularly powerful when creating relationships between points of data. Mainly used by companies with a consumer focus, specifically marketing divisions. Data mining allows them to make meaningful relationships between products and consumers. 2

Applications in Physics Efficient data mining techniques can improve data storage and retrieval in experiments that require a great deal of data collection. Effective mining can help analysts develop relationships between specific points of data, and thus physical phenomena. 3

Our Goals Use basic ideas about ant behaviors to develop an effective means of data mining. Discuss recent improvements ant clustering algorithms, and compare data mining techniques by results from simple tests. 4

A Simple Model of Ants-1 5 Ant Object

A Simple Model of Ants-2 Ant Object Probability of picking up a is a constant f is the perceived fraction of objects nearby Probability of placing b is a constant Assuming the ant moves randomly and it has enough time to explore the entire area, you could expect all of the objects to be clustered together. 6

A Note on Perception f is the perceived fraction of objects nearby when f > 0 otherwise X y f(x) is now a measure of the similarity of object x to object y in the area around object x When the objects are the same: When the objects are different: α is a scale factor for dissimilarity. 7

The Basic Algorithm 15. end if 16. else 17. if (ant w/object) and (empty site) then 18. compute f(x) and probability of dropping 19. draw random real number R 20. if (R ≤ Prob) then 21. drop object 22. end if 23. end if 24. end if 25. move to randomly selected ant free adjacent site 26.end for 27. end for 28.Print location of objects 8

Improvements-1 Granted ants “short-term memory.” The ants stored their last x number of locations. After picking up data they proceed to their last remembered locations sequentially. Normalized the grid to enable efficient mining of a variety of data set sizes. 9 Where N is the maximum number of data items to be mined. Grid sizeStep size Number of iterations

Improvements-2 10 α determines the percentage of items that are similar. If α is too small, clusters wont be formed. If α is too large, the clusters will combine to create one super cluster. Each ant is uniquely assigned a value for α, and is allowed to change its value in the following way: the ant makes a set number of moves (100), during which it keeps track of how many times it has failed to drop data items F. The rate of failure is found by F/100, and α is adapted according to these parameters. If rate α  0.99 If rate α ≤ 0.99

The Updated Algorithm 11. move_agent to new location 12. I = carried_object 13. compute f*(x) and prob of drop 14. if drop = true then 15. while pick = false do 16. I = random_select_object 17. compute f*(x) and prob of pick 18. pick_up_object 19. end while 20. end if 21. end for 22.end 11

Comparing Techniques Iris 150K-meansACA Clusters Rand Index F-measure Dunn Index Variance Class. Err Best results Clusters3.000 Rand Index F-measure Dunn Index Variance Class. Err Iris 150 is a data set used from the Machine Learning repository. K-means is a standard technique for data mining, and is used here to benchmark the Ant Clustering Algorithm’s (ACA) performance. Maximize these values Minimize this value Important note: the ACA does not need to be given the correct number of clusters to proceed; whereas K-means does.

Summary Ant simulation offers a unique technique for data mining. This technique was developed using simple ideas about ant behavior. Ant Clustering Algorithms could use improvement, but as it stands it is fairly effective. As our understanding of ant behavior improves, perhaps ACA could be refined into an even more efficient tool. 13

Just to be Clear… None of the information presented, including data tables, and code, is my personal work. All of the information was found in the paper below. Boryczka, Urszula. "Ant Colony Metaphor in a New Clustering Algorithm." Control and Cybernetics 39.2 (2010): Print. 14