Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Andrew K. C. Wong Yang Wang 國立雲林科技大學 National Yunlin University of.
Advertisements

國立雲林科技大學 National Yunlin University of Science and Technology 11 Discovering Personal Gazetteers: An Interactive clustering Approach Changqing Zhou, Dan.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Wireless Sensor Network Wireless Sensor Network Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Virus Pattern Recognition Using Self-Organization Map.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Development of a reading material recommendation system based on a knowledge engineering approach Presenter.
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Motivated Reinforcement Learning for Non-Player Characters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: YU-SHENG.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Event Summarization for System Management Wei Peng†, Chang-shing Perng§, Tao Li†, Haixun Wang§ †Florida International University §IBM T.J.Waston Research.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Michael.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor : Dr. Hsu Presenter : Wen-Hsiang Hu Authors : Wei Peng ; Tao Li ; Sheng Ma SIGKDD Explorations, 2005, Pages:44 -51

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Common Categories Message Categorization Naïve Bayes Classifier Modified Naïve Bayes algorithm Hidden Markov Model Experiments Conclusion Future Work

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation A popular approach to system management is based on analyzing system log files. However, some new aspects of the log files have been less emphasized in existing methods from data mining and machine learning community. e.g. temporal characteristics

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We will describe our research efforts on mining system log files for automatic management. Automated log data analysis can be performed without much domain knowledge and its results provide guidance for network managers to perform their jobs more effectively.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction We apply text mining techniques to automatically categorize the text messages with disparate formats into a set of common categories start-up: Component A reports “A has started” in log file. Component B reports “ B has begun execution” in log file. Improve categorization accuracy by considering the temporal characteristics of log messages Utilize visualization tools to help the users understand and interpret patterns in the data.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Common Categories Transform the messages in the log files into a set of common categories. We first manually determine a set of categories as the basis for transformation. The set of categories is based on the CBE (Common Base Event) format established by IBM initiative [26]. The set of categories includes start, stop, dependency, create, connection, report, request, configuration, and other.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 The goal is to assign predefined category labels to unlabeled documents based on the likelihood inferred from the training set of labeled documents. We use Naïve Bayes as our classification approach Uses training data to calculate Bayes-optimal estimates of the model parameters. The most probable class is then assigned to the test data. Message Categorization

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 suppose there are L categories, denoted by C 1,C 2, · · · C L, We can characterize the likelihood of a document with the sum of probability over all the categories Given a set of training samples S, the Naïve Bayes classifier uses S to estimate P(d i |C j ) and P(C j ). To classify a new sample, it uses Bayes rule to compute class posterior The predicted class for the document d i is then just argmax j P(C j |d i ). Naïve Bayes Classifier

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Incorporating the temporal information- Modified Naïve Bayes algorithm In many scenarios, text messages generated in the log files usually contain timestamps. If a sequence of log messages are considered, the accuracy of categorization for each message can be improved. For example, the components usually first start up a process, then stop the process at a later time.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Incorporating the temporal information- Modified Naïve Bayes algorithm (cont.) Suppose we are given a sequence of adjacent messages D = (d 1, d 2, · · ·, d T ). let Q i be the category labels for message d i (i.e.,Q i is one of C 1,C 2, · · · C L ). Now we want to classify d i+1 V.S. text classification probability P(C j |d i+1 ) state transition probability P(C j |Q i )

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 One example of effectiveness of modified Naïve Bayes text classification probability P(C j |d i+1 ) Incorporating the temporal information- Modified Naïve Bayes algorithm (cont.) 考慮時間不考慮時間 previous state current state

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 state transition probability P(C j |Q i ). Incorporating the temporal information- Modified Naïve Bayes algorithm (cont.) configuration to startconfiguration to configuration P(C j |d i+1 ) * P(C j |Q i ) * * >

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Incorporating the temporal information- Hidden Markov Model Associated with each of a set of states, S = {s 1, · · ·, s n }, is a probability distribution over the symbols in the emission vocabulary K = {k 1, · · ·, k m }. There is also a prior state distribution π(s). Training data consists of several sequences of observed emissions, one of which would be written {o 1, · · ·, o x } the category labels as states the log messages as emission vocabulary HMM explicitly considers the state transition sequence.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Incorporating the temporal information- Hidden Markov Model (cont.) When one log message has been assigned several competitive state labels by text classification, For example, the message “The Windows Installer initiated a system restart to complete or continue the configuration of ’Microsoft XML Parser’.” has probability to be categorized into configuration state has probability to be labeled as stop state The state transition probability is calculated from the training log data sets. The probability of emitting messages can be estimated as the ratio of the occurrence probabilities of log messages to the occurrence of their states in the training data. Viterbi algorithm is used to find the most possible state sequence that emits the given log messages.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 The training data of the raw log files are labeled with nine categories, i.e., configuration, connection, create, dependency, other, report, request, start, and stop. Table 2 lists the keywords and their probabilities in the corresponding classes. Experiments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Experiments (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Figure 1: The 2D plot of the “raven” data set. X axis is the time. Y axis is the state We observe that these dotted lines in create and connection categories occur alternately. We note that the connection problems occur after the create problems for at least three days. Experiments (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 The 2D plot of the “raven” data set. X axis is the time. Y axis is the component Component 5 will report problems synchronously whenever there are problems in component 6. after component 0 starts, it tends to keep generating report messages. Experiments (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 3D-plot will help users understand and interpret complicated patterns. Experiments (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 We propose two approaches (Modified Naïve Bayes and HMM) for incorporating temporal information to improve the performance of categorizing log messages. Conclusion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Instead of manually determining the set of common categories, we could develop techniques (i.e. clustering) to automatically infer them from historical data. the number of different common categories for system management can be large => utilize the dependence relationships among different categories develop methods that can efficiently discover interesting temporal patterns from the transformed log files Future work