Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.

Similar presentations

Presentation on theme: "Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on."— Presentation transcript:

1 Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on Tackling Computer Systems Problems with Machine Learning Techniques )‏ Presented By Hassan Wassel

2 Introduction System logs is a critical tool for system administrators. They are massive in amount We need to rank them according to importance. Previous work:  Ranking using expert rules  Visualization  One machine log

3 What is Important? This paper propose that an important message is the message appears in a probability higher than the expected. Represent messages of the same type by one message type. Calculate the empirical distribution of probabilities and rank them. Systems are not homogeneous.

4 Algorithm Using K-means clustering to divide system logs into classes. Estimate the empirical distribution of each class. Given a system log, identify a class and rank messages according to its P

5 Clustering K-Means tries to minimize an objective function J=Sum j Sum i d 2 (X i, Z j )‏ Inputs:  Number of Clusters  Distance Matrix Outputs:  Membership matrix  Objective function value Features Clusters Patterns

6 Dimensionality Problem The data was 3000 system log with 15,000 message type. However, it is sparse Distance measurement using these 15,000 feature is computationally intensive. Solution: Dimensionality reduction

7 Feature Construction Using Spearman Correlation between every two system logs  Corr(x,y) = 1 – (6 || r x – r y || 2 )/(N(N-1))‏ From k logs X n message types to k X k similarity matrix. Question: How to calculate rank vectors?

8 Evaluation Compare Spearman Correlation to other feature construction schemes.  Histogram of Pairwise distance  Maximal Mutual Information  Improvement in Score

9 Comment Future Work  Correlation based clustering  Feature extraction + choice of distance measure  Bi-clustering  Fuzzy Clustering Evaluation  Use of human expertise to evaluate the ranking.  Clustering index

10 Thank you! Pros and Cons!

Download ppt "Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on."

Similar presentations

Ads by Google