Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on Tackling Computer Systems Problems with Machine Learning Techniques )‏ Presented By Hassan Wassel

Introduction System logs is a critical tool for system administrators. They are massive in amount We need to rank them according to importance. Previous work:  Ranking using expert rules  Visualization  One machine log

What is Important? This paper propose that an important message is the message appears in a probability higher than the expected. Represent messages of the same type by one message type. Calculate the empirical distribution of probabilities and rank them. Systems are not homogeneous.

Algorithm Using K-means clustering to divide system logs into classes. Estimate the empirical distribution of each class. Given a system log, identify a class and rank messages according to its P

Clustering K-Means tries to minimize an objective function J=Sum j Sum i d 2 (X i, Z j )‏ Inputs:  Number of Clusters  Distance Matrix Outputs:  Membership matrix  Objective function value Features Clusters Patterns

Dimensionality Problem The data was 3000 system log with 15,000 message type. However, it is sparse Distance measurement using these 15,000 feature is computationally intensive. Solution: Dimensionality reduction

Feature Construction Using Spearman Correlation between every two system logs  Corr(x,y) = 1 – (6 || r x – r y || 2 )/(N(N-1))‏ From k logs X n message types to k X k similarity matrix. Question: How to calculate rank vectors?

Evaluation Compare Spearman Correlation to other feature construction schemes.  Histogram of Pairwise distance  Maximal Mutual Information  Improvement in Score

Comment Future Work  Correlation based clustering  Feature extraction + choice of distance measure  Bi-clustering  Fuzzy Clustering Evaluation  Use of human expertise to evaluate the ranking.  Clustering index

Thank you! Pros and Cons!

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.

Similar presentations

Presentation on theme: "Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.

Similar presentations

Presentation on theme: "Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on."— Presentation transcript:

Similar presentations

About project

Feedback