Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.

Similar presentations


Presentation on theme: "File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer."— Presentation transcript:

1 File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer

2 Introduction Self-* infrastructure need information about Users Applications Policies Not readily provided, and cannot depend on them to provide them So? Must be learned

3 Self-* storage systems Sub-problem of the self-* structure Key: to get hints based on what creators associate with their files File size File names Lifetimes Intentions determined, then decisions can be made Results: better file organization, performance

4 Classifying Files Current: rule-of-thumb policy selection Generic, not optimized Better: distinguish classes Finer grained policies Ideally assigned at file creation Determine classes at creation Self-* must learn this association 1) traces 2)running fs

5 So, how? Create model that classify based on (some attribs) Name Owner Permissions Must filter out irrelevant attribs Classifier must learn rules to do so Based on test set Then inference happens

6 The right model Model must be Scalable Dynamic Cost-sensitive (mis-prediction cost) Interpretable (human) Model selected: decision trees

7 ABLE Attribute-based learning environment 1. obtain traces 2. make decision tree 3. make predictions Top down, until all attribs are used Split sample until leaves have similar file attribs After creation, query begins

8 Tests Based on several systems to make sure it is workload-independent DEAS03 EECS03 CAMPUS LAB The control: MODE algorithm – places all files in a single cluster

9 Results Prediction results quite good 90% - 100% claimed Clustering files by attribs are clear Predict that a model ’ s ruleset will converge over time

10 Benefits of incremental learning Dynamically refines model as samples become available Generally better than one-shot learners Sometimes one-shot performs poorly Ruleset of incremental learners are smaller

11 On accuracy More attributes = chance of over-fitting More rules -> smaller ratios Loses compression benefits Predictive models can have false predictions Can impact performance Things that should be in RAM is placed on disk instead etc. Solution: cost functions Penalize errors Create biased tree System goals will need to be translated into it

12 Conclusion These trees provide prediction accuracies in the 90% range Adaptable via incremental learning Continued work: integration into self-* infrastructure

13 Questions?


Download ppt "File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer."

Similar presentations


Ads by Google