Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chad A. Williams † Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th,

Similar presentations


Presentation on theme: "Chad A. Williams † Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th,"— Presentation transcript:

1 Chad A. Williams † Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th, 2009 Attribute Constrained Rules: A new approach for missing traveler data † Supported by the NSF IGERT program under Grant DGE-0549489

2 Abstract Address reducing participant burden in travel behavior modeling  New approach introduced that greatly reduces cost of missing data  Opportunity to reduce data collected with limited impact on predictive performance  Review wider application implications

3 Outline Overview and problem context Approach Illustrative example Experimental evaluation Summary and next steps

4 Problem space Predicting sets within a sequence  Introduce technique for handling missing data  Minimal impact to predictive performance with even large quantities of data missing Benefits Able to tolerate more problematic data sources Reduce data requirements Opportunity to reduce communication needs OverviewDesignExampleEvaluationSummary

5 Applications Reduce user burden associated with data collection Travel surveys Applications such as intelligent travelers assistant (ITA) Reduce communication costs Mobile networks Sensor networks OverviewDesignExampleEvaluationSummary

6 Implications for ITA Activity Start time Length of Activity Location Activity People involved When planned Time flexibility Location flexibility Accessibility Travel Start time Travel time Mode of travel Trip distance People involved When mode planned Map to sequence of sets Time OverviewDesignExampleEvaluationSummary

7 Problem statement Partially labeled sequence completion Given database of sequences of sets The attributes/labels and possible discrete values within each set are known The value of any attribute within any sequence can be missing  Given a target set and the surrounding sequence predict the missing value(s) {a 1,b 2, c 1 }{a 2,?}{a 1,b 1 } OverviewDesignExampleEvaluationSummary

8 Related work Lots of effort on mining frequent sequences and rules efficiently Techniques for associative mining with missing values but don’t extend well to sequences OverviewDesignExampleEvaluationSummary Original DB VDB X1VDB X2 acba?bcbacbb

9 Related work (cont.) Regular expression rule mining Recent work has started to examine tailoring rule form to benefit specific classes of problems OverviewDesignExampleEvaluationSummary {a 1,b 2, c 1 }{a 2 }{a 1,b 1 }→ {a 1,b 2, c 1 }{a 2,b 2,c 1 }{a 1,b 1 } {a 1,b 2, c 1 }{a 2,*}{a 1,b 1 }→ {a 1,b 2, c 1 }{a 2,b 2,c 1 }{a 1,b 1 }

10 Design Introduce new rule form for attribute constrained sequences with missing values Constrained rule similar to a template Focus on attribute presence as well as values Allows for more accurate confidence & support using knowledge of problem OverviewDesignExampleEvaluationSummary

11 Attribute constrained rule form Introduce new rule form for attribute constrained partially labeled sequences Allows for more accurate confidence & support using knowledge of problem {a 1,b 2, c 1 }{a 2,?}{a 1,b 1 } {a 1,b 2, c 1 }{a 2 }{a 1,b 1 }→ {a 1,b 2, c 1 }{a 2, b 2 }{a 1,b 1 } {a 1,b 2, c 1 }{a 2 }{a 1,b 1 }→ {a 1,b 2, c 1 }{a 2, c 1 }{a 1,b 1 } {a 1,b 2, c 1 }{a 2, b:*}{a 1,b 1 }→ {a 1,b 2, c 1 }{a 2, b 2 }{a 1,b 1 } Traditional sequential rule form Attribute constrained rule (ACR) form OverviewDesignExampleEvaluationSummary

12 Example: Frequent sequence graph {a 1 }{a 2,b 2 }{b 1 } {a 1 }{a 2,b 2 }{a 2,b 1 } {a 1 }{b 2, c 2 }{a 2 }{b 1 } {a 1 }{a 2, c 1 } {b 1 } Example sequence DB OverviewDesignExampleEvaluationSummary {a 1 } [4] {a 1 }{a 2 } [4] Ø [4] {a 1 }{ b 1 } [4] {a 2 } [4] {b 1 } [4] {b 2 } [3] {a 1 }{ b 2 } [3] {a 2 b 2 } [2] {a 2 }{b 1 } [4] {b 2 }{ a 2 } [2] {b 2 }{ b 1 } [3] {a 1 }{b 2 }{b 1 } [3] {a 1 }{b 2 }{a 2 } [2] {a 1 }{a 2 }{b 1 } [4] {a 2 b 2 }{b 1 } [2] {a 1 }{a 2 b 2 } [2] {a 1 }{a 2 b 2 }{b 1 } [2] Traditional sequential rules:  [support = 2/4, confidence = 2/4]

13 ACR extended frequent sequence graph ACR templates added to the tree Level 0 Level 1 {a 1 } [4] Ø [4] {a 2 } [4] {b 1 } [4] {b 2 } [3] {a 2 b:*} [2] {b 2 }{b:*} [3] {a :* } [4] {b :* } [4] {b:*}{b 1 } [3] {a:*b:*} [2] OverviewDesignExampleEvaluationSummary {a 2 b 2 } [2] {b 2 }{ b 1 } [3] {a:*b 2 } [2]

14 ACR extended frequent sequence graph ACR templates added to the tree Additions - O(#freq. patterns * 2 min(freq. pattern len., set length) ) Level 0 Level 1 Level 2 Level 3 {a 1 } [4] {a 1 }{a 2 } [4] Ø [4] {a 1 }{ b 1 } [4] {a 2 } [4] {b 1 } [4] {b 2 } [3] {a 1 }{ b 2 } [3] {a 2 b 2 } [2] {a 2 }{b 1 } [4] {b 2 }{ a 2 } [2] {b 2 }{ b 1 } [3] {a 1 }{b 2 }{b 1 } [3] {a 1 }{b 2 }{a 2 } [2] {a 1 }{a 2 }{b 1 } [4] {a 2 b 2 }{b 1 } [2] {a 1 }{a 2 b 2 } [2] {a 1 }{a 2 b 2 }{b 1 } [2] {a:*}{a 2 } [4] {a:*b 2 } [2] {a:*}{b 1 } [4] {a:*}{b 2 } [3] {a 1 }{b:*} [4] {a 1 }{a:*} [4] {a 2 b:*} [2] {a 2 }{b:*} [4] {b 2 }{a:*} [2] {b 2 }{b:*} [3] {a :* } [4] {b :* } [4] {b:*}{a 2 } [2] {b:*}{b 1 } [3] {a:*b:*} [2] {a 1 }{b 2 }{b:*} [3] {a 1 }{b 2 }{a:*} [2] {a 1 }{a 2 }{b:*} [4] {a 2 b 2 }{ b:*} [2] {a 1 }{a:*}{b 1 } [4] {a 1 }{a:*b 2 } [2] {a 1 }{b:*}{a 2 } [2] {a:*}{a 2 }{b 1 } [4] {a 1 }{b:*}{b 1 } [3] {a:*}{b 2 }{a 2 } [2] {a 1 }{a 2 b:*} [2] {a:*}{b 2 }{b 1 } [3] {a 1 }{a:*b:*} [2] {a:*b 2 }{ b 1 } [2] {a:*}{a 2 b 2 } [2] {a 2 b:*}{ b 1 } [2] {a:*b:*}{ b 1 } [2] {a 1 }{a 2 b 2 }{b:*} [2] {a 1 }{a 2 b:*}{b 1 } [2] {a 1 }{a:*b 2 }{b 1 } [2] {a 1 }{a:*b:*}{b 1 } [2] {a:*}{a 2 b 2 }{b 1 } [2] OverviewDesignExampleEvaluationSummary

15 Example Example rules: Traditional sequential rules:  [support = 2/4, confidence = 2/4] Attribute constrained rules (ACR):  [support = 2/4, confidence = 2/2] {a 1 }{a 2,b 2 }{b 1 } {a 1 }{a 2,b 2 }{a 2,b 1 } {a 1 }{b 2, c 2 }{a 2 }{b 1 } {a 1 }{a 2, c 1 } {b 1 } Example sequence DB Support = # of occurrences # of sequences Confidence = # of full sequence # of premise OverviewDesignExampleEvaluationSummary

16 Evaluation Metrics Precision = # true positives (# true positives + # false positives) Recall = # true positives (# true positives + # false negatives) F-measure = (2 * precision * recall) precision + recall 2001 Atlanta household travel survey 21k people, 8k households, 126k places visited, 48 hrs 49,695 sets of information, avg. seq. length 7.4 sets Focus 6 attributes (act. type, mode, time, duration) Results take average over all combinations {a 1,b 2, c 1 }{a 2,?}{a 1,b 1 } OverviewDesignExampleEvaluationSummary

17 Performance vs.. percent of values missing OverviewDesignExampleEvaluationSummary Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

18 Traditional performance vs.. # of target values missing OverviewDesignExampleEvaluationSummary Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

19 ACR performance vs.. traditional performance w.r.t. # of targets missing OverviewDesignExampleEvaluationSummary Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

20 ACR performance vs.. traditional performance Recall OverviewDesignExampleEvaluationSummary Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

21 ACR performance vs.. traditional performance Precision OverviewDesignExampleEvaluationSummary Williams, C. A.; Nelson, P. C. & Mohammadian, A., “Attribute Constrained Rules for Partially Labeled Sequence Completion”, LNCS: Advances in Data Mining - Applications and Theoretical Aspects 5633, 338 – 352, 2009.

22 Summary  Introduced technique for handling missing data  New rule form for entire class of problems Much better predictions when data is missing than traditional methods Represents opportunity to reduce number of data points that need to be collected/communicated Computationally practical OverviewDesignExampleEvaluationSummary

23 Future work Study underway to confirm benefits in survey applications Adapt technique for streaming data sources Sensor applications OverviewDesignExampleEvaluationSummary

24 Other related research Transfer learning of activity patterns Further data requirement reduction Reduce learning time Sequential pattern mining of streams Current model offline modeling Enable real-time pattern learning for ITA  Demo of pattern and location learning Tilted time windows OverviewDesignExampleEvaluationSummary

25 Questions? IGERT Program in Computational Transportation Science Information Technology Transportation IGERT Integrative Graduate Education and Research Traineeship This research was supported in part by the National Science Foundation IGERT program under Grant DGE- 0549489 OverviewDesignExampleEvaluationSummary

26 Additional results C. Williams, A. Mohammadian, P. Nelson, and S. Doherty, “Mining Sequential Association Rules for Traveler Context Prediction” The First International Workshop on Computational Transportation Science (IWCTS’08), Dublin, Ireland, July 2008. Sequential rules


Download ppt "Chad A. Williams † Peter C. Nelson Abolfazl (Kouros) Mohammadian University of Illinois at Chicago Department of Computer Science Colloquium July 16th,"

Similar presentations


Ads by Google