On the Designing of Popular Packages

Slides:



Advertisements
Similar presentations
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
BackTracking Algorithms
Fast Algorithms For Hierarchical Range Histogram Constructions
Chapter 4: Trees Part II - AVL Tree
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
1 Boolean Satisfiability in Electronic Design Automation (EDA ) By Kunal P. Ganeshpure.
CS2420: Lecture 27 Vladimir Kulyukin Computer Science Department Utah State University.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Optimal binary search trees
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Exploring Personal CoreSpace For DataSpace Management Li Yukun and Xiaofeng Meng WAMDM Lab Renmin University of China.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Boolean Minimizer FC-Min: Coverage Finding Process Petr Fišer, Hana Kubátová Czech Technical University Department of Computer Science and Engineering.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Presentation Template KwangSoo Yang Florida Atlantic University College of Engineering & Computer Science.
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
Distributed Control and Autonomous Systems Lab. Sang-Hyuk Yun and Hyo-Sung Ahn Distributed Control and Autonomous Systems Laboratory (DCASL ) Department.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Chapter 10 NP-Complete Problems.
Hybrid BDD and All-SAT Method for Model Checking
Towards Scalable Traffic Management in Cloud Data Centers
Evolutionary Technique for Combinatorial Reverse Auctions
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Database Management System
Priority Queues An abstract data type (ADT) Similar to a queue
The Greedy Method and Text Compression
Parallel Density-based Hybrid Clustering
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
The Greedy Method and Text Compression
Byung Joon Park, Sung Hee Kim
Chapter 12: Query Processing
Spatial Online Sampling and Aggregation
Pyramid Sketch: a Sketch Framework
Analysis and design of algorithm
Data Structures and Algorithms
CS222P: Principles of Data Management Notes #11 Selection, Projection
Objective of This Course
Conflict-Aware Event-Participant Arrangement
Branch and Bound.
Machine Learning: Lecture 3
Sungho Kang Yonsei University
Chapter 11 Limitations of Algorithm Power
Algorithms and Problem Solving
Priority Queues An abstract data type (ADT) Similar to a queue
Canonical Computation without Canonical Data Structure
CS222: Principles of Data Management Notes #11 Selection, Projection
Backtracking and Branch-and-Bound
Canonical Computation without Canonical Data Structure
Decision Trees Jeff Storey.
Fraction-Score: A New Support Measure for Co-location Pattern Mining
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
Presentation transcript:

On the Designing of Popular Packages Yangjun Chen and Wei Shi Department of Applied Computer Science University of Winnipeg

Outline Motivation - Data mining - Most popular packages Signature trees and modified signature trees Single package design (SPD) - Basic algorithm - Heuristic signature tree method Multiple package design (MPD) Experiments Conclusion and Future Work

Motivation Given a query log concerning the customers’ preference on items or activities, design a package which satisfies as many customers as possible. A query log by an travel agency: Most popular packge: Hot spring Hiking airlines QueryId Hot Spring Ride Glacier Hiking Airline Boating Q1 1 ? Q2 Q3 Q4 Q5 Q6 It satisfies three queries: Q1, Q3, Q5

Signature Files and Signature Trees 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 s1: s2: s3: s4: s5: s6: s7: 1 2 4 5 7 s1 s6 s2 s7 s3 s4 s5 1 1 1 1 1 1 Each path represents the identifier of a signature. Identifier(s7): (1, 1)(4, 1)(5, 0)(7, 0) Y. Chen and Y.B. Chen, On the Signature Tree Construction and Analysis, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 9, 2006, pp. 1207-1224.

Modified signature tree over a query log Construction of single package signature tree (SPD): Let Q = {q1, …, qm} be a query log. We use qi[j] to represent the value of the jth attribute in qi (i = 1, …, m). Starting from the first attribute value, we divide all queries in Q into two branches. For query qi (1 ≤ j ≤ M), if qi[1] = ‘0’, we put qi into the left branch. If qi[1] = ‘1’, it is put into the right branch. However, if qi[1] = ‘?’, we will put it in both left and right branches, showing a quite different behavior from a traditional signature tree construction.

Single Package Design Tree (SPD) 1 S61 S62 S63 S64 S65 S66 S67 S68 S69 S50 S52 S53 S54 S55 S56 S57 S58 S59 S40 S41 S42 S43 S44 S45 S46 S47 S30 S31 S32 S34 S35 S36 S37 S20 S21 S22 S23 S0 S10 S11 S33 S60 S51 S0 = {q1, q2, q3, q4, q5, q6} S10 = {q3, q4, q5, q6} S11 = {q1, q2, q3, q5, q6} S20 = {q3, q4, q5} S21 = {q4, q5} S22 = {q1, q2, q3, q5} S23 = {q1, q5} S30 = {q3, q5} S31 = {q4} S32 = {q6} S33 = {q4, q6} S34 = {q1, q3, q5} S35 = {q2} S36 = {q1, q6} S37 = {q6} S40 = {q3} S41 = {q4, q5} S42 = {q4, q6} S43 = {q4} S44 = {q1, q5} S45 = {q1, q3, q5} S46 = {q1, q6} S47 = {q1} S50 = {q3} S51 = {q3, q5} S52 = {q6} S53 = {q4, q6} S54 = {q5} S55 = {q1, q5} S56 = {q5} S57 = {q1, q3, q5} S58 = {q1} S59 = {q1, q6} S68 = {q1} S69 = {q1, q6} S60 = {q3, q5} S61 = {q3} S62 = {q4} S63 = {q4, q6} S64 = {q1, q5} S65 = {q1} S67 = {q1, q3} S66 = {q1, q3, q5}

Approximate Algorithm with Heuristics Computational complexities of SPD: time complexity: O(2m-1n) space complexity: O(mn) where m = number of attributes in Q, n = number of queries in Q. Approximate algorithm with a general rule to cut off subtrees: If the number of queries in any branch in a subtree is smaller than the number of queries in the candidate result, the subtree should be pruned. 7

Approximate Algorithm with Heuristics Derived rules: If for all the queries represented by a node v, the attribute to be checked contains only ‘0’, or ‘?’, the subtree rooted at the right child of v can be pruned. If for all the queries represented by a node v, the attribute to be checked does not contains ‘0’, and at least one of them contains ‘1’, the subtree rooted at the left child of v can be cut off. If for all the queries represented by a node v, the attribute to be checked contains only ‘?’, we cut off the subtree rooted at the right child of v. (Notice that we can also prune the subtree rooted at the left child of v. But the result will be same.)

Approximate Algorithm with Heuristics Choose the attribute with the minimum number of “?” values to minimize the selection for don’t care. If more than one column contains the same number of ‘?’, we continue to calculate the number of 1s and the number of 0s in them. We select the column in which the number of 1s and the number of 0s are mostly closed to each other to keep the tree balanced.

Approximate Algorithm with Heuristics Example: 1 S20 S21 S10, 2 S11 S0, 3 S20, 5 S11, 1 S22 S23 S31 Step 2: Step 3: S10 Step 1: S0 = {q1, q2, q3, q4, q5, q6} S10 = {q1, q3, q5, q6} S11 = {q2, q4, q6} S20 = {q1, q3, q5} S21 = {q1, q5}   S22 = {q4, q6} S23 = {q2, q6} S31 = {q1, q3, q5}

Approximate Algorithm with Heuristics Example: 1 S20, 5 S21 S10, 2 S11, 1 S0, 3 S22 S23 S31, 1 S41 S41, 4 S51 S51, 6 S60 Step 4: Step 5: Step 6: S41 = {q1, q3, q5} S51 = {q1, q3, q5} S60 = {q1, q3, q5}

Multiple Package Design Tree (MPD) algorithm 5. MPD based on modified signature trees Input: a set of queries Q. Output: a set of packages P satisfying all queries. begin P ← ; while (Q ≠ ) { create the root node v; {P, Q}← ConstructSPD(v, Q, 1); Q ← Q\Q; P = P  P; } end

Experiments Signature tree for SPD - It works in two steps. In the first step, we construct a signature-tree-like structure, call a SPD-tree. Then, in the second step, we search the SPD-tree to find the best popular package. Heuristic signature tree for SPD - The basic algorithm presented in can be dramatically improved by integrating the SPD-tree construction and the SPD-tree search into a single process. By doing this, we can achieve an optimization in both response time and package quality. Heuristic SPD - This algorithm was proposed by Miah [6]. This is in fact an algorithm to find an approximate solution to an NP-complete problem, the so-called MINSAT problem: Given a set U of Boolean variables and a collection of disjunctive clauses over U, a truth assignment was found that minimizes the number of satisfied disjunctive clauses. In [6], this algorithm is referred to as MINSAT HeuristicPD. 13

EXperiments All the experiments are performed on a Sony notebook with a 2.53Ghz Inter Core i3 CPU, with 300 GB hard disk and 8.0GB of memory. The code is written in C++ and run on Windows 7 professional with 32-bit operating system. Real data: 100 customers’ favourites at a Chinese restaurant and surveyed during a large party. The investigation was designed with 10 attributes such as lemon chicken, ginger beef, honey garlic shrimp, broccoli with seafood and so on. The customers respond “yes”, “no”, or “don’t care” to each attribute to provide their preferences. Synthetic data: 10000 queries with up to 30 attributes. Each query is represented by a string with each position being ‘0’, ‘1’, or ‘?’, evenly populated. We may increase the number of‘?’ to obtain different experimental results. 14 14

Experiments (SPD on real data) Test results on real data sets for SPD 15 15

Experiments (SPD on synthetic data) Test results for varying attributes on SPD

Experiments (SPD on synthetic data) Test results for varying query log size on SPD

Experiments (MPD on real data) Test results on real data sets for MPD

Experiments (MPD on synthetic data) Test results on varying attributes for MPD

Experiments (MPD on synthetic data) Test results on varying query log sizes for MPD

Conclusion and Future Work Main contribution - Signature tree based method for SPD - Approximate algorithm for SPD - Approximate algorithm for MPD - Extensive tests Future work - Theoretic analysis on the ratio of the approximate solutions to the optimal solution

Thank you!