On the Designing of Popular Packages

Slides:

Advertisements

Similar presentations

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna

BackTracking Algorithms

Fast Algorithms For Hierarchical Range Histogram Constructions

Chapter 4: Trees Part II - AVL Tree

Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.

Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.

1 Boolean Satisfiability in Electronic Design Automation (EDA ) By Kunal P. Ganeshpure.

CS2420: Lecture 27 Vladimir Kulyukin Computer Science Department Utah State University.

Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.

B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.

Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.

Optimal binary search trees

Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.

Exploring Personal CoreSpace For DataSpace Management Li Yukun and Xiaofeng Meng WAMDM Lab Renmin University of China.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Boolean Minimizer FC-Min: Coverage Finding Process Petr Fišer, Hana Kubátová Czech Technical University Department of Computer Science and Engineering.

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

Presentation Template KwangSoo Yang Florida Atlantic University College of Engineering & Computer Science.

Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.

Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.

Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina

1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.

Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.

Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.

Distributed Control and Autonomous Systems Lab. Sang-Hyuk Yun and Hyo-Sung Ahn Distributed Control and Autonomous Systems Laboratory (DCASL ) Department.

1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Chapter 10 NP-Complete Problems.

Hybrid BDD and All-SAT Method for Model Checking

Towards Scalable Traffic Management in Cloud Data Centers

Evolutionary Technique for Combinatorial Reverse Auctions

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Database Management System

Priority Queues An abstract data type (ADT) Similar to a queue

The Greedy Method and Text Compression

Parallel Density-based Hybrid Clustering

RE-Tree: An Efficient Index Structure for Regular Expressions

Privacy Preserving Subgraph Matching on Large Graphs in Cloud

The Greedy Method and Text Compression

Byung Joon Park, Sung Hee Kim

Chapter 12: Query Processing

Spatial Online Sampling and Aggregation

Pyramid Sketch: a Sketch Framework

Analysis and design of algorithm

Data Structures and Algorithms

CS222P: Principles of Data Management Notes #11 Selection, Projection

Objective of This Course

Conflict-Aware Event-Participant Arrangement

Branch and Bound.

Machine Learning: Lecture 3

Sungho Kang Yonsei University

Chapter 11 Limitations of Algorithm Power

Algorithms and Problem Solving

Priority Queues An abstract data type (ADT) Similar to a queue

Canonical Computation without Canonical Data Structure

CS222: Principles of Data Management Notes #11 Selection, Projection

Backtracking and Branch-and-Bound

Canonical Computation without Canonical Data Structure

Decision Trees Jeff Storey.

Fraction-Score: A New Support Measure for Co-location Pattern Mining

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.

Presentation transcript:

On the Designing of Popular Packages Yangjun Chen and Wei Shi Department of Applied Computer Science University of Winnipeg

Outline Motivation - Data mining - Most popular packages Signature trees and modified signature trees Single package design (SPD) - Basic algorithm - Heuristic signature tree method Multiple package design (MPD) Experiments Conclusion and Future Work

Motivation Given a query log concerning the customers’ preference on items or activities, design a package which satisfies as many customers as possible. A query log by an travel agency: Most popular packge: Hot spring Hiking airlines QueryId Hot Spring Ride Glacier Hiking Airline Boating Q1 1 ? Q2 Q3 Q4 Q5 Q6 It satisfies three queries: Q1, Q3, Q5

Signature Files and Signature Trees 0 0 1 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 s1: s2: s3: s4: s5: s6: s7: 1 2 4 5 7 s1 s6 s2 s7 s3 s4 s5 1 1 1 1 1 1 Each path represents the identifier of a signature. Identifier(s7): (1, 1)(4, 1)(5, 0)(7, 0) Y. Chen and Y.B. Chen, On the Signature Tree Construction and Analysis, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 9, 2006, pp. 1207-1224.

Modified signature tree over a query log Construction of single package signature tree (SPD): Let Q = {q1, …, qm} be a query log. We use qi[j] to represent the value of the jth attribute in qi (i = 1, …, m). Starting from the first attribute value, we divide all queries in Q into two branches. For query qi (1 ≤ j ≤ M), if qi[1] = ‘0’, we put qi into the left branch. If qi[1] = ‘1’, it is put into the right branch. However, if qi[1] = ‘?’, we will put it in both left and right branches, showing a quite different behavior from a traditional signature tree construction.

Single Package Design Tree (SPD) 1 S61 S62 S63 S64 S65 S66 S67 S68 S69 S50 S52 S53 S54 S55 S56 S57 S58 S59 S40 S41 S42 S43 S44 S45 S46 S47 S30 S31 S32 S34 S35 S36 S37 S20 S21 S22 S23 S0 S10 S11 S33 S60 S51 S0 = {q1, q2, q3, q4, q5, q6} S10 = {q3, q4, q5, q6} S11 = {q1, q2, q3, q5, q6} S20 = {q3, q4, q5} S21 = {q4, q5} S22 = {q1, q2, q3, q5} S23 = {q1, q5} S30 = {q3, q5} S31 = {q4} S32 = {q6} S33 = {q4, q6} S34 = {q1, q3, q5} S35 = {q2} S36 = {q1, q6} S37 = {q6} S40 = {q3} S41 = {q4, q5} S42 = {q4, q6} S43 = {q4} S44 = {q1, q5} S45 = {q1, q3, q5} S46 = {q1, q6} S47 = {q1} S50 = {q3} S51 = {q3, q5} S52 = {q6} S53 = {q4, q6} S54 = {q5} S55 = {q1, q5} S56 = {q5} S57 = {q1, q3, q5} S58 = {q1} S59 = {q1, q6} S68 = {q1} S69 = {q1, q6} S60 = {q3, q5} S61 = {q3} S62 = {q4} S63 = {q4, q6} S64 = {q1, q5} S65 = {q1} S67 = {q1, q3} S66 = {q1, q3, q5}

Approximate Algorithm with Heuristics Computational complexities of SPD: time complexity: O(2m-1n) space complexity: O(mn) where m = number of attributes in Q, n = number of queries in Q. Approximate algorithm with a general rule to cut off subtrees: If the number of queries in any branch in a subtree is smaller than the number of queries in the candidate result, the subtree should be pruned. 7

Approximate Algorithm with Heuristics Derived rules: If for all the queries represented by a node v, the attribute to be checked contains only ‘0’, or ‘?’, the subtree rooted at the right child of v can be pruned. If for all the queries represented by a node v, the attribute to be checked does not contains ‘0’, and at least one of them contains ‘1’, the subtree rooted at the left child of v can be cut off. If for all the queries represented by a node v, the attribute to be checked contains only ‘?’, we cut off the subtree rooted at the right child of v. (Notice that we can also prune the subtree rooted at the left child of v. But the result will be same.)

Approximate Algorithm with Heuristics Choose the attribute with the minimum number of “?” values to minimize the selection for don’t care. If more than one column contains the same number of ‘?’, we continue to calculate the number of 1s and the number of 0s in them. We select the column in which the number of 1s and the number of 0s are mostly closed to each other to keep the tree balanced.

Approximate Algorithm with Heuristics Example: 1 S20 S21 S10, 2 S11 S0, 3 S20, 5 S11, 1 S22 S23 S31 Step 2: Step 3: S10 Step 1: S0 = {q1, q2, q3, q4, q5, q6} S10 = {q1, q3, q5, q6} S11 = {q2, q4, q6} S20 = {q1, q3, q5} S21 = {q1, q5} S22 = {q4, q6} S23 = {q2, q6} S31 = {q1, q3, q5}

Approximate Algorithm with Heuristics Example: 1 S20, 5 S21 S10, 2 S11, 1 S0, 3 S22 S23 S31, 1 S41 S41, 4 S51 S51, 6 S60 Step 4: Step 5: Step 6: S41 = {q1, q3, q5} S51 = {q1, q3, q5} S60 = {q1, q3, q5}

Multiple Package Design Tree (MPD) algorithm 5. MPD based on modified signature trees Input: a set of queries Q. Output: a set of packages P satisfying all queries. begin P ← ; while (Q ≠ ) { create the root node v; {P, Q}← ConstructSPD(v, Q, 1); Q ← Q\Q; P = P  P; } end

Experiments Signature tree for SPD - It works in two steps. In the first step, we construct a signature-tree-like structure, call a SPD-tree. Then, in the second step, we search the SPD-tree to find the best popular package. Heuristic signature tree for SPD - The basic algorithm presented in can be dramatically improved by integrating the SPD-tree construction and the SPD-tree search into a single process. By doing this, we can achieve an optimization in both response time and package quality. Heuristic SPD - This algorithm was proposed by Miah [6]. This is in fact an algorithm to find an approximate solution to an NP-complete problem, the so-called MINSAT problem: Given a set U of Boolean variables and a collection of disjunctive clauses over U, a truth assignment was found that minimizes the number of satisfied disjunctive clauses. In [6], this algorithm is referred to as MINSAT HeuristicPD. 13

EXperiments All the experiments are performed on a Sony notebook with a 2.53Ghz Inter Core i3 CPU, with 300 GB hard disk and 8.0GB of memory. The code is written in C++ and run on Windows 7 professional with 32-bit operating system. Real data: 100 customers’ favourites at a Chinese restaurant and surveyed during a large party. The investigation was designed with 10 attributes such as lemon chicken, ginger beef, honey garlic shrimp, broccoli with seafood and so on. The customers respond “yes”, “no”, or “don’t care” to each attribute to provide their preferences. Synthetic data: 10000 queries with up to 30 attributes. Each query is represented by a string with each position being ‘0’, ‘1’, or ‘?’, evenly populated. We may increase the number of‘?’ to obtain different experimental results. 14 14

Experiments (SPD on real data) Test results on real data sets for SPD 15 15

Experiments (SPD on synthetic data) Test results for varying attributes on SPD

Experiments (SPD on synthetic data) Test results for varying query log size on SPD

Experiments (MPD on real data) Test results on real data sets for MPD

Experiments (MPD on synthetic data) Test results on varying attributes for MPD

Experiments (MPD on synthetic data) Test results on varying query log sizes for MPD

Conclusion and Future Work Main contribution - Signature tree based method for SPD - Approximate algorithm for SPD - Approximate algorithm for MPD - Extensive tests Future work - Theoretic analysis on the ratio of the approximate solutions to the optimal solution

Thank you!