Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1.

Similar presentations


Presentation on theme: "Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1."— Presentation transcript:

1 Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1

2 Uncertain Data Set The Summarization Problem 2 location (e.g. LA) face (e.g. Jeff, Kate) visual concepts (e.g. water, plant, sky) Extractive Abstractive O1O1 O8O8 O 11 O 25 Kate Jeff wedding at LA O1O1 O2O2 OnOn …

3 Modeling Information Summarization Process What information does this image contain? Extract best subset … dataset summary Metrics? - Coverage Agrawal, WSDM’09; Li, WWW’09; Liu, SDM‘’09; Sinha, WWW’11 - Diversity Vee, ICDE’08; Ziegler, WWW’05 - Quality Sinha, WWW’11 3 information object

4 Existing Techniques 4 Kennedy et al. WWW’08 Simon et al. ICCV’07 Sinha et al. WWW’11 Hu et al. KDD’04 Ly et al. CoRR’11 Inouye et al. SocialCom ’11 Li et al. WWW’09 Liu et al. SDM’09 Do not consider information in multiple attributes Do not deal with uncertain data image customer reviewdoc/micro-blog

5 Challenges Design a summarization framework for  Multi-attribute Data  Uncertain/Probabilistic Data. 5 visual concept face tags location time event visual concepts P(sky) = 0.7, P(people) = 0.9 data processing (e.g. vision analysis)

6 Existing techniques typically model & summarize a single information dimension Limitations of existing techniques - 1 Summarize only information about visual content (Kennedy et al. WWW’08, Simon et al. ICCV’07) Summarize only information about review content (Hu et al. KDD’04, Ly et al. CoRR’11) 6

7 What information is in the image? 7 {sky}, {plant}, … {Kate}, {Jeff} {wedding} {12/01/2012} {Los Angeles} Elemental IU Is that all? {Kate, Jeff} {sky, plant} … Intra-attribute IU Even more information from attributes? {Kate, LA} Inter-attribute IU {Kate, Jeff, wedding} …

8 Are all information units interesting? Is {Sharad, Mike} an interesting intra-attribute IU? Yes, they often have coffee together and appear frequently in other photos Are all of the 2 n combinations of people interesting? Shall we select a summary that covers all these information? Well, probably not! I don’t care about person X and person Y who happen to be together in the photo of this large group. 8 Is {Liyan, Ling} interesting? Yes from my perspective, because they are both my close friends

9 Mine for interesting information units O1O1 face {Jeff, Kate} O2O2 face {Tom} O3O3 face {Jeff, Kate, Tom} O4O4 face {Kate, Tom} O5O5 face {Jeff, Kate} … OnOn face {Jeff, Kate} T1T1 T2T2 T3T3 T4T4 T5T5 … TnTn Modified Item-set mining algorithm frequent correlated {Jeff, Kate} 9

10 Mine for interesting information units O1O1 face {Jeff, Kate} O2O2 face {Jeff} O3O3 face {Jeff, Kate, Tom} O4O4 face {Kate, Tom} O5O5 face {Jeff, Kate} … OnOn face {Jeff, Kate} 10 Mine from social context (e.g. Jeff is friend of Kate, Tom is a close friend of the user) {Jeff, Kate} {Tom}

11 Can not handle probabilistic attributes Limitation of existing techniques – 2 … dataset summary P( Jeff ) = 0.8 P(Jeff) = 0.6 Not sure whether an object covers an IU in another object ? 11 objects IU n n 3

12 Deterministic Coverage Model --- Example 12 Coverage = 8 / 14 dataset summary information object

13 Probabilistic Coverage Model 13 Expected amount of information covered by S Expected amount of total information Simplify to compute efficiently Can be computed in polynomial time The function is sub-modular

14 Optimization Problem for summarization Parameters :  dataset O = { o 1, o 2, · · ·, o n }  positive number K Finding summary with Maximum Expected Coverage is NP- hard. We developed an efficient greedy algorithm to solve it. 14

15 For each object o in O \ S, Compute hkjhkhk Basic Greedy Algorithm Expensive to compute Cov. It is (Object-level optimization) Too many operations of computing Cov. (Iteration-level Optimization) 15 Initialize S = empty set Select o* with max Yes No done

16 Efficiency optimization – Object-level Reduce the time required to compute the coverage for one object  Instead of directly compute and optimize coverage in each iteration, compute the gain of adding one object o to summary S gain(S,o) = -  Updating gain(S,o) is much more efficient ( ) 16

17 Submodularity of Coverage Expected Coverage Cov(S,O) is submodular: 17 Cov(S, O) Cov(S ∪ o, O) – Cov(S, O) Cov(T, O) Cov(T ∪ o) - Cov(T, O)

18 Efficiency optimization – Iteration-level Reduce the number of object-level computations (i.e. gain(S, o ) ) in each iteration of the greedy process While traversing objects in O \ S, we maintain  the maximum gain so far gain*.  an upper bound Upper(S, O ) on gain(S, o ). For any  prune an object o if Upper(S, o ) < gain*. By definition By submodularity 18 Update in constant time

19 Experiment -- Datasets Facebook Photo Set 200 photos uploaded by 10 Facebook users Review Dataset Reviews about 10 hotels from TripAdvisor. Each hotel has about 250 reviews on average. Flickr Photo Set 20,000 photos from Flickr. 19 visual concept event time face visual concept facets rating visual event time

20 Experiment – Quality 20

21 Experiment – Efficiency 21 Basic greedy algorithm without optimization runs more than 1 minute

22 Summary 22 Developed a new extractive summarization framework  Multi-attribute data.  Uncertain/Probabilistic data.  Generates high-quality summaries.  Highly efficient.

23 23


Download ppt "Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1."

Similar presentations


Ads by Google