Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tag-Cloud Drawing : Algorithms for Cloud Visualization Owen Kaser, University of New Brunswick, Saint John, NB, Canada Daniel Lemire, Universite du Quebeca.

Similar presentations

Presentation on theme: "Tag-Cloud Drawing : Algorithms for Cloud Visualization Owen Kaser, University of New Brunswick, Saint John, NB, Canada Daniel Lemire, Universite du Quebeca."— Presentation transcript:

1 Tag-Cloud Drawing : Algorithms for Cloud Visualization Owen Kaser, University of New Brunswick, Saint John, NB, Canada Daniel Lemire, Universite du Quebeca Montreal Montreal, QC, Canada WWW’07: 16th International World Wide Web Conference

2 Introduction Tag Cloud usually use font size to show the relative importance or frequency of tag. A consequence is wasteful white space that is problematic in small-display device. Clumps of white space are not aesthetically pleasing. Try to optimize the display of tag cloud and place associated tags near one another. Use EDA algorithm, min-cut placement for area minimization and clustering in tag clouds. Use Knuth-Plass algorithm for text justification and a book-placement exercise considered by Skiena.

3 Related Work Tag clouds have been attributed to Coupland but have been popularized by the Web site Flickr. Tag cloud are commonly associated with folksonomies and social software. Graph drawing suggest some metrics that make graph easy to understand and pleasing to eyes.

4 Related Work (Cont.) Other type of tag-cloud display. Hassan-Montero and Herrero-Solana have proposed improving tag-cloud by clustering similar tags together. Millen et al. have proposed that user be dynamically remove able to remove less significant tags and add index in large clouds. Bielenberg has proposed circular clouds, where the most heavily weighted tags appear closer to the center. Dubinko et al. have proposed a model to represent tags over a time line. Russel has proposed cloudalicious, a tool to study the evolution of the tag cloud over time. Jaffe et al. have integrated tag clouds inside maps for displaying tags having geographical information, such as pictures taken at a given location.

5 Related Work (Cont.) Improvement of layout of HTML Hurst et al. showed that it is possible to make HTML table more pleasing. Ongoing work to improve the layout of text in HTML pages using Cascading Style Sheet.

6 Background Typesetting EDA: Physical Design

7 Typesetting Greedy Method Fits as many words per line as possible, starting a new line whenever further words cannot be placed on the current line. This approach used by most browsers. Greedy approach can be done on-line, without waiting the end of paragraph. It is fast but can also produce suboptimal solutions.

8 Typesetting (Cont.) Dynamic Programming Knuth and Plass compute an optimal solution using dynamic programming. TEX system can quickly determine where to break line and fit text onto the page. Their total-fit algorithm minimizes the sum of squares of each line’s badness inline.

9 Typesetting (Cont.) The total-fit algorithm can be summarized excluding hyphenation and penalties. We can compute for all possible j = 1,….,n in time O(n 2 ) and O(n) space. Label the words of a paragraph from 1 to n. b k,j - The badness measure resulting from a line containing words k to j and b k,j = 0 while k > j. t j - The minimal possible sum of square of the line badness when the j th word ends a line and t 0 = 0. K j – For j > 1, the last word of the line prior to the one the one ending with j th word.

10 EDA: Physical Design Electronic design automation (EDA) is the category of tools for designing and producing electronic systems. Placement and floorplanning are two closely related stages during many physical design flows. Mathematically, floorplanning and placement solve the same problem. Floorplanning is often done early in the design stage and gives a “a bird’s eyes” view of the layout. On the other hand, placement is typically done with complete knowledge module shape. Recent tools blur the destinction.

11 EDA: Physical Design (Cont.) Placement Approaches in EDA Placement problem are typically NP-hard. Approaches include force-directed placement, simulated annealing, min-cut placement. For speed, min-cut placement is often chosen.

12 Models For Cloud Optimization Tag Clouds with Inline Text Tag Clouds with Arbitrary Placement Tag Relationships

13 Tag Clouds with Inline Text Inline text is a paragraph (block) made exclusively of inline HTML elements such as span, font, em, b, i, strong, a and br. Any area outside a tag but inside the tag cloud will be referred to as “white”. The primary view has the width and height of each tag fixed. But still can change the height and width of tag by using HTML style or CSS. Do not include a penalty for squeezing tags or spaces. Do not take into account symmetry or homogeneity.

14 Tag Clouds with Inline Text Badness Measuring k : numbers of tags i : ranging from 1 to k h j : height of tag i w j : width of tag i The badness of a line is only a function of the set of tag dimension (w j, h j ). W : normal width of a white space. h : h = max h j

15 Tag Clouds with Inline Text Example 1. We have tags on the line in (width, height) format: (32,14),(45, 16), (24,12). Tag cloud width w is 128 pixels. Expected white-space width of 4 pixels between tags. W = 4. The line height h = max{14,16,12} = 16 extra white space on the line 128 – 32 – 45 – 24 – (2 * 4) = 19 Contributing to the badness by 19 * 16 = 304 The first and last tags have lesser heights than the second tag, and they contribute respectively 32(16-14) + 24(16-12) = 160 Total badness 304 + 160 = 464

16 Tag Clouds with Inline Text In the spirit of the Knuth-Plass total-fit algorithm, we might define the overall badness of a tag cloud as the sum of the squares. (l 2 ) Summing the squares of the badness has the benefit of penalizing more heavily solutions with some very bad lines. (l 1 ) Merely summing the line badness tend to produce shorter clouds (l ∞ ) Minimize the maximum badness across all line might generate very tall clouds.

17 Tag Clouds with Arbitrary Placement Assumption 1. tags may be reordered and placed arbitrarily (but without overlap or rotation) in the plane; 2. tag relationships are known, and strongly related tags should be in close proximity; 3. tag-cloud width has an upper bound; 4. tag-cloud height should be small, to reduce scrolling; 5. (optional) tags may be deformed slightly (made shorter but wider, for instance), so long as tag area remains (nearly) constant; 6. (optional) large clumps of white space are bad.

18 Tag Clouds with Arbitrary Placement (Cont.) There is no analogue to a “line” of tags when arbitrary placement is allowed. We need to sum white area surrounding tags. Another goal is to obtain spatial clustering of semantically related tags. Small values indicated better clustering.

19 Tag Relationships One method of determining tag relationships counts co-occurrences, when a pair of tags have been assigned to the same resource. Another view is that each resource corresponds to a hyperedge in a hypergraph, whose members consist of the tags. For instance, the hyperedge {bottle, gas, beer} from the first view would correspond to the edges { (bottle, gas), (bottle, beer), (gas, beer) } in the second view. We should use graph instead of hypergraph.

20 Tag Relationships (Cont.)

21 Solutions Cloud Layout with Inline Text Cloud Layout with Arbitrary Placement

22 Cloud Layout with Inline Text Apply dynamic programming or shelf-packing. First breed of algorithms : take an ordered list of tags and choose where to break lines. 1.First design a simple greedy method :  Tags are added to line until the line is full and create new line when needed. 2. Then apply Knuth-Plass algorithm except that :  The last line is not an exception: it cannot be half empty without penalty ;  if, and only if, a tag exceeds the maximal width, then it will be given a line of its own; no other overfull lines are allowed.

23 Cloud Layout with Inline Text (Cont.) The second breed of algorithms : attempting to decrease the badness. (NP-hard) Strip packing problem (SPP) (with 10 time randomly shuffling tags) Other heuristic method are based on approximation algorithms for SPP. ◦ NEXT FIT DECREASING HEIGHT (NFDH) ◦ FIRST FIT DECREASING HEIGHT (NFDH) ◦ FIRST FIT DECREASING HEIGHT WEIGHT (NFDHW)

24 Cloud Layout with Arbitrary Placement Min-cut Placement Min-cut placement recursively decomposes a collection of tags by bipartitioning. Then each group is recursively split. Ideally, the bipartition must be fairly balanced The cut size (the number — or perhaps total weight — of edges/hyperedges containing tags in both groups) should be small. There should be an influence of “outside” tags. Min-cut placement can run in O(mlogn) time if we use the Fiduccia-Mattheyeses bipartitioning heuristic.

25 Cloud Layout with Arbitrary Placement (Cont.)

26 Slicing Floorplans Recursive bipartitioning’s effect can be represented in a slicing tree.

27 Cloud Layout with Arbitrary Placement (Cont.) Nested Tables for Slicing Floorplans The Table is either 2x1 or 1x2, denpending whether the slicing-tree node is tagged ‘H’ or ‘V’.

28 Cloud Layout with Arbitrary Placement (Cont.) EDA Placement Is Not (Quite) Tag Placement We can simply feed our tag-cloud data to an EDA placement, but we found it appropriate to modify the EDA tool. Long tags are unusual for EDA. Tags cannot be rotated. Tags do not need to consider wire area. Each tag in the cluster is related to every other tag, and thus dividing them should be much more expensive. Different solution quality levels and running time requirement.

29 Experimental Results Test Data Tag Clouds with In-line Text Tag Clouds with Arbitrary Placement

30 Test Data Tags and their accompanying importance levels (0-9) were obtained from ZoomClouds and Project Gutenberg. On average, clouds had 93 tags. ZoomClouds is Web site using the Yahoo! Content Analysis API. Experiment retreives 65 different tag clouds and normalized the weights with a linear function. Test data were also derived from word co-occurrences in 20 e-books produced by Project Gutenberg. The importance i of tag T was determined as f, r and t are respectively the frequencies of the most frequent tag, the least frequent retained tag, and the tag T.

31 Tag Clouds with In-line Text Alphabetically-sorted tags are, on average, 40% larger than weight-sorted tags. Dynamic programming does not reduce the area of the tag clouds for weight-sorted tags, but offers a reduction of about 3% for alphabetically sorted tags. The random-shuffling algorithm does worse than sorting by weight.

32 Tag Clouds with In-line Text (Cont.) The NFDH heuristic gives about the same average tag-cloud height as does the weight-sorted greedy algorithm. The FFDH and FFDHW heuristics offer an average reduction of about 3% in the height of the ZoomClouds tag clouds, and of 1 % and 2% respectively for the Project Gutenberg tag clouds.

33 Tag Clouds with In-line Text (Cont.) (l ∞ ) can generate unacceptably tall tag clouds (3 times taller than normal). The difference in height between (l 1 ) and (l 2 ) aggregates is well below 1 %. The most competitive algorithms are FFDH, FFDHW and either the greedy or dynamic- programming algorithms applied to weight- sorted tags. if the l 1 norm is chosen, the FFDHW heuristic is the clear winner and dynamic programming is not worth the effort If the sum of squares is preferred, it is a close race.

34 Tag Clouds with In-line Text (Cont.)

35 Tag Clouds with Arbitrary Placement Some changes of EDA algorithms ◦ Modified program to perform graph bipartitioning rather than hypergraph pratitioning. ◦ Fiduccia-Mattheyes heuristic. ◦ Estimating the correct amount of “padding” area is not required for tag placement. ◦ Add an estimate of the absolute width of a floorplan area.

36 Tag Clouds with Arbitrary Placement (Cont.) Results Interestingly, 100-tag is faster than 50-tag. Floorplaning sizing was only small part of over-all time. C-soft shows that unacceptably long runtimes.

37 Experimental Results – Tag Clouds with Arbitrary Placement (Cont.) compaSS has tighter cloud than other algorithms. The sorted greedy heuristic used 2–19% less area than min-cut heuristic. With 200 tags, it is remarkable that the more sophisticated compaSS approach was not as good as the greedy heuristic.

38 Tag Clouds with Arbitrary Placement (Cont.) The min-cut approach clearly (and unsurprisingly) outperformed greedy approaches and compaSS compaSS is apparently better at grouping than the sorted greedy heuristic. This is counterintuitive and reveals a weakness in using Equation 1.

39 Conclusion Future work should include browser-based implementations. For in-line text, our cloud-badness model is probably incomplete since it ignores some basic symmetry issues. Differences between tag-cloud layout and EDA placement, we plan to test an industrial strength min-cut placement tool. The new hyphenate property might encourage the use of slightly more sophisticated line-breaking algorithms in browsers.

Download ppt "Tag-Cloud Drawing : Algorithms for Cloud Visualization Owen Kaser, University of New Brunswick, Saint John, NB, Canada Daniel Lemire, Universite du Quebeca."

Similar presentations

Ads by Google