Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Similar presentations


Presentation on theme: "Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."— Presentation transcript:

1 Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

2 © Prof. Rolf Ingold 2 Outline  Objectives of layout analysis  Classification of layout analysis methods  Top down methods  Run length smearing algorithm  Bottom-up methods  Connected component extraction  A model driven approach  Conclusions

3 © Prof. Rolf Ingold 3 Objectives of layout analysis and segmentation  The role of segmentation is to split a document image into regions of interest  Regions of interest may be of different granularity levels: graphics or text blocs, text lines, words, characters  The goal of layout analysis is to get a hierarchical description of segmented objects

4 © Prof. Rolf Ingold 4 Segmentation strategies  Segmentation produces a hierarchy of physical objects  Two strategies can be used  top-down segmentation: starting with the entire image, split it recursively down to elementary shapes  bottom-up segmentation: starting at pixel level, detect connected components and group them hierarchically  Hybrid methods combine both strategies  Segmentation methods can be  data-driven using only data properties (without contextual knowledge)‏  model-driven, i.e., using contextual knowledge

5 © Prof. Rolf Ingold 5 Top-down methods  Top-down methods decompose the entire page into a hierarchy of  rectangular regions  Top-down approaches perform recursive XY-cuts  horizontal and vertical projection profile analysis  white streams (spaces) analysis  run length smoothing algorithm (RLSA)‏

6 © Prof. Rolf Ingold 6 Recursive XY-Cut  The page is cut alternatively horizontally and vertically according to white spaces  Robust for most printed modern documents  Supposes page images to be unskewed  Does not work for all kind of layouts  Non rectangular formatting  Complex mosaics (illustration next)  Resulting hierarchy may not reflect the natural structure (illustration below)‏

7 © Prof. Rolf Ingold 7 Top-Down Segmentation  Recursive splitting can be performed by horizontal and vertical profile analysis  images need to be "unskewed" !

8 © Prof. Rolf Ingold 8 Top-Down Segmentation (2)  Order in which X-Y cuts are performed is critical

9 © Prof. Rolf Ingold 9 White streams analysis  Principle: detect maximal rectangular white blocs  split regions recursively according to thresholds

10 © Prof. Rolf Ingold 10 Run Length Smearing Algorithm (RLSA)‏  The Run Length Smearing Algorithm (RLSA) is a morphological operator  it replaces white runs that are smaller or equal to a given threshold by black runs  it can be applied horizontally as well as vertically

11 © Prof. Rolf Ingold 11 RLSA based segmentation  RLSA can be used to segment a page into blocs using three steps  applied horizontally  applied vertically  combined by logical and operator  Threshold values are critical and have to be chosen  according to document class  using statistical white space analysis

12 © Prof. Rolf Ingold 12 Bottom-up methods  Bottom-up methods start at pixel levels and groups them together in a hierarchy of  multi-rectangular regions (shapes delimited by horizontal and vertical segments)‏  arbitrary shapes  Bottom up methods use  connected component extraction  region grouping

13 © Prof. Rolf Ingold 13 Connected components  In a binary image, a connected component is a set of black pixels connected by 4- or 8-adjacency five 4-connected componentstwo 8-connected components

14 © Prof. Rolf Ingold 14 Extraction of connected components  Connected components can be extracted by different algorithms  By a one pass full image scanning process, from top to bottom and from left to right  By a border following algorithm, using as first pixel a border pixel supposed to be known

15 © Prof. Rolf Ingold 15 Scanning based CC Extraction for each scan line l y for each black run r if on line l y-1 there is no run k-adjacent to r create a new component containing r else if on line l y-1 there exist one run r’ k-adjacent to r add r to the component containing r’ else if on line l y-1 there exist several runs r i k-adjacent to r merge all components containing such a r i add r to that component merge

16 © Prof. Rolf Ingold 16 PQ d R2R2 Border following algorithm consider P 0  S having a 4-neighbor Q 0  S P ← P 0 ; Q ← Q 0 ; d ← direction of Q according to P ; repeat let R i be the neighbor of P in direction (d+i) mod 8 if R 2  S then Q ← R 2 ; d ← (d+2) mod 8; else if R 1  S then P ← R 2 ; Q ← R 1 ; else P ← R 1 ; d ← (d  2) mod 8; add P to the contour until P = P 0 and Q = Q 0 P Q d R2R2 R1R1

17 © Prof. Rolf Ingold 17 Illustration of connected components

18 © Prof. Rolf Ingold 18 Connected components from RLSA  Connected components can be used to detect characters  Word can be located using RLSA

19 © Prof. Rolf Ingold 19 Grouping components  Grouping connected components is non trivial  Grouping rules are based on  relative positioning  distances and thresholds  component classification  Parameters can be estimated statistically

20 © Prof. Rolf Ingold 20 Allen's relations in 2D space  Relative positioning of two rectangles generate 169 configurations !

21 © Prof. Rolf Ingold 21 Threshold estimation  Thresholds can be estimated on statistical distributions of  horizontal spaces for character grouping into words and word grouping into text lines  vertical spacing for grouping text lines into text blocs

22 © Prof. Rolf Ingold 22 Distributions of component sizes  Components can be classified into  symbols  letters  hairlines  punctuation according to their size

23 © Prof. Rolf Ingold 23 Region grouping

24 © Prof. Rolf Ingold 24 Docstrum  The docstrum method [O'Gorman] is using a graph that connects each connected component to its k closest neighbors

25 © Prof. Rolf Ingold Model driven layout analysis [Azokly95]

26 © Prof. Rolf Ingold Component hierarchy

27 © Prof. Rolf Ingold Generic macrostructures  In a model-driven approach, generic macrostructures are used  a formal language describes margins and separators

28 © Prof. Rolf Ingold Formal description of macrostructures VOLUME Article IS WIDTH = 160; HEIGHT = 240; PAGE Garde IS... END; PAGE Paire IS HSEP hs1 = (4, 3, LEFT, RIGHT, BLANK); LAYER Principal IS VSEP vs1 = (40, 65, TOP, hs1, BLANK); VSEP vs2 = ([50,60], 4, hs1, BOTTOM, BLANK); REGION Centre = (vs2, RIGHT, hs1, BOTTOM, ANY, NORMAL); REGION Marge = (LEFT, vs2, hs1, BOTTOM, TEXT, SMALL);... END; LAYER Secondaire IS HSEP hs2 = ([10,220], 2, LEFT, RIGHT, BLANK) SUBST hs1; HSEP hs3 = ([20,240], 2, LEFT, RIGHT, BLANK) SUBST BOTTOM; REGION Figure = (LEFT, RIGHT, hs2, hs3, {TABLE, GRAPHICS}); END; PAGE Impaire IS... END; END;

29 © Prof. Rolf Ingold Evaluation of segmentation results  Segmentation is rarely perfect; it generates  undersegmentation: real components are merged  oversegmentation: a single component is split  Special metrics have been developed to evaluate a segmentation result  In ICDAR'03 and ICDAR'05 scientific contests were organized

30 © Prof. Rolf Ingold Conclusion  Segmentation is a crucial step in document analysis  Segmentation is almost solved for  printed documents with regular layout  form analysis  Results are rarely perfect  Contextual knowledge may improve the results  Advanced pattern recognition method are required  Segmentation remains an open problem for uncontrolled handwriting and graphical documents


Download ppt "Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."

Similar presentations


Ads by Google