Presentation is loading. Please wait.

Presentation is loading. Please wait.

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong.

Similar presentations


Presentation on theme: "A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong."— Presentation transcript:

1 A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong

2 Related work Text Area Detection –Uncompressed domain methods Texture-based Color-based Edge-based –Compressed domain methods DCT coefficients Number of intra-coded blocks on P- / B- frames Text String Localization –Bottom-up scheme –Top-down scheme

3 Language-independent characteristics Contrast –An adaptive contrast threshold according to the background complexity Color –Color bleeding caused by compression Orientation –Well-defined size and orientation make it easy to understand Stationary location –Appear a certain long time

4 Language-dependent characteristics EnglishChinese Stroke density roughly similarvaries dramatically Min(Font size) 10-pixel high20-pixel high Min(Aspect ratio) Relatively largeRelatively small Stroke direction statistics mainly vertical vertical horizontal Left diagonal Right diagonal

5 Workflow Sampling & color space conversion Multi-frame comparison Video text detection and localization on every sampled frame

6 A sequential multi-resolution paradigm Level = 2 Level = n-1 Original image Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = 1 Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = n Final text regions with original coordinates Edge detection

7 Text detection Edge detection –Sobel edge detector Local thresholding –Adaptive to background complexity Text-like area recovery –Enhance the density of text areas

8 Local Thresholding Use a small kernel (gray) to scan the whole edge map row by row. In the bigger window surrounding the kernel, check the background type: “Clear” or “Noisy”. For Clear background and Noisy background, determined the local threshold by low and high parts, respectively, of the edge strength histogram in the bigger window. 3h3h h Window Kernel (a) Concentric kernel and window P1P1 P 3h........ (b) A window on the multi-line text area and the horizontal projection in it. (c) Local threshold selection MAX Count Edge strength 0 Low part High part

9 Thresholding result comparison Video image Local thresholding resultsGlobal thresholding results

10 Labeling: Classify current edge pixels as “TEXT” and “NON_TEXT” based on its local density. Recovery/Suppression: –Bring back neighboring lower-strength edge pixels of the TEXT edge pixels. –The NON_TEXT edge pixels are suppressed. Text-like area recovery Before recovery After recovery

11 Coarse-to-fine Text localization Projection-based top-down localization. To handle complex text layout. Divisible? Horizontal projection Vertical projection Pop the first region from the processing array Add to the processing array Initialization The whole edge map is the only region in the processing array. Add to the resulting text regions Y N Each sub-region The region Sub-regions Indivisible regions Y N If the array is empty, terminate. Divisible? Check aspect ratio Y N Discard false regions

12 Localization steps (1) (2) (3)(4)

13 Experimental results

14

15 Performance statistics Statistics of 10 news videos: Processing time per frame: 0.25 s ( PIII 1G CPU ) Detection rate = = 93.6% Detection accuracy = = 87.2% Localization accuracy = > 90%


Download ppt "A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong."

Similar presentations


Ads by Google