A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong.

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong

Related work Text Area Detection –Uncompressed domain methods Texture-based Color-based Edge-based –Compressed domain methods DCT coefficients Number of intra-coded blocks on P- / B- frames Text String Localization –Bottom-up scheme –Top-down scheme

Language-independent characteristics Contrast –An adaptive contrast threshold according to the background complexity Color –Color bleeding caused by compression Orientation –Well-defined size and orientation make it easy to understand Stationary location –Appear a certain long time

Language-dependent characteristics EnglishChinese Stroke density roughly similarvaries dramatically Min(Font size) 10-pixel high20-pixel high Min(Aspect ratio) Relatively largeRelatively small Stroke direction statistics mainly vertical vertical horizontal Left diagonal Right diagonal

Workflow Sampling & color space conversion Multi-frame comparison Video text detection and localization on every sampled frame

A sequential multi-resolution paradigm Level = 2 Level = n-1 Original image Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = 1 Edge map Text regions Original coordinates of text regions Size/ f(l) Text area Detection Text string Localization Size  f(l) Level = n Final text regions with original coordinates Edge detection

Text detection Edge detection –Sobel edge detector Local thresholding –Adaptive to background complexity Text-like area recovery –Enhance the density of text areas

Local Thresholding Use a small kernel (gray) to scan the whole edge map row by row. In the bigger window surrounding the kernel, check the background type: “Clear” or “Noisy”. For Clear background and Noisy background, determined the local threshold by low and high parts, respectively, of the edge strength histogram in the bigger window. 3h3h h Window Kernel (a) Concentric kernel and window P1P1 P 3h........ (b) A window on the multi-line text area and the horizontal projection in it. (c) Local threshold selection MAX Count Edge strength 0 Low part High part

Thresholding result comparison Video image Local thresholding resultsGlobal thresholding results

Labeling: Classify current edge pixels as “TEXT” and “NON_TEXT” based on its local density. Recovery/Suppression: –Bring back neighboring lower-strength edge pixels of the TEXT edge pixels. –The NON_TEXT edge pixels are suppressed. Text-like area recovery Before recovery After recovery

Coarse-to-fine Text localization Projection-based top-down localization. To handle complex text layout. Divisible? Horizontal projection Vertical projection Pop the first region from the processing array Add to the processing array Initialization The whole edge map is the only region in the processing array. Add to the resulting text regions Y N Each sub-region The region Sub-regions Indivisible regions Y N If the array is empty, terminate. Divisible? Check aspect ratio Y N Discard false regions

Localization steps (1) (2) (3)(4)

Experimental results

Performance statistics Statistics of 10 news videos: Processing time per frame: 0.25 s ( PIII 1G CPU ) Detection rate = = 93.6% Detection accuracy = = 87.2% Localization accuracy = > 90%

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong.

Similar presentations

Presentation on theme: "A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong.

Similar presentations

Presentation on theme: "A New Approach for Video Text Detection and Localization M. Cai, J. Song and M.R. Lyu VIEW Technologies The Chinese University of Hong Kong."— Presentation transcript:

Similar presentations

About project

Feedback