Presenter: Ibrahim A. Zedan

Presenter: Ibrahim A. Zedan
Caption Detection, Localization and Type Recognition in Arabic News Video Presenter: Ibrahim A. Zedan

Agenda Introduction Types of Text in Videos Text Detection Methods
Proposed System Caption Detection & Localization Caption Types Identification Experimental Results Conclusion and Future Work

Introduction

News Video Importance The quick development of video data leads to a dire demand for efficient video content analysis, indexing and retrieval systems. News Video Bring us live pictures of everyday events. Plays an essential role in information transmission. Contains a set of semantically independent stories.

Video Indexing Systems
Utilizing one or a combination of image, audio, and textual information in the videos. Especially text present in the video can provide important information for indexing and retrieval.

Text Extraction Phases

Text Detection Challenges
Complex Background. Low Resolution. The text available in a video may be in different languages. For every language text nature varies.

Video Text Types

Types of Text in Videos

Scene Text Happens naturally in videos and is part of the frames background. Can be of various alignments, sizes, and Styles. Influenced by lighting conditions and distortions caused by camera point of view.

Graphic Text Artificially embedded to the video during the editing stage. Inserted in the original image background.

Caption Text Artificially embedded to the video during the editing stage. Inserted in a more simplified background.

Caption Text Importance
Captions in video provide highly concise information about the contents of the video compared with scene text . In news video, caption can often be seen at the beginning of each story unit, so can be indicator for segmenting news video into stories.

Caption Properties The caption position is in the scope of 1/3 from the screen bottom. The caption background color is eye-attractive. Text characters colors are often very discernable from the background color.

Caption Properties cont.
Caption shape is regularly a rectangle. The caption stays in the screen for no less than a few seconds. Text lie horizontally.

Text Detection Methods

Color-Based Methods Assumes that text pixels in the same region have similar colors intensities. Detect text by extracting the monotone colors connected components that comply with certain constraints. Have difficulties when text is inserted in complex background.

Edge-Based Methods Based on the observation that text regions are rich of edges because of the high contrast between the text and its background. Generate false alarms in the case of texture-rich images. Fail to detect text areas in case of large or blurred text.

Texture-Based Methods
Considers text as a special texture and thus, extract text by utilizing conventional texture classification methods. Require extensive training. Sensitive to font styles and sizes.

Correlation-Based Methods
Utilize any correlation method to decide if a pixel belongs to a text area or not.

Combining Several Methods
Combine the advantages of each method. Eliminate the post processing step. In the sequential processing strategy Processing is hierarchical and faster. In parallel processing strategy Processing of all methods on complete images is required. The results are then merged using weighting coefficients.

Multiple Frame Integration (MFI)
Widely used in caption detection. Frames backgrounds usually have movement Caption position is steady across frames Handle background complexity problem. Applying MFI before text detection procedure. -> Reduce the influence of the image background..

Proposed System

(a) Caption Part. (b) Histogram of the Caption Part.
Caption Nature (a) Caption Part. (b) Histogram of the Caption Part.

Caption Appearance Patterns

Flowchart of the Proposed Method

Caption Detection & Localization
Based on: Captions introduce horizontal lines represents the caption boundary. Horizontal lines are fixed in their location over several frames represent the caption duration that is always long. So System is decomposed in two steps: Horizontal lines detection. Frames clustering.

Horizontal Lines Detection

Horizontal Lines Detection cont.
Convert RGB video frame to gray scale using HSI color model. Work on the intensity or luminance image.

Considering only the bottom 1/3 of the image. Get the canny edge map of this part.

Scanning the edge map row by row and counting the number of edge points in each row denoted by 𝑁 𝑖 .

A row i is detected as horizontal line if its edge points count 𝑁 𝑖 is a peak satisfy the following 2 conditions: 𝑁 𝑖 > T1 * frame width (1) Minimum peak separation >= T2 * frame height (2)

N( 𝐶 𝑘 ) >= T3 * frames rate (3)
Frames Clustering Detect the list of horizontal lines for all the video frames. The purpose: group the frames that have the same number of horizontal lines and similar horizontal lines location. After establishing all clusters, delete the clusters that did not satisfy the following condition: N( 𝐶 𝑘 ) >= T3 * frames rate (3)

Caption Types Identification
Observing the normalized inter-frame edge map difference. 𝑑𝑖𝑓𝑓 𝑖 = 𝑥= 𝑙 𝑖 𝑙 𝑗 𝑦=1 𝑓𝑟𝑎𝑚𝑒 𝑤𝑖𝑑𝑡ℎ 𝐸 𝑖 𝑥,𝑦 − 𝐸 𝑖−1 𝑥,𝑦 𝑥= 𝑙 𝑖 𝑙 𝑗 𝑦=1 𝑓𝑟𝑎𝑚𝑒 𝑤𝑖𝑑𝑡ℎ 𝐸 𝑖 (𝑥,𝑦) (4) 𝐸 𝑖 , 𝐸 𝑖−1 denotes the canny edge map of frame i and frame i-1 respectively. 𝑙 𝑖 , and 𝑙 𝑗 denotes the upper and lower bounds of the interested caption to be checked.

Caption Types Identification cont.
Algorithm input Caption frame cluster. The upper and lower bounds of the caption to be tested. For each frame in the Cluster, calculate the normalized inter-frame edge difference using equation 4. Calculate The average edge difference for the cluster. If the average edge difference > T4 Caption type -> horizontal scrolling caption

Caption Types Identification cont.
Otherwise Check each frame difference and create a flag f(i) 𝑓 𝑖 = 1, &𝑑𝑖𝑓𝑓(𝑖)≤𝑇4 0, &𝑑𝑖𝑓𝑓(𝑖)>𝑇 (5) Move a window of length N on the frames. If all the window have f(i)=1 Mark this frame state as static . else Mark its state as transition. All the frames that have the same state are grouped. If transition group surrounded with static groups . Vertical transition group. Appearance or disappearance group.

Experimental Results

Similar Data Set AcTiV AcTiV-D
Designed to assess the performance of different Arabic VIDEO-OCR systems. AcTiV-D A sub-dataset of non-redundant frames collected from the AcTiV-DB. Used to measure the performance of single-frame based detection/localization method.

Similar Data Set cont. Zayene, O., S. M. Touj, J. Hennebert, R. Ingold, and N. E. Ben Amara, "Semi-automatic news video annotation framework for Arabic text", Image Processing Theory, Tools and Applications (IPTA), th International Conference on, pp. 1-6, Oct, 2014. O. Zayene, J. Hennebert, S. M. Touj, R. Ingold, and N. E. Ben Amara, "A Dataset for Arabic text Detection, Tracking and Recognition in News Videos─ AcTiV”, International Conference on Document Analysis and Recognition (ICDAR), August 2015.

Manual Caption Marker Tool

Our Data Set vs AcTiV-D

Dimensions width*height available Caption types
Our Data Set cont. Source Dimensions width*height No. of frames available Caption types Egyptian channel 1 600*480 4402 Vertical scrolling and static Nile News 534*480 6564 Horizontal and vertical scrolling Al Jazeera 316*240 7889 Horizontal scrolling and static

Caption Detection / Localization Results
The parameters values are empirically determined T1=0.33, T2=0.05, T3=1, T4=0.4, N=10 Proposed system achieves accuracy of 0.975%, insertion error of 0.007% and deletion error of 0.018% 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦= 𝑁 𝑆𝑇 𝑁 𝐺 (6) 𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟= 𝑁 𝑆𝐹 𝑁 𝑆 (7) 𝑑𝑒𝑙𝑒𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟= 𝑁 𝑆𝑀 𝑁 𝑆 (8)

Caption Types Identification Static Caption

Caption Types Identification Horizontal Scrolling Caption

Caption Types Identification Vertical Scrolling Caption

Conclusion and Future Work

Conclusion Addressing the problem of translucent background captions.
Dealing with news videos with multiple captions. Dealing with different patterns of appearance and disappearance of captions in news video. Detecting the captions types.

Future Work Integrating this work in a news video indexing /summarization system. Better Overcome of the problem of detecting text base line as horizontal lines of the caption boundaries. Add a caption verification component. Add a caption localization in the x-coordinate component.

Cite this paper as Zedan, I.A., Elsayed, K.M., Emary, E.: Caption detection, localization and type recognition in Arabic news video. In: Proceedings of the 10th International Conference on Informatics and Systems (INFOS 2016), pp. 114–120, Cairo, Egypt, 9–11 May 2016

Questions ?

Presenter: Ibrahim A. Zedan

Similar presentations

Presentation on theme: "Presenter: Ibrahim A. Zedan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presenter: Ibrahim A. Zedan

Similar presentations

Presentation on theme: "Presenter: Ibrahim A. Zedan"— Presentation transcript:

Similar presentations

About project

Feedback