Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing.

Similar presentations


Presentation on theme: "1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing."— Presentation transcript:

1 1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing algorithms; A remotely located researcher may want to access both original and corrected versions of a document; We want to avoid sending the document twice, since both versions are very similar. RectoVerso

2 3. Algorithm Details We assume that the continuous recto and verso image coordinate frames are related by a six-parameter affine transformation We search for a parameter vector that gives the best match between the recto and the transformed flipped verso, in the least-squares sense We identify the registered verso image Registration

3 4. Joint Compression Based on existing standards Original, uncorrected image compressed with standard efficient compression scheme such as JPEG or JPEG 2000 Segmentation map compressed using efficient bilevel compression scheme, such as JBIG or JBIG2 Additional information for inpainting transmitted as side information ++  4.6Mbit131 kbit

4 2. Bleed-through Removal We assume the existence of underlying recto and verso images without bleed-though. These consist of the background, with the writing, superimposed. These ideal recto and verso images are combined in some way to produce the observed recto and verso images corrupted with bleed- through (see above). In general, the scanned recto and verso images (with bleed-through) will not be aligned. Recto and flipped verso images superimposed Model

5 Segmentation We segment each side of the document into the four regions R1-R4. However, it is most important to correctly identify region R2, ‘bleed-through only’. If we miss some parts of R2, bleed- through will remain. If the label R2 is incorrectly assigned to some parts of R1, ‘foreground only’ or R4 ‘foreground and bleed-through’, then parts of the desired writing will be erased. 1.We first identify points that can be considered to definitely be background (R3), because they are lighter than a certain threshold. 2.We then identify points that can be considered to foreground (R1), because they are darker than corresponding points on the other side. 3.Of the remaining points, those whose correlation between the two sides exceeds a correlation threshold are deemed to be bleedthrough (R2). The rest are assigned to R4.

6 Original with bleed-throughWith bleed-through removal

7 Algorithm Registration: Alignment of recto and flipped verso Segmentation: Four regions 1.R1: Foreground only 2.R2: Bleed-through only 3.R3: Background 4.R4: Foreground and bleedthrough overlap Inpainting: Region R2 filled in with estimate of background Recto and flipped verso images, superimposed after registration Illustration of four types of regions Inpainting applied to circled region

8 Inpainting Points labelled R2 ‘bleed-through’ are replaced by suitable nearby points from the background region R3. In the initial work, a fixed value was used.

9 5. Conclusion Bleed-through can be effectively removed by jointly processing recto and verso sides of document. More complex bleed-through removal algorithms can be used at the server side, with the result transmitted to the remote user. It is not necessary to separately transmit original and corrected versions to a user who wishes to see both. All elements can be incorporated into JPEG2000. More work needs to be done on the segmentation and inpainting aspects of the algorithm.


Download ppt "1. Problem Many archived two-sided manuscript documents suffer from bleed-through; Bleed-through can be effectively removed offline using image-processing."

Similar presentations


Ads by Google