Download presentation
Presentation is loading. Please wait.
Published byTravis Croxton Modified over 10 years ago
1
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved inserting background graphic lvincent@lizardtech.com Copyright 2003, LizardTech, Inc. All rights reserved 1 Introduction to DjVu for document compression, archival and web-delivery Luc Vincent
2
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 2 About LizardTech 50+ Employees in Seattle Offices in the UK, Japan, Spain Subsidiary of Celartem Technology, Inc. –Public Company on NASDAQ Japan (Hercules) –3 Recent U.S. Acquisitions: Extensis, Diamond Soft, LizardTech –Solid Financials: $200+M market cap, $30+M revenue, $38+M cash 1995 Ships first commercial wavelet compressor, based on MrSID Technology 2003 LizardTech acquired by Celartem 1992 LizardTech spins off from Los Alamos National Labs 2000 Acquires DjVu Technology from AT&T 2002 Releases ExpressServer
3
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 3 Genesis of DjVu Technology A realization that more than 95% of the world’s knowledge still exist only on paper The desire to efficiently publish legacy academic papers (and more) on the web The need for compression: raw bitonal scans are ~1MB per page, color scans are 25-100MB –Disk is cheap, but large file sizes are barrier to web access The inefficiency of existing technologies and file formats (TIFF, JPEG, PDF, etc) to deal with scanned documents, especially scanned color documents.
4
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 4 JPEG for Color Document Compression? Page size still between 400KB and 2MB –Impractical for use over narrowband connections Unpleasant ringing artifacts around text Text not normally separated from the image –Cannot be indexed or searched Viewing requires vast amounts of RAM –Impractical for use on wireless phones and PDAs Cumbersome page navigation in web browsers –Slow and inconvenient zooming and panning No multi-page document support –Encapsulating in container format like PDF only adds layers of inefficiencies JPEG2000 an improvement, however: –Only a specification. Implementations vary in quality –Limited viewing support –Slow decoding and rendering
5
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 5 Layered Representations (MRC) + Full Page = Background: tint, images, paper texture, etc. Foreground: text, graphics, etc. Paradigm: segment into layers and use appropriate compression/resolution for each
6
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 6 DjVu Approach to Compression Highly accurate FG/BG segmentation Background: IW44 wavelet compression –Compress in full color –Low resolution (~100dpi) usually sufficient Foreground: JB2 symbol-based compression –Keep at high resolution –Palettize colors, or keep color information at very low resolution
7
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 7 What is DjVu (“Déjà Vu”)? An Open file format for managing, storing and exchanging color documents – scanned or electronic – without compromise: –Dramatically smaller file size: up to 1,000:1 compression –Superior image quality and legibility: full color, full text resolution –Enables extremely reliable OCR from any document –True portability: no fonts, no problems –Highly optimized Web viewing –Established open standard A set of advanced algorithms working together for highly efficient document compression and delivery A technology platform, DocumentExpress, for creating these optimized DjVu files from scans or electronic originals
8
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 8 Smallest Scanned Color Documents Typical file sizes for a 400dpi 24 bit color magazine page Up to 1,000x smaller than TIFF 5x to 100x smaller than JPEGs or PDFs Typical full color high-res DjVu page: 50KB –Same as typical web page –Same as TIFF G4 black-and-white scan Benefits: Scan, store & distribute complex color documents without ever worrying about file size or quality When in doubt, scan in color: –Cut scanning costs –Boost efficiency –Increase customer satisfaction
9
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 9 DjVu Foreground Compression Principles: Leverage repeating shapes (characters) in and across pages Compress using –Shape dictionaries –Lists of positions –Shape cross-coding and other advanced techniques State-of-the-art JB2 token-based compression
10
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 10 DjVu JB2 Bitonal Compression JB2 is 3x to 10x smaller than TIFF G4 –On scanned books, up to 10x smaller –Worst case: 1.5X to 2X smaller Typical per-page size: 5-10 KB Highest bitonal document compression available Typical size for 400dpi bitonal scan or foreground layer
11
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 11 Segmentation Enables Highest OCR Accuracy from Color Scans Benefits: High quality indexing Search & Retrieve –text on tint –Inverted text –text in images or graphics Detach here and return upper portion with check or money order. Do not staple or fold. Summary of,,...Corporate Card Account Retain this portion for your Corporate Card member Name Corporate Services Statement Closing Date- 04-02-02 Previous Balance New Charges Other Debits Payments Received Other Credits Balance Due $18,816.21 $6,605.38 $.00 $9,660.39 $.00 $15,761. mtaon hen and roturn uppw portlon wlth ohaok or molny odor. Do not rth-i or kM. Summary of Corporate Card Account Retdn thir ~ortlon r your flier. Corwrate Car-r Ir(arp S t a t m t ~loqmng D& r 04-01-02 Card Number Cardmember Name I--.- --.-.-. - - - -. -. - TOTAL CHARGES AND CREDITS......................--..-.. I....-.................. -.......... -.-............ -. -. I.- -- --...--........ BALANCE DUE - -. -. -. -- ---- -.--- ------- -. -. -.... ALTERNATIVE TECHNOLOGY
12
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 12 DjVu “Hidden Text” Layer Novel paradigm: –Keep scanned documents as images –Use OCR for searching, indexing and copy/paste –Keep OCR as “hidden DjVu layer” Various free tools enable to extract and manipulate hidden text layer XML or TXT Free DjVu IFilter integrates keyword search to Windows Benefits: DjVu files are keyword-searchable and can be indexed Easy integration into Document Management Systems Out-of-the-box integration with Windows Indexing Service (SharePoint, etc)
13
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 13 What About Electronic Documents? PDF to DjVu: 2X to 100X size reduction Virtual Printer Driver technology can be used to create DjVu files from any Application Conversion can be done to preserves text without need for OCR Benefits: Create snapshots of web pages Convert and distribute those huge PDFs! Create truly portable files (no fonts required) Single format for scanned & electronic documents interchange
14
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 14 Highly Optimized DjVu Viewing & Browsing Free browser plugins 900KB download, auto-installation Plugin-free viewing options available (server-based, Java, etc) Designed to minimize any viewing delays: –Progressive page rendering –Instant access to pages in any sequence –Pages are pre-fetched and pre-decoded while you read –Real-time zooming and panning Benefits: Instantly access your documents from any platform, over any network Browse DjVu as easily as HTML
15
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 15 DjVu: a Web-Friendly Format Hyperlink support ActiveX based plugin can be integrated into Windows applications DjVu documents can be embedded in web pages (like JPEG images) Numerous ways to customize how DjVu documents are viewed by end-users Benefits: Enables seamless switch between HTML and DjVu browsing Outstanding end-user experience Highly customizable
16
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 16 DjVu: an Established Open Standard Published format specification Open-source tools for viewing and manipulating existing DjVu files Already millions of users worldwide, ~300,000 monthly plugin downloads Benefits: No worry about longevity of format: it has already reached critical mass and support DjVu great for archival Freely view and manipulate DjVu files on all common platforms Already out-of-the-box DjVu support in 4 major Linux distributions Download it at: http://djvu.sourceforge.net DjVu-Libre package available under Gnu GPL
17
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 17 Companies/Products Supporting DjVu
18
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 18 Some Happy DjVu Users USGS 1M+ color and B&W documents/drawings Small storage requirements Improved web delivery Fast viewing “No Plug-in” viewing Internet Archive Scan out-of-copyright books 100k books in DjVu by end of 2004 Books can be accessed and printed anywhere in the world with ”bookmobile” Garfield Court, OH 95% paperless court Lawyers can access material securely from anywhere Tamper-proof files Records searchable Huge productivity gains by eliminating manual procedures Sears 5M service manuals for 12k technicians available via GPRS Technicians take laptop instead of truckload of manuals Improved productivity Improved service quality & speed Lower cost
19
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 19 Why Choose DjVu? DjVu truly enable scan-to-web, for any document type –No other solution comes close Why choose DjVu over JPEG for color documents? –JPEG files are 10x to 100x larger –JPEG has no document structure: single page, no OCR Why choose DjVu over PDF? –PDF is best used for printing, not for scanning and web-publishing –PDF files are typically 5 to 100 times larger –Scanned PDFs cannot match DjVu quality even with much larger files –PDF viewers are sluggish Why not stick to TIFF Group4 for black-and-white documents? –DjVu is up to 10x smaller –TIFF plugins are slow and awkward –TIFF does not support hidden text, hyperlinks, cgi-bin arguments, etc
20
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 20 EXTRA SLIDES
21
We Make Imaging Work. Everywhere. Copyright 2004, LizardTech, Inc. All rights reserved 21 DjVu Background Compression State-of-the-art IW44 wavelet compressor Surpasses JPEG and JPEG2000 compression, speed, and features Progressive compression/display, with in-place updates. –Browser plugin displays successive refinements (typically 3 to 5) –Enables faster browsing over slow connections Multiplication-free decoding –3X to 10X faster decoding/viewing than JPEG2000 Maskable, so no bits are wasted encoding background regions covered by foreground pixels –Typically reduces file size by ~30% Decoder supports “half-decoding” into ~2MB sparse matrix without performance hit –Enable viewing in memory-constrained environments (PDAs, etc) –Still enables highly efficient on-the-fly zooming and panning * Features unique to IW44
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.