21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML/EDI Overview West Chester Electronic Commerce Resource Center (ECRC)
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
XML Unit 6 October 31. XML, review XML is used to markup data Used to describe information Uses tags like HTML –But all tags are user-defined –Must be.
Information Retrieval in Practice
Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library.
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
LBSC 690: Session 6 CSS, XML/XSLT Jimmy Lin College of Information Studies University of Maryland Monday, October 15, 2007.
LBSC 690 Session #6 CSS, XML/XSLT Jimmy Lin The iSchool University of Maryland Wednesday, October 8, 2008 This work is licensed under a Creative Commons.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Overview of Search Engines
Content Management at Grainger Engineering Library Case studies from various digital library research projects Tom Habing
Introduction to XSLT & its use in Grainger Library full-text & metadata projects Thomas G. Habing Grainger Engineering Library Presentation to ASIS&T,
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Digital Library Technologies at the Grainger Library William H. Mischo, Timothy W. Cole, Tom Habing Grainger Engineering Library Information.
KINOLINA.COM XML Standards and Vocabulary Development Eric Gould December 2002.
Enriching Metadata for XML Journal Articles Through Extraction of MathML and Function Names Timothy W. Cole William.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP The University of Akron Summit College Business Technology Department Computer Information Systems 2440: 140 Internet Tools Instructor: Enoch E. Damson.
Digital Library Issues and Trends William H. Mischo Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
The Illinois Digital Library Initiative: Processing and Access Issues for Full-Text Journals May 27, 1998 Pennsylvania State University William H. Mischo.
XML Tutorial Timothy W. Cole Thomas G. Habing University of Illinois at UC CDP / Colorado Alliance of Research Libraries 23/24 October 2002.
Localized Linking Prototype CNI April 10, 2001 Dale Flecker, Larry Lannom, Rick Luce, Bill Mischo, Ed Pentz.
April 30, 2003CENDI Workshop, Wash. DC XML for Technical Reports Kurt Maly, M. Zubair Old Dominion University.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Introduction to XML Timothy W. Cole Thomas G. Habing University of Illinois at UC CDP / Colorado Alliance of Research Libraries 23 October 2002.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
University of Nottingham School of Computer Science & Information Technology Introduction to XML 1. The XML Language Tim Brailsford.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Practical Experiences With the Adoption of XML in Commercial Publishing Richard Kidd Neil Hunter
Introduction to Markup David J. Birnbaum University of Pittsburgh Slavic Digital Text Workshop University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
From Access to Archive Transforming Scholars Portal into an E-Journal Archive.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Scientific Markup Languages Birds of a Feather Brief Overview of MathML Timothy W. Cole Mathematics Librarian & Professor of Library.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Using XML, XSLT, and CSS in a Digital Library Rendering Using XSLT and CSS Mathematics Rendering Thomas Habing ASIS Annual Meeting 2000.
The Open Archives Initiative: Perspectives on Metadata Harvesting OAI Provider & Harvesting Services at the University of Illinois Timothy W. Cole Mathematics.
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
Beyond HTML: Extensible Markup Language (XML)
Microsoft FrontPage 2003 Illustrated Complete Creating a Web Site.
Information Retrieval in Practice
Unit 4 Representing Web Data: XML
Creating a Well-Formed Valid Document
Tutorial 9 Working with XHTML
Search Engine Architecture
University of Illinois at Urbana-Champaign OAI Alpha Experiences
Qualified Dublin Core Using RDF for Sci-Tech Journal Articles DC-2001 International Conference on Dublin Core and Metadata Applications, October 22-26,
MathML and Digital Libraries
Improving Braille accessibility and personalization on Internet
Using XML, XSLT, and CSS in a Digital Library
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Chapter 7 Representing Web Data: XML
Tutorial 9 Working with XHTML
XML Problems and Solutions
Beyond HTML: Extensible Markup Language
Digital Library Issues and Trends
CSE591: Data Mining by H. Liu
7 September 1999 The Basics of XSLT Assuming a basic knowledge of XML and XML Namespaces Thomas G. Habing Grainger Engineering Library Information Center.
Presentation transcript:

21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole Thomas G. Habing William H. Mischo Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign  

21 October 2000 MathML & Math on the Web Project Background & Objectives Funded under DLI-I (NSF, DARPA, & NASA) Continued under CNRI’s D-Lib Test Suite Objectives: –Construct Large-Scale, Multipublisher, Markup-Based Full-Text Journal Testbed. –Investigate Processing, Indexing, Normalization, Retrieval, Rendering and Linking. –Study End-User Searching Behavior and Needs. Testbed contains 60,000 Articles from 50 Journal Titles –Received as SGML (various DTDs); converted to XML –Content & support from AIP, APS, ASCE, IEE, ASM, ACM, Elsevier –Additional support from IEEE, NRL, NTT Learning Systems

21 October 2000 MathML & Math on the Web Project Background (cont.) Accomplishments: –Process & Retrieve from Multiple Publishers & Heterogeneous DTDs. –SGML to XML Conversion. –Metadata Extraction, Representation, Merging. –Dynamic Linking: Forward/Backward, from/to A & I DBs. Current Investigations: –Mathematics Markup & Rendering Issues –Metadata Harvesting: Replicative & Distributed –E-Journal Archiving –Local Resource Resolution –Asynchronous Searching of Multiple Resources

21 October 2000 MathML & Math on the Web Converting Legacy Markup to MathML Goal: Convert publisher-specific XML math markup to standard presentation MathML –Desired result: can then focus on single rendering solution Groundrules: –Minimize need for human intervention –Utilize standards-based techniques (e.g., XSLT, JavaScript, DOM) –Embed MathML in full XML document –Validate success of conversion based on quality of presentation –Strive for consistency across MathML viewers Scope: –E.g. in 17,000 APS articles, > 2.3 M instances of math (100 K block) –  

21 October 2000 MathML & Math on the Web Mathematics Markup Transformations Identify & translate mathematical character references Identify & tokenize mathematical content Recognize & transform mathematical markup (e.g., embellishments, script & limit schemtas, etc.) ISO Math a 2 i Presentational MathML α i 2

21 October 2000 MathML & Math on the Web Approach & Algorithim For each XML document: Identify mathematical nodes (e.g.,, ) Recursively apply templates to every child node within mathematical nodes: –Look up entities & special characters and Convert to appropriate MathML characters & tokenize (JavaScript) –Tokenize remaining #PCDATA (JavaScript) –Convert Postfix markup to MathML (e.g.,, ) –Re-tag one-to-one transformations (e.g.,,, ) Transformed mathematical nodes ( ) replace original mathematical nodes in document –Include default namespace attribute

21 October 2000 MathML & Math on the Web Approach & Algorithim (cont.) Illustrative XSLT:... THERE ARE FOUR MORE CASES TO HANDLE !

21 October 2000 MathML & Math on the Web Remaining Issues JavaScript from within XSLT –Rely on MS-specific mechanisms to invoke extension functions Inconsistent Rendering by MathML Viewers –Validating against TechExplorer, Amaya, Mozilla, MS IE (w/ CSS) –Incomplete MathML implementations Ambiguity & Overuse of –Limited impact on appearance –Verbosity -- 60% increase for inline, 15% increase for block Character / glyph issues –STIX project / Unicode update will provide some relief Automated Checking for Errors / Problems Rendering System Performance

21 October 2000 MathML & Math on the Web Status Developing publisher-specific XSLT stylesheets –See sample transformed issue of Physical Review Letters   XSLT allows us to generate standard MathML from publisher-dependent SGML math markup –Moves customization to pre-processing stage –Allows for single, common rendering solution –MathML can be rendered in some browsers / tools without the need to style (Mozilla, techexplorer, Mathematica)