DFDL WG Session 3 Mike Beckerle Ascential Software Two note-takers please?

Slides:



Advertisements
Similar presentations
XMLSchema to TTCN-3 Mapping Importing XML schema based data types into TTCN-3.
Advertisements

Introduction to C Programming
Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh Alan Chappell PNNL
Datatypes for OGSA Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:
Pointers.
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML 6.5 XML Schema (XSD) 6. What is XML Schema? The origin of schema  XML Schema documents are used to define and validate the content and structure.
Some computer fundamentals and jargon Memory: Basic element is a bit – value = 0 or 1 Collection of “n” bits is a “byte” Collection of several bytes is.
DFDL WG Session 2 Mike Beckerle Ascential Software (Two note-takers please?) Room D008 Thursday, h00+02:00 Brussels (BE.CEST) 05h00 New York.
CSE 636 Data Integration XML Schema. 2 XML Schemas W3C Recommendation: Generalizes DTDs Uses XML syntax Two documents: structure.
1 Lecture 2  Input-Process-Output  The Hello-world program  A Feet-to-inches program  Variables, expressions, assignments & initialization  printf()
Computer Science 1620 Multi-Dimensional Arrays. we used arrays to store a set of data of the same type e.g. store the assignment grades for a particular.
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
Hash Tables1 Part E Hash Tables  
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
Software Engineering Recitation 2 Suhit Gupta. Today we will be covering… XML II Sockets, Server – Client relationships, Servers capable of handling multiple.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
ECE122 L3: Expression Evaluation February 6, 2007 ECE 122 Engineering Problem Solving with Java Lecture 3 Expression Evaluation and Program Interaction.
Overview of C++ Chapter 2 in both books programs from books keycode for lab: get Program 1 from web test files.
1 CSCE 1030 Computer Science 1 Arrays Chapter 7 in Small Java.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
CS 355 – Programming Languages
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
Data Formats CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
IS432 Semi-Structured Data Lecture 3: XSchema Dr. Gamal Al-Shorbagy.
Dr. Azeddine Chikh IS446: Internet Software Development.
CSE4500 Information Retrieval Systems XML Schema – Part 1.
CPS120: Introduction to Computer Science Arrays. Arrays: A Definition A list of variables accessed using a single identifier May be of any data type Can.
CSE 2341 Honors Professor Mark Fontenot Southern Methodist University Note Set 04.
C Tokens Identifiers Keywords Constants Operators Special symbols.
1 C - Memory Simple Types Arrays Pointers Pointer to Pointer Multi-dimensional Arrays Dynamic Memory Allocation.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Lec 6 Data types. Variable: Its data object that is defined and named by the programmer explicitly in a program. Data Types: It’s a class of Dos together.
Serialization. Serialization is the process of converting an object into an intermediate format that can be stored (e.g. in a file or transmitted across.
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
DFDL WG Session 1 Summary of Status WG Co-Chairs:  Mike Beckerle, Ascential Software Alan Chappell, PNNL.
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
CPS120: Introduction to Computer Science Lecture 15 Arrays.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
Files Tutor: You will need ….
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
Programming Fundamentals. Summary of previous lectures Programming Language Phases of C++ Environment Variables and Data Types.
© 2006 Open Grid Forum Data Format Description Language (DFDL) DFDL 1.0 Public Comment Suman Kalia, IBM
Spring 2009 Programming Fundamentals I Java Programming XuanTung Hoang Lecture No. 8.
CSCI 130 More on Arrays. Multi-dimensional Arrays Multi - Dimensional arrays: –have more than one subscript –can be directly initialized –can be initialized.
Data Types Always data types will decide which type of information we are storing into variables In C programming language we are having 3 types of basic.
Java Basics. Tokens: 1.Keywords int test12 = 10, i; int TEst12 = 20; Int keyword is used to declare integer variables All Key words are lower case java.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
© 2010 Open Grid Forum Data Format Description Language (DFDL) DFDL 1.0 – Proposed Recommendation Steve Hanson, IBM
Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh
Apache Avro CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
DFDL WG Session 1 Summary of Status Mike Beckerle Ascential Software.
Java Programming Language Lecture27- An Introduction.
CSI 3125, Data Types, page 1 Data types Outline Primitive data types Structured data types Strings Enumerated types Arrays Records Pointers Reading assignment.
1 ENERGY 211 / CME 211 Lecture 3 September 26, 2008.
Washington D.C. 1 DFDL Data Format Description Language Overview & Summary of Status WG Co-Chairs: Mike Beckerle,
DFDL WG Session 2 Mike Beckerle Ascential Software
A DFDL Proposal based on Commercial Data Processing Requirements
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
THE DATATYPES OF XML SCHEMA A Practical Introduction
C Language B. DHIVYA 17PCA140 II MCA.
Presentation transcript:

DFDL WG Session 3 Mike Beckerle Ascential Software Two note-takers please?

DFDL-WG Session 3 Data Model and Binary Primitives (The old agenda - from the program) (If the previous sessions issues still need time they will be worked on in this session.) Discussion of issues with primitives set of W3C Schema identify important gaps (multidimensional arrays, for example) identify anomalous semantics Discussion of binary mappings list mappings and arguments expose issues propose naming policy/names for mappings Discussion of binary mapped types list types naming policy/name Discussion of text mappings list mappings and arguments expose issues propose naming policy/names for mappings Discussion of text mapped types list types naming policy/names

DFDL-WG Session 3 Agenda Current Working Issues (Continued from Session 2) Data Model and Mapping Primitives

Issues (From Session 2) Stored length, references in general Choice/unions Expression language for discrimination Layered translations compression, encryption IBM data streams (F, FB, VB, VBS) Modularity How to plug in new transformations? Composition properties

Layered Translations Use case: Matrix with dynamic size in text file: blank lines are ignored C-style comments are ignored (equiv. to whitespace) First line contains xdim ydim (whitespace separated) Subsequent lines are rows of the 2-d matrix. There must be exactly ydim rows each containing xdim numbers Within each row the values are whitespace separated. The charset is UTF-8 Requires that we express preprocessing of the input data to handle the C-style comments and blank lines The preprocessing is not part of the structure of the data

Layered Translations Matrix w/Dynamic Size Example /* obsv3 データ 2003 年 08 月 27 日 佐藤 */ /* gbxx2. 14:02:21 実行時間 8 秒 */ 3 2 /* データはこのラインの後に続く */ /**/ 4 5 /* 見積もり */ 6 /* データの終わり */

Layered Translations Matrix w/Dynamic Size Example

Layered Translations Matrix w/Dynamic Size Example <element name=“xdata” type=“double” minOccurs=“0” maxOccurs=“unbounded”/>

Layered Translations Matrix w/Dynamic Size Example Underlying transformations Bits to bytes Bytes to Characters (UTF-8 encoding) Removal of blank lines Removal of C-style comments

Layered Translations Matrix w/Dynamic Size Example The data now looks like: Let b = blank, n=newline. The data really is this string of characters: 3b2n1b2b3bbn4b5bbb6n

References: Matrix w/Dynamic Size Example DFDL wants to make invalid mistakes like: (line structure doesn’t match dimensions) or: (too many rows)

References Matrix w/Dynamic Size Example <element name=“xdata” type=“double” minOccurs=“0” maxOccurs=“unbounded”>

References Matrix w/Dynamic Size Example … <element name=“xdata” type=“double” minOccurs=“0” maxOccurs=“unbounded”>

Layered Translations Matrix w/Dynamic Size Example Now add in the layered transformations of the streams…. <rep charset="UTF-8" container="byteStream">

Modularity Consider this example This connects the definition of binaryInt all the way back to how bits are turned into bytes This over-specification limits reusability

Modularity Issue: Why should binaryInt care about where the bytes come from? They could come from a binary file They could come from conversion of uuencoded text back into binary data They could come from decompression. DFDL defined types want to be parameterized by where they get their underlying data

Data Model and Mapping Primitives XML/XSD Issues Mapping primitives Binary and Text

XML/XSD Issues Present type model from XSD Multi-dimensional arrays Missing types? Basic ones: extended precision float Standardized ones that could be built by users but need to be there: complex numbers? Escape sequences needed for XML-illegal char codes. E.g., no allowed.

XML/XSD – basic types anySimpleType stringQNameNOTATIONfloatdoubledecimalbooleanbase64BinaryhexBinaryanyURI normalizedString token languageNameNMTOKEN NMTOKENSNCName ID IDREFENTITY IDREFSENTITIES integer long nonPositiveIntegernonNegativeInteger negativeInteger positiveIntegerunsignedLong unsignedInt unsignedShort unsignedByte int short byte datetimedateTimegYeargYearMonthgMonthgMonthDaygDayduration

XSD Complex Types Sequence, All Choice Vectors Any element can have minOccurs and maxOccurs specified. Multi-dimensions only via nested vectors

Multidimensional Arrays Nested arrays make storage order explicit That is, it's always row major order. Last subscript changes first MxN matrix A[i,j] is at (i*N+j) What if data is stored column-major order First subscript changes first MxN matrix A[i,j] is at (i+M*j) To solve this we need a matrix element type So we can put an arrayStorageOrder property on it! Extend XML? Or not? No extension proposal: <element name="mymatrix" minOccurs="0" maxOccurs="unbounded" type="…the element type…">

Mapping Primitives Discussion and Proposals??