Considerations for Future XML Development Methodologies Steve Margenau

Considerations for Future XML Development Methodologies Steve Margenau
Chair, PESC Technical Advisory Board Systems Analyst, Great Lakes Educational Loan Services

We can’t talk about where we’re going in the future without describing how we got where we are.

Early 2001 The Core Components Workgroup The Technology Workgroup
Core Components would sift through every report and file transmission in higher education and boil away duplicate and “like” data names to arrive at a single data dictionary. The Technology workgroup would determine a set of best practices for the higher education community. Since Core Components sounded like an extended trip to the dentist, I chose the latter group.

The Technology Workgroup
Weekly calls selecting Best Practices for the Higher Education community Provided advisory and instructional services to the Core Components workgroup as well as to other PESC members Weekly Calls – We followed the guideline of keeping things simple by addressing the relatively high-level aspects of XML data design. We did not want to dictate unnecessary details to such a large community. We were not looking to address low-level detail items unless it became necessary to assist the development work of one of our workgroups. The Technology Workgroup assisted the Core Components workgroup when they had questions. Or at least we tried! The Tech Workgroup also provided educational services to PESC member organizations who were looking to develop an XML-based vocabulary. Working with NCHELP is an example of one of those efforts, as well as Bruce Marton’s work with the AACRAO/SPEEDE committee.

Early 2002 The Technology Workgroup The Core Components Workgroup
Technical Specifications The Core Components Workgroup Data Dictionary An Enthusiastic Membership The Technology workgroup had published the first version of the PESC XML Technical Specifications for Higher Education and were working on the next version. The Core Components workgroup was well on it’s way to creating a data dictionary for the higher education community. And we had an enthusiastic membership who thought “hey, this is gonna work!!!!!!!!!!!!!”

But what are we going to do?
No one was going to change all of their systems to match the PESC definitions. How are we going to support community-wide data items that have the same name, but which must have differences in their definitions, until the time all members migrate to the common definition? Members were agreeing on common names, even common definitions of minimum and maximum length, but could not move to a definition different from the one their system(s) were currently using. The majority of these differences were in the maximum lengths of a field as well as cardinality – is it required or optional? Is there a way we can have data elements which share the same name but have different length values, etc?

At a PESC meeting at the University of Miami in February of 2002
We referenced a whitepaper prepared for PESC in March of 2001, in which an architecture of multiple schemas and namespaces was described that could solve the very issue we were faced with. This architecture was tested, implemented, and is in use to this day. We could take a single data element that has two definitions, and put each definition in a separate schema, with each schema having its own namespace. When creating a schema for a particular application, such as the College Transcript. we could choose which definition we wanted to use by Importing that schema and its namespace into the application schema we were creating, then in turn reference the definition we wanted to use with a namespace reference.

We developed a “Core” schema that contains element definitions that have no differences across PESC members. We developed “Sector” schemas that contain element definitions that are specific to a given sector of the higher education community. Application schemas choose which definition they use by Importing the schema and namespace in which the definition resides, and using the namespace prefix when specifying the element definition. The Core schema, and each Sector schema, has its own namespace. Application schemas also have their own namespace as well. The instance documents themselves only need to specify a namespace qualifier on the root element start and end tags. All other namespace-related XML specifications remain at the schema level, hiding the relative complexity of the structure from someone viewing an instance document.

This architecture is easy for people to understand, and works perfectly from an XML definitions perspective. But we now know that it’s not the best approach from an applications and systems environment perspective. Here is why……………………

Core Sector 1 Sector 2 First Name Middle Initial Last Name Last Name
I want to build a schema that uses the Last Name from Core, the First Name from Sector 2, and the Middle Initial from Sector 1. Conceptually, for us humans visualizing the situation, this works. It also works for schema design tools as well. However, let’s take the example of a software tool used for generating an XML instance document based on our schema. This tool creates internal memory representations not only for the element definitions for Last Name, First Name, and Middle initial – it also creates representations for…….. Last Name First Name Middle Initial

Core Sector 1 Sector 2 First Name Middle Initial Last Name Last Name
Each and every item in each of the imported schemas. At a minimum, this taxes system performance due to excessive memory usage. That’s not good for software developers or vendors. Last Name First Name Middle Initial

The Technical Advisory Board has been working on moving away from this structure for the past year.
How do we do this? By hand? This means managing the XML definitions for individual elements, both simple and complex, and putting them together to create schemas - and managing versions of all of them. Then send the schemas out for multiple reviews to be sure we’ve got it right. Will this work? I don’t think so. Our most experienced PESC Schema Author tried this as an experiment. It got real tedious real fast. Using a manual process by which volunteers maintain schemas comprised of many different component definitions is not suited to an organization this large.

Once this alternative was turned down, we thrashed for quite awhile
Once this alternative was turned down, we thrashed for quite awhile. What to do? At our January face-to-face meeting a member of the Technical Advisory Board mentioned a tool his company was evaluating for managing their internal XML components. Our interest was piqued. Are there tools available to make managing components and building schemas less tedious and less prone to manual error? We sat through a long online demonstration of the product. We were quite impressed. At least – the tool sounded like it did most everything we wanted it to do.

With a good number of Technical Advisory Board members present in Washington, we began developing a set of evaluation criteria for what we have come to refer to as “repository management tools”. PESC doesn’t have staff dedicated to component and schema development and maintenance. A tool, if one exists, could provide automated creation and management of components as well as schemas. Perhaps it could provide for multiple active versions of components and schemas. And maybe it could even allow a schema to be put together by a workgroup, rather than volunteers dedicated to this task. Here are some of the criteria we came up with

Be sufficiently robust as to support the creation and maintenance of PESC schemas based on components contained in the tool’s repository Have the ability to import existing schemas, both PESC schemas and schemas from other sources Be able to create schemas from repository-based components that are backward-compatible with existing PESC schema definitions Must Haves

Provide the ability to move a component to a new namespace
Be able to store schemas in separate namespaces to accommodate existing PESC schema definitions, which provide like-named elements and types that exist in separate namespaces Be able to create a new schema file and namespace from Repository-based components Provide the ability to move a component to a new namespace More Must Haves

Provide the ability to create new components from subcomponents whose definitions exist in different namespaces Be able to conduct an Impact Analysis of a change to a definition contained in the repository and its affect on other components and schemas Provide the ability to see parent/child associations and relationship history across components and schemas More Must Haves

Be able to support multiple versions of the same definition (concurrent versions as well as those in various development stages) for components and schemas Have the ability to publish/deploy definitions within the repository in multiple formats such as plain text and Comma Separated Values (CSV) More Must Haves

Provide a means to publish contents of the repository to a Component Registry such as Federal Student Aid’s XML Registry and Repository for Higher Education, the ebXML Registry and Repository, etc Have the ability to identify elements and components that are not used within another component or schema definition Provide a means to track changes made to components and other artifacts (such as a sample instance document) stored in the repository Nice to haves

Provide a means to store supporting information (sample instance documents, change history, documentation, etc) and tie it back to the corresponding component Provide the ability to generate a report that details the results of an Impact Analysis. This report could serve as evidence of due diligence by a PESC workgroup that is adding or changing an element, component, or schema Be able to generate instance documents based on a schema definition residing in the repository More nice to haves

Next came the search for tools that might provide at least some of the capabilities enumerated on the previous slides. There aren’t many. In fact, we have found one. But it is very promising. Schema Agent, components that are a part of a larger environment/software package

We’re excited about this tool, but we are proceeding cautiously in order to make the best decision for PESC and the Community. Being a volunteer force also constrains the amount of time that we can devote to our evaluation as well. We’ve seen the demos, and some of the features have been tested already. However, we want to run our tests with PESC schemas, to be sure it works for US in the way we want, and need, it to work.

We will deliver schemas that allow software tools to generate what they need to produce PESC XML – not the extra baggage that is created today. We want to get the word out that this issue is being addressed, and provide an update on our progress. We may be able to provide workgroups with the ability to create their own schemas from existing components, and components they have created.

Questions?

Thank you for attending and participating in today’s session.
Stay tuned for further updates!

The Technical Advisory Board is an EOEO
The Technical Advisory Board is an EOEO. Suggestions, as well as new members, are always welcome! You may also contact Michael Sessa or Jennifer Kim, and they will get the information to us. Equal Opportunity Egoless Organization

Considerations for Future XML Development Methodologies Steve Margenau

Similar presentations

Presentation on theme: "Considerations for Future XML Development Methodologies Steve Margenau"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Considerations for Future XML Development Methodologies Steve Margenau

Similar presentations

Presentation on theme: "Considerations for Future XML Development Methodologies Steve Margenau"— Presentation transcript:

Similar presentations

About project

Feedback