Presentation on theme: "1 Harmonizing Semantics in E-Government Presentation to the Ontolog-Forum (http://ontolog.cim3.net) Brand L. Niemann U.S. Environmental Protection Agency."— Presentation transcript:
1 Harmonizing Semantics in E-Government Presentation to the Ontolog-Forum (http://ontolog.cim3.net) Brand L. Niemann U.S. Environmental Protection Agency Enterprise Architecture Team CIO Councils Architecture and Infrastructure Committee (AIC) Co-Chair, Semantic Interoperability Community of Practice (SICoP) CIO Councils Best Practices Committee (Knowledge Management Working Group) April 22, 2004
2 A Little History Led a Team That Won the Special Award for Innovation with XML and VoiceXML Web Services from Mark Forman and the Quad Council at FOSE, March 2002. Led the CIO Council XML Web Services Working Group from August 2002-October 2003: –TopQuadrant led the Semantic Technologies for eGovernment Pilot. –TopQuadrant helped organize the very successful Semantic Technologies for eGovernment Conference at the White House Conference Center, September 8, 2003. The TopQuadrant Pilot and the CIO Councils Knowledge Management Working Group (Best Practices Committee) Helped Start the new Semantic Interoperability Community of Practice (SICoP). The XML Web Services Working Group Pilots Supported the Development of the: –Federal Enterprise Architectures (FEA) Data and Information Reference Model and Its Data Management Strategy; and the –Government Enterprise Architecture Framework (GEAF) of the CIO Councils Architecture and Infrastructure Committee (AIC) Governance Subcommittee.
3 Organizational Relationships U.S. CIO Council Industry Advisory Council (IAC) OMB - FEAPMO Enterprise Architecture Special Interest Group Subcommittees: Governance Components Emerging Technologies Architecture & Infrastructure Committee IT Workforce Connections Best Practices Committee Chief Architects Forum Semantic Interoperability Community of Practice WGs and CoPs
4 Some Upcoming Events Collaboration Expedition Workshop #31, April 28 th, National Science Foundation, Ballston, Virginia: –Joint Workshop with SICoP on Multiple Taxonomies: See http://ua-exp.gov Collaboration Expedition Workshop #32, May 11 th, National Science Foundation, Ballston, Virginia: –Workshop on Emerging Technology Innovations in Software Component Development, Reuse, and Management – Applications to Government Enterprise Architecture (e.g. the new Chief Architects Forum CoP): See http://ua-exp.gov SICoP Monthly Meeting #2, May 19 th, MITRE, Mclean, Virginia: –Progress Reports on White Paper Modules (3), Collaboration Tools, Discussion of Common Upper Ontologies, etc. See http://web-services.gov and http://km.gov Fourth Quarterly Emerging Technology Components Conference, June 3 rd, MITRE, McLean, Virginia: –Populating the Service Grid with Service Components: See http://Componettechnology.org
5 An Upcoming Event Joint Workshop with SICoP on Multiple Taxonomies, April 28 th : –Welcome: Organizer: Michel Biezunski, Coolheads Consulting –The Semantic Web-What Is This Really About? Renee Lewis, Pensare Group –Increased Knowledge Sharing and Mission Success: Implementing Taxonomies for NASA: Jayne Dutra, Jet Propulsion Laboratory –Master and Relational Taxonomies: Kevin Hannon, Independent Consultant –Clustering of Search Results With and Without Taxonomies: Raul Valdez-Perez, Vivisimo, Inc.
6 An Upcoming Event Joint Workshop with SICoP on Multiple Taxonomies, April 28 th (continued): –Semantics, Ontologies, and the Semantic Web: Leo Obrst, The MITRE Corporation –How to Create Many Taxonomies That Integrate Into a Single Enterprise-Wide Taxonomy: Denise Bedford, The World Bank –Ontology Overview: Adam Pease, Independent Consultant –Issues in Negotiating Multiple Semantic Models: LeeEllen Friedland, The MITRE Corporation –Accessibility, Usability, and Preservation of Government Information: Eliot Christian, USGS and Chair, Categorization of Government Information Working Group of the Interagency Committee on Government Information –Open Dialogue: Steven Newcomb, Coolheads Consulting
7 A Past Event SICoP Monthly Meeting #1, April 14 th, Army CIOs Office, Crystal City, Virginia: –Part 1 Community Business: Old Business: –Minutes and Charter Emerging Products: –White Paper On Implementing the Semantic Web: »Module 1: Harnessing the Power of Information Semantics (Jie- Hong Morrison, State Department) »Module 2: Exploring the Business Value of Semantic Interoperability (Irene Polikoff, TopQuadrant) »Module 3: Roadmap for Operationalizing the Semantic Web (Michael Daconta, Smart Data Associates) (Slides 13-14) Army Knowledge Management Conference, August 31-September 2 nd, Semantic Web Track (need speakers). Posted at http://web-services.gov, Past Meetings and Presentations, April 14th
8 A Past Event SICoP Monthly Meeting #1, April 14 th, Army CIOs Office, Crystal City, Virginia (continued): –Part 2 Building Shared Understanding: Ontologies for Semantically Interoperable Systems, Leo Obrst, The MITRE Corporation (deferred to the next meeting) (Slides 9-12) A Data and Information reference Model (DRM) Registry and Repository Pilot, Brand Niemann, US EPA (deferred to the next meeting) Common Upper Ontology for Cross-Domain Semantic Interoperability, Jim Schoening, The U.S. Army Communications Electonics Command –Part 3 Launching/Building the Supported Community of Practice: Proposed CoP Development Process, Rick Morris, US Army CIO Office Facilitated Discussion, Rick Morris and Brand Niemann, Co-Chairs
9 Tightness of Coupling & Semantic Explicitness Data Application Implicit, TIGHT Explicit, Loose 1 System: Small Set of Developers Local Far Same Process Space Same Address Space Same CPU Same OS Same Programming Language Same DBMS Same Local Area Network Systems of Systems Enterprise Community Internet Same Wide Area Network Client-Server Same Intranet Federated DBs Data Warehouses Data Marts Workflow Ontologies Compiling Linking Agent Programming Web Services: SOAP Distributed Systems OOP Applets Semantic Mappings Semantic Brokers Looseness of Coupling Semantics Explicitness XML, XML Schema Conceptual Models RDF/S, OWL Web Services: UDDI, WSDL OWL-S Modal Policies Middleware Web Peer-to-peer N-Tier Architecture EAI From Synchronous Interaction to Asynchronous Communication Performance = k / Integration_Flexibility
10 Dimensions of Interoperability & Integration Enterprise Object Data System Application Component 0%100% 6 Levels of Interoperability 3 Kinds of Integration Interoperability Scale Our interest lies here Community
11 weak semantics strong semantics Is Disjoint Subclass of with transitivity property Modal Logic Logical Theory Thesaurus Has Narrower Meaning Than Taxonomy Is Sub-Classification of Conceptual Model Is Subclass of DB Schemas, XML Schema UML First Order Logic Relational Model, XML ER Extended ER Description Logic DAML+OIL, OWL RDF/S XTM Ontology Spectrum: One View Syntactic Interoperability Structural Interoperability Semantic Interoperability
12 Logical Theory Thesaurus Has Narrower Meaning Than Taxonomy Is Sub-Classification of Conceptual Model Is Subclass of Is Disjoint Subclass of with transitivity property weak semantics strong semantics DB Schemas, XML Schema UML Modal Logic First Order Logic Relational Model, XML ER Extended ER Description Logic DAML+OIL, OWL RDF/S XTM Ontology Spectrum: One View Problem: Very General Semantic Expressivity: Very High Problem: Local Semantic Expressivity: Low Problem: General Semantic Expressivity: Medium Problem: Local Semantic Expressivity: High Syntactic Interoperability Structural Interoperability Semantic Interoperability
13 The Smart Data Enterprise Figure 2. Developer's Perspective on Data: To the application developer, the data evolution timeline is viewed through the correlation of programming paradigms with the relation of data and code. From: Designing the Smart-Data Enterprise, Get prepared for the 10 ways that semantic computing will impact enterprise IT, by Michael C. Daconta, Posted November 28, 2003, Enterprise Architect Magazine.
14 The Smart Data Enterprise Figure 3. The Smart Data Continuum: Data has progressed through four stages of increasing intelligence. (Reprinted with permission from The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management [John Wiley & Sons, 2003]. From: Designing the Smart-Data Enterprise, Get prepared for the 10 ways that semantic computing will impact enterprise IT, by Michael C. Daconta, Posted November 28, 2003, Enterprise Architect Magazine.
15 Abstract The history and broader context of this work. –See Section 1. The eGov Act of 2002 has two sections (207 & 212) which require more structure and interoperability for government data and information and work has begun in several committees and communities of practice. –See Section 2 (just a few highlights). The new Semantic Web standards and technologies provide a way to accomplish the purposes of the eGov Act of 2002 and the FEA Data and Information Reference Model Data Management Strategy. –See Section 3 (will skip over for this group). The work on repurposing the Statistical Abstract of the United States, 2003, into a DRM Registry and Repository illustrates how a number of objectives can be accomplished at the same time, including the highest priority of the CIO Councils Architecture and Infrastructure Committee, namely intergovernmental exchange of data and information. –See Section 4 (just a few highlights). The additional pilots underway are outlined. –See Section 5.
16 Overview 1. Introduction (slides 17-19). 2. eGovernment Drivers: The eGov Act of 2002 and the FEA Data and Information Reference Model (DRM) (slides 20-32). 3. Semantic Technologies for eGovernment (slides 33-49). 4. Repurposing the Statistical Abstract of the United States, 2003, Into a DRM Registry and Repository (slides 50-72). 5. Additional Pilots (slides 73-74).
17 1. Introduction Repurposing of large documents with mixed content (text, tables, graphics, etc.) into XML content collections began with The Statistical Abstract of the United States (1999 Edition) as part of the FedStats.Net project to build a distributed network of statistical data and information using new XML standards and technology. –The Statistical Abstract of the United States was considered to be one of the best examples of "manual aggregation of government information" (from some 200 programs across about 70 agencies) that would benefit from a distributed XML-based content network that would leave the content in the hands of its originators and create a more "living document". This work was recognized by OMB Associate Director for Information Technology and E-Government, Mark Forman, and the Quad Council with a Special Award for Innovation in the 2002 CIO Showcase of Excellence for the use of XML in a distributed content network (renamed FedGov) and use of VoiceXML in providing universal access to emergency response information.
18 1. Introduction More recently, the eGov Act of 2002's provisions for an Intergovernmental Committee on Government Information (ICGI) and Data Integration Pilots, the Federal Enterprise Architecture's Data and Information Reference Model (DRM) and its Data Management Strategy and the focus in the CIO Council's Architecture and Infrastructure Committee on Intergovernmental Data Exchange, have all be tied together in a new pilot that simultaneously accomplishes multiple objectives (see next slide). This Smart Data Enterprise approach came from the Semantic Technologies for eGov Conference, September 8, 2003, at the White House Conference Center (in which the EPA CIO and her staff participated), and continues in the new CIO Councils Semantic Interoperability (Web Services) Community of Practice (SICoP) (see subsequent slides).
19 1. Introduction (1) Repurposes government data and information into structured documents using new XML-based standards and technologies that facilitate reuse and exchange. (2) Repurpose the data and information so that it can be readily decomposed into XML fragments (for text and tables) and RDF metadata (for graphics) that can be stored and referenced in a database and can be in turn repurposed into new documents that provide additional user-defined views of the data and information. (3) Organize and categorize the repurposed data and information using taxonomies and even ontologies in semantic registries and repositories. (4) Use "XML data islands", and RDF and OWL to add metadata, interoperability and semantic meaning to data and information to be reused and exchanged. (5) Standardize the data element and XML tag names in a DRM registry and repository. (6) Share these results with others that are working on Semantic Web and Technology Applications for eGovernment.
20 2. eGovernment Drivers The eGov Act of 2002: –SEC. 207. ACCESSIBILITY, USABILITY, AND PRESERVATION OF GOVERNMENT INFORMATION. (a) PURPOSE.The purpose of this section is to improve the methods by which Government information, including information on the Internet, is organized, preserved, and made accessible to the public. (b) DEFINITIONS.In this section, the term (1) Committee means the Interagency Committee on Government Information established under subsection (c); and (2) directory means a taxonomy of subjects linked to websites that (A) organizes Government information on the Internet according to subject matter; and (B) may be created with the participation of human editors.
21 2. eGovernment Drivers The eGov Act of 2002 (continued): –SEC. 207. ACCESSIBILITY, USABILITY, AND PRESERVATION OF GOVERNMENT INFORMATION. (d) CATEGORIZING OF INFORMATION. (1) COMMITTEE FUNCTIONS.Not later than 2 years after the date of enactment of this Act, the Committee shall submit recommendations to the Director on (A) the adoption of standards, which are open to the maximum extent feasible, to enable the organization and categorization of Government information (i) in a way that is searchable electronically, including by searchable identifiers; and (ii) in ways that are interoperable across agencies; (B) the definition of categories of Government information which should be classified under the standards; and (C) determining priorities and developing schedules for the initial implementation of the standards by agencies. Note: Received the 2002 CIO Council Showcase of Excellence Special Innovation Award for XML Web Services (VoiceXML and the FedGov Content Network) in March 2002.
22 2. eGovernment Drivers The eGov Act of 2002 (continued): –SEC. 212. INTEGRATED REPORTING STUDY AND PILOT PROJECTS. (a) PURPOSES.The purposes of this section are to (1) enhance the interoperability of Federal information systems; (2) assist the public, including the regulated community, in electronically submitting information to agencies under Federal requirements, by reducing the burden of duplicate collection and ensuring the accuracy of submitted information; and (3) enable any person to integrate and obtain similar information held by 1 or more agencies under 1 or more Federal requirements without violating the privacy rights of an individual.
23 2. eGovernment Drivers The eGov Act of 2002 (continued): –SEC. 212. INTEGRATED REPORTING STUDY AND PILOT PROJECTS. –(d) PILOT PROJECTS TO ENCOURAGE INTEGRATED COLLECTION AND MANAGEMENT OF DATA AND INTEROPERABILITY OF FEDERAL INFORMATION SYSTEMS. –(1) IN GENERAL.In order to provide input to the study under subsection (c), the Director shall designate, in consultation with agencies, a series of no more than 5 pilot projects that integrate data elements. The Director shall consult with agencies, the regulated community, public interest organizations, and the public on the implementation of the pilot projects. –(2) GOALS OF PILOT PROJECTS. –(A) IN GENERAL.Each goal described under subparagraph –(B) shall be addressed by at least 1 pilot project each. –(B) GOALS.The goals under this paragraph are to –(i) reduce information collection burdens by eliminating duplicative data elements within 2 or more reporting requirements; –(ii) create interoperability between or among public databases managed by 2 or more agencies using technologies and techniques that facilitate public access; and –(iii) develop, or enable the development of, software to reduce errors in electronically submitted information.
24 2. eGovernment Drivers The Federal Enterprise Architecture (FEA) Data and Information Reference Model (DRM): –Volume 1 – Bob Haycock, OMB Chief Architect, will soon release with guidance to the agencies. The E-Government Act 2002, Section 207, Interagency Committee on Government Information, will use top two layers of the DRM structure for categorization of government information (see next slide). The E-Government Act 2002, Section 212, calls for a series of no more than 5 pilot projects that integrate data elements to encourage integrated collection and management of data and interoperability of Federal Information systems. –Data Management Strategy – In process and draft to be released soon. Have several critiques of the ISO 11179 to improve the DRM Model including the suggested use of the Meta Object Facility (MOF) from the Object Management Group (OMG) by MetaMatrix (see slide 16). –Volumes 2-4 – To Be Released by Fall 2004 (see slides 17-19). DRM business context, DRM information exchange, and DRM data elements.
25 The Current DRM Model A model for discovery of information: –Context and classification. –To determine available packages and elements. A model for exchange of information: –Information packages, built from common data elements. –Sharing mechanism. A model for representation of information: –Data elements defined in standard way. BUSINESS CONTEXT Subject Area Super Type BUSINESS DATA FLOW Information Exchange Package DATA ELEMENT Data Object Data Property Data Representation ISO 11179
26 Expanding the DRM Model MetaMatrix vision: –Generic classification to tag metadata with context: vs. 2-level context. –Packages built from complex datatypes and deployable for exchange or data access : vs. exchange-only packaging of ISO 11179 data elements. –Formal datatype model: vs. more conceptual ISO 11179 model. –Formal reference information to add semantic value to data definitions : vs. nothing. BUSINESS CONTEXT Subject Area Super Type BUSINESS DATA FLOW Info Exch Package DATA ELEMENT DRM ModelMetaMatrix Model ISO 11179 CLASSIFICATION Context PACKAGE Virtual Database Category REFERENCE Glossary Thesaurus Bibliography Exchange Package TYPE Complex Datatype Abstract Datatype Simple Datatype Schema/Association INSTANCE Transform Virtual Physical Data Property Data Representation Data Object
27 2. eGovernment Drivers Data GovernanceData ArchitectureData Sharing Architecture VOLUME II: BUSINESS CONTEXT Governance Structure, Policy & Procedures Purpose: Define policy & procedures for use of Information Categories in OMB 300 reporting and government information indexing. Information Categories Data Groups, and Security Profile Purpose: Catalogue and Index Government Information consistent with the E-Gov Act. Information Categories, Data Groups, and Exchange Security Requirements Purpose: Identify and define federated data classifications to discover commonalities and opportunities for re-use.
28 2. eGovernment Drivers Data GovernanceData ArchitectureData Sharing Architecture VOLUME III: INFORMATION FLOW Governance Structure, Policy & Procedures Purpose: Define and enforce policy that governs the use and protection of information packages available in the DRM registry. Information Exchange Packages, Security Profile Purpose: Define data groups (tables, records, messages, text) and attributes that reflect business process needs common to a Community of Practice. Information Maps, Exchange Security Requirements Purpose: Define data transformation patterns and key attributes that determine data exchange processing requirements.
29 2. eGovernment Drivers Data GovernanceData ArchitectureData Sharing Architecture VOLUME IV: DATA ELEMENT DESCRIPTION Governance Structure, Policy & Procedures Purpose: Define and enforce requirements for Data Standardization. Data Element Descriptions, Security Profile Purpose: Define and maintain data structures that reflect business data entity attributes and relationships. Object Descriptions, XML Schemas, Exchange Security Requirements Purpose: Define and maintain metadata required to provide or support a specific service pattern.
30 2. eGovernment Drivers The FEA DRM Data Management Strategy, Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work, Brand Niemann and Ken Gill: –Introduction to Data Semantics. –Domain Data Harmonization Strategy. –Data Harmonization Guiding Principles (10). –Global Justice Information Sharing Initiative (Global) Example. –Increased Collaboration by Means of and with "Smart Data (Dacontas Declaration of Data Independence). –Recommendations. Note: See http://web-services.gov for details.
31 2. eGovernment Drivers The FEA DRM has been and currently is the object of a series of pilot projects and collaborative work within the Communities of Practice: –Open GIS Consortium (OGC): Information Communities and Semantics WG (ICS WG): –http://www.opengis.org/groups/?iid=50 –Sustainable Intergovernmental Network Exchange (Global- Justice, Environmental Information-EPA, and Health IT Sharing (Health) (SINE): Collaborative Work Environment: –http://sine.cim3.net/ –Intelligence Community Metadata Working Group (IC MWG): http://www.xml.saic.com/icml/ –CIO Councils (Best Practices Committee) Knowledge Management Working Group (KM.GOV): Semantic (Web Services) Interoperability Community of Practice (SICoP): –See http://km.gov and http://web-services.gov/
32 2. eGovernment Drivers The FEA DRM has been and currently is the object of a series of pilot projects and collaborative work within the Communities of Practice (continued): –E-Gov SmartServices: To join the group send an email to eGov_SmartServices- firstname.lastname@example.org with empty Subject and Body. You will then receive an email with a web link where you can select the subscription option. –Open International Forum on Business Ontology: ONTOLOG - collaborative work environment: –http://ontolog.cim3.net/ (April 22 nd presentation) –Semantic Technologies for E-Government, September 8, 2003, White House Conference Center: http://www.topquadrant.com/conferences/tq_proceedings.htm 2 nd Semantic Technologies for E-Government, September 8, 2004 (tentative). –University of Maryland MINDLab (Professor Jim Hendler) and TopQuadrant (Ralph Hodgson): http://owl.mindswap.org/ and http://www.topquadrant.com/ –TopMIND Tutorials with Government Data Examples, March 22-25, 2004: –http://www.topquadrant.com/seminars/topmind.htm
33 3. Semantic Technologies for eGovernment Web-Enabled Government 2004 Conference and Exhibition, Session 2-4, February 4th, 2004 Understanding Semantic Web Technology by Professor Jim Hendler and Brand Niemann: –(1) Tree of Knowledge Technologies and The Semantic Technology Layer Cake –(2) Where We Are –(3) Emerging Vendors Landscape: Semantic Integration –(4) Semantic Technologies and Web Services –(5) The First Site on the Semantic Web –(6) Taxonomy –(7) Topic Maps –(8) RDF and Ontology Components –(9) RDF Syntax and Validator –(10) OWL Syntax and Functionality –(11) Some Educational Resources Note: Based on TopMIND Tutorials, November 3-4, and December 3-4, 2003
34 3. Semantic Technologies for eGovernment Jim Hendler is a Professor at the University of Maryland and the Director of Semantic Web and Agent Technology at the Maryland Information and Network Dynamics Laboratory. He holds joint appointments in the Department of Computer Science, the Institute for Advanced Computer Studies and the Institute for Systems Research, and he is also an affiliate of the Electrical Engineering Department. He has authored close to 150 technical papers in the areas of artificial intelligence, robotics, agent-based computing and high performance processing. Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence. As Chief Scientist and Program Manager at DARPA for the DAML program, he has been one of the major drivers in the creation of the Semantic Web, and continues to be a prominent player in the W3Cs Semantic Web Activity.
35 (1) Tree of Knowledge Technologies AI Knowledge Representation Semantic Technology Languages Content Management Languages Process Knowledge Languages Software Modeling Languages
37 (2) Where We Are We Are Here Source: Tim-Berner Lee, Standards, Semantics and Survival, http://www.w3.org/2003/Talks/01-siia-tbl/Overview.html
38 (3) Emerging Vendors Landscape: Semantic Integration Expressivity and Semantic Power Enterprise Support XML RDF OWL Data and Schema Management Validation Run - time Engine Integration and Orchestration Ontology Works enLeague Ontoprise Network Inference Unicorn SchemaLogic Contivo Celcorp Vitria MetaMatrix Modulant IGS S S S S S S S S S U S&U S U Structured information Unstructured information Supports both Current Support / Primary Strength S Miosoft Source: Irene Polikoff, TopQuadrant, Positioning Semantic Technologies: The Emerging Vendor Landscape, September 8, 2003.
39 (4) Semantic Technologies and Web Services Dynamic Resources Static Resources Interoperable Syntax Interoperable Semantics Web Services WWWSemantic Web Services Semantic Web Services Enterprise Ontology and Web Services Registry Source: Derived in part from two separate presentations at the Web Services One Conference 2002 by Dieter Fensel and Dragan Sretenovic.
40 (5) The First Site on the Semantic Web http://owl.mindswap.orgPhotoStuff: Image Annotation Tool with OWL
41 (6) Taxonomy From Tim Berners-Lee, ISWC 2003 Regardless of end goals, look to a future where taxonomies interoperate (domains connect) Expect new stakeholders to take an interest… … but have their own viewpoints Technology Recommendation: RDF(S) Goals for enterprise taxonomies
42 (6) What is a Taxonomy? A taxonomy is a model of knowledge organized as a hierarchical arrangement (tree structure) of concepts : –parent nodes denote more general ideas than their children. animal horsesheep marestallioneweram animal horsesheep dales pony arabian horse swaledalecheviot OR [A][B]
43 (6) Types of Taxonomy A taxonomy can be: –A classification hierarchy, eg: Natural Taxonomy: Unique Beginner (plant) -> Life-Form (bush) -> Generic (rose) - > Specific (hybrid tea) -> Varietal (Peace) –A part hierarchy (Meronomy) –A category hierarchy Taxonomies can intersect – intersection means there are different relationships at work: Reference: D.A. Cruise, Lexical Semantics, Cambridge University Press, 1986 building cinemaOffice-blocksynagoguemosquepubchurch shrine holy place
44 (6) topSAIL/tdf – Taxonomy Development Framework: A five-step method for taxonomy development Focus What is the taxonomy for? What business challenges will it overcome? What results will it achieve? How to measure stakeholder benefit? Analysis What is the context for the taxonomy? What are the types & sources of knowledge? How does knowledge map to processes? Design What types of taxonomy concepts are needed? What to do first? What system capabilities are needed? What will be the impact? Is the taxonomy design correct, complete and consistent? Construct Have we enough content mapped? How to connect taxonomies to content? How to integrate with IT systems? Deploy How do we ensure there will be feedback for assessment? Have we accomplished set objectives? What should be done next? 12345
45 (7) Topic Maps Topic: –The entry in a topic map that refers to a subject on the real world. –Topic Maps make a Plato- distinction between Things in the Real World (Subjects) and Things in the Topic Map world (Topics). Association: –Linkages between Topics. –Tosca was written by Puccini. Occurrence: –Topics occur in resources. –Resources indicated e.g., URLs –Types of Occurrence: mention, illustration, article, etc. Note: See http://www.giuseppeverdi.it/verdi. Also see http://www.coolheads.com/egov for merging of topic maps. The TAO of Topic Maps
46 (8) RDF and Ontology Components Key Ontology ComponentsRDF* Triple Components Subject* Object Literal Predicate** =URI =Literal =Property or Association *The company* **sells batteries**. Person birthdate: date Gender: char Image LeaderOrganization Resource leads is-A works for published depiction knows Source: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003. * Resource Description Framework
47 (9) RDF Syntax and Validator Jen Golbeck George Washington University Adjunct Professor July 2001 $1 15 ……………….. http://www.w3.org/RDF/Validator Graph of the Data ModelSyntax
48 (10) OWL Syntax and Functionality Our Cool Employee Class Our Cool Civil Servant Class Our Cool Woman Class Our Cool Man Class ……. Applications for OWL: –Markup for web pages and other web- based media. –Raw Data Sharing. –Web Services. Media Markup: –Google and other keyword searches are excellent because they can work with text. Not likely to be much improved by semantic web. –Image searches are much worse than text searches. No way to know what is happening in an image, what in it, what context it was taken, or who is doing what. –MP3 searches. I want that song that was in the Mitsubishi commercial… –Video search. Challenges: –Trust & Provenance. –Visualization.
49 (11) Some Educational Resources Dieter Fensel, Wolfgang Wahlster, Henry Lieberman, James Hendler (Eds.): Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, MIT Press, 2002 John Davies, Dieter Fensel & Frank van Harmelen:, Towards the Semantic WEB – Ontology Driven Knowledge Management, John Wiley, 2002 Johan Hjelm, Creating the Semantic Web with RDF, John Wiley, 2001 Dieter Fensel: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce, Springer Verlag, 2001 Sheller Powers, Practical RDF, OReilly, 2003 Michael C. Daconta, Leo J. Obrst, Kevin T. Smith: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, John Wiley, 2003 Vladimir Geroimenko (Editor), Chaomei Chen (Editor),Visualizing the Semantic Web, Springer-Verlag, 2003 M. Klein and B. Omelayenko (eds.),Knowledge Transformation for the Semantic Web, Vol. 95, Frontiers in Artificial Intelligence and Applications, IOS Press, 2003
50 4. Repurposing the Statistical Abstract of the United States, 2003, Into a DRM Registry and Repository Overview Steps in Repurposing the Data Tables: –(1) Table in Adobe Reader 6.0. –(2) Define Basic XML Tags in XMLSPY 2004. –(3) Define XML Tags for Data Element Names in XMLSPY 2004. –(4) Markup the Table in XMLSPY 2004. –(5) Grid View in XMLSPY 2004. –(6) XML Table Database in Excel 2002. –(7) Create the HTML Interface. –(8) HTML Interface in Browser. –(9) XML Table Database in Browser. Some Features of the DRM Registry and Repository: –Note that it is embedded in the document itself, not separate!
51 4. Repurposing the Statistical Abstract of the United States, 2003, Into a DRM Registry and Repository Overview: –The methodology for repurposing the Statistical Abstract, 2003, documents (45 PDF files/14.2 MB) into a structured XML content collection was presented previously: See Past Meetings and Presentations at http://web-services.gov, November 18-19, 2003, Website Content Management for Government Conference, Invited Presentation on November 19th on "Repurposing Documents Into Semantic Web Services and Networks" (EPA Enterprise Integration Portal/Data Exchange Network Pilot), Doubletree Hotel, Arlington, VA. Also see Folio-to- XML Conversion and Webinar. –Current plans call for the completions of the repurposing of this document and continued work on state of the environment and national and community indicator reports.
52 Step 1. Table in Adobe Reader 6.0 Text Select Tool & Highlight Table, Edit & Copy, & Edit & Paste to XML SPY 2004
53 Step 2. Define Basic XML Tags in XMLSPY 2004
54 Step 3. Define XML Tags for Data Element Names in XMLSPY 2004 Census Date (Year, Month & Day) Resident Population (Number) Resident Population (Number Per Square Mile of Land Area) Resident Population Increase Over Preceding Census (Number) Resident Population Increase Over Preceding Census (Percent) Area (Square Miles) Total Area (Square Miles) Land Area (SquareMiles) Water CensusDateYearMonthDay ResidentPopulationNumber ResidentPopulationPerSquare MileofLandArea ResidentPopulationIncreaseOv erPrecedingCensusNumber ResidentPopulationIncreaseOv erPrecedingCensusPercent AreaSquareMilesTotal AreaSquareMilesLand AreaSquareMilesWater The heart of the DRM Registry and Repository for reuse!
55 Step 4. Markup the Table in XMLSPY 2004 Text View in XMLSPY 2004
56 Step 4. Markup the Table in XMLSPY 2004 (continued) Text View in XMLSPY 2004
57 Step 5. Grid View in XMLSPY 2004 (like a spreadsheet!) Highlight Grid Table, Edit & Copy as Structured Text, and Paste to Excel.
58 Step 6. XML Table Database in Excel 2002 Highlight Table, Format & Column & AutoFit Selection. Also spreadsheet-like data tables can be pasted into XMLSPY 2004.
59 Step 7. Create the HTML Interface Note two references to statabs2003no1.xml. Navigation Functionality (non-XML)
60 Step 7. Create the HTML Interface (continued) Data Element Names XML Tag Names Note this makes the XML table database independent of the HTML presentation.
61 Step 8. HTML Interface in Browser Link to XML File Navigation Buttons Can easy browse through long tables.
62 Step 9. XML Table Database in Browser Can expand and collapse using + and -. The heart of the DRM Registry and Repository for interoperable exchange.
63 Some Features of the DRM Registry and Repository Taxonomy of Federal Statistical Data and Information!
64 Some Features of the DRM Registry and Repository Detailed of Table of Contents for Entire Document.
65 Some Features of the DRM Registry and Repository Detailed Table of Contents for Each Section.
66 Some Features of the DRM Registry and Repository Graphics can have RDF metadata.
67 Some Features of the DRM Registry and Repository Tables are structured data (copy to Excel) and available in XML
68 Some Features of the DRM Registry and Repository Table copied to Excel from Browser
69 Some Features of the DRM Registry and Repository Search within just one chapter of the entire document.
70 Some Features of the DRM Registry and Repository Better search than from conventional Internet search engines.
71 Some Features of the DRM Registry and Repository Appendix III on Limitations of the Data (Data Quality) for Major Databases!
72 Some Features of the DRM Registry and Repository Harmonization/Standardization of Data Element and XML Tag Names
73 5. Additional Pilots Where does the FEA go next?, Bob Haycock, Chief Architect, OMB, at the Chief Architects Forum, April 5, 2004: –Complete the DRM. –Conduct DRM Community of Practice Pilots. –Continue to develop and implement further DRM volumes and FEA Data Management Strategy. –Etc.
74 5. Additional Pilots Census Bureau/FedStats (Statistical Abstract of the US): –Lead original Line of Business (Data and Statistics) which was exempted so it became a logical selection for a best practice pilot! National Indicator System and the Community Statistical System: –GAO, CEQ, Community Indicator Consortium, etc. Sustainable Intergovernmental Network Exchange (SINE): –Global Justice, EPA, Health, etc. Intelligence Community Metadata Working Group (IC MWG): –XML Enablement Strategy and Tool Evaluation. Componenttechnology.Org: –Proposals from participants in this Community of Practice to Populate the Service Grid with Services Components. Categorization of Government Information Working Group of the Interagency Committee on Government Information: –GSA Office of Intergovernmental Solutions (Susan Turnbull) Outreach to Involve State and Local Governments. University of Maryland MINDLab (Professor Jim Hendler) and TopQuadrant (Ralph Hodgson): –Semantic Markup and Tools for Government Content (getting content ready for them!).