Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification of Business Documents DITA BusDocs Subcommittee Meeting 21 January 2008 Presentation with Notes from the Focus Group Meeting of 14 Jan 2008.

Similar presentations


Presentation on theme: "Classification of Business Documents DITA BusDocs Subcommittee Meeting 21 January 2008 Presentation with Notes from the Focus Group Meeting of 14 Jan 2008."— Presentation transcript:

1 Classification of Business Documents DITA BusDocs Subcommittee Meeting 21 January 2008 Presentation with Notes from the Focus Group Meeting of 14 Jan 2008

2 Meeting Summary Classification focus group members include Howard Schwartz, Eric Severson, Amber Swope, and Michael Boses. Howard was not able to attend the meeting due to travel Michael presented the enclosed PowerPoint as a starting point for the discussion Discussion was captured and incorporated into the PowerPoint under the heading, “Notes” Next steps: – Eric will work on a preliminary mapping of a limited number of document types that illustrate the mapping – The focus group will present a summary of what we have discussed to the full subcommittee during the January 21 meeting

3 Introduction - 1 The need for a classification system for business documents arises from: – The desire to indentify the specific document set that is being addressed by the subcommittee, as well as the rationale behind that selection – The ability to further analyze the document set using a refinement of the same characteristics used to classify them

4 Introduction - 2 What type of characteristics are important? – Documents can be classified in many ways. The most common way used is a semantic classification based upon the textual content of the document – The subcommittee approach is different since we want to classify documents based upon their structural characteristics since it is the structure of business document that will need to be harmonized with DITA

5 Potential Structural Characteristic to Consider when Classifying Is it a narrative? Narrative complexity Document length Tree depth Tree balance Table frequency Table complexity Graphic frequency XML vocabularies Transclusions – Notes: Eric feels that repetitive structures will be an important characteristic – Amber suggests that whether a document references external system data might be important as well Howard – Understanding the business purpose might be important as a characteristic. Eric– could be interesting but maybe not the driver We will capture the information as part of the analysis Ann – It’s possibly a different level of classification Josef– translation should not in itself change the structure, but perhaps what we want to look at is documents with variants in them. Howard--Business documents will have different challenges than technical publications

6 Higher level model – Structures that are not linked to semantics that can then be correlated to documents for different usage – The end-game is to say where does DITA fit in? – “semantic neutral” way of classifying – Apply the general to specific usages later – Eric– concept, task, and reference were specializations to begin with—are they even meaningful for business documents? – Howard-- Informational, vs. persuasive? Intent or purposes—does it correlate to structure—does it dictate structure, does it matter for reuse?

7 Business DocumentsTabular ReportsFormsNarrative Documents First-level Classification Notes: while the concept is good, none of us is happy with the terminology. In particular, we need to come up with an alternative for Forms. The purpose of this slide is to say that there are business documents that are out-of-scope. This is our first level?

8 FormsNarrative Documents Subject Document Form-Narrative Scale Metric: – Ratio of total elements to total words – Notes Eric: What is a form? How do we keep from excluding documents with structures that we need to address, because we called a “form”? Something to describe “form” that isn’t based upon its implementation. “XML blurs the distinction between documents and data” – A: Elements are “structural” in nature. We need to define what type of elements we will use to arrive at the ratio

9 Narrative Document Narrative Density? Tree Depth? Document Length? Most Significant Characteristic? Once we have established that it is a narrative document, what is the next most significant characteristic to examine? – Notes, general agreement with the presentation, that it would be the tree depth of the document

10 Eric- DITA is trying to apply best practices to writing – is this a fundamental thing about writing or is it just tech pubs? Should there be a more generic task that could be specialized into a tech pubs task? Ann- what we have now is a specialization for tech docs and so it fits—it is possible to start at higher at a more generalized level Interesting that paragraphs have “topic sentence.” The topic sentence may be an important bridge that allows us to introduce the concept of topic based authoring to the business community Business documents are maturing—are tech docs more mature? Tech docs are most often not read for pleasure and are “random access” information Writing for reuse has a significant impact on how content is written—does it invalidate some of our common business document structures?

11 Types of reuse: – The ability to flow one person’s content into another person’s content and have it hold up contextually – The ability to have content presented as a result of a query or aggregation and have it hold its integrity as a single unit of information – Will the message change depending upon how someone arrived at it—either in the original context or by itself? All this ties back to the maturity model that will help organizations move to a “best practice” approach to authoring. This will give us something valuable for business and acceptable to the DITA community. Now our classification can also correlate to this issue.

12 Narrative DocumentFlat Document Highly Nested Document The Need to Quantify Hierarchy The author of the highly nested document is using structure to communicate semantics. Hierarchical Scale – Ratio of total transitions in hierarchy to total elements Notes: General agreement. No specific comments

13 Flat DocumentLight NarrativeDense Narrative Highly Nested Document Light NarrativeDense Narrative Qualifying Narrative Density Narrative Density Scale – Average paragraph length for paragraphs > 100 characters – Notes: no specific comments

14 Recap of Characteristic Importance Is it a Narrative? Narrative complexity Document length Tree depth Tree balance Table frequency Table complexity Graphic frequency XML vocabularies Transclusions Notes: Eric- we need to address: repetitive structures (i.e., topics) and constrained structures. What do repetitive structures and constrained structures mean to DITA? Michael: the number of paragraphs per section seems important—but what is a section?

15 Notes: Additional Discussion Discussion of an SOP as it relates to repeating structures – One approach to an SOP is for it to be very verbose, with only 4-5 “structures” – Another approach is for it to be very terse, with 20 structures that add semantics to the content. The goal of XML in general when applied to narrative documents, is to imply more and more of the semantics through the document structure “Document linearity with repeating structures” as a structural characteristic provides “random access” to the information in the document. Repetitive structures appear to be as important a characteristic as the tree depth, if not more. Repetitive structures to a degree indicate whether the document is a reference or something intended to be read end-to-end? Repetitive structures cause a document to actually be a collection of mini- documents, each that could stand alone


Download ppt "Classification of Business Documents DITA BusDocs Subcommittee Meeting 21 January 2008 Presentation with Notes from the Focus Group Meeting of 14 Jan 2008."

Similar presentations


Ads by Google