Presentation is loading. Please wait.

Presentation is loading. Please wait.

Translation Quality Measurement

Similar presentations


Presentation on theme: "Translation Quality Measurement"— Presentation transcript:

1 Translation Quality Measurement
By Riccardo Schiaffino and Franco Zearo

2 Biographical Notes on the Authors
Riccardo Schiaffino Riccardo Schiaffino worked as translator, translation manager and special software translation project lead for a major software company. As a translation manager, Riccardo worked on the improvement of translation quality and on translation quality metrics and tools. He holds an MA degree in Translation, and has been working in translations for over 18 years, first in Italy and then in the U.S. Riccardo is ATA accredited. Contact: Franco Pietro Zearo Franco Pietro Zearo is a project manager with Lionbridge Technologies in Boulder, Colorado. He holds a degree in translation from the Advanced School of Modern Languages for Translators and Interpreters at the University of Trieste, Italy, and earned an MBA from the University of Phoenix. Before joining Lionbridge in 1996, he worked as a freelance technical translator in Italian, English, and Russian. At Lionbridge, he has held positions in translation, localization analysis, presales, and cultural and globalization consulting. He has been responsible for translation quality on numerous projects for many Fortune 500 clients. In his previous role as senior technical translator, he helped define best practices for the translation department. Contact:

3 Overview Technical translation and quality
Translation quality initiatives Quality Control vs. Quality Assurance Our proposal for quality assurance Checklists Sampling techniques Conclusions Importance of cost/benefit factors We shall deal here exclusively with technical or commercial translation. Probably the first reaction of a translator is that translation quality cannot be measured, but that he or she can tell good-quality translation from a middling or poor one. This is all for the good, but without quality measurement, your “good” translation could be my “poor” one, etc. We should distance ourselves from subjective quality criteria and strive for objective ones. Of course, all of this only concerns technical or business translation... translation quality metrics for literary translation would be as inappropriate as indeed would any attempt to judge any artistic endeavor on quantitative criteria.

4 Overview Measuring Quality Translation Quality Assessment
Quality Assurance Forms Error Categories Sampling Translation Quality Index Questions and Answers (Franco) We’ll start with an overview of last year’s presentation and in particular in the field of translation quality assessment. We’ll talk about our definition of quality Why measurements are important, and what to measure We’ll then talk of the work that we are doing to improve and control translation quality in our companies Finally, we’ll mention the concept of the translation quality index, and describe the work we still need to do to get there. At the end, we’ll open the floor to questions (please save your questions for the end of our presentation) (Remind audience about the feedback form)

5 Overview Why Is Quality Measurement Important?
How to Set Up a Quality Measurement System Demo of a Translation Quality Measurement Tool Prototype Practical Recommendations Questions & Answers Last year, we introduced the concept of the Translation Quality Index (TQI), an innovative approach to the measurement of quality in translation. This year, we will expand on this topic and present additional research in the field of translation quality assurance. In particular, we will show how to set up a quality measurement process and present the prototype of a tool for measuring translation errors in a more consistent way.

6 Our Definition of Quality
Functional approach to quality Different views of translation lead to: ð Different concepts of quality ð Different assessments Quality is defined as meeting the needs and expectations of the customer or user. Item 2: (House, 1997; as quoted in Schäffner, 1998). implicit or explicit (Newmark... I don’t have the exact quote).

7 Our Definition of Quality
Functional approach to quality Quality is defined as consistently meeting the needs and expectations of the customer or user Different views of translation lead to: Different concepts of quality Different assessments Quality is defined as consistently meeting the needs and expectations of the customer or user. “A predictable degree of uniformity and dependability at low cost and suited to the market” – Deming “Evaluating the quality of a translation presupposes a theory of translation. Thus different views of translation lead to different concepts of translation quality, and hence different ways of assessing it.” (House, 1997; as quoted in Schäffner, 1998). Any act of translation presupposes a theory of translation, implicit or explicit (Newmark... I don’t have the exact quote).

8 Correct Translation A correct translation is a translation with no errors or where total error points result in a Translation Quality Index above the desired threshold

9 Customer-driven Considerations
Conformance to specifications Customer’s vs. One’s own Fitness for use How well the translation performs its intended purpose Value ( = quality & price) How well the translation performs its intended purpose at a price customers are willing to pay Support E.g.: Printing, testing Psychological impressions E.g.: In-country translators; certification

10 Customer-driven Considerations
Price / Time / Quality

11 Importance of Quality Quality as a Competitive Weapon Good Quality ð Higher Profits Good quality of translation (product) and service (process) can pay off in higher profits Improving on quality can reduce costs and speed up time-to-market

12 Why is Quality Measurement Important?
You can’t manage what you can’t measure It is difficult to improve something if you cannot measure it. Such measurement should be repeatable and objective. Different persons should arrive at similar assessment for the same piece of translation.

13 Why is Quality Measurement Important?
It is difficult to improve something if you cannot measure it. Such measurement should be repeatable and objective. Different evaluators should arrive at similar assessment for the same piece of translation. Why is translation quality measurement important? You can’t manage what you can’t measure It is difficult to improve something if you cannot measure it. Such measurement should be repeatable (i.e., consistent), objective (i.e., different persons should arrive at similar assessment for the same piece of translation). This means that it has to be, void—as much as possible—of subjective bias. “When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it, cannot express it in numbers, then your knowledge is of a meager and unsatisfactory kind” Lord Kelvin,

14 Why is Quality Measurement Important?
It is difficult to improve something if you cannot measure it Metrics provide: A way to objectively quantify a process A means to reduce the cost of poor quality A means to increase customer satisfaction An opportunity for benchmarking Competitive advantages “When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it, cannot express it in numbers, then your knowledge is of a meager and unsatisfactory kind” Lord Kelvin,

15 “You cannot measure quality”
This is not true: There are certain components of translation quality that will always remain subjective. However, There are other elements that can be objectively measured. By concentrating of these, we believe we can achieve a satisfactory measurement of translation quality.

16 Who Benefits from Reliable Translation Quality Measurement?
Professional Translators Translation Companies and In-House Translation Departments Translation Customers and Users Professional Translators “Professional translators need [translation quality assessment] because there are so many amateur translators who work for very little money that professional translators will only be able to sell their products if there is some proof of the superior quality of their work.” - Hönig (1998), p. 15 Translation Companies and Departments Provides a baseline for future improvements Translation Customers and Users “Users need [translation quality assessment] because they want to know whether they can trust the translators and rely on the quality of their products.”- Hönig (1998), p. 15

17 Why Do We Make Errors? The reasons behind the errors are separate from the measurement of the errors: Studying why errors happen is important, but it pertains more to quality control and improvement than to quality assurance E.g., capitalization errors due to the "Autocorrect" (mis)feature of MS Word (e.g., HBsAg "corrected" to HbsAg) Why do we make errors? The reasons behind the errors are separate from the measurement of the errors: Studying why errors happen is important, but it pertains more to quality control and improvement than to quality assurance E.g., capitalization errors due to the "Autocorrect" (mis)feature of MS Word (e.g., HBsAg "corrected" to HbsAg)

18 Quality Assurance (QA)
QC vs QA Quality Control (QC) Quality verification over the whole text. Example: editing. Quality Assurance (QA) Sampling techniques, control of quality over a (statistically significant) sample of the whole text. Example: quality measurement.

19 QC vs QA Quality Control (QC) Quality Assurance (QA)
Quality verification over the whole text. Example: Editing. Quality Assurance (QA) Sampling techniques, control of quality over a (statistically significant) sample of the whole text. Appropriate use: Quality measurement. Translation Quality Initiatives ISO 9000 series EUATC Quality Standard DIN 2345 ASTM Standard for Language Translation F15.48 SAE J2450 UNI EN 10754 LISA QA Model Academic translation theories and studies Private sector methodologies Mostly 9002; ILE was ISO 9001 certified Quality Manual Quality control procedure Process-oriented Very expensive Perhaps good for the manufacturing industry EUATC (European Union of Associations of Translation Companies) Similar to ISO Process is certified (not the final result) Defines relationship between client and translation provider Selection of the best supplier Consumer-Focused Guide to Quality Language Translation Translation quality metric developed by a sub-committee of the Society of Automotive Engineers (SAE) Product-oriented Sectorial (Stated objective: “[to] establish a consistent standard against which the quality of automotive service information can be objectively measured regardless of the target language and regardless of how the translation is performed—that is, human translation or machine translation.” Eckersley, p.39 Intended for service manuals in the automotive industry only

20 QC v QA Quality Control (QC) Quality Assurance (QA)
Quality verification over the whole text. Example: Editing Quality Assurance (QA) Sampling techniques, control of quality over a (statistically significant) sample of the whole text. Appropriate use: Quality measurement

21 Translation Quality Factors
Accuracy The TL text corresponds to the SL text as regards to terminology, etc. Usability The TL text can appropriately be used for the purpose for which it was intended (e.g. – a manual can be used to instruct about the operation of some equipment or program, a contract can be used to specify exactly what the terms and conditions of an agreement are, etc.)

22 Inspection Points Key Principle: Reject “defective material” at its lowest value Proof SL Content Development (GIGO) Edit $ Value of Service Key Principle: Reject “defective material” at its lowest value GIGO = Garbage In, Garbage Out The whole concept here is that the sooner you can introduce quality into the process, the less expensive it is to make corrections. The red circles indicate desirable check points, where quality is measured. Translation Stages of Production

23 Inspection Points Key Principle: Reject “defective material” at its lowest value Proof SL Content Development (GIGO) Edit $ Value of Service Key Principle: Reject “defective material” at its lowest value GIGO = Garbage In, Garbage Out The whole concept here is that the sooner you can introduce quality into the process, the less expensive it is to make corrections. The red circles indicate desirable check points, where quality is measured. Translation Stages of Production

24 Cost/Benefit Analysis
Quality measurements are a tool to determine the optimal level of quality. They could help us identify a cut-off point. The enemy of “good” is “better”. Quality measurements give us the tools to determine precisely which level of translation quality we should aim at. There is probably a point at which any additional dollar (or additional hour) spent in improving quality buys less and less improvement.

25 Ideas from other disciplines
Software project management techniques W. Edwards Deming and other quality assurance experts (Riccardo) Contribution of other disciplines to the measurement of quality in translation Ideas from software development project management techniques There are some statistical methods used in the management of software development projects that we think are suitable to be adapted to the management of translation projects, certainly as regards the translation of software products but also other technical translation projects as well. Defects counts “At the most basic level, defect counts give you a quantitative handle on how much work the project team has to do before it can release the software. By comparing the number of new defects to the number of defects resolved each week, you can determine how close the project is to completion. Until the number of defects solved per time period exceed the number of new defects discovered, the project is still very far from over. If the project’s quality level is under control and the project is making progress toward completion, the number of open defects should generally trend downward after the middle of the project and then remain low. “ (McConnell, 1998) Statistics on effort per defect “The data on time required to fix defects categorized by type of defect will provide a basis for estimating remaining defect-correction work on [the current] and future projects. The data on phases in which defects are detected and corrected aklso gives you a measure of teh efficiency of the [translation quality control] process. If 95% of the defects are detected in the same phase they were created, the process is very efficient: if 95% of the defects are detected one or more phases after the phase in which they were created, the project has a lot of room for improvement. (ibid.) Defect density prediction “One [...] way to judge whether a [translation project] is ready to release is to measure its defect density [the number of defects per page, per 1000 words or per screen]. Suppose that release 1.0 of our documentation consist of 2,000,000 words, that the editing process detected and corrected 9,000 errors, and that a further 1,000 errors were discovered after the documentation was released: the overall defect count for this documentation release would be 10,000, with a density of 5 errors per 1000 words (about one error per page). (notes continue on next page)

26 When we study translation quality, we can focus on different things:
The translator The translation process (the “process”) Much emphasis has been placed on claiming that quality translations are achieved thanks to the adoption of a particular process (e.g. ISO, DIN). Another approach is highlighting that only native speakers (or professional translators with various combinations of experience, expertise, background) are used. However, the process and the people or players (manufacturers would say “tools”) are only indirect (or external) indicators of quality. We are interested in finding out what quality attributes are to be found (or lacking) in the translated text (the “product”). (direct measurement of intrinsic quality) The translated text (the “product”)

27 Product & Process Assessment
Translation quality assessment must apply to both: The translated text (the “product”) The translation process (the “process”) Does following to the letter a controlled process = quality, as the ISO 9000 certification seems to imply? (ISO 9000 example; example from “Dilbert” has been removed due to copyright concerns.) We do not think so: Not only must the process be accurately described and implemented, but it must be well thought out and appropriate. In this case, following the process may be an external indication that quality is being built in the process itself.

28 Product & Process Assessment
Translation quality assessment must apply to both: The translated text (the “product”) The translation process (the “process”) Does following to the letter a controlled process = quality, as the ISO 9000 certification seems to imply? (ISO 9000 example; example from “Dilbert” has been removed due to copyright concerns.) We do not think so: Not only must the process be accurately described and implemented, but it must be well thought out and appropriate. In this case, following the process may be an external indication that quality is being built in the process itself.

29 Translation Quality Initiatives
ATA and other translators’ certification initiatives The translator The translation process DIN 2345 ISO 900x UNI EN EUATC ASTM SAE J2450 LISA QA Translation Quality Initiatives ISO 9000 series EUATC Quality Standard DIN 2345 ASTM Standard for Language Translation F15.48 SAE J2450 UNI EN 10754 LISA QA Model Academic translation theories and studies (see Bibliography; also, the Geneve Translation school) Private sector methodologies (e.g. Microsoft) Mostly 9002; ILE was ISO 9001 certified Quality Manual Quality control procedure Process-oriented Very expensive Perhaps good for the manufacturing industry EUATC (European Union of Associations of Translation Companies) Similar to ISO Process is certified (not the final result) Defines relationship between client and translation provider Selection of the best supplier Consumer-Focused Guide to Quality Language Translation Translation quality metric developed by a sub-committee of the Society of Automotive Engineers (SAE) Product-oriented Sectorial (Stated objective: “[to] establish a consistent standard against which the quality of automotive service information can be objectively measured regardless of the target language and regardless of how the translation is performed—that is, human translation or machine translation.” Eckersley, p.39 Intended for service manuals in the automotive industry only Academic translation assessment theories and studies Private sector methodologies The translated text

30 Translation Quality Initiatives
ISO 9002 EUATC Quality Standard DIN 2345 ASTM Standard for Language Translation SAE J2450 LISA QA Model Academic translation theories and studies Private sector methodologies ISO 9002 Quality control procedure Quality Manual Process oriented Very expensive Perhaps good for the manufacturing industry EUATC (European Union of Associations of Translation Companies) Similar to ISO Process is certified (not the final result) DIN 2345 Defines relationship between client and translation provider Selection of the best supplier ASTM (American Society for Testing and Materials) Defines guidelines Procurement Guidelines SAE J2450 (Society of Automotive Engineers) Quality metric focused on identifying and recording translation errors Numeric weights Score sheet (weighted score) Regardless of SL or TL LISA QA Model Key approaches: Repeatability (one person-same work twice) Reproducibility (2 people should achieve the same result)

31 Quality Measurement: Our Proposal
What Can Other Disciplines Teach Us? Use checklists to collect the data Identify types of errors, issues or problems Determine relative importance of issues (may be different for different languages; e.g., spelling errors in English, French or Italian) Use sampling techniques to assess your quality level Determine percent thresholds for various levels of quality Determine whether you have achieved your target quality or not Contribution of other disciplines to the measurement of quality in translation Ideas from software development project management techniques There are some statistical methods used in the management of software development projects that we think are suitable to be adapted to the management of translation projects, certainly as regards the translation of software products but also other technical translation projects as well.

32 Criteria for Successful Quality Measurements
Translation quality measurements should be: Repeatable (two assessments of the same sample yield similar results) Reproducible (different evaluators should arrive at a similar assessment for the same piece of translation Objective (void of subjective bias) Such measurement should be repeatable (i.e., consistent), objective (i.e., different persons should arrive at similar assessment for the same piece of translation). This means that it has to be, void—as much as possible—of subjective bias. Note the emphasis on “similar”: Ideally, it should be “same”. But, alas, translation evaluation is still a very subjective undertaking.

33 Classification of Errors

34 Measurement through Circumstantial Evidence
Errors are circumstantial evidence of quality We believe that precise error measurement provides sufficient indication of good and bad translations A good translation is a translation with very few errors or none at all

35 Definition of Errors Deal with errors only when they violate agreed upon protocols of engagement whether implicit or explicit Examples of explicit and implicit criteria: Non-compliance errors (e.g. not following instructions) Violations of generally accepted language conventions For Generally Accepted Language Conventions, you’d better specify authoritative sources (dictionaries, grammars, and style guides) to avoid falling into subjective evaluations. Summary: Error Categorization Select a (small) set of categories CTQ: Critical-To-Quality categories Provide clear definitions Assign a weight Critical, Major, Minor Concept: You should measure too many things; keep it simple; not all the elements in your checklist need to be measured (unless you are applying the first definition of sampling) Quote: “When setting metrics, keep the number of measurements small.” “The key here is quality over quantity.” “When establishing a metric, you need to know why you’re measuring it, why it’s important, and what’s causing the results. ” Source: Brue (2002), p. 48 Concept: When is a metric important? Look at the consequences: E.g. Errors that could lead to liability, health, or safety issues. Critical errors may require the recall of the localized product from the market Major errors may require a correction to the current release of the localized product Minor errors may require a correction for the next release of the localized product Concept: Critical-to-quality (CTQ) Quote: “Elements of a process that significantly affect the output of that process. Identifying these elements is vital to figuring out how to make the improvements that can dramatically reduce costs and enhance quality.” Source: Brue (2002), p. 15 Concept: When defining error categories, avoiding room for subjective interpretation. When it comes to identifying an error or a mistranslation, nothing is black and white. So, you must appeal to a ‘higher authority” (e.g. L'Académie française), or reference books that are generally accepted as being authoritative. E.g.: For Italian: Spelling in accordance to Zingarelli, N. Vocabolario della Lingua Italiana, 16ª ed., Zanichelli, 2002 Grammar in accordance to Dardano M., Trifone, P. La Lingua Italiana, 1ª ed., Zanichelli, 1985 Style in accordance to Lesina, R. Il Manuale di Stile, 1ª ed., Zanichelli, 1986 Why do we make errors? The reasons behind the errors are separate from the measurement of the errors: Studying why errors happen is important, but it pertains more to quality control and improvement than to quality assurance E.g., capitalization errors due to the "Autocorrect" (mis)feature of MS Word (e.g., HBsAg "corrected" to HbsAg) provide real-life examples of errors something that you define explicitly or that is implicit in something that you defined explicitly e.g. obiettivo / obbiettivo or spelling according to a certain dictionary Google as ad hoc tool Try to limit as far as possible are covered by subjective judgments Real-Life Examples Examples: “Domestic” Spanish (good) French punctuation rules (bad) Solution client education about translation etc. Otherwise client education happens during process in a more painful way

36 Summary: Error Categorization
Select a (small) set of categories CTQ: Critical-To-Quality categories Provide clear definitions Set tolerance limits Min / Max # of errors per X words Assign a weight Critical, Major, Minor Concept: You cannot measure too many things; keep it simple; not all the elements in your checklist need to be measured (unless you are applying the first definition of sampling) Quote: “When setting metrics, keep the number of measurements small.” “The key here is quality over quantity.” “When establishing a metric, you need to know why you’re measuring it, why it’s important, and what’s causing the results. ” Source: Brue (2002), p. 48 Concept: When is a metric important? Look at the consequences: E.g. Errors that could lead to liability, health, or safety issues. Critical errors may require the recall of the localized product from the market Major errors may require a correction to the current release of the localized product Minor errors may require a correction for the next release of the localized product Concept: Critical-to-quality (CTQ) Quote: “Elements of a process that significantly affect the output of that process. Identifying these elements is vital to figuring out how to make the improvements that can dramatically reduce costs and enhance quality.” Source: Brue (2002), p. 15 Concept: When defining error categories, avoiding room for subjective interpretation. When it comes to identifying an error or a mistranslation, nothing is black and white. So, you must appeal to a ‘higher authority” (e.g. L'Académie française), or reference books that are generally accepted as being authoritative. E.g.: For Italian: Spelling in accordance to Zingarelli, N. Vocabolario della Lingua Italiana, 16ª ed., Zanichelli, 2002 Grammar in accordance to Dardano M., Trifone, P. La Lingua Italiana, 1ª ed., Zanichelli, 1985 Style in accordance to Lesina, R. Il Manuale di Stile, 1ª ed., Zanichelli, 1986

37 Summary: Error Categorization
Select a (small) set of categories CTQ: Critical-To-Quality categories Provide clear definitions Assign a weight Critical, Major, Minor

38 Real Life Examples Development of translation quality measurement at J.D. Edwards Use of sampling techniques for quality assurance at Lionbridge (Riccardo) Use Quality Assurance forms to collect the data Identify types of errors, issues or problems Determine relative importance of issues (may be different for different languages; e.g., spelling errors in English, French or Italian) Use sampling techniques to assess your quality level Determine percent thresholds for various levels of quality Determine whether you have achieved your target quality or not (continuation of note from previous page) Suppose now that the additional material translated for release 1.5 had a defect density of 7 error per 1000 words. You now translate version 2.0, and are finding a defect density of 2 error per 1000 words. Unless you have good reason to think that your translation process was dramatically improved, you would expect to find normally between 5 and 7 errors per 1,000, and finding only 2 per 1,000 words may suggest that there is still quite a bit editing and QC work to do. The more good historical data you have, the better your forecasts will be about the amount of work necessary to identify and correct errors before the actual release of the translation project. Defect pooling A simple defect prediction technique is to separate defects into two pools Defect seeding Technique by which a known number of defects is deliberately seeded in a work as a means of estimating how many (unseeded) defects are left to find: by seeing how many of the known defects are found during a QC operation and how many ‘unseeded’ ones are found, it is possible to estimate how many unseeded defects are left to find.

39 The J.D. Edwards’ QA Form Language Customization
Weighting the major categories Work still in progress J.D. Edwards uses QA form that is a modified version of the LISA Quality Assurance Form Changed so as to reflect the fact that different kind of errors should be given a different weight for different languages After deciding which are the major categories we want to control, The first step is to assign a different % weight to the major error categories

40 The J.D. Edwards’ QA Form Language Customization
Weighting the items within the major categories The second step is to assign a weight to the various elements within the major categories (more detailed/legible view to follow)

41 The J.D. Edwards’ QA Form Language Customization
Weighting the items within the major categories (detail) (detail from previous slide)

42 How We Worked to Develop Our Spreadsheet
Determine type of errors, issues or problems Determine relative importance of issues (may be different for different languages; e.g., spelling errors in English, French or Italian) Determine which are the responsibility of translation Determine tolerance limits for various levels of quality

43 Translation Quality Measurement Tool
The Translation Quality Measurement tool helps to measure process quality It is NOT an editing tool, but it serves to measure whether a process is effective

44 Use of the Tool Use the tool to measure the effectiveness of quality control process Analyze the results obtained through the tool (control charts) If the process is NOT in statistical control Discover special causes and deal with them appropriately Remove them if they are negative Incorporate them in process if they are positive Improve the process when it is in statistical control

45 A TQI Tool Prototype This tool is only a prototype. It will need to be externalized so as to be usable on text different than MS Word files. Also, additional functionality is needed to make it more robust, useful and user-friendly. Nonetheless it is a functioning tool and has already been used to help score translation tests for a translation organization

46 ATA Implementation

47 ATA Implementation

48 SAE Implementation (Modified)

49 SAE Implementation (Modified)

50 TQI Log Error = The word or sentence containing the error
Category EP Remarks Bookmark Path File Grader Date Is Formal 2 irregular capitalization; should be is, not Is x2_is C:\Documents and Settings\RS \Desktop\Quality Measurement CoffeMakerTest.doc RS 11/1/2003 aluminium 1 British spelling; American English should be aluminum, not aluminium x3_aluminium food Meaning The container is not made to cook food, it is made to brew a beverage. x5_food right for the gas cooker, the electric plate and the pyroceram A better phase might be: "acceptable for use on gas and electric stoves." x6_right_for pyroceram 4 The word "pyroceram" is unknown to most English speakers. x7_pyroceram wash In English, the word "wash" typically means water and soap. The instructions specify only using water, so a better word choice would be "rinse." x8_wash trow The word "trow" is a misspelling of "throw." x9_trow total 14 N. of words 42 TQI 67% Error = The word or sentence containing the error Category = The category for the error EP = Error Points Remarks = Detailed description of the error Bookmark = the bookmark assigned in MSWord by the tool (used to easily return to a specific error) Path = Path for the file tested File = File name of the file tested Grader = NT sign-on of grader (from system) Date = Date the test was graded (from system)

51 Use of Checklists There are several quality assessment methodologies that rely on the use of checklists – among these the LISA methodology.

52 Use of Checklists There are several quality assessment methodologies that rely on the use of checklists – among these the LISA methodology. We would like, however, to advocate the use not of “universal” checklists, but of checklist specifically tailored to each language. Checklists for evaluating translation companies Checklists and tests for evaluating translators Checklists for evaluating translations Limitations of universal checklists Language specific checklists (example, different weight of spelling correctness for different languages)

53 Development of Translation Quality Measurement at J.D. Edwards
From the concept of checklists to a spreadsheet of measurements Checklists are appropriate to control whether a certain action has been performed or not (e.g., spell check done or not – as opposed to a measurement of how many spelling mistakes were found) Based on LISA model (www.lisa.org) Flexibility (different settings for different languages)

54 Use of Quality Assurance Forms
The LISA Quality Assurance Form J.D. Edwards loosely based its Quality Assurance Form on the LISA Quality Assurance Form Lionbridge developed its own Language Compliance Form It is important to remember that these tools are for assessing the performance achieved: They are not quality control instruments (as described earlier: QC = editing, proofreading, etc. on the whole text, as against QA work on a sample of the text). They may be used, instead, to verify how effective our quality control efforts are.

55 Purposes of sampling according to LISA
To determine whether something has been done or not. To accept / reject the batch of product at hand. To determine if the process that produced the product at hand was within acceptable limits. (Franco) Purposes of sampling according to LISA: Sampling to determine whether something has been done or not. Concept: Checklists, by themselves, are not tools suitable for QA (but they are for QC). (E.g.: Checked spelling? Yes/No, but it doesn’t tell how many spelling errors were made, and if the number of spellings errors was within the tolerance limits Sampling to accept or reject the batch of product at hand. (Lionbridge’s main purpose for sampling). Sampling to determine if the process that produced the product at hand was within acceptable limits. (see Deming)

56 Guidelines for Sampling
Select a sample Selection criteria (e.g. random, systematic) Size considerations Cost considerations Evaluate the sample Repeatable, reproducible, objective Investigate the outcome / causes Correct / Improve Without proper guidance, an improperly selected sample would introduce bias and compromise the evaluation effort Sample size: The greater the sample, the more accurate assumptions can be made Concept: Economic value of the sample Quote: “The value of sample information and the cost of the sample both increase as sample size increase. The optimum sample size is that which balances the cost and value of the sample.” Source: see “How big should a sample be?” in Spurr and Bonini, p

57 Statistical Methods Defect Counts Defect Density Prediction
Statistics on Effort Per Defect Defect Density Prediction Defect Pooling Defect Seeding

58 Defect Counts Useful to obtain a quantitative measurement of how much QC work to do. Ratio of new defects to defects solved. Statistics on Effort Per Defect In order to estimate the scope of the defect correction work, it is necessary to have good data on the time necessary to fix the various types of defects “At the most basic level, defect counts give you a quantitative handle on how much work the project team has to do before it can release the software. By comparing the number of new defects to the number of defects resolved each week, you can determine how close the project is to completion. Until the number of defects solved per time period exceed the number of new defects discovered, the project is still very far from over. If the project’s quality level is under control and the project is making progress toward completion, the number of open defects should generally trend downward after the middle of the project and then remain low.” (McConnell, 1998) “The data on time required to fix defects categorized by type of defect will provide a basis for estimating remaining defect-correction work on [the current] and future projects. The data on phases in which defects are detected and corrected also gives you a measure of the efficiency of the [translation quality control] process. If 95% of the defects are detected in the same phase they were created, the process is very efficient: if 95% of the defects are detected one or more phases after the phase in which they were created, the project has a lot of room for improvement.” (ibid.)

59 Defect Density Prediction
One way to judge whether the QC work on a translation project is complete is to measure its defect density (the number of defects per page, per 1,000 words or per screen). Suppose that release 1.0 of our documentation consist of 2,000,000 words, that the editing process detected and corrected 9,000 errors, and that a further 1,000 errors were discovered after the documentation was released: The overall defect count for this documentation release would be 10,000, with a density of 5 errors per 1000 words (about one error per page). Suppose now that the additional material translated for release 1.5 had a defect density of 7 error per 1000 words. You now translate version 2.0, and are finding a defect density of 2 error per 1000 words. Unless you have good reason to think that your translation process was dramatically improved, you would expect to find normally between 5 and 7 errors per 1,000, and finding only 2 per 1,000 words may suggest that there is still quite a bit editing and QC work to do. The better historical data you have, the better your forecasts will be about the amount of work necessary to identify and correct errors before the actual release of the translation project.

60 Defect Pooling Defect pooling is a simple defect prediction technique that separates the defects found in a translation sample into two pools. Depending on the number of defects found in either of the two pools (but not in both) it is then possible to estimate the defects that have not been found in the sample. This number can then be used to estimate the number of defects in the entire project. Assign same sample of translated material to two QA persons. Each QA person to work independently on whole sample. Track which defects have been reported by QA person A, by QA person B, and by both A and B. The number of unique defects in the sample will be given by the following formula: DefUnique=DefA+DefB-DefAB If the translation sample has 40 defects in pool A, 35 defects in pool B and 15 defects in both A and B, the number of unique defects would be =60 The number of total defects for the sample in question can be estimated using the following formula: DefTot=(DefA*DerfB)/DefAB The approximate number of total defects for teh sample would therefore be (40*35)/15=93 This number could then be used to extrapolate the total number of defects in the complete project, and could therefore be used as a yardstick to measure the QC effort against.

61 Defect Seeding Defect seeding is a statistical technique in which a sample of a population is extracted and used to estimate the total population. The technique works by deliberately inserting (“seeding”) defects in a complete translation that will be QCed. The ratio of the seeded defects found compared to the total number of defects seeded provides a rough estimate of the total number of translation defects yet to be found. A common problem with this type of technique is forgetting to remove the errors deliberately inserted. Say for example that you want to estimate the number of fish in a lake. You free in the lake a number (e.g., 100) of fish that had previously been tagged. You then catch fish from the lake. Depending on the ratio between tagged and untagged fish that you catch, you can then reliably estimate the total number of fish in the lake. So, if you catch 50 fish, and 5 of them are tagged, you can estimate that the total fish population in the lake is 1000.

62 Calibration and Error Seeding
One of the things one can do to calibrate a translation quality measurement tool (or process) is to use error seeding: Not only to be able to estimate what percentage of errors is not discovered, but also in order to estimate how much variance there is in assessing the errors that do get discovered. Calibration and Error seeding One of the things one can do to calibrate a translation quality measurement tool (or process) is to use error seeding: Not only to be able to estimate what percentage of errors is not discovered, but also in order to estimate how much variance there is in assessing the errors that do get discovered.

63 Suggested process: calibration of a (generic) translation quality measurement tool
Have the sample translations (a suitable number of them) scored "by hand" by expert translators, so as to obtain a suitable range of evaluated samples, from very good to very bad. Importance of tightly defining the pool of reviewers Importance of instructions for reviewers Have other expert translators score the same tests, but using the tool On the basis of the results of the previous two steps, adjust the weights, types of errors, etc. in the tool until you are satisfied it is going to help in assessing translation quality - that is, until you are confident that trained evaluators are going to obtain with the tool consistent and reliable scores In doing this remember to remove from the kind of errors that can be assessed those that are controversial, i.e., those that lead to differences of opinion whether they are errors or not Finally adjust the tool so that it produces the range of error scores that is useful for your organization (e.g., if you want "0" or 100% as your perfect score) Suggested process: calibration of a (generic) translation quality measurement tool Have the sample translations (a suitable number of them) scored "by hand" by expert translators, so as to obtain a suitable range of evaluated samples, from very good to very bad. Importance of tightly defining the pool of reviewers Importance of instructions for reviewers Have other expert translators score the same tests, but using the tool On the basis of the results of the previous two steps, adjust the weights, types of errors, etc. in the tool until you are satisfied it is going to help in assessing translation quality - that is, until you are confident that trained evaluators are going to obtain with the tool consistent and reliable scores In doing this remember to remove from the kind of errors that can be assessed those that are controversial, i.e., those that lead to differences of opinion whether they are errors or not Finally adjust the tool so that it produces the range of error scores that is useful for your organization (e.g., if you want "0" or 100% as your perfect score)

64 Translation Quality Index (TQI)
The TQI is a number—obtained by the rigorous application of a QA process—that indicates the quality of a given translated text The TQI shouldn’t pass judgment on whether a translation is a pass or fail, right or wrong, correct or incorrect, good or bad. The client and the translator need to reach this agreement. The TQI simply measures on a scale from 0 to 100.

65 The concept of a “Translation Quality Index”
Translation Quality Index (TQI) A number—obtained by the rigorous application of a QA form—that is indicative of the quality of a given translation Our own definition One previous attempts of assigning a translation quality index has been made by the SAE J2450 standard: The concept of translation quality score (TQS). See Woyde, R.: “Introduction to the SAE J2450 Translation Quality Metric” in Language International Vol. 13 No. 2, April 2001 Indication, not measurement: We know that there are subjective elements of translation quality that cannot be measured, but—by focusing our attention on those that can be objectively measured—we believe we can obtain a useful indication of translation quality.

66 Delusions of Accuracy “Averages can be calculated to nineteen places of decimal with astonishing ease. When the job is done, it looks very accurate. It is an easy and fatal step to think that the accuracy of our arithmetic is equivalent to the accuracy of our knowledge about the problem in hand.” M.J. Moroney, Facts from Figures

67 Index / Indices Depending on one’s purpose, there may be more than a single TQI. E.g., a TQI may be developed for external purposes (to standardize the work obtained from outsourcing). Another TQI may be primarily for internal purposes (to measure the quality of a given special process). Depending on one’s purpose, there may be one than a single TQI; for example, a TQI may be developed for external purposes (to standardize the work obtained from outsourcing) while another TQI may be primarily for internal purposes (to measure the quality of a given special process).

68 An Example of a “Translation Quality Index” (1)
LISA QA Model ver. 1.0 (1995) 3,000 words ( words) 30 error points 30 error pts / 3,000 words = 1.0% 10,000 error pts out of 1 million words DPMO = 99.0% = TQI (Franco) DPMO = Defects per Million of Opportunities Six Sigma: DPMO = 3.4 (3.4 defects per 1 million opportunities) Opportunities [for defects / errors / nonconformities]: Words, or perhaps translation units, since translation deals with concepts, rather than single words. (But words are easier to calculate).

69 An Example of a “Translation Quality Index” (2)
Microsoft Quality Standards for Print ver. 1.0 (1998) 10,000 words ( words) 0 major errors 15 minor errors 15 errors / 10,000 words = 0.15% 1,500 errors out of 1 million words DPMO = 99.85% = TQI (Franco)

70 An Example of a “Translation Quality Index” (3)
2,000 words (8 250 words) 1 critical error 2 major errors 3 minor errors 6 errors / 2,000 words = 0.3% 3,000 errors out of 1 million words DPMO = 99.7% = TQI (Franco) Be careful when comparing TQIs from different companies. Unless the error categories and weights are the same, it makes no sense to compare different systems.

71 Let’s Calculate Two TQIs
LISA QA Model ver. 1.0 (1995) ATA Framework for Standard Error Marking 3,000 words ( words) 30 error points 30 error pts / 3,000 words = 0.01 250 words (estimate) 17 error points 17 error pts / 250 words = 0.068 The TQI is a number—obtained by the rigorous application of a QA form—that is indicative of the quality of a given translation DPMO = Defects per Million of Opportunities Six Sigma: DPMO = 3.4 (3.4 defects per 1 million opportunities) Opportunities [for defects / errors / nonconformities]: Words, or perhaps translation units, since translation deals with concepts, rather than single words. (But words are easier to calculate). Important: these two TQIs are not directly comparable Implicit TQI = 99.0% Implicit TQI = 93.2%

72 Control Charts Concept of “statistical control”
Concept: The process needs to be in statistical control before it makes sense to make improvements Statistical control: Said of a process whose average behavior can be foreseen.

73 Process Flow Diagram Concept: The definition of insanity
Quote: “Insanity is doing the same thing and expecting different results” (Deming’s course – Dr. Deming was one of the pioneer and of the foremost experts on statistical quality assurance techniques) Obtaining a measurement such as the TQI permits to verify the outcome of a process and see if it is in statistical control. Once it has been determined that a process is in statistical control (i.e., that no special causes are present), we can work to improve the process to reduce its variability and raise the quality level of the system. Special cause: Causes specific to some ephemeral event that can usually be discovered and removed. The performance of a system in statistical control, can only be improved by improving the system or the process itself. Improving the process improves the quality.

74 Example of Process for Accepting or Rejecting a Translation Process
1) Determine and describe what your process actually is (NOT what you think it is or what the process should be) 2) Measure the quality you have now 3) Determine if you have special cases, and if so, eliminate them (what the special cases are can be seen through the use of control charts) 4) Once the process is in statistical control (i.e., any quality variance is not due to special cases) 5) Change the process to improve quality 6) Measure the new level of quality to determine the effectiveness of the changes to the process (Franco)

75 Very Important Improvements made to the overall process should result in improvements to the product (the translation) Measurements of the product quality should indicate if there have been actual improvements to the process Therefore, means to measure product quality must be in place

76 How to Apply Statistical Methods for Quality Improvement
Define error categories and tolerances Create a QA form Obtain a TQI index Use the TQI index to improve the translation process

77 How to Set Up a Quality Measurement System – Stage 1, Preparation
Collect examples of good and bad translations Analyze the examples to separate controversial issues from agreed upon errors Decide what to measure (error categorization) Define what to measure in as many details as necessary (error definition)

78 How to Set Up a Quality Measurement System – Stage 2, Calibration
Assign a weight to various types of errors Determine critical errors (if necessary) Repeat 3, 4, 5, and 6 until the system works in an objective, repeatable, and reproducible way

79 Quality Assurance Forms and Tools
Create a QA form (or a tool) to help graders give objective scores

80 How to Set Up a Quality Measurement System – Stage 3, Sampling
Selection criteria (e.g. random, systematic) Size considerations (the greater the sample, the more accurate the results) Select confidence intervals, margins of error Cost considerations (find the point of diminishing returns) Collect samples Selection criteria (e.g. random, systematic) Size considerations (the greater the sample, the more accurate the results) Select confidence intervals, margins of error Cost considerations (find the point of diminishing returns)

81 How to Set Up a Quality Measurement System – Stage 4, Measurement
Evaluation must be repeatable, reproducible, objective Use of independent auditors Calculation of a Translation Quality Index (TQI) Measure the quality you have now

82 How to Set Up a Quality Measurement System – Stage 5, Statistical Analysis
Investigate the Outcome At this stage there shouldn’t be any special causes (use of control charts) Collect samples Selection criteria (e.g. random, systematic) Size considerations (the greater the sample, the more accurate the results) Select confidence intervals, margins of error Cost considerations (find the point of diminishing returns) Statistical Control: Deming Six Sigma Concept: The process needs to be in statistical control before it makes sense to make improvements Statistical control: Said of a process whose average behavior can be foreseen. Use of control charts to verify that a process is in statistical control Study (medical) found how actuarial-statistical based tests usually are more accurate than "experts" (i.e., more subjective tests) [Trent & Bishop] 3) Determine if you have special cases, and if so, eliminate them (what the special cases are can be seen through the use of control charts) 4) Once the process is in statistical control (i.e., any quality variance is not due to special cases)

83 How to Set Up a Quality Measurement System – Stage 6, Process Improvement
Take corrective actions (process improvement) Compare the TQI values before and after a process change to check for actual process improvement Determine and describe what your process actually is (NOT what you think it is or what the process should be) 5) Change the process to improve quality 6) Measure the new level of quality to determine the effectiveness of the changes to the process

84 How to Set Up a Quality Measurement System – Summary
Preparation Calibration Sampling Measurement Statistical Analysis Process Improvement

85 Practical Recommendations
Importance of Glossaries (for terminology) Style Guides (for syntax) Translation Instructions (for special cases) Protocols of Engagement (regulating the treatment of errors/defects and defining the acceptance/rejection criteria) Translation Guide for Customers (including a detailed customer checklist to specify what is important and what is not)

86 Conclusions Desirability of common standards (see GAAP - Generally Accepted Accounting Principles) It is not possible to directly compare different quality initiatives A common standard would still permit assigning different weights to different categories but in a much more transparent and comparable way Different quality initiatives (e.g., ATA, LISA, SAE, etc.) cannot be compared Reasons for differences is probably due the different aims of the different organizations.

87 Translation Quality Scale
Quality Continuum The TQI is a number—obtained by the rigorous application of a QA form—that is indicative of the quality of a given translation Note: E instead of F The TQI shouldn’t pass judgment on whether a translation is a pass or fail, right or wrong, correct or incorrect, good or bad. The client and the translator need to reach this agreement. The TQI simply measures on a scale from 0 to 100.

88 Translation Quality Scale
Quality Grades A E D C B 90 60 70 80 50 100 TQI

89 Select Bibliography •Brue, G. : Six Sigma for Managers, New York, McGraw Hill , 2000 •Deming, W. Edwards: Out of the Crisis, Cambridge (Mass), MIT Press, 2000 •Eckersley, H.: “Systems for Evaluating Translation Quality”, in Multilingual Computing & Technology, #47 Volume 13 Issue 3, April/May 2002 •Grove, A.: High Output Management, 2nd ed., New York, Vintage Press, 1995 •Hönig, H. : “Positions, Power and Practice: Functionalist Approaches and Translation Quality Assessment”, in Schäffner, C. (ed.) Translation and Quality. Clevendon, Multilingual Matters, 1998 •Language International: “Engineering Language Quality – A word with quality-standards consultant John Gagliardi”, in Language International Vol. 12 No. 3, June 2000 •Lauscher S.: “Concepts of Translation Quality and Quality Assessment”, in Proceedings of the 39th Annual Conference of the American Translators Association, 1998 •Ling Koo, S., and Kinds, H.: “A Quality-Assurance Model for Large Projects”, in Sprung, R. (ed.) Translating into Success. Cutting-edge strategies for going multilingual in a global age. Amsterdam/Philadelphia, John Benjamins Publishing Company, 2000 •LISA: “Microsoft Quality Standards”, in Case Studies and Client Requirements, 1998 •McConnell, S.: Software Project Survival Guide, Redmond, Microsoft Press, 1998 •Moroney, M.J.: “Facts from Figures”, Harmondsworth, Penguins, 1951, 1956(3rd) , •Reiss, Katharina: Translation Criticism - The Potential & Limitations. Categories and Criteria for Translation Quality Assessment. Translated by Erroll F. Rhodes. St. Jerome Publishing 2000 •Schäffner, C. (ed.): Translation and Quality, Clevendon, Multilingual Matters, 1998 •Shewhart, W. : Statistical Method from the Viewpoint of Quality Control New York: Dover Publications. Reprint, (Originally published: Washington, D.C.: Graduate School of the Department of Agriculture, 1939.) •Spurr W., and Bonini C. : Statistical Analysis for Business Decisions, Homewood, IL: Richard D. Irwin, Inc., 1967 •Sturz, W.: “DIN 2345 Hits the Language Industry” in Language International Vol. 10 No. 5, May 1998 •Vogel, S.; Nießen, S.; Hermann, N.: “Automatic Extrapolation of Human Assessment of Translation Quality” , 2000 •Woyde, R.: “Introduction to the SAE J2450 Translation Quality Metric”, in Language International Vol. 13 No. 2, April 2001 Other sources: American Society for Testing Materials ASTM. Subcommittee F15.48 on Language Translation. Consumer-Focused Guide to Quality Language Translation [DRAFT], November 1999 American Translators Association. “Accreditation Forum: Grading Standards—A Glimpse Behind the Scenes”, in ATA Chronicle, October 2002 Chase, R., Aquilano N., and Jacobs, F. Production and Operations Management: Manufacturing and Services, 8th ed. Boston, Irwin McGraw Hill, 1998 Kovács, P. : “Stopping the Standards? DIN 2345—From translation standards to perfect competition”, in Language International, Vol. 14 No. 5, October 2002 Language International: “Engineering Language Quality – A word with quality-standards consultant John Gagliardi”, in Language International Vol. 12 No. 3, June 2000 Newmark, P. “Categorizing and Evaluating Translation Errors: a Task for Examiners and Book Reviewers”, in More Paragraphs on Translation. Clevendon, Multilingual Matters, 1998 Reiss, K. Translation Criticism —The Potentials & Limitations. Categories and Criteria for Translation Quality Assessment. Trans. Erroll F. Rhodes. Manchester, UK, St. Jerome Publishing, 2000 Silvestrini, M., and Squarcina, L. Translation Quality: the UNI and ISO Standards — The Federcentri Approach. [Presentation]. 43rd Annual Conference of the American Translators Association, Atlanta, CA, 2002 Sprung, R. : “Regulating Language,” in Language International Vol. 11 No. 4. August 1999 Translation Company Division of the American Translators Association. ATA Translation Company Division Quality Standards. (Unapproved working document). Seventh draft, June Unpublished. Woyd, R. Introduction to SAE J2450 Translation Quality. [Tape]. 42nd Annual Conference of the American Translators Association, Los Angeles, CA, 2001


Download ppt "Translation Quality Measurement"

Similar presentations


Ads by Google