Presentation on theme: "SADC Course in Statistics The Use of Optical Character Recognition Technology In National Statistical Offices."— Presentation transcript:
SADC Course in Statistics The Use of Optical Character Recognition Technology In National Statistical Offices
To put your footer here go to View > Header and Footer 2 What is Optical Character Recognition? It is a technology that recognises and captures alphanumeric characters on a computer at high speed. It provides complete form processing and documents capture solution. It is sometimes called Optical Character Reader (OCR) or Intelligent Character Reader (ICR).
To put your footer here go to View > Header and Footer 3 Why do National Statistical Offices require OCR? Most NSOs are moving from the traditional way of doing things by adopting Optical Character Recognition technology. Its use may offer the following benefits/advantages: It allows the NSO to process information more quickly, more accurately and more efficiently thus allowing them to release and disseminate data timeously to support the evidence-based decision making process.
To put your footer here go to View > Header and Footer 4 Why require OCR? contd It reduces the data entry time and increases its accuracy when compared to the use of manual data entry operators. It allows validation rules to be incorporated in the system so as to validate and correct the data. Errors can be identified using different colours that facilitate the review and correction process. Scanned forms are stored digitally thus eliminating the need for physical storage of questionnaires for these can be destroyed after the initial scanning, recognition and repair.
To put your footer here go to View > Header and Footer 5 Why require OCR? contd The system stores data in a database thus facilitating data analysis. It reduces the number of data entry personnel.
To put your footer here go to View > Header and Footer 6 What are the disadvantages of OCR? The speed of gathering data in the field by enumerators is severely reduced for the filling in of OCR/ICR forms needs more care to write in the specified boxes. Has a severe limitation when it comes to human handwriting. Variation in enumerator handwriting can cause problems in form processing and may thus decrease the character recognition rate. Errors in filling of questionnaires decrease the rate of recognition. Printing quality can cause problems if it is too dark or too light. This may reduce the recognition rate of characters.
To put your footer here go to View > Header and Footer 7 Factors to consider when implementing OCR. Although OCR has advantages in speeding data processing, analysis and ultimately the release of data, adoption of this technology becomes an organisational consideration. The following considerations come to mind: Does the organisation have the capacity to use the technology, and if not, is it possible to outsource skills, funding the exercise of outsourcing and are there possibilities of creating capacity in the immediate future. How comparable is the quality of data obtained through the use of OCR/ICR to that obtained through the use of human labour particularly at data entry.
To put your footer here go to View > Header and Footer 8 Factors to consider contd Differences in the error rate between OCR/ICR and the traditional use of data entry personnel. Cost implication of the technology as compared to the use of human labour. In the South African case, the planned use of OCR technology in the Census 2001 was expected to reduce cost compared with the 1996 Census by between 30 and 40 percent. #The above factors are basically querying, whether Optical Character Recognition is an appropriate technology in National Statistical Offices.
To put your footer here go to View > Header and Footer 9 Factors to consider contd The need to clearly define the roles or responsibilities of the District Office, Provincial Office and Head Office. This entails deciding where manual editing of questionnaires, data entry and final analysis and production of statistical data or information will be done. Pilot testing questionnaires to evaluate enumerator training, data entry by enumerators and using OCR technology e.g. character recognition. This activity requires funding and the question to ask is; Do National Statistical Offices have the funds to carry out these activities?
To put your footer here go to View > Header and Footer 10 How to obtain good results from scanning? There are three requirements: quality of the form. appropriate preparation of field staff and their supplies. appropriate design of the quality control activities.
To put your footer here go to View > Header and Footer 11 Quality of the form The quality of the form may be increased in one of the following ways: Select adequate paper quality. Use paper heavier than 80 grams per square meters to avoid paper crashes or over read the other side of a single page. Source a reliable print press. Select an appropriate drop out colour, usually red to allow the system to pick up only the meaningful information from an OCR form. It advisable to use marks or ticks as much as possible. Avoid using open ended questions.
To put your footer here go to View > Header and Footer 12 Preparation of field staff and their supplies Emphasis should be placed on the following aspects: Careful handling and filing of materials or documents. This means that enumerators should have appropriate supplies such as a documents bag, several black pencils, correctors or erasers among other supplies. Training of field staff should pay attention on aspects of how to write numeric or alphabetic characters so as to achieve maximum character recognition. Spend time emphasising scanning hand writing.
To put your footer here go to View > Header and Footer 13 Field staff and their supplies contd Adequate instructions stating that each box should contain only one character, characters should not extend outside the designated boxes and unnecessary lines of characters such as points, strokes are prohibited, strokes should not be ended with extensions, all lines should be connected without breaks and all lines and dots should be pressed with the same pressure. Ensure that all answers in the questionnaire are numeric codes.
To put your footer here go to View > Header and Footer 14 Field staff and their supplies contd Instructions should be given on reasons of error reading by OCR, e.g. bad condition of the form because it is dirty, folded or crumbled or forms are incompletely filled.
To put your footer here go to View > Header and Footer 15 Quality control process A number of quality control processes have to be put in place to ensure the following: that all questionnaires have been scanned completely, with no omissions and duplications. Quality assurance tests are done on the quality of recognition to ensure that acceptable recognition rates are maintained.
To put your footer here go to View > Header and Footer 16 Sources –http://www.afdb.org/pls/portal/docs/PAGE/ADB_ADMIN_ PG/DOCUMENTS/STATISTICS/JOURNALVOL1FULL.P DFhttp://www.afdb.org/pls/portal/docs/PAGE/ADB_ADMIN_ PG/DOCUMENTS/STATISTICS/JOURNALVOL1FULL.P DF –http://intranet.unescap.org/stat/pop-it/pop- guide/capture_ch06.pdfhttp://intranet.unescap.org/stat/pop-it/pop- guide/capture_ch06.pdf –National Sample Census of Agriculture 2002/2003, Volume 1: Technical and Operation Report, September 2006.
To put your footer here go to View > Header and Footer 17