Agenda Factiva Intelligent Indexing Application of Factiva Intelligent Indexing Pros and Cons Quality Control
Factiva Intelligent Indexing Factiva Taxonomy 320,000 companies 760+ industries 450+ news subjects 370+ regions 22 languages
FII Structure One universal taxonomy Building blocks Inclusive hierarchy Polyarchy Synonyms and alias names Full descriptions Variable depth and breadth
Polyarchy Internet/Online services E-commerce Internet browsers Internet portals Internet search engines Internet service providers etc. Computers Computer hardware Computer services Computer stores Networking Semiconductors Software Applications software GroupWare Intelligent agents Internet browsers etc.
Factiva Intelligent Indexing Company Codes Industry Codes Subject Codes Region Codes Codes On documents Search
FII Application Code mapping Entity extraction Rule-based system Linguistic analysis software Manual review
Code Mapping Most information providers provide some form of metadata. This is matched to relevant Factiva indexing terms. Advantages: Easy and quick Efficient use of existing data Disadvantages: Mismatches between coding schemes Different interpretations of same concepts Variable quality – which sources do you trust?
Entity extraction This tool finds company names which are then compared to our controlled vocabulary. Advantages: Consistent Precise Disadvantages: Ambiguous names High maintenance costs
Rule-based system Sets of IF-THEN statements established by editors, information architects, or subject-matter experts. Advantages: Good at highly formulaic content Precise Disadvantages: Need thousands of rules for a complete system Maintenance of the rules themselves becomes VERY expensive! Only captures explicit concepts
Linguistics-based categorization This tool is currently employed across all English, French, German and Spanish language publications. A combination of linguistic analysis and statistical algorithms allows new content to be compared to example data and coded appropriately. Advantages: Scales to millions of documents, thousands of categories, multiple languages Copes well with change Fits editorial workflow Good fine-tuning tools – editorial control Codes implicit as well as explicit concepts Disadvantages: Training time and cost
Editorial Control Set relevance levels Maintain training set Stop words - correlation and multiple meanings "Chechnya" to the industries model, as it was triggering the freelance journalist code (because so many of them were dying there)
Manual coding About 200 editors spread across main time zones Advantages: Humans easily grasp the gist of the story Cope well with exceptions Visible/Controllable Disadvantages: Very resource-intensive = Expensive Slow Inconsistent (subjective and temporal) Not scalable
Review process Lists reviewed every three months, redefinition, new codes, expansion changes Market research/customer feedback and behavior Changes to parent schemes/standards Editorial/Quality control feedback Internal coding forum 45-day notice period
Quality control Sampling by editors Scoring for precision and recall Analysis by source, language, code, editor etc. Feedback to editors and systems Corrective action
Results Three million articles coded a month All receive a level of autocoding Seventy-nine percent automation or more than two million are auto- coded with no further manual review
Recap Factivas taxonomy is Factiva Intelligent Indexing Factiva uses a hybrid methodology for application Factiva has a coding team for governance and maintenance End result: Factiva Intelligent Indexing leverages our editorial strengths, combining human experience and expertise with the latest automation software to implement a completely flexible and granular indexing system across all of our content.