ROSIDS - R apid O pen S ource I ntelligence D eployment S ystem Mark P. Pfeiffer, SAIL LABS Technology AG August 7, 2006.

Slides:



Advertisements
Similar presentations
The UK Television and Broadcast Market. The UK Broadcast Media Industry Radio Commercial Radio Public Radio Internet Public Internet Commercial Internet.
Advertisements

LETS LOOK AT HOW THE NEWS IS MADE! WHY ARE NEWS SOURCES BIASED?
ACCESSIBILE MEDIA INC. CELEBRATING 20 YEARS OF MAKING MEDIA ACCESSIBLE.
Multimedia is the combination of several forms of communication
Jobs in the Central Intelligence Agency There are several jobs within the CIA that are available to people with skills in foreign language. - Open Source.
Second Language Acquisition
1 Integrated Public Alert and Warning System (IPAWS) Overview and Commercial Mobile Alert System CMAS Introduction August 2009.
 In some countries, not-for-profit organizations are permitted to run advertisements through certain media outlets free-of-charge if the message contained.
ACCESSIBLE TECHNOLOGIES FOR SPEECH MANAGEMENT “Making media accessible to all” ITU workshop – Geneva October 2013.
A.A young child (age 2) learning German as a first language in Germany B. A young Turkish child (age 5) learning German as a second language in Germany.
Digital Journalism The Economics of Digital Journalism: What Happens When Traditional Newspapers Go Out of Business in 2039? W. R. Neuman.
SI485i : NLP Day 1 Intro to NLP. Assumptions about You You know… how to program Java basic UNIX usage basic probability and statistics (we’ll also review)
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CSCI 200 Introduction To Programming with Visual Basic Bob Bradley.
Mass media. Media for information  newspaper  radio  televison  internet.
INFORMATION SCIENCE IN JOURNALISM Genna Gross. WHAT IS JOURNALISM?  The main aspect of journalism is knowing how to construct information.  Journalists.
Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.
Bits and Bytes in a computers memory Inside the computer are millions of electronic switches. These are grouped together in bundles of 8. A switch can.
Copyright 2006 – Biz/ed Marketing.
& Starting to writeChap 2.  What do you think the role of a journalist is in our society?  Where do you get most of your news information ▪ Do you think.
P  We do exegesis every day.  It is the process of understanding what we hear or read.  Exegesis is about communication and understanding :
Project CHORIST - PSCE '07, May 07 - Patrice SIMONpage: 1 Project CHORIST CHORIST = Integrating Communications for enHanced envirOnmental RISk management.
LOGO. Hot Tip  Radio offers an ability to reach specific audiences through specialized programming.  Radio can be adapted for different.
Managing your Digital Footprint Real Life vs. Digital Life.
Mass Media AP GoPo. What are the different types of mass media?
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
1 NumericNumeric Developing a statistical framework for measuring the digitisation of Europe’s cultural heritage  Numeric  Phillip Ramsdale The study.
Streaming Media A technique for transferring data on the Internet so it can be processed as a steady and continuous stream.
WHO IS A JOURNALIST TODAY? HOW IS THE ROLE OF A JOURNALIST CHANGING? JOURNALISM TODAY.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
Geneva, Switzerland, 24 October 2013 Breaking the Sound Barrier Breakthroughs in Captioning ITU Workshop on “Making Media Accessible to All: The Options.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
The Localisation Industry in Transition: New Economy, New Technology Florita Mendez Localisation Ireland 2000 Dublin, November 7, 2000.
The Library versus the Internet-1 The library and the Internet each have strengths and limitations. Help In the library, the staff can help you find what.
Mass Media: Good or Bad.
FM - AGS 2001 What computers are good at ? The advantages of ICT.
Media, Multimedia & Digital Media Basic Concepts.
+ The Use of Databases in the Instructional Program Increasing Rigor and Inquiry Throughout the Curriculum Donna Dick, Jacob Gerding, and Michelle Phillips.
TVEyes Inc. Searchable Video December Some Search History Text Search had a slow start – but is now fundamental Yahoo started as a directory.
The Electronic & Digital Era The Communication Toolshed Home & Information Highway.
Learning by heart Take an English course outside school Reading books, newspapers Practice grammar with vocabulary Memorize new words Listening to the.
How To Learn Any Language Brad Aiken Source:
Objective: Enabling students to translate from English into Arabic and vice versa. Why teach translation: It develops accuracy, fluency, clarity, and.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Online Copywriting eMarketing: The Essential Guide to Online Marketing
INTRODUCTION TO APPLIED LINGUISTICS
Advanced Higher Modern Languages. Aims of the Session To examine in detail the Outcome and Assessment Standards of the Specialist Study Unit and how they.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
Indian Community Languages Schools Parents and Teachers Conference July 2017.
Breaking the Sound Barrier Breakthroughs in Captioning
The Specialist Study Unit
The “Who” and “How” of the News
Introduction Characteristics Advantages Limitations
Evaluation of Research Methods
Media, Multimedia & Digital Media
Advanced Higher Specialist Study Unit
Unit 9 Information and Communications Technologies
Software Testing and Maintenance Maintenance and Evolution Overview
Speech Processing August 4, /2/2018.
Gathering Systems Requirements
Breaking the Sound Barrier Breakthroughs in Captioning
Advanced Higher French/ Spanish in a nutshell!
Gathering Systems Requirements
Economy Project.
Idiap Research Institute University of Edinburgh
Audio Visual Media Accessibility
Chapter 31 - The Global Digital Library
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

ROSIDS - R apid O pen S ource I ntelligence D eployment S ystem Mark P. Pfeiffer, SAIL LABS Technology AG August 7, 2006

2 open source intelligenceIS intelligence gather by publicly accessible sources (TV, Radio, Newspapers, Internet...) 85% of used intelligence is open source intelligence OSINT is only a single digit % of the intelligence budget

Government - SAIL LABS Project “A” Navy needed a reliable, robust, independent, maintenance free, real-time and inexpensive open source intelligence (OSINT) tool for Arabic TV and radio …  and they needed it fast.

1 st Step: Needs Assessment “Need”  Close caption insert  Time shift  60 seconds (10-20s speech engine*, 10s translation, 30s safety buffer) “Have”  SAIL LABS: reliable, real-time and robust ASR for Arabic  Sakhr: fast, reliable Arabic translation engine *=due to the nature of languages themselves, engine requires only 2s

Result They said one of our competitors could deliver in 30 days at very little cost! We said: „Sorry, but we don‘t want to disappoint a customer“

2 nd Step: 1 year later (366 days exact) The same Navy still needed a reliable, working, robust, independent, maintenance free, real-time and inexpensive open source intelligence (OSINT) tool for Arabic TV and radio...  fast (well, at least as quick as it works!)

Result We decided to build and offer ROSIDS (Rapid Open Source Intelligence Deployment System)

8

Building ROSIDS

Requires close work with  Someone who knows time shifting  Someone who knows ASR  Someone who knows translation technologies  Someone who knows how to put this all together

Situational Awareness International Crisis Management Open Source Intelligence Real-time Speech-to-Text (ASR) Translation (MT) ROSIDS Arabic to English Also to and from: Arabic, English,French, German, Greek, Polish, Spanish, …

Schematic Layout  Speech Recognition  Text Translation Sail Labs ROSIDS Real-time 30s latency Satellite TV Antenna Cable Radio Sail Labs Media Mining Store in archive

Accuracy Hits How do you make this thing readable?  ASR WER is 5-25% (depends on audio, domain, etc)  Translation error rate is 5-30% (depends on source text)  Combined untreated error rate CAN GO ANYWHERE!  Context is much more important than WER!

Machine Translation (MT) Traditional MT sources from Books MT + ASR, MT must assume non structured, non grammatical,no syntax New MT models where adapted to Broadcast news

meanin g context relation reference No impact Impact of ASR and MT combined errors

BAD RESULT Remedies ASR domain vocab MT source

VERY BAD RESULT Remedies AS R domain vocab MT source

VERY BAD RESULT Remedies AS R domain vocab MTsource

GOOD RESULT Remedies source AS R domain vocab MT

GOOD RESULT Remedies source AS R domain vocab MT

GOOD RESULT Remedies source AS R domain vocab MT

Human vs Machine It will always be necessary to get somebody who is familiar with the language and even better with the cultural environment to look at the relevant piece and decide what it means. ROSIDS just helps a non-linguist decide when to get (wake) the analyst and when better let him sleep!

Mark P. Pfeiffer, SAIL LABS Technology AG US cell: (571)