Presentation on theme: "THE BILINGUAL BRAIN [Why Machines Can't Translate for Toffee] Derek J. SMITH, CEng, CITP Cardiff School of Health Sciences University of Wales Institute,"— Presentation transcript:
THE BILINGUAL BRAIN [Why Machines Can't Translate for Toffee] Derek J. SMITH, CEng, CITP Cardiff School of Health Sciences University of Wales Institute, Cardiff email@example.com http://www.smithsrisca.co.uk firstname.lastname@example.org http://www.smithsrisca.co.uk
Specially written to support Monday 10th March 2008
A BRIEF PRELIMINARY..... WILL THE WELSH SPEAKERS IN THE AUDIENCE PLEASE WRITE YOUR TRANSLATION OF THE FOLLOWING ENGLISH PHRASES ON THE SLIP OF PAPER PROVIDED, AND PASS IT TO THE FRONT..... "Out of sight, out of mind" "It was just a pipe dream" "He was as pleased as punch" "He's a bit tight-fisted"
ALL ABOUT VOCABULARY AND GRAMMAR VOCABULARY means all the words you know. I didn't know what an arquebusier was. didn't an was know arquebusier I what. GRAMMAR means the rules which affect how those words can be used. A set of words means nothing until it obeys these rules.............. Take a good look at this word - we'll be mentioning it again later..
GRAMMAR THINGS WORDS SAME THE DIFFERENT SAY HELPS IN ENGLISH, WORD SEQUENCE IS THE MOST IMPORTANT ASPECT OF GRAMMAR. TRY PUTTING THIS SENTENCE IN THE RIGHT ORDER.....
EXAMPLES: "DAVID KISSES SARAH" "SARAH KISSES DAVID" GRAMMAR THINGS WORDS SAME THE DIFFERENT SAY HELPS
THE FIRST RELIABLE DISSECTIONS OF THE NERVOUS SYSTEM TOOK PLACE IN THE 16TH CENTURY..... These images are from Andreas Vesalius' De Humani Corporis Fabrica (1543)
Pay attention now! THIS ENCOURAGED PHILOSOPHERS TO GUESS AT HOW IT WORKED..... This image is from Rene Descartes' Treatise of Man (1662), and speculates on how biological information processing might be organised.
DOCTORS, TOO, HAD ALWAYS BEEN INTERESTED IN THE STRANGE THINGS THAT HAPPENED WHEN THEIR PATIENTS SUFFERED STROKES OR HEAD INJURIES.....
THEY FOUND OUT, FOR EXAMPLE, THAT EACH SIDE OF THE BRAIN CONTROLS THE OPPOSITE SIDE OF THE BODY!
AND THAT LANGUAGE SKILLS ARE CONCENTRATED IN THE LEFT HEMISPHERE Language skills here So although I can't use my left arm, I can still hold a decent conversation with you.
Damage here causes problems with grammar and word finding BY THE MID-19TH CENTURY, NEUROLOGISTS HAD STARTED TO LOCATE LANGUAGE SKILLS EVEN MORE PRECISELY Whereas damage here causes problems with understanding
THEN CAME THE GREAT WAR (1914-1918)..... IT WAS VERY DANGEROUS
WARS GIVE US LOTS OF HEAD INJURIES TO STUDY Here are some sketches from cases reported by the German military surgeon Walter Poppelreuter in 1917. THESE MEN ALL HAD PROBLEMS SEEING. CAN YOU SPOT THE PATTERN TO THIS, PERHAPS?
Here is a sketch of a case reported by the British neurosurgeon Henry Head in 1926. Note the "cross-over" of the resulting impairment. I can't feel my left hand. WARS GIVE US LOTS OF HEAD INJURIES TO STUDY
One simple test is to show someone an object and then ask them what it is called. I can see it and I know what it's for, but I just can't think of the name! WARS GIVE US LOTS OF HEAD INJURIES TO STUDY
I'm like Poppelreuter's cases - I just can't see it (even though there's nothing wrong with my eyes). (But I'd recognise it at once if you let me touch it.) WARS GIVE US LOTS OF HEAD INJURIES TO STUDY Different injuries give different answers!!
Hair... morning. WARS GIVE US LOTS OF HEAD INJURIES TO STUDY This patient knows what the comb is for but has lost the grammar needed to put together a full sentence. This symptom is known as "agrammatism" - a loss of grammar.
WARS GIVE US LOTS OF HEAD INJURIES TO STUDY So basically, if you shoot enough soldiers, sooner or later you'll have case data on the entire brain.....
This "map" of brain function was compiled from the records of 1600 WW1 head injury cases seen by the German military surgeon Karl Kleist in 1934..... WARS GIVE US LOTS OF HEAD INJURIES TO STUDY
SO BY 1945 WE HAD A PRETTY GOOD IDEA WHAT "KNOWLEDGE" WAS AND WHERE IT WAS STORED …..
Well I'm obviously a Language Skill! And me! And me too, I suppose!.... AND OUR JOB THIS EVENING IS TO SORT THE LANGUAGE SKILLS FROM THE NON-LANGUAGE SKILLS. ANYBODY GOT ANY SUGGESTIONS?
NOW HERE'S SOMETHING INTERESTING. WHO CAN REMEMBER THE LONG WORD WE SAW A FEW MINUTES AGO? SO HOW IS IT THAT WE CAN LEARN WORDS BEFORE WE KNOW WHAT THEY MEAN? IT WAS ARQUEBUSIER
THE RELATIONSHIP BETWEEN WORDS AND UNDERSTANDING IS NOT AT ALL STRAIGHTFORWARD.......... BECAUSE THE UNDERSTANDING IS STORED SEPARATELY FROM THE WORDS AND THE GRAMMAR. Consider this typical dictionary entry..... Now watch closely..... Cat: A four-legged furry mammal, often domesticated.Cat: Cat: A four-legged furry mammal, often domesticated. The meaning of things is stored up here..... Whereas their names are stored down here.....
I can already read, hear, and say arquebusier, but if you show me one I'll understand it as well.
So I must be a Language Skill as well! THE MOST IMPORTANT LANGUAGE SKILL, IN OTHER WORDS, IS HAVING SOMETHING SENSIBLE TO SAY IN THE FIRST PLACE
BUT WE'RE ALREADY CONFUSED, SO LET'S TAKE A MORE LEISURELY LOOK AT THE PARTS OF THE LANGUAGE JIGSAW WITHOUT USING SO MUCH COMPLICATED VOCABULARY.....
Here's an encyclopaedia containing everything we know.....
This is where the information arriving at our ears ends up.....
This is where the information arriving at our eyes ends up.....
This is where the information arriving from the touch sensors on our skin ends up.....
IT'S ALL VERY LOGICAL REALLY - WHAT WE KNOW BY TOUCHING, HEARING, AND SEEING ALL CONTRIBUTES TOWARDS OUR GENERAL KNOWLEDGE
And this is where your speech apparatus (lips, tongue, breathing, etc.) is controlled from.....
This is what we know about speaking grammatically..... NOTE THAT YOU NEED ONE OF THESE "MENTAL TEXTBOOKS" FOR EVERY LANGUAGE YOU KNOW HOW TO SPEAK German GRAMMAR French GRAMMAR Latin GRAMMAR English and Welsh GRAMMAR
English / Welsh GRAMMARS German VOCAB French VOCAB Latin VOCAB English and Welsh VOCAB..... and these are the words we know how to use..... AGAIN YOU NEED ONE OF THESE "MENTAL VOCABULARIES" FOR EVERY LANGUAGE YOU KNOW HOW TO SPEAK
English / Welsh ORAL GRAMMARS English / Welsh ORAL VOCABS This gives us an important concentration of language OUTPUT skills. Notice how the grammar, the vocabulary, and the necessary muscle control all sit closely together. IN MULTILINGUAL SPEAKERS, THERE ARE ONLY MINOR DIFFERENCES IN WHERE THE VARIOUS OUTPUT SKILLS ARE LOCATED.
Finally, we need to add in the mental textbooks for the languages we understand when we hear them spoken. These are our language INPUT skills..... English / Welsh AURAL GRAMMARS English / Welsh AURAL VOCABS English / Welsh ORAL GRAMMARS English / Welsh ORAL VOCABS IN MULTILINGUAL SPEAKERS, THERE ARE AGAIN ONLY MINOR DIFFERENCES IN WHERE THE VARIOUS INPUT SKILLS ARE LOCATED.
SO THIS IS WHAT WE END UP WITH..... [A PERSONAL COPY WILL BE HANDED OUT AT THIS POINT]
English / Welsh GRAMMARS English / Welsh VOCABS English / Welsh AURAL GRAMMARS English / Welsh AURAL VOCABS
LET'S USE OUR DIAGRAMS TO SEE HOW ALL THESE LANGUAGE SKILLS ARE INVOLVED IN EVERYDAY COMMUNICATION.....
EXAMPLE #1 - ASKING SOMEONE A QUESTION Here is the flow of information within the two brains involved when I'm talking to someone.....
EXAMPLE #2 - REPLYING TO A QUESTION..... and this is what happens when that person replies to me.....
EXAMPLE #3 - HEARING AN UNFAMILIAR LANGUAGE It's nearly the same when a non-Welsh speaker listens to the news in Welsh. You only get the occasional word..... Eisteddfod Genedlaethol yr Urdd yw gŵyl ieuenctid fwyaf Ewrop. Fe'i trefnir gan.....
EXAMPLE #4 - GIVING A LECTURE Here is the flow of information within the many brains involved when I'm talking to a class.....
English / Welsh GRAMMARS English / Welsh VOCABS English / Welsh AURAL GRAMMARS English / Welsh AURAL VOCABS EXAMPLE #5 - WHAT IF YOU'RE DEAF?? If you're deaf, you can't rely on the sense of hearing and its AURAL vocabulary and grammar..... So you'll need to rely more on your SIGHT vocabulary and grammar instead..... English / Welsh VISUAL GRAMMARS English / Welsh VISUAL VOCABS
EXAMPLE #6 - THINKING THINGS OVER TO YOURSELF This is you, later tonight, when you're trying to remember what I was saying All you have to do is switch off the connection to your mouth, and listen to your own "inner speech" instead.
AND FINALLY, BILINGUALISM WHAT IF YOU DON'T SPEAK A LANGUAGE AND NEED A SIMULTANEOUS TRANSLATION
THE BILINGUAL BRAIN THE PROBLEM OF IDIOM An idiom is a way of expressing a thought figuratively, perhaps as a metaphor or saying. E.g., "He was as pleased as punch" E.g., "Mae hi'n siarad fel melun bupur" Many idioms are particular to the language in question AND - AS WE SHALL SHORTLY BE SEEING - CANNOT BE LITERALLY TRANSLATED.
LET'S DO ANOTHER WORKED EXAMPLE..... TRANSLATE THIS ENGLISH IDIOM INTO WELSH "OUT OF SIGHT, OUT OF MIND"
YOU MIGHT USE THE PHRASE IN ENGLISH IF SOMEONE HAD TWO GIRLFRIENDS..... BOY RIVAL GIRLFRIEND
OUT OF SIGHT, OUT OF MIND OUT OF SIGHT, OUT OF MIND
"OUT OF SIGHT, OUT OF MIND" translated word by word we get..... "ALLAN - O - GOLWG - ALLAN - O - MEDDWL" which mutates to..... "ALLAN O OLWG, ALLAN O FEDDWL"
BUT EVEN WHEN WE GET THE GRAMMAR RIGHT. A TRANSLATION IS ONLY ANY GOOD IF IT PRODUCES THE REQUIRED NUANCE OF UNDERSTANDING. SO ULTIMATELY WE JUST HAVE TO TEST OUR TRANSLATION ON SOME EXPERIENCED WELSH SPEAKERS.....
SO IS THE MEANING THE SAME IN WELSH? WHAT IDEA DOES HEARING THE PHRASE CONJURE UP? WHAT IDEA MIGHT PROMPT YOU USING THE PHRASE? ALLAN O OLWG, ALLAN O FEDDWL ALLAN O OLWG, ALLAN O FEDDWL
WE'LL DO A COUPLE MORE EXAMPLES IN A FEW MINUTES.....
MACHINE TRANSLATION The story of machine translation begins in World War Two, in the winter of 1942 as the Nazi invasion of the Soviet Union ground to a halt at a city named Stalingrad.....
MACHINE TRANSLATION We've probably seen the Jude Law movie.....
MACHINE TRANSLATION By the Spring of 1943, the Russians had exhausted their attackers and began a long counter-offensive.....
MACHINE TRANSLATION In June 1944, the Allies mounted the D-Day landings to attack the Nazis from the West.....
MACHINE TRANSLATION Again we've seen the movies.....
MACHINE TRANSLATION But the Russians were moving faster from the East, and by Spring 1945 were fighting in Berlin itself.
MACHINE TRANSLATION So when the Nazis surrendered, the Russians ended up in charge of more of Europe than anyone had anticipated.....
MACHINE TRANSLATION As a result, the Americans and the Russians started squabbling, and Europe was soon divided into East and West by fences, walls, and minefields.....
MACHINE TRANSLATION They called the new border the "Iron Curtain", and the hostile stand-off the "Cold War".
MACHINE TRANSLATION The Iron Curtain stayed in place until 1989, when the Berlin Wall was pulled down and Germany was re- unified.....
MACHINE TRANSLATION There wasn't a lot of actual fighting, but rather the constant threat of nuclear annihilation.....
MACHINE TRANSLATION It was also a time for espionage and military intelligence.....
MACHINE TRANSLATION And therefore a lot more books and movies.....
MACHINE TRANSLATION UNFORTUNATELY, it was difficult spying on the Russians because they spoke a different language and used a different alphabet..... What did he say? "Ыоур плаце ор мине?", I think.
MACHINE TRANSLATION Here's some more of the Russian version of the "Cyrillic alphabet". some of the letters are the same as ours, but others are different and misleading.....
MACHINE TRANSLATION So the Allied intelligence services always had a backlog of documents in Russian waiting to be translated.....
THE "COLOSSUS", 1943 Click for additional background..... coincidentally, the first programmable digital computer had been invented in the same year as the Soviet counter-attack. This is the British code- breaking Colossus.....
MACHINE TRANSLATION As the Cold War got ever more hostile, the Allies poured a fortune into experiments getting computers to do the translation for them. Here's the sort of machine they would have used in the early 1950s.....
MACHINE TRANSLATION The idea of machine translation dates back to meetings 20th June 1946 and 6th March 1947 between a British physicist named Andrew D. Booth and Warren Weaver of the American Rockefeller Foundation, and a letter dated 4th March 1947 from Weaver to the mathematician Norbert Wiener.
MACHINE TRANSLATION Booth had already helped build one of London University's first computers - the ARC. This had a drum storage system, similar to this one..... I'm the I-Pod's grandad!
MACHINE TRANSLATION Booth wondered whether this drum storage system might store some bilingual vocabulary, reading text in one language and writing it out in another! BITE DOG CAT CATH BRATHU CI
MACHINE TRANSLATION Then he heard that a Cambridge geneticist named Richard H. Richens had been using a punched card system to store his experimental data on plant breeds, and had worked out how to organise large bodies of data.
MACHINE TRANSLATION Between them, Booth and Richens designed the first bilingual computer "database". For those of you interested in the technicalities, here is part of their design concept..... "The most obvious way of using a computing machine for translation is to code each letter of a proposed dictionary word in 5-binary-digit form and to store the translation in that memory location having the same digital value as the aggregate value of the dictionary word. Translation would then consist in coding the message word (a teletyper does this automatically) and then extracting the translation in the same or next lower (digitally valued) storage position, the first giving exact translation and the second the stem translation with a remainder. [.....] " (Richens and Booth, 1955, pp45- 46)
MACHINE TRANSLATION THE BOSTON CONFERENCE, 1952 The first machine translation conference took place at the Massachussetts Institute of Technology, 17-20th June 1952. The computer scientists promised great things, but the professors of linguistic theory were not impressed. The conference was chaired by one of the sceptics, Yehoshua Bar-Hillel.....
MACHINE TRANSLATION THE BOSTON CONFERENCE, 1952 Bar-Hillel doubted that machine translation would be much use until computers got a lot bigger and a lot faster, and even then their eventual success couldn't be guaranteed.....
MACHINE TRANSLATION THE GEORGETOWN CONFERENCE, 1954 A public demonstration of machine translation followed on 7th January 1954 at IBM Headquarters, New York. The demonstration processed a 250-word Russian-English vocabulary, and made front page news the next morning [see the IBM Press Release].see the IBM Press Release The next screen reproduces the first paragraphs of that Press Release.....
MACHINE TRANSLATION THE GEORGETOWN CONFERENCE, 1954 New York, January 7..... Russian was translated into English by an electronic "brain" today for the first time. Brief statements about politics, law, mathematics, chemistry, metallurgy, communications and military affairs were submitted in Russian by linguists of the Georgetown University Institute of Languages and Linguistics to the famous 701 computer of the International Business Machines Corporation. And the giant computer, within a few seconds, turned the sentences into easily readable English. A girl who didn't understand a word of the language of the Soviets punched out the Russian messages on IBM cards. The "brain" dashed off its English translations on an automatic printer at the breakneck speed of two and a half lines per second. "Mi pyeryedayem mislyi posryedstvom ryechyi," the girl punched. And the 701 responded: "We transmit thoughts by means of speech."
MACHINE TRANSLATION Centers for machine translation research were soon set up at MIT (Victor Yngve), Washington (Erwin Reifler), and Berkeley (Sydney Lamb). Britain's effort was concentrated at the Cambridge Language Research Unit, under Margaret Masterman, where the researchers included Richens himself, Frederick Parker-Rhodes, Yorick Wilks, Michael Halliday, and Karen Spärck Jones.
MACHINE TRANSLATION But Bar-Hillel's reservations had been well-placed, because the Richens-Booth approach only works for the relatively simple words in a language, and starts to struggle once it encounters "irregular" words and complex grammatical structures. Just like English speakers learning Welsh, computers also have trouble learning grammar, and make total dogs' breakfasts of a language's idioms! A computer would have trouble with this phrase, for example
THESE PROBLEMS HAVE STILL NOT BEEN OVERCOME HALF A CENTURY LATER, AND THE 1950S EFFORTS WERE OFTEN AMUSING FAILURES. ONE OF BAR-HILLEL'S FAVOURITE STORIES CONCERNED HOW A COMPUTER WOULD GO ABOUT TRANSLATING OUR "OUT OF SIGHT, OUT OF MIND" SENTENCE INTO RUSSIAN.....
HERE IS THE INITIAL TRANSLATION..... OUT OF MINDOUT OF SIGHT НЕВИДИМЫЙБЕЗУМНЫЙ
SO FAR SO GOOD..... OUT OF SIGHT, OUT OF MIND НЕВИДИМЫЙ, БЕЗУМНЫЙ
BAR-HILLEL THEN POINTED OUT THAT IF YOUR COMPUTER HAD DONE ITS JOB PROPERLY YOU SHOULD NOW BE ABLE TO PUT THE RUSSIAN BACK THROUGH THE TRANSLATION SOFTWARE FROM THE OTHER DIRECTION AND GET BACK THE SENTENCE YOU HAD STARTED WITH.....
THIS IS WHAT WOULD HAPPEN..... OUT OF SIGHT, OUT OF MIND НЕВИДИМЫЙ, БЕЗУМНЫЙ INVISIBLE, INSANE The idiom had been translated literally, and lost its usefulness НЕВИДИМЫЙ, БЕЗУМНЫЙ
SO TO PUT IT SIMPLY, BRAINS DO LANGUAGE BETTER THAN THE EARLY COMPUTERS. BUT HAVE THEY GOT BETTER SINCE?
SECTION 7 IS MACHINE TRANSLATION JUST A PIPE DREAM?
Modern computers have been doing effective spell-checking for over a decade now..... The author doesn't know enough Welsh to know whether this joke works in Welsh.
TV programmes regularly use instant captioning for the hard of hearing. But because the software has to work very quickly, errors and hesitations are commonplace..... Eisteddfod Genedlaethol yr Urdd yw gŵyl ieuenctid fwyaf Ewrop. Fe'i trefnir gan..... EISTEDDFOD CEN.. GENEDLAETHOL 'R URDD YW GWYL.... TYDFIL EWROP..... TREFNIR GAN
Text recognition and speech synthesis systems are even quite happy to read books out loud to the blind.
English / Welsh AURAL GRAMMARS English / Welsh AURAL VOCABS English / Welsh ORAL GRAMMARS English / Welsh ORAL VOCABS But the problem is that none of these systems understand a word of what they are saying..... Artificial Vocabulary 10/10 Artificial Grammar 5/10 Artificial Intelligence 1/10
MACHINE TRANSLATION Here's a typical translation package.....
..... and here's a typical free translation website.....
TAKE THE DELIBERATELY IDIOMATIC TITLE OF THIS TALK, FOR EXAMPLE WHY MACHINES CAN'T TRANSLATE FOR TOFFEE PAM NA ALL PEIRIANNAU GYFIEITHU DROS EU CROGI DROS EU CROGI = OVER / OF THEM / TO HANG SO PERHAPS "WHY MACHINES CAN'T TRANSLATE TO SAVE THEIR LIVES"
AND WHAT OF THE FUTURE? WILL MACHINES EVER DO LANGUAGE LIKE WE DO? THE ANSWER IS THAT THEY MIGHT. BUT ONLY IF THEIR DESIGNERS CRACK THE PROBLEM OF HOW HUMAN BEINGS DO UNDERSTANDING
MACHINE TRANSLATION Specifically, software has to be designed to do what the brain does, and use an intervening code - the meaning code. English / Welsh AURAL GRAMMARS English / Welsh AURAL VOCABS English / Welsh ORAL GRAMMARS English / Welsh ORAL VOCABS
MACHINE TRANSLATION Such intervening codes have been standard programming practice since the 1960s, and are known as "interlinguas". So the brain uses several languages, but it reduces all of them to a single interlingua - its mental encyclopaedia.
IT WOULD WORK LIKE THIS..... WHY MACHINES CAN'T TRANSLATE FOR TOFFEE AS-YET-UNKNOWN INTERLINGUA CODE PAM NA ALL PEIRIANNAU GYFIEITHU DROS EU CROGI
THE AUTHOR IS CURRENTLY WORKING ON A SOFTWARE PROJECT CALLED KONRAD, WHICH USES A NUMERIC INTERLINGUA
INTRODUCING PROJECT KONRAD TECHNICAL SPECIFICATION The application platform is the CA-IDMS (Release 16) network database. As is normal for such databases, the data layout has to be carefully defined beforehand in what is known as a "data model". Vocabularies can be loaded into the database from any number of languages, but are all linked to a single set of knowledge records to give the translation its all-important meaning.
INTRODUCING PROJECT KONRAD THE MAIN UPDATE PROGRAM KONRAD Main Program CA-IDMS Network Database KONRAD Report Program COMMANDS FILE OUTPUT LOG FILE REPORT Here is the database defined by the Data Model. It contains bilingual English and Welsh vocabularies, both linked to a network-structured interlingua