Presentation on theme: "Prolog for Linguists Symbolic Systems 139P/239P John Dowding Week 4, October 29, 2001"— Presentation transcript:
Prolog for Linguists Symbolic Systems 139P/239P John Dowding Week 4, October 29, 2001 firstname.lastname@example.org
Office Hours We have reserved 4 workstations in the Unix Cluster in Meyer library, fables 1-4 Skipping 4:30-5:30 on Thursday this week Friday 3:30-4:30, after NLP Reading Group this week If not, contact me and we can make other arrangements
occurs_in_helper/3 %occurs_in_helper(+Index, -Var, +Term) occurs_in_helper(Index, Var, Term):- Index > 0, arg(Index, Term, Arg), occurs_in(Var, Arg). occurs_in_helper(Index, Var, Term):- Index > 0, NextIndex is Index - 1, occurs_in_helper(NextIndex, Var, Term).
Could have written subterm/2 as: %subterm(+SubTerm, +Term) subterm(SubTerm, Term):- replace_all(SubTerm, Term, _AnyThing, NewTerm), \+ Term == NewTerm. But this would be slower
Accumulators Build up partial results to return at the end list_length(, 0). list_length([_Head|Tail], Result):- list_length(Tail, N), Result is N +1. list_length(List, Result) :- list_length_helper(List, 0, Result). list_length_helper(, Result, Result). list_length_helper([_Head|Tail], Partial, Result):- NextPartial is Partial + 1, list_length_helper(Tail, NextPartial, Result).
Difference Lists Use two logical variables that point to different portions of the same list. Compare stacks with queues:
Queues Queue represented as a pair of lists (Front-Back) Back is always a variable %empty_queue(?Queue) – true if the queue is empty empty_queue(Queue-Queue). %add_to_queue(+Element, +Queue, -NewQueue) add_to_queue(Element, (Front-[Element| Back]), (Front-Back)). %remove_from_queue(+Queue, -Element, -NewQueue) remove_from_queue(([Element|Front]-Back), Element, (Front-Back)).
Generate-and-Test Popular (and sometimes efficient) way to write a program. Goal :- Generator, - generates candidate solutions Tester. - verifies correct answers
One more generate and test example N-Queens Problem
Unification Two terms unify iff there is a set of substitutions of variables with terms that makes the terms identical True unification disallows cyclic terms: X=f(X) ought to fail because there is no finite term that can substitute for X to make those terms identical. This is called the occurs check. Prolog unification does not enforce the occurs check, and may create cyclic terms Occurs check is expensive O(n) – n is the size of the smaller of the two terms O(n+m) – n and m are the sizes of the two terms In Prolog, it is quite typical to unify a variable with a larger term
More about cut! Common to distinguish between red cuts and green cuts Red cuts change the solutions of a predicate Green cuts do not change the solutions, but effect the efficiency Most of the cuts we have used so far are all red cuts %delete_all(+Element, +List, -NewList) delete_all(_Element, , ). delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList). delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).
Green cuts Green cuts can be used to avoid unproductive backtracking % identical(?Term1, ?Term2) identical(Var1, Var2):- var(Var1), var(Var2), !, Var1 == Var2. identical(Atomic1,Atomic2):- atomic(Atomic1), atomic(Atomic2), !, Atomic1 == Atomic2. identical(Term1, Term2):- compound(Term1), compound(Term2), functor(Term1, Functor, Arity), functor(Term2, Functor, Arity), identical_helper(Arity, Term1, Term2).
Last Call Optimization Generalization of Tail-Recursion Optimization Turns recursions into iteration by reusing stackframe When about to execute last Goal in a clause, If there are no more choices points for the predicate, And no choice points from earlier Goals in clause delete_all(_Element, , ). delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList). delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).
Advice on cuts Dangerous, easy to misuse Rules of thumb: Use sparingly Use with as narrow scope as possible Know which choice points you are removing Green cuts may be unnecessary, sometimes the compiler can figure it out.
Input/Output of Terms Input and Output in Prolog takes place on Streams By default, input comes from the keyboard, and output goes to the screen. Three special streams: user_input user_output user_error read(-Term) write(+Term) nl
Example: Input/Output repeat/0 is a built-in predicate that will always resucceed % classifing terms classify_term :- repeat, write('What term should I classify? '), nl, read(Term), process_term(Term), Term == end_of_file.
I/O Example (cont) process_term(Atomic):- atomic(Atomic), !, write(Atomic), write(' is atomic.'), nl. process_term(Variable):- var(Variable), !, write(Variable), write(' is a variable.'), nl. process_term(Term):- compound(Term), write(Term), write(' is a compound term.‘), nl.
Streams You can create streams with open/3 open(+FileName, +Mode, -Stream) Mode is one of read, write, or append. When finished reading or writing from a Stream, it should be closed with close(+Stream) There are Stream-versions of other Input/Output predicates read(+Stream, -Term) write(+Stream, +Term) nl(+Stream)
Characters and character I/O Prolog represents characters in two ways: Single character atoms ‘a’, ‘b’, ‘c’ Character codes Numbers that represent the character in some character encoding scheme (like ASCII) By default, the character encoding scheme is ASCII, but others are possible for handling international character sets. Input and Output predicates for characters follow a naming convention: If the predicate deals with single character atoms, it’s name ends in _char. If the predicate deals with character codes, it’s name ends in _code. Characters are character codes is traditional “Edinburgh” Prolog, but single character atoms were introduced in the ISO Prolog Standard.
Special Syntax I Prolog has a special syntax for typing character codes: 0’a is a expression that means the character codc that represents the character a in the current character encoding scheme.
Special Syntax II A sequence of characters enclosed in double quote marks is a shorthand for a list containing those character codes. “abc” = [97, 98, 99] It is possible to change this default behavior to one in which uses single character atoms instead of character codes, but we won’t do that here.
Built-in Predicates: atom_chars(Atom, CharacterCodes) Converts an Atom to it’s corresponding list of character codes, Or, converts a list of CharacterCodes to an Atom. put_code(Code) and put_code(Stream, Code) Write the character represented by Code get_code(Code) and get_code(Stream, Code) Read a character, and return it’s corresponding Code Checking the status of a Stream: at_end_of_file(Stream) at_end_of_line(Stream)
Tokenizer A token is a sequence of characters that constitute a single unit What counts as a token will vary A token for a programming language may be different from a token for, say, English. We will start to write a tokenizer for English, and build on it in further classes
Tokenizer for English Most tokens are consecutive alphabetic characters, separated by white space Except for some characters that always form a single token on their own:. ‘ ! ? -
Homework Read section in SICTus Prolog manual on Input/Output This material corresponds to Ch. 5 in Clocksin and Mellish, but the Prolog manual is more up to date and consistent with the ISO Prolog Standard Improve the tokenizer by adding support for contractions can’t., won’t haven’t, etc. would’ve, should’ve I’ll, she’ll, he’ll He’s, She’s, (contracted is and contracted has, and possessive) Don’t hand this in, but hold on to it, you’ll need it later.