Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alfred V. Aho aho@cs.columbia.edu The Evolution of AWK: Computational Thinking in Programming Language Design COMS W4995 Design - Bjarne Stroustrup April.

Similar presentations


Presentation on theme: "Alfred V. Aho aho@cs.columbia.edu The Evolution of AWK: Computational Thinking in Programming Language Design COMS W4995 Design - Bjarne Stroustrup April."— Presentation transcript:

1 Alfred V. Aho aho@cs.columbia.edu
The Evolution of AWK: Computational Thinking in Programming Language Design COMS W4995 Design - Bjarne Stroustrup April 24, 2015

2 Computational Thinking
Computational thinking is a fundamental skill for everyone, not just for computer scientists. To reading, writing, and arithmetic, we should add computational thinking to every child’s analytical ability. Just as the printing press facilitated the spread of the three Rs, what is appropriately incestuous about this vision is that computing and computers facilitate the spread of computational thinking. Jeannette M. Wing Computational Thinking CACM, vol. 49, no. 3, pp , 2006

3 What is Computational Thinking?
The thought processes involved in formulating problems so their solutions can be represented as computation steps and algorithms. Alfred V. Aho Computation and Computational Thinking The Computer Journal, vol. 55, no. 7, pp , 2012

4 Software in Our World Today
How much software does the world use today? Guesstimate: over one trillion lines of source code What is the sunk cost of the legacy base? $100 per line of finished, tested source code How many bugs are there in the legacy base? 10 to 10,000 defects per million lines of source code A. V. Aho Software and the Future of Programming Languages Science, February 27, 2004, pp

5 Programming Languages Today
Today there are thousands of programming languages. The website has programs in over 1,500 different programming languages and variations to generate the lyrics to the song “99 Bottles of Beer.”

6 “99 Bottles of Beer” 99 bottles of beer on the wall, 99 bottles of beer. Take one down and pass it around, 98 bottles of beer on the wall. 98 bottles of beer on the wall, 98 bottles of beer. Take one down and pass it around, 97 bottles of beer on the wall. . 2 bottles of beer on the wall, 2 bottles of beer. Take one down and pass it around, 1 bottle of beer on the wall. 1 bottle of beer on the wall, 1 bottle of beer. Take one down and pass it around, no more bottles of beer on the wall. No more bottles of beer on the wall, no more bottles of beer. Go to the store and buy some more, 99 bottles of beer on the wall. [Traditional]

7 “99 Bottles of Beer” in C++
#include <iostream> using namespace std; int main() { int bottles = 99; while ( bottles > 0 ) cout << bottles << " bottle(s) of beer on the wall," << endl; cout << bottles << " bottle(s) of beer." << endl; cout << "Take one down, pass it around," << endl; cout << --bottles << " bottle(s) of beer on the wall." << endl; } return 0; [Tim Robinson,

8 Stroustrup’s Version #include <iostream> using namespace std; int main() {     for (int bottles = 99;  bottles>0; --bottles)         cout << bottles << " bottle(s) of beer on the wall,\n"             << bottles << " bottle(s) of beer.\n"             << "Take one down, pass it around,\n"             << " bottle(s) of beer on the wall.\n\n"; } [Bjarne Stroustrup, personal communication, April 17, 2015]

9 “99 Bottles of Beer” in the Whitespace language
[Andrew Kemp,

10 Why Are There So Many Languages?
One language cannot serve all application areas well e.g., programming web pages (JavaScript) e.g., electronic design automation (VHDL) e.g., parser generation (YACC) Programmers often have strongly held opinions about what makes a good language how programming should be done There is no universally accepted metric for a good language!

11 Evolution of Programming Languages
1970 Fortran Lisp Cobol Algol 60 APL Snobol 4 Simula 67 Basic PL/1 Pascal 2015 Java C C++ Objective-C C# JavaScript PHP Python Visual Basic Visual Basic .NET TIOBE Index April 2015 2015 Java PHP Python C# C++ C JavaScript Objective-C MATLAB R PYPL Index April 2015

12 Evolutionary Forces on Languages
Increasing diversity of applications Stress on increasing programmer productivity and shortening time to market Need to improve software security, reliability and maintainability Emphasis on mobility and distribution Support for parallelism and concurrency New mechanisms for modularity Trend toward multi-paradigm programming

13 Models of Computation in Languages
Early programming languages usually had only one model of computation: Fortran (1957): Procedural Lisp (1958): Functional Simula (1967): Object oriented Prolog (1972): Logic SQL (1974): Relational algebra

14 Models of Computation in Languages
New programming languages are often designed around several models of computation And legacy languages are incorporating additional models of computation to support multiple programming paradigms

15 Example: Elm Elm is a functional programming language for declaratively creating web browser based graphical user interfaces. It uses functional reactive programming and purely functional graphical layout to build user interfaces without any destructive updates. Elm was designed in 2012 by Evan Czaplicki. The key features in Elm are signals, immutability, static types, and interoperability with HTML, CSS, and JavaScript. elm-lang.org

16 Example: Rust Rust is a general-purpose, multi-paradigm, compiled programming language. It is designed to be a safe, concurrent, practical language. First pre-alpha release of the Rust compiler was in 2012. It supports pure-functional, concurrent-actor, imperative-procedural, and object-oriented programming styles. Rust was originally designed by Graydon Hoare and is supported by Mozilla Research. It advertises itself as “a systems programming language that runs blazingly fast, prevents almost all crashes, and eliminates data races.”

17 Example: Swift Swift is Apple’s new programming language for iOS and OS X whose code is designed to work with Objective-C It was designed with code safety and performance in mind. Some of the features of Swift include named parameters inferred types modules automatic memory management closures with unified function pointers functional programming patterns like map and filter

18 The Birth of AWK AWK is a scripting language for routine data-processing tasks designed by Al Aho, Brian Kernighan, Peter Weinberger at Bell Labs around 1977 Each of the co-designers had slightly different motivations Aho wanted a generalized grep Kernighan wanted a programmable editor Weinberger wanted a database query tool All co-designers wanted a simple, easy-to-use language

19 Kleene Regular Expressions
Matches c the character c itself except when it is (, ), or * r1 | r2 r1 or r2 r1 r2 r1 followed by r2 r * zero or more instances of r ( r ) r ‘|’ has lowest precedence, then concatenation, then * ‘|’ and concatentation are left associative, * is right associative For example, a | b*c = (a) | ((b*) c)

20 Kleene Regular Expressions and Finite Automata are Equivalent
The set of strings denoted by any Kleene regular expression can be recognized by a deterministic finite automaton. The set of strings recognized by any finite automaton can be denoted by a regular expression.

21 Grep Regular Expressions
Matches c the character c itself except when c is . [ ] ^ $ * \ r1r2 r1 followed by r2 r * zero or more instances of r . any character ^ beginning of line when ^ is first character in regexp $ end of line when $ is last character in regexp [abc] an a, b, or c [a-z] any lower-case letter [^abc] any character except an a, b, or c [^0-9] any character that is not a digit \c c unless c is ( ) or a digit \( r \) tagged regular expression that matches r the matched strings are available as \1, \2, etc.

22 Back Referencing in Grep Regular Expressions
Grep regular expressions can match non-regular languages: ^\([ab]*\)\1$ matches strings of the form xx where x is any string of a’s and b’s Back referencing makes the string pattern matching problem NP-complete: Theorem: Let r be a grep regular expression with back referencing and s an input string. Determining whether s contains a substring matched by r is NP-complete. Proof. Reduction from vertex-cover. See A. V. Aho, “Algorithms for finding patterns in strings”, Handbook of Theoretical Computer Science, MIT Press 1990, pp

23 Egrep Regular Expressions
Started with grep regular expressions except for back referencing Added ‘|’ for union as in Kleene regular expressions Added parentheses for grouping as in Kleene regular expressions Current egrep uses POSIX regular expressions.

24 Egrep Regular Expression Pattern-Matching Algorithm
Constructs the transitions for a deterministic finite automaton on demand from the regular expression using an LR(0)-like algorithm Uses a fixed size cache to store the transitions of the DFA Adds a transition in a given state on a given input character only when it is needed When the cache becomes full, it flushes it and adds transitions to the empty cache as needed Observed time complexity given a regular expression r and an input string s is O(|r| + |s|). It is an open question if this can be achieved in the worst case.

25 The Evolution of AWK Prototypical use cases
selection: “print all lines containing the word AWK in the first field” $1 ~ /AWK/ transformation: “print the second and first field of every line” { print $2, $1 } report generation: “sum the values in the first field of every line and then print the sum and average” { sum += $1 } END { print "sum = " sum, "avg = " sum/NR }

26 Structure and Invocation of an AWK Program
An AWK program is a sequence of pattern-action statements pattern { action } . . . Each pattern is a boolean combination of regular, numeric, and string expressions An action is a C-like program If there is no { action }, the default is to print the line Invocation awk ‘program’ [file1 file ] awk –f progfile [file1 file ]

27 AWK’s Model of Computation: Pattern-Action Programming
for each file for each line of the current file for each pattern in the AWK program if the pattern matches the input line then execute the associated action

28 AWK in a Nutshell - I Input is read automatically across multiple files lines are split into fields $1, $2, . . ., $NF whole line is $0 Variables are dynamic and can contain string or numeric values or both no declarations: types determined by context and use initialized to 0 and empty string built-in variables for frequently used values Operators work on strings or numbers coerce type/value according to context what does $1 == $2 mean? Associative arrays take arbitrary subscripts Regular expressions as in egrep

29 AWK in a Nutshell - II Control-flow operators similar to C
if-else, while, for, do Built-in functions for arithmetic, string processing, regular expressions, editing text, . . . Supports user-defined functions printf for formatted output getline for input from files or processors

30 Some Useful AWK “One-liners”
Print the total number of input lines END { print NR } Print the last field of every line { print $NF } Print each line preceded by its line number { print NR, $0 } Print all non-empty lines NF > 0 What does this AWK program do? !x[$0]++

31 pattern-action statements
AWK Summary AWK is a scripting language designed for routine data-processing tasks on strings and numbers E.g.: given a list of name-value pairs, print the total value associated with each name. An AWK program is a sequence of pattern-action statements alice 10 eve 20 bob 15 alice 30 { total[$1] += $2 } END { for (x in total) print x, total[x] } eve 20 bob 15 alice 40

32 regular expression and text size n
Comparison: Regular Expression Pattern Matching in Perl, Python, Ruby vs. AWK Time to check whether a?nan matches an regular expression and text size n Russ Cox, Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) [ 2007]

33 “99 Bottles of Beer” in AWK (bottled version)
BEGIN{ split( \ "no mo"\ "rexxN"\ "o mor"\ "exsxx"\ "Take "\ "one dow"\ "n and pas"\ "s it around"\ ", xGo to the "\ "store and buy s"\ "ome more, x bot"\ "tlex of beerx o"\ "n the wall" , s,\ "x"); for( i=99 ;\ i>=0; i--){ s[0]=\ s[2] = i ; print \ s[2 + !(i) ] s[8]\ s[4+ !(i-1)] s[9]\ s[10]", " s[!(i)]\ s[8] s[4+ !(i-1)]\ s[9]".";i?s[0]--:\ s[0] = 99; print \ s[6+!i]s[!(s[0])]\ s[8] s[4 +!(i-2)]\ s[9]s[10] ".\n";}} [Wilhem Weske,


Download ppt "Alfred V. Aho aho@cs.columbia.edu The Evolution of AWK: Computational Thinking in Programming Language Design COMS W4995 Design - Bjarne Stroustrup April."

Similar presentations


Ads by Google