Hoogλe Finding Functions from Types Neil Mitchell haskell.org/hoogle community.haskell.org/~ndm/ [α]  [α]

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 10 - type checking.
Advertisements

Tom Schrijvers K.U.Leuven, Belgium with Manuel Chakravarty, Martin Sulzmann and Simon Peyton Jones.
Haskell Chapter 7. Topics  Defining data types  Exporting types  Type parameters  Derived instances  Type synonyms  Either  Type classes  Not.
Kathleen Fisher cs242 Reading: “A history of Haskell: Being lazy with class”,A history of Haskell: Being lazy with class Section 6.4 and Section 7 “Monads.
Problem Solving 5 Using Java API for Searching and Sorting Applications ICS-201 Introduction to Computing II Semester 071.
Haskell Chapter 5, Part I. Topics  Higher Order Functions  map, filter  Infinite lists Get out a piece of paper… we’ll be doing lots of tracing.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
S EMINAL : Searching for ML Type-Error Messages Benjamin Lerner, Dan Grossman, Craig Chambers University of Washington.
Arrays.
©The McGraw-Hill Companies, Inc. Permission required for reproduction or display. ETC - 1 What comes next? Recursion (Chapter 15) Recursive Data Structures.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
Type Classes with Functional Dependencies Mark P Jones, Oregon Graduate Institute The theory of relational databases meets.
0 PROGRAMMING IN HASKELL Typeclasses and higher order functions Based on lecture notes by Graham Hutton The book “Learn You a Haskell for Great Good” (and.
CSC 107 – Programming For Science. Today’s Goal  Learn how arrays normally used in real programs  Why a function returning an array causes bugs  How.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 15: Linked data structures.
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
Uniform Boilerplate and List Processing Or: Scrap Your Scary Types Neil Mitchell and Colin Runciman, Haskell Workshop, 2007 Simple generics (Haskell ’98)
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Lecture 10: Class Review Dr John Levine Algorithms and Complexity March 13th 2006.
Week 6 - Wednesday.  What did we talk about last time?  Exam 1 post-mortem  Recursive running time.
Data Structures Chapter 1- Introduction Mohamed Mustaq.A.
PowerPoint Presentation of Essential Concepts PowerPoint Presentation of Essential Concepts Chalice Tillis LEM 511.
Arrays Tonga Institute of Higher Education. Introduction An array is a data structure Definitions  Cell/Element – A box in which you can enter a piece.
CS321 Functional Programming 2 © JAS Type Checking Polymorphism in Haskell is implicit. ie the system can derive the types of all objects. This.
Some Other Collections: Bags, Sets, Queues and Maps COMP T2 Lecture 4 School of Engineering and Computer Science, Victoria University of Wellington.
Advanced Functional Programming 2009 Ulf Norell (lecture by Jean-Philippe Bernardy)
CS 206 Introduction to Computer Science II 09 / 10 / 2009 Instructor: Michael Eckmann.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Create Your Own Webpage. Today’s Class Internet Safety & Privacy Tables Embedding music and video Frames.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Data Structures & Algorithms
Curry A Tasty dish? Haskell Curry!. Curried Functions Currying is a functional programming technique that takes a function of N arguments and produces.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Composition When one class contains an instance variable whose type is another class, this is called composition. Instead of inheritance, which is based.
Hoogle Neil Mitchell. Haskell Types 101 isURI :: String -> Bool (||) :: Bool -> Bool -> Bool or :: [Bool] -> Bool id :: a ->
PHY 107 – Programming For Science. Today’s Goal  Learn how arrays normally used in real programs  Why a function returning an array causes bugs  How.
1 Testing in Haskell: using HUnit Notes, thanks to Mark P Jones Portland State University.
Python 1 SIGCS 1 Intro to Python March 7, 2012 Presented by Pamela A Moore & Zenia C Bahorski 1.
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
Fusion Design Overview Object Interaction Graph Visibility Graph Class Descriptions Inheritance Graphs Fusion: Design The overall goal of Design is to.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Data Structures and Algorithm Analysis Dr. Ken Cosh Linked Lists.
0 PROGRAMMING IN HASKELL Typeclasses and higher order functions Based on lecture notes by Graham Hutton The book “Learn You a Haskell for Great Good” (and.
Advanced Functional Programming 2010
Polymorphic Functions
Haskell Chapter 7.
Types CSCE 314 Spring 2016.
Haskell Chapter 2.
smallest number of inserts/deletes to turn arg#1 into arg#2
String Processing.
.NET and .NET Core 5.2 Type Operations Pan Wuming 2016.
Design by Contract Fall 2016 Version.
Object Oriented Programming (OOP) LAB # 8
Objective of This Course
Data Structures and Algorithms
CMSC 202 Lesson 22 Templates I.
Type & Typeclass Syntax in function
Database Design and Programming
topics mutable data structures
Records and Type Classes
Templates I CMSC 202.
CSEP505: Programming Languages Lecture 9: Haskell Typeclasses and Monads; Dan Grossman Autumn 2016.
Arrays.
PROGRAMMING IN HASKELL
PROGRAMMING IN HASKELL
Assignment resource Working with Excel Tables, PivotTables, and Pivot Charts Fairhurst pp The commands on these slides work with the Week 2 Excel.
Records and Type Classes
Presentation transcript:

Hoogλe Finding Functions from Types Neil Mitchell haskell.org/hoogle community.haskell.org/~ndm/ [α]  [α]

Hoogle Synopsis Hoogle is a Haskell API search engine, which allows you to search many standard Haskell libraries by either function name, or by approximate type signature. “ ” Or, Google for Haskell libraries

Solving the Jigsaw static typing is … putting pieces into a jigsaw puzzle “ ” Real World Haskell Find a function to go here

Which function do we want? [Int]  String Ord a  [a]  [a] Char  Bool (a  b)  [a]  [b] a  [(a,b)]  b Set a  a  Bool

The Problem Given a type signature, rank a set of functions with types by appropriateness Order types by closeness, efficiently Heuristics/Psychic powers Algorithms

String: Ordering by closeness Equality, perhaps case insensitive Prefix/Suffix/Substring matching Levenshtein/edit distance Tries, KMP, FSA, Baeza-Yates… search :: [(String,φ)]  (String  [φ])

String: Edit Distance How many “steps” –Insertion or deletion –Substitution (just a cheap insert and delete?) Hello  Hell Hell  Sell O(nm), result is bounded by max(n,m)

Type: Ordering by closeness Ignoring performance, we can write: How “close” are two Type values? (May not be commutative) match :: Type  Type  Maybe Closeness

Brainstorm match :: Type  Type  Maybe Closeness What is Closeness? How is it calculated?

Ideas Alpha equality (Hoogle 1) Isomorphism (Rittri, Runciman ’s) Textual type searching (Hayoo!) Unification (Hoogle 2) Edit distance (Hoogle 3) Full edit distance (Hoogle 3.5) Structural edit distance (Hoogle 4) Result indexed edit distance (Hoogle 5)

Alpha equality Take a type signature, and “normalise” it Rename variables to be sequential Then do an exact text match k  v  Map k v a  b  Map a b No psychic powers

Isomorphism Only match types which are isomorphic –Long before type classes Ismorphism is about equal structure –a  b  c  (a, b)  c uncurry :: (a  b  c)  (a, b)  c :: (a  b  c)  a  b  c Less useful for modern code

Textual Type Searching Alpha normalise + strength reduced alpha normalisation k  v  Map k v a  b  Map a b & a  b  c a b Plus substring searching A neat hack, build on text search

Unification Unify against each result, like a compiler The lookup problem: –a  [(a,b)]  b  a  [(a,b)]  Maybe b Works OK, but not great, in practice –More general is fine, what about less general? –a  everything? –is undefined really the answer? Not what humans want

Edit Distance What changes do I need to make to equalise these types Each change has a cost a  [(a,b)]  b a  [(a,b)]  Maybe b Eq a  a  [(a,b)]  Maybe b box context A nice start, lots of details left

Ideas Compared Generality My Type Alpha equality Unification Edit distance Textual search = superset of (alpha equality + more general unification) Unification (?) All but Textual search can have argument reordering added

Edit Distance Costs Alias following (String  [Char]) Instances (Ord a  a  a) Subtyping (Num a  a  Int) Boxing (a  m a, a  [a]) Free variable duplication ((a,b)  (a,a)) Restriction ([a]  m a, Bool  a) Argument deletion (a  b  c  b  c) Argument reordering

Edit Distance Examples [Int]  String Show a  a  String [Int]  [Char] (a  b)  [a]  [b] [a]  [Char] [a]  [b] Int  String a  String alias restrict context unbox restrict dead arg subtype

A note on “subtype” Num a  a  a Double  Double a  a Given instance Num Double: Double  (Num a  a)  a

A note on “boxing” Eq a  a  [a]  Int Eq a  a  [a]  Maybe Int Eq a  a  [a]  [Int] Most boxes add a little info: Maybe - this might fail/optional arg List - may be multiple results IO - you need to be in the IO monad

Edit Distances Which types of edits should be used? –Lots of scope for experimentation Can the edits be implemented efficiently? What environment do we need? –Aliases? Instances?

Ordering Closeness type Closeness = [Edit] compare :: Closeness  Closeness  Ordering compare x y = score x > score y score :: Closeness  Double score = sum. map rank rank :: Edit  Double Throw away choices

Ranking Edits Initial attempt: Make up numbers manually –Did not scale at all, hard to get right, like solving a large constraint problem in your head Solution: Constraint solver!

Ranking Examples Keep a list of example searches, with ordered results When someone complains, add their complaint to this list Generate a set of constraints, then solve –I use the ECLiPSe constraint solver

Performance Target: As-you-type searches against all current versions of all Haskell libraries

Naive Edit Distance [x| (t, x)  database, Just c  [match t user], order by c] let n = length database –  (n) to search all items (ignoring sort) –  (n) to find the best result n = 27,396 today (target of 296,871 )

Decomposing Edit Distance Functor f  (a  b)  f a  f b subtyping/contextdifferent variables same variablesswap arguments

Interactive Lists data Barrier o α = Value o α | Barrier o bsort :: Ord o  [Barrier o α]  [α] Given (Barrier o 1 :xs),  Value o 2 x  xs, o 1 < o 2

Per Argument Searching The idea: Search for each argument separately, combine the results a  b  c combine $ search arguments a `merge` search arguments b `merge` search results c Use interactive lists for search/combine

Implementing Search Have type graphs, annotated with costs –Dijkstra’s graph search algorithm String a Char [Char]

Implementing Combine Combine is fiddly Needs to apply costs such as instances, variable renaming, argument deletion Check all arguments are present Ensure no duplicate answers Fast to search for the best matches

The Problem Finds the first result quickly Graphs may be really big But a particular search may match many results in many ways –Finding all results can take some time –5000 functions, ~5 seconds Need to be more restrictive with matching

Structure Matching We can decompose any type into a structure and a list of terms Either (Maybe a) (b,c)  ? (? ?) (? ? ?) + Either Maybe a (,) b c Can now find types quickly –22 distinct argument structures in base library –Very amenable to hashing/interning –Not as powerful as edit distance

Structure + Aliases String  [Char] ? + String  ? ? + [] Char Solution: Expand out all aliases –Penalise for all mismatched aliases used –i.e. left uses String, but right doesn’t –Imprecise heuristic

Structure + Boxing Maybe a  a ? ? + Maybe a  ? + a Solution: Only allow top-level boxes –Maybe [a]  Maybe a –Now have at most 3 structure lookups Boxing is 3x expensive

Step 1: Restrict Search Use structure for type search Many fewer answers –5,000 types, ~0.5 seconds Target: 300,000 types, ~0.1 seconds

Step 2: Restrict Combine Start by looking at the result first Map Structure (Map Int [(Type,[(Structure,Type)],[φ])]) box/unbox alias argument count argument reorder Not yet finished implementation

The Hoogle Tool Over 6 years old 4 major versions (each a complete rewrite) –Version 1 in Javascript, 2-4 in Haskell Web version Firefox plugin, iPhone version, command line tool, custom web server

Hoogle Statistics 1.7 million searches up until 1 st Jan 2011 Between 1000 to 2500 a day

Academia + Real World Academia –Theory of type searching Real World –Generating databases of type signatures –Web server, AJAX interface, interactivity –Lots of user feedback, including logs –1/6 of searches are type based

Fixing User Searches double to integer Did you mean: Double  Integer where keyword where

Conclusions I now use Hoogle every day –Name search lets you look up types/docs –Type search lets you look up names –Both let you find new functions Edit distance works for type search Having an online search engine is handy! haskell.org/hoogle

Funny Searches eastenders california public schools portable classes Bondage diem chuan truong dai hoc su pham ha noi 2008 Messenger freak ebay consistency version Simon Peyton Jones Genius free erotic storeis videos pornos gratis gia savores de BARILOCHE name of Peanuts carton bird Colin Runciman