Programming with Data (PWD 2019) Revision Class

Programming with Data (PWD 2019) Revision Class
Tuesday, 7 May 2019 Stelios Sotiriadis Prof. Alessandro Provetti

Agenda for today Part 1: Part 2:
Exam clarifications and material to be covered Part 2: Revision and example questions for exam

Clarifications for exam and material to be covered
Part 1 Clarifications for exam and material to be covered

Preparing for exam All material seen in class is available for download in: Moodle PWD Stelios web site for FT students: Alessandro web site for PT students: Key chapters from book are available for download Code samples:

How PWD is going to be assessed?
The written examination is on: Monday 10th of June 2019, 1:30-3:30pm Please double check! There are FIVE questions in the exam paper Answer only FOUR of the FIVE questions Each question carries 25 marks

What is expected? To be able to critically analyse concepts taught in class Explain a concept To write code fragments (short)

How to approach questions?
Be logical Explain clearly the concepts based on your own words No need to memorize definitions, but be able to simply explain a concept. Answers will be evaluated based on the logic and critical understanding: Arguments or code

Answering coding questions
Python indentation might be confusing for markers Provide clear code, make sure you don’t overthink it Think about indentation to make sure your program could work i=0 for i in range(10): print(i) if i == 8: break i=0 for i in range(10): print(i) if i == 8: break

In practice What is the role of Python?
Python will be used to answer a question Is not examined per se Ethics of Computing and Data Mining/Science? Ethics material is not in the final exam. Lecture 4 and Lecture 5 will not be examined SQLite and Pandas will not be examined

Topics to be examined Past exam papers Computational problems, cost estimates, timing of a function Random numbers and their application in algorithms Probabilities and how to estimate events Gradient descent Informal database specification (E-R models) SQL: Create tables, adding constraints and primary, foreign keys Update a table Select statements (SELECT/FROM/WHERE) Graphs and matrices Greedy and dynamic programming algorithms Complexity classes, intractable problems and approximation GROUP 1 GROUP 2 GROUP 3 There will always be a question from the first group and one from the second group 8 5 =56 𝑝𝑜𝑠𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 6 3 =20 𝑝𝑜𝑠𝑠𝑖𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠

Revision and example questions for exam
Part 2 Revision and example questions for exam

Big O & sorting and searching
Class 1 Big O & sorting and searching

Lecture 1 Big O Computational costs
O(nlogn) Lecture 1 O(n!) Big O Computational costs Complete Question 1 of the revision class quiz operations O(n) O(logn) O(n^2): bubble sort O(n): linear search O(logn): binary search O(nlogn): Merge sort O(1) elements

Lecture 1 Sorting Searching Merge sort Insertion sort Tim sort
Example question: What are the differences between the following complexities: O(n^2), O(n) and O(logn), give an example of an algorithm for each complexity. [ 5 marks] Sorting Merge sort Insertion sort Tim sort Sorts small pieces using insertion sort Merges the pieces using merge sort Searching Complete question 2 of the revision class quiz Linear search/Naive search: O(𝑛) operations Advanced search (e.g. Binary search): O( log 𝑛 ) operations (divide and conquer) O(n^2): bubble sort O(n): linear search O(logn): binary search O(nlogn): Merge sort

Linear vs. Binary search & code benchmarking
Class 2 Linear vs. Binary search & code benchmarking

Lecture 2 Recursive algorithms Linear search Binary search
Recursive algorithm is the algorithm that calls it self! Linear search Binary search Interpolation search Example question: Explain briefly the binary and interpolation search algorithms [5 marks]

Lab 2 Benchmarking To measure the time cost of an algorithm we use the computer’s clock to obtain an actual run time. The program implements an algorithm that counts from 1 to a given number. import time def some_algorithm(n): start=time.time() for i in range(n): #your actual algorithm elapsed=time.time()-start print(elapsed) some_algorithm(10) Example question: Give an example of how to measure a running time of an algorithm in Python [5 marks] Complete question 3 of the revision class quiz

Secretary problem & probabilities
Class 3 Secretary problem & probabilities

Lecture 3 Optimal stopping strategy and the “Secretary problem”
Read the following: Probabilities What is the probability to toss a fair coin 3 times, and have a head each time? We have: 2^3 = 8 options HHH HHT HTH HTT THH THT TTH TTT 𝑃 𝑋=0 = (no heads!) 𝑃 𝑋=1 = (I have 1 head) 𝑃 𝑋=2 = (I have 2 heads) 𝑃 𝑋=3 = (all heads!) 1/8 3/8 Value for X Probability

Lab 3 Consult Joel Grus’ Data Science from Scratch for the ‘probability of having two baby girls example’ What is the probability of a couple to have two girls conditional on the event “the first child is a girl?” 1/2

Class 4 More probabilities

Lab 4 Probabilities continue… Probability definitions:
Example question: We expect 5 yellow and 4 black cars to pass by, what is the probability that the second car to pass by is black, conditional on the event that the first car that passed by was yellow? [1 mark] Lab 4 Probabilities continue… Unconditional events Conditional events Probability definitions: Pr[E,F] = Pr [E] * Pr[F] when events E and F are independent. Pr[E,F] = Pr[F] * Pr[E|F] when events E and F are dependent. 4 out of 8 = 0.5 Example question: 2 blues and 3 red balls are in a bag, what are the chances of getting a blue ball? [1 mark] 2 in 5 = 0.4 Complete question 4 of the revision class quiz

Knapsack fractional vs. Knapsack 0-1
Class 5 Knapsack fractional vs. Knapsack 0-1

Lab 5 Greedy algorithms Knapsack fractional
Make the local choice that maximizes a local (easy to check) criterion in the hope that the thus-generated solution will maximise the global (costly to check) criterion. Knapsack fractional n objects! (n=7) m is size of a bag (m=15) Objects: O Profits: P Weight: W The question is: How to fill the bag such that the profit is maximized? (Solution is in notes of Lab 5)

Lab 5 Knapsack fractional vs Knapsack 0-1
Knapsack 0-1 (Study lab 5 notes and the following): Read the following: Example question: Describe the Knapsack 0-1 problem [10 marks]

Database design and SQL, Gradient descent
Class 6 Database design and SQL, Gradient descent

Lecture 6: E-R diagram Example question: Create an E-R diagram for a company to handle customer orders for products. A customer can place orders for one or more items. [10 marks] Complete question 5 of the revision class quiz

Lecture 6: Example of SQL statements
Examples CREATE TABLE cities (city_id integer primary key, name varchar(24) not null, in_country varchar(2)); DELETE * FROM CITIES WHERE in_country = 'GB’ UPDATE actors SET birth_year=1974 WHERE actor_id=4; SELECT film_id,title,release_year FROM films WHERE runtime_minutes >= 100; Complete question 6 of the revision class quiz Example question: Give an SQL statement to delete all cities from table ‘cities’ where in country is US [1 mark] DELETE * from cities WHERE in_country = 'US’; [1 mark]

Lab 6: Gradient descent A method to optimize a function, in our example minimize the error (mse) to find the best fit line! Example question: Describe the gradient descent algorithm [15 marks]

Lab 6: Gradient descent Batch gradient descent: Mini-batch
Example question: Explain the differences between batch, mini-batch and stochastic gradient descent [5 marks] Batch gradient descent: Use all data, in Class6-grad_descent(ax+b).py we have 5 points But what happens if we have 1000 or 1 billion points? Algorithm becomes very slow! Mini-batch Instead of going over all examples, Mini-batch Gradient Descent sums up over lower number of examples based on the batch size. Stochastic gradient descent Shuffle the training data and uses a single randomly picked training example

More SQL and gradient descent
Class 7 More SQL and gradient descent

Lecture 7: SQL examples Cont.
Two SQL statement styles in class: MySQL (lecture notes) Oracle (lab exercises) Main differences are in CREATE and INSERT statements Feel free to use the style you prefer

Lecture 7: SQL examples Cont.
Create a table with foreign keys (example of lecture 7) Select data from two tables SELECT * FROM actors,cities WHERE actors.birth_place = cities.city_id; Select country and sum of populations from cities grouping by country SELECT in_country, SUM(population) FROM cities GROUP BY in_country;

Lab 7: Gradient descent: Class6-grad_descent(mx+b).py

Dynamic programming and Dijkstra algorithm, more SQL
Class 8 Dynamic programming and Dijkstra algorithm, more SQL

Lecture 8 Dynamic Programming Dijkstra algorithm
is a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions using a memory-based data structure (e.g. an array). Dijkstra algorithm Single source shortest path problem Comparison of Divide and conquer, Greedy algorithms and Dynamic programming algorithms Example question: Explain the Dijkstra algorithm [10 marks]

Lab 8: SQL Find the monthly salary of the employees named White.
SELECT Salary / 12 as MonthlySalary FROM Employee WHERE Surname = 'White'; Find the maximum salary among the employees who work in a department based in London. SELECT MAX(Salary) FROM Employee, Department WHERE Department.Dept = Department.DeptName AND Department.City = 'London'; Example question: Find the sum of salaries of all the employees of the same department. Answer: SELECT Dept, SUM(Salary) FROM Employee GROUP BY Dept; [1 mark]

P vs. NP & Transactional systems
Class 9 P vs. NP & Transactional systems

Lecture 9: P vs. NP problems
We want to find algorithms faster than the existing ones E.g. for sorting from O(n2) (insertions sort) we went to O(nlogn) (merge sort). Problems that need exponential times we need to make them solved faster Methods need exponential times need to be solved in polynomial times. NP is the class of all problems for which checking a putative solution costs poly-time. Example: Travelling Saleperson Problem (mentioned FoC) a closed tour of n cities, Maximum W km/mi in O(2^n) P is: Polynomial time deterministic algorithm Deterministic: Set of clear steps to follows, no randomness involved! NP is: Non deterministic polynomial time algorithm Non deterministic: When you don’t have a solution assume that one exists…different behaviour for different runs.

Lab 9: Transactional systems
SQL would rather generate errors than let you spoil the data a rollback mechanism brings the DB back to its previous, consistent state. ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures. A: Atomicity (all-or-nothing behaviour) Atomicity guarantees that each transaction is treated as a single "unit", which either succeeds completely, or fails completely: C: Consistency (of the data) Consistency ensures that a transaction can only bring the database from one valid state to another (any data written to the database must be valid according to all defined rules) I: Isolation (from concurrent transactions) Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. D: Durability (only transactions change data) Durability guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash). Example question: Explain the ACID properties (Atomicity, Consistency, Isolation, Durability) [5 marks]

Class 10 NP vs NP Hard

Lecture 10: NP vs. NP Hard NP vs. NP Hard What is NP Hard?
Probabilistic algorithms Build algorithms using a ‘random’ element so as gain improved performance. For some cases, improved performance is very dramatic, moving from intractable to tractable. Approximation algorithms An approximate algorithm is a way of dealing with NP-completeness for optimization problem. This technique does not guarantee the best solution. The goal of an approximation algorithm is to come as close as possible to the optimum value in a reasonable amount of time which is at most polynomial time. Example question: Explain the differences between probabilistic and approximation algorithms [ 5 marks]

Quote of the day “Do or do not. There is no try.” Thank you and good luck!

Programming with Data (PWD 2019) Revision Class

Similar presentations

Presentation on theme: "Programming with Data (PWD 2019) Revision Class"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Programming with Data (PWD 2019) Revision Class

Similar presentations

Presentation on theme: "Programming with Data (PWD 2019) Revision Class"— Presentation transcript:

Similar presentations

About project

Feedback