Programming with Data Lab 6

Programming with Data Lab 6
Wednesday, 28 Nov. 2018 Stelios Sotiriadis Prof. Alessandro Provetti

Optimization

General format Instance: A collection
Solution: (Often) a choice from the collection under some constraints Measure: A goal, i.e., a cost function to be minimized, or a utility function to be maximized. For this class of problems a mathematical assessment should precede any coding effort: subtle changes on the specification might bring huge changes in the computational cost.

Typical strategies for solving optimization problems
Greedy Randomized methods, e.g., Gradient descent. Dynamic programming Approximation

The Greedy principle Make the local choice that maximizes a local (easy to check) criterion In the hope that the thus-generated solution will maximise the global (costly to check) criterion Local: take as much as possible of the most precious, ounce-by-ounce, bar/bullion available Global: take the combination of bars that gives the maximum aggregated value under W

Does it always work? Greedy does not work on KNAPSACK 0-1
Underlying principle: Greedy works when local min/maximization does not prevent us from later reaching the global optimum. p/w =

Does it always work? In the example, choice of Item 1 excludes the actually-optimal sol. from consideration. Only some sufficient conditions are known for the applicability of Greedy Approximation and randomization are the methods of choice for KNAPSACK 0-1.

A look at the solution: Class5-knapsack-list of pairs
elements = ['Platinum', 'Gold', 'Silver', 'Palladium’] instance = [[20, 711], [15, 960], [2000, 12], [130, 735]] Problem: sorting instance breaks the positional connection between Platinum and [20, 711] Possible solution: the powerful zip operations by python.

Gradient Descent

Glance at machine learning…
In linear algebra we have: 𝑦=2𝑥+3 x = [1,2,3,4…] y = [ 5, 7, 9, 11…] In machine learning: We have data! departments= [1,2,3,4…] sales= [ 5, 7, 9, 11…] We are looking for the equation! y= mx+b e.g. y =2x+3 sales We are looking for the best fit line! departments

Which line is the best fit line?
Draw a line (random) and calculate the error, between the point and the line… 1 𝑛 𝑖=1 𝑛 (𝑒 𝑖 ) 2 Mean square error (mse) 𝑚𝑠𝑒= 1 𝑛 𝑖=1 𝑛 (𝑦 𝑖 − 𝑦′ 𝑖 ) 2 This is our cost function! sales e4 e3 y'2 e2 y2 e1 years

What is gradient descent?
A method to optimize a function, in our example minimize the error (mse) to find the best fit line!

Another example 𝑓 𝑥 = 𝑥 2 −2𝑥+2= 𝑥−1 2 +1
𝑓 𝑥 = 𝑥 2 −2𝑥+2= 𝑥− When x=1, f(1) = 1, this is our min! I know from calculus: Minimize when the derivative of f(x) equals 0 𝜕𝑦 𝜕𝑥 =2𝑥−2=0, 𝑥=1 so at x=1! Min! 1

With gradient descent x=3 x=2.2 Step 1: Take a random point e.g. x=3,
Step 2: Take the derivate at this point of 𝑥− , 𝝏𝒚 𝝏𝒙 = 𝟐 𝟑 −𝟐=𝟒 4 is positive number so function gets larger! Lets say we take -1, then 𝜕𝑦 𝜕 𝑥 0 =2 −1 −2=−4 , so function gets smaller! Step 3: Next guess (on x=3 example). 𝒙 𝒊+𝟏 = 𝒙 𝒊 −𝒂 𝝏𝒚 𝝏 𝒙 𝒊 , Where “a” is a small step e.g. 𝑎=0.2 e.g. 𝑥 1 = 𝑥 𝑜 −𝑎 𝜕𝑦 𝜕 𝑥 0 =3−0.2∗4=2.2 𝝏𝒚 𝝏 𝒙 𝟏 =𝟐 𝟐.𝟐 −𝟐=𝟏.𝟕𝟐 , We moved closer! Step 4: Repeat! Again and again… We need a software to calculate this! x=3 x=2.2

Gradient descent for the best fit example
You take small steps to minimize the error e.g. arbitrary step 0.5… Step is too big! We lost minimum! mse b

Gradient descent You take small steps to minimize the error
The step is minimized while we go… We need to find the slopes! Derivatives introduction. mse b

Gradient descent You take small steps to minimize the error
The step is minimized while we go… We need to find the slopes! We calculate partial derivatives 𝑚𝑠𝑒= 1 𝑛 𝑖=1 𝑛 (𝑦 𝑖 − 𝑦𝑝𝑟𝑒𝑑 𝑖 ) 2 𝑤ℎ𝑒𝑟𝑒 𝑦𝑝𝑟𝑒𝑑=𝑚 𝑥 𝑖 +𝑏 𝑚𝑠𝑒= 1 𝑛 𝑖=1 𝑛 (𝑦 𝑖 − 𝑚 𝑥 𝑖 +𝑏 𝑖 ) 2 mse b1 𝜕 𝜕𝑚 = 2 𝑛 𝑖=1 𝑛 − 𝑥 𝑖 (𝑦 𝑖 −(𝑚 𝑥 𝑖 +𝑏)) b2 =b1 – learning rate * b1’ 𝜕 𝜕𝑏 = 2 𝑛 𝑖=1 𝑛 − (𝑦 𝑖 −(𝑚 𝑥 𝑖 +𝑏)) b

Gradient descent in Python!
Lets see: Class6-grad_descent(ax+b).py And: Class6-gradient_descent_visualize.py

Study Chapter 8! In Chapter 8 of his book, Grus introduces minimization and gradient descent. The intended application is error minimization. But let’s see the details… Book chapter is online: gradient_descent.pdf

Understanding *args and **kwargs
# *args for variable number of arguments def myFun(*args): for arg in args: print (arg) myFun(‘Hi!', ‘I', ‘pass', ‘many’, ‘args!’) Output: Hi! I pass many args! # *kwargs for variable number of keyword arguments def myFun(**kwargs): for key, value in kwargs.items(): print ("%s == %s" %(key, value)) myFun(first ='Key', mid ='value', last='pair') first == Key mid == value last == pair

Functionals Negate a function def negate(f):
"""return a function that for any input x returns -f(x)""" return lambda *args, **kwargs: -f(*args, **kwargs) Example: def myincrementor(n): return n+1 g = negate(myincrementor) # g is a new function print g(6)

List comprehensions and Mappings
unit_prices = [711, 960, 12, 735] print([int(price*1.10) for price in unit_prices]) OR: def myinflator(n): return int(n*1.10) new_unit_prices = map(myinflator, unit_prices) print([i for i in new_unit_prices]) Both print the same!

Objectives of Chapter 8 Grus forgot his Maths and now would like to find the minimum value of function X2 for values around 0: argmin 𝑓 𝑥 𝑥∈[−1,1] def square(x): return x * x def derivative(x): return 2 * x - Lets see how it works using the Class6-gradient_descent_visualize.py

Further reading: Learn Gradient Descent
Try: gradient_descent.py with its companion module linear algebra on functions of your choice. Try f(x) = x3+3x2−2x+1 over [−4, 2]. Hint: derivative is 3x2+6x−2 Hint: global minimum at -4 Code: Class6-gradient_descent-Gruss_code.py

More on gradient descent…
Batch gradient descent: Use all data, in Class6-grad_descent(ax+b).py we have 5 points But what happens if we have or 1 billion points? Algorithm becomes very slow! Mini-batch Instead of going over all examples, Mini-batch Gradient Descent sums up over lower number of examples based on the batch size. Stochastic gradient descent Shuffle the training data and uses a single randomly picked training example

When comparing Batch gradient descent Stochastic gradient descent
Much slower More accurate Stochastic gradient descent Much faster Slightly off (noise data)

Resources Try online function visualization

Programming with Data Lab 6

Similar presentations

Presentation on theme: "Programming with Data Lab 6"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Programming with Data Lab 6

Similar presentations

Presentation on theme: "Programming with Data Lab 6"— Presentation transcript:

Similar presentations

About project

Feedback