Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen.

Similar presentations


Presentation on theme: "Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen."— Presentation transcript:

1 Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

2 新加坡国立大学商学院 This Session’s Agenda NumPy Arrays Random Number Generation Numerical Linear Algebra Visualization An OLS Example 2

3 新加坡国立大学商学院 Preliminaries By now you should have a Python distribution… – Run IPython or Spyder – If not, start downloading one and work with IDLE 3

4 新加坡国立大学商学院 NumPy Arrays The ndarray is a typed N-dimensional array object: >>> import numpy as np >>> A = np.ndarray((2,3))# Kinda the same as np.empty((2,3)); Uninitialized data array([[ 8.48798326e-314, 2.12199579e-314, 0.00000000e+000], [ 0.00000000e+000, 1.05290307e-253, 1.47310613e-319]]) >>> A[0,0] = 2; A[0,1] = 3; A[1,1] = 4; A[1,2] = 5; A array([[ 2., 3., 0.], [ 0., 4., 5.]]) >>> 10 * A + 1# Simple vector operations array([[ 21., 31., 1.], [ 1., 41., 51.]]) >>> np.exp(A)# And various other vector operations array([[ 7.3890561, 20.08553692, 1. ], [ 1., 54.59815003, 148.4131591 ]]) >>> [A.shape, A.ndim]# And don’t forget ndarrays have a "shape" and "dimension" [(2, 3), 2] >>> A.dtype#... and a data type. dtype('float64') 4

5 新加坡国立大学商学院 NumPy Arrays: Creating Common ways of creating a ndarray >>> np.array(np.arange(5))# arange is like range but returns a ndarray array([0, 1, 2, 3, 4]) >>> np.array(np.arange(5)).dtype# numpy selects an appropriate datatype dtype('int32') >>> np.reshape([3,2,1,3,1,2,6,7], (2,4)) array([[3, 2, 1, 3], [1, 2, 6, 7]]) >>> np.ones((2,3,4)) array([[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]], [[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]]) >>> np.zeros((5)) array([ 0., 0., 0., 0., 0.]) 5

6 新加坡国立大学商学院 NumPy Arrays: Creating An amusing way to create a ndarray >>> M = np.mat('[1,2,3;4,5,6]'); M# Like MATLAB matrix([[1, 2, 3], [4, 5, 6]]) >>> A = np.array(M); A#... not really necessary array([[1, 2, 3], [4, 5, 6]]) >>> # Here's the obligatory Matrix Multiplication digression >>> M * M.T# Matrix multiplication (M.T is the transpose of M) matrix([[14, 32], [32, 77]]) >>> A * A.T# Not matrix multiplication Traceback (most recent call last): File " ", line 1, in ValueError: operands could not be broadcast together with shapes (2,3) (3,2) >>> A.dot(A.T)# Matrix multiplication for 2 dimensional ndarrays array([[14, 32], [32, 77]]) >>> np.dot(A,A.T)#... same here. (Dot product for 1 dimensional ndarrays) array([[14, 32], [32, 77]]) 6

7 新加坡国立大学商学院 NumPy Arrays: Accessing Simple slicing creates views (not copies) >>> A = np.reshape(np.arange(2*4)+1, (2,4)); A array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> B = A[:,1:3]; B array([[2, 3], [6, 7]]) >>> B[0,0] = 100; B; A array([[100, 3], [ 6, 7]]) array([[ 1, 100, 3, 4], [ 5, 6, 7, 8]]) “Fancy Indexing” creates copies >>> C = A[:, [1,3]]; C# Using lists/arrays for indexing array([[100, 4], [ 6, 8]]) >>> C[0,0] = 200; C; A array([[200, 4], [ 6, 8]]) array([[ 1, 100, 3, 4], [ 5, 6, 7, 8]]) 7

8 新加坡国立大学商学院 NumPy Arrays: A Little More Some standard operations >>> A = np.reshape(np.arange(2*4)+1, (2,4)); A array([[1, 2, 3, 4], [5, 6, 7, 8]]) >>> A.sum() 36 >>> A.sum(axis=0) array([ 6, 8, 10, 12]) >>> A.sum(axis=1) array([10, 26]) >>> A.cumsum(axis=1) array([[ 1, 3, 6, 10], [ 5, 11, 18, 26]]) >>> (A.mean(axis=1), A.std(axis=1), A.var(axis=1), A.min(axis=1), A.argmax(axis=1)) (array([ 2.5, 6.5]), array([ 1.11803399, 1.11803399]), array([ 1.25, 1.25]), array([1, 5]), array([3, 3])) >>> arr = np.array([1,2,4,9,1,4]); arr.sort(); arr # Sorting array([1, 1, 2, 4, 4, 9]) >>> np.unique(arr)# Keep only unique entries array([1, 2, 4, 9]) >>> np.unique(np.array(['A', 'B', 'CC', 'c', 'A', 'CC']))# Applies to strings too array(['A', 'B', 'CC', 'c'], dtype='|S2') 8

9 新加坡国立大学商学院 Random Number Generation The Random Seed >>> np.random.seed(10) Various Distributions >>> np.random.SOME_DISTRIBUTION(params, size=(dimensions)) >>> # rand ~ Uniform; randn ~ Normal; exponential; beta; >>> # gamma; binomial; randint(=>Discrete)... even triangular 9

10 新加坡国立大学商学院 Numerical Linear Algebra Matrix Decompositions >>> A = np.ones((5,5)) + np.eye(5) >>> L = np.linalg.cholesky(A)# Cholesky decomposition. >>> np.allclose(np.dot(L,L.T), A)# Check... True >>> Q, R = np.linalg.qr(A)# QR decomposition >>> np.allclose(np.dot(Q,R), A) True >>> # Eigenvalue decomposition (Use eigh when symmetric/Hermetian) >>> e, V = np.linalg.eig(A) >>> np.allclose(np.dot(A,V), np.dot(V,np.diag(e)))# V may not be unitary True >>> # Singular Value Decomposition >>> U, s, V = np.linalg.svd(A, full_matrices = True) >>> np.allclose(np.dot(np.dot(U, np.diag(s)),V), A) True 10

11 新加坡国立大学商学院 Numerical Linear Algebra Solving Linear Systems >>> np.linalg.solve(A, np.ones((5,1))).T# Solve Linear System array([[ 0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667]]) >>> np.linalg.inv(A)# Matrix Inverse array([[ 0.83333333, -0.16666667, -0.16666667, -0.16666667, -0.16666667], [-0.16666667, 0.83333333, -0.16666667, -0.16666667, -0.16666667], [-0.16666667, -0.16666667, 0.83333333, -0.16666667, -0.16666667], [-0.16666667, -0.16666667, -0.16666667, 0.83333333, -0.16666667], [-0.16666667, -0.16666667, -0.16666667, -0.16666667, 0.83333333]]) Other Stuff >>> np.linalg.norm(A)# Matrix norm 6.324555320336759 >>> np.linalg.det(A)# Determinant (should be 6...) 5.9999999999999982 >>> np.trace(A)# Matrix trace 10 >>> np.linalg.matrix_rank(A)# Matrix rank 5 11

12 新加坡国立大学商学院 Visualization (By Example) By now you should have a Python distribution. Run IPython or Spyder In [1]: %pylab Using matplotlib backend: module://IPython.kernel.zmq.pylab.backend_inline Populating the interactive namespace from numpy and matplotlib In [2]: plot( arange(20) ) Out[2]: [ ] 12

13 新加坡国立大学商学院 Visualization (By Example) A Standard Brownian Motion In [3]: dx = 0.01# Hit CTRL-Enter for multi-line input...: walk = (np.random.randn(1000) * np.sqrt(dx)).cumsum()...: plot(np.arange(0,1000)*dx, walk - walk[0]) Out[3]: [ ] 13

14 新加坡国立大学商学院 Visualization (By Example) A 2D Intensity Plot In [4]: xy_extent = 20...: xy_range = np.arange(-xy_extent,xy_extent,0.1)...: X_m, Y_m = np.meshgrid(xy_range, xy_range)...: f = ((X_m-5) - 2 * (Y_m+5)) ** 2 - (1.5*(X_m+3) + 1 * (Y_m-5)) ** 2...: imshow(f, cmap = matplotlib.cm.bone, \...: extent=[-xy_extent, xy_extent, -xy_extent, xy_extent])...: colorbar() Out[4]: 14

15 新加坡国立大学商学院 Visualization (By Example) A 2D Contour Plot In [5]: xy_extent = 20...: xy_range = np.arange(-xy_extent,xy_extent,0.1)...: X_m, Y_m = np.meshgrid(xy_range, xy_range)...: f = ((X_m-5) - 2 * (Y_m+5)) ** 2 - (1.5*(X_m+3) + 1 * (Y_m-5)) ** 2...: contourf(X_m, Y_m, f, 20)...: colorbar() Out[5]: 15

16 新加坡国立大学商学院 Visualization (By Example) 16 Our Objective…

17 新加坡国立大学商学院 Visualization (By Example) Some data for plotting In [6]: XY = np.arange(1,1+100)/100.0...: XZ = np.arange(1,1+1000)/1000.0...: Y = np.random.randn(100)...: Z = np.random.randn(1000)...: Z.sort() Setting up the figure and subplots In [7]: n_row = 2...: n_col = 2...: fig1 = plt.figure(figsize=(12,8))...: axes = [];# Collect axis references in a list...: for k in range(n_row * n_col):...: axes.append( fig1.add_subplot(n_row,n_col, k+1) ) 17

18 新加坡国立大学商学院 Visualization (By Example) The simple line plot In [8]: axes[0].plot( XZ - 0.5, np.cos(1.5 * (XZ - 0.5) * \...: 2 * 3.14159), color='k')...: axes[0].set_xlim([-0.6, 0.6])...: axes[0].set_title(r'$\cos\ 2\pi x$')...: axes[0].set_xlabel('x')...: axes[0].set_ylabel('y')...: fig1 Out[8]: 18

19 新加坡国立大学商学院 Visualization (By Example) The histogram of normal samples In [9]: axes[1].hist( Z, bins=20, color='b')...: axes[1].set_title('Some Histogram of 1000 samples from $N(0,1)$')...: fig1 Out[9]: 19

20 新加坡国立大学商学院 Visualization (By Example) The scatter plots (with legend) In [10]: axes[2].scatter( XY[10:60], Y[:50] + XY[10:60]*0.05, marker='o', \...: c=(0.5, 0.5, 0.5), label='Population 1')...: axes[2].scatter( XY[40:90], Y[50:] + XY[40:90]*0.05, marker='x', \...: c='#FF00A0', label='Population 2')...: axes[2].legend(loc = 'best')...: fig1 Out[10]: 20

21 新加坡国立大学商学院 Visualization (By Example) The big mess of normal CDFs In [11]: axes[3].set_title(r'A Colourful Mess of $N(\mu_k, 1)$ CDFs')...: col = ['r', 'g', 'b', 'c', 'm', 'y', 'k'] # w for white...: line = ['solid', 'dashed', 'dashdot', 'dotted']...: linestyle = [(c,l) for c in col for l in line]...: for k in range(40):...: ls_idx = k % len(linestyle)...: axes[3].plot( Z+(k-20)*0.5, XZ, color = linestyle[ls_idx][0], \...: linestyle=linestyle[ls_idx][1])...: axes[3].text(-10, 0.45, 'Nothing to See Here.', fontsize=20)...: fig1 Out[11]: 21

22 新加坡国立大学商学院 Visualization (By Example) 22

23 新加坡国立大学商学院 Example: Ordinary Least Squares 23

24 新加坡国立大学商学院 Example: Ordinary Least Squares In [13]: if X.shape[0] != Y.shape[0]:# Always good to check for errors...: raise ValueError("Data dimension mismatch.")...: num_pred = X.shape[1]...: num_samples = X.shape[0] In [14]: # Descriptive Statistics...: plt_side = math.ceil(math.sqrt(num_pred + 1))...: fig = plt.figure(figsize = (12,12));...: for k in range(num_pred + 1):...: ax = fig.add_subplot(plt_side, plt_side, k+1)...: if k == 0:...: ax.hist(Y, bins=20)...: ax.set_title( "Dependent Variable ($\\mu={mean:.3}$, $\\sigma={std:.3}$)".format( \...: mean = np.mean(Y), std = np.std(Y)) )...: else:...: ax.hist(X[:, k-1], bins=20)...: ax.set_title( "Predictor {pred} ($\\mu={mean:.3}$, $\\sigma={std:.3}$)".format( \...: pred = k, mean = np.mean(X[:, k-1]), std = np.std(X[:, k-1])) ) 24

25 新加坡国立大学商学院 Example: Ordinary Least Squares 25

26 新加坡国立大学商学院 Example: Ordinary Least Squares In [15]: # OLS...: XtX = np.dot(X.T, X)...: XtY = np.dot(X.T, Y)...: beta_est = np.linalg.solve(XtX, XtY)...: s_sq = np.var(Y - np.dot(X, beta_est))...:...: print "True Betas", true_beta.T...: print "Estimated Betas", beta_est.T...: print...:...: print "True Error Variance: ", true_error_std ** 2...: print "Estimated Error Variance: ", s_sq True Betas [[ 1. 1. 1. 1. 1. 0.]] Estimated Betas [[ 1.2389584 0.98711812 0.90258976 0.86002269 0.99770273 0.01079312]] True Error Variance: 4 Estimated Error Variance: 3.85523406528 26

27 新加坡国立大学商学院 Example: Ordinary Least Squares In [16]: # Hypothesis Testing...: standard_errors = np.reshape(np.sqrt(np.diag( \...: s_sq * np.linalg.solve(XtX, np.eye(num_pred)) )), (num_pred,1))...: print "Standard Errors: ", standard_errors.T...: t_statistics = beta_est / standard_errors...: print "t-Statistics: ", t_statistics.T...: df = num_samples - num_pred...: p_values = 2 * scipy.stats.t.cdf(-abs(t_statistics), df).T...: print "p-values (2-sided): ", p_values Standard Errors: [[ 0.26819354 0.05023788 0.16891535 0.09578419 0.09744182 0.06468454]] t-Statistics: [[ 4.61964296 19.6488801 5.34344442 8.97875377 10.23895845 0.16685784]] p-values (2-sided): [[ 4.91038992e-06 6.11554987e-64 1.39498275e-07 5.75446104e-18 1.92610049e-22 8.67550186e-01]] 27 So let’s do it again!

28 新加坡国立大学商学院 Example: Ordinary Least Squares In [15]: # OLS Again...: X = X[:,:-1]; num_pred = X.shape[1]; num_samples = X.shape[0]...: XtX = np.dot(X.T, X)...: XtY = np.dot(X.T, Y)...: beta_est = np.linalg.solve(XtX, XtY)...: s_sq = np.var(Y - np.dot(X, beta_est))...: print "Estimated Betas", beta_est.T...: standard_errors = np.reshape(np.sqrt(np.diag( \...: s_sq * np.linalg.solve(XtX, np.eye(num_pred)) )), (num_pred,1))...: print "Standard Errors: ", standard_errors.T...: t_statistics = beta_est / standard_errors...: print "t-Statistics: ", t_statistics.T...: df = num_samples - num_pred...: p_values = 2 * scipy.stats.t.cdf(-abs(t_statistics), df).T...: print "p-values (2-sided): ", p_values Estimated Betas [[ 1.25027031 0.98744205 0.90124032 0.86084415 0.9975555 ]] Standard Errors: [[ 0.25949091 0.05020176 0.16872633 0.09566025 0.09744054]] t-Statistics: [[ 4.81816615 19.6694721 5.34143267 8.99897405 10.23758223]] p-values (2-sided): [[ 1.92950811e-06 4.54116320e-64 1.40853414e-07 4.88551777e-18 1.93166121e-22]] 28

29 新加坡国立大学商学院 ☺ 29


Download ppt "Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen."

Similar presentations


Ads by Google