Introducing the Python programming language to the SMEDG community By Rupert Osborn H&S Consultants.

Slides:



Advertisements
Similar presentations
The Complete Technical Analysis and Development Environment An attractive alternative to MATLAB and GAUSS - Physics World.
Advertisements

Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Python for Science Shane Grigsby. What is python? Why python? Interpreted, object oriented language Free and open source Focus is on readability Fast.
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Jonathan Huelman CSC 415 – Programming Languages
Activity 1 - WBs 5 mins Go online and spend a moment trying to find out the difference between: HIGH LEVEL programming languages and LOW LEVEL programming.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
© 2004 The MathWorks, Inc. 1 MATLAB for C/C++ Programmers Support your C/C++ development using MATLAB’s prebuilt graphics functions and trusted numerics.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Introduction to InVEST ArcGIS Tool Nasser Olwero GMP, Bangkok April
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
INTRODUCTION FOR PERL MONGERS MATLAB. Outline 1. Matlab, what is it good for 2. Matlab’s IDE & functions 3. A few words about Maple 4. What needs to be.
Python: An Introduction
Company Overview for GDF Suez December 29, Enthought’s Business Enthought provides products and consulting services for scientific software solutions.
Introduction to Python By Neil Cook Twitter: njcuk Slides/Notes:
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
Introduction to Python Lesson 1 First Program. Learning Outcomes In this lesson the student will: 1.Learn some important facts about PC’s 2.Learn how.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Intermediate 2 Computing Unit 2 - Software Development Topic 2 - Software Development Languages and Environments.
MATLAB – PT1 The purpose of this workshop is to get you started and to have fun with MATLAB! Let’s talk a little and decide on what we will be covering.
Python for: Data Science. Python  Python is an open source scripting language.  Developed by Guido Van Rossum in late 1980s  Named after Monty Python.
CIS 601 Fall 2003 Introduction to MATLAB Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
COMP 4332 Tutorial 1 Feb 16 WANG YUE Tutorial Overview & Learning Python.
Python & NetworkX Youn-Hee Han
Programming Objectives What is a programming language? Difference between source code and machine code What is python? – Where to get it from – How to.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
NumPy, SciPy, Mpi4Py Shepelenko Olha. History of NumPy Originally, Python was not developed as a language for numerical computing. However, due to its.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Python coding of MULTIPLY’s processing chain Ioannis Binietoglou National Institute for R&D in Optoelectronics, Bucharest, Romania National Observatory.
IT for quantitative information Working with Spreadsheets 1.
How to Get Started With Python
A quick guide to other statistical software
Python for data analysis Prakhar Amlathe Utah State University
IBM Predictive Analytics Virtual Users’ Group Meeting March 30, 2016
Introduction to InVEST ArcGIS Tool
Matlab.
MET4750 Techniques for Earth System Modeling
CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016
PYTHON: AN INTRODUCTION
MatLab Programming By Kishan Kathiriya.
Introduction to R Programming with AzureML
Basic machine learning background with Python scikit-learn
Prepared by Kimberly Sayre and Jinbo Bi
Network Visualization
Introduction to MATLAB
DESIGN & IMPLEMENTATION
Brief Intro to Python for Statistics
What's New in eCognition 9
EMSE 6574 – Programming for Analytics: Python 101 – Python Enviornments Joel Klein.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Simulation And Modeling
Tour of NCL Website Modified by R. Grotjahn
Chapter 1: Programming Basics, Python History and Program Components
Python for Data Analysis
Collecting, Analyzing, and Visualizing Data with Python Part I
What's New in eCognition 9
What's New in eCognition 9
Igor Stančin, Alan Jović to: {igor.stancin,
Mapping packages Unfortunately none come with Anaconda (only geoprocessing is which does lat/long to Cartesian conversions). matplotlib.
An Introduction to Data Science using Python
An Introduction to Data Science using Python
Presentation transcript:

Introducing the Python programming language to the SMEDG community By Rupert Osborn H&S Consultants

 Pages on a blue background are additions to the presentation for those who didn’t attend the talk  They give a summary of the points made in the talk where necessary  The second part of the talk was presented using the IPython Notebook. This should also be available for download from the SMEDG website  The last slide gives advice on how to download Python – use a distribution like Anaconda!

 Python is a general purpose, high-level, object oriented, programming language  Free and open-source  Readable and intuitive

 Python is an easy to use programming language  Many geologists feel intimidated by programming languages  Programming languages can help the exploration geologist by automating repetitive tasks and enable quicker, deeper data analysis

 Started in 1989 by Guido van Rossum  Named after Monty Python’s Flying Circus

 Python 1.0 released in 1994  Python 2.0 released in 2000  Python 3.0 released in 2008  Python 2.7 is currently the recommended version???  Python 3 is the future of the language  Micromine uses Python 3.3 (so that’s what I use)

 Fast to write  Easy to read  Continually developing – Thriving ecosystem of third-party libraries  Huge number of simple and advanced functions  Loads of examples and help online  Great IDEs (integrated development environments)  Multi-platform (Windows, Apple, Linux, etc)

 Reads like English  Third-party libraries (e.g. NumPy) add huge functionality and are growing  Was considered as the glue between languages but is quickly becoming the Swiss army knife  Basically, IDEs are the interface that you type into. The ones available free for Python are great. They mark up mistakes (syntax errors) and include tab completion which is a bit like predictive text

Source: Python Charmers presentation  Different language for each task? PythonFortranJava MatlabCVB.net IDLC++R PerlC# others

 Python is developing to be able to do all the tasks done by a large selection of specialist languages  Python has improved massively over the last five years because more and more people are enjoying the ease of use and therefore more people are putting an effort into producing more functions and third-party libraries

Year Approximate cost per GFLOPS (US$) Platform providing the lowest cost per GFLOPS 1961$1,100,000,000,000About 17 million IBM 1620 units costing $64,000 each 1984$18,750,000Cray X-MP/ $30,000Beowulf clusters with Pentium Pro microprocessors 2000$1,000Bunyip Beowulf cluster 2003$82KASY0 2007$48Microwulf 2011$1.80HPU4Science 2013$0.22Sony PlayStation $0.08Celeron G1830 R9 295x2 System

 GFLOPS means Giga Floating point Operations per Second GFLOPS  Fortran was designed when programmer’s time was cheaper than processing time  Fortran and C are 10x slower to write than Python (and that is not including the use of ready made functions available in Python’s third-party libraries)  The computer takes longer to process Python scripts but optimised libraries like NumPy make it much quicker (similar to C++)

 United Space Agency - NASA  Google: Maps, Gmail, Groups, News  YouTube, Reddit, BitTorrent  Civilisation IV  Financial analysis  Research: Universities worldwide for a variety of disciplines

 Because Python is easy and quick to read and write, and because it is so versatile, people are using it more and more for a variety of applications

 So far I have tried to explain why Python is a good choice of languages to learn  But how will a programming language benefit geologists?  Python has a load of third-party libraries, some of which are shown on the slide  Specialist geological libraries are lacking but there are a few geophysical and hydrogeological libraries available or in development

 NumPy is the fundamental package for scientific computing with Python. It adds a fast and sophisticated array facility to the Python language

 The NumPy library forms the basis of Python’s scientific computing ability  It includes routines for mathematical, logical, shape manipulation, sorting, selecting, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more  Python uses loops to iterate over data. This is slow because it needs to convert commands to machine code and execute them one by one  NumPy is built to operate efficiently on arrays of numbers by sending batches of data to optimized C and Fortran code  The first part of the IPython Notebook that accompanies this presentation explains a little more about why NumPy is so useful

 DataFrame object for data manipulation with integrated indexing  Label-based slicing, fancy indexing, and subsetting of large data sets  Tools for reading and writing data to different file formats  Basic but handy quick plotting  Group by engine allowing split-apply-combine operations on data sets  Reshaping and pivoting of data sets  Time series-functionality, merging and joining, integrated handling of missing data, Data structure column insertion and deletion, Hierarchical axis indexing to work with high- dimensional data in a lower-dimensional data structure

 Pandas uses a DataFrame object which can be thought of as a table of data (although it can be more complicated)  It was built by the finance sector to aid with data manipulation and data analysis  It has loads of brilliant functions to really dig into your data  It has useful functions for reading and writing to file types such as csv and Excel

 Plotting library for graphs  Designed to closely resemble MATLAB  Simple to publication quality graphs  Great gallery with examples  Plenty of tutorials and help (Stack Overflow)

 Matplotlib is now the recommended plotting library to make graphs etc.  The Matplotlib figures can be easily customised and produce publication quality plots  Using the Matplotlib, NumPy and Pandas libraries together make data analysis much easier and reproducible than in Excel

 Gathers a variety of high level science and engineering modules together:  stats: statistical functions  spatial: KD-trees, nearest neighbors, distance functions  interpolate: interpolation tools e.g. IDW, RBF  ndimage: various functions for multi-dimensional image processing  optimize: optimization algorithms including linear programming  constants: physical constants and conversion factors  fftpack: Discrete Fourier Transform algorithms  integrate: numerical integration routines  linalg: linear algebra routines  misc: miscellaneous utilities (e.g. image reading/writing)  signal: signal processing tools  special: special functions

 The SciPy library has loads of more advanced functions – enabling, for example, high-level statistical functions to be called using a single line of code

 scikit-learn: various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN  scikit-image: algorithms for segmentation, geometric transformations, colour space manipulation, analysis, filtering, morphology, feature detection, and more.

 Scikits offer a few more specialised libraries e.g. Scikits-Learn which provides a load of machine learning capabilities. Supervised and unsupervised classification processes are available  The accompanying IPython Notebook shows how easy it is to conduct dimensionality reduction (Factor Analysis) and unsupervised classification (k-means clustering) on ICP data to help differentiate rock types

 IDEs are basically the interface into which you type the code  Great packages like Spyder, PyCharm and the Ipython Notebook  Auto-complete and tab-complete options  Syntax and spelling errors are highlighted automatically

 There are a couple of Python distributions that are easy to download and install  These include the most popular Python libraries  Anaconda is recommended for science, math, engineering, data analysis. Anaconda  The link will only work if viewing this presentation in Slide Show. Alternatively just search for “Download Anaconda”

 For those at the presentation we’re going to go through the IPython Notebook part of the presentation  A version of this will be available for download but unfortunately I won’t be able to provide the data with it  Install Anaconda then you can try it with your own data