Introduction to Python

Slides:



Advertisements
Similar presentations
Python for Informatics: Exploring Information
Advertisements

Course A201: Introduction to Programming 10/28/2010.
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Overview of programming in C C is a fast, efficient, flexible programming language Paradigm: C is procedural (like Fortran, Pascal), not object oriented.
Container Types in Python
15. Python - Modules A module allows you to logically organize your Python code. Grouping related code into a module makes the code easier to understand.
Setting the PYTHONPATH PYTHONPATH is where Python looks for modules it is told to import List of paths Add new path to the end with: setenv PYTHONPATH.
Programming for Linguists
Perkovic, Chapter 7 Functions revisited Python Namespaces
Python 3 March 15, NLTK import nltk nltk.download()
Introduction to Python Week 15. Try It Out! Download Python from Any version will do for this class – By and large they are all mutually.
COP 2800 Lake Sumter State College Mark Wilson, Instructor.
Chapter 6 Lists and Dictionaries CSC1310 Fall 2009.
A Crash Course Python. Python? Isn’t that a snake? Yes, but it is also a...
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Tcl and Otcl Tutorial Part I Internet Computing KUT Youn-Hee Han.
What is a scripting language? What is Python?
Concepts when Python retrieves a variable’s value Namespaces – Namespaces store information about identifiers and the values to which they are bound –
BIL101, Introduction to Computers and Information Systems Chapter 12 A Portable Scientific Visualization Program: GnuPlot Prepared by Metin Demiralp Istanbul.
1 Python Chapter 2 © Samuel Marateck, After you install the compiler, an icon labeled IDLE (Python GUI) will appear on the screen. If you click.
Introduction to Computing Using Python Dictionary container class + modules  Container Class dict  Modules, revisited.
Lilian Blot CORE ELEMENTS COLLECTIONS & REPETITION Lecture 4 Autumn 2014 TPOP 1.
CSC 110 Numeric data types [Reading: chapter 3] CSC 110 D 1.
Python  By: Ben Blake, Andrew Dzambo, Paul Flanagan.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
January 24, 2006And Now For Something Completely Different 1 “And Now For Something Completely Different” A Quick Tutorial of the Python Programming Language.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Computer Science 111 Fundamentals of Programming I Basic Program Elements.
H3D API Training  Part 3.1: Python – Quick overview.
Modules and Decomposition UW CSE 190p Summer 2012 download examples from the calendar.
Introduction to Programming David Goldschmidt, Ph.D. Computer Science The College of Saint Rose Java Fundamentals (Comments, Variables, etc.)
Built-in Data Structures in Python An Introduction.
Getting Started with Python: Constructs and Pitfalls Sean Deitz Advanced Programming Seminar September 13, 2013.
Introduction to Python
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
Unit 6 File processing Special thanks to Roy McElmurry, John Kurkowski, Scott Shawcroft, Ryan Tucker, Paul Beck for their work. Except where otherwise.
Department of Electrical and Computer Engineering Introduction to C++: Primitive Data Types, Libraries and Operations By Hector M Lugo-Cordero August 27,
(A Very Short) Introduction to Shell Scripts CSCI N321 – System and Network Administration Copyright © 2000, 2003 by Scott Orr and the Trustees of Indiana.
Python I Some material adapted from Upenn cmpe391 slides and other sources.
CS105 Computer Programming PYTHON (based on CS 11 Python track: lecture 1, CALTECH)
GE3M25: Computer Programming for Biologists Python, Class 5
LISTS and TUPLES. Topics Sequences Introduction to Lists List Slicing Finding Items in Lists with the in Operator List Methods and Useful Built-in Functions.
Python Files and Lists. Files  Chapter 9 actually introduces you to opening up files for reading  Chapter 14 has more on file I/O  Python can read.
Introduction to Programming Oliver Hawkins. BACKGROUND TO PROGRAMMING LANGUAGES Introduction to Programming.
Introduction to Python Developed by Dutch programmer Guido van Rossum Named after Monty Python Open source development project Simple, readable language.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
PYTHON PROGRAMMING. WHAT IS PYTHON?  Python is a high-level language.  Interpreted  Object oriented (use of classes and objects)  Standard library.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Indentations makes the scope/block Function definition def print_message (): print “hello” Function usages print_message () hubo.move ()// hubo is a class.
Introduction to Python
CSc 120 Introduction to Computer Programing II Adapted from slides by
G. Pullaiah College of Engineering and Technology
Introduction to Python
Containers and Lists CIS 40 – Introduction to Programming in Python
CS 100: Roadmap to Computing
String Processing Upsorn Praphamontripong CS 1110
CSc 120 Introduction to Computer Programing II
Topics Introduction to File Input and Output
Data types Numeric types Sequence types float int bool list str
CISC101 Reminders Quiz 2 graded. Assn 2 sample solution is posted.
Python Tutorial for C Programmer Boontee Kruatrachue Kritawan Siriboon
CHAPTER 3: String And Numeric Data In Python
CS 1111 Introduction to Programming Spring 2019
Topics Introduction to File Input and Output
Terminal-Based Programs
Winter 2019 CISC101 4/29/2019 CISC101 Reminders
Lecture 7: Python’s Built-in Types and Basic Statements
“Everything Else”.
Topics Introduction to File Input and Output
Presentation transcript:

Introduction to Python Chen Lin clin@brandeis.edu COSI 134a Volen 110 Office Hour: Thurs. 3-5

For More Information? http://python.org/ - documentation, tutorials, beginners guide, core distribution, ... Books include: Learning Python by Mark Lutz Python Essential Reference by David Beazley Python Cookbook, ed. by Martelli, Ravenscroft and Ascher (online at http://code.activestate.com/recipes/langs/python/) http://wiki.python.org/moin/PythonBooks

Python Videos http://showmedo.com/videotutorials/python “5 Minute Overview (What Does Python Look Like?)” “Introducing the PyDev IDE for Eclipse” “Linear Algebra with Numpy” And many more

4 Major Versions of Python “Python” or “CPython” is written in C/C++ - Version 2.7 came out in mid-2010 - Version 3.1.2 came out in early 2010 “Jython” is written in Java for the JVM “IronPython” is written in C# for the .Net environment Go To Website

Development Environments what IDE to use. http://stackoverflow 1. PyDev with Eclipse 2. Komodo 3. Emacs 4. Vim 5. TextMate 6. Gedit 7. Idle 8. PIDA (Linux)(VIM Based) 9. NotePad++ (Windows) 10.BlueFish (Linux)

Pydev with Eclipse

Python Interactive Shell Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> You can type things directly into a running Python session >>> 2+3*4 14 >>> name = "Andrew" >>> name 'Andrew' >>> print "Hello", name Hello Andrew

Background Data Types/Structure Control flow File I/O Modules Class NLTK

List A compound data type: [0] [2.3, 4.5] [5, "Hello", "there", 9.8] [] Use len() to get the length of a list >>> names = [“Ben", “Chen", “Yaqin"] >>> len(names) 3

Use [ ] to index items in the list >>> names[0] ‘Ben' >>> names[1] ‘Chen' >>> names[2] ‘Yaqin' >>> names[3] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list index out of range >>> names[-1] >>> names[-2] >>> names[-3] [0] is the first item. [1] is the second item ... Out of range values raise an exception Negative values go backwards from the last element.

Strings share many features with lists >>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles[0] 'C' >>> smiles[1] '(' >>> smiles[-1] 'O' >>> smiles[1:5] '(=N)' >>> smiles[10:-4] 'C(=O)' Use “slice” notation to get a substring

String Methods: find, split smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles.find("(O)") 15 >>> smiles.find(".") 9 >>> smiles.find(".", 10) -1 >>> smiles.split(".") ['C(=N)(N)N', 'C(=O)(O)O'] >>> Use “find” to find the start of a substring. Start looking at position 10. Find returns -1 if it couldn’t find a match. Split the string into parts with “.” as the delimiter

String operators: in, not in if "Br" in “Brother”: print "contains brother“ email_address = “clin” if "@" not in email_address: email_address += "@brandeis.edu“

String Method: “strip”, “rstrip”, “lstrip” are ways to remove whitespace or selected characters >>> line = " # This is a comment line \n" >>> line.strip() '# This is a comment line' >>> line.rstrip() ' # This is a comment line' >>> line.rstrip("\n") ' # This is a comment line ' >>>

More String methods email.startswith(“c") endswith(“u”) True/False >>> "%s@brandeis.edu" % "clin" 'clin@brandeis.edu' >>> names = [“Ben", “Chen", “Yaqin"] >>> ", ".join(names) ‘Ben, Chen, Yaqin‘ >>> “chen".upper() ‘CHEN'

Unexpected things about strings >>> s = "andrew" >>> s[0] = "A" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment >>> s = "A" + s[1:] >>> s 'Andrew‘ Strings are read only

“\” is for special characters \n -> newline \t -> tab \\ -> backslash ... But Windows uses backslash for directories! filename = "M:\nickel_project\reactive.smi" # DANGER! filename = "M:\\nickel_project\\reactive.smi" # Better! filename = "M:/nickel_project/reactive.smi" # Usually works

Lists are mutable - some useful methods append an element remove an element sort by default order reverse the elements in a list insert an element at some specified position. (Slower than .append()) >>> ids = ["9pti", "2plv", "1crn"] >>> ids.append("1alm") >>> ids ['9pti', '2plv', '1crn', '1alm'] >>>ids.extend(L) Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. >>> del ids[0] ['2plv', '1crn', '1alm'] >>> ids.sort() ['1alm', '1crn', '2plv'] >>> ids.reverse() >>> ids.insert(0, "9pti")

Tuples: sort of an immutable list >>> yellow = (255, 255, 0) # r, g, b >>> one = (1,) >>> yellow[0] >>> yellow[1:] (255, 0) >>> yellow[0] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment Very common in string interpolation: >>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056) 'Andrew lives in Sweden at latitude 57.7'

zipping lists together >>> names ['ben', 'chen', 'yaqin'] >>> gender = [0, 0, 1] >>> zip(names, gender) [('ben', 0), ('chen', 0), ('yaqin', 1)]

Dictionaries Duplicate keys are not allowed Dictionaries are lookup tables. They map from a “key” to a “value”. symbol_to_name = { "H": "hydrogen", "He": "helium", "Li": "lithium", "C": "carbon", "O": "oxygen", "N": "nitrogen" } Duplicate keys are not allowed Duplicate values are just fine

Keys can be any immutable value numbers, strings, tuples, frozenset, not list, dictionary, set, ... atomic_number_to_name = { 1: "hydrogen" 6: "carbon", 7: "nitrogen" 8: "oxygen", } nobel_prize_winners = { (1979, "physics"): ["Glashow", "Salam", "Weinberg"], (1962, "chemistry"): ["Hodgkin"], (1984, "biology"): ["McClintock"], A set is an unordered collection with no duplicate elements.

Dictionary Get the value for a given key Test if the key exists (“in” only checks the keys, not the values.) [] lookup failures raise an exception. Use “.get()” if you want to return a default value. >>> symbol_to_name["C"] 'carbon' >>> "O" in symbol_to_name, "U" in symbol_to_name (True, False) >>> "oxygen" in symbol_to_name False >>> symbol_to_name["P"] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'P' >>> symbol_to_name.get("P", "unknown") 'unknown' >>> symbol_to_name.get("C", "unknown")

Some useful dictionary methods >>> symbol_to_name.keys() ['C', 'H', 'O', 'N', 'Li', 'He'] >>> symbol_to_name.values() ['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium'] >>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} ) >>> symbol_to_name.items() [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')] >>> del symbol_to_name['C'] >>> symbol_to_name {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}

Background Data Types/Structure list, string, tuple, dictionary Control flow File I/O Modules Class NLTK

Control Flow Things that are False The boolean value False The numbers 0 (integer), 0.0 (float) and 0j (complex). The empty string "". The empty list [], empty dictionary {} and empty set set(). Things that are True The boolean value True All non-zero numbers. Any string containing at least one character. A non-empty data structure.

If >>> smiles = "BrC1=CC=C(C=C1)NN.Cl" >>> bool(smiles) True >>> not bool(smiles) False >>> if not smiles: ... print "The SMILES string is empty" ... The “else” case is always optional

Use “elif” to chain subsequent tests >>> mode = "absolute" >>> if mode == "canonical": ... smiles = "canonical" ... elif mode == "isomeric": ... smiles = "isomeric” ... elif mode == "absolute": ... smiles = "absolute" ... else: ... raise TypeError("unknown mode") ... >>> smiles ' absolute ' >>> “raise” is the Python way to raise exceptions

Boolean logic Python expressions can have “and”s and “or”s: if (ben <= 5 and chen >= 10 or chen == 500 and ben != 5): print “Ben and Chen“

Range Test if (3 <= Time <= 5): print “Office Hour"

For >>> names = [“Ben", “Chen", “Yaqin"] >>> for name in names: ... print smiles ... Ben Chen Yaqin

Tuple assignment in for loops data = [ ("C20H20O3", 308.371), ("C22H20O2", 316.393), ("C24H40N4O2", 416.6), ("C14H25N5O3", 311.38), ("C15H20O2", 232.3181)] for (formula, mw) in data: print "The molecular weight of %s is %s" % (formula, mw) The molecular weight of C20H20O3 is 308.371 The molecular weight of C22H20O2 is 316.393 The molecular weight of C24H40N4O2 is 416.6 The molecular weight of C14H25N5O3 is 311.38 The molecular weight of C15H20O2 is 232.3181

Break, continue >>> for value in [3, 1, 4, 1, 5, 9, 2]: Checking 3 The square is 9 Checking 1 Ignoring Checking 4 The square is 16 Checking 5 The square is 25 Checking 9 Exiting for loop >>> >>> for value in [3, 1, 4, 1, 5, 9, 2]: ... print "Checking", value ... if value > 8: ... print "Exiting for loop" ... break ... elif value < 3: ... print "Ignoring" ... continue ... print "The square is", value**2 ... Use “break” to stop the for loop Use “continue” to stop processing the current item

Range() How to get every second element in a list? “range” creates a list of numbers in a specified range range([start,] stop[, step]) -> list of integers When step is given, it specifies the increment (or decrement). >>> range(5) [0, 1, 2, 3, 4] >>> range(5, 10) [5, 6, 7, 8, 9] >>> range(0, 10, 2) [0, 2, 4, 6, 8] How to get every second element in a list? for i in range(0, len(data), 2): print data[i]

Background Data Types/Structure Control flow File I/O Modules Class NLTK

Reading files >>> f = open(“names.txt") >>> f.readline() 'Yaqin\n'

Quick Way Ignore the header? >>> lst= [ x for x in open("text.txt","r").readlines() ] >>> lst ['Chen Lin\n', 'clin@brandeis.edu\n', 'Volen 110\n', 'Office Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', 'yaqin@brandeis.edu\n', 'Volen 110\n', 'Offiche Hour: Tues. 3-5\n'] Ignore the header? for (i,line) in enumerate(open(‘text.txt’,"r").readlines()): if i == 0: continue print line

Using dictionaries to count occurrences >>> for line in open('names.txt'): ... name = line.strip() ... name_count[name] = name_count.get(name,0)+ 1 ... >>> for (name, count) in name_count.items(): ... print name, count Chen 3 Ben 3 Yaqin 3

File Output input_file = open(“in.txt") output_file = open(“out.txt", "w") for line in input_file: output_file.write(line) “w” = “write mode” “a” = “append mode” “wb” = “write in binary” “r” = “read mode” (default) “rb” = “read in binary” “U” = “read files with Unix or Windows line endings”

Background Data Types/Structure Control flow File I/O Modules Class NLTK

Modules When a Python program starts it only has access to a basic functions and classes. (“int”, “dict”, “len”, “sum”, “range”, ...) “Modules” contain additional functionality. Use “import” to tell Python to load a module. >>> import math >>> import nltk

import the math module >>> import math >>> math.pi 3.1415926535897931 >>> math.cos(0) 1.0 >>> math.cos(math.pi) -1.0 >>> dir(math) ['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc'] >>> help(math) >>> help(math.cos)

“import” and “from ... import ...” >>> import math math.cos >>> from math import cos, pi cos >>> from math import *

Background Data Types/Structure Control flow File I/O Modules Class NLTK

Classes class ClassName(object): <statement-1> . . . <statement-N> class MyClass(object): """A simple example class""" i = 12345 def f(self): return self.i class DerivedClassName(BaseClassName):

Background Data Types/Structure Control flow File I/O Modules Class NLTK

http://www.nltk.org/book NLTK is on berry patch machines! >>>from nltk.book import * >>> text1 <Text: Moby Dick by Herman Melville 1851> >>> text1.name 'Moby Dick by Herman Melville 1851' >>> text1.concordance("monstrous") >>> dir(text1) >>> text1.tokens >>> text1.index("my") 4647 >>> sent2 ['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in', 'Sussex', '.']

Classify Text >>> def gender_features(word): ... return {'last_letter': word[-1]} >>> gender_features('Shrek') {'last_letter': 'k'} >>> from nltk.corpus import names >>> import random >>> names = ([(name, 'male') for name in names.words('male.txt')] + ... [(name, 'female') for name in names.words('female.txt')]) >>> random.shuffle(names)

Featurize, train, test, predict >>> featuresets = [(gender_features(n), g) for (n,g) in names] >>> train_set, test_set = featuresets[500:], featuresets[:500] >>> classifier = nltk.NaiveBayesClassifier.train(train_set) >>> print nltk.classify.accuracy(classifier, test_set) 0.726 >>> classifier.classify(gender_features('Neo')) 'male'

from nltk.corpus import reuters Reuters Corpus:10,788 news 1.3 million words. Been classified into 90 topics Grouped into 2 sets, "training" and "test“ Categories overlap with each other http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html

Reuters >>> from nltk.corpus import reuters >>> reuters.fileids() ['test/14826', 'test/14828', 'test/14829', 'test/14832', ...] >>> reuters.categories() ['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut', 'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]