Python basics#

Getting help#

There are many places to turn to for a Python reference and basic help. The quickest way to get help on a function is to google python what you're looking for. Typically, Google will refer you to http://docs.python.org/. For example, try googling python randomize. Google is good. Below are some additional references:

StackOverflow : For example, try searching for “python randomize” on StackOverflow: https://stackoverflow.com/search?q=python+randomize

Software Carpentry : Software Carpentry also has lectures and tutorials on Linux, Scientific Computing, and many other topics: https://software-carpentry.org/lessons/

Python Visualizer : The Python Visualizer may be helpful if you are having trouble conceptualizing how python exectures some bit of code.

NumPy for Matlab users : If you’re a Matlab user transitioning to Python, this page may be helpful.

Quick references#

Python mini tutorials and tips#

Get help on a module#

To get help on the functions contained in some module, for instance, the module ‘string’, type: help('string')

help('string')
Help on module string:

NAME
    string - A collection of string operations (most are no longer used).

FILE
    /usr/local/Cellar/python/2.7.14_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py

DESCRIPTION
    Warning: most of the code you see here isn't normally used nowadays.
    Beginning with Python 1.6, many of these functions are implemented as
    methods on the standard string object. They used to be implemented by
    a built-in module called strop, but strop is now obsolete itself.
    
    Public module variables:
    
    whitespace -- a string containing all characters considered whitespace
    lowercase -- a string containing all characters considered lowercase letters
    uppercase -- a string containing all characters considered uppercase letters
    letters -- a string containing all characters considered letters
    digits -- a string containing all characters considered decimal digits
    hexdigits -- a string containing all characters considered hexadecimal digits
    octdigits -- a string containing all characters considered octal digits
    punctuation -- a string containing all characters considered punctuation
    printable -- a string containing all characters considered printable

CLASSES
    __builtin__.object
        Formatter
        Template
    
    class Formatter(__builtin__.object)
     |  Methods defined here:
     |  
     |  check_unused_args(self, used_args, args, kwargs)
     |  
     |  convert_field(self, value, conversion)
     |  
     |  format(*args, **kwargs)
     |  
     |  format_field(self, value, format_spec)
     |  
     |  get_field(self, field_name, args, kwargs)
     |      # given a field_name, find the object it references.
     |      #  field_name:   the field being looked up, e.g. "0.name"
     |      #                 or "lookup[3]"
     |      #  used_args:    a set of which args have been used
     |      #  args, kwargs: as passed in to vformat
     |  
     |  get_value(self, key, args, kwargs)
     |  
     |  parse(self, format_string)
     |      # returns an iterable that contains tuples of the form:
     |      # (literal_text, field_name, format_spec, conversion)
     |      # literal_text can be zero length
     |      # field_name can be None, in which case there's no
     |      #  object to format and output
     |      # if field_name is not None, it is looked up, formatted
     |      #  with format_spec and conversion and then used
     |  
     |  vformat(self, format_string, args, kwargs)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
    
    class Template(__builtin__.object)
     |  A string class for supporting $-substitutions.
     |  
     |  Methods defined here:
     |  
     |  __init__(self, template)
     |  
     |  safe_substitute(*args, **kws)
     |  
     |  substitute(*args, **kws)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
     |  
     |  ----------------------------------------------------------------------
     |  Data and other attributes defined here:
     |  
     |  __metaclass__ = <class 'string._TemplateMetaclass'>
     |  
     |  
     |  delimiter = '$'
     |  
     |  idpattern = '[_a-z][_a-z0-9]*'
     |  
     |  pattern = <_sre.SRE_Pattern object>

FUNCTIONS
    atof(s)
        atof(s) -> float
        
        Return the floating point number represented by the string s.
    
    atoi(s, base=10)
        atoi(s [,base]) -> int
        
        Return the integer represented by the string s in the given
        base, which defaults to 10.  The string s must consist of one
        or more digits, possibly preceded by a sign.  If base is 0, it
        is chosen from the leading characters of s, 0 for octal, 0x or
        0X for hexadecimal.  If base is 16, a preceding 0x or 0X is
        accepted.
    
    atol(s, base=10)
        atol(s [,base]) -> long
        
        Return the long integer represented by the string s in the
        given base, which defaults to 10.  The string s must consist
        of one or more digits, possibly preceded by a sign.  If base
        is 0, it is chosen from the leading characters of s, 0 for
        octal, 0x or 0X for hexadecimal.  If base is 16, a preceding
        0x or 0X is accepted.  A trailing L or l is not accepted,
        unless base is 0.
    
    capitalize(s)
        capitalize(s) -> string
        
        Return a copy of the string s with only its first character
        capitalized.
    
    capwords(s, sep=None)
        capwords(s [,sep]) -> string
        
        Split the argument into words using split, capitalize each
        word using capitalize, and join the capitalized words using
        join.  If the optional second argument sep is absent or None,
        runs of whitespace characters are replaced by a single space
        and leading and trailing whitespace are removed, otherwise
        sep is used to split and join the words.
    
    center(s, width, *args)
        center(s, width[, fillchar]) -> string
        
        Return a center version of s, in a field of the specified
        width. padded with spaces as needed.  The string is never
        truncated.  If specified the fillchar is used instead of spaces.
    
    count(s, *args)
        count(s, sub[, start[,end]]) -> int
        
        Return the number of occurrences of substring sub in string
        s[start:end].  Optional arguments start and end are
        interpreted as in slice notation.
    
    expandtabs(s, tabsize=8)
        expandtabs(s [,tabsize]) -> string
        
        Return a copy of the string s with all tab characters replaced
        by the appropriate number of spaces, depending on the current
        column, and the tabsize (default 8).
    
    find(s, *args)
        find(s, sub [,start [,end]]) -> in
        
        Return the lowest index in s where substring sub is found,
        such that sub is contained within s[start,end].  Optional
        arguments start and end are interpreted as in slice notation.
        
        Return -1 on failure.
    
    index(s, *args)
        index(s, sub [,start [,end]]) -> int
        
        Like find but raises ValueError when the substring is not found.
    
    join(words, sep=' ')
        join(list [,sep]) -> string
        
        Return a string composed of the words in list, with
        intervening occurrences of sep.  The default separator is a
        single space.
        
        (joinfields and join are synonymous)
    
    joinfields = join(words, sep=' ')
        join(list [,sep]) -> string
        
        Return a string composed of the words in list, with
        intervening occurrences of sep.  The default separator is a
        single space.
        
        (joinfields and join are synonymous)
    
    ljust(s, width, *args)
        ljust(s, width[, fillchar]) -> string
        
        Return a left-justified version of s, in a field of the
        specified width, padded with spaces as needed.  The string is
        never truncated.  If specified the fillchar is used instead of spaces.
    
    lower(s)
        lower(s) -> string
        
        Return a copy of the string s converted to lowercase.
    
    lstrip(s, chars=None)
        lstrip(s [,chars]) -> string
        
        Return a copy of the string s with leading whitespace removed.
        If chars is given and not None, remove characters in chars instead.
    
    maketrans(...)
        maketrans(frm, to) -> string
        
        Return a translation table (a string of 256 bytes long)
        suitable for use in string.translate.  The strings frm and to
        must be of the same length.
    
    replace(s, old, new, maxreplace=-1)
        replace (str, old, new[, maxreplace]) -> string
        
        Return a copy of string str with all occurrences of substring
        old replaced by new. If the optional argument maxreplace is
        given, only the first maxreplace occurrences are replaced.
    
    rfind(s, *args)
        rfind(s, sub [,start [,end]]) -> int
        
        Return the highest index in s where substring sub is found,
        such that sub is contained within s[start,end].  Optional
        arguments start and end are interpreted as in slice notation.
        
        Return -1 on failure.
    
    rindex(s, *args)
        rindex(s, sub [,start [,end]]) -> int
        
        Like rfind but raises ValueError when the substring is not found.
    
    rjust(s, width, *args)
        rjust(s, width[, fillchar]) -> string
        
        Return a right-justified version of s, in a field of the
        specified width, padded with spaces as needed.  The string is
        never truncated.  If specified the fillchar is used instead of spaces.
    
    rsplit(s, sep=None, maxsplit=-1)
        rsplit(s [,sep [,maxsplit]]) -> list of strings
        
        Return a list of the words in the string s, using sep as the
        delimiter string, starting at the end of the string and working
        to the front.  If maxsplit is given, at most maxsplit splits are
        done. If sep is not specified or is None, any whitespace string
        is a separator.
    
    rstrip(s, chars=None)
        rstrip(s [,chars]) -> string
        
        Return a copy of the string s with trailing whitespace removed.
        If chars is given and not None, remove characters in chars instead.
    
    split(s, sep=None, maxsplit=-1)
        split(s [,sep [,maxsplit]]) -> list of strings
        
        Return a list of the words in the string s, using sep as the
        delimiter string.  If maxsplit is given, splits at no more than
        maxsplit places (resulting in at most maxsplit+1 words).  If sep
        is not specified or is None, any whitespace string is a separator.
        
        (split and splitfields are synonymous)
    
    splitfields = split(s, sep=None, maxsplit=-1)
        split(s [,sep [,maxsplit]]) -> list of strings
        
        Return a list of the words in the string s, using sep as the
        delimiter string.  If maxsplit is given, splits at no more than
        maxsplit places (resulting in at most maxsplit+1 words).  If sep
        is not specified or is None, any whitespace string is a separator.
        
        (split and splitfields are synonymous)
    
    strip(s, chars=None)
        strip(s [,chars]) -> string
        
        Return a copy of the string s with leading and trailing
        whitespace removed.
        If chars is given and not None, remove characters in chars instead.
        If chars is unicode, S will be converted to unicode before stripping.
    
    swapcase(s)
        swapcase(s) -> string
        
        Return a copy of the string s with upper case characters
        converted to lowercase and vice versa.
    
    translate(s, table, deletions='')
        translate(s,table [,deletions]) -> string
        
        Return a copy of the string s, where all characters occurring
        in the optional argument deletions are removed, and the
        remaining characters have been mapped through the given
        translation table, which must be a string of length 256.  The
        deletions argument is not allowed for Unicode strings.
    
    upper(s)
        upper(s) -> string
        
        Return a copy of the string s converted to uppercase.
    
    zfill(x, width)
        zfill(x, width) -> string
        
        Pad a numeric string x with zeros on the left, to fill a field
        of the specified width.  The string x is never truncated.

DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
    lowercase = 'abcdefghijklmnopqrstuvwxyz'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTU...
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    whitespace = '\t\n\x0b\x0c\r '

Oo, look at that, learn something every time:

import string
string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

Notes on importing libraries and functions#

Python provides a somewhat confusing variety of ways of importing functions and libraries.

import X
import X as Y
from X import *
from X import a,b,c
X = __import__('X')

The differences and pros and cons are discussed in this excellent article: http://effbot.org/zone/import-confusion.htm

To find out the version of the library you’ve imported:#

import nltk
nltk.__version__
'3.7'

To find out the location of the source files that are being loaded when you import a library:#

import nltk
nltk.__file__
'/Users/glupyan/anaconda3/envs/jupyter/lib/python3.9/site-packages/nltk/__init__.py'

Finding something in lists and strings#

Supposed you have a list called shoppingList:

shoppingList =  ['apples', 'oranges', 'screwdriver']

And you want to determine if this list contains some item, say, ‘apples’. The easiest way to do it is to use in.

if 'apples' in shoppingList:
    print('yep')
yep

Now, suppose your shopping list is in a string called shopping list and you want to to determine if a string variable called shoppingList contains the word ‘apples’ in it.

shoppingString =  'apples, oranges, screwdriver'

Turns out in works here as well:

if 'apples' in shoppingString:
    print('yep')
yep

The reason in operator works here is that in is defined for all sequences (lists, tuples, strings, etc.). Note, however, that in this case, there is an ambiguity. In the case of a shoppingList list, ‘apples’ is a standalone element. In the case of a shoppingList string, python doesn’t know where one element starts and the next stops. Therefore, both of these statements will be true for shoppingStrings.

'apple' in shoppingString
True
'apples' in shoppingString
True

Tip

If you want to search a string more flexibly, you can use String.find and regular expressions which we’ll cover later in the term.

but not for shoppingList

'apple' in shoppingList
False
'apples' in shoppingList
True

Just as you can use in to check if an element is contained in a sequence, you can use not in to check if it’s not in the sequence.

Use Exceptions#

See the Python doc on exceptions here http://docs.python.org/tutorial/errors.html The ‘pythonic’ way of doing things is to try it and catch the exception rather than check first.

For example, rather than doing this:

if os.path.exists('name.txt'):
    f = open('name.txt', 'r')
else:
    print('file does not exist')
    sys.exit()

do this:

try:
    f = open('name.txt', 'r')
except IOError:
    print('file not found!')
    sys.exit()

There are many cases where you have to use exceptions to keep your program from crashing, for example, division by 0.

Using list comprehension#

This

print([letter for letter in 'abracadabra'])

is better than this

for letter in 'abcracadabra'
  print(letter)

Here’s another example. Say you have a list of names and you want to split them into first and last names

names = ['Ed Sullivan', 'Salvador Dali']
firstNames = [name.split(' ')[0] for name in names]
lastNames =  [name.split(' ')[1] for name in names]

Another example: generate 10 random numbers in the range 1-5:

import random

[random.randint(1,5) for i in range(10)]
[5, 5, 2, 4, 2, 2, 4, 3, 4, 1]

Or generate 10 random letters:

import string
[random.choice(list(string.ascii_lowercase)) for i in range(10)]
['b', 'n', 'a', 'v', 'h', 'o', 'p', 'f', 'l', 'l']

And yet another example, this one restricting the output using a conditional. Generate numbers from 0-7, but omitting 2 and 5:

[location for location in range(8) if location not in [2,5]]
[0, 1, 3, 4, 6, 7]

List comprehension! all the cool kids do it.

On the other hand…. think twice before obfuscating your code:

For example, the repetition function from Exercise 4 (trial generation) can be rewritten as a one-liner:

def repetition(letters,numberBeforeSwitch,numRepetitions):
       print('\n'.join([item for sublist in  [[i] * numberBeforeSwitch for i in letters] for item in sublist] * numRepetitions))


repetition(['a','b','c'], 2, 2)
a
a
b
b
c
c
a
a
b
b
c
c

It is fast and compact, but certainly not very clear.

How to flatten a list#

Say you’ve got a list like this:

list1 = [['A','B'],['A','B'],['C','D'],['C','D']]

But what you want is this:

list2 = ['A','B','A','B','C','D','C','D']

You can turn list1 into list2 (i.e., flatten list1), like so:

list2 = [item for sublist in list1 for item in sublist]
list2
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']

The above method will only work for flattening lists of depth-1, see here for more information.

An alternative way of flattening a list is to use NumPy.

import numpy
list1 = numpy.array(list1) # convert it to a numpy array
list1 = list1.flatten()    # flatten it
list1 = list(list1)        # convert it back to a Python list, if you want.
list1
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']

We can, of course, do it all in one line:

list1 = list(numpy.array(list1).flatten())
list1
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']

(In cases like this, you can continue to work with the NumPy Array, which lets you do all sorts of neat things).

Detect sequences#

Say you have a list and you want to know whether it has sequential elements (e.g., 3,4). Why would you care? Suppose you want to intersperse catch trials throughout your experiment, but you don’t want to have two catch trials in a row. How to ensure this?

import random

def has_sequential(lst):
	lst = sorted(lst)
	for elt in zip(lst,lst[1:]):
		if elt[1]-elt[0]==1:
			return True
	return False

repeatTrials = random.sample(range(180),20)
while has_sequential(repeatTrials):
    repeatTrials = random.sample(range(180),20)
print(sorted(repeatTrials))
[2, 35, 49, 55, 57, 59, 68, 74, 81, 83, 95, 98, 119, 139, 147, 150, 152, 155, 160, 165]

Shuffle a list slice in place#

To shuffle a list in place, we can use random.shuffle(lst). But what if you want to shuffle only a part of the list? random.shuffle(lst) will shuffle the whole list (and unhelpfully return None).

One option is to use a modified Knuth (a.k.a. Fisher-Yates) shuffle.

import random
def shuffle_slice(a, start, stop):
    index1 = start
    while (index1 < stop-1):
        index2 = random.randrange(index1, stop)
        a[index1], a[index2] = a[index2], a[index1]
        index1 += 1

a = range(10)
print(a)
shuffle_slice(a,0,4)
print(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 0, 1, 3, 4, 5, 6, 7, 8, 9]

Use sets#

Don’t reinvent the wheel. Operations like computing intersections, unions, and uniqueness are all well-defined functions in set notation and are built in to Python. See here. Some examples of sets:

Get the intersection (the elements in common)

set('abc').intersection('cde')
{'c'}

Get the union (all the elements)

set('abc').union('cdef')
{'a', 'b', 'c', 'd', 'e', 'f'}

Note that because, by definition, a set can only contain unique elements, they are a good way to get all the distinct elements in a list.

spam = ['s','s','s','p','p','a','m']
set(spam)
{'a', 'm', 'p', 's'}

Caveat: sets are, by definition, not ordered, hence we are not guaranteed to get ‘s’,’p’,’a’,’m’.

Let’s see what spam and ham have in common.

set('spam').intersection('ham')
{'a', 'm'}

And what they don’t

set('spam').difference('ham')
{'p', 's'}

Arithmetic and floating point notation#

Python uses dynamic typing. This means that it attempts to automatically detect the type of variable you are creating.

For example

spam = "can be fried"

Assigns the string can be fried to the variable spam. It knows it’s a string because it’s in quotations

spam = 3

assigns spam to the integer 3, which is not the same as

spam2 = '3'
spam == spam2
False

Tip

If you’re not sure what type something is, use the type() function to check.

Reference, mutability, and copying#

Have a look at this:

egg = 'green'
ham = egg
ham
'green'
egg = 'yellow'
ham
'green'

Easy enough. Now have a look here:

egg = ['green']
ham = egg
ham
['green']
egg[0] = 'yellow'
ham
['yellow']

What do you think is happening here? That’s right, ham points to the egg list, not to the content inside. When you change the content of the egg list, you’ve changed the value of the ham variable.

Writing to a file, safely#

import os
fileHandle = open('dataFile.txt','w')
fileHandle.write(line) # the line you are writing.
fileHandle.flush()     # mostly unnecessary 
os.fsync(fileHandle)   # ditto; it helps if you have several processes writing to the file

At the end of your experiment:

fileHandle.close()

Copy a file#

To copy a file use shutil.copyfile(src, dst). src is the path and name of the original file. dst is the path and name where src will be copied.

import shutil 
shutil.copyfile(src,dst)

Examples

shutil.copyfile('1.dat', '3.dat')

This copies 1.dat into a new file named 3.dat.

shutil.copyfile('1.dat', 'directory\\3.dat')

This copies 1.dat into the specified directory as 3.dat. Notice the escape character before the slash.

Create a new directory#

import os
os.makedirs(newDirectoryName)

Some simple generator functions#

Here’s a function that implements an infinite list of odd numbers.

def oddNum(start):
    while True:
        if start % 2 ==0:
            start+=1
        yield start
        start+=1

Here’s one way to use it:
Get 30 odd numbers starting at 1

someOddNums = oddNum(1) #start it at 1
print(someOddNums)
for i in range(30):
    print(next(someOddNums))
<generator object oddNum at 0x115550740>
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59

Here’s another way using list comprehension:

moreOddNums = oddNum(1) #start it at 1
[next(moreOddNums) for i in range(30)]
[1,
 3,
 5,
 7,
 9,
 11,
 13,
 15,
 17,
 19,
 21,
 23,
 25,
 27,
 29,
 31,
 33,
 35,
 37,
 39,
 41,
 43,
 45,
 47,
 49,
 51,
 53,
 55,
 57,
 59]

Here’s a generator function for implementing a circular list. If you pass in a number, it will create a list of integers of that length, i.e., circularList(5) will create a circular list from [0,1,2,3,4]. If you pass in a list, it will make a circular list out of what you pass in, e.g., circularList(['a','b','c']) will create a circular list from ['a','b','c'])

def circularList(lst):
    if not isinstance(lst,list) and isinstance(lst,int):
        lst = range(lst)
    i = 0
    while True:
        yield lst[i]
        i = (i + 1)%len(lst) #try this out to understand the logic

To use it, create a new generator by assigning it to a variable:

myGenerator = circularList(lst)

where lst is the list you’d like to iterate through continuously. Notice the conditional in the first line of the circularList function. This allows the function to take in either a list or an integer. In the latter case, the function constructs a new list of that length, e.g., circularList(3) will iterate through the list [0,1,2] ad infinitum:

myGenerator = circularList([0,1,2])
next(myGenerator)
0
next(myGenerator)
1
next(myGenerator)
2
next(myGenerator)
0

See what happens if you make a generator using a character string, e.g., myGenerator = circularList('spam').

Here’s a slightly more complex version of the circularList generator. The basic version above iterates through the list always in the same order. It is more likely that you’ll want to iterate through it in a less ordered way. The variant below shuffles the list after each complete passthrough. Moreover, the shuffling is controlled by a seed so that each time you run it with the same seed, you’ll get the same sequence of randomizations.

import random

def randomizingCircularList(lst,seed):
    if not isinstance(lst,list):
        lst = range(lst)
    i = 0
    random.seed(seed)
    while True:
        yield lst[i]
        if (i+1) % len(lst) ==0:
            random.shuffle(lst)
        i = (i + 1)%len(lst)

newCircle = randomizingCircularList(['a','b','c'], 10)

for i in range(10):
    print(next(newCircle))
a
b
c
b
a
c
b
c
a
c

Simple classes#

Here is a simple counter class:

class Counter:
    """A simple counting class"""
    def __init__(self,start=0):
        """Initialize a counter to zero or start if supplied."""
        self.count= start
    def __call__(self):
        """Return the current count."""
        return self.count
    def increment(self, amount):
        """Increment the counter."""
        self.count+= amount
    def reset(self):
        """Reset the counter to zero."""
        self.count= 0

Here’s another simple class:

class BankAccount():
    def __init__(self, initial_balance=0):
        self.balance = initial_balance
    def deposit(self, amount):
        self.balance += amount
    def withdraw(self, amount):
        self.balance -= amount
    def overdrawn(self):
        return self.balance < 0

Creating an instance of a BankAccount class and manipulatig the balance is as simple as:

my_account = BankAccount(15)
my_account.withdraw(5)
print(my_account.balance)
10

For most experiments you’ll be creating, it’s probably not necessary to use object oriented programming (OOP). When might you want to use it? Consider a dynamic experiment such as the bouncing ball (Exercise 11). Suppose you want to have multiple bouncing balls at the same time? This is cumbersome without OOP, but becomes very simple with OOP: just create a bouncing ball class and then instantiate a new instance of a bouncingBall for each one you want to appear. Remember: each class instance you create (e.g., greenBall = bouncingBall(color="green")), is completely independent from other instances you create.