Python basics
Contents
Python basics#
Getting help#
There are many places to turn to for a Python reference and basic
help. The quickest way to get help on a function is to google
python what you're looking for
. Typically, Google will refer you
to http://docs.python.org/. For example, try googling
python randomize
. Google is good. Below are some additional
references:
StackOverflow : For example, try searching for “python randomize” on StackOverflow: https://stackoverflow.com/search?q=python+randomize
Software Carpentry : Software Carpentry also has lectures and tutorials on Linux, Scientific Computing, and many other topics: https://software-carpentry.org/lessons/
Python Visualizer : The Python Visualizer may be helpful if you are having trouble conceptualizing how python exectures some bit of code.
NumPy for Matlab users : If you’re a Matlab user transitioning to Python, this page may be helpful.
Quick references#
Lists and list comprehension: http://docs.python.org/tutorial/datastructures.html#more-on-lists
Useful functions for Python dictionaries: https://docs.python.org/3.7/library/stdtypes.html
Writing/reading files: http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
Sorting lists and dictionaries - nice tips for how to sort by a particular key: http://wiki.python.org/moin/HowTo/Sorting
Python mini tutorials and tips#
Get help on a module#
To get help on the functions contained in some module, for instance,
the module ‘string’, type: help('string')
help('string')
Help on module string:
NAME
string - A collection of string operations (most are no longer used).
FILE
/usr/local/Cellar/python/2.7.14_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/string.py
DESCRIPTION
Warning: most of the code you see here isn't normally used nowadays.
Beginning with Python 1.6, many of these functions are implemented as
methods on the standard string object. They used to be implemented by
a built-in module called strop, but strop is now obsolete itself.
Public module variables:
whitespace -- a string containing all characters considered whitespace
lowercase -- a string containing all characters considered lowercase letters
uppercase -- a string containing all characters considered uppercase letters
letters -- a string containing all characters considered letters
digits -- a string containing all characters considered decimal digits
hexdigits -- a string containing all characters considered hexadecimal digits
octdigits -- a string containing all characters considered octal digits
punctuation -- a string containing all characters considered punctuation
printable -- a string containing all characters considered printable
CLASSES
__builtin__.object
Formatter
Template
class Formatter(__builtin__.object)
| Methods defined here:
|
| check_unused_args(self, used_args, args, kwargs)
|
| convert_field(self, value, conversion)
|
| format(*args, **kwargs)
|
| format_field(self, value, format_spec)
|
| get_field(self, field_name, args, kwargs)
| # given a field_name, find the object it references.
| # field_name: the field being looked up, e.g. "0.name"
| # or "lookup[3]"
| # used_args: a set of which args have been used
| # args, kwargs: as passed in to vformat
|
| get_value(self, key, args, kwargs)
|
| parse(self, format_string)
| # returns an iterable that contains tuples of the form:
| # (literal_text, field_name, format_spec, conversion)
| # literal_text can be zero length
| # field_name can be None, in which case there's no
| # object to format and output
| # if field_name is not None, it is looked up, formatted
| # with format_spec and conversion and then used
|
| vformat(self, format_string, args, kwargs)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class Template(__builtin__.object)
| A string class for supporting $-substitutions.
|
| Methods defined here:
|
| __init__(self, template)
|
| safe_substitute(*args, **kws)
|
| substitute(*args, **kws)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __metaclass__ = <class 'string._TemplateMetaclass'>
|
|
| delimiter = '$'
|
| idpattern = '[_a-z][_a-z0-9]*'
|
| pattern = <_sre.SRE_Pattern object>
FUNCTIONS
atof(s)
atof(s) -> float
Return the floating point number represented by the string s.
atoi(s, base=10)
atoi(s [,base]) -> int
Return the integer represented by the string s in the given
base, which defaults to 10. The string s must consist of one
or more digits, possibly preceded by a sign. If base is 0, it
is chosen from the leading characters of s, 0 for octal, 0x or
0X for hexadecimal. If base is 16, a preceding 0x or 0X is
accepted.
atol(s, base=10)
atol(s [,base]) -> long
Return the long integer represented by the string s in the
given base, which defaults to 10. The string s must consist
of one or more digits, possibly preceded by a sign. If base
is 0, it is chosen from the leading characters of s, 0 for
octal, 0x or 0X for hexadecimal. If base is 16, a preceding
0x or 0X is accepted. A trailing L or l is not accepted,
unless base is 0.
capitalize(s)
capitalize(s) -> string
Return a copy of the string s with only its first character
capitalized.
capwords(s, sep=None)
capwords(s [,sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. If the optional second argument sep is absent or None,
runs of whitespace characters are replaced by a single space
and leading and trailing whitespace are removed, otherwise
sep is used to split and join the words.
center(s, width, *args)
center(s, width[, fillchar]) -> string
Return a center version of s, in a field of the specified
width. padded with spaces as needed. The string is never
truncated. If specified the fillchar is used instead of spaces.
count(s, *args)
count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
expandtabs(s, tabsize=8)
expandtabs(s [,tabsize]) -> string
Return a copy of the string s with all tab characters replaced
by the appropriate number of spaces, depending on the current
column, and the tabsize (default 8).
find(s, *args)
find(s, sub [,start [,end]]) -> in
Return the lowest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
index(s, *args)
index(s, sub [,start [,end]]) -> int
Like find but raises ValueError when the substring is not found.
join(words, sep=' ')
join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
joinfields = join(words, sep=' ')
join(list [,sep]) -> string
Return a string composed of the words in list, with
intervening occurrences of sep. The default separator is a
single space.
(joinfields and join are synonymous)
ljust(s, width, *args)
ljust(s, width[, fillchar]) -> string
Return a left-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated. If specified the fillchar is used instead of spaces.
lower(s)
lower(s) -> string
Return a copy of the string s converted to lowercase.
lstrip(s, chars=None)
lstrip(s [,chars]) -> string
Return a copy of the string s with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
maketrans(...)
maketrans(frm, to) -> string
Return a translation table (a string of 256 bytes long)
suitable for use in string.translate. The strings frm and to
must be of the same length.
replace(s, old, new, maxreplace=-1)
replace (str, old, new[, maxreplace]) -> string
Return a copy of string str with all occurrences of substring
old replaced by new. If the optional argument maxreplace is
given, only the first maxreplace occurrences are replaced.
rfind(s, *args)
rfind(s, sub [,start [,end]]) -> int
Return the highest index in s where substring sub is found,
such that sub is contained within s[start,end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(s, *args)
rindex(s, sub [,start [,end]]) -> int
Like rfind but raises ValueError when the substring is not found.
rjust(s, width, *args)
rjust(s, width[, fillchar]) -> string
Return a right-justified version of s, in a field of the
specified width, padded with spaces as needed. The string is
never truncated. If specified the fillchar is used instead of spaces.
rsplit(s, sep=None, maxsplit=-1)
rsplit(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string, starting at the end of the string and working
to the front. If maxsplit is given, at most maxsplit splits are
done. If sep is not specified or is None, any whitespace string
is a separator.
rstrip(s, chars=None)
rstrip(s [,chars]) -> string
Return a copy of the string s with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(s, sep=None, maxsplit=-1)
split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified or is None, any whitespace string is a separator.
(split and splitfields are synonymous)
splitfields = split(s, sep=None, maxsplit=-1)
split(s [,sep [,maxsplit]]) -> list of strings
Return a list of the words in the string s, using sep as the
delimiter string. If maxsplit is given, splits at no more than
maxsplit places (resulting in at most maxsplit+1 words). If sep
is not specified or is None, any whitespace string is a separator.
(split and splitfields are synonymous)
strip(s, chars=None)
strip(s [,chars]) -> string
Return a copy of the string s with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping.
swapcase(s)
swapcase(s) -> string
Return a copy of the string s with upper case characters
converted to lowercase and vice versa.
translate(s, table, deletions='')
translate(s,table [,deletions]) -> string
Return a copy of the string s, where all characters occurring
in the optional argument deletions are removed, and the
remaining characters have been mapped through the given
translation table, which must be a string of length 256. The
deletions argument is not allowed for Unicode strings.
upper(s)
upper(s) -> string
Return a copy of the string s converted to uppercase.
zfill(x, width)
zfill(x, width) -> string
Pad a numeric string x with zeros on the left, to fill a field
of the specified width. The string x is never truncated.
DATA
ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits = '0123456789'
hexdigits = '0123456789abcdefABCDEF'
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
lowercase = 'abcdefghijklmnopqrstuvwxyz'
octdigits = '01234567'
printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTU...
punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
whitespace = '\t\n\x0b\x0c\r '
Oo, look at that, learn something every time:
import string
string.ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Notes on importing libraries and functions#
Python provides a somewhat confusing variety of ways of importing functions and libraries.
import X
import X as Y
from X import *
from X import a,b,c
X = __import__('X')
The differences and pros and cons are discussed in this excellent article: http://effbot.org/zone/import-confusion.htm
To find out the version of the library you’ve imported:#
import nltk
nltk.__version__
'3.7'
To find out the location of the source files that are being loaded when you import a library:#
import nltk
nltk.__file__
'/Users/glupyan/anaconda3/envs/jupyter/lib/python3.9/site-packages/nltk/__init__.py'
Finding something in lists and strings#
Supposed you have a list called shoppingList:
shoppingList = ['apples', 'oranges', 'screwdriver']
And you want to determine if this list contains some item, say,
‘apples’. The easiest way to do it is to use in
.
if 'apples' in shoppingList:
print('yep')
yep
Now, suppose your shopping list is in a string called shopping list and you want to to determine if a string variable called shoppingList contains the word ‘apples’ in it.
shoppingString = 'apples, oranges, screwdriver'
Turns out in
works here as well:
if 'apples' in shoppingString:
print('yep')
yep
The reason in
operator works here is that in
is defined for all
sequences (lists, tuples, strings, etc.). Note, however, that in this
case, there is an ambiguity. In the case of a shoppingList list,
‘apples’ is a standalone element. In the case of a shoppingList
string, python doesn’t know where one element starts and the next
stops. Therefore, both of these statements will be true for
shoppingStrings.
'apple' in shoppingString
True
'apples' in shoppingString
True
Tip
If you want to search a string more flexibly, you can use String.find and regular expressions which we’ll cover later in the term.
but not for shoppingList
'apple' in shoppingList
False
'apples' in shoppingList
True
Just as you can use in
to check if an element is contained in a
sequence, you can use not in
to check if it’s not in the sequence.
Use Exceptions#
See the Python doc on exceptions here http://docs.python.org/tutorial/errors.html The ‘pythonic’ way of doing things is to try it and catch the exception rather than check first.
For example, rather than doing this:
if os.path.exists('name.txt'):
f = open('name.txt', 'r')
else:
print('file does not exist')
sys.exit()
do this:
try:
f = open('name.txt', 'r')
except IOError:
print('file not found!')
sys.exit()
There are many cases where you have to use exceptions to keep your program from crashing, for example, division by 0.
Using list comprehension#
This
print([letter for letter in 'abracadabra'])
is better than this
for letter in 'abcracadabra'
print(letter)
Here’s another example. Say you have a list of names and you want to split them into first and last names
names = ['Ed Sullivan', 'Salvador Dali']
firstNames = [name.split(' ')[0] for name in names]
lastNames = [name.split(' ')[1] for name in names]
Another example: generate 10 random numbers in the range 1-5:
import random
[random.randint(1,5) for i in range(10)]
[5, 5, 2, 4, 2, 2, 4, 3, 4, 1]
Or generate 10 random letters:
import string
[random.choice(list(string.ascii_lowercase)) for i in range(10)]
['b', 'n', 'a', 'v', 'h', 'o', 'p', 'f', 'l', 'l']
And yet another example, this one restricting the output using a conditional. Generate numbers from 0-7, but omitting 2 and 5:
[location for location in range(8) if location not in [2,5]]
[0, 1, 3, 4, 6, 7]
List comprehension! all the cool kids do it.
On the other hand…. think twice before obfuscating your code:
For example, the repetition function from Exercise 4 (trial generation) can be rewritten as a one-liner:
def repetition(letters,numberBeforeSwitch,numRepetitions):
print('\n'.join([item for sublist in [[i] * numberBeforeSwitch for i in letters] for item in sublist] * numRepetitions))
repetition(['a','b','c'], 2, 2)
a
a
b
b
c
c
a
a
b
b
c
c
It is fast and compact, but certainly not very clear.
How to flatten a list#
Say you’ve got a list like this:
list1 = [['A','B'],['A','B'],['C','D'],['C','D']]
But what you want is this:
list2 = ['A','B','A','B','C','D','C','D']
You can turn list1 into list2 (i.e., flatten list1), like so:
list2 = [item for sublist in list1 for item in sublist]
list2
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']
The above method will only work for flattening lists of depth-1, see here for more information.
An alternative way of flattening a list is to use NumPy.
import numpy
list1 = numpy.array(list1) # convert it to a numpy array
list1 = list1.flatten() # flatten it
list1 = list(list1) # convert it back to a Python list, if you want.
list1
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']
We can, of course, do it all in one line:
list1 = list(numpy.array(list1).flatten())
list1
['A', 'B', 'A', 'B', 'C', 'D', 'C', 'D']
(In cases like this, you can continue to work with the NumPy Array, which lets you do all sorts of neat things).
Detect sequences#
Say you have a list and you want to know whether it has sequential elements (e.g., 3,4). Why would you care? Suppose you want to intersperse catch trials throughout your experiment, but you don’t want to have two catch trials in a row. How to ensure this?
import random
def has_sequential(lst):
lst = sorted(lst)
for elt in zip(lst,lst[1:]):
if elt[1]-elt[0]==1:
return True
return False
repeatTrials = random.sample(range(180),20)
while has_sequential(repeatTrials):
repeatTrials = random.sample(range(180),20)
print(sorted(repeatTrials))
[2, 35, 49, 55, 57, 59, 68, 74, 81, 83, 95, 98, 119, 139, 147, 150, 152, 155, 160, 165]
Shuffle a list slice in place#
To shuffle a list in place, we can use random.shuffle(lst)
. But what if you want to shuffle only a part of the list? random.shuffle(lst)
will shuffle the whole list (and unhelpfully return None
).
One option is to use a modified Knuth (a.k.a. Fisher-Yates) shuffle.
import random
def shuffle_slice(a, start, stop):
index1 = start
while (index1 < stop-1):
index2 = random.randrange(index1, stop)
a[index1], a[index2] = a[index2], a[index1]
index1 += 1
a = range(10)
print(a)
shuffle_slice(a,0,4)
print(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 0, 1, 3, 4, 5, 6, 7, 8, 9]
Use sets#
Don’t reinvent the wheel. Operations like computing intersections, unions, and uniqueness are all well-defined functions in set notation and are built in to Python. See here. Some examples of sets:
Get the intersection (the elements in common)
set('abc').intersection('cde')
{'c'}
Get the union (all the elements)
set('abc').union('cdef')
{'a', 'b', 'c', 'd', 'e', 'f'}
Note that because, by definition, a set can only contain unique elements, they are a good way to get all the distinct elements in a list.
spam = ['s','s','s','p','p','a','m']
set(spam)
{'a', 'm', 'p', 's'}
Caveat: sets are, by definition, not ordered, hence we are not guaranteed to get ‘s’,’p’,’a’,’m’.
Let’s see what spam and ham have in common.
set('spam').intersection('ham')
{'a', 'm'}
And what they don’t
set('spam').difference('ham')
{'p', 's'}
Arithmetic and floating point notation#
Python uses dynamic typing. This means that it attempts to automatically detect the type of variable you are creating.
For example
spam = "can be fried"
Assigns the string can be fried
to the variable spam. It knows it’s a
string because it’s in quotations
spam = 3
assigns spam to the integer 3, which is not the same as
spam2 = '3'
spam == spam2
False
Tip
If you’re not sure what type something is, use the type()
function to check.
Reference, mutability, and copying#
Have a look at this:
egg = 'green'
ham = egg
ham
'green'
egg = 'yellow'
ham
'green'
Easy enough. Now have a look here:
egg = ['green']
ham = egg
ham
['green']
egg[0] = 'yellow'
ham
['yellow']
What do you think is happening here? That’s right, ham points to the
egg list, not to the content inside. When you change the content of the egg
list, you’ve changed the value of the ham
variable.
Writing to a file, safely#
import os
fileHandle = open('dataFile.txt','w')
fileHandle.write(line) # the line you are writing.
fileHandle.flush() # mostly unnecessary
os.fsync(fileHandle) # ditto; it helps if you have several processes writing to the file
At the end of your experiment:
fileHandle.close()
Copy a file#
To copy a file use shutil.copyfile(src, dst). src is the path and name of the original file. dst is the path and name where src will be copied.
import shutil
shutil.copyfile(src,dst)
Examples
shutil.copyfile('1.dat', '3.dat')
This copies 1.dat into a new file named 3.dat.
shutil.copyfile('1.dat', 'directory\\3.dat')
This copies 1.dat into the specified directory as 3.dat. Notice the escape character before the slash.
Create a new directory#
import os
os.makedirs(newDirectoryName)
Some simple generator functions#
Here’s a function that implements an infinite list of odd numbers.
def oddNum(start):
while True:
if start % 2 ==0:
start+=1
yield start
start+=1
Here’s one way to use it:
Get 30 odd numbers starting at 1
someOddNums = oddNum(1) #start it at 1
print(someOddNums)
for i in range(30):
print(next(someOddNums))
<generator object oddNum at 0x115550740>
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
Here’s another way using list comprehension:
moreOddNums = oddNum(1) #start it at 1
[next(moreOddNums) for i in range(30)]
[1,
3,
5,
7,
9,
11,
13,
15,
17,
19,
21,
23,
25,
27,
29,
31,
33,
35,
37,
39,
41,
43,
45,
47,
49,
51,
53,
55,
57,
59]
Here’s a generator function for implementing a circular list. If you
pass in a number, it will create a list of integers of that length,
i.e., circularList(5)
will create a circular list from [0,1,2,3,4]
. If
you pass in a list, it will make a circular list out of what you pass
in, e.g., circularList(['a','b','c'])
will create a circular
list from ['a','b','c']
)
def circularList(lst):
if not isinstance(lst,list) and isinstance(lst,int):
lst = range(lst)
i = 0
while True:
yield lst[i]
i = (i + 1)%len(lst) #try this out to understand the logic
To use it, create a new generator by assigning it to a variable:
myGenerator = circularList(lst)
where lst
is the list you’d like to iterate through continuously.
Notice the conditional in the first line of the circularList function.
This allows the function to take in either a list or an integer. In the
latter case, the function constructs a new list of that length, e.g.,
circularList(3) will iterate through the list [0,1,2] ad infinitum:
myGenerator = circularList([0,1,2])
next(myGenerator)
0
next(myGenerator)
1
next(myGenerator)
2
next(myGenerator)
0
See what happens if you make a generator using a character string, e.g.,
myGenerator = circularList('spam').
Here’s a slightly more complex version of the circularList generator. The basic version above iterates through the list always in the same order. It is more likely that you’ll want to iterate through it in a less ordered way. The variant below shuffles the list after each complete passthrough. Moreover, the shuffling is controlled by a seed so that each time you run it with the same seed, you’ll get the same sequence of randomizations.
import random
def randomizingCircularList(lst,seed):
if not isinstance(lst,list):
lst = range(lst)
i = 0
random.seed(seed)
while True:
yield lst[i]
if (i+1) % len(lst) ==0:
random.shuffle(lst)
i = (i + 1)%len(lst)
newCircle = randomizingCircularList(['a','b','c'], 10)
for i in range(10):
print(next(newCircle))
a
b
c
b
a
c
b
c
a
c
Simple classes#
Here is a simple counter class:
class Counter:
"""A simple counting class"""
def __init__(self,start=0):
"""Initialize a counter to zero or start if supplied."""
self.count= start
def __call__(self):
"""Return the current count."""
return self.count
def increment(self, amount):
"""Increment the counter."""
self.count+= amount
def reset(self):
"""Reset the counter to zero."""
self.count= 0
Here’s another simple class:
class BankAccount():
def __init__(self, initial_balance=0):
self.balance = initial_balance
def deposit(self, amount):
self.balance += amount
def withdraw(self, amount):
self.balance -= amount
def overdrawn(self):
return self.balance < 0
Creating an instance of a BankAccount class and manipulatig the balance is as simple as:
my_account = BankAccount(15)
my_account.withdraw(5)
print(my_account.balance)
10
For most experiments you’ll be creating, it’s probably not necessary
to use object oriented programming (OOP). When might you want to use it?
Consider a dynamic experiment such as the bouncing ball (Exercise 11).
Suppose you want to have multiple bouncing balls at the same time? This
is cumbersome without OOP, but becomes very simple with OOP: just create
a bouncing ball class and then instantiate a new instance of a
bouncingBall for each one you want to appear. Remember: each class
instance you create (e.g., greenBall = bouncingBall(color="green")
),
is completely independent from other instances you create.