Python Notes
Links
Quick style guide
# 79 column rule
# module_name, package_name, ClassName, method_name,
# ExceptionName, function_name, GLOBAL_CONSTANT_NAME,
# global_var_name, instance_var_name, function_parameter_name,
# local_var_name
Doc string
Basics
print(__name__)
if __name__ == '__main__':
print 'match'
One of the use of this functionality to write various kind of unit tests within the same module. For example, you can just import many different modules you have written and call specific functions.
Scoping
global
statement can be used to indicate that particular variables live in the global scope and should be rebound there.nonlocal
statement indicates that particular variables live in an enclosing scope and should be rebound there.
def scope_test():
def do_local():
spam = "local spam"
def do_nonlocal():
nonlocal spam
spam = "nonlocal spam"
def do_global():
global spam
spam = "global spam"
spam = "test spam"
do_local()
print("After local assignment:", spam)
do_nonlocal()
print("After nonlocal assignment:", spam)
do_global()
print("After global assignment:", spam)
scope_test()
print("In global scope:", spam)
# After local assignment: test spam
# After nonlocal assignment: nonlocal spam
# After global assignment: nonlocal spam
# In global scope: global spam
Sequences
There are three sequence types:
List
These are mutable sequences that are used to store collections of homogeneous items.
Deque
Supports fewer operations but faster for adding/removing from the head/tail of the queue - O(1)
. See the docs.
Tuple
These are immutable sequences and assignment to its members aren't possible however they may contain mutable objects like lists.
Named tuple
Range
The range
type is similar to Python 2.7 xrange
where it has constant space cost - small memory usage like generators.
Misc
# a if condition else b
x = 100 if True else 0
# lists
l = [1, 2] + [2, 3] # l += list(range(10))
l[-1] # last element
# slicing
# sublist will include the element at the lower bound
# but exclude the element at the upper bound
sublist = list(range(1, 11))[3: -1]
# *args, **kwargs
def foo(parg, *args, **kwargs):
print("Positional argument: {0}".format(parg))
for i, a in enumerate(args):
print("Arg {0}: {1}".format(i, a))
for k, v in kwargs.items():
print("{0}: {1}".format(k, v))
# now call the function and do cool things with unpacking
list_args = ['arg2', 'arg3', 'arg4']
dict_args = {'karg1': 1, 'karg2': 2, 'karg3': 3}
foo('pos1 arg', *list_args, **dict_args)
# print formatting
print("Float with 2 decimal places: {0:.2f}".format(10.111))
print("Decimal padded (left justified): {0:10d}".format(100))
print("Decimal padded (right justified): {0:>10d}".format(100))
print("String padded: {0:20}, end of padding.".format("Short String"))
is
vs ==
is
vs ==
is
checks that 2 arguments refer to the same object.==
checks that 2 arguments have the same value.
1 == True # True
1 is True # False
# dir(1) == dir(True)
# these return the namespace list from both instances
# both lists contain the same values
# BUT they are not the same instance
# hence why the *is* will fail
# another example
list(range(1) == list(range(1)) # True
list(range(1) is list(range(1)) # False
iterables and iterators
See the Stack Overflow post for more explanations.
An iterable is:
anything that can be looped over (i.e. you can loop over a string or file) or
anything that can appear on the right-side of a for-loop:
for x in iterable:
anything you can call with
iter()
that will return an iterator e.g.iter(obj)
an object that defines
__iter__
that returns a fresh iterator, or it may have a__getitem__
method suitable for indexed lookup.no state!
An iterator is an object:
with state that remembers where it is during iteration
with a
__next__
method that:returns the next value in the iteration
updates the state to point at the next value
signals when it is done by raising
StopIteration
is self-iterable (meaning that it has an
__iter__
method that returnsself
).next(iterator)
calls__next__
on the iterator.
In a for
loop:
iterable_obj = 'abcde' # immutable, no state
iterator_obj = iter(iterable_obj) # this calls iterable_obj.__iter__()
# now emulate the for loop
while True:
try:
next(iterator_obj) # calls iterator_obj.__next__()
except StopIteration:
break
View objects
They provide a dynamic view on the dictionary's entries, which means that when the dictionary changes, the view reflects these changes.
See the docs for more info.
LEGB-rule
Local -> Enclosed -> Global -> Built-in
Where the arrows should denote the direction of the namespace-hierarchy search order.
Local can be inside a function or class method, for example.
Enclosed can be its enclosing function, e.g., if a function is wrapped inside another function.
Global refers to the uppermost level of the executing script itself
For loops
range(0, 10)
returns an immutablerange
type - iterator, so you need to cast it aslist(range(10))
.Use of other built-ins like
reversed()
,sorted()
,list.sort()
.
num_list1 = list(range(10, 20)) # Python 3
num_list2 = list(range(20, 30))
# get index and value
for i, v in enumerate(num_list1):
print('index: {0}, value: {1}'.format(i, v))
# iterate over both lists at the same time
for x, y in zip(num_list1, num_list2):
print('item1: {0}, item2: {1}'.format(x, y))
Copy
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
Basically:
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original. This is is okay for flat mutable objects.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original. Use this for nested lists.
import copy as c
# for flat immutable objects
l1 = list(range(10))
l2 = c.copy(l1) # create a shallow copy
l3 = l1[:] # create a shallow copy
# for nested
l1 = [list(range(10)), 10, 11, 12]
l2 = c.copy(l1) # shallow copy
l2[0][0] = 111111 # this also changes l1
print(l1) # [[111111, 1, 2, 3, 4, 5, 6, 7, 8, 9], 10, 11, 12]
# solution
l1 = [list(range(10)), 10, 11, 12]
l3 = c.deepcopy(l1) # deepcopy recursively copies objects
l3[0][0] = 111111
print(l1) # [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 10, 11, 12]
Lists, generators and other interesting built-ins
For Python 3.x:
Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.
# built-in functions
q, r = divmod(9, 4) # 9//4 == 2, 9-(4*2) == 1
# returns the concatenation of the strings in the sequence seq.
dash_delimted_str = '-'.join(['a', 'b', 'c'])
number_list = list(range(10))
# basic list comprehension
# some_list = [ <output value> for <element> in <list> <optional criteria> ]
#
squared_list = [x**x for x in number_list if x < 5]
filtered_list = [x if x < 10 else -1 for x in number_list]
# using built-in filter
def less_than_five(x):
return x < 5
filtered_list = list(filter(less_than_five, number_list))
# using lambda
filtered_list = list(filter(lambda x: x < 5, number_list))
map, filter, reduce
# filter(function, sequence)
# applies the function to each element of the sequence
def three_five(x):
return x % 3 == 0 or x % 5 == 0
filter(three_five, range(100))
filter(lambda x: x % 3 == 0 or x % 5 == 0, range(100))
# map(function, sequence)
# calls function(element) of the sequence and returns the result as a list
def expd(x):
return x**x
map(expd, range(10))
# functools.reduce(function, sequence)
# returns a single value constructed by calling the binary function function
# on the first two items of the sequence, then on the result and
# the next item, and so on
def sum(seq):
def add(x, y):
return x + y
return reduce(add, seq)
# sum is a built-in, but this is how it's implemented
sum(range(1, 11))
def total(x, y):
return x + y
reduce(total, range(1, 11))
Dictionaries
Thank god Python 3 reduced the number of functions for iterating over a dictionary!
dict.items()
dict.keys()
dict.values()
# Return an **view** over the dictionary’s (k, v) pair, keys and values.
# Remember, these are dynamic views of the dictionary itself!
# other
dict.pop(key)
dict.clear()
Time
import datetime
import time
# getting current date time with milliseconds
# remember: datetime isn't time locale aware!
curr_local_datetime = datetime.datetime.now() # local time
curr_utc_datetime = datetime.datetime.utcnow() # utc time
# getting current epoch time
time.time() # seconds by default (float)
cur_utc_epoch = datetime.utcnow().timestamp() # converting from datetime
# converting epoch to datetime
curr_utc_datetime = datetime.utcfromtimestamp(time.time())
# strip the milliseconds from the time and convert to string
curr_local_str = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
curr_utc_str = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time.time()))
# getting the minutes/seconds etc.
curr_local_datetime.minute
# adding minutes/seconds to a datetime
# far easier to convert to epoch and add seconds
curr_local_datetime += datetime.timedelta(minutes=1, seconds=1)
curr_utc_epoch = time.time() + 61
Bit manipulation
"cd".encode("hex") # '6364'
hex(10) # 0xa
Logging
import logging
logging.basicConfig(filename='logfile.log',
filemode='a',
format='%(asctime)s,%(msecs)d %(name)s '
'%(levelname)s %(message)s',
datefmt='%H:%M:%S',
level=logging.DEBUG)
File
File open modes
'r'
open for reading (default)
'w'
open for writing, truncating the file first
'x'
open for exclusive creation, failing if the file already exists
'a'
open for writing, appending to the end of the file if it exists
'b'
binary mode
Return contents as bytes objects without any decoding
't'
text mode (default)
Contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given
'+'
open a disk file for updating (reading and writing)
'U'
universal newlines mode (deprecated)
Time complexity
Some of the optimized data structures Python for better performance summarized from the Python wiki.
For additional information for time complexity, see my other page
list
Copy
O(n)
O(n)
Append[1]
O(1)
O(1)
Pop last
O(1)
O(1)
Pop intermediate
O(k)
O(k)
Insert
O(n)
O(n)
Get Item
O(1)
O(1)
Set Item
O(1)
O(1)
Delete Item
O(n)
O(n)
Iteration
O(n)
O(n)
Get Slice
O(k)
O(k)
Del Slice
O(n)
O(n)
Set Slice
O(k+n)
O(k+n)
Extend[1]
O(k)
O(k)
Sort
O(n log n)
O(n log n)
Multiply
O(nk)
O(nk)
x in s
O(n)
min(s), max(s)
O(n)
Get Length
O(1)
O(1)
collections.deque
copy
O(n)
O(n)
append
O(1)
O(1)
appendleft
O(1)
O(1)
pop
O(1)
O(1)
popleft
O(1)
O(1)
extend
O(k)
O(k)
extendleft
O(k)
O(k)
remove
O(n)
O(n)
set
x in s
O(1)
O(n)
Union s|t
O(len(s)+len(t))
Intersection s&t
O(min(len(s), len(t))
O(len(s) * len(t))
replace "min" with "max" if t is not a set
Multiple intersection s1&s2&..&sn
(n-1)*O(l) where l is max(len(s1),..,len(sn))
Difference s-t
O(len(s))
dict
Copy[2]
O(n)
O(n)
Get Item
O(1)
O(n)
Set Item[1]
O(1)
O(n)
Delete Item
O(1)
O(n)
Iteration[2]
O(n)
O(n)
[1] = These operations rely on the "Amortized" part of "Amortized Worst Case". Individual actions may take surprisingly long, depending on the history of the container.
[2] = For these operations, the worst case n
is the maximum size the container ever achieved, rather than just the current size. For example, if N
objects are added to a dictionary, then N-1
are deleted, the dictionary will still be sized for N
objects (at least) until another insertion is made.
Classes
Terms:
data attributes: instance variables (don't need to be declared)
method attributes
When a class definition is entered, a new namespace is created, and used as the local scope — thus, all assignments to local variables go into this new namespace. In particular, function definitions bind the name of the new function here.
When a class definition is left normally (via the end), a class object is created. This is basically a wrapper around the contents of the namespace created by the class definition. The original local scope (the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name given in the class definition header.
Also to note, when changing class members, these are remembered when creating new instances of that class e.g.
class Foo():
static_var = 0 # i realise this should be called class member...
print(Foo.static_var) # 0
f = Foo()
print(f.static_var) # 0
Foo.static_var = 100
print(f.static_var) # 100
f.static_var = 99 # instance variable change
print(f.static_var) # 99
ff = Foo()
print(ff.static_var) # 100
Inheritance
# if an attribute is not found in the derived class namespace
# it is searched in the baseClassName and so on recursively
class DerivedClassName(modname.BaseClassName):
class_var = 0 # class attribute
def __init__(self):
self.instance_var = 0
# multiple levels of inheritance
# attributes will be search DFS, left-right (simplified)
class DerivedClassName(Base1, Base2, Base3):
# remember this will override the Base1 constructors!
def __init__(self):
# super can be used to access parent class members
suer
isinstance(child, parent)
: if the child type is the same or some class derived from parent.issubclass(child, parent)
: if child instance is a subclass (inheritance) of parent. e.g.bool
is subclass ofint
.
Private variables
Python doesn't support private variables (class-private members) but has some implementation detail and limited support for such mechanism called name mangling.
_var
: are by convention used for semiprivate variables for e.g. non-public api.something minor, when doing
from foo import *
,_names
are not included.
__var
: are by convention to indicate private variables.they cannot be accessed normally e.g.
instance.__var
- exceptionthey are replaced with
_classname__var
to prevent accidental access.
Misc
you can make an instance callable - not class - by implementing the
__call__
method.
Decorators
Obvious ones
@classmethod
: function belongs to the class that it is in.@staticmethod
: function is its own function - used for grouping/style e.g. like nested functions. It is useful for not polluting a module's namespace with free functions for example.
class A(object):
def foo(self,x):
print("executing foo({0},{1})".format(self,x))
@classmethod
def class_foo(cls,x):
print("executing class_foo({0},{1})".format(cls,x))
@staticmethod
def static_foo(x):
print("executing static_foo({0})".format(x))
a = A()
a.foo(10) # executing foo(<__main__.A object at 0x0000021ACE738F60>,10)
print(a.foo) # <bound method A.foo of <__main__.A object at 0x0000021ACE738F60>>
a.class_foo(100) # executing class_foo(<class '__main__.A'>,100)
print(a.class_foo) # <bound method A.class_foo of <class '__main__.A'>>
a.static_foo(1000) # executing static_foo(1000)
print(a.static_foo) # <function A.static_foo at 0x0000021ACE73B9D8>
Creating your own
Remember that decorators return the wrapped function i.e. a modified version of the function that was passed in as an argument. This can be resolved with functools.wraps
. There are a few ways to use them:
as classes
you must be aware that the
help()
may not work correctlythe workaround for decorators as wrappers is
functools.update_wrapper()
. This sets the attribute on the instance buthelp()
looks at the class as it is always called ashelp(instance)
. See more here on stackoverflow.
as nested functions
with/without arguments
Each method has a slightly different syntax that you should be aware of. Useful links:
Decorators with classes
import functools
class decorator_without_arguments(object):
def __init__(self, f):
"""If there are *no* decorator arguments, the function
to be decorated is passed to the constructor.
"""
print("Inside __init__()")
self.f = f
functools.update_wrapper(self, f) # update __name__ etc. attributes
def __call__(self, *args, **kwargs):
"""The __call__ method is not called until the
decorated function is called.
"""
print("Inside __call__()")
self.f(*args, **kwargs)
print("After self.f(*args, **kwargs)")
@decorator_without_arguments
def sayHello(a1, a2, a3, a4):
print('sayHello arguments:', a1, a2, a3, a4)
from functools import wraps
# notice the __init__ doesn't contain func
# notice there are 2 decorator args vs 3 function args
class decorator_with_arguments(object):
def __init__(self, arg1, arg2):
"""If there are decorator arguments, the function
to be decorated is *not* passed to the constructor!
"""
print("Inside __init__()")
self.arg1 = arg1
self.arg2 = arg2
def __call__(self, f):
"""If there are decorator arguments, __call__() is only called
once, as part of the decoration process! You can only give
it a single argument, which is the function object.
"""
print("Inside __call__()")
@wraps(f) # updates __name__, __doc__ attributes etc.
def wrapped_f(*args, **kwargs):
print("Inside wrapped_f()")
print("Decorator arguments:", self.arg1, self.arg2)
f(*args, **kwargs)
print("After f(*args, **kwargs)")
return wrapped_f
@decorator_with_arguments("hello", "world")
def sayHello(a1, a2, a3):
print('sayHello arguments:', a1, a2, a3)
sayHello("say", "hello", "argument")
sayHello("a", "different", "set of args")
# Inside __init__()
# Inside __call__() # after decoration
# Inside wrapped_f() # sayHello is called here
# Decorator arguments: hello world
# sayHello arguments: say hello argument list
# After f(*args) # after first sayHello() call
# Inside wrapped_f() # sayHello is called here
# Decorator arguments: hello world 42
# sayHello arguments: a different set of args
# After f(*args) # after second sayHello() call
Decorators with Functions
def decorator_without_arguments(func):
@wraps(func)
def wrapped_func(*args, **kwargs):
print('Inside wrapped_func - wrapping:', func.__name__)
func(*args, **kwargs)
print('Finished func(*args, **kwargs)')
return wrapped_func
def decorator_with_arguments(arg1, arg2):
def wrapper(func):
@wraps(func)
def wrapped_func(*args, **kwargs):
print('Inside wrapped_func - wrapping:', func.__name__)
func(*args, **kwargs)
print('Finished func(*args, **kwargs)')
return wrapped_func
return wrapper
Threading
Python 3.6 now provides some awesome simplification for threading. See concurrent.futures.
import concurrent.futures
# We can use a with statement to ensure threads are cleaned up promptly
# optional chunksize=1 can split a large iterable in map for speedup
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
execute.map(func, iterable)
pending = []
pending.append(executor.submit(func, *args)) # or use r.result()
# useful if delegating multiple tasks to threads
for future in concurrent.futures.as_completed(pending):
try:
r = future.result(timeout=10)
except concurrent.futures.TimeoutError:
print('Thread timeout of task')
Regex
a, X, 9
Ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: .
^
$
*
+
?
{
[
]
\
|
(
)
.
A period matches any single character except newline
\w
Matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]
. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word.
\W
Matches any non-word character.
\b
Boundary between word and non-word
\s
Matches a single whitespace character - space, newline, return, tab, form [ \n\r\t\f]
.
\S
Matches any non-whitespace character.
\t, \n,
Tab, newline, return.
\d
Decimal digit [0-9]
(some older regex utilities do not support but \d
, but they all support \w
and \s
).
^, $
Match the start or end of the string respectively.
\
Inhibit the specialness of a character.
So, for example, use \.
to match a period or \\
to match a slash. If you are unsure if a character has special meaning, such as @
, you can put a slash in front of it, \@
, to make sure it is treated just as a character.
match, search, findall
match
: only finds the string on the beginning i.e. ignore space delimited strings apart from the first one!search
: similar tomatch
but it will return the first match of the string.findall
: will return all matches as a list and we can also use group extraction quite nicely!
Tips
{1-4}
will match 104 times!
Email
str = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
print(match.group()) ## 'b@google'
Square brackets
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
print(match.group()) ## 'alice-b@google.com'
Group extraction
str = 'purple alice-b@google.com monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
print(match.group()) ## 'alice-b@google.com' (the whole match)
print(match.group(1)) ## 'alice-b' (the username, group 1)
print(match.group(2)) ## 'google.com' (the host, group 2)
Find all
## Suppose we have a text with many email addresses
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
## Here re.findall() returns a list of all the found email strings
## ['alice@google.com', 'bob@abc.com']
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str)
for email in emails:
# do something with each found email string
print(email)
# Open file
f = open('test.txt', 'r')
# Feed the file text into findall(); it returns a list of all the found strings
strings = re.findall(r'some pattern', f.read())
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples) ## [('alice', 'google.com'), ('bob', 'abc.com')]
for tuple in tuples:
print(tuple[0]) ## username
print(tuple[1]) ## host
Options
IGNORECASE
: Ignore upper/lowercase differences for matching, so 'a' matches both 'a' and 'A'.DOTALL
: Allow dot (.) to match newline -- normally it matches anything but newline. This can trip you up - you think.*
matches everything, but by default it does not go past the end of a line. Note that\s
(whitespace) includes newlines, so if you want to match a run of whitespace that may include a newline, you can just use\s*
MULTILINE
:Within a string made of many lines, allow^
and$
to match the start and end of each line. Normally^/$
would just match the start and end of the whole string.
# for search()` or `findall()` etc.
re.search(pattern, str, re.IGNORECASE)
Operator precedence
**
Exponentiation (raise to the power)
~
+
-
Complement, unary plus and minus (method names for the last two are +@ and -@)
*
/
%
//
Multiply, divide, modulo and floor division
+
-
Addition and subtraction
>>
<<
Right and left bitwise shift
&
Bitwise AND
^
|
Bitwise exclusive 'OR' and regular 'OR'
<=
<
>
>=
Comparison operators
<>
==
!=
Equality operators
=
%=
/=
//=
-=
+=
*=
**=
Assignment operators
is
is not
Identity operators
in
not in
Membership operators
not
or
and
Logical operators
Last updated