Generators: x = yield 42

And other applications

David Stuebe

www.swipely.com

Get the presentation

On Github IO: http://dstuebe.github.io/generators/

On NbViewer: http://nbviewer.ipython.org/github/dstuebe/generators/blob/gh-pages/presentation.ipynb

March 2, 2015

Copyright(C) 2015, David Stuebe

What is a generator?


    A kind of function that can return an intermediate result ("the next
    value") to its caller, but maintaining the function's local state so
    that the function can be resumed again right where it left off.

PEP 255 introduced the generator object and the yield statement in version 2.2 of Python.

In [138]:
def fib():
    a, b = 0, 1
    while True:
        yield b
        a, b = b, a+b
In [139]:
function_result = fib()
print function_result
<generator object fib at 0x112da18c0>

The result of calling the gererator function is a generator object. The generator object can be used as an iterator.

The generator is an iterator

In [140]:
function_result.__iter__() is function_result
Out[140]:
True
In [141]:
zip(xrange(10),function_result)
Out[141]:
[(0, 1),
 (1, 1),
 (2, 2),
 (3, 3),
 (4, 5),
 (5, 8),
 (6, 13),
 (7, 21),
 (8, 34),
 (9, 55)]
In [142]:
(10, function_result.next())
Out[142]:
(10, 89)

Inside a generator


Inside the generator we can see that execution is paused after the yield and state is maintained between calls to next

In [117]:
def noisy_generator():
    print '  Initializing'
    print '  first yield'
    yield "A"
    print '  generator running...'
    print '  second yield'
    yield "B"
    print '  generator running...'
    print '  now what?'
In [118]:
g = noisy_generator()
print 'Calling next...'
print 'Result 1: %s' % g.next()
print 'Result 2: %s' % g.next()
print 'Result 3: %s' % g.next()
Calling next...
  Initializing
  first yield
Result 1: A
  generator running...
  second yield
Result 2: B
  generator running...
  now what?
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-118-a4dd887bc460> in <module>()
      3 print 'Result 1: %s' % g.next()
      4 print 'Result 2: %s' % g.next()
----> 5 print 'Result 3: %s' % g.next()

StopIteration: 

Generator interface


In [181]:
help(g)
Help on generator object:

foo = class generator(object)
 |  Methods defined here:
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  close(...)
 |      close() -> raise GeneratorExit inside generator.
 |  
 |  next(...)
 |      x.next() -> the next value, or raise StopIteration
 |  
 |  send(...)
 |      send(arg) -> send 'arg' into generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  throw(...)
 |      throw(typ[,val[,tb]]) -> raise exception in generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  gi_code
 |  
 |  gi_frame
 |  
 |  gi_running

Generator interface: throw


In [182]:
help(g.throw)
Help on built-in function throw:

throw(...)
    throw(typ[,val[,tb]]) -> raise exception in generator,
    return next yielded value or raise StopIteration.

Looks fun - lets try it...

In [183]:
def accepting_generator():
    print 'Initializing'
    try:
        yield 'Hello'
        print 'generator running...'
    except StandardError as e:
        print "Error Message: %s" % e
        yield 'World'

g = accepting_generator()
g.next()
Initializing
Out[183]:
'Hello'
In [184]:
g.throw(StandardError, 'Foo bar baz')
Error Message: Foo bar baz
Out[184]:
'World'
In [185]:
g.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-185-d7e53364a9a7> in <module>()
----> 1 g.next()

StopIteration: 

Generator interface: close


In [447]:
help(g.close)
Help on built-in function close:

close(...)
    close() -> raise GeneratorExit inside generator.

In [448]:
def closeable():
    try:
        yield 1
        yield 2
    except GeneratorExit:
        print 'closing'
g = closeable()
g.next()
Out[448]:
1
In [449]:
g.close()
closing
In [450]:
g.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-450-d7e53364a9a7> in <module>()
----> 1 g.next()

StopIteration: 

Generator interface: send


In [451]:
help(g.send)
Help on built-in function send:

send(...)
    send(arg) -> send 'arg' into generator,
    return next yielded value or raise StopIteration.

In [452]:
g = fib()
g.send('foo')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-452-73713f4894f7> in <module>()
      1 g = fib()
----> 2 g.send('foo')

TypeError: can't send non-None value to a just-started generator

so next() is really just send(None)

Lets try that again...


In [453]:
g = fib()
g.send(None)
Out[453]:
1
In [454]:
g.send(None)
Out[454]:
1

What is send('foo') ?

In [455]:
g.send('foo')
Out[455]:
2

Where did it go?

What is generator.send?


    Coroutines are a natural way of expressing many algorithms, such as
    simulations, games, asynchronous I/O, and other forms of event-
    driven programming or co-operative multitasking.  Python's generator
    functions are almost coroutines -- but not quite -- in that they
    allow pausing execution to produce a value, but do not provide for
    values or exceptions to be passed in when execution resumes.  They
    also do not allow execution to be paused within the "try" portion of
    try/finally blocks, and therefore make it difficult for an aborted
    coroutine to clean up after itself.

PEP 342 added close, send and throw to the generator in version 2.5 of python and made yield an expresion rather than a statement.

In [143]:
def adder(val):
    x = 0
    while True:
        x = yield val+x
        
g = adder(5)
g.send(None)
Out[143]:
5
In [144]:
g.send(2)
Out[144]:
7
In [145]:
g.send(6)
        
Out[145]:
11

Inside a generator part 2

In [146]:
def noisy_coroutine():
    print '  Initializing'
    print '  first yield'
    received = yield "A"
    print '  generator running after receiving: %s' % received
    print '  second yield'
    received = yield "B"
    print '  generator running after receiving: %s' % received
    print '  now what?'
In [147]:
g = noisy_coroutine()
print 'Calling send...'
g.send(None)
Calling send...
  Initializing
  first yield
Out[147]:
'A'
In [148]:
g.send(1)
  generator running after receiving: 1
  second yield
Out[148]:
'B'
In [149]:
g.send(2)
  generator running after receiving: 2
  now what?
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-149-7a89fd35d27c> in <module>()
----> 1 g.send(2)

StopIteration: 
In [150]:
def foo():
    x = yield
    y = yield
    z = yield
    yield x, y, z
    
g = foo()
g.next()
g.send(1)
g.send(2)
print g.send(3)
(1, 2, 3)

Calling next to start the generator is a pain


Lets fix that with a decorator...

In [81]:
def consumer(func):
    def wrapper(*args,**kw):
        gen = func(*args, **kw)
        gen.next()
        return gen
    wrapper.__name__ = func.__name__
    wrapper.__doc__  = func.__doc__
    return wrapper
In [82]:
g = consumer(adder)(4)
print g.send(11)
print g.send(2)
15
6
In [83]:
@consumer
def subtractor(val):
    '''A generator that subtracts numbers from a value'''
    x = 0
    while True:
        x = yield x - val

g = subtractor(8)
g.send(15)
Out[83]:
7
In [84]:
help(subtractor)
Help on function subtractor in module __main__:

subtractor(*args, **kw)
    A generator that subtracts numbers from a value

Lets try and do something interesting


So far we have looked at simple examples but we can use generators for:

  • Iteration
  • Data flow
  • Concurrancy
  • Reformulate control flow

Iteration (a la David Beazley)


An example of setting up a processing pipeline with generators

In [418]:
# From http://www.dabeaz.com/coroutines/pipeline.py
def grep(pattern,lines):
    for line in lines:
        if pattern in line:
             yield line

import time
def follow(thefile):
    thefile.seek(0,2)      # Go to the end of the file
    while True:
         line = thefile.readline()
         if not line:
             time.sleep(0.1)    # Sleep briefly
             continue
         yield line

# Set up a processing pipe : tail -f | grep python
with open("access-log") as logfile:
    loglines = follow(logfile)
    pylines  = grep("python",loglines)

    # Pull results out of the processing pipeline
    for line in pylines:
        print line,
217.237.150.206 - - [02/Apr/2015:00:47:42 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE060.HTM HTTP/1.0" 200 1686
129.106.32.126 - - [02/Apr/2015:00:47:43 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE006.HTM HTTP/1.0" 200 1254
66.91.239.214 - - [02/Apr/2015:00:47:44 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE014.HTM HTTP/1.0" 200 1232
217.219.18.80 - - [02/Apr/2015:00:47:46 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE090.HTM HTTP/1.0" 200 2001
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-418-8786720a03f2> in <module>()
     21 
     22     # Pull results out of the processing pipeline
---> 23     for line in pylines:
     24         print line,

<ipython-input-418-8786720a03f2> in grep(pattern, lines)
      1 # From http://www.dabeaz.com/coroutines/pipeline.py
      2 def grep(pattern,lines):
----> 3     for line in lines:
      4         if pattern in line:
      5              yield line

<ipython-input-418-8786720a03f2> in follow(thefile)
     11          line = thefile.readline()
     12          if not line:
---> 13              time.sleep(0.1)    # Sleep briefly
     14              continue
     15          yield line

KeyboardInterrupt: 

Data Flow (a la David Beazley)


An example of setting up a similar pipeline with coroutines

In [423]:
import time
def follow(thefile, target):
    thefile.seek(0,2)      # Go to the end of the file
    while True:
         line = thefile.readline()
         if not line:
             time.sleep(0.1)    # Sleep briefly
             continue
         target.send(line)

@consumer
def grep(pattern, target):
    print "Looking for %s" % pattern
    while True:
        line = (yield)
        if pattern in line:
            target.send(line),

@consumer
def printer():
    while True:
         line = (yield)
         print line,

with open("access-log") as logfile:
    follow(logfile, grep('python',printer() ) )
Looking for python
66.249.65.37 - - [02/Apr/2015:00:51:57 -0600] "GET /papers/python.html HTTP/1.1" 404 133
128.135.125.245 - - [02/Apr/2015:00:51:58 -0600] "GET /python/tutorial/beazley_intro_python/intropy.pdf HTTP/1.0" 304 -
194.105.57.11 - - [02/Apr/2015:00:52:01 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE002.HTM HTTP/1.0" 200 1352
189.141.19.88 - - [02/Apr/2015:00:52:03 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE096.HTM HTTP/1.0" 200 1671
123.190.193.8 - - [02/Apr/2015:00:52:03 -0600] "GET /python/tutorial/beazley_advanced_python/Slides/SLIDE059.HTM HTTP/1.0" 200 1694
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-423-0ade070e4e58> in <module>()
     24 
     25 with open("access-log") as logfile:
---> 26     follow(logfile,grep('python',printer() ) )

<ipython-input-423-0ade070e4e58> in follow(thefile, target)
      5          line = thefile.readline()
      6          if not line:
----> 7              time.sleep(0.1)    # Sleep briefly
      8              continue
      9          target.send(line)

KeyboardInterrupt: 

Are generators fast for data flow?


Lets start with Beazely's Benchmark example

In [5]:
# An object
class GrepHandler(object):
    def __init__(self,pattern, target):
        self.pattern = pattern
        self.target = target
    def send(self,line):
        if self.pattern in line:
            self.target.send(line)

# A coroutine
@consumer
def grep(pattern,target):
    while True:
        line = (yield)
        if pattern in line:
            target.send(line)

# A null-sink to send data
@consumer
def null(): 
    while True: item = (yield)

# A benchmark
line = 'python is nice'
p1   = grep('python',null())          # Coroutine
p2   = GrepHandler('python',null())   # Object

from timeit import timeit

print "coroutine:", timeit("p1.send(line)", "from __main__ import line, p1")

print "object:", timeit("p2.send(line)", "from __main__ import line, p2")
coroutine: 0.326890945435
object: 0.381680011749
In [114]:
%timeit p1.send(line)
The slowest run took 9.04 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 317 ns per loop

Beware - ipython magic %timeit does not play nice with generator send calls!

Lets a problem that calls send a few more times...


The idea for this benchmark comes from a stack overflow question a-faster-nested-tuple-to-list-and-back

I'm trying to perform tuple to list and list to tuple conversions on nested sequences of unknown depth and shape. The calls are being made hundreds of thousands of times, which is why I'm trying to squeeze out as much speed as possible.

First, lets define a function to make some test data...

In [153]:
def make_test(m, n):
  return [[range(m), make_test(m, n-1)] for i in range(n)]
make_test(2,3)
Out[153]:
[[[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]],
 [[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]],
 [[0, 1], [[[0, 1], [[[0, 1], []]]], [[0, 1], [[[0, 1], []]]]]]]
In [154]:
def list2tuple(a):
  return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))

def tuple2list(a):
  return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

list2tuple(make_test(2,3))
Out[154]:
(((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))),
 ((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))),
 ((0, 1), (((0, 1), (((0, 1), ()),)), ((0, 1), (((0, 1), ()),)))))
In [155]:
make_test(2,3) == tuple2list(list2tuple(make_test(2,3)))
Out[155]:
True

Now lets try a solution using coroutines


In [156]:
@consumer
def colist2tuple():
  """Convertes lists to tuples"""
  result = []
  while True:
    lst, l2t = yield result
    result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) )
    
l2t = colist2tuple()
l2t.send((make_test(2,3),l2t))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-156-3030118d1c72> in <module>()
      8 
      9 l2t = colist2tuple()
---> 10 l2t.send((make_test(2,3),l2t))

<ipython-input-156-3030118d1c72> in colist2tuple()
      5   while True:
      6     lst, l2t = yield result
----> 7     result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) )
      8 
      9 l2t = colist2tuple()

<ipython-input-156-3030118d1c72> in <genexpr>((x,))
      5   while True:
      6     lst, l2t = yield result
----> 7     result = tuple((l2t.send((x, l2t)) if isinstance(x,list) else x for x in lst) )
      8 
      9 l2t = colist2tuple()

ValueError: generator already executing

Generators are not recursive...

Try a worker pool of generators


In [157]:
@consumer
def colist2tuple():
  """Convertes lists to tuples"""
  result = []
  while True:
    (lst, pool) = yield result
    result = tuple((pool[0].send((x, pool[1:])) if isinstance(x,list) else x for x in lst) )

@consumer
def cotuple2list():
  """converts tuples to lists"""
  result = None
  while True:
    (tup, pool) = (yield result)
    result = list((pool[0].send((x, pool[1:])) if isinstance(x, tuple) else x for x in tup))
    
class GenPool:
    def __init__(self, gen_func, depth):
        self.pool = [gen_func() for i in xrange(depth) ]
    def send(self,iterable):
        return self.pool[0].send((iterable, self.pool[1:]))
        
l2t_pool = GenPool(colist2tuple,5)
t2l_pool = GenPool(cotuple2list,5)

make_test(3,2) == t2l_pool.send(l2t_pool.send(make_test(3,2)))
Out[157]:
True

How about doing it inplace for the lists?


In [158]:
def inplace_list2tuple(a):
  for (i,x) in enumerate(a):
    if isinstance(x,list):
      a[i] = inplace_list2tuple(x)
  return tuple(a)
  
def inplace_tuple2list(a):
  a = list(a) # can't really modify a tuple in place...
  for (i,x) in enumerate(a):
    if isinstance(x,tuple):
      a[i] = inplace_tuple2list(x)
  return a
    
make_test(2,3) ==  inplace_tuple2list(inplace_list2tuple(make_test(2,3)))
Out[158]:
True
In [166]:
@consumer
def inplace_colist2tuple():
  """Convertes lists to tuples"""
  result = None
  while True:
    (lst, co_pool) = (yield result)
    for (i,x) in enumerate(lst):
      if isinstance(x,list):
        lst[i] = co_pool[0].send((x, co_pool[1:]))
    result = tuple(lst)
    
@consumer
def inplace_cotuple2list():
  """converts tuples to lists"""
  result = None
  while True:
    (tup, co_pool) = (yield result)
    result = list(tup)
    for (i,x) in enumerate(result):
      if isinstance(x,tuple):
        result[i] = co_pool[0].send((x, co_pool[1:]))
        
l2t_pool = GenPool(inplace_colist2tuple,5)
t2l_pool = GenPool(inplace_cotuple2list,5)

make_test(3,2) == t2l_pool.send(l2t_pool.send(make_test(3,2)))
Out[166]:
True

Lets timeit!


In [168]:
import timeit
breadth,depth = 25,8
number,repeat = 2,3
l2t_pool = GenPool(colist2tuple,breadth)
t2l_pool = GenPool(cotuple2list,breadth)
inplace_l2t_pool = GenPool(inplace_colist2tuple,breadth)
inplace_t2l_pool = GenPool(inplace_cotuple2list,breadth) 
In [40]:
# Compare round trip operations
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('t2l_pool.send(l2t_pool.send(test_data))', setup='from __main__ import t2l_pool, l2t_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('tuple2list(list2tuple(test_data))', setup='from __main__ import tuple2list, list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['3.18855', '3.21685', '2.94067']
Recursive ['2.46753', '2.82640', '3.27728']

Lets timeit!


In [41]:
# Compare round trip operations using inplace for list to tuple
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))', setup='from __main__ import inplace_l2t_pool, inplace_t2l_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_tuple2list(inplace_list2tuple(test_data))', setup='from __main__ import inplace_tuple2list, inplace_list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['1.54234', '1.54228', '1.51895']
Recursive ['1.38003', '1.37990', '1.37279']
In [42]:
# Compare just list2tuple using inplace
print "Generator %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_l2t_pool.send(test_data)', setup='from __main__ import inplace_l2t_pool, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
print "Recursive %s" % ["%0.5f" % (v/number) for v in timeit.repeat('inplace_list2tuple(test_data)', setup='from __main__ import inplace_list2tuple, make_test, depth, breadth; test_data = make_test(breadth, depth)', number=number, repeat=repeat)]
Generator ['0.51324', '0.51948', '0.50737']
Recursive ['0.44689', '0.45245', '0.44493']

Generators are not magic - a function call is still a function call and the gil still serializes the process.

In [146]:
test_data = make_test(breadth,depth)
# Courtine
%timeit t2l_pool.send(l2t_pool.send(test_data))
# Recursive
%timeit tuple2list(list2tuple(test_data))
1 loops, best of 3: 305 ms per loop
1 loops, best of 3: 271 ms per loop
In [35]:
inplace_l2t_pool = GenPool(inplace_colist2tuple,breadth)
inplace_t2l_pool = GenPool(inplace_cotuple2list,breadth) 
test_data = make_test(breadth,depth)
%timeit inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))
test_data = make_test(breadth,depth)
%timeit inplace_tuple2list(inplace_list2tuple(test_data))
100 loops, best of 3: 15.3 ms per loop
100 loops, best of 3: 13.5 ms per loop
In [156]:
# Generator
test_data = make_test(breadth,depth)
%timeit inplace_l2t_pool.send(test_data)
# Recursive
test_data = make_test(breadth,depth)
%timeit inplace_list2tuple(test_data)
The slowest run took 51639.38 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.63 µs per loop
The slowest run took 56308.31 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.17 µs per loop

Lets try HYRY's Cython


In [43]:
import cython
cython_list2tuple, cython_tuple2list = cython.inline(
    """
    @cython.profile(True)
    def cython_list2tuple(a):
        return tuple([cython_list2tuple(x) if type(x)==list else x for x in a])

    @cython.profile(True)
    def cython_tuple2list(a):
        return [cython_tuple2list(x) if type(x)==tuple else x for x in a]
    """
    ).values() # it returns a dict of named functions

make_test(3,2) == cython_tuple2list(cython_list2tuple(make_test(3,2)))
Out[43]:
True
In [44]:
print "Cython:  %s" % ["%0.5f" % (v/number) for v in timeit.repeat('cython_tuple2list(cython_list2tuple(t))', setup='from __main__ import cython_list2tuple, cython_tuple2list, make_test, depth, breadth; t = make_test(breadth, depth)', number=number, repeat=repeat)]
Cython:  ['0.40505', '0.38909', '0.40008']

As expected, cython is blazing fast

Lets try cProfile


In [163]:
import cProfile
In [164]:
test_data = make_test(breadth, depth)
cProfile.run('tuple2list(list2tuple(test_data));')
         13590406 function calls (6137622 primitive calls) in 5.433 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 328801/1    0.942    0.000    2.520    2.520 <ipython-input-154-19d1758e8e98>:1(list2tuple)
3397601/9    1.042    0.000    2.520    0.280 <ipython-input-154-19d1758e8e98>:2(<genexpr>)
 328801/1    1.159    0.000    2.853    2.853 <ipython-input-154-19d1758e8e98>:4(tuple2list)
3397601/9    1.121    0.000    2.853    0.317 <ipython-input-154-19d1758e8e98>:5(<genexpr>)
        1    0.060    0.060    5.433    5.433 <string>:1(<module>)
  6137600    1.109    0.000    1.109    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


In [165]:
test_data = make_test(breadth, depth)
cProfile.run('inplace_tuple2list(inplace_list2tuple(test_data));')
         6795204 function calls (6137604 primitive calls) in 3.892 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 328801/1    1.019    0.000    1.614    1.614 <ipython-input-158-935485dfecde>:1(inplace_list2tuple)
 328801/1    1.684    0.000    2.228    2.228 <ipython-input-158-935485dfecde>:7(inplace_tuple2list)
        1    0.050    0.050    3.892    3.892 <string>:1(<module>)
  6137600    1.140    0.000    1.140    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Lets try cProfile


In [49]:
test_data = make_test(breadth, depth)
cProfile.run('cython_tuple2list(cython_list2tuple(test_data));')
         657604 function calls (4 primitive calls) in 1.422 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.053    0.053    1.422    1.422 <string>:1(<module>)
 328801/1    0.689    0.000    0.689    0.689 _cython_inline_2b6dbefcfcbecd965f1c99dd441375e0.pyx:12(cython_tuple2list)
 328801/1    0.680    0.000    0.680    0.680 _cython_inline_2b6dbefcfcbecd965f1c99dd441375e0.pyx:8(cython_list2tuple)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Lets try cProfile


In [169]:
test_data = make_test(breadth, depth)
cProfile.run('t2l_pool.send(l2t_pool.send(test_data))')
         14248010 function calls (6137626 primitive calls) in 5.771 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 328801/1    0.804    0.000    2.583    2.583 <ipython-input-157-d591fe3bcabe>:1(colist2tuple)
3397601/9    1.090    0.000    3.188    0.354 <ipython-input-157-d591fe3bcabe>:15(<genexpr>)
        2    0.000    0.000    5.771    2.886 <ipython-input-157-d591fe3bcabe>:20(send)
3397601/9    1.147    0.000    2.583    0.287 <ipython-input-157-d591fe3bcabe>:7(<genexpr>)
 328801/1    1.505    0.000    3.188    3.188 <ipython-input-157-d591fe3bcabe>:9(cotuple2list)
        1    0.000    0.000    5.771    5.771 <string>:1(<module>)
  6137600    1.098    0.000    1.098    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 657602/2    0.128    0.000    5.771    2.886 {method 'send' of 'generator' objects}


In [171]:
test_data = make_test(breadth, depth)
cProfile.run('inplace_t2l_pool.send(inplace_l2t_pool.send(test_data))')
         7452808 function calls (6137608 primitive calls) in 3.864 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    3.864    1.932 <ipython-input-157-d591fe3bcabe>:20(send)
 328801/1    1.060    0.000    1.665    1.665 <ipython-input-166-ce8d713e0c92>:1(inplace_colist2tuple)
 328801/1    1.594    0.000    2.199    2.199 <ipython-input-166-ce8d713e0c92>:12(inplace_cotuple2list)
        1    0.000    0.000    3.864    3.864 <string>:1(<module>)
  6137600    1.102    0.000    1.102    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 657602/2    0.108    0.000    3.864    1.932 {method 'send' of 'generator' objects}


Lets play with dot product


In [174]:
import numpy
import itertools
# make some data
n = 1000000
a = numpy.random.randn(n)
b = numpy.random.randn(n)
number,repeat = 2,3
In [175]:
print "Numpy Dot Product" 
print "Value %s" % numpy.dot(a,b)
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('numpy.dot(a,b)', setup='from __main__ import numpy, a, b', number=number, repeat=repeat)]
Numpy Dot Product
Value 731.284297837
Time ['0.00087', '0.00082', '0.00081']
In [177]:
def naive_loop(it):
    result = numpy.float64(0.0)
    for (a_val,b_val) in it:
        result += a_val*b_val
    return result
print "Naive Dot Product"
print "Value: %s" % naive_loop(numpy.nditer([a, b]))
print "Time: %s" % ["%0.5f" % (v/number) for v in timeit.repeat('naive_loop(it)', setup='from __main__ import numpy, naive_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Naive Dot Product
Value: 731.284297837
Time: ['0.32970', '0.33121', '0.33032']

Coroutine Dot Product


In [176]:
@consumer
def mult(target=None):
    result = None
    while True:
        (a, b) = (yield result)
        result = target.send(a*b)
        
@consumer
def add():
    result = numpy.float64(0.0)
    while True:
        m = (yield result)
        result += m
        
dot_product_process = mult(add())
 
def gen_loop(it):
    dot_product = None
    for a_val,b_val in it: 
        dot_product = dot_product_process.send((a_val, b_val))
    return dot_product

print "Coroutinte Dot Product"
print "Value: %s" % gen_loop(numpy.nditer([a, b]))
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('gen_loop(it)', setup='from __main__ import numpy, gen_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Coroutinte Dot Product
Value: 731.284297837
Time ['0.49758', '0.49486', '0.50892']
In [187]:
# Coroutine processes without using return...
@consumer
def mult_allocated_result(target):
    while True:
        (a, b) = (yield)
        result = target.send(a*b)
                
@consumer
def add_allocated_result(result):
    while True:
        m = (yield)
        result[0] += m
 
result = numpy.zeros((1,))
dot_product_process_noresult = mult_allocated_result(add_allocated_result(result))
  
def allocated_result_loop(it):
    dot_product = None
    for a_val,b_val in it: 
        dot_product_process_noresult.send((a_val, b_val))

print "Coroutine Dot Product:" 
allocated_result_loop(numpy.nditer([a, b]))
print "Value: %s" % result
print "Time %s" % ["%0.5f" % (v/number) for v in timeit.repeat('allocated_result_loop(it)', setup='from __main__ import numpy, allocated_result_loop, a, b; it = numpy.nditer([a, b])', number=number, repeat=repeat)]
Coroutine Dot Product:
Value: [ 731.28429784]
Time ['0.59082', '0.57764', '0.61219']

Try cProfile


In [77]:
n = 100000
a = numpy.random.randn(n)
b = numpy.random.randn(n)
In [78]:
cProfile.run('gen_loop(numpy.nditer([a, b]))')
         400003 function calls (300003 primitive calls) in 0.205 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.091    0.000    0.138    0.000 <ipython-input-65-4e20301a8ce3>:1(mult)
        1    0.046    0.046    0.205    0.205 <ipython-input-65-4e20301a8ce3>:17(gen_loop)
   100000    0.026    0.000    0.026    0.000 <ipython-input-65-4e20301a8ce3>:8(add)
        1    0.000    0.000    0.205    0.205 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
200000/100000    0.042    0.000    0.159    0.000 {method 'send' of 'generator' objects}


In [79]:
cProfile.run('naive_loop(numpy.nditer([a, b]))')
         3 function calls in 19.887 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   19.887   19.887   19.887   19.887 <ipython-input-66-21b9c16a1a7f>:1(naive_loop)
        1    0.000    0.000   19.887   19.887 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Unfortunately, we don't get a lot of insight into why the naive loop is soo slow.

More Data Flow


In [95]:
TARGETS = 'targets'
class AnalysisWindowComplete(Exception):
    """Used in coroutine control flow, this is not an error condition."""
 
@consumer
def myprinter(name):
    while True:      
        p = (yield) 
        print "  PrinterName: %s; says: %s" % (name, p)
        
In [104]:
@consumer
def average(targets):
    try:
        while True:
            cnt, result = 0, 0.0   
            try:
                
                while True:      
                    result += (yield) 
                    cnt += 1
        
            except AnalysisWindowComplete as wc:
                print '  In complete with:', wc
                for target in targets:
                    target.send(result/cnt)
            
    except (ValueError, IndexError) as e:
        raise
        

take the average of a few values


In [105]:
avg_co = average([myprinter('#1'),] )
 
avg_co.send(5.)
avg_co.send(5.)
avg_co.send(5.)
avg_co.send(4.)
print "Call Complete"
avg_co.throw(AnalysisWindowComplete,'foobar')
 
Call Complete
  In complete with: foobar
  printer #1 says: 4.75
In [106]:
def get_targets(co):
    '''Get the targets from the generators frame locals'''
    try:
        return co.gi_frame.f_locals[TARGETS]
    except KeyError as ke:
        raise KeyError('No target key found?')
                  
def set_targets(co, targets):
    """Set new targets after the generator has started!"""
    t = get_targets(co)
    while len(t) > 0:
        t.pop()
    
    for target in targets:
        t.append(target)
         
In [108]:
            
# Now continue using the same coroutine workflow for next analysis
avg_co.send(6)
avg_co.send(6)
avg_co.send(7)
 
# change where things go...
set_targets(avg_co,(myprinter('#2'),myprinter('#3')) )
 
avg_co.throw(AnalysisWindowComplete,'Print twice!')
  In complete with: Print twice!
  printer #2 says: 6.33333333333
  printer #3 says: 6.33333333333

Data Processing Chains


Generators can be used to build some really powerful data processing flows

  • The lines before the first yield are valuable
  • Reflection and reasoning on the processing chain is possible
  • Pass by reference and allocated numpy types are your friend
  • Combine with MPI4Py to make parallel applications

Pitfalls

  • Code is generally not pretty
  • Beware of tight loops
  • Count function calls and object initialization

Further reading: David M. Beazley

With lots more pictures than this talk...

Generator Tricks for Systems Programmers

http://www.dabeaz.com/generators-uk/

A Curious Course on Coroutines and Concurrency

http://www.dabeaz.com/coroutines/

Generators: The Final Frontier

http://www.dabeaz.com/finalgenerator/

Combining Context Managers, Decorators and Generators in Python 3.0 for control flow of inline thread execution.

In [ ]: