Generators: the Final Frontier

All tutorials in the series can be found here:

  1. http://www.dabeaz.com/generators
  2. http://www.dabeaz.com/coroutines
  3. http://www.dabeaz.com/finalgenerator

The bulk of the generator functionality has been around since Python 2.5:

  • close()
  • send()
  • next()

Since python 3.x

  • yield and return within the same function are both allowed. Prior to this, including both within the same function was considered a syntax error.
  • Using return within a generator has "interesting" behaviour. It raises a StopIteration exception with the value of return. I.e. one can catch an error from a generator as such:

    def my_generator(): yield 'foo' yield 'bar' return 0

    g = mygenerator() mygenerator.next() # 'foo' mygenerator.next() # 'bar' try: mygenerator.next() except StopIteration, e: e.value # 0

yield from [generator] is available in python 3.2? and it does the iteration for you (and a whole lot more)

"Poutine code" == "It seemed like a good idea at that time"

John Perry Barlow keynote

Grateful Dead lyricist. Constrasted the concept of physical property and IP. Adam Smith got scarcity and value relationship right wrt physical items. However, songs become more valuable the more people who know them. The best song in the world inside my head is worthless. "Had an aversion to authority... LSD tends to do that to you". Talks amost everyday w/ Edward Snowden, but also has consulted with the NSA and CIA for 20+ years. When asked about whether he had given up on regulation: "If I didn't believe in the power of regulation, I wouldn't be sending out armies of EFF lawyers every day".

Was challenged as being "contradictory" by a woman to claimed to want "less government". Responded by saying that he believed in government for universal health care ("if you loose a leg and have to go to the ER..."). But that he believed local government was more direct and responsive. "...hoping that there will be a City-State Renessaince that is as big as The Renessaince".

Character Encoding

  • Know your encoding
  • Use the unicode sandwich
  • Test your application with different character encodings

chardet

mixed encoding / mojibake

python-ftfy package fixes things.

strict behaviour for .encode('cp1252', 'strict') raises exception .encode('cp1252', 'ignore') # missing characters .encode('cp1252', 'replace') # replaces (duh!)

Byte order marks. e.g. UTF-16, byte at beginning... big/little

Decorators

http://bit.ly/dec-pycon-2014

Uses

  • setup/teardown
  • diagnostics

Graham will be talking later today... first track, last talk

Good decorators are versatile and can be applied to any function e.g. inner function signature should use args, *kwargs make sure to fix func.__name__ ... inner.__name__ = wrapped.__name__ also __doc__ etc. arg spec... though, that's mch more complicated... see Graham Dumpleton's "wrapt" package... http://bit.ly/decorators2014

decorators with arguments

@unittest.skipIf()
simplified
def skipIf(conditional, message):
    def dec(wrapped):
    def inner(*args, **kwargs):
        if not conditional:

...

More examples:

  • counting how many times a function has been called
  • use start time and end time to time a function call
  • combined: time elapsed

Porting your Application to Python3

  1. Straddling Python 2/3 in single codebase
  2. Choosing target Python versions
    1. Add test coverage to reduce risk
  3. Cover C extension
  4. Porting hygene

Based on porting 180 kLOC Python, ~25 kLOC c codebase

  • Can't really port once and abandon... not feasible
  • Use 2to3 as a useful starting point... but it's painfully slow on large codebases
  • Straddling is best

    • use "compatible subset" of python syntax
    • conditional imports masks stdlib changes
    • six module can help (but you might not need it, and it's too heavy for the speaker)
  • Syntax changes < 2.6 are hard

    • no b'' literal
    • No except Exception as e:, in python3 there is no except Exception, e
  • Much more cruft / pain
  • 2.4 / 2.5 are long past EOL
  • Incompatibilities make Python3 < 3.2 hard
    • PEP 3333 fixes WSGI in Py3k
    • callable() restored in 3.2
  • 3.3 restores u'' literals
  • 3.2 is "system Python3" on some LTS systems
    • Ubuntu/debian have just jumped to 3.4 for their LTS

Summary: 2.6, 2.7, 3.2+, maybe drop 2.6

Risks

  • Bug injection
  • Fear of breaking working software
  • Fight fear with confidence; improve testing regime
    • improve testing
    • modernized idioms in Python2
    • use with statement
    • exception naming
    • bytes vs. text for all literals, do not use bare literals, ever
    • print is now function like, rather than a statement
    • don't use StringIO.StringIO... use io.BytesIO instead
    • clarity in text vs. bytes (have to do this)

Bottom-Up Porting

  • Port packages with no dependencies first (bottom of the dependency tree)
  • The port packages with already-ported deps
    • Note Python version supported by deps!
    • May have write a pull request for deps
  • Lather, rinse, repeat
  • Finally! Port the application

  • See the book: "Common subset" idioms

  • python2.7 -3 option can point out problem areas

Testing Avoids Hair-Loss

  • Untested code is where the bugs go to hide
  • 100% coverage is ideal before porting
    • unit tests preferable for libraries
    • functional tests best for applications
  • Measure coverage with http://pypi.python.org/pypi/coverage
  • Work to improve assertions as well as coverage
    • assert contracts, not implementation details
  • doctests make straddling really hard
  • replace doctests with unit/functional tests
  • if at all feasible, convert doctests to Sphinx examples
  • automate running tests
  • tox helps ensure that tests pass under all supported versions (including pypy)

Considerations for C Extensions

  • http://python3porting.com/cextensions.html
  • Testing C is harder! (duh)
  • have a reference implementation in Python
    • easier to test
    • supports PyPy
    • might even run faster under PyPy than it would in C anyways

The Birth and Death of JavaScript

JavaScript: The Good Parts asm.js - compile as closely as possible to native code Python interpreter that is writte in javascript A game... 250 kLOC of C code compiled to asm.js running in a browser at half native speed

INTEGERS

Compile a program to native javascript and run it in the browser

GIMP running in Chrome

  • virtual memory
  • functions
  • system calls
    • push registers
    • fire interrupt
    • trap
    • switch to ring 0
    • swtich VM table
    • jump

Metal

  • 25-33% perfomance costs for hardware isolation

Kernel + asm.js + DOM at full native speed

VM provides protection asm/VM loss: -20% Metal gain: +20%

Gained 4% performance (1.2 * .8)

Hard to sell in 2014. Most programmers don't undestand the fundamentals VMs were very high level

Portability - no CPU portability

Binaries are dead C infrastructure is dead Execution is faster JavaScript is dead JavaScript had to be bad and popular (to get where we were) JavaScript Lost but Programming Won

Gary Bernhardt http://destroyallsoftware.com Screencasts for Serious Developers

Puppet Modules: Apps for Ops

https://github.com/jbronn https://speakerdeck.com/jbronn

Why do we need it: Complexity

Don't Repeat Yourself - do not configure manually

Why Puppet?

It's ruby. Yuck. SaltStack? crypt.py has been the source of problem. Ansible? Licensing.

Puppet Wide OS support. Explicit. Bunch of other things.

Puppet Language

Ruby DSL .pp files Compiled into a directed acyclic graph

package {'openssh-server':
  ensure => installed,
  }
package {'Django':
  ensure => '1.6.2',
  provider => 'pip',
}

Declarative: cannot assign a name again

string interpolation Flow control: if, case, etc.

Facts

$ facter | less

quickly get info about the ystem

Resource Ordering

How to set up relationships between resources first have to know how to reference them

Modules

containers to Puppet code Analogous to Python Packages

Modulefile - analgous to setup.py manifests/init.pp analagous to init.py "class", a bag of resources... don't think about it in an OO way. templates - similar to django templates.... files - will not be interpreted or evaluated lib - beyond scope. Custom extensions to puppet itself.

Mix and match. Have small units of abstraction and build them up to something bigger pip provider install from git sources w/ existing pip provider

pipx provider can add --index-url venv {'/srv/pyapp'}

venv_pacakge type

Class Parameters

rather than hard code versions, can add parens between class and spec to write dynamically treat the module as another type of resource

ERB

Ruby Templates

Running Puppet

  • Standalone
    • Customize
    • Evaluate - lazy, put it in a file
  • Centralize
    • The Puppetmaster
    • Node + Master
    • Can centralize configuration and secrets in one place and then can track resources across everything
    • pypuppetboard

Module Development

Use Vagrant. A must have for devops

vagrant up done

What is Async, How Does It Work, and When Should I Use It?

Subs

Analogy of Sub shop, client and sandwhich maker, CPU-bound web service Can scale out by making the sandwhich maker faster, or add more sandwhich makers, but concurrancy doesn't make sense Throughput bound by computaion. No need for async here. Do not use async here.

Pizza

Client to pizza cook slices are made but are put in the oven for 2 minutes. Both sides are idling until slice is hot. Scaling? Here you don't want the cook to last for the full interaction. Cook should be asysc and respond to events as they occur. Clients arrive. Slices get hot. Cook should respond to those events. Problem? With lots of concurrent clients and slices, nemonic is required to keep track of stuff. Analogous to server w/ big backend (database, OAuth service, etc.). Throughput bound by memory.

Omakase

Clients show up. Waiter tells the kitchen that there are lots of people there. Chef makes lots of stuff. Small number of waiters are event-driven and can multi-task. Like a websocket application. Clients connect to server and wait for server events. Connection remains idle until new e-mail arrives on the server. Long life, lots of them are open simultanously. Minimize resources for idle connections. Main resource that you need to conserve is for memory.

Async must store per-connection state as efficiently as possible.

C10K ("ten thousand concurrent connections") is influential paper on this matter.

Orders of magnitude may have changed since then, but still if you use 1 thread per socket, you will exhaust your resources.

So... why is multithreading still the conventional approach? Answer: it's hard to code. I.e. when you're waiting for something to happen, you have to remember what you were going to do when the event occurs. Exceptionally true, since the wait is now indefinitely long... you don't know when that event will happen (e.g. an e-mail arriving).

Memory per connection is large for threads but programming is (supposedly) easier. Callbacks is (supposedly) harder to program, but requires much much less memory per connection.

  • Threads
  • Callbacks

and middle ground implementations include

  • Coroutines
  • Greenlets

etc.

So what is async?

  • Single-threaded
  • I/O concurrency
  • Non-blocking sockets
  • epoll / kqueue
  • Event loop

So, let's switch to a specific implementation, asyncio.

asyncio/"Tulip"

  • It is now in the Python 3.4 standard library
  • Implementation of PEP 3156
  • Standard event loop
  • Coroutines

Example: chat application

two browser windows connect to the same server via websocket When one window posts, both windows get the mesage

Layers:

  • example.py / Application
  • autobahn websockets / Protocol
  • asyncio / {Transport | Event Loop}
  • / Selectors

asyncio is relatively low level. It gives you Transport and the Event Loop, but won't give you Protocol or Application

Selectors is a new module in the Python 3.4 stdlib and abstracts all of the differences away between epoll/kqueue and let's you generally ask for event notifications reguardless of the OS on which you are working (yay!)

sock.setblocking(False)

makes it so that any operation on the socket either suceeds or fails instantly. It will never block (though it may raise an exception).

callback can do anything including adding new callback to the ready queue. We don't want to call those now... any secondary actions have to wait until the next iteration through the loop. This prevents starvation, among other things.

Review

  • asyncio uses non-blocking sockets
  • event loop tracks sockets, and the callbacks waiting for them.
  • selectors wait for network events
  • event loop runs callbacks

Should I use it? Yes: * slow backend * websockets * many connections

No: * cpu-bound * no async driver * no async expertise

One other thing people get wrong: use an async framework with a blocking driver (i.e. tornado + mysql)

Final issue that you have to deal with is... what if you don't actually know async very well. You need to be judicious in when you have time to learn it correctly, and you need to decide when that moment is. It may not be easy.

Doesn't have to be all or nothing, of course.

TODO read PEP3156

Advanced Decorator Usage

Graham Dumpleton

http://blog.dscpl.com.au/search/label/decorators http://github.com/GrahamDumpleton/wrapt/tree/master/blog

function wrapper

using a class method with __call__()

class function_wrapper(object):
def __init__(self, wrapper):
    pass
def __call__(self, *args, **kwargs):
    pass

class wrappers are not popular

introspecting functions doesn't necessarily maintain the

class instances do not have names

when using a closure, copy the attributes (name, doc, module, ...) But... it's laborious and error-prone

@function_wrapper is available in standard lib (functools module) for some version of Python

but... doesn't work with the class wrapper

@classmethod is also available

issues so far:

  • preservation of function name and doc
  • others...

what are descriptors? something with binding behaviour, e.g. get, set, delete_

call is a temporary method that is a result of accessing the function as an attribute of get

So... a wrapper needs to also use the get and do the binding if it's wrapping a class

But... this will fail if wrapping a decorator

It potentially does a lot of copying. wrapped, dict (which could be a lot of attributes). This will be slow. Solution is a transparent object proxy. Complicated so speaker glossed over a lot of details.

So... queries for name and doc just pass through to the wrapped objects.

New decorator factory

Remembering the instance that gets passed through, helps us maintain cls for instance methods that are decorated

But, we broke class methods

Because then the first argument cls, gets set as instance which is wrong.

So, check using isinstance() instead.

But the instance is still None

We can use instance = getattr(self.wrapped, '__self__', None) to detect that.

One more problem... decorating a class

instance is still None

Universal decorator: one that can determine what it is wrapping (class, function, instance method, static method, etc)

This doesn't allow arguments yet, though.

The magic decorator that does everything * preserves name, doc * signature from inspect.getargspec() * inspect.getsource() * aware of context

Use the wrapt package

http://wrapt.readthedocs.org https://github.com/GrahamDumpleton/wrapt https://pypi.python.org/pypi/wrapt



blog comments powered by Disqus

Published

11 April 2014

Category

work

Tags