PyCon: Days 0-1
Generators: the Final Frontier
All tutorials in the series can be found here:
- http://www.dabeaz.com/generators
- http://www.dabeaz.com/coroutines
- http://www.dabeaz.com/finalgenerator
The bulk of the generator functionality has been around since Python 2.5:
- close()
- send()
- next()
Since python 3.x
yield
andreturn
within the same function are both allowed. Prior to this, including both within the same function was considered a syntax error.Using
return
within a generator has "interesting" behaviour. It raises a StopIteration exception with the value of return. I.e. one can catch an error from a generator as such:def my_generator(): yield 'foo' yield 'bar' return 0
g = mygenerator() mygenerator.next() # 'foo' mygenerator.next() # 'bar' try: mygenerator.next() except StopIteration, e: e.value # 0
yield from [generator]
is available in python 3.2? and it does the iteration for you (and a whole lot more)
"Poutine code" == "It seemed like a good idea at that time"
John Perry Barlow keynote
Grateful Dead lyricist. Constrasted the concept of physical property and IP. Adam Smith got scarcity and value relationship right wrt physical items. However, songs become more valuable the more people who know them. The best song in the world inside my head is worthless. "Had an aversion to authority... LSD tends to do that to you". Talks amost everyday w/ Edward Snowden, but also has consulted with the NSA and CIA for 20+ years. When asked about whether he had given up on regulation: "If I didn't believe in the power of regulation, I wouldn't be sending out armies of EFF lawyers every day".
Was challenged as being "contradictory" by a woman to claimed to want "less government". Responded by saying that he believed in government for universal health care ("if you loose a leg and have to go to the ER..."). But that he believed local government was more direct and responsive. "...hoping that there will be a City-State Renessaince that is as big as The Renessaince".
Character Encoding
- Know your encoding
- Use the unicode sandwich
- Test your application with different character encodings
chardet
mixed encoding / mojibake
python-ftfy package fixes things.
strict behaviour for .encode('cp1252', 'strict') raises exception .encode('cp1252', 'ignore') # missing characters .encode('cp1252', 'replace') # replaces (duh!)
Byte order marks. e.g. UTF-16, byte at beginning... big/little
Decorators
http://bit.ly/dec-pycon-2014
Uses
- setup/teardown
- diagnostics
Graham will be talking later today... first track, last talk
Good decorators are versatile and can be applied to any function
e.g. inner function signature should use args, *kwargs
make sure to fix func.__name__
... inner.__name__
= wrapped.__name__
also __doc__
etc.
arg spec... though, that's mch more complicated... see Graham Dumpleton's "wrapt" package...
http://bit.ly/decorators2014
decorators with arguments
@unittest.skipIf()
simplified
def skipIf(conditional, message):
def dec(wrapped):
def inner(*args, **kwargs):
if not conditional:
...
More examples:
- counting how many times a function has been called
- use start time and end time to time a function call
- combined: time elapsed
Porting your Application to Python3
- Straddling Python 2/3 in single codebase
- Choosing target Python versions
-
- Add test coverage to reduce risk
- Cover C extension
- Porting hygene
Based on porting 180 kLOC Python, ~25 kLOC c codebase
- Can't really port once and abandon... not feasible
- Use 2to3 as a useful starting point... but it's painfully slow on large codebases
Straddling is best
- use "compatible subset" of python syntax
- conditional imports masks stdlib changes
six
module can help (but you might not need it, and it's too heavy for the speaker)
Syntax changes < 2.6 are hard
- no b'' literal
- No except Exception as e:, in python3 there is no except Exception, e
- Much more cruft / pain
- 2.4 / 2.5 are long past EOL
- Incompatibilities make Python3 < 3.2 hard
- PEP 3333 fixes WSGI in Py3k
- callable() restored in 3.2
- 3.3 restores u'' literals
- 3.2 is "system Python3" on some LTS systems
- Ubuntu/debian have just jumped to 3.4 for their LTS
Summary: 2.6, 2.7, 3.2+, maybe drop 2.6
Risks
- Bug injection
- Fear of breaking working software
- Fight fear with confidence; improve testing regime
- improve testing
- modernized idioms in Python2
- use with statement
- exception naming
- bytes vs. text for all literals, do not use bare literals, ever
- print is now function like, rather than a statement
- don't use StringIO.StringIO... use io.BytesIO instead
- clarity in text vs. bytes (have to do this)
Bottom-Up Porting
- Port packages with no dependencies first (bottom of the dependency tree)
- The port packages with already-ported deps
- Note Python version supported by deps!
- May have write a pull request for deps
- Lather, rinse, repeat
Finally! Port the application
See the book: "Common subset" idioms
python2.7 -3
option can point out problem areas
Testing Avoids Hair-Loss
- Untested code is where the bugs go to hide
- 100% coverage is ideal before porting
- unit tests preferable for libraries
- functional tests best for applications
- Measure coverage with http://pypi.python.org/pypi/coverage
- Work to improve assertions as well as coverage
- assert contracts, not implementation details
- doctests make straddling really hard
- replace doctests with unit/functional tests
- if at all feasible, convert doctests to Sphinx examples
- automate running tests
tox
helps ensure that tests pass under all supported versions (including pypy)
Considerations for C Extensions
- http://python3porting.com/cextensions.html
- Testing C is harder! (duh)
- have a reference implementation in Python
- easier to test
- supports PyPy
- might even run faster under PyPy than it would in C anyways
The Birth and Death of JavaScript
JavaScript: The Good Parts asm.js - compile as closely as possible to native code Python interpreter that is writte in javascript A game... 250 kLOC of C code compiled to asm.js running in a browser at half native speed
INTEGERS
Compile a program to native javascript and run it in the browser
GIMP running in Chrome
- virtual memory
- functions
- system calls
- push registers
- fire interrupt
- trap
- switch to ring 0
- swtich VM table
- jump
Metal
- 25-33% perfomance costs for hardware isolation
Kernel + asm.js + DOM at full native speed
VM provides protection asm/VM loss: -20% Metal gain: +20%
Gained 4% performance (1.2 * .8)
Hard to sell in 2014. Most programmers don't undestand the fundamentals VMs were very high level
Portability - no CPU portability
Binaries are dead C infrastructure is dead Execution is faster JavaScript is dead JavaScript had to be bad and popular (to get where we were) JavaScript Lost but Programming Won
Gary Bernhardt http://destroyallsoftware.com Screencasts for Serious Developers
Puppet Modules: Apps for Ops
https://github.com/jbronn https://speakerdeck.com/jbronn
Why do we need it: Complexity
Don't Repeat Yourself - do not configure manually
Why Puppet?
It's ruby. Yuck. SaltStack? crypt.py has been the source of problem. Ansible? Licensing.
Puppet Wide OS support. Explicit. Bunch of other things.
Puppet Language
Ruby DSL .pp files Compiled into a directed acyclic graph
package {'openssh-server':
ensure => installed,
}
package {'Django':
ensure => '1.6.2',
provider => 'pip',
}
Declarative: cannot assign a name again
string interpolation Flow control: if, case, etc.
Facts
$ facter | less
quickly get info about the ystem
Resource Ordering
How to set up relationships between resources first have to know how to reference them
Modules
containers to Puppet code Analogous to Python Packages
Modulefile - analgous to setup.py manifests/init.pp analagous to init.py "class", a bag of resources... don't think about it in an OO way. templates - similar to django templates.... files - will not be interpreted or evaluated lib - beyond scope. Custom extensions to puppet itself.
Mix and match. Have small units of abstraction and build them up to something bigger pip provider install from git sources w/ existing pip provider
pipx provider can add --index-url venv {'/srv/pyapp'}
venv_pacakge type
Class Parameters
rather than hard code versions, can add parens between class and spec to write dynamically treat the module as another type of resource
ERB
Ruby Templates
Running Puppet
- Standalone
- Customize
- Evaluate - lazy, put it in a file
- Centralize
- The Puppetmaster
- Node + Master
- Can centralize configuration and secrets in one place and then can track resources across everything
- pypuppetboard
Module Development
Use Vagrant. A must have for devops
vagrant up
done
What is Async, How Does It Work, and When Should I Use It?
Subs
Analogy of Sub shop, client and sandwhich maker, CPU-bound web service Can scale out by making the sandwhich maker faster, or add more sandwhich makers, but concurrancy doesn't make sense Throughput bound by computaion. No need for async here. Do not use async here.
Pizza
Client to pizza cook slices are made but are put in the oven for 2 minutes. Both sides are idling until slice is hot. Scaling? Here you don't want the cook to last for the full interaction. Cook should be asysc and respond to events as they occur. Clients arrive. Slices get hot. Cook should respond to those events. Problem? With lots of concurrent clients and slices, nemonic is required to keep track of stuff. Analogous to server w/ big backend (database, OAuth service, etc.). Throughput bound by memory.
Omakase
Clients show up. Waiter tells the kitchen that there are lots of people there. Chef makes lots of stuff. Small number of waiters are event-driven and can multi-task. Like a websocket application. Clients connect to server and wait for server events. Connection remains idle until new e-mail arrives on the server. Long life, lots of them are open simultanously. Minimize resources for idle connections. Main resource that you need to conserve is for memory.
Async must store per-connection state as efficiently as possible.
C10K ("ten thousand concurrent connections") is influential paper on this matter.
Orders of magnitude may have changed since then, but still if you use 1 thread per socket, you will exhaust your resources.
So... why is multithreading still the conventional approach? Answer: it's hard to code. I.e. when you're waiting for something to happen, you have to remember what you were going to do when the event occurs. Exceptionally true, since the wait is now indefinitely long... you don't know when that event will happen (e.g. an e-mail arriving).
Memory per connection is large for threads but programming is (supposedly) easier. Callbacks is (supposedly) harder to program, but requires much much less memory per connection.
- Threads
- Callbacks
and middle ground implementations include
- Coroutines
- Greenlets
etc.
So what is async?
- Single-threaded
- I/O concurrency
- Non-blocking sockets
- epoll / kqueue
- Event loop
So, let's switch to a specific implementation, asyncio
.
asyncio/"Tulip"
- It is now in the Python 3.4 standard library
- Implementation of PEP 3156
- Standard event loop
- Coroutines
Example: chat application
two browser windows connect to the same server via websocket When one window posts, both windows get the mesage
Layers:
- example.py / Application
- autobahn websockets / Protocol
- asyncio / {Transport | Event Loop}
- / Selectors
asyncio is relatively low level. It gives you Transport and the Event Loop, but won't give you Protocol or Application
Selectors is a new module in the Python 3.4 stdlib and abstracts all of the differences away between epoll/kqueue and let's you generally ask for event notifications reguardless of the OS on which you are working (yay!)
sock.setblocking(False)
makes it so that any operation on the socket either suceeds or fails instantly. It will never block (though it may raise an exception).
callback can do anything including adding new callback to the ready queue. We don't want to call those now... any secondary actions have to wait until the next iteration through the loop. This prevents starvation, among other things.
Review
- asyncio uses non-blocking sockets
- event loop tracks sockets, and the callbacks waiting for them.
selectors
wait for network events- event loop runs callbacks
Should I use it? Yes: * slow backend * websockets * many connections
No: * cpu-bound * no async driver * no async expertise
One other thing people get wrong: use an async framework with a blocking driver (i.e. tornado + mysql)
Final issue that you have to deal with is... what if you don't actually know async very well. You need to be judicious in when you have time to learn it correctly, and you need to decide when that moment is. It may not be easy.
Doesn't have to be all or nothing, of course.
TODO read PEP3156
Advanced Decorator Usage
Graham Dumpleton
http://blog.dscpl.com.au/search/label/decorators http://github.com/GrahamDumpleton/wrapt/tree/master/blog
function wrapper
using a class method with __call__()
class function_wrapper(object):
def __init__(self, wrapper):
pass
def __call__(self, *args, **kwargs):
pass
class wrappers are not popular
introspecting functions doesn't necessarily maintain the
class instances do not have names
when using a closure, copy the attributes (name, doc, module, ...) But... it's laborious and error-prone
@function_wrapper is available in standard lib (functools module) for some version of Python
but... doesn't work with the class wrapper
@classmethod is also available
issues so far:
- preservation of function name and doc
- others...
what are descriptors? something with binding behaviour, e.g. get, set, delete_
call is a temporary method that is a result of accessing the function as an attribute of get
So... a wrapper needs to also use the get and do the binding if it's wrapping a class
But... this will fail if wrapping a decorator
It potentially does a lot of copying. wrapped, dict (which could be a lot of attributes). This will be slow. Solution is a transparent object proxy. Complicated so speaker glossed over a lot of details.
So... queries for name and doc just pass through to the wrapped objects.
New decorator factory
Remembering the instance that gets passed through, helps us maintain cls for instance methods that are decorated
But, we broke class methods
Because then the first argument cls, gets set as instance which is wrong.
So, check using isinstance() instead.
But the instance is still None
We can use instance = getattr(self.wrapped, '__self__', None)
to detect that.
One more problem... decorating a class
instance is still None
Universal decorator: one that can determine what it is wrapping (class, function, instance method, static method, etc)
This doesn't allow arguments yet, though.
The magic decorator that does everything * preserves name, doc * signature from inspect.getargspec() * inspect.getsource() * aware of context
Use the wrapt
package
http://wrapt.readthedocs.org https://github.com/GrahamDumpleton/wrapt https://pypi.python.org/pypi/wrapt
blog comments powered by Disqus