PyCon 2019: Day 0
Morning Keynote
Lots of history of Perth, Australia (the speaker’s home) as metaphors for how to lead the community into the future.
Whar are Python’s “Black Swans” (things that you don’t know that exist or will happen, but are obvious in retrospect)? How could the community be upended in ten years?
- The ubiquity of phones (that don’t run Python)
- Not needing to run Python on the server (you can do that with JavaScript)
Calls to action
- Research and Development
- Donate (or have your employer donate) to OSS
Practical decorators
Couldn’t get a seat. Fire code violation. Watched the video afterwards, and it was a solid talk.
Terrain, Art, Python and LIDAR
Andrew Godwin
Django contributor Principle Engineer at Eventbrite (“I always need more lasers”).
- SRTM
- Shuttle RADAR Topography Mission
- 30m accuracy
- Works well on wide scales. Mountains and valleys well. Can’t see around valleys or mountains. Good start, but we have progressed.
- LIDAR
- Accurate to cm or better.
- Usually on an aircraft, but can be on a car or something else.
- Many major US cities have LIDAR flights flown regularly
Laser-cut profiles
How do you get data?
DEM: Digital Elevation Model. A big bitmap that contains elevations.
Laser cutting, need a “cut path”. Construct a series of SVG profiles.
Load a DEM into CSV (“because I’m lazy”)
Picks one in N
rows.
Draws a contour using svgwrite
.
But you need to exaggerate in the z
dimension by 2.5x.
Works well for small items, but not for anything bigger. E.g. Scottland, looks really flat.
3d printed cities
Cities are very precise. Accurate to half a meter. You can see individual trees, cars, railroad lines.
Fabrication is not the problem. The problem is the data.
LIDAR data comes in “point clouds”. Here’s the raw data from flying the city. Need to turn it into a solid surface.
Point cloud -> DEM
python-pcl
lastools
- Top surface -> Fully sealed 3D model (with the tile base) to give the 3D printer. TIN?
- Load the DEM… obvious outliers… get rid of 50m pit in the middle of London.
- Climps height (top and bottom).
- Smooths rough feature (array pass in Python).
- Writes out and STL file (pretty common 3D model format… TIN).
- How do you write STL?
- Python’s
struct.pack()
- Python’s
- Should I have used NumPy? Yes.
- Did I? No.
- https://github.com/andrewgodwin/lidartile
CNC-milled National Park
You need a milling machine. Subtract from metal.
The the US National DEM
Get the outline of the National Park
Use QGIS to cut out a park-only DEM
Toolbox > GDAL > Clip By Extent
Irregular shapes, not square.
8 hours of milling for each tiny little park
Map Projections
- Things I won’t work with:
- unicode
- names
- timezones
- currencies
- networks
- …
- map projections (congratulations!)
Future
More US National Parks
I do each one as I visit it. There are… 59 (actually 61).
Easier Milling
8 hours per piece. Really.
Tiny milling bits, I have broken many.
G-Code can pass instructions directly to the
Better STL optimisation
Millions of polygons isn’t great. Even a totall flat lake bed has many, many polygons.
Blender… 3d modelling program.
Personal LiDAR. Thanks autonomous vehicles.
https://github.com/andrewgodwin/gis_tools
From days to minutes, from minutes to milliseconds with SQLAlchemy
Leonardo Rochael Almeida
Not an expert in any of these things, but want to pass on my lessons learned
Work at Geru, Brazillian Fintech.
Backend stack: Python, Pyramid, SQLAlchemy, Postgres (also, Celery, MongoDB, Java)
SQLAlchemy: 2 aspects
The expression language (a Python DSL) + the ORM (classes mapping to tables and records mapping to instances of the class). You can use one without the other.
“Frameworks still require you to make decisions about how to use them…” Martin Fowler.
The ORM Trap
The Database is an external system, so you do have to think about it. If you don’t, sensible Python code -> Bad SQL access patterns.
The ORM does not help you write good, performant SQL.
Worse, this is unnoticeable at low data volumes, like during development. And Early Production. THEN when you’re getting success on your system, things start slowing to a crawl.
SQLAlchemy is so good that the metaphor rarely leaks.
The Fix: Let the DB do its job. Be especially mindful of the number of roundtrips that the database lookup happens. Especially when you are looking at relationships.
Be aware of implicit queries, especially from relationships
Aim for O(1) queries per request/job/activity
Avoid looping through model instances. It’s tempting, but let the DB do it for you.
Geru Case 1: the 24+ hour reports
It now takes minutes
The Geru Funding Model
Debenture Holder buys debentures to Geru, Borrows grands loans to individuals. Every month the borrow pays back money and the pay back debenture.
We run a report at the beginning of every month collecting everything that the borrow paid and run that against everything that we’ve ever paid debenture. Eliminate rounding errors, etc.
So I replaced a loop over a huge series of attributes, but sum()
ing
and coalescing in SQL. Push the calculations into the database.
Refactored method that all accepted start and end period. Put that into a method that assembles filters.
INSERT .from_select()
. Ask PostgreSQL to insert data from it’s own select, which means you don’t have to send data across the wire. Brought time from 4 hours to 15 minutes.
Conclusions
- You need to understand SQL even if you are using the DSL and the SQLAlchemy.
- “Rule of leaky abstractions”
- Read the
SELECT
Documentation from the database GROUP BY X
aggregations functions- Understand windowing functions, calculating aggregates without reducing dimensionality
- THEN study SQLAlchemy
- Be aware of the underlying queries
- Push work to the DB
- As much as possible
- But not too much
Only select the amount of information that you need
If you are bound to a loop… don’t do a complex query before you enter the loop. Be mindful of the amount of queries that you’re making and
Everything at Once: Python’s Many Concurrency Models
Jess Shapiro
Intro
- Doing multiple things “at once”
- Isn’t just “on” or “off”
- Many available options in Python
- Asyncio coroutines
- Python threads
- GIL-related threads
- Multiprocessing
- Distributed tasks
Parallelism
- Do things actually happen simultaneously?
- How does performance scale when you add more CPUs
- Good diagram to illustrate this
- Even models w/o “true” parallelism are useful
Minimum Schedulable Unit
- Code is made up of semantic chunks
- How big are the chunks that can be run independently
Data Sharing and Isolation
- How isolated is data between tasks?
- How long does data stay the same for?
- E.g. banking info
- What tools can be used to share data?
- E.g. Share data by default: locks, flags (redunce concurrency)
Asyncio coroutines
- One coroutine runs at a time
- MSU: “Awaitable block”
- Global state is shared within awaitable blokc
- Event loop
- Awaitable block is between
def
andawait
and between awaits - Example of well engineered vs. another block where synchronous block runs for too long
Python Threads
- One thread runs (GIL)
- MSU: “Bytecode”
- Global state is shared but consistent only for single-bytecode ops*
- Combined scheduling
- GIL released threads
- Multiple threads run simultaneously
- MSU: Host processor instruction (x86, etc)
- Global state is shared but unreliable
- OS-scheduled
- You would only want to do this if you’re implementing a core algorithm in C, Rust, etc. and provide a robust API to it.
- Don’t touch the Python interpreter while you do that
Multiprocessing
- Multi processes run simultaneously
- MSU: Host processor instruction (x86, etc)
- Global state starts the same as parent, but evolves independently
- OS-scheduled
fork()
system call
Distributed tasks
- Multiple tasks run simultaneously
- MSU: varies: often the entire application for some subset of data
- Global state totally independent: often “process-like”
- Central orchestrator
- E.g. 1 TB of data split by orchestrator and sent indepenednetly to workers
- Can be a lot of overhead using this concurrency model
- Want to use this only with lots of data …. 100s of GB to 100s of TB to 100s of EB
When to use each
- Asyncio
- Great when performance is is IO bound
- Network requests
- Disk read/write
- Doesn’t allow us to gain CPU performance
- Great for starting a new code base from scratch
- Legacy code may often have had threading
- Threads
- When you need preemptive multitasking (task need to interrupt each other)
- Integrate synchronous code
- Need fine-grained concurrency
- Python “glue” for GIL-unlocked C/Rust
- Processes
- Don’t need substantial inter-task communication
- Share data with pipes, memory-mapped regions
- Full parallelism required for Python code
- If you can easily split the data apart, do this
- Don’t need substantial inter-task communication
- Distributed tasks
- Highly-segmentable and distributable workload
- Need for share dstate minimal
- Large enough to use an orchestrator
Thanks to Mazarine on Market for avacado toast and esspresso
@haikuginger on GitHub
Questions
- I switch from Threads to asyncio by switching jobs. How do you actually migrate?
- We wanted to share so much state, that there wasn’t enough RAM
- If you have a large enough dataset, move to distributed.
- Go out to the filesystem
- Will you be posting your slides
- Eventually. I don’t know where.
Understanding Python’s Debugging Internals
Liran Haimovitch from Rookout
- Co-founder and CTO of Rookout
- Advocate of modern software methodologies
- Passion is to understand how software actually works
- Rookout
- platform for live-data collection and delivery
- on demand within seconds
- set non-breaking breakpoints, no restarts, extra coding or redeployment required
Debuggers
- pdb
- ipdb
- PyDev debugger (PyCharm, Eclipse)
- Rookout
- IDLE
What do they have in common? Based on sys.settrace()
sys.settrace()
- register a callback for the Python interpreter
- invoked on interpreter events
- function call
- line exec
- function return
- exeception raise
- E.g.
def simple_tracer(frame, event, arg):
pass
There exists global_trace and local_trace. Local tracing has a significant performance impact. So we can customize which functions we want to trace and which we want to ignore.
Multithreading?
threading.settrace()
- Must be called as early as possible - or you’ll miss threads
- Doesn’t cover the underlying ‘thread’ module and other low-level implementations
- gevent/eventlet
- Global tracing function will be shared among greenlets
How to build a debugger?
Inherit from Bdb
class Debugger(Bdb):
def __init__(self):
Bdb.__init__(self)
self.breakpoints = dict()
self.set_trace()
- Add set_breakpoint
def set_breakpoint(self, filename, lineno, method):
self.set_break(filename, lineno)
try:
self.breakpoints[(filename, lineno)].add(method)
except KeyError:
pass
def user_line(self, frame):
if not self.break_here(frame):
return
(filename, lineno, _, _, _) = inspect.getframeinfo(frame)
methods = self.breakpoints[(filename, lineno)]
for method in methods:
method(frame)
Performance Testing
What will be the performance impact?
Benchmarks
- Multiple scenarios with each implmenetation
- test w/o debugger
- test w/ debugger but no breakpoints
- test with a breakpoint in the different file
- test with a breakpoint in the same file
def empty_method():
pass
def simple_method():
pass
Performance optimization
- Avoid local tracing
- Optimize “call” events
- Optimize “line” events
We forked bdb, and optimized it. Much better performance. Still big performance hit with breakpoint in the same file.
Used Cython and gain a performance bit to 2.5x
I was getting desperate. Stopped and thought about what I was doing
Insights
- bdb is naive
- Performance may be improved by a significant margin
- becomes gradually harder to improve
- What happens if we set an empty tracer?
- 4x slower
- Turning on tracing, sets up CPython for extra work
- Some of this is in Python
- Some comes from C:
maybe_call_line_trace
eval.c:4384
- So what did we do?
- Give up.
Python Bytecode
Rookout way
- Go to the byte code
- Find the line of code
- Insert our breakpoint
- The interpreter doesn’t care about our breakpoint
- Defer to the other debuggers when our bytecode gets called
Tools for Bytecode mainipulation
- Python Standard Library (read-only)
inspect
dis
(short for “disassemble”)- There is no way to write bytecode in memory
- Google
could-debug-python
After you break?
>>> inspect.getframeinfo(inspect.currentframe())
def test_vars():
mystr = "mystr"
mydict = {'foo': 'bar'}
mylist = [1, 2, 3]
print(inspect.currentframe().f_locals)
Use cases
- Show off your Python skills :)
- Get source information (logging module)
- Walk up the stack (for profiling)
- Build a debugger
- https://github.com/Rookout/pycon-debugging-internals
- https://www.rookout.com/
blog comments powered by Disqus