PyCon 2019: Day 1

Morning Keynotes

No notes per se, aside from the fact that these two speakers told a powerful story from two perspectives.

Just check out the videos when they become available on pyvideo (they are not, as of this writing).

Releasing the World’s Largest Python Site Every 7 Minutes

Shuhong Wong, Production Engineering Manager at Instagram(IG)/FB

State of server release

Release server code 70-100 times daily, every 7 minutes at peak

Inspiration for your continuous deployment system

Build package -> (Run Test) -> (Run Canary) -> (Take lock) -> (Notify Authors) -> (Track Deploy Start) -> (Parallel Deploy) -> (Track Deploy End) -> (Release Lock)

Deployment script matured over time
Lots of features - we kept building)
Deploying IG for many years
Laid the foundation to how we do deployment at IG

Push script

Improvmenets to the script came organically
- Anyone can deploy code
- encourage authors to take ownershiop
- Human is the weak link
- Can push doesn’t mean will push
- inconsistent human response to options
- “Tests failed, do you want to continue? (y/N) y”

Deploy automation

Do you want to run tests before pushing? (Y)
“test fails, do you wish to continue pushing?” (N)
No human input needed when everything is working with safe defaults (ABORT/ABORT)
Post commit hook… deploy script (auto defaults) = Continuous Deployment!
Deploy every single commit consistenty and as soon as it lands
CD became a service
- fewer people know how to deploy and revert a change to the site!
- back to deployment team to unblock when the build fails
DO NOT BREAK TRUNK
- The people who notice the breakage are not the people who caused it
- Land blocking
- Immediately deploy after

Land blocking

Ensure commit is production worthy before allowing to land
- Commit author owning the change
- lesser incidencts of trunk broken
- everyone moves faster
Commit pushed to prod within 1 hour of the commit landing
- engineers can be expected to be around to support the change
Before landing, commit? lock
After landing, deploy lock
What happens when an error slips past test and canary
Tests and canary are not bulletproof
- Need volume

Deploy in phases

Canary was our last line of defense, it has its limits
Add the c1 tier
C1, C2, C3, running in parallel, pipelined.
Deploy script is very complex
- Massive coordination needed
Still have a post commit hook
- pushes to db
- controller, makes a decision and promote a version to a tier
- c1, c2 runners
- broadcast the instructions to roll() and deploy to all the machines
  - code knows how to pull the backage build it and deploy
Now have pipelined deployment system
- c1 was second defense

Deploy as fast as we can

Why?
- Engineers are around to support their changes
- Our engineers don’t slow down for us
- Better productivity and safety
1 commit in queue
- deploy
2
- still can happen in an hour
20 commits?
- Can deploy 8 / hour, need to deploy 3 at once
Can only deploy at the speed of allowed capacity loss
- Also deploy at peak traffic periods
- can only use idle compute power to reload the web server
uWSGI
- Fork a new master
- Shutdown idle worker
- Spawn worker on anew master
- eventually move all workers to new master
- Shutdown old master
servers server traffic all the time during reload
We can’t limit the volume of commits
- how fast we deploy affects… how many deployments we can do in 1 hour…

Fully autonomous fleet

Used to be 100% homongenous
Sometimes py2/py3
C1/C2
Different config at parts of the fleet
Different runner for each time of config
replace c1/c2 with cache?

The “North Stars”

Do simple things first
Do what is enough for us to scale
Push as fast as you can
- smaller batches, turns out to be more stable for production
Build a culture around testing
- There is no CD without QC and testing
- Type checking
- 100% test coverage
- Continuously asking why something wasn’t testing

Questions

Does every problem show up within 6 minutes? Aren’t there some things that take say 6 hours?
- At IG scale, everything shows up within a minute. The Canary is the safeguard. Catch 1-2 bad commits per day. If we missed it in the pipeline, we iterate and improve.
Alpha, beta, production staging?
- No. The best signal comes from actual code hitting production. When it hits every user, the sooner we get that signal back, the better. I wouldn’t reject that thought, but this model works very well with us.
How many services?
- This is instagram.com which is a monolith. Everything. There are other microservices, but they have their own release cycles.
How much do engineers depend on local tests, vs. canary and pushing to commit?
- No data. Have a hunch…
Do engineers also use feature flagging?
- Not introduced… want to expand our current system to
How do you deploy your deployment system?
- Continuously. It breaks, but we like to know as soon as it breaks. It’s under control and we can roll it back fast. 20 minutes downtime, we can move on.
How the uWSGI master process works.
- Watch the talk from PyCon Australia

Dependency hell: a library author’s guide

Yanhui (Angela) Li, Brian Quinlan

Package, distribution, PyPI, Internal

Introduction/Motivation

Welcome to hell.

These are all real examples

$ pip install apache_beam tensoflow

tensorboard 1.11.2 has requirements protobuf>=3.4.0, but you'll have protobuf 3.3.0 which is incompatible.

As a user, what do I do? I didn’t ask for this and I don’t care about it. So I can assume that someone at apache is a liar. Or I can just quit and not use tensorflow.

Incompatibilities in the top PyPI packages: 7 of 100

Most of these packages are foundational and don’t have any dependencies! This is untenable!

The big problem is this is one of the situations where a user encounters and error and can’t do anything about it.

Technical Details

Diamond dependency

My App -> Apache Beam (protobuf <= 3.3.0)->protobuf > Tensor Flow (protobuf>=3.6.1) /

The dependencies come from the setup.py in the dependencies

Dependency Issues

Common things that lead to diamond dependencies.

Never release a breaking change in the minor release. That’s the most common reason that diamond dependencies come from.

E.g. oauth2client removed API between v4.1.1 -> v4.1.2

Version number isn’t communicating changes accurately.
Hard for the users to depend on the latest version of oauth2client
Hard for users to specify dependency ranges

Breaking changes in a minor release. User has to use version pinning in the requirements. However, later, using version pinning, their package could be incompatible with any other package that requires something else.

Version range too narrow (e.g. > 1.3.0, < 1.4.0)
Use a broad range, too many version to support (> 1.2.0, < 1.0.9)
Not supporting the latest version

Use outdated dependencies

If your pacakge can’t work with the latest than it will be widely incompatible
Missing bug/security fixes

Best Practices

Everything isn’t hopeless
What to do?!
- Use semantic versioning
- Best thing about this? You don’t have to coordinate with anyone. You just choose a version and do it.
- Advertise to your users that you use semver
- Avoid API churn
  - Use tensorflow -> tensorflow makes breaking change -> code breaks -> refactor code -> use tensorflow
- Make reasonable constraints
  - E.g. ‘six>=1.10’ (they basically never have released a breaking change
  - If six had used semver you could say ‘six>=1.10,<2’
- Once you stop pinning versions, you have more testing to do

@nox.session(name=['3.4', '3.5', '3.6', '3.7'])
@nox.parametrize(
    'min_version', ['Jinja2==2.9.0,
                    'Pillow==5.0.0,
		    ...
		    ]
    def compatibility_test(session, min_version):
        session.install()

Support new depenedency version
- new packages will use the latest version
- important to pick up security fixes

Make users happy!

Use semver
Avoid API churn
Support as large a range as possible
Support the latest versions of your dependencies

Questions?

How do automated dependency tools fit into this
- $ pip check will check your virtualenv
Is the impossible conflict common b/c of how Python does dependency management
- sdists. In general, you don’t know its transitive dependencies until it downloads them and run the setup script
- A Python only supported wheel, the PyPI API will tell you what the dependencies are. And then you could have more sophisticated tools for resolution. I was an advocate of that until I talked to other people who hate wheels.

Advanced asyncio: Solving real-world production problems

Lynn Root

SRE at Spotify @roguelynn

Build infrastructure for people who write machine learning models for signal processing FOSS advocate at Spotify PyLadies

Agenda

Initial setup of Mayhem Mandrill
Dev best practices
Testing debugging profil

https://rogue.ly/adv-aio

Intro

Simple illustrations are not very helpful. Basically souped up hello world examples

Some help you get up and running, but then you realize that you’re doing it wrong

I’m not building “web crawlers” at spotify

I’m building services that make a lot of HTTP requests that have to be non-blocking

Pub/sub, handle errors, service level metrics. Need non-asyncio compatible dependencies

Example

Service that does periodic hard restarts

Chaos monkey -> Mayhem Mandrill

Listen for pub/sub message and restart the host based on that message

Initial setup

Not using await… creating a tasks. Returns the task, but using as a fire/forget.

Consumer

Concurrent work

Store message in db for later playing

Restart and save don’t depend on one another, but maybe you do want it to happen serially. E.g. restart hosts that have an uptime > 7 days. Serial code with dependencies doesn’t mean that it can’t be async.

Block when needed; put logic in a separate coroutine

Finalization tasks

cleanup, ack message. handle message

unblocking the finalization tasks

async != concurrently
serial != blocking

It is a mental paradigm shift. Think about what you can farm out and what you cannot.

Graceful shutdowns

Clean up database connections, finish current request, while not accepting new. Respond to signals

Responding to signals

Attach a signal handler to the loop.

Which signals to care about?

Mmmmmm, no standard. All of them.

not-so-graceflu asyncio.shield

try/except/finally isn’t enough
define desired shutdown behaviour
use signal handlers
listen for appropriate signals

Exception handling

We haven’t done that yet.

We can use a global exception handler and attach it to the loop

Specific handlers?

return_exceptions=True is super imperative
asyncio.gather with return_exceptions=True has deterministic ordering

Threads

Sometimes you have to work with them and I’m sorry if you do

running coroutines from other threads
ThreadPoolExecutor: calling threaded code from the main event loop
asyncio.run_coroutine_threadsafe: running a coroutine on the main event loop from another thread
- I was deadlocking myself in production before I realized this

Testing

asyncio.run in py37

pytet.mark.asyncio will do the hard work for you in <py37

mocking coroutines

E.g. save() calls another coroutine or it might call a database. You don’t want to wait for that when you’re running tests, right?

Unittest.mock and pytest.mock don’t support async mocks.

testing create_task()

Create some mock queue and use the mock_queue fixture

testing the event loop

100% test coverage… how do we get there with main

pytest-asyncio + mocked coroutines

Debugging

One small thing. Use print_stack(), you’ll see the stack for every running task, and you can increase the number of frames that are printed.

PYTHONASYNCIODEBUG=1

It’s able to tell you if you’re threadsafe!

Acts as a tiny profiler that flags calls that are slower than 100ms (configurable). Highlighting any unnecessarily blocking tasks.

In production
- aiodebug that will log callbacks for you
- Can report delayed calls to statsd
- Super lightweight

Profiling

Event loop can already track coroutines that take too long. Hard to differentiate a pattern from abnormal behaviour.

Cprofiler. Nothing stands out except main event loop.

kcachegrind can be used with python

pyprof2calltree –kcachegrind -i mayham.prof

Visualization groups modules together by color.

line_profiler package where we can hone in on pieces of are code that are suspicious.

aiologger: allows for non-blocking logging

Live profiling (don’t want to have to stop the service to look at the results). Can’t attach to a running process, but when you launch with it, you get a text based UI, and you can save performance data and view it at a later time. Server that you can connect to from else where.

Not much difference between profiling aysnc code vs. regular

Does remote work really work?

Lauren Schaefer

Why I work remotely

2008

My boyfriend proposed. I said yes. Life was great.

I was a computer scientist, and he was a nuclear engineer. We did get along. We had our interships.

I interviewed with IBM. Three offers at IBM. Fantastic problem. Nuclear recruiting didn’t start until the next spring. You have to live where the nuclear power plants.

Awkward situation. I want to live with my husband.

I called these hiring managers up. I can work with you in the office for a year and then I don’t know.

Will you let me work from home. Yes, maybe, no. Easy decision.

My husband went to Maryland. We split physically (not romantically).

Went to cubeland for a year. THE WORST.

After a year of that I was excited to work remotely.

Working remotely can be THE WORST. So many life adjustments can be very challenging.

I went from having lots of local friends to one single set of couple friends. My life change in a major way and I wasn’t happy about that.

I’d talk to my team once per week for an hour. It was a struggle.

I was in charge of build verification tests. Get on at 6:30 to make sure that the build was ready for 9:00. I had to keep checking in all day long when the test didn’t pass. Found myself very unpassionate about the work.

Working remotely can be THE BEST. Learned how to switch teams. Learned out to speak at conferences. Started a BOF session for remote employees and then started a remote support group for 11k IBM employees.

Worked for SugarCRM for a year and then have been at MongoDB. Don’t ever want to go back to an office.

How to go from the worst to the best.

Why do employees want it

Unable to relocate due to spouse’s job or kid’s school
Lengthy commutes
Availability for children and aging parents
Distracting office environments
Travel the world

Why to employers want it

Attract and retain top talent–no matter where they live
Increase employee morale
Save $$
- Smaller/no office space
- No relocation costs for new employees
Increase employee productivity
- Fewer sick days
- Shorter breaks
- Fewer distractions

Research

2019 Stack Overflow Developer survey

40% of devs want to work not in the office

How often do you work remotely? 43% never.

60% more coding experience for those who work remotely full time.

Greatest challenge to productivity

distracting work environment (42%)
meetings (37%)
time spent commuting

2014 scientific study

16k employees. Who would like to work remotely. Split the yes’s into two control groups. 13% performance increase in work from home employees.

more minutes per shift
- remote employees took fewer breaks and fewer sick days
more calls per minute
- quieter work environment that was more conducive to getting their work done
statistically significant work satisfaction
50% reduction in work from home employees
- huge!
Downside: 50% reduction in promotion rate conditional on performance
- out of sight, out of mind. Your not going to be considered for promotion
- work from home, couldn’t develop their interpersonal skills
- work from home employees didn’t want to go back to the office and wouldn’t put in for it
22% performance increase after allowing employees to chose where to work

2009 Cisco Teleworker Survey

productivity, work-life felx increase
$277 million savings in productivity
47k metric tons of GHGs not release in 1 year due to teleworking

Lit review, meta study (46 over 20 years)

Clear upside
“…no straightforward damaging effects on quality of workplace relationships or perceived career prospects”

Downsides

“Professional isolation negatively impacts job performance”
Having a remote manager may negatively impact you
Remote employees fear stalling careers, isolation, distractions, and blurred lines between work and home life
Remote work encourages employees to “overwork and to allow their work to infringe on their family role”
- remote employees are often paranoid about appearing have been slacking off

How to convince your boss

“Remote office not required”

Propose an experiment. Sick mother-in-law? Want to travel the world? Be honest. What are we doing and how to we evaluate it. Propose the experiment for all team members.

“But what about collaboration and water-cooler conversations?”

“Water cooler” channel in Slack.

Some manager just have a “gut feeling” that it’s not going to work.

How many wildly innovation ideas can you implement at once. One? Two?

When all else fails, talk about the bottom line. Talk about the reduced attrition rates (not in a threatening way).

Steps

In reverse priority order.

Join the right team (fully distributed)
1. Know everyone’s communication style, and be comfortable.
2. Schedule 1-on-1s with each team member
3. Share a bit about yourself. Be personal.
Be productive.
1. Do your job.
2. Set daily goals.
3. I don’t have anyone else accountable, I have to do that myself
4. Create a workspace you love. Don’t work on your couch.
Communicate with your team
1. Be present. Be present when you say you are.
2. Be a great PR agent for yourself. Be careful of the words you use to describe yourself.
Travel
1. You have done this to get here today so good job.
2. If you have a chance to meet your teammates, do it.
3. Hack the system (arrange a client visit with your teammates, conference presentation).
Actively prevent burnout
1. Take a lunch break (I ignored my husband who wanted me to skip lunch to finish 30 minutes earlier)
2. Stretch before meeting (other people don’t show up on time)
3. Turn off your computer after work (and notifications on your home). You have to take a step away

Slides are on twitter @Lauren_Schaefer

At the end of the deck there is an appendix of references.

What I wish people had told me about Python’s multiprocessing

Complex multiprocessing example

Status subprocesses
Observation subprocess
Send queue -> send subprocess -> Logging servers
Reply queue -> listen subprocesses -> event queue

IoT HVAC Process

Main process
System status sub-process
HVAC Observation Sub-process
Send sub-process
listen sub-process

Tips

Don’t share data, pass messages
- when you share data, you have to manage the locks, use messages
- use multiprocessesing queues
  - Great thing… it ships! It’s part of the stdlib
  - If you need to scale, you can swap it out
  - Down side: uses pipes, every message is pickled
- each queue handles one type of message (with one exception)
- each process should read from at most one queue
- refactor later to use other queueing systems

send_q = multiprocessing.Queue()
event_q = multiprocessing.Queue()
event_q.put("FOO")

# in another subprocess
event = eventq.get(block

Always clean up after yourself
- notify processes to shutdown using “END” messages and a single shutdown_event
  - every one of these, when you’re reading out of a queue, is a loop: get the next job, get the next job
- All processes: notice shutdown and then clean up after themselves
- Main process: cleans up subprocesses and queues

while not shutdown_event.is_set():
   try:
       item = work_queue.get(block=True, ...)
   except


def stop_procs(self):
    self.shtudown_event.set()

end_+time = time.time() + self.STOP_WAIT_SECS
for proc in self.procs:
    join_secs = max(
        0.0,
	min(end_time - time.time(), STOP_WAIT_SECS)

# If I'm still alive, I have problems

proc.terminate()
# You want to log that... I'll talk about that later

Always join your threads. Removes a spurrious error if you are killing your master and the

Obey all signals
- Every process, needs to handle both TERM (kill) and INT (Ctl-C) (and other) signals
- Set the shutdown_event the first two times
  - That way during debugging, I can test failure modes
  - If you can shutdown cleanly, you’ve probably thought about and handled all of the race conditions that are possible
- Third time raise
- Maybe change system settings
Don’t ever wait forever
- No process should get stuck
- Loops must terminate
- Blocking calls need timeout
- Timeouts based on how long you can wait
- With queues, you can tell it to block and give it a timeout
- Sockets, you can set the timeout when you startup
- If there’s no timeout, write it yourself
Report, and log all things
- Use the Python logging module
- Use single time relative to application start
- Include Name of process
- Must log: Errors, Exception Tracebacks, and Warnings
- Should log: Start, Stop, Events
- In DEBUG mode: log a lot, Yeah, log even more.
- How do I log?
  - start_time = time.monotonic()

Conclusion

Don’t share, pass messages
Always clean up after yourself
Handle TERM and INT signals
Don’t ever wait forever
Report, and log all the things

As part of this talk, I wrote a blog post for my company.

pamela@cloudcity.io

@pmcanulty01

CUDA in your Python: Effective Parallel Programming on the GPU

William Horton

@hortonhearsafoo

Moore’s Law is dead

Number of transistors on an integrated circuit will double ever two years

Based on data from the 50s - 70s. Maintained until 2016.

Then Physics happend. As you get down to smaller levels, nanometer scale. Probably with power coming off and head dissapation.

Gordon Moore who died in 2015 said that Moore’s Law is dead.

GPUs

Developed for gaming. Designed to be good at matix operations.

Typical workloads…

NVidia, 4352 CUDA cores across 68 streaming multiprocessors. 1.35 GHz Base clock

GPU Devotes more transistors to data processing and less to Cache and control

GPU is mostly arithmetic units (ALU).

GPUs are truly general purpose parallel processors.

Different models: CUDA (Nvidia), APP (AMD), OpenCL (open standard maintained by Khronos Group)

About Me

Senior Software Engineer on the Data team at Compass

Real Estate platform.

We work with: PySpark, Kafka, Airflow.

Hobbies include deep learning, faist.ai, kaggle, pytorch

“Horton’s Law: Your AWS bill will double every month with your interest in deep learning”

Data Pipelines: Uber uses AresDB for real-time analytics.

How do I start?


import numpy as np

x = np.random.randn(1000000000000000000).astype(np.float32)
y = np.random.randn(1000000000000000000).astype(np.float32)

z = x + y

In CUDA:


import cupy as cp

x = cp.random.randn(1000000000000000000).astype(cp.float32)
y = cp.random.randn(1000000000000000000).astype(cp.float32)

z = x + y

Different approaches

Drop-in replacement
Compiling CUDA strings in Python
C/C++ extension

Increasing complexity, but greater control.

CuPy

A drop in replacement for NumPy

Developed for the deel learning framework

API differences:

Data types: strings and objects
numpy.array([some, list]) doesn’t work in cupy
reduction methods return arrays and not scalars

More drop ins

cuDF
cu=

Compiling CUDA strings into your program

CUDA let’s you look at your data/threads in multiple dimensions as well. Threads are in a block, so that you can map your processing to your data

Blocks and Grids

Blocks are groups of threads Grids are groups of blocks

Each threads get assigned to an ALU on the processor

Host and Device CPU will run most general purpose logic and the GPU will run the big parallel operations. Need to specify what code runs on the device vs. the host

PyCUDA

Built by researcher Andresas Klockner at UIUC?

Benefits?

Auto memory management
- objects get cleaned up when their lifetimes are over
Data transfer: in, out, and inout
- wrappers around your numpy array to transfer data to and from the GPU
Error checking
Metaprogramming
- PyCUDA will benchmark at runtime and optimize at runtime

https://github.com/rmcgibbo/npcuda-example

Uses Cython to generate C++

Manual Memory Management

How to start?

Access a GPU

Google Colab

Kaggle Kernels

Cloud GP Instances (but remember Horton’s law!)

Where to go next?

Applying CUDA to your workflow

Parallel programming algorithms

Other kinds of devices (xPUs like TPU , FPGA through PYNQ)

← Previous Archive Next →

blog comments powered by Disqus

Published

04 May 2019