PyCon 2018 - Day 1 - James Hiebert's Blog

Code Reviews using Art Critique Principles

Code Reviews / Art Critiques

Produce the best results possible and teach students how to learn and to take criticism

Enter the Code Review Rage Quit

People love to flip desks. “We are not doing code reviews anymore”. We want to believe that we’re the best possible developers that we can be.

We don’t realize that when we stare at the same stuff over and over, you miss things.

I can fix this. It’s not a trap! Why do they always think it’s a trap.

I want to look at this with a unique perspective of my art background.

What is this methodology about? Art critiques have been around for hundreds of years. If you can’t talk about your work, or if you can’t talk about other people’s work, you don’t understand what’s going on.

If you can’t talk about your code, it’s highly likely that you have just gone to StackOverflow, cut and paste some jibberish without understanding what you’re doing. That’s bad.

We don’t make mistakes, we just have happy accidents [picture of Bob Ross].

Goals?

Happier devs
Improved morale
- Don’t want to deal with processes that are abusive
Better code
Better code reviews
Less stress

Rules

Process! Without rules, there is chaos. If you know what is expected, you know what to expect.

Respect
Be prepared (don’t want to waste someone’s time). Code review is not time for mentorship. Code review is time to be productive.
Stay neutral where being neutral is required. Leave negativity out of it.
Constructive criticsm only.
Stay on point and focused. You can’t read code if you’re doing other things.
Mandatory participation. We all just want to talk about our own projects, but you have to contribute to everyone. The person doing the review has to read the code… and other understand it. A “thumbs up” is not a code review. This doesn’t give you any proof that the code was reviewed. The thumbs up doesn’t have any shared responsibility.

Only time you need to go over the rules again are when you on-board someone.

Pre-Review

Identify your work.
Run ALL relevant testing.
Know your code!
Check that the code meets all requirements.
Do not wast the time of others with unprepared code.

Code Crit

Code Critique Stages

Description
1. Identify
2. Inventory (proof that we have read it!)
  1. What is it? What format do I make it? Where does it go? You just want to make me do busy work.
  2. Write it however will work best for your company.
  3. What do I do with it? If there is something wrong with the code, send it back to the person who wrote it.
3. Neutral
Analysis
1. New Code + Old Code
2. Facts needed for critical interpretation (write down issues right now)
Interpretation (Description + Analysis)
1. What do all of these observations mean
2. In a constructive manner, make suggestions
  1. Leave the ego at the door
  2. Acknowledge the negative
  3. Contribute positive solutions
  4. Not a compliment sandwhich.
  5. Be clear
  6. Provide solutions to fix the problem
  7. Use the notes from your inventory
  8. Contribute to a better environment
  9. No malice!
3. When a person walks away from a code review beaten down, it lowers morale
Judgement
1. Can this code ship?
  1. Our code is important to us. It’s what we do. We should be paying attention to what we do.

Strategies to Edit Production Data

Julie Qiu, Engineering Lead for Spring (fashion ecommerce marketplace)

At some point, we all need to edit data in production. Many ways, some better than other. Processes that evolved in the company that I work for.

Why we edit production data

We’re all told that it’s bad practice. And then we go and do it anyway. Why? It turns out, there are legitimate reasons.

Internal tools are not available. I would have used another method, but it didn’t exist.

Edge cases exist. I tried to use the tool, but it didn’t work.

Time-sensitive changes are needed. We edit data in production because, it’s usually the fastest and easiest (albeit not the safest) method available.

I want to run this query:

UPDATE products SET name='julies-product' WHERE id=1

It’s Friday night. Marketing needs me to make this change. They promised that something was going to be done. All of the other engineers have left. I could Slack someone and have them check out my SQL, but… I don’t want to bother someone on Friday night. What’s the worst thing that can happen?

What I actually ran:

UPDATE products SET name='julies-product'

“My God. What have I done!”

We have real-life disaster stories just like this from all over the valley. They happen. And they happen not because we are bad engineers.

Safer strategies

Raw SQL spreadsheet
Local scripts
Existing server
Task runner
Script runner service

How it works. What’s great. What’s not great.

Develop a review process for manual edits

Just a process to make sure we get spot checks.

What we did was simply aintain a Google spreadsheet to record and review queries before running them

Add a record to the spreadsheet
- name date, description, reviewer
Reviewer approves or requests changes
Run the query
Use a transaction

What’s great?

Easy to implement
Audit trail
Teaches people the right thing to do

What’s not great?

Easy to make mistakes
Audit trail is at will
Difficult to run long and complex logic

Local scripts

How it works?

Write a script
Run the script (but do a dry-run)

What’s great?

Reusable (with different arguments)
Easy to manipulate outputs
Access to common code (easy to import functions, reuse logic, etc)

What’s not great?

Easy to make mistakes
Logs are only available locally
Network disconnections
- What do we do if the scripts take a really long time to run?
- We have 50 million products in our database… what if we want to do something to all of them?
  - I used to come into work really early and start my script and wait all day
  - Would it be nice if we had a computer on all the time?! (Duh!)

Existing server

How it works?

Write a script
Get the script onto the server
ssh and run inside a session

What’s great?

Ability to run long scripts
Reliable network connectivity (generally)
Infrastructure already exists

What’s not great?

Scripts can affect resources on that server
Not user friendly (deployment process can be a pain)
No persistent audit trail

Task runner

How it works?

Write a script
Code review and run tests
Input arguments and run (from the Jenkins interface)

What’s great?

All the repetitive things are done once. Logging built in.
Persistent audit logs
Can watch the task run from the console
Code review and automated tests
User interface

What’s not great?

Hard to manage credentials
Environments are not clearly separated
- Could easily accidentally use the production db URL but input a bunch of parameters from dev
Inputs are not verified

Script runner service

How it works?

User Interface -> Script Runner (dev) -> Database (dev)
               \> Script Runner (prod) -> Database (prod)

User just had to select an environment rather than input every argument into the script

What’s great?

Centralized config management
Separation of environments
User friendly interface

More that we can do

Parallelize and scale
Preview results
… up to you to customize

What’s not great?

Nothing. We’ve done it all

Which strategy is right for you?

Depends on your willingness to invest in infrastructure. Depends on your time available.

Make something usable, because engineers are also people!

Invest the effort, it’s worth the cost.

WHAT IS THIS MESS? (Writing tests for pre-existing code bases)

github.com/mrname/

I work for spinifexgroup. We write code that powers crazy contraptions.

What is Legacy Code?

Old? Not necessarily the case. Old != Legacy. Legacy code is:

Lacking tests
Lacking documentation (of every kind)
Difficult to read or reason about

If your code matches 1+ of these things, it is legacy.

Why shouldn’t we test code

It will be replaced eventually
- NO IT WON’T… maybe it will, but not right now.
- It was written to fill a critical business need. Otherwise they wouldn’t have spent the time to write it.
It has been working just fine
- HAS IT?
- Legacy code usually has poor logging or error handling, so there could be things going on that you don’t know about.
I do not have time
- It takes more time and money to debug issues in the wild
I’m lazy
- LAZY means being too lazy to fight fires, not being too lazy to write tests
What do I have to be the one to start?
- Because you’re a good person.

Let’s do this

My tests will never cover this code base
One rule: test what you touch
- Write a test that asserts current behaviour BEFORE YOU TOUCH CODE
- Write a test that asserts new/modified behaviour and LET IT FAIL
- Do code stuffs
- Test passes
The first test is the hardest test

Choose Thy Weapons

Find a good test runner (pytest)
Use pre-existing tools for your framework (pytest-django)
pytest-socket to disable socket JUST IN CASE
pytest-cov to make
factory_boy to avoid boiler plate

Requirements file: .in (unpinned)

pytest.ini: django_settings, etc.

Assignemnt 1 (feature request)

Update “Quark” mode to conditionally determine the value of the “magic” property.

Everyone who knew what quark and magic has left the company.

Where did we go wrong? Nothing wrong with unit tests. But we should test the FUNCTIONALITY of legacy applications. Need to do integration testing?

Don’t go nuclear. When dealing with legacy code, there’s always a reason, even if that reason no longer exists.

Assignment 2: Dealing with code that touches the world

Get back a new array with crazy sauce on top of it.

Let’s read the code but do not touch

Where to start? Just try to get the class into a test harness. pytest-socket blocked e-mail connection. We can’t even instantiate this thing.

Breaking dependencies… what is this tangled web of madness

Subclass it. Override the smtp object and mock it. But we want to test the view. And it failss.

Mocking helps us get past the smpt problem.

Making the world a better place. Making docstrings and comments are the safest changes that you can ever make.

Hold up now… that thing shouldn’t be mutating data. But we could have broken things using that mutated argument. We should just fix the broken thing, Cannot change code without tests, canot write test without changing code.

Sometimes it makes sense to change code to make it more testable, but ONLY if we can do that without breaking the interface.

E.g. let’s pass in the smtp object.

Assignment 3

Adding a new feature to an unruly process. Add SMS reporting.

Create a legacy (the good time)

CI/CD

Pushing code should run a build that runs tests. Builds should fail if tests fail Builds should fail if test coverage drops

Keep testing! Start testing things that you don’t need to test Test Sprint Brain refreshment

No go forth and test things!

Adapting from Spark to Dask: what to expect

Irina Truong

Backend Engineer at Parse.ly

To Dask or not to Dask? You may be considering a journey…

Is it possible?
Is it worth it?
What about performance?

78 more lines of Java stack trace from exceeding memory limit

Job aborted due to stage failure

Driver stacktrack: (and then nothing)

No more java heap space.

I am here to help you set realistic expectation.

What is Spark?

General data processing engine
Java, Scala, Python, R
Batch processing, stream processing, SQL engine

What is Dask?

Parallel computing library for analytic computing
Python only
Written in Python, C and FORTRAN
Can run in Hadoop YARM, EC2, Mesos, Local cluster

So I had this Spark application…

Input Parquet record customer, url, referrer, session_id, timestamp

Aggregated JSON record Looks like a JSON dict. Special ID to index into ElasticSearch. How many people came from which referrers.

I wanted to use a real example b/c most Dask examples are abstract and dry.

Input data structure is partitioned into columns

Aggregate and transform

Round timestamps down to an hour

# Group by customer, url and rounded time period
COUNT_VALUES(referrer) (referrers) # this is a user defined aggregation function

I had to write it in scala, build a jar, give it to the Spark runner, register it with Python, etc., etc.

Aggregate with Dask

Dask uses the same syntax as Pandas

df['ts'] = df['ts'].dt.floor('1H')

Dask needs two custom aggregations

counter = dd.Aggregation(
    'counter',
lambda s: s.apply(counter_chunk),
lambda s: s.apply(counter_agg)
)

Two levels of hierarchy in the multilevel columns

Transformation:

Performance?

Benchmark on local laptop and YARN cluster (Amazon EMR)

Cluster deployment with Spark (AWS)

github.com/rq/rq and a uI to schedule jobs remotely

Cluster deployment with Dask (Kubernetes) Amazon, Elastic Container Services for Kubernetes (EKS) is in beta

Promises

No more Java heap space
Or container keilled by YARN
Meaningful stack traces
Real-time visual feedback
No profiling

Can see in real time how workers are doing and what their resource usage is.

The bad:

Custer deployments are WIP (dask-ec2, knit, skein, kubernetes)
Does not support SQL
Only simple aggregations (no collect_list or nunique)
Less docs and examples
Can’t write JSON
More bugs

The full code: < 250 LOCs

Slides are here

TODO

I wanted to go to the following talks this day, but had a scheduling conflict:

← Previous Archive Next →

blog comments powered by Disqus

Published

11 May 2018

PyCon 2018 - Day 1

Code Reviews using Art Critique Principles

Code Reviews / Art Critiques

Goals?

Rules

Pre-Review

Code Crit

Strategies to Edit Production Data

Why we edit production data

Safer strategies

Develop a review process for manual edits

Local scripts

Existing server

Task runner

Script runner service

Which strategy is right for you?

WHAT IS THIS MESS? (Writing tests for pre-existing code bases)

What is Legacy Code?

Why shouldn’t we test code

Let’s do this

Choose Thy Weapons

Assignemnt 1 (feature request)

Assignment 2: Dealing with code that touches the world

Assignment 3

Adapting from Spark to Dask: what to expect

Aggregate and transform

Aggregate with Dask

TODO

Published

Category

Tags