PyCon 2018 - Day 1
Code Reviews using Art Critique Principles
Code Reviews / Art Critiques
Produce the best results possible and teach students how to learn and to take criticism
Enter the Code Review Rage Quit
People love to flip desks. “We are not doing code reviews anymore”. We want to believe that we’re the best possible developers that we can be.
We don’t realize that when we stare at the same stuff over and over, you miss things.
I can fix this. It’s not a trap! Why do they always think it’s a trap.
I want to look at this with a unique perspective of my art background.
What is this methodology about? Art critiques have been around for hundreds of years. If you can’t talk about your work, or if you can’t talk about other people’s work, you don’t understand what’s going on.
If you can’t talk about your code, it’s highly likely that you have just gone to StackOverflow, cut and paste some jibberish without understanding what you’re doing. That’s bad.
We don’t make mistakes, we just have happy accidents [picture of Bob Ross].
Goals?
- Happier devs
- Improved morale
- Don’t want to deal with processes that are abusive
- Better code
- Better code reviews
- Less stress
Rules
Process! Without rules, there is chaos. If you know what is expected, you know what to expect.
- Respect
- Be prepared (don’t want to waste someone’s time). Code review is not time for mentorship. Code review is time to be productive.
- Stay neutral where being neutral is required. Leave negativity out of it.
- Constructive criticsm only.
- Stay on point and focused. You can’t read code if you’re doing other things.
- Mandatory participation. We all just want to talk about our own projects, but you have to contribute to everyone. The person doing the review has to read the code… and other understand it. A “thumbs up” is not a code review. This doesn’t give you any proof that the code was reviewed. The thumbs up doesn’t have any shared responsibility.
Only time you need to go over the rules again are when you on-board someone.
Pre-Review
- Identify your work.
- Run ALL relevant testing.
- Know your code!
- Check that the code meets all requirements.
- Do not wast the time of others with unprepared code.
Code Crit
Code Critique Stages
- Description
- Identify
- Inventory (proof that we have read it!)
- What is it? What format do I make it? Where does it go? You just want to make me do busy work.
- Write it however will work best for your company.
- What do I do with it? If there is something wrong with the code, send it back to the person who wrote it.
- Neutral
- Analysis
- New Code + Old Code
- Facts needed for critical interpretation (write down issues right now)
- Interpretation (Description + Analysis)
- What do all of these observations mean
- In a constructive manner, make suggestions
- Leave the ego at the door
- Acknowledge the negative
- Contribute positive solutions
- Not a compliment sandwhich.
- Be clear
- Provide solutions to fix the problem
- Use the notes from your inventory
- Contribute to a better environment
- No malice!
- When a person walks away from a code review beaten down, it lowers morale
- Judgement
- Can this code ship?
- Our code is important to us. It’s what we do. We should be paying attention to what we do.
- Can this code ship?
Strategies to Edit Production Data
Julie Qiu, Engineering Lead for Spring (fashion ecommerce marketplace)
At some point, we all need to edit data in production. Many ways, some better than other. Processes that evolved in the company that I work for.
Why we edit production data
We’re all told that it’s bad practice. And then we go and do it anyway. Why? It turns out, there are legitimate reasons.
Internal tools are not available. I would have used another method, but it didn’t exist.
Edge cases exist. I tried to use the tool, but it didn’t work.
Time-sensitive changes are needed. We edit data in production because, it’s usually the fastest and easiest (albeit not the safest) method available.
I want to run this query:
UPDATE products SET name='julies-product' WHERE id=1
It’s Friday night. Marketing needs me to make this change. They promised that something was going to be done. All of the other engineers have left. I could Slack someone and have them check out my SQL, but… I don’t want to bother someone on Friday night. What’s the worst thing that can happen?
What I actually ran:
UPDATE products SET name='julies-product'
“My God. What have I done!”
We have real-life disaster stories just like this from all over the valley. They happen. And they happen not because we are bad engineers.
Safer strategies
- Raw SQL spreadsheet
- Local scripts
- Existing server
- Task runner
- Script runner service
How it works. What’s great. What’s not great.
Develop a review process for manual edits
Just a process to make sure we get spot checks.
What we did was simply aintain a Google spreadsheet to record and review queries before running them
- Add a record to the spreadsheet
- name date, description, reviewer
- Reviewer approves or requests changes
- Run the query
- Use a transaction
What’s great?
- Easy to implement
- Audit trail
- Teaches people the right thing to do
What’s not great?
- Easy to make mistakes
- Audit trail is at will
- Difficult to run long and complex logic
Local scripts
How it works?
- Write a script
- Run the script (but do a dry-run)
What’s great?
- Reusable (with different arguments)
- Easy to manipulate outputs
- Access to common code (easy to import functions, reuse logic, etc)
What’s not great?
- Easy to make mistakes
- Logs are only available locally
- Network disconnections
- What do we do if the scripts take a really long time to run?
- We have 50 million products in our database… what if we want to do something to all of them?
- I used to come into work really early and start my script and wait all day
- Would it be nice if we had a computer on all the time?! (Duh!)
Existing server
How it works?
- Write a script
- Get the script onto the server
- ssh and run inside a session
What’s great?
- Ability to run long scripts
- Reliable network connectivity (generally)
- Infrastructure already exists
What’s not great?
- Scripts can affect resources on that server
- Not user friendly (deployment process can be a pain)
- No persistent audit trail
Task runner
How it works?
- Write a script
- Code review and run tests
- Input arguments and run (from the Jenkins interface)
What’s great?
- All the repetitive things are done once. Logging built in.
- Persistent audit logs
- Can watch the task run from the console
- Code review and automated tests
- User interface
What’s not great?
- Hard to manage credentials
- Environments are not clearly separated
- Could easily accidentally use the production db URL but input a bunch of parameters from dev
- Inputs are not verified
Script runner service
How it works?
User Interface -> Script Runner (dev) -> Database (dev)
\> Script Runner (prod) -> Database (prod)
User just had to select an environment rather than input every argument into the script
What’s great?
- Centralized config management
- Separation of environments
- User friendly interface
More that we can do
- Parallelize and scale
- Preview results
- … up to you to customize
What’s not great?
- Nothing. We’ve done it all
Which strategy is right for you?
Depends on your willingness to invest in infrastructure. Depends on your time available.
Make something usable, because engineers are also people!
Invest the effort, it’s worth the cost.
WHAT IS THIS MESS? (Writing tests for pre-existing code bases)
I work for spinifexgroup. We write code that powers crazy contraptions.
What is Legacy Code?
Old? Not necessarily the case. Old != Legacy. Legacy code is:
- Lacking tests
- Lacking documentation (of every kind)
- Difficult to read or reason about
If your code matches 1+ of these things, it is legacy.
Why shouldn’t we test code
- It will be replaced eventually
- NO IT WON’T… maybe it will, but not right now.
- It was written to fill a critical business need. Otherwise they wouldn’t have spent the time to write it.
- It has been working just fine
- HAS IT?
- Legacy code usually has poor logging or error handling, so there could be things going on that you don’t know about.
- I do not have time
- It takes more time and money to debug issues in the wild
- I’m lazy
- LAZY means being too lazy to fight fires, not being too lazy to write tests
- What do I have to be the one to start?
- Because you’re a good person.
Let’s do this
- My tests will never cover this code base
- One rule: test what you touch
- Write a test that asserts current behaviour BEFORE YOU TOUCH CODE
- Write a test that asserts new/modified behaviour and LET IT FAIL
- Do code stuffs
- Test passes
- The first test is the hardest test
Choose Thy Weapons
- Find a good test runner (pytest)
- Use pre-existing tools for your framework (pytest-django)
- pytest-socket to disable socket JUST IN CASE
- pytest-cov to make
- factory_boy to avoid boiler plate
Requirements file: .in (unpinned)
pytest.ini: django_settings, etc.
Assignemnt 1 (feature request)
Update “Quark” mode to conditionally determine the value of the “magic” property.
Everyone who knew what quark and magic has left the company.
Where did we go wrong? Nothing wrong with unit tests. But we should test the FUNCTIONALITY of legacy applications. Need to do integration testing?
Don’t go nuclear. When dealing with legacy code, there’s always a reason, even if that reason no longer exists.
Assignment 2: Dealing with code that touches the world
Get back a new array with crazy sauce on top of it.
Let’s read the code but do not touch
Where to start? Just try to get the class into a test harness. pytest-socket blocked e-mail connection. We can’t even instantiate this thing.
Breaking dependencies… what is this tangled web of madness
Subclass it. Override the smtp object and mock it. But we want to test the view. And it failss.
Mocking helps us get past the smpt problem.
Making the world a better place. Making docstrings and comments are the safest changes that you can ever make.
Hold up now… that thing shouldn’t be mutating data. But we could have broken things using that mutated argument. We should just fix the broken thing, Cannot change code without tests, canot write test without changing code.
Sometimes it makes sense to change code to make it more testable, but ONLY if we can do that without breaking the interface.
E.g. let’s pass in the smtp object.
Assignment 3
Adding a new feature to an unruly process. Add SMS reporting.
Create a legacy (the good time)
CI/CD
Pushing code should run a build that runs tests. Builds should fail if tests fail Builds should fail if test coverage drops
Keep testing! Start testing things that you don’t need to test Test Sprint Brain refreshment
No go forth and test things!
Adapting from Spark to Dask: what to expect
Irina Truong
Backend Engineer at Parse.ly
To Dask or not to Dask? You may be considering a journey…
- Is it possible?
- Is it worth it?
- What about performance?
78 more lines of Java stack trace from exceeding memory limit
Job aborted due to stage failure
Driver stacktrack: (and then nothing)
No more java heap space.
I am here to help you set realistic expectation.
What is Spark?
- General data processing engine
- Java, Scala, Python, R
- Batch processing, stream processing, SQL engine
What is Dask?
- Parallel computing library for analytic computing
- Python only
- Written in Python, C and FORTRAN
- Can run in Hadoop YARM, EC2, Mesos, Local cluster
So I had this Spark application…
Input Parquet record customer, url, referrer, session_id, timestamp
Aggregated JSON record Looks like a JSON dict. Special ID to index into ElasticSearch. How many people came from which referrers.
I wanted to use a real example b/c most Dask examples are abstract and dry.
Input data structure is partitioned into columns
Aggregate and transform
Round timestamps down to an hour
# Group by customer, url and rounded time period
COUNT_VALUES(referrer) (referrers) # this is a user defined aggregation function
I had to write it in scala, build a jar, give it to the Spark runner, register it with Python, etc., etc.
Aggregate with Dask
Dask uses the same syntax as Pandas
df['ts'] = df['ts'].dt.floor('1H')
Dask needs two custom aggregations
counter = dd.Aggregation(
'counter',
lambda s: s.apply(counter_chunk),
lambda s: s.apply(counter_agg)
)
Two levels of hierarchy in the multilevel columns
Transformation:
Performance?
Benchmark on local laptop and YARN cluster (Amazon EMR)
Cluster deployment with Spark (AWS)
github.com/rq/rq and a uI to schedule jobs remotely
Cluster deployment with Dask (Kubernetes) Amazon, Elastic Container Services for Kubernetes (EKS) is in beta
Promises
- No more Java heap space
- Or container keilled by YARN
- Meaningful stack traces
- Real-time visual feedback
- No profiling
Can see in real time how workers are doing and what their resource usage is.
The bad:
- Custer deployments are WIP (dask-ec2, knit, skein, kubernetes)
- Does not support SQL
- Only simple aggregations (no collect_list or nunique)
- Less docs and examples
- Can’t write JSON
- More bugs
The full code: < 250 LOCs
TODO
I wanted to go to the following talks this day, but had a scheduling conflict:
- Taking Django Async
- Trio: Async concurrency for mere mortals
- Love your bugs
- All in the timing: How side channel attacks work
- Bowerbirds of Technology: Architecture and Teams at Less-than-Google Scale
blog comments powered by Disqus