PyCon Day 2: Afternoon Sessions

Usable Ops

Kate Hettleston and Joyce Jang

Intro

Technical on-boarding. Most of the problems that people ran into were not necessarily on-boarding problems, but problems understanding the systems that existed.
Scaling problems where not about bringing in as many engineers as possible.
Start all over for a bug fix (but quickly!) / Rollback if there’s an issue
Develop -> Review -> Run Automated Tests -> Deploy to Staging -> Deploy to Production
Deploying is so fragile that many teams have specific DevOps teams where developers throw code over a wall.
DevOps Engineers are abstracted away from the problem to use
Wall creates barriers to on-boarding, hard to get code to a testable problem
The way we think about web infrastructure are managed as technical problems
However most of the problems are actually human problems and human error
Human problems arise when we interact with technology
Focus on building abstractions that allow us to do what we do best and allow computers to do what they do best

What is usability?

How we interact with the man-made things around us
E.g. how do we turn on a light?
Book: “The design of everyday things”
- Author loved asking someone to dim the lights when he gave a talk
- Lights are not inherently easy
E.g. how do you set your shower temperature?
E.g. how do you open a door?
- Have you ever used a door incorrectly?
- The canonical example of usability?
Key vocabulary: affordances
- An object that tells you how to interact with it
- Teapot affords that it need holding
- Mugs do not say “hold here” on the handle
- Can discover how to use objects just by trial and error
- Give people visual and tactile clues as to how to use them
Consequences of bad usability
- Annoying doors; get frustrated, embarrassed
- Injury or death
- e.g. Cars
  - Merge lane where there are lots of accidents
  - Road infrastructure is unusable in those spots
  - We’re dependent on the usability to keep us safe from harm
  - Increased cars == increased accidents in 1940s
  - “Bots dots” were brought in and reduced the number of accidents.
  - Bots didn’t create new tech
  - Bots increased road usability
Roads are analogous to web infrastructure
- the usage has increased dramatically
- building and using web infrastructure needs to be relatively understandable and accessible

Why is it important to web infrastructure?

Software is a man made object
But it’s abstract, so it’s hard to apply usability to it
E.g. file editing
- solved problem since 1967 when the first screen editor was created
- first version control wasn’t written until 1980s
- GitHub solves the problem of teams of people editing collections of files
- Levels of abstraction (file editing, version control, productivity platform)
CS is about abstraction
- binary -> programming language
Usability is
Consequences
- Errors
  - If tools are hard to use then devs will either avoid using them or use them incorrectly
  - E.g. engineer at trulia deleted the entire user database
- Scalability
  - Poor automation means that you can’t scale your engineering team
  - If the system is tool complex, you can’t hire and train people in a reasonable amount of time
  - E.g. a company had to freeze hiring for 9 months on two different occasions, because they would decrease productivity for each hire.
  - “If your system is too complex for your entire team to use safely, it is too complex. Period.”
- Friction
  - Success is tied to the projects that we work on
  - Dependent on the web infrastructure on which it runs
  - At the mercy of the DevOp Engineers
  - Makes the proverbial wall palpable
  - Creates power dynamics
  - Creates the opportunity to block productivity of their peers
  - Not the fault of either team… fault of the organization
  - Separation of responsibilities can be the right way to go, but must have the right processes in place
  - Usability is different than security
    - Just because everyone can use the system, doesn’t mean everyone has to have access to it
    - Separate concerns

How do we build usable web infrastructure?

How do you change system installations?
- Separate process than writing code
- Separate tool that requires specialized training
- Solution: use a container.
  - Containers fix the problem that system installations can change the same way that code development happens
  - Edit code, change it and commit it, and you use the same tools and workflows that you use in either space
  - Links system installations to code changes
  - Humans spend a lot of time figuring out why the servers are running the wrong thing
  - Greatly reduces the amount of information that a person needs to know to get their job done
  - Reduces human errors
  - Reduces the amount of specialized training, which is a huge blocker for human scaling
How do you deploy code?
- Can be the source of a lot of stress and problems
- One-click deploy system is the biggest way to improve productivity
- Everyone knows how to use a button
- Abstract the pieces that require human attention away from those that don’t
- Things that require human intent should have a button
- Good abstractions are all about creating human usable entry points
- E.g. hearsaysocial PR Bot
  - Red/Green PR buttons
How do you know where you are in the system?
- Non-trivial problem when there are more services than engineers
- Companies have internal tools that show all of the services
  - What services are there?
  - Which services talk to others?
  - Needs to be able to update itself in real time
  - Needs to be interactive
  - Needs to show where the code is running
- 10 usability heuristics (Jacon Neilson?)

The cobbler’s children have no shoes, or building better tools for ourselves

Alex Gaynor: US Digital Service

https://speakerdeck.com/alex

Premise: we like writing be fancy tools, but we don’t write tools for ourselves

A short history of tools

$ git init

Everything had version control
Issue trackers were common, but you couldn’t necessarily that they existed
CI was not universal and now it’s extremely common
Code review tool have become en vogue, but that wasn’t necessarily always the case
Deployment automation is basically expected, but that wasn’t always the case
- fabric, chef or heroku
Most healthy projects have these things
Not quite universal

Emerging trends

CI for Pull Requests
- Ability to run all of your tests on proposed changes is an incredible advancement
- Far more common in open source (largely because of TravisCI)
Linting
- pep8
- flake8
- bandit (bad security practices in Python)
- Anything that tries to assess your code w/o actually running it
- Other communities are moving away from style checks to actually fixing it for you
Coverage Tracking
- This is way more automated than it used to be
- Used to be someone would run it when you got around to it
livegrep.com
- Imagine you’re a large company that don’t necessarily know all of the projects across your company
https://github.com/facebook/mention-bot
- Suggests reviewers based on the changes that you’re making

Build more tailored tools

As developers we have the ability to write software
Too often, our processes are a hodgepodge of by-hand stuff
Automation > Process
- Automation scales better
- If you encode your process into a tool, when you want to change it, that is a Pull Request
- Functionally, it is possible to see what the expectations are
- It’s easier to discuss the merits of a change and to experiement with that
- You always know what the correct behvior is
- Human processes deviate from what has been documented and documentation bit rots
- When your processes are encoded in tools, you avoid this problem
APIs!
- These examples will all use GitHub’s API
- Publicly accessible API
- Issues
  - Create an issue
  - Add/remove labels
  - Add a comment
  - Assign to someone
- PR
  - Send a PR
  - Assign a PR
  - Add/remove labels
  - Leave a code review
  - Add a commit status (Say whether something is passing/failing)
- $ pip install github3.py
- Create a bot user/password w/ minimal permissions
Examples
- HTTPS certificate expiration
  - Common, people forget, don’t want the ugly red lock sign
  - Track this in our issue tracker
- Auto-labelling
  - Created a security label to help people prioritize
  - Create a bot that will automatically create a security label
  - Any time we touch the cryptography.py file
  - Use web hooks
    - GitHub will make a request back to us anytime something happens
- Other ideas
  - requirements.txt bumper
    - a bot that goes through all of our projects and creates a pull request to upgrade requirements
  - UI change reviewer
    - painful
    - hard to notice
    - no automated way to test/check for it
    - TravisCI captures screenshots
    - Send screenshots to a service we control
    - That service can leave a comment on GitHub asking whether the change was actually correct
    - This adds a human element to code review
  - Approval process commit status
    - Imagine a bot that knows people’s roles (e.g. front-end/back-end both co-approve)
- Often these are very small tools
  - 10 lines or less
  - Help us to not forget something
  - Small processes have made us much more productive
Questions
- Q: What can’t you make a tool for? A: If something is intentionally invisible, tools that try to make it visible, fail.

← Previous Archive Next →

blog comments powered by Disqus

Published

30 May 2016

Usable Ops

Intro

What is usability?

Why is it important to web infrastructure?

How do we build usable web infrastructure?

The cobbler’s children have no shoes, or building better tools for ourselves

A short history of tools

Emerging trends

Build more tailored tools

Published

Category

Tags