PyCon Day 2: Afternoon Sessions
Usable Ops
Kate Hettleston and Joyce Jang
Intro
- Technical on-boarding. Most of the problems that people ran into were not necessarily on-boarding problems, but problems understanding the systems that existed.
- Scaling problems where not about bringing in as many engineers as possible.
- Start all over for a bug fix (but quickly!) / Rollback if there’s an issue
- Develop -> Review -> Run Automated Tests -> Deploy to Staging -> Deploy to Production
- Deploying is so fragile that many teams have specific DevOps teams where developers throw code over a wall.
- DevOps Engineers are abstracted away from the problem to use
- Wall creates barriers to on-boarding, hard to get code to a testable problem
- The way we think about web infrastructure are managed as technical problems
- However most of the problems are actually human problems and human error
- Human problems arise when we interact with technology
- Focus on building abstractions that allow us to do what we do best and allow computers to do what they do best
What is usability?
- How we interact with the man-made things around us
- E.g. how do we turn on a light?
- Book: “The design of everyday things”
- Author loved asking someone to dim the lights when he gave a talk
- Lights are not inherently easy
- E.g. how do you set your shower temperature?
- E.g. how do you open a door?
- Have you ever used a door incorrectly?
- The canonical example of usability?
- Key vocabulary: affordances
- An object that tells you how to interact with it
- Teapot affords that it need holding
- Mugs do not say “hold here” on the handle
- Can discover how to use objects just by trial and error
- Give people visual and tactile clues as to how to use them
- Consequences of bad usability
- Annoying doors; get frustrated, embarrassed
- Injury or death
- e.g. Cars
- Merge lane where there are lots of accidents
- Road infrastructure is unusable in those spots
- We’re dependent on the usability to keep us safe from harm
- Increased cars == increased accidents in 1940s
- “Bots dots” were brought in and reduced the number of accidents.
- Bots didn’t create new tech
- Bots increased road usability
- Roads are analogous to web infrastructure
- the usage has increased dramatically
- building and using web infrastructure needs to be relatively understandable and accessible
Why is it important to web infrastructure?
- Software is a man made object
- But it’s abstract, so it’s hard to apply usability to it
- E.g. file editing
- solved problem since 1967 when the first screen editor was created
- first version control wasn’t written until 1980s
- GitHub solves the problem of teams of people editing collections of files
- Levels of abstraction (file editing, version control, productivity platform)
- CS is about abstraction
- binary -> programming language
- Usability is
- Consequences
- Errors
- If tools are hard to use then devs will either avoid using them or use them incorrectly
- E.g. engineer at trulia deleted the entire user database
- Scalability
- Poor automation means that you can’t scale your engineering team
- If the system is tool complex, you can’t hire and train people in a reasonable amount of time
- E.g. a company had to freeze hiring for 9 months on two different occasions, because they would decrease productivity for each hire.
- “If your system is too complex for your entire team to use safely, it is too complex. Period.”
- Friction
- Success is tied to the projects that we work on
- Dependent on the web infrastructure on which it runs
- At the mercy of the DevOp Engineers
- Makes the proverbial wall palpable
- Creates power dynamics
- Creates the opportunity to block productivity of their peers
- Not the fault of either team… fault of the organization
- Separation of responsibilities can be the right way to go, but must have the right processes in place
- Usability is different than security
- Just because everyone can use the system, doesn’t mean everyone has to have access to it
- Separate concerns
- Errors
How do we build usable web infrastructure?
- How do you change system installations?
- Separate process than writing code
- Separate tool that requires specialized training
- Solution: use a container.
- Containers fix the problem that system installations can change the same way that code development happens
- Edit code, change it and commit it, and you use the same tools and workflows that you use in either space
- Links system installations to code changes
- Humans spend a lot of time figuring out why the servers are running the wrong thing
- Greatly reduces the amount of information that a person needs to know to get their job done
- Reduces human errors
- Reduces the amount of specialized training, which is a huge blocker for human scaling
- How do you deploy code?
- Can be the source of a lot of stress and problems
- One-click deploy system is the biggest way to improve productivity
- Everyone knows how to use a button
- Abstract the pieces that require human attention away from those that don’t
- Things that require human intent should have a button
- Good abstractions are all about creating human usable entry points
- E.g. hearsaysocial PR Bot
- Red/Green PR buttons
- How do you know where you are in the system?
- Non-trivial problem when there are more services than engineers
- Companies have internal tools that show all of the services
- What services are there?
- Which services talk to others?
- Needs to be able to update itself in real time
- Needs to be interactive
- Needs to show where the code is running
- 10 usability heuristics (Jacon Neilson?)
The cobbler’s children have no shoes, or building better tools for ourselves
Alex Gaynor: US Digital Service
Premise: we like writing be fancy tools, but we don’t write tools for ourselves
A short history of tools
$ git init
- Everything had version control
- Issue trackers were common, but you couldn’t necessarily that they existed
- CI was not universal and now it’s extremely common
- Code review tool have become en vogue, but that wasn’t necessarily always the case
- Deployment automation is basically expected, but that wasn’t always the case
- fabric, chef or heroku
- Most healthy projects have these things
- Not quite universal
Emerging trends
- CI for Pull Requests
- Ability to run all of your tests on proposed changes is an incredible advancement
- Far more common in open source (largely because of TravisCI)
- Linting
- pep8
- flake8
- bandit (bad security practices in Python)
- Anything that tries to assess your code w/o actually running it
- Other communities are moving away from style checks to actually fixing it for you
- Coverage Tracking
- This is way more automated than it used to be
- Used to be someone would run it when you got around to it
- livegrep.com
- Imagine you’re a large company that don’t necessarily know all of the projects across your company
- https://github.com/facebook/mention-bot
- Suggests reviewers based on the changes that you’re making
Build more tailored tools
- As developers we have the ability to write software
- Too often, our processes are a hodgepodge of by-hand stuff
- Automation > Process
- Automation scales better
- If you encode your process into a tool, when you want to change it, that is a Pull Request
- Functionally, it is possible to see what the expectations are
- It’s easier to discuss the merits of a change and to experiement with that
- You always know what the correct behvior is
- Human processes deviate from what has been documented and documentation bit rots
- When your processes are encoded in tools, you avoid this problem
- APIs!
- These examples will all use GitHub’s API
- Publicly accessible API
- Issues
- Create an issue
- Add/remove labels
- Add a comment
- Assign to someone
- PR
- Send a PR
- Assign a PR
- Add/remove labels
- Leave a code review
- Add a commit status (Say whether something is passing/failing)
- $ pip install github3.py
- Create a bot user/password w/ minimal permissions
- Examples
- HTTPS certificate expiration
- Common, people forget, don’t want the ugly red lock sign
- Track this in our issue tracker
- Auto-labelling
- Created a security label to help people prioritize
- Create a bot that will automatically create a security label
- Any time we touch the cryptography.py file
- Use web hooks
- GitHub will make a request back to us anytime something happens
- Other ideas
- requirements.txt bumper
- a bot that goes through all of our projects and creates a pull request to upgrade requirements
- UI change reviewer
- painful
- hard to notice
- no automated way to test/check for it
- TravisCI captures screenshots
- Send screenshots to a service we control
- That service can leave a comment on GitHub asking whether the change was actually correct
- This adds a human element to code review
- Approval process commit status
- Imagine a bot that knows people’s roles (e.g. front-end/back-end both co-approve)
- requirements.txt bumper
- Often these are very small tools
- 10 lines or less
- Help us to not forget something
- Small processes have made us much more productive
- HTTPS certificate expiration
- Questions
- Q: What can’t you make a tool for? A: If something is intentionally invisible, tools that try to make it visible, fail.
blog comments powered by Disqus