Usable Ops

Kate Hettleston and Joyce Jang

Intro

  • Technical on-boarding. Most of the problems that people ran into were not necessarily on-boarding problems, but problems understanding the systems that existed.
  • Scaling problems where not about bringing in as many engineers as possible.
  • Start all over for a bug fix (but quickly!) / Rollback if there’s an issue
  • Develop -> Review -> Run Automated Tests -> Deploy to Staging -> Deploy to Production
  • Deploying is so fragile that many teams have specific DevOps teams where developers throw code over a wall.
  • DevOps Engineers are abstracted away from the problem to use
  • Wall creates barriers to on-boarding, hard to get code to a testable problem
  • The way we think about web infrastructure are managed as technical problems
  • However most of the problems are actually human problems and human error
  • Human problems arise when we interact with technology
  • Focus on building abstractions that allow us to do what we do best and allow computers to do what they do best

What is usability?

  • How we interact with the man-made things around us
  • E.g. how do we turn on a light?
  • Book: “The design of everyday things”
    • Author loved asking someone to dim the lights when he gave a talk
    • Lights are not inherently easy
  • E.g. how do you set your shower temperature?
  • E.g. how do you open a door?
    • Have you ever used a door incorrectly?
    • The canonical example of usability?
  • Key vocabulary: affordances
    • An object that tells you how to interact with it
    • Teapot affords that it need holding
    • Mugs do not say “hold here” on the handle
    • Can discover how to use objects just by trial and error
    • Give people visual and tactile clues as to how to use them
  • Consequences of bad usability
    • Annoying doors; get frustrated, embarrassed
    • Injury or death
    • e.g. Cars
      • Merge lane where there are lots of accidents
      • Road infrastructure is unusable in those spots
      • We’re dependent on the usability to keep us safe from harm
      • Increased cars == increased accidents in 1940s
      • “Bots dots” were brought in and reduced the number of accidents.
      • Bots didn’t create new tech
      • Bots increased road usability
  • Roads are analogous to web infrastructure
    • the usage has increased dramatically
    • building and using web infrastructure needs to be relatively understandable and accessible

Why is it important to web infrastructure?

  • Software is a man made object
  • But it’s abstract, so it’s hard to apply usability to it
  • E.g. file editing
    • solved problem since 1967 when the first screen editor was created
    • first version control wasn’t written until 1980s
    • GitHub solves the problem of teams of people editing collections of files
    • Levels of abstraction (file editing, version control, productivity platform)
  • CS is about abstraction
    • binary -> programming language
  • Usability is
  • Consequences
    • Errors
      • If tools are hard to use then devs will either avoid using them or use them incorrectly
      • E.g. engineer at trulia deleted the entire user database
    • Scalability
      • Poor automation means that you can’t scale your engineering team
      • If the system is tool complex, you can’t hire and train people in a reasonable amount of time
      • E.g. a company had to freeze hiring for 9 months on two different occasions, because they would decrease productivity for each hire.
      • “If your system is too complex for your entire team to use safely, it is too complex. Period.”
    • Friction
      • Success is tied to the projects that we work on
      • Dependent on the web infrastructure on which it runs
      • At the mercy of the DevOp Engineers
      • Makes the proverbial wall palpable
      • Creates power dynamics
      • Creates the opportunity to block productivity of their peers
      • Not the fault of either team… fault of the organization
      • Separation of responsibilities can be the right way to go, but must have the right processes in place
      • Usability is different than security
        • Just because everyone can use the system, doesn’t mean everyone has to have access to it
        • Separate concerns

How do we build usable web infrastructure?

  • How do you change system installations?
    • Separate process than writing code
    • Separate tool that requires specialized training
    • Solution: use a container.
      • Containers fix the problem that system installations can change the same way that code development happens
      • Edit code, change it and commit it, and you use the same tools and workflows that you use in either space
      • Links system installations to code changes
      • Humans spend a lot of time figuring out why the servers are running the wrong thing
      • Greatly reduces the amount of information that a person needs to know to get their job done
      • Reduces human errors
      • Reduces the amount of specialized training, which is a huge blocker for human scaling
  • How do you deploy code?
    • Can be the source of a lot of stress and problems
    • One-click deploy system is the biggest way to improve productivity
    • Everyone knows how to use a button
    • Abstract the pieces that require human attention away from those that don’t
    • Things that require human intent should have a button
    • Good abstractions are all about creating human usable entry points
    • E.g. hearsaysocial PR Bot
      • Red/Green PR buttons
  • How do you know where you are in the system?
    • Non-trivial problem when there are more services than engineers
    • Companies have internal tools that show all of the services
      • What services are there?
      • Which services talk to others?
      • Needs to be able to update itself in real time
      • Needs to be interactive
      • Needs to show where the code is running
    • 10 usability heuristics (Jacon Neilson?)

The cobbler’s children have no shoes, or building better tools for ourselves

Alex Gaynor: US Digital Service

https://speakerdeck.com/alex

Premise: we like writing be fancy tools, but we don’t write tools for ourselves

A short history of tools

$ git init
  • Everything had version control
  • Issue trackers were common, but you couldn’t necessarily that they existed
  • CI was not universal and now it’s extremely common
  • Code review tool have become en vogue, but that wasn’t necessarily always the case
  • Deployment automation is basically expected, but that wasn’t always the case
    • fabric, chef or heroku
  • Most healthy projects have these things
  • Not quite universal
  • CI for Pull Requests
    • Ability to run all of your tests on proposed changes is an incredible advancement
    • Far more common in open source (largely because of TravisCI)
  • Linting
    • pep8
    • flake8
    • bandit (bad security practices in Python)
    • Anything that tries to assess your code w/o actually running it
    • Other communities are moving away from style checks to actually fixing it for you
  • Coverage Tracking
    • This is way more automated than it used to be
    • Used to be someone would run it when you got around to it
  • livegrep.com
    • Imagine you’re a large company that don’t necessarily know all of the projects across your company
  • https://github.com/facebook/mention-bot
    • Suggests reviewers based on the changes that you’re making

Build more tailored tools

  • As developers we have the ability to write software
  • Too often, our processes are a hodgepodge of by-hand stuff
  • Automation > Process
    • Automation scales better
    • If you encode your process into a tool, when you want to change it, that is a Pull Request
    • Functionally, it is possible to see what the expectations are
    • It’s easier to discuss the merits of a change and to experiement with that
    • You always know what the correct behvior is
    • Human processes deviate from what has been documented and documentation bit rots
    • When your processes are encoded in tools, you avoid this problem
  • APIs!
    • These examples will all use GitHub’s API
    • Publicly accessible API
    • Issues
      • Create an issue
      • Add/remove labels
      • Add a comment
      • Assign to someone
    • PR
      • Send a PR
      • Assign a PR
      • Add/remove labels
      • Leave a code review
      • Add a commit status (Say whether something is passing/failing)
    • $ pip install github3.py
    • Create a bot user/password w/ minimal permissions
  • Examples
    • HTTPS certificate expiration
      • Common, people forget, don’t want the ugly red lock sign
      • Track this in our issue tracker
    • Auto-labelling
      • Created a security label to help people prioritize
      • Create a bot that will automatically create a security label
      • Any time we touch the cryptography.py file
      • Use web hooks
        • GitHub will make a request back to us anytime something happens
    • Other ideas
      • requirements.txt bumper
        • a bot that goes through all of our projects and creates a pull request to upgrade requirements
      • UI change reviewer
        • painful
        • hard to notice
        • no automated way to test/check for it
        • TravisCI captures screenshots
        • Send screenshots to a service we control
        • That service can leave a comment on GitHub asking whether the change was actually correct
        • This adds a human element to code review
      • Approval process commit status
        • Imagine a bot that knows people’s roles (e.g. front-end/back-end both co-approve)
    • Often these are very small tools
      • 10 lines or less
      • Help us to not forget something
      • Small processes have made us much more productive
  • Questions
    • Q: What can’t you make a tool for? A: If something is intentionally invisible, tools that try to make it visible, fail.


blog comments powered by Disqus

Published

30 May 2016

Category

work

Tags