Python Governance

PyCon and Python have two different governance structured. Have both been led by Guido in the past, but not anymore. So it’s been confusing.

PSF is a member-elected board (not a self-perpetuating board).

There are basic members and voting members.

Voting members:

  • Contributing
  • Supporting
  • Managing
  • Fellow

Until last year, Guido was the BDFL of Python the language.

What does it mean to be the BDFL. In theory in means your in charge of everything, but in practice, it takes a lot of people. Generally, he steps in when there’s high-level design decisions. Usually made decisions on PEP.

Python Enhancement Proposal

Can be submitted or sponsored by Python core team members. Can be decided on by the BDFL or delegate. Usually little-to-no conflict regarding who took the delegate role.

Changes

PEP 572. Heated discussion, very controversial. Walrus operator. After that Guido decided that he didn’t want to be BDFL anymore.

Who’s in charge now? No really had that experience. Errr. what’s our governance model? Errr, how do we decide what our governance model?

How do we decide our governance model?

PEP-8001

Meeting in Redmond in September 2018. Decided that people could submit PEPs. So core devs got to submit and vote. Self-selected as “active” could vote.

How will the vote work? Ranked choice, administered by PSF.

What is the governance model

PEP-8002

Researched the governance of many other languages and community.

Actual proposals: PEP 8010-8016

  • Points of debate
    • distributed vs. hierarchical power
    • number of people in leadership roles
    • importance of being a core dev
    • formalization of process/power vs leaving things implicit
    • who decides on PEPs?

Winner PEP-2018 the “steering council model”.

Who’s in charge?

Elections in January, administered by the PSF

Documented in PEP-8100

Core Team * responsible for project infrastructure (github.org and repos, issue trackers, mailing lists, IRC channel, etc.) * Core team can declare “no confidence” in individual or full membership of steering council * Can vote to change PEP-13 to change the current governance (with 2/3 majority)

Steering Councel: has power to accept/reject PEPs and enforce code of conduct.

Going forward

  • What’s changing?
  • How can you get involved?

Lessons Learned

  • Make your governance structure explicit
    • most projects are BDFL-style by default, implicitly
    • no way to handle resignation, death, abuse
  • worried about bureaurcracy? just specify that “first level” of governance
    • how the decision will happen
    • who gets to make the decision
  • when changing/defining governance, consider the existing resource that exist

Shauna Gordon-Mckeon

Getting Started Testing in Data Science

Jes Ford

Data Scientist at Recursion in Salt Lake City

Originally from Alaska, have followed the snow all around the western US/Canada PhD in Astrophysics at UBC and Postdoc in Data Science at UW, Seattle no formal training in software best practices

motivate()
testing_with_pytest()
data_science_workflows()
data_science_example()
wrap_up()

Why not test

  • It takes time
  • As a data scientist I’m balancing
    • getting results as quickly as possible
    • being confident in the results that I provide
  • Won’t insist that you always test
  • But will describe scenarios that you may find yourself in
  • Disclaimer
    • I’m not a testing expert
  • How do you know if your code is correct
    • manual sanity checks
    • defensive programming (next level up)
    • tests (the best)

Defensive programming

  • adding assertions in your code
  • e.g. check that there are things in a list before you try to print them
  • assertions are a data scientist’s best friend
    • this is a practical middle ground to ensure that things are working

Testing

  • Use pytest
    • less boilerplate
    • demo
      • this is great, but these examples are too simple
  • Problems
    • TDD isn’t really a workflow in data science
    • We don’t use native objects… data frames, pandas, etc.
    • We read/write databases
    • We are using probabilistic models that don’t have deterministic results
      • think about ranges
  • Workflows
    • “one-off” analyses
    • exploratory
    • well defined problem
  • One-off
    • don’t write tests
    • clear documentation
    • maybe if I come back it, we’ll consider more
  • Exploratory work
    • Kind of impossible to do it?
    • BUT! I spend a ton of time in this phase, and there’s almost always some piece of code that I write during exploration that are useful down the road.
  • Well defined problem
    • Doesn’t come up often
    • When it does, I try to use TDD
  • Legacy code
    • inherited
    • something that you did yourself during exploratory
    • when I modify that code, I add tests. See: Testing legacy codebases from PyCon 2018
  • Pandas dataframes
    • Useful things that will help you test
    • Check for missing values (notnull(), isnull()
    • Check for duplicated data (df.duplicated())
    • pandas.util.testing.assert_frame_equal… (check_like,check_dtype,check_less_precise) paramters
    • Also handles NaN or None comparison “as expected”… e.g. NaNs are in the same place.
  • Databases
    • Let’s say I have a function “load_data” which takes a condition and returns some data frame. And then I transform.
    • could test, load_data() end to end. But don’t want to be querying our database as part of our tests.
      • One solution: pytest-mock
      • patch the query to return some input instead of hitting the database.
  • Hardcoding input/output dataframes is extremely verbose
    • too much code/too much time
    • Use the library hypothesis!
    • Auto data generation for property based testing
  • Hypothesis + pandas
    • can generate full dataframes
    • Hypothesis will try to push the limits
      • empty data frame… values at the ends of the range
  • Testing the properties of data
    • Example with customer loyalty
    • Say things that should be true about the output, rather than exactly what the output is

Wrap up

  • Data scientists should not always write tests, but should always practice defensive programming
  • Any reused or shared piece of code should be tested
  • Strive for a balance between speed and confidence

https://github.com/jesford/testing-in-data-science

5 steps to build Python native GUI widgets for BeeWare

Dan Yeaw

Designs safety into autonomous vehicles for Ford

Very frustrated with GUI widgets for Python. One app for all devices. Felt like I was working against the grain.

Cross-platform, native, app development.

Toga, Beeware’s GUI Toolkit.

Hello world

What is a widget?

  • Controls and logic that a user interacts with when using a GUI
  • We’ll use a Canvas widget as an example
  • Not all widgets have been written in BeeWare

Toga

Layers:

GUI -> Toga TK -> Bridge or Transpiler -> Platform

Withink the Toga TK:

Toga_core -> Toga_impl

More terms: toga_core -

toga_impl factory pattern (to improve testability). Tests should not need to know how to recreate the environmenta…

The five steps

  • Dev plantform
  • Reserach your widget
    • abstraction requires knowledge of specific examples
    • create use cases or user stories
    • get feedback S
  • Use Cases
  • Write the docs


blog comments powered by Disqus

Published

05 May 2019

Category

work

Tags