PyCon 2019: Day 2

Python Governance

PyCon and Python have two different governance structured. Have both been led by Guido in the past, but not anymore. So it’s been confusing.

PSF is a member-elected board (not a self-perpetuating board).

There are basic members and voting members.

Voting members:

Contributing
Supporting
Managing
Fellow

Until last year, Guido was the BDFL of Python the language.

What does it mean to be the BDFL. In theory in means your in charge of everything, but in practice, it takes a lot of people. Generally, he steps in when there’s high-level design decisions. Usually made decisions on PEP.

Python Enhancement Proposal

Can be submitted or sponsored by Python core team members. Can be decided on by the BDFL or delegate. Usually little-to-no conflict regarding who took the delegate role.

Changes

PEP 572. Heated discussion, very controversial. Walrus operator. After that Guido decided that he didn’t want to be BDFL anymore.

Who’s in charge now? No really had that experience. Errr. what’s our governance model? Errr, how do we decide what our governance model?

How do we decide our governance model?

PEP-8001

Meeting in Redmond in September 2018. Decided that people could submit PEPs. So core devs got to submit and vote. Self-selected as “active” could vote.

How will the vote work? Ranked choice, administered by PSF.

What is the governance model

PEP-8002

Researched the governance of many other languages and community.

Actual proposals: PEP 8010-8016

Points of debate
- distributed vs. hierarchical power
- number of people in leadership roles
- importance of being a core dev
- formalization of process/power vs leaving things implicit
- who decides on PEPs?
- …

Winner PEP-2018 the “steering council model”.

Who’s in charge?

Elections in January, administered by the PSF

Documented in PEP-8100

Core Team * responsible for project infrastructure (github.org and repos, issue trackers, mailing lists, IRC channel, etc.) * Core team can declare “no confidence” in individual or full membership of steering council * Can vote to change PEP-13 to change the current governance (with 2/3 majority)

Steering Councel: has power to accept/reject PEPs and enforce code of conduct.

Going forward

What’s changing?
How can you get involved?

Lessons Learned

Make your governance structure explicit
- most projects are BDFL-style by default, implicitly
- no way to handle resignation, death, abuse
worried about bureaurcracy? just specify that “first level” of governance
- how the decision will happen
- who gets to make the decision
when changing/defining governance, consider the existing resource that exist

Shauna Gordon-Mckeon

Getting Started Testing in Data Science

Jes Ford

Data Scientist at Recursion in Salt Lake City

Originally from Alaska, have followed the snow all around the western US/Canada PhD in Astrophysics at UBC and Postdoc in Data Science at UW, Seattle no formal training in software best practices

motivate()
testing_with_pytest()
data_science_workflows()
data_science_example()
wrap_up()

Why not test

It takes time
As a data scientist I’m balancing
- getting results as quickly as possible
- being confident in the results that I provide
Won’t insist that you always test
But will describe scenarios that you may find yourself in
Disclaimer
- I’m not a testing expert
How do you know if your code is correct
- manual sanity checks
- defensive programming (next level up)
- tests (the best)

Defensive programming

adding assertions in your code
e.g. check that there are things in a list before you try to print them
assertions are a data scientist’s best friend
- this is a practical middle ground to ensure that things are working

Testing

Use pytest
- less boilerplate
- demo
  - this is great, but these examples are too simple
Problems
- TDD isn’t really a workflow in data science
- We don’t use native objects… data frames, pandas, etc.
- We read/write databases
- We are using probabilistic models that don’t have deterministic results
  - think about ranges
Workflows
- “one-off” analyses
- exploratory
- well defined problem
One-off
- don’t write tests
- clear documentation
- maybe if I come back it, we’ll consider more
Exploratory work
- Kind of impossible to do it?
- BUT! I spend a ton of time in this phase, and there’s almost always some piece of code that I write during exploration that are useful down the road.
Well defined problem
- Doesn’t come up often
- When it does, I try to use TDD
Legacy code
- inherited
- something that you did yourself during exploratory
- when I modify that code, I add tests. See: Testing legacy codebases from PyCon 2018
Pandas dataframes
- Useful things that will help you test
- Check for missing values (notnull(), isnull()
- Check for duplicated data (df.duplicated())
- pandas.util.testing.assert_frame_equal… (check_like,check_dtype,check_less_precise) paramters
- Also handles NaN or None comparison “as expected”… e.g. NaNs are in the same place.
Databases
- Let’s say I have a function “load_data” which takes a condition and returns some data frame. And then I transform.
- could test, load_data() end to end. But don’t want to be querying our database as part of our tests.
  - One solution: pytest-mock
  - patch the query to return some input instead of hitting the database.
Hardcoding input/output dataframes is extremely verbose
- too much code/too much time
- Use the library hypothesis!
- Auto data generation for property based testing
Hypothesis + pandas
- can generate full dataframes
- Hypothesis will try to push the limits
  - empty data frame… values at the ends of the range
Testing the properties of data
- Example with customer loyalty
- Say things that should be true about the output, rather than exactly what the output is

Wrap up

Data scientists should not always write tests, but should always practice defensive programming
Any reused or shared piece of code should be tested
Strive for a balance between speed and confidence

https://github.com/jesford/testing-in-data-science

5 steps to build Python native GUI widgets for BeeWare

Dan Yeaw

Designs safety into autonomous vehicles for Ford

Very frustrated with GUI widgets for Python. One app for all devices. Felt like I was working against the grain.

Cross-platform, native, app development.

Toga, Beeware’s GUI Toolkit.

Hello world

Controls and logic that a user interacts with when using a GUI
We’ll use a Canvas widget as an example
Not all widgets have been written in BeeWare

Toga

Layers:

GUI -> Toga TK -> Bridge or Transpiler -> Platform

Withink the Toga TK:

Toga_core -> Toga_impl

More terms: toga_core -

toga_impl factory pattern (to improve testability). Tests should not need to know how to recreate the environmenta…

The five steps

Dev plantform
Reserach your widget
- abstraction requires knowledge of specific examples
- create use cases or user stories
- get feedback S
Use Cases
Write the docs

← Previous Archive Next →

blog comments powered by Disqus

Published

05 May 2019