One of the most powerful things about coding for the sciences is that it costs nothing to re-use code we’ve written in the past, allowing us to build on past work rather than starting over every project or paper.
However, one practice we see again and again is copying and pasting code from one project into another. Sometimes it will just be a function, other times (coughMATLABcough) it’s files.
Assembling
code in packages makes it really easy to re-use old code: all the scripts and functions end up in a central location and can be called and imported from anywhere on the computer -
just like the famous packages numpy
or matplotlib
.
In this tutorial, we’ll walk through the creation of a simple package from some code that’s just lying around. You can view and clone the demo package. For an unpackaged state, checkout the unpackaged
tag.
The most basic directory structure for a Python package looks like this:
project
|
|__ setup.py
|
|__ myPackage
|
|_ somePython.py
|_ __init__.py
But at the moment, we’ve just got some flat files.
project
|
|__ norms.py
|__ metrics.py
So, the first step is to move files around. First comes the hardest part: choosing a package name. I’ll call mine measure
. Create a directory with that name, and move the python files in there.
project
|
|__ measure
|__ norms.py
|__ metrics.py
There is one more crucial file: __init__.py
lets the Python interpreter know that there are importable modules in this directory. This is the script that gets run when you execute import measure
. For more about what you can do with modules, you can see the (Python docs)[https://docs.python.org/3/tutorial/modules.html]. After adding __init__.py
, the project directory should be
project
|
|__ measure
|__ __init__.py
|__ norms.py
|__ metrics.py
At this point, the library can be imported if we’re in the same directory, but it isn’t a package. To let setuptools and pip
know how to handle it, we need to add the setup.py
file.
A very basic version of setup.py
is
from setuptools import setup
setup(
# Whatever arguments you need/want
)
If you were to run that file, you’d get a whole bunch of warnings, and nothing would actually get packaged. As a bare minimum, here is our setup.py
for it to work.
from setuptools import setup
setup(
# Needed to silence warnings (and to be a worthwhile package)
name='Measurements',
url='https://github.com/jladan/package_demo',
author='John Ladan',
author_email='jladan@uwaterloo.ca',
# Needed to actually package something
packages=['measure'],
# Needed for dependencies
install_requires=['numpy'],
# *strongly* suggested for sharing
version='0.1',
# The license can be anything you like
license='MIT',
description='An example of a python package from pre-existing code',
# We will also need a readme eventually (there will be a warning)
# long_description=open('README.txt').read(),
)
Technically, you can install the module by running python setup.py install
, however it’s much more convenient to wrap it all up and distribute online.
This is my preferred way of sharing/storing/installing packages. Simply create a git repo of the project directory, and put it somewhere. Then, use pip
to install it from that repo directly
pip install git+https://github.com/jladan/package_demo.git#egg=measurements
An advantage of this is that you can also install to particular branches, tags, and commits. A common practice is add a tag for the version release git tag v0.1
. Users of the library can then lock into that particular version,
pip install git+https://github.com/jladan/package_demo.git@v0.1#egg=measurements
In my opinion, distributing on PyPI is more complicated than with git, and not much more useful.
You’ll have to maintain the package in two places if you intend on collaborating on development on GitHub and distributing through PyPI.
So why distribute on PyPI? If you’re proud of your software and want others to use it, this is the best place. It’s more visible, easier for your users, and more tightly controlled. The stricter version control (not just whatever commit to whatever repo like git) makes the software more robust and easier to support as a developing team.
For those interested, to upload to PyPI, you’ll need
python setup.py sdist
This creates a dist
directory and adds an archive file (a.k.a. a distribution) of your package.
To get set up on PyPI for the first time with a
new package, make a new account on PyPI (or for trying it out, the testing PyPI site), and
then back in your package
directory do
python setup.py register
to claim your package name - they must be unique. Then, upload your package; you’ll do this every time you want to release a new version of your code:
python setup.py sdist upload
Now, you and anyone else in the world can install your package using the following:
pip install your-package-name
We’ve already begun the documentation by adding a short description, author/maintainer name, and version… but there’s a lot more to add. At a minimum, a README
file should be included, but so should a license text, changes between versions, and actual software documentation.
A readme file summarizes the software. For Python packages, this can named README
, README.rst
, or README.txt
. The recommendation is to use [reST]
(http://docutils.sourceforge.net/rst.html), as this is the standard on PyPI
===================
Measurements Module
===================
**A demo for how to create a python package from existing code**
The actual software provides L1, L2, Supremum, and Hamming metrics and norms for numpy arrays.
Now is a good time to uncomment that long_description
argument in setup.py
.
The typical thing is to put the license you choose in LICENSE.txt
. There are a few choices for licenses, I usually end up using MIT or simple BSD. If you’re using github, you can add most standard open source licenses through the website.
a demo of adding the license file here
Changes between versions are usually tracked in CHANGES.txt
. This is more of a concern if you’re planning to distribute the software.
A good package should also include full documentation and testing. I won’t cover this here, but unit tests can be performed with the unittest
library, with the tests stored in the directory tests
. Many people use Sphinx for documentation, and that can be stored in the doc
directory.
There’s a further step required for distributing. If you want to include these files in the distribution (the archive file in dist
), you’ll need a MANIFEST.in
file.
# MANIFEST.in
include *.txt
recursive-include docs *.rst
Suppose you want to include a script that can be executed from the command line. Create a directory scripts
, and put it there. Next, add the scripts
argument to setup.py
setup(
#...
scripts = ['scripts/hello.py']
# ...
)
So, at this point, we’ve got a pretty good project skeleton, and most of the basics are covered. Your package should look something like this:
project
|
|__ setup.py
|__ MANIFEST.in
|
|__ measure
| |__ __init__.py
| |__ norms.py
| |__ metrics.py
|
|__ tests
| |__ test_suite.py
|
|__ doc
| |__ docs.rst
|
|__ scripts
| |__ hello.py
|
|__ README.txt
|__ CHANGES.txt
|__ LICENSE.txt
If you have software that covers all these points, then congratulations! You’re well on your way to a well-maintained software package.
To get a detailed look at packaging, check out the official Python docs; they’re complete, accessible, and generally more up to date.