One of the most powerful things about coding for the sciences is that it costs nothing to re-use code we’ve written in the past, allowing us to build on past work rather than starting over every project or paper.
However, one practice we see again and again is copying and pasting code from one project into another. Sometimes it will just be a function, other times (coughMATLABcough) it’s files.
code in packages makes it really easy to re-use old code: all the scripts and functions end up in a central location and can be called and imported from anywhere on the computer -
just like the famous packages
In this tutorial, we’ll walk through the creation of a simple package from some code that’s just lying around. You can view and clone the demo package. For an unpackaged state, checkout the
The most basic directory structure for a Python package looks like this:
project | |__ setup.py | |__ myPackage | |_ somePython.py |_ __init__.py
But at the moment, we’ve just got some flat files.
project | |__ norms.py |__ metrics.py
So, the first step is to move files around. First comes the hardest part: choosing a package name. I’ll call mine
measure. Create a directory with that name, and move the python files in there.
project | |__ measure |__ norms.py |__ metrics.py
There is one more crucial file:
__init__.py lets the Python interpreter know that there are importable modules in this directory. This is the script that gets run when you execute
import measure. For more about what you can do with modules, you can see the (Python docs)[https://docs.python.org/3/tutorial/modules.html]. After adding
__init__.py, the project directory should be
project | |__ measure |__ __init__.py |__ norms.py |__ metrics.py
At this point, the library can be imported if we’re in the same directory, but it isn’t a package. To let setuptools and
pip know how to handle it, we need to add the
A very basic version of
from setuptools import setup setup( # Whatever arguments you need/want )
If you were to run that file, you’d get a whole bunch of warnings, and nothing would actually get packaged. As a bare minimum, here is our
setup.py for it to work.
from setuptools import setup setup( # Needed to silence warnings (and to be a worthwhile package) name='Measurements', url='https://github.com/jladan/package_demo', author='John Ladan', firstname.lastname@example.org', # Needed to actually package something packages=['measure'], # Needed for dependencies install_requires=['numpy'], # *strongly* suggested for sharing version='0.1', # The license can be anything you like license='MIT', description='An example of a python package from pre-existing code', # We will also need a readme eventually (there will be a warning) # long_description=open('README.txt').read(), )
Technically, you can install the module by running
python setup.py install, however it’s much more convenient to wrap it all up and distribute online.
This is my preferred way of sharing/storing/installing packages. Simply create a git repo of the project directory, and put it somewhere. Then, use
pip to install it from that repo directly
pip install git+https://github.com/jladan/package_demo.git#egg=measurements
An advantage of this is that you can also install to particular branches, tags, and commits. A common practice is add a tag for the version release
git tag v0.1. Users of the library can then lock into that particular version,
pip install git+https://email@example.com#egg=measurements
In my opinion, distributing on PyPI is more complicated than with git, and not much more useful.
You’ll have to maintain the package in two places if you intend on collaborating on development on GitHub and distributing through PyPI.
So why distribute on PyPI? If you’re proud of your software and want others to use it, this is the best place. It’s more visible, easier for your users, and more tightly controlled. The stricter version control (not just whatever commit to whatever repo like git) makes the software more robust and easier to support as a developing team.
For those interested, to upload to PyPI, you’ll need
python setup.py sdist
This creates a
dist directory and adds an archive file (a.k.a. a distribution) of your package.
python setup.py register
to claim your package name - they must be unique. Then, upload your package; you’ll do this every time you want to release a new version of your code:
python setup.py sdist upload
Now, you and anyone else in the world can install your package using the following:
pip install your-package-name
We’ve already begun the documentation by adding a short description, author/maintainer name, and version… but there’s a lot more to add. At a minimum, a
README file should be included, but so should a license text, changes between versions, and actual software documentation.
A readme file summarizes the software. For Python packages, this can named
README.txt. The recommendation is to use [reST]
(http://docutils.sourceforge.net/rst.html), as this is the standard on PyPI
=================== Measurements Module =================== **A demo for how to create a python package from existing code** The actual software provides L1, L2, Supremum, and Hamming metrics and norms for numpy arrays.
Now is a good time to uncomment that
long_description argument in
The typical thing is to put the license you choose in
LICENSE.txt. There are a few choices for licenses, I usually end up using MIT or simple BSD. If you’re using github, you can add most standard open source licenses through the website.
a demo of adding the license file here
Changes between versions are usually tracked in
CHANGES.txt. This is more of a concern if you’re planning to distribute the software.
A good package should also include full documentation and testing. I won’t cover this here, but unit tests can be performed with the
unittest library, with the tests stored in the directory
tests. Many people use Sphinx for documentation, and that can be stored in the
There’s a further step required for distributing. If you want to include these files in the distribution (the archive file in
dist), you’ll need a
# MANIFEST.in include *.txt recursive-include docs *.rst
Suppose you want to include a script that can be executed from the command line. Create a directory
scripts, and put it there. Next, add the
scripts argument to
setup( #... scripts = ['scripts/hello.py'] # ... )
So, at this point, we’ve got a pretty good project skeleton, and most of the basics are covered. Your package should look something like this:
project | |__ setup.py |__ MANIFEST.in | |__ measure | |__ __init__.py | |__ norms.py | |__ metrics.py | |__ tests | |__ test_suite.py | |__ doc | |__ docs.rst | |__ scripts | |__ hello.py | |__ README.txt |__ CHANGES.txt |__ LICENSE.txt
If you have software that covers all these points, then congratulations! You’re well on your way to a well-maintained software package.
To get a detailed look at packaging, check out the official Python docs; they’re complete, accessible, and generally more up to date.