Publishing Python Modules

There are problems that require specific solutions, and problems that require generalizable solutions. This article is aimed at the latter, using python as an example. When building out large systems, it’s important to keep in mind the DRY principle — Do not Repeat Yourself! Repetition in a code base can turn a simple change into a tangled mess befuddled by human errors — hence, spaghetti code. One way to reduce repetition in a single code base is through modularizing functionality — by decomposing repetitious code into modular functions, an update to a single function can replace the arduous task of updating code in distant corners of a codebase. But what if you’re building something bigger, and shared code exists beyond a single codebase?

Enter, packages. Every major programming language has some mechanism through which code can be shared and used by different people — Ruby has Gems, Node has npm, and Python has pypi. Packages help reduce the effort required to compose software by modularizing pre-built functionality that can be imported and used in any codebase. If you wanted to write a python program that can graph a dataset, you certainly could write your own library to help facilitate this functionality, but this might not be the best use of your time. Graphing is a common problem, and common problems often already have solutions — the path of least resistance would be to import an existing library (such as matplotlib), and use that for your graphing needs (unless, of course, you’re needs require a level of complexity and sophistication that existing packages can’t provide, in which case a more custom solution is required).

Writing Custom Modules

There are many reasons to write custom modules — you might want to package some functionality that would be useful in code across your organization’s codebases, you might want to open source a solution that worked well for a particular problem you solved, or you might just be interested in how programmers share code. Whatever the case, mature languages will almost certainly have the facilities to share code. In fact, I’d argue that any moderately useful language should have some type of package management solution — it’s impossible to efficiently write code if every time you want to do something as common as print to stdout, you have to write the functionality to support the use case yourself. Of course, this is coupled with the question of how extensive should a language's standard library be — that's a discussion beyond the scope of this article, but I point the curious reader to Guy Steele's fantastic talk on growing a language.

Exploring Package Management in Python

Note: I use the term package and module interchangeably. Semantically speaking, a package is a collection of modules, whereas a module is a single file. For the sake of simplicity, I illustrate examples of a module, but the same applies for packages in general.

With sufficient motivation for shared packages, we can now look into the tools within python that make this magic possible. pip is the utility that allows you to install packages in python — there's no shared consensus over what it actually stands for ("PIP installs python"? "PIP installs packages?", or some other uninspired variation). Regardless of what it stands for, usage is incredibly simple. In our above example using matplotlib, we can install the package as simply as running:

Note: Some python installations might not come with pip out-of-box. From the pip documentation:

pip is already installed if you are using Python 2 >=2.7.9 or Python 3 >=3.4 downloaded from python.org or if you are working in a Virtual Environment created by virtualenv or pyvenv. Just make sure to upgrade pip.

Now know how to install packages, but where do those packages come from?

PyPi: Python Package Index

The Python Package Index is the most popular public index of python packages. Anybody can publish to or install from PyPI. If you have shared code that is sensitive (for example, shared code to handle auth within an organization), there are excellent private options, such as JFrog Artifactory or AWS CodeArtifact.

pip defaults to using the PyPi index when installing packages, which you can confirm by observing the output from the following command:

If you’re using a private option like Artifactory, you would need to create a configuration file that knows to use Artifactory when attempting to resolve packages.

Writing a Python Module

Writing a module in python to be shared across files in a single code base is as simple as defining a function and importing it into other files. If I define the following function in my_cool_function.py:

I can import this function in another file as easily as:

But perhaps, my_cool_function is so cool and useful that you feel the need to share it with the world — you want to publish your module to PyPI. The difficult part is writing the module — actually publishing it is seamless once you have the proper infrastructure in place.

Publishing a Python Module

The base infrastructure for writing a module is the actual module code, and a few additional files. The basic module structure should look something like the below:

The actual source code exists in src/. Along with the file we have for our code, we also have an __init__.py — this file lets python know this is a module directory. This is a contrived example — in reality, your module may have many more files in the src directory that serve as support for your core module. (Note: the src/ directory can actually be called whatever you desire — all that matters is that you properly identify the root directory of your source files in your setup.py, which we’ll get to in a moment.

The requirements.txt and requirements-dev.txt are standard within python projects—they contain all of your projects dependencies (the split between a regular and dev file is purely up to the programmer, but there are always benefits to extra decomposition). Think of these files as containing a list of libraries that any end user would need to install in order to properly use your library (when the module is pip installed, these dependencies are downloaded automatically by pip).

The README.md is standard across most codebases (not just python) — this should contain a description of your module and whatever other information you want to include.

Finally, the files that actually provide the backbone of your module are MANIFEST.in and setup.py. MANIFEST.in specifies what additional files to include as part of your module (other than your source files), and setup.py is the actual module configuration. The setup.py file contains all of the metadata and information required to properly package your module for whatever index you want to upload to. An example for us might look something like this:

The above example is configure to include the README.md file along with regular and dev dependencies with the package definition. Most of this should be self explanatory, but of note is the name argument to the setup function, which the actual name of the package (i.e., you would install this module by running pip install my_cool_function. The packages arg specifies the location of the source files, in this case, src. The rest of the options are metadata for the package — for an extended explanation for each, check out the docs.

With the proper infrastructure in place, the final step is to actually upload and publish the package. To do so with PyPI, you need create an account at https://pypi.org/. Once you have an account, you’ll need access to your API key which ensures your uploads are secure.

To actually upload your archive, first generate the distribution archives with the following:

There are ways to debug that your package will be uploaded properly using TestPyPI, a separate index meant to facilitate the testing module uploads— the python documentation has an excellent tutorial on this exact process https://packaging.python.org/tutorials/packaging-projects/.

Now that your package is published, you can easily pip install it: pip install my_cool_function

Maintaining a Module

When maintaining a python module (or any code, for that matter), you should automate as much of the work as possible. While we can’t really automate writing a module (at least not entirely), we can automate its integration and deployment. If you aren’t practicing rigorous Test Driven Development (TDD), the long term correctness of your code is dependent on automated testing. Discussion on effective testing is beyond the scope of this article, but there are lots of ways to test your code — my recommended solution is pytest.

In the spirit of developing shared code, I have a starter repo called module_starter_cli that comes preconfigured with all the necessary infrastructure to automate testing via pytest , versioning and publishing through python-semantic-release, and deployment through CircleCI. Forking this repo for downstream module projects gives you the ability to run automated tests on every branch for every pull request, and will automatically version your code and handle the hassle of deployment to create a seamless and low friction development cycle. I can go into detail on this setup and the philosophy behind shared infrastructure code in a future article. Till then, I hope this article helps your solutions become a little more generalizable.

Software and things

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store