Welcome to the Developer Guide for contributing to BiocPy and/or adding new packages. This repository provides guidance on the developer tools we employ to ensure code quality and consistency within and across all the packages.
Maintaining Consistency: When developing a package, it's essential to maintain consistency in documentation and code style. When working across packages, guidelines are designed to establish a reliable framework for testing and publishing packages, ensuring a seamless development process.
We use pyscaffold to streamline the process of creating a new package. Optionally, you can enhance your workflow by installing the markdown (unless you really want to use restructuredtext) and pre-commit extensions:
pip install -U pyscaffold pyscaffoldext-markdown pre-commit
Creating a new package is as simple as running the following command:
putup <NEW_PACKAGE_NAME> --markdown --pre-commit
This command sets up the basic package structure.
Embracing Google's Python Style Guide: We highly recommend adhering to Google's Python style guide for consistency. You can find detailed information on naming conventions, type annotations and more.
- Classes should use
PascalCase
and should follow Bioconductor's class names. - Methods should use
snake_case
and should take the form of<verb>[_<details>]
. For example,get_start()
,set_names()
and so on.- Method arguments should also use
snake_case
.
- Method arguments should also use
For each Bioconductor class, we aim to provide the same user experience in Python.
In most cases, this is done by just directly re-implementing the class and its associated methods in Python.
Occasionally, the Bioconductor implementation has some historical baggage (e.g., the storage of rowData
in a RangedSummarizedExperiment
, MultiAssayExperiment
harmonization);
developers should use their own discretion to decide whether that really needs to be replicated in Python.
The existence of mutable types in Python means that it can be dangerous to modify complex objects. If a mutable object has a user-visible reference and is also a member of a larger Bioconductor object, a user-specified modification to that object may violate the constraints of the parent object.
To mitigate these issues, we enforce a functional programming discipline in all class methods. By default, all methods should avoid side effects that mutate the object. This simplifies reasoning around the effect of methods in large complex objects.
The most obvious application of this philosophy is in setter methods.
Rather than mutating the object directly, they should return a new copy of the object with the desired modification.
The "depth" of the copy depends on the nature of the field being set; the aim should be to avoid any modification of the contents in self
.
Implementations may offer an in_place=
option to apply the modification to the original object, but this should be False
by default.
To avoid performance issues, getter methods may return mutable objects without copying This assumes that their return values are read-only and will not be directly mutated. (Setter methods that operate via a copy are allowed.) In some cases; the return value of a getter method may be directly mutated, e.g., because a copy was already created in the getter; this should be clearly stated in the documentation but should not be treated as the default.
Direct access to class members (via properties or @property
) should generally be avoided,
as it is too easy to perform modifications via one liners with the class.property
on the left-hand-side of an assignment.
The default assumption is property based setters will mutate the object in-place.
pyscaffold uses tox to create isolated environments for testing, documentation, and publishing packages. Familiarize yourself with available tox commands. You'll rarely need to modify the default tox.ini
file.
We recommend a line length of 120 as it seems to work well in most scenarios, but feel free to modify this to your preference. We highly recommend using ruff for linting.
To use this configuration, add the following to the end of your pyproject.toml
file:
[tool.ruff]
line-length = 120
src = ["src"]
exclude = ["tests"]
extend-ignore = ["F821"]
[tool.ruff.pydocstyle]
convention = "google"
[tool.ruff.per-file-ignores]
"__init__.py" = ["E402", "F401"]
[tool.black]
force-exclude = "__init__.py"
An additional step might be needed if you are utilizing pre-commits. Our recommended pre-commit configuration is included in this repository.
We also enabled pre-commit bot across all BiocPy packages to automate and auto-fix code and documentation.
We use the furo theme across all packages for a unified look. Add furo to both docs/requirements.txt
and update the HTML theme to use furo (in docs/conf.py
).
In addition, we use sphinx-autodoc-typehints for a cleaner api documentation. Include this package in docs/requirements.txt
and add it as an extension in docs/conf.py
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.todo",
"sphinx.ext.autosummary",
"sphinx.ext.viewcode",
"sphinx.ext.coverage",
"sphinx.ext.doctest",
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.napoleon",
"sphinx_autodoc_typehints",
]
Something on our todo list is to explore quartodoc and/or Jupyter Book for documentation and tutorials (vignette-like reproducible tutorials).
Utilize intersphinx to link objects from other packages. For instance, to link to a pandas DataFrame object, one might simply specify :py:class:`~pandas.DataFrame` within their docstrings. While this is sometimes annoying, its helps developers and users recognizing and identifying the underlying data model or objects the documentation refers to.
Make sure to update the "intersphinx_mapping"
to external packages in docs/conf.py
.
Note that dunder methods of classes aren't automatically documented. Modify these defaults as needed in your docs/conf.py
:
autodoc_default_options = {
'special-members': True,
'undoc-members': False,
'exclude-members': '__weakref__, __dict__, __str__, __module__, __init__'
}
autosummary_generate = True
autosummary_imported_members = True
As the term suggests, these are "hints", only used to enhance the developer experience; they should not dictate how we write our code. For this reason, we prefer simple types in these hints, usually corresponding to base Python types with minimal nesting. For example, if a function is expected to operate on any arbitrary list, the basic list type hint should suffice.
def find_element(arr: list, query: int)
pass
If your function expects a list of strings,
from typing import List
def find_element(arr: List[str], query: str):
pass
If your function accepts multiple types as inputs,
from typing import Union
def find_element(arr: List[str], query: Union[int, str, slice]):
pass
There is no need to waste time constructing the most perfectly descriptive type for your arguments or return values; just use a simple hint with minimal nesting and put the details in the docstring instead.
For most packages, the included GitHub workflows should suffice for most scenarios as long as you follow the instructions in this document. You might need to set up twine to publish packages to PyPI.
If you are developing packages interfacing with C/C++ libraries and require building multiple wheels, refer to our GitHub workflows in scranpy. While we currently use cibuildwheel (and its very slow), we are trying to speed up this workflow.
Your contributions and packages are valuable to BiocPy, and we hope this guide helps set guidelines and standards. Thank you for being a part of our developer community and more importantly have fun!