Use Postgres to manage Python workloads asynchronously, powered by asyncpg. asyncpg-queue is a simple library whose features include:
- Ability to run both synchronous and asynchronous Python callables
- At-least-once execution of queued tasks
- Scales with the number of database connections available
- Dependency free apart from
asyncpg
- Uses Postgres notification channels to not thrash the database with unnecessary polling
To get started using asyncpg-queue, initialize the Postgres objects that it relies on:
import asyncpg
from asyncpg_queue import bootstrap
db = asyncpg.connect("postgresql://[email protected]:5432/postgres")
await bootstrap(db)
Now tasks can be enqueued for future processing. The queue.put
method is naive and
should, in most cases be used within a transaction like in the following contrived
example:
from asyncpg_queue import queue
db = asyncpg.connect("postgresql://[email protected]:5432/postgres")
async with db.transaction():
await db.execute(
"INSERT INTO users (name, email) VALUES ($1, $2)",
"Someone Like a User",
"[email protected]",
)
await queue.put(
db,
"send-welcome-email",
data={
"email": "[email protected]",
"name": "Someone Like a User",
"stuff": "more of it"},
)
The utility of using put
within a transaction is that often tasks that are meant to
be processed asynchronously should only be enqueued if the generating process succeeds.
The above relies on the database transaction successfully being committed as a strong
indicator that the user was successfully created and therefore should receive a welcome
email. However, there is no requirement that put
must be called within a transaction.
Processing tasks entails creating and running a worker process.
from asyncpg_queue import Worker
def send_welcome_email(email, name, **kwargs):
print("sending a welcome email!")
worker = Worker(
"postgresqlL//[email protected]:5432/postgres",
tasks={
"send-welcome-email": send_welcome_email,
}
)
await worker.run()
Notice the tasks
parameter passed as part of Worker
's initialization. This map
instructs the worker to process the "send-welcome-email" queue of tasks with the
specified function.
asyncpg-queue uses Poetry to manage its dependencies, development tooling, and buiild.
poetry install
A Docker container running Postgres is used during testing. Assuming that docker
is
available on your system path at the time of running tests, the appropriate image(s)
will be pulled.
Tests are invoked by Pytest:
poetry run pytest test/
Alternatively, if you have a running Postgres instance and do not want to rely on Docker, pass the DSN of a running Postgres instance that can be used during testing:
poetry run pytest --postgres-dsn=postgresql://postgres@localhost:5433/postgres test/
Code formatting is enforced by Black:
poetry run black .
Linting (and auto-fixing where possible) is done by Ruff:
poetry run ruff check --fix .
Types are checked with Mypy:
poetry run mypy --install-types ./asyncpg_queue/ ./test/
Unused code is checked by Vulture:
poetry run vulture asyncpg_queue/ test/
Keep your project simple as long as possible! While simplicity is in the eye of the beholder, the definition used here amounts to, refrain from adding additional tools until necessary.
Many projects begin simply with a server and a data-store. Eventually, as the project gains users and gathers complexity there may be a need for doing something in a separate process so as to not impede the main line. This something could be sending emails by poking some email SaaS provider's API or calculating the total number of new users of a particular feature at the end of the day. asyncpg-queue is meant for this moment in an application's history.
asyncpg-queue and similar implementations have been successfully used to prolong or forestall implementing queues and background workers with Redis, Celery or a variety of other data stores. While many of these tools are not difficult to operate and PaaS vendors often have a managed version, there is always an additional complexity cost from introducing a new tool. asyncpg-queue should keep your toolset consistent since it only relies on Python and Postgres.
The primary caveat of this library is that if the database is the bottleneck in an application deployment then using this tool will only add to the pressure on Postgres. There will be more connections opened, more queries, and some additional data stored. If any of those areas are problems, they will almost undoubtedly get worse with the introduction of asyncpg-queue.
While fast enough, asyncpg-queue has little ability to ramp the performance of
producers (ie: adding to the queue) or consumers (ie: popping from the queue) because
of its reliance on Postgres. Only so much data can be written or read given a network
configuration and the server instance running Postgres. To play around with this idea
consult example/benchmark/producer.py
and
example/benchmark/consumer.py
which should provide
estimates of the maximum read and write throughput of your setup.
asyncpg-queue is well suited for workloads that are mostly I/O. An example would be
calculating an end-of-day rollup table in Postgres that takes a long time to run.
However, asyncpg-queue is ill suited for running many CPU intensive tasks, like
training a neural network or performing the same end-of-day rollup in memory. In these
cases it's necessary to pay attention to the concurrency
parameter of Worker
.