Skip to content

Commit

Permalink
Documentation work... clarifying things, adding MQTT cases.
Browse files Browse the repository at this point in the history
  • Loading branch information
petersilva committed Feb 22, 2022
1 parent 9b0777a commit 8bace1a
Show file tree
Hide file tree
Showing 5 changed files with 130 additions and 62 deletions.
2 changes: 1 addition & 1 deletion debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Source: metpx-sr3
Section: python
Priority: optional
Maintainer: Shared Services Canada Supercomputing <[email protected]>
Build-Depends: debhelper (>= 9), python3, python3-all, python3-setuptools, python3-docutils, dh-python, python3-amqp, python3-appdirs, python3-watchdog, python3-paramiko, python3-netifaces, python3-humanize, python3-jsonpickle, python3-psutil
Build-Depends: debhelper (>= 9), python3, python3-all, python3-setuptools, python3-docutils, dh-python, python3-amqp, python3-appdirs, python3-watchdog, python3-paramiko, python3-netifaces, python3-humanize, python3-jsonpickle, python3-psutil, python3-dateparser
Standards-Version: 3.9.5
X-Python3-Version: >= 3.6
Vcs-Git: https://github.com/MetPX/sarracenia
Expand Down
26 changes: 26 additions & 0 deletions docs/Contribution/Development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1385,6 +1385,32 @@ Automated Build
* for Sarrac, follow the procedure `here <https://github.com/MetPX/sarrac#release-process>`_
* The built packages will be available in the `metpx ppa <https://launchpad.net/~ssc-hpc-chp-spc/+archive/ubuntu/metpx>`_

Ubuntu 18.04
++++++++++++

For ubuntu 18.04 (bionic), there are a few wrinkles. The recipe is called: metpx-sr3-daily-bionic, and it
takes source from a different branch: *v03_launchpda* For every release, this branch needs to be rebased from
*v03_wip*

* git checkout v03_launchpad
* git rebase v03_wip
* git push
* import souce
* Request build from *metpx-sr3-daily-bionic* Recipe.

What is different about this *v03_launchpad* branch? It:

* removes the dependency on python3-paho-mqtt as the version in the repositories is too old.
* removed the dependency on python3-dateparser, as that package is not available in the repository.
* override the testing target un debian/rules, because testing without the dependencies fails.::

override_dh_auto_test:
echo "disable on 18.04... some deps must come from pip"

The missing dependencies should be installed with pip3.



Building a Windows Installer
++++++++++++++++++++++++++++

Expand Down
147 changes: 95 additions & 52 deletions docs/Explanation/Concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ but it seems this information should be available *somewhere*.
Introduction
------------

Sarracenia pumps form a network. The network uses amqp brokers as a transfer
Sarracenia pumps form a network. The network uses mqp brokers as a transfer
manager which sends advertisements in one direction and report messages in the
opposite direction. Administrators configure the paths that data flows through
at each pump, as each broker acts independently, managing transfers from
Expand Down Expand Up @@ -44,7 +44,7 @@ The Flow Algorithm
~~~~~~~~~~~~~~~~~~

All of the components (post, subscribe, sarra, sender, shovel, watch, winnow)
share substantial code and differ only in default settings. The Flow
share substantial code and differ only in default settings. The Flow
algorithm is:

* Gather a list of messages
Expand Down Expand Up @@ -101,40 +101,61 @@ The components just have different default settings:
+------------------------+--------------------------+
| Component | Use of the algorithm |
+------------------------+--------------------------+
+------------------------+--------------------------+
| *sr_subscribe* | Gather=gather.message |
| | |
| Download file from a | Filter |
| pump. If the local | |
| host is a pump, | Do=Download |
| post the downloaded | |
| file. | Outlet=optional |
| pump. | |
| | Work=Download |
| default mirror=False | |
| All others it is True| post=optional |
+------------------------+--------------------------+
| *sr_sarra* | Gather=gather.message |
| | |
| Used on pumps. | |
| Download file from a | Filter |
| pump to another pump | |
| Post the file from | |
| the new pump so that | |
| subscribers to | Work=Download |
| this pump can | |
| download in turn. | post=post |
| | |
+------------------------+--------------------------+
| *sr_poll* | Gather |
| | if has_vip: poll |
| | |
| Find files on other | Filter |
| servers to post to | |
| a pump. | if has_vip: |
| | Do=nil |
| | Work=nil |
| has_vip* | |
| | Outlet=yes |
| | Post=yes |
| | Message?, File? |
+------------------------+--------------------------+
| *sr_shovel/sr_winnow* | Gather=gather.message |
| *sr_shovel* | Gather=gather.message |
| | |
| Move posts or | Filter (shovel cache=off)|
| reports around. | |
| | Do=nil |
| | Work=nil |
| | |
| | Post=yes |
+------------------------+--------------------------+
| *sr_winnow* | Gather=gather.message |
| | |
| | Outlet=yes |
| Move posts or | Filter (shovel cache=off)|
| reports around. | |
| | Work=nil |
| suppress duplicates | |
| | Post=yes |
+------------------------+--------------------------+
| *sr_post/watch* | Gather=gather.file |
| | |
| Find file on a | Filter |
| local server to | |
| post | Do=nil |
| post | Work=nil |
| | |
| | Outlet=yes |
| | Post=yes |
| | Message?, File? |
+------------------------+--------------------------+
| *sr_sender* | Gather=gather.message |
Expand Down Expand Up @@ -176,7 +197,7 @@ do not have the vip the following algorithmic loop will continue:

* gather
* filter
* after_accept
* after_accept

The poll's gather and fileter being alive and kicking even in passive mode,
allows it to subscribe to the exchange it is posting to and update it's cache
Expand Down Expand Up @@ -208,20 +229,19 @@ be rabbitmq specific, but management functions differ between implementations.
*Queues* are usually taken care of transparently, but you need to know
- A consumer/subscriber creates a queue to receive messages.
- Consumer queues are *bound* to exchanges (AMQP-speak)
- MQTT equivalent: *client-id*

An *exchange* is a matchmaker between *publisher* and *consumer* queues.
An *exchange* is a matchmaker between *publisher* and *consumer queues*.
- A message arrives from a publisher.
- message goes to the exchange, is anyone interested in this message?
- in a *topic based exchange*, the message topic provides the *exchange key*.
- interested: compare message key to the bindings of *consumer queues*.
- message is routed to interested *consumer queues*, or dropped if there aren't any.
- concept does not exist in MQTT, used as root of the topic hierarchy.

Multiple processes can share a *queue*, they just take turns removing messages from it.
- This is used heavily for sr_sarra and sr_subcribe multiple instances.

*Queues* can be *durable*, so even if your subscription process dies,
if you come back in a reasonable time and you use the same queue,
you will not have missed any messages.
- Same concept is available as *shared subscriptions* in MQTT.

How to Decide if Someone is Interested.
- For Sarracenia, we use (AMQP standard) *topic based exchanges*.
Expand All @@ -231,43 +251,66 @@ How to Decide if Someone is Interested.
- Resolution & syntax of server filtering is set by AMQP. (. separator, # and * wildcards)
- Server side filtering is coarse, messages can be further filtered after download using regexp on the actual paths (the reject/accept directives.)

topic prefix? We start the topic tree with fixed fields
- v02 the version/format of sarracenia messages.
- post ... the message type, this is an announcement
of a file (or part of a file) being available.


Sarracenia is a MQP Application
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AMQP v09 (Rabbitmq) Settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MetPX-Sarracenia is only a light wrapper/coating around Message Queueing Protocols.

- A MetPX-Sarracenia data pump is a python AMQP application that uses a (rabbitmq)
broker to co-ordinate SFTP and HTTP client data transfers, and accompanies a
web server (apache) and sftp server (openssh), often on the same user-facing address.

- A MetPX-Sarracenia data pump can also work with rabbitmq replaced by an MQTT broker
such as mosquitto.org (but some administrivia must be handled manually.

- Wherever reasonable, we use their terminology and syntax.
If someone knows AMQP, they understand. If not, they can research.

- Users configure a *broker*, instead of a pump.
- by convention, the default vhost '/' is always used (did not feel the need to use other vhosts yet)
- users explicitly can pick their *queue* names (this ia a client-id in MQTT.)
- users set *subtopic*,
- topics with dot separator are minimally transformed, rather than encoded.
- queue *durable*.
- we use *message headers* (AMQP-speak for key-value pairs) rather than encoding in JSON or some other payload format.

- reduce complexity through conventions.
- use only one type of exchanges (Topic), take care of bindings.
- naming conventions for exchanges and queues.
- exchanges start with x.
- xs_Weather - the exchange for the source (amqp user) named Weather to post messages
- xpublic -- exchange used for most subscribers.
- queues start with q\_

For those who are familiary with the underlying protocols, These are the mappings:

- A MetPX-Sarracenia data pump is a python AMQP application that uses a (rabbitmq)
broker to co-ordinate SFTP and HTTP client data transfers, and accompanies a
web server (apache) and sftp server (openssh), often on the same user-facing address.

- A MetPX-Sarracenia data pump can also work with rabbitmq replaced by an MQTT broker
such as mosquitto.org (but some administrivia must be handled manually.

- Wherever reasonable, we use their terminology and syntax.
If someone knows AMQP, they understand. If not, they can research.

- Users configure a *broker*, instead of a pump.
- by convention, the default vhost '/' is always used (did not feel the need to use other vhosts yet)
- users explicitly can pick their *queue* names (this ia a client-id in MQTT.)
- users set *subtopic*,
- topics with dot separator are minimally transformed, rather than encoded.
- queue is set to *durable* so that messages are not lost across broker restarts.
- we use *message headers* (AMQP-speak for key-value pairs) rather than encoding in JSON or some other payload format.
- *expire* how long to keep an idle queue or exchange around.

- reduce complexity through conventions.
- use only one type of exchanges (Topic), take care of bindings.
- naming conventions for exchanges and queues.
- exchanges start with x.
- xs_Weather - the exchange for the source (mqp user) named Weather to post messages
- xpublic -- exchange used for most subscribers.
- queues start with q\_

MQTT (version =5) Settings
~~~~~~~~~~~~~~~~~~~~~~~~~~

MQTT is actually a better match to Sarracenia than AMQP, as it is entirely
based on hierarchical topics, while topics are only one among a variety of
choices for routing methods in AMQP.

- in MQTT, topic separator is / instead of .
- the MQTT topic wildcard *#* is the same as in AMQP (match rest of topic)
- the MQTT topic wildcard *+* is the same as the AMQP *\** (match one topic.)
- an AMQP "Exchange" is mapped to the root of the MQTT topic tree,
- an AMQP "queue" is represented in MQTT by *client-id* and a *shared subscription*
Note: Shared subscriptions are only present in MQTTv5. So Sarracenia can only easily

* AMQP: A queue named *queuename* is bount to an exchange xpublic with key: v03.observations ...
* MQTT subscription: topic $shared/*queuename*/xpublic/v03/observations ...

- connections are clean_sesssion=0 normally, to recover messages when a connection is broken.
- MQTT QoS==1 is used to assure messages are sent at least once, and avoid overhead
of ensuring only once.
- AMQP *prefetch* mapped to MQTT *receiveMaximum*
- *expire* has same meaning in MQTT as in AMQP.

MQTT v3 lacks shared subscriptions, and the recovery logic is quite different.
Sarracenia only supports v5.


Flow Through Pumps
Expand Down
11 changes: 6 additions & 5 deletions docs/Explanation/Glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,24 @@ Subscribers
Post, Notice, Notification, Advertisement, Announcement
These are AMQP messages build by sr_post, sr_poll, or sr_watch to let users
know that a particular file is ready. The format of these AMQP messages is
described by the `sr_post(7) <../Reference/sr3.1.rst#post>`_ manual page. All of these
described by the `sr_post(7) <../Reference/sr_post.7.rst>`_ manual page. All of these
words are used interchangeably. Advertisements at each step preserve the
original source of the posting, so that report messages can be routed back
to the source.


Report messages
These are AMQP messages (in `sr_post(7) <../Reference/sr3.1.rst#post>`_ format, with _report_
These are AMQP messages (in `sr_post(7) <../Reference/sr_post.7.rst>`_ format, with _report_
field included) built by consumers of messages, to indicate what a given pump
or subscriber decided to do with a message. They conceptually flow in the
opposite direction of notifications in a network, to get back to the source.


Pump or broker
A pump is a host running Sarracenia, a rabbitmq AMQP server (called a *broker*
in AMQP parlance) The pump has administrative users and manage the AMQP broker
as a dedicated resource. Some sort of transport engine, like an apache
A pump is a host running Sarracenia, either a rabbitmq AMQP server or an MQTTT
one such as mosquitto. The message queueing middleware is called a *broker.*
The pump has administrative users and manage the MQP broker
as a dedicated resource. Some sort of transport engine, like an apache
server, or an openssh server, is used to support file transfers. SFTP, and
HTTP/HTTPS are the protocols which are fully supported by sarracenia. Pumps
copy files from somewhere, and write them locally. They then re-advertise the
Expand Down
6 changes: 2 additions & 4 deletions docs/Explanation/SarraPluginDev.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
Sarracenia Programming Guide
=============================

[ `version française <fr/Prog.rst>`_ ]

---------------------
Working with Plugins
---------------------
Expand Down Expand Up @@ -53,10 +51,10 @@ Introduction
------------

A Sarracenia data pump is a web server with notifications for subscribers to
know, quickly, when new data has arrived. To find out what data is already
know, quickly, when new data has arrived. To find out what data is already
available on a pump, view the tree with a web browser. For simple immediate
needs, one can download data using the browser itself or through a standard tool
such as wget. The usual intent is for sr_subscribe to automatically download
such as wget. The usual intent is for sr_subscribe to automatically download
the data wanted to a directory on a subscriber machine where other software
can process it.

Expand Down

0 comments on commit 8bace1a

Please sign in to comment.