Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V03 issue721 - fix so filetype detection is now an optional extra #734

Closed
wants to merge 11 commits into from
Closed
6 changes: 6 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
metpx-sr3 (3.00.42) UNRELEASED; urgency=medium

* move away from previous release.

-- Peter Silva <peter@blacklab> Tue, 27 Jun 2023 14:26:23 -0400

metpx-sr3 (3.00.41) unstable; urgency=medium

* issue #700 nodupe_redis driver (experimental for now)
Expand Down
14 changes: 8 additions & 6 deletions docs/source/Contribution/Development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,16 +137,18 @@ Local Installation

There are many different ways to install python packages on a computer. Different developers
will prefer different methods, and all the methods need to be tested prior to each release.
Sarracenia can work with either mqtt or amqp (most mature and stable) message passing libraries.
Install one of those first. in these examples, we use amqp.

* **Wheel** when people are running different operating systems (non-ubuntu, non-debian) people will be installing wheels, typically that have been uploaded to pypi.python.org. On the other hand, it is a bit of a pain/noise to upload every development version, so we only upload releases, so testing of wheels is done by building local wheels. Need to build a new wheel every time a change is made.

* **pip install (not -e)** would pull a wheel down from pypi.python.org. Generally not used during development of Sarracenia itself.
* **pip install metpx-sr3[amqp]** would pull a wheel down from pypi.python.org. Generally not used during development of Sarracenia itself.
one could also pull in all possible dependencies with **pip install metpx-sr3[all]**
* **pip install -e .[amqp] ... lets you edit the source code of the installed package, ideal for debugging problems, because it allows live changes to the application without having to go through building and installing a new package.

* **pip install -e** ... lets you edit the source code of the installed package, ideal for debugging problems, because it allows live changes to the application without having to go through building and installing a new package.
* **apt install metpx-sr3** install debian package from repositories, similarly to pip install (not -e), normally dev snapshots are not uploaded to repositories, so while this would be the normal way for users of ubuntu servers, it is not available during development of the package itself. Also need **apt install python3-amqp**

* **apt install** install debian package from repositories, similarly to pip install (not -e), normally dev snapshots are not uploaded to repositories, so while this would be the normal way for users of ubuntu servers, it is not available during development of the package itself.

* **dpkg -i** builds a debian package for local installation. This is how packages are tested prior to upload to repositories. It can also be used to support development (have to run dpkg -i for each package change.)
* **dpkg -i** builds a debian package for local installation. This is how packages are tested prior to upload to repositories. It can also be used to support development (have to run dpkg -i for each package change.) also need **apt install python3-amqp**

The sr_insects tests invokes the version of metpx-sarracenia that is installed on the system,
and not what is in the development tree. It is necessary to install the package on
Expand Down Expand Up @@ -998,7 +1000,7 @@ to identify more issues. sample run to 100,000 entries::
maximum of the shovels is: 100008


While it is runnig one can run flow_check.sh at any time::
While it is running one can run flow_check.sh at any time::

NB retries for sr_subscribe t_f30 0
NB retries for sr_sender 18
Expand Down
6 changes: 6 additions & 0 deletions docs/source/Tutorials/Install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ On Ubuntu 22.04 and derivatives::
sudo add-apt-repository ppa:ssc-hpc-chp-spc/metpx
sudo apt update
sudo apt install metpx-sr3 # main python package.
sudo apt install python3-magic # optional support putting file type content-type message headers.
sudo apt install metpx-sr3c # optional C client.
sudo apt install python3-amqp # optionally support rabbitmq brokers
sudo apt install python3-paho-mqtt # optionally support MQTT brokers
Expand Down Expand Up @@ -139,6 +140,7 @@ For example, on fedora 28 mandatories::
Optional ones::

$ sudo dnf install python3-amqp # optionally support rabbitmq brokers
$ sudo dnf install python3-magic # optionally support content-type header in messages.
$ sudo dnf install python3-netifaces # optionally support vip directive for HA.
$ sudo dnf install python3-paho-mqtt # optionally support mqtt brokers

Expand Down Expand Up @@ -174,6 +176,10 @@ one could also add the extras::

$ pip install metpx-sr3[amqp,mqtt,vip]

for all the extras, there is a shortcut::

$ pip install metpx-sr3[all]

and to upgrade after the initial installation::

$ pip install metpx-sr3
Expand Down
7 changes: 7 additions & 0 deletions docs/source/fr/Contribution/Développement.rst
Original file line number Diff line number Diff line change
Expand Up @@ -133,28 +133,35 @@ Installation locale

Il existe de nombreuses façons d’installer des paquets python sur un ordinateur. Différents développeurs
préféreront différentes méthodes, et toutes les méthodes doivent être testées avant chaque version.
Avant d´installer le paquet il faut généralement une librarie pour communiquer avec le courtier
de messages (généralement rabbitmq/AMQP, mais ca peut être MQTT également)

* **Wheel** Lorsque les gens utilisent différents systèmes d’exploitation (non-Ubuntu, non-Debian),
les gens installent des wheel, généralement qui ont été téléchargées sur pypi.python.org. D’un
autre côté, c’est un peu pénible / bruyant de télécharger chaque version de développement, donc
nous ne téléchargeons que des versions, donc les tests de wheel se font en construisant des roues
locales. Besoin de construire une nouvelle wheel chaque fois qu’un changement est apporté.
*pip install amqp* sera également nécessaire pour le support rabbitmq.

* **pip install (pas -e)** tirerait une wheel vers le bas de pypi.python.org. Généralement pas utilisé
pendant le développement de Sarracenia lui-même.
*pip install amqp* sera également nécessaire pour le support rabbitmq.

* **pip install -e** ... vous permet de modifier le code source du package installé, idéal pour les
problèmes de débogage, car il permet des modifications en direct de l’application sans avoir à passer
par la construction et l’installation d’un nouveau package.
*pip install amqp* sera également nécessaire pour le support rabbitmq.

* **apt install** installer le paquet Debian à partir de dépôts, de la même manière que pip install (pas -e),
normalement les instantanés de développement ne sont pas téléchargés vers des dépôts, donc bien que ce soit
la manière normale pour les utilisateurs de serveurs Ubuntu, il n’est pas disponible pendant le développement
du paquet lui-même.
*apt install python3-amqp* sera également nécessaire pour le support rabbitmq.

* **dpkg -i** construit un paquet Debian pour l’installation locale. C’est ainsi que les packages sont testés
avant d’être téléchargés vers des référentiels. Il peut également être utilisé pour soutenir le développement
(il faut exécuter dpkg -i pour chaque changement de paquet).
*apt install python3-amqp* sera également nécessaire pour le support rabbitmq.

Le test sr_insects appelle la version de metpx-sarracenia installée sur le système,
et non ce qui est dans l’arbre de développement. Il est nécessaire d’installer le paquet sur
Expand Down
18 changes: 12 additions & 6 deletions docs/source/fr/Tutoriel/Installer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,13 @@ Sur Ubuntu 22.04 et dérivés du même::

sudo add-apt-repository ppa:ssc-hpc-chp-spc/metpx
sudo apt update
sudo apt install metpx-sr3 # main python package.
sudo apt install metpx-sr3c # optional C client.
sudo apt install python3-amqp # optionally support rabbitmq brokers
sudo apt install python3-paho-mqtt # optionally support MQTT brokers
sudo apt install python3-netifaces # optionally support the vip directive (HA failover.)
sudo apt install python3-dateparser python3-pytz # optionally support ftp polling.
sudo apt install metpx-sr3 # pacquet principale.
sudo apt install metpx-sr3c # client binaire (en C) .
sudo apt install python3-amqp # support optionnel pour les courtiers AMWP (rabbitmq)
sudo apt install python3-magic # support optionnel pour les entêtes "content-type" dans les messages
sudo apt install python3-paho-mqtt # support optionnel pour les courtiers MQTT
sudo apt install python3-netifaces # support optionnel pour les vip (haut-disponibilité)
sudo apt install python3-dateparser python3-pytz # support optionnel pour les sondages ftp.

Si les paquets ne sont pas disponibles, on peut les remplacer en utilisant python install package (pip)
Actuellement, seuls les paquets Debian incluent des pages de manuel. Les guides sont seulement
Expand Down Expand Up @@ -127,6 +128,7 @@ Par exemple, sur fedora 28 obligatoirement::
Facultatifs::

$ sudo dnf install python3-amqp # optionally support rabbitmq brokers
$ sudo dnf install python3-magic # optionally support content-type headers in files.
$ sudo dnf install python3-netifaces # optionally support vip directive for HA.
$ sudo dnf install python3-paho-mqtt # optionally support mqtt brokers

Expand Down Expand Up @@ -161,6 +163,10 @@ on pourrait aussi ajouter les extras::

$ pip install metpx-sr3[amqp,mqtt,vip]

Si veut avoir tous les extras::

$ pip install metpx-sr3[all]

et à mettre à niveau après l’installation initiale::

$ pip install metpx-sr3
Expand Down
28 changes: 20 additions & 8 deletions sarracenia/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
import datetime
import importlib.util
import logging
import magic
import os
import os.path
import paramiko
import random
import re
import sarracenia.filemetadata
import stat as os_stat
Expand Down Expand Up @@ -261,6 +261,9 @@ def durationToSeconds(str_value, default=None) -> float:

if type(str_value) in [int, float]:
return str_value

if type(str_value) is not str:
return 0

if str_value.lower() in [ 'none', 'off', 'false' ]:
return 0
Expand Down Expand Up @@ -337,7 +340,7 @@ def __computeIdentity(msg, path, o):
methods = [
'random', 'md5', 'md5name', 'sha512', 'cod,md5', 'cod,sha512'
]
calc_method = choice(methods)
calc_method = random.choice(methods)
elif 'identity' in xattr.x and 'mtime' in xattr.x:
if xattr.get('mtime') >= msg['mtime']:
logger.debug("mtime remembered by xattr")
Expand Down Expand Up @@ -471,11 +474,15 @@ def fromFileData(path, o, lstat=None):
if lstat :
if os_stat.S_ISREG(lstat.st_mode):
m.__computeIdentity(path, o)
try:
t = magic.from_file(path,mime=True)
m['contentType'] = t
except Exception as ex:
logging.info("trying to determine mime-type. Exception details:", exc_info=True)
if extras['filetypes']['present']:
try:
t = magic.from_file(path,mime=True)
m['contentType'] = t
except Exception as ex:
logging.info("trying to determine mime-type. Exception details:", exc_info=True)
#else:
# m['contentType'] = 'application/octet-stream' # https://www.rfc-editor.org/rfc/rfc2046.txt (default when clueless)
# I think setting a bad value is worse than none, so just omitting.
elif os_stat.S_ISDIR(lstat.st_mode):
m['contentType'] = 'text/directory' # source: https://www.w3.org/2002/12/cal/rfc2425.html
elif os_stat.S_ISLNK(lstat.st_mode):
Expand Down Expand Up @@ -572,7 +579,7 @@ def fromFileInfo(path, o, lstat=None):
'value': o.identity_method[4:]
}
elif o.identity_method in ['random']:
algo = sarracenia.identity.Indentiy.factory(o.identity_method)
algo = sarracenia.identity.Identity.factory(o.identity_method)
algo.set_path(post_relPath)
msg['identity'] = {
'method': o.identity_method,
Expand Down Expand Up @@ -800,6 +807,7 @@ def getContent(msg):

amqp - ability to communicate with AMQP (rabbitmq) brokers
mqtt - ability to communicate with MQTT brokers
filetypes - ability to
ftppoll - ability to poll FTP servers
vip - enable vip (Virtual IP) settings to implement singleton processing
for high availability support.
Expand All @@ -812,6 +820,7 @@ def getContent(msg):
'ftppoll' : { 'modules_needed': ['dateparser', 'pytz'], 'present': False, 'lament' : 'will not be able to poll with ftp' },
'humanize' : { 'modules_needed': ['humanize' ], 'present': False, 'lament': 'humans will have to read larger, uglier numbers' },
'mqtt' : { 'modules_needed': ['paho.mqtt.client'], 'present': False, 'lament': 'will not be able to connect to mqtt brokers' },
'filetypes' : { 'modules_needed': ['magic'], 'present': False, 'lament': 'will not be able to set content headers' },
'vip' : { 'modules_needed': ['netifaces'] , 'present': False, 'lament': 'will not be able to use the vip option for high availability clustering' },
'watch' : { 'modules_needed': ['watchdog'] , 'present': False, 'lament': 'cannot watch directories' }
}
Expand All @@ -833,6 +842,9 @@ def getContent(msg):

# Some sort of graceful fallback, or good messaging for when dependencies are missing.

if extras['filetypes']['present']:
import magic

if extras['mqtt']['present']:
import paho.mqtt.client
if not hasattr( paho.mqtt.client, 'MQTTv5' ):
Expand Down
2 changes: 1 addition & 1 deletion sarracenia/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "3.00.41"
__version__ = "3.00.42"
3 changes: 2 additions & 1 deletion sarracenia/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1569,7 +1569,8 @@ def parse_file(self, cfg, component=None):
setattr(self, k, v)
else:
#FIXME: with _options lists for all types and addition of declare, this is probably now dead code.
logger.debug('possibly undeclared option: %s' % line )
if k not in self.undeclared:
logger.debug('possibly undeclared option: %s' % line )
v = ' '.join(line[1:])
if hasattr(self, k):
if type(getattr(self, k)) is float:
Expand Down
2 changes: 1 addition & 1 deletion sarracenia/examples/flow/amserver.conf
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ sum sha512
AllowIPs 127.0.0.1
AllowIPs 199.212.17.131/24

destination am://0.0.0.0:5003
sendTo am://0.0.0.0:5003
debug on
28 changes: 17 additions & 11 deletions sarracenia/flow/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import copy
import importlib
import logging
import magic
import os
import re

Expand Down Expand Up @@ -58,6 +57,9 @@
'vip': None
}

if sarracenia.extras['filetypes']['present']:
import magic

if sarracenia.extras['vip']['present']:
import netifaces

Expand Down Expand Up @@ -1835,13 +1837,6 @@ def download(self, msg, options) -> bool:
if not self.o.dry_run:
if accelerated:
self.proto[self.scheme].update_file(new_inflight_path)
if (new_inflight_path != new_file):
if os.path.isfile(new_file):
os.remove(new_file)
os.rename(new_inflight_path, new_file)
# older versions don't include the contentType, so patch it here.
if 'contentType' not in msg:
msg['contentType'] = magic.from_file(new_file,mime=True)
elif len_written < 0:
logger.error("failed to download %s" % new_file)
return False
Expand All @@ -1865,8 +1860,18 @@ def download(self, msg, options) -> bool:
'incomplete download only %d of expected %d bytes for %s'
% (len_written, block_length, new_inflight_path))
return False

msg['size'] = len_written
# when len_written is different than block_length
msg['size'] = len_written

# if we haven't returned False by this point, assuming download was successful
if (new_inflight_path != new_file):
if os.path.isfile(new_file):
os.remove(new_file)
os.rename(new_inflight_path, new_file)

# older versions don't include the contentType, so patch it here.
if sarracenia.extras['filetypes']['present'] and 'contentType' not in msg:
msg['contentType'] = magic.from_file(new_file,mime=True)

self.metrics['flow']['transferRxBytes'] += len_written
self.metrics['flow']['transferRxFiles'] += 1
Expand Down Expand Up @@ -1954,7 +1959,8 @@ def send(self, msg, options):
local_path = '/' + msg['relPath']

# older versions don't include the contentType, so patch it here.
if 'contentType' not in msg and not 'fileOp' in msg:
if sarracenia.extras['filetypes']['present'] and \
('contentType' not in msg) and (not 'fileOp' in msg):
msg['contentType'] = magic.from_file(local_path,mime=True)

local_dir = os.path.dirname(local_path).replace('\\', '/')
Expand Down
6 changes: 5 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,17 @@ def read(*parts):
'Topic :: System :: Logging',
],
install_requires=[
"appdirs", "humanfriendly", "humanize", "jsonpickle", "python-magic", "paramiko",
"appdirs", "humanfriendly", "humanize", "jsonpickle", "paramiko",
"psutil>=5.3.0", "watchdog"
],
extras_require = {
'amqp' : [ "amqp" ],
'filetypes': [ "python-magic" ],
'ftppoll' : ['dateparser' ],
'mqtt': [ 'paho.mqtt>=1.5.1' ],
'vip': [ 'netifaces' ],
'redis': [ 'redis' ]
})
extras_require['all'] = list(itertools.chain.from_iterable(extras_require.values()))


27 changes: 26 additions & 1 deletion tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,15 @@ If you want to run this in VSCode, and have it do all the things nicely, you'll
- [GitLens — Git supercharged](https://marketplace.visualstudio.com/items?itemName=eamodio.gitlens)
Not strictly required, but *very* strongly recommended as it makes VS Code's git features fully functional

Beyond that, changing a few options in your settings file will make it all work; thusly:
Beyond that, changing a few things in your VS Code configs will make it all work.

In `settings.json`, to get all the reports and coverage when running tests, and allow you to run individual tests even if they have dependencies:
```json
{
"python.testing.pytestArgs": [
"tests", "-v",
"--cov-config=tests/.coveragerc", "--cov=sarracenia", "--cov-report=xml", "--cov-report=html",
"--html=tests/report.html", "--self-contained-html",
"--failed-dependency-action=run", "--missing-dependency-action=run"
],
"python.testing.unittestEnabled": false,
Expand All @@ -69,6 +72,28 @@ Beyond that, changing a few options in your settings file will make it all work;
}
```


In `launch.json` (per [documentation](https://code.visualstudio.com/docs/python/testing#_debug-tests)), to enable full debugging support in your tests:
```json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Debug Tests",
"type": "python",
"request": "launch",
"program": "${file}",
"purpose": ["debug-test"],
"console": "integratedTerminal",
"justMyCode": false,
"env": {"PYTEST_ADDOPTS": "--no-cov"}
}
]
}
```

**NOTE:** Don't just squash whatever you have in `settings.json`, or `launch.json`, but use some common sense to merge what's above into your existing files.

## Docker
You can also run the exact same tests from within a Docker container if you want to avoid having to (re)-provision clean installs.

Expand Down
1 change: 1 addition & 0 deletions tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ pytest-cov>=4.0
pytest-bug>=1.2
pytest-depends>=1.0
pytest-html>=3.2
pytest-mock>=3.11

python-redis-lock>=4
fakeredis>=2.11
Expand Down
Loading
Loading