-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue1029 wmo00 accumulated file format support #1030
Conversation
config settings:
sample run: UPDATE: module names changed... the attachment is a run from before the change was made so modules have the old names. |
fwiw... I call it WMO-00 format because I heard that somewhere (possibly in wmo meetings.) I think it refers to bytes 9 and 10 in the header... which are 0's. (aka nulls.) There is another type described on the same page, where the type specifier is 01 ... but that file format is deprecated according to the source, and our French colleagues. |
questions:
I'm scarred by amtcpwriter... ;-) where nobody knows what _read and _write are referring to. perhaps different names for the modules.... suggestions?
Would names like that be clearer? |
I went with wmo00_accumulate and wmo00_split ... which seem a bit more descriptive. |
The other thing missing... is how will this be tested... no integration in flow_tests or unit tests... |
sarracenia/flowcb/log.py
Outdated
s+= f"a file with baseUrl: {msg['baseUrl']} " | ||
s+= f"a file " | ||
|
||
if 'baseURl' in msg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo here:
# v
if 'baseURl' in msg:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good eye!
WMO-00 is a format for data exchange with WMO GTS nodes.
One needs to put a header and possibly a trailer on the content of inbound individual files. Once the headers are added the data for multiple messages are catenated together. The resulting accumulation file is then named CCCC00000000.a where the CCCC is the WMO origin code, the 8 digits are a numbers to make the files unique, and the a suffix is a type specifier.
There are two plugins introduced to support:
Reference: https://library.wmo.int/viewer/35800/download?file=386_en_1.pdf&type=pdf&navigator=1
( Page 137... the accumulated message format.)
Describing the patches:
So
You don't have to calculate the file name and open it. just call getContent() it will lookup the baseDir setting,
and check if such a file exists locally.
How it was tested:
set up subscription to WMO-BULLETINS on a test marchine, then have the two modules called
This provides a round-tripping test. During development ran with messageCountMax 50 to see short runs. Looking at the logs, could determine if the files were coming out of the grouped file intact. Also, since the file
naming cannot match (using randomized sequence numbers) applied md5 checksums to make
correlation of input and output easier.
Once it was working, ran it for a few hours on a full international WMO GTS feed.
looking for errors. So the two are at least consistent with each other.
Used international WMO bulletins present in the subscription as examples to better understand the formats.
next step would be interop tests, but probably need it merged first.
EDIT: changed names of modules from wmo00_write -> wmo00_accumulate, wmo00_read -> wmo00_split.