Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AM protocol driver #379

Open
petersilva opened this issue Jul 22, 2021 · 23 comments
Open

AM protocol driver #379

petersilva opened this issue Jul 22, 2021 · 23 comments
Assignees
Labels
enhancement New feature or request

Comments

@petersilva
Copy link
Contributor

To help with decommissioning Sundew, but not lose interop with MM, then we could make a sender/receiver for AM sockets...

@petersilva petersilva added the enhancement New feature or request label Jul 22, 2021
@petersilva
Copy link
Contributor Author

@habilinour suggestes we could do an on_file plugin to write to AM.
but requires Sarra on all the client systems, versus still having to run MM on the data hub, and feeding it that way, but same on_file approach would work in both cases.

@petersilva
Copy link
Contributor Author

unable to get python-paramiko on ubuntu 20.04
MetPX/Sundew#16

@petersilva
Copy link
Contributor Author

@kurt2x implemented data hub with combo of Sundew and Sarra... he thinks having an AM Sender of some kind would be good.

@petersilva
Copy link
Contributor Author

@kurt2x likely wants the receiver to act like an amtcpserver connection, where you have a master that forks individual connections using a single port defined for all connections (different from Sundew, where 1 port was assigned to each configuration.)

(this might be too much for an initial attempt.)... perhaps just start with a script that accepts a connection and processes bulletins received (separate one for a sender.)

@andreleblanc11 andreleblanc11 self-assigned this Sep 21, 2022
@andreleblanc11
Copy link
Member

The AM sender is completed and ready to test. One problem though, Sundew is giving me problems when setting up the test receiver on my Azure VM which has Ubuntu 20.04 installed. There seems to be quite a lot of missing dependencies on Python2, Especially with paramiko. This has already been addressed in another issue.

@petersilva
Copy link
Contributor Author

petersilva commented Oct 3, 2022

I spun up a vm to try things out... the following might work:


sudo apt update
sudo apt upgrade
sudo apt install python
curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py
sudo python get-pip.py
sudo pip install paramiko

@petersilva
Copy link
Contributor Author

@petersilva
Copy link
Contributor Author

General rule, do apt search python-x to find python library x. If there isn't one, then install with sudo pip install x.
ubuntu 20, is the last version where even this recipe will work.

@petersilva
Copy link
Contributor Author

petersilva commented Oct 26, 2022

OK so I sat with @andreleblanc11 yesterday and we pair programmed for a bit. He now has a flowcb plugin that works for accepting messages from a single connection, writing them to a file in a configured directory, and posting a message for the resultant local file. We had to comment out the poll initialization in order to get it to work for our example, because:

  • If we add that plugin into a poll, a poll wants to have a transfer protocol (that it connects to to list files on the remove resource.) so since there is no transfer protocol to connect to, the init fails.

  • If we add it as a plugin into a shovel (or subscribe, or sarra, or winnow.) it wants a broker to connect to for upstream messages... so the connection to the upstream broker fails on startup.

  • If we add it as a plugin for watch... hey that might work... we just have a watch with a plugin that writes to the watch directory... but again... it would be a watch with no watch...

  • we could try to use the parent class (sarracenia.flow) without any sub-class... but the whole cli is built around the concept of "component/config" (which is already something being called into question in: what to do with config files outside of the proper tree? #575, and hierarchical configurations (getting rid of "components") #494 )

  • we could add special logic for "flow" to be an implicit component, and call them flows.

  • to avoid having to deal with "no-component", we could invent a kind of "vanilla" flow... a flow class with no default plugins. That would start up, and run... but currently we have no way of using a something that isn't a "component" so we need to invent a new "vanilla" component? "plain?"

  • We could do is just make a sub-class for Andre's thing am becomes a new components sr3 start amserver.

dunno... lots of options.. not sure which way to do... so far the most attractive thing to me is to futz with "flow" as an implicit component...

@petersilva
Copy link
Contributor Author

I'm moving the above discussion to #575 ... as the resolution might be the same...

@petersilva
Copy link
Contributor Author

petersilva commented Oct 28, 2022

So @kurt2x has a long standing wish to have an amserver, like on MetManager, where anyone just connects to a socket, and starts sending data. with Sundew, we never had that, one would just pick a port, and a single process (pxReceiver) would service a single connection on the defined port.

What @andreleblanc11 has implemented right now is like pxReceiver, it accepts a single connection and processes it. To get amserver like behaviour, we need to follow the traditional (if obscure) C daemon service pattern (like in http://www.microhowto.info/howto/listen_for_and_accept_tcp_connections_in_c.html#idp58640 )

to recap... what is needed for the amserver "forking" issue:

  • in the on_start it does a bind. ONCE! then it loops:
    • it then does an accept.
    • increment an instance id for the child's use...
    • then you pid=fork().
    • the parent (pid!=0) then needs to:
      • close the accepted socket. and
      • loop around again waiting on the next accept... (not bind.)
    • The child (pid==0) then needs to:
      • change to use the given instance id... which means:
      • logging to a file with the correct instance id.
      • writing the pid file for the instance, so that sr status|stop|sanity will find it.
        *then exit the loop and start processing messages...

so... all the instance and log stuff is set up in instance.py ... which calls flow.. so it isn't immediately obvious how to call the instance stuff from a flow... sigh... food for thought.
perhaps @reidsunderland has some ideas.

the log setup logic is kind of painful, it's in sarracenia.instance.start() ... I guess we can copy/paste, or might need to re-factor a bit...

@petersilva
Copy link
Contributor Author

petersilva commented Oct 28, 2022

I think copy/paste from instance.start() is probably fine...
call:

pidfilename = sarracenia.config.get_pid_filename( ... )

with the right instance number (last argument.) and then write it. a few lines.

The logging stuff requires more thought... maybe move it to config? and invoke from there?

@petersilva
Copy link
Contributor Author

I think we can just fix the pids first, and let them all share the log for now... that will be tangible progress.

@petersilva
Copy link
Contributor Author

at midnight, it will be hilarious when they all try to rotate the log at once...

@andreleblanc11
Copy link
Member

andreleblanc11 commented Oct 28, 2022

I've been working on the sender problem that me and @petersilva worked on couple of weeks ago. Still can't seem to figure out why the Sundew receiver doesn't like to ingest my source codes' bulletins. There aren't any error messages being replied as well. The receiver just shuts down. I've been verifying the encoding format, and everything seems to be fine (iso-8859-1 is the standard used in my source code and is the same used on Sundew).

@petersilva
Copy link
Contributor Author

@andreleblanc11
Copy link
Member

Been talking with @kurt2x this afternoon and here are a couple of key things to point out from our discussion.

  • In order to write to the same port on the AM sarracenia receiver, there would need to be another amtcp writer setup on another VM connected through MM. Ideally, the bulletins that would be received would be from a different source so that we're able to test all the use cases for the AM protocol. Kurt will set that up on a sarra dev server when he gets the time.
  • At the moment, the messages containing the bulletins are sometimes received in bulk. Sometimes it'll even max out the buffer when the queue is long enough. We'll need a find way to parse the bulletins out of that bulk and to make it so that bulletins don't get divided whenever the buffer maxes. The sundew source code already does this somehow.
  • @kurt2x figured out that the Sundew bulletins aren't being written to files correctly on the AM sarra receiver. Usually, every bulletin would get its separate file like such .../SACN31_CWAO_281500__CYTR_27717... where the last five digits originate from a random number preceded by part of the bulletin header. When parsing the bulletins, we'll also need to figure out how to write the files correctly since the format varies from different types of bulletins. Again, more clues will be found in the Sundew source code.

@petersilva
Copy link
Contributor Author

The stuff being put in the AM header is the ONLY way to separate the bulletins. The AM header gives the length of the message in bytes. and you have to read that many bytes and then stop. Any later bytes are the start of the header of the next message,... If you don't read the right number of bytes... it is hopeless... you will miss the header and I don't think there is any way to re-sync, and AM protocol doesn't have any elements that would allow a re-sync. You need to parse the header correctly, assign the bytes correctly.

@petersilva
Copy link
Contributor Author

In your test case... you need two canned files instead of just the one you have now. you then send the two files and look at what the am server writes out... if the file it writes out is longer ... then it likely is not paying attention to the length specified in the header. If it's shorter, then it's getting the length wrong.

@andreleblanc11
Copy link
Member

Sender and receiver seem to work well together. Bulletins are being stored correctly inside the files. Couple of bugs that have been fixed.

  • Receiver
    • Exiting while forking the processes
    • Connecting to random outbound connections
    • Bulletin parsing
    • File writing of bulletins
  • Sender
    • Sarra integration

@petersilva
Copy link
Contributor Author

looked over the branch. looking good! so far... would be good to add:

  • can you copy your test configs into sarracenia/examples/flow and sender/ so people have some sample sender configs to work with.
  • in the doc string at the start of each plugin should be documentation of it's use. have a look at sarracenia/flowcb/mdelaylatest.py for idea on content...
  • that docstring should include description of options.
  • that should probably include description of usage of directory/ because this is a bit unusual...
  • The code currently refers to "Sundew AM" protocol... Sundew was just the third or fourth implementation of it. Sundew implemented the protocol to talk to other software. The protocol is best referred to as ECCC's proprietary Alpha Manager(AM) socket protocol.
  • the documentation should indicate that AM protocol cannot be stopped without data loss, and therefore care should be taken with maintenance interventions.
  • There is a comment that says "Add signal handler" followed by... you guessed it, a line that sets a signal handler... except what it actually does is reset the signal handler to default... It might be better to have a comment that indicates why we are futzing with signals... something like: "override outer signal handler with a default one to get AM to exit. AM protocol means this can never be clean"
  • connection is mispelled often (connexion), please fix.
  • please change the name of establishconn ... It's an odd name for what the function does... waitForConnection ? waitForRemoteServerConnection ? waitRemoteConnection ... establishconn is unhelpfully vague, could equally apply to someone connecting to a remote server... so it does not give a great idea of what the routine does.
  • in unwrapmsg() ... logger.info("Gather successful") is not useful to anyone once the plugin is debugged... likely will spam the log, should be deleted, or at the very least demoted to debug.
  • in the sender... establichConn should be changed to connect or connectToRemoteUrl or something like that.
  • in the sender, if a connection ever fails, it just dies and never attempts to re-establish a connection. In Sarracenia, we always have civilized recovery... so that the connection will be re-established without human intervention, and without hammering the other end continuously (unlike MetManager, from what I've seen of your tests...) we use exponential back off ( https://en.wikipedia.org/wiki/Exponential_backoff ... ebo) ...

for basically if the socket dies, after the close, you want to ebo=1; sleep(ebo) if the reconnect fails then ebo= ebo*2 upto 64 seconds... that way it will only try to re-connect slowly, without spamming the other other end. It also means rather than a try, you need a loop of some kind.

You can test by starting the server and sender up, sending a message, then stopping the server, and sending another message... the current code will just give up and die... actually worse... it will consume everything in the queue saying "Message not sent" essentially dropping everything on the ground...

really want to be stubborn about sending each message, and not go on to the next if we fail to send one.
That's the appropriate thing for AM.

Once that is done, a PR (vs. v03_wip) would be great!

@petersilva
Copy link
Contributor Author

also in the sender, the except socket.error.... it does close... and then the routine exits... but the routine is supposed to return True or False.

@andreleblanc11
Copy link
Member

andreleblanc11 commented Mar 19, 2024

The migration of the NCP servers is a big milestone for AM. It should be completed within the next month.

The amserver got modified quite a lot to accomodate with generalized bulletin delivery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants