Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: prefix design and matching #42

Open
sneakers-the-rat opened this issue Aug 28, 2024 · 7 comments
Open

Discussion: prefix design and matching #42

sneakers-the-rat opened this issue Aug 28, 2024 · 7 comments

Comments

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Aug 28, 2024

i think we've talked about this in a few issues and in the slack but i don't think we have a single issue for this - chattin about what we want for recognizing boundaries between frames.

detecting frames currently relies on a match between a prefix value:

Error characterization:

The current problem i know of is that there is some known rate of bit corruption that makes the prefix not a perfect match. but plz list any others. Knowing exactly what kind of corruption we need to be robust to would probably go a a long way - eg. if it's just bit flips we could do a convolution, but if there are insertions/deletions it wouldn't work so good.

Constraints:

  • small: want to not spend all day on the preamble
  • specific: no false positives in the data
  • sensitive: tolerant to mild corruption
  • ?

Strategies:

  • Fuzzy matching: sum of an XOR above some value? pull in a levenshtein?
  • Range guessing: currently we just search for any match in the string, but we could allow for even more fuzziness and make the search faster by seeding searches in the place we expect the next preamble to be in the bitstream (assuming 'packets' are uniform length?)
  • Repeating preamble: the preamble is currently 0x12345678, is there some preamble that is empirically worse or better than another, or is every preamble equally likely in the data? i also wonder if we can use a repeating preamble, eg. 10101... so that we can scale it up and down depending on the quality of the signal, as well as set a threshold in miniscope-io? so like "treat any sequence that ends with 4 copies of the preamble as a buffer, if we detect too many, increase that threshold, if we detect too few, decrease it"

there is also probably some encoding magic like that manchester encoding dummy value thing that @t-sasatani just pulled in the SAMD framework that i am not aware of, so plz fill in the gaps here and i can edit them into this top comment until we come up with a plan

also @MarcelMB @phildong

@t-sasatani
Copy link
Collaborator

t-sasatani commented Aug 28, 2024

I think the error situation will change pretty much with the dummy word thing, so we'd want to collect data with this update and observe what fuzzy we need. It also might be better to stay rigorous with the preamble first and start fuzzy with the headers because the header data are relatively less coupled with firmware.

  • I need to leave my bench for today, but @MarcelMB and I will have to collect data anyway. So we can upload binaries with the updated thing while doing that.
  • We don't know how broken preambles look yet because they aren't extracted in the current code. We might want to add a debug feature for this.
  • Some preambles will be significantly worse than 0x12345678, but there shouldn't be significantly better ones.
  • If we want to do the repeat thing, we can start with repeating 0x12345678 as well. It's not really bandwidth efficient, but adding ten more 32-bit words in front of a buffer shouldn't really affect anything. If you want to make that shorter, I think [0, 255, 0, 255, ...] would be a decent choice because it wouldn't show up in sensor data and also wouldn't cause weird problems with clock recovery (it'll take pretty long to explain this; so let me do this some other time).

@MarcelMB
Copy link
Contributor

I have some files from the most recent commit 443200d

Drive link

Some buffers will be corrupted because I was updating the firmware via ATMEL ICE connected to the daughterboard multiple times while recording. But could be also nice to test since we will do that anyway for in vivo test to update the firmware needed for excitation light changes, ROI shift etc.

@sneakers-the-rat
Copy link
Collaborator Author

Side note - we definitely should make it so ya dont need to modify the firmware to set those values, but can set them via miniscope-io, ill make a separate issue for that

@MarcelMB
Copy link
Contributor

MarcelMB commented Oct 9, 2024

I am down working on this and implement a combination of strategies as @sneakers-the-rat proposed.
Let me know when you want to start. I would join on this.

  1. we could easily repeat the preamble a few times.
  2. add XOR or hamming distance to add some tolerance in identifying preamble
  3. detect error in a preamble like simple checksum and move on to next one
  4. predict location of preamble with range guessing (should be easy since we have kind of fixed buffer sizes)

I would still like to work further on error correction for the whole preamble (and or data) later on. But the preamble could be a start.

@sneakers-the-rat
Copy link
Collaborator Author

I think for this the first thing we should do is clean up the code a bit first - split up each of the acquisition methods that bundle together pulling from a queue and running in a subprocess and etc. into separate pure functions, then it should be a lot easier to write and test functionality that only affects one stage (i.e. the initial identification of the start of a buffer from a continuous bytestream) without the others. as-is it's a little challenging to access that code in tests/without mocking up the full streaming situation, so we ideally get to a place where we can just pass a bytestring and get back buffer(s).

This will require a bit more statefulness than a pure function because we want to take advantage of being able to 'go backwards' and remember positions in the recent past, but in either case we should split that out from the StreamDaq class because it's getting a bit overloaded

@MarcelMB
Copy link
Contributor

MarcelMB commented Oct 9, 2024

split up each of the acquisition methods that bundle together pulling from a queue and running in a subprocess and etc. into separate pure functions

not sure I am understanding it all correct. So you suggest cleaning up streamDAQ and put certain process into functions that live outside?

@sneakers-the-rat
Copy link
Collaborator Author

Yeah yeah, because we'll need to make that more flexible as we add more devices anyway, eg. The ephys stuff should be able to stream but it won't be reassembled into images. And currently the design of them being locked into being run as a separate process that pulls from a queue makes them pretty hard to reuse.

Shouldnt be too hard. Ill get on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants