Skip to content

ReproNim webinar talk#1

Draft
yarikoptic wants to merge 8 commits intomasterfrom
enh-repronim-webinar
Draft

ReproNim webinar talk#1
yarikoptic wants to merge 8 commits intomasterfrom
enh-repronim-webinar

Conversation

@yarikoptic
Copy link
Member

No description provided.

New presentation: ReproFlow & YODA: Structure your studies, observable
and reproducible they become

- Title slide and metadata updated for Feb 6, 2026 ReproNim webinar
- Abstract emphasizes observability and reproducibility themes
- QR code generated for slides URL
- Planning materials organized in YODA-compliant structure:
  - notes/act2-refinement-notes.md: Research on BEP028, BABS, Nipoppy,
    BIDS-flux, FAIRly big framework, and SciOps principles
  - planning/proposed-structure.md: 5-act narrative structure proposal
  - README.md: Overview and entry point for all materials

Theme: YODA principles + BIDS composition + ReproFlow/reprostim tooling
enable observable and reproducible neuroimaging workflows from
acquisition to publication. Emphasis on provenance (BEP028),
dashboard separation pattern, and AI as amplifier of structured data.
Elaborate on how modular composition creates "condensed frontiers" -
transformed, summarized, or extracted forms that are more appropriate
for downstream use while maintaining exact version-controlled links to
source materials.

Key insight: Each module/subdataset serves as both:
- A stopping point ("stopping the bleeding" of data/complexity)
- A usable interface for next level (condensed/transformed)
- A versioned link back to full source (reproducible)

Examples across domains:
- Neuroscience: TB of ephys recordings → spike trains (1000x smaller)
- Software: Source code → compiled binaries (platform-appropriate)
- BIDS: Multi-stage cascade (DICOM → BIDS → derivatives → paper)
- Data analysis: Individual measurements → summary statistics
- AI/ML: Full corpora → embeddings/indices
- Meetings: Video recordings → minutes
- Genomics: Full genomes → variant calls (1000x smaller)
- Dashboards: .tsv data → interactive visualizations

Pattern enables:
- Cognitive load reduction (work at appropriate level)
- Performance (smaller, transformed data)
- Reproducibility (exact source association via git hexsha)
- Flexibility (multiple frontiers from same source)
- Evolvability (regenerate as methods improve)

Anti-pattern: Orphaned frontiers without source links
Best practice: Version control both source and frontier as modules

Visual metaphor: "Surface you create, depth you preserve"

This concept integrates throughout Acts II-IV of the presentation.
Software section:
- Add NeuroDebian as example of source → package transformation
- Add reproducible-builds.org for bit-identical binaries
- Add snapshot.debian.org (~20PB) as non-git archival approach
- Emphasize pattern is universal, not DataLad/git-specific

Literature section (complete rewrite):
- Replace generic example with DANDI Archive citation workflow
- Detail dandi-bib: metadata → BibTeX/RIS/Zotero (daily automation)
- Detail citations-collector: DOI → citation discovery (WiP)
- 8 citation types (Publication, Preprint, Protocol, etc.)
- 11 relationship types (Cites, Uses, IsDocumentedBy, etc.)
- Show multi-layer frontier condensation in action
- Zotero as "dashboard" - regenerable view of version-controlled data

Meetings section:
- Add real-world practice of maintaining local Zoom archive
- Emphasize reusable resource (decisions, training, quotes)
- Storage cheap, context priceless

Universal pattern section:
- New section: "The Pattern is Universal, Not Tool-Specific"
- Compare git/DataLad, snapshot.debian.org, container registries,
  data repositories, academic citations
- Emphasize principles over tools: explicit linking, retrievability,
  versioning, automation, modularity
- Message: "Pattern is ancient, tools evolve—embrace principles"

All examples now concrete, traceable projects with URLs.
Document how to weave frontier condensation throughout all 5 acts:

- Act I: Introduce concept with Principle 3 (modular composition)
- Act II: Show in practice (ReproFlow, tools, dashboards)
- Act III: BIDS as 4-stage frontier cascade
- Act IV: AI as frontier generator (structured vs unstructured)
- Act V: Universal pattern across domains

Key reframings:
- BIDS pipeline = cascade of frontiers (DICOM → derivatives → paper)
- Dashboards = visualization frontiers (consume, don't own data)
- AI summaries = version-controlled frontiers with source links
- Tools comparison = different condensation strategies

Visual motif: Two-layer diagrams (frontier ⇅ source)
Terminology: Surface/depth, frontier/source, condensation/link

New slides proposed:
- Frontier condensation pattern intro
- Software example: NeuroDebian + reproducible-builds + snapshot
- Literature example: dandi-bib workflow
- Meeting archives as resource
- Universal pattern comparison table

Narrative thread: 'Surface you create, depth you preserve'

Questions for refinement discussion tomorrow morning.
Comprehensive overview of current status and tomorrow's agenda:

Status:
- Presentation header updated and committed
- Materials organized in YODA-compliant structure
- Research complete on BEP028, BABS, Nipoppy, BIDS-flux, etc.
- Frontier condensation concept developed and documented

Key breakthrough:
- Frontier condensation = hierarchical transformation pattern
- Each module: stopping point, transformation, usable interface, linked source
- Tagline: 'Surface you create, depth you preserve'
- Unifies YODA, BIDS, ReproFlow, dashboards, AI under one framework

Tomorrow's agenda:
1. Review/refine frontier condensation concept
2. Decide presentation structure (explicit theme vs. woven throughout)
3. Prioritize new slides (high/medium/low)
4. Content decisions (keep/reduce/enhance)
5. Visual design (two-layer diagrams)
6. Time allocation (~45 min webinar)

Questions to resolve:
- Terminology: 'frontier condensation' or alternative?
- Emphasis: DataLad-specific vs. universal pattern?
- Depth: Tool details vs. conceptual overview?
- Personal anecdotes: Include Zoom archive practice?
- Slide count: Realistic for Feb 6 deadline?

Resources ready: 4 planning docs, all references documented
Timeline: 4 days to Feb 6 (realistic but tight)

Differentiator: YODA as transformation framework, not just organization
@asmacdo
Copy link
Member

asmacdo commented Feb 2, 2026

Perhaps some data not added? I ran datalad get -n -r . I also ran npm install and npm start but still did not render many of the images

datalad get error (probably not related)
[INFO   ] Ensuring presence of Dataset(/home/austin/devel/talks) to get /home/austin/devel/talks 
install(error): /home/austin/devel/talks/2022-nih-compcore (dataset) [Failed to clone from any candidate source URL. Encountered errors per each url were:                                                                    
- [email protected]:con/talks.git/2022-nih-compcore
  CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false clone --progress [email protected]:con/talks.git/2022-nih-compcore /home/austin/devel/talks/2022-nih-compcore' failed with exitcode 128 [err: 'Cloning into '/home/austin/devel/talks/2022-nih-compcore'...
fatal: remote error: 
 con/talks.git/2022-nih-compcore is not a valid repository name
Visit https://support.github.com/ for help
CommandError: 'ssh -o ControlPath=/home/austin/.cache/datalad/sockets/63d7c056 -o SendEnv=GIT_PROTOCOL [email protected] 'git-upload-pack '"'"'con/talks.git/2022-nih-compcore'"'"''' failed with exitcode 1']]
Failed images Screenshot From 2026-02-02 14-09-28 Screenshot From 2026-02-02 14-09-18 Screenshot From 2026-02-02 14-09-12

@yarikoptic
Copy link
Member Author

yarikoptic commented Feb 2, 2026

tip of trade: when doing collapsed <details> - leave empty line after html element so it renders properly

you need to add https://datasets.datalad.org/centerforopenneuroscience/talks/.git remote which would have annexed content, here we have none.

@asmacdo
Copy link
Member

asmacdo commented Feb 2, 2026

tip of trade: when doing collapsed

Details - leave empty line after html element so it renders properly

oops frequent error fix now.

you need to add https://datasets.datalad.org/centerforopenneuroscience/talks/.git remote which would have annexed content, here we have none.

Can you add that to the README (that did work)


<div style="position: relative; width: 100%; height: 90vh;">

<img src="pics/bids-nipoppy.png" class="" width="80%" style="position: absolute; top: 3%; left: 0%" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nipoppy bids study layout has been updated. suggest screenshot of nipoppy/nipoppy#687

Copy link
Member

@asmacdo asmacdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think theres too much info. Especially in the images, many contain more text than is readable/digestable during a live presentation, which can make the key point easy to miss (e.g., the tree structure for OpenNeuroDerivatives, or the container dataset composition for repronim-containers).

Comment on lines +175 to +177
- `git reset --hard && git clean -dfx` -- no evil was done
- `git reset --hard HEAD^` -- forget we did it
- `git reset --hard COMMITISH` -- get-the-hell-out-of-here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the humor, but the intended audience of this may not get the joke.

Suggested change
- `git reset --hard && git clean -dfx` -- no evil was done
- `git reset --hard HEAD^` -- forget we did it
- `git reset --hard COMMITISH` -- get-the-hell-out-of-here
- `git reset --hard && git clean -dfx` -- discard all uncommitted changes and untracked files
- `git reset --hard HEAD^` -- undo the last commit (and its changes)
- `git reset --hard COMMITISH` -- reset to a specific earlier state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Not sure"...


----

### DataLad runs in the wild: [datalad-usage-registry](http://github.com/datalad/datalad-usage-dashboard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not obvious that this is a link

14.33%
```

- and if we know the command on how to `get` the file ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont get where you're going with this


### [datalad containers-run](https://docs.datalad.org/projects/container/en/stable/generated/man/datalad-containers-run.html#datalad-containers-run) it now!

![](pics/containers-run.svg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"t and use"? (in the image)


<small>

N.B. Talk to Alex Waite about their work on guaranteeing reproducibility and network encapsulation of containers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the slide? maybe link to his github?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slides now are still just a copy from distribits talk and there Alex was in the audience and it made sense... for this one indeed doesn't directly, I will remove

Comment on lines +572 to +599
----

### We already deal with "global" layouts

#### defined "globally" while relying on "packages" to adhere to the specifications.

- Operating Systems layouts, e.g.
- [Filesystem Hierarchy Standard (FHS)](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard)
- [XDG (Cross-Desktop Group)](https://specifications.freedesktop.org/basedir-spec/latest/)

<small>

N.B. YODA skill: Use Joey's [etckeeper](https://etckeeper.branchable.com) to keep your `/etc` under git

</small>

----

### We already have "project" layouts

#### defined locally per project

- Programming language/platform specific
- e.g. think of a typical Python project
- Typically not nested (unless "vendoring")


----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could lead into layouts needed for YODA in 1 slide, no need to get into the weeds.

@yarikoptic
Copy link
Member Author

Mary Poppin Bag is just wonderful for this indeed!

Comment on lines +537 to +542
<small>

N.B. Talk to Alex Waite about their work on guaranteeing reproducibility and network encapsulation of containers.

</small>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<small>
N.B. Talk to Alex Waite about their work on guaranteeing reproducibility and network encapsulation of containers.
</small>

Co-authored-by: Austin Macdonald <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants