QA dataset generation and rise experiment #175

shackmann · 2025-07-23T17:50:00Z

notebook to generate a new HF QA dataset from raw text (potentially private)
new private dataset in storage
experiment based on the new dataset

…y Gemini.

alex-dr · 2025-07-28T16:19:26Z

syftr/storage.py

+            supporting_facts=[],
+            difficulty="default",
+            qtype="default",
+            gold_evidence=[],


The dataset generation script seems to populate this field - we should include it here.

alex-dr

Mike had some logic for reviewing and filtering generated QA pairs after generation. From what I've seen, we can end up with partially generated answers and stuff, so it'd be good if we can look up what he was doing and incorporate it into the notebook.

alex-dr

Looks okay, but we need to rebase this

shackmann added 2 commits July 23, 2025 11:34

adding QA dataset generation

3c86c3d

first rise experiment

de21fee

shackmann linked an issue Jul 23, 2025 that may be closed by this pull request

QA dataset generation #174

Open

shackmann requested review from alex-dr and mhauskn-dr and removed request for alex-dr and mhauskn-dr July 23, 2025 17:50

shackmann added 2 commits July 24, 2025 13:06

using all chunks for grounding data

14d3404

minor fix

dc0612d

shackmann requested review from alex-dr and mhauskn-dr July 24, 2025 13:19

shackmann self-assigned this Jul 24, 2025

shackmann and others added 7 commits July 24, 2025 15:03

minor fixes for the insights module

239a6d0

adding custom titles to plots

0d30e49

local models as a default

5cea2a0

Added notebook which creates dataset on HF using qa pairs generated b…

696d69a

…y Gemini.

Minor update.

660eb64

Added unit test and fix caught from running unit test.

0150207

rise6

7c0577f

alex-dr reviewed Jul 28, 2025

View reviewed changes

shackmann added 2 commits August 4, 2025 08:06

backup

120040c

using qtype and gold_evidence if provided

52fd477

shackmann requested a review from alex-dr August 4, 2025 15:01

alex-dr reviewed Sep 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

QA dataset generation and rise experiment #175

QA dataset generation and rise experiment #175

Uh oh!

shackmann commented Jul 23, 2025 •

edited

Loading

Uh oh!

alex-dr Jul 28, 2025

Uh oh!

alex-dr left a comment

Uh oh!

alex-dr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

QA dataset generation and rise experiment #175

Are you sure you want to change the base?

QA dataset generation and rise experiment #175

Uh oh!

Conversation

shackmann commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-dr Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

alex-dr left a comment

Choose a reason for hiding this comment

Uh oh!

alex-dr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shackmann commented Jul 23, 2025 •

edited

Loading