Mixing of .in and .interaction samples #263

Matistjati · 2024-08-17T23:57:16Z

Would it be reasonable that a problem being interactive means you should only be allowed to have .interaction samples (no .in)?

If not, should you ever be allowed to mix .in and .interaction? has this ever occured?

niemela · 2024-08-17T23:58:31Z

@austrin @jsannemo @simonlindholm @ghamerly Thoughts?

niemela · 2024-08-18T00:12:07Z

If not, should you ever be allowed to mix .in and .interaction? has this ever occured?

To clarify, the question is whether there can be a problem where some samples have .interaction and other do not and only have .in. Not if a single sample can have a .interaction and a .in, which is clearly true.

This is in fact specified in the "Interactive Problems" section. But it took way longer to find and figure this out. The relevant sections could clearly take some clarifications.

(Or, we might be too tired?)

RagnarGrootKoerkamp · 2024-08-18T08:58:55Z

Some more comments. (See also #265)

Which files exist:

data/sample/1.in
data/sample/1.ans
problem_statement/sample/1.in
problem_statement/sample/1.ans (or call this out?)
problem_statement/sample/1.interaction

For brevity I'll refer to these as data/1.* and statement/1.*.

statement/* files may or may not correspond directly to files with the same basename in data/*.

Restriction:

data/1.{in,ans} must come in pairs.
If data/1.{in,ans} does not exist, statement/1.{in,ans} must come in pairs.
statement/1.interaction and statement/1.in are only allowed for interactive problems.
statement/1.ans is always allowed.
Interactive problems are required to provide a statement/1.{in,ans} pair, and may additionally have a statement/1.interaction. It is not allowed to have data/*.in testcases that do not correspond to a statement/* testcase.

What is shown in the statement

For each sample test case (defined by either a data/1.in, statement/1.in, or statement/1.interaction)

If there is a statement/1.interaction, show that.
Otherwise, show a .in/.ans pair, where the default in data/sample/1.{in,ans} can optionally be overwritten by statement/*.{in,ans}.

We should probably restrict this to require consistency, so that either:

either all or no testcases have a statement/1.in
either all or no testcases have a statement/1.ans
either all or no testcases have a statement/1.interaction

Custom output validation problems may override the default data/1.ans, but do not have to. The data/1.ans is used as input to the output validator and may or may not be a valid answer in itself. If a sample/1.ans is provided, tooling can verify that it is a valid answer indeed. This can be used to e.g. have a high-precision data/1.ans, but a lower precision statement/1.ans to show to teams.

Interactive problems are allowed (but not required) to have a statement/1.interaction for every test case. Instead of that, they are also allowed to have a statement/1.{in,ans} pair that can be shown instead for 'fake interactive' problems, in particular for problems where (random) input data is generated on the fly and is passed to the team submission as-if it were a classic input-output problem.

What is available to contestants as download

default & custom validation: give data/1.in and statement/1.ans if present, otherwise data/1.ans
interactive: give statement/1.{in,ans}.

When statement/1.interaction is not present, statement/1.{in,ans} are the generated input and corresponding answer.
When statement/1.interaction files are also present, these statement/1.{in,ans} are the input files to the testing tool.

TODO: for interactive problems with statement/1.interaction files, do we require or allow statement/1.{in,ans} files?
If we require them, that means every shown interaction must actually have a corresponding testcase for download, which I think is good. If we allow them, then two things could happen:

the corresponding data/1.{in,ans}

Fake interactive problems / generated input problems

Just to repeat: It's possible to have problems with on-the-fly generated input, by specifying them as an interactive type problem, but then not providing statement/1.interaction files. Instead, statement/1.{in,ans} can be provided for the generated .in and corresponding .ans, while data/1.{in,ans} are the instructions to the interactor, which takes the role of both an input generator and output validator.

interaction with `run_samples: false`?

We added run_samples: false to avoid running on samples, e.g. in cases where they do not follow the spec because they use n=10 instead of n=1000 (which would be guaranteed for secret data).

Instead of providing data/sample/*.{in,ans} files, this could now be implemented by leaving data/sample empty and only providing these files in problem_statement/sample/*.{in,ans}. That may be preferred, and then we drop run_samples?

Summary

Files in data/sample/* are the testcases that are judged as samples.
Files in problem_statement/sample/* control/override what is shown in the statement and available for download.

Question: I think it makes sense to require that data/sample/* and problem_statement/sample/* either contain the same set of cases, or else one of them must be empty, just to ensure consistency.

simonlindholm · 2024-08-18T21:36:13Z

Fake interactive problems / generated input problems

Do we have examples of this actually being used in practice? There's never a case where you actually strictly need this, right? I can say I've never felt the need, and it feels like it just risks providing worse UX (e.g. having judges show confusing UI and not being able to show failing test case input/output in the same way as for other non-interactive problems) and adding a lot of bug potential around stdin/stdout fd closures, EOF checking and other termination behavior.

To clarify, the question is whether there can be a problem where some samples have .interaction and other do not and only have .in.

FWIW, https://github.com/zehnsechs/egoi-2024-testdata/tree/main/day1/gardendecorations/data/sample had this, because we wanted to have a sample test case that would be run by Kattis while also splitting the .interaction file into three because it was a multi-run problem. I'm not sure how we're envisioning samples to work for multi-run problems.

niemela · 2024-08-18T23:57:57Z

I'm not sure how we're envisioning samples to work for multi-run problems.

multi-pass will use .interaction files (https://www.kattis.com/problem-package-format/spec/2023-07-draft.html#multi-pass-validation).

(Was that the answer you were looking for?)

simonlindholm · 2024-08-19T07:05:28Z

Ah, thanks, that makes sense.

RagnarGrootKoerkamp · 2024-08-19T18:58:37Z

Do we have examples of this actually being used in practice?

Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.

RagnarGrootKoerkamp · 2024-08-19T19:06:05Z

Reopening, since there are still some unresolved discussions in #291.

I think one thing that also isn't really specified is whether for custom output validation and interactive problems, we require that for each test case in data/sample/statement, the same set of files is present, rather than an independent set of valid files.

simonlindholm · 2024-08-19T21:38:58Z

Yes, we had multiple such 'generated input' problems for BAPC, in particular where we guarantee that the input is random, and hence regenerated on each re-submission.

Interesting. Is the problem package available for any of them? I do feel like that kind of setup is inadvisable, and it's better to keep the test data static while still giving a guarantee that it was generated at random.

RagnarGrootKoerkamp · 2024-08-20T09:35:28Z

See problem L here: https://2022.bapc.eu/bapc/problems.pdf
(You can download the sources via the homepage.)

With static random testdata, there is always the probability that some specific solution hits an annoying edge case, which is avoided by regenerating it each time.

JoelNiemela mentioned this issue Aug 18, 2024

Clarify usage of .interaction files #264

Merged

RagnarGrootKoerkamp mentioned this issue Aug 18, 2024

Revisit samples #291

Merged

RagnarGrootKoerkamp closed this as completed in #291 Aug 18, 2024

RagnarGrootKoerkamp reopened this Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixing of .in and .interaction samples #263

Mixing of .in and .interaction samples #263

Matistjati commented Aug 17, 2024 •

edited

Loading

niemela commented Aug 17, 2024

niemela commented Aug 18, 2024

RagnarGrootKoerkamp commented Aug 18, 2024 •

edited

Loading

simonlindholm commented Aug 18, 2024

niemela commented Aug 18, 2024

simonlindholm commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 19, 2024

simonlindholm commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 20, 2024 •

edited

Loading

Mixing of .in and .interaction samples #263

Mixing of .in and .interaction samples #263

Comments

Matistjati commented Aug 17, 2024 • edited Loading

niemela commented Aug 17, 2024

niemela commented Aug 18, 2024

RagnarGrootKoerkamp commented Aug 18, 2024 • edited Loading

Which files exist:

What is shown in the statement

What is available to contestants as download

Fake interactive problems / generated input problems

interaction with run_samples: false?

Summary

simonlindholm commented Aug 18, 2024

niemela commented Aug 18, 2024

simonlindholm commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 19, 2024

simonlindholm commented Aug 19, 2024

RagnarGrootKoerkamp commented Aug 20, 2024 • edited Loading

Matistjati commented Aug 17, 2024 •

edited

Loading

RagnarGrootKoerkamp commented Aug 18, 2024 •

edited

Loading

interaction with `run_samples: false`?

RagnarGrootKoerkamp commented Aug 20, 2024 •

edited

Loading