Make program (validators et.c.) environment more strictly defined. #330

niemela · 2024-08-20T10:13:38Z

Comment from @gkreitz:

It is super annoying that validators do not seem to have any documentation on what they need or what language they are written in. Supplying a run and/or build script without any guarantees on the environment other than the existence of cc and cpp just feels bad. As someone building a backend, I want to know what language the validator is written in, and what it needs for its runtime. As someone building a validator, I don't see how I would ever use run and build to safely do anything other than run C/C++. I'm very unsure what problem was solved by adding run and build instead of re-using the same system as is in place for submissions.

@eldering I think you have been pushing mostly for the necessity of this? @Tagl you as well?

The text was updated successfully, but these errors were encountered:

Tagl · 2024-08-20T10:41:23Z

My main use case for this has been linking to a library with C++ for the output validator where it saves a massive amount of time and ensures quality of the validator. The most recent use case was the use of the library Eigen for https://open.kattis.com/problems/flaedasmidi because our original validator was numerically unstable, and we needed something more robust.

That being said, this sort of thing I encounter very rarely, but sometimes I need to do it and the only other way to make the problem work would be to either implement the library myself or try to retrofit it to the problem package, both of which would take much more time than reasonable.
I am fine with limiting to specific commands, but it should not be too restrictive.
Allowing Python 3, C, and C++ as the only general purpose languages for validators is also something I support, with a possible addition of other high performance languages such as Rust if that becomes high in demand.

In Python 3 you can simply add the library as a directory within your validator and import it, but in C and C++ it's much nicer to be able to build the library as needed and link to it.

I do think this is something that may be best handled by allowing the things we know we need and then if we encounter more needs in the future we can add those to the spec with great hesitance to avoid too many required dependencies.

niemela · 2024-08-20T10:57:31Z

with a possible addition of other high performance languages such as Rust if that becomes high in demand.

There is always the possibility to add more languages in the future.

eldering · 2024-08-21T20:13:29Z

I think @Tagl's case is already an important one: you might need to include a library: eigen, libgmp, matplotlib, etc that is not and should not be available in a contest language, and it doesn't make sense to add a contest language just to be able to build your validator (or other type of program).

Another example I've seen, are custom validators that consist of multiple programs chained together with a shell script. For example, first a checktestdata program to validate the syntax and then another program to validate the semantics. We've in the past also combined two separate validators into a single script.
The bottom line is that supporting scripts gives a lot of flexibility.

OTOH, I don't think that the submission compilation environment is (much) better defined. If you specify C++ as language for your program, you still don't know which compiler, version and flags will be used, so it might still fail to compile.

gkreitz · 2024-08-22T13:02:25Z

I think @Tagl's case is already an important one: you might need to include a library: eigen, libgmp, matplotlib, etc that is not and should not be available in a contest language, and it doesn't make sense to add a contest language just to be able to build your validator (or other type of program).

I feel like I'm missing something here. How does the build script ensure/know that the library it needs is available within the sandbox we run it in?

Another example I've seen, are custom validators that consist of multiple programs chained together with a shell script. For example, first a checktestdata program to validate the syntax and then another program to validate the semantics. We've in the past also combined two separate validators into a single script. The bottom line is that supporting scripts gives a lot of flexibility.

I agree that it gives flexibility, but I feel like it's so underspecified that it's very unclear to me how one would actually (safely) make use of that flexibility when one writes validators. For instance, in your example with checktestdata, did the validator include the source code and compile it, or did it assume the binary was present somewhere on path? In the example with two separate validators, were those two C/C++ validators, or did the script for instance assume the existence of python?

I think it would be beneficial if someone has example(s) of use cases and could contribute one to https://github.com/Kattis/problem-package-format/tree/master/examples . I think it would both be a valuable contribution for problem authors to have an example to start from, and I sure would love to have an example to start from when building Kattis support that gives me at least some hint on what people in practice are gonna assume from the environment.

OTOH, I don't think that the submission compilation environment is (much) better defined. If you specify C++ as language for your program, you still don't know which compiler, version and flags will be used, so it might still fail to compile.

I see it as way less problematic to just assume that judge systems will have a somewhat sane C++ compiler with somewhat sane settings than trying to build portable shell scripts where my only guarantee is that cc and cpp exists.

As someone building a judge system, I want to choose how my system compiles C++ submissions, but I don't see how I can support arbitrary shell scripts in a good way (I'm guessing that if I provide a sandbox literally only giving what's guaranteed by the standard, most non-trivial build and run scripts simply would not work, as they will make assumptions on the environment beyond that).

RagnarGrootKoerkamp · 2024-08-22T13:28:57Z

I think the entire point of build and run scripts is that they allow customization outside what the spec defines.

For output validators:

Sometimes we have two independent output validators. They can be compiled as usual in the build script, and the run script calls both of them and either ensures they give the same answer, or that one passes before the other is called.
Once we had to pass -lpthread to the g++ invocation, because otherwise it's too slow.
Once we had to pass -lpthread to be able to write output and read input in parallel (for an interactive problem), to avoid pipes from filling up.
Once we had to pass -std=gnu++2a to g++.

While all these can be solved in other ways, it's much more convenient to just fully control the build step.

For visualizers (which are only run locally during development, not on the judge):

For some input visualizers, the build just cals g++ as usual, and the run script first calls that as usual, and then does timeout 10 convert out.svg out.png, to convert the generated svg to png (since svg apparently isn't always supported)
Some of our input visualizers parse the generated .interaction file, which is piped in in a small wrapper shell script.
Either way some of our visualizers are written in asymptote. While that won't work on the judge, it's still useful for running locally.
Sometimes a mix of languages is used: c++ or python to parse data, and python or asymptote to the final image.

niemela · 2024-08-24T20:06:37Z

I think the entire point of build and run scripts is that they allow customization outside what the spec defines.

It's a balancing act though. If we allow for too generic "whatever" then the usage will never be portable, and the format becomes useless. OTOH, of we allow too little, too many cases will be unreasonably cumbersome, and people will work around the fomat.... and it becomes useless.

The completely generic build and run scripts seems a little bit on the powerful side. What is the least powerful we could allow, that would cover most use cases (or maybe even all known use cases)?

For visualizers (which are only run locally during development, not on the judge):

That's true for input visualizers, not necessarily output visualizers.

It seems to me that all the examples you listed only need "POSIX", python, and g++ (or equivalent). Can we define exactly what should be available, and is it only that?

gkreitz · 2024-08-25T07:11:31Z

It seems to me that all the examples you listed only need "POSIX", python, and g++ (or equivalent).

I'm not sure how that solves the use case above where one needs to link libraries (e.g., eigen), but I haven't seen any responses to my question "How does the build script ensure/know that the library it needs is available within the sandbox we run it in?," so maybe I'm missing something.

eldering · 2024-08-30T08:04:29Z

It seems to me that all the examples you listed only need "POSIX", python, and g++ (or equivalent).

I'm not sure how that solves the use case above where one needs to link libraries (e.g., eigen), but I haven't seen any responses to my question "How does the build script ensure/know that the library it needs is available within the sandbox we run it in?," so maybe I'm missing something.

Sorry for the slow reply.

I think build and run scripts don't really fix issues with knowing what libraries and versions are available/required. However, I don't think that this is really worse than the "use the submission language definitions". There's a signficant difference between submission language specification and that for other programs. Submission languages just tell competitors what they can expect, and then then that's what they have to comply with. However, with other (typically jury provided) programs it's not that simple, and I think there's two cases:

If a problem was specifically developed for a contest (or other event), then it's easy to make sure that the program's requirements are met by direct communication with the jury, and just make sure that the necessary dependencies are installed in the sandbox.
If it a problem in/from an archive, then I think there's no clear way to guarantee compatibility with submission languages either. What if the script declares python as language, but is from 15 years ago, so it needs python 2? Or it requires some specific GNU extensions of the GCC compiler project? And what if it declares an obscure submission language that you don't support?

So, bottom line, I think that we can't really think of these programs' sandbox requirement the same way as for normal submissions, and I don't think we lose much by using build and run scripts, but gain a lot of flexibility.

That said, I do agree that it makes sense to specify a base set of dependencies that such a program should be able to expect, and I think that a C/C++ compiler, python and POSIX shell should be among them. Maybe we should recommend such program writers to document in a README or so any special requirements that the program has?

Tagl · 2024-08-30T08:22:49Z

"How does the build script ensure/know that the library it needs is available within the sandbox we run it in?,"

In our case with eigen, we provided the code for the library, built it separately and then linked to it.

evouga mentioned this issue Sep 17, 2024

Program clarifications #342

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make program (validators et.c.) environment more strictly defined. #330

Make program (validators et.c.) environment more strictly defined. #330

niemela commented Aug 20, 2024

Tagl commented Aug 20, 2024

niemela commented Aug 20, 2024

eldering commented Aug 21, 2024

gkreitz commented Aug 22, 2024

RagnarGrootKoerkamp commented Aug 22, 2024

niemela commented Aug 24, 2024

gkreitz commented Aug 25, 2024

eldering commented Aug 30, 2024 •

edited

Loading

Tagl commented Aug 30, 2024

Make program (validators et.c.) environment more strictly defined. #330

Make program (validators et.c.) environment more strictly defined. #330

Comments

niemela commented Aug 20, 2024

Tagl commented Aug 20, 2024

niemela commented Aug 20, 2024

eldering commented Aug 21, 2024

gkreitz commented Aug 22, 2024

RagnarGrootKoerkamp commented Aug 22, 2024

niemela commented Aug 24, 2024

gkreitz commented Aug 25, 2024

eldering commented Aug 30, 2024 • edited Loading

Tagl commented Aug 30, 2024

eldering commented Aug 30, 2024 •

edited

Loading