Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request programming language support here #1029

Open
rien opened this issue Jan 25, 2023 · 23 comments
Open

Request programming language support here #1029

rien opened this issue Jan 25, 2023 · 23 comments

Comments

@rien
Copy link
Member

rien commented Jan 25, 2023

If you want to use a programming language that Dolos does not support yet, please ask here! This helps us with prioritizing which programming languages we should focus on first.

We currently ship Dolos with the following programming languages:

  • Bash
  • C
  • C++
  • C#
  • Elm
  • Java
  • JavaScript
  • Python
  • Typescript
  • Tsx

If your programming language is not in the list of languages supported out-of-the box, there is a high possibility that a tree-sitter parser already exists for that language. If that is the case, it should be easy to add support for your language.

In any case, let us know which languages you want to use with Dolos!

@rien rien pinned this issue Jan 25, 2023
@BTWS2
Copy link

BTWS2 commented Jan 25, 2023

Adding support for HTML would be great, because the HTML judge allows for exercises with different solutions (e.g. add a title (doesn't matter which text), add at least 3 items to a list, ...).

@rien
Copy link
Member Author

rien commented Jan 26, 2023

@BTWS2 there exists a parser for HTML, but it wouldn't work good for plagiarism detection because the parser ignores tag names, attribute names, the exact content itself, ...

This is how tree-sitter converts an example HTML page:
image

Especially if the underlying structure of the analyzed HTML files is expected to be very similar, using this parser would result in very high similarities. Using this parser, Dolos reports that the homepage of GitHub and the homepage of Dodona have a similarity of 88%.

I would prefer to stick to languages that work good with Dolos. However if you want, you can try it out yourself by installing tree-sitter-html using npm or yarn, Dolos is able to automatically detect and use this parser if it is available.

@BTWS2
Copy link

BTWS2 commented Jan 27, 2023

@rien No problem, thank you for the insight.

@anilgulecha
Copy link

anilgulecha commented Feb 14, 2023

@rien can there support simple text?

Regular assignments (like essays) is the usecase I'm thinking for this. The tokenizer is as simple as splitting by space? Or sentences split by "."

@rien
Copy link
Member Author

rien commented Feb 14, 2023

@anilgulecha Dolos is specifically made for plagiarism detection on source code. There are tools that should perform better on just text than Dolos.

That said, we do indeed have a tokenizer that does split on spaces which you can use by passing --language char. However, in that case you might be better / faster using the diff command or something else that does string matching.

@anilgulecha
Copy link

Thanks for the char recommendation. will try it out.

@yafuerst
Copy link

yafuerst commented Mar 23, 2023

I need support for the language Modelica. It is not supported by tree-sitter, but there are two parsers on github:
https://github.com/OpenModelica/tree-sitter-modelica
https://github.com/mtiller/modelica-tree-sitter

I managed to get them running using tree-sitter directly, but I had no luck adding them to dolos yet. Do you have any tips? I am on Linux if thats important.

@rien
Copy link
Member Author

rien commented Apr 3, 2023

@yafuerst Dolos will try to find the parser with a fitting name (tree-sitter-${name}) if you add the language with the -l option. It will look in the node_modules accessible to Dolos (local, per user, global).

If you've managed to get them working but if Dolos doesn't work, you can try "installing" the parser fro your user or globally with npm link or npm link --global. For the modelica-tree-sitter parser to be detected by Dolos, you will have to change the name to tree-sitter-modelica.

Let me know if it doesn't work and we'll figure it out.

@alexey-sh
Copy link

what about vue, react?

@rien
Copy link
Member Author

rien commented Aug 21, 2023

@alexey-sh since those languages use multiple languages (template syntax, css, html, ...), tree-sitter does not handle those out-of-the box, so some additional work is required for them to work.

In addition, since HTML and CSS often have a lot of common code fragments between submissions, Dolos isn't very good in detecting plagiarism with them (you get a lot of false positives).

However, we do plan on changing Dolos under the hood to support these kind of languages in the future!

@DhruvDh
Copy link

DhruvDh commented Sep 19, 2023

Hi, I am running dolos with the following version -

Dolos v2.3.0
Node v18.16.0
Tree-sitter v0.20.1

npm only has [email protected], and it seems dolos cannot find it because of this. Any workaround? I have tried installing it locally and globally with pnpm and npm.

@rien
Copy link
Member Author

rien commented Sep 20, 2023

@DhruvDh with the way we currently integrate tree-sitter languages, we will have to wait on tree-sitter-java to publish a new release. Recently, someone already made an issue with the maintainers of that parser to create a new release, let's hope they publish it soon: tree-sitter/tree-sitter-java#163

As an alternative, you can try cloning this repository and updating the base tree-sitter version manually. However, that van be cumbersome.

We already have some ideas how to avoid this problem with Dolos in the future (see #1028), however we've not started on the implementation of this solution yet.

@DhruvDh
Copy link

DhruvDh commented Sep 20, 2023

I was able to solve this by the following, I am not confident I understand it correctly, so I won't attempt an explanation.

# a fork with package.json version set to 0.20.1
pnpm install DhruvDh/tree-sitter-java 
pnpm rebuild
pnpm install @dodona/dolos
pnpm exec dolos run info.csv

@nachiket
Copy link

nachiket commented Apr 3, 2024

Would you support Verilog? Thanks.

@rien
Copy link
Member Author

rien commented Apr 4, 2024

Hi @nachiket, thanks for your suggestion! There is an official verilog parser available, so this is definitely possible.

We'll put it in our schedule and will let you know when support for verilog has landed.

@rhz1949
Copy link

rhz1949 commented Oct 12, 2024

Would you support Go and Rust? Thanks

@rien
Copy link
Member Author

rien commented Oct 24, 2024

@rhz1949 working on that right now. If everything goes well, support for Go and Rust will be added in the new release I will make today.

@kudhru
Copy link

kudhru commented Nov 29, 2024

Is it posible to support Ocaml as well?

@rien
Copy link
Member Author

rien commented Nov 29, 2024

@kudhru yes, there seems to be an official Ocaml parser (https://github.com/tree-sitter/tree-sitter-ocaml).

It seems they provide three parsers: for implementations (.ml) interfaces (.mli) and types. Will it work for you if we only support the implementations? Because Dolos currently supports single-file submissions anyway.

@kudhru
Copy link

kudhru commented Nov 29, 2024

Yes, in the programming exam I am conducting, I will have only one .ml file submission only per student.

But how much time would it take? Or could you tell me if I can do it locally on my machine? As I need to conduct an end-semester exam after 10 days.

@benhadid
Copy link

benhadid commented Dec 4, 2024

hello,
is it possible to support mips / risc-v assembly languages
thank you

@rien
Copy link
Member Author

rien commented Dec 5, 2024

@benhadid I cannot finds a specific parser for MIPS / RISC-V, but there seems to be a generic tree-sitter-asm parser, can you check whether this would fit your use-case? You can do so by checking their test cases: https://github.com/RubixDev/tree-sitter-asm/tree/main/test/corpus

If not, is there any other parser from tree-sitter's list of parsers that would fit?

@benhadid
Copy link

benhadid commented Dec 5, 2024

@rien I checked the links you gave me and unfortunately this parser tree-sitter-asm is for the x86 architecture (intel/amd processors).

I have googled a bit and found this tree-sitter-riscvasm parser for RISC-V. Can it be used with Dolos ?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants