Skip to content

Conversation

SarahGrand
Copy link

@SarahGrand SarahGrand commented Jul 10, 2025

GH-5327
Design Doc for RDF 1.2 implementation

@SarahGrand SarahGrand changed the title GH-5327 Design Doc for RDF 1.2 GH-5327 Design Doc for RDF 1.2 RFC Jul 10, 2025
@odysa
Copy link

odysa commented Jul 10, 2025

RFC: RDF1.2 Design
Our team has reviewed the RDF1.2 design (Turtle & MemoryStore) internally and would greatly appreciate feedback from the RDF4J community.
Your insights and comments are very welcome!
cc @hmottestad @kenwenzel @nguyenm100

@kenwenzel
Copy link
Contributor

@SarahGrand Thank you for the good compilation of main changes in RDF 1.2 and implementation aspects in RDF4J.
As far as I can see, the MemoryStore needs only minimal changes and you already have a good plan to extend the Turtle parser and writer.
I would propose to store the language direction in a separate field within Literal that may have a similar signature to Optional<String> getLanguageDirection(). You could even introduce an enumeration enum LanguageDirection { LTR, RTL } and use this instead of a generic String datatype.
@hmottestad Would it be fine to integrate everything related to RDF 1.2 directly into the existing packages? Should we drop and/or rename the RDF-star related classes?

@nguyenm100
Copy link
Contributor

Thanks for the feedback @kenwenzel .. quick (logistical) note. Sarah's internship is only till 8/15 and thus any decisions and/or feedback would be appreciated well in advance of that date (aka asap). She's hoping to be able to submit some PRs shortly.

@SarahGrand
Copy link
Author

SarahGrand commented Jul 23, 2025

@kenwenzel We had discussed whether or not to store the base direction as a separate field within Literal, and thought that it would be more efficient to store it together with the language tag.

In most cases, a Literal would be retrieved and converted to string as a whole, so storing the language and direction as separate fields requires concatenating them together each time. If they are both stored in the language field, then converting to string is unchanged from the current implementation and saves the extra overhead of concatenating the base direction to the language tag.

The only time the base direction needs to be retrieved separate from the language tag is if the SPARQL functions LANGDIR or hasLANGDIR are called, which is much more rare than all of the cases where the language tag and base direction need to be retrieved and concatenated together.

What are your thoughts on why storing the direction as a separate field would be preferable?

@kenwenzel
Copy link
Contributor

@SarahGrand Thank you for the clarification. Based on the RDF 1.2 specification a literal's language and base direction are two separate attributes. If base direction is encoded within the language attribute those semantics would be broken.

@hmottestad
Copy link
Contributor

I'm still on vacation. I'll try to take a look at this on Wednesday.

@odysa
Copy link

odysa commented Jul 24, 2025

@hmottestad @kenwenzel Since this introduces breaking changes, do you think we should release it as a major version, version 6, and create a new branch for it, separate from develop?

@nguyenm100
Copy link
Contributor

hey folks, can we get a decision on which branch the PR should be submitted to? @kenwenzel @hmottestad

@kenwenzel
Copy link
Contributor

@nguyenm100 @SarahGrand You can use rdf12 and we decide later if it should be part of RDF4J 5 or an upcoming Version 6.

@hmottestad
Copy link
Contributor

@kenwenzel is that branch up to date with the develop branch?

@kenwenzel
Copy link
Contributor

@hmottestad Yes - at least up to the state of yesterday ;-)

@SarahGrand
Copy link
Author

SarahGrand commented Jul 29, 2025

@SarahGrand Thank you for the clarification. Based on the RDF 1.2 specification a literal's language and base direction are two separate attributes. If base direction is encoded within the language attribute those semantics would be broken.

@kenwenzel I'm not sure I fully understand what you mean that it will break the semantics. Even if language and direction are stored within the same string, they can be deterministically separated by splitting on "--", which is only used to separate the language and direction in the serialization formats. Can you clarify what you mean, and whether or not it would be acceptable to store the two attributes in one string to save time on concatenation?

I will make a PR soon for the implementation proposed in my design, but want to make sure we are on the same page first about how the language direction should be implemented.

@kenwenzel
Copy link
Contributor

@SarahGrand I would propose to intoduce a separate method getBaseDirection(). I understand that concatenation might introduce a slight performance cost (although most writers are probably stream-based) but in the end splitting (especially when evaluating SPARQL) could be even more costly.

@hmottestad
Copy link
Contributor

Finally had some time to look through this. Looks good!

Moving the Triple interface will probably cause some fairly large reproductions in the rest of the code base. We'll see how that works out.

@SarahGrand SarahGrand force-pushed the GH-5327-rdf12-design-rfc branch from a4b817a to 50191f3 Compare August 14, 2025 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants