Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow valid XML characters outside of BMP when storing property #163

Closed
wants to merge 5 commits into from

Conversation

niklasnatter
Copy link
Contributor

@niklasnatter niklasnatter commented Nov 3, 2020

This PR adjusts the validation of valid XML characters in the Client to allow UTF8 characters that are not in the BMP. At the moment, treating these characters as invalid makes it impossible to store emojis such as 🤓 or 🤯 (see #85 (comment)).

Per specification, the characters in the range U+10000–U+10FFFF are valid inside of XML documents: https://en.wikipedia.org/wiki/Valid_characters_in_XML#XML_1.0

It looks like the stackoverflow answer that was used as a reference when implementing the RegEx does not include this range, but the range is included in multiple other answers:

@dbu
Copy link
Member

dbu commented Nov 3, 2020

stackoverflow-driven development :-D thanks for looking into this one!

can you please replace the test Jackalope\Transport\Jackrabbit\ClientTest::testOutOfRangeCharacterOccurrence with something that validates that we can actually successfully store utf-8 to jackrabbit and read it again? does utf-8 work for both content and node/property names?

@alexander-schranz
Copy link
Contributor

alexander-schranz commented Nov 3, 2020

The tests are failing and it seems this regex was changed in #65.

Did test some jackrabbit versions and at current state I have the following state not sure how we should handle this over an option? Or is there a version constraint/detection currently so we can build to logic on top of that?:

Version Support Emoticons (filesystem storage) Support Emoticons (mysql storage)
2.8 No Not tested
2.18.5 Yes Yes
2.20.1 Yes Yes

/cc @dbu @lsmith77

@alexander-schranz
Copy link
Contributor

There exists a version parameter can we maybe use that and use based on that another regular expression?

if (! version_compare(self::VERSION, $this->descriptors['jcr.repository.version'], '<=')) {

@dbu do you know if $this->descriptors is always available? I'm not sure if this is always set when storing properties.

If its always set we could maybe go with something like:

$regex = '/[^\x{9}\x{a}\x{d}\x{20}-\x{D7FF}\x{E000}-\x{FFFD}]+/u';
if (version_compare($this->descriptors['jcr.repository.version'], '2.18.0', '>=')) {
    // add support for storage of emoticons
    $regex = '/[^\x{9}\x{a}\x{d}\x{20}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u';
}

@dbu
Copy link
Member

dbu commented Nov 4, 2020

we could say that if the version descriptor exists and the version is high enough, we use the new regex?

@niklasnatter
Copy link
Contributor Author

Just tested this. Looks like $this->descriptors is not set at the time of writing the property in our case 😕
I guess i could call the getRepositoryDescriptors() method to get the version descriptor, but this would send an additional request, right? How would you solve this?

@alexander-schranz
Copy link
Contributor

@nnatter That is really sad, I think we should not call getRepositoryDescriptors, as triggering a additional Request I think is bad and should be avoided. So as we have no version I see the only way by adding a additonal config parameter.

Something like addition_utf8_support: true or we go with something like doctrine by setting the version version: 2.18.5.

@dbu What do you think? Do you see another solution then a parameter?

@niklasnatter niklasnatter force-pushed the patch-1 branch 2 times, most recently from c3dea89 to 8143fde Compare November 6, 2020 10:25
@niklasnatter
Copy link
Contributor Author

Added a jackalope.jackrabbit_version parameter and a test case 🙂

Copy link
Member

@dbu dbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a clean solution, great job!

and i agree that we should prefer configuring the version. i added some nitpicks.

i was wondering if we should initialize the version from the descriptors if it is not set, if we happen to fetch descriptors anyways. but that would lead to super weird behaviour when something works or does not work depending on whether one did something else before that...

what we should do however is compare the configured version to the actual version around line

if (! version_compare(self::VERSION, $this->descriptors['jcr.repository.version'], '<=')) {
and emit a notice with trigger_error. imho we should only emit the notice if major or minor version do not match, but ignore the patch version.

src/Jackalope/Transport/Jackrabbit/Client.php Outdated Show resolved Hide resolved
src/Jackalope/RepositoryFactoryJackrabbit.php Outdated Show resolved Hide resolved
src/Jackalope/RepositoryFactoryJackrabbit.php Outdated Show resolved Hide resolved
src/Jackalope/Transport/Jackrabbit/Client.php Show resolved Hide resolved
@@ -1328,6 +1333,11 @@ protected function isStringValid($string)
{
$regex = '/[^\x{9}\x{a}\x{d}\x{20}-\x{D7FF}\x{E000}-\x{FFFD}]+/u';

if ($this->version && version_compare($this->version, '2.18.0', '>=')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please extract this version into a constant declared next to the VERSION constant. lets call it UTF8_SUPPORT_MINIMAL_VERSION or something like that.

@dbu
Copy link
Member

dbu commented Nov 11, 2020

ah, and please add a note to the changelog.

do we need a PR on the phpcr-odm bundle as well to allow configuring this, or is that already generic enough that it just works? if it "just works" we should document the new parameter at https://symfony.com/doc/current/cmf/bundles/phpcr_odm/configuration.html#phpcr-session-with-jackalope-jackrabbit

@niklasnatter
Copy link
Contributor Author

niklasnatter commented Nov 11, 2020

@dbu Thanks for your review!

I implemented the suggestions and created a PR for the documentation here: symfony-cmf/symfony-cmf-docs#866
The parameter works without any changes on the phpcr-odm bundle 🙂

Copy link
Member

@dbu dbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did some cleanup and merge this in #164

thanks a lot for your effort, this is good now!

@dbu dbu closed this Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants