Fix and enhance `StringConverter()` to handle more special characters #361

mizady · 2024-12-09T02:19:18Z

Description

Fix and Enhance StringConverter to handle more special characters.
Added test cases for StringConverter to validate special character handling during serialization and deserialization.

Motivation and Context

When serializing and/or de-serializing a file location such as I:\somefile.txt, the StringConverter was not correctly handling the case of a backslash.
Fixes nanoFramework.Json Serialize/Deserialize does not handle \ correctly Home#1562

How Has This Been Tested?

Added unit tests that verified the backslash, tab, backspace, and FeedForm special characters were not being handled correctly as escaped characters by the StringConverter. All current tests in the nanoFramework.Json.Test project including the added tests are now passing.

Types of changes

Improvement (non-breaking change that improves a feature, code or algorithm)
Bug fix (non-breaking change which fixes an issue with code or algorithm)
New feature (non-breaking change which adds functionality to code)
Breaking change (fix or feature that would cause existing functionality to change)
Config and build (change in the configuration and build system, has no impact on code or features)
Dependencies (update dependencies and changes associated, has no impact on code or features)
Unit Tests (add new Unit Test(s) or improved existing one(s), has no impact on code or features)
Documentation (changes or updates in the documentation, has no impact on code or features)

Checklist:

My code follows the code style of this project (only if there are changes in source code).
My changes require an update to the documentation (there are changes that require the docs website to be updated).
I have updated the documentation accordingly (the changes require an update on the docs in this repo).
I have read the CONTRIBUTING document.
I have tested everything locally and all new and existing tests passed (only if there are changes in source code).
I have added new tests to cover my changes.

Added unit tests to verify correct handling of special characters

Added special character handling for FormFeed, Backspace, Backslash, and Tab

coderabbitai · 2024-12-09T02:19:25Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (4)

nanoFramework.Json.Test/Converters/StringConverterTests.cs is excluded by none and included by none
nanoFramework.Json.Test/JsonSerializerOptionsTests.cs is excluded by none and included by none
nanoFramework.Json/Converters/StringConverter.cs is excluded by none and included by none
nanoFramework.Json/JsonConvert.cs is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Ellerbach · 2024-12-09T08:16:18Z

So, this is not properly working as explained in Discord. Here is the investigation: https://github.com/nanoframework/nanoFramework.Json/tree/pr361

And here are the elements:

This will fail: [DataRow("I:\nano\rpath\to", "I:\nano\rpath\to")] // More complex with combination of special characters and possible new line
The string converter with the current logic only works for a json that is a single string
In case the string is part of an existing obejct, the current logic will break when it's a string wontaining potential characters like in the previous example. Also, it will fail in case of a string of length 1

To solve this problem:

As in the branch pr361 pointed above, do not use the escaped logic for objects, just comment out the code like in that branch
For single string, then, the logic should be used

Ellerbach · 2024-12-09T09:28:11Z

And the case already exists:

https://github.com/nanoframework/nanoFramework.Json/blob/f31157a19724bec6bbde21ae1e1fb2408a7a445b/nanoFramework.Json/JsonConvert.cs?plain=1

So, in short, it's about using the function I've commented for that case and calling it in the if. Otherwise, we should not use it.

Added a bypass to return a string that as already been deserialized when it lands at StringConverter.ToType() Removed portion of string escape character processing in JsonConvert as it was implemented in StringConverter (unicode escaped character processing still remains in JsonConvert) Added Tests for multiple escaped characters in a string and in an object containing a string property. Added test case for string converter test for serializing and deserializing strings with multiple escaped characters

mizady · 2024-12-15T14:36:37Z

I noticed that the discord thread was not actually linked here. Discord thread

RFC8259 - https://datatracker.ietf.org/doc/html/rfc8259
ECMA-404 - https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf

Seems like we are handling all the escaped characters correctly now from what I can tell based on testing.

Ellerbach · 2024-12-16T10:52:51Z

Are the \uXXXXstill properly handled with the new implementation?

mizady · 2024-12-16T14:00:28Z

Are the \uXXXXstill properly handled with the new implementation?

The original unicode processing part of he code is still used for unicode characters. The existing tests that test unicode properly serializing and deserializing have all passed.

nanoFramework.Json/JsonConvert.cs

Added unit tests to verify correct handling of special characters

Added special character handling for FormFeed, Backspace, Backslash, and Tab

Added a bypass to return a string that as already been deserialized when it lands at StringConverter.ToType() Removed portion of string escape character processing in JsonConvert as it was implemented in StringConverter (unicode escaped character processing still remains in JsonConvert) Added Tests for multiple escaped characters in a string and in an object containing a string property. Added test case for string converter test for serializing and deserializing strings with multiple escaped characters

Added test cases in StringConverterTests to further test serialization (StringConverter.ToJson) Added test cases in StringConverterTests to further test StringConverter.ToType and corrected the existing one (did not have removal of quotes which should happen when deserializing) Added test cases to JsonSerializerOptionsTests for multiple escaped characters and combinations such as escaped character being at the end of a string for both simple string objects and complex objects containing string properties. Corrected test CanDeserializeInvocationReceiveMessage_05 which had an expected value including quotes for a string that did not have quotes originally in it as "" becomes " and would have been removed during deserialization. In StringConverter I have made it so that it returns the string without quotes (as expected) when there are no escaped characters present as before if there were no escaped characters there would be an extra set of quotes left around the returned value. I also inverted the logic on the loop in order to ensure that the loop can exit without appending the letter portion of the escaped character if it is the last character in the string (such as a newline on the end would replace the escaped value as expected but would then attach the 'n' character to the end. I have replaced the deserialization of escaped characters to correctly replace them when deserializing an object with a string component. I also added the missing escaped characters that were added to the StringConverter in previous commit.

During de-serialization of strings that are part of complex objects there is a set of quotes removed that shouldn't be when escaped. This conditional block replaces those quotes.

Ellerbach · 2024-12-19T10:13:05Z

/azp run

azure-pipelines · 2024-12-19T10:13:25Z

Azure Pipelines successfully started running 1 pipeline(s).

Ellerbach

Before approving, can you add a test with a complex object, something that is already existing with a string that contains escaped chars so we are sure we cover properly that case as well?

For the rest, looks all goos! Thanks a lot for the efforts getting there.

mizady · 2024-12-19T13:56:18Z

The test labeled "Can_serialize_and_deserialize_object_containing_string_with_escaped_characters" located in the JsonSerializerOptionsTests has a data row for the case of escaped quotes at the beginning and end. Also in the same test class is "Can_serialize_and_deserialize_string_with_escaped_characters" which has all the same data rows for a string that is not part of a complex object. So unless I missed something (which of course is possible) then there should be tests handling all of these cases.

josesimoes · 2024-12-19T14:12:57Z

@mizady from my understanding of @Ellerbach request: you can grab Can_serialize_and_deserialize_complex_object, duplicate it and sprinkle a couple of escaped strings there.

Add 18 DataRows to the two tests for serialization and deserialization of simple strings and complex objects containing strings.

mizady · 2024-12-19T19:40:45Z

I've added 18 test cases to each of the two tests for simple string and complex object containing a string serialization and deserialization. this is now the list of tested combinations for both. Totaling 68 tests between the two.

[DataRow("a")] // Single character
[DataRow("1")] // Single numeric character
[DataRow("\t")] // Single Tab character
[DataRow("Testing / solidus")] // Forward slash in string
[DataRow("Testing solidus")] // Double space in string
[DataRow("Some string with " that needs escaping")] // String containing a quote
[DataRow("Quotes in a "string".")] // String with escaped quotes
[DataRow("Escaped last character \n")] // Newline as the last character
[DataRow("I:\Nano\rApp\app.pe")] // Backslash in string
[DataRow("Tab \t in a string \t")] // Tab character in multiple places
[DataRow("Newline \n in a string \n")] // Newline character in multiple places
[DataRow("LineFeed \f in a string \f")] // Line feed character in multiple places
[DataRow("CarriageReturn \r in a string \r")] // Carriage return character in multiple places
[DataRow("Backspace \b in a string \b")] // Backspace character in multiple places
[DataRow("TestString")] // Simple string with no special characters
[DataRow(""TestString"")] // String wrapped in quotes
[DataRow("\u0041")] // Unicode character (A)
[DataRow("\u2764")] // Unicode character (❤)
[DataRow("\x1B")] // Escape character (ASCII 27)
[DataRow("\x7F")] // Delete character (ASCII 127)
[DataRow("\0")] // Null character
[DataRow("")] // Empty string
[DataRow("Line 1\nLine 2\nLine 3")] // Multi-line string
[DataRow("Curly braces: { }")] // JSON-like curly braces
[DataRow("Square brackets: [ ]")] // JSON-like square brackets
[DataRow("Colon and comma: : ,")] // Colon and comma
[DataRow("Special symbols: @#$%^&*()_+~")] // Special symbols
[DataRow("English 中文 Español العربية हिंदी")] // Mixed language text
[DataRow("{"key": "value"}")] // JSON-like string
[DataRow(""[{"inner":"value"}]"")] // Serialized JSON-like string
[DataRow("{"name":"John","age":30}")] // Serialized JSON
[DataRow("Invalid escape: \q")] // Invalid escape sequence
[DataRow("https://example.com/api?query=escaped%20characters")] // URL
[DataRow("Unicode \u2764, Newline \n, Tab \t, Backslash \")] // Combination of cases

sonarqubecloud · 2024-12-19T19:46:15Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Ellerbach

Great, thanks a lot for the fix!

Ellerbach · 2024-12-20T10:29:57Z

I've added 18 test cases to each of the two tests for simple string and complex object containing a string serialization and deserialization

Amazing! Thanks a lot!

Ellerbach · 2024-12-20T10:30:20Z

@josesimoes we do have 2 of the actions not running, any idea why before I force the merge?

josesimoes · 2024-12-20T10:43:45Z

@josesimoes we do have 2 of the actions not running, any idea why before I force the merge?

They were waiting for a maintainer approval to run. (github is getting heavier and heavier on security 😉 )

nfbot · 2024-12-20T10:44:03Z

@mizady thank you again for your contribution! 🙏😄

.NET nanoFramework is all about community involvement, and no contribution is too small.
We would like to invite you to join the project's Contributors list.

Please edit it and add an entry with your GitHub username in the appropriate location (names are sorted alphabetically):

  <tr>
    <td><img src="https://github.com/mizady.png?size=50" height="50" width="50" ></td>
    <td><a href="https://github.com/mizady">Lance Cain</a></td>
  </tr>

(Feel free to adjust your name if it's not correct)

josesimoes

Well done! Thank you for this new improvement on this library.

mizady added 5 commits December 8, 2024 20:24

Added special character unit tests for StringConverter

52daab5

Added unit tests to verify correct handling of special characters

Added special character handling to StringConverter

1b80381

Added special character handling for FormFeed, Backspace, Backslash, and Tab

Merge branch 'nanoframework:main' into main

a4eb022

Removed null char and octal null tests

640ebc5

Merge branch 'main' of https://github.com/mizady/nanoFramework.Json

96921cc

nfbot added Type: Bug Something isn't working Type: Unit Tests labels Dec 9, 2024

mizady added 2 commits December 13, 2024 16:30

Merge branch 'main' into main

d508707

Merge branch 'main' into main

13a4cb3

josesimoes reviewed Dec 16, 2024

View reviewed changes

nanoFramework.Json/JsonConvert.cs Outdated Show resolved Hide resolved

mizady added 4 commits December 17, 2024 16:28

Added special character unit tests for StringConverter

ee1e8e1

Added unit tests to verify correct handling of special characters

Added special character handling to StringConverter

c6c45ec

Added special character handling for FormFeed, Backspace, Backslash, and Tab

Removed null char and octal null tests

aabcafe

josesimoes force-pushed the main branch from 13a4cb3 to 482d788 Compare December 17, 2024 16:28

mizady added 3 commits December 17, 2024 14:37

Merge branch 'main' of https://github.com/mizady/nanoFramework.Json

6944c3c

Added condition to replace quotes

537d1ee

During de-serialization of strings that are part of complex objects there is a set of quotes removed that shouldn't be when escaped. This conditional block replaces those quotes.

Ellerbach reviewed Dec 19, 2024

View reviewed changes

Merge branch 'main' into main

84edef9

Added further test combinations for serialization and deserialization

fd999fc

Add 18 DataRows to the two tests for serialization and deserialization of simple strings and complex objects containing strings.

Ellerbach approved these changes Dec 20, 2024

View reviewed changes

josesimoes enabled auto-merge (squash) December 20, 2024 10:42

josesimoes merged commit eb4c259 into nanoframework:main Dec 20, 2024
7 checks passed

josesimoes approved these changes Dec 20, 2024

View reviewed changes

josesimoes changed the title ~~Fix and Enhance StringConverter to handle more special characters~~ Fix and enhance StringConverter() to handle more special characters Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and enhance `StringConverter()` to handle more special characters #361

Fix and enhance `StringConverter()` to handle more special characters #361

mizady commented Dec 9, 2024 •

edited by josesimoes

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Ellerbach commented Dec 9, 2024

Ellerbach commented Dec 9, 2024

mizady commented Dec 15, 2024

Ellerbach commented Dec 16, 2024

mizady commented Dec 16, 2024

Ellerbach commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

Ellerbach left a comment

mizady commented Dec 19, 2024

josesimoes commented Dec 19, 2024

mizady commented Dec 19, 2024

sonarqubecloud bot commented Dec 19, 2024

Ellerbach left a comment

Ellerbach commented Dec 20, 2024

Ellerbach commented Dec 20, 2024

josesimoes commented Dec 20, 2024

nfbot commented Dec 20, 2024

josesimoes left a comment

Fix and enhance StringConverter() to handle more special characters #361

Fix and enhance StringConverter() to handle more special characters #361

Conversation

mizady commented Dec 9, 2024 • edited by josesimoes Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

coderabbitai bot commented Dec 9, 2024 • edited Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Ellerbach commented Dec 9, 2024

Ellerbach commented Dec 9, 2024

mizady commented Dec 15, 2024

Ellerbach commented Dec 16, 2024

mizady commented Dec 16, 2024

Ellerbach commented Dec 19, 2024

azure-pipelines bot commented Dec 19, 2024

Ellerbach left a comment

Choose a reason for hiding this comment

mizady commented Dec 19, 2024

josesimoes commented Dec 19, 2024

mizady commented Dec 19, 2024

sonarqubecloud bot commented Dec 19, 2024

Quality Gate passed

Ellerbach left a comment

Choose a reason for hiding this comment

Ellerbach commented Dec 20, 2024

Ellerbach commented Dec 20, 2024

josesimoes commented Dec 20, 2024

nfbot commented Dec 20, 2024

josesimoes left a comment

Choose a reason for hiding this comment

Fix and enhance `StringConverter()` to handle more special characters #361

Fix and enhance `StringConverter()` to handle more special characters #361

mizady commented Dec 9, 2024 •

edited by josesimoes

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)