Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and enhance StringConverter() to handle more special characters #361

Merged
merged 17 commits into from
Dec 20, 2024

Conversation

mizady
Copy link
Contributor

@mizady mizady commented Dec 9, 2024

Description

  • Fix and Enhance StringConverter to handle more special characters.
  • Added test cases for StringConverter to validate special character handling during serialization and deserialization.

Motivation and Context

How Has This Been Tested?

  • Added unit tests that verified the backslash, tab, backspace, and FeedForm special characters were not being handled correctly as escaped characters by the StringConverter. All current tests in the nanoFramework.Json.Test project including the added tests are now passing.

Types of changes

  • Improvement (non-breaking change that improves a feature, code or algorithm)
  • Bug fix (non-breaking change which fixes an issue with code or algorithm)
  • New feature (non-breaking change which adds functionality to code)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Config and build (change in the configuration and build system, has no impact on code or features)
  • Dependencies (update dependencies and changes associated, has no impact on code or features)
  • Unit Tests (add new Unit Test(s) or improved existing one(s), has no impact on code or features)
  • Documentation (changes or updates in the documentation, has no impact on code or features)

Checklist:

  • My code follows the code style of this project (only if there are changes in source code).
  • My changes require an update to the documentation (there are changes that require the docs website to be updated).
  • I have updated the documentation accordingly (the changes require an update on the docs in this repo).
  • I have read the CONTRIBUTING document.
  • I have tested everything locally and all new and existing tests passed (only if there are changes in source code).
  • I have added new tests to cover my changes.

@nfbot nfbot added Type: Bug Something isn't working Type: Unit Tests labels Dec 9, 2024
Copy link

coderabbitai bot commented Dec 9, 2024

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (4)
  • nanoFramework.Json.Test/Converters/StringConverterTests.cs is excluded by none and included by none
  • nanoFramework.Json.Test/JsonSerializerOptionsTests.cs is excluded by none and included by none
  • nanoFramework.Json/Converters/StringConverter.cs is excluded by none and included by none
  • nanoFramework.Json/JsonConvert.cs is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Ellerbach
Copy link
Member

So, this is not properly working as explained in Discord. Here is the investigation: https://github.com/nanoframework/nanoFramework.Json/tree/pr361

And here are the elements:

  • This will fail: [DataRow("I:\nano\rpath\to", "I:\nano\rpath\to")] // More complex with combination of special characters and possible new line
  • The string converter with the current logic only works for a json that is a single string
  • In case the string is part of an existing obejct, the current logic will break when it's a string wontaining potential characters like in the previous example. Also, it will fail in case of a string of length 1

To solve this problem:

  • As in the branch pr361 pointed above, do not use the escaped logic for objects, just comment out the code like in that branch
  • For single string, then, the logic should be used

@Ellerbach
Copy link
Member

And the case already exists:

So, in short, it's about using the function I've commented for that case and calling it in the if. Otherwise, we should not use it.

Added a bypass to return a string that as already been deserialized when it lands at StringConverter.ToType()

Removed portion of string escape character processing in JsonConvert as it was implemented in StringConverter (unicode escaped character processing still remains in JsonConvert)

Added Tests for multiple escaped characters in a string and in an object containing a string property.

Added test case for string converter test for serializing and deserializing strings with multiple escaped characters
@mizady
Copy link
Contributor Author

mizady commented Dec 15, 2024

I noticed that the discord thread was not actually linked here. Discord thread

RFC8259 - https://datatracker.ietf.org/doc/html/rfc8259
ECMA-404 - https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf

Seems like we are handling all the escaped characters correctly now from what I can tell based on testing.

@Ellerbach
Copy link
Member

Are the \uXXXXstill properly handled with the new implementation?

@mizady
Copy link
Contributor Author

mizady commented Dec 16, 2024

Are the \uXXXXstill properly handled with the new implementation?

The original unicode processing part of he code is still used for unicode characters. The existing tests that test unicode properly serializing and deserializing have all passed.

Added unit tests to verify correct handling of special characters
Added special character handling for FormFeed, Backspace, Backslash, and Tab
Added a bypass to return a string that as already been deserialized when it lands at StringConverter.ToType()

Removed portion of string escape character processing in JsonConvert as it was implemented in StringConverter (unicode escaped character processing still remains in JsonConvert)

Added Tests for multiple escaped characters in a string and in an object containing a string property.

Added test case for string converter test for serializing and deserializing strings with multiple escaped characters
Added test cases in StringConverterTests to further test serialization (StringConverter.ToJson)

Added test cases in StringConverterTests to further test StringConverter.ToType and corrected the existing one (did not have removal of quotes which should happen when deserializing)

Added test cases to JsonSerializerOptionsTests for multiple escaped characters and combinations such as escaped character being at the end of a string for both simple string objects and complex objects containing string properties.

Corrected test CanDeserializeInvocationReceiveMessage_05 which had an expected value including quotes for a string that did not have quotes originally in it as "" becomes " and would have been removed during deserialization.

In StringConverter I have made it so that it returns the string without quotes (as expected) when there are no escaped characters present as before if there were no escaped characters there would be an extra set of quotes left around the returned value.  I also inverted the logic on the loop in order to ensure that the loop can exit without appending the letter portion of the escaped character if it is the last character in the string (such as a newline on the end would replace the escaped value as expected but would then attach the 'n' character to the end.

I have replaced the deserialization of escaped characters to correctly replace them when deserializing an object with a string component.  I also added the missing escaped characters that were added to the StringConverter in previous commit.
During de-serialization of strings that are part of complex objects there is a set of quotes removed that shouldn't be when escaped. This conditional block replaces those quotes.
@Ellerbach
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@Ellerbach Ellerbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before approving, can you add a test with a complex object, something that is already existing with a string that contains escaped chars so we are sure we cover properly that case as well?

For the rest, looks all goos! Thanks a lot for the efforts getting there.

@mizady
Copy link
Contributor Author

mizady commented Dec 19, 2024

The test labeled "Can_serialize_and_deserialize_object_containing_string_with_escaped_characters" located in the JsonSerializerOptionsTests has a data row for the case of escaped quotes at the beginning and end. Also in the same test class is "Can_serialize_and_deserialize_string_with_escaped_characters" which has all the same data rows for a string that is not part of a complex object. So unless I missed something (which of course is possible) then there should be tests handling all of these cases.

@josesimoes
Copy link
Member

@mizady from my understanding of @Ellerbach request: you can grab Can_serialize_and_deserialize_complex_object, duplicate it and sprinkle a couple of escaped strings there.

Add 18 DataRows to the two tests for serialization and deserialization of simple strings and complex objects containing strings.
@mizady
Copy link
Contributor Author

mizady commented Dec 19, 2024

I've added 18 test cases to each of the two tests for simple string and complex object containing a string serialization and deserialization. this is now the list of tested combinations for both. Totaling 68 tests between the two.

[DataRow("a")] // Single character
[DataRow("1")] // Single numeric character
[DataRow("\t")] // Single Tab character
[DataRow("Testing / solidus")] // Forward slash in string
[DataRow("Testing solidus")] // Double space in string
[DataRow("Some string with " that needs escaping")] // String containing a quote
[DataRow("Quotes in a "string".")] // String with escaped quotes
[DataRow("Escaped last character \n")] // Newline as the last character
[DataRow("I:\Nano\rApp\app.pe")] // Backslash in string
[DataRow("Tab \t in a string \t")] // Tab character in multiple places
[DataRow("Newline \n in a string \n")] // Newline character in multiple places
[DataRow("LineFeed \f in a string \f")] // Line feed character in multiple places
[DataRow("CarriageReturn \r in a string \r")] // Carriage return character in multiple places
[DataRow("Backspace \b in a string \b")] // Backspace character in multiple places
[DataRow("TestString")] // Simple string with no special characters
[DataRow(""TestString"")] // String wrapped in quotes
[DataRow("\u0041")] // Unicode character (A)
[DataRow("\u2764")] // Unicode character (❤)
[DataRow("\x1B")] // Escape character (ASCII 27)
[DataRow("\x7F")] // Delete character (ASCII 127)
[DataRow("\0")] // Null character
[DataRow("")] // Empty string
[DataRow("Line 1\nLine 2\nLine 3")] // Multi-line string
[DataRow("Curly braces: { }")] // JSON-like curly braces
[DataRow("Square brackets: [ ]")] // JSON-like square brackets
[DataRow("Colon and comma: : ,")] // Colon and comma
[DataRow("Special symbols: @#$%^&*()_+~")] // Special symbols
[DataRow("English 中文 Español العربية हिंदी")] // Mixed language text
[DataRow("{"key": "value"}")] // JSON-like string
[DataRow(""[{"inner":"value"}]"")] // Serialized JSON-like string
[DataRow("{"name":"John","age":30}")] // Serialized JSON
[DataRow("Invalid escape: \q")] // Invalid escape sequence
[DataRow("https://example.com/api?query=escaped%20characters")] // URL
[DataRow("Unicode \u2764, Newline \n, Tab \t, Backslash \")] // Combination of cases

Copy link
Member

@Ellerbach Ellerbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot for the fix!

@Ellerbach
Copy link
Member

I've added 18 test cases to each of the two tests for simple string and complex object containing a string serialization and deserialization

Amazing! Thanks a lot!

@Ellerbach
Copy link
Member

@josesimoes we do have 2 of the actions not running, any idea why before I force the merge?

@josesimoes josesimoes enabled auto-merge (squash) December 20, 2024 10:42
@josesimoes
Copy link
Member

@josesimoes we do have 2 of the actions not running, any idea why before I force the merge?

They were waiting for a maintainer approval to run. (github is getting heavier and heavier on security 😉 )

@josesimoes josesimoes merged commit eb4c259 into nanoframework:main Dec 20, 2024
7 checks passed
@nfbot
Copy link
Member

nfbot commented Dec 20, 2024

@mizady thank you again for your contribution! 🙏😄

.NET nanoFramework is all about community involvement, and no contribution is too small.
We would like to invite you to join the project's Contributors list.

Please edit it and add an entry with your GitHub username in the appropriate location (names are sorted alphabetically):

  <tr>
    <td><img src="https://github.com/mizady.png?size=50" height="50" width="50" ></td>
    <td><a href="https://github.com/mizady">Lance Cain</a></td>
  </tr>

(Feel free to adjust your name if it's not correct)

Copy link
Member

@josesimoes josesimoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thank you for this new improvement on this library.

@josesimoes josesimoes changed the title Fix and Enhance StringConverter to handle more special characters Fix and enhance StringConverter() to handle more special characters Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working Type: Unit Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nanoFramework.Json Serialize/Deserialize does not handle \ correctly
4 participants