Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XML] properly handle normalizedString & token #451

Open
jkowalleck opened this issue Jun 24, 2024 · 2 comments
Open

[XML] properly handle normalizedString & token #451

jkowalleck opened this issue Jun 24, 2024 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@jkowalleck
Copy link
Member

jkowalleck commented Jun 24, 2024

CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines normalizedString as follows:

<xs:simpleType name="normalizedString" id="normalizedString">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>
  </xs:annotation>
  <xs:restriction base="xs:string">
    <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

normalizedString represents white space normalized strings. The ·value space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·lexical space· of normalizedString is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters. The ·base type· of normalizedString is string.


CycloneDX uses http://www.w3.org/2001/XMLSchema - which defines token as follows:

<xs:simpleType name="token" id="token">
  <xs:annotation>
    <xs:documentation source="http://www.w3.org/TR/xmlschema-2/#token"/>
  </xs:annotation>
  <xs:restriction base="xs:normalizedString">
    <xs:whiteSpace value="collapse" id="token.whiteSpace"/>
  </xs:restriction>
</xs:simpleType>

token represents tokenized strings. The ·value space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·lexical space· of token is the set of strings that do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces. The ·base type· of token is normalizedString.


therefore, on XML-normalization for normalizedString, the following chars must be replaced by space( ):

  • carriage return: \r (#xD)
  • line feed: \n (#xA)
  • tab: \t (#x9)

Therefore, on XML-normalization for token, the following must aplpy:

  • all from above
  • consecutive spaces are collapsed to one space.
  • leading and trialing spaces are truncated

Affected are only fields that are defined as normalizedString respective token in XML spec!
Other field MUST NOT be affected!

@jkowalleck jkowalleck transferred this issue from CycloneDX/cyclonedx-php-composer Jun 24, 2024
@jkowalleck jkowalleck added the bug Something isn't working label Jun 24, 2024
@jkowalleck
Copy link
Member Author

jkowalleck commented Jun 24, 2024

possible solution:
modify the XML normalizers where needed, and call a to-be-written helper function that does the normalization.

@jkowalleck jkowalleck added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 24, 2024
@jkowalleck jkowalleck changed the title [XML] properly handle normalizedString [XML] properly handle normalizedString & token Jun 24, 2024
@jkowalleck
Copy link
Member Author

solution as done in TS?JS CycloneDX/cyclonedx-javascript-library#1116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant