Skip to content

fix(QTDI-679): schema collision with Cyrillic names #1051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ozhelezniak-talend
Copy link
Contributor

Requirements

  • Any code change adding any logic MUST be tested through a unit test executed with the default build
  • Any API addition MUST be done with a documentation update if relevant

Why this PR is needed?

What does this PR adds (design/code thoughts)?

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses schema collisions by replacing direct calls to Schema.sanitizeConnectionName and Schema.avoidCollision with their counterparts in SchemaCompanionUtil, streamlining the sanitization and collision resolution process. Key changes include:

  • Updating imports and method calls across production and test code to use SchemaCompanionUtil.
  • Refactoring test helper methods (e.g., making newEntry/newSchema static) for consistency.
  • Removing deprecated implementations in favor of the new utility methods.

Reviewed Changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
component-runtime-di/src/main/java/org/talend/sdk/component/runtime/di/record/DiRowStructVisitor.java Updated sanitize method import.
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/SchemaImplTest.java Changed static reference in collision tests.
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/SchemaCompanionUtilTest.java Introduced new tests for SchemaCompanionUtil.
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/RecordBuilderImplTest.java Adjusted collision test for record building.
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/RecordBuilderFactoryImplTest.java Added tests for sanitized names in presence of Cyrillic characters.
component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/record/SchemaImpl.java Updated collision resolution and sanitization calls.
component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/record/RecordImpl.java Modified collision handling by adding an overwriteEntry helper.
component-runtime-beam/* Updated multiple Beam-related classes to use SchemaCompanionUtil.
component-api/* Removed outdated tests and deprecated implementations in favor of SchemaCompanionUtil.
component-api/src/main/java/org/talend/sdk/component/api/record/SchemaCompanionUtil.java Newly added file providing robust sanitization and collision functions.
Files not reviewed (1)
  • pom.xml: Language not supported
Comments suppressed due to low confidence (1)

component-api/src/main/java/org/talend/sdk/component/api/record/Schema.java:491

  • [nitpick] Since the method is now deprecated and simply delegates to SchemaCompanionUtil, consider updating the deprecation message to clearly indicate the preferred alternative and usage.
static String sanitizeConnectionName(final String name) {

@ozhelezniak-talend ozhelezniak-talend force-pushed the ozhelezniak/QTDI-679_schema_sanitize_collision branch from 6f7cdab to 8197b19 Compare May 29, 2025 07:19

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

* fix when the names of the same length the values were mixed
* fix when non-raw name is first that will be collided with others
* fix order property
* fix schemaimpl
* refactor sanitization logic
* add tests
@ozhelezniak-talend ozhelezniak-talend force-pushed the ozhelezniak/QTDI-679_schema_sanitize_collision branch from ca29e4c to 21c85f4 Compare May 29, 2025 15:14
@ozhelezniak-talend ozhelezniak-talend changed the title fix(QTDI-679): schema collision fix(QTDI-679): schema collision with Cyrillic names May 29, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactor sanitization and collision handling by introducing SchemaCompanionUtil, updating implementation and tests to use the new utility, and adding coverage for Cyrillic name collisions.

  • Introduce SchemaCompanionUtil for centralizing sanitizeConnectionName and avoidCollision logic.
  • Replace all deprecated Schema.sanitizeConnectionName and Schema.avoidCollision calls with the new utility across runtime and Beam modules.
  • Add and update unit tests to cover collision scenarios, including equal-length Cyrillic names.

Reviewed Changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/SchemaImplTest.java Refactor helpers to static methods and add antiCollisionRealNameFirst test case
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/SchemaCompanionUtilTest.java New tests for SchemaCompanionUtil, including Cyrillic collision scenarios
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/RecordBuilderImplTest.java Update tests to use SchemaCompanionUtil.sanitizeConnectionName and rename collision tests
component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/RecordBuilderFactoryImplTest.java Add sanitizeCyrillicEqualLength test for RecordBuilderFactory
component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/record/SchemaImpl.java Replace deprecated sanitization/collision calls with SchemaCompanionUtil and adjust replacement logic
component-runtime-impl/src/main/java/org/talend/sdk/component/runtime/record/RecordImpl.java Use SchemaCompanionUtil.avoidCollision and add replaceEntryAndItsValue helper
component-runtime-beam/src/test/java/org/talend/sdk/component/runtime/beam/spi/record/AvroRecordTest.java Update collision tests to use SchemaCompanionUtil and adjust assertions
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transform/ViewsMappingTransform.java Switch static import to SchemaCompanionUtil.sanitizeConnectionName
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transform/RecordBranchUnwrapper.java Switch static import to SchemaCompanionUtil.sanitizeConnectionName
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transform/RecordBranchMapper.java Switch static import to SchemaCompanionUtil.sanitizeConnectionName
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/transform/RecordBranchFilter.java Switch static import to SchemaCompanionUtil.sanitizeConnectionName
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/spi/record/AvroSchemaBuilder.java Replace collision logic with SchemaCompanionUtil
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/spi/record/AvroRecord.java Update static import to SchemaCompanionUtil.sanitizeConnectionName
component-runtime-beam/src/main/java/org/talend/sdk/component/runtime/beam/BaseProcessorFn.java Switch static import to SchemaCompanionUtil.sanitizeConnectionName
component-api/src/test/java/org/talend/sdk/component/api/test/MockEntry.java Add full-featured MockEntry builder for companion tests
component-api/src/test/java/org/talend/sdk/component/api/service/record/RecordBuilderFactoryTest.java Simplify imports and replace deprecated builders with MockEntry
component-api/src/test/java/org/talend/sdk/component/api/record/SchemaTest.java Remove deprecated sanitization tests
component-api/src/test/java/org/talend/sdk/component/api/record/SchemaCompanionUtilTest.java New parameterized tests covering sanitization and collision edge cases
component-api/src/main/java/org/talend/sdk/component/api/record/SchemaCompanionUtil.java New utility class for sanitization and collision avoidance
component-api/src/main/java/org/talend/sdk/component/api/record/Schema.java Deprecate old methods and forward to SchemaCompanionUtil
Comments suppressed due to low confidence (2)

component-runtime-impl/src/test/java/org/talend/sdk/component/runtime/record/RecordBuilderImplTest.java:852

  • [nitpick] The test method name abilityToOverwrite is inconsistent with the other test naming conventions; consider renaming it to testAbilityToOverwrite for clarity.
void abilityToOverwrite() {

component-api/src/main/java/org/talend/sdk/component/api/record/Schema.java:491

  • Consider adding a @since tag to this deprecated method to guide consumers to SchemaCompanionUtil.
@Deprecated
    static String sanitizeConnectionName(final String name) {

This comment has been minimized.

This comment has been minimized.

@ozhelezniak-talend ozhelezniak-talend requested a review from undx June 2, 2025 09:54
Copy link

sonar-eks bot commented Jun 3, 2025

Passed

Analysis Details

1 Issue

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 1 Code Smell

Coverage and Duplications

  • Coverage 96.30% Coverage (56.50% Estimated after merge)
  • Duplications 0.00% Duplicated Code (1.30% Estimated after merge)

Project ID: org.talend.sdk.component:component-runtime

View in SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants