Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Good first issue | Feature] Synthesize specific types of IDs #120

Open
Z712023 opened this issue Jan 19, 2024 · 1 comment
Open

[Good first issue | Feature] Synthesize specific types of IDs #120

Z712023 opened this issue Jan 19, 2024 · 1 comment
Labels
difficulty-hard enhancement New feature or request

Comments

@Z712023
Copy link
Collaborator

Z712023 commented Jan 19, 2024

Problem

Table data often includes special ID fields, such as a fixed string "AXBSAX" followed by a variable string X, where the fixed string holds a static physical meaning and X increments in quantity, such as "0001", "0002", and so on.

Proposed Solution

Using regular expressions to analyze the ID format, synthesize different meaningful segments while preserving the static meaning of the original ID field.

We can consider two conditions and handle them separately:

  1. The field has no semantic meaning
    (1) Determine the number of unique types.
    (2) Use Faker (Note: Faker-generated fields may not preserve the semantic meaning of the original ID field).
  2. The field is associated with other attributes and has simulation value
    We need to preserve the original field's semantics:
    (1) If the ID field carries additional information, abstract it into a new column.
    (2) Use the data fitted by the model to guide the generation of Faker.
    (3) Exclude in post-processing.

Additional context

@Z712023 Z712023 added enhancement New feature or request difficulty-hard labels Jan 19, 2024
@MooooCat MooooCat changed the title Synthesize specific types of IDs. [Good first issue | Feature] Synthesize specific types of IDs Feb 26, 2024
@Devansh-Kushwaha
Copy link

I can help in adding this feature.
Assign me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty-hard enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants