Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change filename column name to file_name #449

Merged

Conversation

praateekmahajan
Copy link
Collaborator

Description

Resolves #427

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Praateek <[email protected]>
@praateekmahajan praateekmahajan changed the title Change filename to file_name Change filename column name to file_name Dec 20, 2024
Signed-off-by: Praateek <[email protected]>
@praateekmahajan praateekmahajan added the gpuci Run GPU CI/CD on PR label Dec 20, 2024
@@ -64,7 +64,7 @@ def read_json(
input_files: The path of the input file(s).
backend: The backend to use for reading the data.
files_per_partition: The number of files to read per partition.
add_filename: Whether to add a "filename" column to the DataFrame.
add_filename: Whether to add a "file_name" column to the DataFrame.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing add_filename to add_file_name, keep_filename_column to keep_file_name_column, write_to_filename to write_to_file_name, etc.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah I did think about it but I personally felt the cons of renaming arguments in function signature >> pros of having alignment between the actual column name, specifically where backward compatibility, changing across scripts / tutorials / user behavior.
If you think the that the consistency between functions argument name and the df column name matters, I'll be happy to change it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah, I don't have a very strong preference for it so feel free to keep as is, unless others have different opinions.

In this case, my only other comment is if you could make sure there aren't any lingering scripts or examples/tutorials that try to access a "filename" column when it should now be "file_name"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look and couldn't find anything, found one remaining comment line that I've added 1126461

Copy link
Collaborator

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing it. I think file_name is innocuous enough for the benefit we are getting.

Signed-off-by: Praateek <[email protected]>
@praateekmahajan praateekmahajan merged commit 4fb7f54 into NVIDIA:main Dec 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuci Run GPU CI/CD on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filename column inaccessible with pandas backend and parquet
3 participants