-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early wrangling #1
Draft
bobular
wants to merge
70
commits into
main
Choose a base branch
from
first-draft
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Added type detection for variables - Introduced warnings for duplicate column names - Integrated `type_convert` for robust type inference
- Added `data_type` auto-detection, including a new `id` type for primary keys. - Dates are explicitly excluded from being detected as `id`. - Detection of `id` applies only to primary keys (parent IDs won't work with current simple logic). - Added `data_shape` inference logic: - Variables with `number`, `integer`, or `date` types default to `continuous`. - All other types default to `categorical`. - Updated test fixture data to include an additional row, ensuring non-date variables have non-unique values. This implementation improves the metadata generation for variables, aligning with EDA requirements.
…and tests - Introduced the `preprocess_fn` argument for user-defined data cleanup before type inference. This allows handling edge cases like correcting invalid dates (e.g., changing '2021-02-29' to '2021-03-01'). - Enhanced type inference warnings: - Invalid date warnings from `type_convert` are now intercepted and embellished with a note about using `preprocess_fn`. - Suppressed propagation of handled warnings to avoid duplicates. - Added test for handling invalid leap year dates (e.g., '2021-02-29'). - Invalid dates added using `preprocess_fn` - Invalid dates are converted to `NA` as per `type_convert` behavior, with appropriate warnings and user-guidance issued.
- Added functionality to display metadata for ID and variable columns separately: - Prints a concise summary of ID columns. - Includes detailed metadata for variable columns (`data_type`, `data_shape`). - Integrated `skimr::skim()` for summarizing variable data. - Excludes ID columns from the summary. - Placeholder note for future entity-level metadata summary. - Provides an intuitive way to inspect Entity objects, including column metadata and data summaries.
- Introduced an S4 method `inspect_variable()` to inspect a single variable in detail: - Validates the presence of the variable in the Entity's metadata. - Displays metadata for the specified variable in a vertical format using `pivot_longer()`. - Summarizes the variable's data using `skim()`, pivoted for readability. - Ensures robust handling of mixed types in skim output by converting all values to character before pivoting. - Complements the `inspect()` method for Entity-wide inspection by focusing on individual variables.
…d display_name_plural
…them in a few places - fix tests that were broken with using kable simple format
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.