Early wrangling #1

bobular · 2024-11-27T16:30:18Z

No description provided.

- Added type detection for variables - Introduced warnings for duplicate column names - Integrated `type_convert` for robust type inference

- Added `data_type` auto-detection, including a new `id` type for primary keys. - Dates are explicitly excluded from being detected as `id`. - Detection of `id` applies only to primary keys (parent IDs won't work with current simple logic). - Added `data_shape` inference logic: - Variables with `number`, `integer`, or `date` types default to `continuous`. - All other types default to `categorical`. - Updated test fixture data to include an additional row, ensuring non-date variables have non-unique values. This implementation improves the metadata generation for variables, aligning with EDA requirements.

…and tests - Introduced the `preprocess_fn` argument for user-defined data cleanup before type inference. This allows handling edge cases like correcting invalid dates (e.g., changing '2021-02-29' to '2021-03-01'). - Enhanced type inference warnings: - Invalid date warnings from `type_convert` are now intercepted and embellished with a note about using `preprocess_fn`. - Suppressed propagation of handled warnings to avoid duplicates. - Added test for handling invalid leap year dates (e.g., '2021-02-29'). - Invalid dates added using `preprocess_fn` - Invalid dates are converted to `NA` as per `type_convert` behavior, with appropriate warnings and user-guidance issued.

…cies

- Added functionality to display metadata for ID and variable columns separately: - Prints a concise summary of ID columns. - Includes detailed metadata for variable columns (`data_type`, `data_shape`). - Integrated `skimr::skim()` for summarizing variable data. - Excludes ID columns from the summary. - Placeholder note for future entity-level metadata summary. - Provides an intuitive way to inspect Entity objects, including column metadata and data summaries.

- Introduced an S4 method `inspect_variable()` to inspect a single variable in detail: - Validates the presence of the variable in the Entity's metadata. - Displays metadata for the specified variable in a vertical format using `pivot_longer()`. - Summarizes the variable's data using `skim()`, pivoted for readability. - Ensures robust handling of mixed types in skim output by converting all values to character before pivoting. - Complements the `inspect()` method for Entity-wide inspection by focusing on individual variables.

…d display_name_plural

… errors fatal

…d notebook

…them in a few places - fix tests that were broken with using kable simple format

…e wider

bobular added 30 commits November 27, 2024 16:28

changes to improve dev workflow

71a078d

first class, function and tests

89874e5

Initial implementation of entity_from_file

72c3afd

- Added type detection for variables - Introduced warnings for duplicate column names - Integrated `type_convert` for robust type inference

oopsie with expected test outputs

c1c852f

removed test.R boilerplate

bb951bb

added roxygen2 docs for entity_from_file; updated README and dependen…

75762c1

…cies

add skimr dependency

05fe81d

treat categoricals as factors, fix skim behaviour with factors

f65a09d

add test for inspect(entity) and long categorical values

7bb6dc8

improve docs on inspect

b35cb4a

add basic test for inspect_variable()

a05cb27

made a start on validate(entity) - WIP

419b196

added quiet option to validate()

0f430df

added basic tests for validate()

24c519f

tiny tweak to port number in examples

1c06e0a

add setwd() advice in README

6f53f3c

improve preprocess_fn documentation

cbedf09

major refactor to separate concerns and add all metadata columns

e5b7f5a

more of the refactor

ded4179

tidied up handling of factors and vocabularies

382c52a

improve inspect() output

a683b70

tidy up entity metadata slots

8922aa9

big rename of Entity metadata slot to 'variables'

730edc2

document Entity class; add entity metadata handling in constructor

6c4fceb

added set_entity_metadata and tests and fallbacks for display_name an…

2a7955b

…d display_name_plural

refined set_entity_metadata and tests

a0424c1

bobular added 30 commits December 9, 2024 11:21

tidy some base R

3fb6939

categorical factor column validation and make data-variables mismatch…

19b3627

… errors fatal

remove as a variables metadata column

e9d0d54

improve factor validation and testing

f9993ef

add support for large vocabs/factors

17b9984

added extensive variable metadata validation

f245db7

added more fixtures - not checked yet

ad7ce09

added convenience redirect from inspect() to inspect_variable()

bc56ef5

add basic participants test

e3aa05a

change guidance for inspecting variables

8b2e71d

add set_parents() and tests and fct_mutate()

5b016ea

started tests for fct_mutate()

3cc88cb

added essential fct_mutate tests

78a2cdf

fixed buggy fixture data and provided parsing error reporting

91499d8

allow mutation to NA for factors

e8b4ceb

big changes but mostly redo_type_detection_as_variables_only()

9b14e32

tidy inspect() guidance

0845232

better naming/tagging of built docker image

d5e31ec

avoid error when there are no variables

b396239

complete move of fixture data

086925f

add basic vignette and figure out most of skim/kable issues

f6e51bc

vignette/build-related tweaks

d6fd203

improve comments

4dad26e

inspect(entity) now produces consistent all-text report in console an…

1bdc73b

…d notebook

inspect_variable() now console/notebook friendly, continue with vignette

98ca02a

add get_variable_metadata and get_id_column_metadata helpers and use …

9c452c2

…them in a few places - fix tests that were broken with using kable simple format

added validation and test for no ID column

54cf647

cleanup and vignette work

ac9a1e7

add set_data() convenience and adapt tests and vignette, make vignett…

39d79ec

…e wider

add tests for participants and observations loading

8768a3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early wrangling #1

Early wrangling #1

bobular commented Nov 27, 2024

Early wrangling #1

Are you sure you want to change the base?

Early wrangling #1

Conversation

bobular commented Nov 27, 2024