Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging/main/0.10.8 #1081

Merged
merged 5 commits into from
Jan 11, 2024
Merged

Staging/main/0.10.8 #1081

merged 5 commits into from
Jan 11, 2024

Conversation

taylorfturner
Copy link
Contributor

@taylorfturner taylorfturner commented Jan 11, 2024

  • Version bump to 0.10.8 from 0.10.7 in version.py
  • Feature upgrades

menglinw and others added 5 commits December 12, 2023 08:59
* parquet sampling function developed in data_utils.py; Added sample_nrows argument in ParquetData class; Added test_len_sampled_data in test_parquet_data.py

* resolved conflict with dev, added more tests

* fixed sample empty column bug

* fixed comments in data_utils.py, including:
1. added type of return in sample_parquet function;
2. changed variable names in sample_parquet function to more descriptive names (select -> sample_index, out -> sample_df);
3. created convert_unicode_col_to_utf8 function to reduce repeating code in sample_parquet and read_parquet_df functions

* 1. renamed variable names in covert_unicode_col_to_utf8 function (data_utils.py) to be more descriptive (types -> input_column_types, col -> iter_column), other part unchanged

2. test_parquet_data.py, move import statement to the top of file

3. test_parquet_data.py, merged all tests about parquet sample feature to their original tests

* checked the datatype and input file path before and after reload with sampling option enabled

* test

* delete test edit in avro_data.py, updated fastavro version in  requirment.txt

* remove fastavro.reader type

* change fastavro version back to original

* 1. sample_parquet function description
2. test_len_data method keep one sample length test
3. remove sampling test in test_specifying_data_type
4. remove sampling test in test_reload_data
* bump tag matplotlib

* bumpt to most recent

* 3.9.0 update
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Taylor Turner <[email protected]>
@taylorfturner taylorfturner added Version Upgrade Release / version change PR 0.10.8 labels Jan 11, 2024
@taylorfturner taylorfturner self-assigned this Jan 11, 2024
@taylorfturner taylorfturner requested a review from a team as a code owner January 11, 2024 15:57
@taylorfturner taylorfturner added the Work In Progress Solution is being developed label Jan 11, 2024
@@ -67,7 +67,7 @@ def _load_data_from_file(self, input_file_path: str) -> List:
# even when the option encoding='utf-8' is added. It may come from
# some special compression codec, e.g., snappy. Then, binary mode
# reading is currently used to get the dict-formatted lines.
df_reader: fastavro.reader = fastavro.reader(input_file)
df_reader = fastavro.reader(input_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was actually throwing a mypy error IIRC @menglinw

@micdavis micdavis merged commit a92ab1e into main Jan 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.10.8 Version Upgrade Release / version change PR Work In Progress Solution is being developed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants