Skip to content

Releases: capitalone/DataProfiler

v0.7.6

04 Feb 15:41
836607e
Compare
Choose a tag to compare

Bug fixes

  • Fixes missing requests package #438

Other Changes

  • Documentation / Github Pages updated #439

v0.7.5

28 Jan 20:22
3f3fa87
Compare
Choose a tag to compare

Profiler

  • Allow user to specify structured labeler to return indexes instead of labels #435

Bug fixes

  • Bug fix for if nones are in header the header is no longer detected #432
  • Bug fix for removal of PAD conf in structured labeler #437

Other Changes

v0.7.4

19 Nov 15:45
60da2c4
Compare
Choose a tag to compare

Bug fixes

  • Bug fix to handle final word in word level argmax #426
  • Bug fix for numpy removing trailing nulls from strings #427, #428 , #429

Other Changes

  • Documentation / Github Pages updated #428

v0.7.3

28 Oct 21:24
05ea6eb
Compare
Choose a tag to compare

Profiler

  • Add ability to full install the dataprofiler with one command #424

Other Changes

  • Documentation / Github Pages updated #422
  • Updated examples #425

v0.7.2

18 Oct 20:19
11a50a9
Compare
Choose a tag to compare

Profiler

  • Add median to numeric stats #389
  • Chi square tests, added to profiler #392
  • Chi square/homogeneity, median, mode, MAD differences #398, #400

Readers

Graphs

  • add missing values matrix #403
  • update histogram to use column indexes #404
  • Add warning to user when reqs not installed #407

Bug fixes

  • Fix bug in mode when disabled #388
  • Update exception text for ssl_verify error #395
  • ssl verify misnaming fix and consecutive spaces in csv fix #405
  • fix cnn confidences not slicing data correctly #419

Other Changes

v0.7.1

09 Aug 17:01
db52376
Compare
Choose a tag to compare

Profiler

  • Validate min_true_samples in update_profile #377
  • Add mode to numeric stats #382

Readers

  • Readers now accepts a url to a file for reading #375
  • Allow text to determine encoding automatically #378

Graphs

  • Graphs: Create function which accepts a profiler and creates histogram bar charts #367

Bug fixes

  • Fixes bug in _get_quantiles when median case occurs #383
  • Catch Divide by 0 bug for unique row ratio #384
  • Make clean data function static again due to multiprocessing and model issue #385

Other Changes

v0.7.0

30 Jul 15:12
d130b1d
Compare
Choose a tag to compare

Profiler

Readers

Runtime Changes

  • Abstract NumericalStatsMixin profile for columns #337
  • Added profiler min true samples error checking #365

Bug fixes

  • Allow users to send in non-string value for structured labeling #343
  • Profiler samples now doesn't change visual representation when passed as a list #363

Other Changes

  • requirements.txt changes added scipy #369
  • Update throughput testing changes #356
  • Version updated #370
  • Github Pages updated #345, #362, #372, #373

v0.6.1

16 Jul 20:42
1303abe
Compare
Choose a tag to compare

Profiler

  • Options added to allow setting 'k' concerning the top k highest counts of categorical #325
  • Improved CSV data streaming to accept StringIO/BytesIO #327

Runtime

  • Text in Unstructured profiler now keep a count of word #321

Bug Fixes

  • Fixed unalikeability bug that caused errors on datasets with only one sample #341

Other Changes

  • Standardized through-put for structured testing #298

v0.6.0

14 Jul 15:10
a56f9e3
Compare
Choose a tag to compare

Profiler

  • Structured Profiler can now take in duplicate columns #315
    • this is an api Change to access to the data in the report, data_stats is now a list
  • Categorical Profile now includes top 5 counts #299
  • Add new categorical statistics: gini impurity and unalikeability #308, #320
  • Unstructured Data Labeler profile now includes entity percentages #305
  • Add Pearson's correlation to the Structured Profiler #281, #307, #317
  • Unstructured Profiler Text vocab now outputs a top k highest vocab counts #304, #314

Runtime Changes

  • Categorical Profiler keeps an internal count of categories #296
  • Text in Unstructured profiler now keep a count of vocab #304
  • Data Reader's `is_match function can now take in StringIO/ByteIO #292 ,#306, #326

Bug fixes

  • Bug fix to make sure samples being stored by UnstructuredProfiler save #313

Other Changes

v0.5.3

28 Jun 13:53
0bf4c77
Compare
Choose a tag to compare

Bug fixes

  • remove unused import causing profiler error #290