Updated null stats tests to include data initialized in setUpClass #898

jacob-buehler · 2023-06-22T15:29:37Z

Updated null stats tests to include data initialized in setUpClass function

…nction

CLAassistant · 2023-06-22T15:29:43Z

All committers have signed the CLA.

dataprofiler/tests/profilers/test_profile_builder.py

taylorfturner · 2023-06-22T15:37:54Z

dataprofiler/tests/profilers/test_profile_builder.py

-from . import utils as test_utils
+from dataprofiler.tests.profilers import utils as test_utils


why did this need to change?

the original line didn't import utils properly on my local, but the new line imports it consistently

That might be bc the test was run from the dir as opposed to base of repo. I think here we are trying to separate from test vs in lib by using relative. but open to convo if we need a change here.

taylorfturner · 2023-06-22T15:40:33Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # file_path = os.path.join(test_root_path, "data", "csv/empty_rows.txt")
+        # data = pd.read_csv(file_path)


is this needed?

taylorfturner · 2023-06-22T15:41:12Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # I commented out these lines of code, because they are a second
+        # test of the functions tested in the last four lines of code.
+        # Since we intend to use only the setUpClass data, there is no
+        # reason to keep these in, or test those functions a second time.
+
+        # file_path = os.path.join(test_root_path, "data", "csv/iris-with-null-rows.csv")
+        # data = pd.read_csv(file_path)
+        # data = self.data
+
+        # profile = dp.StructuredProfiler(data, options=profiler_options)
+        # self.assertEqual(13, profile.row_has_null_count)
+        # self.assertEqual(13 / 24, profile._get_row_has_null_ratio())
+        # self.assertEqual(3, profile.row_is_null_count)
+        # self.assertEqual(3 / 24, profile._get_row_is_null_ratio())


i'd recommend either deleting or keeping -- not just keeping commented out code in the file

taylorfturner · 2023-06-22T15:41:27Z

dataprofiler/tests/profilers/test_profile_builder.py

@@ -3646,35 +3665,39 @@ def test_null_in_file(self):
                "row_statistics.is_enabled": True,
            }
        )
-        data = dp.Data(filename_null_in_file)
+        # data = dp.Data(filename_null_in_file)


do we need to keep commented out?

taylorfturner · 2023-06-22T15:42:18Z

dataprofiler/tests/profilers/test_profile_builder.py

+        names_idx = report["global_stats"]["profile_schema"]["names"][0]
+        numbers_idx = report["global_stats"]["profile_schema"]["numbers"][0]


changing becuase of new dataset

taylorfturner · 2023-06-22T15:42:33Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # data = [
+        #     ["test1", 1.0],
+        #     ["test2", 2.0],
+        #     ["test3", 3.0],
+        #     [None, None],
+        #     ["test5", 5.0],
+        #     ["test6", 6.0],
+        #     [None, None],
+        #     ["test7", 7.0],
+        # ]
+        # data = pd.DataFrame(data, columns=["NAME", "VALUE"])


same as above on commented out code

taylorfturner · 2023-06-22T15:42:47Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # data = pd.DataFrame(
+        #     {
+        #         "full": [1, 2, 3, 4, 5, 6, 7, 8, 9],
+        #         "sparse": [1, None, 3, None, 5, None, 7, None, 9],
+        #     }
+        # )


same as above

taylorfturner · 2023-06-22T15:42:52Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # data2 = pd.DataFrame(
+        #     {
+        #         "sparse": [1, None, 3, None, 5, None, 7, None],
+        #         "sparser": [1, None, None, None, None, None, None, 8],
+        #     }
+        # )


taylorfturner · 2023-06-22T15:43:28Z

dataprofiler/tests/profilers/test_profile_builder.py

+        # self.assertSetEqual({}, profile._profile[0].null_types_index)
+        # self.assertSetEqual({}, profile._profile[1].null_types_index)
+        self.assertEqual({}, profile._profile[0].null_types_index)
+        self.assertEqual({}, profile._profile[1].null_types_index)


why is this changing from a set to just assertEqual?

to solve an issue: 'dict' has no attribute 'difference'

JGSweets · 2023-06-22T16:49:52Z

dataprofiler/tests/profilers/test_profile_builder.py

-        file_path = os.path.join(test_root_path, "data", "csv/empty_rows.txt")
-        data = pd.read_csv(file_path)
+        data = self.data
+


I wonder if there was intent to profile an empty csv. I'm concerned about changing the data.

JGSweets

I think we may need to take a step back here and identify the need of this PR.
Identify which tests in here are slow and work towards what we need to do to speed those up as oppose to a changing everything. I'm concerned by unifying to a single data source we are losing data variability which has intention within the structure of the data to test different cases of profiling.

taylorfturner · 2023-06-28T17:20:44Z

Related to issue #866

Updated null stats tests to include data initialized in setUpClass fu…

0e244b7

…nction

jacob-buehler requested review from JGSweets, ksneab7, taylorfturner, micdavis and tyfarnan as code owners June 22, 2023 15:29

taylorfturner enabled auto-merge (squash) June 22, 2023 15:37

jacob-buehler commented Jun 22, 2023

View reviewed changes

dataprofiler/tests/profilers/test_profile_builder.py Show resolved Hide resolved

dataprofiler/tests/profilers/test_profile_builder.py Show resolved Hide resolved

taylorfturner reviewed Jun 22, 2023

View reviewed changes

11:52am revision of null stats tests

83f3688

taylorfturner added the Work In Progress Solution is being developed label Jun 22, 2023

auto-merge was automatically disabled June 22, 2023 15:55
Head branch was pushed to by a user without write access

taylorfturner assigned jacob-buehler Jun 22, 2023

JGSweets reviewed Jun 22, 2023

View reviewed changes

JGSweets suggested changes Jun 22, 2023

View reviewed changes

jacob-buehler closed this Jun 22, 2023

jacob-buehler mentioned this pull request Aug 1, 2023

Modify TestStructuredProfilerRowStatistics: Update null stats tests #866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated null stats tests to include data initialized in setUpClass #898

Updated null stats tests to include data initialized in setUpClass #898

jacob-buehler commented Jun 22, 2023

CLAassistant commented Jun 22, 2023 •

edited

Loading

taylorfturner Jun 22, 2023

jacob-buehler Jun 22, 2023

JGSweets Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

taylorfturner Jun 22, 2023

jacob-buehler Jun 22, 2023

JGSweets Jun 22, 2023

JGSweets left a comment

taylorfturner commented Jun 28, 2023

		from . import utils as test_utils
		from dataprofiler.tests.profilers import utils as test_utils

		# file_path = os.path.join(test_root_path, "data", "csv/empty_rows.txt")
		# data = pd.read_csv(file_path)

		names_idx = report["global_stats"]["profile_schema"]["names"][0]
		numbers_idx = report["global_stats"]["profile_schema"]["numbers"][0]

Updated null stats tests to include data initialized in setUpClass #898

Updated null stats tests to include data initialized in setUpClass #898

Conversation

jacob-buehler commented Jun 22, 2023

CLAassistant commented Jun 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JGSweets left a comment

Choose a reason for hiding this comment

taylorfturner commented Jun 28, 2023

CLAassistant commented Jun 22, 2023 •

edited

Loading