String Length Check #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

dougb merged 7 commits into master from StringLengthCheck

Jun 27, 2019

Collaborator

samratmitra-0812 commented Jun 24, 2019

EEDD-1020: String Length Check.
Unit Test results attached in https://jira.target.com/browse/EEDD-1020


          String Length Check

4f7648a

samratmitra-0812 requested a review from dougb

June 24, 2019 17:30

colindean requested changes

View reviewed changes

Collaborator

colindean left a comment

A few changes to the readme that I'd want to see.

Great work on the thorough tests. An improvement opportunity I see would reduce duplication inside of many of the tests. Move commonly used things into a test environment trait or collection of them and extend that environment in the test, e.g.

//old
it("minValue") {
  val dict = new VarSubstitution
  //… the rest
}

//new
it("minValue") in new VarSubstutitionSetup {
  //… the rest
}

trait VarSubstitutionSetup {
  val dict = new VarSubstitution
}

This is a really shallow example, but there are some other duplicate things or near duplicates that you could move into this trait or other traits, e.g. VarSubstitutionSetup with YamlDecoder and that latter trait is defined as

trait YamlDecoder { 
  def decodeYaml(yaml: String) = io.circe.yaml.parser.parse(yaml).right.getOrElse(Json.Null)
}

This "cake building" would let you reduce your tests in many cases to only the assertions, maybe with one additional line that calls methods provided by the test environment traits.

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

dougb reviewed

View reviewed changes

Collaborator

dougb left a comment •

edited

Loading

@samratmitra-0812 This is great!

The tests are pretty consistent with the existing code, but @colindean comments are good. I think we should have a separate pr to cleanup the test like what Colin suggests. I tried to add some handy test functions in #13 but something like what colin suggested might be better.

I would be fine with just making the suggested changes to the README.md and getting this merged, and fixing the tests in a future pr after #13 is merged.

samratmitra-0812 and others added 3 commits

June 25, 2019 12:26


          Update README.md

d360227

Co-Authored-By: Colin Dean <[email protected]>


          Update README.md

5087f20

Co-Authored-By: Colin Dean <[email protected]>


          Update README.md

d34bc09

Co-Authored-By: Colin Dean <[email protected]>

Collaborator Author

samratmitra-0812 commented Jun 25, 2019

@dougb I completely agree with you. We should really take up on colin's suggestions and do the refactoring. Let us make sure we have a JIRA ticket for this.

I have commited the suggestions to README.md.

dougb approved these changes

View reviewed changes

Collaborator

dougb left a comment •

edited

Loading

👍
I might have found a problem testing this on BR, need to dig a little more. Don't merge it yet.

phpisciuneri approved these changes

View reviewed changes

Contributor

phpisciuneri left a comment

@samratmitra-0812 This is really nice, thank you for contributing. I don't like the ordering assumption we make on the asserts on the tests. I am curious to hear what @dougb thinks. I am OK with addressing this in the test refactoring PR.

src/main/scala/com/target/data_validator/validator/StringLengthCheck.scala Outdated

+              import org.apache.spark.sql.DataFrame
+              import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute
+              import org.apache.spark.sql.catalyst.expressions._
+              import org.apache.spark.sql.types.{DataType, IntegerType, StringType, StructType}

Contributor

phpisciuneri Jun 25, 2019

@samratmitra-0812 DataType is an unused import

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+              import com.target.TestingSparkSession
+              import com.target.data_validator._
+              import com.target.data_validator.validator.ValidatorBase.D0

Contributor

phpisciuneri Jun 25, 2019

unused import

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                  describe("configCheck") {
+                    it("error if min and max are not defined") {
+                      val dict = new VarSubstitution

Contributor

phpisciuneri Jun 25, 2019

dict is unused

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                    }
+                    it("error if min and max are different types") {
+                      val dict = new VarSubstitution

Contributor

phpisciuneri Jun 25, 2019

dict is unused

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                    }
+                    it("error if column is not found in df") {
+                      val dict = new VarSubstitution

Contributor

phpisciuneri Jun 25, 2019

dict is unused

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                    }
+                    it("error if col type is not String") {
+                      val dict = new VarSubstitution

Contributor

phpisciuneri Jun 25, 2019

dict is unused

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                    }
+                    it("minValue less than maxValue fails configCheck") {
+                      val dict = new VarSubstitution

Contributor

phpisciuneri Jun 25, 2019

dict is unused

src/test/scala/com/target/data_validator/validator/StringLengthCheckSpec.scala Outdated

+                      assert(sut.failed)
+                      assert(sut.getEvents contains
+                        ValidatorCheckEvent(failure = true, "StringLengthCheck on column 'item'", 4, 2))
+                      assert(sut.getEvents contains

Contributor

phpisciuneri Jun 25, 2019

@samratmitra-0812 @dougb I think we need to be careful here, and in other tests below.

This assert is based on the original ordering of defData. There is no guarantee that when converted to a dataframe, partitioned, and operated on that we will get results back in the same order. Since there are two potential modes of failure here this result could very well be the other one (empty string).

https://issues.apache.org/jira/browse/SPARK-16207?focusedCommentId=15349509&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15349509

SamratMitra added 2 commits

June 26, 2019 13:59


          Removed unused imports and variables

534734d


          Changes to README

7e8ab17

Collaborator Author

samratmitra-0812 commented Jun 26, 2019 •

edited

Loading

@phpisciuneri Thanks for your comments. I have removed the unused imports and variables.

Nice catch on the assert ! I have made modifications by checking that exactly one of the 2 errors is present in the event list.

The issue does not exist for other tests because for all of them, the number of invalid rows is equal to the specified value of numErrorsToReport. So we can expect errors for all the invalid rows to be present in the list.


          Unit Test Fix:Check that exactly one of the 2 errors are present

50a6416

colindean approved these changes

View reviewed changes

Collaborator

colindean left a comment •

edited

Loading

I'll clear my RC if you'll refactor the tests in a future PR 😉

dougb merged commit 1743579 into master

dougb deleted the StringLengthCheck branch

June 27, 2019 01:16

dougb mentioned this pull request

Refactor tests using traits #16

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet