Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python enable markdown acceptance tests #64

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

temyers
Copy link
Contributor

@temyers temyers commented Nov 28, 2022

🤔 What's changed?

My version of #37

⚡️ What's your motivation?

Fix acceptance tests to support markdown in python

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, documentation etc. without changing behaviour)
  • 🐛 Bug fix (non-breaking change which fixes a defect)
  • ⚡ New feature (non-breaking change which adds new behaviour)
  • 💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

This: https://github.com/cucumber/gherkin/pull/64/files#r1033154806
Would appreciate a second pair of eyes on the approval tests.
These should be consistent across python and Javascript implementations (at least) (cc @aslakhellesoy)?

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

This text was originally generated from a template, then edited by hand. You can modify the template here.

python/gherkin/parser.py Outdated Show resolved Hide resolved
@mpkorstanje mpkorstanje changed the base branch from python-enable-markdown-acceptance-tests to main November 28, 2022 11:18
@mpkorstanje mpkorstanje marked this pull request as ready for review November 28, 2022 11:21
@mpkorstanje
Copy link
Contributor

Weird. I would have expected CI to run when I changed the target branch. I guess you'll have to push something for the build to trigger

@mpkorstanje mpkorstanje marked this pull request as draft November 28, 2022 11:22
@temyers temyers marked this pull request as ready for review December 27, 2022 09:13
@temyers
Copy link
Contributor Author

temyers commented Dec 27, 2022

@mpkorstanje
I think this is now ready for review.

I have acceptance tests passing locally (I think)

Will review any CI failures.

it('matches bullet Step', () => {
const line = new GherkinLine(' * Given I have 3 cukes', location.line)
const token = new Token(line, location)
assert(tm.match_StepLine(token))
assert.strictEqual(token.matchedType, TokenType.StepLine)
assert.strictEqual(token.matchedKeyword, 'Given ')
assert.strictEqual(token.matchedKeywordType, 'Context')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added checks here since I had a bug when generating pickles.

Is this an appropriate place to be checking matchedKeywordType

assert.strictEqual(token.matchedText, 'I have 3 cukes')
assert.strictEqual(token.location.column, 6)
})

it('matches a when Step', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test as I had a bug in my Python parser.

Is this an appropriate location to be testing matchedKeywordType

If these are too low level (white-box / implementation detail), then the tests should be moved/removed

@@ -186,4 +209,21 @@ describe('GherkinInMarkdownTokenMatcher', function () {
]
assert.deepStrictEqual(t.matchedItems, expectedItems)
})

it('matches arbitrary text as Empty after the FeatureLine has already been matched', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing implementation detail, but I find it useful documentation for the token matcher - there is inherent state management here that I found confusing to begin with (resulting in a bug in my Python translation)

This behaviour is captured under the acceptance tests, but was useful for me for understanding where differences between implementations were.

Thoughts - is this useful to keep, or preferable to remove?

parser = Parser(TokenFormatterBuilder())
for file in files:
scanner = TokenScanner(file)
print(parser.parse(scanner))

if(file.endswith('.md')):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is now spread across a few files (see source_events.py)

Is there a better way to encapsulate this behaviour?

Caveat from CONTRIBUTING.md:

TL;DR anyone who only knows one of the supported programming languages should be
able to fix a bug or add a feature in all the other implementations. - By virtue of
finding their way around a consistently organised codebase.

@@ -28,7 +29,10 @@ def enum(self, source_event):
source = source_event['source']['data']

try:
gherkin_document = self.parser.parse(source)
matcher=None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above - DRY

}
}
return event


def _media_type(path):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above - DRY

Comment on lines 27 to +28
self._set_token_matched(token,None)
return False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note based on CONTRIBUTING.md:

TL;DR anyone who only knows one of the supported programming languages should be
able to fix a bug or add a feature in all the other implementations. - By virtue of
finding their way around a consistently organised codebase.

There is a difference in behaviour for _set_token_matched between the JavaScript and Python implementations, one (javascript) returns true/false based on whether the token was matched (properly?), the other (python) returns None

This was a cause for confusion for me, and a source of some of the issues here.

@@ -6,7 +6,16 @@
from gherkin.gherkin_line import GherkinLine
location = { 'line': 1, 'column': 1 }

def test_it_matches_FeatureLine():
def test_it_matches_FeatureLineH1():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR, but I recently found pytest-describe that would make it easier to match the describe style of the javascript variant

I really prefer it - it enables me to structure my specifications more logically.

One to think about...

@@ -5,6 +5,7 @@ GHERKIN_PARSER = src/Parser.ts
GHERKIN_RAZOR = gherkin-javascript.razor
SOURCE_FILES = $(shell find . -name "*.js" | grep -v $(GHERKIN_PARSER))

# HELP - is this correnct now?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpkorstanje is this still correct with the package changes?

Looking at the changelog, it looks to have been moved here

That is different from the python implementation and now potentially causes a dependency issue? I think there's a conflict between the javascript and python implementations with minor differences between the ASTs - which I'm not sure how to resolve.

Comment on lines +135 to +138
token.matchedType = TokenType.Comment
token.matchedText = undefined
token.matchedIndent=0
result = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these in an attempt to make the Python and Javascript versions consistent. But the Javascript acceptance tests currently fail because of my comment above regarding @cucumber/gherkin-streams

If I try to remove the matchedType from the Python implementation, it fails - but I'm not sure how to resolve.

Guidance here would be appreciated.

@@ -308,5 +309,128 @@ description

assert.strictEqual(pickle.steps[0].argument.docString.content, '```what')
})

it("parses Markdown data tables with headers", () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a re-implementation of the datatables.md acceptance test - once issues have been resolved then this test can probably be removed.

"column": 3,
"line": 7
},
"text": undefined
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute is required, but conflicts with the Python implementation (see below)

]
},
"id": "2",
docString: undefined,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this attribute is required in the Javascript implementation, but not in the Python implementation.

}
assert ast == expected

def test_it_parses_markdown_data_tables_with_headers():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-implementation of the datatables.md acceptance test.

Note comments above regarding Javascript implementation.

This test can probably be deleted once issues are resolved.

"column": 3,
"line": 7
},
"text": None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note discrepancy with Javascript version and causes issues when executing acceptance tests.

To resolve:

  • Either add the text of the table header line OR
  • Remove the entry entirely - I don't know how to do this in the Python implementation (would need pointing in the right direction).

]
},
"id": "2",
# "docString": None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is not included in this implementation - discrepancy with the Javascript implementation

@@ -1 +1 @@
{"gherkinDocument":{"comments":[],"feature":{"children":[{"scenario":{"description":"","examples":[],"id":"3","keyword":"Scenario","location":{"column":5,"line":3},"name":"minimalistic","steps":[{"dataTable":{"location":{"column":3,"line":6},"rows":[{"cells":[{"location":{"column":5,"line":6},"value":"foo"},{"location":{"column":11,"line":6},"value":"bar"}],"id":"0","location":{"column":3,"line":6}},{"cells":[{"location":{"column":5,"line":8},"value":"boz"},{"location":{"column":11,"line":8},"value":"boo"}],"id":"1","location":{"column":3,"line":8}}]},"id":"2","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":5},"text":"a simple data table"}],"tags":[]}}],"description":"","keyword":"Feature","language":"en","location":{"column":4,"line":1},"name":"DataTables","tags":[]},"uri":"../testdata/good/datatables.feature.md"}}
{"gherkinDocument":{"comments":[{"location":{"column":3,"line":7},"text":null}],"feature":{"children":[{"scenario":{"description":"","examples":[],"id":"3","keyword":"Scenario","location":{"column":5,"line":3},"name":"minimalistic","steps":[{"dataTable":{"location":{"column":3,"line":6},"rows":[{"cells":[{"location":{"column":5,"line":6},"value":"foo"},{"location":{"column":11,"line":6},"value":"bar"}],"id":"0","location":{"column":3,"line":6}},{"cells":[{"location":{"column":5,"line":8},"value":"boz"},{"location":{"column":11,"line":8},"value":"boo"}],"id":"1","location":{"column":3,"line":8}}]},"id":"2","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":5},"text":"a simple data table"}],"tags":[]}}],"description":"","keyword":"Feature","language":"en","location":{"column":4,"line":1},"name":"DataTables","tags":[]},"uri":"../testdata/good/datatables.feature.md"}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this updated version is using the python AST generator.

Using the Javascript implementation:
npx @cucumber/gherkin-streams --no-pickles --predictable-ids --no-source ../testdata/good/datatables.feature.md | jq --sort-keys --compact-output "."

{"gherkinDocument":{"comments":[],"feature":{"children":[{"scenario":{"description":"","examples":[],"id":"3","keyword":"Scenario","location":{"column":5,"line":3},"name":"minimalistic","steps":[{"dataTable":{"location":{"column":3,"line":6},"rows":[{"cells":[{"location":{"column":5,"line":6},"value":"foo"},{"location":{"column":11,"line":6},"value":"bar"}],"id":"0","location":{"column":3,"line":6}},{"cells":[{"location":{"column":5,"line":8},"value":"boz"},{"location":{"column":11,"line":8},"value":"boo"}],"id":"1","location":{"column":3,"line":8}}]},"id":"2","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":5},"text":"a simple data table"}],"tags":[]}}],"description":"","keyword":"Feature","language":"en","location":{"column":4,"line":1},"name":"DataTables","tags":[]},"uri":"../testdata/good/datatables.feature.md"}}

But as per my previous comment, I don't think this takes into account my changes to javascript/src/GherkinInMarkdownTokenMatcher.ts

A quick grok suggests that the previous version is the Javascript implementation.

I'd like to remove the comment from this output - but not sure how to achieve it safely - my attempts so far have been unsuccessful.

@@ -1 +1 @@
{"gherkinDocument":{"comments":[],"feature":{"children":[{"scenario":{"description":"","examples":[],"id":"4","keyword":"Scenario","location":{"column":4,"line":7},"name":"minimalistic","steps":[{"id":"0","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":8},"text":"the minimalism"}],"tags":[{"id":"1","location":{"column":2,"line":5},"name":"@scenario_tag1"},{"id":"2","location":{"column":19,"line":5},"name":"@scenario_tag2"},{"id":"3","location":{"column":4,"line":6},"name":"@scenario_tag3"}]}},{"scenario":{"description":"","examples":[{"description":"","id":"11","keyword":"Examples","location":{"column":5,"line":17},"name":"","tableBody":[{"cells":[{"location":{"column":5,"line":20},"value":"minimalism"}],"id":"7","location":{"column":3,"line":20}}],"tableHeader":{"cells":[{"location":{"column":5,"line":18},"value":"what"}],"id":"6","location":{"column":3,"line":18}},"tags":[{"id":"8","location":{"column":2,"line":15},"name":"@ex_tag1"},{"id":"9","location":{"column":13,"line":15},"name":"@ex_tag2"},{"id":"10","location":{"column":4,"line":16},"name":"@ex_tag3"}]},{"description":"","id":"17","keyword":"Examples","location":{"column":5,"line":24},"name":"","tableBody":[{"cells":[{"location":{"column":5,"line":27},"value":"more minimalism"}],"id":"13","location":{"column":3,"line":27}}],"tableHeader":{"cells":[{"location":{"column":5,"line":25},"value":"what"}],"id":"12","location":{"column":3,"line":25}},"tags":[{"id":"14","location":{"column":2,"line":22},"name":"@ex_tag4"},{"id":"15","location":{"column":13,"line":22},"name":"@ex_tag5"},{"id":"16","location":{"column":4,"line":23},"name":"@ex_tag6"}]}],"id":"21","keyword":"Scenario Outline","location":{"column":4,"line":12},"name":"minimalistic outline","steps":[{"id":"5","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":13},"text":"the <what>"}],"tags":[{"id":"18","location":{"column":2,"line":10},"name":"@so_tag1"},{"id":"19","location":{"column":14,"line":10},"name":"@so_tag2"},{"id":"20","location":{"column":4,"line":11},"name":"@so_tag3"}]}},{"scenario":{"description":"","examples":[],"id":"23","keyword":"Scenario","location":{"column":4,"line":30},"name":"comments","steps":[],"tags":[{"id":"22","location":{"column":2,"line":29},"name":"@comment_tag1"}]}},{"scenario":{"description":"","examples":[],"id":"25","keyword":"Scenario","location":{"column":4,"line":34},"name":"hash in tags","steps":[],"tags":[{"id":"24","location":{"column":2,"line":33},"name":"@comment_tag#2"}]}},{"rule":{"children":[{"scenario":{"description":"","examples":[],"id":"28","keyword":"Scenario","location":{"column":5,"line":41},"name":"joined tags","steps":[],"tags":[{"id":"26","location":{"column":2,"line":40},"name":"@joined_tag3"},{"id":"27","location":{"column":16,"line":40},"name":"@joined_tag4"}]}}],"description":"","id":"30","keyword":"Rule","location":{"column":4,"line":38},"name":"","tags":[{"id":"29","location":{"column":2,"line":37},"name":"@rule_tag"}]}}],"description":"","keyword":"Feature","language":"en","location":{"column":3,"line":3},"name":"Minimal Scenario Outline","tags":[{"id":"31","location":{"column":2,"line":1},"name":"@feature_tag1"},{"id":"32","location":{"column":18,"line":1},"name":"@feature_tag2"},{"id":"33","location":{"column":4,"line":2},"name":"@feature_tag3"}]},"uri":"../testdata/good/tags.feature.md"}}
{"gherkinDocument":{"comments":[{"location":{"column":3,"line":19},"text":null},{"location":{"column":3,"line":26},"text":null}],"feature":{"children":[{"scenario":{"description":"","examples":[],"id":"4","keyword":"Scenario","location":{"column":4,"line":7},"name":"minimalistic","steps":[{"id":"0","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":8},"text":"the minimalism"}],"tags":[{"id":"1","location":{"column":2,"line":5},"name":"@scenario_tag1"},{"id":"2","location":{"column":19,"line":5},"name":"@scenario_tag2"},{"id":"3","location":{"column":4,"line":6},"name":"@scenario_tag3"}]}},{"scenario":{"description":"","examples":[{"description":"","id":"11","keyword":"Examples","location":{"column":5,"line":17},"name":"","tableBody":[{"cells":[{"location":{"column":5,"line":20},"value":"minimalism"}],"id":"7","location":{"column":3,"line":20}}],"tableHeader":{"cells":[{"location":{"column":5,"line":18},"value":"what"}],"id":"6","location":{"column":3,"line":18}},"tags":[{"id":"8","location":{"column":2,"line":15},"name":"@ex_tag1"},{"id":"9","location":{"column":13,"line":15},"name":"@ex_tag2"},{"id":"10","location":{"column":4,"line":16},"name":"@ex_tag3"}]},{"description":"","id":"17","keyword":"Examples","location":{"column":5,"line":24},"name":"","tableBody":[{"cells":[{"location":{"column":5,"line":27},"value":"more minimalism"}],"id":"13","location":{"column":3,"line":27}}],"tableHeader":{"cells":[{"location":{"column":5,"line":25},"value":"what"}],"id":"12","location":{"column":3,"line":25}},"tags":[{"id":"14","location":{"column":2,"line":22},"name":"@ex_tag4"},{"id":"15","location":{"column":13,"line":22},"name":"@ex_tag5"},{"id":"16","location":{"column":4,"line":23},"name":"@ex_tag6"}]}],"id":"21","keyword":"Scenario Outline","location":{"column":4,"line":12},"name":"minimalistic outline","steps":[{"id":"5","keyword":"Given ","keywordType":"Context","location":{"column":3,"line":13},"text":"the <what>"}],"tags":[{"id":"18","location":{"column":2,"line":10},"name":"@so_tag1"},{"id":"19","location":{"column":14,"line":10},"name":"@so_tag2"},{"id":"20","location":{"column":4,"line":11},"name":"@so_tag3"}]}},{"scenario":{"description":"","examples":[],"id":"23","keyword":"Scenario","location":{"column":4,"line":30},"name":"comments","steps":[],"tags":[{"id":"22","location":{"column":2,"line":29},"name":"@comment_tag1"}]}},{"scenario":{"description":"","examples":[],"id":"25","keyword":"Scenario","location":{"column":4,"line":34},"name":"hash in tags","steps":[],"tags":[{"id":"24","location":{"column":2,"line":33},"name":"@comment_tag#2"}]}},{"rule":{"children":[{"scenario":{"description":"","examples":[],"id":"28","keyword":"Scenario","location":{"column":5,"line":41},"name":"joined tags","steps":[],"tags":[{"id":"26","location":{"column":2,"line":40},"name":"@joined_tag3"},{"id":"27","location":{"column":16,"line":40},"name":"@joined_tag4"}]}}],"description":"","id":"30","keyword":"Rule","location":{"column":4,"line":38},"name":"","tags":[{"id":"29","location":{"column":2,"line":37},"name":"@rule_tag"}]}}],"description":"","keyword":"Feature","language":"en","location":{"column":3,"line":3},"name":"Minimal Scenario Outline","tags":[{"id":"31","location":{"column":2,"line":1},"name":"@feature_tag1"},{"id":"32","location":{"column":18,"line":1},"name":"@feature_tag2"},{"id":"33","location":{"column":4,"line":2},"name":"@feature_tag3"}]},"uri":"../testdata/good/tags.feature.md"}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the differences for comments (likely another problem in waiting...)

@temyers
Copy link
Contributor Author

temyers commented Dec 27, 2022

@mpkorstanje Ok, I think I'm blocked on this now.

TL;DR

  • There's a conflict between the python and Javascript implementations for the AST
  • The Javascript acceptance tests are dependent on an external module meaning I can't update the implementation without a release (chicken and egg situation)
  • I'm not sure how to make the Python implementation match the Javascript version - it's creating tokens I don't want, but not sure how to stop it
  • I'm not sure which implementation is more 'correct' between Javascript and Python variants.

@davidjgoss
Copy link
Contributor

@temyers if you merge main into this branch now you should at least be past the JavaScript circular dep issue.

@mpkorstanje
Copy link
Contributor

Looks like this fell of the wagon at some point. I've tried to merge main into this PR but I'm seeing more conflicts than I can solve.

Is it worth salvaging this PR?

@kieran-ryan
Copy link
Member

Looks like this fell of the wagon at some point. I've tried to merge main into this PR but I'm seeing more conflicts than I can solve.

Is it worth salvaging this PR?

@mpkorstanje, haven't evaluated the scope of changes as such just yet, though for sure would be great to retain the contributions and review if something to take forward.

I was able to merge and resolve the conflicts from main locally - majorly conflicts with removal of duplicate code (#205), formatting (#286), type hints (#283) and migration of bin/ to scripts/ (#290); though intend to repeat and look more thoroughly when have more time before pushing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants