Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes 414 request-uri too long for import_articles_by_title by using a POST request .2 #6210

Conversation

empty-codes
Copy link
Contributor

What this PR does

Fixes 414 request-uri too long for import_articles_by_title by using a POST request to send the query parameters in the request.body instead of a GET request:

  • Adds kn.wikipedia.org to list of stubbed wiki validations
  • Adds 3 API post methods, api_post, do_post and too_many_requests for creating and sending the post requests
  • Refactors import_articles_by_title since response is now a hash
  • Adds a test that confirms 414 request-uri too long no longer occurs and a new vcr-cassette some_article_titles.yml

Adds TYPICAL_ERRORS array and safe navigation operator &. to results = response&.dig('query', 'pages') to prevent:

  • NameError: uninitialized constant ArticleImporter::TYPICAL_ERRORS when ApiErrorHandling sets the error level
  • NoMethodError: undefined method dig for nil:NilClass results = response.dig('query', 'pages') when response is nil ( for example when methods that call import_articles_by_title are mocked, the api_post response becomes nil)

The idea of api_post and do_post was gotten from a similar implementation in replica.rb and I tried to follow the flow of wiki_api.rb too.

Screenshots

Before:
image

After:
image
image

…ing a POST request to send the query parameters in the `request.body` instead of a GET request:

- Adds `kn.wikipedia.org` to list of stubbed wiki validations
- Adds 3 API post methods, `api_post`, `do_post` and `too_many_requests` for creating and sending the post requests
- Refactors `import_articles_by_title` since response is now a hash
- Adds a test that confirms `414 request-uri too long` no longer occurs
….` to `results = response&.`:

- Prevents:  'NameError:
       uninitialized constant ArticleImporter::TYPICAL_ERRORS' when `ApiErrorHandling` sets the error level
- Prevents: 'NoMethodError:
       undefined method `dig' for nil:NilClass
             results = response.dig('query', 'pages')
                               ^^^^' when response is nil ( for example when methods that call `import_articles_by_title` are mocked, the api_post response becomes `nil`)
@@ -60,4 +66,44 @@ def import_articles_from_title_query(results)
end
Article.import articles, on_duplicate_key_update: [:title, :namespace]
end

###############
# API methods #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all about interacting with the MediaWiki API, independently of ArticleImporter, so it probably should not be part of this class. It should be either in WikiApi or in a new WikiPostApi.


def do_post(api_url, params)
uri = URI(api_url)
https = Net::HTTP.new(uri.host, uri.port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test whether you can do a post request with the mediawiki_api gem?

It is not well-documented, but it looks like it will support it as a parameter: https://github.com/wikimedia/mediawiki-ruby-api/blob/master/lib/mediawiki_api/client.rb#L157

I think that means you can add http_method: :post as a query parameter, and it will make it as a POST request instead of a GET. If that works, maybe just adding the extra parameter will make it work without re-implementing any API methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the first way I tried and it did not work so I used this way instead. However, your comment prompted me to retry and turns out I simply implemented it wrongly the first time. I was able to get it to work by simply adding the extra parameter (although I had to create a separate meta method in wiki_api since it kept error-ing). I created a new PR to that effect: #6212

@@ -67,5 +68,50 @@
expect(Article.find_by(title: 'Istanbul').mw_page_id).to eq(3391396)
end
end

it 'returns the correct number of titles without raising error 414' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great test. It should include a comment explaining in a little more detail what the test is doing and why the list of titles is what it is.

@empty-codes empty-codes changed the title Fixes 414 request-uri too long for import_articles_by_title by using a POST request Fixes 414 request-uri too long for import_articles_by_title by using a POST request .2 Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants