You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I using GNews get_full_article() function to extract the top_image from the Article. However, when I run this on my production server it throws me the below error:
ERROR: Article download() failed with HTTPSConnectionPool(host='indianexpress.com', port=443): Max retries exceeded with url: /article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))) on URL https://indianexpress.com/article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/
I searched through Google and ended up with this solution:
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = user_agent
url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
As per the code above, I need to mention the user agent and get it assigned to config.browser_user_agent to prevent the server from getting banned. However, if I want to use gnews.get_full_article() I am not able to specify the config parameter inside. Is there any provision to mention this parameter? Am I missing something?
The text was updated successfully, but these errors were encountered:
I using GNews
get_full_article()
function to extract thetop_image
from theArticle
. However, when I run this on my production server it throws me the below error:ERROR: Article
download()failed with HTTPSConnectionPool(host='indianexpress.com', port=443): Max retries exceeded with url: /article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/ (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))) on URL https://indianexpress.com/article/idea-exchange/gautam-gambhir-idea-exchange-first-challenge-mcd-polls-change-narrative-bjp-doesnt-do-anything-8158944/
I searched through Google and ended up with this solution:
As per the code above, I need to mention the
user agent
and get it assigned toconfig.browser_user_agent
to prevent the server from getting banned. However, if I want to usegnews.get_full_article()
I am not able to specify theconfig
parameter inside. Is there any provision to mention this parameter? Am I missing something?The text was updated successfully, but these errors were encountered: