You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I'm using newspaper3k package to parse the following article: https://spectrum.ieee.org/3d-printed-meat
In debugged it until I reached the code section of ContentExtractor.nodes_to_check method and I saw that when it execute the following: items = self.parser.getElementsByTag(doc, tag=tag)
when tag = 'p'
I get 75 elements which do not include the article text, compared to when I'm using BeautifulSoup with soup.find_all('p') I get 76 elements with the right text.
can you please help me to understand the problem?
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I'm using newspaper3k package to parse the following article: https://spectrum.ieee.org/3d-printed-meat
In debugged it until I reached the code section of
ContentExtractor.nodes_to_check
method and I saw that when it execute the following:items = self.parser.getElementsByTag(doc, tag=tag)
when
tag = 'p'
I get 75 elements which do not include the article text, compared to when I'm using BeautifulSoup with soup.find_all('p') I get 76 elements with the right text.
can you please help me to understand the problem?
Thank you.
The text was updated successfully, but these errors were encountered: