Webscraping Code Outputted Nothing to Shell #21

wesleyhedrick · 2018-06-16T15:13:02Z

I am really excited about the potential of webscraping! But when I ran the webscraping code from chapter 20, it output nothing to the interactive shell. When I pressed run, the interactive shell window came to the foreground, but there was nothing in it.

In case I had mistyped the code, I decided to copy and paste right from tinyurl. Still nothing.

Please help.

calthoff · 2018-06-22T02:50:40Z

Hello! I rechecked the web scraper code and it is working just fine on my end. Please post a question (and make sure to include your actual code) in the Self-Taught Programmers Facebook group: https://www.facebook.com/groups/selftaughtprogrammers/.

jsteve427 · 2018-07-22T21:08:01Z

I'm having the same problem. The code runs, but it doesn't print anything, despite copy/pasting the code from here. I even posted a question to the FB group and the responses unfortunately didn't help. I first tried running the code on Win. 10 then Linux Mint 18 (currently don't have access to a Mac) to see if that would change anything, but it didn't.

John-m555 · 2018-07-28T05:17:43Z

I have the same problem. So I add one line to print the "url" as below.

    for tag in sp.find_all("a"):
        url = tag.get("href")
        print("\n" + url)           << added !
        if url is None:
            continue
        if "html" in url:
            print("\n" + url)

With this addition, you will have list of "url". Copy the out put to text editor and try to find some URL actually shown in the https://news.google.com You will see non of news article link on the web is shown in the list of "url"

It means non of text data with "href" tag in "url" match with actual URL on the Google news home page.......... Did we get right "sp" by Beartifulsoup ??

John-m555 · 2018-07-28T05:42:22Z

Guys,
Now I found what's wrong. Our program is not wrong but the HTML of the https://news.google.com must be changed after this book was written. As I stated above, current HTML doesn't contain any html as the list of "url" doesn't contain any of URL.

Then, try to change the ULR to check as below. You will have complete list of the link on the Yahoo home page as same as the book. (this works only as of 28-Jul, 2018)

news = "https://www.yahoo.com/"
Scraper(news).scrape()

sui74 · 2018-12-25T02:46:52Z

news = "https://www.yahoo.com/"
Scraper(news).scrape()

The code worked correctly, thank you.

EvanKardos42 · 2019-01-25T01:50:10Z

this is a simple fix and I believe this is a good way to practice your problem solving skills as it just requires you to think about what is happening in the algorithm and what it is looking at.

SPOILERS done below:

so here is the thing about the webscraper. it looks for the links that ends with "html" but google has it were its a link to link( its weird). for example looking at the website and copying the link on the first link(1/24/2019) you will get this:

"https://news.google.com/articles/CAIiEFK2EuxWltO3z6pUctxA2HwqFwgEKg8IACoHCAowjuuKAzCWrzww9oEY?hl=en-US&gl=US&ceid=US%3Aen"

that is not the link to the actual article itself this is:
"https://www.nytimes.com/2019/01/24/us/politics/senate-vote-fails-shutdown.html"

it a link to a link and i dont know why Google does this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webscraping Code Outputted Nothing to Shell #21

Webscraping Code Outputted Nothing to Shell #21

wesleyhedrick commented Jun 16, 2018

calthoff commented Jun 22, 2018

jsteve427 commented Jul 22, 2018

John-m555 commented Jul 28, 2018 •

edited

Loading

John-m555 commented Jul 28, 2018 •

edited

Loading

sui74 commented Dec 25, 2018

EvanKardos42 commented Jan 25, 2019

Webscraping Code Outputted Nothing to Shell #21

Webscraping Code Outputted Nothing to Shell #21

Comments

wesleyhedrick commented Jun 16, 2018

calthoff commented Jun 22, 2018

jsteve427 commented Jul 22, 2018

John-m555 commented Jul 28, 2018 • edited Loading

John-m555 commented Jul 28, 2018 • edited Loading

sui74 commented Dec 25, 2018

EvanKardos42 commented Jan 25, 2019

John-m555 commented Jul 28, 2018 •

edited

Loading

John-m555 commented Jul 28, 2018 •

edited

Loading