Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webscraping Code Outputted Nothing to Shell #21

Open
wesleyhedrick opened this issue Jun 16, 2018 · 6 comments
Open

Webscraping Code Outputted Nothing to Shell #21

wesleyhedrick opened this issue Jun 16, 2018 · 6 comments

Comments

@wesleyhedrick
Copy link

I am really excited about the potential of webscraping! But when I ran the webscraping code from chapter 20, it output nothing to the interactive shell. When I pressed run, the interactive shell window came to the foreground, but there was nothing in it.

In case I had mistyped the code, I decided to copy and paste right from tinyurl. Still nothing.

Please help.

@calthoff
Copy link
Owner

Hello! I rechecked the web scraper code and it is working just fine on my end. Please post a question (and make sure to include your actual code) in the Self-Taught Programmers Facebook group: https://www.facebook.com/groups/selftaughtprogrammers/.

@jsteve427
Copy link

I'm having the same problem. The code runs, but it doesn't print anything, despite copy/pasting the code from here. I even posted a question to the FB group and the responses unfortunately didn't help. I first tried running the code on Win. 10 then Linux Mint 18 (currently don't have access to a Mac) to see if that would change anything, but it didn't.

@John-m555
Copy link

John-m555 commented Jul 28, 2018

I have the same problem. So I add one line to print the "url" as below.

    for tag in sp.find_all("a"):
        url = tag.get("href")
        print("\n" + url)           << added !
        if url is None:
            continue
        if "html" in url:
            print("\n" + url)

With this addition, you will have list of "url". Copy the out put to text editor and try to find some URL actually shown in the https://news.google.com You will see non of news article link on the web is shown in the list of "url"

It means non of text data with "href" tag in "url" match with actual URL on the Google news home page.......... Did we get right "sp" by Beartifulsoup ??

@John-m555
Copy link

John-m555 commented Jul 28, 2018

Guys,
Now I found what's wrong. Our program is not wrong but the HTML of the https://news.google.com must be changed after this book was written. As I stated above, current HTML doesn't contain any html as the list of "url" doesn't contain any of URL.

Then, try to change the ULR to check as below. You will have complete list of the link on the Yahoo home page as same as the book. (this works only as of 28-Jul, 2018)

news = "https://www.yahoo.com/"
Scraper(news).scrape()

@sui74
Copy link

sui74 commented Dec 25, 2018

news = "https://www.yahoo.com/"
Scraper(news).scrape()

The code worked correctly, thank you.

@EvanKardos42
Copy link

this is a simple fix and I believe this is a good way to practice your problem solving skills as it just requires you to think about what is happening in the algorithm and what it is looking at.

SPOILERS done below:

so here is the thing about the webscraper. it looks for the links that ends with "html" but google has it were its a link to link( its weird). for example looking at the website and copying the link on the first link(1/24/2019) you will get this:

"https://news.google.com/articles/CAIiEFK2EuxWltO3z6pUctxA2HwqFwgEKg8IACoHCAowjuuKAzCWrzww9oEY?hl=en-US&gl=US&ceid=US%3Aen"

that is not the link to the actual article itself this is:
"https://www.nytimes.com/2019/01/24/us/politics/senate-vote-fails-shutdown.html"

it a link to a link and i dont know why Google does this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants