Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question - Twitter support #12

Open
FuchsiaSoft opened this issue May 10, 2016 · 8 comments
Open

Question - Twitter support #12

FuchsiaSoft opened this issue May 10, 2016 · 8 comments

Comments

@FuchsiaSoft
Copy link

I've just stumbled across this and love it... but I'd like to add support for tweeting out broken links automatically. Similar to the slack option currently just another platform for it I guess.

I've got experience working with Twitter's API and associated .Net libs for it so can't see it being particularly tricky, just wanted to see if it would be a welcome addition from your point of view before I go off and fork etc.

If the addition would be welcome let me know and I'll provide an outline of how I'd plan on doing it. 😃

@hmol
Copy link
Owner

hmol commented May 10, 2016

Yeah, I guess if you see it as a useful feature then go right ahead and fork me :neckbeard:
One thing: if the crawler finds 1000 broken links, it will tweet 1000 tweets at the same time. Do you think this could be a problem?

@FuchsiaSoft
Copy link
Author

Yes it definitely would be a problem... twitter's rate limits are 15 updates per window, which conveniently is 15 minutes long, and also API key gets revoked for duplicate messages in quick repetition. So there would need to be a message queue or similar.

The options I see are:

  1. Message queue that posts them in line with twitter rate limits. This is doable as the twitter API gives a header in its response saying how many requests are "left" for the current key. I'd also need to have some sort of persistence for relatively recent tweets to make sure that duplicates aren't put up in sequence. Maybe a local DB to make it resilient for restarts etc.
  2. Aggregate results into a report and have the tweet just reference a link to that. This could then be linked to a twitter account which acts as a bot that people can tweet to and request a crawl of a website, with the bot answering them back on twitter directly when complete.

I'll have a detailed look at the setup of your current code base and see how I can go about fitting it in with minimal disruption.

Also, the twitter thing I think would be mainly fun to do, but I'm also considering it for a real-world intranet and have it send the results with SMTP from an internal server. So I'll probably put that in too

Thanks for being so welcoming by the way!... social coding for the win! 👍

@hmol
Copy link
Owner

hmol commented May 10, 2016

Alternative 1 sounds difficult. Alternative 2 is a great idea. But where will you host the generated report?

Btw: SMTP as a output sounds wery useful. You could gather all the broken links as a report and just send one mail (Not one mail per broken link).

@FuchsiaSoft
Copy link
Author

agreed re: alternatives 1 and 2... actually as I was typing it I realised the same.

for hosting the report I'd probably go with pastebin which has a handy API for just such use cases... or maybe GitHub gists (never even checked if they're publicly accessible without a GitHub account but I assume they are)

And SMTP yes definitely just email the aggregate report, sorry if I didn't explain that point.

coolio... I'll crack on ASAP :)

p.s. I have a separate issue around proxy support, but I can raise that in a separate issue for tracking/clarity

@hmol
Copy link
Owner

hmol commented May 10, 2016

I was not aware that you could use pastebin for this, really great to avoid having to host the reports for our self :bowtie:.
If I understand this correctly there needs to be an instance of the LinkCrawler running someplace on a server and receiving events from twitter, finding broken links on requested website, and then tweeting a response. Do you think I should continue to use Azure Webjob for this, or did you have something else on mind? Should there be a limit on the number of crawled links?

PS: If you want to implement SMTP support, maybe you could create a separate issue for that as well?

@FuchsiaSoft
Copy link
Author

Yes it would need to run on an endless loop essentially polling twitter for mentions. There is a convenient endpoint for that exact purpose so yeah it's pretty much as simple as you describe.

For twitter, the webjob wouldn't be the way to go, but since we're going with the aggregated report option I'd suggest putting the twitter side of things into its own application and moving the logic of link crawler into a class library. So the console program would still function as normal but we can use link crawler in other places.

And I'd plan on running the twitter side on a raspberry pi using Mono... but I'd need to make sure that the code ports to Mono OK for that first. Would be a cool thing to have running though 😄

@hmol
Copy link
Owner

hmol commented May 10, 2016

Cool! 👍 Let me know if you need anything, looking forward to your pull request 😄

@hmol
Copy link
Owner

hmol commented May 12, 2016

Maybe it would be relevant to take a look at User streams https://dev.twitter.com/streaming/userstreams ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants