Serverless app for fetching public data from a company website URL
There is a single client facing API endpoint that accepts a "url". Then that site is scraped and social links are collected. If a Facebook link exists in the website data, then the Facebook page’s “about” section is again scraped for more data. The data is stored and then response returned to the user in the same request. If there is a repeat request with the same URL, data is not scraped again, but instead the data stored from the previous request is returned.
- Docker
- Docker Compose
$ docker-compose up --build
- Serverless for local development
- CloudFormation for maintaining written architecture.
- Lambda for implementing Function as a Service.
- DynamoDB is the NoSQL storage, where the data is stored.
URL: /dev/collector
Sample Request:
{
"url": "https://mysite.com/"
}
Sample Response
{
"createdAt": "2020-10-09 03:05:46.949389+00:00",
"website": {
"fb": "https://www.facebook.com/mysite/",
"linkedin": "https://www.linkedin.com/company/mysite/",
"twitter": "https://twitter.com/mysite",
"instagram": "https://www.instagram.com/mysite/",
"youtube": "",
"pinterest": ""
},
"id": "54894f08-09dc-11eb-ac4b-f39630d71423",
"url": "https://mysite.com/",
"fb": {
"title": "My Site",
"founded": "",
"email": "[email protected]",
"phone": "",
"about": "Making My Site Simpler, Faster",
"categories": "Financial Service",
"likes": "35,338",
"talking": "724",
"awards": "Forbes FinTech 50 2020 American Banker's Best FinTechs to Work for 2020",
"mission": "Focused on changing the way My Site works.",
"products": "My Site"
},
"updatedAt": "2020-10-09 03:05:48.067641+00:00"
}
- Use a Task Queue to scrape data in the background
- Refactor to organize the code better
- Improve security and data validation
- Scrape full data when "See More" appears in Facebook
MIT