To make an API to fetch latest videos sorted in reverse chronological order of their publishing date-time from YouTube for a given tag/search query in a paginated response.
- Server should call the YouTube API continuously in background (async) with some interval (say 15 seconds) for fetching the latest videos for a predefined search query and should store the data of videos (specifically these fields - Video title, description, publishing datetime, thumbnails URLs and any other fields you require) in a database with proper indexes.
- A GET API which returns the stored video data in a paginated response sorted in descending order of published datetime.
- It should be scalable and optimised.
The whole project consists of 4 containers practically one for each service (redis, fastapi, celery worker and celery beat)
API tokens are kept in a list and are rotated through. when a token expires, the next token is used
To run this project, you will need to add the following environment variables to your .env file
MONGO_DB_CREDENTIALS=
=
MONGO_DB_NAME
=
MONGO_COLLECTION_NAME
=
LAST_UPDATED_TIME_COLLECTION
=
YOUTUBE_API_KEY1
=
YOUTUBE_API_KEY2
=
YOUTUBE_API_KEY3
=
MONGO_DB_CREDENTIALS
=
MONGO_DB_NAME
=
MONGO_COLLECTION_NAME
=
LAST_UPDATED_TIME_COLLECTION
=
The project has a docker-compose file which can start the server and the cron job at the same time.
- Redis
- Celery Worker
- Celery Beat
- FastAPI web application
docker-compose up --build
- I was told that thumbnails had had 3 fields to store them, has been rectified now
- Code quality was bad because there database.py was duplicated in both the apps, but this is a necessity as both the apps are talking to the same database. As far as the other files, they are different and have the content absolutely required for the scalability of individual containers.
- Video retrival time was not taken into consideration while requesting for the next video. This time I have created an extra function which will take videos only after the last successful fetch of a video so that none of the videos are missed.
- Duplicated responses were being shown. I have created a function which shows only unique responses in the reverse chronological order of their publishing date-time
Clone the project
git clone https://github.com/mogiiee/Fampay_task.git
Go to the project directory
cd Fampay_task
Set up a virtual environment for the project:
python3 -m venv virtualenv
Go to the web_server
cd web-server
Install dependencies
pip3 install requirements.txt
Start the server
uvicorn app.main:app --reload
Go to the url
http://localhost:8000/docs or http://127.0.0.1:8000/docs
Be greeted with 2 different endpoints
- root endpoint just greets you in a wonderful way
- Gets the unique data from the database in a paginated response. It also shows the current page and the total pages in which the data would fit in.
- celery beat operates at every 10 seconds, can be changed
- The insert query has been set to music can be changed in worker/app/celery_config
- Get your Youtube API credentials from here
- Have 3 tokens in order to minimise risk of error incase of quota completion