Welcome to this project! On this occasion, a job will be done in the role of a Data Engineer.
The idea of this project is to internalize the knowledge required for the development and execution of an API.
Application Programming Interface
is an interface that allows two applications to communicate with each other, independent of the underlying infrastructure. They are very versatile and fundamental tools for the creation of, for example, pipelines, since they allow you to move and provide simple access to the data that you want to make available through the different endpoints, or API exit points.
Today we have FastAPI, a modern and high-performance web framework for building APIs with Python.
The project consists of ingesting data from various sources, then applying the transformations that are considered relevant, and then making the clean data available for consultation through an API. This API will be built in a dockerized virtual environment.
The data will be provided in files of different extensions, such as csv or json. There will be a correction of data types, null and duplicate values, among other tasks. Later, they will have to relate the datasets so they can access their information through API queries.
The queries to be made are:
-
Maximum duration according to type of film (film/series), by platform and by year: The request should be: get_max_duration(year, platform, [min or season])
-
Number of movies and series (separated) by platform The request should be: get_count_platform(platform)
-
Number of times a genre and platform is repeated with greater frequency. The request should be: get_listedin('gender')
-
Actor who repeats himself the most according to platform and year. The request should be: get_actor(platform, year)
-
Data ingestion and normalization
-
Relate the data set and create the table needed to perform queries. Here it is recommended to verify what data you will need based on the queries to be made and concatenate the 4 tables
-
Create the API in a Docker environment
-
Make requested inquiries
Docker
is a complete solution for the production, distribution and use of containers.
-Container
is a software layer abstraction that allows packaging code, with libraries and dependencies in a partially isolated environment.
-Image
is a Docker executable that has everything needed to run applications, including a configuration file, environment and runtime variables, and libraries.
-Dockerfile
text file with instructions for building an image. Image creation automation can be considered.
Docker image with Uvicorn/Guinicorn for high performance web applications:
FAST API Documentation: