Skip to content

This is a simple script to scrape manga pages from a websites and save them to a folder on AWS EC2 instance

License

Notifications You must be signed in to change notification settings

jjeanjacques10/manga-scrapper-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manga Page Scrapper



Report Bug · Request Feature

This is a simple script to scrape manga pages from a websites and save them to a folder on AWS EC2 instance. There is an api and a consumer, the api is a Flask app that takes a chapter from a manga and if not already scraped, send a message to the consumer SQS to scrape the pages.

SQS Queue

SQS Message

{
    "source": "manga_livre",
    "manga": "Naruto",
    "chapter": "692"
}

Endpoints

  • Get a single chapter page

GET /page

Query Param Type Description
source string Required. manga_livre or muito_manga
manga string Required. manga name
number string Required. chapter number
page string Required. page number
  • Save a single chapter page on EBS

POST /page

Form Type Description
source string Required. manga_livre or muito_manga
manga string Required. manga name
number string Required. chapter number
page string Required. number of pages
image file Required. image file
  • Get a chapter

GET /chapter

Query Param Type Description
source string Required. manga_livre or muito_manga
manga string Required. manga name
number string Required. chapter number

Sites Supported

Architecture

GitHub Actions

  • Variables to be set in the repository secrets
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=
AWS_SECURITY_GROUP =
SSH_PRIVATE_KEY=
HOSTNAME=
USERNAME=
  • Workflow to deploy to EC2 instance

.github/workflows/deploy.yml

  • Script to config the EC2 instance, install docker, update nginx and run the container

app.sh

Run Locally

Use docker-compose to run both the api and the consumer

docker-compose up --build --scale manga_consumer=10 -d

--scale manga_consumer=10 will run 10 consumers in parallel

Licença

MIT

⚠ Atention ⚠

This project is for study purposes only, I do not encourage piracy. If you like the manga, buy it. If you want to read it for free, go to the official website. I am not responsible for any misuse of this project.


Developed by Jean Jacques Barros

About

This is a simple script to scrape manga pages from a websites and save them to a folder on AWS EC2 instance

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published