Skip to content

Latest commit

 

History

History
92 lines (61 loc) · 2.95 KB

python-libraries.md

File metadata and controls

92 lines (61 loc) · 2.95 KB

Python Libraries

Package Manager

  • pypi
  • poetry
  • pdm
  • uv

HTTP & Networking

  • requests
  • aiohttp

Parser

  • beautifulsoup4
  • m3u8

Browser Automation

  • selenium
  • playwright

Crypto

  • pycrypto
  • pycryptodome

OCR

  • Tesseract (⭐️58.1k). Tesseract is one of the most popular OCR libraries in Python. It supports over 100 languages and can extract text from various image formats.
  • pytesseract (⭐5.5k). pytesseract is a wrapper around Tesseract OCR engine. It provides a simple interface to extract text from images using Tesseract.
  • OpenCV (⭐️75.5k): OpenCV is a computer vision library that can be used for OCR tasks. It provides functions for image preprocessing, text detection, and character recognition.
  • EasyOCR (⭐22k). EasyOCR is a recently developed OCR library for Python. It supports more than 80 languages and provides pre-trained models for text extraction from images. EasyOCR is known for its ease of use and high accuracy.
  • ddddocr (⭐8.3k)
  • Doctr (⭐3.1k)
  • Keras-OCR (⭐1.3k)
  • GOCR: GOCR is an OCR engine developed in C. It can be used in Python using the PyOCR library.
  • //OCRopus: OCRopus is a collection of document analysis and OCR tools. It includes the Tesseract OCR engine and provides additional features for document layout analysis and text extraction.
  • //PyOCR: PyOCR is another wrapper around Tesseract OCR engine. It supports multiple OCR engines, including Tesseract, CuneiForm, and GOCR.

API

  • Google Cloud Vision API: Google Cloud Vision API is a cloud-based OCR service provided by Google. It offers advanced features like image classification, object detection, and handwriting recognition.
  • Microsoft Azure Computer Vision API: Microsoft Azure Computer Vision API is another cloud-based OCR service. It provides OCR capabilities along with other computer vision features like image tagging and face recognition.
  • Amazon Textract: Amazon Textract is a machine learning-based OCR service provided by Amazon Web Services. It can extract text and data from scanned documents, invoices, forms, and tables.

Date and Time

  • arrow

Data Access

Connection Pool

  • dbutils

MySQL

  • PyMySQL
  • mysqlclient
  • aiomysql

ORM

  • peewee
  • Django ORM
  • SQLAlchemy
  • PonyORM
  • SQLObject
  • Tortoise ORM

Elasticsearch

  • elasticsearch. Official Python client for Elasticsearch.
  • elasticsearch7
  • elasticsearch8
  • elasticsearch-dsl. a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py).

Scientific Computing

  • NumPy

Data Science

  • Polars
  • Pandas

Artificial Intelligence

  • PyTorch