Skip to content

omgitsbees/Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Improve Traffic Management with Machine Learning

A modular Python pipeline for analyzing, predicting, and optimizing urban traffic using machine learning and graph algorithms. This project demonstrates real-time traffic prediction, incident detection, congestion hotspot analysis, route optimization, and event impact analysis for smart city and transportation applications.


Features

  • Real-time Traffic Prediction: Predicts vehicle speed and volume using Random Forest regression.
  • Incident Detection: Identifies traffic anomalies (potential incidents) using Isolation Forest.
  • Congestion Hotspot Detection: Detects congestion clusters with DBSCAN clustering.
  • Route Optimization: Finds the fastest route between nodes using a weighted directed graph.
  • Event Impact Analysis: Quantifies the effect of special events (e.g., concerts) on traffic speed.
  • Traffic Signal Optimization (Stub): Placeholder for RL/simulation-based signal optimization (SUMO, CityFlow, etc.).
  • Extensible: Easily add new features, data sources, or advanced models.

Tech Stack


Quick Start

1. Install Dependencies

pip install pandas numpy scikit-learn networkx

2. Prepare Data

  • Place your traffic data as traffic_data.csv in the working directory.
  • The CSV should have columns: timestamp, location_id, speed, volume, incident, weather, event.

3. Run the Script

python improve_traffic_management.py

Example Output

  • Predicted speed and volume for new data points.
  • Detected incidents (anomalies) in the traffic data.
  • Congestion hotspots and their cluster assignments.
  • Fastest route between two nodes in the road network.
  • Impact of events (e.g., concerts) on average speed.

Extending the Pipeline

  • Integrate real-time data feeds (IoT, GPS, sensors).
  • Add deep learning models (LSTM, GNN) for spatio-temporal forecasting.
  • Connect to traffic simulation environments (SUMO, CityFlow).
  • Visualize results using folium, plotly, or matplotlib.
  • Deploy as an API for real-time traffic management.

Project Structure

improve_traffic_management.py
traffic_data.csv

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Creator Analytics

A Python tool for analyzing and visualizing YouTube channel statistics. This project enables you to fetch subscriber counts, total views, and video counts for multiple channels, and compare them visually using bar plots.


Features

  • Fetch Channel Stats: Scrapes subscriber, view, and video counts from YouTube channel pages.
  • Batch Analysis: Analyze multiple channels at once.
  • Visualization: Compare channels with clear bar plots for subscribers, views, and video counts.
  • Extensible: Easily add more channels or extend to other platforms.
  • Simple Interface: Just provide a list of YouTube channel URLs.

Tech Stack


Quick Start

1. Install Dependencies

pip install requests beautifulsoup4 pandas matplotlib seaborn

2. Run the Script

Edit the channel_list in the __main__ block to include your desired YouTube channels.

python Creator_Analytics.py

Example Output

  • Prints a table of channel statistics (subscribers, views, videos).
  • Displays bar plots comparing each channel’s subscribers, views, and video counts.

Notes

  • Web Scraping Warning: This script scrapes YouTube’s public channel pages. YouTube’s HTML structure may change, which can break the scraper. For production use, consider the YouTube Data API.
  • Data Accuracy: Scraped numbers may be formatted (e.g., "1.2M subscribers") and are converted to numeric values for plotting.
  • Extending: You can add more features, such as fetching channel descriptions, recent video stats, or integrating with the YouTube API.

Project Structure

Creator_Analytics.py

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Copyright Infringement Detection

A deep learning pipeline for detecting copyright infringement in video and audio content descriptions. This project uses an LSTM neural network with GloVe word embeddings to classify whether a given text is likely to indicate copyright infringement.


Features

  • Text Preprocessing: Tokenizes and pads input text for model compatibility.
  • GloVe Embeddings: Utilizes pre-trained GloVe vectors for rich word representations.
  • LSTM Model: Employs a neural network for binary classification (infringement vs. not infringement).
  • Evaluation: Prints accuracy and a detailed classification report.
  • Prediction: Predicts infringement status for new, unseen text samples.
  • Extensible: Easily adapt for other copyright or compliance-related NLP tasks.

Tech Stack


Quick Start

1. Install Dependencies

pip install tensorflow numpy scikit-learn

2. Download GloVe Embeddings

Download GloVe 6B 100d, unzip, and place glove.6B.100d.txt in your working directory.

3. Run the Script

python Copyright_Infringement_Detection.py

Example Output

  • Test Accuracy: Prints the model’s accuracy on the test set.
  • Classification Report: Shows precision, recall, and F1-score for each class.
  • Predictions on New Texts: Outputs predicted labels for new sample descriptions.

Extending the Pipeline

  • Add more training data for improved accuracy and generalization.
  • Fine-tune model architecture (e.g., add more LSTM layers, try GRU or Transformer).
  • Integrate with video/audio fingerprinting for multi-modal copyright detection.
  • Deploy as an API for real-time content moderation.

Project Structure

Copyright_Infringement_Detection.py
glove.6B.100d.txt

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Here’s a professional GitHub README for your Cloud AI project:


Cloud AI: Multi-Modal Machine Learning & Document Intelligence Pipeline

A modular Python pipeline for leveraging cloud-scale AI models and open-source tools to process, analyze, and extract insights from text, images, audio, video, and documents. This project demonstrates how to combine state-of-the-art models for conversational AI, document processing, computer vision, speech-to-text, translation, summarization, and sentiment analysis.


Features

  • Conversational AI: Generate responses using pre-trained conversational models (e.g., DialoGPT).
  • Document Processing: Extract text from images (OCR), PDFs, and plain text files.
  • Image Analysis: Perform object detection on images using deep learning models.
  • Video Analysis: Scene clustering via frame extraction and K-Means clustering.
  • Text Summarization: Summarize long documents using transformer models.
  • Translation: Translate text between languages using NMT models.
  • Sentiment Analysis: Analyze sentiment of text using pre-trained models.
  • Speech-to-Text: Transcribe audio files using models like Whisper.
  • Generic Model Deployment: Easily deploy and use any Hugging Face pipeline for custom tasks.
  • Robust Logging: Structured logging for all major steps and error handling.
  • Extensible: Easily add new models or processing steps.

Tech Stack


Quick Start

1. Install Dependencies

pip install torch transformers pillow opencv-python pytesseract PyPDF2 librosa scikit-learn python-dotenv

2. Prepare Example Files

  • Place sample files in your working directory:
    • example_document.png (image for OCR)
    • example_image.jpg (image for object detection)
    • example_document.pdf (PDF for text extraction)
    • example_text_doc.txt (plain text)
    • example_video.mp4 (video for scene clustering)
    • example_audio.wav (audio for speech-to-text)

The script will attempt to create some dummy files if they do not exist.

3. Run the Script

python Cloud_AI.py

Example Output

  • Conversational AI: Prints chatbot responses to user input.
  • Document Processing: Extracts and prints text from images, PDFs, and TXT files.
  • Image Analysis: Prints detected objects in images.
  • Video Analysis: Prints scene cluster labels for video frames.
  • Text Summarization: Prints summaries of extracted text.
  • Translation: Prints translated text.
  • Sentiment Analysis: Prints sentiment results.
  • Speech-to-Text: Prints transcribed audio.

Extending the Pipeline

  • Add new models: Update MODEL_CONFIG and add new functions for additional tasks.
  • Integrate with cloud storage: Adapt file I/O for S3, GCS, or Azure Blob.
  • Deploy as an API: Wrap functions in FastAPI or Flask for web service deployment.
  • Batch processing: Extend to process directories or streams of files.

Project Structure

Cloud_AI.py
example_document.png
example_image.jpg
example_document.pdf
example_text_doc.txt
example_video.mp4
example_audio.wav
.env

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Automated Captioning Quality and Translation

A Python pipeline for automated speech recognition (ASR), caption quality assessment, and neural machine translation (NMT). This project enables you to transcribe audio, evaluate the quality of generated captions, and translate captions into multiple languages with quality metrics.


Features

  • Automatic Speech Recognition (ASR): Transcribe audio files to text using state-of-the-art models (Whisper).
  • Language Detection: Automatically detect the language of the transcript.
  • Caption Quality Assessment: Evaluate captions using readability, grammar/spelling checks, confidence scores, and Word Error Rate (WER).
  • Neural Machine Translation (NMT): Translate captions into multiple target languages using pre-trained models.
  • Translation Quality (BLEU): Compute BLEU scores for translation quality assessment.
  • Logging: Structured logging for all major steps.

Tech Stack


Quick Start

1. Install Dependencies

pip install torch transformers jiwer textstat language-tool-python langdetect nltk

2. Prepare Data

  • Place your audio file (e.g., sample_audio.wav) in the working directory.

3. Run the Script

python Automated_Captioning_Quality_Translation.py

Example Output

  • Quality Assessment: Prints readability, WER, confidence, grammar errors, and a composite quality score.
  • Translations: Prints translations of the transcript into French, Spanish, and German.
  • BLEU Score: Prints the BLEU score for the French translation (if reference is available).

Extending the Pipeline

  • Add more target languages by editing the target_langs parameter.
  • Integrate with video/audio pipelines for batch processing.
  • Use your own reference captions for more accurate WER/BLEU evaluation.
  • Deploy as a web service using FastAPI or Flask for real-time captioning and translation.

Project Structure

Automated_Captioning_Quality_Translation.py
sample_audio.wav

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Analyzing Satellite Images

A modular Python pipeline for advanced analysis of satellite imagery using deep learning and computer vision. This project demonstrates classification, segmentation, object detection, cloud masking, and change detection on multi-spectral satellite data (e.g., Sentinel-2).


Features

  • Land Cover Classification: Classifies satellite images into categories such as water, forest, urban, agriculture, and barren.
  • Semantic Segmentation: Pixel-wise land cover mapping using a U-Net style deep learning model.
  • Object Detection: Detects objects (e.g., buildings, roads) in satellite RGB images using Faster R-CNN.
  • Cloud Masking: Identifies and masks cloud pixels using NDSI or brightness thresholding.
  • Change Detection: Detects changes between two satellite images (e.g., before/after events).
  • Multi-Spectral Support: Handles multi-band imagery (e.g., Sentinel-2, Landsat).

Tech Stack


Quick Start

1. Install Dependencies

pip install torch torchvision pillow rasterio numpy

2. Prepare Data

  • Place your multi-spectral satellite images (e.g., Sentinel-2 .tif files) in the working directory.
  • Place an RGB satellite image (e.g., .jpg or .png) for object detection.

3. Run the Script

python Analyzing_Satellite_Images.py

Example Output

  • Classification: Prints the predicted land cover class for the input image.
  • Segmentation: Prints the shape of the segmentation mask.
  • Object Detection: Prints detected objects and their bounding boxes.
  • Cloud Masking: Prints the shape of the cloud mask.
  • Change Detection: Prints the shape of the change map between two images.

Extending the Pipeline

  • Train with your own data: Replace or fine-tune the models with your labeled satellite datasets.
  • Add more classes: Expand CLASS_NAMES and retrain models for additional land cover types.
  • Integrate with GIS tools: Export results as GeoTIFF or visualize with QGIS/ArcGIS.
  • Deploy as a service: Wrap the pipeline in a FastAPI or Flask app for web-based inference.

Project Structure

Analyzing_Satellite_Images.py
sentinel2_image.tif
sentinel2_image_later.tif
sample_satellite.jpg

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.


Ad Targeting Content Moderation Pipeline

A modular Python pipeline for multi-modal content moderation in ad targeting and digital marketing. This project provides conceptual and practical tools for moderating text, images, video, and audio content to ensure brand safety and compliance in advertising platforms.


Features

  • Text Moderation: Uses a pre-trained transformer model (e.g., unitary/toxic-bert) to flag toxic, obscene, or hateful text.
  • Image Moderation (Conceptual): Simulates image moderation; can be extended to use real NSFW or cloud vision models.
  • Video Moderation: Extracts frames at intervals and applies image moderation to each frame.
  • Audio Moderation: Transcribes audio (placeholder STT) and applies text moderation to the transcript.
  • Multi-Modal Moderation: Conceptual discussion and hooks for fusing text, image, and audio signals.
  • Logging: Structured logging for all moderation actions and errors.
  • Example Usage: Demonstrates moderation for all modalities with dummy data.

Tech Stack


Quick Start

1. Install Dependencies

pip install transformers torch opencv-python pillow requests

2. Run the Example

python Ad_Targeting.py

This will run moderation examples for text, images, video, and audio using dummy/generated data.


Usage

Text Moderation

result = moderate_text_content("This is a wonderful and insightful presentation. Thank you!")
print(result)

Image Moderation (Conceptual)

result = moderate_image_content_conceptual("path/to/image.png")
print(result)

Video Moderation

result = moderate_video_content_frame_based("path/to/video.mp4", frame_interval_seconds=5)
print(result)

Audio Moderation (Conceptual)

result = moderate_audio_content("path/to/audio.wav")
print(result)

Extending the Pipeline

  • Replace conceptual image moderation with a real NSFW or cloud vision API/model.
  • Integrate a real Speech-to-Text (STT) engine for audio moderation (e.g., Whisper, Google Speech-to-Text).
  • Implement multi-modal fusion for advanced moderation scenarios.
  • Connect to ad serving or creative review pipelines for automated compliance.

Project Structure

Ad_Targeting.py

License

MIT License


Contributions and feedback are welcome!
For questions or suggestions, please open an issue or submit a pull request.

About

Basic machine learning models created for learning and curiosity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages