This repository contains a set of hands-on notebooks used in the Applied AI part of PSIML.
Each notebook is an independent “tour” through a key AI area:
- Google Colab basics
- Vision
- NLP
- Voice / Audio
All notebooks are designed to be run on Google Colab.
psiml-applied-ai/
│
├── notebooks/
│ ├── Psiml_Tour_Collab.ipynb
│ ├── PSIML_Tour_Vision.ipynb
│ ├── PSIML_Tour_NLP.ipynb
│ ├── PSIML_Tour_Voice.ipynb
│
└── README.md
File: notebooks/Psiml_Tour_Collab.ipynb
This notebook provides a quick introduction to Google Colab—an online environment for running Python and Jupyter notebooks with many scientific and machine-learning libraries preinstalled. It demonstrates how to execute Python code and install additional packages directly within Colab.
This is the recommended first stop before exploring other notebooks.
File: notebooks/PSIML_Tour_Vision.ipynb
This notebook demonstrates how modern vision models can locate, segment, and even modify objects in images using natural-language prompts. It combines three powerful tools: Grounding DINO for zero-shot object detection, Segment Anything (SAM) for generating high-quality masks, and diffusers pipelines for text-to-image generation and inpainting.
- How zero-shot object detection works with Grounding DINO
- How to turn detected boxes into segmentation masks using SAM
- How to use inpainting models to replace or modify objects in the image
- How to run complete, practical workflows for:
- Finding objects using text prompts
- Visualizing detections and masks
- Editing images by removing or altering selected regions
File: notebooks/PSIML_Tour_NLP.ipynb
This notebook is a compact tour of modern language models: it starts with using LLM chat APIs (with system prompts, multi-turn conversations, and sampling parameters), then shows how to run a small language model (SLM) directly in Colab, and finally introduces a vision-language model (VLM) for image captioning.
- How to call chat-style LLM APIs from code and structure system/user/assistant messages
- How parameters like temperature and top-p affect model outputs
- How to steer behavior with system prompts (e.g. for translation and “tricky” examples)
- How to load and run a small open-source language model with
AutoModelForCausalLMin Colab - How to use a vision-language model (
AutoModelForVision2Seq) to generate natural language descriptions from images
File: notebooks/PSIML_Tour_Voice.ipynb
This notebook walks through a full speech-to-speech translation (S2ST) pipeline using a cascaded approach: ASR → MT → TTS. It uses a Whisper-style speech recognition model (AutoModelForSpeechSeq2Seq + AutoProcessor) to transcribe audio, a machine translation component (via OPUS-MT), and an XTTS-based TTS model to generate speech in the target language, with examples built on the Common Voice dataset.
- What speech-to-speech translation is and why cascaded ASR → MT → TTS is a practical solution
- How to load and run an ASR model with Hugging Face Transformers and pipelines
- How to plug in a machine translation model (OPUS-MT) between ASR and TTS
- How to use an XTTS text-to-speech model to synthesize translated speech
- How to combine all components into a simple end-to-end speech-to-speech translation pipeline
All notebooks are intended to be executed on Google Colab.
- Open any notebook on GitHub:
notebooks/Psiml_Tour_Collab.ipynbnotebooks/PSIML_Tour_Vision.ipynbnotebooks/PSIML_Tour_NLP.ipynbnotebooks/PSIML_Tour_Voice.ipynb
- If the “Open in Colab” button is available, click it.
- If not, copy the GitHub URL and open it via:
Colab → File → Open notebook → GitHub
- Download the
.ipynbfile from GitHub - Open https://colab.research.google.com
- Choose Upload, select the notebook, and run it
- Execute cells from top to bottom
(install commands likepip install ...should be run first)
To ensure the notebooks run fast and smoothly, set Colab to use a T4 GPU:
- Open the notebook in Colab
- Go to Runtime → Change runtime type
- Under Hardware accelerator, choose GPU
- Under GPU type, select T4 (if available)
- Click Save
If you want to run these notebooks on your own machine or server,
please contact the PSIML team for guidance on the environment setup.
You can reach us via:
- Discord
- Direct message (DM)
or any other communication channel where PSIML provides support.
PSIML (Practical Seminar on Machine Learning) is a hands-on educational initiative focused on modern AI methods, practical projects, and accessible machine learning resources.
This repository is part of the Applied AI materials used in PSIML workshops and sessions.