PSIML Applied AI Tour — Notebooks

This repository contains a set of hands-on notebooks used in the Applied AI part of PSIML.
Each notebook is an independent “tour” through a key AI area:

Google Colab basics
Vision
NLP
Voice / Audio

All notebooks are designed to be run on Google Colab.

📂 Repository Structure

psiml-applied-ai/
│
├── notebooks/
│   ├── Psiml_Tour_Collab.ipynb
│   ├── PSIML_Tour_Vision.ipynb
│   ├── PSIML_Tour_NLP.ipynb
│   ├── PSIML_Tour_Voice.ipynb
│
└── README.md

🔵 1. Google Colab Intro

File: notebooks/Psiml_Tour_Collab.ipynb

This notebook provides a quick introduction to Google Colab—an online environment for running Python and Jupyter notebooks with many scientific and machine-learning libraries preinstalled. It demonstrates how to execute Python code and install additional packages directly within Colab.

This is the recommended first stop before exploring other notebooks.

👁️ 2. Vision — Find Anything in Images

File: notebooks/PSIML_Tour_Vision.ipynb

This notebook demonstrates how modern vision models can locate, segment, and even modify objects in images using natural-language prompts. It combines three powerful tools: Grounding DINO for zero-shot object detection, Segment Anything (SAM) for generating high-quality masks, and diffusers pipelines for text-to-image generation and inpainting.

You will learn:

How zero-shot object detection works with Grounding DINO
How to turn detected boxes into segmentation masks using SAM
How to use inpainting models to replace or modify objects in the image
How to run complete, practical workflows for:
- Finding objects using text prompts
- Visualizing detections and masks
- Editing images by removing or altering selected regions

📝 3. NLP — Understanding and Generating Text

File: notebooks/PSIML_Tour_NLP.ipynb

This notebook is a compact tour of modern language models: it starts with using LLM chat APIs (with system prompts, multi-turn conversations, and sampling parameters), then shows how to run a small language model (SLM) directly in Colab, and finally introduces a vision-language model (VLM) for image captioning.

You will learn:

How to call chat-style LLM APIs from code and structure system/user/assistant messages
How parameters like temperature and top-p affect model outputs
How to steer behavior with system prompts (e.g. for translation and “tricky” examples)
How to load and run a small open-source language model with AutoModelForCausalLM in Colab
How to use a vision-language model (AutoModelForVision2Seq) to generate natural language descriptions from images

🔊 4. Voice — Understanding and Generating Speech

File: notebooks/PSIML_Tour_Voice.ipynb

This notebook walks through a full speech-to-speech translation (S2ST) pipeline using a cascaded approach: ASR → MT → TTS. It uses a Whisper-style speech recognition model (AutoModelForSpeechSeq2Seq + AutoProcessor) to transcribe audio, a machine translation component (via OPUS-MT), and an XTTS-based TTS model to generate speech in the target language, with examples built on the Common Voice dataset.

You will learn:

What speech-to-speech translation is and why cascaded ASR → MT → TTS is a practical solution
How to load and run an ASR model with Hugging Face Transformers and pipelines
How to plug in a machine translation model (OPUS-MT) between ASR and TTS
How to use an XTTS text-to-speech model to synthesize translated speech
How to combine all components into a simple end-to-end speech-to-speech translation pipeline

▶️ How to Run the Notebooks (Google Colab)

All notebooks are intended to be executed on Google Colab.

Option A — Open directly from GitHub

Open any notebook on GitHub:
- notebooks/Psiml_Tour_Collab.ipynb
- notebooks/PSIML_Tour_Vision.ipynb
- notebooks/PSIML_Tour_NLP.ipynb
- notebooks/PSIML_Tour_Voice.ipynb
If the “Open in Colab” button is available, click it.
If not, copy the GitHub URL and open it via:
Colab → File → Open notebook → GitHub

Option B — Download and upload to Colab

Download the .ipynb file from GitHub
Open https://colab.research.google.com
Choose Upload, select the notebook, and run it
Execute cells from top to bottom
(install commands like pip install ... should be run first)

⚙️ Setting up the GPU (T4) in Google Colab

To ensure the notebooks run fast and smoothly, set Colab to use a T4 GPU:

Open the notebook in Colab
Go to Runtime → Change runtime type
Under Hardware accelerator, choose GPU
Under GPU type, select T4 (if available)
Click Save

💻 Running Outside Colab

If you want to run these notebooks on your own machine or server,
please contact the PSIML team for guidance on the environment setup.

You can reach us via:

Discord
Instagram
Email
Direct message (DM)
or any other communication channel where PSIML provides support.

ℹ️ About PSIML

PSIML (Practical Seminar on Machine Learning) is a hands-on educational initiative focused on modern AI methods, practical projects, and accessible machine learning resources.

This repository is part of the Applied AI materials used in PSIML workshops and sessions.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PSIML Applied AI Tour — Notebooks

📂 Repository Structure

🔵 1. Google Colab Intro

👁️ 2. Vision — Find Anything in Images

You will learn:

📝 3. NLP — Understanding and Generating Text

You will learn:

🔊 4. Voice — Understanding and Generating Speech

You will learn:

▶️ How to Run the Notebooks (Google Colab)

Option A — Open directly from GitHub

Option B — Download and upload to Colab

⚙️ Setting up the GPU (T4) in Google Colab

💻 Running Outside Colab

ℹ️ About PSIML

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

pfe-rs/psiml-applied-ai

Folders and files

Latest commit

History

Repository files navigation

PSIML Applied AI Tour — Notebooks

📂 Repository Structure

🔵 1. Google Colab Intro

👁️ 2. Vision — Find Anything in Images

You will learn:

📝 3. NLP — Understanding and Generating Text

You will learn:

🔊 4. Voice — Understanding and Generating Speech

You will learn:

▶️ How to Run the Notebooks (Google Colab)

Option A — Open directly from GitHub

Option B — Download and upload to Colab

⚙️ Setting up the GPU (T4) in Google Colab

💻 Running Outside Colab

ℹ️ About PSIML

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages