diff --git a/docs/examples.md b/docs/examples.md index 8f39e15..fc4b3ba 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -14,10 +14,15 @@ - [Google's Gemini Models](#googles-gemini-models) - [Cohere's Command Models](#coheres-command-models) - [Mistral's Mistral Models](#mistrals-mistral-models) - - [OctoAI's Models](#octoais-models) + - [Fireworks](#fireworks) + - [Together](#together) + - [Groq](#groq) - [Llama.cpp](#llamacpp) + - [Ollama](#ollama) - [Transcription Options](#transcription-options) - [Whisper.cpp](#whispercpp) + - [Whisper Python](#whisper-python) + - [Whisper Diarization](#whisper-diarization) - [Deepgram](#deepgram) - [Assembly](#assembly) - [Prompt Options](#prompt-options) @@ -25,10 +30,7 @@ - [Docker Compose](#docker-compose) - [Deno](#deno) - [Bun](#bun) -- [Makeshift Test Suite](#makeshift-test-suite) - - [Full Test Suite](#full-test-suite) - - [Partial Test Command for Local Services](#partial-test-command-for-local-services) -- [Create Single Markdown File with Entire Project](#create-single-markdown-file-with-entire-project) +- [Test Suite](#test-suite) ## Content and Feed Inputs @@ -37,7 +39,8 @@ Run on a single YouTube video. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" ``` ### Process Multiple Videos in YouTube Playlist @@ -45,13 +48,16 @@ npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" Run on multiple YouTube videos in a playlist. ```bash -npm run as -- --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0SoAVu8Rc1emAdGPbSbr" +npm run as -- \ + --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0SoAVu8Rc1emAdGPbSbr" ``` Run on playlist URL and generate JSON info file with markdown metadata of each video in the playlist: ```bash -npm run as -- --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0SoAVu8Rc1emAdGPbSbr" --info +npm run as -- \ + --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0SoAVu8Rc1emAdGPbSbr" \ + --info ``` ### Process Multiple Videos Specified in a URLs File @@ -59,13 +65,16 @@ npm run as -- --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0S Run on an arbitrary list of URLs in `example-urls.md`. ```bash -npm run as -- --urls "content/example-urls.md" +npm run as -- \ + --urls "content/example-urls.md" ``` Run on URLs file and generate JSON info file with markdown metadata of each video: ```bash -npm run as -- --urls "content/example-urls.md" --info +npm run as -- \ + --urls "content/example-urls.md" \ + --info ``` ### Process Single Audio or Video File @@ -73,7 +82,8 @@ npm run as -- --urls "content/example-urls.md" --info Run on `audio.mp3` on the `content` directory: ```bash -npm run as -- --file "content/audio.mp3" +npm run as -- \ + --file "content/audio.mp3" ``` ### Process Podcast RSS Feed @@ -81,7 +91,8 @@ npm run as -- --file "content/audio.mp3" Process RSS feed from newest to oldest (default behavior): ```bash -npm run as -- --rss "https://ajcwebdev.substack.com/feed" +npm run as -- \ + --rss "https://ajcwebdev.substack.com/feed" ``` Process a certain number of the most recent items, for example the last three episodes released on the feed: @@ -114,15 +125,16 @@ Process a single specific episode from a podcast RSS feed by providing the episo npm run as -- \ --rss "https://ajcwebdev.substack.com/feed" \ --item "https://api.substack.com/feed/podcast/36236609/fd1f1532d9842fe1178de1c920442541.mp3" \ - --whisper tiny \ - --llama \ + --ollama \ --prompt titles summary longChapters takeaways questions ``` Run on a podcast RSS feed and generate JSON info file with markdown metadata of each item: ```bash -npm run as -- --rss "https://ajcwebdev.substack.com/feed" --info +npm run as -- \ + --rss "https://ajcwebdev.substack.com/feed" \ + --info ``` ## Language Model (LLM) Options @@ -145,23 +157,33 @@ For each model available for each provider, I have collected the following detai ### OpenAI's ChatGPT Models ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --chatgpt ``` Select ChatGPT model: ```bash # Select GPT-4o mini model - https://platform.openai.com/docs/models/gpt-4o-mini -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GPT_4o_MINI +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --chatgpt GPT_4o_MINI # Select GPT-4o model - https://platform.openai.com/docs/models/gpt-4o -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GPT_4o +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --chatgpt GPT_4o # Select GPT-4 Turbo model - https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4 -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GPT_4_TURBO +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --chatgpt GPT_4_TURBO # Select GPT-4 model - https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4 -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GPT_4 +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --chatgpt GPT_4 ``` | Model | Context Window | Max Output | Input Tokens | Output Tokens | Batch Input | Batch Output | @@ -174,159 +196,273 @@ npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GP ### Anthropic's Claude Models ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --claude ``` Select Claude model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude CLAUDE_3_5_SONNET -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude CLAUDE_3_OPUS -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude CLAUDE_3_SONNET -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude CLAUDE_3_HAIKU +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --claude CLAUDE_3_5_SONNET + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --claude CLAUDE_3_OPUS + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --claude CLAUDE_3_SONNET + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --claude CLAUDE_3_HAIKU ``` ### Google's Gemini Models ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --gemini +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --gemini ``` Select Gemini model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --gemini GEMINI_1_5_FLASH -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --gemini GEMINI_1_5_PRO +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --gemini GEMINI_1_5_FLASH + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --gemini GEMINI_1_5_PRO ``` ### Cohere's Command Models ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --cohere +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --cohere ``` Select Cohere model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --cohere COMMAND_R -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --cohere COMMAND_R_PLUS +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --cohere COMMAND_R + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --cohere COMMAND_R_PLUS ``` ### Mistral's Mistral Models ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --mistral ``` Select Mistral model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral MIXTRAL_8x7b -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral MIXTRAL_8x22b -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral MISTRAL_LARGE -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral MISTRAL_NEMO -``` - -### OctoAI's Models +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --mistral MIXTRAL_8x7b -```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo -``` +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --mistral MIXTRAL_8x22b -Select Octo model: +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --mistral MISTRAL_LARGE -```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo LLAMA_3_1_8B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo LLAMA_3_1_70B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo LLAMA_3_1_405B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo MISTRAL_7B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo MIXTRAL_8X_7B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo NOUS_HERMES_MIXTRAL_8X_7B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --octo WIZARD_2_8X_22B +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --mistral MISTRAL_NEMO ``` ### Fireworks ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks ``` Select Fireworks model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks LLAMA_3_1_405B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks LLAMA_3_1_70B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks LLAMA_3_1_8B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks LLAMA_3_2_3B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks LLAMA_3_2_1B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks QWEN_2_5_72B +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks LLAMA_3_1_405B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks LLAMA_3_1_70B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks LLAMA_3_1_8B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks LLAMA_3_2_3B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks LLAMA_3_2_1B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --fireworks QWEN_2_5_72B ``` ### Together ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together ``` Select Together model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together LLAMA_3_2_3B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together LLAMA_3_1_405B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together LLAMA_3_1_70B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together LLAMA_3_1_8B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together GEMMA_2_27B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together GEMMA_2_9B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together QWEN_2_5_72B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together QWEN_2_5_7B +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together LLAMA_3_2_3B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together LLAMA_3_1_405B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together LLAMA_3_1_70B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together LLAMA_3_1_8B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together GEMMA_2_27B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together GEMMA_2_9B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together QWEN_2_5_72B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --together QWEN_2_5_7B ``` ### Groq ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq ``` Select Groq model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq LLAMA_3_1_70B_VERSATILE -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq LLAMA_3_1_8B_INSTANT -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq LLAMA_3_2_1B_PREVIEW -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq LLAMA_3_2_3B_PREVIEW -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq MIXTRAL_8X7B_32768 +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq LLAMA_3_1_70B_VERSATILE + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq LLAMA_3_1_8B_INSTANT + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq LLAMA_3_2_1B_PREVIEW + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq LLAMA_3_2_3B_PREVIEW + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --groq MIXTRAL_8X7B_32768 ``` ### Llama.cpp ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --llama +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --llama ``` Select Llama model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --llama GEMMA_2_2B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --llama LLAMA_3_2_1B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --llama PHI_3_5 -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --llama QWEN_2_5_3B +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --llama GEMMA_2_2B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --llama LLAMA_3_2_1B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --llama PHI_3_5 + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --llama QWEN_2_5_3B ``` ### Ollama ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama ``` Select Ollama model: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama LLAMA_3_2_1B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama LLAMA_3_2_3B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama GEMMA_2_2B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama PHI_3_5 -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama QWEN_2_5_1B -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama QWEN_2_5_3B +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama LLAMA_3_2_1B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama LLAMA_3_2_3B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama GEMMA_2_2B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama PHI_3_5 + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama QWEN_2_5_1B + +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --ollama QWEN_2_5_3B ``` ## Transcription Options @@ -337,28 +473,42 @@ If neither the `--deepgram` or `--assembly` option is included for transcription ```bash # tiny model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper tiny +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper tiny # base model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper base +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper base # small model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper small +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper small # medium model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper medium +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper medium # large-v2 model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper large-v2 +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper large-v2 # large-v3-turbo model -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper large-v3-turbo +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisper large-v3-turbo ``` Run `whisper.cpp` in a Docker container with `--whisperDocker`: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDocker base +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisperDocker base ``` ### Whisper Python @@ -366,7 +516,9 @@ npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDoc Use the original [`openai/whisper`](https://github.com/openai/whisper) Python library with the newly released [`turbo`](https://github.com/openai/whisper/discussions/2363) model: ```bash -npm run as -- --file "content/audio.mp3" --whisperPython turbo +npm run as -- \ + --file "content/audio.mp3" \ + --whisperPython turbo ``` ### Whisper Diarization @@ -374,7 +526,9 @@ npm run as -- --file "content/audio.mp3" --whisperPython turbo Use [`whisper-diarization`](https://github.com/MahmoudAshraf97/whisper-diarization) to provide speaker labels: ```bash -npm run as -- --file "content/audio.mp3" --whisperDiarization tiny +npm run as -- \ + --file "content/audio.mp3" \ + --whisperDiarization tiny ``` ### Deepgram @@ -382,7 +536,9 @@ npm run as -- --file "content/audio.mp3" --whisperDiarization tiny Create a `.env` file and set API key as demonstrated in `.env.example` for `DEEPGRAM_API_KEY`. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --deepgram +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --deepgram ``` ### Assembly @@ -390,13 +546,18 @@ npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --deepgram Create a `.env` file and set API key as demonstrated in `.env.example` for `ASSEMBLY_API_KEY`. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --assembly +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --assembly ``` Include speaker labels and number of speakers: ```bash -npm run as -- --video "https://ajc.pics/audio/fsjam-short.mp3" --assembly --speakerLabels +npm run as -- \ + --video "https://ajc.pics/audio/fsjam-short.mp3" \ + --assembly \ + --speakerLabels ``` ## Prompt Options @@ -404,55 +565,73 @@ npm run as -- --video "https://ajc.pics/audio/fsjam-short.mp3" --assembly --spea Default includes summary and long chapters, equivalent to running this: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt summary longChapters +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt summary longChapters ``` Create five title ideas: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt titles +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt titles ``` Create a one sentence and one paragraph summary: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt summary +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt summary ``` Create a short, one sentence description for each chapter that's 25 words or shorter. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt shortChapters +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt shortChapters ``` Create a one paragraph description for each chapter that's around 50 words. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt mediumChapters +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt mediumChapters ``` Create a two paragraph description for each chapter that's over 75 words. ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt longChapters +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt longChapters ``` Create three key takeaways about the content: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt takeaways +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt takeaways ``` Create ten questions about the content to check for comprehension: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt questions +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt questions ``` Include all prompt options: ```bash -npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --prompt titles summary longChapters takeaways questions +npm run as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --prompt titles summary longChapters takeaways questions ``` ## Alternative Runtimes @@ -468,14 +647,21 @@ npm run docker-up Replace `as` with `docker` to run most other commands explained in this document. Does not support all options at this time, notably `--llama`, `--whisperPython`, and `--whisperDiarization`. ```bash -npm run docker -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" -npm run docker -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDocker tiny +npm run docker -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" + +npm run docker -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisperDocker tiny ``` Currently supports Ollama's official Docker image so the entire project can be encapsulated in one local Docker Compose file: ```bash -npm run docker -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDocker tiny --ollama +npm run docker -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" \ + --whisperDocker tiny \ + --ollama ``` To reset your Docker images and containers, run: @@ -487,20 +673,20 @@ npm run prune ### Bun ```bash -bun bun-as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" +bun bun-as -- \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" ``` ### Deno ```bash -deno task deno-as --video "https://www.youtube.com/watch?v=MORMZXEaONk" +deno task deno-as \ + --video "https://www.youtube.com/watch?v=MORMZXEaONk" ``` -## Makeshift Test Suite +## Test Suite -Creating a robust and flexible test suite for this project is complex because of the range of network requests, file system operations, build steps, and 3rd party APIs involved. A more thought out test suite will be created at some point, but in the mean time these are hacky but functional ways to test the majority of the project in a single go. - -### Full Test Suite +Integration test. - You'll need API keys for all services to make it through this entire command. - Mostly uses transcripts of videos around one minute long and cheaper models when possible, so the total cost of running this for any given service should be at most only a few cents. @@ -509,52 +695,20 @@ Creating a robust and flexible test suite for this project is complex because of npm run test-all ``` -### Partial Test Command for Local Services - -This version of the test suite only uses Whisper for transcription and Llama.cpp for LLM operations. +Local services test, only uses Whisper for transcription and Ollama for LLM operations. ```bash npm run test-local ``` -## Create Single Markdown File with Entire Project - -This can be a useful way of creating a single markdown file of the entire project for giving to an LLM as context to develop new features or debug code. I'll usually start a conversation by including this along with a prompt that explains what I want changed or added. - -```bash -export MD="LLM.md" && export COMMANDS="src/commands" && export UTILS="src/utils" && \ - export LLMS="src/llms" && export TRANSCRIPT="src/transcription" && \ - export OPEN="\n\n\`\`\`js" && export CLOSE="\n\`\`\`\n\n" && cat README.md >> $MD && \ - echo '\n\n### Directory and File Structure\n\n```' >> $MD && tree >> $MD && \ - echo '```\n\n## Example CLI Commands Test Suite'$OPEN'' >> $MD && cat test/all.test.js >> $MD && \ - echo ''$CLOSE'## JSDoc Types'$OPEN'' >> $MD && cat src/types.js >> $MD && \ - echo ''$CLOSE'## AutoShow CLI Entry Point'$OPEN'' >> $MD && cat src/autoshow.js >> $MD && \ - echo ''$CLOSE'## Utility Functions\n\n### Generate Markdown'$OPEN'' >> $MD && cat $UTILS/generateMarkdown.js >> $MD && \ - echo ''$CLOSE'### Download Audio'$OPEN'' >> $MD && cat $UTILS/downloadAudio.js >> $MD && \ - echo ''$CLOSE'### Run Transcription'$OPEN'' >> $MD && cat $UTILS/runTranscription.js >> $MD && \ - echo ''$CLOSE'### Run LLM'$OPEN'' >> $MD && cat $UTILS/runLLM.js >> $MD && \ - echo ''$CLOSE'### Clean Up Files'$OPEN'' >> $MD && cat $UTILS/cleanUpFiles.js >> $MD && \ - echo ''$CLOSE'## Process Commands\n\n### Process Video'$OPEN'' >> $MD && cat $COMMANDS/processVideo.js >> $MD && \ - echo ''$CLOSE'### Process Playlist'$OPEN'' >> $MD && cat $COMMANDS/processPlaylist.js >> $MD && \ - echo ''$CLOSE'### Process URLs'$OPEN'' >> $MD && cat $COMMANDS/processURLs.js >> $MD && \ - echo ''$CLOSE'### Process RSS'$OPEN'' >> $MD && cat $COMMANDS/processRSS.js >> $MD && \ - echo ''$CLOSE'### Process File'$OPEN'' >> $MD && cat $COMMANDS/processFile.js >> $MD && \ - echo ''$CLOSE'## Transcription Functions\n\n### Call Whisper'$OPEN'' >> $MD && cat $TRANSCRIPT/whisper.js >> $MD && \ - echo ''$CLOSE'### Call Deepgram'$OPEN'' >> $MD && cat $TRANSCRIPT/deepgram.js >> $MD && \ - echo ''$CLOSE'### Call Assembly'$OPEN'' >> $MD && cat $TRANSCRIPT/assembly.js >> $MD && \ - echo ''$CLOSE'## LLM Functions\n\n### Prompt Function'$OPEN'' >> $MD && cat $LLMS/prompt.js >> $MD && \ - echo ''$CLOSE'### Call ChatGPT'$OPEN'' >> $MD && cat $LLMS/chatgpt.js >> $MD && \ - echo ''$CLOSE'### Call Claude'$OPEN'' >> $MD && cat $LLMS/claude.js >> $MD && \ - echo ''$CLOSE'### Call Cohere'$OPEN'' >> $MD && cat $LLMS/cohere.js >> $MD && \ - echo ''$CLOSE'### Call Gemini'$OPEN'' >> $MD && cat $LLMS/gemini.js >> $MD && \ - echo ''$CLOSE'### Call Llama.cpp'$OPEN'' >> $MD && cat $LLMS/llama.js >> $MD && \ - echo ''$CLOSE'### Call Ollama'$OPEN'' >> $MD && cat $LLMS/ollama.js >> $MD && \ - echo ''$CLOSE'### Call Mistral'$OPEN'' >> $MD && cat $LLMS/mistral.js >> $MD && \ - echo ''$CLOSE'### Call Octo'$OPEN'' >> $MD && cat $LLMS/octo.js >> $MD && \ - echo ''$CLOSE'## Docker Files\n\n```Dockerfile' >> $MD && cat .github/whisper.Dockerfile >> $MD && \ - echo ''$CLOSE'```Dockerfile' >> $MD && cat .github/llama.Dockerfile >> $MD && \ - echo ''$CLOSE'```Dockerfile' >> $MD && cat Dockerfile >> $MD && \ - echo ''$CLOSE'```yml' >> $MD && cat docker-compose.yml >> $MD && \ - echo ''$CLOSE'```bash' >> $MD && cat docker-entrypoint.sh >> $MD && \ - echo '\n```\n' >> $MD +Docker test, also uses Whisper for transcription and Ollama for LLM operations but in Docker containers. + +```bash +npm run test-docker +``` + +Benchmark test, compares different size models for `whisper.cpp`, `openai-whisper`, and `whisper-diarization`. + +```bash +npm run test-bench ``` \ No newline at end of file diff --git a/package.json b/package.json index a5f1c47..e14dd55 100644 --- a/package.json +++ b/package.json @@ -26,16 +26,19 @@ "docker-up": "docker compose up --build -d --remove-orphans --no-start", "ds": "docker compose images && docker compose ls", "prune": "docker system prune -af --volumes && docker image prune -af && docker container prune -f && docker volume prune -af", - "v": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --video", - "u": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --urls", - "p": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --playlist", - "f": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --file", - "r": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --rss", - "last3": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v2 --last 3 --rss", + "v": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --video", + "u": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --urls", + "p": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --playlist", + "f": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --file", + "r": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --rss", + "last2": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --last 2 --rss", + "last3": "tsx --env-file=.env --no-warnings src/autoshow.ts --whisper large-v3-turbo --last 3 --rss", "serve": "tsx --env-file=.env --no-warnings --watch packages/server/index.ts", "fetch-local": "tsx --env-file=.env --no-warnings packages/server/tests/fetch-local.ts", "fetch-all": "tsx --env-file=.env --no-warnings packages/server/tests/fetch-all.ts", "t": "npm run test-local", + "bench": "tsx --test test/bench.test.ts", + "test-bench": "tsx --test test/bench.test.ts", "test-local": "tsx --test test/local.test.ts", "test-docker": "tsx --test test/docker.test.ts", "test-integrations": "tsx --test test/integrations.test.ts", @@ -44,8 +47,8 @@ "deno-as": "deno run --allow-sys --allow-read --allow-run --allow-write --allow-env src/autoshow.ts" }, "dependencies": { - "@anthropic-ai/sdk": "0.29.0", - "@deepgram/sdk": "3.8.1", + "@anthropic-ai/sdk": "0.30.1", + "@deepgram/sdk": "3.9.0", "@fastify/cors": "10.0.1", "@google/generative-ai": "0.21.0", "@mistralai/mistralai": "1.1.0", @@ -56,17 +59,17 @@ "commander": "12.1.0", "fast-xml-parser": "4.5.0", "fastify": "5.0.0", - "file-type": "19.5.0", - "inquirer": "12.0.0", + "file-type": "19.6.0", + "inquirer": "12.0.1", "node-llama-cpp": "3.1.1", "ollama": "0.5.9", - "openai": "4.67.3" + "openai": "4.68.4" }, "devDependencies": { "@types/inquirer": "9.0.7", - "@types/node": "22.7.5", + "@types/node": "22.8.1", "tsx": "4.19.1", - "typedoc": "^0.26.10", + "typedoc": "0.26.10", "typescript": "5.6.3" } } diff --git a/src/autoshow.ts b/src/autoshow.ts index 986862d..2ed1701 100644 --- a/src/autoshow.ts +++ b/src/autoshow.ts @@ -22,28 +22,31 @@ import { argv, exit } from 'node:process' import { log, opts, final, ACTION_OPTIONS, LLM_OPTIONS, TRANSCRIPT_OPTIONS } from './models.js' import type { ProcessingOptions, HandlerFunction, LLMServices, TranscriptServices } from './types.js' -// Initialize the command-line interface +// Initialize the command-line interface using Commander.js const program = new Command() /** * Defines the command-line interface options and descriptions. + * Sets up all available commands and their respective flags */ program .name('autoshow') .version('0.0.1') .description('Automate processing of audio and video content from various sources.') .usage('[options]') - .option('--prompt ', 'Specify prompt sections to include') + // Input source options .option('-v, --video ', 'Process a single YouTube video') .option('-p, --playlist ', 'Process all videos in a YouTube playlist') .option('-u, --urls ', 'Process YouTube videos from a list of URLs in a file') .option('-f, --file ', 'Process a local audio or video file') .option('-r, --rss ', 'Process a podcast RSS feed') + // RSS feed specific options .option('--item ', 'Process specific items in the RSS feed by providing their audio URLs') .option('--order ', 'Specify the order for RSS feed processing (newest or oldest)') .option('--skip ', 'Number of items to skip when processing RSS feed', parseInt) .option('--last ', 'Number of most recent items to process (overrides --order and --skip)', parseInt) .option('--info', 'Generate JSON file with RSS feed information instead of processing items') + // Transcription service options .option('--whisper [model]', 'Use Whisper.cpp for transcription with optional model specification') .option('--whisperDocker [model]', 'Use Whisper.cpp in Docker for transcription with optional model specification') .option('--whisperPython [model]', 'Use openai-whisper for transcription with optional model specification') @@ -51,6 +54,7 @@ program .option('--deepgram', 'Use Deepgram for transcription') .option('--assembly', 'Use AssemblyAI for transcription') .option('--speakerLabels', 'Use speaker labels for AssemblyAI transcription') + // LLM service options .option('--chatgpt [model]', 'Use ChatGPT for processing with optional model specification') .option('--claude [model]', 'Use Claude for processing with optional model specification') .option('--cohere [model]', 'Use Cohere for processing with optional model specification') @@ -62,6 +66,8 @@ program .option('--llama [model]', 'Use Node Llama for processing with optional model specification') .option('--ollama [model]', 'Use Ollama for processing with optional model specification') .option('--gemini [model]', 'Use Gemini for processing with optional model specification') + // Utility options + .option('--prompt ', 'Specify prompt sections to include') .option('--noCleanUp', 'Do not delete intermediary files after processing') .option('-i, --interactive', 'Run in interactive mode') .addHelpText( @@ -80,6 +86,8 @@ Report Issues: https://github.com/ajcwebdev/autoshow/issues /** * Helper function to validate that only one option from a list is provided. + * Prevents users from specifying multiple conflicting options simultaneously. + * * @param optionKeys - The list of option keys to check. * @param options - The options object. * @param errorMessage - The prefix of the error message. @@ -90,7 +98,10 @@ function getSingleOption( options: ProcessingOptions, errorMessage: string ): string | undefined { + // Filter out which options from the provided list are actually set const selectedOptions = optionKeys.filter((opt) => options[opt as keyof ProcessingOptions]) + + // If more than one option is selected, throw an error if (selectedOptions.length > 1) { console.error(`Error: Multiple ${errorMessage} provided (${selectedOptions.join(', ')}). Please specify only one.`) exit(1) @@ -100,13 +111,17 @@ function getSingleOption( /** * Main action for the program. + * Handles the processing of options and executes the appropriate command handler. + * * @param options - The command-line options provided by the user. */ program.action(async (options: ProcessingOptions) => { + // Log received options for debugging purposes log(opts(`Options received at beginning of command:\n`)) log(options) log(``) + // Define mapping of action types to their handler functions const PROCESS_HANDLERS: Record = { video: processVideo, playlist: processPlaylist, @@ -115,61 +130,65 @@ program.action(async (options: ProcessingOptions) => { rss: processRSS, } + // Extract interactive mode flag const { interactive } = options + + // Check if no action option was provided const noActionProvided = ACTION_OPTIONS.every((opt) => !options[opt as keyof ProcessingOptions]) + // If in interactive mode or no action provided, prompt user for input if (interactive || noActionProvided) { options = await handleInteractivePrompt(options) } - // Ensure options.item is an array if provided via command line + // Ensure options.item is always an array if provided via command line if (options.item && !Array.isArray(options.item)) { options.item = [options.item] } - // Validate and retrieve single action option + // Validate and get single options for action, LLM, and transcription const action = getSingleOption(ACTION_OPTIONS, options, 'input option') - - // Validate and retrieve single LLM option const llmKey = getSingleOption(LLM_OPTIONS, options, 'LLM option') const llmServices = llmKey as LLMServices | undefined - - // Validate and retrieve single transcription option const transcriptKey = getSingleOption(TRANSCRIPT_OPTIONS, options, 'transcription option') const transcriptServices: TranscriptServices | undefined = transcriptKey as TranscriptServices | undefined - // Set default transcription service if not provided + // Set default transcription service to whisper if none provided const finalTranscriptServices: TranscriptServices = transcriptServices || 'whisper' - // Set default Whisper model if not provided + // Set default Whisper model to 'large-v3-turbo' if whisper is selected but no model specified if (finalTranscriptServices === 'whisper' && !options.whisper) { - options.whisper = 'base' + options.whisper = 'large-v3-turbo' } + // Execute the appropriate handler if an action was specified if (action) { try { + // Process the content using the selected handler await PROCESS_HANDLERS[action]( options, options[action as keyof ProcessingOptions] as string, llmServices, finalTranscriptServices ) + // Log success message log(final(`\n================================================================================================`)) log(final(` ${action} Processing Completed Successfully.`)) log(final(`================================================================================================\n`)) exit(0) } catch (error) { + // Log error and exit if processing fails console.error(`Error processing ${action}:`, (error as Error).message) exit(1) } } }) -// Handle unknown commands +// Set up error handling for unknown commands program.on('command:*', function () { console.error(`Error: Invalid command '${program.args.join(' ')}'. Use --help to see available commands.`) exit(1) }) -// Parse the command-line arguments +// Parse the command-line arguments and execute the program program.parse(argv) \ No newline at end of file diff --git a/src/commands/processFile.ts b/src/commands/processFile.ts index 6dc4c77..a9b3c5d 100644 --- a/src/commands/processFile.ts +++ b/src/commands/processFile.ts @@ -1,5 +1,10 @@ // src/commands/processFile.ts +/** + * @file Process a local audio or video file for transcription and analysis. + * @packageDocumentation + */ + import { generateMarkdown } from '../utils/generateMarkdown.js' import { downloadAudio } from '../utils/downloadAudio.js' import { runTranscription } from '../utils/runTranscription.js' @@ -9,12 +14,22 @@ import { log, opts, wait } from '../models.js' import type { LLMServices, TranscriptServices, ProcessingOptions } from '../types.js' /** - * Main function to process a local audio or video file. - * @param {string} filePath - The path to the local file to process. - * @param {LLMServices} [llmServices] - The selected Language Model option. - * @param {TranscriptServices} [transcriptServices] - The transcription service to use. - * @param {ProcessingOptions} options - Additional options for processing. - * @returns {Promise} + * Processes a local audio or video file through a series of operations: + * 1. Generates markdown with file metadata + * 2. Converts the file to the required audio format + * 3. Transcribes the audio content + * 4. Processes the transcript with a language model (if specified) + * 5. Cleans up temporary files (unless disabled) + * + * Unlike processVideo, this function handles local files and doesn't need + * to check for external dependencies like yt-dlp. + * + * @param options - Configuration options for processing + * @param filePath - Path to the local audio or video file to process + * @param llmServices - Optional language model service to use for processing the transcript + * @param transcriptServices - Optional transcription service to use for converting audio to text + * @throws Will terminate the process with exit code 1 if any processing step fails + * @returns Promise that resolves when all processing is complete */ export async function processFile( options: ProcessingOptions, @@ -22,18 +37,30 @@ export async function processFile( llmServices?: LLMServices, transcriptServices?: TranscriptServices ): Promise { + // Log the processing parameters for debugging purposes log(opts('Parameters passed to processFile:\n')) log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}\n`)) + try { - const { frontMatter, finalPath, filename } = await generateMarkdown(options, filePath) // Generate markdown for the file - await downloadAudio(options, filePath, filename) // Convert the audio or video file to the required format - await runTranscription(options, finalPath, frontMatter, transcriptServices) // Run transcription on the file - await runLLM(options, finalPath, frontMatter, llmServices) // Process the transcript with the selected Language Model - if (!options.noCleanUp) { // Clean up temporary files if the noCleanUp option is not set + // Generate markdown file with file metadata and get file paths + const { frontMatter, finalPath, filename } = await generateMarkdown(options, filePath) + + // Convert the input file to the required audio format for processing + await downloadAudio(options, filePath, filename) + + // Convert the audio to text using the specified transcription service + await runTranscription(options, finalPath, frontMatter, transcriptServices) + + // Process the transcript with a language model if one was specified + await runLLM(options, finalPath, frontMatter, llmServices) + + // Remove temporary files unless the noCleanUp option is set + if (!options.noCleanUp) { await cleanUpFiles(finalPath) } } catch (error) { + // Log the error and terminate the process with error code console.error(`Error processing file: ${(error as Error).message}`) - process.exit(1) // Exit with an error code + process.exit(1) } } \ No newline at end of file diff --git a/src/commands/processPlaylist.ts b/src/commands/processPlaylist.ts index 755c530..1d99623 100644 --- a/src/commands/processPlaylist.ts +++ b/src/commands/processPlaylist.ts @@ -1,5 +1,10 @@ // src/commands/processPlaylist.ts +/** + * @file Process all videos from a YouTube playlist, handling metadata extraction and individual video processing. + * @packageDocumentation + */ + import { writeFile } from 'node:fs/promises' import { execFile } from 'node:child_process' import { promisify } from 'node:util' @@ -9,14 +14,26 @@ import { checkDependencies } from '../utils/checkDependencies.js' import { log, opts, success, wait } from '../models.js' import type { LLMServices, TranscriptServices, ProcessingOptions } from '../types.js' +// Convert execFile to use promises instead of callbacks const execFilePromise = promisify(execFile) /** - * Main function to process a YouTube playlist. - * @param playlistUrl - The URL of the YouTube playlist to process. - * @param llmServices - The selected Language Model option. - * @param transcriptServices - The transcription service to use. - * @param options - Additional options for processing. + * Processes an entire YouTube playlist by: + * 1. Validating system dependencies + * 2. Fetching all video URLs from the playlist using yt-dlp + * 3. Extracting metadata for each video + * 4. Either: + * a. Generating a JSON file with playlist information (if --info option is used) + * b. Processing each video sequentially with error handling + * + * The function continues processing remaining videos even if individual videos fail. + * + * @param options - Configuration options for processing + * @param playlistUrl - URL of the YouTube playlist to process + * @param llmServices - Optional language model service for transcript processing + * @param transcriptServices - Optional transcription service for audio conversion + * @throws Will terminate the process with exit code 1 if the playlist itself cannot be processed + * @returns Promise that resolves when all videos have been processed or JSON info has been saved */ export async function processPlaylist( options: ProcessingOptions, @@ -24,39 +41,36 @@ export async function processPlaylist( llmServices?: LLMServices, transcriptServices?: TranscriptServices ): Promise { + // Log the processing parameters for debugging purposes log(opts('Parameters passed to processPlaylist:\n')) log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}`)) try { - // Check for required dependencies + // Verify that yt-dlp is installed and available await checkDependencies(['yt-dlp']) - - // Fetch video URLs from the playlist + // Extract all video URLs from the playlist using yt-dlp const { stdout, stderr } = await execFilePromise('yt-dlp', [ '--flat-playlist', '--print', 'url', '--no-warnings', playlistUrl ]) - + // Log any warnings from yt-dlp if (stderr) { console.error(`yt-dlp warnings: ${stderr}`) } - - // Split the stdout into an array of video URLs + // Convert stdout into array of video URLs, removing empty entries const urls = stdout.trim().split('\n').filter(Boolean) + // Exit if no videos were found in the playlist if (urls.length === 0) { console.error('Error: No videos found in the playlist.') - process.exit(1) // Exit with an error code + process.exit(1) } - log(opts(`\nFound ${urls.length} videos in the playlist...`)) - - // Extract metadata for all videos + // Collect metadata for all videos in parallel const metadataPromises = urls.map(extractVideoMetadata) const metadataList = await Promise.all(metadataPromises) const validMetadata = metadataList.filter(Boolean) - - // Generate JSON file with playlist information if --info option is used + // Handle --info option: save metadata to JSON and exit if (options.info) { const jsonContent = JSON.stringify(validMetadata, null, 2) const jsonFilePath = 'content/playlist_info.json' @@ -64,21 +78,22 @@ export async function processPlaylist( log(success(`Playlist information saved to: ${jsonFilePath}`)) return } - - // Process each video in the playlist + // Process each video sequentially, with error handling for individual videos for (const [index, url] of urls.entries()) { + // Visual separator for each video in the console log(opts(`\n================================================================================================`)) log(opts(` Processing video ${index + 1}/${urls.length}: ${url}`)) log(opts(`================================================================================================\n`)) try { await processVideo(options, url, llmServices, transcriptServices) } catch (error) { + // Log error but continue processing remaining videos console.error(`Error processing video ${url}: ${(error as Error).message}`) - // Continue processing the next video } } } catch (error) { + // Handle fatal errors that prevent playlist processing console.error(`Error processing playlist: ${(error as Error).message}`) - process.exit(1) // Exit with an error code + process.exit(1) } } \ No newline at end of file diff --git a/src/commands/processRSS.ts b/src/commands/processRSS.ts index d73d6d8..34c74aa 100644 --- a/src/commands/processRSS.ts +++ b/src/commands/processRSS.ts @@ -1,5 +1,10 @@ // src/commands/processRSS.ts +/** + * @file Process podcast episodes and other media content from RSS feeds with robust error handling and filtering options. + * @packageDocumentation + */ + import { writeFile } from 'node:fs/promises' import { XMLParser } from 'fast-xml-parser' import { generateMarkdown } from '../utils/generateMarkdown.js' @@ -7,11 +12,13 @@ import { downloadAudio } from '../utils/downloadAudio.js' import { runTranscription } from '../utils/runTranscription.js' import { runLLM } from '../utils/runLLM.js' import { cleanUpFiles } from '../utils/cleanUpFiles.js' -import { log, final, wait, opts } from '../models.js' - +import { log, wait, opts } from '../models.js' import type { LLMServices, TranscriptServices, ProcessingOptions, RSSItem } from '../types.js' -// Initialize XML parser with specific options +/** + * Configure XML parser for RSS feed processing + * Handles attributes without prefixes and allows boolean values + */ const parser = new XMLParser({ ignoreAttributes: false, attributeNamePrefix: '', @@ -19,12 +26,183 @@ const parser = new XMLParser({ }) /** - * Process a single item from the RSS feed. - * @param {RSSItem} item - The item to process. - * @param {TranscriptServices} [transcriptServices] - The transcription service to use. - * @param {LLMServices} [llmServices] - The selected Language Model option. - * @param {ProcessingOptions} options - Additional options for processing. - * @returns {Promise} + * Validates RSS processing options for consistency and correct values. + * + * @param options - Configuration options to validate + * @throws Will exit process if validation fails + */ +function validateOptions(options: ProcessingOptions): void { + if (options.last !== undefined) { + if (!Number.isInteger(options.last) || options.last < 1) { + console.error('Error: The --last option must be a positive integer.') + process.exit(1) + } + if (options.skip !== undefined || options.order !== undefined) { + console.error('Error: The --last option cannot be used with --skip or --order.') + process.exit(1) + } + } + + if (options.skip !== undefined && (!Number.isInteger(options.skip) || options.skip < 0)) { + console.error('Error: The --skip option must be a non-negative integer.') + process.exit(1) + } + + if (options.order !== undefined && !['newest', 'oldest'].includes(options.order)) { + console.error("Error: The --order option must be either 'newest' or 'oldest'.") + process.exit(1) + } +} + +/** + * Logs the current processing action based on provided options. + * + * @param options - Configuration options determining what to process + */ +function logProcessingAction(options: ProcessingOptions): void { + if (options.item && options.item.length > 0) { + log(wait('\nProcessing specific items:')) + options.item.forEach((url) => log(wait(` - ${url}`))) + } else if (options.last) { + log(wait(`\nProcessing the last ${options.last} items`)) + } else if (options.skip) { + log(wait(` - Skipping first ${options.skip || 0} items`)) + } +} + +/** + * Fetches and parses an RSS feed with timeout handling. + * + * @param rssUrl - URL of the RSS feed to fetch + * @returns The parsed RSS feed object + * @throws Will exit process on network or parsing errors + */ +async function fetchRSSFeed(rssUrl: string) { + const controller = new AbortController() + const timeout = setTimeout(() => controller.abort(), 10000) // 10 seconds timeout + + try { + const response = await fetch(rssUrl, { + method: 'GET', + headers: { 'Accept': 'application/rss+xml' }, + signal: controller.signal, + }) + clearTimeout(timeout) + + if (!response.ok) { + console.error(`HTTP error! status: ${response.status}`) + process.exit(1) + } + + const text = await response.text() + return parser.parse(text) + } catch (error) { + if ((error as Error).name === 'AbortError') { + console.error('Error: Fetch request timed out.') + } else { + console.error(`Error fetching RSS feed: ${(error as Error).message}`) + } + process.exit(1) + } +} + +/** + * Extracts and normalizes items from a parsed RSS feed. + * + * @param feed - Parsed RSS feed object + * @returns Array of normalized RSS items + * @throws Will exit process if no valid items are found + */ +function extractFeedItems(feed: any): RSSItem[] { + const { title: channelTitle, link: channelLink, image: channelImageObject, item: feedItems } = feed.rss.channel + const channelImage = channelImageObject?.url || '' + const feedItemsArray = Array.isArray(feedItems) ? feedItems : [feedItems] + + const items: RSSItem[] = feedItemsArray + .filter((item) => { + if (!item.enclosure || !item.enclosure.type) return false + const audioVideoTypes = ['audio/', 'video/'] + return audioVideoTypes.some((type) => item.enclosure.type.startsWith(type)) + }) + .map((item) => ({ + showLink: item.enclosure.url, + channel: channelTitle, + channelURL: channelLink, + title: item.title, + description: '', + publishDate: new Date(item.pubDate).toISOString().split('T')[0], + coverImage: item['itunes:image']?.href || channelImage || '', + })) + + if (items.length === 0) { + console.error('Error: No audio/video items found in the RSS feed.') + process.exit(1) + } + + return items +} + +/** + * Saves feed information to a JSON file. + * + * @param items - Array of RSS items to save + */ +async function saveFeedInfo(items: RSSItem[]): Promise { + const jsonContent = JSON.stringify(items, null, 2) + const jsonFilePath = 'content/rss_info.json' + await writeFile(jsonFilePath, jsonContent) + log(wait(`RSS feed information saved to: ${jsonFilePath}`)) +} + +/** + * Selects which items to process based on provided options. + * + * @param items - All available RSS items + * @param options - Configuration options for filtering + * @returns Array of items to process + * @throws Will exit process if no matching items are found + */ +function selectItemsToProcess(items: RSSItem[], options: ProcessingOptions): RSSItem[] { + if (options.item && options.item.length > 0) { + const matchedItems = items.filter((item) => options.item!.includes(item.showLink)) + if (matchedItems.length === 0) { + console.error('Error: No matching items found for the provided URLs.') + process.exit(1) + } + return matchedItems + } + + if (options.last) { + return items.slice(0, options.last) + } + + const sortedItems = options.order === 'oldest' ? items.slice().reverse() : items + return sortedItems.slice(options.skip || 0) +} + +/** + * Logs the processing status and item counts. + */ +function logProcessingStatus(total: number, processing: number, options: ProcessingOptions): void { + if (options.item && options.item.length > 0) { + log(wait(`\n - Found ${total} items in the RSS feed.`)) + log(wait(` - Processing ${processing} specified items.`)) + } else if (options.last) { + log(wait(`\n - Found ${total} items in the RSS feed.`)) + log(wait(` - Processing the last ${options.last} items.`)) + } else { + log(wait(`\n - Found ${total} item(s) in the RSS feed.`)) + log(wait(` - Processing ${processing} item(s) after skipping ${options.skip || 0}.\n`)) + } +} + +/** + * Processes a single RSS feed item. + * + * @param options - Configuration options for processing + * @param item - RSS item to process + * @param llmServices - Optional language model service + * @param transcriptServices - Optional transcription service */ async function processItem( options: ProcessingOptions, @@ -34,27 +212,46 @@ async function processItem( ): Promise { log(opts('Parameters passed to processItem:\n')) log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}\n`)) + try { - const { frontMatter, finalPath, filename } = await generateMarkdown(options, item) // Generate markdown for the item - await downloadAudio(options, item.showLink, filename) // Download audio - await runTranscription(options, finalPath, frontMatter, transcriptServices) // Run transcription - await runLLM(options, finalPath, frontMatter, llmServices) // Process with Language Model - if (!options.noCleanUp) { // Clean up temporary files if necessary + const { frontMatter, finalPath, filename } = await generateMarkdown(options, item) + await downloadAudio(options, item.showLink, filename) + await runTranscription(options, finalPath, frontMatter, transcriptServices) + await runLLM(options, finalPath, frontMatter, llmServices) + + if (!options.noCleanUp) { await cleanUpFiles(finalPath) } } catch (error) { console.error(`Error processing item ${item.title}: ${(error as Error).message}`) - // Continue processing the next item + } +} + +/** + * Processes a batch of RSS items. + */ +async function processItems( + items: RSSItem[], + options: ProcessingOptions, + llmServices?: LLMServices, + transcriptServices?: TranscriptServices +): Promise { + for (const [index, item] of items.entries()) { + log(opts(`\n========================================================================================`)) + log(opts(` Item ${index + 1}/${items.length} processing: ${item.title}`)) + log(opts(`========================================================================================\n`)) + + await processItem(options, item, llmServices, transcriptServices) + + log(opts(`\n========================================================================================`)) + log(opts(` ${index + 1}/${items.length} item processing completed successfully`)) + log(opts(`========================================================================================\n`)) } } /** * Main function to process an RSS feed. - * @param {string} rssUrl - The URL of the RSS feed to process. - * @param {LLMServices} [llmServices] - The selected Language Model option. - * @param {TranscriptServices} [transcriptServices] - The transcription service to use. - * @param {ProcessingOptions} options - Additional options for processing. - * @returns {Promise} + * See detailed documentation above regarding options and error handling. */ export async function processRSS( options: ProcessingOptions, @@ -63,163 +260,25 @@ export async function processRSS( transcriptServices?: TranscriptServices ): Promise { log(opts('Parameters passed to processRSS:\n')) - log(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}`) - try { - // Validate that --last is a positive integer if provided - if (options.last !== undefined) { - if (!Number.isInteger(options.last) || options.last < 1) { - console.error('Error: The --last option must be a positive integer.') - process.exit(1) - } - // Ensure --last is not used with --skip or --order - if (options.skip !== undefined || options.order !== undefined) { - console.error('Error: The --last option cannot be used with --skip or --order.') - process.exit(1) - } - } - - // Validate that --skip is a non-negative integer if provided - if (options.skip !== undefined) { - if (!Number.isInteger(options.skip) || options.skip < 0) { - console.error('Error: The --skip option must be a non-negative integer.') - process.exit(1) - } - } - - // Validate that --order is either 'newest' or 'oldest' if provided - if (options.order !== undefined) { - if (!['newest', 'oldest'].includes(options.order)) { - console.error("Error: The --order option must be either 'newest' or 'oldest'.") - process.exit(1) - } - } + log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}`)) - // Log the processing action - if (options.item && options.item.length > 0) { - // If specific items are provided, list them - log(wait('\nProcessing specific items:')) - options.item.forEach((url) => log(wait(` - ${url}`))) - } else if (options.last) { - log(wait(`\nProcessing the last ${options.last} items`)) - } else if (options.skip) { - log(wait(` - Skipping first ${options.skip || 0} items`)) - } - - // Fetch the RSS feed with a timeout - const controller = new AbortController() - const timeout = setTimeout(() => { - controller.abort() - }, 10000) // 10 seconds timeout - - let response: Response - try { - response = await fetch(rssUrl, { - method: 'GET', - headers: { - 'Accept': 'application/rss+xml', - }, - signal: controller.signal, - }) - clearTimeout(timeout) - } catch (error) { - if ((error as Error).name === 'AbortError') { - console.error('Error: Fetch request timed out.') - } else { - console.error(`Error fetching RSS feed: ${(error as Error).message}`) - } - process.exit(1) // Exit with an error code - } + try { + validateOptions(options) + logProcessingAction(options) - // Check if the response is successful - if (!response.ok) { - console.error(`HTTP error! status: ${response.status}`) - process.exit(1) // Exit with an error code - } + const feed = await fetchRSSFeed(rssUrl) + const items = extractFeedItems(feed) - // Parse the RSS feed content - const text = await response.text() - const feed = parser.parse(text) - - // Extract channel and item information - const { - title: channelTitle, link: channelLink, image: channelImageObject, item: feedItems, - } = feed.rss.channel - - // Extract channel image URL safely - const channelImage = channelImageObject?.url || '' - - // Ensure feedItems is an array - const feedItemsArray = Array.isArray(feedItems) ? feedItems : [feedItems] - - // Filter and map feed items - const items: RSSItem[] = feedItemsArray - .filter((item) => { - // Ensure the item has an enclosure with a valid type - if (!item.enclosure || !item.enclosure.type) return false - const audioVideoTypes = ['audio/', 'video/'] - // Include only audio or video items - return audioVideoTypes.some((type) => item.enclosure.type.startsWith(type)) - }) - .map((item) => ({ - showLink: item.enclosure.url, - channel: channelTitle, - channelURL: channelLink, - title: item.title, - description: '', - publishDate: new Date(item.pubDate).toISOString().split('T')[0], - coverImage: item['itunes:image']?.href || channelImage || '', - })) - - if (items.length === 0) { - console.error('Error: No audio/video items found in the RSS feed.') - process.exit(1) // Exit with an error code - } - - // Generate JSON file with RSS feed information if --info option is used if (options.info) { - const jsonContent = JSON.stringify(items, null, 2) - const jsonFilePath = 'content/rss_info.json' - await writeFile(jsonFilePath, jsonContent) - log(wait(`RSS feed information saved to: ${jsonFilePath}`)) + await saveFeedInfo(items) return } - let itemsToProcess: RSSItem[] = [] - if (options.item && options.item.length > 0) { - // Find the items matching the provided audio URLs - const matchedItems = items.filter((item) => options.item!.includes(item.showLink)) - if (matchedItems.length === 0) { - console.error('Error: No matching items found for the provided URLs.') - process.exit(1) // Exit with an error code - } - itemsToProcess = matchedItems - log(wait(`\n - Found ${items.length} items in the RSS feed.`)) - log(wait(` - Processing ${itemsToProcess.length} specified items.`)) - } else if (options.last) { - // Process the most recent N items - itemsToProcess = items.slice(0, options.last) - log(wait(`\n - Found ${items.length} items in the RSS feed.`)) - log(wait(` - Processing the last ${options.last} items.`)) - } else { - // Sort items based on the specified order and apply skip - const sortedItems = options.order === 'oldest' ? items.slice().reverse() : items - itemsToProcess = sortedItems.slice(options.skip || 0) - log(wait(`\n - Found ${items.length} item(s) in the RSS feed.`)) - log(wait(` - Processing ${itemsToProcess.length} item(s) after skipping ${options.skip || 0}.\n`)) - } - - // Process each item in the feed - for (const [index, item] of itemsToProcess.entries()) { - log(opts(`\n========================================================================================`)) - log(opts(` Item ${index + 1}/${itemsToProcess.length} processing: ${item.title}`)) - log(opts(`========================================================================================\n`)) - await processItem(options, item, llmServices, transcriptServices) - log(opts(`\n========================================================================================`)) - log(opts(` ${index + 1}/${itemsToProcess.length} item processing completed successfully`)) - log(opts(`========================================================================================\n`)) - } + const itemsToProcess = selectItemsToProcess(items, options) + logProcessingStatus(items.length, itemsToProcess.length, options) + await processItems(itemsToProcess, options, llmServices, transcriptServices) } catch (error) { console.error(`Error processing RSS feed: ${(error as Error).message}`) - process.exit(1) // Exit with an error code + process.exit(1) } } \ No newline at end of file diff --git a/src/commands/processURLs.ts b/src/commands/processURLs.ts index a4898d9..d69e428 100644 --- a/src/commands/processURLs.ts +++ b/src/commands/processURLs.ts @@ -1,5 +1,10 @@ // src/commands/processURLs.ts +/** + * @file Process multiple YouTube videos from a list of URLs stored in a file. + * @packageDocumentation + */ + import { readFile, writeFile } from 'node:fs/promises' import { processVideo } from './processVideo.js' import { extractVideoMetadata } from '../utils/extractVideoMetadata.js' @@ -8,11 +13,24 @@ import { log, wait, opts } from '../models.js' import type { LLMServices, TranscriptServices, ProcessingOptions } from '../types.js' /** - * Main function to process URLs from a file. - * @param filePath - The path to the file containing URLs. - * @param llmServices - The selected Language Model option. - * @param transcriptServices - The transcription service to use. - * @param options - Additional options for processing. + * Processes multiple YouTube videos from a file containing URLs by: + * 1. Validating system dependencies + * 2. Reading and parsing URLs from the input file + * - Skips empty lines and comments (lines starting with #) + * 3. Extracting metadata for all videos + * 4. Either: + * a. Generating a JSON file with video information (if --info option is used) + * b. Processing each video sequentially with error handling + * + * Similar to processPlaylist, this function continues processing + * remaining URLs even if individual videos fail. + * + * @param options - Configuration options for processing + * @param filePath - Path to the file containing video URLs (one per line) + * @param llmServices - Optional language model service for transcript processing + * @param transcriptServices - Optional transcription service for audio conversion + * @throws Will terminate the process with exit code 1 if the file cannot be read or contains no valid URLs + * @returns Promise that resolves when all videos have been processed or JSON info has been saved */ export async function processURLs( options: ProcessingOptions, @@ -20,30 +38,29 @@ export async function processURLs( llmServices?: LLMServices, transcriptServices?: TranscriptServices ): Promise { + // Log the processing parameters for debugging purposes log(opts('Parameters passed to processURLs:\n')) log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}\n`)) + try { - // Check for required dependencies + // Verify that yt-dlp is installed and available await checkDependencies(['yt-dlp']) - - // Read and parse the content of the file into an array of URLs + // Read the file and extract valid URLs const content = await readFile(filePath, 'utf8') const urls = content.split('\n') .map(line => line.trim()) .filter(line => line && !line.startsWith('#')) - + // Exit if no valid URLs were found in the file if (urls.length === 0) { console.error('Error: No URLs found in the file.') process.exit(1) } log(opts(`\nFound ${urls.length} URLs in the file...`)) - - // Extract metadata for all videos + // Collect metadata for all videos in parallel const metadataPromises = urls.map(extractVideoMetadata) const metadataList = await Promise.all(metadataPromises) const validMetadata = metadataList.filter(Boolean) - - // Generate JSON file with video information if --info option is used + // Handle --info option: save metadata to JSON and exit if (options.info) { const jsonContent = JSON.stringify(validMetadata, null, 2) const jsonFilePath = 'content/urls_info.json' @@ -51,21 +68,22 @@ export async function processURLs( log(wait(`Video information saved to: ${jsonFilePath}`)) return } - - // Process each URL + // Process each URL sequentially, with error handling for individual videos for (const [index, url] of urls.entries()) { + // Visual separator for each video in the console log(opts(`\n================================================================================================`)) log(opts(` Processing URL ${index + 1}/${urls.length}: ${url}`)) log(opts(`================================================================================================\n`)) try { await processVideo(options, url, llmServices, transcriptServices) } catch (error) { + // Log error but continue processing remaining URLs console.error(`Error processing URL ${url}: ${(error as Error).message}`) - // Continue processing the next URL } } } catch (error) { + // Handle fatal errors that prevent file processing console.error(`Error reading or processing file ${filePath}: ${(error as Error).message}`) - process.exit(1) // Exit with an error code + process.exit(1) } } \ No newline at end of file diff --git a/src/commands/processVideo.ts b/src/commands/processVideo.ts index f087e31..7c45fb4 100644 --- a/src/commands/processVideo.ts +++ b/src/commands/processVideo.ts @@ -1,5 +1,10 @@ // src/commands/processVideo.ts +/** + * @file Process a single video from YouTube or other supported platforms. + * @packageDocumentation + */ + import { checkDependencies } from '../utils/checkDependencies.js' import { generateMarkdown } from '../utils/generateMarkdown.js' import { downloadAudio } from '../utils/downloadAudio.js' @@ -10,12 +15,20 @@ import { log, opts, wait } from '../models.js' import type { LLMServices, TranscriptServices, ProcessingOptions } from '../types.js' /** - * Main function to process a single video. - * @param url - The URL of the video to process. - * @param llmServices - The selected Language Model option. - * @param transcriptServices - The transcription service to use. - * @param options - Additional options for processing. - * @returns A promise that resolves when processing is complete. + * Processes a single video by executing a series of operations: + * 1. Validates required system dependencies + * 2. Generates markdown with video metadata + * 3. Downloads and extracts audio + * 4. Transcribes the audio content + * 5. Processes the transcript with a language model (if specified) + * 6. Cleans up temporary files (unless disabled) + * + * @param options - Configuration options for processing + * @param url - The URL of the video to process + * @param llmServices - Optional language model service to use for processing the transcript + * @param transcriptServices - Optional transcription service to use for converting audio to text + * @throws Will throw an error if any processing step fails + * @returns Promise that resolves when all processing is complete */ export async function processVideo( options: ProcessingOptions, @@ -23,19 +36,33 @@ export async function processVideo( llmServices?: LLMServices, transcriptServices?: TranscriptServices ): Promise { + // Log the processing parameters for debugging purposes log(opts('Parameters passed to processVideo:\n')) log(wait(` - llmServices: ${llmServices}\n - transcriptServices: ${transcriptServices}\n`)) + try { - await checkDependencies(['yt-dlp']) // Check for required dependencies. - const { frontMatter, finalPath, filename } = await generateMarkdown(options, url) // Generate markdown with video metadata. - await downloadAudio(options, url, filename) // Download audio from the video. - await runTranscription(options, finalPath, frontMatter, transcriptServices) // Run transcription on the audio. - await runLLM(options, finalPath, frontMatter, llmServices) // If llmServices is set, process with LLM. If llmServices is undefined, bypass LLM processing. - if (!options.noCleanUp) { // Clean up temporary files if the noCleanUp option is not set. + // Verify that required system dependencies (yt-dlp) are installed + await checkDependencies(['yt-dlp']) + + // Generate markdown file with video metadata and get file paths + const { frontMatter, finalPath, filename } = await generateMarkdown(options, url) + + // Extract and download the audio from the video source + await downloadAudio(options, url, filename) + + // Convert the audio to text using the specified transcription service + await runTranscription(options, finalPath, frontMatter, transcriptServices) + + // Process the transcript with a language model if one was specified + await runLLM(options, finalPath, frontMatter, llmServices) + + // Remove temporary files unless the noCleanUp option is set + if (!options.noCleanUp) { await cleanUpFiles(finalPath) } } catch (error) { - console.error('Error processing video:', (error as Error).message) // Log any errors that occur during video processing - throw error // Re-throw to be handled by caller + // Log the error details and re-throw for upstream handling + console.error('Error processing video:', (error as Error).message) + throw error } } \ No newline at end of file diff --git a/src/interactive.ts b/src/interactive.ts index bdecec5..d7c3e10 100644 --- a/src/interactive.ts +++ b/src/interactive.ts @@ -6,13 +6,18 @@ import { log } from './models.js' /** * Prompts the user for input if interactive mode is selected. + * Handles the collection and processing of user choices through a series of + * interactive prompts using inquirer. + * * @param options - The initial command-line options. * @returns The updated options after user input. */ export async function handleInteractivePrompt( options: ProcessingOptions ): Promise { + // Define all interactive prompts using inquirer const answers: InquirerAnswers = await inquirer.prompt([ + // Content source selection prompt { type: 'list', name: 'action', @@ -25,6 +30,7 @@ export async function handleInteractivePrompt( { name: 'Podcast RSS Feed', value: 'rss' }, ], }, + // Input prompts for different content sources { type: 'input', name: 'video', @@ -53,6 +59,7 @@ export async function handleInteractivePrompt( when: (answers: InquirerAnswers) => answers.action === 'file', validate: (input: string) => (input ? true : 'Please enter a valid file path.'), }, + // RSS feed specific prompts { type: 'input', name: 'rss', @@ -77,8 +84,7 @@ export async function handleInteractivePrompt( { type: 'confirm', name: 'info', - message: - 'Do you want to generate JSON file with RSS feed information instead of processing items?', + message: 'Do you want to generate JSON file with RSS feed information instead of processing items?', when: (answers: InquirerAnswers) => answers.action === 'rss', default: false, }, @@ -98,8 +104,7 @@ export async function handleInteractivePrompt( name: 'skip', message: 'Number of items to skip when processing RSS feed:', when: (answers: InquirerAnswers) => answers.action === 'rss' && !answers.info, - validate: (input: string) => - !isNaN(Number(input)) ? true : 'Please enter a valid number.', + validate: (input: string) => !isNaN(Number(input)) ? true : 'Please enter a valid number.', filter: (input: string) => Number(input), }, { @@ -107,10 +112,10 @@ export async function handleInteractivePrompt( name: 'last', message: 'Number of most recent items to process (overrides order and skip):', when: (answers: InquirerAnswers) => answers.action === 'rss' && !answers.info, - validate: (input: string) => - !isNaN(Number(input)) ? true : 'Please enter a valid number.', + validate: (input: string) => !isNaN(Number(input)) ? true : 'Please enter a valid number.', filter: (input: string) => Number(input), }, + // Language Model (LLM) selection and configuration { type: 'list', name: 'llmServices', @@ -130,11 +135,13 @@ export async function handleInteractivePrompt( { name: 'Groq', value: 'groq' }, ], }, + // Model selection based on chosen LLM service { type: 'list', name: 'llmModel', message: 'Select the model you want to use:', choices: (answers: InquirerAnswers) => { + // Return appropriate model choices based on selected LLM service switch (answers.llmServices) { case 'llama': return [ @@ -240,6 +247,7 @@ export async function handleInteractivePrompt( 'gemini', ].includes(answers.llmServices as string), }, + // Transcription service configuration { type: 'list', name: 'transcriptServices', @@ -253,6 +261,7 @@ export async function handleInteractivePrompt( { name: 'AssemblyAI', value: 'assembly' }, ], }, + // Whisper model configuration { type: 'list', name: 'whisperModel', @@ -276,6 +285,7 @@ export async function handleInteractivePrompt( ), default: 'large-v2', }, + // Additional configuration options { type: 'confirm', name: 'speakerLabels', @@ -311,35 +321,31 @@ export async function handleInteractivePrompt( default: true, }, ]) - - // If user cancels the action + // Handle user cancellation if (!answers.confirmAction) { log('Operation cancelled.') process.exit(0) } - - // Merge answers into options + // Merge user answers with existing options options = { ...options, ...answers, } as ProcessingOptions - - // Handle transcription options + // Configure transcription service options based on user selection if (answers.transcriptServices) { if ( ['whisper', 'whisperDocker', 'whisperPython', 'whisperDiarization'].includes( answers.transcriptServices ) ) { - // Assign the Whisper model + // Set selected Whisper model (options as any)[answers.transcriptServices] = answers.whisperModel as WhisperModelType } else if (answers.transcriptServices === 'deepgram' || answers.transcriptServices === 'assembly') { - // Assign boolean true for these services + // Enable selected service (options as any)[answers.transcriptServices] = true } } - - // Handle LLM options + // Configure LLM options based on user selection if (answers.llmServices) { if (answers.llmModel) { (options as any)[answers.llmServices] = answers.llmModel @@ -347,15 +353,12 @@ export async function handleInteractivePrompt( (options as any)[answers.llmServices] = true } } - - // Handle 'item' for RSS feed + // Process RSS feed item URLs if provided if (typeof answers.item === 'string') { options.item = answers.item.split(',').map((item) => item.trim()) } - - // Remove unnecessary properties + // Clean up temporary properties used during prompt flow const keysToRemove = ['action', 'specifyItem', 'confirmAction', 'llmModel', 'whisperModel'] keysToRemove.forEach((key) => delete options[key as keyof typeof options]) - return options } \ No newline at end of file diff --git a/src/llms/ollama.ts b/src/llms/ollama.ts index f735c1b..9fd289e 100644 --- a/src/llms/ollama.ts +++ b/src/llms/ollama.ts @@ -104,7 +104,7 @@ export const callOllama: LLMFunction = async (promptAndTranscript: string, tempP try { const response = JSON.parse(line) if (response.status === 'success') { - log(wait(` - Model ${ollamaModelName} has been pulled successfully.`)) + log(wait(` - Model ${ollamaModelName} has been pulled successfully...\n`)) break } } catch (parseError) { @@ -113,7 +113,7 @@ export const callOllama: LLMFunction = async (promptAndTranscript: string, tempP } } } else { - log(wait(`\n Model ${ollamaModelName} is already available...`)) + log(wait(`\n Model ${ollamaModelName} is already available...\n`)) } } catch (error) { console.error(`Error checking/pulling model: ${error instanceof Error ? error.message : String(error)}`) diff --git a/src/models.ts b/src/models.ts index 943fc45..054d072 100644 --- a/src/models.ts +++ b/src/models.ts @@ -1,27 +1,79 @@ // src/models.ts +/** + * @file Defines constants, model mappings, and utility functions used throughout the application. + * @packageDocumentation + */ + import chalk from 'chalk' import type { ChalkInstance } from 'chalk' import type { WhisperModelType, ChatGPTModelType, ClaudeModelType, CohereModelType, GeminiModelType, MistralModelType, OctoModelType, LlamaModelType, OllamaModelType, TogetherModelType, FireworksModelType, GroqModelType } from './types.js' +/** + * Chalk styling for step indicators in the CLI + * @type {ChalkInstance} + */ export const step: ChalkInstance = chalk.bold.underline + +/** + * Chalk styling for dimmed text + * @type {ChalkInstance} + */ export const dim: ChalkInstance = chalk.dim + +/** + * Chalk styling for success messages + * @type {ChalkInstance} + */ export const success: ChalkInstance = chalk.bold.blue + +/** + * Chalk styling for options display + * @type {ChalkInstance} + */ export const opts: ChalkInstance = chalk.magentaBright.bold + +/** + * Chalk styling for wait/processing messages + * @type {ChalkInstance} + */ export const wait: ChalkInstance = chalk.bold.cyan + +/** + * Chalk styling for final messages + * @type {ChalkInstance} + */ export const final: ChalkInstance = chalk.bold.italic +/** + * Convenience export for console.log + * @type {typeof console.log} + */ export const log: typeof console.log = console.log +/** + * Available action options for content processing + * @type {string[]} + */ export const ACTION_OPTIONS = ['video', 'playlist', 'urls', 'file', 'rss'] + +/** + * Available LLM service options + * @type {string[]} + */ export const LLM_OPTIONS = ['chatgpt', 'claude', 'cohere', 'mistral', 'octo', 'llama', 'ollama', 'gemini', 'fireworks', 'together', 'groq'] + +/** + * Available transcription service options + * @type {string[]} + */ export const TRANSCRIPT_OPTIONS = ['whisper', 'whisperDocker', 'whisperPython', 'whisperDiarization', 'deepgram', 'assembly'] /** - * Define available Whisper models for whisper.cpp + * Mapping of Whisper model types to their corresponding binary filenames for whisper.cpp. * @type {Record} */ -export const WHISPER_MODELS: Record = { +export const WHISPER_MODELS: Record = { 'tiny': 'ggml-tiny.bin', 'tiny.en': 'ggml-tiny.en.bin', 'base': 'ggml-base.bin', @@ -33,10 +85,11 @@ export const WHISPER_MODELS: Record = { 'large-v1': 'ggml-large-v1.bin', 'large-v2': 'ggml-large-v2.bin', 'large-v3-turbo': 'ggml-large-v3-turbo.bin', + 'turbo': 'ggml-large-v3-turbo.bin' } /** - * Define available Whisper models for openai-whisper + * Mapping of Whisper model types to their corresponding names for openai-whisper. * @type {Record} */ export const WHISPER_PYTHON_MODELS: Record = { @@ -55,7 +108,7 @@ export const WHISPER_PYTHON_MODELS: Record = { } /** - * Map of ChatGPT model identifiers to their API names + * Mapping of ChatGPT model identifiers to their API names. * @type {Record} */ export const GPT_MODELS: Record = { @@ -66,7 +119,7 @@ export const GPT_MODELS: Record = { } /** - * Map of Claude model identifiers to their API names + * Mapping of Claude model identifiers to their API names. * @type {Record} */ export const CLAUDE_MODELS: Record = { @@ -77,7 +130,7 @@ export const CLAUDE_MODELS: Record = { } /** - * Map of Cohere model identifiers to their API names + * Mapping of Cohere model identifiers to their API names. * @type {Record} */ export const COHERE_MODELS: Record = { @@ -86,7 +139,7 @@ export const COHERE_MODELS: Record = { } /** - * Map of Gemini model identifiers to their API names + * Mapping of Gemini model identifiers to their API names. * @type {Record} */ export const GEMINI_MODELS: Record = { @@ -96,7 +149,7 @@ export const GEMINI_MODELS: Record = { } /** - * Map of Mistral model identifiers to their API names + * Mapping of Mistral model identifiers to their API names. * @type {Record} */ export const MISTRAL_MODELS: Record = { @@ -107,7 +160,7 @@ export const MISTRAL_MODELS: Record = { } /** - * Map of OctoAI model identifiers to their API names + * Mapping of OctoAI model identifiers to their API names. * @type {Record} */ export const OCTO_MODELS: Record = { @@ -121,7 +174,7 @@ export const OCTO_MODELS: Record = { } /** - * Map of Fireworks model identifiers to their API names + * Mapping of Fireworks model identifiers to their API names. * @type {Record} */ export const FIREWORKS_MODELS: Record = { @@ -134,7 +187,7 @@ export const FIREWORKS_MODELS: Record = { } /** - * Map of Together model identifiers to their API names + * Mapping of Together model identifiers to their API names. * @type {Record} */ export const TOGETHER_MODELS: Record = { @@ -149,7 +202,7 @@ export const TOGETHER_MODELS: Record = { } /** - * Map of Groq model identifiers to their API names. + * Mapping of Groq model identifiers to their API names. * @type {Record} */ export const GROQ_MODELS: Record = { @@ -161,7 +214,7 @@ export const GROQ_MODELS: Record = { } /** - * Map of local model identifiers to their filenames and URLs + * Mapping of local model identifiers to their filenames and download URLs. * @type {Record} */ export const LLAMA_MODELS: Record = { @@ -188,7 +241,7 @@ export const LLAMA_MODELS: Record} */ export const OLLAMA_MODELS: Record = { diff --git a/src/types.ts b/src/types.ts index f2b258a..6eef56a 100644 --- a/src/types.ts +++ b/src/types.ts @@ -1,12 +1,13 @@ // src/types.ts /** - * @file This file contains all the custom type definitions used across the Autoshow project. + * @file Custom type definitions used across the Autoshow project. * @packageDocumentation */ +// Core Processing Types /** - * Represents the processing options passed through command-line arguments or interactive prompts. + * Processing options passed through command-line arguments or interactive prompts. */ export type ProcessingOptions = { /** URL of the YouTube video to process. */ @@ -77,8 +78,9 @@ export type ProcessingOptions = { interactive?: boolean } +// Interactive CLI Types /** - * Represents the answers received from inquirer prompts in interactive mode. + * Answers received from inquirer prompts in interactive mode. */ export type InquirerAnswers = { /** The action selected by the user (e.g., 'video', 'playlist'). */ @@ -124,7 +126,7 @@ export type InquirerAnswers = { } /** - * Represents the structure of the inquirer prompt questions. + * Structure of the inquirer prompt questions. */ export type InquirerQuestions = Array<{ /** The type of the prompt (e.g., 'input', 'list', 'confirm', 'checkbox'). */ @@ -143,26 +145,25 @@ export type InquirerQuestions = Array<{ default?: any }> +// Handler and Processing Types /** - * Represents a handler function for processing different actions (e.g., video, playlist). - * @param options - The options containing various inputs. - * @param input - The specific input (URL or file path). - * @param llmServices - The selected LLM service (optional). - * @param transcriptServices - The selected transcription service (optional). + * Handler function for processing different actions (e.g., video, playlist). + * + * @param options - The options containing various inputs + * @param input - The specific input (URL or file path) + * @param llmServices - The selected LLM service (optional) + * @param transcriptServices - The selected transcription service (optional) */ export type HandlerFunction = ( - // The options containing various inputs options: ProcessingOptions, - // The specific input (URL or file path) input: string, - // Allow llmServices to be optional or undefined llmServices?: LLMServices, - // Allow transcriptServices to be optional or undefined transcriptServices?: TranscriptServices ) => Promise +// Content Types /** - * Represents the data structure for markdown generation. + * Data structure for markdown generation. */ export type MarkdownData = { /** The front matter content for the markdown file. */ @@ -174,7 +175,7 @@ export type MarkdownData = { } /** - * Represents the metadata extracted from a YouTube video. + * Metadata extracted from a YouTube video. */ export type VideoMetadata = { /** The URL to the video's webpage. */ @@ -185,7 +186,7 @@ export type VideoMetadata = { channelURL: string /** The title of the video. */ title: string - /** The description of the video (empty string in this case). */ + /** The description of the video. */ description: string /** The upload date in 'YYYY-MM-DD' format. */ publishDate: string @@ -193,8 +194,9 @@ export type VideoMetadata = { coverImage: string } +// RSS Feed Types /** - * Represents an item in an RSS feed. + * Item in an RSS feed. */ export type RSSItem = { /** The publication date of the RSS item (e.g., '2024-09-24'). */ @@ -216,7 +218,7 @@ export type RSSItem = { } /** - * Represents the options for RSS feed processing. + * Options for RSS feed processing. */ export type RSSOptions = { /** The order to process items ('newest' or 'oldest'). */ @@ -225,8 +227,9 @@ export type RSSOptions = { skip?: number } +// Audio Processing Types /** - * Represents the options for downloading audio files. + * Options for downloading audio files. */ export type DownloadAudioOptions = { /** The desired output audio format (e.g., 'wav'). */ @@ -238,39 +241,24 @@ export type DownloadAudioOptions = { } /** - * Represents the supported file types for audio and video processing. + * Supported file types for audio and video processing. */ export type SupportedFileType = 'wav' | 'mp3' | 'm4a' | 'aac' | 'ogg' | 'flac' | 'mp4' | 'mkv' | 'avi' | 'mov' | 'webm' +// Transcription Service Types /** - * Represents the transcription services that can be used in the application. - * - * - whisper: Use Whisper.cpp for transcription. - * - whisperDocker: Use Whisper.cpp in a Docker container. - * - deepgram: Use Deepgram's transcription service. - * - assembly: Use AssemblyAI's transcription service. + * Transcription services that can be used in the application. */ export type TranscriptServices = 'whisper' | 'whisperDocker' | 'whisperPython' | 'whisperDiarization' | 'deepgram' | 'assembly' /** - * Represents the available Whisper model types. - * - * - tiny: Smallest multilingual model. - * - tiny.en: Smallest English-only model. - * - base: Base multilingual model. - * - base.en: Base English-only model. - * - small: Small multilingual model. - * - small.en: Small English-only model. - * - medium: Medium multilingual model. - * - medium.en: Medium English-only model. - * - large-v1: Large multilingual model version 1. - * - large-v2: Large multilingual model version 2. - * - large-v3-turbo: Large multilingual model version 3 with new turbo model. + * Available Whisper model types with varying sizes and capabilities. */ export type WhisperModelType = 'tiny' | 'tiny.en' | 'base' | 'base.en' | 'small' | 'small.en' | 'medium' | 'medium.en' | 'large-v1' | 'large-v2' | 'large-v3-turbo' | 'turbo' +// LLM Types /** - * Represents the object containing the different prompts, their instructions to the LLM, and their expected example output. + * Object containing different prompts, their instructions to the LLM, and expected example output. */ export type PromptSection = { /** The instructions for the section. */ @@ -280,21 +268,12 @@ export type PromptSection = { } /** - * Represents the options for Language Models (LLMs) that can be used in the application. - * - * - chatgpt: Use OpenAI's ChatGPT models. - * - claude: Use Anthropic's Claude models. - * - cohere: Use Cohere's language models. - * - mistral: Use Mistral AI's language models. - * - octo: Use OctoAI's language models. - * - llama: Use Llama models for local inference. - * - ollama: Use Ollama for processing. - * - gemini: Use Google's Gemini models. + * Options for Language Models (LLMs) that can be used in the application. */ export type LLMServices = 'chatgpt' | 'claude' | 'cohere' | 'mistral' | 'octo' | 'llama' | 'ollama' | 'gemini' | 'fireworks' | 'together' | 'groq' /** - * Represents the options for LLM processing. + * Options for LLM processing. */ export type LLMOptions = { /** The sections to include in the prompt (e.g., ['titles', 'summary']). */ @@ -308,10 +287,11 @@ export type LLMOptions = { } /** - * Represents a function that calls an LLM for processing. - * @param promptAndTranscript - The combined prompt and transcript. - * @param tempPath - The temporary file path. - * @param llmModel - The specific LLM model to use (optional). + * Function that calls an LLM for processing. + * + * @param promptAndTranscript - The combined prompt and transcript + * @param tempPath - The temporary file path + * @param llmModel - The specific LLM model to use (optional) */ export type LLMFunction = ( promptAndTranscript: string, @@ -320,199 +300,364 @@ export type LLMFunction = ( ) => Promise /** - * Represents a mapping of LLM option keys to their corresponding functions. - * - * This ensures that only valid `LLMServices` values can be used as keys in the `llmFunctions` object. + * Mapping of LLM option keys to their corresponding functions. */ export type LLMFunctions = { [K in LLMServices]: LLMFunction } +// LLM Model Types /** - * Define all available LLM models. + * Available GPT models. */ -/** Define available GPT models. */ export type ChatGPTModelType = 'GPT_4o_MINI' | 'GPT_4o' | 'GPT_4_TURBO' | 'GPT_4' -/** Define available Claude models. */ + +/** + * Available Claude models. + */ export type ClaudeModelType = 'CLAUDE_3_5_SONNET' | 'CLAUDE_3_OPUS' | 'CLAUDE_3_SONNET' | 'CLAUDE_3_HAIKU' -/** Define available Cohere models. */ + +/** + * Available Cohere models. + */ export type CohereModelType = 'COMMAND_R' | 'COMMAND_R_PLUS' -/** Define available Gemini models. */ + +/** + * Available Gemini models. + */ export type GeminiModelType = 'GEMINI_1_5_FLASH' | 'GEMINI_1_5_PRO' -/** Define available Mistral AI models. */ + +/** + * Available Mistral AI models. + */ export type MistralModelType = 'MIXTRAL_8x7b' | 'MIXTRAL_8x22b' | 'MISTRAL_LARGE' | 'MISTRAL_NEMO' -/** Define available OctoAI models. */ + +/** + * Available OctoAI models. + */ export type OctoModelType = 'LLAMA_3_1_8B' | 'LLAMA_3_1_70B' | 'LLAMA_3_1_405B' | 'MISTRAL_7B' | 'MIXTRAL_8X_7B' | 'NOUS_HERMES_MIXTRAL_8X_7B' | 'WIZARD_2_8X_22B' -/** Define available Fireworks models. */ + +/** + * Available Fireworks models. + */ export type FireworksModelType = 'LLAMA_3_1_405B' | 'LLAMA_3_1_70B' | 'LLAMA_3_1_8B' | 'LLAMA_3_2_3B' | 'LLAMA_3_2_1B' | 'QWEN_2_5_72B' -/** Define available Together models. */ + +/** + * Available Together models. + */ export type TogetherModelType = 'LLAMA_3_2_3B' | 'LLAMA_3_1_405B' | 'LLAMA_3_1_70B' | 'LLAMA_3_1_8B' | 'GEMMA_2_27B' | 'GEMMA_2_9B' | 'QWEN_2_5_72B' | 'QWEN_2_5_7B' -/** Define available Groq models. */ + +/** + * Available Groq models. + */ export type GroqModelType = 'LLAMA_3_1_70B_VERSATILE' | 'LLAMA_3_1_8B_INSTANT' | 'LLAMA_3_2_1B_PREVIEW' | 'LLAMA_3_2_3B_PREVIEW' | 'MIXTRAL_8X7B_32768' -/** Define local model configurations. */ + +/** + * Local model configurations. + */ export type LlamaModelType = 'QWEN_2_5_1B' | 'QWEN_2_5_3B' | 'PHI_3_5' | 'LLAMA_3_2_1B' | 'GEMMA_2_2B' -/** Define local model with Ollama. */ + +/** + * Local model with Ollama. + */ export type OllamaModelType = 'LLAMA_3_2_1B' | 'LLAMA_3_2_3B' | 'GEMMA_2_2B' | 'PHI_3_5' | 'QWEN_2_5_1B' | 'QWEN_2_5_3B' +// API Response Types +/** + * Response structure from Fireworks AI API. + */ export type FireworksResponse = { + /** Unique identifier for the response */ id: string + /** Type of object */ object: string + /** Timestamp of creation */ created: number + /** Model used for generation */ model: string + /** Input prompts */ prompt: any[] + /** Array of completion choices */ choices: { + /** Reason for completion finish */ finish_reason: string + /** Index of the choice */ index: number + /** Message content and metadata */ message: { + /** Role of the message author */ role: string + /** Generated content */ content: string + /** Tool calls made during generation */ tool_calls: { + /** Tool call identifier */ id: string + /** Type of tool call */ type: string + /** Function call details */ function: { + /** Name of the function called */ name: string + /** Arguments passed to the function */ arguments: string } }[] } }[] + /** Token usage statistics */ usage: { + /** Number of tokens in the prompt */ prompt_tokens: number + /** Number of tokens in the completion */ completion_tokens: number + /** Total tokens used */ total_tokens: number } } +/** + * Response structure from Together AI API. + */ export type TogetherResponse = { + /** Unique identifier for the response */ id: string + /** Type of object */ object: string + /** Timestamp of creation */ created: number + /** Model used for generation */ model: string + /** Input prompts */ prompt: any[] + /** Array of completion choices */ choices: { + /** Generated text */ text: string + /** Reason for completion finish */ finish_reason: string + /** Random seed used */ seed: number + /** Choice index */ index: number + /** Message content and metadata */ message: { + /** Role of the message author */ role: string + /** Generated content */ content: string + /** Tool calls made during generation */ tool_calls: { + /** Index of the tool call */ index: number + /** Tool call identifier */ id: string + /** Type of tool call */ type: string + /** Function call details */ function: { + /** Name of the function called */ name: string + /** Arguments passed to the function */ arguments: string } }[] } + /** Log probability information */ logprobs: { + /** Array of token IDs */ token_ids: number[] + /** Array of tokens */ tokens: string[] + /** Log probabilities for tokens */ token_logprobs: number[] } }[] + /** Token usage statistics */ usage: { + /** Number of tokens in the prompt */ prompt_tokens: number + /** Number of tokens in the completion */ completion_tokens: number + /** Total tokens used */ total_tokens: number } } +/** + * Response structure from Groq Chat Completion API. + */ export type GroqChatCompletionResponse = { + /** Unique identifier for the response */ id: string + /** Type of object */ object: string - created: number // UNIX timestamp - model: string // e.g., "mixtral-8x7b-32768" - system_fingerprint: string | null // Nullable field + /** Timestamp of creation */ + created: number + /** Model used for generation */ + model: string + /** System fingerprint */ + system_fingerprint: string | null + /** Array of completion choices */ choices: { + /** Choice index */ index: number + /** Message content and metadata */ message: { - role: 'assistant' | 'user' | 'system' // Role of the message author - content: string // The actual text of the message + /** Role of the message author */ + role: 'assistant' | 'user' | 'system' + /** Generated content */ + content: string } - finish_reason: string // Reason why the completion stopped, e.g., "stop" + /** Reason for completion finish */ + finish_reason: string + /** Optional log probability information */ logprobs?: { - tokens: string[] // Tokens generated by the model - token_logprobs: number[] // Log probabilities for each token - top_logprobs: Record[] // Top logprobs for the tokens - text_offset: number[] // Text offsets for the tokens - } | null // Optional logprobs object + /** Array of tokens */ + tokens: string[] + /** Log probabilities for tokens */ + token_logprobs: number[] + /** Top log probabilities */ + top_logprobs: Record[] + /** Text offsets for tokens */ + text_offset: number[] + } | null }[] + /** Optional usage statistics */ usage?: { - prompt_tokens: number // Tokens used in the prompt - completion_tokens: number // Tokens used in the generated completion - total_tokens: number // Total tokens used - prompt_time?: number // Optional timing for the prompt - completion_time?: number // Optional timing for the completion - total_time?: number // Optional total time for both prompt and completion + /** Number of tokens in the prompt */ + prompt_tokens: number + /** Number of tokens in the completion */ + completion_tokens: number + /** Total tokens used */ + total_tokens: number + /** Optional timing for prompt processing */ + prompt_time?: number + /** Optional timing for completion generation */ + completion_time?: number + /** Optional total processing time */ + total_time?: number } } -// Define the expected structure of the response from Ollama API +/** + * Response structure from Ollama API. + */ export type OllamaResponse = { + /** Model used for generation */ model: string + /** Timestamp of creation */ created_at: string + /** Message content and metadata */ message: { + /** Role of the message author */ role: string + /** Generated content */ content: string } + /** Reason for completion */ done_reason: string + /** Whether generation is complete */ done: boolean + /** Total processing duration */ total_duration: number + /** Model loading duration */ load_duration: number + /** Number of prompt evaluations */ prompt_eval_count: number + /** Duration of prompt evaluation */ prompt_eval_duration: number + /** Number of evaluations */ eval_count: number + /** Duration of evaluation */ eval_duration: number } +/** + * Response structure for Ollama model tags. + */ export type OllamaTagsResponse = { + /** Array of available models */ models: Array<{ + /** Model name */ name: string + /** Base model identifier */ model: string + /** Last modification timestamp */ modified_at: string + /** Model size in bytes */ size: number + /** Model digest */ digest: string + /** Model details */ details: { + /** Parent model identifier */ parent_model: string + /** Model format */ format: string + /** Model family */ family: string + /** Array of model families */ families: string[] + /** Model parameter size */ parameter_size: string + /** Quantization level */ quantization_level: string } }> } -// Define types for Deepgram API response +/** + * Response structure from Deepgram API. + */ export type DeepgramResponse = { + /** Metadata about the transcription */ metadata: { + /** Transaction key */ transaction_key: string + /** Request identifier */ request_id: string + /** SHA256 hash */ sha256: string + /** Creation timestamp */ created: string + /** Audio duration */ duration: number + /** Number of audio channels */ channels: number + /** Array of models used */ models: string[] + /** Information about models used */ model_info: { [key: string]: { + /** Model name */ name: string + /** Model version */ version: string + /** Model architecture */ arch: string } } } + /** Transcription results */ results: { + /** Array of channel results */ channels: Array<{ + /** Array of alternative transcriptions */ alternatives: Array<{ + /** Transcribed text */ transcript: string + /** Confidence score */ confidence: number + /** Array of word-level details */ words: Array<{ + /** Individual word */ word: string + /** Start time */ start: number + /** End time */ end: number + /** Word-level confidence */ confidence: number }> }> @@ -521,7 +666,8 @@ export type DeepgramResponse = { } /** - * Represents the function signature for cleaning up temporary files. - * @param id - The unique identifier for the temporary files. + * Function signature for cleaning up temporary files. + * + * @param id - The unique identifier for the temporary files */ export type CleanUpFunction = (id: string) => Promise \ No newline at end of file diff --git a/src/utils/checkDependencies.ts b/src/utils/checkDependencies.ts index 0b23862..a9d01dc 100644 --- a/src/utils/checkDependencies.ts +++ b/src/utils/checkDependencies.ts @@ -1,20 +1,51 @@ // src/utils/checkDependencies.ts +/** + * @file Utility for verifying required command-line dependencies. + * Checks if necessary external tools are installed and accessible in the system PATH. + * @packageDocumentation + */ + import { execFile } from 'node:child_process' import { promisify } from 'node:util' +// Promisify execFile for async/await usage const execFilePromise = promisify(execFile) /** - * Check if required dependencies are installed. - * @param dependencies - List of command-line tools to check. - * @returns A promise that resolves when all dependencies are checked. + * Verifies that required command-line dependencies are installed and accessible. + * Attempts to execute each dependency with --version flag to confirm availability. + * + * Common dependencies checked include: + * - yt-dlp: For downloading online content + * - ffmpeg: For audio processing + * + * @param {string[]} dependencies - Array of command names to verify. + * Each command should support the --version flag. + * + * @returns {Promise} Resolves when all dependencies are verified. + * + * @throws {Error} If any dependency is: + * - Not installed + * - Not found in system PATH + * - Not executable + * - Returns non-zero exit code + * + * @example + * try { + * await checkDependencies(['yt-dlp', 'ffmpeg']) + * console.log('All dependencies are available') + * } catch (error) { + * console.error('Missing dependency:', error.message) + * } */ export async function checkDependencies(dependencies: string[]): Promise { for (const command of dependencies) { try { + // Attempt to execute command with --version flag await execFilePromise(command, ['--version']) } catch (error) { + // Throw descriptive error if command check fails throw new Error( `Dependency '${command}' is not installed or not found in PATH. Please install it to proceed.` ) diff --git a/src/utils/cleanUpFiles.ts b/src/utils/cleanUpFiles.ts index 35effe1..3259dc0 100644 --- a/src/utils/cleanUpFiles.ts +++ b/src/utils/cleanUpFiles.ts @@ -1,29 +1,72 @@ // src/utils/cleanUpFiles.ts +/** + * @file Utility for cleaning up temporary files generated during processing. + * Handles removal of intermediate files with specific extensions. + * @packageDocumentation + */ + import { unlink } from 'node:fs/promises' import { log, step, success } from '../models.js' /** - * Asynchronous function to clean up temporary files. - * @param {string} id - The base filename (without extension) for the files to be cleaned up. - * @returns {Promise} - * @throws {Error} - If an error occurs while deleting files. + * Removes temporary files generated during content processing. + * Attempts to delete files with specific extensions and logs the results. + * Silently ignores attempts to delete non-existent files. + * + * Files cleaned up include: + * - .wav: Audio files + * - .txt: Transcription text + * - .md: Markdown content + * - .lrc: Lyrics/subtitles + * + * @param {string} id - Base filename (without extension) used to identify related files. + * All files matching pattern `${id}${extension}` will be deleted. + * + * @returns {Promise} Resolves when cleanup is complete. + * + * @throws {Error} If deletion fails for reasons other than file not existing: + * - Permission denied + * - File is locked/in use + * - I/O errors + * + * @example + * try { + * await cleanUpFiles('content/my-video-2024-03-21') + * // Will attempt to delete: + * // - content/my-video-2024-03-21.wav + * // - content/my-video-2024-03-21.txt + * // - content/my-video-2024-03-21.md + * // - content/my-video-2024-03-21.lrc + * } catch (error) { + * console.error('Cleanup failed:', error) + * } */ export async function cleanUpFiles(id: string): Promise { log(step('\nStep 5 - Cleaning up temporary files...\n')) - // Array of file extensions to delete - const extensions = ['.wav', '.txt', '.md', '.lrc'] + + // Define extensions of temporary files to be cleaned up + const extensions = [ + '.wav', // Audio files + '.txt', // Transcription text + '.md', // Markdown content + '.lrc' // Lyrics/subtitles + ] log(success(` Temporary files deleted:`)) + + // Attempt to delete each file type for (const ext of extensions) { try { + // Delete file and log success await unlink(`${id}${ext}`) log(success(` - ${id}${ext}`)) } catch (error) { + // Only log errors that aren't "file not found" (ENOENT) if (error instanceof Error && (error as Error).message !== 'ENOENT') { console.error(`Error deleting file ${id}${ext}: ${(error as Error).message}`) } - // If the file does not exist, silently continue + // Silently continue if file doesn't exist } } } \ No newline at end of file diff --git a/src/utils/downloadAudio.ts b/src/utils/downloadAudio.ts index ac2f2df..3ef8b64 100644 --- a/src/utils/downloadAudio.ts +++ b/src/utils/downloadAudio.ts @@ -1,5 +1,12 @@ // src/utils/downloadAudio.ts +/** + * @file Utility for downloading and processing audio from various sources. + * Handles both online content (via yt-dlp) and local files (via ffmpeg), + * converting them to a standardized WAV format suitable for transcription. + * @packageDocumentation + */ + import { exec, execFile } from 'node:child_process' import { promisify } from 'node:util' import { readFile, access } from 'node:fs/promises' @@ -8,83 +15,146 @@ import { checkDependencies } from './checkDependencies.js' import { log, step, success, wait } from '../models.js' import type { SupportedFileType, ProcessingOptions } from '../types.js' +// Promisify node:child_process functions for async/await usage const execFilePromise = promisify(execFile) const execPromise = promisify(exec) /** - * Function to download or process audio based on the input type. - * @param {ProcessingOptions} options - The processing options specifying the type of content to generate. - * @param {string} input - The URL of the video or path to the local file. - * @param {string} filename - The base filename to save the audio as. - * @returns {Promise} - Returns the path to the downloaded or processed WAV file. - * @throws {Error} - If there is an error during the download or processing. + * Downloads or processes audio content from various sources and converts it to a standardized WAV format. + * + * The function handles two main scenarios: + * 1. Online content (YouTube, RSS feeds) - Downloads using yt-dlp + * 2. Local files - Converts using ffmpeg + * + * In both cases, the output is converted to: + * - WAV format + * - 16kHz sample rate + * - Mono channel + * - 16-bit PCM encoding + * + * @param {ProcessingOptions} options - Processing configuration containing: + * - video: Flag for YouTube video processing + * - playlist: Flag for YouTube playlist processing + * - urls: Flag for processing from URL list + * - rss: Flag for RSS feed processing + * - file: Flag for local file processing + * + * @param {string} input - The source to process: + * - For online content: URL of the content + * - For local files: File path on the system + * + * @param {string} filename - Base filename for the output WAV file + * (without extension, will be saved in content/ directory) + * + * @returns {Promise} Path to the processed WAV file + * + * @throws {Error} If: + * - Required dependencies (yt-dlp, ffmpeg) are missing + * - File access fails + * - File type is unsupported + * - Conversion process fails + * - Invalid options are provided + * + * Supported file formats include: + * - Audio: wav, mp3, m4a, aac, ogg, flac + * - Video: mp4, mkv, avi, mov, webm + * + * @example + * // Download from YouTube + * const wavPath = await downloadAudio( + * { video: true }, + * 'https://www.youtube.com/watch?v=...', + * 'my-video' + * ) + * + * @example + * // Process local file + * const wavPath = await downloadAudio( + * { file: true }, + * '/path/to/audio.mp3', + * 'my-audio' + * ) */ -export async function downloadAudio(options: ProcessingOptions, input: string, filename: string): Promise { +export async function downloadAudio( + options: ProcessingOptions, + input: string, + filename: string +): Promise { + // Define output paths using the provided filename const finalPath = `content/${filename}` const outputPath = `${finalPath}.wav` + // Handle online content (YouTube, RSS feeds, etc.) if (options.video || options.playlist || options.urls || options.rss) { log(step('\nStep 2 - Downloading URL audio...\n')) try { - // Check for required dependencies + // Verify yt-dlp is available await checkDependencies(['yt-dlp']) - - // Execute yt-dlp to download the audio + // Download and convert audio using yt-dlp const { stderr } = await execFilePromise('yt-dlp', [ - '--no-warnings', - '--restrict-filenames', - '--extract-audio', - '--audio-format', 'wav', - '--postprocessor-args', 'ffmpeg:-ar 16000 -ac 1', - '--no-playlist', - '-o', outputPath, + '--no-warnings', // Suppress warning messages + '--restrict-filenames', // Use safe filenames + '--extract-audio', // Extract audio stream + '--audio-format', 'wav', // Convert to WAV + '--postprocessor-args', 'ffmpeg:-ar 16000 -ac 1', // 16kHz mono + '--no-playlist', // Don't expand playlists + '-o', outputPath, // Output path input, ]) - - // Log any errors from yt-dlp + // Log any non-fatal warnings from yt-dlp if (stderr) { console.error(`yt-dlp warnings: ${stderr}`) } - log(success(` Audio downloaded successfully:\n - ${outputPath}`)) } catch (error) { - console.error(`Error downloading audio: ${error instanceof Error ? (error as Error).message : String(error)}`) + console.error( + `Error downloading audio: ${ + error instanceof Error ? (error as Error).message : String(error) + }` + ) throw error } - } else if (options.file) { + } + // Handle local file processing + else if (options.file) { log(step('\nStep 2 - Processing file audio...\n')) - // Define supported audio and video formats + // Define supported media formats const supportedFormats: Set = new Set([ - 'wav', 'mp3', 'm4a', 'aac', 'ogg', 'flac', 'mp4', 'mkv', 'avi', 'mov', 'webm', + // Audio formats + 'wav', 'mp3', 'm4a', 'aac', 'ogg', 'flac', + // Video formats + 'mp4', 'mkv', 'avi', 'mov', 'webm', ]) try { - // Check if the file exists + // Verify file exists and is accessible await access(input) - - // Read the file into a buffer + // Read file and determine its type const buffer = await readFile(input) - - // Determine the file type const fileType = await fileTypeFromBuffer(buffer) + // Validate file type is supported if (!fileType || !supportedFormats.has(fileType.ext as SupportedFileType)) { throw new Error( fileType ? `Unsupported file type: ${fileType.ext}` : 'Unable to determine file type' ) } log(wait(` File type detected as ${fileType.ext}, converting to WAV...\n`)) - - // Convert the file to WAV format + // Convert to standardized WAV format using ffmpeg await execPromise( `ffmpeg -i "${input}" -ar 16000 -ac 1 -c:a pcm_s16le "${outputPath}"` ) log(success(` File converted to WAV format successfully:\n - ${outputPath}`)) } catch (error) { - console.error(`Error processing local file: ${error instanceof Error ? (error as Error).message : String(error)}`) + console.error( + `Error processing local file: ${ + error instanceof Error ? (error as Error).message : String(error) + }` + ) throw error } - } else { + } + // Handle invalid options + else { throw new Error('Invalid option provided for audio download/processing.') } - return outputPath } \ No newline at end of file diff --git a/src/utils/extractVideoMetadata.ts b/src/utils/extractVideoMetadata.ts index eeea3e6..e57986c 100644 --- a/src/utils/extractVideoMetadata.ts +++ b/src/utils/extractVideoMetadata.ts @@ -1,52 +1,96 @@ -// src/utils/extractVideoMetadata.ts +/** + * @file Utility for extracting metadata from YouTube videos using yt-dlp. + * Provides functionality to retrieve essential video information such as title, + * channel, publish date, and thumbnail URL. + * @packageDocumentation + */ import { execFile } from 'node:child_process' import { promisify } from 'node:util' import { checkDependencies } from './checkDependencies.js' - import type { VideoMetadata } from '../types.js' +// Promisify execFile for async/await usage with yt-dlp const execFilePromise = promisify(execFile) /** - * Extract metadata for a single video URL. - * @param url - The URL of the video. - * @returns The video metadata. + * Extracts metadata for a single video URL using yt-dlp. + * + * This function performs the following steps: + * 1. Verifies yt-dlp is installed + * 2. Executes yt-dlp with specific format strings to extract metadata + * 3. Parses the output into structured video metadata + * 4. Validates that all required metadata fields are present + * + * @param {string} url - The URL of the video to extract metadata from. + * Supports YouTube and other platforms compatible with yt-dlp. + * + * @returns {Promise} A promise that resolves to an object containing: + * - showLink: Direct URL to the video + * - channel: Name of the channel that published the video + * - channelURL: URL to the channel's page + * - title: Title of the video + * - description: Video description (currently returned empty) + * - publishDate: Publication date in YYYY-MM-DD format + * - coverImage: URL to the video's thumbnail + * + * @throws {Error} If: + * - yt-dlp is not installed + * - The video URL is invalid + * - Any required metadata field is missing + * - The yt-dlp command fails + * + * @example + * try { + * const metadata = await extractVideoMetadata('https://www.youtube.com/watch?v=...') + * console.log(metadata.title) // Video title + * console.log(metadata.publishDate) // YYYY-MM-DD + * } catch (error) { + * console.error('Failed to extract video metadata:', error) + * } */ export async function extractVideoMetadata(url: string): Promise { try { - // Check for required dependencies + // Verify yt-dlp is available await checkDependencies(['yt-dlp']) + // Execute yt-dlp with format strings to extract specific metadata fields const { stdout } = await execFilePromise('yt-dlp', [ - '--restrict-filenames', - '--print', '%(webpage_url)s', - '--print', '%(channel)s', - '--print', '%(uploader_url)s', - '--print', '%(title)s', - '--print', '%(upload_date>%Y-%m-%d)s', - '--print', '%(thumbnail)s', + '--restrict-filenames', // Ensure safe filenames + '--print', '%(webpage_url)s', // Direct link to video + '--print', '%(channel)s', // Channel name + '--print', '%(uploader_url)s', // Channel URL + '--print', '%(title)s', // Video title + '--print', '%(upload_date>%Y-%m-%d)s', // Formatted upload date + '--print', '%(thumbnail)s', // Thumbnail URL url, ]) + // Split stdout into individual metadata fields const [showLink, channel, channelURL, title, publishDate, coverImage] = stdout.trim().split('\n') - // Ensure all metadata is present + // Validate that all required metadata fields are present if (!showLink || !channel || !channelURL || !title || !publishDate || !coverImage) { throw new Error('Incomplete metadata received from yt-dlp.') } + // Return structured video metadata return { - showLink, - channel, - channelURL, - title, - description: '', - publishDate, - coverImage, + showLink, // Direct URL to the video + channel, // Channel name + channelURL, // Channel page URL + title, // Video title + description: '', // Empty description to fill in with LLM output + publishDate, // Publication date (YYYY-MM-DD) + coverImage, // Thumbnail URL } } catch (error) { - console.error(`Error extracting metadata for ${url}: ${error instanceof Error ? (error as Error).message : String(error)}`) - throw error + // Enhanced error handling with type checking + console.error( + `Error extracting metadata for ${url}: ${ + error instanceof Error ? (error as Error).message : String(error) + }` + ) + throw error // Re-throw to allow handling by caller } } \ No newline at end of file diff --git a/src/utils/generateMarkdown.ts b/src/utils/generateMarkdown.ts index ed53d67..3586a21 100644 --- a/src/utils/generateMarkdown.ts +++ b/src/utils/generateMarkdown.ts @@ -1,5 +1,11 @@ // src/utils/generateMarkdown.ts +/** + * @file Utility for generating markdown files with front matter for different content types. + * Supports YouTube videos, playlists, local files, and RSS feed items. + * @packageDocumentation + */ + import { execFile } from 'node:child_process' import { promisify } from 'node:util' import { writeFile } from 'node:fs/promises' @@ -12,23 +18,64 @@ import type { MarkdownData, ProcessingOptions, RSSItem } from '../types.js' const execFilePromise = promisify(execFile) /** - * Generates markdown content based on the provided options and input. + * Generates markdown content with front matter based on the provided options and input. + * Handles different content types including YouTube videos, playlists, local files, and RSS items. + * + * The function performs the following steps: + * 1. Sanitizes input titles for safe filename creation + * 2. Extracts metadata based on content type + * 3. Generates appropriate front matter + * 4. Creates and saves the markdown file * * @param {ProcessingOptions} options - The processing options specifying the type of content to generate. - * @param {string | RSSItem} input - The input data, either a string (for video URL or file path) or an RSSItem object. - * @returns {Promise} A promise that resolves to an object containing the generated markdown data. + * Valid options include: video, playlist, urls, file, and rss. + * @param {string | RSSItem} input - The input data to process: + * - For video/playlist/urls: A URL string + * - For file: A file path string + * - For RSS: An RSSItem object containing feed item details + * @returns {Promise} A promise that resolves to an object containing: + * - frontMatter: The generated front matter content + * - finalPath: The path where the markdown file is saved + * - filename: The sanitized filename * @throws {Error} If invalid options are provided or if metadata extraction fails. + * + * @example + * // For a YouTube video + * const result = await generateMarkdown( + * { video: true }, + * 'https://www.youtube.com/watch?v=...' + * ) + * + * @example + * // For an RSS item + * const result = await generateMarkdown( + * { rss: true }, + * { + * publishDate: '2024-03-21', + * title: 'Episode Title', + * coverImage: 'https://...', + * showLink: 'https://...', + * channel: 'Podcast Name', + * channelURL: 'https://...' + * } + * ) */ export async function generateMarkdown( options: ProcessingOptions, input: string | RSSItem ): Promise { - // log(` - input: ${input}\n`) /** - * Sanitizes a title string for use in filenames. + * Sanitizes a title string for use in filenames by: + * - Removing special characters except spaces and hyphens + * - Converting spaces and underscores to hyphens + * - Converting to lowercase + * - Limiting length to 200 characters * * @param {string} title - The title to sanitize. - * @returns {string} The sanitized title. + * @returns {string} The sanitized title safe for use in filenames. + * + * @example + * sanitizeTitle('My Video Title! (2024)') // returns 'my-video-title-2024' */ function sanitizeTitle(title: string): string { return title @@ -45,36 +92,36 @@ export async function generateMarkdown( let finalPath: string let filename: string - // Use a switch statement to handle different content types + // Handle different content types using a switch statement switch (true) { case !!options.video: case !!options.playlist: case !!options.urls: - // Check if yt-dlp is installed + // Verify yt-dlp is available for video processing await checkDependencies(['yt-dlp']) - // Execute yt-dlp to extract video metadata + // Extract video metadata using yt-dlp const { stdout } = await execFilePromise('yt-dlp', [ '--restrict-filenames', - '--print', '%(upload_date>%Y-%m-%d)s', + '--print', '%(upload_date>%Y-%m-%d)s', // Format: YYYY-MM-DD '--print', '%(title)s', '--print', '%(thumbnail)s', '--print', '%(webpage_url)s', '--print', '%(channel)s', '--print', '%(uploader_url)s', - input as string, // Assert input as string for video URL + input as string, ]) - // Parse the output from yt-dlp + // Parse the metadata output into individual fields const [ formattedDate, videoTitle, thumbnail, webpage_url, videoChannel, uploader_url ] = stdout.trim().split('\n') - // Generate filename and path + // Generate filename using date and sanitized title filename = `${formattedDate}-${sanitizeTitle(videoTitle)}` finalPath = `content/${filename}` - // Create front matter for video content + // Create video-specific front matter frontMatter = [ '---', `showLink: "${webpage_url}"`, @@ -89,15 +136,15 @@ export async function generateMarkdown( break case !!options.file: - // Extract filename from the input path + // Extract and process local file information const originalFilename = basename(input as string) const filenameWithoutExt = originalFilename.replace(extname(originalFilename), '') - // Generate sanitized filename and path + // Generate sanitized filename filename = sanitizeTitle(filenameWithoutExt) finalPath = `content/${filename}` - // Create front matter for file content + // Create file-specific front matter with minimal metadata frontMatter = [ '---', `showLink: "${originalFilename}"`, @@ -112,15 +159,15 @@ export async function generateMarkdown( break case !!options.rss: - // Assert input as RSSItem and destructure its properties + // Process RSS feed item const item = input as RSSItem const { publishDate, title: rssTitle, coverImage, showLink, channel: rssChannel, channelURL } = item - // Generate filename and path + // Generate filename using date and sanitized title filename = `${publishDate}-${sanitizeTitle(rssTitle)}` finalPath = `content/${filename}` - // Create front matter for RSS content + // Create RSS-specific front matter frontMatter = [ '---', `showLink: "${showLink}"`, @@ -135,21 +182,20 @@ export async function generateMarkdown( break default: - // Throw an error if an invalid option is provided throw new Error('Invalid option provided for markdown generation.') } - // Join the front matter array into a single string + // Join front matter array into a single string const frontMatterContent = frontMatter.join('\n') - // Write the front matter content to a file + // Write the front matter content to a markdown file await writeFile(`${finalPath}.md`, frontMatterContent) - // Log the generated front matter and success message + // Log the generated content and success message log(dim(frontMatterContent)) log(step('\nStep 1 - Generating markdown...\n')) log(success(` Front matter successfully created and saved:\n - ${finalPath}.md`)) - // Return the generated markdown data + // Return the generated markdown data for further processing return { frontMatter: frontMatterContent, finalPath, filename } } \ No newline at end of file diff --git a/src/utils/runLLM.ts b/src/utils/runLLM.ts index d7d05ef..13d3697 100644 --- a/src/utils/runLLM.ts +++ b/src/utils/runLLM.ts @@ -1,5 +1,11 @@ // src/utils/runLLM.ts +/** + * @file Orchestrator for running Language Model (LLM) processing on transcripts. + * Handles prompt generation, LLM processing, and file management for multiple LLM services. + * @packageDocumentation + */ + import { readFile, writeFile, unlink } from 'node:fs/promises' import { callLlama } from '../llms/llama.js' import { callOllama } from '../llms/ollama.js' @@ -17,13 +23,66 @@ import { log, step, success, wait } from '../models.js' import type { LLMServices, ProcessingOptions, LLMFunction, LLMFunctions } from '../types.js' /** - * Main function to run the selected Language Model. - * @param {string} finalPath - The base path for the files. - * @param {string} frontMatter - The front matter content for the markdown file. - * @param {LLMServices} llmServices - The selected Language Model option. - * @param {ProcessingOptions} options - Additional options for processing. - * @returns {Promise} - * @throws {Error} - If the LLM processing fails or an error occurs during execution. + * Processes a transcript using a specified Language Model service. + * Handles the complete workflow from reading the transcript to generating + * and saving the final markdown output. + * + * The function performs these steps: + * 1. Reads the transcript file + * 2. Generates a prompt based on provided options + * 3. Processes the content with the selected LLM + * 4. Saves the results with front matter and original transcript + * + * If no LLM is selected, it saves the prompt and transcript without processing. + * + * @param {ProcessingOptions} options - Configuration options including: + * - prompt: Array of prompt sections to include + * - LLM-specific options (e.g., chatgpt, claude, etc.) + * + * @param {string} finalPath - Base path for input/output files: + * - Input transcript: `${finalPath}.txt` + * - Temporary file: `${finalPath}-${llmServices}-temp.md` + * - Final output: `${finalPath}-${llmServices}-shownotes.md` + * + * @param {string} frontMatter - YAML front matter content to include in the output + * + * @param {LLMServices} [llmServices] - The LLM service to use: + * - llama: Node Llama for local inference + * - ollama: Ollama for local inference + * - chatgpt: OpenAI's ChatGPT + * - claude: Anthropic's Claude + * - gemini: Google's Gemini + * - cohere: Cohere + * - mistral: Mistral AI + * - octo: OctoAI + * - fireworks: Fireworks AI + * - together: Together AI + * - groq: Groq + * + * @returns {Promise} Resolves when processing is complete + * + * @throws {Error} If: + * - Transcript file is missing or unreadable + * - Invalid LLM service is specified + * - LLM processing fails + * - File operations fail + * + * @example + * // Process with ChatGPT + * await runLLM( + * { prompt: ['summary', 'highlights'], chatgpt: 'GPT_4' }, + * 'content/my-video', + * '---\ntitle: My Video\n---', + * 'chatgpt' + * ) + * + * @example + * // Save prompt and transcript without LLM processing + * await runLLM( + * { prompt: ['summary'] }, + * 'content/my-video', + * '---\ntitle: My Video\n---' + * ) */ export async function runLLM( options: ProcessingOptions, @@ -32,48 +91,58 @@ export async function runLLM( llmServices?: LLMServices ): Promise { log(step(`\nStep 4 - Running LLM processing on transcript...\n`)) + + // Map of available LLM service handlers const LLM_FUNCTIONS: LLMFunctions = { - llama: callLlama, - ollama: callOllama, - chatgpt: callChatGPT, - claude: callClaude, - gemini: callGemini, - cohere: callCohere, - mistral: callMistral, - octo: callOcto, - fireworks: callFireworks, - together: callTogether, - groq: callGroq, + llama: callLlama, // Local inference with Node Llama + ollama: callOllama, // Local inference with Ollama + chatgpt: callChatGPT, // OpenAI's ChatGPT + claude: callClaude, // Anthropic's Claude + gemini: callGemini, // Google's Gemini + cohere: callCohere, // Cohere + mistral: callMistral, // Mistral AI + octo: callOcto, // OctoAI + fireworks: callFireworks, // Fireworks AI + together: callTogether, // Together AI + groq: callGroq, // Groq } try { - // Read the transcript file + // Read and format the transcript const tempTranscript = await readFile(`${finalPath}.txt`, 'utf8') const transcript = `## Transcript\n\n${tempTranscript}` - // Generate the prompt + // Generate and combine prompt with transcript const prompt = generatePrompt(options.prompt) const promptAndTranscript = `${prompt}${transcript}` if (llmServices) { log(wait(` Processing with ${llmServices} Language Model...\n`)) + + // Get the appropriate LLM handler function const llmFunction: LLMFunction = LLM_FUNCTIONS[llmServices] if (!llmFunction) { throw new Error(`Invalid LLM option: ${llmServices}`) } - // Set up a temporary file path and call the LLM function + + // Process content with selected LLM const tempPath = `${finalPath}-${llmServices}-temp.md` await llmFunction(promptAndTranscript, tempPath, options[llmServices]) log(wait(`\n Transcript saved to temporary file:\n - ${tempPath}`)) - // Read generated content and write front matter, show notes, and transcript to final markdown file + + // Combine results with front matter and original transcript const showNotes = await readFile(tempPath, 'utf8') - await writeFile(`${finalPath}-${llmServices}-shownotes.md`, `${frontMatter}\n${showNotes}\n${transcript}`) - // Remove the temporary file + await writeFile( + `${finalPath}-${llmServices}-shownotes.md`, + `${frontMatter}\n${showNotes}\n\n${transcript}` + ) + + // Clean up temporary file await unlink(tempPath) log(success(`\n Generated show notes saved to markdown file:\n - ${finalPath}-${llmServices}-shownotes.md`)) } else { + // Handle case when no LLM is selected log(wait(' No LLM selected, skipping processing...')) - // If no LLM is selected, just write the prompt and transcript await writeFile(`${finalPath}-prompt.md`, `${frontMatter}\n${promptAndTranscript}`) log(success(`\n Prompt and transcript saved to markdown file:\n - ${finalPath}-prompt.md`)) } diff --git a/src/utils/runTranscription.ts b/src/utils/runTranscription.ts index 21099de..5001073 100644 --- a/src/utils/runTranscription.ts +++ b/src/utils/runTranscription.ts @@ -1,5 +1,12 @@ // src/utils/runTranscription.ts +/** + * @file Orchestrator for running transcription services on audio files. + * Manages the routing and execution of various transcription services, + * both local and cloud-based. + * @packageDocumentation + */ + import { callWhisper } from '../transcription/whisper.js' import { callWhisperPython } from '../transcription/whisperPython.js' import { callWhisperDocker } from '../transcription/whisperDocker.js' @@ -10,11 +17,69 @@ import { log, step } from '../models.js' import { TranscriptServices, ProcessingOptions } from '../types.js' /** - * Manages the transcription process based on the selected service. - * @param {ProcessingOptions} options - The processing options. - * * @param {TranscriptServices} transcriptServices - The transcription service to use. - * @param {string} finalPath - The base path for the files. - * @returns {Promise} + * Orchestrates the transcription process using the specified service. + * Routes the transcription request to the appropriate service handler + * and manages the execution process. + * + * Available transcription services: + * Local Services: + * - whisper: Default Whisper.cpp implementation + * - whisperDocker: Whisper.cpp running in Docker + * - whisperPython: OpenAI's Python Whisper implementation + * - whisperDiarization: Whisper with speaker diarization + * + * Cloud Services: + * - deepgram: Deepgram's API service + * - assembly: AssemblyAI's API service + * + * @param {ProcessingOptions} options - Configuration options including: + * - whisper: Whisper model specification + * - whisperDocker: Docker-based Whisper model + * - whisperPython: Python-based Whisper model + * - whisperDiarization: Diarization model + * - speakerLabels: Enable speaker detection (Assembly) + * - Additional service-specific options + * + * @param {string} finalPath - Base path for input/output files: + * - Input audio: `${finalPath}.wav` + * - Output transcript: `${finalPath}.txt` + * + * @param {string} frontMatter - YAML front matter content for the transcript + * (Reserved for future use with metadata) + * + * @param {TranscriptServices} [transcriptServices] - The transcription service to use: + * - 'whisper': Local Whisper.cpp + * - 'whisperDocker': Containerized Whisper + * - 'whisperPython': Python Whisper + * - 'whisperDiarization': Whisper with speaker detection + * - 'deepgram': Deepgram API + * - 'assembly': AssemblyAI API + * + * @returns {Promise} Resolves when transcription is complete + * + * @throws {Error} If: + * - Unknown transcription service is specified + * - Service-specific initialization fails + * - Transcription process fails + * - File operations fail + * + * @example + * // Using local Whisper + * await runTranscription( + * { whisper: 'base' }, + * 'content/my-video', + * '---\ntitle: My Video\n---', + * 'whisper' + * ) + * + * @example + * // Using AssemblyAI with speaker labels + * await runTranscription( + * { speakerLabels: true }, + * 'content/my-video', + * '---\ntitle: My Video\n---', + * 'assembly' + * ) */ export async function runTranscription( options: ProcessingOptions, @@ -24,26 +89,38 @@ export async function runTranscription( ): Promise { log(step(`\nStep 3 - Running transcription on audio file using ${transcriptServices}...`)) - // Choose the transcription service based on the provided option + // Route to appropriate transcription service switch (transcriptServices) { case 'deepgram': + // Cloud-based service with advanced features await callDeepgram(options, finalPath) break + case 'assembly': + // Cloud-based service with speaker diarization await callAssembly(options, finalPath) break + case 'whisper': + // Local Whisper.cpp implementation await callWhisper(options, finalPath) break + case 'whisperDocker': + // Containerized Whisper.cpp await callWhisperDocker(options, finalPath) break + case 'whisperPython': + // Original Python implementation await callWhisperPython(options, finalPath) break + case 'whisperDiarization': + // Whisper with speaker detection await callWhisperDiarization(options, finalPath) break + default: throw new Error(`Unknown transcription service: ${transcriptServices}`) } diff --git a/test/bench.test.ts b/test/bench.test.ts new file mode 100644 index 0000000..f78e8c8 --- /dev/null +++ b/test/bench.test.ts @@ -0,0 +1,134 @@ +// test/local.test.ts + +import test from 'node:test' +import { strictEqual } from 'node:assert/strict' +import { execSync } from 'node:child_process' +import { existsSync, renameSync } from 'node:fs' +import { join } from 'node:path' + +type Command = { + cmd: string + expectedFile: string + newName: string +} + +const commands: Command[] = [ + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper tiny', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '01_TINY_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper base', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '02_BASE_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper medium', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '03_MEDIUM_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper large-v1', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '04_LARGE_V1_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper large-v2', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '05_LARGE_V2_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisper large-v3-turbo', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '06_LARGE_V3_WHISPERCPP.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDiarization tiny', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '07_TINY_DIARIZATION.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDiarization base', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '08_BASE_DIARIZATION.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDiarization medium', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '09_MEDIUM_DIARIZATION.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDiarization large-v1', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '10_LARGE_V1_DIARIZATION.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperDiarization large-v2', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '11_LARGE_V2_DIARIZATION.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython tiny', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '12_TINY_PYTHON.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython base', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '13_BASE_PYTHON.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython medium', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '14_MEDIUM_PYTHON.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython large-v1', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '15_LARGE_V1_PYTHON.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython large-v2', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '16_LARGE_V2_PYTHON.md' + }, + { + // Process a single YouTube video using Autoshow's default settings. + cmd: 'npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --whisperPython large-v3-turbo', + expectedFile: '2024-09-24-ep0-fsjam-podcast-prompt.md', + newName: '17_LARGE_V3_TURBO_PYTHON.md' + } +] + +test('Autoshow Command Tests', async (t) => { + for (const [index, command] of commands.entries()) { + await t.test(`should run command ${index + 1} successfully`, async () => { + // Run the command + execSync(command.cmd, { stdio: 'inherit' }) + + const filePath = join('content', command.expectedFile) + strictEqual(existsSync(filePath), true, `Expected file ${command.expectedFile} was not created`) + + const newPath = join('content', command.newName) + renameSync(filePath, newPath) + strictEqual(existsSync(newPath), true, `File was not renamed to ${command.newName}`) + }) + } +}) \ No newline at end of file