Skip to content

Calliope is an experimental agentic framework that brings modern AI tools like generative AI (large language models and image generation models), computer vision, and vector databases to bear in the creation of interactive art works that dynamically produce images, video, text, and sound.

Notifications You must be signed in to change notification settings

chrisimmel/calliope

Repository files navigation

Calliope

image

(A Calliope self-portrait)

For Calliope examples, see: https://chrisimmel.com/collection/calliope

See also Calliope's DeepWiki documentation: Ask DeepWiki

Or, listen to the podcast (courtesy NotebookLM).

In Greek mythology, Calliope (/kəˈlaɪ.əpi/ kə-LY-ə-pee; Ancient Greek: Καλλιόπη, romanized: Kalliópē, lit. 'beautiful-voiced') is the Muse who presides over eloquence and epic poetry; so called from the ecstatic harmony of her voice. Hesiod and Ovid called her the "Chief of all Muses".

Calliope is an experimental agentic framework that brings modern AI tools like generative AI (large language models and image generation models), computer vision, and vector databases to bear in the creation of interactive art works that dynamically produce images, video, text, and sound. The core system is a flexible framework, service, and API that enables an artist to build repeatable interaction strategies. The API accepts inputs such as images, text, and voice, then processes these through an artist-defined pipeline of AI models to generate its multimdia output.

The focus is on enabling the creation of works that are "aware" of the environment in which they are installed or running, in the sense that they can see, hear, and react to things, people, and sounds around them.

  • Processing is driven by pluggable modules called story strategies (or "storytellers" in Clio parlance), meant to be experimented with and extended by the artist-engineers who make use of the framework.

  • AI models can be any commercial or open source models accessible by API. Example providers include OpenAI, Anthropic, HuggingFace, Stability, Replicate, Runway, Azure.

  • Images are interpreted by a combination of a multimodal LLM (e.g. GPT-4o, Claude, Gemini) and the Azure computer vision API to generate a rich text description, lists of recognized objects and text, and metadata that can be passed to other components as input.

  • Large language models are configured into sequences that process the inputs and generate narrative output that can then be illustrated with images or video generated by other models (Flux, Stable Diffusion, Runway, GPT-Image-1).

  • A semantic search facility is provided using the Pinecone vector database, with a scheduled ETL pipeline to index generated media.

There is a strong emphasis on narrative structure. Calliope invents and recites stories. This can be through any client of its story API. The two existing clients are:

  • An ESP32-Sparrow -- one of a family of bespoke hardware devices with a screen and optional input sensors such as camera and microphone.

  • Clio -- a small TypeScript client included in this repo, runnable in any browser on desktop or mobile devices. Clio optionally takes image input from any accessible webcam, or audio input, and passes this with its request for a story continuation. Calliope uses this input to condition its continuation of the story.

Try it Out!

You can try Calliope and Clio at https://calliope.chrisimmel.com/clio/.

image

Hints:

  • Clio works with "storytellers" in Calliope to construct a story for you, one page at a time. You request a new page by tapping any of the buttons at the bottom of the screen.

    • Click the plus (plus) icon to let the storyteller simply continue along its current train of thought.
    • Click the microphone (microphone) icon and speak a few words to give the storyteller an idea or inspiration.
    • Click the camera (camera) icon to take a photo and send it to the storyteller for inspiration.
  • After you do this, Calliope will work for several seconds and give you a new page.

  • You can review previous pages by swiping to the right. (Clicking the arrows also works.)

  • Input images and sounds are not kept on the Calliope server, so you don't need to worry about it hording a cache of your photos and soundlclips!

  • You can start a new story at any time from the menu. The "Create New Story" option lets you start a new story using the storyteller of your choice, from either a photo, a spoken sound clip, or "thin air".

  • You can browse stories you've created in the past and select them to either review or update them.

  • Coming soon:

    • Bookmark stories and share them with friends.

Table of Contents

About

Calliope is an experimental agentic framework that brings modern AI tools like generative AI (large language models and image generation models), computer vision, and vector databases to bear in the creation of interactive art works that dynamically produce images, video, text, and sound.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published