Release v0.5.0 · zhudotexe/kani

New Feature: Message Parts API

The Message Parts API is intended to provide a foundation for future multimodal LLMs and other engines that require engine-specific input without compromising kani's model-agnostic design. This is accomplished by allowing ChatMessage.content to be a list of MessagePart objects, in addition to a string.

This change is fully backwards-compatible and will not affect existing code.

When writing code with compatibility in mind, the ChatMessage class exposes ChatMessage.text (always a string or None) and ChatMessage.parts (always a list of message parts), which we recommend using instead of ChatMessage.content. These properties are dynamically generated based on the underlying content, and it is safe to mix messages with different content types in a single Kani.

Generally, message part classes are defined by an engine, and consumed by the developer. Message parts can be used in any role’s message - for example, you might use a message part in an assistant message to separate out a chain of thought from a user reply, or in a user message to supply an image to a multimodal model.

For more information, see the Message Parts documentation.

Up next: we're adding support for multimodal vision-language models like LLaVA and GPT-Vision through a kani extension!

Improvements

LLaMA 2: Improved the prompting in non-strict mode to group consecutive user/system messages into a single [INST] wrapper. See the tests for how kani translates consecutive message types into the LLaMA prompt.
Other documentation and minor improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0

New Feature: Message Parts API

Improvements