Skip to content

embabel/code-index

Repository files navigation

Build

Kotlin Java Spring Apache Tomcat Apache Maven ChatGPT JSON Docker IntelliJ IDEA Neo4j

    

    

Embabel Issue Manager Demo

Issue manager agent demonstrating the Embabel agent framework. Uses Neo4j as vector database, although this could be replaced by other choices such as Postgres with pgvector.

We recommend using a database that has vector features rather than a pure vector database, as this allows you to store your data in a more structured way, and use the vector features for semantic search.

Shows:

  • Integration with GitHub
  • Domain modeling
  • Neo4j OGM for graph-based data management and vector searching
  • Simple agent flow integrating different data sources
  • Local embedding models with Ollama

Features

IssueManagerAgent

The IssueManagerAgent is a GitHub issue management system that demonstrates Embabel's capabilities:

  • Intelligent Issue Matching: Transforms natural language queries into vector embeddings and uses Neo4j's vector search to find semantically relevant GitHub issues
  • Automated Issue Research: Conducts web searches via the integrated BraveWebSearchService to gather comprehensive background information on issues
  • Developer Assignment: Intelligently matches issues to developers based on skills and expertise using LLM-powered reasoning
  • Parallel Processing: Efficiently processes multiple issues concurrently with configurable concurrency limits
  • Multimodel Orchestration: Uses different LLM models for different tasks (query generation vs. issue research)based on their strengths

Neo4j OGM Integration

Embabel leverages Neo4j's Object Graph Mapping (OGM) for powerful graph-based data management:

  • Vector Search Capabilities: Utilizes Neo4j's vector index for high-performance semantic similarity matching
  • Graph Domain Modeling: Maps domain objects like Issue, Repository, and Organization to Neo4j nodes with proper relationships
  • Transactional Operations: Ensures data consistency with Spring-managed transactions
  • Cypher Query Language: Performs advanced vector similarity searches using Cypher queries
// Example of vector search in Neo4j from IssueIndexer.kt
val query = """
    CALL db.index.vector.queryNodes('issue_embeddings', $k, $queryVector) 
    YIELD node AS m, score 
    WHERE ($state = 'ALL' OR m.state = $state) AND score > $similarityThreshold
    RETURN m AS match, score
    ORDER BY score DESC
"""

Vector Embeddings for Semantic Search

  • LLM-Powered Embeddings: Issues are embedded into vector space using embedding models
  • Semantic Similarity: Searches performed using cosine similarity in Neo4j's vector index
  • Configurable Parameters: Adjustable similarity threshold, result count, and issue state filtering
  • Persistent Storage: Embeddings stored directly with issue data in Neo4j for efficient retrieval
// Example from Issue.kt showing Neo4j OGM entity with vector embedding
@NodeEntity
data class Issue(
    @Id
    val id: Long,
    val title: String,
    val body: String?,
    val number: Int,
    override val embedding: Embedding? = null,
    @Relationship(type = "BELONGS_TO", direction = Relationship.Direction.OUTGOING)
    val repository: Repository?,
    val state: GHIssueState,
) : Embedded

Getting Started

Environment variables:

  • OPENAI_API_KEY API key
  • GITHUB_PERSONAL_ACCESS_TOKEN: This is optional, but you will likely be rate limited without it. Without this set, the application may hang as GitHub refuses to respond to the request in a timely manner.
  • You may wish to customize the NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD environment variables, although the defaulting in application.properties will work with the Docker file.

Create an .mcp.env file to contain secrets for the Docker MCP gateway. The mcp.env.example file is an example.

Run docker compose up to run the Neo4j database. You can also change your Neo configuration to use your own Neo database if you prefer.

Run the Cypher in scripts/db.cypher tp create the vector index for issues.

Install Ollama and download the necessary local models:

ollama pull all-minilm:l6-v2
ollama pull gemma:2b

Web search uses the Model Context Protocol (MCP) to access tools and services.

The default source is the Docker Desktop MCP server, which is installed with Docker Desktop.

To ensure tools are available and startup doesn't time out, first pull models with:

docker login
docker mcp gateway run

When the gateway has come up you can kill it and start the Embabel server.


Run the shell script to start Embabel under Spring Shell:

```bash
cd scripts
./shell.sh

IssueManagerAgent

Manages GitHub issues with intelligent search, research, and developer assignment.

// Example of how IssueManagerAgent processes issues
@Action
fun processIssues(issues: ResearchedIssues, context: OperationContext): IssueActionSummary {
    val actionList = issues.issues.filter { it.difficultyOfSolving < 5 }
    val summary = context.promptRunner(LlmOptions.fromModel(config.summaryModel))
        .generateText(
            """
            Summarize the following issues that are easy to solve:
            ${actionList.joinToString("\n") { "${it.issue.title} (${it.issue.number}) - Difficulty: ${it.difficultyOfSolving}" }}
        """.trimIndent()
        )
    val allDevs = developerRepository.findAll()
    val assignedIssues = context.parallelMap(actionList, maxConcurrency = 5) { issue ->
        val developerChoice = findAppropriateDeveloper(
            issue.issue,
            context.promptRunner(LlmOptions.fromModel(config.summaryModel)),
            allDevs
        )
        if (developerChoice != null) {
            issue.assignToDeveloper(developerChoice.developer)
        }
        AssignedIssue(issue, developerChoice)
    }

    return IssueActionSummary(
        text = summary,
        issuesToAction = assignedIssues,
    )
}

Run these commands in the shell to interact with the agents:

  • developers: Create all sample developers. Do this once only for each database.
  • index-repo: Index issues from a GitHub repository into Neo4j.
  • search-issues: Search for issues using natural language queries. Put queries in quotes if need be. Presently relies entirely on vector search. Should probably add full text search, but need to understand requirements.
  • cluster-issues: Find similar issues
  • process-issues: Invoke an agent. Based on a query, find issues, research them, and assign them to developers.

Simple Example Agent

WriteAndReviewAgent (for reference)

Uses one LLM with a high temperature and creative persona to write a story based on your input, then another LLM with a low temperature and different persona to review the story.

x "Tell me a story about...[your topic]"

Architecture

Embabel integrates with Neo4j using Spring Data Neo4j and Neo4j OGM, providing:

  • Transactional Operations: Ensures data consistency across complex operations
  • Vector Indexing: Enables high-performance semantic search capabilities
  • Graph-based Relationship Management: Maintains complex relationships between entities
  • Efficient Query Execution: Leverages Cypher for optimized graph queries
  • Agent Orchestration: Coordinates multiple specialized agents with different capabilities

Moving This Codebase Forward

  • Although written in Kotlin, this codebase can be used in Java applications and new functionality can be added in Java.
  • The use of Spring Shell is convenient for getting started, but it's also easy to use the agents in any other UI or in a headless application. Please see Embabel examples for use of MCP clients such as Claude Desktop, which is particularly important.
  • Issue research is hard-coded to use the Embabel Agent Frameworkrepository, but this is easy to change.
  • Developers are hard-coded, but this is easy to change.

What is Embabel?

Embabel is an advanced AI agent orchestration framework for Kotlin and Java applications that:

  • Simplifies the creation and management of AI agents with a declarative approach
  • Provides tools for integrating multiple LLM models with different strengths
  • Offers built-in support for vector databases and semantic search
  • Enables parallel processing of complex workflows
  • Integrates seamlessly with Spring Boot and other JVM frameworks

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published