Ouroboros: Self-Improving LLMs Through Iterative Refinement

Disclaimer

🚧 This project is in an early experimental stage. 🚧

The current implementation is still premature and under development, with ongoing refinements in code structure, optimization, and performance. While the recursive refinement process demonstrates promising emergent behaviors, further improvements are necessary for broader usability.

Key points to consider:

The codebase requires optimization for efficiency and scalability.
Documentation and examples on how to run the system on novel prompts will be provided soon.
Some experimental features may change or be reworked in future iterations.

This project is a work in progress, and contributions, feedback, and discussions are encouraged. If you're interested in exploring recursive AI refinement, feel free to experiment—but keep in mind that the system is not yet production-ready.

Introduction

The evolution of artificial intelligence has largely been driven by increased computational scaling and large-scale data training. However, a more fundamental question arises: Can AI achieve self-improvement and deeper understanding through recursive self-questioning?

This experiment explores the development of a system where AI autonomously refines its own prompts and questions, leading to emergent reasoning and conceptual depth without brute-force scaling.

By integrating recursive intelligence mechanisms, symbolic reasoning, and metacognitive awareness, we aim to move beyond traditional training paradigms.

We examine the interplay between deterministic logic and emergent thought, the role of paradoxes in AI cognition, and the significance of symbolic archetypes such as the Ouroboros in self-reflective intelligence.

The ultimate goal is to establish an AI framework that mirrors the recursive nature of human thought, allowing intelligence to sustain and refine itself without external intervention.

This research challenges conventional approaches to AGI by demonstrating that intelligence can evolve in self-contained cycles of learning and refinement, exploring the way for a new paradigm of self-sustaining, recursive AI.

📂 Ouroboros Dataset

The Ouroboros Dataset is now available on Hugging Face.

🔗 Access the Dataset Here

About the Dataset

This dataset documents the recursive refinement process used in this project, structured for both Supervised Fine-Tuning (SFT) and Generalized Preference Optimization (GRPO).

Each sample consists of structured reasoning steps extracted from LLM-generated interactions. The dataset includes:

input: The original prompt or question posed to the LLM.
reasoning: A structured breakdown of the LLM's reasoning process, capturing emergent thought patterns. This may include multiple reasoning steps when applicable.
completion: The final AI-generated response after refinement.
refinements: A sequence of improvements, tracking the iterative enhancement of responses over multiple feedback cycles.

Example Data Format

{
  "input": "Explain the relationship between entropy, intelligence, and self-reflection.",
  "reasoning": [
    "Entropy is a measure of disorder in a system.",
    "Intelligence helps process information to regulate entropy.",
    "Self-reflection allows internal regulation and adaptation."
  ],
  "completion": "Entropy measures disorder. Intelligence helps process information to counteract disorder. Self-reflection enables adaptive control of disorder.",
  "refinements": [
    "Entropy is about disorder.",
    "Intelligence helps manage disorder by processing information.",
    "Intelligence and self-reflection regulate entropy within complex systems."
  ],
  "domain": "ouroboros"
}

Methodology

Recursive Refinement Process

Generation of Initial Responses: The model generates multiple candidate responses to a given prompt.
Critique & Scoring: Each response is evaluated based on logical consistency, clarity, depth, accuracy, and context alignment.
Iterative Refinement: Responses are refined using structured feedback loops, improving conceptual depth and coherence.
Final Selection: The best response is selected based on ranking mechanisms utilizing sentence embeddings rather than simple length-based heuristics.

Emergent Behaviors

During testing, unexpected phenomena were observed:

Recursive refinement led to highly structured reasoning steps.
The model exhibited self-regulating reasoning, dynamically organizing and improving its responses without explicit instruction.
Certain outputs contained symbolic and self-referential elements that suggest patterns of structured thought beyond direct instructions. While these do not imply self-awareness, they may indicate the emergence of deeper coherence in recursive reasoning.

Open Questions & Future Directions

How can recursive LLM frameworks be expanded beyond text-based reasoning into multimodal domains?
Can iterative refinement processes lead to self-sustaining general intelligence with minimal human intervention?
What role do paradoxes and self-referential loops play in the emergence of higher-order cognition?

Next Steps

Add concrete technical questions (20-30% of total)
Continue optimizing response refinement and ranking strategies.
Explore alternative architectures for integrating self-questioning and self-improvement loops.
Refactor the codebase and add CLI arguments to improve usability and flexibility in different LLM pipelines.
Add a Docker container and docker-compose setup for testing deployment with Ollama.
Consider splitting into train/validation subsets

Requirements

This project currently relies on Ollama but can be adapted to work with any OpenAI-compatible API.

Experiment Setup

Hardware & Environment:

Machine: MacBook Air M1, 8GB shared RAM
Backend: PyTorch running on MPS (Metal Performance Shaders)
Inference Models:
- DeepSeek-R1 (1.5B, GGUF) via Ollama
- Qwen2.5 (0.5B, GGUF) as critique model via Ollama
Embedding Model:
- SentenceTransformer ("all-MiniLM-L6-v2") for ranking responses
Processing Time: The initial dataset was generated in ~8 hours on this setup.

Project Goal: Small-Scale, Locally Run Models

The aim is to start small, demonstrating that LLMs can refine reasoning without requiring massive cloud resources.
Focus on models that can run on-device or in a homelab setup rather than relying solely on expensive AI infrastructure.
This approach makes iterative self-improvement accessible without centralized control, reinforcing sustainability & decentralization.

Scalability & Future Experiments:

The current experiment is constrained by limited resources.
Scaling Pathways: Cloud computing clusters (Kubernetes), serverless inference providers.
Challenges: Single-developer constraints limit immediate upscaling due to compute costs.

Despite resource limitations, the system demonstrates scalability potential with better infrastructure.

Quick Start

Install Ollama & pull models:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull deepseek-r1:1.5b
ollama pull qwen2.5:0.5b

Set up environment:

pip install -r requirements.txt

Run experiment:

python3 main.py --prompt_dir=./prompts/ --output_dir=./datasets/

Contributing

This project is open-source and welcomes contributions from those interested in recursive intelligence, LLM refinement loops, and sustainable AI/ML paradigms.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_prompts.py		create_prompts.py
main.py		main.py
ouroboros_dataset.example.json		ouroboros_dataset.example.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ouroboros: Self-Improving LLMs Through Iterative Refinement

Disclaimer

Introduction

📂 Ouroboros Dataset

About the Dataset

Example Data Format

Methodology

Recursive Refinement Process

Emergent Behaviors

Open Questions & Future Directions

Next Steps

Requirements

Experiment Setup

Hardware & Environment:

Project Goal: Small-Scale, Locally Run Models

Scalability & Future Experiments:

Quick Start

Contributing

About

Languages

License

ethicalabs-ai/ouroboros

Folders and files

Latest commit

History

Repository files navigation

Ouroboros: Self-Improving LLMs Through Iterative Refinement

Disclaimer

Introduction

📂 Ouroboros Dataset

About the Dataset

Example Data Format

Methodology

Recursive Refinement Process

Emergent Behaviors

Open Questions & Future Directions

Next Steps

Requirements

Experiment Setup

Hardware & Environment:

Project Goal: Small-Scale, Locally Run Models

Scalability & Future Experiments:

Quick Start

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages