Skip to content

krishnaura45/FinAI

Repository files navigation

FinAI: LLM-based Equity Research Engine

Tool UI


Table of Contents


Introduction 📄

FinAI is a Retrieval-Augmented Generation (RAG) based equity news analyzer that simplifies information retrieval for investors, analysts, and financial researchers. Built using LangChain, OpenAI, Gemini, FAISS, and local LLM backends, it allows users to input article URLs and query them in natural language.


Problem Statement 🚫

  • Equity research is manual, fragmented, and time-consuming.
  • Analysts must manually browse multiple sources and interpret insights.
  • LLMs like ChatGPT alone cannot handle multi-source, large documents, or real-time querying efficiently.
  • Need for a tool that can ingest articles, process them intelligently, and provide accurate, real-time answers.

Working (Pipeline Stages) ⚙️

  1. Data Ingestion

    • News article URLs are fetched using SeleniumURLLoader.
  2. Text Splitting

    • Articles are chunked using LangChain's RecursiveCharacterTextSplitter to fit LLM token limits.
  3. Embeddings & Vector Store

    • Embeddings created via OpenAI or HuggingFace models (like all-MiniLM-L6-v2).
    • FAISS stores and retrieves similar content based on queries.
  4. Querying via LLMs

    • User queries are answered using OpenAI/Gemini/GPT4All/LLama-2 LLMs via RetrievalQAWithSourcesChain.
    • Local models (llama-cpp-python, gpt4all) ensure offline support.
  5. Answer + Source Display

    • Source-linked responses shown via Streamlit.

Results 🔄

  • Tool providing real-time updates on each stage’s progress image

  • Tool providing exact accurate answer on straightforward (direct) queries image

  • Customized same pipeline using both online APIs (OpenAI, Gemini) and offline models (LLaMA, GPT4All).

  • Benchmark Table

    Backend Avg Latency QA Relevance Token Cost Use-Case Fit
    OpenAI (gpt-3.5-turbo) ~2.1s 96.4% High (Paid) Best for fast, high-quality responses
    Gemini Pro ~2.8s 92.1% Free (limited) Good fallback; prone to hallucination
    Local LLaMA (7B) ~5.3s 93.2% None Reliable offline QA; requires setup
    GPT4All (q4_0) ~7.2s 86.5% None Works offline; lower accuracy in deep QA

Files & Structure 📁

  • app_versions/: Contains different Streamlit app versions based on LLMs — OpenAI, Gemini, GPT4All, and LLaMA.
  • data_files/: Includes sample article text files and URL lists used during experimentation.
  • notebooks/: Jupyter notebooks demonstrating individual components of the RAG pipeline (e.g., vector store testing, embeddings, chunking).
  • test/: Debugging and testing scripts for Gemini and LLaMA-based app flows.
  • .env: Stores environment variables like API keys for OpenAI and Gemini.
  • faiss-store-hf.pkl: Vector store generated using HuggingFace embeddings.
  • faiss-store-openai.pkl: Vector store generated using OpenAI embeddings.
  • vector-index.pkl: Sample vector index created using notebook for FAISS validation.
  • main.py: Primary file containing final UI code after experimentation.
  • requirements.txt: Python dependencies required for running the project.
  • README.md: Documentation and usage guide for the project.
  • models/: 🔐 Not uploaded — should contain downloaded local LLMs (refer to Installation)

Installation 🚧

  1. Clone the repository
git clone https://github.com/your-username/FinAI.git
cd FinAI
  1. Install dependencies
pip install -r requirements.txt
  1. Set up API keys Create a .env file just like the reference being provided and add:
OPENAI_API_KEY=your-key-here
GOOGLE_API_KEY=your-google-studio-api-key
  1. For local LLM usage
  • Download .gguf models from LLaMA HF or GPT4All.
  • Create a models/ directory and place them inside.
  • Update model_path in corresponding app files (e.g. app_local_llama.py).
  1. Run the tool
streamlit run main.py

Tech Stack 🚀

Python LangChain Streamlit OpenAI Google Gemini FAISS HuggingFace LLaMA GPT4All

  • LangChain: Orchestration of RAG pipeline
  • Streamlit: Interactive web interface
  • OpenAI & Gemini APIs: Cloud-based LLMs
  • LLaMA / GPT4All: Local LLMs
  • HuggingFace Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
  • FAISS: Vector similarity search and store
  • Python + Selenium: Document scraping + automation

References 📚


Future Scope 🔮

  • Real-time financial API integration (e.g. stock prices, reports)
  • LLM-based summarization for multi-source insights
  • Domain-tuned custom LLMs for financial jargon
  • Globalization support via multi-language ingestion

Contributing 🤝

We welcome contributions! Feel free to:

  • Fork the repo
  • Create a new branch
  • Submit PR with changes or improvements

Thanks for Visiting 😊!

We hope FinAI helps you gain actionable insights with less effort. If you like it, give the repo a ⭐ and feel free to reach out for suggestions or ideas!


About

🚀Information Retrieval & Question Answering 🔥LLMs 🌐Streamlit

Topics

Resources

Stars

Watchers

Forks