Skip to content

Latest commit

 

History

History
123 lines (88 loc) · 3.08 KB

README.md

File metadata and controls

123 lines (88 loc) · 3.08 KB

DeRAG

Project Overview

This project is a submission to the "Filecoin IPC Data Economy Hackathon"

Slides can be found at https://docs.google.com/presentation/d/1AVLqkqH05mcDIStsDQljMHI_xg8kGiZ06iIAb6t2Bsg/edit?usp=sharing

Demo

Archiecture

alt text

Indexer

Crawler

  • For youtube transcript crawling
    • We use a Crawl Frontier design where videoids to crawl are seeded from queries of keywords / playlist ids
      {
        type: 'search',
        keyword: 'Ang Mo Kio property',
      }
    {
      type: 'playlist',
      playlistId: 'PLYIHyr0q2nW8W1Hr0PyMyztqYFt9ZoLbs',
    }
    
    • videoIds are written to DB Table on tableland
    • At crawl() it load latest videoIds for transcript processing.

Indexer

  • ASR (Automatic Speech Recognition)
    • whisper from OpenAI for transcript and translation
  • breakdown transcript into chunks and save as index for LLM usage

LLM

  • langchain / openAI for LLM
  • langsmith for LLM observability

Tehnology Usage in API

  • Filecoin

    • (via lighthouse sdk)

    • for files upload

    • apps/api/src/adapters/lighthouse.ts

    • (via lighthouse contract)

    • for PoDSI check

    • apps/contracts/src/DeRag.sol L34

    • Tableland

      • apps/api/src/adapters/tableland.ts
      • apps/api/src/crawl.service.ts L123 (insert table for crawl requests)
      • apps/api/src/index.service.ts L121 (read table for index cids)
  • Nestjs

  • for youtube

    • youtubei metadata loading
    • youtube-transcript for transcript (testing when whisper not in use)

Techstack of validator

  • We create Validator which verify indices created by indexer
  • With Trustless Verification

Sample Check approach

  • validate() at validator.service.ts compare
  • BLEU score

ZKML approach

Techstack of Smart Contract

Building Env

  • We use turborepo for monorepo cli
  • it is recommend to pair with env-cmd to populate environment from .env
  • values can be refer to env.sample

typical commands at

# start server
env-cmd pnpm dev
# eslint
env-cmd pnpm lint
# format(prettier)
env-cmd pnpm format

Setup Whisper at local

Testing

# unit test (.spec.ts) / integration test (.int.spec.ts)
env-cmd pnpm test