From 1c7ad4efc066e3e6e524898b06614e7d869c840e Mon Sep 17 00:00:00 2001
From: Adithya S K <adithyaskolavi@gmail.com>
Date: Thu, 4 Apr 2024 11:41:54 +0530
Subject: [PATCH] updated readme - Adithya S K

---
 README.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index fa6a4bf..c38382f 100644
--- a/README.md
+++ b/README.md
@@ -1,16 +1,16 @@
-# IndicEval 
+# Indic Eval 
 
 A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks, aiding in performance assessment and comparison within the Indian language context
 
-## Context
+### Context
 
 Indic Eval is a lightweight LLM evaluation suite built on top of [LightEval](https://github.com/huggingface/lighteval), which Hugging Face has been using internally. We at [cognitivelab.in](https://cognitivelab.in) utilize it for internal testing of LLMs on various tasks. 
 
 While early in development, it offers a collaborative space for community-driven advancement in Indic language modeling. Please note that stability is a work in progress. Feel free to contribute or raise issues!
 
-## What does it Offer on top of Light eval
+### What does it Offer on top of Light eval
 - ✅ Intergration with [Indic LLM leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard) 
-- ✅ Support of the following tranlsted dataset ARC, Hellaswag , Boolq, MMLU , Winogrande to indian languages
+- ✅ Support of the following translated dataset ARC, Hellaswag , Boolq, MMLU , Translate[IND 22] to indian languages
 - ✅ [Skypilot](https://skypilot.readthedocs.io/) Integration to run Evals on 15+ CLoud providers with simple configuration
 - ✅ Support for Language Base evaluation rather than task based
 
@@ -158,11 +158,11 @@ Please refer to the [documentation](https://skypilot.readthedocs.io/en/latest/do
 
 ## To-DO
 - [x] Proper Intergration with [Indic_LLM_Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard)
-- [x] Make Intergration with [Indic_LLM_Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard) more secure
 - [x] Test out ARC-Easy for all indic Languages and see consistency
 - [x] Test out ARC-Challenge for all indic Languages and see consistency
-- [ ] Test out Hellaswag for all indic Languages and see consistency
+- [x] Test out Hellaswag for all indic Languages and see consistency
 - [ ] Test out Boolq for all indic Languages and see consistency
+- [ ] Make Intergration with [Indic_LLM_Leaderboard](https://huggingface.co/spaces/Cognitive-Lab/indic_llm_leaderboard) more secure
 - [ ] Test out MMLU for all indic Languages and see consistency
 - [ ] Test out Translate for all indic Languages and see consistency
 - [ ] Integrate VLLM for faster evaluation
@@ -462,6 +462,7 @@ python3 -m build .
 - To run the evaluations, you can use `run_indic_evals_accelerate`,`run_evals_accelerate.py` or `run_evals_nanotron.py`.
 - [src/indic_eval](https://github.com/adithya-s-k/indic_eval/tree/main/src/indic_eval) contains the core of the lib itself
     - [indic_eval](https://github.com/adithya-s-k/indic_eval/tree/main/src/indic_eval) contains the core of the library, divided in the following section
+        - [indic_accelerate.py](https://github.com/adithya-s-k/indic_eval/blob/main/src/indic_eval/main_accelerate.py) is the entry point to run indic languge benchmark
         - [main_accelerate.py](https://github.com/adithya-s-k/indic_eval/blob/main/src/indic_eval/main_accelerate.py) and [main_nanotron.py](https://github.com/adithya-s-k/indic_eval/blob/main/src/indic_eval/main_nanotron.py) are our entry points to run evaluation
         - [logging](https://github.com/adithya-s-k/indic_eval/tree/main/src/indic_eval/logging): Our loggers, to display experiment information and push it to the hub after a run
         - [metrics](https://github.com/adithya-s-k/indic_eval/tree/main/src/indic_eval/metrics): All the available metrics you can use. They are described in metrics, and divided between sample metrics (applied at the sample level, such as a prediction accuracy) and corpus metrics (applied over the whole corpus). You'll also find available normalisation functions.