Skip to content

Latest commit

 

History

History
567 lines (390 loc) · 19 KB

README.md

File metadata and controls

567 lines (390 loc) · 19 KB
specify theme context for images

SEMIKONG - The Open Source Foundation Model for Semiconductor Manufacturing Process

🤗 Hugging Face Dataset • 🤖 Hugging Face Model

👩‍🚀 Ask questions or discuss ideas on GitHub

📝 Check out SEMIKONG Tech Report


📕 Table of Contents

What is SEMIKONG?

Introduction

  • 🤖 SEMIKONG is an open-source, industry-specific large language model (LLM) tailored to the semiconductor domain. It aims to address the unique challenges faced by the semiconductor industry, such as the physics and chemistry of semiconductor devices and processes, by incorporating domain-specific knowledge into the model.

  • 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the SEMIKONG series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,

    • SEMIKONG-8B / 70B-Instruct model .

    • SEMIKONG-8B / 70B model .

    • 🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem.

[ Back to top ⬆️ ]

News

[ Back to top ⬆️ ]

Key Features

  • First industry-specific LLM for the semiconductor domain
  • Trained on a comprehensive semiconductor-related text corpus
  • Novel pre-training approach leveraging domain-specific knowledge
  • Superior performance compared to general-purpose LLMs on industry-relevant benchmarks
  • Serves as a valuable foundation for companies to build proprietary models tailored to their needs

Models

SEMIKONG models come in multiple sizes and cater to different use cases. You can also fine-tune SEMIKONG models to meet your specific requirements.

If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.

Instruct models

Model Download
SEMIKONG-70B-Instruct 🤗 Hugging Face
SEMIKONG-8B-Instruct 🤗 Hugging Face

Base models

Model Download
SEMIKONG-70B 🤗 Hugging Face
SEMIKONG-8B 🤗 Hugging Face

Model info

  • For chat and base models
Model Intro Default context window Pretrained tokens
70B series models A powerful version of SEMIKONG that suitable more complex task 48k 25T
8B series models An economical version of SEMIKONG that able to perform general instruction and chat in semiconductor manufacturing process 48k 25T
  • For chat models

    For chat model limitations, see the explanations below. ⬇️

      The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.


      However, this higher diversity might amplify certain existing issues, including:

    • Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.
    • Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.
    • Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.
    • To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.

[ Back to top ⬆️ ]

How to use SEMIKONG?

Quick start

Getting up and running with SEMIKONG models is simple with multiple choices available.

Choose your path

Select one of the following paths to begin your journey with SEMIKONG!

🎯 Deploy SEMIKONG locally

If you prefer to deploy SEMIKONG models locally,

  • 🙋‍♀️ and you have sufficient resources (for example, NVIDIA A100 40GB), you can choose one of the following methods:

🎯 Not to deploy SEMIKONG locally

If you prefer not to deploy SEMIKONG models locally, you can explore SEMIKONG's capabilities using any of the following options.

🙋‍♀️ Chat with SEMIKONG

If you want to chat with SEMIKONG, you can use one of these online services, which offer a similar user experience:

[ Back to top ⬆️ ]

Quick start - pip

This tutorial guides you through every step of running SEMIKONG-8B-Instruct locally on an A100 (40G) and then performing inference.

Step 0: Prerequisites

Step 1: Prepare your environment

To set up the environment and install the required packages, execute the following command.

git clone https://github.com/aitomatic/semikong.git
cd semikong
pip install -r requirements.txt

Step 2: Download the SEMIKONG model

You can download the weights and tokenizer of SEMIKONG models from the following sources:

Step 3: Perform inference

You can perform inference with SEMIKONG chat or base models as below.

Perform inference with SEMIKONG chat model
  1. Create a file named quick_start.py and copy the following content to it.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_path = '<your-model-path>'
    
    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
    
    # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map="auto",
        torch_dtype='auto'
    ).eval()
    
    # Prompt content: "hi"
    messages = [
        {"role": "user", "content": "hi"}
    ]
    
    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
    output_ids = model.generate(input_ids.to('cuda'))
    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
    
    # Model response: "Hello! How can I assist you today?"
    print(response)
  2. Run quick_start.py.

    python quick_start.py

    Then you can see an output similar to the one below. 🥳

    Hello! How can I assist you today?
Perform inference with SEMIKONG base model
  • SEMIKONG-8B

    Input

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    MODEL_DIR = "pentagoniac/SEMIKONG-8B"
    model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype="auto")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)
    
    input_text = "what is semiconductor ?"
    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_length=256)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

    Output

    Semiconductor is a ....

[ Back to top ⬆️ ]

Quick start - Docker

TBA

[ Back to top ⬆️ ]

Web demo

You can build a web UI demo for SEMIKONG chat models (note that SEMIKONG base models are not supported in this senario).

Step 1: Prepare your environment.

Step 2: Download the SEMIKONG model.

Step 3. To start a web service locally, run the following command.

python demo/web_demo.py -c <your-model-path>

You can access the web UI by entering the address provided in the console into your browser.

[ Back to top ⬆️ ]

Fine-tuning

Finetune code for SEMIKONG 8B and 70B

Hardware Setup

For the SEMIKONG-8B model, a node with 1 GPUs, each with GPU memory larger than 16GB, is recommended.

For the SEMIKONG-70B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh).

A typical hardware setup for finetuning the 70B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.

Quick Start

Deployment

If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.

Software requirements

Before using SEMIKONG quantized models, make sure you've installed the correct software listed below.

Model Software
SEMIKONG 4-bit quantized models AWQ and CUDA
SEMIKONG 8-bit quantized models GPTQ and CUDA

Hardware requirements

Before deploying SEMIKONG in your environment, make sure your hardware meets the following requirements.

Instruction models
Model Minimum VRAM Recommended GPU Example
SEMIKONG-70B-Instruct 170 GB 3 x A100 80GB
5 x A100 40GB
SEMIKONG-8B-Instruct 16 GB 1 x RTX 3060 (12 GB)
1 x RTX 4060 (8 GB)
Base models
Model Minimum VRAM Recommended GPU Example
SEMIKONG-8B 15 GB 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB)
SEMIKONG-70B 200 GB 4 x A800 (80 GB)

[ Back to top ⬆️ ]

Why SEMIKONG?

Ecosystem

SEMIKONG has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity.

Upstream

The SEMIKONG series models follow the same model architecture as Llama. By choosing SEMIKONG, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency.

For example, the SEMIKONG series models are saved in the format of the Llama model. You can directly use LlamaForCausalLM and LlamaTokenizer to load the model. For more information, see Use the chat model.

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", use_fast=False)

model = AutoModelForCausalLM.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", device_map="auto")

[ Back to top ⬆️ ]

Downstream

💡 Tip

  • Feel free to create a PR and share the fantastic work you've built using the SEMIKONG series models.

  • To help others quickly understand your work, it is recommended to use the format of <model-name>: <model-intro> + <model-highlights>.

Serving

If you want to get up with SEMIKONG in a few minutes, you can use the following services built upon SEMIKONG.

[ Back to top ⬆️ ]

Tech report

For detailed capabilities of the SEMIKONG series model, see SEMIKONG: Technical Report.

Citation

@article{semikong2024,
  title={SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model},
  author={Christopher Nguyen et al.},
  journal={arXiv preprint arXiv:2024.xxxxx},
  year={2024}
}

Benchmarks

Chat model performance

SEMIKONG-70B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.

Chat model performance

Evaluation methods and challenges. ⬇️

Base model performance

SEMIKONG-9B

[ Back to top ⬆️ ]

Who can use SEMIKONG?

Everyone! 🙌 ✅

The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license, which means the SEMIKONG series models are free for personal usage, academic purposes, and commercial use.

[ Back to top ⬆️ ]

Misc.

Contributions

This project is the result of a collaborative effort involving multiple companies and individuals:

We would like to express our gratitude to the AI Alliance (https://thealliance.ai) for providing the impetus, resources, and platform for this work, and for collaboration in open science. We also extend our thanks to the member organizations of the AI Alliance, their researchers and engineers for their valuable contributions to this study, including:

  • Noritaka Yokomori (Tokyo Electron)
  • Anthony Annunziata (IBM Research)
  • Sean Hughes (ServiceNow)
  • Phong Nguyen (FPT Software, AI Center)

Their expertise, insights, and collaborative spirit have been instrumental in advancing our research.

[ Back to top ⬆️ ]

Disclaimer

We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns.

[ Back to top ⬆️ ]

License

The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license.

If you create derivative works based on this model, please include the following attribution in your derivative works:

This work is a derivative of [The SEMIKONG Series Model You Base On] by AI Alliance, used under the Apache 2.0 License.

[ Back to top ⬆️ ]