cublas can run now but max token size is greatly reduced. #231

hiqsociety · 2023-09-21T17:16:33Z

hiqsociety
Sep 21, 2023

can run but i cant seem to generate the same amount of context size tokens as without using golang. why?

with 4060 rtx, i can do 1920 max tokens using pure llama.cpp cuda offload 100%

on go-llama, i can only do around ctx size of 650 without oom

@mudler do u know why? how do i fix this?

same setting with llama.cpp but in golang...

        var model string

        flags := flag.NewFlagSet(os.Args[0], flag.ExitOnError)
        flags.StringVar(&model, "m", "./models/7B/ggml-model-q4_0.bin", "path to q4_0.bin model file to load")
        flags.IntVar(&gpulayers, "ngl", 0, "Number of GPU layers to use")
        flags.IntVar(&threads, "t", runtime.NumCPU(), "number of threads to use during computation")
        flags.IntVar(&tokens, "n", 1900, "number of tokens to predict")
        flags.IntVar(&seed, "s", -1, "predict RNG seed, -1 for random seed")

        err := flags.Parse(os.Args[1:])
        if err != nil {
                fmt.Printf("Parsing program arguments failed: %s", err)
                os.Exit(1)
        }
        l, err := llama.New(model, llama.EnableF16Memory, llama.SetContext(655), llama.EnableEmbeddings, llama.SetGPULayers(gpulayers))
        if err != nil {
                fmt.Println("Loading the model failed:", err.Error())
                os.Exit(1)
        }
        fmt.Printf("Model loaded successfully.\n")

        reader := bufio.NewReader(os.Stdin)

        for {
                text := readMultiLineInput(reader)

                _, err := l.Predict(text, llama.Debug, llama.SetTokenCallback(func(token string) bool {
                        fmt.Print(token)
                        return true
                }), llama.SetTokens(tokens), llama.SetTemperature(0.3), llama.SetMirostat(2), llama.SetThreads(threads), llama.SetTopK(90), llama.SetTopP(0.86), llama.SetSeed(seed))
                if err != nil {
                        panic(err)
                }
                embeds, err := l.Embeddings(text)
                if err != nil {
                        fmt.Printf("Embeddings: error %s \n", err.Error())

mudler · 2023-09-21T17:54:10Z

mudler
Sep 21, 2023
Maintainer

There are many parameters to set - what's the batch size you are using? Is f16 enabled?

0 replies

hiqsociety · 2023-09-23T16:41:27Z

hiqsociety
Sep 23, 2023
Author

@mudler
i set the batch size to 1 previously... can u provide the default llama.cpp settings? most people will just use standard default as set by llama.cpp

i'm only having mirostat = 2, temp = 0.3, ngl = 43, t = 1, ctx = 1920, n = 1920. these are the prompt parameters i use for llama.cpp which works.

how to disable F16Mem?
the first one works, the 2nd one out of mem.

        //l, err := llama.New(model, llama.EnableF16Memory, llama.SetContext(250), llama.EnableEmbeddings, llama.SetGPULayers(gpulayers))
        l, err := llama.New(model, llama.SetContext(1024), llama.EnableEmbeddings, llama.SetGPULayers(gpulayers))

I realised there's typoe with Enabel <- u can find this for VRAM

0 replies

hiqsociety · 2023-09-25T10:04:20Z

hiqsociety
Sep 25, 2023
Author

@mudler any help to get go-llama running like the llama.cpp setting mentioned?

0 replies

hiqsociety · 2023-09-25T13:28:40Z

hiqsociety
Sep 25, 2023
Author

bringing this discussion to the latest question

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cublas can run now but max token size is greatly reduced. #231

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

cublas can run now but max token size is greatly reduced. #231

hiqsociety Sep 21, 2023

Replies: 4 comments

mudler Sep 21, 2023 Maintainer

hiqsociety Sep 23, 2023 Author

hiqsociety Sep 25, 2023 Author

hiqsociety Sep 25, 2023 Author

hiqsociety
Sep 21, 2023

mudler
Sep 21, 2023
Maintainer

hiqsociety
Sep 23, 2023
Author

hiqsociety
Sep 25, 2023
Author

hiqsociety
Sep 25, 2023
Author