small man changes

Mozilla-Ocho · Dec 7, 2024 · 81845d5 · 81845d5
1 parent 61b718d
commit 81845d5
Show file tree

Hide file tree

Showing 2 changed files with 82 additions and 85 deletions.
diff --git a/embedfile/embedfile.1 b/embedfile/embedfile.1
@@ -1,8 +1,8 @@
 .Dd December 5, 2023
-.Dt LLAMAFILE-QUANTIZE 1
+.Dt EMBEDFILE 1
 .Os Llamafile Manual
 .Sh NAME
-.Nm llamafile-quantize
+.Nm embedfile
 .Nd large language model quantizer
 .Sh SYNOPSIS
 .Nm

diff --git a/embedfile/embedfile.1.asc b/embedfile/embedfile.1.asc
@@ -1,83 +1,80 @@
-[4mLLAMAFILE-QUANTIZE[24m(1)       General Commands Manual      [4mLLAMAFILE-QUANTIZE[24m(1)
-
-[1mNAME[0m
-       llamafile-quantize — large language model quantizer
-
-[1mSYNOPSIS[0m
-       [1mllamafile-quantize  [22m[flags...]  [4mmodel-f32.gguf[24m  [[4mmodel-quant.gguf[24m] [4mtype[0m
-                          [[4mnthreads[24m]
-
-[1mDESCRIPTION[0m
-       [1mllamafile-quantize [22mconverts  large  language  model  weights  from  the
-       float32  or float16 formats into smaller data types from 2 to 8 bits in
-       size.
-
-[1mOPTIONS[0m
-       The following flags are available:
-
-       [1m--allow-requantize[0m
-               Allows requantizing tensors that have already  been  quantized.
-               Warning:  This can severely reduce quality compared to quantiz‐
-               ing from 16bit or 32bit
-
-       [1m--leave-output-tensor[0m
-               Will leave output.weight un(re)quantized. Increases model  size
-               but may also increase quality, especially when requantizing
-
-       [1m--pure  [22mDisable  k-quant  mixtures and quantize all tensors to the same
-               type
-
-[1mARGUMENTS[0m
-       The following positional arguments are accepted:
-
-       [4mmodel-f32.gguf[0m
-               Is the input file, which contains the unquantized model weights
-               in either the float32 or float16 format.
-
-       [4mmodel-quant.gguf[0m
-               Is the output file, which will contain quantized weights in the
-               desired format. If this path isn't specified, it'll default  to
-               [inp path]/ggml-model-[ftype].gguf.
-
-       [4mtype[24m    Is the desired quantization format, which may be the integer id
-               of  a supported quantization type, or its name. See the quanti‐
-               zation types section below for acceptable formats.
-
-       [4mnthreads[0m
-               Number of threads to use during computation (default: nproc/2)
-
-[1mQUANTIZATION TYPES[0m
-       The following quantization types are available. This table shows the ID
-       of the quantization format, its name, the file size of 7B model weights
-       that use it, and finally the amount of quality badness it introduces as
-       measured by the llamafile-perplexity tool averaged over 128 chunks with
-       the TinyLLaMA 1.1B v1.0 Chat model. Rows are ordered in accordance with
-       how recommended the quantization format is for general usage.
-
-       [1m-     [22m18 Q6_K   5.6gb +0.0446 ppl (q6 kawrakow)
-       [1m-      [22m7 Q8_0   7.2gb +0.0022 ppl (q8 gerganov)
-       [1m-      [22m1 F16    14gb  +0.0000 ppl (best but biggest)
-       [1m-      [22m8 Q5_0   4.7gb +0.0817 ppl (q5 gerganov zero)
-       [1m-     [22m17 Q5_K_M 4.8gb +0.0836 ppl (q5 kawrakow medium)
-       [1m-     [22m16 Q5_K_S 4.7gb +0.1049 ppl (q5 kawrakow small)
-       [1m-     [22m15 Q4_K_M 4.1gb +0.3132 ppl (q4 kawrakow medium)
-       [1m-     [22m14 Q4_K_S 3.9gb +0.3408 ppl (q4 kawrakow small)
-       [1m-     [22m13 Q3_K_L 3.6gb +0.5736 ppl (q3 kawrakow large)
-       [1m-     [22m12 Q3_K_M 3.3gb +0.7612 ppl (q3 kawrakow medium)
-       [1m-     [22m11 Q3_K_S 3.0gb +1.3834 ppl (q3 kawrakow small)
-       [1m-     [22m10 Q2_K   2.6gb +4.2359 ppl (tiniest hallucinates most)
-       [1m-     [22m32 BF16   14gb  +0.0000 ppl (canonical but cpu/cuda only)
-       [1m-      [22m0 F32    27gb   9.0952 ppl (reference point)
-       [1m-      [22m2 Q4_0   3.9gb +0.3339 ppl (legacy)
-       [1m-      [22m3 Q4_1   4.3gb +0.4163 ppl (legacy)
-       [1m-      [22m9 Q5_1   5.1gb +0.1091 ppl (legacy)
-       [1m-     [22m12 Q3_K   alias for Q3_K_M
-       [1m-     [22m15 Q4_K   alias for Q4_K_M
-       [1m-     [22m17 Q5_K   alias for Q5_K_M
-       [1m-   [22mCOPY Only copy tensors, no quantizing.
-
-[1mSEE ALSO[0m
-       [4mllamafile[24m(1),      [4mllamafile-imatrix[24m(1),       [4mllamafile-perplexity[24m(1),
-       [4mllava-quantize[24m(1), [4mzipalign[24m(1), [4munzip[24m(1)
-
-Llamafile Manual               December 5, 2023          [4mLLAMAFILE-QUANTIZE[24m(1)
+EMBEDFILE(1)                 General Commands Manual                EMBEDFILE(1)
+
+NNAAMMEE
+     eemmbbeeddffiillee - large language model quantizer
+
+SSYYNNOOPPSSIISS
+     eemmbbeeddffiillee [flags...] _m_o_d_e_l_-_f_3_2_._g_g_u_f [_m_o_d_e_l_-_q_u_a_n_t_._g_g_u_f] _t_y_p_e [_n_t_h_r_e_a_d_s]
+
+DDEESSCCRRIIPPTTIIOONN
+     eemmbbeeddffiillee converts large language model weights from the float32 or float16
+     formats into smaller data types from 2 to 8 bits in size.
+
+OOPPTTIIOONNSS
+     The following flags are available:
+
+     ----aallllooww--rreeqquuaannttiizzee
+             Allows requantizing tensors that have already been quantized.
+             Warning: This can severely reduce quality compared to quantizing
+             from 16bit or 32bit
+
+     ----lleeaavvee--oouuttppuutt--tteennssoorr
+             Will leave output.weight un(re)quantized. Increases model size but
+             may also increase quality, especially when requantizing
+
+     ----ppuurree  Disable k-quant mixtures and quantize all tensors to the same type
+
+AARRGGUUMMEENNTTSS
+     The following positional arguments are accepted:
+
+     _m_o_d_e_l_-_f_3_2_._g_g_u_f
+             Is the input file, which contains the unquantized model weights in
+             either the float32 or float16 format.
+
+     _m_o_d_e_l_-_q_u_a_n_t_._g_g_u_f
+             Is the output file, which will contain quantized weights in the
+             desired format. If this path isn't specified, it'll default to [inp
+             path]/ggml-model-[ftype].gguf.
+
+     _t_y_p_e    Is the desired quantization format, which may be the integer id of
+             a supported quantization type, or its name. See the quantization
+             types section below for acceptable formats.
+
+     _n_t_h_r_e_a_d_s
+             Number of threads to use during computation (default: nproc/2)
+
+QQUUAANNTTIIZZAATTIIOONN TTYYPPEESS
+     The following quantization types are available. This table shows the ID of
+     the quantization format, its name, the file size of 7B model weights that
+     use it, and finally the amount of quality badness it introduces as measured
+     by the llamafile-perplexity tool averaged over 128 chunks with the
+     TinyLLaMA 1.1B v1.0 Chat model. Rows are ordered in accordance with how
+     recommended the quantization format is for general usage.
+
+     --     18 Q6_K   5.6gb +0.0446 ppl (q6 kawrakow)
+     --      7 Q8_0   7.2gb +0.0022 ppl (q8 gerganov)
+     --      1 F16    14gb  +0.0000 ppl (best but biggest)
+     --      8 Q5_0   4.7gb +0.0817 ppl (q5 gerganov zero)
+     --     17 Q5_K_M 4.8gb +0.0836 ppl (q5 kawrakow medium)
+     --     16 Q5_K_S 4.7gb +0.1049 ppl (q5 kawrakow small)
+     --     15 Q4_K_M 4.1gb +0.3132 ppl (q4 kawrakow medium)
+     --     14 Q4_K_S 3.9gb +0.3408 ppl (q4 kawrakow small)
+     --     13 Q3_K_L 3.6gb +0.5736 ppl (q3 kawrakow large)
+     --     12 Q3_K_M 3.3gb +0.7612 ppl (q3 kawrakow medium)
+     --     11 Q3_K_S 3.0gb +1.3834 ppl (q3 kawrakow small)
+     --     10 Q2_K   2.6gb +4.2359 ppl (tiniest hallucinates most)
+     --     32 BF16   14gb  +0.0000 ppl (canonical but cpu/cuda only)
+     --      0 F32    27gb   9.0952 ppl (reference point)
+     --      2 Q4_0   3.9gb +0.3339 ppl (legacy)
+     --      3 Q4_1   4.3gb +0.4163 ppl (legacy)
+     --      9 Q5_1   5.1gb +0.1091 ppl (legacy)
+     --     12 Q3_K   alias for Q3_K_M
+     --     15 Q4_K   alias for Q4_K_M
+     --     17 Q5_K   alias for Q5_K_M
+     --   COPY Only copy tensors, no quantizing.
+
+SSEEEE AALLSSOO
+     llamafile(1), llamafile-imatrix(1), llamafile-perplexity(1),
+     llava-quantize(1), zipalign(1), unzip(1)
+
+Llamafile Manual                December 5, 2023                Llamafile Manual