feat: Add Microsoft Phi-3 to Cactus #240

harshaljanjani · 2025-11-29T11:45:10Z

Re-raised with custom tokenizer and chat-template support after repo mishaps over the past few days.

This PR adds support for Microsoft’s Phi-3 model ++ improves the robustness of the test suite for models that do not support embeddings. Phi-3-Mini-4K-Instruct is a 3.8B parameter model and comes in 4K and 128K context-length variants. Turns out FP16 works well for Phi-3, but performance degrades sharply with INT8; results are shared below.

Phi-3 Integration -- Support for Phi-3's architecture. Updated convert_hf.py to handle these weight configurations during conversion.
Test Suite -- Fixed the embeddings test, stopping segfaults when testing incompatible larger model types.

Testing Environment (For Repro):

GCP VM: t2a-standard-16 (16 vCPU ARM64, 64GB RAM), Debian 12 ARM, Ampere Altra.

Results

Metric	INT8	FP16	INT8
TTFT	0.47s	4.70s	10x faster
Prefill	59.1 tok/s	6.0 tok/s	10x faster
Decode	16.4 tok/s	1.7 tok/s	10x faster
RAM	3.6 GB	7.3 GB	50% less
Quality	Inconsistent	Reliable	-

Phi-3-mini-4k-instruct (INT8)

╔════════════════════════════════╗
║    Running Engine Tests        ║
╚════════════════════════════════╝
╔════════════════════════════════╗
║   STREAMING & FOLLOW-UP TEST   ║
╚════════════════════════════════╝
[Turn 1]
User: My name is Henry Ndubuaku, how are you?
Assistant: Hello Henry, it's lovely to assist you. May I know how can I help?
[Results - Turn 1]
├─ TTFT: 0.47 sec
├─ Prefill: 59.1 toks/sec
├─ Decode: 16.4 toks/sec
└─ RAM: 3628.1 MB
[Turn 2]
User: What is my name?
Assistant: Your name is Ndubuaku, but some people might call me Microsoft Assistant for assistance purposes. How can I assist you further?
[Results - Turn 2]
├─ TTFT: 0.72 sec
├─ Prefill: 78.8 toks/sec
├─ Decode: 13.8 toks/sec
└─ RAM: 3710.9 MB
✓ PASS │ streaming
╔════════════════════════════════╗
║       100 CONTEXT TEST         ║
╚════════════════════════════════╝
Response: This table appears to be presenting a series of numerical values which appear random without any discernible pattern or formula that relates one value with another directly through mathematical operations (such as addition/multiplication). Thus:   | x      | y                              ||-------|-------------------------||0   43       217           1986        25    3      4          ------! In order for these numbers to make sense, they should follow some relationship between 'x [→ stopped]
[Results]
├─ TTFT: 2.25 sec
├─ Prefill: 105.0 toks/sec
├─ Decode: 10.6 toks/sec
└─ RAM: 3975.8 MB
└─ Status: PASSED ✓
✓ PASS │ 100_context
╔════════════════════════════════╗
║      1K CONTEXT TEST           ║
╚════════════════════════════════╝
Response: From examining Dataset C's daily total precipitation measurements in inches over a month period starting from the first day labeled "Data 0.5," it is evident an increasing trend where each subsequent increment results not only by whole multiples of ten but also shows fluctcy within individual increases and overall growth rates that are slightly outpacing what one would expect purely on linearity progression due solely to multiplicative escalation. Initial increments suggest relatively stable clim [→ stopped]
[Results]
├─ TTFT: 16.11 sec
├─ Prefill: 75.3 toks/sec
├─ Decode: 7.9 toks/sec
└─ RAM: 5596.1 MB
└─ Status: PASSED ✓
✓ PASS │ 1k_context

Phi-3-mini-4k-instruct (FP16)

╔════════════════════════════════╗
║    Running Engine Tests        ║
╚════════════════════════════════╝
╔════════════════════════════════╗
║   STREAMING & FOLLOW-UP TEST   ║
╚════════════════════════════════╝
[Turn 1]
User: My name is Henry Ndubuaku, how are you?
Assistant: I'm doing well. Thank you for asking, Henry Ndubuaku! How are the circumstances surrounding your question?
[Results - Turn 1]
├─ TTFT: 4.70 sec
├─ Prefill: 6.0 toks/sec
├─ Decode: 1.7 toks/sec
└─ RAM: 7267.1 MB
[Turn 2]
User: What is my name?
Assistant: Your name is Henry Ndubuaku. Well-identified once again!
[Results - Turn 2]
├─ TTFT: 9.70 sec
├─ Prefill: 6.4 toks/sec
├─ Decode: 1.7 toks/sec
└─ RAM: 7283.9 MB
✓ PASS │ streaming
╔════════════════════════════════╗
║       100 CONTEXT TEST         ║
╚════════════════════════════════╝
Response: The given data represents an increasing sequence of floating-point values that correspond to 1/4π intervals from Data 0 (wherein each value corresponds to a quarter interval along a unit circle), starting at zero and approaching one without ever actually reaching it due its representation as limited decimal places up until four digits:<0x0A><0x0A>Data 0 = 'zero'. This suggests no angle measurement has occurred yet - think about initial stage or reference point in trigonometry terms within our circular context i.e [→ stopped]
[Results]
├─ TTFT: 35.78 sec
├─ Prefill: 6.6 toks/sec
├─ Decode: 1.6 toks/sec
└─ RAM: 7584.1 MB
└─ Status: PASSED ✓
✓ PASS │ 100_context

cc: @HenryNdubuaku

Signed-off-by: harshaljanjani <[email protected]>

harshaljanjani added 5 commits November 29, 2025 17:03

init: Add type support in config and conversion script

3e0c208

Signed-off-by: harshaljanjani <[email protected]>

feat: Make Phi-3-specific changes in SPTok

6c8c044

Signed-off-by: harshaljanjani <[email protected]>

compat: Phi-3-specific changes to tokenizer decoding

2d3030c

Signed-off-by: harshaljanjani <[email protected]>

feat: Add Microsoft Phi-3

4460152

Signed-off-by: harshaljanjani <[email protected]>

chore: Fix test segfault

17ebf8d

Signed-off-by: harshaljanjani <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Microsoft Phi-3 to Cactus #240

feat: Add Microsoft Phi-3 to Cactus #240

Uh oh!

harshaljanjani commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add Microsoft Phi-3 to Cactus #240

Are you sure you want to change the base?

feat: Add Microsoft Phi-3 to Cactus #240

Uh oh!

Conversation

harshaljanjani commented Nov 29, 2025

Results

Phi-3-mini-4k-instruct (INT8)

Phi-3-mini-4k-instruct (FP16)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant