[Bug]: chunked-prefill fallback does not synchronize llm_args.enable_chunked_prefill

### System Info

* CPU architecture: N/A (code inspection issue)
* GPU: N/A (issue identified through source analysis)
* TensorRT-LLM branch: main
* TensorRT-LLM commit: current main branch at time of investigation
* OS: N/A

Additional information:

* This issue was identified through source-code inspection and call-chain analysis.
* No specific hardware is required to observe the behavior.
* The report concerns state synchronization in the PyTorch executor initialization path.


### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

## Summary

While reviewing `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`, I noticed that chunked prefill can be disabled through fallback logic, but `llm_args.enable_chunked_prefill` is never updated to reflect the effective runtime state.

The relevant flow is:

```python
enable_chunked_context = llm_args.enable_chunked_prefill
```

Later, chunked prefill may be disabled through:

* `FLASHINFER_STAR_ATTENTION` fallback
* MLA unsupported-SM fallback

For example:

```python
enable_chunked_context = False
```

and:

```python
model_engine.attn_runtime_features.chunked_prefill = False
```

However:

```python
llm_args.enable_chunked_prefill
```

is never updated.

As a result, the runtime state and user-facing configuration can diverge.

## Steps to reproduce the behavior

1. Enable chunked prefill.
2. Trigger a fallback path that disables chunked prefill at runtime (for example, an unsupported MLA SM configuration or `FLASHINFER_STAR_ATTENTION`).
3. Observe that:

   * runtime chunked prefill is disabled
   * `llm_args.enable_chunked_prefill` remains `True`

## Minimal example

Relevant pattern:

```python
enable_chunked_context = llm_args.enable_chunked_prefill

...

enable_chunked_context = False
model_engine.attn_runtime_features.chunked_prefill = False

# llm_args.enable_chunked_prefill remains unchanged
```


### Expected behavior

When chunked prefill is disabled through fallback logic, the effective runtime state and `llm_args.enable_chunked_prefill` should remain synchronized.

After initialization:

```python
llm_args.enable_chunked_prefill
```

should accurately reflect whether chunked prefill is actually enabled.


### actual behavior

The runtime disables chunked prefill through fallback logic, but:

```python
llm_args.enable_chunked_prefill
```

remains `True`.

This creates a state mismatch where:

* runtime chunked prefill is disabled
* user-facing configuration still reports chunked prefill as enabled

Downstream validation and feature-status reporting may therefore observe stale state.


### additional notes

This appears to be a synchronization issue rather than an intentional design choice.

Notably, `create_py_executor()` is already expected to mutate portions of `llm_args`, and the nearby `kv_cache_config.enable_block_reuse` logic keeps runtime state and configuration synchronized.

I was unable to find an existing issue or PR tracking this specific `enable_chunked_prefill` synchronization problem.

Potential areas to investigate:

* `FLASHINFER_STAR_ATTENTION` fallback path
* MLA unsupported-SM fallback path
* downstream validation that reads `llm_args.enable_chunked_prefill`
* feature-status reporting based on `llm_args`


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: chunked-prefill fallback does not synchronize llm_args.enable_chunked_prefill #15463

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Steps to reproduce the behavior

Minimal example

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: chunked-prefill fallback does not synchronize llm_args.enable_chunked_prefill #15463

Description

System Info

Who can help?

Information

Tasks

Reproduction

Summary

Steps to reproduce the behavior

Minimal example

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions