janhq / cortex.tensorrt-llm Public

forked from NVIDIA/TensorRT-LLM

Notifications You must be signed in to change notification settings
Fork 2
Star 31

Code
Issues 11
Pull requests 2
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: janhq/cortex.tensorrt-llm

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

11 Open 7 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

feat: use batch-manager instead of gpt-runtime

#51 opened Jul 1, 2024 by vansangpfiev

bug: templating issue with mistral v0.3

#50 opened Jul 1, 2024 by vansangpfiev

feat: support llama3

#49 opened Jul 1, 2024 by vansangpfiev

feat: Load multiple models P1: important

Important feature / fix

#33 opened Mar 21, 2024 by tikikun

feat: Unload the model P1: important

Important feature / fix

#32 opened Mar 21, 2024 by tikikun

feat: Stop inferencing P1: important

Important feature / fix

#31 opened Mar 21, 2024 by tikikun

feat: Enable the usage of InferenceRequest and stop_words_list P1: important

Important feature / fix

#30 opened Mar 21, 2024 by tikikun

feat: Enable inflight batching in nitro-tensorrt-llm P1: important

Important feature / fix

#29 opened Mar 21, 2024 by tikikun

Github CI windows for tensorrt_llm engine

#28 opened Mar 20, 2024 by hiro-v

bug: tensorRT - Switching between model is causing error satisfyProfile Runtime dimension does not satisfy any optimization profile

#27 opened Mar 18, 2024 by Van-QA

feat: Ultilize free_gpu_memory_fraction to control max VRAM consumption type: feature request

A new feature

#25 opened Mar 16, 2024 by hiro-v

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly