Skip to content

how to use a online LLM API,instead of local vllm loaded #1307

Closed Answered by cpfiffer
devillaws asked this question in Q&A
Discussion options

You must be logged in to vote

Please review this doc. vLLM is OpenAI compliant, meaning you can just use the openai python library and use a different base_url for whatever your inference server is.

class Testing(BaseModel):
    """
    A class representing a testing schema.
    """
    name: str
    age: int

openai_client = openai.OpenAI(
    base_url="http://0.0.0.0:1234/v1",
    api_key="dopeness"
)

# Make a request to the local LM Studio server
response = openai_client.beta.chat.completions.parse(
    model="hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF",
    messages=[
        {"role": "system", "content": "You are like so good at whatever you do."},
        {"role": "user", "content": "My name is Cameron and …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cpfiffer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants