Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rag回复内容流式输出怎么实现呢 #13

Open
huangrs494 opened this issue Oct 15, 2024 · 4 comments
Open

rag回复内容流式输出怎么实现呢 #13

huangrs494 opened this issue Oct 15, 2024 · 4 comments

Comments

@huangrs494
Copy link

part_2_plus.py
# 生成回复
res = chat_llm_chain.predict(
# 格式化聊天记录为JSON字符串
chat_history = "\n".join([f"{entry['role']}: {entry['content']}" for entry in self.memory]),
# chat_history = "\n".join(self.memory),
context=context,
question=query
)
这段代码怎么改成流式输出呢,我尝试使用async for chunk in chat_llm_chain.astream()去改,但是失败了,求助有没有大佬知道是调用哪个方式实现流式输出?

@stay-leave
Copy link
Owner

首先模型要开启流式输出,然后:
`import requests

def stream_large_model_api():
url = "https://api.example.com/large_model" # 替换为实际的大模型API地址
headers = {
"Authorization": "Bearer YOUR_API_KEY", # 替换为实际的API密钥
"Content-Type": "application/json"
}
data = {
"prompt": "你好,请问今天的天气怎么样?",
"max_tokens": 100,
"stream": True # 设定流式传输
}

# 发送流式请求
response = requests.post(url, headers=headers, json=data, stream=True)

# 检查请求状态
if response.status_code != 200:
    print(f"请求失败,状态码:{response.status_code}")
    return

# 逐行读取响应数据
for line in response.iter_lines(decode_unicode=True):
    if line:  # 确保不是空行
        # 大多数API返回的数据是以`data:`开头的 JSON 字符串
        if line.startswith("data:"):
            # 解析数据并输出
            message = line[len("data:"):].strip()
            if message == "[DONE]":
                break
            print(message)

if name == "main":
stream_large_model_api()
`

@huangrs494
Copy link
Author

感谢回复,不知道您还记得enhance_llm/qa_bot/part_2_plus.py里的下面这段代码吗?我想把langchain的RAG预测输出改成流式输出,也就是下面这段代码改成流式输出。是不是需要改掉 chat_llm_chain.predict这个方法,怎么改呢?是不是要重新写chat_llm_chain链
def chat(self, query):
retriever = self.retrieve()
logger.info("加载数据库和检索器成功!")
context = self.re_write_rank(query, retriever, score_threshold=0)
logger.info(f"改写重排成功!上下文的前50字:{context[:50]}")

    PROMPT_TEMPLATE = """以下是历史聊天记录:
    {chat_history}
    ---
    以下是参考信息:
    {context}
    ---
    以下是我当前的问题:
    {question}
    ---
    请根据上述聊天记录和参考信息回答我当前的问题。前面的聊天记录和参考信息可能有用,也可能没用,你需要从我给出的参考信息中选出与我的问题最相关的那些,来为你的回答提供依据。回答一定要忠于原文,简洁但不丢信息,不要胡乱编造。请给出回答。"""
    prompt = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)

    chat_llm_chain = LLMChain(
        llm=self.llm,
        prompt=prompt,
        verbose=True,
    )

    # 生成回复 
    res = chat_llm_chain.predict(
                            # 格式化聊天记录为JSON字符串
                            chat_history = "\n".join([f"{entry['role']}: {entry['content']}" for entry in self.memory]),
                            context=context,
                            question=query
                            )
    # res = chat_llm_chain.invoke({"context": context,"question": query})
    logger.info(f"模型回复:{res}")

    # 更新聊天历史
    self.memory.append({'role': 'user', 'content': query})
    self.memory.append({'role': 'assistant', 'content': res})
    print(self.memory)

    return res

@stay-leave
Copy link
Owner

Langchain应该有实现该功能的,看看官方怎么做的

@huangrs494
Copy link
Author

huangrs494 commented Oct 17, 2024

感谢解答。看官方文档需要部署模型后,以openAI的方式访问模型,才能实现LCEL的流式输出。方法如下,有需要的同学可参考。
prompt = ChatPromptTemplate.from_template("讲一个关于 {topic}的笑话")
output_parser = StrOutputParser()
chain = prompt | model | output_parser

import asyncio

async def main():
async for chunk in chain.astream("冰淇淋"):
print(chunk, end="|", flush=True)

asyncio.run(main())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants