Skip to main content
Version: DEV

Python API

RAGFlow Python API 的完整参考。在继续之前,请确保您已准备好 RAGFlow API 密钥进行身份验证

注意

运行以下命令下载 Python SDK:

pip install ragflow-sdk

错误代码


代码消息描述
400错误请求无效的请求参数
401未授权未授权访问
403禁止访问被拒绝
404未找到资源未找到
500内部服务器错误服务器内部错误
1001无效的块 ID无效的块 ID
1002块更新失败块更新失败

OpenAI 兼容 API


创建聊天完成

通过 OpenAI API 为给定的历史聊天对话创建模型响应。

参数

model: str必需

用于生成响应的模型。服务器会自动解析此参数,因此您可以暂时将其设置为任何值。

messages: list[object]必需

用于生成响应的历史聊天消息列表。此列表必须包含至少一条 user 角色的消息。

stream: boolean

是否以流的形式接收响应。如果您希望一次性接收整个响应而不是流式响应,请将此参数显式设置为 false

返回值

  • 成功:类似 OpenAI 的响应消息
  • 失败:Exception

Examples

from openai import OpenAI

model = "model"
client = OpenAI(api_key="ragflow-api-key", base_url=f"http://ragflow_address/api/v1/chats_openai/<chat_id>")

stream = True
reference = True

completion = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I am an AI assistant named..."},
{"role": "user", "content": "Can you tell me how to install neovim"},
],
stream=stream,
extra_body={"reference": reference}
)

if stream:
for chunk in completion:
print(chunk)
if reference and chunk.choices[0].finish_reason == "stop":
print(f"Reference:\n{chunk.choices[0].delta.reference}")
print(f"Final content:\n{chunk.choices[0].delta.final_content}")
else:
print(completion.choices[0].message.content)
if reference:
print(completion.choices[0].message.reference)

知识库管理


创建知识库

RAGFlow.create_dataset(
name: str,
avatar: Optional[str] = None,
description: Optional[str] = None,
embedding_model: Optional[str] = "BAAI/bge-large-zh-v1.5@BAAI",
permission: str = "me",
chunk_method: str = "naive",
parser_config: DataSet.ParserConfig = None
) -> DataSet

创建知识库。

参数

name: str必需

要创建的数据集的唯一名称。必须遵守以下要求:

  • 最多 128 个字符。
  • 不区分大小写。
avatar: str

头像的 Base64 编码。默认为 None

description: str

要创建的数据集的简要描述。默认为 None

permission

指定谁可以访问要创建的数据集。可用选项:

  • "me":(默认)只有您可以管理数据集。
  • "team":所有团队成员都可以管理数据集。
chunk_method, str

要创建的数据集的分块方法。可用选项:

  • "naive":通用(默认)
  • "manual":手动
  • "qa":问答
  • "table":表格
  • "paper":论文
  • "book":书籍
  • "laws":法律
  • "presentation":演示文稿
  • "picture":图片
  • "one":单个
  • "email":邮件
parser_config

数据集的解析器配置。ParserConfig 对象的属性根据选定的 chunk_method 而变化:

  • chunk_method="naive":
    {"chunk_token_num":512,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}.
  • chunk_method="qa":
    {"raptor": {"use_raptor": False}}
  • chunk_method="manuel":
    {"raptor": {"use_raptor": False}}
  • chunk_method="table":
    None
  • chunk_method="paper":
    {"raptor": {"use_raptor": False}}
  • chunk_method="book":
    {"raptor": {"use_raptor": False}}
  • chunk_method="laws":
    {"raptor": {"use_raptor": False}}
  • chunk_method="picture":
    None
  • chunk_method="presentation":
    {"raptor": {"use_raptor": False}}
  • chunk_method="one":
    None
  • chunk_method="knowledge-graph":
    {"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}
  • chunk_method="email":
    None

返回值

  • 成功:一个 dataset 对象。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")

删除知识库

RAGFlow.delete_datasets(ids: list[str] | None = None)

按 ID 删除知识库。

参数

ids: list[str]None必需

要删除的数据集 ID。默认为 None

  • 如果为 None,将删除所有数据集。
  • 如果是 ID 数组,只删除指定的数据集。
  • 如果是空数组,不删除任何数据集。

返回值

  • 成功:不返回值。
  • 失败:Exception

Examples

rag_object.delete_datasets(ids=["d94a8dc02c9711f0930f7fbc369eab6d","e94a8dc02c9711f0930f7fbc369eab6e"])

列出知识库

RAGFlow.list_datasets(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[DataSet]

列出知识库。

参数

page: int

指定数据集将显示在哪一页。默认为 1

page_size: int

每页的数据集数量。默认为 30

orderby: str

数据集排序的字段。可用选项:

  • "create_time"(默认)
  • "update_time"
desc: bool

指示检索到的数据集是否应按降序排列。默认为 True

id: str

要检索的数据集 ID。默认为 None

name: str

要检索的数据集名称。默认为 None

返回值

  • 成功:DataSet 对象列表。
  • 失败:Exception

示例

列出所有数据集
for dataset in rag_object.list_datasets():
print(dataset)
按 ID 检索数据集
dataset = rag_object.list_datasets(id = "id_1")
print(dataset[0])

更新知识库

DataSet.update(update_message: dict)

更新当前知识库的配置。

Parameters

update_message: dict[str, str|int], Required

A dictionary representing the attributes to update, with the following keys:

  • "name": str The revised name of the dataset.
    • Basic Multilingual Plane (BMP) only
    • Maximum 128 characters
    • Case-insensitive
  • "avatar": (Body parameter), string
    The updated base64 encoding of the avatar.
    • Maximum 65535 characters
  • "embedding_model": (Body parameter), string
    The updated embedding model name.
    • Ensure that "chunk_count" is 0 before updating "embedding_model".
    • Maximum 255 characters
    • Must follow model_name@model_factory format
  • "permission": (Body parameter), string
    The updated dataset permission. Available options:
    • "me": (Default) Only you can manage the dataset.
    • "team": All team members can manage the dataset.
  • "pagerank": (Body parameter), int
    refer to Set page rank
    • Default: 0
    • Minimum: 0
    • Maximum: 100
  • "chunk_method": (Body parameter), enum<string>
    The chunking method for the dataset. Available options:
    • "naive": General (default)
    • "book": Book
    • "email": Email
    • "laws": Laws
    • "manual": Manual
    • "one": One
    • "paper": Paper
    • "picture": Picture
    • "presentation": Presentation
    • "qa": Q&A
    • "table": Table
    • "tag": Tag

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset = dataset[0]
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})

知识库内文件管理


上传文档

DataSet.upload_documents(document_list: list[dict])

将文档上传到当前知识库。

参数

document_list: list[dict]必需

表示要上传的文档的字典列表,每个字典包含以下键:

  • "display_name":(可选)在数据集中显示的文件名。
  • "blob":(可选)要上传的文件的二进制内容。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

dataset = rag_object.create_dataset(name="kb_name")
dataset.upload_documents([{"display_name": "1.txt", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}, {"display_name": "2.pdf", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}])

更新文档

Document.update(update_message:dict)

更新当前文档的配置。

参数

update_message: dict[str, str|dict[]]必需

表示要更新的属性的字典,具有以下键:

  • "display_name"str 要更新的文档名称。
  • "meta_fields"dict[str, Any] 文档的元字段。
  • "chunk_method"str 应用于文档的解析方法。
    • "naive":通用
    • "manual":手动
    • "qa":问答
    • "table":表格
    • "paper":论文
    • "book":书籍
    • "laws":法律
    • "presentation":演示文稿
    • "picture":图片
    • "one":单个
    • "email":邮件
  • "parser_config"dict[str, Any] 文档的解析配置。其属性根据选定的 "chunk_method" 而变化:
    • "chunk_method"="naive":
      {"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}.
    • chunk_method="qa":
      {"raptor": {"use_raptor": False}}
    • chunk_method="manuel":
      {"raptor": {"use_raptor": False}}
    • chunk_method="table":
      None
    • chunk_method="paper":
      {"raptor": {"use_raptor": False}}
    • chunk_method="book":
      {"raptor": {"use_raptor": False}}
    • chunk_method="laws":
      {"raptor": {"use_raptor": False}}
    • chunk_method="presentation":
      {"raptor": {"use_raptor": False}}
    • chunk_method="picture":
      None
    • chunk_method="one":
      None
    • chunk_method="knowledge-graph":
      {"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}
    • chunk_method="email":
      None

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
doc.update([{"parser_config": {"chunk_token_num": 256}}, {"chunk_method": "manual"}])

下载文档

Document.download() -> bytes

下载当前文档。

返回值

下载的文档(字节格式)。

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="id")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
open("~/ragflow.txt", "wb+").write(doc.download())
print(doc)

列出文档

Dataset.list_documents(
id: str = None,
keywords: str = None,
page: int = 1,
page_size: int = 30,
order_by: str = "create_time",
desc: bool = True,
create_time_from: int = 0,
create_time_to: int = 0
) -> list[Document]

列出当前知识库中的文档。

参数

id: str

要检索的文档 ID。默认为 None

keywords: str

用于匹配文档标题的关键词。默认为 None

page: int

指定文档将显示在哪一页。默认为 1

page_size: int

每页的最大文档数。默认为 30

orderby: str

文档排序的字段。可用选项:

  • "create_time"(默认)
  • "update_time"
desc: bool

指示检索到的文档是否应按降序排列。默认为 True

create_time_from: int

用于过滤在此时间之后创建的文档的 Unix 时间戳。0 表示无过滤。默认为 0。

create_time_to: int

用于过滤在此时间之前创建的文档的 Unix 时间戳。0 表示无过滤。默认为 0。

返回值

  • 成功:Document 对象列表。
  • 失败:Exception

Document 对象包含以下属性:

  • id:文档 ID。默认为 ""
  • name:文档名称。默认为 ""
  • thumbnail:文档的缩略图。默认为 None
  • dataset_id:与文档关联的数据集 ID。默认为 None
  • chunk_method:分块方法名称。默认为 "naive"
  • source_type:文档的源类型。默认为 "local"
  • type:文档的类型或类别。默认为 ""。保留供未来使用。
  • created_bystr 文档的创建者。默认为 ""
  • sizeint 文档大小(字节)。默认为 0
  • token_countint 文档中的令牌数。默认为 0
  • chunk_countint 文档中的块数。默认为 0
  • progressfloat 当前处理进度(百分比)。默认为 0.0
  • progress_msgstr 指示当前进度状态的消息。默认为 ""
  • process_begin_atdatetime 文档处理的开始时间。默认为 None
  • process_durationfloat 处理持续时间(秒)。默认为 0.0
  • runstr 文档的处理状态:
    • "UNSTART"(默认)
    • "RUNNING"
    • "CANCEL"
    • "DONE"
    • "FAIL"
  • statusstr 保留供未来使用。
  • parser_config: ParserConfig Configuration object for the parser. Its attributes vary based on the selected chunk_method:
    • chunk_method="naive":
      {"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}.
    • chunk_method="qa":
      {"raptor": {"use_raptor": False}}
    • chunk_method="manuel":
      {"raptor": {"use_raptor": False}}
    • chunk_method="table":
      None
    • chunk_method="paper":
      {"raptor": {"use_raptor": False}}
    • chunk_method="book":
      {"raptor": {"use_raptor": False}}
    • chunk_method="laws":
      {"raptor": {"use_raptor": False}}
    • chunk_method="presentation":
      {"raptor": {"use_raptor": False}}
    • chunk_method="picure":
      None
    • chunk_method="one":
      None
    • chunk_method="email":
      None

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")

filename1 = "~/ragflow.txt"
blob = open(filename1 , "rb").read()
dataset.upload_documents([{"name":filename1,"blob":blob}])
for doc in dataset.list_documents(keywords="rag", page=0, page_size=12):
print(doc)

删除文档

DataSet.delete_documents(ids: list[str] = None)

按 ID 删除文档。

参数

ids: list[list]

要删除的文档 ID。默认为 None。如果未指定,将删除数据集中的所有文档。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_1")
dataset = dataset[0]
dataset.delete_documents(ids=["id_1","id_2"])

解析文档

DataSet.async_parse_documents(document_ids:list[str]) -> None

解析当前知识库中的文档。

参数

document_ids: list[str]必需

要解析的文档 ID。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
{'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")

停止解析文档

DataSet.async_cancel_parse_documents(document_ids:list[str])-> None

停止解析指定文档。

参数

document_ids: list[str]必需

应停止解析的文档 ID。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
{'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
{'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
{'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
dataset.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled.")

知识库内块管理


添加块

Document.add_chunk(content:str, important_keywords:list[str] = []) -> Chunk

向当前文档添加块。

参数

content: str必需

块的文本内容。

important_keywords: list[str]

要与块关联的关键词或短语。

返回值

  • 成功:一个 Chunk 对象。
  • 失败:Exception

Chunk 对象包含以下属性:

  • idstr:块 ID。
  • contentstr 块的文本内容。
  • important_keywordslist[str] 与块关联的关键词或短语列表。
  • create_timestr 块创建(添加到文档)的时间。
  • create_timestampfloat 表示块创建时间的时间戳,以 1970 年 1 月 1 日以来的秒数表示。
  • dataset_idstr 关联数据集的 ID。
  • document_namestr 关联文档的名称。
  • document_idstr 关联文档的 ID。
  • availablebool 块在数据集中的可用性状态。值选项:
    • False:不可用
    • True:可用(默认)

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(id="123")
dataset = datasets[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")

列出块

Document.list_chunks(keywords: str = None, page: int = 1, page_size: int = 30, id : str = None) -> list[Chunk]

列出当前文档中的块。

参数

keywords: str

用于匹配块内容的关键词。默认为 None

page: int

指定块将显示在哪一页。默认为 1

page_size: int

每页的最大块数。默认为 30

id: str

要检索的块 ID。默认:None

返回值

  • 成功:Chunk 对象列表。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets("123")
dataset = dataset[0]
docs = dataset.list_documents(keywords="test", page=1, page_size=12)
for chunk in docs[0].list_chunks(keywords="rag", page=0, page_size=12):
print(chunk)

删除块

Document.delete_chunks(chunk_ids: list[str])

按 ID 删除块。

参数

chunk_ids: list[str]

要删除的块 ID。默认为 None。如果未指定,将删除当前文档的所有块。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="123")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
doc.delete_chunks(["id_1","id_2"])

更新块

Chunk.update(update_message: dict)

更新当前块的内容或配置。

参数

update_message: dict[str, str|list[str]|int] 必需

表示要更新的属性的字典,具有以下键:

  • "content"str 块的文本内容。
  • "important_keywords"list[str] 要与块关联的关键词或短语列表。
  • "available"bool 块在数据集中的可用性状态。值选项:
    • False:不可用
    • True:可用(默认)

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="123")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
chunk.update({"content":"sdfx..."})

检索块

RAGFlow.retrieve(question:str="", dataset_ids:list[str]=None, document_ids=list[str]=None, page:int=1, page_size:int=30, similarity_threshold:float=0.2, vector_similarity_weight:float=0.3, top_k:int=1024,rerank_id:str=None,keyword:bool=False,cross_languages:list[str]=None,metadata_condition: dict=None) -> list[Chunk]

从指定知识库检索块。

参数

question: str必需

用户查询或查询关键词。默认为 ""

dataset_ids: list[str]必需

要搜索的数据集 ID。默认为 None

document_ids: list[str]

要搜索的文档 ID。默认为 None。您必须确保所有选定的文档使用相同的嵌入模型。否则将发生错误。

page: int

要检索的文档的起始索引。默认为 1

page_size: int

要检索的最大块数。默认为 30

Similarity_threshold: float

最小相似性分数。默认为 0.2

vector_similarity_weight: float

向量余弦相似性的权重。默认为 0.3。如果 x 表示向量余弦相似性,则 (1 - x) 是词项相似性权重。

top_k: int

参与向量余弦计算的块数。默认为 1024

rerank_id: str

重排序模型的 ID。默认为 None

keyword: bool

指示是否启用基于关键词的匹配:

  • True:启用基于关键词的匹配。
  • False:禁用基于关键词的匹配(默认)。
cross_languages: list[string]

应该翻译成的语言,以实现不同语言的关键词检索。

metadata_condition: dict

meta_fields 的过滤条件。

返回值

  • 成功:表示文档块的 Chunk 对象列表。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="ragflow")
dataset = dataset[0]
name = 'ragflow_test.txt'
path = './test_data/ragflow_test.txt'
documents =[{"display_name":"test_retrieve_chunks.txt","blob":open(path, "rb").read()}]
docs = dataset.upload_documents(documents)
doc = docs[0]
doc.add_chunk(content="This is a chunk addition test")
for c in rag_object.retrieve(dataset_ids=[dataset.id],document_ids=[doc.id]):
print(c)

聊天助手管理


创建聊天助手

RAGFlow.create_chat(
name: str,
avatar: str = "",
dataset_ids: list[str] = [],
llm: Chat.LLM = None,
prompt: Chat.Prompt = None
) -> Chat

创建聊天助手。

参数

name: str必需

聊天助手的名称。

avatar: str

头像的 Base64 编码。默认为 ""

dataset_ids: list[str]

关联数据集的 ID。默认为 [""]

llm: Chat.LLM

要创建的聊天助手的 LLM 设置。默认为 None。当值为 None 时,将生成具有以下值的字典作为默认值。LLM 对象包含以下属性:

  • model_namestr
    聊天模型名称。如果为 None,将使用用户的默认聊天模型。
  • temperaturefloat
    控制模型预测的随机性。较低的温度会产生更保守的响应,而较高的温度会产生更具创意性和多样性的响应。默认为 0.1
  • top_pfloat
    也被称为"核采样",此参数设置一个阈值来选择一个较小的单词集合进行采样。它专注于最可能的单词,切断可能性较低的单词。默认为 0.3
  • presence_penaltyfloat
    通过惩罚对话中已经出现的单词,阻止模型重复相同信息。默认为 0.2
  • frequency penaltyfloat
    与存在惩罚类似,这减少了模型频繁重复相同单词的倾向。默认为 0.7
prompt: Chat.Prompt

LLM 要遵循的指令。Prompt 对象包含以下属性:

  • similarity_thresholdfloat RAGFlow 在检索过程中采用加权关键词相似性和加权向量余弦相似性的组合,或加权关键词相似性和加权重排序分数的组合。如果相似性分数低于此阈值,相应的块将被排除在结果之外。默认值为 0.2
  • keywords_similarity_weightfloat 此参数设置关键词相似性在与向量余弦相似性或重排序模型相似性的混合相似性分数中的权重。通过调整此权重,您可以控制关键词相似性相对于其他相似性度量的影响。默认值为 0.7
  • top_nint 此参数指定相似性分数高于 similarity_threshold 的顶级块数量,这些块将被提供给 LLM。LLM 将访问这些"前 N 个"块。默认值为 8
  • variableslist[dict[]] 此参数列出了在聊天配置的"系统"字段中使用的变量。请注意:
    • knowledge 是一个保留变量,表示检索到的块。
    • "系统"中的所有变量都应该用花括号括起来。
    • 默认值为 [{"key": "knowledge", "optional": True}]
  • rerank_modelstr 如果未指定,将使用向量余弦相似性;否则,将使用重排序分数。默认为 ""
  • top_kint 指基于特定排序标准从列表或集合中重新排序或选择前 k 个项目的过程。默认为 1024。
  • empty_responsestr 如果在数据集中没有为用户问题检索到任何内容,这将用作响应。要允许 LLM 在找不到任何内容时即兴发挥,请将此项留空。默认为 None
  • openerstr 给用户的开场问候。默认为 "Hi! I am your assistant, can I help you?"
  • show_quotebool 指示是否应显示文本来源。默认为 True
  • promptstr 提示内容。

返回值

  • 成功:表示聊天助手的 Chat 对象。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(name="kb_1")
dataset_ids = []
for dataset in datasets:
dataset_ids.append(dataset.id)
assistant = rag_object.create_chat("Miss R", dataset_ids=dataset_ids)

更新聊天助手

Chat.update(update_message: dict)

更新当前聊天助手的配置。

参数

update_message: dict[str, str|list[str]|dict[]]必需

表示要更新的属性的字典,具有以下键:

  • "name"str 聊天助手的修订名称。
  • "avatar"str 头像的 Base64 编码。默认为 ""
  • "dataset_ids"list[str] 要更新的数据集。
  • "llm"dict LLM 设置:
    • "model_name"str 聊天模型名称。
    • "temperature"float 控制模型预测的随机性。较低的温度会产生更保守的响应,而较高的温度会产生更具创意性和多样性的响应。
    • "top_p"float 也被称为"核采样",此参数设置一个阈值来选择一个较小的单词集合进行采样。
    • "presence_penalty"float 通过惩罚对话中出现的单词,阻止模型重复相同信息。
    • "frequency penalty"float 与存在惩罚类似,这减少了模型重复相同单词的倾向。
  • "prompt":LLM 要遵循的指令。
    • "similarity_threshold"float RAGFlow 在检索过程中采用加权关键词相似性和加权向量余弦相似性的组合,或加权关键词相似性和加权重排序分数的组合。此参数设置用户查询和块之间相似性的阈值。如果相似性分数低于此阈值,相应的块将被排除在结果之外。默认值为 0.2
    • "keywords_similarity_weight"float 此参数设置关键词相似性在与向量余弦相似性或重排序模型相似性的混合相似性分数中的权重。通过调整此权重,您可以控制关键词相似性相对于其他相似性度量的影响。默认值为 0.7
    • "top_n"int 此参数指定相似性分数高于 similarity_threshold 的顶级块数量,这些块将被提供给 LLM。LLM 将访问这些"前 N 个"块。默认值为 8
    • "variables"list[dict[]] 此参数列出了在聊天配置的"系统"字段中使用的变量。请注意:
      • knowledge 是一个保留变量,表示检索到的块。
      • "系统"中的所有变量都应该用花括号括起来。
      • 默认值为 [{"key": "knowledge", "optional": True}]
    • "rerank_model"str 如果未指定,将使用向量余弦相似性;否则,将使用重排序分数。默认为 ""
    • "empty_response"str 如果在数据集中没有为用户问题检索到任何内容,这将用作响应。要允许 LLM 在没有检索到任何内容时即兴发挥,请将此项留空。默认为 None
    • "opener"str 给用户的开场问候。默认为 "Hi! I am your assistant, can I help you?"
    • "show_quote"bool 指示是否应显示文本来源。默认为 True
    • "prompt"str 提示内容。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(name="kb_1")
dataset_id = datasets[0].id
assistant = rag_object.create_chat("Miss R", dataset_ids=[dataset_id])
assistant.update({"name": "Stefan", "llm": {"temperature": 0.8}, "prompt": {"top_n": 8}})

删除聊天助手

RAGFlow.delete_chats(ids: list[str] = None)

按 ID 删除聊天助手。

参数

ids: list[str]

要删除的聊天助手 ID。默认为 None。如果为空或未指定,系统中的所有聊天助手都将被删除。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.delete_chats(ids=["id_1","id_2"])

列出聊天助手

RAGFlow.list_chats(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[Chat]

列出聊天助手。

参数

page: int

指定聊天助手将显示在哪一页。默认为 1

page_size: int

每页的聊天助手数量。默认为 30

orderby: str

结果排序的属性。可用选项:

  • "create_time"(默认)
  • "update_time"
desc: bool

指示检索到的聊天助手是否应按降序排列。默认为 True

id: str

要检索的聊天助手 ID。默认为 None

name: str

要检索的聊天助手名称。默认为 None

返回值

  • 成功:Chat 对象列表。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for assistant in rag_object.list_chats():
print(assistant)

会话管理


创建聊天助手会话

Chat.create_session(name: str = "New session") -> Session

与当前聊天助手创建会话。

参数

name: str

要创建的聊天会话名称。

返回值

  • 成功:包含以下属性的 Session 对象:
    • idstr 创建的会话的自动生成的唯一标识符。
    • namestr 创建的会话名称。
    • messagelist[Message] 创建的会话的开场消息。默认:[{"role": "assistant", "content": "Hi! I am your assistant, can I help you?"}]
    • chat_idstr 关联的聊天助手 ID。
  • 失败:Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session()

更新聊天助手会话

Session.update(update_message: dict)

更新当前聊天助手的当前会话。

参数

update_message: dict[str, Any]必需

表示要更新的属性的字典,只有一个键:

  • "name"str 会话的修订名称。

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session("session_name")
session.update({"name": "updated_name"})

列出聊天助手会话

Chat.list_sessions(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
name: str = None
) -> list[Session]

列出与当前聊天助手关联的会话。

Parameters

page: int

Specifies the page on which the sessions will be displayed. Defaults to 1.

page_size: int

The number of sessions on each page. Defaults to 30.

orderby: str

The field by which sessions should be sorted. Available options:

  • "create_time" (default)
  • "update_time"
desc: bool

Indicates whether the retrieved sessions should be sorted in descending order. Defaults to True.

id: str

The ID of the chat session to retrieve. Defaults to None.

name: str

The name of the chat session to retrieve. Defaults to None.

Returns

  • Success: A list of Session objects associated with the current chat assistant.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
for session in assistant.list_sessions():
print(session)

删除聊天助手会话

Chat.delete_sessions(ids:list[str] = None)

按 ID 删除当前聊天助手的会话。

Parameters

ids: list[str]

The IDs of the sessions to delete. Defaults to None. If it is not specified, all sessions associated with the current chat assistant will be deleted.

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
assistant.delete_sessions(ids=["id_1","id_2"])

与聊天助手对话

Session.ask(question: str = "", stream: bool = False, **kwargs) -> Optional[Message, iter[Message]]

向指定聊天助手提问以开始 AI 驱动的对话。

NOTE

In streaming mode, not all responses include a reference, as this depends on the system's judgement.

Parameters

question: str, Required

The question to start an AI-powered conversation. Default to ""

stream: bool

Indicates whether to output responses in a streaming way:

  • True: Enable streaming (default).
  • False: Disable streaming.
**kwargs

The parameters in prompt(system).

Returns

  • A Message object containing the response to the question if stream is set to False.
  • An iterator containing multiple message objects (iter[Message]) if stream is set to True

The following shows the attributes of a Message object:

id: str

The auto-generated message ID.

content: str

The content of the message. Defaults to "Hi! I am your assistant, can I help you?".

reference: list[Chunk]

A list of Chunk objects representing references to the message, each containing the following attributes:

  • id str
    The chunk ID.
  • content str
    The content of the chunk.
  • img_id str
    The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
  • document_id str
    The ID of the referenced document.
  • document_name str
    The name of the referenced document.
  • position list[str]
    The location information of the chunk within the referenced document.
  • dataset_id str
    The ID of the dataset to which the referenced document belongs.
  • similarity float
    A composite similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity. It is the weighted sum of vector_similarity and term_similarity.
  • vector_similarity float
    A vector similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity between vector embeddings.
  • term_similarity float
    A keyword similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity between keywords.

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session()

print("\n==================== Miss R =====================\n")
print("Hello. What can I do for you?")

while True:
question = input("\n==================== User =====================\n> ")
print("\n==================== Miss R =====================\n")

cont = ""
for ans in session.ask(question, stream=True):
print(ans.content[len(cont):], end='', flush=True)
cont = ans.content

创建代理会话

Agent.create_session(**kwargs) -> Session

与当前代理创建会话。

Parameters

**kwargs

The parameters in begin component.

Returns

  • Success: A Session object containing the following attributes:
    • id: str The auto-generated unique identifier of the created session.
    • message: list[Message] The messages of the created session assistant. Default: [{"role": "assistant", "content": "Hi! I am your assistant, can I help you?"}]
    • agent_id: str The ID of the associated agent.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow, Agent

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
agent_id = "AGENT_ID"
agent = rag_object.list_agents(id = agent_id)[0]
session = agent.create_session()

与代理对话

Session.ask(question: str="", stream: bool = False) -> Optional[Message, iter[Message]]

向指定代理提问以开始 AI 驱动的对话。

NOTE

In streaming mode, not all responses include a reference, as this depends on the system's judgement.

Parameters

question: str

The question to start an AI-powered conversation. Ifthe Begin component takes parameters, a question is not required.

stream: bool

Indicates whether to output responses in a streaming way:

  • True: Enable streaming (default).
  • False: Disable streaming.

Returns

  • A Message object containing the response to the question if stream is set to False
  • An iterator containing multiple message objects (iter[Message]) if stream is set to True

The following shows the attributes of a Message object:

id: str

The auto-generated message ID.

content: str

The content of the message. Defaults to "Hi! I am your assistant, can I help you?".

reference: list[Chunk]

A list of Chunk objects representing references to the message, each containing the following attributes:

  • id str
    The chunk ID.
  • content str
    The content of the chunk.
  • image_id str
    The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
  • document_id str
    The ID of the referenced document.
  • document_name str
    The name of the referenced document.
  • position list[str]
    The location information of the chunk within the referenced document.
  • dataset_id str
    The ID of the dataset to which the referenced document belongs.
  • similarity float
    A composite similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity. It is the weighted sum of vector_similarity and term_similarity.
  • vector_similarity float
    A vector similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity between vector embeddings.
  • term_similarity float
    A keyword similarity score of the chunk ranging from 0 to 1, with a higher value indicating greater similarity between keywords.

Examples

from ragflow_sdk import RAGFlow, Agent

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.list_agents(id = AGENT_id)[0]
session = agent.create_session()

print("\n===== Miss R ====\n")
print("Hello. What can I do for you?")

while True:
question = input("\n===== User ====\n> ")
print("\n==== Miss R ====\n")

cont = ""
for ans in session.ask(question, stream=True):
print(ans.content[len(cont):], end='', flush=True)
cont = ans.content

列出代理会话

Agent.list_sessions(
page: int = 1,
page_size: int = 30,
orderby: str = "update_time",
desc: bool = True,
id: str = None
) -> List[Session]

列出与当前代理关联的会话。

Parameters

page: int

Specifies the page on which the sessions will be displayed. Defaults to 1.

page_size: int

The number of sessions on each page. Defaults to 30.

orderby: str

The field by which sessions should be sorted. Available options:

  • "create_time"
  • "update_time"(default)
desc: bool

Indicates whether the retrieved sessions should be sorted in descending order. Defaults to True.

id: str

The ID of the agent session to retrieve. Defaults to None.

Returns

  • Success: A list of Session objects associated with the current agent.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.list_agents(id = AGENT_id)[0]
sessons = agent.list_sessions()
for session in sessions:
print(session)

删除代理会话

Agent.delete_sessions(ids: list[str] = None)

按 ID 删除代理的会话。

Parameters

ids: list[str]

The IDs of the sessions to delete. Defaults to None. If it is not specified, all sessions associated with the agent will be deleted.

Returns

  • Success: No value is returned.
  • Failure: Exception

Examples

from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.list_agents(id = AGENT_id)[0]
agent.delete_sessions(ids=["id_1","id_2"])

代理管理


列出代理

RAGFlow.list_agents(
page: int = 1,
page_size: int = 30,
orderby: str = "create_time",
desc: bool = True,
id: str = None,
title: str = None
) -> List[Agent]

列出代理。

Parameters

page: int

Specifies the page on which the agents will be displayed. Defaults to 1.

page_size: int

The number of agents on each page. Defaults to 30.

orderby: str

The attribute by which the results are sorted. Available options:

  • "create_time" (default)
  • "update_time"
desc: bool

Indicates whether the retrieved agents should be sorted in descending order. Defaults to True.

id: str

The ID of the agent to retrieve. Defaults to None.

name: str

The name of the agent to retrieve. Defaults to None.

Returns

  • Success: A list of Agent objects.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for agent in rag_object.list_agents():
print(agent)

创建代理

RAGFlow.create_agent(
title: str,
dsl: dict,
description: str | None = None
) -> None

创建代理。

Parameters

title: str

Specifies the title of the agent.

dsl: dict

Specifies the canvas DSL of the agent.

description: str

The description of the agent. Defaults to None.

Returns

  • Success: Nothing.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.create_agent(
title="Test Agent",
description="A test agent",
dsl={
# ... canvas DSL here ...
}
)

更新代理

RAGFlow.update_agent(
agent_id: str,
title: str | None = None,
description: str | None = None,
dsl: dict | None = None
) -> None

更新代理。

Parameters

agent_id: str

Specifies the id of the agent to be updated.

title: str

Specifies the new title of the agent. None if you do not want to update this.

dsl: dict

Specifies the new canvas DSL of the agent. None if you do not want to update this.

description: str

The new description of the agent. None if you do not want to update this.

Returns

  • Success: Nothing.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.update_agent(
agent_id="58af890a2a8911f0a71a11b922ed82d6",
title="Test Agent",
description="A test agent",
dsl={
# ... canvas DSL here ...
}
)

删除代理

RAGFlow.delete_agent(
agent_id: str
) -> None

删除代理。

Parameters

agent_id: str

Specifies the id of the agent to be deleted.

Returns

  • Success: Nothing.
  • Failure: Exception.

Examples

from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.delete_agent("58af890a2a8911f0a71a11b922ed82d6")