fix(embedding): 通用修复 embedding_dimensions 参数校验,解决 SiliconFlow 等兼容接口报 400 的问题#8807
fix(embedding): 通用修复 embedding_dimensions 参数校验,解决 SiliconFlow 等兼容接口报 400 的问题#8807Rat0323 wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request simplifies the _embedding_kwargs method in openai_embedding_source.py by removing a temporary workaround for the SiliconFlow provider and ensuring that the dimensions parameter is only set if its value is greater than zero. Feedback suggests checking for None or empty string values for embedding_dimensions before attempting to convert it to an integer, which prevents unnecessary warning logs from flooding the console when the field is left blank in the WebUI.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- When
embedding_dimensionsis configured as0or a negative value it is now silently ignored; consider logging a low-level warning or info in that case so misconfigurations are easier to spot while still avoiding the SiliconFlow 400s. - Since the provider-specific SiliconFlow branch is removed, it might be worth adding a short inline comment near the
dim_val > 0check to clarify that omittingdimensionsis intentional for OpenAI-compatible providers that reject this parameter entirely.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- When `embedding_dimensions` is configured as `0` or a negative value it is now silently ignored; consider logging a low-level warning or info in that case so misconfigurations are easier to spot while still avoiding the SiliconFlow 400s.
- Since the provider-specific SiliconFlow branch is removed, it might be worth adding a short inline comment near the `dim_val > 0` check to clarify that omitting `dimensions` is intentional for OpenAI-compatible providers that reject this parameter entirely.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ec362e4 to
85a3543
Compare
85a3543 to
2b52c22
Compare
2b52c22 to
3c4ac17
Compare
- Allow automatic dimension inference when configured_dim is 0 to prevent blocking new KB creation. - Filter out dimensions parameter in OpenAI embedding requests when value is 0 to improve compatibility with providers like SiliconFlow.
📝 PR 摘要 (Summary)
本次提交是对嵌入维度校验与参数处理的通用端到端修复,旨在彻底解决当用户未配置嵌入维度(即
embedding_dimensions: 0)时,系统向接口发送非法参数(导致 400 错误)以及知识库创建被阻塞的问题。该修复为通用方案,不针对单一服务商。🐛 问题背景与时间线 (Problem Context & Timeline)
第一阶段:API 拒绝请求
0(或留空默认值)时,系统会将"dimensions": 0发送给 API。0维度,API 直接返回 HTTP 400 错误。第二阶段:发现深层阻塞逻辑 (本次修复的核心)
0,API 成功返回1024维的向量),暴露出一个更深层的问题:knowledge_base_service.py内部存在一个强校验:if len(vec) != provider.get_dim(): raise ValueError(...)。0,而 API 返回了真实的1024维,1024 != 0,从而抛出错误:测试嵌入模型失败: 嵌入向量维度不匹配,实际是 1024,然而配置是 0。🛠️ 完整的通用修复方案 (Generic Complete Fix)
为了让默认的
0维度配置能够真正做到“开箱即用”,且对支持 dimensions 的提供商(如 OpenAI 官方)无影响;对拒绝该参数的提供商(如 SiliconFlow、vLLM)自动规避,本次提交进行了双层修复:1. 业务层:引入“智能维度探测” (
knowledge_base_service.py)if configured_dim != 0 and actual_dim != configured_dim:0(留空)时,系统将其视为“未知”,自动接纳探测 API 返回的实际维度(如 1024)用于后续数据库初始化。只有当用户明确填写了错误的非零维度时,才执行强拦截以保护数据库。2. 接口层:通用化参数过滤 (
openai_embedding_source.py)kwargs时,确保:if dim_val > 0: kwargs["dimensions"] = dim_val。0发送给任何后端。移除了之前为了规避警告而写的特定域名(api.siliconflow.cn)硬编码,采用极简的通用逻辑,提升了代码的向下兼容性与整洁度。📊 接口实测证据 (Empirical Evidence)
针对此完整的端到端修复,实际使用的模型为 SiliconFlow 的
BAAI/bge-m3,进行了全面回归验证:embedding_dimensions)0(或留空默认)dimensions字段,获取真实维度102410241024(正确填写)dimensions: 1024,请求正常768(错误填写)dimensions: 768,请求响应✅ 结论 (Conclusion)
本次提交彻底打通了
embedding_dimensions = 0的默认使用链路。在保留数据库安全拦截底线的同时,用通用的方式解决了底层 API 的参数兼容性问题与顶层建库逻辑的逻辑“死锁”,使得系统更加智能、稳定。