Skip to content

Conversation

@ZZUZSL1024
Copy link

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ZZUZSL1024, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求全面支持 Paraformer 模型的 LoRA 微调,提供了从训练到评估的完整工具链。同时,它还引入了一个基于 Flask 的新 Web 服务,为音频处理带来了声纹注册和说话人识别等高级功能。此外,对现有 WebSocket 客户端和服务器进行了大量增强,重点关注改进日志记录、延迟测量以及通过线程池和信号量实现强大的并发管理,从而确保了更稳定、更具扩展性的实时语音处理系统。

Highlights

  • Paraformer LoRA 微调支持: 新增了 Paraformer 模型的 LoRA 微调功能,包括完整的训练、推理和 CER 评估脚本及配置。
  • 新增 Flask 音频处理服务: 引入了一个新的 Flask Web 服务,提供声纹注册、带说话人识别的 ASR 以及声纹相似度计算等高级音频处理功能。
  • WebSocket 客户端增强: 显著改进了 WebSocket 客户端,增加了结构化日志记录、延迟跟踪、热词处理,并优化了异步编程实践,提升了稳定性和用户体验。
  • WebSocket 服务端并发优化: 对 WebSocket 服务端进行了重大重构,通过引入线程池和信号量实现了非阻塞并发推理,集成了说话人验证,并增加了离线音频片段保存功能,大幅提升了系统的可伸缩性和调试能力。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次拉取请求引入了LoRA微调Paraformer的训练和推理脚本,并对WebSocket客户端和服务端进行了大量改进。客户端增加了日志记录和延迟统计功能,服务端则通过引入线程池和信号量显著提升了并发处理能力和稳定性。这些改进对于提高系统的可观测性、性能和健壮性非常有益。同时,也发现了一些可以进一步优化和修正的地方,主要集中在数据处理效率和配置一致性上。


for name, db_emb in speaker_db.items():
# 计算余弦相似度
data_list = json.loads(db_emb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

process_cam_result_with_identify_speakers 函数中,json.loads(db_emb) 在循环内部被重复调用。如果 speaker_db 包含大量条目,这种重复的 JSON 解析会成为性能瓶颈。建议在加载 speaker_db 时一次性将 db_emb 解析为列表,或者至少在循环外部进行解析,以提高效率。

Suggested change
data_list = json.loads(db_emb)
# 计算余弦相似度
# 优化:db_emb 应该在 speaker_db 加载时就解析为列表,避免循环内重复解析
arr = np.array(json.loads(db_emb), dtype=np.float32)

Comment on lines +201 to +212
with open(args.hotword, encoding="utf-8") as f_scp:
hot_lines = f_scp.readlines()

hot_list = []
for line in hot_lines:
words = line.strip().split(" ")
if len(words) < 2:
print("Please checkout format of hotwords")
words = line.strip().split()
if not words:
continue
try:
fst_dict[" ".join(words[:-1])] = int(words[-1])
except ValueError:
print("Please checkout format of hotwords")
hotword_msg = json.dumps(fst_dict)
# Python AutoModel: 用逗号分隔多个热词
hot_list.append(words[0])

hotword_msg = ",".join(hot_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

record_from_scp 函数中,热词处理逻辑已更改为生成一个逗号分隔的热词字符串。然而,funasr_wss_server.pyws_serve 函数的第418-421行)期望 JSON 消息中的 hotwords 字段直接赋值给 websocket.status_dict_asr["hotword"]websocket.status_dict_asr_online["hotword"]。如果服务器的 AutoModel 期望一个字典(如原始客户端代码和常见热词用法所示),那么发送一个逗号分隔的字符串可能会导致热词识别不正确。请确保客户端发送的 hotword_msg 格式与服务器 AutoModel 期望的格式一致。

Suggested change
with open(args.hotword, encoding="utf-8") as f_scp:
hot_lines = f_scp.readlines()
hot_list = []
for line in hot_lines:
words = line.strip().split(" ")
if len(words) < 2:
print("Please checkout format of hotwords")
words = line.strip().split()
if not words:
continue
try:
fst_dict[" ".join(words[:-1])] = int(words[-1])
except ValueError:
print("Please checkout format of hotwords")
hotword_msg = json.dumps(fst_dict)
# Python AutoModel: 用逗号分隔多个热词
hot_list.append(words[0])
hotword_msg = ",".join(hot_list)
fst_dict = {}
for line in hot_lines:
words = line.strip().split()
if not words:
continue
# Assuming the format is "hotword score"
if len(words) < 2:
print("Please checkout format of hotwords file (e.g., 'word score')")
continue
try:
fst_dict[" ".join(words[:-1])] = int(words[-1])
except ValueError:
print("Please checkout format of hotwords (score must be an integer)")
continue
hotword_msg = json.dumps(fst_dict, ensure_ascii=False)

Comment on lines +23 to +25
if key is None and isinstance(record.get("source"), dict):
key = record["source"].get("key")
keys.append(key or "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

load_jsonl 函数中,key 的提取逻辑可以更清晰和健壮。如果 key 未找到,当前会默认为空字符串,这可能不利于后续记录的追踪。为了与输出逻辑(第72行)保持一致,建议在 key 缺失时生成一个唯一的标识符。

Suggested change
if key is None and isinstance(record.get("source"), dict):
key = record["source"].get("key")
keys.append(key or "")
key = record.get("key")
if key is None and isinstance(record.get("source"), dict):
key = record["source"].get("key")
if key is None:
key = f"input_utt_{len(keys)}"
keys.append(key)

from werkzeug.utils import secure_filename
import torch
import os
from openai import OpenAI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

代码中导入了 OpenAI 模块,但在提供的代码片段中并未发现其使用。建议移除未使用的导入,以保持代码整洁。

Suggested change
from openai import OpenAI
# from openai import OpenAI

Comment on lines +238 to +239
model="paraformer-zh",
model_revision="v2.0.4",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

model_asr 现在硬编码为 "paraformer-zh",而不是使用 args.asr_model。这移除了通过命令行参数配置离线 ASR 模型的灵活性,如果 args.asr_model 旨在可配置,这可能是无意的。请确认此更改是否符合预期的可配置性。

model_asr = AutoModel(
    model=args.asr_model, # Revert to using the argument for flexibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant