基於多模態大模型的行為識別評估系統-技術移轉-產業服務-工業技術研究院

:::

產業服務

技術名稱：基於多模態大模型的行為識別評估系統

技術簡介

基於多模態大模型的行為識別評估技術，目前正處於發展階段，尚未落實到實際場域，本計劃提出明確的系統框架和具體化策略，並且在 3 個工作實例上進行具體化實現，驗證我們所提出的系統框架的可行性，這 3 個工作實例包括：齒科術式流程評估、銅箔加工流程評估、元件銲錫異常評估，分別屬於醫療訓練和工業SOP領域，相同的技術未來還可以應用在老年照護、動植物觀察、等領域。

Abstract

Behavior recognition and evaluation technology based on multi-modal large models is currently in the development stage and has not yet been implemented in actual fields. This plan proposes a clear system framework and specific strategies, and implements and verifies them on three working examples. The feasibility of our proposed system framework. These three working examples include: dental surgery process assessment, copper foil processing process assessment, and component soldering abnormality assessment. They belong to the fields of medical training and industrial SOP respectively. The same technology can still be used in the future. Used in elderly care, animal and plant observation, and other fields.

技術規格

識別準確度：＞90％

Technical Specification

Reognition Accuracy : ＞90％

技術特色

1. 減少訓練資料的需求: 能夠利用預訓練模型並通過微調來應對新任務。 2. 提升模型的通用性: 能夠同時處理多種不同任務。 3. 增強對圖像變化的容忍度: 使模型能夠適應新的環境和情境，尤其是在少量樣本和未知場景下依然能夠表現良好。 4. 跨模態的能力: 不限於訓練集中的圖像分類，能夠根據用戶的語言描述來進行圖像檢索或分類。 5. 強大的特徵抽取能力: 能夠提取出圖像的多層次特徵，包括顏色、形狀、物體、背景等多種信息。 6. 更強的推理與理解能力: 能夠根據輸入的圖像和文本進行複雜的語義推理。 7. 持續提升的能力: 隨著大模型能力的提升而提升自身能力。

應用範圍

醫療訓練、工業SOP、老年照護、動植物觀察、等領域的行為評估。

接受技術者具備基礎建議（設備）

光學取像、電腦視覺、影像處理、Python軟體、GPU

接受技術者具備基礎建議（專業）

具備多模態大模型知識、Python程式能力

技術分類智慧視覺系統技術

聯絡資訊

聯絡人：謝靜婷智慧視覺系統組

電話：+886-3-5917801 或 Email：manahsieh＠itri.org.tw

客服專線：+886-800-45-8899

傳真：+886-3-5917531