零樣本動作識別技術-技術移轉-產業服務-工業技術研究院

:::

產業服務

技術名稱：零樣本動作識別技術

技術簡介

本技術以單張影像 (image-based) 視覺語言預訓練模型(Vision-Language Pretraining Model, VLP or VLM)為基礎，結合大語言模型(LLM)以及時間序(time sequence) 邏輯推演，運用在第一人稱視角(egocentric) 操作SOP之動作辨識(action recognition) 應用。相較於文獻中的video-based VLM cube embedding，運算量大幅降低，解決了運算資源過於昂貴的實用面問題。

Abstract

This technology is built upon a single-image (image-based) Vision-Language Pretraining Model (VLP or VLM), integrating a Large Language Model (LLM) and temporal sequence logic deduction. It is applied in action recognition tasks specifically for Standard Operating Procedure (SOP) execution from a first-person perspective (egocentric). In contrast to the video-based VLM cube embedding discussed in the literature, our approach significantly reduces computational requirements, addressing practical challenges associated with costly computational resources.

技術規格

動作辨識Zero-shot 第一人稱視角 Top 5 accuracy ＞ 90％操作評估模組Reaction time ＜ 3 secs/action

Technical Specification

•Zero-shot Action Recognition (First-Person Perspective): Top 5 Accuracy: ＞ 90％ •Operational Assessment Module: Reaction Time: ＜ 3 secs/action

技術特色

系統準確度 (top 1 accuracy)平均 85％ (介於 77％ ~ 91％之間)。此外，採用image-based VLM, 運算量大幅低於 video-based VLM，具有運算成本上的可行性。

應用範圍

可用於各種場域如：教育、培訓、維修、遊戲、廣告等進行動作評估。其中，尤其是高危險作業。

接受技術者具備基礎建議（設備）

個人電腦

接受技術者具備基礎建議（專業）

軟體設計能力

技術分類智慧視覺系統技術

聯絡資訊

聯絡人：林均蔓智慧視覺系統組

電話：+886-3-5916705 或 Email：jmlin＠itri.org.tw

客服專線：+886-800-45-8899

傳真：+886-3-5917531