跨模态理解

发布时间: 2024-11-21

主要围绕视觉-文本的跨模态理解开展研究。针对多视角、多模态的视觉内容,开展基于内容问答、内容解释、常识推理等多层次理解研究。从认识与认知两个层次入手,针对内容难定位、模态关联乱、数据有偏差、信息冗余多等认识难题,以及推理链路难构建、必要信息易忽视、常识知识不充分、存在统计偏见等认知难题。研究大模型高效微调、跨模态对齐、因果去偏、知识融合等理论技术。提出了自适应提示学习、双阶段常识融合等代表性方法。实现了回答准确、描述清晰、叙述顺畅的高效跨模态理解。将能在智慧教育、工业机器人控制等领域实现应用。



代表性论文


[1] Bowen Yuan, Sisi You, Bing-Kun Bao*, Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering. ACM International Conference on Multimedia (ACM MM) 2023 【论文】【Github


[2] Mengqi Yuan, Gengyun Jia, Bing-Kun Bao*. GPT-based Knowledge Guiding Network for Commonsense Video Captioning. IEEE Transactions on Multimedia (TMM) 2023 【论文】【Github


[3] Pengju Li, Zhiyi Tan, Bing-Kun Bao*. Multiview Language Bias Reduction for Visual Question Answering. IEEE Multimedia 2023 【论文


[4] Mengqi Yuan, Bing-Kun Bao*, Zhiyi Tan, and Changsheng Xu. Adaptive Text Denoising Network for Image Caption Editing. ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMM) 2022 【论文】【Github


[5] Jianyu Wang, Bing-Kun Bao*, Changsheng Xu. DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering. IEEE Transactions on Multimedia (TMM) 2022 【论文

联系我们

地址:江苏省南京市栖霞区仙林大学城文苑路9号(南京邮电大学仙林校区)计算机学科楼

电话:13813992640(贾老师)

邮箱:bingkunbao@njupt.edu.cn(鲍老师)