Multimodal, Knowledgeable and Controllable Agents in Dialogue Systems

CogsAGENT provides 17 models and integrate a variety of datasets covered above features, which decouple and modularize them flexibly to make users more convenient for development and research. Get Started


CogAGENT is a toolkit for building multimodal, knowledgeable and controllable conversational agents

A conversational agent refers to a dialogue system that conducts natural language processing and replies in human language. We provide 17 models and integrate a variety of datasets covered above features. We decouple and modularize them flexibly to make users more convenient for development and research. First, we aim to build a multimodal interaction module comprises of perception, response, enhancement and embodiment to conduct multimodal conversations. Second, we manage to leverage external knowledge to enhance internal dialogue contexts by knowledgeable response module. Third, we implement controllable generation modules with empathic reply and dialogue safety.



Feature Icon

A multimodal, knowledgeable and controllable conversational framework

We propose a unified framework named CogAGENT, incorporating Multimodal Module, Knowledgeable Module and Controllable Module to conduct multimodal interaction, generate knowledgeable response and make replies under control in real scenarios.

Feature Icon

Comprehensive conversational models, datasets and metrics

CogAGENT implements 17 conversational models covering task-oriented dialogue, open-domain dialogue and question-answering tasks. We also integrate some widely used conversational datasets and metrics to verify the performance of models.

Feature Icon

Open-source and modularized conversational toolkit

We release CogAGENT as an open-source toolkit and modularize conversational agents to provide easy-to-use interfaces. Hence, users can modify codes for their own customized models or datasets.

Feature Icon

Online dialogue system

We release an online system, which supports conversational agents to interact with users. We also provide a short video to illustrate how to use it.

Availabel datasets of CogAGENT

Dataset Category Reference
MultiWOZ 2.0 Fundamental MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
MultiWOZ 2.1 Fundamental MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
Chinese chitchat Dataset Fundamental Chinese chitchat
MOD Multimodal DSTC10-Track1
MMConvQA Multimodal MMCoQA: Conversational Question Answering over Text, Tables, and Images
OK-VQA Multimodal Ok-vqa: A visual question answering benchmark requiring external knowledge
VQAv2 Multimodal Making the v in vqa matter: Elevating the role of image understanding in visual question answering
WAY Multimodal Where Are You? Localization from Embodied Dialog
Wizard of Wikipedia Knowledgeable Wizard of Wikipedia: Knowledge-Powered Conversational Agents
Holl-E Knowledgeable Towards Exploiting Background Knowledge for Building Conversation Systems
OpenDialKG Knowledgeable OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
DIASAFETY Controllable On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
EmpatheticDialogues Controllable Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

Enhancer module details of CogAGENT

Modal Category Reference
SUMBT Fundamental SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking
SC-LSTM Fundamental Semantically conditioned lstm-based natural language generation for spoken dialogue systems
BERTNLU Fundamental ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
MDRG Fundamental Towards end-to-end multi-domain dialogue modelling
UBAR Fundamental owards fully end-to-end task-oriented dialog system with gpt-
GPT2 for Chinese chitchat Fundamental Chinese chitchat
TransResNet-Ret Multimodal Image-Chat: Engaging Grounded Conversations
MMBERT Multimodal Selecting Stickers in Open-Domain Dialogue through Multitask Learning
MAE Multimodal MMCoQA: Conversational Question Answering over Text, Tables, and Images
PICa Multimodal An empirical study of gpt-3 for few-shot knowledge-based vqa
LingUNet Multimodal Where Are You? Localization from Embodied Dialog
DifffKS Knowledgeable Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation
KE-Blender Knowledgeable Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation
NPH Knowledgeable Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding
BERTQA Knowledgeable Dense Passage Retrieval for Open-Domain Question Answering
KEMP Controllable OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
RobertaClassifier Controllable On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark