CogAgent

CogAGENT is a toolkit for building multimodal, knowledgeable and controllable conversational agents

A conversational agent refers to a dialogue system that conducts natural language processing and replies in human language. We provide 17 models and integrate a variety of datasets covered above features. We decouple and modularize them flexibly to make users more convenient for development and research. First, we aim to build a multimodal interaction module comprises of perception, response, enhancement and embodiment to conduct multimodal conversations. Second, we manage to leverage external knowledge to enhance internal dialogue contexts by knowledgeable response module. Third, we implement controllable generation modules with empathic reply and dialogue safety.

Contribution

A multimodal, knowledgeable and controllable conversational framework

We propose a unified framework named CogAGENT, incorporating Multimodal Module, Knowledgeable Module and Controllable Module to conduct multimodal interaction, generate knowledgeable response and make replies under control in real scenarios.

Comprehensive conversational models, datasets and metrics

CogAGENT implements 17 conversational models covering task-oriented dialogue, open-domain dialogue and question-answering tasks. We also integrate some widely used conversational datasets and metrics to verify the performance of models.

Open-source and modularized conversational toolkit

We release CogAGENT as an open-source toolkit and modularize conversational agents to provide easy-to-use interfaces. Hence, users can modify codes for their own customized models or datasets.

Online dialogue system

We release an online system, which supports conversational agents to interact with users. We also provide a short video to illustrate how to use it.

Availabel datasets of CogAGENT

Dataset	Category	Reference
MultiWOZ 2.0	Fundamental	MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
MultiWOZ 2.1	Fundamental	MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines
Chinese chitchat Dataset	Fundamental	Chinese chitchat
MOD	Multimodal	DSTC10-Track1
MMConvQA	Multimodal	MMCoQA: Conversational Question Answering over Text, Tables, and Images
OK-VQA	Multimodal	Ok-vqa: A visual question answering benchmark requiring external knowledge
VQAv2	Multimodal	Making the v in vqa matter: Elevating the role of image understanding in visual question answering
WAY	Multimodal	Where Are You? Localization from Embodied Dialog
Wizard of Wikipedia	Knowledgeable	Wizard of Wikipedia: Knowledge-Powered Conversational Agents
Holl-E	Knowledgeable	Towards Exploiting Background Knowledge for Building Conversation Systems
OpenDialKG	Knowledgeable	OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
DIASAFETY	Controllable	On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
EmpatheticDialogues	Controllable	Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

Enhancer module details of CogAGENT

Modal	Category	Reference
SUMBT	Fundamental	SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking
SC-LSTM	Fundamental	Semantically conditioned lstm-based natural language generation for spoken dialogue systems
BERTNLU	Fundamental	ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
MDRG	Fundamental	Towards end-to-end multi-domain dialogue modelling
UBAR	Fundamental	owards fully end-to-end task-oriented dialog system with gpt-
GPT2 for Chinese chitchat	Fundamental	Chinese chitchat
TransResNet-Ret	Multimodal	Image-Chat: Engaging Grounded Conversations
MMBERT	Multimodal	Selecting Stickers in Open-Domain Dialogue through Multitask Learning
MAE	Multimodal	MMCoQA: Conversational Question Answering over Text, Tables, and Images
PICa	Multimodal	An empirical study of gpt-3 for few-shot knowledge-based vqa
LingUNet	Multimodal	Where Are You? Localization from Embodied Dialog
DifffKS	Knowledgeable	Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation
KE-Blender	Knowledgeable	Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation
NPH	Knowledgeable	Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding
BERTQA	Knowledgeable	Dense Passage Retrieval for Open-Domain Question Answering
KEMP	Controllable	OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs
RobertaClassifier	Controllable	On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Multimodal, Knowledgeable and Controllable Agents in Dialogue Systems