CogAGENT
CogAGENT is a toolkit for building multimodal, knowledgeable and controllable conversational agents
A conversational agent refers to a dialogue system that conducts natural language processing and replies in human language. We provide 17 models and integrate a variety of datasets covered above features. We decouple and modularize them flexibly to make users more convenient for development and research. First, we aim to build a multimodal interaction module comprises of perception, response, enhancement and embodiment to conduct multimodal conversations. Second, we manage to leverage external knowledge to enhance internal dialogue contexts by knowledgeable response module. Third, we implement controllable generation modules with empathic reply and dialogue safety.
Contribution
A multimodal, knowledgeable and controllable conversational framework
We propose a unified framework named CogAGENT, incorporating Multimodal Module, Knowledgeable Module and Controllable Module to conduct multimodal interaction, generate knowledgeable response and make replies under control in real scenarios.
Comprehensive conversational models, datasets and metrics
CogAGENT implements 17 conversational models covering task-oriented dialogue, open-domain dialogue and question-answering tasks. We also integrate some widely used conversational datasets and metrics to verify the performance of models.
Open-source and modularized conversational toolkit
We release CogAGENT as an open-source toolkit and modularize conversational agents to provide easy-to-use interfaces. Hence, users can modify codes for their own customized models or datasets.
Online dialogue system
We release an online system, which supports conversational agents to interact with users. We also provide a short video to illustrate how to use it.
Availabel datasets of CogAGENT
Dataset | Category | Reference |
---|---|---|
MultiWOZ 2.0 | Fundamental | MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling |
MultiWOZ 2.1 | Fundamental | MultiWOZ 2.1: A Consolidated Multi-Domain Dialogue Dataset with State Corrections and State Tracking Baselines |
Chinese chitchat Dataset | Fundamental | Chinese chitchat |
MOD | Multimodal | DSTC10-Track1 |
MMConvQA | Multimodal | MMCoQA: Conversational Question Answering over Text, Tables, and Images |
OK-VQA | Multimodal | Ok-vqa: A visual question answering benchmark requiring external knowledge |
VQAv2 | Multimodal | Making the v in vqa matter: Elevating the role of image understanding in visual question answering |
WAY | Multimodal | Where Are You? Localization from Embodied Dialog |
Wizard of Wikipedia | Knowledgeable | Wizard of Wikipedia: Knowledge-Powered Conversational Agents |
Holl-E | Knowledgeable | Towards Exploiting Background Knowledge for Building Conversation Systems |
OpenDialKG | Knowledgeable | OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs |
DIASAFETY | Controllable | On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark |
EmpatheticDialogues | Controllable | Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset |
Enhancer module details of CogAGENT
Modal | Category | Reference |
---|---|---|
SUMBT | Fundamental | SUMBT: Slot-Utterance Matching for Universal and Scalable Belief Tracking |
SC-LSTM | Fundamental | Semantically conditioned lstm-based natural language generation for spoken dialogue systems |
BERTNLU | Fundamental | ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems |
MDRG | Fundamental | Towards end-to-end multi-domain dialogue modelling |
UBAR | Fundamental | owards fully end-to-end task-oriented dialog system with gpt- |
GPT2 for Chinese chitchat | Fundamental | Chinese chitchat |
TransResNet-Ret | Multimodal | Image-Chat: Engaging Grounded Conversations |
MMBERT | Multimodal | Selecting Stickers in Open-Domain Dialogue through Multitask Learning |
MAE | Multimodal | MMCoQA: Conversational Question Answering over Text, Tables, and Images |
PICa | Multimodal | An empirical study of gpt-3 for few-shot knowledge-based vqa |
LingUNet | Multimodal | Where Are You? Localization from Embodied Dialog |
DifffKS | Knowledgeable | Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation |
KE-Blender | Knowledgeable | Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation |
NPH | Knowledgeable | Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding |
BERTQA | Knowledgeable | Dense Passage Retrieval for Open-Domain Question Answering |
KEMP | Controllable | OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs |
RobertaClassifier | Controllable | On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark |