MIntRec2.0

Features • Download • Dataset Description • Benchmark Framework • Quick start

MIntRec2.0 is a large-scale multimodal multi-party benchmark dataset for intent recognition and out-of-scope detection in conversations. We also provide benchmark framework and evaluation codes for usage.

Example:

Updates 🔥 🔥 🔥

Date	Announcements
1/2024	🎆 🎆 The first large-scale multimodal intent dataset has been released. Refer to the directory MIntRec2.0 for the dataset and codes. Read the paper -- MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations (Published in ICLR 2024).
10/2022	🎆 🎆 The first multimodal intent dataset is published. Refer to the directory MIntRec for the dataset and codes. Read the paper -- MIntRec: A New Dataset for Multimodal Intent Recognition (Published in ACM MM 2022).

Features

MIntRec2.0 has the following features:

Large in Scale: Compared with our first version of multimodal intent recognition dataset (MIntRec), MIntRec2.0 increase the data-scale from 2.2K to 15K, with 30 intent classes, 9.3K in-scope and 5.7K out-of-scope annotated utterances with text, video, and audio modalities.
Multi-turn & Multi-party Dialogues: It contains 1,245 dialogues with an average of 12 utterances per dialogue in continuous conversations. Each utterance has an intent label in each dialogue. Each dialogue has at least two different speakers with annotated speaker identities for each utterance.
Out-of-scope Detection: As real-world dialogues are in the open-world scenarios as suggested in TEXTOIR, we further include an OOS tag for detecting those utterances that do not belong to any of existing intent classes. They can be used for out-of-distribution detection and improve system robustness.

Download

Zenodo

The brief version of the dataset (text and video, audio feature files, 7G) can be downloaded from zenodo.

Feature data

We provide video feature files, audio feature files, and text annotation files (9G), which can be downloaded from Google Drive.

Raw data

We also provide raw video data (13G), which can be downloaded from Google Drive.

Dataset Description

Data sources: The raw videos are collected from three TV series: Superstore, The Big Bang Theory, and Friends.
Dialogue division: We manually divide dialogues based on the scenes and episode.
Speaker information: We manually annotate 21, 7, 6 main characters in Superstore, The Big Bang Theory, and Friends, respectively.
Intent classes
- Express emotions or attitudes (16): doubt, acknowledge, refuse, warn, emphasize, complain, praise, apologize, thank, criticize, care, agree, oppose, taunt, flaunt, joke
- Acheve goals (14): ask for opinions, confirm, explain, invite, plan, inform, advise, arrange, introduce, comfort, leave, prevent, greet, ask for help

Statistics

Item	Statistics
Number of coarse-grained intents	2
Number of fine-grained intents	30
Number of dialogues	1,245
Number of utterances	15,040
Number of words in utterances	118,477
Number of unique words in utterances	9,524
Average length of utterances	7.0
Maximum length of utterances	46
Average video clip duration	3.0 (s)
Maximum video clip duration	19.9 (s)
Video hours	12.3 (h)

Data distribution of in-scope (IS) and out-of-scope (OOS) samples:

Intent distribution:

Benchmark Framework

We present a framework to benchmark multimodal intent understanding and out-of-scope detection in both single-turn and multi-turn conversational scenarios.

The overall framework:

The framework contains 4 main modules:

Data Organization: Single-turn dialogues use utterance-level samples as inputs. Multi-turn dialogues are arranged chronologically based on the order in which the speakers take their turn.
Multimodal Feature Extraction: Extracting features from text, video, and audio modalities. For multi-turn dialogues, we concatenate the context information with the current utterance and separate them with a special token.
Multimodal Fusion: Multimodal fusion methods (e.g., MAG-BERT, MulT) can be used for fusing different modalities.
Training: In-scope data uses cross-entropy loss. Out-of-scope data uses outlier exposure loss. It may also contain the multimodal fusion loss for capturing cross-modal interactions.
Inference: Open set recognition method (e.g., DOC) can be used to identify K known classes and detect one out-of-scope class.

Quick start

Use anaconda to create Python environment

conda create --name MIntRec python=3.9
conda activate MIntRec

Install PyTorch (Cuda version 11.2)

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Clone the MIntRec repository.

git clone git@github.com:thuiar/MIntRec2.0.git
cd MIntRec

Install related environmental dependencies
```
pip install -r requirements.txt
```
Run examples (Take mag-bert as an example, more can be seen here)
```
sh examples/run_mag_bert_baselines.sh
```

Citations

If this work is helpful, or you want to use the codes and results in this repo, please cite the following papers:

MIntRec2.0: A Large-scale Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
MIntRec: A New Dataset for Multimodal Intent Recognition

@inproceedings{
  zhang2024mintrec,
  title={{MI}ntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations},
  author={Hanlei Zhang and Xin Wang and Hua Xu and Qianrui Zhou and Kai Gao and Jianhua Su and jinyue Zhao and Wenrui Li and Yanting Chen},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=nY9nITZQjc}
}

@inproceedings{MIntRec,
   author = {Zhang, Hanlei and Xu, Hua and Wang, Xin and Zhou, Qianrui and Zhao, Shaojie and Teng, Jiayan},
   title = {MIntRec: A New Dataset for Multimodal Intent Recognition},
   year = {2022},
   booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
   pages = {1688–1697},
}

The dataset and camera ready version of the paper will be updated recently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MIntRec2.0

Updates 🔥 🔥 🔥

Features

Download

Zenodo

Feature data

Raw data

Dataset Description

Statistics

Data distribution of in-scope (IS) and out-of-scope (OOS) samples:

Intent distribution:

Benchmark Framework

Quick start

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

MIntRec2.0

Updates 🔥 🔥 🔥

Features

Download

Zenodo

Feature data

Raw data

Dataset Description

Statistics

Data distribution of in-scope (IS) and out-of-scope (OOS) samples:

Intent distribution:

Benchmark Framework

Quick start

Citations