The goal is to let intelligent agents interpret and learn high-level user intents which span multiple mobile apps, e.g., to plan a dinner we may need to use Yelp -> Maps -> SMS, etc.
There are several ways to train app embeddings. You can use doc2vec on app descriptions to project each app into a semantic space. Alternatively, you can collect stream of app invocations from people's smart phones and treat it as a corpus of words and apply word2vec.
In sequence_labeling directory you will find following:
- train, test, dev splits for app sequences
train.apps.int
,test.apps.int
,dev.apps.int
. The numeric ids correspond to labels providedapps.csv
file. - B/I/O tagging information for the app sequences
train.labels.int
,test.labels.int
,dev.labels.int
. The numeric ids correspond to labels provided inlabels.csv
file.; - CRFSuite sequence labeling models for these sequences.
- App invocation sequences collected from 19 users' Android phones (
R1.csv
); - Clean app sequences (apps irrelevant to the intents removed) with user intents annotated by participants (
R2.csv
); - Speech commands (both manual transcripts and Google ASR 1-best hypotheses) at app level to re-enact part of intents in 2 (
R3.csv
).
Please cite following work if you use this dataset in your research work.
@CONFERENCE {sunSLT2016,
author = "Ming Sun, Aasish Pappu, Yun-Nung Chen, Alexander I Rudnicky",
title = "Weakly Supervised User Intent Detection for Multi-Domain Dialogues",
booktitle = "IEEE Workshop on Spoken Language Technology",
year = "2016",
publisher = "IEEE"
}
You can find a video demo here: https://youtu.be/FvQto8pP1OU
Creative Commons License 1.0
For any questions/suggestions contact: [email protected], [email protected]