Skip to content

Commit efc1ec2

Browse files
author
Jinho Choi
committed
Release 1.0.
0 parents  commit efc1ec2

File tree

5 files changed

+995363
-0
lines changed

5 files changed

+995363
-0
lines changed

LICENSE.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Copyright 2017, Emory University
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Reading Comprehension
2+
3+
Reading Comprehension challenges the machine's ability of understanding human conversations.
4+
This is a part of the [Character Mining](../../../character-mining) project led by the [Emory NLP](http://nlp.mathcs.emory.edu) research group.
5+
The following shows a multiparty dialogue where all personal proper nouns are encoded by their entity IDs such as `@ent01`: Joey, `@ent02`: Rachel, `@ent03`: Ross, `@ent04`: Neuman, `@ent05`: Paul, and a plot summary describing this dialogue.
6+
7+
| Speaker | Utterance |
8+
|:-------:|-----------|
9+
| - | [Scene: Central Perk, `@ent01` and `@ent02` are there as `@ent03` enters.] || `@ent03` | Hey! Oh, I’m so glad you guys are here. I’ve been dying to tell someone what happened in the Paleontology department today.
10+
| `@ent01` | (To `@ent02`) Do you think he saw us or can we still sneak out? || `@ent03` | Professor `@ent04`, the head of the department, so ... || `@ent02` | They made you head of the department! || `@ent03` | No, I get to teach one of his advanced classes! Why didn’t I get head of the department? || `@ent01` | Oh! Hey `@ent02`, listen umm ... || `@ent02` | Yeah. || `@ent01` | I got a big date coming up, do you know a good restaurant? || `@ent02` | Uh, `@ent05`’s Cafe. They got great food and it’s really romantic. || `@ent01` | Ooh, great! Thanks! || `@ent02` | Yeah! Oh, and then afterwards you can take her to the Four Seasons for drinks. Or you go downtown and listen to some jazz. Or dancing - Oh! Take her dancing! || `@ent01` | You sure are naming a lot of ways to postpone xxx, I’ll tell ya ... || `@ent02` | Ooh, I miss dating. Gettin’ all dressed up and going to a fancy restaurant. I’m not gonna be able to do that for so long, and it’s so much fun! I mean not that sitting at home worrying about giving birth to a sixteen pound baby is not fun. || `@ent01` | Hey, y’know what? || `@ent02` | Huh? || `@ent01` | Why don’t I take you out? || `@ent02` | What?! `@ent01`, you don’t want to go on a date with a pregnant lady. || `@ent01` | Yes I do! And we’re gonna go out, we’re gonna have a good time, and take your mind off of childbirth and c-sections and-and giant baby heads stretching out ... || `@ent02` | (interrupting) Okay! I’ll go with ya! I’ll go! I’ll go with ya. || `@ent01` | It’ll be fun. || `@ent02` | All right? |
11+
12+
<p align="center"><code>@placeholder</code> misses dressing up for romantic dates so <code>@ent01</code> promises to take <code>@ent02</code> out.
13+
</p>
14+
15+
Your task is to identify the `@placeholder` in the plot summary by inferring the meaning of the dialogue.
16+
This task is challenging because it needs to match contexts between colloquial (dialog) and formal (summary) writings.
17+
18+
19+
## Dataset
20+
21+
Personal proper nouns in both dialogues and plot summaries are encoded by their entity IDs.
22+
Note that the entity IDs are consistent within each dialogue but not across different dialogues.
23+
Each query contains exactly one `@placeholder` such that multiple queries can be generated from a plot summary.
24+
25+
* Latest release: [v1.0](https://github.com/emorynlp/reading-comprehension/archive/reading-comprehension-1.0.tar.gz).
26+
* [Release notes](https://github.com/emorynlp/reading-comprehension/releases).
27+
28+
## Statistics
29+
30+
Queries are randomly distributed to the training (TRN), development (DEV), evaluation (TST) sets.
31+
32+
* U / Q: the average number of utterances per query.
33+
* {E} / Q: the average number of entity types per query.
34+
* [E] / Q: the average number of entities per query.
35+
* {E} / U: the average number of entity types per utterance.
36+
* [E] / U: the average number of entities per utterance.
37+
38+
| Dataset | Queries | U / Q | {E} / Q | [E] / Q | {E} / U | [E] / U |
39+
|:-------:|--------:|------:|--------:|--------:|--------:|--------:|
40+
| TRN | 10,785 | 16.71 | 2.57 | 3.99 | 5.67 | 26.64 |
41+
| DEV | 1,349 | 16.77 | 2.56 | 3.94 | 5.68 | 26.73 |
42+
| TST | 1,353 | 16.53 | 2.58 | 4.03 | 5.70 | 26.31 |
43+
| Total | 13,487 | 16.70 | 2.57 | 3.99 | 5.68 | 26.62 |
44+
45+
46+
## Annotation
47+
48+
Every instance consists of a query and its answer along with the utterances that the query is based on.
49+
50+
```json
51+
{
52+
"scene_id": "s01_e01_c10",
53+
"query": "After @ent02 leaves to go out on on a date with a woman whose name he has trouble remembering , @ent00 asks @placeholder ,",
54+
"answer": "@ent05",
55+
"utterances": [
56+
{
57+
"speakers": "@ent00",
58+
"tokens": "( scornful ) Grab a spoon . Do you know how long it 's been since I 've grabbed a spoon ? Do the words ' @ent01 , do n't be a hero ' mean anything to you ?"
59+
},
60+
{
61+
"speakers": "@ent02",
62+
"tokens": "Great story ! But , I uh , I got ta go , I got a date with @ent03 -- @ent04 -- @ent03 ... Oh man , ( looks to @ent05 )"
63+
},
64+
{
65+
"speakers": "@ent05",
66+
"tokens": "@ent04 's the screamer , @ent03 has cats ."
67+
},
68+
{
69+
"speakers": "@ent02",
70+
"tokens": "Right . Thanks . It 's June . I 'm outta here . ( Exits . )"
71+
},
72+
{
73+
"speakers": "@ent00",
74+
"tokens": "Y'know , here 's the thing . Even if I could get it together enough to - to ask a woman out , ... who am I gon na ask ? ( He gazes out of the window . )"
75+
},
76+
{
77+
"speakers": "",
78+
"tokens": "[ Cut to @ent06 staring out of her window . ]"
79+
}
80+
]
81+
}
82+
```
83+
84+
## Citation
85+
86+
* [Challenging Reading Comprehension on Daily Conversation: Passage Completion on Multiparty Dialog](). Kaixin Ma, Tomasz Jurczyk, and Jinho D. Choi. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL'18, 2018.
87+
88+
89+
## Contact
90+
91+
* [Jinho D. Choi](http://www.mathcs.emory.edu/~choi).

0 commit comments

Comments
 (0)