🍎 This is a repository for OntoEvent-Doc dataset.
💡 Note that MAVEN_ERE is proposed in a paper and released in GitHub, where introduced the detailed data schema.
The structure of data files (require to unzip MAVEN_ERE and OntoEvent-Doc first) is as follows:
SPEECH
└── Datasets
├── MAVEN_ERE
│ ├── train.jsonl # for training
│ ├── test.jsonl # for testing
│ └── valid.jsonl # for validation
├── OntoEvent-Doc
│ ├── event_dict_label_data.json # containing all event type labels
│ ├── event_dict_on_doc_train.json # for training
│ ├── event_dict_on_doc_test.json # for testing
│ └── event_dict_on_doc_valid.json # for validation
└── README.md OntoEvent-Doc, formatted in document level, is derived from OntoEvent which is formatted in sentence level.
The statistics of MAVEN-ERE and OntoEvent-Doc are shown below.
| Dataset | #Document | #Mention | #Temporal | #Causal | #Subevent |
|---|---|---|---|---|---|
| MAVEN-ERE | 4,480 | 112,276 | 1,216,217 | 57,992 | 15,841 |
| OntoEvent-Doc | 4,115 | 60,546 | 5,914 | 14,155 | / |
The data schema of MAVEN-ERE can be referred to their GitHub. Experiments on MAVEN-ERE in our paper involve:
- 6 temporal relations: BEFORE, OVERLAP, CONTAINS, SIMULTANEOUS, BEGINS-ON, ENDS-ON
- 2 causal relations: CAUSE, PRECONDITION
- 1 subevent relation: subevent_relations
Experiments on OntoEvent-Doc in our paper involve:
- 3 temporal relations: BEFORE, AFTER, EQUAL
- 2 causal relations: CAUSE, CAUSEDBY
We also add a NA relation to signify no relation between the event mention pair for the two datasets.
🍒 The OntoEvent-Doc dataset is stored in json format. Each document (specialized with a doc_id, e.g., 95dd35ce7dd6d377c963447eef47c66c) in OntoEvent-Doc datasets contains a list of "events" and a dictionary of "relations", where the data format is as below:
[a doc_id]:
{
"events": [
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
},
{
'doc_id': '...',
'doc_title': 'XXX',
'sent_id': ,
'event_mention': '......',
'event_mention_tokens': ['.', '.', '.', '.', '.', '.'],
'trigger': '...',
'trigger_pos': [, ],
'event_type': ''
},
...
],
"relations": { // each event-relation contains a list of 'sent_id' pairs.
"COSUPER": [[,], [,], [,]],
"SUBSUPER": [],
"SUPERSUB": [],
"CAUSE": [[,], [,]],
"BEFORE": [[,], [,]],
"AFTER": [[,], [,]],
"CAUSEDBY": [[,], [,]],
"EQUAL": [[,], [,]]
}
}