Datasets

MAVEN-ERE & OntoEvent-Doc

🍎 This is a repository for OntoEvent-Doc dataset.

💡 Note that MAVEN_ERE is proposed in a paper and released in GitHub, where introduced the detailed data schema.

Data File Structure 🪜

The structure of data files (require to unzip MAVEN_ERE and OntoEvent-Doc first) is as follows:

SPEECH
└── Datasets
    ├── MAVEN_ERE   
    │   ├── train.jsonl     # for training
    │   ├── test.jsonl      # for testing
    │   └── valid.jsonl     # for validation
    ├── OntoEvent-Doc
    │   ├── event_dict_label_data.json      # containing all event type labels  
    │   ├── event_dict_on_doc_train.json	# for training
    │   ├── event_dict_on_doc_test.json		# for testing
    │   └── event_dict_on_doc_valid.json	# for validation
    └── README.md

Brief Introduction 📣

OntoEvent-Doc, formatted in document level, is derived from OntoEvent which is formatted in sentence level.

Statistics 📊

The statistics of MAVEN-ERE and OntoEvent-Doc are shown below.

Dataset	#Document	#Mention	#Temporal	#Causal	#Subevent
MAVEN-ERE	4,480	112,276	1,216,217	57,992	15,841
OntoEvent-Doc	4,115	60,546	5,914	14,155	/

Data Format 🔍

The data schema of MAVEN-ERE can be referred to their GitHub. Experiments on MAVEN-ERE in our paper involve:

6 temporal relations: BEFORE, OVERLAP, CONTAINS, SIMULTANEOUS, BEGINS-ON, ENDS-ON
2 causal relations: CAUSE, PRECONDITION
1 subevent relation: subevent_relations

Experiments on OntoEvent-Doc in our paper involve:

3 temporal relations: BEFORE, AFTER, EQUAL
2 causal relations: CAUSE, CAUSEDBY

We also add a NA relation to signify no relation between the event mention pair for the two datasets.

🍒 The OntoEvent-Doc dataset is stored in json format. Each document (specialized with a doc_id, e.g., 95dd35ce7dd6d377c963447eef47c66c) in OntoEvent-Doc datasets contains a list of "events" and a dictionary of "relations", where the data format is as below:

[a doc_id]:
{
    "events": [
    {
        'doc_id': '...', 
        'doc_title': 'XXX', 
        'sent_id': , 
        'event_mention': '......', 
        'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
        'trigger': '...', 
        'trigger_pos': [, ], 
        'event_type': ''
    },
    {
        'doc_id': '...', 
        'doc_title': 'XXX', 
        'sent_id': , 
        'event_mention': '......', 
        'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], 
        'trigger': '...', 
        'trigger_pos': [, ], 
        'event_type': ''
    },
    ... 
    ],
    "relations": { // each event-relation contains a list of 'sent_id' pairs.  
        "COSUPER": [[,], [,], [,]], 
        "SUBSUPER": [], 
        "SUPERSUB": [], 
        "CAUSE": [[,], [,]], 
        "BEFORE": [[,], [,]], 
        "AFTER": [[,], [,]], 
        "CAUSEDBY": [[,], [,]], 
        "EQUAL": [[,], [,]]
    }
}

Name		Name	Last commit message	Last commit date
parent directory ..
MAVEN_ERE.zip		MAVEN_ERE.zip
OntoEvent-Doc.zip		OntoEvent-Doc.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

MAVEN-ERE & OntoEvent-Doc

Data File Structure 🪜

Brief Introduction 📣

Statistics 📊

Data Format 🔍

FilesExpand file tree

Datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

Datasets

Folders and files

parent directory

README.md

MAVEN-ERE & OntoEvent-Doc

Data File Structure 🪜

Brief Introduction 📣

Statistics 📊

Data Format 🔍