Skip to content

orensul/analogies_mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

110 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎪 Life is a Circus and We are the Clowns 🤡: Automatically Finding Analogies between Situations and Processes

This repository contains the code for the paper: https://arxiv.org/abs/2210.12197.
Authors: Oren Sultan, Dafna Shahaf, The Hebrew University of Jerusalem, Israel.
Conference: The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).

Setup

The code is implemented in python 3.8.12. To run it, please install the requirements.txt file:

pip install -r minimalrequirements.txt

Where to start?

Explore the paper_experiments_results folder for restoring the results in the experiment (each folder contains a separate README file).
Run runner.py for running our algorithm on a specific example of pairs of texts.
Note that you don't need to run coreference and qa_srl, as the output files have already exist in the repo. (You should run coreference and qa_srl only if you use a new input text files, by setting run_coref=False, run_qasrl=False in analogous_matching_algorithm function)

Important folders

paper_experiments_results:
Contains the datasets, the labels of the annotators, as well as the data which generates the results in the figures and tables of the three experiments. Each inner folder contains a separate README file.

data:
Includes the following folders:
original_text_files -- all the original texts files (including the stories and paragraphs from ProPara).
coref_text_files -- all the texts files after coreference (including the stories and paragraphs from ProPara).
propara -- data files relevant to ProPara dataset, output files of the ranking lists for the different models (see Section 4.1 in the paper), and some code files to read and print stats on ProPara and the methods.

s2e-coref:
Contains the implementation code for the coreference model that we used (see Section 3.1 in the paper).

qasrl-modeling
Contains the implementation code for the QA-SRL model that we used (see Section 3.2 in the paper).

Important py. files

Algorithm's code files

runner.py -- runner of our analogous matching algorithm on given pairs.
find_mappings.py -- run FMQ method on a given pair of texts (called from outside to generate_mappings function).
find_mappings_verbs.py -- run FMV method on a given pair of texts (called from outside to generate_mappings function).
sentence_bert.py -- run SBERT on a given pair of texts.
coref.py -- run our coreference implementation on input files.
qa-srl.py -- run our QA-SRL implementation on texts files (after coref).

Experiment's code files

run_propara_all_pairs_exp.py -- run experiment 1 (see Section 4.1 in the paper).
analogies_mining_exp_annotators_consistency.py -- run annotators consistency confusion matrix (see Section 4.1 in the paper).
run_mappings_evaluation_exp.py -- run experiment 2 (see Section 4.2 in the paper).
run_robustness_to_paraphrases_exp.py -- run experiment 3 (see Section 4.3 in the paper).

Cite

@article{sultan2022life,
title={Life is a Circus and We are the Clowns: Automatically Finding Analogies between Situations and Processes},
author={Sultan, Oren and Shahaf, Dafna},
journal={arXiv preprint arXiv:2210.12197},
year={2022}
}

Contact

For inquiries, please send an email to [email protected].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors