PRISM

Overview

This project houses a dataset of four entities, each with associated descriptions and sample data.

Dataset Description

The dataset consists of four distinct entity types, each with its unique characteristics:

Location (Small)
- Number of Entities: 1000
- Number of Sentences: 567
- Sample Entities: Chicopee, Erlanger
- Sample Sentences:
  - "I live in Chicopee."
  - "Erlanger and Minnetonka are gorgeous."
Location (Large)
- Number of Entities: 2818
- Number of Sentences: 1628
- Sample Entities: Albemarle, Saratoga
- Sample Sentences:
  - "The route passes through Albemarle and Saratoga."
  - "Merthyr Tydfil is gorgeous."
Person Names
- Number of Entities: 4348
- Number of Sentences: 4220
- Sample Entities: Beaufort, Zebadiah
- Sample Sentences:
  - "Zebadiah always makes sure to include others."
  - "Eberhard loves to skydive."
Drugs
- Number of Entities: 2874
- Number of Sentences: 2874
- Sample Entities: Aclovate, Primidone
- Sample Sentences:
  - "Is Primidone safe to use during pregnancy?"
  - "What is the dosage for Fluothane?"

Audio Files can be downloaded from : https://drive.google.com/file/d/1utVz2jYshJmJfXDhY2PPqfiargEzt0yI/view?usp=share_link

Usage

This dataset can be valuable for contextual biasing in speech recognition.

License

This dataset is provided under the Creative Commons Attribution-NonCommercial 4.0 International License. You are free to:

Share: Copy and redistribute the material in any medium or format.
Adapt: Remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Non-Commercial: You may not use the material for commercial purposes.

How to Cite

If you use this dataset in your research or project, please cite it as follows:

@misc{mittal2023speech,
      title={Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries}, 
      author={Mittal, Ashish and Sarawagi, Sunita and Jyothi, Preethi and Saon, George and Kurata, Gakuto},
      booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
      year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Entity_Rich_Dataset		Entity_Rich_Dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRISM

Overview

Dataset Description

Usage

License

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PRISM

Overview

Dataset Description

Usage

License

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages