This project houses a dataset of four entities, each with associated descriptions and sample data.
The dataset consists of four distinct entity types, each with its unique characteristics:
-
Location (Small)
- Number of Entities: 1000
- Number of Sentences: 567
- Sample Entities: Chicopee, Erlanger
- Sample Sentences:
- "I live in Chicopee."
- "Erlanger and Minnetonka are gorgeous."
-
Location (Large)
- Number of Entities: 2818
- Number of Sentences: 1628
- Sample Entities: Albemarle, Saratoga
- Sample Sentences:
- "The route passes through Albemarle and Saratoga."
- "Merthyr Tydfil is gorgeous."
-
Person Names
- Number of Entities: 4348
- Number of Sentences: 4220
- Sample Entities: Beaufort, Zebadiah
- Sample Sentences:
- "Zebadiah always makes sure to include others."
- "Eberhard loves to skydive."
-
Drugs
- Number of Entities: 2874
- Number of Sentences: 2874
- Sample Entities: Aclovate, Primidone
- Sample Sentences:
- "Is Primidone safe to use during pregnancy?"
- "What is the dosage for Fluothane?"
Audio Files can be downloaded from : https://drive.google.com/file/d/1utVz2jYshJmJfXDhY2PPqfiargEzt0yI/view?usp=share_link
This dataset can be valuable for contextual biasing in speech recognition.
This dataset is provided under the Creative Commons Attribution-NonCommercial 4.0 International License. You are free to:
- Share: Copy and redistribute the material in any medium or format.
- Adapt: Remix, transform, and build upon the material.
Under the following terms:
- Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- Non-Commercial: You may not use the material for commercial purposes.
If you use this dataset in your research or project, please cite it as follows:
@misc{mittal2023speech,
title={Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries},
author={Mittal, Ashish and Sarawagi, Sunita and Jyothi, Preethi and Saon, George and Kurata, Gakuto},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023}
}