Skip to content

AshishMittal/PRISM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

PRISM

Overview

This project houses a dataset of four entities, each with associated descriptions and sample data.

Dataset Description

The dataset consists of four distinct entity types, each with its unique characteristics:

  1. Location (Small)

    • Number of Entities: 1000
    • Number of Sentences: 567
    • Sample Entities: Chicopee, Erlanger
    • Sample Sentences:
      • "I live in Chicopee."
      • "Erlanger and Minnetonka are gorgeous."
  2. Location (Large)

    • Number of Entities: 2818
    • Number of Sentences: 1628
    • Sample Entities: Albemarle, Saratoga
    • Sample Sentences:
      • "The route passes through Albemarle and Saratoga."
      • "Merthyr Tydfil is gorgeous."
  3. Person Names

    • Number of Entities: 4348
    • Number of Sentences: 4220
    • Sample Entities: Beaufort, Zebadiah
    • Sample Sentences:
      • "Zebadiah always makes sure to include others."
      • "Eberhard loves to skydive."
  4. Drugs

    • Number of Entities: 2874
    • Number of Sentences: 2874
    • Sample Entities: Aclovate, Primidone
    • Sample Sentences:
      • "Is Primidone safe to use during pregnancy?"
      • "What is the dosage for Fluothane?"

Audio Files can be downloaded from : https://drive.google.com/file/d/1utVz2jYshJmJfXDhY2PPqfiargEzt0yI/view?usp=share_link

Usage

This dataset can be valuable for contextual biasing in speech recognition.

License

This dataset is provided under the Creative Commons Attribution-NonCommercial 4.0 International License. You are free to:

  • Share: Copy and redistribute the material in any medium or format.
  • Adapt: Remix, transform, and build upon the material.

Under the following terms:

  • Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • Non-Commercial: You may not use the material for commercial purposes.

How to Cite

If you use this dataset in your research or project, please cite it as follows:

@misc{mittal2023speech,
      title={Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries}, 
      author={Mittal, Ashish and Sarawagi, Sunita and Jyothi, Preethi and Saon, George and Kurata, Gakuto},
      booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
      year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors