Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs

This repository contains the source code for the research paper "Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs". Visit our project website to explore audio samples and learn more.

Getting Started

The experiments were conducted on NVIDIA GeForce A6000, The code is written in Python. We recommend using a virtual environment (e.g., conda) to manage dependencies.

Attack Seamless

Perturbation-based Attack

Environment Setup

Clone the Seamless Repository:

   cd our_repository
   git clone https://github.com/facebookresearch/seamless_communication.git
   cd seamless_communication
   git checkout a9f6fa2c98f93af0ff1a9d967424a85b8fd352f1
   conda create -n advst python=3.9.18
   conda activate advst
   pip install .

Install Dependencies:

  conda install -c conda-forge libsndfile==1.0.31  # Not available via pip

Pretrained Models

We evaluated the following Seamless models in our experiments:

SeamlessM4T Large
SeamlessM4T Medium
SeamlessM4T v2
Seamless Expressive

Details on these models and download instructions can be found in the official repository. Notably, we use the Seamless Expressive model to evaluate VSIM-E. Access to the pretrained Expressive model requires official authorization from Meta. For more information, refer to the official model repository.

Run Attack

For batch attack:

   cp core-code/seamless/{Attack_seamless.py,psy.py,Attack_m4tlarge.sh,Attack_m4tmedium.sh,Attack_m4tv2.sh,Attack_expressive.sh} seamless_communication/src/

   cd seamless_communication/src/
   # conda env: advst
   bash Attack_m4tlarge.sh
   bash Attack_m4tmedium.sh
   bash Attack_m4tv2.sh
   bash Attack_expressive.sh

Attack with Target Cycle Optimization:

   cp core-code/seamless/{Attack_m4tlarge_tco.sh} seamless_communication/src/
   
   cd seamless_communication/src/
   # conda env: advst
   bash Attack_m4tlarge_tco.sh

For more convenient attack test: single sample attack

   # conda env: advst
   python Attack_seamless.py  --in audio_file \
                              --target "You make me sick." \
                              --out "Attack-m4tlarge-(eng,deu,fra,cmn)/${speaker}/${sentence_index}" \
                              --lr 0.1 \
                              --eps 0.5 \
                              --bp 1 \
                              --tgtl "eng,cmn,deu,fra"

Music-based Attack

Environment Setup

Setup Seamless env as mentioned in perturbation based attack.
Setup MusTango env
- Clone the modified Mustango model code. We have modified the official MusTango code to remove the no_grad operation during the music generation process, ensuring that the adversarial music optimization process enables gradient flow.
```
   cp -r core-code/seamless/mustango seamless_communication/src/
```
- Install Dependencies
```
   # conda env: advst
   pip install -r core-code/music_req.txt

   cd seamless_communication/src/mustango/diffusers
   pip install .
```

Run Attack

For batch attack:

   cp core-code/seamless/{Attack_seamless_music.py,Attack_m4tlarge_music.sh,Attack_m4tmedium_music.sh,Attack_m4tv2_music.sh,Attack_expressive_music.sh} seamless_communication/src/mustango/

   cd seamless_communication/src/mustango/
   # conda env: advst
   bash Attack_m4tlarge_music.sh
   bash Attack_m4tmedium_music.sh
   bash Attack_m4tv2_music.sh
   bash Attack_expressive_music.sh

Attack with Target Cycle Optimization:

   cp core-code/seamless/{Attack_m4tlarge_music_tco.sh} seamless_communication/src/mustango/
   
   cd seamless_communication/src/mustango/
   bash Attack_m4tlarge_music_tco.sh

For more convenient attack test: single sample attack

   # conda env: advst
   python Attack_seamless_music.py     --target "You make me sick." \
                                       --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                       --lr 0.1 \
                                       --tgtl "eng,cmn,deu,fra"

Physical Attack

Environment Setup

Setup Seamless and Mustango env as mentioned in music based attack.

Datasets

We incorporate the Aachen Impulse Response Database to simulate environmental reverberation.
- Aachen Impulse Response Database
  - Manual download
  - Get the .wav file versions (AIR_wav_files.zip)
We use the LibriSpeech dataset to simulate real-world background voice, enhancing the adversarial music through adversarial augmentation during optimization.
- LibriSpeech
  - Splits: train-clean-100
```
   cd core-code
   wget https://openslr.elda.org/resources/12/train-clean-100.tar.gz
   tar -zxvf train-clean-100.tar.gz
   python move_flac_to_wav.py
```

Run Attack

For batch attack:

   cp core-code/seamless/{Attack_seamless_music_physical.py,dataset.py,Attack_m4tlarge_music_physical.sh} seamless_communication/src/mustango/

   cd seamless_communication/src/mustango/
   # conda env: advst
   bash Attack_m4tlarge_music_physical.sh

For more convenient attack test: single sample attack

   # conda env: advst
   python Attack_seamless_music_physical.py     --target "You make me sick." \
                                                --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                                --lr 0.1 \
                                                --noise 0.98 \
                                                --tgtl "eng" \
                                                --speech_pth "../../../core-code/LibriSpeech_wav/"

Evaluation

Environment Setup

Due to dependency conflicts between the music generation model and the language model used for testing semantic similarity, we need to create a new virtual isolated environment for evaluation.

   cd seamless_communication
   conda create -n adv-seamless-eval python=3.9.18
   conda activate adv-seamless-eval
   pip install .

   pip install -r ../core-code/seamless/eval_req.txt

   conda install -c conda-forge libsndfile==1.0.31

Pretrained Models

We use the Seamless Expressive model to evaluate VSIM-E. Notably, access to the pretrained Expressive model requires official authorization from Meta. For details, refer to the official model repository.

Run Evaluation

   cp core-code/seamless/{seamless_evaluate.py} seamless_communication/src/

   cd seamless_communication/src/
   # conda env: adv-seamless-eval
   python seamless_evaluate.py      --tgtm "seamlessM4T_large" \
                                    --mode "music" \
                                    --in "../../demo_samples/seamless/mustango/Generated/Attack-m4tlarge-(eng-cmn-deu)-music/p285_126_mic1/3/cqm-adv-409.wav"  \
                                    --target_lang "cmn"  \
                                    --target_semantic "Are you insane?" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

   
   python seamless_evaluate.py      --tgtm "seamlessM4T_large" \
                                    --mode "perturbation" \
                                    --in "../../demo_samples/seamless/Generated/Attack-m4tlarge-eps-(0.1)-(eng-cmn-deu-fra)/p285_126_mic1/5/mrv-adv-2565.wav"  \
                                    --speaker_lang "eng"  \
                                    --target_lang "cmn"  \
                                    --target_semantic "Don't waste my time anymore." \
                                    --original_audio "../../core-code/database/vctk_selected/p285/p285_126_mic1.wav" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

Attack Canary

Perturbation-based Attack

Environment Setup

Clone the Modified NeMo Module: We have modified the official NeMo code to remove the no_grad operation, ensuring that the adversarial optimization process enables gradient flow.
```
   cp -r core-code/canary/NeMo-1.23.0 ./
```

Setup Virtual Environment: As canary uses different base dependencies from Seamless, we need build a new virtual python environment for attack canary.

   conda create -n advst-canary python=3.10.12
   conda activate advst-canary
   pip install git+https://github.com/NVIDIA/[email protected]#egg=nemo_toolkit[asr]
   conda install -c conda-forge libsndfile==1.0.31  # Not available via pip
   pip install transformers==4.41.2
   pip install datasets==2.20.0
   pip install huggingface-hub==0.23.4

Pretrained Models

We evaluated the canary-1b in our experiments. Model details can be found in their official repository.

Generate reference audio

Unlike Seamless, Canary can only accept speech as input. Therefore, we need to generate a speech sample for each attack target semantic (in English) to generate the corresponding text for the target semantic in the specified language, which will be stored in "../core-code/canary/reference".

We use Seamless to generate the target speech. The semantics used in the experiments have already been generated. If you need to perform attacks on additional target semantics, you can generate the corresponding speech using the following command:

# conda env: advst
# m4t_predict {target semantic (in eng)} --task t2st --tgt_lang "eng" --src_lang "eng" --output_path "../core-code/canary/reference/{target semantic (in eng)}"
m4t_predict "This is ridiculous." --task t2st --tgt_lang "eng" --src_lang "eng" --output_path "../core-code/canary/reference/This is ridiculous..wav"

Run Attack

For batch attack:

   cp core-code/canary/{Attack_canary.py,psy.py,Attack_canary.sh} NeMo-1.23.0/

   cd NeMo-1.23.0/
   # conda env: advst-canary
   bash Attack_canary.sh

For more convenient attack test: single sample attack

   # conda env: advst-canary
   python Attack_canary.py  --in audio_file \
                              --target "You make me sick." \
                              --out "Generated/Attack-canary-eps-(0.5)-(eng,fra,deu,spa)/${speaker}/${sentence_index}" \
                              --lr 0.1 \
                              --eps 0.5 \
                              --bp 1 \
                              --src_lang "eng" \
                              --tgtl "eng,fra,deu,spa"

Music-based Attack

Environment Setup

Setup canary env as mentioned in perturbation based attack.
Setup MusTango env
- Clone the modified Mustango model code. We have modified the official MusTango code to remove the no_grad operation during the music generation process, ensuring that the adversarial music optimization process enables gradient flow.
```
   cp -r core-code/mustango NeMo-1.23.0/
```
- Install Dependencies
```
   # conda env: advst-canary
   pip install -r core-code/music_req.txt

   cd NeMo-1.23.0/mustango/diffusers
   pip install .
   pip install torchaudio
```

Run Attack

For batch attack:

   cp core-code/canary/{Attack_canary_music.py,Attack_canary_music.sh} NeMo-1.23.0/

   cd NeMo-1.23.0/mustango/
   # conda env: advst-canary
   bash Attack_canary_music.sh

For more convenient attack test: single sample attack

   # conda env: advst-canary
   python Attack_canary_music.py       --target "You make me sick." \
                                       --out "Generated/Attack-canary-(eng,fra,deu,spa)-music/${speaker}/${sentence_index}" \
                                       --lr 0.1 \
                                       --tgtl "eng,fra,deu,spa"

Physical Attack

Environment Setup

Setup Canary and Mustango env as mentioned in music based attack.

Datasets

Set up the datasets as outlined in the Physical Attack on Seamless.

Run Attack

For batch attack:

   cp core-code/canary/{Attack_canary_music_physical.py,Attack_canary_music_physical.sh,dataset.py} NeMo-1.23.0/mustango/

   cd NeMo-1.23.0/mustango/
   # conda env: advst-canary
   bash Attack_canary_music_physical.sh

For more convenient attack test: single sample attack

   # conda env: advst-canary
   python Attack_seamless_music_physical.py     --target "You make me sick." \
                                                --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                                --lr 0.1 \
                                                --noise 0.98 \
                                                --tgtl "eng"

Evaluation

Environment Setup

Due to dependency conflicts between the music generation model and the language model used for testing semantic similarity, we need to create a new virtual isolated environment for evaluation.

Install Seamless env for VSIM-E calculate:

   cd seamless_communication
   conda create -n adv-canary-eval python=3.10.12
   conda activate adv-canary-eval
   pip install .

Install NeMo env:

   cd ../NeMo-1.23.0/
   pip install git+https://github.com/NVIDIA/[email protected]#egg=nemo_toolkit[asr]

   # For the bert model of evaluation 
   conda install -c conda-forge libsndfile==1.0.31  # Not available via pip
   pip install -r ../core-code/canary/eval_req.txt

Run Evaluation

   cp core-code/canary/{canary_evaluate.py} NeMo-1.23.0/

   cd NeMo-1.23.0/
   # conda env: adv-canary-eval
   python canary_evaluate.py        --mode "music" \
                                    --in "../demo_samples/canary/mustango/Generated/Attack-canary-(eng-fra-deu)-music/p285_126_mic1/3/zws-adv-580.wav"  \
                                    --target_lang "fra"  \
                                    --target_semantic "Are you insane?" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir


   python canary_evaluate.py        --mode "perturbation" \
                                    --in "../demo_samples/canary/Generated/Attack-canary-eps-(0.1)-(eng-fra-deu)/p285_126_mic1/4/bcb-adv-959.wav"  \
                                    --speaker_lang "eng"  \
                                    --target_lang "spa"  \
                                    --target_semantic "Who do you think you're talking to?" \
                                    --original_audio "../core-code/database/vctk_selected/p285/p285_126_mic1.wav" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

Acknowledgements

We sincerely appreciate the invaluable contributions of the authors and developers behind the open-source projects and datasets utilized in this research. Their dedication to advancing scientific knowledge and fostering open collaboration has been instrumental in making this work possible. We acknowledge their efforts in providing high-quality resources that have significantly contributed to the progress of our study.

Open-Source Projects

Datasets

We deeply appreciate the open-source community’s commitment to sharing knowledge and resources, which continues to drive innovation in machine learning and security.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
core-code		core-code
demo_samples		demo_samples
README.MD		README.MD

Folders and files

Latest commit

History

Repository files navigation

Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs

Getting Started

Attack Seamless

Perturbation-based Attack

Environment Setup

Pretrained Models

Run Attack

Music-based Attack

Environment Setup

Run Attack

Physical Attack

Environment Setup

Datasets

Run Attack

Evaluation

Environment Setup

Pretrained Models

Run Evaluation

Attack Canary

Perturbation-based Attack

Environment Setup

Pretrained Models

Generate reference audio

Run Attack

Music-based Attack

Environment Setup

Run Attack

Physical Attack

Environment Setup

Datasets

Run Attack

Evaluation

Environment Setup

Run Evaluation

Acknowledgements

Open-Source Projects

Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages