Skip to content

Adv-ST/Adv-ST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs

This repository contains the source code for the research paper "Exploring Security Vulnerabilities in Multilingual Speech Translation Systems via Deceptive Inputs". Visit our project website to explore audio samples and learn more.

Getting Started

The experiments were conducted on NVIDIA GeForce A6000, The code is written in Python. We recommend using a virtual environment (e.g., conda) to manage dependencies.

Attack Seamless

Perturbation-based Attack

Environment Setup

  1. Clone the Seamless Repository:

       cd our_repository
       git clone https://github.com/facebookresearch/seamless_communication.git
       cd seamless_communication
       git checkout a9f6fa2c98f93af0ff1a9d967424a85b8fd352f1
       conda create -n advst python=3.9.18
       conda activate advst
       pip install .
  2. Install Dependencies:

      conda install -c conda-forge libsndfile==1.0.31  # Not available via pip

Pretrained Models

We evaluated the following Seamless models in our experiments:

  • SeamlessM4T Large
  • SeamlessM4T Medium
  • SeamlessM4T v2
  • Seamless Expressive

Details on these models and download instructions can be found in the official repository. Notably, we use the Seamless Expressive model to evaluate VSIM-E. Access to the pretrained Expressive model requires official authorization from Meta. For more information, refer to the official model repository.

Run Attack

  • For batch attack:

       cp core-code/seamless/{Attack_seamless.py,psy.py,Attack_m4tlarge.sh,Attack_m4tmedium.sh,Attack_m4tv2.sh,Attack_expressive.sh} seamless_communication/src/
    
       cd seamless_communication/src/
       # conda env: advst
       bash Attack_m4tlarge.sh
       bash Attack_m4tmedium.sh
       bash Attack_m4tv2.sh
       bash Attack_expressive.sh
  • Attack with Target Cycle Optimization:

       cp core-code/seamless/{Attack_m4tlarge_tco.sh} seamless_communication/src/
       
       cd seamless_communication/src/
       # conda env: advst
       bash Attack_m4tlarge_tco.sh
  • For more convenient attack test: single sample attack

       # conda env: advst
       python Attack_seamless.py  --in audio_file \
                                  --target "You make me sick." \
                                  --out "Attack-m4tlarge-(eng,deu,fra,cmn)/${speaker}/${sentence_index}" \
                                  --lr 0.1 \
                                  --eps 0.5 \
                                  --bp 1 \
                                  --tgtl "eng,cmn,deu,fra"

Music-based Attack

Environment Setup

  1. Setup Seamless env as mentioned in perturbation based attack.
  2. Setup MusTango env
    • Clone the modified Mustango model code. We have modified the official MusTango code to remove the no_grad operation during the music generation process, ensuring that the adversarial music optimization process enables gradient flow.

         cp -r core-code/seamless/mustango seamless_communication/src/
    • Install Dependencies

         # conda env: advst
         pip install -r core-code/music_req.txt
      
         cd seamless_communication/src/mustango/diffusers
         pip install .

Run Attack

  • For batch attack:

       cp core-code/seamless/{Attack_seamless_music.py,Attack_m4tlarge_music.sh,Attack_m4tmedium_music.sh,Attack_m4tv2_music.sh,Attack_expressive_music.sh} seamless_communication/src/mustango/
    
       cd seamless_communication/src/mustango/
       # conda env: advst
       bash Attack_m4tlarge_music.sh
       bash Attack_m4tmedium_music.sh
       bash Attack_m4tv2_music.sh
       bash Attack_expressive_music.sh
  • Attack with Target Cycle Optimization:

       cp core-code/seamless/{Attack_m4tlarge_music_tco.sh} seamless_communication/src/mustango/
       
       cd seamless_communication/src/mustango/
       bash Attack_m4tlarge_music_tco.sh
  • For more convenient attack test: single sample attack

       # conda env: advst
       python Attack_seamless_music.py     --target "You make me sick." \
                                           --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                           --lr 0.1 \
                                           --tgtl "eng,cmn,deu,fra"

Physical Attack

Environment Setup

  • Setup Seamless and Mustango env as mentioned in music based attack.

Datasets

  1. We incorporate the Aachen Impulse Response Database to simulate environmental reverberation.

  2. We use the LibriSpeech dataset to simulate real-world background voice, enhancing the adversarial music through adversarial augmentation during optimization.

    • LibriSpeech
      • Splits: train-clean-100
           cd core-code
           wget https://openslr.elda.org/resources/12/train-clean-100.tar.gz
           tar -zxvf train-clean-100.tar.gz
           python move_flac_to_wav.py

Run Attack

  • For batch attack:

       cp core-code/seamless/{Attack_seamless_music_physical.py,dataset.py,Attack_m4tlarge_music_physical.sh} seamless_communication/src/mustango/
    
       cd seamless_communication/src/mustango/
       # conda env: advst
       bash Attack_m4tlarge_music_physical.sh
  • For more convenient attack test: single sample attack

       # conda env: advst
       python Attack_seamless_music_physical.py     --target "You make me sick." \
                                                    --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                                    --lr 0.1 \
                                                    --noise 0.98 \
                                                    --tgtl "eng" \
                                                    --speech_pth "../../../core-code/LibriSpeech_wav/"

Evaluation

Environment Setup

Due to dependency conflicts between the music generation model and the language model used for testing semantic similarity, we need to create a new virtual isolated environment for evaluation.

   cd seamless_communication
   conda create -n adv-seamless-eval python=3.9.18
   conda activate adv-seamless-eval
   pip install .

   pip install -r ../core-code/seamless/eval_req.txt

   conda install -c conda-forge libsndfile==1.0.31

Pretrained Models

We use the Seamless Expressive model to evaluate VSIM-E. Notably, access to the pretrained Expressive model requires official authorization from Meta. For details, refer to the official model repository.

Run Evaluation

   cp core-code/seamless/{seamless_evaluate.py} seamless_communication/src/

   cd seamless_communication/src/
   # conda env: adv-seamless-eval
   python seamless_evaluate.py      --tgtm "seamlessM4T_large" \
                                    --mode "music" \
                                    --in "../../demo_samples/seamless/mustango/Generated/Attack-m4tlarge-(eng-cmn-deu)-music/p285_126_mic1/3/cqm-adv-409.wav"  \
                                    --target_lang "cmn"  \
                                    --target_semantic "Are you insane?" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

   
   python seamless_evaluate.py      --tgtm "seamlessM4T_large" \
                                    --mode "perturbation" \
                                    --in "../../demo_samples/seamless/Generated/Attack-m4tlarge-eps-(0.1)-(eng-cmn-deu-fra)/p285_126_mic1/5/mrv-adv-2565.wav"  \
                                    --speaker_lang "eng"  \
                                    --target_lang "cmn"  \
                                    --target_semantic "Don't waste my time anymore." \
                                    --original_audio "../../core-code/database/vctk_selected/p285/p285_126_mic1.wav" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

Attack Canary

Perturbation-based Attack

Environment Setup

  1. Clone the Modified NeMo Module: We have modified the official NeMo code to remove the no_grad operation, ensuring that the adversarial optimization process enables gradient flow.

       cp -r core-code/canary/NeMo-1.23.0 ./
  2. Setup Virtual Environment: As canary uses different base dependencies from Seamless, we need build a new virtual python environment for attack canary.

       conda create -n advst-canary python=3.10.12
       conda activate advst-canary
       pip install git+https://github.com/NVIDIA/[email protected]#egg=nemo_toolkit[asr]
       conda install -c conda-forge libsndfile==1.0.31  # Not available via pip
       pip install transformers==4.41.2
       pip install datasets==2.20.0
       pip install huggingface-hub==0.23.4

Pretrained Models

We evaluated the canary-1b in our experiments. Model details can be found in their official repository.

Generate reference audio

  • Unlike Seamless, Canary can only accept speech as input. Therefore, we need to generate a speech sample for each attack target semantic (in English) to generate the corresponding text for the target semantic in the specified language, which will be stored in "../core-code/canary/reference".
  • We use Seamless to generate the target speech. The semantics used in the experiments have already been generated. If you need to perform attacks on additional target semantics, you can generate the corresponding speech using the following command:
    # conda env: advst
    # m4t_predict {target semantic (in eng)} --task t2st --tgt_lang "eng" --src_lang "eng" --output_path "../core-code/canary/reference/{target semantic (in eng)}"
    m4t_predict "This is ridiculous." --task t2st --tgt_lang "eng" --src_lang "eng" --output_path "../core-code/canary/reference/This is ridiculous..wav"

Run Attack

  • For batch attack:

       cp core-code/canary/{Attack_canary.py,psy.py,Attack_canary.sh} NeMo-1.23.0/
    
       cd NeMo-1.23.0/
       # conda env: advst-canary
       bash Attack_canary.sh
  • For more convenient attack test: single sample attack

       # conda env: advst-canary
       python Attack_canary.py  --in audio_file \
                                  --target "You make me sick." \
                                  --out "Generated/Attack-canary-eps-(0.5)-(eng,fra,deu,spa)/${speaker}/${sentence_index}" \
                                  --lr 0.1 \
                                  --eps 0.5 \
                                  --bp 1 \
                                  --src_lang "eng" \
                                  --tgtl "eng,fra,deu,spa"

Music-based Attack

Environment Setup

  1. Setup canary env as mentioned in perturbation based attack.
  2. Setup MusTango env
    • Clone the modified Mustango model code. We have modified the official MusTango code to remove the no_grad operation during the music generation process, ensuring that the adversarial music optimization process enables gradient flow.

         cp -r core-code/mustango NeMo-1.23.0/
    • Install Dependencies

         # conda env: advst-canary
         pip install -r core-code/music_req.txt
      
         cd NeMo-1.23.0/mustango/diffusers
         pip install .
         pip install torchaudio

Run Attack

  • For batch attack:

       cp core-code/canary/{Attack_canary_music.py,Attack_canary_music.sh} NeMo-1.23.0/
    
       cd NeMo-1.23.0/mustango/
       # conda env: advst-canary
       bash Attack_canary_music.sh
  • For more convenient attack test: single sample attack

       # conda env: advst-canary
       python Attack_canary_music.py       --target "You make me sick." \
                                           --out "Generated/Attack-canary-(eng,fra,deu,spa)-music/${speaker}/${sentence_index}" \
                                           --lr 0.1 \
                                           --tgtl "eng,fra,deu,spa"

Physical Attack

Environment Setup

  • Setup Canary and Mustango env as mentioned in music based attack.

Datasets

  • Set up the datasets as outlined in the Physical Attack on Seamless.

Run Attack

  • For batch attack:

       cp core-code/canary/{Attack_canary_music_physical.py,Attack_canary_music_physical.sh,dataset.py} NeMo-1.23.0/mustango/
    
       cd NeMo-1.23.0/mustango/
       # conda env: advst-canary
       bash Attack_canary_music_physical.sh
  • For more convenient attack test: single sample attack

       # conda env: advst-canary
       python Attack_seamless_music_physical.py     --target "You make me sick." \
                                                    --out "Attack-m4tlarge-(eng,deu,fra,cmn)-music/${speaker}/${sentence_index}" \
                                                    --lr 0.1 \
                                                    --noise 0.98 \
                                                    --tgtl "eng"

Evaluation

Environment Setup

Due to dependency conflicts between the music generation model and the language model used for testing semantic similarity, we need to create a new virtual isolated environment for evaluation.

  1. Install Seamless env for VSIM-E calculate:

       cd seamless_communication
       conda create -n adv-canary-eval python=3.10.12
       conda activate adv-canary-eval
       pip install .
  2. Install NeMo env:

       cd ../NeMo-1.23.0/
       pip install git+https://github.com/NVIDIA/[email protected]#egg=nemo_toolkit[asr]
    
       # For the bert model of evaluation 
       conda install -c conda-forge libsndfile==1.0.31  # Not available via pip
       pip install -r ../core-code/canary/eval_req.txt

Run Evaluation

   cp core-code/canary/{canary_evaluate.py} NeMo-1.23.0/

   cd NeMo-1.23.0/
   # conda env: adv-canary-eval
   python canary_evaluate.py        --mode "music" \
                                    --in "../demo_samples/canary/mustango/Generated/Attack-canary-(eng-fra-deu)-music/p285_126_mic1/3/zws-adv-580.wav"  \
                                    --target_lang "fra"  \
                                    --target_semantic "Are you insane?" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir


   python canary_evaluate.py        --mode "perturbation" \
                                    --in "../demo_samples/canary/Generated/Attack-canary-eps-(0.1)-(eng-fra-deu)/p285_126_mic1/4/bcb-adv-959.wav"  \
                                    --speaker_lang "eng"  \
                                    --target_lang "spa"  \
                                    --target_semantic "Who do you think you're talking to?" \
                                    --original_audio "../core-code/database/vctk_selected/p285/p285_126_mic1.wav" \
                                    --gated_model_dir "./facebook/seamless-expressive" # the downloaded Seamless Expressive model dir

Acknowledgements

We sincerely appreciate the invaluable contributions of the authors and developers behind the open-source projects and datasets utilized in this research. Their dedication to advancing scientific knowledge and fostering open collaboration has been instrumental in making this work possible. We acknowledge their efforts in providing high-quality resources that have significantly contributed to the progress of our study.

Open-Source Projects

Datasets

We deeply appreciate the open-source community’s commitment to sharing knowledge and resources, which continues to drive innovation in machine learning and security.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors