Anwarvic's Blog

Anwarvic's Blog https://anwarvic.github.io/ My blog! en-us Thu, 01 Aug 2024 07:59:45 +0000 Thu, 01 Aug 2024 07:59:45 +0000 Jekyll v3.9.5 Anwarvic's Blog https://anwarvic.github.io/ https://anwarvic.github.io/images/avatar.png 70 70 True Bilingual Neural Machine Translation Bilingual machine translation permits training a single model that translates monolingual sentences from one language to another. However, a model is not truly bilingual unless it can translate back and forth in both language directions it was trained on, along with translating code-switched sentences to either language. We propose a true bilingual model trained on WMT14 English-French (En-Fr) dataset. For better use of parallel data, we generated synthetic code-switched (CSW) data along with an alignment loss on the encoder to align representations across languages. Our model strongly outperforms bilingual baselines on CSW translation while maintaining quality for non-code switched data. Fri, 08 Apr 2022 00:00:00 +0000 https://anwarvic.github.io/about-me/true_bilingual_nmt https://anwarvic.github.io/about-me/true_bilingual_nmt The Effect of Alignment Objectives on Code-Switching Translation One of the things that need to change when it comes to machine translation is the models’ ability to translate code-switching content, especially with the rise of social media and user-generated content. In this paper, we are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another, along with translating code-switched sentences to either language. This model can be considered a bilingual model in the human sense. For better use of parallel data, we generated synthetic code-switched (CSW) data along with an alignment loss on the encoder to align representations across languages. Using the WMT14 English-French (En-Fr) dataset, the trained model strongly outperforms bidirectional baselines on code-switched translation while maintaining quality for non-code-switched (monolingual) data. Thu, 30 Jun 2022 00:00:00 +0000 https://anwarvic.github.io/about-me/effect_of_alignmet_on_csw https://anwarvic.github.io/about-me/effect_of_alignmet_on_csw mBERT: Multilingual BERT mBERT is a multilingual BERT pre-trained on 104 languages, released by the authors of the original paper on Google Research’s official GitHub repository: google-research/bert on November 2018. mBERT follows the same structure of BERT. The only difference is that mBERT is pre-trained on concatenated Wikipedia data for 104 languages and it does surprisingly well compared to cross-lingual word embeddings on zero-shot cross-lingual transfer in XNLI dataset. Sun, 04 Nov 2018 00:00:00 +0000 https://anwarvic.github.io/cross-lingual-lm/mBERT https://anwarvic.github.io/cross-lingual-lm/mBERT XLM XLM stands for “Cross-lingual Language Modeling” which is a model created by FacebookAI in 2019 and published in this paper: “Cross-lingual Language Model Pretraining”. XLM is an 12-layer encoder-transformer of with 1024 hidden units, 16 heads, and GELU activation. Tue, 22 Jan 2019 00:00:00 +0000 https://anwarvic.github.io/cross-lingual-lm/XLM https://anwarvic.github.io/cross-lingual-lm/XLM RNN: Recurrent Neural Networks The neural n-gram language model we've seen earlier was trained using the a window-sized subset of the previous tokens. And this falls short with long sentences as where the contextual dependencies are longer than the window size. Now, we need a model that is able to capture dependencies outside the window. In other words, we need a system that has some kind of memory to save these long dependencies. Thu, 19 Sep 1985 00:00:00 +0000 https://anwarvic.github.io/language-modeling/RNN https://anwarvic.github.io/language-modeling/RNN Neural N-gram Language Model As we discussed before, the n-gram language model has a few problems like the data sparsity and the big storage need. That’s why these problems were first tackled by Bengio et al in 2003 and published under the name “A Neural Probabilistic Language Model”, which introduced the first large-scale deep learning for natural language processing model. This model learns a distributed representation of words, along with the probability function for word sequences expressed in terms of these representations. The idea behind this architecture is to deal with the language model task as if it is a classification problems where: Sun, 09 Feb 2003 00:00:00 +0000 https://anwarvic.github.io/language-modeling/Neural_N-gram https://anwarvic.github.io/language-modeling/Neural_N-gram Attention Mechanism A potential issue with the Seq2Seq approach is that a neural network needs to be able to compress all the necessary information of a source sentence into a fixed-length vector (context vector). This may make it difficult for the neural network to cope with long sentences, especially those that are longer than the sentences in the training corpus. This paper: “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches” showed that indeed the performance of a basic encoder–decoder deteriorates rapidly as the length of an input sentence increases. Mon, 01 Sep 2014 00:00:00 +0000 https://anwarvic.github.io/machine-translation/Attention https://anwarvic.github.io/machine-translation/Attention Seq2Seq Sequence-to-sequence (seq2seq) models or encoder-decoder architecture, created by IlyaSutskever and published in their paper: Sequence to Sequence Learning with Neural Networks published in 2014, have enjoyed great success in a machine translation, speech recognition, and text summarization. Wed, 10 Sep 2014 00:00:00 +0000 https://anwarvic.github.io/machine-translation/Seq2Seq https://anwarvic.github.io/machine-translation/Seq2Seq Multilingual Google's NMT GNMT stands for “Google Neural Machine Translation” which is a bilingual machine translation architecture that was discussed before in this post: GNMT. Here, we are going to discuss how they extended the bilingual nature of the GNMT model to be multilingual. The Multilingual GNMT architecture, as seen in the following figure, was proposed in 2016 by the Google Research team and published in this paper: Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. The official code for this paper can be found in the TensorFlow’s official GitHub repository: TensorFlow/GNMT. Mon, 14 Nov 2016 00:00:00 +0000 https://anwarvic.github.io/multilingual-nmt/Multilingual_GNMT https://anwarvic.github.io/multilingual-nmt/Multilingual_GNMT Massively MNMT Massively MNMT is a multilingual many-to-many NMT model proposed by Google Research in 2019 and published in their paper: Massively Multilingual Neural Machine Translation. Massively MNMT is a standard Base-Transformer with 6 layers in both the encoder and the decoder. To enable many-to-many translation, the authors added a target-language prefix token to each source sentence. Tue, 02 Jul 2019 00:00:00 +0000 https://anwarvic.github.io/multilingual-nmt/Massively_MNMT https://anwarvic.github.io/multilingual-nmt/Massively_MNMT CTC Data sets for speech recognition are usually a dataset of audio clips and corresponding transcripts. The main issue in these datasets is that we don’t know how the characters in the transcript align to the audio. Without this alignment, it would be very hard to train a speech recognition model since people’s rates of speech vary. CTC provides a solution to this problem. Sun, 25 Jun 2006 00:00:00 +0000 https://anwarvic.github.io/speech-recognition/CTC https://anwarvic.github.io/speech-recognition/CTC RNN-T: RNN Transducer RNN-T stands for “Recurrent Neural Network Transducer” which is a promising architecture for general-purpose sequence such as audio transcription built using RNNs. RNN-T was proposed by Alex Graves at the University of Toronto back in 2012 and published under the name: Sequence Transduction with Recurrent Neural Networks. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Wed, 14 Nov 2012 00:00:00 +0000 https://anwarvic.github.io/speech-recognition/RNN-T https://anwarvic.github.io/speech-recognition/RNN-T WaveNet WaveNet is a generative deep neural network for generating raw audio waveforms based on PixelCNN architecture. WaveNet was proposed by Deep Mind in 2016 and published in this paper: WaveNet: A Generative Model for Raw Audio. The official audio samples outputted from the trained WaveNet by Google is provided in this website. The unofficial TensorFlow implementation for WaveNet can be found in this GitHub repository: tensorflow-wavenet. Mon, 12 Sep 2016 00:00:00 +0000 https://anwarvic.github.io/speech-synthesis/WaveNet https://anwarvic.github.io/speech-synthesis/WaveNet Tacotron Tacotron is a two-staged generative text-to-speech (TTS) model that synthesizes speech directly from characters. Given (text, audio) pairs, Tacotron can be trained completely from scratch with random initialization to output spectrogram without any phoneme-level alignment. After that, a Vocoder model is used to convert the audio spectrogram to waveforms. Tacotron was proposed by Google in 2017 and published in this paper under the same name: Tacotron: Towards End-to-End Speech Synthesis. The official audio samples outputted from the trained Tacotron by Google is provided in this website. The unofficial TensorFlow implementation for Tacotron can be found in this GitHub repository: tacotron. Wed, 29 Mar 2017 00:00:00 +0000 https://anwarvic.github.io/speech-synthesis/Tacotron https://anwarvic.github.io/speech-synthesis/Tacotron Dual-decoder Transformer Dual-decoder Transformer is a Transformer architecture that consists of two decoders; one responsible for Automatic Speech Recognition (ASR) while the other is responsible for Speech Translation (ST). This model was proposed by FAIR and Grenoble Alpes University in 2020 and published in this paper: Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation. The official code of this paper can be found in the following GitHub repository: speech-translation. Mon, 02 Nov 2020 00:00:00 +0000 https://anwarvic.github.io/speech-translation/Dual-decoder_Transformer https://anwarvic.github.io/speech-translation/Dual-decoder_Transformer SpeechT5 SpeechT5, stands for “Speech Text-to-Text Transfer Transformer”, is a unified framework for speech and text that leverages the large-scale unlabeled speech and text data hoping to improve the modeling capability for both speech and text. The name is inspired by T5 framework by Google which did the same on the textual modality. SpeechT5 was proposed by Microsoft in 2021 and published in this paper: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing. The official code for this framework can be found on Microsoft’s official GitHub repository: Microsoft/SpeechT5. Thu, 14 Oct 2021 00:00:00 +0000 https://anwarvic.github.io/speech-translation/SpeechT5 https://anwarvic.github.io/speech-translation/SpeechT5 t-SNE One of the popular things to do with word embedding, is to take this N-dimensional data and embed it in a two-dimensional space so that we can visualize them. The most common algorithm for doing this is the t-SNE algorithm created by Laurens van der Maaten and Geoffrey Hinton in 2008 and published in this paper: Visualizing data using t-SNE. Tue, 25 Nov 2008 00:00:00 +0000 https://anwarvic.github.io/word-embedding/t-SNE https://anwarvic.github.io/word-embedding/t-SNE Word2Vec Word2Vec stands for “word-to-vector” is a model architecture created by Tomáš Mikolov from Google in 2013 and published in the paper: Efficient Estimation of Word Representations in Vector Space. This model aims at computing continuous vector representations of words from very large data sets. Sat, 07 Sep 2013 00:00:00 +0000 https://anwarvic.github.io/word-embedding/word2vec https://anwarvic.github.io/word-embedding/word2vec