<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="https://anwarvic.github.io/feed.xml" rel="self" type="application/rss+xml" />
    <title>Anwarvic's Blog</title>
    <link>https://anwarvic.github.io/</link>
    <description>My blog!</description>
    <language>en-us</language>
    <pubDate>Thu, 01 Aug 2024 07:59:45 +0000</pubDate>
    <lastBuildDate>Thu, 01 Aug 2024 07:59:45 +0000</lastBuildDate>
    <generator>Jekyll v3.9.5</generator>
    <image>
      <title>Anwarvic's Blog</title>
      <link>https://anwarvic.github.io/</link>
      <link>https://anwarvic.github.io/images/avatar.png</link>
      <width>70</width>
      <height>70</height>
    </image>
    
      
      <item>
        <title>True Bilingual Neural Machine Translation</title>
        <description>Bilingual machine translation permits training a single model that translates
monolingual sentences from one language to another. However, a model is not
truly bilingual unless it can translate back and forth in both language
directions it was trained on, along with translating code-switched sentences to
either language. We propose a true bilingual model trained on WMT14
English-French (En-Fr) dataset. For better use of parallel data, we generated
synthetic code-switched (CSW) data along with an alignment loss on the encoder
to align representations across languages. Our model strongly outperforms
bilingual baselines on CSW translation while maintaining quality for non-code
switched data.

</description>
        <pubDate>Fri, 08 Apr 2022 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/about-me/true_bilingual_nmt</link>
        <guid isPermaLink="true">https://anwarvic.github.io/about-me/true_bilingual_nmt</guid>
        
        
      </item>
      
      <item>
        <title>The Effect of Alignment Objectives on Code-Switching Translation</title>
        <description>One of the things that need to change when it comes to machine translation is
the models’ ability to translate code-switching content, especially with the
rise of social media and user-generated content. In this paper, we are
proposing a way of training a single machine translation model that is able to
translate monolingual sentences from one language to another, along with
translating code-switched sentences to either language. This model can be
considered a bilingual model in the human sense. For better use of parallel
data, we generated synthetic code-switched (CSW) data along with an alignment
loss on the encoder to align representations across languages. Using the WMT14
English-French (En-Fr) dataset, the trained model strongly outperforms
bidirectional baselines on code-switched translation while maintaining quality
for non-code-switched (monolingual) data.

</description>
        <pubDate>Thu, 30 Jun 2022 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/about-me/effect_of_alignmet_on_csw</link>
        <guid isPermaLink="true">https://anwarvic.github.io/about-me/effect_of_alignmet_on_csw</guid>
        
        
      </item>
      
    
      
      <item>
        <title>mBERT: Multilingual BERT</title>
        <description>mBERT is a multilingual BERT pre-trained on 104 languages, released by the
authors of the original paper on Google Research’s official GitHub repository: google-research/bert
 on 
November 2018. mBERT follows the same structure of BERT. The only difference is
that mBERT is pre-trained on concatenated Wikipedia data for 104 languages and
it does surprisingly well compared to cross-lingual word embeddings on zero-shot
cross-lingual transfer in XNLI dataset.

</description>
        <pubDate>Sun, 04 Nov 2018 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/cross-lingual-lm/mBERT</link>
        <guid isPermaLink="true">https://anwarvic.github.io/cross-lingual-lm/mBERT</guid>
        
        
      </item>
      
      <item>
        <title>XLM</title>
        <description>XLM stands for “Cross-lingual Language Modeling” which is a model
created by FacebookAI in 2019 and published in this paper:
“Cross-lingual Language Model Pretraining”.
XLM is an 12-layer encoder-transformer of with 1024
hidden units, 16 heads, and GELU activation.

</description>
        <pubDate>Tue, 22 Jan 2019 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/cross-lingual-lm/XLM</link>
        <guid isPermaLink="true">https://anwarvic.github.io/cross-lingual-lm/XLM</guid>
        
        
      </item>
      
    
      
      <item>
        <title>RNN: Recurrent Neural Networks</title>
        <description>The neural n-gram language model we've seen earlier was trained using
the a window-sized subset of the previous tokens. And this falls short
with long sentences as where the contextual dependencies are longer than
the window size. Now, we need a model that is able to capture
dependencies outside the window. In other words, we need a system that
has some kind of memory to save these long dependencies.

</description>
        <pubDate>Thu, 19 Sep 1985 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/language-modeling/RNN</link>
        <guid isPermaLink="true">https://anwarvic.github.io/language-modeling/RNN</guid>
        
        
      </item>
      
      <item>
        <title>Neural N-gram Language Model</title>
        <description>As we discussed before, the n-gram language model has a few problems
like the data sparsity and the big storage need. That’s why these
problems were first tackled by Bengio et al in 2003 and published under
the name “A Neural Probabilistic Language
Model”,
which introduced the first large-scale deep learning for natural
language processing model. This model learns a distributed
representation of words, along with the probability function for word
sequences expressed in terms of these representations. The idea behind
this architecture is to deal with the language model task as if it is a
classification problems where:

</description>
        <pubDate>Sun, 09 Feb 2003 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/language-modeling/Neural_N-gram</link>
        <guid isPermaLink="true">https://anwarvic.github.io/language-modeling/Neural_N-gram</guid>
        
        
      </item>
      
    
      
      <item>
        <title>Attention Mechanism</title>
        <description>A potential issue with the Seq2Seq approach is that a neural network
needs to be able to compress all the necessary information of a source
sentence into a fixed-length vector (context vector). This may make it
difficult for the neural network to cope with long sentences, especially
those that are longer than the sentences in the training corpus. This
paper: “On the Properties of Neural Machine Translation:
Encoder–Decoder Approaches”
showed that indeed the performance of a basic encoder–decoder
deteriorates rapidly as the length of an input sentence increases.

</description>
        <pubDate>Mon, 01 Sep 2014 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/machine-translation/Attention</link>
        <guid isPermaLink="true">https://anwarvic.github.io/machine-translation/Attention</guid>
        
        
      </item>
      
      <item>
        <title>Seq2Seq</title>
        <description>Sequence-to-sequence (seq2seq) models or encoder-decoder architecture,
created by IlyaSutskever and published in their paper: Sequence to
Sequence Learning with Neural Networks
published in 2014, have enjoyed great success in a machine translation,
speech recognition, and text summarization.

</description>
        <pubDate>Wed, 10 Sep 2014 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/machine-translation/Seq2Seq</link>
        <guid isPermaLink="true">https://anwarvic.github.io/machine-translation/Seq2Seq</guid>
        
        
      </item>
      
    
      
    
      
      <item>
        <title>Multilingual Google's NMT</title>
        <description>GNMT stands for “Google Neural Machine Translation” which is a
bilingual machine translation architecture that was
discussed before in this post:
GNMT. Here, we
are going to discuss how they extended the bilingual nature of the GNMT
model to be multilingual. The Multilingual GNMT architecture, as seen in
the following figure, was proposed in 2016 by the Google Research team
and published in this paper: Google’s Multilingual Neural Machine
Translation System: Enabling Zero-Shot
Translation. The official code
for this paper can be found in the TensorFlow’s official GitHub repository:
TensorFlow/GNMT.

</description>
        <pubDate>Mon, 14 Nov 2016 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/multilingual-nmt/Multilingual_GNMT</link>
        <guid isPermaLink="true">https://anwarvic.github.io/multilingual-nmt/Multilingual_GNMT</guid>
        
        
      </item>
      
      <item>
        <title>Massively MNMT</title>
        <description>Massively MNMT is a multilingual many-to-many NMT model proposed by
Google Research in 2019 and published in their paper: Massively
Multilingual Neural Machine
Translation. Massively MNMT is a
standard Base-Transformer with 6 layers in both the encoder and the
decoder. To enable many-to-many translation, the authors added a
target-language prefix token to each source sentence.

</description>
        <pubDate>Tue, 02 Jul 2019 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/multilingual-nmt/Massively_MNMT</link>
        <guid isPermaLink="true">https://anwarvic.github.io/multilingual-nmt/Massively_MNMT</guid>
        
        
      </item>
      
    
      
    
      
      <item>
        <title>CTC</title>
        <description>Data sets for speech recognition are usually a dataset of audio clips
and corresponding transcripts. The main issue in these datasets is that
we don’t know how the characters in the transcript align to the audio.
Without this alignment, it would be very hard to train a speech
recognition model since people’s rates of speech vary. CTC provides a
solution to this problem.

</description>
        <pubDate>Sun, 25 Jun 2006 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-recognition/CTC</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-recognition/CTC</guid>
        
        
      </item>
      
      <item>
        <title>RNN-T: RNN Transducer</title>
        <description>RNN-T stands for “Recurrent Neural Network Transducer” which is a promising
architecture for general-purpose sequence such as audio transcription built
using RNNs. RNN-T was
proposed by Alex Graves at the University of Toronto back in 2012 and
published under the name: Sequence Transduction with Recurrent Neural
Networks. This paper introduces an
end-to-end, probabilistic sequence transduction system, based entirely
on RNNs, that is in principle able to transform any input sequence into
any finite, discrete output sequence.

</description>
        <pubDate>Wed, 14 Nov 2012 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-recognition/RNN-T</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-recognition/RNN-T</guid>
        
        
      </item>
      
    
      
      <item>
        <title>WaveNet</title>
        <description>WaveNet is a generative deep neural network for generating raw audio
waveforms based on PixelCNN architecture. WaveNet was proposed by Deep
Mind in 2016 and published in this paper: WaveNet: A Generative Model
for Raw Audio. The official audio
samples outputted from the trained WaveNet by Google is provided in
this
website.
The unofficial TensorFlow implementation for WaveNet can be found in
this GitHub repository:
tensorflow-wavenet.

</description>
        <pubDate>Mon, 12 Sep 2016 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-synthesis/WaveNet</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-synthesis/WaveNet</guid>
        
        
      </item>
      
      <item>
        <title>Tacotron</title>
        <description>Tacotron is a two-staged generative text-to-speech (TTS) model that
synthesizes speech directly from characters. Given (text, audio) pairs,
Tacotron can be trained completely from scratch with random
initialization to output spectrogram without any phoneme-level
alignment. After that, a Vocoder model is used to convert the audio
spectrogram to waveforms. Tacotron was proposed by Google in 2017 and
published in this paper under the same name: Tacotron: Towards End-to-End
Speech Synthesis. The official audio
samples outputted from the trained Tacotron by Google is provided in
this website. The unofficial
TensorFlow implementation for Tacotron can be found in this GitHub
repository: tacotron.

</description>
        <pubDate>Wed, 29 Mar 2017 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-synthesis/Tacotron</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-synthesis/Tacotron</guid>
        
        
      </item>
      
    
      
      <item>
        <title>Dual-decoder Transformer</title>
        <description>Dual-decoder Transformer is a
Transformer
architecture that consists of two decoders; one responsible for
Automatic Speech Recognition (ASR) while the other is responsible for
Speech Translation (ST). This model was proposed by FAIR and Grenoble
Alpes University in 2020 and published in this paper: Dual-decoder
Transformer for Joint Automatic Speech Recognition and Multilingual
Speech Translation. The official
code of this paper can be found in the following GitHub repository:
speech-translation.

</description>
        <pubDate>Mon, 02 Nov 2020 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-translation/Dual-decoder_Transformer</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-translation/Dual-decoder_Transformer</guid>
        
        
      </item>
      
      <item>
        <title>SpeechT5</title>
        <description>SpeechT5, stands for “Speech Text-to-Text Transfer Transformer”, is a
unified framework for speech and text that leverages the large-scale
unlabeled speech and text data hoping to improve the modeling capability
for both speech and text. The name is inspired by
T5 framework by
Google which did the same on the textual modality. SpeechT5 was proposed
by Microsoft in 2021 and published in this paper: SpeechT5:
Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
Processing. The official code for
this framework can be found on Microsoft’s official GitHub repository:
Microsoft/SpeechT5.

</description>
        <pubDate>Thu, 14 Oct 2021 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/speech-translation/SpeechT5</link>
        <guid isPermaLink="true">https://anwarvic.github.io/speech-translation/SpeechT5</guid>
        
        
      </item>
      
    
      
      <item>
        <title>t-SNE</title>
        <description>One of the popular things to do with word embedding, is to take this
N-dimensional data and embed it in a two-dimensional space so that we
can visualize them. The most common algorithm for doing this is the
t-SNE algorithm created by Laurens van der Maaten and Geoffrey
Hinton in 2008 and published in this paper: Visualizing data using
t-SNE.

</description>
        <pubDate>Tue, 25 Nov 2008 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/word-embedding/t-SNE</link>
        <guid isPermaLink="true">https://anwarvic.github.io/word-embedding/t-SNE</guid>
        
        
      </item>
      
      <item>
        <title>Word2Vec</title>
        <description>Word2Vec stands for “word-to-vector” is a model architecture created by
Tomáš Mikolov from Google in 2013 and published in the paper: Efficient
Estimation of Word Representations in Vector
Space. This model aims at
computing continuous vector representations of words from very large
data sets.

</description>
        <pubDate>Sat, 07 Sep 2013 00:00:00 +0000</pubDate>
        <link>https://anwarvic.github.io/word-embedding/word2vec</link>
        <guid isPermaLink="true">https://anwarvic.github.io/word-embedding/word2vec</guid>
        
        
      </item>
      
    
  </channel>
</rss>
