--Jean Arreola-- --Jean Arreola-- - - R tag 2018-07-13T22:40:35+00:00 2018-07-13T22:40:35+00:00 1800 Variational Gaussian Mixtures for Face Detection <h2 id="mixture-model">Mixture model</h2> <p>A Gaussian mixture model is a probabilistic way of representing subpopulations within an overall population. We only observe the data, not the subpopulation from which observation belongs.</p> <p>We have $N$ random variables observed, each distributed according to a mixture of K gaussian components. Each gaussian has its own parameters, and we should be able to estimate the category using Expectation Maximization, as we are using a latent variables model.</p> <p>Now, in a bayesian scenario, each parameter of each gaussian is also a random variable, as well as the mixture weights. To estimate the distributions we use Variational Inference, which can be seen as a generalization of the EM algorithm. Be sure to check <a href="https://www.springer.com/us/book/9780387310732">this book</a> to learn all the theory behind gaussian mixtures and variational inference.</p> <p>Here is my implementation for Variational Gaussian Mixture Model.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Variational Gaussian Mixture Model</span><span class="w"> </span><span class="c1">#Constant for Dirichlet Distribution</span><span class="w"> </span><span class="n">dirConstant</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">alpha</span><span class="p">){</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">alpha</span><span class="p">)){</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">gamma</span><span class="p">(</span><span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="nf">gamma</span><span class="p">(</span><span class="nf">sum</span><span class="p">(</span><span class="n">alpha</span><span class="p">))</span><span class="o">/</span><span class="n">res</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">BWishart</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">){</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">W</span><span class="p">)</span><span class="w"> </span><span class="n">elem1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="p">(</span><span class="n">det</span><span class="p">(</span><span class="n">W</span><span class="p">))</span><span class="o">^</span><span class="p">(</span><span class="o">-</span><span class="n">v</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="n">elem2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="p">(</span><span class="m">2</span><span class="o">^</span><span class="p">(</span><span class="n">v</span><span class="o">*</span><span class="n">D</span><span class="o">/</span><span class="m">2</span><span class="p">))</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="nb">pi</span><span class="o">^</span><span class="p">(</span><span class="n">D</span><span class="o">*</span><span class="p">(</span><span class="n">D</span><span class="m">-1</span><span class="p">)</span><span class="o">/</span><span class="m">4</span><span class="p">))</span><span class="w"> </span><span class="n">elem3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">D</span><span class="p">){</span><span class="w"> </span><span class="n">elem3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">elem3</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">gamma</span><span class="p">((</span><span class="n">v</span><span class="m">+1</span><span class="o">-</span><span class="n">i</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">elem1</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">elem2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">elem3</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1">#Log precision expected value</span><span class="w"> </span><span class="n">espLnPres</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">){</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">W</span><span class="p">)</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">D</span><span class="p">){</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">digamma</span><span class="p">((</span><span class="n">v</span><span class="m">+1</span><span class="o">-</span><span class="n">i</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">D</span><span class="o">*</span><span class="nf">log</span><span class="p">(</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">det</span><span class="p">(</span><span class="n">W</span><span class="p">))</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">res</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1">#Wishart distribution entropy</span><span class="w"> </span><span class="n">entropyWishart</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="p">){</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">W</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="o">-</span><span class="nf">log</span><span class="p">(</span><span class="n">BWishart</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="n">v</span><span class="p">))</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">((</span><span class="n">v</span><span class="o">-</span><span class="n">D</span><span class="m">-1</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">espLnPres</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="n">v</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="o">*</span><span class="n">D</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Estimating mixture parameters</span><span class="w"> </span><span class="n">vgmm</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">,</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">eps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.001</span><span class="p">){</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="w"> </span><span class="n">N</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="w"> </span><span class="c1">#Hyperparameters initialization</span><span class="w"> </span><span class="n">m</span><span class="m">0</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">D</span><span class="p">)</span><span class="w"> </span><span class="c1"># mean</span><span class="w"> </span><span class="n">W</span><span class="m">0</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">diag</span><span class="p">(</span><span class="n">D</span><span class="p">)</span><span class="w"> </span><span class="c1"># precision</span><span class="w"> </span><span class="n">v</span><span class="m">0</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="c1"># degrees of freedom: n &gt; p-1</span><span class="w"> </span><span class="n">alpha0</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="o">/</span><span class="n">K</span><span class="w"> </span><span class="c1"># Dirichlet parameter</span><span class="w"> </span><span class="n">beta0</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="c1"># Variance for mean</span><span class="w"> </span><span class="c1">#For each category</span><span class="w"> </span><span class="c1">#Initialize the means with centroids from k-means</span><span class="w"> </span><span class="n">mk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">kmeans</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">K</span><span class="p">)</span><span class="o">$</span><span class="n">centers</span><span class="w"> </span><span class="c1">#Initialize presicions with diagonal matrix</span><span class="w"> </span><span class="n">Wk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">D</span><span class="p">,</span><span class="w"> </span><span class="n">D</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">W</span><span class="m">0</span><span class="w"> </span><span class="n">vk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">v</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="c1">#Initialize hyperparameters</span><span class="w"> </span><span class="n">betak</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">beta0</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="n">alphak</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">alpha0</span><span class="p">,</span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="c1"># Necessary terms for calculate responsabilities</span><span class="w"> </span><span class="n">ln_pres</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="n">ln_pi</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="n">E_mu_pres</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">N</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="c1"># Iterate</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">it</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">iter</span><span class="p">){</span><span class="w"> </span><span class="c1">#Responsabilities</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">N</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">)</span><span class="w"> </span><span class="c1">##################### Variational E-Step ##########################33</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="c1">#Log precision</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">D</span><span class="p">){</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">digamma</span><span class="p">((</span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">j</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">det</span><span class="p">(</span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]))</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">alphak</span><span class="p">)</span><span class="w"> </span><span class="n">ln_pi</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">digamma</span><span class="p">(</span><span class="n">alphak</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">digamma</span><span class="p">(</span><span class="n">alpha</span><span class="p">)</span><span class="w"> </span><span class="c1">#E[mu,pres] (expected value of joint distribution of mu and pres)</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">k</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">N</span><span class="p">){</span><span class="w"> </span><span class="n">E_mu_pres</span><span class="p">[</span><span class="n">k</span><span class="p">,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="p">(</span><span class="n">D</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">k</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">k</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="c1">#10.64</span><span class="w"> </span><span class="n">r</span><span class="p">[</span><span class="n">k</span><span class="p">,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ln_pi</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">D</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nf">log</span><span class="p">(</span><span class="m">2</span><span class="o">*</span><span class="nb">pi</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">E_mu_pres</span><span class="p">[</span><span class="n">k</span><span class="p">,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Exp-log-sum trick for numerical stability</span><span class="w"> </span><span class="n">rho</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">){</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="nf">exp</span><span class="p">(</span><span class="n">y</span><span class="p">)</span><span class="o">/</span><span class="nf">sum</span><span class="p">(</span><span class="nf">exp</span><span class="p">(</span><span class="n">y</span><span class="p">)))</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="n">rho</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">rho</span><span class="p">)</span><span class="w"> </span><span class="c1">########################### Variational M-Step ##################################</span><span class="w"> </span><span class="c1"># Auxiliary statistics</span><span class="w"> </span><span class="n">Nk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">rho</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">sum</span><span class="p">)</span><span class="w"> </span><span class="c1"># Update means</span><span class="w"> </span><span class="n">xBark</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">matrix</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">K</span><span class="p">,</span><span class="w"> </span><span class="n">D</span><span class="p">)</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colSums</span><span class="p">(</span><span class="n">rho</span><span class="p">[,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">X</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Update covariances</span><span class="w"> </span><span class="n">Sk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">array</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">D</span><span class="p">,</span><span class="n">D</span><span class="p">,</span><span class="n">K</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">sum_sk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">N</span><span class="p">){</span><span class="w"> </span><span class="n">sum_sk</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sum_sk</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">rho</span><span class="p">[</span><span class="n">j</span><span class="p">,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">j</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">X</span><span class="p">[</span><span class="n">j</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">Sk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sum_sk</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Update hyperparameters</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">beta0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="o">/</span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">beta0</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">m</span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">solve</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">W</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Sk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">((</span><span class="n">beta0</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">beta0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">m</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">m</span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">v</span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1">#ELBO (Evidence Lower Bound)</span><span class="w"> </span><span class="c1"># ELBO is a sum of seven terms</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.71</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Nk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">D</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">diag</span><span class="p">(</span><span class="n">Sk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]))</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,])</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="p">(</span><span class="n">xBark</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,]))</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="m">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nb">pi</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="n">term2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.72</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">N</span><span class="p">){</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">rho</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ln_pi</span><span class="p">[</span><span class="n">j</span><span class="p">])</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.73</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ln_pi</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">alpha0</span><span class="w"> </span><span class="m">-1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">dirConstant</span><span class="p">(</span><span class="n">alpha0</span><span class="p">))</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.74</span><span class="w"> </span><span class="n">sub</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">D</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">beta0</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="m">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nb">pi</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-</span><span class="w"> </span><span class="p">((</span><span class="n">D</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">beta0</span><span class="p">)</span><span class="o">/</span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">beta0</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="o">-</span><span class="n">m</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="p">(</span><span class="n">mk</span><span class="p">[</span><span class="n">i</span><span class="p">,]</span><span class="o">-</span><span class="n">m</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">K</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">BWishart</span><span class="p">(</span><span class="n">W</span><span class="m">0</span><span class="p">,</span><span class="n">v</span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">sub</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sub</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">diag</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">W</span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%*%</span><span class="w"> </span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">]))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">ln_pres</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">((</span><span class="n">v</span><span class="m">0</span><span class="o">-</span><span class="n">D</span><span class="m">-1</span><span class="p">)</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">sub</span><span class="w"> </span><span class="n">term5</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.75</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">N</span><span class="p">){</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">stand</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rho</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">rho</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">])</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="nf">is.finite</span><span class="p">(</span><span class="n">stand</span><span class="p">))</span><span class="w"> </span><span class="n">stand</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="n">term5</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">stand</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.76</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">alphak</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="m">-1</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ln_pi</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">dirConstant</span><span class="p">(</span><span class="n">alphak</span><span class="p">))</span><span class="w"> </span><span class="n">term7</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="c1">#10.77</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">K</span><span class="p">){</span><span class="w"> </span><span class="n">term7</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term7</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">ln_pres</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="n">D</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nf">log</span><span class="p">(</span><span class="n">betak</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">/</span><span class="p">(</span><span class="m">2</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nb">pi</span><span class="p">))</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">D</span><span class="o">/</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">entropyWishart</span><span class="p">(</span><span class="n">Wk</span><span class="p">[,,</span><span class="n">i</span><span class="p">],</span><span class="w"> </span><span class="n">vk</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">it</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">1</span><span class="p">){</span><span class="w"> </span><span class="n">prevELBO</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ELBO</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">ELBO</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">term1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">term2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">term3</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">term4</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">term5</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">term6</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">term7</span><span class="w"> </span><span class="c1"># Convergence criteria</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">it</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nf">is.finite</span><span class="p">(</span><span class="n">ELBO</span><span class="p">)){</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">ELBO</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">prevELBO</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eps</span><span class="p">){</span><span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Return responsabilities, ELBO, covariances and means</span><span class="w"> </span><span class="c1"># (You can add whatever parameters (or hyperparameters) you need)</span><span class="w"> </span><span class="n">lista</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"rho"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rho</span><span class="p">,</span><span class="w"> </span><span class="s2">"ELBO"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ELBO</span><span class="p">,</span><span class="w"> </span><span class="s2">"Wk"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Wk</span><span class="p">,</span><span class="w"> </span><span class="s2">"mk"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mk</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">lista</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <h2 id="applications">Applications</h2> <p>Gaussian Mixture Models can be seen as a form of clustering, but each observation will belong to all clusters simultaneously, as we are estimating the probabilities for belonging to each gaussian distribution. This is called “soft clustering”, as opposed to other algorithms like k-means, which is a “hard clustering technique” (each observation belongs to only one cluster). As a matter of fact, k-means is a special case of a gaussian mixture when the variances all are the same, and there aren’t covariances (so all the clusters will have a circular shape).</p> <p>A consequence of this is that gaussian mixture are more flexible than k-means because the clusters can have an “elliptical form”. In particular, in image segmentation, gaussian mixture are the prefered algorithm. For example, in image matting (segment an image by background and foreground pixels), GMM are a natural choice because each pixel will have a probability for belongin to the foreground and the background.</p> <h2 id="eigenfaces">Eigenfaces</h2> <p>In this post, we will use variational GMM to do face detection. We will use the <a href="https://cswww.essex.ac.uk/mv/allfaces/faces94.html">faces94 dataset</a>, and choose the most probable category for each face.</p> <p>The representation that I choose for the images are the Eigenfaces, which are the eigenvectors of the matrix of faces (each column is an image and each row has all the pixels values of the image). It’s important to note that the images have to be centered (sustract the mean).</p> <p>To reduce dimensionality, we will work with the eigenvectors of the matrix X’X, so we will have instead a matrix of N x N.</p> <h2 id="results">Results</h2> <p>The first five eigenfaces:</p> <div class="image-gallery"> <img src="https://raw.githubusercontent.com/jean9208/jean9208.github.io/master/assets/img/posts/eigenFaces/eigF1.jpg" /> <img src="https://raw.githubusercontent.com/jean9208/jean9208.github.io/master/assets/img/posts/eigenFaces/eigF2.jpg" /> <img src="https://raw.githubusercontent.com/jean9208/jean9208.github.io/master/assets/img/posts/eigenFaces/eigF3.jpg" /> <img src="https://raw.githubusercontent.com/jean9208/jean9208.github.io/master/assets/img/posts/eigenFaces/eigF4.jpg" /> <img src="https://raw.githubusercontent.com/jean9208/jean9208.github.io/master/assets/img/posts/eigenFaces/eigF5.jpg" /> <div class="clear"></div> </div> <p>Now the results of the classification:</p> <iframe frameborder="0" src="/html_wg/vgmm_fd/facesPlot.html" height="500"></iframe> <p><a href="/html_wg/vgmm_fd/facesPlot.html" target="_blank">open</a></p> <p>We can see that the algorithm only misclassified one point. Notice that the groups are almost linearly separable, so eigenfaces was an extremely helpful representation.</p> <h1 id="final-thoughts">Final thoughts</h1> <p>A gaussian mixture model is a powerful technique for unsupervised learning. With Variational Inference, we can give more abilities to the mixture, like working with missing values, or adding additional levels to the hierarchical model. GMM are also the principles for learning advances models like Hidden Markov Models.</p> /vgmm_fd/ /vgmm_fd 2018-07-13T00:00:00+00:00 Correspondence Analysis of Mexican Discourses <h2 id="correspondence-analysis">Correspondence Analysis</h2> <p>Correspondence analysis is a multivariate statistical technique that summarizes a set of categorical data in a two dimensional form. It’s like the equivalent of Principal Component Analysis but for categorical data.</p> <p>Correspondence analysis is usually applied to contigency tables. In this post, we will apply it to a frequency matrix (term document matrix from bag of words representation).</p> <p>The analysis can be done by row or by column. Below is an implementation of correspondence analysis, where row and column analysis are done at the same time.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">correspondence</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">ct</span><span class="p">,</span><span class="w"> </span><span class="n">ind</span><span class="p">){</span><span class="w"> </span><span class="c1">#Parameters</span><span class="w"> </span><span class="c1">#ct : contingency table (or frequency table)</span><span class="w"> </span><span class="c1">#ind: which eigenvectors (first eigenvector is ommited)</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">ct</span><span class="p">)</span><span class="w"> </span><span class="n">rows</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">ct</span><span class="p">)</span><span class="w"> </span><span class="n">cols</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ncol</span><span class="p">(</span><span class="n">ct</span><span class="p">)</span><span class="w"> </span><span class="c1">#Correspondence Matrix</span><span class="w"> </span><span class="n">F_fisher</span><span class="o">&lt;-</span><span class="p">(</span><span class="n">ct</span><span class="p">)</span><span class="o">/</span><span class="n">n</span><span class="w"> </span><span class="c1">#Relative frequencies</span><span class="w"> </span><span class="n">rtot</span><span class="o">&lt;-</span><span class="p">(</span><span class="n">apply</span><span class="p">(</span><span class="n">ct</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="n">sum</span><span class="p">))</span><span class="o">/</span><span class="n">n</span><span class="w"> </span><span class="n">ctot</span><span class="o">&lt;-</span><span class="n">apply</span><span class="p">(</span><span class="n">ct</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="n">sum</span><span class="p">)</span><span class="o">/</span><span class="n">n</span><span class="w"> </span><span class="n">Dr</span><span class="o">&lt;-</span><span class="n">diag</span><span class="p">(</span><span class="n">rtot</span><span class="p">)</span><span class="w"> </span><span class="n">Dc</span><span class="o">&lt;-</span><span class="n">diag</span><span class="p">(</span><span class="n">ctot</span><span class="p">)</span><span class="w"> </span><span class="n">Z</span><span class="o">&lt;-</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">Dr</span><span class="p">)))</span><span class="o">%*%</span><span class="n">F_fisher</span><span class="o">%*%</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">Dc</span><span class="p">)))</span><span class="w"> </span><span class="c1">#Eigenvalues and eigenvector are obtained with SVD</span><span class="w"> </span><span class="n">dvalsing</span><span class="o">&lt;-</span><span class="n">svd</span><span class="p">(</span><span class="n">Z</span><span class="p">)</span><span class="w"> </span><span class="c1">#Two dimensional representation</span><span class="w"> </span><span class="c1">#Row analysis</span><span class="w"> </span><span class="n">Cr</span><span class="o">&lt;-</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">Dr</span><span class="p">)))</span><span class="o">%*%</span><span class="n">Z</span><span class="o">%*%</span><span class="n">dvalsing</span><span class="o">$</span><span class="n">v</span><span class="p">[,</span><span class="n">ind</span><span class="p">]</span><span class="w"> </span><span class="c1">#Column analysis</span><span class="w"> </span><span class="n">Cc</span><span class="o">&lt;-</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">solve</span><span class="p">(</span><span class="n">Dc</span><span class="p">)))</span><span class="o">%*%</span><span class="n">t</span><span class="p">(</span><span class="n">Z</span><span class="p">)</span><span class="o">%*%</span><span class="n">dvalsing</span><span class="o">$</span><span class="n">u</span><span class="p">[,</span><span class="n">ind</span><span class="p">]</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="s2">"Cr"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Cr</span><span class="p">,</span><span class="w"> </span><span class="s2">"Cc"</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Cc</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <h2 id="mexican-discourses">Mexican discourses</h2> <p>In this post we will analize the discourses of mexican politicians, in particular, candidates for Mexico presidency. We have 11 discourses in total:</p> <ul> <li>Roberto Madrazo Pintado (PRI 2006)</li> <li>Andres Manuel Lopez Obrador (PRD 2006) (PRD 2012) (MORENA 2018)</li> <li>Enrique Peña Nieto (PRI 2012 before and after being elected)</li> <li>Josefina Vazquez Mota (PAN 2012)</li> <li>Felipe Calderon (PAN 2006)</li> <li>Ricardo Anaya Cortes (PAN 2018)</li> <li>Jose Antonio Meade Kuribreña (PRI 2018)</li> <li>Margarita Ester Zavala Gomez del Campo (Independiente 2018)</li> </ul> <p>Our objective is to find patterns in the two dimensional of the discourses, that reflect information of the actual Mexico context regarding politics.</p> <h2 id="putting-it-all-together">Putting it all together</h2> <p>We will use the bag of words representation for the discourses. The most frequent 500 words will be chosen for the analysis, and our final term document matrix will be a 11 x 500 matrix.</p> <p>Next, we see the results of the correspondence analysis appplied to our term document matrix:</p> <iframe frameborder="0" src="/html_wg/correspondence_analysis/anCorrPol.html" height="500"></iframe> <p><a href="/html_wg/correspondence_analysis/anCorrPol.html" target="_blank">open</a></p> <h2 id="insights">Insights</h2> <p>We can see that Ricardo Anaya and Roberto Madrazo are the furthest. That means in this context that they use words in their discourses that the other candidates don’t use frequently.</p> <p>The three discourses from Andres Manuel are near from each other, and that was expected. And Margarita Zavala is close to Josefina Vazquez Mota. That makes sense, as their campaings are based on the idea of a woman in the presidency, so it’s logical that they use similar words in their discourses.</p> <p>Another interesting insight is the closeness between Felipe Calderon and Margarita Zavala. It turns out that the team that helped Zavala in her campaign were former collaborators of Felipe Calderon, so maybe she was advised in the same way that Calderon. <a href="https://www.animalpolitico.com/2018/04/excolaboradores-de-felipe-calderon-la-base-del-equipo-de-campana-de-margarita-zavala/">Check this new here.</a></p> <p>The final insight was the closeness between Margarita Zavala and Jose Antonio Meade. Recently, Zavala has resigned from her candidacy, and, surprisingly, Jorge Camacho (former campaign chief from Zavala campaign) has anounced that he intends to vote for Meade. Perhaps he intends to vote for the candidate with the most similar ideas, and that would explain the closeness in our analysis. <a href="https://www.elmanana.com/ex-jefe-campana-zavala-votara-meade-jose-antonio-meade-margarita-zavala-pri-twitter/4426818">Check this new here.</a></p> <h2 id="final-thoughts">Final thoughts</h2> <p>Correspondence analysis has proven to be useful in finding patterns on frequencey matrices. We saw how some of the political news can be reflected in a discourse analysis. For future work, we can use MDS in the term frequency matrix to obtain “data points” and train a classificator! But correspondence analysis is good for a initial representation.</p> <h2 id="discourses">Discourses</h2> <p>Discourses obtained from <a href="https://www.animalpolitico.com/2018/03/discursos-candidatos-presidenciales/">animalpolitico.com</a></p> /ca_mexdis/ /ca_mexdis 2018-06-24T00:00:00+00:00 Postgresql + R Sandbox <h2 id="elephantsql">ElephantSQL</h2> <p><a href="https://www.elephantsql.com/">ElephantSQL</a> offers a free instance of Postgresql, with a limit of 20 MB and 5 concurrent connections. For example, you can upload a shiny application that depends on data from ElephantSQL.</p> <p>You only need to register to the site and automatically you can acces your free instance.</p> <p>In this post we will see how to take advantage of this cloud database.</p> <h2 id="getting-the-data">Getting the data</h2> <p>For this example I will use the open data of air quality available in the page of SEDEMA (Environment Secretary) of Mexico City.</p> <p>The data is structured by one csv file per year, and is avalilable from 1992.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Auxiliary function to download the files</span><span class="w"> </span><span class="n">load_sedema</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">year</span><span class="p">){</span><span class="w"> </span><span class="c1">#URL to the file</span><span class="w"> </span><span class="c1">#from 1992</span><span class="w"> </span><span class="n">link</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="s2">"http://148.243.232.112:8080/opendata/IndiceCalidadAire/indice_"</span><span class="p">,</span><span class="n">year</span><span class="p">,</span><span class="s2">".csv"</span><span class="p">)</span><span class="w"> </span><span class="c1">#Columns classes</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"character"</span><span class="p">,</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="s2">"numeric"</span><span class="p">,</span><span class="m">26</span><span class="p">))</span><span class="w"> </span><span class="c1">#Download the file</span><span class="w"> </span><span class="n">air_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read.csv</span><span class="p">(</span><span class="n">link</span><span class="p">,</span><span class="n">skip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">9</span><span class="p">,</span><span class="w"> </span><span class="n">stringsAsFactors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w"> </span><span class="n">encoding</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"latin1"</span><span class="p">,</span><span class="w"> </span><span class="n">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">,</span><span class="w"> </span><span class="n">colClasses</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">,</span><span class="w"> </span><span class="n">na.string</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"NA"</span><span class="p">)</span><span class="w"> </span><span class="c1">#Remove missing data</span><span class="w"> </span><span class="n">air_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">air_data</span><span class="p">[</span><span class="o">!</span><span class="n">air_data</span><span class="p">[,</span><span class="m">1</span><span class="p">]</span><span class="o">==</span><span class="s2">""</span><span class="p">,</span><span class="m">1</span><span class="o">:</span><span class="m">27</span><span class="p">]</span><span class="w"> </span><span class="c1">#Fix time variable</span><span class="w"> </span><span class="n">air_data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">paste0</span><span class="p">(</span><span class="n">substring</span><span class="p">(</span><span class="n">air_data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">6</span><span class="p">),</span><span class="w"> </span><span class="n">year</span><span class="p">)</span><span class="w"> </span><span class="c1">#We need to asure that all dates are from the specified year</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">air_data</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span></code></pre></div></div> <p>Next step is to create the table on Postgresql, now that we know thw structure of the csv.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">RPostgreSQL</span><span class="p">)</span><span class="w"> </span><span class="c1"># SQL query to create main table if it not exists</span><span class="w"> </span><span class="s2">" CREATE TABLE IF NOT EXISTS air_quality ( FECHA date, HORA integer, NO_OZONO integer, NO_AZUFRE integer, NO_NITROGENO integer, NO_CARBONO integer, NO_PM10 integer, NE_OZONO integer, NE_AZUFRE integer, NE_NITROGENO integer, NE_CARBONO integer, NE_PM10 integer, CE_OZONO integer, CE_AZUFRE integer, CE_NITROGENO integer, CE_CARBONO integer, CE_PM10 integer, SO_OZONO integer, SO_AZUFRE integer, SO_NITROGENO integer, SO_CARBONO integer, SO_PM10 integer, SU_OZONO integer, SU_AZUFRE integer, SU_NITROGENO integer, SU_CARBONO integer, SU_PM10 integer, ID serial, PRIMARY KEY (ID) ) "</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="c1">#Be sure to change your credentials! You can check them on the Details window on your ElephantSQL instance!</span><span class="w"> </span><span class="c1">#dbname is the user &amp; default database</span><span class="w"> </span><span class="c1">#host is the serve</span><span class="w"> </span><span class="c1">#you can get the port from URL</span><span class="w"> </span><span class="c1"># Connect to database</span><span class="w"> </span><span class="n">drv</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dbDriver</span><span class="p">(</span><span class="s2">"PostgreSQL"</span><span class="p">)</span><span class="w"> </span><span class="n">con</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dbConnect</span><span class="p">(</span><span class="n">drv</span><span class="p">,</span><span class="w"> </span><span class="n">dbname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">user</span><span class="p">,</span><span class="w"> </span><span class="n">host</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db_url</span><span class="p">,</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5432</span><span class="p">,</span><span class="w"> </span><span class="n">user</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">user</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pwd</span><span class="p">)</span><span class="w"> </span><span class="c1"># Create table</span><span class="w"> </span><span class="n">dbGetQuery</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p>Next we upload the table from one year</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">upload_sedema</span><span class="p">(</span><span class="m">2017</span><span class="p">)</span><span class="w"> </span><span class="c1">#Correct format for date</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">strptime</span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="s2">"%d/%m/%Y"</span><span class="p">)</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gsub</span><span class="p">(</span><span class="s2">"/"</span><span class="p">,</span><span class="s2">"-"</span><span class="p">,</span><span class="n">data</span><span class="o">$</span><span class="n">V</span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="c1">#Set ID</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">id</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="n">ind</span><span class="p">,</span><span class="n">nrow</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">ind</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="c1">#Upload data</span><span class="w"> </span><span class="n">dbWriteTable</span><span class="p">(</span><span class="n">conn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"air_quality"</span><span class="p">,</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">append</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">,</span><span class="w"> </span><span class="n">row.names</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">F</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p>Now you can upload all of the years! Be sure to check the <a href="https://github.com/jean9208/Mexico-City-Air-Quality/blob/master/bulk_import.R">full script</a></p> <p><img src=" http://jean9208.github.io/assets/img/posts/elephantsql.png" title="elephantsql" alt="elephantsql" style="display: block; margin: auto;" /></p> <p>We can query the data now.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s1">' SELECT * FROM "public"."air_quality" LIMIT 100 '</span><span class="w"> </span><span class="n">last100</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dbGetQuery</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">)</span><span class="w"> </span><span class="n">head</span><span class="p">(</span><span class="n">last100</span><span class="p">)</span><span class="w"> </span><span class="c1"># Close the connection</span><span class="w"> </span><span class="nf">on.exit</span><span class="p">(</span><span class="n">dbDisconnect</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Loading required package: methods </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Loading required package: DBI </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## fecha hora no_ozono no_azufre no_nitrogeno no_carbono no_pm10 ## 1 1992-04-01 7 55 34 10 43 NA ## 2 1992-04-01 8 72 39 15 46 NA ## 3 1992-04-01 9 80 44 25 52 NA ## 4 1992-04-01 10 84 48 31 62 NA ## 5 1992-04-01 11 161 43 45 73 NA ## 6 1992-04-01 12 250 41 42 82 NA ## ne_ozono ne_azufre ne_nitrogeno ne_carbono ne_pm10 ce_ozono ce_azufre ## 1 70 24 19 43 NA 56 39 ## 2 68 25 21 43 NA 56 37 ## 3 62 35 30 46 NA 68 41 ## 4 47 40 33 47 NA 85 43 ## 5 81 37 28 47 NA 123 45 ## 6 89 32 19 47 NA 185 38 ## ce_nitrogeno ce_carbono ce_pm10 so_ozono so_azufre so_nitrogeno ## 1 20 46 NA 34 26 9 ## 2 23 45 NA 46 29 10 ## 3 36 48 NA 54 32 15 ## 4 64 55 NA 62 34 26 ## 5 50 59 NA 81 35 19 ## 6 38 62 NA 124 35 16 ## so_carbono so_pm10 su_ozono su_azufre su_nitrogeno su_carbono su_pm10 id ## 1 27 NA 25 18 16 64 NA 1 ## 2 31 NA 31 20 18 65 NA 2 ## 3 38 NA 32 24 21 65 NA 3 ## 4 45 NA 42 26 36 65 NA 4 ## 5 47 NA 69 24 40 66 NA 5 ## 6 49 NA 55 22 27 67 NA 6 </code></pre></div></div> <p>I hope this little example can help you to try PostgreSQL even if you don’t have it installed on your computer or if you don’t have a server.</p> /postgresqlR_Sandbox/ /postgresqlR_Sandbox 2017-09-24T00:00:00+00:00 Gradient Descent <h2 id="trying-gradient-descent-for-linear-regression">Trying gradient descent for linear regression</h2> <p>The best way to learn an algorith is to code it. So here it is, my take on Gradient Descent Algorithm for simple linear regression.</p> <p>First, we fit a simple linear model with lm for comparison with gradient descent values.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Load libraries</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">highcharter</span><span class="p">)</span><span class="w"> </span><span class="c1">#Scaling length variables from iris dataset.</span><span class="w"> </span><span class="n">iris_demo</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">iris</span><span class="p">[,</span><span class="nf">c</span><span class="p">(</span><span class="s2">"Sepal.Length"</span><span class="p">,</span><span class="s2">"Petal.Length"</span><span class="p">)]</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">sepal_length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">scale</span><span class="p">(</span><span class="n">Sepal.Length</span><span class="p">)),</span><span class="w"> </span><span class="n">petal_length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">scale</span><span class="p">(</span><span class="n">Petal.Length</span><span class="p">)))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">sepal_length</span><span class="p">,</span><span class="n">petal_length</span><span class="p">)</span><span class="w"> </span><span class="c1">#Fit a simple linear model to compare coefficients.</span><span class="w"> </span><span class="n">regression</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lm</span><span class="p">(</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">petal_length</span><span class="o">~</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">coef</span><span class="p">(</span><span class="n">regression</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## (Intercept) iris_demo$sepal_length ## 4.643867e-16 8.717538e-01 </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iris_demo_reg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">iris_demo</span><span class="w"> </span><span class="n">iris_demo_reg</span><span class="o">$</span><span class="n">reg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">predict</span><span class="p">(</span><span class="n">regression</span><span class="p">,</span><span class="n">iris_demo</span><span class="p">)</span><span class="w"> </span><span class="c1">#Plot the model with highcharter</span><span class="w"> </span><span class="n">highchart</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_add_series</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iris_demo_reg</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scatter"</span><span class="p">,</span><span class="w"> </span><span class="n">hcaes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sepal_length</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">petal_length</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Sepal Length VS Petal Length"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_add_series</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iris_demo_reg</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"line"</span><span class="p">,</span><span class="w"> </span><span class="n">hcaes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sepal_length</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reg</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Linear Regression"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_title</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Linear Regression"</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <iframe src="/html_wg/gradient_descent/linear_regression.html" height="500"></iframe> <p><a href="/html_wg/gradient_descent/linear_regression.html" target="_blank">open</a></p> <p>We will try to acomplish the same coefficients, this time using Gradient Descent.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">tidyr</span><span class="p">)</span><span class="w"> </span><span class="n">set.seed</span><span class="p">(</span><span class="m">135</span><span class="p">)</span><span class="w"> </span><span class="c1">#To reproduce results</span><span class="w"> </span><span class="c1">#Auxiliary function</span><span class="w"> </span><span class="c1"># y = mx + b</span><span class="w"> </span><span class="n">reg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">m</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="c1">#Starting point</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="c1">#Gradient descent function</span><span class="w"> </span><span class="n">gradient_desc</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">m</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">learning_rate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.01</span><span class="p">){</span><span class="w"> </span><span class="c1"># Small steps</span><span class="w"> </span><span class="c1"># Column names = Code easier to understand</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"x"</span><span class="p">,</span><span class="s2">"y"</span><span class="p">)</span><span class="w"> </span><span class="c1">#Values for first iteration</span><span class="w"> </span><span class="n">b_iter</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="n">m_iter</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="w"> </span><span class="c1"># Compute the gradient for Mean Squared Error function</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">n</span><span class="p">){</span><span class="w"> </span><span class="c1"># Partial derivative for b</span><span class="w"> </span><span class="n">b_iter</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">b_iter</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="m">-2</span><span class="o">/</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">((</span><span class="n">m</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="p">))</span><span class="w"> </span><span class="c1"># Partial derivative for m</span><span class="w"> </span><span class="n">m_iter</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">m_iter</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="m">-2</span><span class="o">/</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="o">$</span><span class="n">y</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">((</span><span class="n">m</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">data</span><span class="o">$</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># Move to the OPPOSITE direction of the derivative</span><span class="w"> </span><span class="n">new_b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">learning_rate</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">b_iter</span><span class="p">)</span><span class="w"> </span><span class="n">new_m</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">learning_rate</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">m_iter</span><span class="p">)</span><span class="w"> </span><span class="c1"># Replace values and return</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">new_b</span><span class="p">,</span><span class="n">new_m</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">new</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1"># I need to store some values to make the motion plot</span><span class="w"> </span><span class="n">vect_m</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="n">vect_b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="c1"># Iterate to obtain better parameters</span><span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="m">1000</span><span class="p">){</span><span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">100</span><span class="p">,</span><span class="m">250</span><span class="p">,</span><span class="m">500</span><span class="p">)){</span><span class="w"> </span><span class="c1"># I keep some values in the iteration for the plot</span><span class="w"> </span><span class="n">vect_m</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">vect_m</span><span class="p">,</span><span class="n">m</span><span class="p">)</span><span class="w"> </span><span class="n">vect_b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">vect_b</span><span class="p">,</span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gradient_desc</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="n">m</span><span class="p">,</span><span class="n">iris_demo</span><span class="p">)</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">x</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="n">m</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">x</span><span class="p">[[</span><span class="m">2</span><span class="p">]]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"m = "</span><span class="p">,</span><span class="w"> </span><span class="n">m</span><span class="p">))</span><span class="w"> </span></code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## [1] "m = 0.871753774273602" </code></pre></div></div> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">print</span><span class="p">(</span><span class="n">paste0</span><span class="p">(</span><span class="s2">"b = "</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">))</span><span class="w"> </span></code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## [1] "b = 5.52239677041512e-10" </code></pre></div></div> <p>The difference in the coefficients is minimal.</p> <p>We can see how the iterations work in the next plot:</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Compute new values</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">preit</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">vect_m</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="n">vect_b</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">it1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">vect_m</span><span class="p">[</span><span class="m">2</span><span class="p">],</span><span class="n">vect_b</span><span class="p">[</span><span class="m">2</span><span class="p">],</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">it100</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">vect_m</span><span class="p">[</span><span class="m">3</span><span class="p">],</span><span class="n">vect_b</span><span class="p">[</span><span class="m">3</span><span class="p">],</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">it250</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">vect_m</span><span class="p">[</span><span class="m">4</span><span class="p">],</span><span class="n">vect_b</span><span class="p">[</span><span class="m">4</span><span class="p">],</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">it500</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">vect_m</span><span class="p">[</span><span class="m">5</span><span class="p">],</span><span class="n">vect_b</span><span class="p">[</span><span class="m">5</span><span class="p">],</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_demo</span><span class="o">$</span><span class="n">finalit</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reg</span><span class="p">(</span><span class="n">m</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">iris_demo</span><span class="o">$</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="n">iris_gathered</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">iris_demo</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">gather</span><span class="p">(</span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">gr</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">,</span><span class="w"> </span><span class="n">preit</span><span class="o">:</span><span class="n">finalit</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">petal_length</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">distinct</span><span class="p">()</span><span class="w"> </span><span class="n">iris_start</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">iris_gathered</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">gr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"preit"</span><span class="p">)</span><span class="w"> </span><span class="n">iris_seq</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">iris_gathered</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">sepal_length</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">do</span><span class="p">(</span><span class="n">sequence</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_parse</span><span class="p">(</span><span class="n">select</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">)))</span><span class="w"> </span><span class="n">iris_data</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">left_join</span><span class="p">(</span><span class="n">iris_start</span><span class="p">,</span><span class="w"> </span><span class="n">iris_seq</span><span class="p">)</span><span class="w"> </span><span class="c1">#Motion Plot</span><span class="w"> </span><span class="n">irhc2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">highchart</span><span class="p">()</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_add_series</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iris_data</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"line"</span><span class="p">,</span><span class="w"> </span><span class="n">hcaes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sepal_length</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Gradient Descent"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_motion</span><span class="p">(</span><span class="n">enabled</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">series</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">startIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Iteration 1"</span><span class="p">,</span><span class="s2">"Iteration 100"</span><span class="p">,</span><span class="s2">"Iteration 250"</span><span class="p">,</span><span class="s2">"Iteration 500"</span><span class="p">,</span><span class="s2">"Final Iteration"</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_add_series</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iris_demo_reg</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"scatter"</span><span class="p">,</span><span class="w"> </span><span class="n">hcaes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sepal_length</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">petal_length</span><span class="p">),</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Sepal Length VS Petal Length"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">hc_title</span><span class="p">(</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Gradient Descent Iterations"</span><span class="p">)</span><span class="w"> </span><span class="n">irhc2</span><span class="w"> </span></code></pre></div></div> <iframe src="/html_wg/gradient_descent/gradient_descent_iterations.html" height="500"></iframe> <p><a href="/html_wg/gradient_descent/gradient_descent_iterations.html" target="_blank">open</a></p> <p>Maybe in a future post we can try a multivariate regression model!</p> /gradient_descent/ /gradient_descent 2017-03-29T00:00:00+00:00 Building a pokemon graph database <h2 id="what-happens-when-you-combine-pokemon-with-neo4j">What happens when you combine Pokemon with Neo4j?</h2> <p>I’m a huge Pokemon fan. So, when I found about <a href="http://jkunst.com/r/pokemon-visualize-em-all/">this awesome post</a> from <em>Joshua Kunst</em>, I just couldn’t wait to throw all that data into Neo4j.</p> <p>It also happens to be a great way to learn how to build a graph database from scratch. The objective of this exercise is to build a graph database where the nodes are the pokemon and the types, and the relationships are the effectiveness between the pokemon based only on their types.</p> <h2 id="getting-the-data">Getting the data</h2> <p>First of all, be sure to check Joshua’s post to learn how to import all that pokemon data. We will asume that the data is in a data frame called <em>df</em>.</p> <p>Then, we need to get the relationships between types. The easiest thing for acomplishing that is to scrape the table from <a href="http://pokemondb.net/type">pokemondb.net</a>.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">RNeo4j</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">rvest</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">methods</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w"> </span><span class="n">link</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"http://pokemondb.net/type"</span><span class="w"> </span><span class="n">link_html</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_html</span><span class="p">(</span><span class="n">link</span><span class="p">)</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">link_html</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">html_nodes</span><span class="p">(</span><span class="s2">"table"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">.</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">html_table</span><span class="p">()</span><span class="w"> </span><span class="c1">#Give format</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">types</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"Type"</span><span class="w"> </span><span class="n">types</span><span class="o">$</span><span class="n">Type</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tolower</span><span class="p">(</span><span class="n">types</span><span class="o">$</span><span class="n">Type</span><span class="p">)</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">types</span><span class="p">)[</span><span class="m">2</span><span class="o">:</span><span class="n">ncol</span><span class="p">(</span><span class="n">types</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">types</span><span class="o">$</span><span class="n">Type</span><span class="w"> </span><span class="n">types</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">types</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="n">types</span><span class="p">[</span><span class="n">types</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="n">types</span><span class="p">[</span><span class="n">types</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"½"</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0.5</span><span class="w"> </span><span class="n">knitr</span><span class="o">::</span><span class="n">kable</span><span class="p">(</span><span class="n">types</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"html"</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <table> <thead> <tr> <th style="text-align:left;"> Type </th> <th style="text-align:right;"> normal </th> <th style="text-align:left;"> fire </th> <th style="text-align:left;"> water </th> <th style="text-align:left;"> electric </th> <th style="text-align:left;"> grass </th> <th style="text-align:left;"> ice </th> <th style="text-align:left;"> fighting </th> <th style="text-align:left;"> poison </th> <th style="text-align:left;"> ground </th> <th style="text-align:left;"> flying </th> <th style="text-align:left;"> psychic </th> <th style="text-align:left;"> bug </th> <th style="text-align:left;"> rock </th> <th style="text-align:left;"> ghost </th> <th style="text-align:left;"> dragon </th> <th style="text-align:left;"> dark </th> <th style="text-align:left;"> steel </th> <th style="text-align:left;"> fairy </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> normal </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> fire </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> water </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> electric </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> grass </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ice </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> fighting </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> </tr> <tr> <td style="text-align:left;"> poison </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> ground </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> flying </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> psychic </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> bug </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> </tr> <tr> <td style="text-align:left;"> rock </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> ghost </td> <td style="text-align:right;"> 0 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> dragon </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> dark </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> </tr> <tr> <td style="text-align:left;"> steel </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> fairy </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:left;"> 1 </td> </tr> </tbody> </table> <p>Then we need to separate the types of the pokemon.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type_1</span><span class="p">)</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">t</span><span class="m">1</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type_2</span><span class="p">)</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">t</span><span class="m">2</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">t</span><span class="m">1</span><span class="p">,</span><span class="n">t</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">tf</span><span class="w"> </span><span class="n">poke_df</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">df</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">type_1</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">type_2</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">left_join</span><span class="p">(</span><span class="n">tf</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"id"</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">type</span><span class="p">))</span><span class="w"> </span></code></pre></div></div> <p>We are ready to import to Neo4j, so we need to set the connection.</p> <p>Then, we create the pokenodes and the type nodes. We set a relationship for the typing.</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Connect to Graph</span><span class="w"> </span><span class="n">graph</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">startGraph</span><span class="p">(</span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">password</span><span class="p">)</span><span class="w"> </span><span class="c1">#Constraints</span><span class="w"> </span><span class="n">addConstraint</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span><span class="w"> </span><span class="s2">"Pokemon"</span><span class="p">,</span><span class="w"> </span><span class="s2">"id"</span><span class="p">)</span><span class="w"> </span><span class="n">addConstraint</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span><span class="w"> </span><span class="s2">"Type"</span><span class="p">,</span><span class="w"> </span><span class="s2">"type"</span><span class="p">)</span><span class="w"> </span><span class="c1">#Create nodes and relationships within the same function</span><span class="w"> </span><span class="n">pokenodes</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">pokemon</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">getOrCreateNode</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span><span class="w"> </span><span class="s2">"Pokemon"</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"id"</span><span class="p">],</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"pokemon"</span><span class="p">],</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"height"</span><span class="p">],</span><span class="w"> </span><span class="n">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"weight"</span><span class="p">],</span><span class="w"> </span><span class="n">attack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"attack"</span><span class="p">],</span><span class="w"> </span><span class="n">defense</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"defense"</span><span class="p">],</span><span class="w"> </span><span class="n">hp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"hp"</span><span class="p">],</span><span class="w"> </span><span class="n">special_attack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"special_attack"</span><span class="p">],</span><span class="w"> </span><span class="n">special_defense</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"special_defense"</span><span class="p">],</span><span class="w"> </span><span class="n">speed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"speed"</span><span class="p">],</span><span class="w"> </span><span class="n">url_image</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"url_image"</span><span class="p">],</span><span class="w"> </span><span class="n">url_icon</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"url_icon"</span><span class="p">])</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">getOrCreateNode</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span><span class="w"> </span><span class="s2">"Type"</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[</span><span class="s2">"type"</span><span class="p">])</span><span class="w"> </span><span class="n">createRel</span><span class="p">(</span><span class="n">pokemon</span><span class="p">,</span><span class="s2">"TYPE"</span><span class="p">,</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="c1">#Apply to every row</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">poke_df</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">poke_df</span><span class="p">),],</span><span class="m">1</span><span class="p">,</span><span class="n">pokenodes</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p>We define the desired relationship (effectiveness) using the scraped table</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">types</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">gather</span><span class="p">(</span><span class="n">Type</span><span class="p">)</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">types</span><span class="p">)[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"Type_Rel"</span><span class="w"> </span><span class="n">effectiveness</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">value</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p>And we are ready to upload the effectiveness, this time using a transaction. Thanks to Nicloe White for this <a href="https://nicolewhite.github.io/2014/09/30/upload-last-fm-rneo4j-transaction.html">useful post</a></p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Query for creating relationships for the pokenodes</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" MERGE (n:Type {type:{type_1}}) MERGE (m:Type {type:{type_2}}) CREATE (n)-[r:EFECTIVENESS]-&gt;(m) SET r.value = {value} "</span><span class="w"> </span><span class="c1">#Transactiopn endpoint</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newTransaction</span><span class="p">(</span><span class="n">graph</span><span class="p">)</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">nrow</span><span class="p">(</span><span class="n">effectiveness</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">type_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">effectiveness</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="o">$</span><span class="n">Type</span><span class="w"> </span><span class="n">type_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">effectiveness</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="o">$</span><span class="n">Type_Rel</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">effectiveness</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="o">$</span><span class="n">value</span><span class="w"> </span><span class="n">appendCypher</span><span class="p">(</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="n">type_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type_1</span><span class="p">,</span><span class="w"> </span><span class="n">type_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type_2</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">commit</span><span class="p">(</span><span class="n">t</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p>It’s time to query our database!!! Let’s check all the pokemon that Salamence is double effective:</p> <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="n">visNetwork</span><span class="p">)</span><span class="w"> </span><span class="c1">#Query to check for effectiveness for Salamence</span><span class="w"> </span><span class="n">final_query</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">" match (n:Pokemon)-[t:TYPE]-&gt;(l:Type)-[e:EFECTIVENESS]-&gt;(s:Type)&lt;-[j:TYPE]-(z:Pokemon) where n.name = 'salamence' return n.name as poke1, e.value as value, z.name as poke2, n.url_icon as icon1, z.url_icon as icon2, n.url_image as image1, z.url_image as image2"</span><span class="w"> </span><span class="c1">#Execute the query</span><span class="w"> </span><span class="n">poke_cypher</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cypher</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span><span class="w"> </span><span class="n">final_query</span><span class="p">)</span><span class="w"> </span><span class="c1">#Get data for VisNetwork</span><span class="w"> </span><span class="n">poke_cypher</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">poke_cypher</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">value</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">poke1</span><span class="p">,</span><span class="w"> </span><span class="n">poke2</span><span class="p">,</span><span class="w"> </span><span class="n">image1</span><span class="p">,</span><span class="w"> </span><span class="n">image2</span><span class="p">,</span><span class="w"> </span><span class="n">icon1</span><span class="p">,</span><span class="w"> </span><span class="n">icon2</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">summarise</span><span class="p">(</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">prod</span><span class="p">(</span><span class="n">value</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="c1">#Filter by double effective</span><span class="w"> </span><span class="n">poke_sp_eft</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">poke_cypher</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="c1">#More data for VisNetwork</span><span class="w"> </span><span class="n">poke</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">poke_sp_eft</span><span class="o">$</span><span class="n">poke1</span><span class="p">,</span><span class="w"> </span><span class="n">poke_sp_eft</span><span class="o">$</span><span class="n">poke2</span><span class="p">))</span><span class="w"> </span><span class="n">img</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">poke_sp_eft</span><span class="o">$</span><span class="n">icon1</span><span class="p">,</span><span class="w"> </span><span class="n">poke_sp_eft</span><span class="o">$</span><span class="n">icon2</span><span class="p">))</span><span class="w"> </span><span class="n">nodes</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">poke</span><span class="p">,</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">poke</span><span class="p">,</span><span class="w"> </span><span class="n">image</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">img</span><span class="p">,</span><span class="w"> </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"image"</span><span class="p">)</span><span class="w"> </span><span class="n">edges</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">poke_sp_eft</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">poke1</span><span class="p">,</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">poke2</span><span class="p">)</span><span class="w"> </span><span class="c1">#The VISUALIZATION</span><span class="w"> </span><span class="n">visNetwork</span><span class="p">(</span><span class="n">nodes</span><span class="p">,</span><span class="w"> </span><span class="n">edges</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"100%"</span><span class="p">)</span><span class="w"> </span></code></pre></div></div> <p><img src="img/pokegraphunnamed-chunk-8-1.png" alt="plot of chunk unnamed-chunk-8" /></p> <p>And that’s how you do it! With the RNeo4j it’s so easy to set a graph. Maybe in the future it could be expanded in a recommender system or something like that.</p> <p>Check out a <a href="https://jean-arreola.shinyapps.io/Pokemon_Effectiveness/">shiny app</a> for the pokemon database!</p> /pokemon_graph/ /pokemon_graph 2017-02-13T00:00:00+00:00