<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Lasse Hansen</title>
    <link>https://lassehansen.me/</link>
      <atom:link href="https://lassehansen.me/index.xml" rel="self" type="application/rss+xml" />
    <description>Lasse Hansen</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Thu, 01 Dec 2022 15:00:00 +0000</lastBuildDate>
    <image>
      <url>https://lassehansen.me/images/icon_hu0b7a4cb9992c9ac0e91bd28ffd38dd00_9727_512x512_fill_lanczos_center_2.png</url>
      <title>Lasse Hansen</title>
      <link>https://lassehansen.me/</link>
    </image>
    
    <item>
      <title>Inferring Neuropsychiatric Conditions from Language - How Specific are Transformers and Traditional ML Pipelines?</title>
      <link>https://lassehansen.me/talk/speech-biomarker-talk/</link>
      <pubDate>Thu, 01 Dec 2022 15:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/talk/speech-biomarker-talk/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click on the &lt;strong&gt;Slides&lt;/strong&gt; button above to view the built-in slides feature.
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Invited talk at the Harvard-MIT Speech and Language Biomarker Interest Group on work related to my master&amp;rsquo;s thesis on a multi-class approach to inferring neuropsychiatric conditions from voice (preprint out Jan 2023). Joint work with 
&lt;a href=&#34;https://rbroc.github.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Roberta Rocca&lt;/a&gt; and Riccardo Fusaroli and jointly presented with Roberta Rocca.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>PSYCOP</title>
      <link>https://lassehansen.me/project/psycop_project/</link>
      <pubDate>Wed, 01 Jun 2022 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/psycop_project/</guid>
      <description>&lt;h1 id=&#34;psycop&#34;&gt;PSYCOP&lt;/h1&gt;
&lt;p&gt;The PSYCOP project aims at using the wealth of information in Electronic Health Records to improve patient care and treatment of those with mental illness.&lt;/p&gt;
&lt;p&gt;To do this, we develop machine learning models with a specific focus on evaluation and clinical applicability.&lt;/p&gt;
&lt;p&gt;We are currently developing tools for thoroughly validation our data and models, and quickly iterating on classical machine learning models. Check out our Github repositories for 
&lt;a href=&#34;https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;feature generation and data validation&lt;/a&gt; and 
&lt;a href=&#34;https://github.com/Aarhus-Psychiatry-Research/psycop-t2d&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;model training and evaluation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Ongoing and planned projects include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Representation learning on the multi-modal EHR data&lt;/li&gt;
&lt;li&gt;Deep sequential models for EHR time-series&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information, see 
&lt;a href=&#34;https://www.cambridge.org/core/journals/acta-neuropsychiatrica/article/psychiatric-clinical-outcome-prediction-psycop-cohort-leveraging-the-potential-of-electronic-health-records-in-the-treatment-of-mental-disorders/73CDCC5B36FF1347E6419EC7B80DEC48&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;our paper which outlines the cohort and research directions&lt;/a&gt;, or reach out!&lt;/p&gt;
&lt;h2 id=&#34;members&#34;&gt;Members&lt;/h2&gt;
&lt;p&gt;Faculty:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://pure.au.dk/portal/da/persons/soeren-dinesen-oestergaard%2896ff5c6c-20cf-4531-aaee-d10e4efd6292%29.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Professor Søren Dinesen Østergaard (PI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://pure.au.dk/portal/da/persons/andreas-aalkjaer-danielsen%287cd087d3-aaea-4d54-93b8-04601ed13c0a%29.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Andreas Aalkjær Danielsen, MD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://knielbo.github.io&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Associate professor Kristoffer Laigaard Nielbo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PhD students:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;lassehansen.me&#34;&gt;Lasse Hansen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;kennethenevoldsen.com&#34;&gt;Kenneth Enevoldsen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.linkedin.com/in/martin-bernstorff-03226a124&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Martin Bernstorff&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.linkedin.com/in/erik-perfalk/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Erik Perfalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://www.linkedin.com/in/frida-h%c3%a6strup-7716741a1&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Frida Hæstrup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Inferring Mental Illness from Voice</title>
      <link>https://lassehansen.me/talk/turing_talk/</link>
      <pubDate>Thu, 21 Apr 2022 15:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/talk/turing_talk/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click on the &lt;strong&gt;Slides&lt;/strong&gt; button above to view the built-in slides feature.
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Invited talk at the Data Science for Mental Health group at the Alan Turing institute. Talk based on the paper &amp;ldquo;A generalizable speech emotion recognition model reveals depression and remission&amp;rdquo;, and the pitfalls and prospects of classifying mental illness from voice.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Gjallarhorn</title>
      <link>https://lassehansen.me/project/danish_wav2vec/</link>
      <pubDate>Wed, 02 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/danish_wav2vec/</guid>
      <description>&lt;p&gt;The Gjallarhorn project seeks to democratize Danish speech technology by open sourcing models and resources. We recently released a version of XLS-R-300m pretrained on 140.000 hours of Danish radio along with a model finetuned for Danish automatic speech recognition (ASR). The model outperformed the previous state-of-the-art ASR model by 20%.&lt;/p&gt;
&lt;p&gt;The project recently received a grant from the Danish e-infrastructure Cooperation (DeiC) for computational resources to continue this line of work. During the fall 2022, we will continue training and releasing new models in collaborations with 
&lt;a href=&#34;https://alvenir.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alvenir&lt;/a&gt; and the 
&lt;a href=&#34;https://alexandra.dk&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alexandra Institute&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Check out our releases on the Huggingface Hub!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href=&#34;https://huggingface.co/chcaa/xls-r-300m-danish&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Pretrained model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://huggingface.co/chcaa/alvenir-wav2vec2-base-da-nst-cv9&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Finetuned for ASR&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;team&#34;&gt;Team&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Lasse Hansen (PI)&lt;/li&gt;
&lt;li&gt;Rasmus Arpe Fogh Jensen (
&lt;a href=&#34;https://alvenir.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alvenir&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Martin Carsten Nielsen (
&lt;a href=&#34;https://alvenir.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alvenir&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Søren Winkel Holm (
&lt;a href=&#34;https://alvenir.ai&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alvenir&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Anders Pedersen (
&lt;a href=&#34;https://alexandra.dk&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Alexandra Institute&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
    <item>
      <title>Danish Foundation Models</title>
      <link>https://lassehansen.me/project/danish_foundation_models/</link>
      <pubDate>Tue, 01 Feb 2022 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/danish_foundation_models/</guid>
      <description></description>
    </item>
    
    <item>
      <title>A generalizable speech emotion recognition model reveals depression and remission</title>
      <link>https://lassehansen.me/publication/emodep/</link>
      <pubDate>Tue, 30 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/publication/emodep/</guid>
      <description></description>
    </item>
    
    <item>
      <title>DaCy: A unified framework for Danish NLP</title>
      <link>https://lassehansen.me/publication/dacy/</link>
      <pubDate>Wed, 17 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/publication/dacy/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The PSYchiatric clinical outcome prediction (PSYCOP) cohort: leveraging the potential of electronic health records in the treatment of mental disorders</title>
      <link>https://lassehansen.me/publication/psycop/</link>
      <pubDate>Mon, 09 Aug 2021 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/publication/psycop/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Neural Networks - step by step</title>
      <link>https://lassehansen.me/post/neural-networks-step-by-step/</link>
      <pubDate>Tue, 27 Apr 2021 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/post/neural-networks-step-by-step/</guid>
      <description>
&lt;script src=&#34;https://lassehansen.me/post/neural-networks-step-by-step/index.en_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;blockquote&gt;
&lt;p&gt;This was created as part of TA’ing the Data Science course for Cognitive Science students. It ended up being fairly extensive so I thought I’d share it here as others might find it useful. The document is a step-by-step walkthrough of a single training exaple of a simple feedforward neural netowrk with 1 hidden layer. Linear algebra is kept out, and emphasis is placed on what happens at the individual nodes to develop an intuition for how neural networks actually learn.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was originally created using the Tufte-style, but I failed to make the HTML version render nicely here. If you prefer, you can find a &lt;a href=&#34;https://lassehansen.me/post/nn/nn_pdf.pdf&#34;&gt;Tufte-style PDF version here&lt;/a&gt;.&lt;/p&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Neural networks (NNs) are all the rage right now and can seem kind of magical at times. In reality, most NNs are essentially just a bunch of logistic regressions&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; stacked on top of each other + a clever way of distributing blame for the predictions and updating the weights accordingly. In this handout, we will break NNs down step-by-step to hopefully de-mystify. There are exercises along the way, but I suggest reading through the whole document before doing them to have a better understanding of the big picture.&lt;/p&gt;
&lt;p&gt;I have tried to minimize the math and keep linear algebra out of the way. Implementations of neural networks in libraries such as &lt;code&gt;torch&lt;/code&gt;, &lt;code&gt;keras&lt;/code&gt;/&lt;code&gt;tensorflow&lt;/code&gt; etc. use matrix multiplications and other linear algebra instead of what I’ll show you here&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;. The reason for doing it this way, is that it, at least for me, is far more intuitive to see what happens at each individual node. Once you understand this, understanding how it’s done with linear algebra is not as daunting.&lt;/p&gt;
&lt;p&gt;Let’s begin our journey by understanding &lt;strong&gt;activation functions&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;activation-functions&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Activation functions&lt;/h2&gt;
&lt;p&gt;Consider a standard feedforward neural network (aka multilayer perceptron)&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:mlp&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;nn_imgs/nn_bare.png&#34; alt=&#34;Figure 1: Standard MLP&#34; width=&#34;200%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Figure 1: Standard MLP
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This network takes 3 inputs (a row of 3 variables/columns in your data), has 4 hidden nodes, and a single output node which means it will output a single number. For regression and binary classification you use a single output node, and for multiclass classification you use an output node per group.&lt;/p&gt;
&lt;p&gt;Upon creation of the network, all weights between the nodes (i.e. the links/lines in the image) are initiated with a random value&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;. When the networks get an input of training examples, the inputs are multiplied by the weights and fed to the next layer of nodes. In our case, each node in the hidden layer receives an input which is a vector of 3 elements: the input times the weight from each input node to the hidden node. Before the hidden node can propagate the signal forward, it has to aggregate it to a single number with a &lt;em&gt;non-linear&lt;/em&gt; function.&lt;a href=&#34;#fn4&#34; class=&#34;footnote-ref&#34; id=&#34;fnref4&#34;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; This is where the activation function comes in.&lt;/p&gt;
&lt;p&gt;Nowadays, ReLU (and its variants) is the most common activation function, but sigmoid is the OG. The equations are as follows:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[ReLU = max(0, input)\]&lt;/span&gt;
&lt;span class=&#34;math display&#34;&gt;\[sigmoid = \frac{1}{1+e^{-input}}\]&lt;/span&gt;
The input in the above equations refers to the &lt;em&gt;net input&lt;/em&gt; to the nodes, i.e. the weighted sum of all inputs. All the activation function does, is apply &lt;em&gt;some function&lt;/em&gt; to the net input. This becomes the node’s &lt;em&gt;activation&lt;/em&gt; and is what is fed to the next layer of nodes in the network in exactly the same manner as from the input to the first hidden layer.&lt;/p&gt;
&lt;p&gt;Let’s see a quick example of how to calculate activation for a single hidden node with 3 input nodes. Let’s assume this is the top hidden node in Figure 1, &lt;span class=&#34;math inline&#34;&gt;\(h_1\)&lt;/span&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Assume we get an input of (1, 3, 5) in
# x1, x2, x3 respectively
input_nodes &amp;lt;-  c(1, 3, 5)

# these are weights coming into h1
# first element from x1, second from x2 etc
weights_i_h1 &amp;lt;-  c(0.5, 1.2, -0.3)

(weighted_input &amp;lt;-  input_nodes * weights_i_h1)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1]  0.5  3.6 -1.5&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(net_input &amp;lt;- sum(weighted_input))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 2.6&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Define our activation functions 
sigmoid &amp;lt;- function(x) 1 / (1+ exp(-x))
relu &amp;lt;- function(x) max(0, x)

# Calculate activations
# Note: each node only has 1 activation function. 
# I showed both here to illustrate the difference between them
(h1_activation &amp;lt;- sigmoid(net_input))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.9308616&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;relu(net_input)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 2.6&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Visually, this is what’s happening.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;nn_imgs/nn_1.png&#34; alt=&#34;Figure 2: Activation function&#34;  /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: Figure 2: Activation function
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Repeat the process for all the nodes in the network and voíla - you just implemented the forward pass of a neural network!&lt;a href=&#34;#fn5&#34; class=&#34;footnote-ref&#34; id=&#34;fnref5&#34;&gt;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Notice how similar this is to logistic regression: You have a bunch of input nodes/variables, which have an associated weight (betas), which are combined using the sigmoid (aka logistic) function.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: each node also has a &lt;em&gt;bias&lt;/em&gt; which is an extra trainable parameter (essentially an intercept). For simplicity we assume it to be zero here.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;details&gt;
&lt;summary&gt;
&lt;span style=&#34;font-size:1.6rem; font-weight:600&#34;&gt; Exercise &lt;/span&gt;
&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;Implement the entire forward pass for the neural network in the image. (3 input nodes, 4 hidden nodes, 1 output node). Randomly initialize the weights and use [1, 3, 5] as the input nodes. Feel free to use either Python or R.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;/div&gt;
&lt;div id=&#34;gradient-descent&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Gradient descent&lt;/h2&gt;
&lt;p&gt;Hurray, the forward pass is done! The network is just making random predictions for now though, so we need to give it a way to learn. This is where &lt;em&gt;backpropagation&lt;/em&gt; and &lt;em&gt;gradient descent&lt;/em&gt; comes in. Now, as with any regression model, we need to define some &lt;em&gt;loss function&lt;/em&gt;, i.e. how we calculate the “goodness” of the model. The loss function to use depends on the problem at hand, but a common one for regression is &lt;em&gt;sum of squares errors&lt;/em&gt;. The equation is very straightforward and looks like this:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[SSE = \sum^n_{i=1}(y-\hat{y})^2\]&lt;/span&gt;
Where &lt;span class=&#34;math inline&#34;&gt;\(y\)&lt;/span&gt; is the label and &lt;span class=&#34;math inline&#34;&gt;\(\hat{y}\)&lt;/span&gt; your prediction. The greater the distance your prediction is from the target, the larger the SSE is. Therefore, we want to minimize this.&lt;/p&gt;
&lt;p&gt;From high school calculus, you might recall that the derivative of a function is simply the slope of the function. &lt;a href=&#34;#fn6&#34; class=&#34;footnote-ref&#34; id=&#34;fnref6&#34;&gt;&lt;sup&gt;6&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;img/grad_desc.png&#34; alt=&#34;Figure 3: Loss landscape&#34; width=&#34;100%&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
(#fig:grad_desc)Figure 3: Loss landscape
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The plot above is an illustration of the loss (e.g. the SSE) given different combinations of &lt;em&gt;weights&lt;/em&gt; between nodes.&lt;a href=&#34;#fn7&#34; class=&#34;footnote-ref&#34; id=&#34;fnref7&#34;&gt;&lt;sup&gt;7&lt;/sup&gt;&lt;/a&gt; As the image illustrates, if we are unlucky we can get stuck in &lt;em&gt;local minima&lt;/em&gt;, i.e. an area where the gradient is zero, but that is not the &lt;em&gt;global minimum&lt;/em&gt;, i.e. the combination of weights with the lowest possible loss. This is just a fact of life for neural networks - you are in no way guaranteed to find the optimal solution. Different weight initializations will lead you to find different minima, which can vary substantially.&lt;/p&gt;
&lt;p&gt;Anyhow, the way we minimize the loss is to calculate the gradient of the loss function in our current state and make changes to our weights based on this. The sign (+ or -) of the gradient tells us which direction to go, and the magnitude tells us how large steps to take. In essence, we start at some random point on the graph above, and slowly make our way down, until we hopefully end at the green dot.&lt;/p&gt;
&lt;p&gt;The procedure differs slightly if the node is an output or hidden node. Let’s go through it for output nodes first.&lt;/p&gt;
&lt;p&gt;In our case, we had a single output node, so let’s assume we’re doing binary classification (is it a 0 or a 1?).&lt;a href=&#34;#fn8&#34; class=&#34;footnote-ref&#34; id=&#34;fnref8&#34;&gt;&lt;sup&gt;8&lt;/sup&gt;&lt;/a&gt; For binary classification, it makes sense to use the sigmoid as activation function for our output, as it squishes the values to a range between 0 and 1. To calculate the derivative of the loss function with respect to the weights (and biases), &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt;, we need to use the chain rule which eventually gives us this&lt;a href=&#34;#fn9&#34; class=&#34;footnote-ref&#34; id=&#34;fnref9&#34;&gt;&lt;sup&gt;9&lt;/sup&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[
\begin{aligned}
\delta &amp;amp;= \textrm{the derivative of the loss function} \cdot  \\&amp;amp; \textrm{the derivative of the activation function}
\end{aligned}
\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Using SSE and sigmoid activation function we get this:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[\delta = 2(y-\hat{y}) \cdot a(1-a)\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Where &lt;span class=&#34;math inline&#34;&gt;\(a\)&lt;/span&gt; is the node’s activation, i.e. the value we get after using the activation function (sigmoid) on the sum of the weighted input.&lt;/p&gt;
&lt;p&gt;Let’s see an example calculation.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# assume the true label of the target is 1
label &amp;lt;-  1

# assume that the activation of the output
# node (after the entire forward pass) is 0.6
output_node_activation &amp;lt;- 0.6

# Derivative of the sigmoid function:
sigmoid_derivative &amp;lt;- function(x) x*(1-x)

(delta_o &amp;lt;- 2*(label - output_node_activation) * 
    sigmoid_derivative(output_node_activation))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.192&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Visually, this is what’s happening.&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-4&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;nn_imgs/nn_2.png&#34; alt=&#34;Figure 4: Calculation of $\delta$ of the output node&#34;  /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: Figure 4: Calculation of &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; of the output node
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Easy! Notice that the value of &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; is positive. This is because our prediction (0.6) was below the label (1) and we should therefore increase the weights to get closer to the correct prediction. Had the label been 0 instead, &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; would have been a negative number. Essentially, &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; tells us how we should make changes to our weights to get closer to the optimal state as defined by the loss function.&lt;a href=&#34;#fn10&#34; class=&#34;footnote-ref&#34; id=&#34;fnref10&#34;&gt;&lt;sup&gt;10&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The clever thing about &lt;strong&gt;backpropagation&lt;/strong&gt; is that weights are updated based on their magnitude. That is, if the error is large, large activations will change more than small activations, as they “contribute” more to the prediction than the smaller ones. As the name implies, the errors are &lt;em&gt;propagated back&lt;/em&gt; into the network (what is known as the &lt;em&gt;backward pass&lt;/em&gt;). Calculating &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; for the hidden layer is the first step in this process.&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
&lt;span style=&#34;font-size:1.6rem; font-weight:600&#34;&gt; Exercise &lt;/span&gt;
&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;Change the value of the &lt;code&gt;output_node_activation&lt;/code&gt; and see how the delta changes. What do you expect happens as it gets closer to the label?&lt;/li&gt;
&lt;li&gt;What do you think the derivative of the ReLU function is? Plot the ReLu function, implement its derivative function, and try using it instead of the sigmoid.&lt;/li&gt;
&lt;li&gt;Use the value for the &lt;code&gt;output_node_activation&lt;/code&gt; that you calculated in the first exercise (the forward pass).&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;p&gt;Now, to distribute blame for the prediction.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;backpropagation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Backpropagation&lt;/h2&gt;
&lt;p&gt;As mentioned, backpropagation works a bit differently whether you’re calculating weights coming in to an output or a hidden node. For weights coming in to the output nodes, the weight changes are proportional to the learning rate &lt;span class=&#34;math inline&#34;&gt;\(\eta\)&lt;/span&gt;, the activation of the predecessor node, and the &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; we just calculated. To continue with our example from Figure 1, let’s calculate the weight changes for the weight going from the top hidden node &lt;span class=&#34;math inline&#34;&gt;\(h_1\)&lt;/span&gt; to the output &lt;span class=&#34;math inline&#34;&gt;\(\hat{y}_1\)&lt;/span&gt;.&lt;a href=&#34;#fn11&#34; class=&#34;footnote-ref&#34; id=&#34;fnref11&#34;&gt;&lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Assume a random weight
# from h1 to yhat (calling yhat &amp;#39;o&amp;#39; for output)
weight_h1_o &amp;lt;- c(0.8)


# To calculate the weight change we need:
# 1) the activation of the input unit
#   which in this case is h1
# 2) the delta of the output unit
# 3) a learning rate (alpha)

# Learning rates differ a lot
# but let&amp;#39;s go with 0.01
alpha &amp;lt;- 0.01

# plugging numbers into the equation
(weight_change &amp;lt;- alpha * delta_o * h1_activation) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.001787254&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# update weight
(weight_h1_o &amp;lt;- weight_h1_o + weight_change)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.8017873&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-6&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;nn_imgs/nn_3.png&#34; alt=&#34;Figure 5: Weight updating&#34;  /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: Figure 5: Weight updating
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Pretty easy right? To calculate the weight change for weights going into hidden units, we simply multiply the learning rate, &lt;span class=&#34;math inline&#34;&gt;\(\alpha\)&lt;/span&gt; with the &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; (the derivative of the loss function with respect to the weights) and the activation at the preceeding node.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[\text{weight change} = \alpha \cdot \delta \cdot activation\]&lt;/span&gt;&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
&lt;span style=&#34;font-size:1.6rem; font-weight:600&#34;&gt; Exercise &lt;/span&gt;
&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;How does the learning rate impact the weight change?&lt;/li&gt;
&lt;li&gt;Calculate the weight changes for all the weights going from the hidden nodes to the output node. Use the weights &lt;code&gt;c(0.8, -1.2, 0.5, -0.3)&lt;/code&gt; for &lt;span class=&#34;math inline&#34;&gt;\(h_1, h_2, h_3, h_4\)&lt;/span&gt; respectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;p&gt;We’re getting &lt;em&gt;really&lt;/em&gt; close to having trained our little NN on a single example. All that’s left is calculating the weight changes for the weights from the input nodes to the hidden nodes. Fortunately, the procedure is largely the same as for the output units.&lt;/p&gt;
&lt;p&gt;First, we calculate &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; for the specific weight and node again. However, since we are now in the hidden layer, we don’t have access to the true label anymore. Instead, we use the &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; from the succeeding layer (the output node) multiplied by the weight from our hidden node to the output!&lt;/p&gt;
&lt;p&gt;To calculate &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; for the hidden node, &lt;span class=&#34;math inline&#34;&gt;\(\delta_{h_1}\)&lt;/span&gt;, we sum over all the &lt;span class=&#34;math inline&#34;&gt;\(\delta \cdot weight\)&lt;/span&gt; for the output node and multiply by the derivative of the activation. In other words, we calculate &lt;code&gt;delta_o_times_weight_h1 + delta_o_times_weight_h2 + delta_o_times_weight_h3 + delta_o_times_weight_h4&lt;/code&gt; and multiply by the derivative of the activation of &lt;span class=&#34;math inline&#34;&gt;\(h_1\)&lt;/span&gt;. Let’s see an example.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Using the weight we calculated before
# and setting 3 random weights for h2_o, h3_o, h4_o
weights_h_o &amp;lt;- c(weight_h1_o, 0.5, 0.1, 0.6)

(delta_o_times_weights &amp;lt;- delta_o * weights_h_o)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.1539432 0.0960000 0.0192000 0.1152000&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(sum_delta_o_times_weights &amp;lt;- sum(delta_o_times_weights))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.3843432&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(delta_h1 &amp;lt;- sum_delta_o_times_weights * 
  sigmoid_derivative(h1_activation))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.02473567&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span id=&#34;fig:unnamed-chunk-8&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;nn_imgs/nn_4.png&#34; alt=&#34;Figure 6: Calculation of $\delta_{h_1}$&#34;  /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: Figure 6: Calculation of &lt;span class=&#34;math inline&#34;&gt;\(\delta_{h_1}\)&lt;/span&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Notice that the value of &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; is substantially smaller than what it was at the output nodes. This means that the weight changes from the input nodes to the hidden nodes will be even smaller. Deep networks can run into the problem of &lt;em&gt;vanishing gradients&lt;/em&gt;, i.e. &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; becomes so small that weight changes are negligible. ReLU is far more robust to the problem of vanishing gradients than the sigmoid function, which is one of the reasons for its success.&lt;/p&gt;
&lt;p&gt;Alright, we’re getting &lt;em&gt;reeally&lt;/em&gt; close now. The last step to update the weights coming in to the hidden nodes is exactly the same as for the weights coming in to the output node: &lt;span class=&#34;math inline&#34;&gt;\(\alpha \cdot \delta \cdot activation\)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We already defined our 3 input nodes and their weights going to &lt;span class=&#34;math inline&#34;&gt;\(h_1\)&lt;/span&gt;. Let’s update them.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(weight_change &amp;lt;- alpha * delta_h1 * input_nodes)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 0.0002473567 0.0007420701 0.0012367836&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;(weights_i_h1 &amp;lt;- weights_i_h1 + weight_change)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1]  0.5002474  1.2007421 -0.2987632&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And we’re done! Rinse and repeat for the rest of the hidden nodes and you just trained one step of a neural network! For a standard feedforward neural network, this is all that is going on - just repeated a lot of times.&lt;a href=&#34;#fn12&#34; class=&#34;footnote-ref&#34; id=&#34;fnref12&#34;&gt;&lt;sup&gt;12&lt;/sup&gt;&lt;/a&gt; When this process has been conducted for each training example, the neural network has gone through one &lt;em&gt;epoch&lt;/em&gt;. You usually stop training after a certain number of epochs, or once you reach a stopping criterion.&lt;a href=&#34;#fn13&#34; class=&#34;footnote-ref&#34; id=&#34;fnref13&#34;&gt;&lt;sup&gt;13&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Doesn’t seem so magical anymore, right?&lt;/p&gt;
&lt;details&gt;
&lt;summary&gt;
&lt;span style=&#34;font-size:1.6rem; font-weight:600&#34;&gt; Exercise &lt;/span&gt;
&lt;/summary&gt;
&lt;ul&gt;
&lt;li&gt;Calculate the weight changes for all the weights going from the input nodes to the hidden nodes. You decide what the remaining weights should be.&lt;/li&gt;
&lt;li&gt;Congratulations! You have now created and trained one step of a simple neural network! All that’s left is looping over this for all the training examples, and repeat until the network converges. Go have a look through some of the code inspiration in the References and further reading section to get a sense of how this can be done.&lt;/li&gt;
&lt;li&gt;I encourage you to take a stab at implementing your own neural network (I suggest 1 or 2 hidden layers) that can take an arbitrary number of input/hidden/output layers. Feel free to follow either this or Nielsen’s way of going about it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;
&lt;p&gt;To sum up, here are the steps:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Initialize the network&lt;/strong&gt;:
Randomly initialize all the weights.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Forward pass&lt;/strong&gt;:
Pass an input to the neural network and propagate the values forward. To calculate activation of the nodes, take the weighted sum of their
input and use an activation function such as the sigmoid or ReLU.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Backward pass&lt;/strong&gt;:
Calculate the loss for your current training example. Calculate &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; for the output node(s) and update the weights coming in to the output node(s) by multiplying &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; with the learning rate and the activation of the hidden node feeding into the output node. Continue this process by propagating &lt;span class=&#34;math inline&#34;&gt;\(\delta\)&lt;/span&gt; back into the hidden layers and continually updating the weights.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Repeat&lt;/strong&gt;:
Repeat for a specific number of &lt;em&gt;epochs&lt;/em&gt; or until some stopping criterion is reached.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;references-and-further-reading&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;References and further reading&lt;/h1&gt;
&lt;div id=&#34;code-inspiration&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Code inspiration&lt;/h2&gt;
&lt;p&gt;I took an elective in Neural Networks a couple of years ago, where part of the exam was to implement a NN from scratch. You can see my code &lt;a href=&#34;https://github.com/HLasse/COSC420/blob/847869534ba9fab809f2e09b363490a8d837bd37/MLP.py&#34;&gt;here&lt;/a&gt;. It’s implemented in much the same style as this document, i.e. no linear algebra, but lots of for loops. After working through the exercises in here, it will likely seem quite straightforward to you.&lt;/p&gt;
&lt;p&gt;Kenneth wrote an implementation based on Nielsen’s to do classification on the MNIST digits dataset. You can find it on Blackboard.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;online-courses&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Online Courses&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://course.fast.ai&#34;&gt;Fast.ai: Practical Deep Learning for Coders&lt;/a&gt;
- A great and comprehensive course on neural networks. You will learn to implement them from scratch using pytorch and pick up tons of useful knowledge along the way. In particular, check out the part on SGD in &lt;a href=&#34;https://colab.research.google.com/github/fastai/fastbook/blob/master/04_mnist_basics.ipynb#scrollTo=GlKtKAI_VXT4&#34;&gt;chapter 4&lt;/a&gt; for a great introduction.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.coursera.org/specializations/deep-learning&#34;&gt;Deep Learning Specialization&lt;/a&gt;
- A true classic, updated spring 2021 with Transformer models and other goodies.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;books&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Books&lt;/h2&gt;
&lt;p&gt;Kriesel, D. A Brief Introduction to Neural Networks. &lt;a href=&#34;http://www.dkriesel.com/en/science/neural_networks&#34; class=&#34;uri&#34;&gt;http://www.dkriesel.com/en/science/neural_networks&lt;/a&gt;.
- A quite nice book on the fundamentals of neural networks. A bit old by now, but the foundations are the same.&lt;/p&gt;
&lt;p&gt;Nielsen, M. Neural Networks and Deep Learning. &lt;a href=&#34;http://neuralnetworksanddeeplearning.com&#34; class=&#34;uri&#34;&gt;http://neuralnetworksanddeeplearning.com&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes footnotes-end-of-document&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;Depending on activation function&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;Along with a whole bunch of other optimization.&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;Not always the case, some of the more advanced NNs require sophisticated initialization schemes.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn4&#34;&gt;&lt;p&gt;This is crucial and what makes NN &lt;em&gt;universal function approximators&lt;/em&gt;.&lt;a href=&#34;#fnref4&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn5&#34;&gt;&lt;p&gt;Different nodes can have different activation functions. For instance, input nodes always use the &lt;em&gt;identity&lt;/em&gt; function (i.e., no transformation is done). Hidden nodes are usually some variant of ReLU. The activation function of the output node(s) depends on the task. Doing regression? The identity function would make sense. Doing binary classification? A sigmoid would be a good choice (as the values are squished to the range 0-1).&lt;a href=&#34;#fnref5&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn6&#34;&gt;&lt;p&gt;Figure taken from &lt;a href=&#34;https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6&#34; class=&#34;uri&#34;&gt;https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6&lt;/a&gt;&lt;a href=&#34;#fnref6&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn7&#34;&gt;&lt;p&gt;This is a simplification, as in reality the weight space is highly multidimensional. To develop an intuition however, this is a useful way to look at it.&lt;a href=&#34;#fnref7&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn8&#34;&gt;&lt;p&gt;You would also use 1 output node for regression, but use e.g. the identity function as activation function for the output node instead. Normally, you would use a different loss function for binary classification, but let’s stick to SSE for simplicity.&lt;a href=&#34;#fnref8&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn9&#34;&gt;&lt;p&gt;I skipped a lot of math here. You don’t need to understand exactly how this is derived to understand neural networks, but feel free to read up on it if so inclined.&lt;a href=&#34;#fnref9&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn10&#34;&gt;&lt;p&gt;See Figure 3.&lt;a href=&#34;#fnref10&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn11&#34;&gt;&lt;p&gt;Learning rates are a pretty big deal. Many different optimizers exist (you’ve probably heard of ADAM or RMSProp), which use clever ways of adapting the learning rate (among other things).&lt;a href=&#34;#fnref11&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn12&#34;&gt;&lt;p&gt;This has gone through the most basic/original formulation of a neural network. Of course, many tricks have been added, such as adding &lt;em&gt;momentum&lt;/em&gt; (weight changes are also proportional to the magnitude of the previous weight change), and perhaps more importantly &lt;strong&gt;stochastic gradient descent&lt;/strong&gt;. In essence, stochastic gradient simply works in batches of multiple inputs instead of a single one. That is, weights changes are not calculated for every single training row, but is instead accumulated over e.g. 48 rows before weights are changed. See the Further Reading section for more&lt;a href=&#34;#fnref12&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn13&#34;&gt;&lt;p&gt;Could be a certain validation error threshold or once the network &lt;em&gt;converges&lt;/em&gt;, i.e. reaches a stable state.&lt;a href=&#34;#fnref13&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>TextDescriptives</title>
      <link>https://lassehansen.me/project/textdescriptives/</link>
      <pubDate>Mon, 01 Feb 2021 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/textdescriptives/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Predicting Presence and Severity of Depression from Speech Using Emotional Transfer Learning</title>
      <link>https://lassehansen.me/talk/ds_hour/</link>
      <pubDate>Wed, 16 Dec 2020 14:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/talk/ds_hour/</guid>
      <description>&lt;div class=&#34;alert alert-note&#34;&gt;
  &lt;div&gt;
    Click on the &lt;strong&gt;Slides&lt;/strong&gt; button above to view the slides.
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A presentation of my internship project while at Hoffman-La Roche on using transfer learning from emotional speech to detect depression. We trained a Mixture of Experts consisting of gradient-boosted decision tree classifiers to classify happiness and sadness in datasets of acted emotional speech in English and German. The model was applied to a dataset of interviews with Danish speaking patients with first episode depression and matched healthy controls. We observed significant seperation between the two groups, and found patients in remission to speak similarly to controls. Further, we conducted experiments on the effect of removing background noise and speaker diarization, which showed consistent levels of background noise to be crucial for consistent inferences.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>gglearn2</title>
      <link>https://lassehansen.me/project/gglearn2/</link>
      <pubDate>Mon, 01 Jun 2020 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/gglearn2/</guid>
      <description></description>
    </item>
    
    <item>
      <title>HOPE Dashboard</title>
      <link>https://lassehansen.me/project/hope/</link>
      <pubDate>Wed, 01 Apr 2020 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/hope/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Garmin Run Stats</title>
      <link>https://lassehansen.me/project/garmin/</link>
      <pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/project/garmin/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Slides</title>
      <link>https://lassehansen.me/slides/example/</link>
      <pubDate>Tue, 05 Feb 2019 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/slides/example/</guid>
      <description>&lt;h1 id=&#34;create-slides-in-markdown-with-academic&#34;&gt;Create slides in Markdown with Academic&lt;/h1&gt;
&lt;p&gt;
&lt;a href=&#34;https://sourcethemes.com/academic/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Academic&lt;/a&gt; | 
&lt;a href=&#34;https://sourcethemes.com/academic/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;features&#34;&gt;Features&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Efficiently write slides in Markdown&lt;/li&gt;
&lt;li&gt;3-in-1: Create, Present, and Publish your slides&lt;/li&gt;
&lt;li&gt;Supports speaker notes&lt;/li&gt;
&lt;li&gt;Mobile friendly slides&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;controls&#34;&gt;Controls&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Next: &lt;code&gt;Right Arrow&lt;/code&gt; or &lt;code&gt;Space&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Previous: &lt;code&gt;Left Arrow&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Start: &lt;code&gt;Home&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Finish: &lt;code&gt;End&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Overview: &lt;code&gt;Esc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Speaker notes: &lt;code&gt;S&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Fullscreen: &lt;code&gt;F&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Zoom: &lt;code&gt;Alt + Click&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href=&#34;https://github.com/hakimel/reveal.js#pdf-export&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PDF Export&lt;/a&gt;: &lt;code&gt;E&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;code-highlighting&#34;&gt;Code Highlighting&lt;/h2&gt;
&lt;p&gt;Inline code: &lt;code&gt;variable&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Code block:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;porridge = &amp;quot;blueberry&amp;quot;
if porridge == &amp;quot;blueberry&amp;quot;:
    print(&amp;quot;Eating...&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;math&#34;&gt;Math&lt;/h2&gt;
&lt;p&gt;In-line math: $x + y = z$&lt;/p&gt;
&lt;p&gt;Block math:&lt;/p&gt;
&lt;p&gt;$$
f\left( x \right) = ;\frac{{2\left( {x + 4} \right)\left( {x - 4} \right)}}{{\left( {x + 4} \right)\left( {x + 1} \right)}}
$$&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id=&#34;fragments&#34;&gt;Fragments&lt;/h2&gt;
&lt;p&gt;Make content appear incrementally&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{{% fragment %}} One {{% /fragment %}}
{{% fragment %}} **Two** {{% /fragment %}}
{{% fragment %}} Three {{% /fragment %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press &lt;code&gt;Space&lt;/code&gt; to play!&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;fragment &#34; &gt;
One
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
&lt;strong&gt;Two&lt;/strong&gt;
&lt;/span&gt;
&lt;span class=&#34;fragment &#34; &gt;
Three
&lt;/span&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A fragment can accept two optional parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;class&lt;/code&gt;: use a custom style (requires definition in custom CSS)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;weight&lt;/code&gt;: sets the order in which a fragment appears&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id=&#34;speaker-notes&#34;&gt;Speaker Notes&lt;/h2&gt;
&lt;p&gt;Add speaker notes to your presentation&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{% speaker_note %}}
- Only the speaker can read these notes
- Press `S` key to view
{{% /speaker_note %}}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Press the &lt;code&gt;S&lt;/code&gt; key to view the speaker notes!&lt;/p&gt;
&lt;aside class=&#34;notes&#34;&gt;
  &lt;ul&gt;
&lt;li&gt;Only the speaker can read these notes&lt;/li&gt;
&lt;li&gt;Press &lt;code&gt;S&lt;/code&gt; key to view&lt;/li&gt;
&lt;/ul&gt;

&lt;/aside&gt;
&lt;hr&gt;
&lt;h2 id=&#34;themes&#34;&gt;Themes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;black: Black background, white text, blue links (default)&lt;/li&gt;
&lt;li&gt;white: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;league: Gray background, white text, blue links&lt;/li&gt;
&lt;li&gt;beige: Beige background, dark text, brown links&lt;/li&gt;
&lt;li&gt;sky: Blue background, thin dark text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;ul&gt;
&lt;li&gt;night: Black background, thick white text, orange links&lt;/li&gt;
&lt;li&gt;serif: Cappuccino background, gray text, brown links&lt;/li&gt;
&lt;li&gt;simple: White background, black text, blue links&lt;/li&gt;
&lt;li&gt;solarized: Cream-colored background, dark green text, blue links&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;section data-noprocess data-shortcode-slide
  
      
      data-background-image=&#34;/img/boards.jpg&#34;
  &gt;

&lt;h2 id=&#34;custom-slide&#34;&gt;Custom Slide&lt;/h2&gt;
&lt;p&gt;Customize the slide style and background&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-markdown&#34;&gt;{{&amp;lt; slide background-image=&amp;quot;/img/boards.jpg&amp;quot; &amp;gt;}}
{{&amp;lt; slide background-color=&amp;quot;#0000FF&amp;quot; &amp;gt;}}
{{&amp;lt; slide class=&amp;quot;my-style&amp;quot; &amp;gt;}}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h2 id=&#34;custom-css-example&#34;&gt;Custom CSS Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s make headers navy colored.&lt;/p&gt;
&lt;p&gt;Create &lt;code&gt;assets/css/reveal_custom.css&lt;/code&gt; with:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-css&#34;&gt;.reveal section h1,
.reveal section h2,
.reveal section h3 {
  color: navy;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h1 id=&#34;questions&#34;&gt;Questions?&lt;/h1&gt;
&lt;p&gt;
&lt;a href=&#34;https://spectrum.chat/academic&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Ask&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
&lt;a href=&#34;https://sourcethemes.com/academic/docs/managing-content/#create-slides&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Documentation&lt;/a&gt;&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Analysing Political Biases in Danish Newspapers Using Sentiment Analysis</title>
      <link>https://lassehansen.me/publication/political_biases/</link>
      <pubDate>Fri, 07 Jul 2017 00:00:00 +0000</pubDate>
      <guid>https://lassehansen.me/publication/political_biases/</guid>
      <description></description>
    </item>
    
  </channel>
</rss>
