<![CDATA[kyrcha.info]]>http://kyrcha.infoGatsbyJSTue, 21 Dec 2021 19:26:46 GMT<![CDATA[2019 in review]]>http://kyrcha.info/2020/03/23/2019-in-reviewhttp://kyrcha.info2020/03/23/2019-in-reviewMon, 23 Mar 2020 16:26:00 GMT<p>Finally! I had this post in draft mode for about three months, but better late than never. This post also signifies the start of blogging in 2020!</p> <p>2019 is (well) over and through this post I will try to summarize important aspects of my life that can be quantified and reflect on what went well, what didn&#39;t and what I&#39;ve learned this year the <a href="https://jamesclear.com/2018-annual-review">James&#39; Clear way</a>. As in <a href="http://kyrcha.info/2019/01/23/2018-in-review">my 2018 review</a>, the first part is pretty quantitative, while the second is more qualitative.</p> <p>In 2019, and after failing a couple of tenure applications, I switched from the Academia to the Industry and also left Cyclopt, the spin-off company I co-founded. In fact I switched two positions in the Industry, back-to-back, because the first company I signed up for (a mature start-up) got out of funding just one month after I started working. In my job-hunting adventure that started early in the summer of 2019, to get a couple of offers, I applied to around 80 positions. I will leave the details and the stats for another post.</p> <p>In general, I don&#39;t consider that 2019 was a good or lucky year for me: <a href="http://kyrcha.info/2019/12/31/thanks-dad">My father passed away</a> and I had to look for a job, one month after starting on one. But as Nietzsche said: <em>&quot;What does not kill me makes me stronger&quot;</em>.</p> <h2 id="quantitative">Quantitative</h2> <h3 id="books">Books</h3> <p>Even though I started a lot of books (both non-fiction and tech ones), I only finished:</p> <ul> <li><a href="https://amzn.to/2GlbtWn">Atomic Habits</a> by <a href="https://jamesclear.com/">James Clear</a>. Has all the theory, the systems and the processes you need to start new (better) habits, but as with many other personal improvement guides the problem I have is in execution. I will definetelly re-read it, since I read it during my summer vacations and this is not a good time start building new habits.</li> </ul> <h3 id="launches">&quot;Launches&quot;</h3> <p>I launched a couple of web sites and apps this year either alone or with teams:</p> <ul> <li><a href="https://github.com/AuthEceSoftEng/cenote">Cenote</a></li> <li>Cyclopt bot (from my startup, now down and discontinued)</li> <li><a href="http://se-ml-interviews.kyrcha.info/">The interview questions knowledge base</a> in <a href="http://raneto.com/">Raneto</a></li> <li><a href="http://kyrcha.info/2019/05/22/launching-the-new-kyrcha-info-using-gatsby-bulma-contentful-and-surge">This website</a>: kyrcha.info</li> <li><a href="http://npm-miner.com/">npm-miner</a> an infrastructure that performs static code analysis of the npm registry.</li> </ul> <p>and some open source software:</p> <ul> <li><a href="https://github.com/kyrcha/gh-downloader">gh-downloader</a> for downloading files from GitHub given search criteria</li> <li><a href="https://github.com/kyrcha/character-position">character-position</a> VS code extension for revealing the current character position</li> <li><a href="https://github.com/figify/eslint-config">eslint-config</a> a shareable config for web application development using node.js and react.</li> </ul> <h3 id="talks">Talks</h3> <p>I gave two talks:</p> <ul> <li>My talk at ECEIG: <a href="http://kyrcha.info/2019/05/16/simple-rules-for-building-robust-machine-learning-models">Simple rules for building robust machine learning models</a></li> <li>My talk at ECESCON: <a href="http://kyrcha.info/2019/04/23/advices-and-strategies-i-learned-from-my-first-business-attempt">Advices and strategies I learned from my first business attempt</a></li> </ul> <h3 id="courses">Courses</h3> <p>I taught two courses in the university:</p> <ul> <li>Big Data Analysis to graduates</li> <li>Software Engineering to under-graduates</li> </ul> <h3 id="research-proposals">Research Proposals</h3> <p>I submitted one proposal for research funding as a principal investigator to <a href="http://www.elidek.gr/en/homepage/">ELIDEK</a> (early in 2020 I learned that it was unsuccessful).</p> <h3 id="blog-posts">Blog posts</h3> <p>I wrote 12 blog posts, much, much better than previous years:</p> <ol> <li><a href="http://kyrcha.info/2019/01/23/2018-in-review">2018 in review</a></li> <li><a href="http://kyrcha.info/2019/01/29/make-your-environment-variables-more-robust-by-making-them-more-fragile">Making env vars more robust by making them more fragile</a></li> <li><a href="http://kyrcha.info/2019/03/22/on-collinearity-and-feature-selection/">On collinearity and feature selection</a></li> <li><a href="http://kyrcha.info/2019/04/05/calculating-the-running-average-and-variance-of-streaming-data-using-redis">Calculating the running average and variance of streaming data using redis</a></li> <li><a href="http://kyrcha.info/2019/04/23/advices-and-strategies-i-learned-from-my-first-business-attempt">Advices and strategies I learned from my first business attempt</a></li> <li><a href="http://kyrcha.info/2019/05/16/simple-rules-for-building-robust-machine-learning-models">Simple rules for building robust machine learning models</a></li> <li><a href="http://kyrcha.info/2019/05/22/launching-the-new-kyrcha-info-using-gatsby-bulma-contentful-and-surge">Launching the new kyrcha.info using Gatsby, Bulma, Contentful and Surge</a></li> <li><a href="http://kyrcha.info/2019/10/15/sending-graphql-queries-using-http-client-in-go/">Sending graphql queries using http.Client in Go</a></li> <li><a href="http://kyrcha.info/2019/10/25/fitting-modified-gompertz-baranyi-equations-bacterial-growth-r/">Fitting modified Gombertz and Baranyi equations for bacterial growth in R</a></li> <li><a href="http://kyrcha.info/2019/11/07/generating-plausible-paper-titles-with-recurrent-neural-networks">Generating plausible paper titles with Recurrent Neural Networks</a></li> <li><a href="http://kyrcha.info/2019/11/07/what-is-a-startup-mastermind-group">What is a (startup) mastermind group?</a></li> <li><a href="http://kyrcha.info/2019/11/26/data-outlier-detection-using-the-chebyshev-theorem-paper-review-and-online-adaptation">Data Outlier Detection using the Chebyshev Theorem - Paper review and online adaptation</a></li> </ol> <p>The website had 2,705 users visiting vs. 5,189 in 2018. I am not sure abuot the drop. Probably also had to do with the switch in technologies from Wordpress (with SEO plugins etc.) to GatsbyJS.</p> <h3 id="publications">Publications</h3> <p>Published 4 papers out of 8 submissions (50%). The <a href="http://kyrcha.info/publications">published papers</a> were:</p> <ul> <li><em>&quot;npm-miner: An Infrastructure for Measuring the Quality of the npm Registry&quot;</em> in MSR 2018</li> <li><em>&quot;Predicting hyperparameters from meta-features in binary classification problems&quot;</em> in AutoML 2018</li> <li>&quot;A Natural Language Driven Approach for Automated Web API Development&quot;_ in WS-REST 2018</li> <li>and &quot;Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks&quot;_ in arxiv</li> </ul> <h3 id="competitions">Competitions</h3> <ul> <li>Really worked with <a href="https://www.kaggle.com/c/humpback-whale-identification">Kaggle Humpback Whale Identification Competition</a> and learned new stuff even though the approach did not generalize well. It was my first image recognition pipeline ever written so I&#39;ve learned a lot.</li> <li>Worked on the <a href="https://github.com/KTH/codrep-2019">CodRep 2019</a> competition, 2nd place. The writeup of the compeittions can be found <a href="https://gist.github.com/kyrcha/4d4ebf960051ccfc8764d1f0f7ca6a05">here</a>.</li> </ul> <h2 id="qualitative">Qualitative</h2> <h3 id="things-that-went-well">Things that went well</h3> <ul> <li>Output in terms of blog posts, papers submitted, competitions participated, software produced and launched, job applications submitted, interviews conducted.</li> </ul> <h3 id="things-that-didnt-go-well">Things that didn&#39;t go well</h3> <ul> <li>Unfortunatelly we didn&#39;t get any VC funding in 2019 for Cyclopt after applying and following the processes of 3 VCs, so I decided to leave the start-up and focus on other aspects of my career.</li> <li>A recurring theme, my weight and in general the fact that I didn&#39;t weight-lifted as much as I wanted (less than 2 times per week).</li> </ul> <h2 id="what-ive-learned">What I&#39;ve learned</h2> <p>The revelation after re-reading <a href="https://amzn.to/2xmeRPv">&quot;the subtle art of not giving a f*ck&quot;</a> that <strong>to be happy, solve problems you enjoy solving</strong>. Life is suffering. You will suffer. So at least for the problems you can pick, pick the ones you enjoy solving. I also created a slide for one of my talks for process:</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/23DKGNgQiUbrxDwregf4kZ/d40f6d0154e994fd1e21e5464fc0b56c/Screenshot_2020-03-23_18.00.15.png" alt="General advice"></p> <p><strong>If you want to do a career in Academia</strong>:</p> <ol> <li>Don&#39;t do work you would do in a software company. Take on projects that will have research outcomes and not just build applications.</li> <li>Follow the publish or perish rule.</li> </ol> <p><strong>I learned about <a href="https://twitter.com/naval">Naval</a> Ravikant</strong> and devoured a lot of <a href="https://theangelphilosopher.com/">content</a> from him, especially the viral tweetstorm <a href="https://twitter.com/naval/status/1002103360646823936">&quot;How to Get Rich (without getting lucky)&quot;</a> and its <a href="https://podcasts.apple.com/us/podcast/how-to-get-rich-every-episode/id1454097755?i=1000440401437">related podcast</a>.</p> <p>I don&#39;t remember where I caught this but <strong>when you press submit to deploy an application you shouldn&#39;t prepare your suitcases just yet</strong>.</p> <h2 id="2020">2020</h2> <p>No goals for 2020. I am setting up processes and systems, building <a href="https://amzn.to/2QGrFax">atomic habits</a>, <a href="https://seths.blog/2019/12/only-the-hits/">shipping early and often</a>.</p> <h2 id="previous-reviews">Previous reviews</h2> <ul> <li><a href="http://kyrcha.info/2019/01/23/2018-in-review">2018</a></li> </ul> <p>Photo from <a href="https://pixabay.com/el/users/mohamed_hassan-5229782/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3623461">mohamed Hassan</a> from <a href="https://pixabay.com/el/?utm_source=link-attribution&amp;utm_medium=referral&amp;utm_campaign=image&amp;utm_content=3623461">Pixabay</a></p> <![CDATA[Thanks dad]]>http://kyrcha.info//2019/12/31/thanks-dadhttp://kyrcha.info/2019/12/31/thanks-dadTue, 31 Dec 2019 16:00:00 GMT<p>In memory of Christodoulos K. Chatzidimitiou (1944-2019). Thanks for everything dad!</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/Ui40oEHNkUNLJau5hZ3Ru/9b1826f114ed6e6fb92f4af9feea6628/Dad.jpg" alt="Dad"></p> <![CDATA[Data Outlier Detection using the Chebyshev Theorem - Paper review and online adaptation]]>http://kyrcha.info/2019/11/26/data-outlier-detection-using-the-chebyshev-theorem-paper-review-and-online-adaptationhttp://kyrcha.info2019/11/26/data-outlier-detection-using-the-chebyshev-theorem-paper-review-and-online-adaptationTue, 26 Nov 2019 09:58:00 GMT<p>This is the first paper review, in a series of paper reviews and implementations I would like to do, on <em>online (or streaming) outlier detection algorithms</em>. I believe with the advent of the Internet of Things it will be an important subject to research and study with all the sensory data to be produced and consumed. All the knowledge from this series will be gathered in the <a href="https://github.com/kyrcha/awesome-streaming-outlier-detection">awesome-streaming-outlier-detection</a> repository.</p> <p>In this first post in the series I am going to present the 2005 paper: <a href="https://www.researchgate.net/publication/224624985_Data_outlier_detection_using_the_Chebyshev_theorem">Data Outlier Detection using the Chebyshev Theorem</a> by Brett G. Amidan, Thomas A. Ferryman, and Scott K. Cooley.</p> <p>The paper uses the <a href="https://en.wikipedia.org/wiki/Chebyshev%27s_inequality">Chebyshev inequality</a> in order to calculate upper and lower outlier detection limits. These thresholds give a bound to the percentage of data that fall ouside <em>k</em> standard deviations from the mean, while on the same time, the calculations make no assumptions about the distribution of the data. This is important as often the distribution is not known and we don&#39;t want to make any assumption over it. The only assumptions the method makes is that the data are independent measurements and that only a small percentage of the outliers are contained in the data. </p> <p>With an unknown distribution, the Chebysev inequality is:</p> <p>$$P(|X - \mu| \leq k \sigma) \geq (1 - \frac{1}{k^2})$$ <strong>(1)</strong></p> <p>and indicates that if <em>k=2</em> at least <em>75%</em> of the data would fall within <em>2</em> standard deviations from the mean (lower bound). The equation above can also be changed as:</p> <p>$$P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}$$ <strong>(2)</strong></p> <p>to indicate that at most <em>25%</em> of the data is outside <em>2</em> standard deviations from the mean (upper bound).</p> <p>There is also a special case when, if we assume that the data is unimodal, (data with only one peak, which can be examined for example by plotting the data) we can use the unimodal Chebyshev&#39;s inequality. But since we are talking about streaming data that arrive one after the other, I will assume that nothing is known in advance and continue with the standard case.</p> <p>From the Chebyshev&#39;s inequality an Outlier Detection Value (ODV) is calculated. Any data value that is more extreme than the ODV is considered to be an outlier.</p> <h2 id="the-algorithm">The algorithm</h2> <p>The algorithm follows a two stage process.</p> <h3 id="stage-1">Stage 1</h3> <p>The first stage is responsible for trimming the data from values that are possibly outliers.</p> <ol> <li><p>We decide on a value of $p_1$, which can be considered as the probability of seeing an expected outlier. We can use values like 0.1, 0.05 or even 0.01.</p> </li> <li><p>Solving equation <strong>(2)</strong> for <em>k</em> we have equation <strong>(3)</strong>. Anything more extreme than <em>k</em> standard deviations is considered a stage-1 outlier. So if $p_1=0.05$ then $k=4.472$ and thus everything more extreme than 4.472 standard deviations will be considered a stage-1 outlier.</p> </li> </ol> <p>$$k=\frac{1}{\sqrt{p_1}}$$ <strong>(3)</strong></p> <ol start="3"> <li>Then we calculate ODVs (upper and lower bounds) for stage-1, where $\mu$ and $\sigma$ are the sample mean and the sample standard deviation derived from the data:</li> </ol> <p>$$ ODV_{1U} = \mu + k \sigma$$ <strong>(4)</strong></p> <p>$$ ODV_{1L} = \mu - k \sigma$$ <strong>(5)</strong></p> <p>Data that are more extreme than the ODVs of stage-1 are removed from the data for the second phase of the algorithm. The truncated dataset (i.e. without the outliers) is used to calculate the mean and standard deviation needed for the Chebyshev&#39;s inequality.</p> <h3 id="stage-2">Stage 2</h3> <p>The second stage derives the final ODVs.</p> <ol> <li>Select a value for $p_2$, the expected probability of seeing an outlier. Usually smaller than $p_1$, used to actually determine the outliers. Reasonable values are 0.01, 0.001, 0.0001.</li> <li>Solve equation <strong>(2)</strong> for <em>k</em> and get equation <strong>(6)</strong>. </li> </ol> <p>$$k=\frac{1}{\sqrt{p_2}}$$ <strong>(6)</strong></p> <ol start="3"> <li><p>Calculate stage-2 ODVs using equations <strong>(4)</strong> and <strong>(5)</strong>, where $\mu$ and $\sigma$ are the sample mean and the sample standard deviation derived from the <strong>truncated</strong> data.</p> </li> <li><p><strong>All data (from the complete dataset) that are more extreme than the stage-2 ODVs are considered to be outliers.</strong></p> </li> </ol> <h2 id="streaming-version">Streaming version</h2> <p>Even though the algorithm is not made for streaming data, I will convert it into a streaming algorithm by calculating <a href="http://kyrcha.info/2019/04/05/calculating-the-running-average-and-variance-of-streaming-data-using-redis">the running average and variance of the data</a> using <a href="https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford&#39;s_online_algorithm">Welford&#39;s online algorithm</a> and benchmark it using the <em>Numenta Anomaly Benchmark (NAB)</em>.</p> <p>The <a href="https://github.com/kyrcha/NAB/blob/master/nab/detectors/chebyshev/chebyshev_detector.py">code</a> for the algorithm can be found in my <a href="https://github.com/kyrcha/NAB">personal fork</a> of the <a href="https://github.com/numenta/NAB">NAB GitHub repository</a>. It achieves:</p> <ul> <li><strong>18.44</strong> in the Standard Profile </li> <li><strong>13.18</strong> Reward Low FP </li> <li><strong>23.21</strong> Reward Low FN</li> </ul> <p>with <strong>100.00</strong> being the perfect score in all three categories. Even though the score maybe low, every record is handled in constant time <em>O(1)</em> and there are no assumptions made whatsoever regarding the distribution of the incoming data.</p> <p>Below is an example of outliers detected from a time series of AWS EC2 CPU utilization:</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/75jCE9hAKNoeluZ76E2NGs/eaa1a0ea7ff1918bec7177a4fbc77a1c/nab-chebyshev-ec2.png" alt="nab-chebyshev-ec2"></p> <p>With red circles we have the points labeled as outliers, while with blue crosses are the predicted ones. As one can see the algorithm is capable of identifying unique outliers rather than periods of &quot;outlierish&quot; behavior needed by NAB.</p> <p>The code for the NAB implementation can be found below:</p> <pre><code class="language-python">from nab.detectors.base import AnomalyDetector import math class ChebyshevDetector(AnomalyDetector): &quot;&quot;&quot; An streaming version of the algorithm found in the paper: &quot;Data Outlier Detection using the Chebyshev Theorem&quot; using Welford&#39;s online algorithm to calculate mean and standard deviation &quot;&quot;&quot; def __init__(self, *args, **kwargs): super(ChebyshevDetector, self).__init__(*args, **kwargs) self.p1 = 0.1 # Stage 1 probability self.p2 = 0.001 # Stage 2 probability self.k1 = 1/math.sqrt(self.p1) self.k2 = 1/math.sqrt(self.p2) self.n1 = 0 self.m1 = 0 self.m1_2 = 0 self.std1 = 1 self.n2 = 0 self.m2 = 0 self.m2_2 = 0 self.std2 = 1 def handleRecord(self, inputData): &quot;&quot;&quot;Returns a tuple (anomalyScore). The input value is considered an outlier if it resides outside the Outlier Detection Values (upper or lower). The anomalyScore is calculated based on the normalized distance the input value has from the upper or lower ODVs, if the input value is considered and outlier, otherwise it is 0.0. The probabilities p1 and p2 have been tuned a bit to give good performance on NAB. &quot;&quot;&quot; anomalyScore = 0.0 inputValue = inputData[&quot;value&quot;] # stage 1 statistics self.n1 += 1 delta = inputValue - self.m1 self.m1 += delta/self.n1 self.m1_2 += delta * (inputValue - self.m1) self.std1 = math.sqrt(self.m1_2/(self.n1-1)) if self.n1-1 &gt; 0 else 0.000001 odv1_high = self.m1 + self.k1 * self.std1 odv1_low = self.m1 - self.k1 * self.std1 if inputValue &lt;= odv1_high and inputValue &gt;= odv1_low: # Passed the first test, let&#39;s calculate the second stage statistics self.n2 += 1 delta = inputValue - self.m2 self.m2 += delta/self.n2 self.m2_2 += delta * (inputValue - self.m2) self.std2 = math.sqrt(self.m2_2/(self.n2-1)) if self.n2-1 &gt; 0 else 0.000001 odv2_high = self.m2 + self.k2 * self.std2 odv2_low = self.m2 - self.k2 * self.std2 if inputValue &gt; odv2_high: ratio = (inputValue - odv2_high)/inputValue anomalyScore = ratio elif inputValue &lt; odv2_low: ratio = abs((odv2_low - inputValue)/odv2_low) anomalyScore = ratio return (anomalyScore, )</code></pre> <![CDATA[What is a (startup) mastermind group?]]>http://kyrcha.info/2019/11/07/what-is-a-startup-mastermind-grouphttp://kyrcha.info2019/11/07/what-is-a-startup-mastermind-groupThu, 07 Nov 2019 11:25:00 GMT<p>I&#39;ve been listening to podcasts from <a href="https://www.startupsfortherestofus.com/">Startups for the Rest of Us</a> for some time now and what has captured my attention was the idea of having a mastermind group. The notion of a startup mastermind group was mainly discussed in episodes:</p> <ul> <li><a href="https://www.startupsfortherestofus.com/episodes/episode-167">167</a></li> <li><a href="https://www.startupsfortherestofus.com/episodes/episode-277-five-ways-to-structure-your-startup-mastermind">277</a></li> </ul> <p>Originally mentioned in Napoleon&#39;s Hill book <em>&quot;Think and Grow Rich&quot;</em> (I haven&#39;t read it, but it was mentioned in the podcast), a mastermind group is a (small) group of people who are in a similar &quot;boat&quot;, encounter similar problems and have a similar type of business. Such a group can give suggestions to allow you to make better decisions for your business and grow your accountability towards yourself and your business. It can also offer you support and feedback. Especially for micropreneurs and solo founders, it is a way of getting a group of supportive people without being isolated. Family, friends and/or employees will probably never understand your problems at the level you need them to.</p> <p>Some heuristics are:</p> <ul> <li>3-5 persons are around the optimal, 3 people is an very good number to have</li> <li>Duration around 2 hours</li> <li>Around 30&#39; each talking about your product/problems</li> <li>Meet every other week</li> <li>Have an opt out period</li> <li>Have an expectation of confidentiality because you will be discussing monetary and legal stuff among other things</li> <li>Probably the best is to have met in person before</li> </ul> <p>For accountability, planning and history you can use an colab document editor like Google docs with bullet points of:</p> <ul> <li>previous commitments</li> <li>accomplished work</li> <li>work to be done</li> </ul> <p>Five approaches to structure your mastermind group:</p> <ol> <li>Round table: each person speaks an equal amount of time.</li> <li>Time segments: for example 5&#39; talk, 1&#39; questions, 1&#39; transition to the next person and start over</li> <li>Short hot seat: 1 person gets extra time for example 1h, 15&#39;, 15&#39;</li> <li>Dedicated hot seat: - each session one person talks all the time</li> <li>Use a moderator</li> </ol> <p>As you can see there is no correct way to structure your mastermind group. The most important point is to be meaningful and helpful for everyone participating.</p> <![CDATA[Generating plausible paper titles with Recurrent Neural Networks]]>http://kyrcha.info/2019/11/07/generating-plausible-paper-titles-with-recurrent-neural-networkshttp://kyrcha.info2019/11/07/generating-plausible-paper-titles-with-recurrent-neural-networksThu, 07 Nov 2019 09:48:00 GMT<p>This is a fun project that occured to me while reading month after month the email with the table of contents from <em><a href="https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=5962385">IEEE Transactions on Neural Networks and Learning Systems</a></em> journal. It seemed to me that the titles followed a pattern that consisted of some conjuctions, prepositions and particles, intermingled with a lot of keywords, specific to the field. So I thought it should be easy to learn to generate titles with a recurrent neural network and a small corpus. Let&#39;s see what I got.</p> <h2 id="data">Data</h2> <p>I exported the emails (see below) into a text file and with some text processing the final <a href="https://github.com/kyrcha/deep-learning-pipelines/blob/master/data/ieee-tnnls-titles.txt">txt file</a> contains all the titles (one title by line) from March 2016 till November 2019.</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/5AWJFi1nXmihLUxk5X0ijq/6bfedceda9db574d422f07e9e838f70a/Screenshot_2019-11-07_11.25.08.png" alt="Screenshot 2019-11-07 11.25.08"></p> <h2 id="pipeline">Pipeline</h2> <p>The <a href="https://github.com/kyrcha/deep-learning-pipelines/blob/master/generating_paper_titles.ipynb">whole pipeline</a> can be found in my <a href="https://github.com/kyrcha/deep-learning-pipelines">deep-learning-pipelines repository</a> as an ipython notebook.</p> <p>I start with the imports and then download the NLTK model data (you need to do this once):</p> <pre><code class="language-python">import csv import itertools import operator import numpy as np import nltk import sys from datetime import datetime import matplotlib.pyplot as plt %matplotlib inline %%capture # Download NLTK model data (you need to do this once) nltk.download(&quot;book&quot;)</code></pre> <p>I read the file with the titles and now I am ready to check the data:</p> <pre><code class="language-python">with open(&#39;ieee-tnnls-titles.txt&#39;, &#39;r&#39;) as f: text = f.read()</code></pre> <h3 id="data-exploration">Data exploration</h3> <p>Let&#39;s explore the dataset a bit. </p> <pre><code class="language-python">print(&#39;Dataset Stats&#39;) print(&#39;Roughly the number of unique words: {}&#39;.format(len({word: None for word in text.split()}))) titles = text.splitlines() print(&#39;Number of titles: {}&#39;.format(len(titles))) word_count_sentence = [len(title.split()) for title in titles] print(&#39;Average number of words in each title: {}&#39;.format(np.average(word_count_sentence)))</code></pre> <p>We have:</p> <ul> <li><strong>1207</strong> titles,</li> <li>around <strong>2705</strong> unique words, while</li> <li>the average number of words in each title is <strong>10.2</strong></li> </ul> <p>So the ratio of titles to unique words is <em>0.5</em>, which means probably we should prune the vocabulary and its interactions to be learnt to much fewer words.</p> <h3 id="pre-processing">Pre-processing</h3> <p>My next step was to preprocess the titles, append <code>END</code> and <code>START</code> tokens in-front and in the back of each title, then tokenize the titles into words and remove non alphabetical tokens.</p> <p>First I declared three tokens to be used for a) unknown words, b) the start and c) the end of a title:</p> <pre><code class="language-python">unknown_token = &quot;UNKNOWN_TOKEN&quot; title_start_token = &quot;TITLE_START&quot; title_end_token = &quot;TITLE_END&quot;</code></pre> <p>Then I proceed into sentence and word tokenization along with appending the start/end tokens:</p> <pre><code class="language-python">from nltk.tokenize import sent_tokenize, word_tokenize sentences = itertools.chain(*[nltk.sent_tokenize(x.lower()) for x in titles]) tokenized_titles = [&quot;%s %s %s&quot; % (title_start_token, x, title_end_token) for x in sentences] tokenized_titles = [nltk.word_tokenize(title) for title in tokenized_titles] final_title = [] for title in tokenized_titles: final_title.append([token for token in title if token.isalpha() or token == title_start_token or token == title_end_token]) tokenized_titles = final_title</code></pre> <p>An example of a tokenized title will be:</p> <pre><code>[&#39;TITLE_START&#39;, &#39;object&#39;, &#39;detection&#39;, &#39;with&#39;, &#39;deep&#39;, &#39;learning&#39;, &#39;a&#39;, &#39;review&#39;, &#39;TITLE_END&#39;]</code></pre><p>During this pre-processing step <strong>2073</strong> unique word tokens were found. Since the corpus is not very large, I will try and learn the connections between the most popular words in order to have enough samples to learn meaningful interconnections. Thus I chose a vocabulary size of <em>250</em>. So the next steps are: to find these frequent words, replace the rest with the <code>UKNOWN</code> token and build <code>index_to_word</code> (a mapping from an integer to a word) and <code>word_to_index</code> (vice-versa) mappings:</p> <pre><code class="language-python">vocabulary_size = 250 vocab = word_freq.most_common(vocabulary_size-1) index_to_word = [x[0] for x in vocab] index_to_word.append(unknown_token) word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)]) print(&quot;Using vocabulary size %d.&quot; % vocabulary_size) print(&quot;The least frequent word in our vocabulary is &#39;%s&#39; and appeared %d times.&quot; % (vocab[-1][0], vocab[-1][1]))</code></pre> <p>The least frequent word in our dictionary of <em>250</em> words appeared to be <em>&quot;stable&quot;</em> (7 appearances) and the most frequent <em>&quot;for&quot;</em> (553 appearences).</p> <p>As a next step I replaced all words not in our vocabulary with the <code>UNKNOWN</code> token:</p> <pre><code class="language-python">for i, sent in enumerate(tokenized_titles): tokenized_titles[i] = [w if w in word_to_index else unknown_token for w in sent]</code></pre> <p>So the title: <em>&quot;Plume Tracing via Model-Free Reinforcement Learning Method&quot;</em> would look like after pre-processing as: <code>[&#39;TITLE_START&#39;, &#39;UNKNOWN_TOKEN&#39;, &#39;UNKNOWN_TOKEN&#39;, &#39;via&#39;, &#39;reinforcement&#39;, &#39;learning&#39;, &#39;method&#39;, &#39;TITLE_END&#39;]</code></p> <h3 id="training">Training</h3> <p>Let&#39;s create the training data. I used the <code>KerasBatchGenerator</code> from <a href="https://adventuresinmachinelearning.com/keras-lstm-tutorial/">this blog post</a> to generate the batches to be fed into the LSTMs:</p> <pre><code>class KerasBatchGenerator(object): def __init__(self, data, num_steps, batch_size, vocabulary, skip_step=5): self.data = data self.num_steps = num_steps self.batch_size = batch_size self.vocabulary = vocabulary # this will track the progress of the batches sequentially through the # data set - once the data reaches the end of the data set it will reset # back to zero self.current_idx = 0 # skip_step is the number of words which will be skipped before the next # batch is skimmed from the data set self.skip_step = skip_step def generate(self): x = np.zeros((self.batch_size, self.num_steps)) y = np.zeros((self.batch_size, self.num_steps, self.vocabulary)) while True: i = 0 while i &lt; self.batch_size: # I don&#39;t want to see in x a title end token to predict y if self.current_idx &lt; len(self.data) and self.data[self.current_idx] == word_to_index[title_end_token]: self.current_idx += self.skip_step if self.current_idx + self.num_steps &gt;= len(self.data): # reset the index back to the start of the data set self.current_idx = 0 x[i, :] = self.data[self.current_idx:self.current_idx + self.num_steps] temp_y = self.data[self.current_idx + 1:self.current_idx + self.num_steps + 1] # convert all of temp_y into a one hot representation y[i, :, :] = to_categorical(temp_y, num_classes=self.vocabulary) self.current_idx += self.skip_step i += 1 yield x, y</code></pre><p>Through the generator, batches of <em>10</em> tokens that predict the next token (in one hot encoding form) are generated. Each batch contains <em>2</em> arrays that contain <em>10</em> tokens each. The first array has <em>10</em> integers, while the second array has 10 one hot encoding vectors that represent the equivalent next tokens of the first array. For example:</p> <p>The 2 arrays are of the form:</p> <pre><code>[[0.],[122.],[249.],[29.],[3.],[187.],[11.],[0.],[40.],[3.]]</code></pre><p>and</p> <pre><code>[[[0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 1.]], ...</code></pre><ol> <li>The <code>START</code> token in the first array, which is 0, to predict the one-hot encoded version of 122 (which is the next token after 0)</li> <li>The 122 token to predict the one-hot encoded version of 249</li> <li>The 249 token to predict the one-hot encoded version of 29</li> <li>and so on and so forth...</li> </ol> <p>The first <em>10K</em> tokens are employed for generating training batches, while the rest <em>3846</em> for validation. As a note, we never have a sample that uses the <code>END</code> token to predict the next token. Let&#39;s create the batch generators:</p> <pre><code class="language-python">num_steps = 1 skip_step = 1 batch_size = 10 # set seeds for reproducibility from numpy.random import seed seed(123) from tensorflow import set_random_seed set_random_seed(234) # Create the training data # A concatenation of all tokens as integers (indices) X = list(itertools.chain(*np.asarray([[word_to_index[w] for w in sent] for sent in tokenized_titles]))) # Create 2 batch generators out of the concatenation train_data_generator = KerasBatchGenerator(X[:10000], num_steps, batch_size, vocabulary_size, skip_step) valid_data_generator = KerasBatchGenerator(X[10001:], num_steps, batch_size, vocabulary_size, skip_step)</code></pre> <p>Next I create the model:</p> <pre><code class="language-python">from keras.models import Sequential from keras.layers import Dense, Activation, Embedding, Dropout, TimeDistributed from keras.layers import LSTM from keras.optimizers import Adam from keras.utils import to_categorical from keras.callbacks import ModelCheckpoint hidden_size = 250 model = Sequential() model.add(Embedding(vocabulary_size, hidden_size, input_length=num_steps)) model.add(LSTM(hidden_size, return_sequences=True)) model.add(LSTM(hidden_size, return_sequences=True)) model.add(Dropout(rate=0.5)) model.add(TimeDistributed(Dense(vocabulary_size))) model.add(Activation(&#39;softmax&#39;))</code></pre> <p>compile the model:</p> <pre><code>model.compile(loss=&#39;categorical_crossentropy&#39;, optimizer=&#39;adam&#39;, metrics=[&#39;categorical_accuracy&#39;])</code></pre><p>and train the model for 10 epochs:</p> <pre><code>num_epochs = 10 model.fit_generator(train_data_generator.generate(), len(X[:10000])//(batch_size*num_steps), num_epochs, validation_data=valid_data_generator.generate(), validation_steps=len(X[10001:])//(batch_size*num_steps))</code></pre><p>After training we got a validation categorical accuracy of <strong>0.3625</strong>, which is of course much better than randomly predicting around <em>250</em> tokens.</p> <h3 id="generating">Generating</h3> <p>Now it is time to check the model. We start by feeding the model a <code>START</code> token and keep sampling until there is an <code>END</code> token. We resample if the sampling generates the <code>UNKNOWN</code> token:</p> <pre><code>def generate_title(model): # We start the sentence with the start token new_title = [word_to_index[title_start_token]] # Repeat until we get an end token while not new_title[-1] == word_to_index[title_end_token]: x = np.zeros((1,1)) x[0, :] = new_title[-1] next_word_probs = model.predict(x)[0][0] sampled_word = word_to_index[unknown_token] # We don&#39;t want to sample unknown words while sampled_word == word_to_index[unknown_token]: samples = np.random.multinomial(1, next_word_probs) sampled_word = np.argmax(samples) new_title.append(sampled_word) title_str = [index_to_word[x] for x in new_title[1:-1]] return title_str num_sentences = 30 senten_min_length = 7 senten_max_length = 15 for i in range(num_sentences): sent = [] # We want long sentences, not sentences with one or two words while len(sent) &lt; senten_min_length or len(sent) &gt; senten_max_length: sent = generate_title(model) print(&quot; &quot;.join(sent))</code></pre><p>We generated <em>30</em> sentences between <em>7</em> and <em>15</em> tokens:</p> <pre><code>a new active systems under control of boolean network for deep noise framework and processes multiview metric clustering and neural networks approach for heterogeneous systems and noise learning structure of nonlinear multiagent systems and unknown systems deep neural networks with adaptive delays of regression adaptive stochastic models using active learning processes stability analysis for mimo neural networks with delays on state estimation of a new iterative learning a class of online model for a novel recurrent neural network a controller for feature analysis of neural networks and noise collaborative quality of and the neural networks a unified sparse representation of delayed neural network representation with the feature selection based on a application to stochastic delays via regularization unified analysis for a deep transfer learning a network of coupled uncertain delay and application to semisupervised classification a deep convolutional neural networks with communication constraints and its switched linear multiagent systems multimodal data for nonlinear systems with adaptive complex networks optimal delays control of multiple learning for clustering linear data design of delayed jump neural network for linear systems memristive generalized efficient estimation for feature selection for modeling for nonlinear systems a deep convolutional neural dynamic systems with hierarchical a constrained iterative learning with multiple least the classification sequential metric learning with a supervised systems with learning exponential synchronization of communication processes and its switched systems using neural networks application to mixture of gaussian heterogeneous and time delays a new control for generalized domain adaptation robust concept and local method for heterogeneous dynamic programming by a novel adaptive control of graph analysis for nonlinear kernel convolutional neural networks with delays optimal control of time regression and its application to features semisupervised feature optimization and probabilistic matrix learning markov for and an multiobjective framework with dynamical delay</code></pre><p>Even though the network didn&#39;t learn any grammar rules, some plausible titles were generated. For example (even though I wouldn&#39;t know what it would be about):</p> <pre><code>adaptive stochastic models using active learning processes</code></pre><p>and my favorite:</p> <pre><code>a novel adaptive control of graph analysis for nonlinear kernel convolutional neural networks with delays</code></pre><p>what a mouthfull!</p> <h2 id="references">References</h2> <p>At this point I should mention that I re-used some code from:</p> <ul> <li><a href="https://adventuresinmachinelearning.com/keras-lstm-tutorial/">https://adventuresinmachinelearning.com/keras-lstm-tutorial/</a> (mainly the <code>KerasBatchGenerator</code>)</li> <li><a href="https://github.com/dennybritz/rnn-tutorial-rnnlm/blob/master/RNNLM.ipynb">https://github.com/dennybritz/rnn-tutorial-rnnlm/blob/master/RNNLM.ipynb</a> (Pre-processing and generating text snippets)</li> </ul> <![CDATA[Fitting modified Gombertz and Baranyi equations for bacterial growth in R]]>http://kyrcha.info/2019/10/25/fitting-modified-gompertz-baranyi-equations-bacterial-growth-rhttp://kyrcha.info2019/10/25/fitting-modified-gompertz-baranyi-equations-bacterial-growth-rSat, 26 Oct 2019 10:40:00 GMT<p>Modified Gombertz and Baranyi equations are two of the most famous equations for modelling bacterial growth. <a href="https://en.wikipedia.org/wiki/Bacterial_growth">Bacterial growth</a> is modelled in four different phases:</p> <ul> <li>The lag phase</li> <li>The log or exponential phase</li> <li>The stationary phase</li> <li>The death phase</li> </ul> <p>Researchers in the food engineering industry are interested in the first two (or three) phases, since in order to maintain low bacterial populations you are interested in prolonging the initial phase and prohibiting the growth of the population. In the first three phases the growth curves resemble what is called a sigmoid curve. As in a previous <a href="http://kyrcha.info/2012/07/08/tutorials-fitting-a-sigmoid-function-in-r">blog post on fitting sigmoid curves using R</a> I will use the non-linear least-squares method in R to fit these specific curve into the data.</p> <p>Both the data and the equations are taken from the edited book of <a href="https://www.crcpress.com/Modeling-Microbial-Responses-in-Food/McKellar-Lu/p/book/9780367394653">McKellar and Lu, 2004: Modeling Microbial Response in Food</a>.</p> <h2 id="modified-gompertz">Modified Gompertz</h2> <p>The modified Gompertz equation is equation (2.2) from the book and is given as:</p> <p>$$log(x_t) = A + C \cdot e^{-e^{(-B \cdot (t - M))}}$$</p> <p>where $x_t$ is the number of cells at time $t$, $A$ the asymptotic count, $C$ the difference in value of the upper and lower asymptote, $B$ the relative growth rate at $M$, and $M$ the time at which the absolute growth rate is maximum.</p> <p>Some data from the book for Listeria monocytogenes at 5 degrees Celsius are:</p> <pre><code># time in days d = c(0, 6, 24, 30, 48, 54, 72, 78, 99, 126, 144, 150, 168, 174, 191, 198, 216, 239, 266, 291, 316, 336, 342, 360, 384) # log cfu ml^-1 y = c(4.8, 4.7, 4.7, 4.7, 4.9, 5.1, 5.3, 5.4, 5.9, 6.3, 6.9, 6.9, 7.2, 7.3, 7.7, 7.8, 8.3, 8.8, 9.1, 9.2, 9.3, 9.7, 9.7, 9.7, 9.5)</code></pre><p>Let&#39;s start with defining the Gompertz equation in R:</p> <pre><code>gombertz_mod = function(params, x) { params[1] + (params[3] * exp(-exp(-params[2] * (x - params[4])))) }</code></pre><p>Next I fit the model using non-linear least squares</p> <pre><code>fitmodel &lt;- nls(y ~ A + C * exp(-exp(-B * (d - M))), start=list(A=3, B=0.01, C=10, M=10))</code></pre><p>Extract the parameters and apply the model to new data</p> <pre><code>gomb_params=coef(fitmodel) print(gomb_params)</code></pre><pre><code>## A B C M ## 4.65920718 0.01163221 5.40581821 138.91015307</code></pre><pre><code>d2 &lt;- 0:400 y2 &lt;- gombertz_mod(gomb_params, d2) y_pred_gomb &lt;- gombertz_mod(gomb_params, d)</code></pre><p>Let&#39;s plot the equations and the data points:</p> <pre><code>plot(d2, y2, type=&quot;l&quot;, xlab=&quot;time (days)&quot;, ylab=&quot;logx&quot;, main=&quot;Growth for Listeria monocytogenes (Gompertz)&quot;) points(d, y)</code></pre><p><img src="//images.ctfassets.net/c5lel8y1n83c/6KRu6BF6TJmhTuEtue0dor/8bc1a2a69d064f299fd2adf728b5ad78/unnamed-chunk-5-1.png" alt="unnamed-chunk-5-1"></p> <p>and calculate the RMSE:</p> <pre><code class="language-r">rmse &lt;- function(real, pred) { sqrt(mean((real-pred)^2)) } paste(&quot;RMSE modified Gombertz: &quot;, rmse(y, y_pred_gomb))</code></pre> <pre><code>## [1] &quot;RMSE modified Gombertz: 0.112256060122191&quot;</code></pre><h2 id="baranyi">Baranyi</h2> <p>The <strong>Baranyi</strong>, equations (2.9) and (2.10) from the book are:</p> <p>$$y(t) = y_0 + \mu_{max} \cdot A(t) - ln(1 + \frac{e^{\mu_{max} \cdot A(t)} - 1}{e^{y_{max}-y_0}})$$</p> <p>and</p> <p>$$A(t) = t + \frac{1}{\mu_{max}} \cdot ln(e^{-\mu_{max} \cdot t} + e^{-\mu_{max} \cdot \lambda} - e^{[-\mu_{max} \cdot (t + \lambda)]})$$</p> <p>where $y(t)=lnx(t)$, $y_0=lnx_0$, $\mu_{max}$ is the rate of increase of the limiting substrate and $\lambda$ is the lag-phase duration. <em>Note: the equation for $A_(t)$ is derived after substituting $q_0$ with $\frac{1}{e^{\mu_{max}} - 1}$ in the original equation from the book.</em></p> <p>Thus in R for the Baranyi model, we have the following function:</p> <pre><code class="language-r">fitmodel &lt;- nls(y ~ y0 + mmax * (d + (1/mmax) * log(exp(-mmax*d) + exp(-mmax * lambda) - exp(-mmax * (d + lambda)))) - log(1 + ((exp(mmax * (d + (1/mmax) * log(exp(-mmax*d) + exp(-mmax * lambda) - exp(-mmax * (d + lambda)))))-1)/(exp(ymax-y0)))), start=list(y0=2.5, mmax=0.1, lambda=10, ymax=10))</code></pre> <pre><code class="language-r">baranyi &lt;- function(params, x) { params[1] + params[2] * (x + (1/params[2]) * log(exp(-params[2]*x) + exp(-params[2] * params[3]) - exp(-params[2] * (x + params[3])))) - log(1 + ((exp(params[2] * (x + (1/params[2]) * log(exp(-params[2]*x) + exp(-params[2] * params[3]) - exp(-params[2] * (x + params[3])))))-1)/ (exp(params[4]-params[1])))) } baranyi_params &lt;- coef(fitmodel) print(baranyi_params)</code></pre> <pre><code>## y0 mmax lambda ymax ## 4.63245864 0.02577884 65.17090445 9.69155419</code></pre><pre><code class="language-r">d3 &lt;- 0:400 y3 &lt;- baranyi(baranyi_params, d3) y_pred_baranyi &lt;- baranyi(baranyi_params, d)</code></pre> <pre><code class="language-r">plot(d3, y3, type=&quot;l&quot;, xlab=&quot;time (days)&quot;, ylab=&quot;logN&quot;, main=&quot;Growth for Listeria monocytogenes (Baranyi)&quot;) points(d, y)</code></pre> <p><img src="//images.ctfassets.net/c5lel8y1n83c/2BZI2arVHfzRe95Bbsn9yl/11dcaf10b3c699f0fb873f0eb1e3045f/unnamed-chunk-9-1.png" alt="unnamed-chunk-9-1"></p> <pre><code class="language-r">paste(&quot;RMSE Baranyi: &quot;, rmse(y, y_pred_baranyi))</code></pre> <pre><code>## [1] &quot;RMSE Baranyi: 0.102942076757872&quot;</code></pre><p>As expected from the bibliography, the Baranyi equations have a smaller error, basically due to the better fit in the steady state of the bacterial growth.</p> <p>A rendered edition of the <a href="https://github.com/kyrcha/ml-rants/blob/master/gompertz-baranyi-example.Rmd">R markdown notebook</a> can be found in <a href="http://rpubs.com/kyrcha/gompertz-baranyi-fit">Rpubs</a>.</p> <![CDATA[Sending graphql queries using http.Client in Go]]>http://kyrcha.info/2019/10/15/sending-graphql-queries-using-http-client-in-gohttp://kyrcha.info2019/10/15/sending-graphql-queries-using-http-client-in-goTue, 15 Oct 2019 05:30:00 GMT<p><strong>Background</strong>: I wanted to test quickly, different, complex GraphQL queries to the GitHub v4 API especially wrt to error messages produced. At the same time I didn&#39;t want to create complex Go types to match the GitHub schema using a library like <a href="https://github.com/shurcooL/githubv4">shurcooL/githubv4</a>, it seemed like a lot of hassle for my purpose, especially since I didn&#39;t want to decode the response and use it.</p> <p><strong>Prerequisites</strong>: Create a personal access token with the scopes related to the queries you want to do and put it in the environmental variable <code>GITHUB_TOKEN</code>.</p> <p>According to the <a href="https://developer.github.com/v4/guides/forming-calls/">GitHub documentation on forming calls</a>:</p> <blockquote> <p>The string value of <code>&quot;query&quot;</code> must escape newline characters or the schema will not parse it correctly. For the <code>POST</code> body, use outer double quotes and escaped inner double quotes.</p> </blockquote> <p>So in order not to do the encoding myself, I will use the <a href="https://golang.org/pkg/encoding/json/">go json library</a> to take care of the json encoding/marshaling. </p> <p>In addition one must take care of putting extra braces like explained in this <a href="https://stackoverflow.com/a/58131007/869151">StackOverflow answer</a>.</p> <p>So let&#39;s start by creating an OAuth2 client since <a href="https://developer.github.com/v4/guides/forming-calls/#authenticating-with-graphql">you cannot have non-authenticated queries to the v4 API</a>:</p> <pre><code>client := oauth2.NewClient( context.TODO(), oauth2.StaticTokenSource( &amp;oauth2.Token{AccessToken: os.Getenv(&quot;GITHUB_TOKEN&quot;)}, ))</code></pre><p>then a query:</p> <pre><code> query := `query { repository(owner:&quot;octocat&quot;, name:&quot;Hello-World&quot;) { issues(last:20, states:CLOSED) { edges { node { title url labels(first:5) { edges { node { name } } } } } } } }`</code></pre><p>then marshal (or encode) the basic struct into JSON:</p> <p><code>gqlMarshalled, err := json.Marshal(graphQLRequest{Query: query})</code></p> <p>and finally POST:</p> <p><code>resp, err := client.Post(&quot;https://api.github.com/graphql&quot;, &quot;application/json&quot;, strings.NewReader(string(gqlMarshalled)))</code></p> <p>and dump the response:</p> <pre><code>b, _ := httputil.DumpResponse(resp, true) fmt.Println(string(b))</code></pre><p>The complete gist that includes a query with variables can be found below:</p> <p><code>gist:kyrcha/76fdcabfbdb4c746fdc8d20761262212#graphqlclient.go</code></p> <p>Execute it with:</p> <p><code>GITHUB_TOKEN=&lt;your token&gt; go run graphqlclient.go</code></p> <![CDATA[Launching the new kyrcha.info using Gatsby, Bulma, Contentful and Surge]]>http://kyrcha.info/2019/05/22/launching-the-new-kyrcha-info-using-gatsby-bulma-contentful-and-surgehttp://kyrcha.info2019/05/22/launching-the-new-kyrcha-info-using-gatsby-bulma-contentful-and-surgeWed, 22 May 2019 15:49:00 GMT<p>Finally! Since the first commit in GitHub on the 26th of April 2018, that is after almost a year, I am in the position to say that I can announce the official launch of <a href="http://kyrcha.info">kyrcha.info</a>.</p> <p>As I state in the header my home page, I want <a href="http://kyrcha.info">kyrcha.info</a> to be the main point of entry to my digital self. To serve as a medium to communicate with the world, to serve as an archive, to serve as a long-term memory, to server as a marketing tool.</p> <p>I have used many technologies before to build it: Wordpress (numerous attempts), dokuwiki, docpad, plain old html and more, but eventually I believe I found the combination that satisfies my requirements:</p> <ul> <li>A static site generator, with whatever that means in terms of performance and security vs. dynamic website platforms.</li> <li>Be able to own my content.</li> <li>Be able to extend the functionality myself programmatically.</li> <li>Use technologies I also use in other projects.</li> <li>Have pride in that I&#39;ve stiched it up myself.</li> </ul> <p>So I am writing this post to present to you kyrcha.info, my personal website that uses <a href="https://www.gatsbyjs.org/">GatsbyJS</a> as the static site generator, <a href="https://bulma.io/">Bulma</a> as the CSS framework, <a href="https://www.contentful.com/">Contentful</a> for managing content and <a href="https://surge.sh/">surge</a> for publishing.</p> <p>Features I wanted and have implemented in this website are:</p> <ul> <li>Google analytics through the <a href="https://www.gatsbyjs.org/packages/gatsby-plugin-google-analytics/">google-analytics Gatsby plugin</a></li> <li>RSS feed with email subscription. This required the <a href="https://www.gatsbyjs.org/packages/gatsby-plugin-feed/">feed Gatsby plugin</a> and feedburner with emai support and <a href="https://github.com/kyrcha/kyrcha.info/blob/master/gatsby-config.js#L73">some code</a>.</li> <li>Math equations in blog posts. For this I used the <a href="https://github.com/hanai/gatsby-remark-mathjax">Gatsby plugin remark-mathjax</a> and some code I found over <a href="https://github.com/hanai/gatsby-remark-mathjax/issues/1#issuecomment-443436362">GitHub</a>. So now I write equations like this: <code>$\frac{a}{b}$</code> in Contentful and they are transform into math: $\frac{a}{b}$.</li> <li>Be able to write my own code if I want to for anything.</li> <li>Be able to draft something in Contentful and preview it without commiting it to GitHub or do other hacks.</li> <li>Be able to write my posts in (simple) markdown and not in html or in some rich format editors that are a pain many times.</li> </ul> <p>I have also added:</p> <ul> <li>Commenting using Disqus through a React plugin. Unfortunattely, I still cannot make the old comments to show up to the new website, despite the migrations I have made in the Disqus platform.</li> </ul> <p>If you like the technologies, the features and the layouting...the code is on <a href="https://github.com/kyrcha/kyrcha.info">GitHub</a>.</p> <p>Some remaining tasks are:</p> <ul> <li><del>I am missing pagination in the blog page</del> <strong>Update 2019-10-26</strong>: <a href="https://github.com/kyrcha/kyrcha.info/commit/59ff61f6b5b591a9b967bc3e4a513ab126193077">done</a></li> <li>I am missing tag pages to contain collections of posts with the same tag</li> <li>Optimizations for speed</li> <li>More content :)</li> </ul> <![CDATA[Simple rules for building robust machine learning models]]>http://kyrcha.info/2019/05/16/simple-rules-for-building-robust-machine-learning-modelshttp://kyrcha.info2019/05/16/simple-rules-for-building-robust-machine-learning-modelsThu, 16 May 2019 09:29:00 GMT<p>This is the title of my invited talk in the Ask Me Analyting (AMA) call of the <a href="https://www.rd-alliance.org/groups/early-career-and-engagement-ig">Research Data Alliance (RDA) Early Career and Engagement Interest Group</a>. The minutes of the call will be posted <a href="https://github.com/fpsom/rda-eceig">here</a>.</p> <p>The rules are summarized as follows:</p> <ol> <li>Always have 3 sets:<ul> <li>training</li> <li>validation</li> <li>test</li> </ul> </li> <li>Validation and test sets should reflect the data you expect to see in the future</li> <li>Follow dataset size heuristics</li> <li>Choose one metric to iterate faster and have more focus</li> <li>Always do your exploratory analysis data <ul> <li>density plots</li> <li>correlation plots</li> <li>box plots</li> </ul> </li> <li>When preprocessing use statistics based only on the training set</li> <li>Increase the number you do 10-fold CV to get even more accurate estimates of performance</li> <li>Use Wilcoxon statistical test to choose between two models</li> <li>Time is money (Person-Months and Cloud Computing), start with a small dataset, debug and then increase the size</li> <li>If you don&#39;t have enough data, find or create more data</li> <li>Decide if you strive for performance or interpretability</li> <li>Learn the strong points of each ML model</li> <li>Become a knowledgeable trader of bias-variance</li> <li>Finish of with an ensemble</li> <li>Tune hyperparameters ... but up to a point</li> <li>Start with a simple waterfall like process:<ul> <li>Study the problem</li> <li>EDA</li> <li>Define optimization strategy</li> <li>Do feature engineering</li> <li>Modelling</li> <li>Ensembling</li> </ul> </li> </ol> <p>Enjoy!</p> <p><a href="https://speakerdeck.com/kyrcha/simple-rules-for-building-robust-machine-learning-models">https://speakerdeck.com/kyrcha/simple-rules-for-building-robust-machine-learning-models</a></p> <![CDATA[Advices and strategies I learned from my first business attempt]]>http://kyrcha.info/2019/04/23/advices-and-strategies-i-learned-from-my-first-business-attempthttp://kyrcha.info2019/04/23/advices-and-strategies-i-learned-from-my-first-business-attemptTue, 23 Apr 2019 08:46:00 GMT<p>This is the title of my invited talk in <a href="http://www.sfhmmy.gr/en/home">ECESCON 2019 (Electrical and Computer Engineering Student Conference)</a>. Even though I do not consider myself as an experienced (or successful) entrepreneur, I took up the challenge to come up with a talk that I believe it will help others in their first attempt. The slides are below:</p> <p><a href="https://speakerdeck.com/kyrcha/advices-and-strategies-i-learned-from-my-first-business-attempt">https://speakerdeck.com/kyrcha/advices-and-strategies-i-learned-from-my-first-business-attempt</a></p> <![CDATA[Calculating the running average and variance of streaming data using redis]]>http://kyrcha.info/2019/04/05/calculating-the-running-average-and-variance-of-streaming-data-using-redishttp://kyrcha.info2019/04/05/calculating-the-running-average-and-variance-of-streaming-data-using-redisThu, 04 Apr 2019 21:33:00 GMT<p>In our Big Data Management System, <a href="https://github.com/AuthEceSoftEng/cenote">cenote</a>, we wanted to calculate the running average and the running variance of numeric JSON properties from streaming data processed by Storm bolts. Before the bolts store the data in the database, we wanted to update these statistics for each numeric property in order to perform online outlier detection. The values of each numeric property can be processed by a different bolt and all these bolts have to update the same running statistic concurrently. Thus each update should be either an atomic operation or formed as a transaction, while at the same time be fast enough to achieve near-real time processing times end-to-end. </p> <p>A system that supports quick writes and reads is redis, an in-memory, key-value store that has both atomic operations and <a href="https://redis.io/topics/transactions">transactions</a>. So when a bolt receives a JSON document it should update the tripplet, <code>{n, m, m2}</code>, according to <a href="https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford&#39;s_online_algorithm">Welford&#39;s algorithm</a>, with <code>n</code> being the number of sample, <code>m</code> the mean and <code>m2</code> the squared distance from the mean.</p> <p>Below there is example code (or <a href="https://gist.github.com/kyrcha/974f662d988906023d20cadf22a8a88e">here as a gist</a>) that instantiates a pool of 100 threads, with each one processing a number, connecting to redis and updating the number of samples, their running average and their variance using transactions (or <a href="https://pypi.org/project/redis/">pipelines in the Python-redis language</a>). It uses a lua script and the EVAL redis command.</p> <pre><code class="language-{python}">from multiprocessing import Pool import redis import math import json from random import seed from random import gauss # Atomic operations def sum(x): r = redis.Redis(host=&#39;localhost&#39;, port=6379, db=0) r.incrbyfloat(&#39;sum&#39;, x) # transactional operations using EVAL def welford(x): r = redis.Redis(host=&#39;localhost&#39;, port=6379, db=0) pipe = r.pipeline() running(keys=[&#39;aggregate&#39;], args=[x], client=pipe) pipe.execute() if __name__ == &#39;__main__&#39;: # create a sequence of numbers following a normal distribution # of mean 0 and 1 standard deviation seed(1) sequence = [gauss(0,1) for i in range(1000)] # connect to redis and initialize rmain = redis.Redis(host=&#39;localhost&#39;, port=6379, db=0) rmain.set(&#39;sum&#39;, 0) rmain.set(&#39;aggregate&#39;, &#39;{ &quot;n&quot;: &quot;0&quot;, &quot;m&quot;: &quot;0&quot;, &quot;m2&quot;: &quot;0&quot; }&#39;) # Welford&#39;s online algorithm # https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford&#39;s_online_algorithm lua_script = &quot;&quot;&quot; local aggregate = redis.call(&#39;get&#39;,KEYS[1]) local decode = cjson.decode(aggregate) local n = decode[&#39;n&#39;] n = n + 1 local m = decode[&#39;m&#39;] local m2 = decode[&#39;m2&#39;] local delta = ARGV[1] - m m = m + delta/n m2 = m2 + delta * (ARGV[1] - m) decode[&#39;n&#39;] = n decode[&#39;m&#39;] = m decode[&#39;m2&#39;] = m2 local encoded = cjson.encode(decode) redis.call(&#39;set&#39;, KEYS[1], encoded) &quot;&quot;&quot; running = rmain.register_script(lua_script) p = Pool(100) # create a pool of 100 threads p.map(sum, sequence) # calculate the sum p.map(welford, sequence) # calculate running mean and variance or standard deviation print(&#39;sum: &#39; + str(rmain.get(&#39;sum&#39;))) result = json.loads(rmain.get(&#39;aggregate&#39;)) print(&#39;count: &#39; + str(result[&#39;n&#39;])) print(&#39;mean: &#39; + str(result[&#39;m&#39;])) print(&#39;std: &#39; + str(math.sqrt(result[&#39;m2&#39;]/result[&#39;n&#39;])))</code></pre> <![CDATA[On collinearity and feature selection]]>http://kyrcha.info/2019/03/22/on-collinearity-and-feature-selectionhttp://kyrcha.info2019/03/22/on-collinearity-and-feature-selectionFri, 22 Mar 2019 15:00:00 GMT<p>I am writing this post as an action to Kent C. Dodds <a href="https://kentcdodds.com/blog/intentional-career-building/">Call for Action in the area of intentional carreer building</a>. In that post Kent C. Dodds discusses (possibly reproducible) ideas on how he built his carreer (by creating and communicating value). One of the proposed actions was <em>&quot;Answer your co-worker&#39;s question in a public space (YouTube, gist, etc.) and share it&quot;</em>. </p> <p>In this blog post I am going ahead and answering a student&#39;s question in public space. In particular, I got asked by a student, <em>whether one should elliminate collinearity - using Variance Inflation Factor (VIF) for example - before using a feature selection algorithm</em>. I&#39;ll do my best to provide an insightful answer and to do that I will be fusing my knowledge, experimentation and different resources I found on the Internet. </p> <p>More posts like that will follow. It is a way of finding ideas and writing posts that help you become a better communicator of ideas and concepts, creating content and value and helping other along the way. But I am stalling, so let&#39;s start.</p> <h2 id="resources">Resources</h2> <p>I read and used the following resources on the subject:</p> <ul> <li>This <a href="https://stats.stackexchange.com/q/168622/57185">StackOverflow (SO)</a> question and its answers: <a href="https://stats.stackexchange.com/a/168631/57185">answer 1</a>, <a href="https://stats.stackexchange.com/a/168703/57185">answer 2</a>, <a href="https://stats.stackexchange.com/a/208156/57185">answer 3</a>, <a href="https://stats.stackexchange.com/a/168703/57185">answer 4</a></li> <li>This <a href="https://stats.stackexchange.com/q/30486/57185">SO question</a> and its answers: <a href="https://stats.stackexchange.com/a/112938/57185">answer 1</a></li> <li>This <a href="https://stats.stackexchange.com/q/25611/57185">SO question</a> and its answers: <a href="https://stats.stackexchange.com/a/26051/57185">answer 1</a></li> <li>This <a href="http://www.sthda.com/english/articles/39-regression-model-diagnostics/160-multicollinearity-essentials-and-vif-in-r/">blog post</a></li> <li>The textbook <a href="https://amzn.to/2K4iaOB">Applied Linear Statistical Models, 5th edition</a>.</li> <li><a href="https://en.wikipedia.org/wiki/Collinearity#Usage_in_statistics_and_econometrics">Wikipedia entry</a></li> </ul> <h2 id="collinearity">Collinearity</h2> <p>Intercorrelation or multi-collinearity is the existence of predictor variables that are (highly) correlated among themselves. For example family income, family savings, age of head of household are correlated among themselves in an example when we try to predict family food expenditures. The older you are, you probably have more money and more savings and vice-versa.</p> <p>In the presence of perfect collinearity, i.e. feature $X_1 = \alpha + \beta * X_2$ (1), we could have many different coefficient values of features that would predict response variable $Y$ equally well after performing Ordinary Least Squares (OLS). So given all these solutions, one would not be able to say something regarding the effect $X_1$ and $X_2$ have on Y. In addition, if a new sample arrives that we want to predict and does not follow equation (1), the prediction error will be probably very big. Finally, the regression coefficients of any multicollinear predictor variables can be very different in the presence of non-correlated variables. In practice though, this is rarelly the case since there is also an error component to the relationship.</p> <p>In plain words: If there is a &quot;nice&quot; relationaship between $X_1$ and $X_2$, when new disturbed data arrive, don&#39;t expect to have good predictions. On the other hand, if the relationship between $X_1$ and $X_2$ is fuzzy and it continuous to be fuzzy then you won&#39;t have a problem. In general collinearity causes problems to the interpretability of the model. Prediction is not hurt as long as the new samples that arrive follow the same (multi)collinear pattern. </p> <h2 id="variance-inflation-factor-vif">Variance Inflation Factor (VIF)</h2> <p>The formal method for detecting the presence of multicollinearity is Variance Inflation Factor (VIF). VIF measures how much the variances of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related. VIF is 1 when $X_n$ is not linearly related to other predictors and greater than 1 in the presence of intercorrelations with other features. A VIF of more than 10 is a heuristic for the indication that colinearity is influencing regression.</p> <p>One can drop one or more collinear variables from the model, but 1) you get no other insight on whether the dropped variables hurt or not the prediction 2) the coefficients of the remaining variables will change. One can also do Principal Components Analysis, which will provide new uncorrelated variables. Of course the new variables will not have any physical meaning though whatsoever, hurting again interpretability. Remedial measures against serious collinearity may be <a href="https://en.wikipedia.org/wiki/Tikhonov_regularization">Ridge Regression</a>, which through regularization it gives preference to one solution over the others.</p> <p>On the other hand, in Machine Learning we most often care about robust predictions and models that can generalize well, rather than issues regarding the interpretability of the models. <em>(Sidenote: I believe this will change because we would often like to know why a machine leanring model provided this or that prediction...think of bank credit scoring systems or autonomous cars using deep neural networks that must adhere to certain laws as well.)</em> The balance of how much regularization is needed (a form of a bias-variance trade-off with examples: the $\lambda$ factor in Ridge Regression, number of variables sampled in Random Forrests or regularization parameter C in Support Vector Machines), is useually found through cross-validation. Of course it is always good to know about the existence of collinearity, since in principle when the collinearity equation changes, we can have large prediction errors.</p> <p>To conclude, if you are interested in the effects of the predictor variables to the response and the interpretability of the model, do care about collinearity. If you are interested about the predictive abilities of a model then you can skip it and follow regular machine learning flows. But it is always good to check.</p> <p>Below one can find an experimentation of various cases along with comments I&#39;ve done using R to prove the arguments above. The Rmd document can be found <a href="https://github.com/kyrcha/ml-rants/blob/master/CollinearityAndFeatureSelection.Rmd">here</a> and the rendered html document <a href="https://kyrcha.github.io/ml-rants/CollinearityAndFeatureSelection.html">here</a>.</p> <hr> <h1 id="rmd-document-for-collinearity-and-feature-selection">Rmd document for Collinearity and Feature Selection</h1> <h2 id="intro">Intro</h2> <p>This notebook is an online appendix of my blog post: <a href="http://kyrcha.info/2019/03/22/on-collinearity-and-feature-selection">On Collinearity and Feature Selection</a>, where I play with the concepts using R code.</p> <h2 id="the-dataset">The dataset</h2> <p>We will use the <a href="https://archive.ics.uci.edu/ml/datasets/auto+mpg">auto-mpg dataset</a>, where we will try to predict the miles per galon (mpg) consumption given some car related features like horsepower, weight etc.</p> <pre><code>set.seed(1234) fileURL &lt;- &quot;https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data&quot; download.file(fileURL, destfile=&quot;auto-mpg.data&quot;, method=&quot;curl&quot;) data &lt;- read.table(&quot;auto-mpg.data&quot;, na.strings = &quot;?&quot;, quote=&#39;&quot;&#39;, dec=&quot;.&quot;, header=F) # remove instances with missing values and the name of the car data &lt;- data[complete.cases(data),-9] summary(data) ## V1 V2 V3 V4 ## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 ## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 ## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 ## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 ## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 ## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 ## V5 V6 V7 V8 ## Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000 ## 1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ## Median :2804 Median :15.50 Median :76.00 Median :1.000 ## Mean :2978 Mean :15.54 Mean :75.98 Mean :1.577 ## 3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 ## Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000</code></pre><h2 id="preprocessing">Preprocessing</h2> <p>Let&#39;s normalize the dataset (values to be in the interval [0,1]), an operation that will maintain the correllation between the variables, and split between training and testing.</p> <pre><code>normalize &lt;- function(x) { (x - min(x, na.rm=TRUE))/(max(x,na.rm=TRUE) - min(x, na.rm=TRUE)) } normData &lt;- cbind(data[,1], as.data.frame(lapply(data[,-1], normalize))) # name variables names(normData) &lt;- c(&quot;mpg&quot;, &quot;cylinders&quot;, &quot;displacement&quot;, &quot;horsepower&quot;, &quot;weight&quot;, &quot;acceleration&quot;, &quot;model_year&quot;, &quot;origin&quot;) # check correlation cat(&quot;Cor between disp. and weight before norm.:&quot;, cor(data$V3, data$V5), &quot;\n&quot;) ## Cor between disp. and weight before norm.: 0.9329944 cat(&quot;After norm.:&quot;, cor(normData$displacement, normData$weight)) ## After norm.: 0.9329944 # Train/Test split library(tidyverse) library(caret) training.samples &lt;- normData$mpg %&gt;% createDataPartition(p = 0.8, list = FALSE) train.data &lt;- normData[training.samples, ] test.data &lt;- normData[-training.samples, ]</code></pre><h2 id="modelling">Modelling</h2> <h3 id="linear-regression">Linear Regression</h3> <pre><code>linearModel &lt;- lm(mpg ~., data = train.data) summary(linearModel) ## ## Call: ## lm(formula = mpg ~ ., data = train.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.4147 -2.1845 -0.1875 1.7702 12.8931 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 25.4017 1.3012 19.521 &lt; 2e-16 *** ## cylinders -2.5022 1.8711 -1.337 0.1821 ## displacement 7.7778 3.2280 2.409 0.0166 * ## horsepower -2.8404 2.8077 -1.012 0.3125 ## weight -22.9293 2.5394 -9.029 &lt; 2e-16 *** ## acceleration 2.0678 1.8409 1.123 0.2622 ## model_year 9.3472 0.7023 13.309 &lt; 2e-16 *** ## origin 2.9604 0.6383 4.638 5.21e-06 *** ## --- ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1 ## ## Residual standard error: 3.389 on 307 degrees of freedom ## Multiple R-squared: 0.8175, Adjusted R-squared: 0.8134 ## F-statistic: 196.5 on 7 and 307 DF, p-value: &lt; 2.2e-16</code></pre><p>One can see that weight is a very important factor both in terms of the coefficient value and in terms of statistical significance (unlikely to observe a relationship between weight and mpg due to change). Notice the negative coefficient (more weight, less miles per gallon), which can be explained by the laws of physics. But also notice that even though weight and diplacement have a correlation o 0.93 (almost collinear), their coefficients have different signs. Based on common knowledge though they should have had the same signs. Collinearity is bad when you try and explain the outputs of models. Let&#39;s examine the VIF values:</p> <pre><code>library(car) vif(linearModel) ## cylinders displacement horsepower weight acceleration ## 10.797995 19.995685 8.952301 10.191612 2.424953 ## model_year origin ## 1.223735 1.794710</code></pre><p>We observe that 3 predictors have a value more than 10 that is a concern for the existence of collinearity (or multicollinearity in this case). So let&#39;s drop displacement and create a second model:</p> <pre><code>linearModelMinusDisp &lt;- lm(mpg ~.-displacement, data = train.data) summary(linearModelMinusDisp) ## ## Call: ## lm(formula = mpg ~ . - displacement, data = train.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.4776 -2.2073 -0.1473 1.7625 12.9679 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 25.4241 1.3113 19.388 &lt; 2e-16 *** ## cylinders 0.5072 1.4040 0.361 0.718 ## horsepower -1.1352 2.7382 -0.415 0.679 ## weight -20.6134 2.3688 -8.702 &lt; 2e-16 *** ## acceleration 1.5673 1.8434 0.850 0.396 ## model_year 9.2684 0.7070 13.110 &lt; 2e-16 *** ## origin 2.4812 0.6112 4.060 6.24e-05 *** ## --- ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1 ## ## Residual standard error: 3.415 on 308 degrees of freedom ## Multiple R-squared: 0.8141, Adjusted R-squared: 0.8104 ## F-statistic: 224.7 on 6 and 308 DF, p-value: &lt; 2.2e-16 vif(linearModelMinusDisp) ## cylinders horsepower weight acceleration model_year ## 5.986398 8.383554 8.731646 2.394082 1.221086 ## origin ## 1.620491</code></pre><p>and let&#39;s check the predictive ability of the two:</p> <pre><code>predLM &lt;- linearModel %&gt;% predict(test.data) predLMMD &lt;- linearModelMinusDisp %&gt;% predict(test.data) cat(&quot;Full model:&quot;, RMSE(predLM, test.data$mpg), &quot;\n&quot;) ## Full model: 3.098026 cat(&quot;Minus disp. model:&quot;, RMSE(predLMMD, test.data$mpg)) ## Minus disp. model: 3.125825</code></pre><p>As one can see I now have a more &quot;understandable&quot; model, with kind of a &quot;worse&quot; predictive ability (slightly higher error).</p> <p>Now let&#39;s remove all the variables that had a VIF value of more than 10</p> <pre><code>linearModelSimpler &lt;- lm(mpg ~.-displacement-cylinders-weight, data = train.data) summary(linearModelSimpler) ## ## Call: ## lm(formula = mpg ~ . - displacement - cylinders - weight, data = train.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.832 -2.415 -0.533 1.930 12.794 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 28.7618 1.4362 20.027 &lt; 2e-16 *** ## horsepower -24.3357 1.6960 -14.349 &lt; 2e-16 *** ## acceleration -6.9758 1.8700 -3.730 0.000227 *** ## model_year 8.0722 0.8034 10.048 &lt; 2e-16 *** ## origin 4.9410 0.6324 7.813 8.76e-14 *** ## --- ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1 ## ## Residual standard error: 3.938 on 310 degrees of freedom ## Multiple R-squared: 0.7511, Adjusted R-squared: 0.7479 ## F-statistic: 233.9 on 4 and 310 DF, p-value: &lt; 2.2e-16 vif(linearModelSimpler) ## horsepower acceleration model_year origin ## 2.418236 1.852449 1.185461 1.304459 predLMS &lt;- linearModelSimpler %&gt;% predict(test.data) cat(&quot;Even simpler model - RMSE:&quot;, RMSE(predLMS, test.data$mpg), &quot;\n&quot;) ## Even simpler model - RMSE: 3.575336</code></pre><p>Now the model is simple, more explainable, without any colinearities (low VIF values) but not as good in terms of RMSE as the previous ones.</p> <h2 id="feature-selection">Feature Selection</h2> <p>To check also a feature selection method, stepwise feature selection using the Akaike Information Criterion (AIC):</p> <pre><code>require(leaps) require(MASS) step.model &lt;- stepAIC(linearModel, direction = &quot;both&quot;, trace = FALSE) summary(step.model) ## ## Call: ## lm(formula = mpg ~ displacement + weight + acceleration + model_year + ## origin, data = train.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.1253 -2.1573 -0.1547 1.8907 12.8220 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 24.4535 1.0334 23.663 &lt; 2e-16 *** ## displacement 4.3108 2.3178 1.860 0.0639 . ## weight -24.4212 2.2367 -10.918 &lt; 2e-16 *** ## acceleration 3.1711 1.4748 2.150 0.0323 * ## model_year 9.5092 0.6811 13.963 &lt; 2e-16 *** ## origin 2.7723 0.6225 4.453 1.18e-05 *** ## --- ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1 ## ## Residual standard error: 3.392 on 309 degrees of freedom ## Multiple R-squared: 0.816, Adjusted R-squared: 0.813 ## F-statistic: 274 on 5 and 309 DF, p-value: &lt; 2.2e-16 linearModelAIC &lt;- lm(as.formula(step.model), data = train.data) vif(linearModelAIC) ## displacement weight acceleration model_year origin ## 10.288515 7.890980 1.553344 1.148540 1.703980 predLMAIC &lt;- linearModelAIC %&gt;% predict(test.data) cat(&quot;AIC model:&quot;, RMSE(predLMAIC, test.data$mpg), &quot;\n&quot;) ## AIC model: 3.114763</code></pre><p>Through this example we can see than even though we have colinearities involved (VIF value of 10+), we obtain a low RMSE of 3.11. Or using another feature selection package:</p> <pre><code># Set up repeated k-fold cross-validation train.control &lt;- trainControl(method = &quot;cv&quot;, number = 10) # Train the model step.model2 &lt;- train(mpg ~., data = train.data, method = &quot;leapBackward&quot;, tuneGrid = data.frame(nvmax = 1:7), trControl = train.control ) step.model2$results ## nvmax RMSE Rsquared MAE RMSESD RsquaredSD MAESD ## 1 1 4.387909 0.6950849 3.378332 0.9854225 0.09849010 0.6697571 ## 2 2 3.407376 0.8137023 2.661535 0.9123314 0.06485389 0.5518240 ## 3 3 3.333900 0.8224665 2.549990 0.8779426 0.06157152 0.5311275 ## 4 4 3.371991 0.8188230 2.592064 0.8832272 0.06214344 0.5410169 ## 5 5 3.395429 0.8170493 2.618763 0.8325787 0.05722386 0.4970754 ## 6 6 3.404866 0.8163436 2.633234 0.7979961 0.05357786 0.4526265 ## 7 7 3.384003 0.8180447 2.618266 0.8100225 0.05401179 0.4619242 summary(step.model2$finalModel) ## Subset selection object ## 7 Variables (and intercept) ## Forced in Forced out ## cylinders FALSE FALSE ## displacement FALSE FALSE ## horsepower FALSE FALSE ## weight FALSE FALSE ## acceleration FALSE FALSE ## model_year FALSE FALSE ## origin FALSE FALSE ## 1 subsets of each size up to 3 ## Selection Algorithm: backward ## cylinders displacement horsepower weight acceleration model_year ## 1 ( 1 ) &quot; &quot; &quot; &quot; &quot; &quot; &quot;*&quot; &quot; &quot; &quot; &quot; ## 2 ( 1 ) &quot; &quot; &quot; &quot; &quot; &quot; &quot;*&quot; &quot; &quot; &quot;*&quot; ## 3 ( 1 ) &quot; &quot; &quot; &quot; &quot; &quot; &quot;*&quot; &quot; &quot; &quot;*&quot; ## origin ## 1 ( 1 ) &quot; &quot; ## 2 ( 1 ) &quot; &quot; ## 3 ( 1 ) &quot;*&quot; coef(step.model2$finalModel, 3) ## (Intercept) weight model_year origin ## 26.190907 -21.244932 9.522434 2.370324</code></pre><p>The best model has 3 predictors: <code>weight</code>, <code>model_year</code> and <code>origin</code>. So making one more final linear regression model and predicting the <code>mpg</code> in the test set we have:</p> <pre><code>linearModelBest &lt;- lm(mpg ~weight+model_year+origin, data = train.data) summary(linearModelBest) ## ## Call: ## lm(formula = mpg ~ weight + model_year + origin, data = train.data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.8638 -2.1894 -0.0388 1.7413 13.0971 ## ## Coefficients: ## Estimate Std. Error t value Pr(&gt;|t|) ## (Intercept) 26.1909 0.6744 38.834 &lt; 2e-16 *** ## weight -21.2449 1.0134 -20.965 &lt; 2e-16 *** ## model_year 9.5224 0.6649 14.321 &lt; 2e-16 *** ## origin 2.3703 0.5942 3.989 8.28e-05 *** ## --- ## Signif. codes: 0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1 ## ## Residual standard error: 3.412 on 311 degrees of freedom ## Multiple R-squared: 0.8126, Adjusted R-squared: 0.8108 ## F-statistic: 449.6 on 3 and 311 DF, p-value: &lt; 2.2e-16 vif(linearModelBest) ## weight model_year origin ## 1.601095 1.082269 1.534712 predLMBest &lt;- linearModelBest %&gt;% predict(test.data) cat(&quot;Best model:&quot;, RMSE(predLMBest, test.data$mpg), &quot;\n&quot;) ## Best model: 3.096694</code></pre><p><em>3.09</em>!!! Lowest error, with only 3 predictors and no collinearities involved. In this case feature selection went along with a fina model that is interpretable as well.</p> <h2 id="ridge-regression">Ridge Regression</h2> <p>One solution to the collinearity problem (without doing any feature selection) is to apply Ridge Regression and to try to &quot;constrain&quot; the number of solution of the <code>beta</code> coefficients into a single solution. In this case since we have the hyperparameter lambda to optimize for which we will apply 10-fold cross-validation on the training set to find the best value and then use that to train the model and predict mpg in the testing dataset.</p> <pre><code>library(glmnet) y &lt;- train.data$mpg x &lt;- train.data %&gt;% dplyr::select(-starts_with(&quot;mpg&quot;)) %&gt;% data.matrix() lambdas &lt;- 10^seq(3, -2, by = -.1) fit &lt;- glmnet(x, y, alpha = 0, lambda = lambdas) cv_fit &lt;- cv.glmnet(x, y, alpha = 0, lambda = lambdas, nfolds = 10) # uncomment the plot to see how lambda changes the error. #plot(cv_fit) opt_lambda &lt;- cv_fit$lambda.min x_test &lt;- test.data %&gt;% dplyr::select(-starts_with(&quot;mpg&quot;)) %&gt;% data.matrix() y_predicted &lt;- predict(fit, s = opt_lambda, newx = x_test) cat(&quot;Ridge RMSE:&quot;, RMSE(y_predicted, test.data$mpg)) ## Ridge RMSE: 3.096991</code></pre><p>The Ridge Regression produced one of the lowest error and without dropping any of the coefficients. And as for the coefficients&#39; values:</p> <pre><code>coef(cv_fit) ## 8 x 1 sparse Matrix of class &quot;dgCMatrix&quot; ## 1 ## (Intercept) 26.7194742 ## cylinders -2.1671592 ## displacement -1.6690043 ## horsepower -5.5116891 ## weight -11.7117393 ## acceleration -0.3212087 ## model_year 7.8818066 ## origin 2.6205643</code></pre><p>which as we can see gave a much more reasonable and physically explainable model. The more cyclinders, displacement, horsepower, weight and accellaration...the less miles per gallon you can drive, while the more recent the model, which originated from origin 2 (Europe) or 3 (Japan) the more miles on the gallon you can go. As for the RMSE, it is close both to the full model and to the optimized model using feature selection.</p> <h2 id="discussion">Discussion</h2> <p>Collinearity is important if you need to have an understandable model. If you don&#39;t, and you just care for predictive ability you can be more brute and care about the numbers.</p> <![CDATA[Make your environment variables more robust by making them more fragile]]>http://kyrcha.info/2019/01/29/make-your-environment-variables-more-robust-by-making-them-more-fragilehttp://kyrcha.info2019/01/29/make-your-environment-variables-more-robust-by-making-them-more-fragileTue, 29 Jan 2019 08:39:00 GMT<p>In the <a href="https://devitconf.org/2016/">Devit 2016</a> conference in Thessaloniki, in one of the keynotes, <a href="https://www.yegor256.com/">Yegor Bugayenko</a> explained why you need to make your software more fragile, in order to make it more robust. It was a really good talk, a talk that I still remember. In the end of this post I have embedded the recording from the conference.</p> <p>In this talk Yegor Bugayenko explained why you need to fail fast software in development in order to make it more robust in production. According to that strategy, every time there is a potential source of a bug or in general something that is not in the happy path, we should make it more visible and &quot;bigger&quot;, in order to catch it by failing fast, fix it and deploy again. Some of the examples in the talk say <strong>&quot;do this&quot;</strong>:</p> <pre><code>@Override void save() { throw new Exception( &quot;not implemented yet&quot; ); }</code></pre><p><strong>&quot;instead of this&quot;</strong>:</p> <pre><code>@Override void save() { // not implemented yet }</code></pre><p>So if the method is not implemented, the program will fail fast with the first strategy but at least you will not think you called <code>save()</code> and it did something when it actually didn&#39;t, like in the second example. Or, <strong>&quot;do this&quot;</strong>:</p> <pre><code>file.delete();</code></pre><p><strong>&quot;instead of this&quot;</strong>:</p> <pre><code>if(!file.delete()) { throw new Exception( &quot;failed to delete a file&quot; ); }</code></pre><p>In this case we ignore the output of the method. If for some reason the system fails to delete the file, we are left hanging with the idea that the file was deleted. But if we throw the exception, we can start writing exception handling code on how to approach a failed file deletion (file not found? file in use? etc.).</p> <p>Now to our case, according to the <a href="https://12factor.net/config">third factor of the twelve-factor-app</a> methodology, we should use environment variables in order to configure the code between deploys (i.e. environments like staging, production, development etc.).</p> <p>In a lot of example in node.js that use environment variables there is often the pattern:</p> <pre><code>const env = process.env.NODE_ENV || &quot;development&quot;;</code></pre><p>or </p> <pre><code>const mongo_uri = process.env.MONGODB_URI || &quot;mongodb://localhost:27017/test&quot;;</code></pre><p>Based on the <em>&quot;making the software more fragile in order to make it more robust&quot;</em> strategy, in my projects I use the following pattern:</p> <pre><code>function throwErr(msg) { throw new Error(msg); } const env = process.env.NODE_ENV || throwErr(&#39;NODE_ENV is unset&#39;); const mongo_uri = process.env.MONGODB_URI || throwErr(&#39;MONGODB_URI is unset&#39;);</code></pre><p>So now, all unset environment variables will cause an error to be thrown and this will force me (or other members of the team) to set them up before continuing with the deployment to an environment, even my own development environment. And how will I know what to do? It will be shown in the logs.</p> <p>Below is the talk that inspired me for this pattern:</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/WOy9zhzyMOE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <p><em>Personal note: This is a post I wanted to write for so much time (a couple of years), but after reading <a href="https://kentcdodds.com/">Kent C. Dodds</a> newsletter yesterday on &quot;Intentional Career Building&quot; I did it first thing in the morning as one of the call to action bullet points he suggested: &quot;Write the blog post you wish existed last week when you were learning something new&quot;</em></p> <![CDATA[2018 in review]]>http://kyrcha.info/2019/01/23/2018-in-reviewhttp://kyrcha.info2019/01/23/2018-in-reviewWed, 23 Jan 2019 09:31:00 GMT<p>2018 is over and through this post I will try to summarize personal and professional accomplisments and some of my 2019 goals. I will also try to reflect what went well, what didn&#39;t and what I&#39;ve learned this year the <a href="https://jamesclear.com/2018-annual-review">James Clear way</a>. I always wanted to share my annual reviews and goals in a blog post, so let&#39;s make this, year one! I know I am a bit late for such posts, but better late than never. The first part is pretty quantitative, while the second is more qualitative.</p> <h2 id="the-list">The list</h2> <h3 id="cyclopt">Cyclopt</h3> <p>(For those wondering what is Cyclopt, <a href="http://cyclopt.com/">Cyclopt</a> is my first startup company)</p> <ul> <li>We got our first clients performing software quality assessment on their web applications.</li> <li>We launched our first MVP <a href="https://qaas.cyclopt.com/">Quality as a Service</a> web application.</li> <li>We got the 5th place out of 361 business plans in the NBG Business Seeds Competition.</li> </ul> <h3 id="publications">Publications</h3> <ul> <li>I co-authored and self-published my first book ever: <a href="https://leanpub.com/practical-machine-learning-r">Practical Machine Learning with R</a>, which at the moment of writing this post has around 500 readers.</li> <li>Published 4 papers:<ul> <li><em>&quot;npm-miner: An Infrastructure for Measuring the Quality of the npm Registry&quot;</em> in MSR 2018</li> <li><em>&quot;Predicting hyperparameters from meta-features in binary classification problems&quot;</em> in AutoML 2018</li> <li>&quot;A Natural Language Driven Approach for Automated Web API Development&quot;_ in WS-REST 2018</li> <li>and &quot;Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks&quot;_ in arxiv</li> </ul> </li> </ul> <h3 id="books-ive-read">Books I&#39;ve read</h3> <p>(not as much as I planned)</p> <ul> <li><em>&quot;The Barefoot Investor&quot;</em> by Scott Pape</li> <li><em>&quot;It Doesn&#39;t Have to Be Crazy at Work&quot;</em> by Jason Fried and David Heinemeier Hansson</li> </ul> <h3 id="conferences-i-went-to">Conferences I went to</h3> <ul> <li>Devit in Thessaloniki</li> <li>AutoML@ICML in Stockholm</li> <li>Voxxed days in Thessaloniki</li> </ul> <h3 id="weightlifting">Weightlifting</h3> <p>The personal records I did this year:</p> <ul> <li>Deadlift: 170kg x 1</li> <li>Squat: 165kg x 1</li> <li>Overhead Press: 72.5kg x 1</li> </ul> <h3 id="projects">Projects</h3> <p>The projects I worked mainly on are the following:</p> <ul> <li>Already running:<ul> <li>Completed the automated continuous integration and deployment pipeline for <a href="https://app.equadcapital.com">https://app.equadcapital.com</a></li> <li>Project management for the project: <em>&quot;Continuous Implicit Authentication on Mobile Devices and Kiosks through gestures&quot;</em>. Through this project we launched the mobile application <a href="http://brainrun.issel.ee.auth.gr">Brain Run</a>, which reached place 344 in Play Store in the Games category.</li> <li>Mobile-Age H2020 project</li> </ul> </li> <li>New:<ul> <li>eeRIS: electric energy Residential Informational System</li> <li>VITAL: Versatile Internet of Things for AgricuLture</li> </ul> </li> </ul> <p>For these two new projects we started building an open source Big Data Management System (BDMS) like <a href="https://keen.io/">keen.io</a> for handling and analyzing real-time event streams. We named it <a href="https://github.com/AuthEceSoftEng/cenote">cenote</a>.</p> <h3 id="proposals">Proposals</h3> <p>Zero out four (0/4) proposals for funding were accepted. On the positive side we have enough previously funded proposals.</p> <h3 id="diploma-theses">Diploma Theses</h3> <p>Some diploma theses I co-supervised that completed in 2018 were:</p> <ul> <li>Anastasios Kakouris: &quot;Continuous User Authentication in Web Applications through Behavioral Biometrics&quot;</li> <li>Napoleon-Christos Economou: &quot;Call by Meaning: Calling Software Components Based on Their Meaning&quot;</li> <li>Giorgos Konstantopoulos: &quot;Decentralized Metering and Billing of energy on Ethereum with respect to scalability and security&quot;</li> </ul> <h3 id="life-in-general">Life in general</h3> <ul> <li>Moved to a new apartments</li> <li>Completed my academic CV (39 pages) and applied for three tenured track positions in Greek universities.</li> <li>I also think I nailed down what I want to do R&amp;D on and what to be good on: <em>&quot;Autonomously improve the quality of software systems (what is called autonomic computing), either in the automatically Find Bugs-Fix-Verify (Fi-Fi-Verify) sense for software systems or in the Life-long-learning sense for machine learning based systems (Software 2.0).&quot;</em></li> </ul> <h3 id="software">Software</h3> <p>Started working on some open source software projects:</p> <ul> <li><a href="https://github.com/cyclopt/jssa">jssa - javascript static analyzer</a>: JS static analyzer (jssa): An aggregation of javascript source code static analysis tools</li> <li><a href="https://github.com/cyclopt/js-starter-kit">js-startet-kit</a>: JS web application starter kit for the MERN stack along with a software development lifecycle proposal</li> <li><a href="https://github.com/kyrcha/github-project-story-points">github-project-story-points</a>: forked and adapted</li> </ul> <h2 id="what-ive-learned">What I&#39;ve learned</h2> <p>I will quote <a href="https://jamesclear.com/2018-annual-review">James&#39; Clear</a> lesson learnt which is exactly what I would want to write: <strong>&quot;Entrepreneurship is never as sexy on the inside as it appears on the outside.&quot;</strong></p> <p><strong>I cannot lose weight unless I spend more calories than those I eat.</strong> I knew it, I&#39;ve read about it over and over again, my wife and friends tell me, but I don&#39;t want to believe it, or most probably don&#39;t want to do it, and think I can escape it by going to the gym more, doing keto, intermitent fasting and what not. Energy balance is the foundation of the <a href="https://muscleandstrengthpyramids.com/">nutrition pyramid</a>. Period.</p> <p><strong>The <a href="https://jamesclear.com/four-burners-theory">four burners theory</a> is a valid theory.</strong> You cannot do everything, i.e. health, work, family, friends, with your top performance, especially in my case, where work is split between the academia and the start-up and the family is a wife and two small kids.</p> <p><strong>Another valid theory is the willpower muscle</strong>, especially in combination with the four burners theory. I cannot have many burners on and in the same time expect myself to have the willpower to accomplish stuff in all or most of them. What usually pays the bill is the health burner, both in terms of overweight and stress.</p> <h2 id="2019">2019</h2> <p>The major professional and health goals for 2019 are:</p> <ul> <li>To go below 90 kgs and 20% body fat.</li> <li>To launch the Cyclopt chabot and GitHub Application and reach a decent number of installations.</li> <li>Research and development on the JavaScript (node.js) ecosystem with blog posts, open source software, papers and more.</li> <li>Write more blog posts than the previous year and read more books than the previous years (non-fiction and technological/scientific).</li> <li>Co-author more academic papers than the previous year: 4+</li> </ul> <p>Finally,some other annual reviews I read were:</p> <ul> <li><a href="https://buttondown.email/kentcdodds/archive/ca78f624-8ed7-463f-addd-a0d039d4dc3b">Kent C Dodds</a></li> </ul> <![CDATA[Coarse-to-Fine Decoding for Neural Semantic Parsing]]>http://kyrcha.info/2018/11/29/coarse-to-finehttp://kyrcha.info2018/11/29/coarse-to-fineThu, 29 Nov 2018 09:54:00 GMT<p><em><strong>Preamble</strong></em></p> <p><em>I decided to have a machine learning on software (MLSW) paper reading group on my own :) and start writing short summaries (bits) on papers I read on the subject. I want them to serve as long term memory for me and make me write better summaries and reviews. I am not sure if they will be of use to anyone else, but in anycase I make them public. I will start by reading all the papers from the <a href="https://github.com/src-d/awesome-machine-learning-on-source-code">awesome machine learning on source code</a> repo.</em></p> <h1 id="summary">Summary</h1> <p>Coarse2fine method learns semantic parsers from instances of natural language expressions paired with structured meaning representations that are machine interpretable. More specifically, the structured meaning representations are logical forms (λ-calculus), django (python) expressions and SQL queries. As an example the goal is to transform:</p> <pre><code>What record company did conductor Mikhail Snitko record for after 1996?</code></pre><p>into</p> <pre><code>SELECT Record Company WHERE (Year ofRecording &gt; 1996) AND (Conductor = Mikhail Snitko)</code></pre><p>To do that coarse2fine transforms the input <em>x</em> into a meaning sketch <em>a</em> and then into the final meaning representation <em>y</em>. Bi-directional LSTMs are used for encoding the input <em>x</em> and RNNs with an attention mechanism are used to decode the encoded input into the abstract sketch <em>a</em>. A similar encoding-decoding scheme is used for transforming <em>a</em> to <em>y</em>. Of course certain fine-tunings are added to encounter for differences among tasks.</p> <p>The experimental results show that coarse2fine does a pretty good job and is worth taking a better look.</p> <p>The paper can be found <a href="https://arxiv.org/pdf/1805.04793.pdf">here</a> and the code is provided <a href="https://github.com/donglixp/coarse2fine">here</a>.</p> <![CDATA[Devit 2018 takeaways and notes]]>http://kyrcha.info/2018/06/13/devit-2018-takeaways-and-noteshttp://kyrcha.info2018/06/13/devit-2018-takeaways-and-notesWed, 13 Jun 2018 11:10:00 GMT<p>Some notes and takeaways from the Devit 2018 conference that took place in Thessaloniki on Monday, June the 11th, 2018 and that will also help my long-term memory.</p> <h2 id="from-david-platt">From <a href="http://www.whysoftwaresucks.com/">David Platt</a></h2> <p>Lively presentation!</p> <p>Reading:</p> <ul> <li><a href="https://www.joyofux.com/">The joy of ux</a></li> </ul> <h2 id="from-pawel-dudek">From <a href="https://twitter.com/eldudi">Pawel Dudek</a>:</h2> <p>&quot;There is no such thing as untestable behavior&quot;. If you cannot test your code then you haven&#39;t architect it correctly. &quot;Tests drive the architecture of the app&quot;.</p> <p>To checkout:</p> <iframe src="https://player.vimeo.com/video/12350535" width="640" height="360" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe> <p><a href="https://vimeo.com/12350535">2009 - Sandi Metz - SOLID Object-Oriented Design</a> from <a href="https://vimeo.com/goruco">Gotham Ruby Conference</a> on <a href="https://vimeo.com">Vimeo</a>.</p> <h2 id="from-cheryl-platz">From <a href="http://www.cherylplatz.com/">Cheryl Platz</a>:</h2> <p>Ask difficult questions when you design and build apps in the era of AI and conversational bots like:</p> <ul> <li>How will this make the world better or worse?</li> <li>If we are successful how customers will be harmed?</li> <li>How customers can abuse our product?</li> <li>What is the worst case impact our product could have?</li> </ul> <p>To checkout:</p> <ul> <li><a href="https://www.artefactgroup.com/the-tarot-cards-of-tech/">The tarot cards of tech</a> for humanity-centered design</li> <li><a href="https://medium.com/mule-design/on-surveys-5a73dda5e9a0">On Surveys</a> for UI/UX</li> <li><a href="http://www.practicalethnography.com/">The practical ethnography book</a></li> <li><a href="https://www.microsoft.com/en-us/design/inclusive">Microsoft&#39;s Inclusive Design</a> thinking</li> <li><a href="https://dscout.com/">dscout</a></li> </ul> <p>Small note: <em>Device agnostic architecture</em> - move customer data and telemetry on the cloud, don&#39;t leave them in the device. So the user could have a nice flow when moving between devices.</p> <p>In general the theme from both David Platt&#39;s and Cheryl Platz&#39;s talks was similar to <a href="http://momtestbook.com/">the mom test</a> book:</p> <ul> <li>&quot;Don’t build products your customers don’t need&quot;</li> <li>&quot;Give customers what they want for a price they are willing to pay&quot;</li> <li>Don&#39;t write code before showing mockups to users/customers (leads to expensive, lost time)</li> </ul> <h2 id="from-the-panel-on-privacy">From the panel on privacy</h2> <p>I learned that according to <a href="https://www.researchgate.net/publication/281007197_The_cost_of_reading_privacy_policies">research</a> a person has to spent 76 work days per year to read all the Privacy Policies encountered on the interent.</p> <h2 id="from-ingrid-epure">From <a href="https://twitter.com/ingridepure">Ingrid Epure</a></h2> <p>Notes:</p> <ul> <li>Add thresholds in dashboards so the user does not have to think so hard about a metric</li> </ul> <p>Checkout:</p> <ul> <li><a href="https://www.researchgate.net/publication/228797158_How_complex_systems_fail">How complex systems fail</a> paper</li> <li><a href="http://opentracing.io/">http://opentracing.io/</a></li> <li><a href="https://www.honeycomb.io/">https://www.honeycomb.io/</a></li> <li>Tom Wilkie&#39;s RED method on the <a href="https://www.slideshare.net/weaveworks/monitoring-weave-cloud-with-prometheus/10?src=clipshare">metrics you need to monitor</a></li> </ul> <h2 id="from-julien-simon">From <a href="https://medium.com/@julsimon">Julien Simon</a></h2> <p>Fun demo with a <a href="https://twitter.com/callmejohnnypi">small robot</a> on <a href="https://medium.com/@julsimon/johnny-pi-i-am-your-father-part-8-reading-translating-and-more-c22f7b8275cc">how one can use AWS</a> and move all computations to the cloud with various AWS services.</p> <p>I am thinking that from now on it doesn&#39;t really makes sense to build and deploy your own models that for example do face recognition or text translation. It doesn&#39;t worth the time and effort.</p> <p>Also checkout:</p> <ul> <li><a href="https://www.computer.org/csdl/mags/ic/2017/03/mic2017030012.html">Two Decades of Recommender Systems at Amazon.com</a> paper</li> </ul> <![CDATA[Machine learning tutorials mini-site]]>http://kyrcha.info/2016/11/10/machine-learning-tutorials-mini-sitehttp://kyrcha.info2016/11/10/machine-learning-tutorials-mini-siteThu, 10 Nov 2016 15:19:00 GMT<p><img src="//images.contentful.com/c5lel8y1n83c/4CRbkIFds4OIqqi2EYagSq/69e91959cbfb0394b044e0072b597273/rmarkdown.png" alt="Rmarkdown"></p> <p>It’s been ages since I wrote my last post. I am planning to be more active from now on (I hope).</p> <p>I’ve been wanting to do a mini-site with machine learning tutorials for years and finally here it is!</p> <p>The mini-site is <a href="http://ml-tutorials.kyrcha.info">ml-tutorials.kyrcha.info</a> and its GitHub repo: <a href="https://github.com/kyrcha/ml-tutorials">https://github.com/kyrcha/ml-tutorials</a></p> <p>The main reason for finally getting through it was that I started teaching two data mining courses in two postgraduate programs (one on the fall and one on the spring semester with different audiences) and I wanted to have some notes to give to students with R implementations of the algorithms I teach in theory in the classroom. The mini-site also include introductory material to R to help you get familiar with it.</p> <p>At the moment I only discuss the R specifics of the algorithms, but my plans are to add some theory in each algorithm as well in order to make the tutorials more standalone.</p> <p>For creating the site I used the <a href="http://rmarkdown.rstudio.com/rmarkdown_websites.html">R Markdown for Website</a> and <a href="https://www.rstudio.com/">RStudio</a>. A great resource is this <a href="https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf">cheatsheet</a>.</p> <h3 id="jupyter-vs-r-markdown">Jupyter vs. R Markdown</h3> <p>I started this effort by working with <a href="http://jupyter.org/">Jupyter</a> notebooks with an R-kernel, but the reasons that made me switch to R Markdown and RStudio were that:</p> <ol> <li>You can actually create out of the box mini-sites like that.</li> <li>I run into problems when I tried to render the Jupyter notebooks into pdf to hand out to students.</li> <li>R Markdown is in markdown and not in JSON, so it is easier to edit it with a text editor.</li> <li>Works well with GitHub and GitHub pages project sites.</li> </ol> <h3 id="deployment">Deployment</h3> <p>I wanted for GitHub to serve the rendered html pages via the GitHub pages project site functionality, using a custom domain to serve the site: the subdomain <strong>ml-tutorials.kyrcha.info</strong>. Searching a bit over the internet I set it up as follows:</p> <p><strong>Step 1:</strong> Configured the site rendering tool to put the generated html files to a docs folder</p> <p><strong>Step 2:</strong> Added a footer with a new google analytics property to check out the traffic.</p> <p><strong>Step 3:</strong> In the repo settings in GitHub I added</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/6cXYM7UGUEEycUwWQ2YasE/a65ccaead13cd8c4fbba74150713f63a/github-pages.png" alt="github-pages"></p> <p>The above will add a CNAME file in the docs folder. Since the docs folder is deleted and re-created when rendering the site, I included it in the root folder of the project and in the <code>_site.yml</code> configuration file added: <code>include: [&quot;CNAME&quot;]</code> so that it is transferred in the docs folder every time the site is rendered.</p> <p><strong>Step 4:</strong> Finally I also created an CNAME record in my DNS provider, with <code>name: ml-tutorials</code> and <code>value: kyrcha.github.io</code>.</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/5XkGzV87GosEScawyO466g/0bfbe0d0d83106e9dc799d3695f27b9f/custom-dns.png" alt="custom-dns"></p> <p>Now <a href="http://ml-tutorials.kyrcha.info/">http://ml-tutorials.kyrcha.info/</a> shows whatever is served from GitHub pages <a href="https://kyrcha.github.io/ml-tutorials">https://kyrcha.github.io/ml-tutorials</a> and <a href="https://kyrcha.github.io/ml-tutorials">https://kyrcha.github.io/ml-tutorials</a> redirects to <a href="http://ml-tutorials.kyrcha.info/">http://ml-tutorials.kyrcha.info/</a></p> <p>Whenever I want to add a new tutorial or update an older one I:</p> <ol> <li>Make the changes in my Rmd files</li> <li>Render the site: <code>rmarkdown::render_site()</code></li> <li>Do a git add and a git commit in the local repository and push both the source and the rendered html pages to GitHub.</li> <li>If I want to render a specific page to pdf I enter: <code>rmarkdown::render(&quot;knn.Rmd&quot;, output_format=&quot;pdf_document&quot;)</code></li> </ol> <![CDATA[The S-CASE concept]]>http://kyrcha.info/2014/10/24/the-s-case-concepthttp://kyrcha.info2014/10/24/the-s-case-conceptFri, 24 Oct 2014 10:02:00 GMT<p><em>This is <a href="http://www.scasefp7.eu/2014/10/24/s-case-blog-scase-concept/">a post I wrote for the S-CASE project blog</a>. <a href="http://www.scasefp7.eu/">S-CASE or Scaffolding Scalable Software Services</a> is an EU funded FP7 project I am currently working on as a technical coordinator. The post below describes what the project is about.</em></p> <p>The <span class="highlight">S-CASE</span> project is about semi-automatically creating RESTful Web Services through multi-modal requirements using a Model Driven Engineering methodology. The world of web services is moving towards REST and <span class="highlight">S-CASE</span> aims at facilitating developers implement such web services by focusing mainly on requirements engineering. The figure below depicts the basic components and basic flow of events/data in <span class="highlight">S-CASE</span>.</p> <p><img src="//images.contentful.com/c5lel8y1n83c/1fTEOSSUnGo20CoI0yck8q/025e89aef6478cc54072204ef40948fd/S-CASE-workflow.png" alt="S-CASE workflow"></p> <div><span style="font-size: 18px;"><strong>Typical use case scenario</strong></span></div> <div></div> Through the <span class="highlight">S-CASE</span> IDE the user imports or creates multi-modal requirements for his/her envisioned application. The requirements may be: <ul> <li>Textual requirements in the form “The user/system must be able to …”,</li> <li>UML activity and use case diagrams created in the platform or imported as images,</li> <li>Storyboards for flow charting, and</li> <li>Analysis class diagrams to improve the accuracy of the system to identify entities, their properties and their relationships.</li> </ul> The requirements are then processed through natural language processing and image analysis techniques in order to extract relevant software engineering concepts. These are mainly the identification of RESTful resources, their properties and relations and Create-Read-Update-Delete (CRUD) actions on resources. All these concepts are stored in the <span class="highlight">S-CASE</span> ontology. <p>The above procedure also identifies action-resource tuples that can be created automatically by the system like the action-resource “create bookmark” (automatically built) or others that need more elaborate processes like “get the weather given geolocation coordinates” (semi-automatically build or composed). The latter are send into the Web Services Synthesis and Composition module.</p> <p>The Web Services Synthesis and Composition module tries to synthesize elaborate processes by composing 3rd party web services into a single <span class="highlight">S-CASE</span> composite web service. To perform such a computation, <span class="highlight">S-CASE</span> provides a methodology for semantically annotating 3rd party web services using <span class="highlight">S-CASE</span> domain ontologies, so that they can later be matched to the requirements of the composite service. The composite service is deployed to the YouREST deployment environment and registered in the directory of <span class="highlight">S-CASE</span> web services for future reference and re-use.</p> <p>Upon completing the stages above, the model driven engineering procedure initiates. The first step is to create the Computational Independent Model (CIM) out of the <span class="highlight">S-CASE</span> ontology. The CIM contains the bare minimum information needed to scaffold a REST service that adheres to the requirements imposed by the user, i.e. it includes all the problem’s domain concepts. After that model transformations take place transforming the CIM into PIM (incorporate design constraints, but platform independent) and PSM (Add support for implementing the PIM into a specific suite of software tools like: java, jax-rs, hibernate, json, jaxb, postgresql etc.). The final step is to automatically generate the code of the web service. Calls to composite services are wrapped inside the generated code. The code is build and deployed to YouREST for others to use.</p> <p>In order to support software re-use, every software artifact created from this procedure is stored into the <span class="highlight">S-CASE</span> repository for future retrieval.</p> <p>Through <span class="highlight">S-CASE</span> we plan to develop an ecosystem of services, along with the appropriate tools for service providers to develop quality software for SMEs with an affordable budget.</p> <![CDATA[Searchable, scrollable bootstrap dropdown with angularjs]]>http://kyrcha.info/2014/10/23/searchable-scrollable-dropdown-button-using-angularjs-and-bootstraphttp://kyrcha.info2014/10/23/searchable-scrollable-dropdown-button-using-angularjs-and-bootstrapThu, 23 Oct 2014 02:54:00 GMT<p>So you are working in AngularJS, you are using the Bootstrap framework and the requirement is to create a <a href="http://getbootstrap.com/components/#btn-dropdowns">dropdown button</a>, which will include several (list) items and that a) is scrollable and b) is searchable because the menu items are many.</p> <p>The following code presents a solution to the above problem.</p> <iframe src="https://jsfiddle.net/kyrcha/ULSy3/6/embedded/result,html,js,css,resources" width="100%" height="300" frameborder="0" allowfullscreen="allowfullscreen"></iframe> <p>We created a dropdown button with menu items coming from the angular controller. At the top of the menu an <code>input</code> element is added as a list item and bound to the scope variable <code>query</code>. This will act as the filter in the <code>ng-repeat</code> directive. The problem is that at this point clicking inside the input element will instantly close the dropdown since the event is propagated up the DOM tree. Thus the jQuery <a href="http://api.jquery.com/event.stoppropagation/">stopPropagation</a> method is used for stopping the event from bubbling up.</p> <![CDATA[Book Review: eCommerce in the Cloud by Kelly Goetsch - O'Reilly]]>http://kyrcha.info/2014/10/22/reviews-book-review-ecommerce-cloud-kelly-goetsch-oreillyhttp://kyrcha.info2014/10/22/reviews-book-review-ecommerce-cloud-kelly-goetsch-oreillyMon, 20 Oct 2014 10:56:00 GMT<iframe style="width: 120px; height: 240px;" src="//ws-na.amazon-adsystem.com/widgets/q?ServiceVersion=20070822&amp;OneJS=1&amp;Operation=GetAdHtml&amp;MarketPlace=US&amp;source=ac&amp;ref=qf_sp_asin_til&amp;ad_type=product_link&amp;tracking_id=stemfull-20&amp;marketplace=amazon&amp;region=US&amp;placement=1491946636&amp;asins=1491946636&amp;linkId=RII46BLBRSYAUGFR&amp;show_border=false&amp;link_opens_in_new_window=true" width="300" height="150" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"> </iframe> <p>Author Kelly Goetsch, a product manager focusing on large-scale eCommerce solutions, aims at educating eCommerce stakeholders on whether, why and how they could move their IT infrastructure to the cloud.</p> <p>The book is quite easy to read, mainly due to the fact that the presentation of the technologies and techniques is kept at a high level.</p> <p>Topics of the book include:</p> <ul> <li>Cloud computing related terminology</li> <li>Cloud architectures</li> <li>Availability: how to avoid outages</li> <li>Performance: perform transactions in a reasonable amount of time</li> <li>Automation: reducing errors</li> <li>Elasticity: scaling up and down</li> <li>Security</li> </ul> I would say that the book is suitable for owners and managers of medium to large eCommerce businesses, novices in cloud technologies and distributed computing, who would like to know the terminology and better communicate with their IT personnel on cloud solutions. <p><a href="https://www.oreilly.com/reviews/"><img src="https://cdn.oreillystatic.com/bloggers/blogger-review-badge-125.png" alt="I review for the O'Reilly Reader Review Program" width="125" height="125" border="0" /></a></p> <p><strong>Update 2015-09-18:</strong> This review was part of the <a href="http://www.oreilly.com/reviews/">O&#39;Reilly Reader Review Program</a> that is no longer available</p> <![CDATA[SSL/HTTPS server with Node.js and Express.js]]>http://kyrcha.info/2014/10/14/sslhttps-server-nodejs-expressjshttp://kyrcha.info2014/10/14/sslhttps-server-nodejs-expressjsTue, 14 Oct 2014 09:56:00 GMT<p>So let’s assume the requirement is to create an https server that will redirect traffic to https if http is used in a request to the server instead. I created this little guide by bundling together a couple of links related to the subject.</p> <p>We will begin by creating quickly a project using the <a href="http://expressjs.com/guide.html">express-generator</a>:</p> <pre><code>$ express https-server $ cd https-server &amp;amp;&amp;amp; npm install $ npm start</code></pre><p>Server should be running at <code>http://localhost:3000/</code>. Now let’s create the certificates (<a href="http://heyrod.com/snippet/s/node-https-ssl.html">Reference</a>):</p> <pre><code>$ openssl genrsa 1024 &amp;gt; file.pem $ openssl req -new -key file.pem -out csr.pem $ openssl x509 -req -days 365 -in csr.pem -signkey file.pem -out file.crt</code></pre><p>We assumed no passphrase was used. We can then read the certificates in the starting point file <code>www</code>:</p> <pre><code>var fs = require(&quot;fs&quot;); var config = { key: fs.readFileSync(&#39;file.pem&#39;), cert: fs.readFileSync(&#39;file.crt&#39;) };</code></pre><p>The next step is to create two servers, one to listen on http and port 3000 and one on https and port 8000 (<a href="http://expressjs.com/4x/api.html#app.listen">Reference</a>). The www file becomes:</p> <pre><code>#!/usr/bin/env node var debug = require(&#39;debug&#39;)(&#39;https-server&#39;); var app = require(&#39;../app&#39;); var https = require(&#39;https&#39;); var http = require(&#39;http&#39;); var fs = require(&quot;fs&quot;); var config = { key: fs.readFileSync(&#39;file.pem&#39;), cert: fs.readFileSync(&#39;file.crt&#39;) }; http.createServer(app).listen(3000) https.createServer(config, app).listen(8000)</code></pre><p>Now one can navigate both to <code>http://localhost:3000</code> and <code>https://localhost:8000</code> and get the same response. In the latter case with the usual “proceed with caution” notice since the certificate is not signed by a trusted authority.</p> <p>The last step is to redirect traffic that come into http to https by using a middleware for all routes (<a href="http://stackoverflow.com/a/24015460/869151">Reference</a>):</p> <pre><code>function ensureSecure(req, res, next){ if(req.secure){ return next(); }; res.redirect(&#39;https://&#39;+req.host+&#39;:&#39; + 8000 + req.url); }; app.all(&#39;*&#39;, ensureSecure); app.use(&#39;/&#39;, routes); app.use(&#39;/users&#39;, users);</code></pre><p>So <code>http://localhost:3000</code> and <code>http://localhost:3000/users</code> redirect to <code>https://localhost:8000</code> and <code>https://localhost:8000/users</code> respectively.</p> <p>The complete code can be found in <a href="https://github.com/kyrcha/blog-code/tree/master/https-server">GitHub</a>.</p> <p>Last but not least, in production you can redirect traffic to standard http and https ports like in this <a href="http://stackoverflow.com/a/7458587/869151">reference</a>.</p> <![CDATA[Introductory post: Going MEAN]]>http://kyrcha.info/2014/10/10/introductory-post-going-meanhttp://kyrcha.info2014/10/10/introductory-post-going-meanFri, 10 Oct 2014 10:43:00 GMT<p><em>This was the first post I wrote for the <a href="http://meanstack.info">meanstack.info</a> blog I have created for all things MEAN, now merged with <a href="http://kyrcha.info">kyrcha.info</a>, the site you are at.</em></p> <p dir="ltr">Dear visitor,</p> <p dir="ltr">Hi! My name is Kyriakos Chatzidimitriou. If you would like you can find out more <a title="about.me page for Kyriakos Chatzidimitriou" href="http://about.me/kyrcha">about.me</a>. I like to consider my self an intelligent systems, data and software engineer and this is my blog about the MEAN stack, i.e. MongoDB, ExpressJS, AngularJS and NodeJS and of course about Javascript and Javascript libraries in general.</p> <p dir="ltr">At a certain point in time, during the last couple of years, after reading some inspiring books and along with the rise of cloud computing, the software-as-a-service paradigm and start-ups, I wanted to start building things and creating real products that provide real value to real customers.</p> <p dir="ltr">Even though being a polyglot has many merits, since for example you can learn a lot by studying other programming languages and provide yourself with a fresh perspective to your current dev stack (see <a href="http://euruko2013.org/speakers/#matz">Matz’s talk</a> on being a language designer in Euruko 2013), something I am actively pursuing, I also found fascinating the idea that you could have “<em>one language to rule them all”. A Lingua Franca for building SaaS applications.</em> From database, to server-side, to client side. With that respect, MongoDB, NodeJS and ExpressJS were no brainers to pick for my main dev stack. The last thing was to decide which client-side JS framework to pick-up: BackboneJS, EmberJS, AngularJS, CanJS, other? Again after some digging around I decided to go for AngularJS and complete the puzzle. I’d like to devote a couple of lines to the posts of other developers that got me started with the MEAN stack and aided me decide:</p> <ul> <li>The <a href="http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and">MEAN stack post</a> on MongoDB’s blog</li> <li>A <a href="http://sporto.github.io/blog/2013/04/12/comparison-angular-backbone-can-ember/">comparison post</a> on the client-side JS frameworks</li> <li><a href="http://briantford.com/blog/angular-express.html">A way</a> for NodeJS, ExpressJS and AngularJS integration</li> </ul> By no means I consider myself at this point to be an expert on the MEAN stack. I started using the MEAN stack since September 2013, I’ll be always learning and along the way I am making this procedure public. If others can benefit form it, the better. My familiarity with the Javascript language and its frameworks is just getting started so bare with me if you spot any mistakes on using Javascript. I promise I’ll get better. <p dir="ltr">I am starting this blog so that:</p> <ul> <li>other developers could get the help I got from other blogs like that,</li> <li>make me a better MEAN stack developer by trying to organize my thoughts better in order to write posts open to public criticism,</li> <li>create a link to the MEAN stack community and receive feedback,</li> <li>act as a long term memory storage for practices and techniques I am working on and</li> <li>serve as a reference for future coworkers that are starting with the MEAN stack.</li> </ul> These are my adventures in the world of the MEAN stack … <p>Best,</p> <p>– Kyriakos Chatzidimitriou</p> <p>PS 1. Some links are affiliate links, which if you use, you will make it easier for me to maintain the site and get even more books, to learn more stuff and write even better posts.</p> <p>PS 2. Occasionally, M will mean MySQL since a) other problems suit document databases and other relational ones and b) I am really liking the <a href="http://sequelizejs.com/">Sequelize</a> framework.</p> <![CDATA[Calculating the fractal dimension of the Greek coastline (1.25)]]>http://kyrcha.info/2013/04/19/calculating-the-fractal-dimension-of-the-greek-coastline-1-25http://kyrcha.info2013/04/19/calculating-the-fractal-dimension-of-the-greek-coastline-1-25Fri, 19 Apr 2013 00:54:00 GMT<p><a href="https://commons.wikimedia.org/wiki/File:Great_Britain_Box.svg#/media/File:Great_Britain_Box.svg"><img src="https://upload.wikimedia.org/wikipedia/commons/2/28/Great_Britain_Box.svg" alt="Great Britain Box.svg" width="640" height="355" /></a> &quot;<a href="https://commons.wikimedia.org/wiki/File:Great_Britain_Box.svg#/media/File:Great_Britain_Box.svg">Great Britain Box</a>&quot; by <a title="User:Prokofiev" href="//commons.wikimedia.org/wiki/User:Prokofiev">Prokofiev</a> - <span class="int-own-work" lang="en">Own work</span>. Licensed under <a title="Creative Commons Attribution-Share Alike 3.0" href="http://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a> via <a href="//commons.wikimedia.org/wiki/">Wikimedia Commons</a>.</p> <p>Inspired by the <a href="http://www.complexityexplorer.org/">Introduction to Complexity</a> course and the unit on <em>Factals</em>, I though it would be fun to make a rough calculation of the fractal dimension of the Greek coastline using the <a href="http://en.wikipedia.org/wiki/Minkowski%E2%80%93Bouligand_dimension">box counting method</a>.</p> <p>The box counting method goes as follows:</p> <ol> <li>Split the 2D map that depicts the coastline into squares (boxes) of a certain side size (<em>r</em>) and count the number of boxes (<em>n</em>) that include the coastline.</li> <li>Decrease the size of the boxes and go to step 1.</li> <li>When finished for a series of box sizes do a linear regression of log(n) given log(1/r).</li> <li>The slope of the line fitting the points on the plot is the fractal dimension of the object.</li> </ol> For a map of Greece, I used the one from <a href="http://www.ginkgomaps.com/maps_greece.html">Ginko maps</a> licensed under the Creative Commons Attribution 3.0. Via an image editor, I removed the frame with the infobox and geolocation axes, plus the borders not residing by the sea to facilitate further image processing. The retouched image was cropped to 1600×1600 pixels. Both images are shown below. <h2 style="text-align: center;">Original map</h2> <p><img src="//images.ctfassets.net/c5lel8y1n83c/3tuG98i600mSE8m4iOGyk6/fd28382897d252ce368a76a038648744/rl3c_gr_greece_map_plaindcw_ja_hres.jpg" alt="rl3c gr greece map plaindcw ja hres"></p> <h2 style="text-align: center;">Retouched map</h2> <p><img src="//images.ctfassets.net/c5lel8y1n83c/4Q8fD1fBWgi4ok2sque40k/0017b1f432d33dd4b4bdc2db0f8e3700/rl3c_gr_greece_map_plaindcw_ja_hres_retouched.jpg" alt="rl3c gr greece map plaindcw ja hres retouched"></p> <p style="text-align: left;">The R script below implements the box counting method on the coastline jpeg picture (make all values &gt; 0.5 white) using boxes with sides that are divisors of 1600.</p> <h2 style="text-align: center;">The coastline map</h2> <p><img src="//images.ctfassets.net/c5lel8y1n83c/2BfGR9L9G8syqw6Iykagc4/830bf6a98496aaaba473708d7c296c20/coastline.jpeg" alt="coastline"></p> <h2 style="text-align: center;">The R script</h2> <pre class="prettyprint"><code class="language-r"> library(jpeg) rm(list=ls()) img = readJPEG("coastline.jpeg") # filter out mainland img[img &gt; 0.5] = 1 # divisors of 1600 # 1,2,4,5,8,10,16,20,25,32,40,50,64,80,100,160,200,320,400,800,1600 boxSizes = c(50, 40, 32, 25, 20, 16, 10, 8, 5, 4) h = img[,,1] data = data.frame() for(k in 1:length(boxSizes)) { b = boxSizes[k] x = dim(img[,,1])[1] ratio = x/b # https://stat.ethz.ch/pipermail/r-help/2012-February/303163.html k = kronecker(matrix(1:(ratio^2), ratio, byrow = TRUE), matrix(1,b,b)) g = lapply(split(h,k), matrix, nr = b) counter = 0; for(i in 1:length(g)) { counter = counter + any(g[[i]] &lt; 0.999) } data = rbind(data,c(log(counter),log(1/b))) } names(data) = c("Y", "X") model = lm(Y~., data=data) cat(coef(model), "\n") </code></pre> <h2 style="text-align: center;">The plot</h2> <p><img src="//images.ctfassets.net/c5lel8y1n83c/2S77qITbWMC0guKMiAecai/140fed5c8f6f6f86c985af67f503954f/plot.jpg" alt="plot"></p> <p style="text-align: left;">With this rough approximation, the calculation yielded that <strong>the fractal dimension of the Greek coastline is 1.25</strong>. Great Britain’s was measured to be 1.25 and Norway’s 1.52 [<a href="http://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension">source</a>].</p> <p style="text-align: left;"></p><![CDATA[2013 and beyond, todo list]]>http://kyrcha.info/2012/12/23/2013-and-beyond-todo-listhttp://kyrcha.info2012/12/23/2013-and-beyond-todo-listSat, 22 Dec 2012 22:00:00 GMT<p>This time my new year&#8217;s resolutions are here to stay. For life. I hope sometime soon I form them into my <em>personal constitution, </em>relating to the pro-activeness habit, one of the <a href="http://amzn.to/Ve4otk">seven habits of highly effective people</a>. In addition, I plan to have <a href="http://chrisguillebeau.com/3x5/how-to-conduct-your-own-annual-review/">an annual review</a> for keeping up with more specific roles and goals for 2013. To cut things short, my todo list is:</p> <ol> <li>To live a life true to myself</li> <li>To don&#8217;t work so hard</li> <li>To have the courage to express my feelings</li> <li>To stay in touch with my friends</li> <li>To be happier</li> <li>To aim high</li> <li>To be modest (&#8220;You don&#8217;t know what you don&#8217;t know&#8221;)</li> <li>To have passion</li> <li>To build my character taking also into account values and virtues I admire in others</li> <li>To believe in myself</li> <li>To work with people I like and have fun with</li> <li>To be surrounded with people with positive energy</li> <li>To be patient and not to give up</li> <li>To admit my mistakes</li> <li>To be lucky</li> </ol> <p>OK I know the last one is not up to me, but I interpret it as &#8220;To don&#8217;t run for trains&#8221;. <em>Note: you must have read the <a href="http://amzn.to/YBKnjo">Black Swan book</a> in order to understand this one.</em></p> <p>The first five are taken from the blog post &#8220;<a href="http://www.inspirationandchai.com/Regrets-of-the-Dying.html">REGRETS OF THE DYING</a>&#8220;, while the next ten from <a href="http://youtu.be/lxdA0ey3Rss?t=50m53s">a talk by Nikos Stathopoulos</a> (in Greek) about the habits of highly effective people.</p><![CDATA[Budapest trip]]>http://kyrcha.info/2012/08/18/budapest-triphttp://kyrcha.info2012/08/18/budapest-tripSat, 18 Aug 2012 04:35:00 GMT<p>Last time I visited Budapest was the summer of 2002 during my IAESTE internship at Elcoteq in Pécs. This is a log of our trip during the summer of 2012.</p> <h3>Day 1</h3> <p>First time to fly Ryanair &#8211; The flight was delayed a bit so we didn&#8217;t get to hear their jingle &#8211; Taxi booked online (20€) prior to departure &#8211; When we arrived there was an incident with an unattended bag in the parking lot but everything turned out to be OK &#8211; After checking in at the hotel, we went out for a walk to locate 0-24 hour shops nearby and walked in Vaci utca and by the Danube.</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789047922/in/album-72157631082268964/" title="Royal palace"><img src="https://farm9.staticflickr.com/8283/7789047922_180054f3a2.jpg" width="500" height="281" alt="Royal palace"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Day 2</h3> <p>Breakfast at Cafe Gerbaud &#8211; Got the 72 hours transportation tickets (3850 Fiorints Per Person &#8211; FPP) for the metro, tram and buses &#8211; Started the walk: Vienna gate =&gt; Fisherman&#8217;s bastion =&gt; Royal palace =&gt; Tram to Gellert hill =&gt; <a href="http://en.wikipedia.org/wiki/Citadella">Citadella</a> =&gt; Liberty bridge =&gt; Small stop at the market =&gt; Raday utca for lunch and then Cafe Central for desert &#8211; In the evening we went for the standard walk on Vaci, Danube and Chain bridge &#8211; Later visited the Fashion street where they sell stove cakes (<a href="http://en.wikipedia.org/wiki/K%C3%BCrt%C5%91skal%C3%A1cs">kurtoskalacs</a>) and bread <a href="http://en.wikipedia.org/wiki/Langos">langos</a>. As for the stove cakes, I liked the one with cinnamon more than the one with cocoa.</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789031736/in/album-72157631082268964/" title="Fisherman&#x27;s bastion"><img src="https://farm8.staticflickr.com/7253/7789031736_988f56d823.jpg" width="500" height="334" alt="Fisherman&#x27;s bastion"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Day 3</h3> <p>These kurtoskalacs make for a great breakfast &#8211; Today&#8217;s tour started from Szent István Bazilika &#8211; Took the lift up to the Dome for panoramic view of the city &#8211; Then to Szabadság tér , a plaza with a pressure aware fountain and the Parliament &#8211; This was our first attempt to enter (tours sold out early since it was Sunday) &#8211; Margaret Island &#8211; Walk to the end and took the bus back &#8211; Great Synagogue (largest in Europe, second largest in the world but largest in capacity) &#8211; English tour included in the ticket &#8211; &#8220;For sale&#8221; for goulash soup &#8211; Again the standard walk in Vaci, Danube and the Chain bridge.</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789050830/in/album-72157631082268964/" title="Tram lines"><img src="https://farm8.staticflickr.com/7253/7789050830_9ef62d9b81.jpg" width="500" height="281" alt="Tram lines"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Day 4</h3> <p>Woke up early &#8211; wait in the line to enter the Parliament (see my tip on 4sq and why you must book online first) &#8211; then off to market &#8211; tried fried langos &#8211; the upper level of the market was pretty packed and seemed to me more of a touristy place than a traditional Hungarian spot &#8211; visited the renowned New York cafe.</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789043548/in/album-72157631082268964/" title="New York cafe"><img src="https://farm9.staticflickr.com/8289/7789043548_1fd08b1c52.jpg" width="500" height="334" alt="New York cafe"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Day 5</h3> <p>Szechenyi baths &#8211; Heroes square &#8211; Andrassy utca &#8211; Terror museum</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789044828/in/album-72157631082268964/" title="Szechenyi baths"><img src="https://farm9.staticflickr.com/8290/7789044828_c47e6bbafc.jpg" width="500" height="375" alt="Szechenyi baths"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Day 6</h3> <p>Checked out and to the airport &#8211; This time the Ryanair jingle played upon landing</p> <p><a data-flickr-embed="true" href="https://www.flickr.com/photos/kyrcha/7789049832/in/album-72157631082268964/" title="The Danube"><img src="https://farm9.staticflickr.com/8425/7789049832_0a1d2e840f.jpg" width="500" height="281" alt="The Danube"></a><script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script></p> <h3>Afterthoughts and observations</h3> <p>Even though there are not very famous museums to visit or monuments that stand out, I think Budapest&#8217;s beauty is in its location: the river, the banks and the bridges. We went to the terror museum since we found it to be different from others we visited in the past. The museum actually turned out to be quite atmospheric. Also a must do in Budapest is to visit one of the baths.</p> <p>Prices are reversely proportional to the distance from Vaci utca from supermarkets prices to eating and change shops. For example prices in Raday utca, a nice street full of places to eat and drink, are much lower than in Vaci and of the same, if not better quality.</p> <p>Tip for the supermarkets: Blue cap for sparkling and pink cap for non sparkling water.</p> <p>A tip between 12-15% is included in the bill in around half the places we ate. In the others you can calculated it.</p> <p>There were no big metro signs in the metro entrances, so you had to look for them. Also there are ticket inspectors at every metro we visited. I guess they had a big problem with missing revenue from free-riders and they resolved to this measure.</p> <p>Budapest had enough tourists from all over the world, but not as many as I show for example in Barcelona last year.</p> <h3>More resources</h3> <p>I compiled a list of places I visited and some interesting tips in the <a href="https://foursquare.com/kyrcha/list/budapest-trip">foursquare Budapest trip list</a>.</p> <p>Also a small collection of photos can be found in the <a href="http://www.flickr.com/photos/kyrcha/sets/72157631082268964/">Budapest 2012 flickr set</a>.</p> <p style="text-align: left;"><iframe src="https://maps.google.com/maps/ms?msa=0&amp;msid=205719516427590235446.0004c3f50c15a4d814f05&amp;ie=UTF8&amp;t=m&amp;ll=47.50491,19.058533&amp;spn=0.081173,0.145912&amp;z=12&amp;output=embed" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="350"></iframe><br /> <small>View <a href="https://maps.google.com/maps/ms?msa=0&amp;msid=205719516427590235446.0004c3f50c15a4d814f05&amp;ie=UTF8&amp;t=m&amp;ll=47.50491,19.058533&amp;spn=0.081173,0.145912&amp;z=12&amp;source=embed" style="color: #0000ff; text-align: left;">Budapest</a> in a larger map</small></p><![CDATA[Fitting a sigmoid curve in R]]>http://kyrcha.info/2012/07/08/tutorials-fitting-a-sigmoid-function-in-rhttp://kyrcha.info2012/07/08/tutorials-fitting-a-sigmoid-function-in-rSun, 08 Jul 2012 08:54:00 GMT<p>This is a short tutorial on how to fit data points that look like a sigmoid curve using the <em>nls</em> function in R. Let’s assume you have a vector of points you think they fit in a sigmoid curve like the ones in the figure below.</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/2nroJdV29uWGWU6IqGae0o/a7e619cd5d04efd95ba9852fa5b6a075/points.jpg" alt="points"></p> <p>The <a href="http://en.wikipedia.org/wiki/Generalised_logistic_function">general form of the logistic or sigmoid function</a> is defined as:</p> <p style="text-align: center;"><img class="latex" title="y(x) = A + \frac{K-A}{(1+Qe^{-B(t-M)})^{1/\nu}}" src="//s0.wp.com/latex.php?latex=y%28x%29+%3D+A+%2B+%5Cfrac%7BK-A%7D%7B%281%2BQe%5E%7B-B%28t-M%29%7D%29%5E%7B1%2F%5Cnu%7D%7D&amp;bg=ffffff&amp;fg=000&amp;s=0" alt="y(x) = A + \frac{K-A}{(1+Qe^{-B(t-M)})^{1/\nu}}" /></p> Let’s assume a more simple form in which only three of the parameters K, B and M, are used. Those are the upper asymptote, growth rate and the time of maximum growth respectively. <p style="text-align: center;"><img class="latex" title="y(x) = \frac{K}{1+e^{-B(t-M)}}" src="//s0.wp.com/latex.php?latex=y%28x%29+%3D+%5Cfrac%7BK%7D%7B1%2Be%5E%7B-B%28t-M%29%7D%7D&amp;bg=ffffff&amp;fg=000&amp;s=0" alt="y(x) = \frac{K}{1+e^{-B(t-M)}}" /></p> The following R code estimates the parameters, where <em>y</em> is a vector of data points: <pre><code class="language-R"># function needed for visualization purposes sigmoid = function(params, x) { params[1] / (1 + exp(-params[2] * (x - params[3]))) } x = 1:53 y = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0.1,0.18,0.18,0.18,0.33,0.33,0.33,0.33,0.41, 0.41,0.41,0.41,0.41,0.41,0.5,0.5,0.5,0.5,0.68,0.58,0.58,0.68,0.83,0.83,0.83, 0.74,0.74,0.74,0.83,0.83,0.9,0.9,0.9,1,1,1,1,1,1,1) # fitting code fitmodel &lt;- nls(y~a/(1 + exp(-b * (x-c))), start=list(a=1,b=.5,c=25)) # visualization code # get the coefficients using the coef function params=coef(fitmodel) y2 &lt;- sigmoid(params,x) plot(y2,type=&quot;l&quot;) points(y)</code></pre> <p>Now the data points along with the sigmoid curve look like this, with a = 1.0395204, b = 0.1253769, and c = 29.1724838.</p> <p><img src="//images.ctfassets.net/c5lel8y1n83c/1ngtCEuDWUuqgoas4AaYUu/7463c456ba7838174ef36b94f37a110d/prediction.jpg" alt="prediction"></p> <![CDATA[Translation of Echo State Networks in Greek]]>http://kyrcha.info/2012/06/30/translation-of-echo-state-networks-in-greekhttp://kyrcha.info2012/06/30/translation-of-echo-state-networks-in-greekSat, 30 Jun 2012 02:55:00 GMT<blockquote class="twitter-tweet"><p>Eureka! For my dissertation I translated Echo State Networks into Δίκτυα Ηχωικών (ή Ηχοϊκών) Καταστάσεων (ΔΗΚ). Liking it.</p> <p>&mdash; Kyr. Chatzidimitriou (@kyrcha) <a href="https://twitter.com/kyrcha/status/219053009395650560" >June 30, 2012</a></p></blockquote> <p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>I think I am going with Ηχωικών due to Echo = Ηχώ. Thus Echo State Networks (ESN) = Δίκτυα Ηχωικών Καταστάσεων (ΔΗΚ) στα Ελληνικά. The idea basically came from thinking about the <a href="http://en.wikipedia.org/wiki/Anechoic_chamber" >anechoic chamber</a> = ανηχωικός θάλαμος.</p><![CDATA[MsAriadne at Pac-Man vs Ghost Competition - CEC 2011]]>http://kyrcha.info/2011/06/07/msariadne-at-pac-man-vs-ghost-competition-cec-2011http://kyrcha.info2011/06/07/msariadne-at-pac-man-vs-ghost-competition-cec-2011Thu, 07 Jul 2011 07:04:00 GMT<p>We got the third place with MsAriadne bot in the <a href="http://cseepr2.essex.ac.uk/~competition/">Ms Pac-Man vs Ghosts competition</a>, organized by University of Essex and held during the 2011 <a href="http://www.cec2011.org/">Congress on Evolutionary Computation</a> (CEC 2011) . The bot is part of George Matzoulas diploma thesis project.</p> <h3>Videos of MsAriadne bot versus Legacy ghost team</h3> <p><iframe width="560" height="315" src="https://www.youtube.com/embed/bDuptphXnbA?rel=0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe></p> <p><iframe width="560" height="315" src="https://www.youtube.com/embed/KKOfrhSn1nk?rel=0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe></p><![CDATA[Finding related work and keeping up to date]]>http://kyrcha.info/2011/03/11/finding-related-work-and-keeping-up-to-datehttp://kyrcha.info2011/03/11/finding-related-work-and-keeping-up-to-dateThu, 10 Mar 2011 23:26:00 GMT<p><a href="https://commons.wikimedia.org/wiki/File:G%C3%B6ttingen-SUB-old.books.JPG#/media/File:G%C3%B6ttingen-SUB-old.books.JPG"><img src="https://upload.wikimedia.org/wikipedia/commons/1/1b/G%C3%B6ttingen-SUB-old.books.JPG" alt="Göttingen-SUB-old.books.JPG" height="480" width="640"></a><br>"<a href="https://commons.wikimedia.org/wiki/File:G%C3%B6ttingen-SUB-old.books.JPG#/media/File:G%C3%B6ttingen-SUB-old.books.JPG">Göttingen-SUB-old.books</a>". Licensed under Public Domain via <a href="//commons.wikimedia.org/wiki/">Wikimedia Commons</a>.</p> <p>One important task as a researcher, is to keep up with all the recent related work in your domain. The items below are what I do personally for following the state or the art. They involve both push notifications in the form of email alerts, and todos that I put on my calendar every two or three months.</p> <p>1. I have subscribed to the RSS feeds of all well known publishers (Elsevier, IEEE, Springer etc.) for the journals I am interested in. Then using a RSS reader (I personally use Google Reader so that I have my feeds available and synced in all my machines and my mobile phone) in order to get all the articles in press. Every couple of months I spent a couple of hours to check the titles and abstracts. If the article seems interesting or is a related work I download it and read it further.</p> <p>2. In my browser I have a bookmark folder for the websites of the research groups and researchers working in my area. Again every couple of months I do an &#8220;Open All in Tabs&#8221; action and browse through the tabs for newly added material. I put a recurrent reminder in my calendar app to check them every 3 months.</p> <p>3. Through Google Alerts I have created a few queries that return newly added search engine results via email. Once per week new alerts arrive in my mailbox and I do a quick scan. For example a query could be &#8220;Echo State Network&#8221; in quotes in order to match the whole phrase. Also alerts can be added using Google Scholar, where one can save the searches as email alerts. I have a few of those too.</p> <p>4. As in item 2, with research groups and researchers, one can have another bookmarks folder for conferences and workshops taking place every year. Some of them publish their proceedings online, so by visiting their websites two-three times a year you can step into the published work.</p> <p>6. Last but not least, I have subscribed to a number of mailing lists in the areas of my interest. For example some of the lists I am subscribed to are:</p> <ul> <li>rl-list (Reinforcement Learning)</li> <li>ML-News (Machine Learning News)</li> <li>reservoir-computing (Reservoir Computing)</li> <li>cig (Computational Intelligence in Games)</li> </ul> <p>among others. Sometimes authors advertise and provide links to their most recent publications, besides just using them for CFPs and job openings.</p> <p>7. Currently I have started experimenting with social networking sites related to research like Mendeley and ResearchGATE. I&#8217;ll see how that goes.</p> <p>What do you do?</p><![CDATA[Diploma Theses @ ISSEL on Computational Intelligence in Games]]>http://kyrcha.info/2010/10/27/diploma-theses-issel-on-computational-intelligence-in-gameshttp://kyrcha.info2010/10/27/diploma-theses-issel-on-computational-intelligence-in-gamesWed, 27 Oct 2010 00:24:00 GMT<p>This is a video I made gathering clips from AI agents/bots/controllers or whatever you want to call them, developed by researchers, students and aficionados, mainly for competitions in the IEEE CIG conferences, with a couple of them being projects of diploma theses students at <a href="http://issel.ee.auth.gr">Intelligent Systems and Software Engineering Labgroup</a> (ISSEL). The goal is to demonstrate existing test-beds to whoever is looking for developing autonomous agents as a diploma thesis project with ISSEL in the field of CIG.</p> <p><a href="https://www.youtube.com/watch?v=erKHbw0NdTo">https://www.youtube.com/watch?v=erKHbw0NdTo</a></p> <h3 id="the-testbeds-and-related-links">The testbeds and related links</h3> <h4 id="torcs">TORCS</h4> <p>Car racing, car setup <a href="http://cig.ws.dei.polimi.it/">http://cig.ws.dei.polimi.it/</a></p> <h4 id="ortsrl">ORTS+RL</h4> <p><a href="http://2008.rl-competition.org/content/view/20/36/">http://2008.rl-competition.org/content/view/20/36/</a></p> <h4 id="starcraft">Starcraft</h4> <p>Micromanagement, Small scale battle, Tech limited and Full game <a href="http://eis.ucsc.edu/StarCraftAICompetition">http://eis.ucsc.edu/StarCraftAICompetition</a> <a href="http://ls11-www.cs.tu-dortmund.de/rts-competition/starcraft-cig2010">http://ls11-www.cs.tu-dortmund.de/rts-competition/starcraft-cig2010</a></p> <h4 id="poker-texas-holdem">Poker Texas Hold’em</h4> <p>Limit heads up, No limit heads up, Ring <a href="webdocs.cs.ualberta.ca/~games/poker">webdocs.cs.ualberta.ca/~games/poker</a> <a href="http://www.computerpokercompetition.org/">http://www.computerpokercompetition.org/</a> <a href="http://www.poker-academy.com/">http://www.poker-academy.com/</a></p> <h4 id="pac-man">Pac-Man</h4> <p><a href="http://cswww.essex.ac.uk/staff/sml/pacman/PacManContest.html">http://cswww.essex.ac.uk/staff/sml/pacman/PacManContest.html</a></p> <h4 id="mario">Mario</h4> <p>Gameplay, learning, level generation <a href="http://www.marioai.org/">http://www.marioai.org/</a></p> <h4 id="defcon">DEFCON</h4> <p><a href="http://www.introversion.co.uk/defcon/">http://www.introversion.co.uk/defcon/</a> <a href="http://www.doc.ic.ac.uk/~rb1006/projects:api">http://www.doc.ic.ac.uk/~rb1006/projects:api</a></p> <h4 id="keepaway">Keepaway</h4> <p><a href="http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/">http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/</a> <a href="http://gridsoccer.codeplex.com/">http://gridsoccer.codeplex.com/</a>(grid soccer environment)</p> <h4 id="unreal-tournament">Unreal Tournament</h4> <p><a href="http://www.botprize.org/">http://www.botprize.org/</a></p> <h3 id="video-making-technical-information">Video making, technical information</h3> <p>In case you are interested, clips were downloaded from the YouTube channels mentioned in the video, from attributed in the video websites, or with the help of screen capture programs in the following formats: flv, wmv, avi and swf. The swf video was converted to flv using a swf2flv converter and all of them to raw DV with the help of Kdenlive and Kino open source programs under Ubuntu Linux. For editing, rendering and uploading iMovie was used.</p> <![CDATA[Rome]]>http://kyrcha.info/2010/10/02/romehttp://kyrcha.info2010/10/02/romeSat, 02 Oct 2010 09:38:00 GMT<p>Along with Paris, <strong>Rome</strong> is one of my favorite cities so far. This was my second visit to Rome. The previous one was a short one, during an InterRail excursion a decade ago. Me and my wife decided to go there for our honey moon and this is just a journal of our time to the <em>Eternal City</em>. I&#8217;ve also tried to visit all the &#8220;Angels &amp; Demons&#8221; sights and see them first-hand. So there will be an A&amp;D post for sure in the near future. Our time of visit was end July &#8211; beginning of August 2010.</p> <h3>Day 1</h3> <p>Early in the morning caught our flight from SKG to FCO with Alitalia (they offered a drink and a snack to eat) &#8211; Got the Leonardo express (14 EPP) and in 30 minutes we reached the Roma Termini &#8211; Metro line B was out causing a little chaos in the bus terminals around &#8211; Check in &#8211; Visited the Castel Sant&#8217;Angelo and in particular: il Passeto (luckily since it was only open 10:30 to 11:30 am), walked through the castle and its museum halls (unfortunately no English translations available at the exhibits), terrace (nice view) &#8211; Short walk to the St. Peter&#8217;s plaza &#8211; Then headed east to Piazza del Popolo, Porta Del Popolo (Bernini) and Santa Maria del Popolo [1]: highlights there are the Cappela Chigi (Raphael&#8217;s &amp; Bernini&#8217;s work) with the kneeling skeleton and the two Caravagios &#8211; Evening walk @ Via del Corso, Via Condotti, Piazza di Spagna and Scalinata, Fontana di Trevi and Piazza Barberini with Bernini&#8217;s Triton fountain.</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/5044397307/" ><img src="https://farm5.static.flickr.com/4104/5044397307_cd807889d7.jpg" alt="DSC_0098" width="500" height="334" /></a></p> <h3>Day 2</h3> <p>Colosseum (the 1.5 EPP booking online ticket fee saved us a lot of queue waiting time so I would recommend it) &#8211; Palatino: Museums, Stadio, Casa di Livia, Casa de Augusto, Roman Huts (not as exhiting as the Colosseum) &#8211; Roman Forum walk through the Via Sacra and Via dei Fori Imperialli (must have A LOT of imagination) &#8211; il Vittoriano: took the elevator to the top (kind of expensive with 7 EPP, but the view is really nice, personally I prefer it even from the St. Peter&#8217;s Dome since it is more in the middle of Rome than aside) &#8211; Piazza Venezia and Chiesa di St. Marco [2] (worth the &#8220;offerte&#8221; for lighting the golden mosaic) &#8211; Capitolium: Piazza and Musei Capitolini &#8211; Walk towards Theatre of Marcellus, Santa Maria Cosmedin with its Boca de la Verita (took a picture from the side, since it was not open at that time), the Broken bridge and Isola de Tiberina &#8211; At that time the clouds started gathering so we turned back to Circo Massimo where we took the metro back to the hotel after grabbing some take away food for dinner &#8211; The most tiresome day that put our legs to test, but at least it was worth it.</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/4923442129/" ><img src="https://farm5.static.flickr.com/4080/4923442129_6ea0e0cd65.jpg" alt="DSC_0232" width="500" height="334" /></a></p> <h3>Day 3</h3> <p>Musei Vaticani (like in the Colosseum, the extra 4EPP for reserving the tickets online are worth it): just looked for the basic attractions in Pinacoteca, Museo pio Clementino, Gregoriano Egizio, Degli Arazi, Geographice, Stanze di Rafaello and Cappella Sistina &#8211; From piazza St. Pietro we entered the St. Peter&#8217;s Basilica [3], then the Dome, &#8220;Cuppola&#8221; in Italian, (elevator costs 7EPP) and finally then the Vatican Grotoes (for free) &#8211; The nice thing about going alone instead of being in a group is that you can take your time. Actually we stayed just in the Basilica for two hours &#8211; Back to the hotel&#8230; &#8211; In the evening a small trip to the Scalinata and Fontana di Trevi.</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/4924042438/" ><img src="https://farm5.static.flickr.com/4099/4924042438_dea0082f3a.jpg" alt="Burn to the end of time" width="334" height="500" /></a></p> <h3>Day 4</h3> <p>Plazza Barberini, Santa Maria della Vitoria [4] with its Theresa in Ecstasy sculpture by Bernini, Via Veneto, Santa Maria dei Concicione [5] and Crypto dei Cappucini (kind of kreepy, their motto: &#8220;What you are we used to be. What we are you will be&#8221; &#8211; In Via Veneto the coffee was expensive (5EPP) and we experienced some bad attitude from the waiters, unfitting for &#8220;want to stay famous&#8221; caffes &#8211; Walk in the park of Villa Borghese &#8211; Caught the nice view towards Piazza del Popolo &#8211; Sat in Caffe Rosati for ice coffee like proposed in AD (I bet Dan Brown has not visited Greece) for 7EPP &#8211; Headed towards Piazza de la Rotonda, the Pantheon, Piazza de la Minerva and the Elephantino obelisk, Santa Maria Dela Minerva [6] &#8211; Looked for the Caravagios @ St. Augustino [7] and St. Francesi [8] that were advertised in public spots all over Rome &#8211; Later Piazza Navona and Agnes in Agony [9] &#8211; Got some rest and went to eat &#8211; Grabbed an ice cream from the Old Bridge &#8211; Admire St. Peter&#8217;s piazza and Basilica at night until 11:00 pm, when they close it &#8211; Returned to the hotel after a quick walk at Bernini&#8217;s bridge in front of Castel Sant Angelo.</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/4923448121/" ><img src="https://farm5.static.flickr.com/4143/4923448121_147e5c4449.jpg" alt="The Oculus" width="500" height="334" /></a></p> <h3>Day 5</h3> <p>Walking and shopping in Rome&#8217;s center &#8211; At afternoon, our attempt to locate Trastevere with no guide ends up in disaster, since we end up at Trastevere train station, which has nothing to do with the &#8220;cool&#8221; place in Rome &#8211; In between we saw the Pyramide and another view of Rome not so &#8220;historic&#8221;, but rather &#8220;urbanic&#8221; &#8211; Grabbed something to eat and returned to the hotel &#8211; After getting a good rest from the afternoon&#8217;s painful to the legs mistake, we relocated Trastevere, and went there on foot by the Tiber (pass four bridges heading south after Ponte Sant&#8217;Angelo and you will find it before Isola Tiberina on the west side of &#8220;Tiveris&#8221;) &#8211; @Trastevere: Santa Maria dei Trastevere [10], Piazza Trastevere, sat down to eat, searched for two Lonely Planet&#8217;s proposed gellateries but both were closed since it was kind of late &#8211; Went back again on foot, which we kind of regretted it since we felt kind of threatened in a situation. At least we got some nice night pictures with long exposure times.</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/4924046212/" ><img src="https://farm5.static.flickr.com/4100/4924046212_f1088de089.jpg" alt="Tiber@night" width="500" height="334" /></a></p> <h3>Day 6</h3> <p>Checked out and left our baggage with the hotel &#8211; Got our small presents for family and friends mainly @ Via dei Rienzo &#8211; Made a final walk through Castel Sant&#8217;Angelo, Piazza Navona, Pantheon, St. Ignacio dei Loyola [11], Fontana di Trevi, St. Peter&#8217;s Basilica (Rome in general and St. Peter&#8217;s piazza more specifically by that time were packed with ROMA 2010 CIM attendees) &#8211; Finally, Hotel, Metro, Train to FCO, FCO to SKG, Thessaloniki, ate gyros and said home sweet home&#8230;</p> <p style="text-align: center;"><a href="https://www.flickr.com/photos/kyrcha/4924045000/" ><img src="https://farm5.static.flickr.com/4094/4924045000_e8cc0762c7.jpg" alt="Nereids" width="500" height="334" /></a></p> <h3>Photos</h3> <p>A small collection of my photos from <a href="http://www.flickr.com/photos/kyrcha/sets/72157624673984075/" target="_blank">Rome and the Vatican City</a>.</p> <h3>Budget, Eating and Drinking</h3> <p>An expensive city (food, drink, museums). For our budget, we had to put some effort to think and search where to eat or drink, maintaining a good quality-to-money ratio. But I guess this is difficult for everything and everywhere nowadays. The coffee is not so cheap, as people often believe in Greece, especially in the caffes at the historic center. Kind of difficult though for the &#8220;average&#8221; tourist to see the attractions and have in his or her mind, where to eat &#8220;smart&#8221;: cheap enough and good enough. Personally, I prefer the pizzas and the freddo cappucino like made in Greece. Filled pasta was good (same for the lasagna) but the other kinds of pasta were too &#8220;al dente&#8221; for our tastes. Gellato was just great!!! Our personal favor was &#8220;Old Bridge&#8221; where we went three times. Fistacio there is just great. But beware, there are places where one can pay 5EPP (for example we spotted one such place at the Fontana di Trevi area) for an ice cream smaller than in places where you pay 1,5EPP for the same quantity.</p> <h3>In numbers</h3> <p>Photos taken: 1368<br /> Churches (&#8220;Chiesas&#8221;) entered: 11 (including St Peter&#8217;s Basilica)</p> <h3>Acronyms</h3> <p>EPP: Euro(s) Per Person<br /> AD: Angels &amp; Demons</p><![CDATA[IEEE ICDM 2010 Contest]]>http://kyrcha.info/2010/09/08/ieee-icdm-2010-contesthttp://kyrcha.info2010/09/08/ieee-icdm-2010-contestWed, 08 Sep 2010 09:25:00 GMT<p>Just for fun, I participated in the <a href="http://tunedit.org/challenge/IEEE-ICDM-2010">IEEE ICDM 2010 Contest</a> - <a href="http://tunedit.org/challenge/IEEE-ICDM-2010/traffic">Traffic track</a>, with a couple of R scripts, at first using linear regression and later neural networks. Mainly due to summer vacations limiting time available, the approach was nothing too fancy, ending up in the 17th place out of 101 active participants.</p> <p>The task was to predict traffic in 10 road segments, 2 ways each, for 1000 60-minutes long windows between the 41st and the 50th minute, knowing only the first 30 minutes. Historical data were provided in the form of 100 10-hour windows (60000 rows) with 20 values per row, corresponding to the traffic observed in a minute of one of the 10 road segments x 2 ways.</p> <p>My best result in the competition was obtained using the following procedure:</p> <p>a. <strong>Preprocessing</strong>: Transform the training and test datasets, having corresponding to 10-minutes intervals rather than 1-minute intervals. Normalize all value to [0,1]. b. <strong>Modelling</strong>: Make the problem a supervised learning problem. I used 60 attributes, 20 for time t+1 to t+10, 20 for time t+11 to t+20, and 20 to t+21 to t+30, to predict one of the 20 traffic values at time t+41 to t+50. Thus 20 such datasets were created, one for each road segment and way. c. <strong>Training</strong>: 20 Feed-Forward Neural Nets (FFNNs) were trained for each one of the above 20 datasets, and 20 more were trained the same way, using a reduced dataset with 15 attributes instead of 60. This was achieved by using ReliefF feature selection algorithm in WEKA and maintaining the top 15 attributes. Each one of the 40 FFNNs had its weights randomly initialized. The former 20 FFNNs had 15 hidden units, while the later 30. Decay rate was also used. d. <strong>Predicting</strong>: Predictions were made for each one of the 20 target values using all 40 NN. The final prediction was the mean value of the 40 predictions.</p> <![CDATA[Academic Fun]]>http://kyrcha.info/2010/09/01/academic-funhttp://kyrcha.info2010/09/01/academic-funTue, 31 Aug 2010 23:51:00 GMT<p>How to make a publication:</p> <ul> <li>If you have crisp algorithm, make it fuzzy.</li> <li>If you have a problem, solve it using a GA.</li> <li>If you have an algorithm, program it in CUDA.</li> </ul> <![CDATA[A NEAT Way for Evolving Echo State Networks]]>http://kyrcha.info/2010/04/29/a-neat-way-for-evolving-echo-state-networkshttp://kyrcha.info2010/04/29/a-neat-way-for-evolving-echo-state-networksThu, 29 Apr 2010 03:56:00 GMT<p>My ECAI 2010 submission entitled &quot;A NEAT Way for Evolving Echo State Networks&quot; was accepted for publication as a full paper. I&#39;ll keep updating the post with information about the paper.</p> <p><strong>Abstract</strong>: The Reinforcement Learning (RL) paradigm is an appropriate formulation for agent, goal-directed, sequential decision making. In order though for RL methods to perform well in difficult, complex, real-world tasks, the choice and the architecture of an appropriate function approximator is of crucial importance. This work presents a method for automatically discovering such function approximators, based on a synergy of ideas and techniques that are proven to be working on their own. Using Echo State Networks (ESNs), as our function approximators of choice, we try to adapt them, by combining evolution and learning for developing the appropriate ad-hoc architectures to solve the problem at hand. The choice of ESNs was made for their ability to handle both non-linear and non-Markovian tasks, while also being capable of learning on-line, through simple gradient descent, temporal difference learning. For creating networks that enable efficient learning, a neuroevolution procedure was applied. Appropriate topologies and weights were acquired by applying the NeuroEvolution of Augmented Topologies (NEAT) method as a meta-search algorithm and by adapting ideas like historical markings, complexification and speciation, to the specifics of ESNs. Our methodology is tested on both supervised and reinforcement learning testbeds with promising results.</p> <h3 id="presentation">Presentation</h3> <p><a href="http://www.slideshare.net/kyrcha/a-neat-way-for-evolving-echo-state-networks">http://www.slideshare.net/kyrcha/a-neat-way-for-evolving-echo-state-networks</a></p>