<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.7">Jekyll</generator><link href="https://nextoptdev.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://nextoptdev.github.io/" rel="alternate" type="text/html" /><updated>2020-06-01T01:47:24+00:00</updated><id>https://nextoptdev.github.io/feed.xml</id><title type="html">NextOpt</title><subtitle>QTell은 넥스트옵트에서 제공하는 시계열 기반 예측 서비스입니다</subtitle><entry><title type="html">GSx Active Learning</title><link href="https://nextoptdev.github.io/regression/optimization/2020/04/20/GSx-active-learning/" rel="alternate" type="text/html" title="GSx Active Learning" /><published>2020-04-20T11:43:00+00:00</published><updated>2020-04-20T11:43:00+00:00</updated><id>https://nextoptdev.github.io/regression/optimization/2020/04/20/GSx%20active%20learning</id><content type="html" xml:base="https://nextoptdev.github.io/regression/optimization/2020/04/20/GSx-active-learning/">&lt;h1 id=&quot;gsx-active-learning&quot;&gt;GSx Active Learning&lt;/h1&gt;

&lt;p&gt;A problem I had with datasets having small amount of samples(for the sake of simplicity I set a criteria of &amp;lt;30) was that it’s hard to get reasonably accurate prediction results.&lt;/p&gt;

&lt;p&gt;Apparently, one of the methods used in said scenarios was &lt;em&gt;Active Learning&lt;/em&gt;. Active learning is a method where given a set of data points, it only chooses up to &lt;script type=&quot;math/tex&quot;&gt;k&lt;/script&gt; points, either by manual selection or a set criteria. It’s often used in scenarios where data is abundant that not all can be labeled.&lt;/p&gt;

&lt;p&gt;However, as I’m primarily dealing with time series, I wasn’t sure if it was acceptable to selectively omit some data points.&lt;/p&gt;

&lt;p&gt;I found a paper named &lt;a href=&quot;https://arxiv.org/abs/1808.04245&quot;&gt;Active Learning for Regression Using Greedy Sampling&lt;/a&gt; by D. Wu et al., using simple greedy techniques to select data points for training.&lt;/p&gt;

&lt;p&gt;The first method the paper mentioned, named &lt;em&gt;“Greedy Sampling on the Inputs(GSx)”&lt;/em&gt;, would select data points which is the closest to the centroid of all the samples. This is iterated &lt;script type=&quot;math/tex&quot;&gt;k&lt;/script&gt; times, resulting in the training set.&lt;/p&gt;

&lt;p&gt;I tested the method with “Total number of phone calls to pizza delivery services from Seoul area in September, 2019” data. The data consists of daily number of phone calls from September 1st to the 30th, separated by age group(10, 20, 30, …) and district. Since I was only interested in the total number of calls, I deemed that age group and district would be irrelevant in the results. So my formatted data consists of 30 data points, one for each day.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GSx&lt;/em&gt; was implemented by using the absolute distance from the mean of the standarized values, which would be the absolute distance from 0. The data was divided into 24 train data points, and 6 test points. Additionally, the data showed a strong weekly seasonal frequency.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/2020-04-20/data_plot.png&quot; alt=&quot;data plot, standardized&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;data plot, standardized&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A standard Linear Ridge model was used, with a Piecewise Trend and a Weekly Fourier feature added. I started with an initial value of &lt;script type=&quot;math/tex&quot;&gt;k = 10&lt;/script&gt;, up to the length of the train set, 24.&lt;/p&gt;

&lt;p&gt;The results are shown in the image below(The numbers next to k is the RMSE):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/2020-04-20/result_subplot.png&quot; alt=&quot;GSx result plot&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the RMSE score peaks at &lt;script type=&quot;math/tex&quot;&gt;k = 21&lt;/script&gt;, but interestingly enough, adding more data points actually increased the error.&lt;/p&gt;

&lt;p&gt;Even though &lt;em&gt;GSx&lt;/em&gt; is a simple technique, it was able to increase the accuracy in a test environment. Additionally, other methods, such as &lt;em&gt;GSy&lt;/em&gt; in the aforementioned paper, or more advanced ones, could possibly bring even better results.&lt;/p&gt;</content><author><name>Shin Young Kim</name></author><summary type="html">GSx Active Learning</summary></entry><entry><title type="html">AF Ratio Optimization</title><link href="https://nextoptdev.github.io/bayesian/model-optimization/2020/04/07/AF-Ratio-Optimization/" rel="alternate" type="text/html" title="AF Ratio Optimization" /><published>2020-04-07T02:34:00+00:00</published><updated>2020-04-07T02:34:00+00:00</updated><id>https://nextoptdev.github.io/bayesian/model-optimization/2020/04/07/AF%20Ratio%20Optimization</id><content type="html" xml:base="https://nextoptdev.github.io/bayesian/model-optimization/2020/04/07/AF-Ratio-Optimization/">&lt;h1 id=&quot;af-ratio-optimization&quot;&gt;AF Ratio Optimization&lt;/h1&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/af-ratio-plot.png&quot; alt=&quot;AF ratio histogram&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Histogram of AF ratios&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;optimizing-the-distribution-of-af-ratio-for-the-maximum-profit-by-using-historical-data-inform-about-the-best-decision-for-order-quantity&quot;&gt;Optimizing the distribution of A/F ratio for the maximum profit by using historical data. Inform about the best decision for order quantity.&lt;/h2&gt;
&lt;p&gt;Solution for how to optimize this A/F data, which is right-skewed and limited to positive values. Yield different models depending on an allowing range and the amount of data.&lt;/p&gt;

&lt;h3 id=&quot;challenge&quot;&gt;Challenge&lt;/h3&gt;
&lt;p&gt;How to fit the A/F ratio distribution under partial demand and forecast information&lt;/p&gt;

&lt;h3 id=&quot;solution&quot;&gt;Solution&lt;/h3&gt;
&lt;p&gt;Through Entropy Maximization, find optimal inventory levels with uncertain demand distribution.&lt;/p&gt;

&lt;h3 id=&quot;why&quot;&gt;Why?&lt;/h3&gt;
&lt;p&gt;Balance between risk and benefit of ordering is important. Focusing on the proportion casts light on optimal order quantity and has an advantage of normalizing, rather than exact prediction of actual quantity.&lt;/p&gt;

&lt;h3 id=&quot;how&quot;&gt;How?&lt;/h3&gt;
&lt;p&gt;Estimate the optimal distribution by calculating and comparing the resulting profits between various competing distributions.&lt;/p&gt;</content><author><name>Hyun Ji Moon</name></author><summary type="html">AF Ratio Optimization</summary></entry><entry><title type="html">Feature Engineering: Using Weather Data to Predict Meal Consumption in a Military Mess</title><link href="https://nextoptdev.github.io/bayesian/feature-engineering/2020/04/06/Adding-Weather-data-for-Predicting-Meal-Consumption/" rel="alternate" type="text/html" title="Feature Engineering: Using Weather Data to Predict Meal Consumption in a Military Mess" /><published>2020-04-06T08:50:00+00:00</published><updated>2020-04-06T08:50:00+00:00</updated><id>https://nextoptdev.github.io/bayesian/feature-engineering/2020/04/06/Adding%20Weather%20data%20for%20Predicting%20Meal%20Consumption</id><content type="html" xml:base="https://nextoptdev.github.io/bayesian/feature-engineering/2020/04/06/Adding-Weather-data-for-Predicting-Meal-Consumption/">&lt;p&gt;I was tasked to attempt to add weather data(precipitation, temperature) to enhance forecast results for meal consumption quantity in a military mess hall near Daejon, Korea.&lt;/p&gt;

&lt;p&gt;The Data I recieved was daily meal consumption quantity from January 1st, 2019 to August 30th of the same year. Additionaly, some dates were missing from the data, which is believed to be from days which the mess didn’t open, like holidays or leaves.&lt;/p&gt;

&lt;p&gt;The plot below shows the consumption quantity plotted against dates, with selected peaks and floors marked as red and blue dots:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/meal-date-plot.png&quot; alt=&quot;data plot&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Upon first look, some sort of seasonality can be seen, although not exact, since we have to remember that some dates are missing, meaning time axis is not uniform.&lt;/p&gt;

&lt;p&gt;I tried plotting some autocorrelation plots of various lags to try to grasp seasonality intervals:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/meal-date-acf-150.png&quot; alt=&quot;data plot&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, using just an ACF plot it’s hard to figure out the seasonality intervals.&lt;/p&gt;

&lt;p&gt;For the sake of analysis, decided to use seasonality values of 5, 7, 15, and 31.&lt;/p&gt;

&lt;p&gt;My goal was to see if adding standarized precipitation and average temperature data would make meaningful impact to prediction accuracy.&lt;/p&gt;

&lt;p&gt;As a result, regression without weather data added as a feature resulted in a RMSE of 1.18(in standarized scale).&lt;/p&gt;

&lt;p&gt;Unfortunately, adding weather barely improved the result, pushing it up to a measly 1.14.&lt;/p&gt;

&lt;p&gt;Adding raw weather data was not effective in this specific datset.
Additional preprocessing would be required to effectively incorporate
the data. I’m currently thinking of applying a Logistical function to precipitation, because one of my hypothesis is that less people are likely to be active the more raint it falls.&lt;/p&gt;</content><author><name>Shin Young Kim</name></author><summary type="html">I was tasked to attempt to add weather data(precipitation, temperature) to enhance forecast results for meal consumption quantity in a military mess hall near Daejon, Korea.</summary></entry><entry><title type="html">Model Composition: Constructing a Hierarchical Spline Time Series Model</title><link href="https://nextoptdev.github.io/bayesian/model/2020/04/06/Constructing-a-Hierarchical-Spline-Time-Series-Model/" rel="alternate" type="text/html" title="Model Composition: Constructing a Hierarchical Spline Time Series Model" /><published>2020-04-06T08:24:06+00:00</published><updated>2020-04-06T08:24:06+00:00</updated><id>https://nextoptdev.github.io/bayesian/model/2020/04/06/Constructing%20a%20Hierarchical%20Spline%20Time%20Series%20Model</id><content type="html" xml:base="https://nextoptdev.github.io/bayesian/model/2020/04/06/Constructing-a-Hierarchical-Spline-Time-Series-Model/">&lt;p&gt;A hierarchical spline time series model is being experimented on for forecasting failure rates.&lt;/p&gt;

&lt;p&gt;The following properties were observed while configuring the model:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The sparsity of the data is unbalanced&lt;/li&gt;
  &lt;li&gt;Some portions of the data are missing&lt;/li&gt;
  &lt;li&gt;Each layer within the model shared similar properties&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;https://nextoptdev.github.io/images/blog/sparsity_data.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Sparsity data&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Data sparsity could be overcome by estimating B-Spline parameters, \(\beta, w\) for the overall period.&lt;/p&gt;

&lt;p&gt;A hierarchical model can be constructed, and its hyperpriors estimated using Markov Chain Monte-Carlo sampling to handle missing data.&lt;/p&gt;

&lt;p&gt;This method calculated the basis of B-Spline at equal intervals. It will be possible to build a more reliable model when calculating the basis by quantile division of the number of data according to the amount of data for each section.&lt;/p&gt;</content><author><name>Jhin Woo Choi</name></author><summary type="html">A hierarchical spline time series model is being experimented on for forecasting failure rates.</summary></entry><entry><title type="html">Correlation Analysis for values D, S, C with flow</title><link href="https://nextoptdev.github.io/bayesian/feature-engineering/2020/03/28/%EC%83%81%EA%B4%80%EA%B4%80%EA%B3%84%EB%B6%84%EC%84%9D-D,-S-C-%ED%9D%90%EB%A6%84%EC%9D%B4-%EC%9E%88%EC%9D%84%EB%95%8C/" rel="alternate" type="text/html" title="Correlation Analysis for values D, S, C with flow" /><published>2020-03-28T07:37:11+00:00</published><updated>2020-03-28T07:37:11+00:00</updated><id>https://nextoptdev.github.io/bayesian/feature-engineering/2020/03/28/%EC%83%81%EA%B4%80%EA%B4%80%EA%B3%84%EB%B6%84%EC%84%9D:%20D,%20S%20C%20%ED%9D%90%EB%A6%84%EC%9D%B4%20%EC%9E%88%EC%9D%84%EB%95%8C</id><content type="html" xml:base="https://nextoptdev.github.io/bayesian/feature-engineering/2020/03/28/%EC%83%81%EA%B4%80%EA%B4%80%EA%B3%84%EB%B6%84%EC%84%9D-D,-S-C-%ED%9D%90%EB%A6%84%EC%9D%B4-%EC%9E%88%EC%9D%84%EB%95%8C/">&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;find i for
 D ~ [D's trend, D's season, y_S_shift(i)]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;epsilon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'yhat'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;for this we suggest the following&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort corr(df.epsilon, y_S_shift(i))&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;select a number of i who have high correlation.&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;feature candidate : {y_S_shift(i)}&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The reason behind comparing the error (\( y - \widehat{y} \)) and y_S, instead of y_D and y_S, is to eliminate the effect of highered correlation resulting from the same seasonality components.&lt;/p&gt;</content><author><name>Hyun Ji Moon</name></author><summary type="html">find i for D ~ [D's trend, D's season, y_S_shift(i)] df.epsilon = df.y - m.predict(df)['yhat']</summary></entry><entry><title type="html">Welcome to Jekyll!</title><link href="https://nextoptdev.github.io/jekyll/update/2020/03/26/welcome-to-jekyll/" rel="alternate" type="text/html" title="Welcome to Jekyll!" /><published>2020-03-26T10:41:11+00:00</published><updated>2020-03-26T10:41:11+00:00</updated><id>https://nextoptdev.github.io/jekyll/update/2020/03/26/welcome-to-jekyll</id><content type="html" xml:base="https://nextoptdev.github.io/jekyll/update/2020/03/26/welcome-to-jekyll/">&lt;p&gt;You’ll find this post in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_posts&lt;/code&gt; directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jekyll serve&lt;/code&gt;, which launches a web server and auto-regenerates your site when a file is updated.&lt;/p&gt;

&lt;p&gt;To add new posts, simply add a file in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_posts&lt;/code&gt; directory that follows the convention &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;YYYY-MM-DD-name-of-post.ext&lt;/code&gt; and includes the necessary front matter. Take a look at the source for this post to get an idea about how it works.&lt;/p&gt;

&lt;p&gt;Jekyll also offers powerful support for code snippets:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;print_hi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nb&quot;&gt;puts&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Hi, &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;#{&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;print_hi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;'Tom'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;#=&amp;gt; prints 'Hi, Tom' to STDOUT.&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://jekyllrb.com/docs/home&quot;&gt;Jekyll docs&lt;/a&gt; for more info on how to get the most out of Jekyll. File all bugs/feature requests at &lt;a href=&quot;https://github.com/jekyll/jekyll&quot;&gt;Jekyll’s GitHub repo&lt;/a&gt;. If you have questions, you can ask them on &lt;a href=&quot;https://talk.jekyllrb.com/&quot;&gt;Jekyll Talk&lt;/a&gt;.&lt;/p&gt;</content><author><name></name></author><summary type="html">You’ll find this post in your _posts directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run jekyll serve, which launches a web server and auto-regenerates your site when a file is updated.</summary></entry></feed>