Poker Gym is a free app available through Telegram where users can practice basic poker skills. For now, it only supports poker combinations. It will have more later.
Learning basics of poker is hard; you need to learn it through trial and error. What if you don’t want to play couple of hundred games to get good at it and instead can drill those in an app?
Poker Gym lets you practice basics such as learning combinations and rankings.
If this sounds interesting to you, I’d love to hear your thoughts:
Feel free to reach out with feedback at https://forms.gle/E2UMLnJqQxX1JnTx9
]]>The current database cannot handle the volume of incoming write requests. During peak times, there are too many write/update requests incoming per second, so requests are taking longer to execute. You can buffer requests, but if the database cannot fulfill them faster than they arrive, it will overflow. Let’s say you work in a bank and cannot afford to drop any requests.
To scale writes, you can either use a bigger database (scale vertically) or use multiple databases (scale horizontally). Let’s say you want to scale horizontally. So you add an extra database. Now, you need to figure out how to forward requests to multiple DBs.
You can do it in a round-robin style. The issue is that requests for user X are persisted across different DBs, which makes querying all records for X slower (we need to query all DBs to get results) and makes enforcing table constraints more difficult (cross-database referential integrity is handled outside the database engine). Since round-robin is not working for us because we lose referential integrity this way, we need to route requests so that user X always goes to Database 1, so that all of user X’s data is located on Database 1, and the database engine can perform referential-integrity checks for that user.
One way to achieve this is to use the modulo operator. We can take the modulo of user_id and use the result to determine which database to map our user to. Here’s an example of how it can be done. We can take our ID, convert it to an integer, and perform a modulo operation on it.

It works nicely as long as your ids are evenly distributed. If our ids are evenly distributed, then each database receives the same number of users. Let’s check if our UUIDs are evenly distributed.

Pretty much evenly distributed, there seems to be some noise around the 2nd bucket, but it will smooth out as numbers increase.
Problems with this approach start when we want to re-scale our database. Because when we rescale our database, instead of doing modulo 5 we do modulo 6 and records that were mapped to Database 1 are now going to be mapped to Database 6 and it will need to happen for many records.
5 % 5 -> 0
6 % 5 -> 1
7 % 5 -> 2
5 % 6 -> 1
6 % 6 -> 0
7 % 6 -> 1
Modulo hashing is simple and works well for static systems, but it becomes problematic when you need to scale. For some cases the number of records that need moving can go up as high as 93%. There are different ways of solving it. One way is to use consistent-hashing or lookup table.
]]>
We can see that the gold movement roughly follows inflation. However, it also follows gold buying done by other countries.

For example, China has been increasing its gold reserves. It’s widely assumed that China does it to reduce risk of dependence on US dollar. It would decrease dependence on US dollar.
]]>In order to release this project, I had to figure out how to host R inside of a container and serve it with AWS Lambda. I’ve already done something similar in Running R on AWS Lambda, so I could re-use parts of learning from there and build on top of it.
There were a couple of challenges that I encountered when working on this project:
Not all R libraries were available for AWS Lambda image, so I had to compile a couple of them from source code. When compiling, too many intermediate artifacts were created which put the final image over 10GB (Docker images hosted on AWS Lambda have a limit of 10GB [1]).
I reduced the size of the Lambda container by using multi-stage Docker build process and copying only compiled binaries into a final AWS Lambda image. I was able to go from 11GB to around 4GB, and I could run R container with all libs on AWS Lambda (yay).
The second challenge was the frontend since I’ve never done it before. Luckily ChatGPT helped me setup the React template that I could then modify and shape.
Also, CloudFront was a bit tricky to configure, specifically for configuring routes to Lambda function and making sure that the SPA could talk to Lambda and work across Firefox and Chrome.
After parts of the whole project had been configured, I did a couple of runs of integration testing and fixing. Once I checked that the skeleton and parts work together, I did a mini release on LinkedIn, to see what people say and if I can catch any errors with real traffic.
Overall, it was a fun learning experience, and now I have deployment templates that I can leverage for future projects as well as knowledge about how website hosting on AWS is done.
It’s a compute env managed by AWS. You can think about it as a service that has a while true loop that waits for incoming requests. When the request comes in, Lambda will call your code and pass a request to appropriate function.

Ok, not so fast. We cannot upload R code to Lambda directly, because Lambda does not support R runtime. Here’s the list of supported runtimes. There’s a way to patch it, but you will keep running into issues when installing deps and you would need to do your own maintenance time to time. We don’t want that.
That’s why we are going to be using 🐳 Docker container to host the R env and when the request comes to Lambda, it will pass it to a running container.

Lambda would pull an image from AWS ECR (host for docker images) and then run that image when the request comes in.
Check out the example repo: https://github.com/Bobrinik/r_on_lambda_example


Taken from https://www.youtube.com/watch?v=rpL77KDN92Q
Here I’m tracking of the collection of useful functions for the analysis of time series with Pandas.
Python for Finance, 2nd EditionPython for Finance, 2nd Edition
rets.corr()
Out[56]: .SPX .VIX
.SPX 1.000000 -0.804382
.VIX -0.804382 1.000000
In [57]: ax = rets['.SPX'].rolling(window=252).corr(
rets['.VIX']).plot(figsize=(10, 6))
ax.axhline(rets.corr().iloc[0, 1], c='r');

I wanted to see if instead of using pre-defined sectors by some other organization; I can partition tickers based on their risk profile. For doing that, I could use knowledge compressed in OpenAI LLM.
So the idea is to use OpenAI embeddings of risks for clustering Toronto Exchange tickers. The hypothesis is to use those instead of sectors. If successful, it would allow to diversify across risks instead of volatility and expected return, or sectors.
Unfortunately, it didn’t work; I think the prompt or the way I was merging embeddings for risks was not ideal. Anyway, if someone wants to continue, the code is on GitHub.
]]>Grease Monkey is a popular browser extension that allows users to customize the functionality and appearance of websites they visit. It works with various web browsers, including Google Chrome, Mozilla Firefox, and others. Grease Monkey uses user scripts, which are small JavaScript programs, to modify the behavior of web pages. Grease Monkey works by injecting user scripts into web pages as they are loaded in your browser. - ChatGPT
The idea is to inject script into webpage that would add functionality which is lacking. That script would get necessary data from the loaded webpage and put it into a CSV. It would also add a download button to the webpage so that person could download it.
That’s how it looks.
// ==UserScript==
// @name jQuery Example
// @require https://cdnjs.cloudflare.com/ajax/libs/jquery/3.7.1/jquery.min.js
// ==/UserScript==
function getFormattedDate() {
var dateObj = new Date();
var year = dateObj.getFullYear();
var month = ("0" + (dateObj.getMonth() + 1)).slice(-2); // getMonth() is zero-based
var day = ("0" + dateObj.getDate()).slice(-2);
return `${year}-${month}-${day}`;
}
window.onload = function() {
setTimeout(function () {
jQuery(document).ready(function($) {
let downloadButton = document.createElement("button");
downloadButton.innerHTML = "Download CSV";
downloadButton.id = "csvButton";
downloadButton.style.padding = "20px";
document.body.insertBefore(downloadButton, document.body.firstChild);
function generateCSV() {
let separator = ",";
let csvContent = [];
let header = ['Security', 'Name', 'Total_Value', 'Quantity', 'All_Time_Return', 'Per_All_time_Return', 'Today_Price', 'Per_Today_Price'];
csvContent.push(header.join(separator));
$("tbody tr").each(function () {
let row = [];
$(this).find("td").each(function () {
$(this).find("p").each(function() {
row.push($(this).text());
});
});
if(row.length == 9) {
row = row.slice(1);
}
console.log(row);
csvContent.push(row.join(separator));
});
return csvContent.join("\n");
}
document.getElementById("csvButton").addEventListener("click", function () {
let accountName = $(".knseRw > div:nth-child(1)").text();
let csvContent = generateCSV();
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:text/csv;charset=utf-8,' + encodeURI(csvContent);
hiddenElement.target = '_blank';
hiddenElement.download = accountName+'_portfolio_'+getFormattedDate()+'.csv';
hiddenElement.click();
});
});
}, 5000);
}
You can read more and follow instructions here.
]]>gcloud tool configured on localFinnhub provides tick level data for TSX for couple of years that you can bulk download from 2021 up to last month.

You can download each one separately or use the script below to get everything
#!/bin/bash
TOKEN="YOUR_TOKEN"
DIR_NAME="./finnhub_data/"
for year in {2021..2023}; do
for month in {1..12}; do
# Get the redirect URL
REDIRECT_URL=$(curl -s "https://finnhub.io/api/v1/bulk-download?exchange=to&dataType=trade&year=$year&month=$month&token=$TOKEN" | grep -oE 'href="proxy.php?url=[^"]+"' | cut -d'"' -f2)
mkdir -p "$DIR_NAME"
# Follow the redirect if a URL was found
if [[ ! -z "$REDIRECT_URL" ]]; then
curl -o "to_trade_$year-$month.tar" "$REDIRECT_URL"
mv "to_trade_$year-$month.tar" $DIR_NAME
fi
sleep 1
done
done
# Copy paste the code into file, say fetch_finnhub_archive.sh
chmod +x fetch_finnhub_archive.sh
./fetch_finnhub_archive.sh
Once you are done, you will end up with 94GB of files. Now let’s say you want to convert this to 1-min OHCL data. You can use pandas and do the processing, or you can use Google BigQuery to do that.
#!/bin/bash
for file in $1/*.tar; do
# Extract the tar file into the directory
echo "Extracting $file to $dir_name..."
dir_name="./uncompressed/${file##*/}"
mkdir -p $dir_name
tar -xf "$file" -C "$dir_name"
done
# Copy and paste into a script called uncompress_finnhub_archive.sh
chmod +x uncompress_finnhub_archive.sh
./uncompress_finnhub_archive.sh ./finnhub_data
After you run this script and cd uncompressed/to_trade_2021-1 and run ls -hl. You will see something like this.
total 2.5M
drwx------ 2 user user 124K Jan 5 2021 2021-01-04
drwx------ 2 user user 120K Jan 5 2021 2021-01-05
drwx------ 2 user user 124K Jan 6 2021 2021-01-06
drwx------ 2 user user 116K Jan 7 2021 2021-01-07
drwx------ 2 user user 128K Jan 8 2021 2021-01-08
drwx------ 2 user user 128K Jan 12 2021 2021-01-11
drwx------ 2 user user 124K Jan 13 2021 2021-01-12
drwx------ 2 user user 124K Jan 14 2021 2021-01-13
drwx------ 2 user user 124K Jan 15 2021 2021-01-14
drwx------ 2 user user 124K Jan 15 2021 2021-01-15
drwx------ 2 user user 120K Jan 19 2021 2021-01-18
drwx------ 2 user user 120K Jan 19 2021 2021-01-19
drwx------ 2 user user 124K Jan 20 2021 2021-01-20
drwx------ 2 user user 120K Jan 21 2021 2021-01-21
drwx------ 2 user user 124K Jan 23 2021 2021-01-22
drwx------ 2 user user 128K Jan 26 2021 2021-01-25
drwx------ 2 user user 124K Jan 27 2021 2021-01-26
drwx------ 2 user user 124K Jan 27 2021 2021-01-27
drwx------ 2 user user 124K Jan 28 2021 2021-01-28
drwx------ 2 user user 124K Jan 31 2021 2021-01-29
How many files are there in total and what’s their average size?
find "uncompressed" -type f | wc -l
2490838
find "uncompressed" -type f -exec du -k {} + | awk '{sum += $1} END {print sum}'
12081404
❯ python3
>>> 12081404 / 2490838
4.85033711546074 # Kbs
.csv filesTo do this, let’s use the script below. Note, you need to install pandas and tqdm libraries.
import os
import pandas as pd
from tqdm import tqdm
for dir in tqdm(os.listdir("./uncompressed"), desc="Processing months"):
try:
for file in tqdm(os.listdir(f"./uncompressed/{dir}"), desc="Processing days"):
tables = []
file_name = f"./transformed/transformed_{dir}_{file}.csv"
if os.path.exists(file_name):
pass
for asset in os.listdir(f"./uncompressed/{dir}/{file}"):
symbol = asset.split(".csv.gz")[0]
df = pd.read_csv(f"./uncompressed/{dir}/{file}/{asset}", compression='gzip')
df["symbol"] = symbol
tables.append(df)
df = pd.concat(tables)
os.makedirs("./transformed", exist_ok=True)
df.to_csv(file_name)
except Exception as e:
print(e)
print("Skipping")
So how many files do we have now?
find "transformed" -type f | wc -l
749
As you can see, we have fewer files and those files are much bigger. Now, it’s more manageable to load everything into Google bucket and process it with BigQuery.
At this point, you are going to have to upload multiple files to a bucket from local by using the following:
gsutil -m cp -r transformed gs://your-bucket-datalake/finnhub_transformed
Depending on your upload speed, it might take some time to upload. You can do all of the above steps on Google Compute, and the upload speed from Google Compute to Google Bucket will not be an issue.
my-bucket-names/finnhub_transformed/*Schema Auto Detect
Now that our data is within the BigQuery table, we can use BigQuery SQL to compute OHCL.
CREATE TABLE trade_data.one_minute_ohcl AS
WITH MinuteRounded AS (
-- This subquery rounds timestamps to the nearest minute
SELECT
TIMESTAMP_TRUNC(TIMESTAMP_MILLIS(timestamp), MINUTE) AS minute_timestamp,
symbol,
price,
volume,
timestamp -- Include the raw timestamp
FROM
trade_data.tick_data
),
AggregatedData AS (
SELECT
minute_timestamp,
symbol,
FIRST_VALUE(price) OVER w AS open,
MAX(price) OVER w AS high,
MIN(price) OVER w AS low,
LAST_VALUE(price) OVER w AS close,
SUM(volume) OVER w AS volume
FROM
MinuteRounded
WINDOW w AS (
PARTITION BY symbol, minute_timestamp
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
)
SELECT
minute_timestamp,
symbol,
open,
high,
low,
close,
volume
FROM
AggregatedData
GROUP BY
minute_timestamp, symbol, open, high, low, close, volume
ORDER BY
symbol, minute_timestamp;
Once the above command runs, you are going to have another table called one_minute_ohcl that you can export to bucket in the UI. Note that you might receive an error saying that the export should happen into the bucket which is within the same region that you read data from. The error will also tell you where your bucket needs to be. To resolve this you can create a new bucket with correct region.
$149.97 USD for a quarter (can’t have lower than that)5 USD maxThe Kentucky Derby is one of the most prestigious horse racing events in the world, attracting millions of viewers and bettors alike. With so much money on the line, can data analysis give us an advantage over the average bettor?
In this analysis, we’ll explore historical data, track conditions, horse statistics, and other factors that might influence race outcomes.
]]>