23/06/2025
The meaning of this proverb is: On the second line, six stones in a row die, but eight stones in a row live. If there are seven stones, then it depends on sente.
Even with sente, White is narrowed down to a straight-three eye space, after which move 4 by Black leaves White simply dead.

This shows the intermediate case where
there are seven stones in a row on the second line. White lives or dies depending on whose turn is
next.
1. If Black plays first, with Black 5, White is dead.
2. If White plays first, White 1 and 3 produce the living formation
of four eyes in a row. Marked points are miai.
No matter who has the sente, White will end up with a four-eye space, which is enough to live. Marked points are miai.
29/06/2025
If the stones on the second line have a base in the corner, then four will die, but six lives. If there are five stones, then it depends on sente.
White stones will die even if White plays first. With Black 2, White is dead. The importance of the corner can be seen here. While more than 4 stones might ensure life in the corner, it is important that more than 6 must be there on the side to ensure life.
Even if Black plays first, White will live. Marked points are miai.
*/ font-size: 16px; line-height: 1.5; margin-bottom: 0.5rem; list-style-type: disc; } table { width: 100%; border-collapse: collapse; margin: 20px 0; font-size: 0.95rem; } table th, table td { border: 1px solid #ddd; padding: 10px; text-align: left; } table th { background-color: #f4f4f4; font-weight: bold; } code { font-family: "Fira Code", monospace; font-size: 0.9em !important; margin: 0px !important; padding: 0px !important; } pre { margin: 0 !important; padding: 0 !important; }
I've always wanted to implement a custom comment box in my Jekyll blog without relying on third-party services like Disqus or the GitHub API. Since I don't get many visitors, security isn't a major concern, and I don't plan to switch from static pages anytime soon, I decided to build my own solution. For this, I'll be using Supabase for the database and Cloudflare Workers for the serverless logic, keeping everything simple and cost-free.
In this blog post, I'll walk you through the process of implementing the comment box, handling database interactions with Supabase, and using Cloudflare Workers to manage the server-side logic. You can choose other databases as well. It's just that I like Supabase, but it tends to auto-turn off every week or two if inactive, but overall, I find it reliable.
Note that I assume you're already familiar with the basics of Jekyll, as this post is quite abstract. If you'd like the full code, feel free to ask in the comments.
So, the initial idea I came up with involved exposing keys, but yeah, we can do better than that, and I doubt it's worth discussing anyway. For this one, the steps are pretty much the same as you'd think of if asked to create a chat box, haha:
Here are some alternatives, though:
| Service | Best For | Free Plan Limits |
|---|---|---|
| Render | Full backend (Node.js, Python) | 750 hours/month |
| Vercel | Fast serverless functions | 1M / billing cycle |
| Railway | Full backend + Free database | 500 hours/month |
| Cloudflare Workers | Superfast serverless API | 100,000 requests/day |
| Netlify | Frontend + Serverless functions | Similar to Vercel I guess |
So, the first thing we need to do is create a database. Like I mentioned earlier, I'm using Supabase. Go to supabase.io and create an account. After that, create a new project. Once the project is created, you'll be taken to the dashboard. From there, you can create a table using the table editor or the SQL editor. For now, I'll use the table editor.
Click on the table editor, and you'll see an option to "Create a new table." Just give your table a name and add some columns. Here's what mine will look like:
Comments:id (auto)author (text, not nullable)created_at (auto, timestamp with timezone)content (text)post_slug (text)Ah, also, you'll need to turn off RLS (Row-Level Security). I didn't research much about this, but I think it causes some issues for some reason.
Steps are pretty simple. Visit the Cloudflare page and create an account. After signing in, you'll see an option called "Workers & Pages." Click on "Create" and start from a template. You can use the "Hello World" template. After that, I guess it will take you to the editor. If not there must be an option somewhere around. Find it.
There are a few things we need to worry about when creating the API. One of this is the CORS policy. We need to allow our domain to access the API. This can be done by adding the following code to the editor:
const corsHeaders = {
"Access-Control-Allow-Origin": "*", // Allow all domains (Change if needed)
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Authorization"
};
Another thing to consider is filtering out explicit words. Also, it's better to add restrictions on the username and message in the backend rather than the frontend to avoid many issues (you know what I meant right. hehe). With all this in mind, we can now add the following code to the supabase editor:
const SUPABASE_URL = "XXX";
const SUPABASE_ANON_KEY = "XXX-XXX";
// CORS headers (Fixes CORS issue)
const corsHeaders = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Authorization"
};
// Bad Words List (Extended)
const badWordsList = [
"word1", "word2", "word3", ...
];
// Normalize text to prevent bypassing (removes spaces & symbols)
function containsBadWords(text) {
const normalizedText = text.toLowerCase().replace(/[^a-zA-Z0-9]/g, "");
return badWordsList.some(word => normalizedText.includes(word));
}
// Handles all incoming requests
async function handleRequest(request) {
if (request.method === "OPTIONS") {
return new Response(null, { headers: corsHeaders });
}
if (request.method === "GET") {
return fetchComments(request);
} else if (request.method === "POST") {
return submitComment(request);
} else {
return new Response("Method Not Allowed", { status: 405, headers: corsHeaders });
}
}
// Fetch comments for a specific post
async function fetchComments(request) {
const url = new URL(request.url);
const post_slug = url.searchParams.get("post_slug");
if (!post_slug) {
return new Response(JSON.stringify({ error: "Missing post_slug" }), {
status: 400,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
const response = await fetch(`${SUPABASE_URL}/rest/v1/comments?post_slug=eq.${post_slug}&select=*`, {
headers: {
"apikey": SUPABASE_ANON_KEY,
"Authorization": `Bearer ${SUPABASE_ANON_KEY}`,
"Content-Type": "application/json"
}
});
const data = await response.json();
return new Response(JSON.stringify(data), { status: 200, headers: { "Content-Type": "application/json", ...corsHeaders } });
}
// Submit a new comment
async function submitComment(request) {
try {
const { author, comment, post_slug } = await request.json();
// Validate input
if (!author || !comment || !post_slug) {
return new Response(JSON.stringify({ error: "Missing fields." }), {
status: 400,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
if (author.length > 15 || !/^[A-Za-z0-9]+$/.test(author)) {
return new Response(JSON.stringify({ error: "Invalid name. Must be 1-15 alphanumeric characters." }), {
status: 400,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
if (comment.length < 5 || comment.length > 500) {
return new Response(JSON.stringify({ error: "Comment must be between 5-500 characters." }), {
status: 400,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
// Block bad words
if (containsBadWords(comment)) {
return new Response(JSON.stringify({ error: "Inappropriate language detected." }), {
status: 400,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
// Save comment to Supabase
const response = await fetch(`${SUPABASE_URL}/rest/v1/comments`, {
method: "POST",
headers: {
"apikey": SUPABASE_ANON_KEY,
"Authorization": `Bearer ${SUPABASE_ANON_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({ author, content: comment, post_slug })
});
return new Response(JSON.stringify({ message: "Comment added" }), {
status: response.status,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
} catch (error) {
return new Response(JSON.stringify({ error: "Internal Server Error", details: error.toString() }), {
status: 500,
headers: { "Content-Type": "application/json", ...corsHeaders }
});
}
}
addEventListener("fetch", event => {
event.respondWith(handleRequest(event.request));
});
This is a basic implementation of the API. You can add more features like rate limiting, spam protection, avatars, and even replying to comments. But for now, this is enough.
Now, on the frontend, we have two tasks:
We can do this by using the following code:
const API_URL = "cloudflare-worker-url"; // Replace with the actual Cloudflare Worker URL
const POST_SLUG = "{{post.slug}}; // Replace with the actual post slug
async function fetchComments() {
try {
const response = await fetch(`${API_URL}?post_slug=${POST_SLUG}`);
if (!response.ok) {
throw new Error("Failed to fetch comments");
}
const data = await response.json();
console.log(data); // Logs the JSON response
} catch (error) {
console.error("Error fetching comments:", error);
}
}
For the post slug, you can utilize the front matter of the post. To do that, add a `slug` to the front matter of the blog, like this for example:
--- layout: post title: "Title" date: 2025-03-22 last_updated: 2025-03-22 category: Jekyll tags: [jekyll, comment-box, cloudflare] slug: jekyll-comment-box ---
Then, to access it in the comment box, you can simply use {{post.slug}}.
2. Submit comments
To do this, you can do something similar to:
async function submitComment(author, comment) {
try {
const response = await fetch(API_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ author, comment, post_slug: POST_SLUG })
});
const result = await response.json();
if (!response.ok) {
throw new Error(result.error || "Error submitting comment");
}
console.log("Comment submitted successfully");
fetchComments(); // Refresh comments after submission
} catch (error) {
console.error("Error submitting comment:", error);
}
}
And that's it! You now have a custom comment box for your Jekyll blog. You can further customize it by adding features like markdown, replies, and even user authentication. The possibilities are endless, and you can make it your own ✌(-‿-)✌
]]>
After completing my Bachelor's, I suddenly decided, without any clear reason, to prepare for GATE instead of going
abroad. I had always dreamed of pursuing higher studies abroad, and honestly, I wasn’t all that interested in campus
placements. But one day, I just made the decision to start preparing for GATE—looking back now, it was one of those
random, all-of-a-sudden choices that somehow worked out.
When I started, I had pretty much forgotten most of what I had learned in college (except for math and programming,
thankfully). I was literally starting from scratch, but once I got into the rhythm of things, it didn’t feel as
difficult as I had imagined. However, the theory-heavy subjects, like Operating Systems, Computer Networks (except
for the Data Link Layer, which for some reason, I actually enjoyed), and Compilers, were a real struggle for me. To
be honest, I just skipped most of the compilers, because I couldn't bring myself to study them.
Throughout my college life, I always enjoyed computer science and did fairly well. I wasn’t a topper, but I was
definitely above average. So, when I began my GATE journey, I didn't feel like I was totally out of my depth.
Fast forward to GATE 2025: I’m proud to share that I secured an All India Rank of 189 and have been selected
for an MTech program at IIT Bombay. It feels surreal, but looking back, I’m glad I took the leap to prepare for GATE
instead of sticking to my original plan of going abroad.
I started my preparation around the end of May, but by then I was already good at discrete math (especially
combinatorics), algorithms, data structures, and linear algebra. Initially, I had planned to prepare on my own
without any coaching. Honestly, had I stuck to that, it would've been a disaster with the amount of syllabus that we
have and an equal amount
of procrastination
I had. Luckily, a friend mentioned that GoClasses was conducting a scholarship test in a week or two, and I casually
checked out a few of their videos and ended up really liking them and decided to give the test a shot. The good part
is
I
got the 90% scholarship! Since the test was math-based, I found it quite easy. And a 90% discount is a massive deal,
so I just went for it — and that's how my journey with GoClasses began.
Now, one thing about GoClasses is that their content is huge (not soooo huge, but huge). At first glance, it feels
overwhelming and that's something
not so
pleasant to look at. But once I got into it, I realized how good the content actually is. The only subject I didn't
enjoy was compilers. Everything else was pretty smooth for me. For preparation (subject order), I simply followed
the schedule
provided by GoClasses, and I was able to complete the syllabus by mid-November. I did skip about half of the
compilers module, which ended up costing me later during the exam 😅. But it wasn't too big of a deal since my main
goal was to get into an MS program, and I was confident I could score enough for that.
When I first started studying, I could barely manage 2 hours a day, and that really frustrated me. I then started
using the Pomodoro technique and gradually increased my daily study time to around 4–5 hours. But even that isn't
enough, especially if you're aiming for a rank under 500. Eventually, I figured out the real problem — my
phone. I installed Yeolpumta, which basically locks your phone, and if you try to use it, your study progress
resets. That feature alone helped me stay serious. I didn't even want to pause it and break the flow. Slowly, my
daily average went up to 7 hours, which I feel is quite decent, considering I sleep 8 hours.
My daily routine was pretty strict after June. I'd wake up at 6:30 AM, go visit my grandparents for a quick morning walk, take a bath, and then start studying by 7:45 AM. I'd go on till 12:30 PM with small 5-minute breaks in between. After lunch, I'd chill a bit by watching some series till around 2 PM, then get back to studying till 4 or 5 PM. After that, I'd go for swimming, come back, and study again till 7 PM. Post dinner, I'd watch something again till 9 PM and finally squeeze in one last study session till 10:30 PM before going to bed. This routine was so strict to a point I felt I was some kind of robot.
I really liked the content from GoClasses. The videos are well-structured, and the explanations are clear.
Sometimes, I would skip certain topics — like Myhill-Nerode, the pumping lemma, or anything that felt a bit dry —
but overall, except for compilers, I found everything quite good. If you start just six months before the exam, the
syllabus can feel overwhelming. So, I'd recommend starting at least nine months in advance to make your preparation
more manageable.
I haven't watched any other courses, so I can't or won't really compare GoClasses with others — but honestly, I
found Go to be really good. Before buying, check out their
free videos on YouTube and their website. You'll most
likely enjoy the content and teaching style. If not, that's totally fine — in the end, go with
whatever suits your
learning style.
Coaching definitely helps, but your own effort matters the most. If you browse Reddit, you might find some people
saying GoClasses isn't good, it's huge, the content is bad, this, that, blah blah, but I think that's mostly
personal
preference. There are plenty of students from GoClasses who have secured good ranks. What you should understand is
that no coaching institute is perfect, and there's a clear limit to how much they can help you. And most
importantly,
you and I are not in school anymore to be spoon-fed — it's important to take responsibility for your learning. If
something isn't clear, take the
initiative to look it up, ask questions in online communities, or watch other lectures on YouTube. The resources are
out there — it's up to you to make the most of them.
I took both the GoClasses test series and the Zeal test series. To be honest, I didn't do many topic-wise tests
until October, which I really regret now. Both Zeal and Go are good — I didn't like Zeal much for topic-wise tests,
but their full-length tests are excellent and really help with time management. They're slightly tougher than the
actual GATE exam. On the other hand, GoClasses tests are closer to the actual GATE level. GateOverflow's full-length
test series is also good, though quite challenging. I haven't taken any other test series apart from these.
The most important thing with test series is that after taking a certain number of mocks, you should start focusing
more on time management. My personal strategy was to first solve the questions I found easy, then move on to
medium-level ones, and finally recheck my answers. If I had time left, I would attempt the harder questions. This
approach might work for some people but not for others, so it's important to find a strategy that works best for
you.
There are plenty of Telegram groups available. You can even make a small group with friends to discuss and ask doubts. GoClasses also has a public Telegram group where you can ask questions — and usually, someone responds pretty quickly. You should also check out GateOverflow (the holy website for GATE) website and Telegram group. Regarding math, you can always ask your doubts in the famous Math Discord server, or Math Stack Exchange.
I regret not doing the topic-wise tests before October. I should've started them earlier. I also didn't revise the
subjects properly, so I ended up wasting a lot of time after November. The thing with topic-wise and subject-wise
tests is that they help you identify weak areas and revise them effectively. So please don't skip them! And also
start your studies as early as possible.
I also regret not taking care of my health. During the exam week, I had a high fever and headache because I
neglected
my health. I had to rest for the entire week and had to take heavy medications right before the exam. So
please take care of your health.
I have applied for the following programs:
I didn't apply to IIT Kanpur MS because I wasn't interested in traveling such a long distance, and I preferred MTech
over MS there. As for IIT Delhi, I applied mainly because SCAI offered multiple exam centers and online interviews.
Other than that, I wasn't particularly interested in any other colleges.
Also, I have received offers from IIT Kanpur and IIT Kharagpur for MTech.
The interview was on the 16th of April. In the morning, we went through verifications and then had a written test. These are the questions they asked:
The questions were relatively easy. After that, we had the interview. I guess most of the time we spent chatting about general stuff and solving a few questions. We talked a bit about my background and the CDS program. Then, the interviewer started asking questions on linear algebra and algorithms, where I had to explain the logic and write pseudocode on the board. One question was about using dynamic programming (DP) for matrix chain multiplication. Overall, the interview went well, although I wasn't able to write the entire pseudocode. Anyway, I made it to the provisional list, even though I didn't make it to the final list.
I attended the CDS interview but skipped CSA because by then, I had already been selected for IIT Madras, which was my first preference. In the CDS interview, you first write a test, and then the interview. The written test is so easy that I don't even understand why they conduct it in the first place. The interview was quite long and stretched until 8 pm, and I was already tired at that point. The questions were mostly from linear algebra, basic ML, and some coding. Since I didn't want to face Bangalore traffic, I decided to skip CSA the next day.
IIT Madras releases a shortlist, and I think for the general category, a score of 600-650 is a safe zone. We need to travel to Madras for the interview. We had the written test on the 5th of May. The test was relatively easy and based on the GATE syllabus. I won't list the questions here since I don't remember them exactly, but it was an easy GATE-level paper. I have a feeling that the cutoffs are high; probably 25/30 is required to get shortlisted. No official information on the cutoff, just a guess that's most probably correct. Around 50 people from ~350-400 were shortlisted. Some had their interviews on the same day, and others had them the next day. I had both my interviews the next day. So, for the rest of the day, my friends and I just roamed around and studied a bit of linear algebra since our preference was Intelligence Systems.
My theory interview was first. For the theory interview, you have the freedom to choose the subject. I chose discrete math, and they asked questions on combinatorics, functions, and relations. I answered almost all the questions, and very few with some hints. Overall, the interviewers were very friendly and didn't expect me to answer everything but focused more on how I approached the problems.
Then, I had my interview for Intelligent Systems. It started with an introduction and a question about which subject I would prefer. I chose linear algebra. For the Intelligent Systems track, there's a coding question first. Many candidates get easy questions such as reversing an array, and mine was about counting one-child nodes in a binary tree. I solved it easily, wrote the code on the board, and explained it. After that, they started asking questions from linear algebra. I answered most of them except for the last one, where I had to take a hint. It was about finding eigenvalues and eigenvectors without a pen and paper, using properties. I solved most of the eigenvalues and two eigenvectors, but the last one was tricky for me. Still, I managed to answer it with a small hint from the interviewer. Overall, the interview went well, but I wasn't very confident about the Intelligent Systems part, although I felt confident about the theory panel. They released the provisional list a week later, and I was on both panels.
From Madras, I flew to Bombay the next day, and we had the written test for MS based on the panel I chose earlier. I chose Intelligence Systems. The paper was a little harder than GATE and required knowledge of additional subjects. But I think if you know the GATE math well, it is easier to clear.
The next day, we had coding tests for MTech and MS RA. It was easy for me, and I believe you will get shortlisted for an interview if you can solve 3 out of 5 questions. The questions mostly covered arrays and recursion:
I messed up this one. But I still solved questions related to Linear Algebra, like explaining why we take $|A - \lambda I| = 0$ when finding eigenvalues. One question was very similar to: "Prove that the Binomial distribution converges to the Poisson distribution under the condition that the number of trials $n \to \infty$, the probability of success $p \to 0$, and the product $\lambda = np$ remains constant." There was also a question on best-fit regression.
These interviews were based on the project you choose. I chose a generative AI project for both MTech and MS RA, and a CS-101 course for MTech RA. The interviews were okay and didn't go too badly. If you choose projects like this, there will be an additional Python test. But the interviews were not that bad overall. However, they expect you to know at least something about the project you choose.
The written test will be tough, believe me! But like the other interview tests, if you know the basics, you will at least get shortlisted. I was shortlisted for the interview. My interview started with an introduction, followed by questions on probability and linear algebra, which were okay. Then, they asked me about computer science topics like the difference between a Trie and a Tree, which I couldn't answer. The second part of the interview with CS questions was a bit terrible.
To be honest, I wasn't interested in going to IIT Delhi (due to personal reasons), but I applied to the ScAI department because it seemed cool. I had my exams at IIT Bombay when I was there for the IITB interview. You can choose from multiple centers, including Mumbai, Bangalore, Delhi, Hyderabad, and Kolkata. The exam was okay, with questions on aptitude, math, algorithms, and ML. I felt that the math section was a bit harder than the others.
These are some resources that you can check out if you're interested in GATE preparation or interviews:
I would like to sincerely thank my parents, friends, and teachers for their constant support throughout this
journey.
A special thanks to Deepak Sir, Sachin Sir, and Arjun Sir from Go Classes for their invaluable guidance and
scholarship support.
I am also grateful to my friends from the GATEFATE group for their encouragement, discussions, and companionship
along the way. I have accepted the offer from IIT Bombay MTech RA, because it perfectly aligns with what I want to
do in the future.
Also, note that the labs from the DS-AI department at IIT Madras are no longer available to CS students. So, if you
are planning to join any of those labs or are interested in professors from the DS-AI department, please write the
GATE DA exam for MTech. For MS, you can apply with a CS score.
By the way, I haven't included everything in this blog — so if you have any doubts, feel free to ask in the comments or via email/LinkedIn.
]]>Go, also known as Baduk in Korea, has a long history spanning over 2500 years and is loved by many in East Asia. In this game, players take turns placing black and white stones on a grid, aiming to control territory and capture their opponent's stones. The game gets more complicated as the board size increases, but typically, players use boards with 9x9, 13x13, or 19x19 squares.
In 2016, DeepMind, now Google DeepMind, made waves when AlphaGo defeated the legendary Lee Sedol. What made AlphaGo special was its use of advanced technology like deep neural networks and reinforcement learning, along with smart algorithms like Monte Carlo Tree Search (MCTS). By training on lots of Go data and playing millions of games against itself, AlphaGo learned to evaluate board positions and make strategic decisions really well.
Environment for this is heavily inspired by the repository Deep
Learning and the Game of Go. Some changes have been made to the original code to make it more readable and
understandable. Regardless of that, the original code contains more information and is much more detailed. It is a
part of the book "Deep Learning and the Game of Go" by Max Pumperla and Kevin Ferguson. In this article, we
are only focusing on the bare minimum to train an AlphaGo on a 9x9 board. The codes for the environment could be
found in the alphago/env folder. If you are planning to code along, create a
folder
alphago and copy the env folder from the
repository to the alphago folder.
Initially I thought of just directly going to the codes of AlphaGo, but then later I thought it would be better to
include many details like MCTS, neural networks, etc. separately. That way it would be easier for a beginner to
understand the code. I would highly recommend going through the book Deep Learning and the Game of Go. It's
a great book with a lot of information for beginners while this blog assumes you are already familiar with the
basics of Go and PyTorch, deep learning.
alphago/
└── env/
├── generate_zobrist.py
├── go_board.py
├── gotypes.py
├── scoring.py
├── utils.py
└── zobrist.py
generate_zobrist.py This file is used to generate Zobrist hash for the
board. Run
this file and copy output to zobrist.py, or python3 generate_zobrist.py > zobrist.py
from the
alphago/env directory.
go_board.py This file contains the GoBoard class which is our environment.
zobrist.py This file contains the Zobrist hash for the board
which is generated using
generate_zobrist.py.
Use the following snippet to create a 9x9 board.
>>> from alphago.env import go_board
>>> from alphago.env.utils import print_board, print_move
>>> board_size = 9
>>> game = go_board.GameState.new_game(board_size=board_size)
>>> print_board(game.board)
9 . . . . . . . . .
8 . . . . . . . . .
7 . . . . . . . . .
6 . . . . . . . . .
5 . . . . . . . . .
4 . . . . . . . . .
3 . . . x . . . . .
2 . . . . . . . . .
1 . . . . . . . . .
A B C D E F G H I
Movements are achieved using alphago.env.go_board.Move object. There are
3 types of movements
available: play(point), pass(),
and resign(). To apply the movement, you can use the
apply_move method from the game object.
>>> from alphago.env.go_board import Move
>>> from alphago.env.gotypes import Point
>>> point = Point(row=3, col=4)
>>> move = Move.play(point)
>>> # You can also do Move.pass() or Move.resign()
>>> print_move(game.next_player, move)
Player.black D3
>>> game = game.apply_move(move)
>>> print_board(game.board)
9 . . . . . . . . .
8 . . . . . . . . .
7 . . . . . . . . .
6 . . . . . . . . .
5 . . . . . . . . .
4 . . . . . . . . .
3 . . . x . . . . .
2 . . . . . . . . .
1 . . . . . . . . .
A B C D E F G H I
We've figured out how to handle tic-tac-toe or a 5x5 Go board with our computer, which only has a few hundred thousand possible situations. But what about games like Go or chess? They've got more situations than all the atoms on Earth! So, we're using something called Monte Carlo Tree Search (MCTS) to figure out the game state without any fancy strategies. The idea's pretty simple: we're just randomly looking around to see how good a situation is. We build a tree with all the possible moves we can make from a given situation. But then again, we can't make a tree with every single situation because there are just way too many of them for our systems to handle.
So, MCTS basically has four parts:
Let's break down every step in this.
Starting from the root node, we select the child node with the highest UCT (Upper Confidence Bound 1 applied to trees) value. The UCT value is calculated as follows:
$$UCT = \frac{Q}{N} + c \sqrt{\frac{\log{P}}{N}}$$Where:
The first part of the equation is the exploitation term, and the second part is the exploration term. The exploration
parameter is a hyperparameter that controls the balance between exploration and exploitation. A higher value of $c$
will lead to more exploration, and a lower value will lead to more exploitation. The reason why we need to explore
is simple:
The agent has to exploit what it has already experienced in order to obtain a reward, but it also has
to explore in order to make better action selections in the future.(Reinforcement Learning - An Introduction by Richard S. Sutton and Andrew G., Chapter 1)
The implementation is as follows:
def select_child(self, node):
"""
Selection
"""
total_rollouts = sum(child.num_rollouts for child in node.children)
log_total_rollouts = math.log(total_rollouts)
best_score = -1
best_child = None
for child in node.children:
win_pct = child.winning_pct(node.game_state.next_player)
exploration_factor = math.sqrt(
log_total_rollouts / child.num_rollouts
)
uct_score = win_pct + self.temperature * exploration_factor
if uct_score > best_score:
best_score = uct_score
best_child = child
return best_child
Once we have selected a node, we need to expand it. We will add a new node to the tree at random. This is also called
rollout.
Once we have expanded the node, we will simulate the game from that node until the end and see who wins. It's to evaluate the current state of the game.
Once we have the result of the simulation, we will update the nodes on the path from the root to the selected node. We will update the number of wins and the number of visits to each node.
class MCTSNode:
def __init__(self, game_state, parent=None, move=None):
self.game_state = game_state
self.parent = parent
self.move = move
self.children = []
self.num_rollouts = 0
self.unvisited_moves = game_state.get_legal_moves()
self.win_counts = {
Player.black: 0,
Player.white: 0,
}
def add_random_child(self):
index = random.randint(0, len(self.unvisited_moves) - 1)
new_move = self.unvisited_moves.pop(index)
next_state = self.game_state.apply_move(new_move)
child = MCTSNode(next_state, self, new_move)
self.children.append(child)
return child
def fully_expanded(self):
return len(self.unvisited_moves) == 0
def is_terminal(self):
return self.game_state.is_over()
def winning_pct(self, player):
return float(self.win_counts[player]) / float(self.num_rollouts)
Next, we define a new Agent which uses this data structure and MCTS algorithm to play Go. We will now need to define
select_move and for that, we will need to implement these methods to our agent:
I have already discussed the first one, and the rest are relatively easy once you have an idea of how MCTS works. The
complete implementation of the MCTS agent can be found in alphago/agents/mcts_agent.py file.
In the original paper, we train 4 different neural networks:
First, we have a fast rollout policy, which is a smaller network compared to the rest, and
we actually use it in tree
search for speed. In our case, since the board is 9x9, I won't be training a separate rollout policy network, but
you are free to train it, and for that purpose, you can use the GoNet in alphago/networks.
Next
we have the
supervised learning policy (SL) network that uses data from game plays to train the network. Then, we improve this
network by playing against itself using reinforcement learning. Lastly, we have a value network that predicts the
outcome of moves for the sake of strategic decision-making. Now let's start from the zero to train an AlphaGo.
Here we will implement a simple one-plane encoder. We will be using a 1x9x9 matrix to represent the board. In this
matrix, 0 represents empty points, 1 represents the points occupied by the current player, and -1 represents the
points occupied by the opponent. I won't be going into any detailed explanation of the implementation since it is
pretty straightforward. The encoder is implemented in alphago/encoders/oneplane.py file. The following
functions are relatively important in the encoder.
def encode(self, game_state):
board_matrix = np.zeros(self.shape())
next_player = game_state.next_player
for r in range(self.board_height):
for c in range(self.board_width):
p = Point(row=r + 1, col=c + 1)
go_string = game_state.board.get_go_string(p)
if go_string is None:
continue
if go_string.color == next_player:
board_matrix[0, r, c] = 1
else:
board_matrix[0, r, c] = -1
return board_matrix
def encode_point(self, point):
return self.board_width * (point.row - 1) + (point.col - 1)
def decode_point_index(self, index):
row = index // self.board_width
col = index % self.board_width
return Point(row=row + 1, col=col + 1)
Even though there are different encoders we can try, I won't be discussing them because the current one we have works perfectly for 9x9 boards. In case you want to try creating one, please refer to the Extra Notes.
Now that we have our encoder, we can start generating data for our neural network. We will use the MCTS agent to
generate data for us. We will play a game with the MCTS agent and store the encoded board state and the move made by
the MCTS agent. We will then use this data to train our neural network. You can find the implementation of the data
generation in archives/generate_mcts_game.py file. Additionally, I have
rewritten the code, which now
uses multiprocessing to play multiple games at the same time. You can find that in
archives/generate_mcts_game_mp.py file.
But don't you think this is actually troublesome because it will take us a lot of time to get enough data, and also
we
can obtain data of games which are equal or better in terms of quality from real-world Go games? So when you think
about factors like that, the best option we have right now is to download the datasets which were already created by
other people, and most of them are the game records of top amateur games played on competitions or servers like OGS.
Here are some of the links to the datasets:
All of these datasets are in SGF
format and we need to
convert them to tensors for us to train the neural networks. I have implemented a dataset which is compatible with
PyTorch DataLoader in alphago/data folder. The idea behind this implementation
is simple:
X amount of games (i.e X sgf files)[Note]:If you have your own dataset, paste the folder
containing sgf files into your
dataset folder and pass name of the folder to the dataloader.
This is how you use the GoDataSet.
from alphago.data.dataset import GoDataSet
BOARD_SIZE = 9
print(GoDataSet.list_all_datasets()) #prints all available datasets
training_dataset = GoDataSet(
encoder = "oneplane", # encoder used
game = "go9", # Dataset name,
no_of_games = 8000, # number of games
dataset_dir="dataset", # directory
seed = 100,
redownload = False,
avoid = [] # files to avoid
)
test_dataset = GoDataSet(
encoder="oneplane", no_of_games=100, avoid=training_dataset.games
)
train_loader = torch.utils.data.DataLoader(
training_dataset, batch_size=64, shuffle=True
)
Let's start by defining the neural network which trains on the Go9 dataset.
class AlphaGoNet(nn.Module):
def __init__(self, input_shape, num_filters=192, dropout=0.3):
super(AlphaGoNet, self).__init__()
self.input_shape, self.num_filters, self.dropout = input_shape, num_filters, dropout
self.conv1 = nn.Conv2d(input_shape[0], num_filters, 3, 1, 1)
self.conv2 = nn.Conv2d(num_filters, num_filters, 3, 1, 1)
self.conv3 = nn.Conv2d(num_filters, num_filters, 3, 1)
self.conv4 = nn.Conv2d(num_filters, num_filters, 3, 1)
self.bn1, self.bn2, self.bn3, self.bn4 = *[
nn.BatchNorm2d(num_filters) for _ in range(4)
]
self.fc1 = nn.Linear(num_filters * (
(input_shape[1] - 4) * (input_shape[2] - 4)), 1024)
self.fc_bn1 = nn.BatchNorm1d(1024)
self.fc2 = nn.Linear(1024, 512)
self.fc_bn2 = nn.BatchNorm1d(512)
self.fc3 = nn.Linear(512, input_shape[1] * input_shape[2])
self.fc4 = nn.Linear(512, 1)
def forward(self, s):
s = F.relu(self.bn1(self.conv1(s.view(-1, 1, self.input_shape[1], self.input_shape[2]))))
s = F.relu(self.bn2(self.conv2(s)))
s = F.relu(self.bn3(self.conv3(s)))
s = F.relu(self.bn4(self.conv4(s)))
s = s.view(-1, self.num_filters * ((self.input_shape[1] - 4) * (self.input_shape[2] - 4))))
s = F.dropout(F.relu(self.fc_bn1(self.fc1(s))), p=self.dropout, training=self.training)
s = F.dropout(F.relu(self.fc_bn2(self.fc2(s))), p=self.dropout, training=self.training)
pi, v = self.fc3(s), self.fc4(s)
if self.training:
return F.log_softmax(pi, dim=1), torch.tanh(v)
return F.softmax(pi, dim=1), torch.tanh(v)
Now that we have a model, we can just go ahead and train it as you would with a normal deep learning model. You can
use torch.nn.functional.nll_loss or torch.nn.CrossEntropyLoss. If you are going to use the latter,
then please one-hot encode the labels before feeding them to the neural network. Code for the training can be found
in
alphago_sl.py.
I have also trained this with the 19x19 KGS dataset, but the results weren't that great. Of course, it's better than a random agent, but that doesn't mean it's good enough to win a real-world game, let alone a beginner-level game. Maybe training on more data and increasing epochs might do the trick, but it won't be enough to defeat an average Go player. The only reason our 9x9 model is performing well at this point is because of its lower complexity compared to larger boards. But then again, I wouldn't go as far as to say our model is good enough to defeat an intermediate Go player. Why? It seems like something is missing, isn't it? As an answer to this, I say experience. When we think about how we humans are different from these deep learning or any machine learning models, we might have a lot of answers, but the most relevant and important one would be our ability to learn from experience. And how do we gain experience? The answer is simple: we humans spend a lot of time practicing, training, and improving upon our knowledge. Likewise, we introduce the concept of learning from experience, the concept of practice, to computers by using reinforcement learning. The next section will be about how we can use reinforcement learning to improve our model by practicing, also known as self-play.
[Note]: We are supposed to train the fast rollout policy network, but I won't be doing that and instead will be using the supervised model as the fast rollout policy. However, as I explained before, you can feel free to train the fast rollout if you want to.
I won't be explaining much about reinforcement learning in this article since it won't fit within this limited space, and our main objective here is to teach an agent to play Go. However, I will provide a brief overview of reinforcement learning. Reinforcement learning is very similar to how we humans learn. We are rewarded when we perform well and punished when we make mistakes. In reinforcement learning, the focus is on learning optimal behavior to maximize rewards.
If you're interested in learning more about reinforcement learning, you can explore the following resources:
These will provide you with more comprehensive insights into the topic.
Back to our topic, we know that experience is very important for a human to take action. For animals, we have brains
to store these experiences. But what about our agents? It's not like our agents have a brain already integrated into
them. To address this, we create a small data structure which stores the relevant information we need. In our case,
we
store the state of the board, the action we took, and the reward we received. Now that we have a brain, don't we
need
to fill our brains with some relevant data? So, we do something called self-play where an agent plays against
another.
Collecting experience by playing with a random agent isn't a great idea since the gathered data isn't great. We can
use MCTS, but it would take a huge amount of time since you will need records of a lot of games, probably around
10,000 or more. Then we can do an agent vs. human, but that would take an eternity, lol. So, we play against the
supervised learning model against itself. Essentially, what happens is self-play of two instances of the deep
learning
agent we trained earlier. By the end, we will discuss the training of the last two networks we need for AlphaGo, the
Policy network and the Value network.
In the original paper, they used the supervised model and improved it using self-play, and this is what we should do.
So here, for us to improve the model, the idea is simple: we take the supervised model, do a lot of self-play,
around
10,000 games would be fine but remember more is better, and store those data. Since I won't be explaining the
complete
codes, I will explain how the data would be like. So, say you won a game in self-play; in that case, for all states
and all the actions we encountered and took, we store a +1 reward, and for the states and actions which we lost the
game, we store a -1 reward. And now you can go ahead and train the policy network like how you would do for a normal
neural network. Codes for the self-play and training can be found in alphago_rl.py.
Now we need a value network to evaluate the board position. We can use the policy model which we trained in the last
section for training our value network. In the last case, we considered both the state and action to get the y value
which is the reward. In this case, since we are only evaluating how good a position is, we will only need the state.
If the state in self-play resulted in a win then we consider the target label for that as +1 else we give 0. Codes
for
the training can be found in alphago_value.py.
Now let's bring together all that we have. In the beginning, we trained two policy networks, one of which is small and the other one is comparatively large. The smaller, faster policy network is used in tree search rollouts, while the larger one is optimized for accuracy. After supervised learning, we engage in self-play to improve this network and make it stronger. In our case, we won't be training a different fast rollout network; instead, we'll use the supervised model. With the data from self-play by the policy network, we train a value network, which is an essential part of AlphaGo.
AlphaGo uses a more complex tree search than the one we used before, but the 4 parts of classic MCTS are still relevant to AlphaGo's MCTS. The only difference is that we are using a deep learning network to evaluate positions and nodes. Let's start with the rollout.
def policy_rollout(self, game_state):
for step in range(self.rollout_limit):
if game_state.is_over():
break
move_probabilities = self.rollout_policy.predict(game_state)[0]
encoder = self.rollout_policy.encoder
for idx in np.argsort(move_probabilities)[::-1]:
max_point = encoder.decode_point_index(idx)
greedy_move = Move(max_point)
if greedy_move in game_state.legal_moves():
game_state = game_state.apply_move(greedy_move)
break
next_player = game_state.next_player
winner = game_state.winner()
if winner is not None:
return 1 if winner == next_player else -1
else:
return 0
This is pretty much the same as our rollout in classic MCTS. However, as you can see, there is a rollout_limit in place to avoid the rollout taking up a huge amount of time. Then, we have a policy_probabilities function to compute the policy values for the legal moves available to us.
def policy_probabilities(self, game_state):
encoder = self.policy.encoder
outputs = self.policy.predict(game_state)[0]
legal_moves = game_state.legal_moves()
if len(legal_moves) == 2:
return legal_moves, [1, 0]
encoded_points = [
encoder.encode_point(move.point) for move in legal_moves if move.point
]
legal_outputs = outputs[encoded_points]
normalized_outputs = legal_outputs / np.sum(legal_outputs)
return legal_moves, normalized_outputs
Next we have our most important function, select_move.
def select_move(self, game_state):
for sim in range(self.num_simulations):
current_state = game_state
node = self.root
for depth in range(self.depth):
if not node.children:
if current_state.is_over():
break
moves, probabilities = self.policy_probabilities(current_state)
node.expand_children(moves, probabilities)
move, node = node.select_child()
current_state = current_state.apply_move(move)
value = self.value.predict(current_state)
rollout = self.policy_rollout(current_state)
weighted_value = (
1 - self.lambda_value
) * value + self.lambda_value * rollout
node.update_values(weighted_value)
moves = sorted(
self.root.children,
key=lambda move: self.root.children.get(move).visit_count,
reverse=True,
)
for i in moves:
if not is_point_an_eye(
game_state.board, i.point, game_state.next_player
):
move = i
break
else:
move = Move.pass_turn()
self.root = AlphaGoNode()
if move in self.root.children:
self.root = self.root.children[move]
self.root.parent = None
return move
The idea of select_move is simple: we play a number of simulations. We restrict the game length using depth, so we play until the specified depth is reached. If we don't have any children, we expand them using the probabilities of moves from the strong policy network. Note that the policy network returns all the moves and their associated probabilities. We update this information to the AlphaGo Node using the following function in the Node.
def expand_children(self, moves, probabilities):
for move, prob in zip(moves, probabilities):
if move not in self.children:
self.children[move] = AlphaGoNode(parent=self, probability=prob)
If we have children, then we select one from them and play the move. We select using the following function in the AlphaGoNode:
def select_child(self):
return max(
self.children.items(),
key=lambda child: child[1].q_value + child[1].u_value
)
After each simulation, we find the value of the value network and rollout by the fast policy, and then combine them using the equation:
After combining, we update the value of AlphaGoNode using the following function that we have in the node:
def update_values(self, leaf_value):
if self.parent is not None:
self.parent.update_values(leaf_value)
self.visit_count += 1
self.q_value += leaf_value / self.visit_count
if self.parent is not None:
c_u = 5
self.u_value = (
c_u
* np.sqrt(self.parent.visit_count)
* self.prior_value
/ (1 + self.visit_count)
)
The rest is simple; we just take the child which is visited the most, and to save time, we avoid choosing positions like an eye. This is also partially because our agent doesn't have the ability to pass until it has no other option.
AlphaGo played as white in all matches, with no inherent advantage. It won every game except when AlphaGo faced itself, where black won. Something you need to note is that this version of AlphaGo only had 10 simulations, a depth of 30, and a rollout limit of 40 for the sake of computation. Even though by doing this, we are trading accuracy, it still manages to win an average of 60 games out of 100 against the supervised agent and the MCTS Agent with more rounds.
alphago/encoders.*/ font-size: 1rem; /* Match the font size of
*/ line-height: 1; /* Match the line height of
*/ margin-bottom: 0.5rem; /* Add consistent spacing between list items */ }
I'm thrilled to share that I've been selected for Google Summer of Code (GSoC) at Ml4SCI. I'll be working on developing equivariant neural networks for dark matter substructure with strong lensing.
I'll be mentored by:
Throughout the summer, I'll be documenting my work and sharing the things I learn. I hope you enjoy reading about my experiences and progress.
Strong gravitational lensing is a promising probe of the substructure of dark matter to better understand its underlying nature. Deep learning methods have the potential to accurately identify images containing substructure and differentiate WIMP particle dark matter from other well-motivated models, including vortex substructure of dark matter condensates and superfluids. However, accurately identifying images containing substructure and differentiating between various dark matter models can be challenging. Deep learning methods, particularly equivariant neural networks, provide a promising approach to addressing these challenges. This project will focus on the further development of the DeepLense pipeline that combines state-of-the-art deep learning models with strong lensing simulations based on lenstronomy. The focus of this project is using equivariant neural networks for the classification and regression of dark matter particle candidates (e.g. CDM, WDM, axions, SIDM).
Gravitational lensing is a phenomenon where the gravity of massive objects, like clusters of galaxies or individual stars, distorts and magnifies the light of more distant objects behind them, allowing us to study the details of early galaxies too far away to be seen with current telescopes. Hubble observations have greatly increased the number of Einstein rings and helped create maps of dark matter in galaxy clusters. The lensed images of crosses, rings, arcs, and more not only provide intriguing visuals but also enable astronomers to probe the distribution of matter in galaxies and clusters of galaxies, as well as observe the distant universe.
One promising method for studying the nature of dark matter is through strong gravitational lensing. By analyzing the perturbations in lensed images that cannot be explained by a smooth lens model, such as those caused by subhalos or line-of-sight halos, researchers can gain insights into the distribution and properties of dark matter.
Equivariant neural networks are a type of neural network that can preserve the symmetries of input data, particularly data with group symmetries. They achieve this through the use of a group representation, which describes how a group acts on a vector space. The convolution operation, which is a key building block of many neural networks used in image and signal processing, is defined based on this group representation.
Compared to standard convolutional neural networks, where filters are learned independently of the input data, in equivariant neural networks, the filters are learned as a function of the group representation. This ensures that the filters are consistent with the symmetries of the input data, making the learning process more efficient and allowing for better generalization.
Different types of group representations, such as rotation, translation, or permutation representations, can be used in equivariant neural networks depending on the type of data being processed and the symmetry properties of the data.
E(2)-steerable Convolutional Neural Networks (CNNs) are neural networks that exhibit rotational and reflectional equivariance, meaning that their output remains independent of the orientation and reflection of the input image. This property can be demonstrated by feeding a randomly initialized E(2)-steerable CNN with rotated images, and visualizing the feature space of the network after a few layers. The feature space consisted of a scalar field and a vector field, which were color-coded and represented by arrows, respectively. The visualization showed that the feature space underwent equivariant transformations under rotations, and the output was stable under changes in the input image orientation. To further illustrate this stability, the feature space was transformed into a comoving reference frame by rotating the response fields back, resulting in a stabilized view of the output.
The invariance of the features in the comoving frame validates the rotational equivariance of E(2)-steerable CNNs empirically. Note that the fluctuations of responses are discretization artifacts due to the sampling of the image on a pixel grid, which does not allow for exact continuous rotations.
Conventional CNNs are not equivariant under rotations, leading to random variations in the response with changes in image orientation. This limits the ability of CNNs to generalize learned patterns between different reference frames. Equivariant neural networks, such as E(2)-steerable CNNs, address this limitation by ensuring that the feature space of the network undergoes a specified transformation behavior under input transformations. As a result, these networks effectively capture symmetries in the data, making them useful for tasks such as studying substructures in strong gravitational lensing images.
In summary, there are four major reasons to favor an equivariant neural network:
The answer to this question is explained in the paper "Equivariance versus Augmentation for Spherical Images" by Jan E Gerken et al. There, they analyze the role of rotational equivariance in convolutional neural networks applied to spherical images. They demonstrated that non-equivariant classification models require significant data augmentation to reach the performance of smaller equivariant networks. They also showed that the performance of non-equivariant semantic segmentation models saturates well below that of equivariant models as the amount of data augmentation is increased. Additionally, they found that the total training time for an equivariant model is shorter compared to a non-equivariant model with matched performance.
Group equivariant Convolutional Neural Networks (G-CNNs), are a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. The feature maps of a GCNN are functions over the elements of the group. A naive implementation of group convolution requires computing and storing a response for each group element. For this reason, the GCNN framework is not particularly convenient to implement networks equivariant to groups with infinite elements.
Steerable CNNs are a more general framework that solves this issue. The key idea is that, instead of storing the value of a feature map on each group element, the model stores the Fourier transform of this feature map, up to a finite number of frequencies.
Steerable CNNs are a neural network architecture that is equivariant to both 2D and 3D isometries, as well as equivariant MLPs. Equivariant neural networks ensure that the transformation behavior of their feature spaces is specified under transformations of their input. For example, conventional CNNs are designed to be equivariant to translations of their input, meaning that a translation of image results in a corresponding translation of the network's feature maps. However, E(n)-equivariant models, including steerable CNNs, are guaranteed to generalize over a broader range of transformations which includes translation, rotation, and reflection, and are thus more data-efficient than conventional CNNs.
The feature spaces of E(n)-equivariant steerable CNNs are defined as spaces of feature fields characterized by their transformation law under rotations and reflections. Examples of such feature fields include scalar fields (such as grayscale images or temperature fields) and vector fields (such as optical flow or electromagnetic fields).
Equivariant Transformer (ET) layers are image-to-image mappings that incorporate prior knowledge on invariances with respect to continuous transformation groups. ET layers can be used to normalize the appearance of images before classification (or other operations) by a convolutional neural network.
Harmonic Networks or H-Nets are a type of convolutional neural network (CNN) that exhibits equivariance to patch-wise translation and 360-rotation, which is not the case for regular CNNs, where global rotation equivariance is typically sought through data augmentation. They achieve this by using circular harmonics instead of regular CNN filters, which return a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient, and low computational complexity representation, and deep feature maps within the network encode complicated rotational invariants.
There are a few advantages to using Harmonic nets:
The e2wrn (Equivariant Wide ResNet) is a technique to attain equivariance in ResNet. It utilizes the codebase available at Wide ResNet as its foundation and can be constructed using escnn/e2cnn.
Geo Jolly Cheeramvelil*, Sergei V Gleyzer, Michael W Toomey, "Equivariant Neural Network for Signatures of Dark Matter Morphology in Strong Lensing Data", Machine Learning for Physical Sciences 2023