Jekyll2026-03-20T04:56:29+00:00https://kingjuno.github.io/feed.xmlGeo JollySoftware Engineer, AI/ML enthusiast sharing insights on technology, programming, and research.Geo JollyGo Proverbs2025-06-23T00:00:00+00:002025-06-23T00:00:00+00:00https://kingjuno.github.io/blog/2025/06/23/goproverbs A while ago, I was watching Dwyrin's proverb series on YouTube, and to be honest, it was quite fun to watch. Since then, I have always wanted to dig into more proverbs but didn't have the time or energy. Lately, I thought it would be fun if I could study a proverb every week and document it here. So I will be doing that from now on, and I hope you will enjoy it too!

1.

Six Die, Eight Live

23/06/2025

The meaning of this proverb is: On the second line, six stones in a row die, but eight stones in a row live. If there are seven stones, then it depends on sente.

Six Die diagram
Six Die

Even with sente, White is narrowed down to a straight-three eye space, after which move 4 by Black leaves White simply dead.


Seven is Unsettled - Black First
Seven is Unsettled - White First
Seven is Unsettled

This shows the intermediate case where there are seven stones in a row on the second line. White lives or dies depending on whose turn is next.
1. If Black plays first, with Black 5, White is dead.
2. If White plays first, White 1 and 3 produce the living formation of four eyes in a row. Marked points are miai.


Eight Live diagram
Eight Live

No matter who has the sente, White will end up with a four-eye space, which is enough to live. Marked points are miai.

2.

Four Die, Six Live

29/06/2025

If the stones on the second line have a base in the corner, then four will die, but six lives. If there are five stones, then it depends on sente.

Four Die diagram
Four Die

White stones will die even if White plays first. With Black 2, White is dead. The importance of the corner can be seen here. While more than 4 stones might ensure life in the corner, it is important that more than 6 must be there on the side to ensure life.


Six Live diagram
Six Live

Even if Black plays first, White will live. Marked points are miai.


]]>
Geo Jolly
A New Recipe for Jekyll Comments2025-03-22T00:00:00+00:002025-03-22T00:00:00+00:00https://kingjuno.github.io/blog/2025/03/22/jekyll-comment-box .blog-bullet { font-family: "Nunito Sans", sans-serif; /* Match the font style of

*/ font-size: 16px; line-height: 1.5; margin-bottom: 0.5rem; list-style-type: disc; } table { width: 100%; border-collapse: collapse; margin: 20px 0; font-size: 0.95rem; } table th, table td { border: 1px solid #ddd; padding: 10px; text-align: left; } table th { background-color: #f4f4f4; font-weight: bold; } code { font-family: "Fira Code", monospace; font-size: 0.9em !important; margin: 0px !important; padding: 0px !important; } pre { margin: 0 !important; padding: 0 !important; }

I've always wanted to implement a custom comment box in my Jekyll blog without relying on third-party services like Disqus or the GitHub API. Since I don't get many visitors, security isn't a major concern, and I don't plan to switch from static pages anytime soon, I decided to build my own solution. For this, I'll be using Supabase for the database and Cloudflare Workers for the serverless logic, keeping everything simple and cost-free.

In this blog post, I'll walk you through the process of implementing the comment box, handling database interactions with Supabase, and using Cloudflare Workers to manage the server-side logic. You can choose other databases as well. It's just that I like Supabase, but it tends to auto-turn off every week or two if inactive, but overall, I find it reliable.

Note that I assume you're already familiar with the basics of Jekyll, as this post is quite abstract. If you'd like the full code, feel free to ask in the comments.

1.

Intro

So, the initial idea I came up with involved exposing keys, but yeah, we can do better than that, and I doubt it's worth discussing anyway. For this one, the steps are pretty much the same as you'd think of if asked to create a chat box, haha:

  • Create a simple backend with Python or Node.js and host it somewhere.
  • Use Supabase for the database because of its simplicity and features.
  • Choose a hosting service. I went with Cloudflare Workers for its free tier. And most importantly 100k requests per day is more than enough. I doubt my blog would use even 0.01% of that.

Here are some alternatives, though:

Service Best For Free Plan Limits
Render Full backend (Node.js, Python) 750 hours/month
Vercel Fast serverless functions 1M / billing cycle
Railway Full backend + Free database 500 hours/month
Cloudflare Workers Superfast serverless API 100,000 requests/day
Netlify Frontend + Serverless functions Similar to Vercel I guess
2.

Implementation

Create Database

So, the first thing we need to do is create a database. Like I mentioned earlier, I'm using Supabase. Go to supabase.io and create an account. After that, create a new project. Once the project is created, you'll be taken to the dashboard. From there, you can create a table using the table editor or the SQL editor. For now, I'll use the table editor.

Click on the table editor, and you'll see an option to "Create a new table." Just give your table a name and add some columns. Here's what mine will look like:

Comments:
  • id (auto)
  • author (text, not nullable)
  • created_at (auto, timestamp with timezone)
  • content (text)
  • post_slug (text)

Ah, also, you'll need to turn off RLS (Row-Level Security). I didn't research much about this, but I think it causes some issues for some reason.

Create API

Steps are pretty simple. Visit the Cloudflare page and create an account. After signing in, you'll see an option called "Workers & Pages." Click on "Create" and start from a template. You can use the "Hello World" template. After that, I guess it will take you to the editor. If not there must be an option somewhere around. Find it.

There are a few things we need to worry about when creating the API. One of this is the CORS policy. We need to allow our domain to access the API. This can be done by adding the following code to the editor:

const corsHeaders = {
    "Access-Control-Allow-Origin": "*",  // Allow all domains (Change if needed)
    "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
    "Access-Control-Allow-Headers": "Content-Type, Authorization"
};

Another thing to consider is filtering out explicit words. Also, it's better to add restrictions on the username and message in the backend rather than the frontend to avoid many issues (you know what I meant right. hehe). With all this in mind, we can now add the following code to the supabase editor:

Expand for full code
const SUPABASE_URL = "XXX";
const SUPABASE_ANON_KEY = "XXX-XXX";

// CORS headers (Fixes CORS issue)
const corsHeaders = {
    "Access-Control-Allow-Origin": "*",
    "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
    "Access-Control-Allow-Headers": "Content-Type, Authorization"
};

// Bad Words List (Extended)
const badWordsList = [
    "word1", "word2", "word3", ...
];

// Normalize text to prevent bypassing (removes spaces & symbols)
function containsBadWords(text) {
    const normalizedText = text.toLowerCase().replace(/[^a-zA-Z0-9]/g, "");
    return badWordsList.some(word => normalizedText.includes(word));
}

// Handles all incoming requests
async function handleRequest(request) {
    if (request.method === "OPTIONS") {
    return new Response(null, { headers: corsHeaders });
    }

    if (request.method === "GET") {
    return fetchComments(request);
    } else if (request.method === "POST") {
    return submitComment(request);
    } else {
    return new Response("Method Not Allowed", { status: 405, headers: corsHeaders });
    }
}

// Fetch comments for a specific post
async function fetchComments(request) {
    const url = new URL(request.url);
    const post_slug = url.searchParams.get("post_slug");

    if (!post_slug) {
    return new Response(JSON.stringify({ error: "Missing post_slug" }), {
        status: 400,
        headers: { "Content-Type": "application/json", ...corsHeaders }
    });
    }

    const response = await fetch(`${SUPABASE_URL}/rest/v1/comments?post_slug=eq.${post_slug}&select=*`, {
    headers: {
        "apikey": SUPABASE_ANON_KEY,
        "Authorization": `Bearer ${SUPABASE_ANON_KEY}`,
        "Content-Type": "application/json"
    }
    });

    const data = await response.json();
    return new Response(JSON.stringify(data), { status: 200, headers: { "Content-Type": "application/json", ...corsHeaders } });
}

// Submit a new comment
async function submitComment(request) {
    try {
    const { author, comment, post_slug } = await request.json();

    // Validate input
    if (!author || !comment || !post_slug) {
        return new Response(JSON.stringify({ error: "Missing fields." }), {
        status: 400,
        headers: { "Content-Type": "application/json", ...corsHeaders }
        });
    }

    if (author.length > 15 || !/^[A-Za-z0-9]+$/.test(author)) {
        return new Response(JSON.stringify({ error: "Invalid name. Must be 1-15 alphanumeric characters." }), {
        status: 400,
        headers: { "Content-Type": "application/json", ...corsHeaders }
        });
    }

    if (comment.length < 5 || comment.length > 500) {
        return new Response(JSON.stringify({ error: "Comment must be between 5-500 characters." }), {
        status: 400,
        headers: { "Content-Type": "application/json", ...corsHeaders }
        });
    }

    // Block bad words
    if (containsBadWords(comment)) {
        return new Response(JSON.stringify({ error: "Inappropriate language detected." }), {
        status: 400,
        headers: { "Content-Type": "application/json", ...corsHeaders }
        });
    }

    // Save comment to Supabase
    const response = await fetch(`${SUPABASE_URL}/rest/v1/comments`, {
        method: "POST",
        headers: {
        "apikey": SUPABASE_ANON_KEY,
        "Authorization": `Bearer ${SUPABASE_ANON_KEY}`,
        "Content-Type": "application/json"
        },
        body: JSON.stringify({ author, content: comment, post_slug })
    });

    return new Response(JSON.stringify({ message: "Comment added" }), {
        status: response.status,
        headers: { "Content-Type": "application/json", ...corsHeaders }
    });

    } catch (error) {
    return new Response(JSON.stringify({ error: "Internal Server Error", details: error.toString() }), {
        status: 500,
        headers: { "Content-Type": "application/json", ...corsHeaders }
    });
    }
}

addEventListener("fetch", event => {
    event.respondWith(handleRequest(event.request));
});

This is a basic implementation of the API. You can add more features like rate limiting, spam protection, avatars, and even replying to comments. But for now, this is enough.

Now, on the frontend, we have two tasks:

  1. 1. Fetch comments

We can do this by using the following code:

const API_URL = "cloudflare-worker-url"; // Replace with the actual Cloudflare Worker URL
const POST_SLUG = "{{post.slug}}; // Replace with the actual post slug

async function fetchComments() {
    try {
        const response = await fetch(`${API_URL}?post_slug=${POST_SLUG}`);
        if (!response.ok) {
            throw new Error("Failed to fetch comments");
        }
        const data = await response.json();
        console.log(data); // Logs the JSON response
    } catch (error) {
        console.error("Error fetching comments:", error);
    }
}

For the post slug, you can utilize the front matter of the post. To do that, add a `slug` to the front matter of the blog, like this for example:

---
layout: post
title: "Title"
date: 2025-03-22
last_updated: 2025-03-22
category: Jekyll
tags: [jekyll, comment-box, cloudflare]
slug: jekyll-comment-box
---

Then, to access it in the comment box, you can simply use {{post.slug}}.

2. Submit comments

To do this, you can do something similar to:

async function submitComment(author, comment) {
    try {
        const response = await fetch(API_URL, {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({ author, comment, post_slug: POST_SLUG })
        });
        
        const result = await response.json();
        if (!response.ok) {
            throw new Error(result.error || "Error submitting comment");
        }
        
        console.log("Comment submitted successfully");
        fetchComments(); // Refresh comments after submission
    } catch (error) {
        console.error("Error submitting comment:", error);
    }
}

3.

Conclusion

And that's it! You now have a custom comment box for your Jekyll blog. You can further customize it by adding features like markdown, replies, and even user authentication. The possibilities are endless, and you can make it your own ✌(-‿-)✌

]]>
Geo Jolly
My Gate Journey2025-03-22T00:00:00+00:002025-03-22T00:00:00+00:00https://kingjuno.github.io/blog/2025/03/22/gatejourney IISC
IIT Bombay, Photo by Zoshua Colah

After completing my Bachelor's, I suddenly decided, without any clear reason, to prepare for GATE instead of going abroad. I had always dreamed of pursuing higher studies abroad, and honestly, I wasn’t all that interested in campus placements. But one day, I just made the decision to start preparing for GATE—looking back now, it was one of those random, all-of-a-sudden choices that somehow worked out.
When I started, I had pretty much forgotten most of what I had learned in college (except for math and programming, thankfully). I was literally starting from scratch, but once I got into the rhythm of things, it didn’t feel as difficult as I had imagined. However, the theory-heavy subjects, like Operating Systems, Computer Networks (except for the Data Link Layer, which for some reason, I actually enjoyed), and Compilers, were a real struggle for me. To be honest, I just skipped most of the compilers, because I couldn't bring myself to study them.
Throughout my college life, I always enjoyed computer science and did fairly well. I wasn’t a topper, but I was definitely above average. So, when I began my GATE journey, I didn't feel like I was totally out of my depth. Fast forward to GATE 2025: I’m proud to share that I secured an All India Rank of 189 and have been selected for an MTech program at IIT Bombay. It feels surreal, but looking back, I’m glad I took the leap to prepare for GATE instead of sticking to my original plan of going abroad.

1.

Preparation

I started my preparation around the end of May, but by then I was already good at discrete math (especially combinatorics), algorithms, data structures, and linear algebra. Initially, I had planned to prepare on my own without any coaching. Honestly, had I stuck to that, it would've been a disaster with the amount of syllabus that we have and an equal amount of procrastination I had. Luckily, a friend mentioned that GoClasses was conducting a scholarship test in a week or two, and I casually checked out a few of their videos and ended up really liking them and decided to give the test a shot. The good part is I got the 90% scholarship! Since the test was math-based, I found it quite easy. And a 90% discount is a massive deal, so I just went for it — and that's how my journey with GoClasses began. Now, one thing about GoClasses is that their content is huge (not soooo huge, but huge). At first glance, it feels overwhelming and that's something not so pleasant to look at. But once I got into it, I realized how good the content actually is. The only subject I didn't enjoy was compilers. Everything else was pretty smooth for me. For preparation (subject order), I simply followed the schedule provided by GoClasses, and I was able to complete the syllabus by mid-November. I did skip about half of the compilers module, which ended up costing me later during the exam 😅. But it wasn't too big of a deal since my main goal was to get into an MS program, and I was confident I could score enough for that.
When I first started studying, I could barely manage 2 hours a day, and that really frustrated me. I then started using the Pomodoro technique and gradually increased my daily study time to around 4–5 hours. But even that isn't enough, especially if you're aiming for a rank under 500. Eventually, I figured out the real problem — my phone. I installed Yeolpumta, which basically locks your phone, and if you try to use it, your study progress resets. That feature alone helped me stay serious. I didn't even want to pause it and break the flow. Slowly, my daily average went up to 7 hours, which I feel is quite decent, considering I sleep 8 hours.

1.1 Study Plan

My daily routine was pretty strict after June. I'd wake up at 6:30 AM, go visit my grandparents for a quick morning walk, take a bath, and then start studying by 7:45 AM. I'd go on till 12:30 PM with small 5-minute breaks in between. After lunch, I'd chill a bit by watching some series till around 2 PM, then get back to studying till 4 or 5 PM. After that, I'd go for swimming, come back, and study again till 7 PM. Post dinner, I'd watch something again till 9 PM and finally squeeze in one last study session till 10:30 PM before going to bed. This routine was so strict to a point I felt I was some kind of robot.

1.2 Go Classes

I really liked the content from GoClasses. The videos are well-structured, and the explanations are clear. Sometimes, I would skip certain topics — like Myhill-Nerode, the pumping lemma, or anything that felt a bit dry — but overall, except for compilers, I found everything quite good. If you start just six months before the exam, the syllabus can feel overwhelming. So, I'd recommend starting at least nine months in advance to make your preparation more manageable.

I haven't watched any other courses, so I can't or won't really compare GoClasses with others — but honestly, I found Go to be really good. Before buying, check out their free videos on YouTube and their website. You'll most likely enjoy the content and teaching style. If not, that's totally fine — in the end, go with whatever suits your learning style.

Coaching definitely helps, but your own effort matters the most. If you browse Reddit, you might find some people saying GoClasses isn't good, it's huge, the content is bad, this, that, blah blah, but I think that's mostly personal preference. There are plenty of students from GoClasses who have secured good ranks. What you should understand is that no coaching institute is perfect, and there's a clear limit to how much they can help you. And most importantly, you and I are not in school anymore to be spoon-fed — it's important to take responsibility for your learning. If something isn't clear, take the initiative to look it up, ask questions in online communities, or watch other lectures on YouTube. The resources are out there — it's up to you to make the most of them.

1.3 Test Series

I took both the GoClasses test series and the Zeal test series. To be honest, I didn't do many topic-wise tests until October, which I really regret now. Both Zeal and Go are good — I didn't like Zeal much for topic-wise tests, but their full-length tests are excellent and really help with time management. They're slightly tougher than the actual GATE exam. On the other hand, GoClasses tests are closer to the actual GATE level. GateOverflow's full-length test series is also good, though quite challenging. I haven't taken any other test series apart from these.

The most important thing with test series is that after taking a certain number of mocks, you should start focusing more on time management. My personal strategy was to first solve the questions I found easy, then move on to medium-level ones, and finally recheck my answers. If I had time left, I would attempt the harder questions. This approach might work for some people but not for others, so it's important to find a strategy that works best for you.

1.4 Communities

There are plenty of Telegram groups available. You can even make a small group with friends to discuss and ask doubts. GoClasses also has a public Telegram group where you can ask questions — and usually, someone responds pretty quickly. You should also check out GateOverflow (the holy website for GATE) website and Telegram group. Regarding math, you can always ask your doubts in the famous Math Discord server, or Math Stack Exchange.

1.5 Regrets

I regret not doing the topic-wise tests before October. I should've started them earlier. I also didn't revise the subjects properly, so I ended up wasting a lot of time after November. The thing with topic-wise and subject-wise tests is that they help you identify weak areas and revise them effectively. So please don't skip them! And also start your studies as early as possible.
I also regret not taking care of my health. During the exam week, I had a high fever and headache because I neglected my health. I had to rest for the entire week and had to take heavy medications right before the exam. So please take care of your health.

2.

Interview Experiences

I have applied for the following programs:

  • IISC CDS MTech (made it to the provisional list)
  • IIT Madras CS (Got Offer)
  • IIT Bombay CS MS RA and TA (Got Offer)
  • IIT Bombay CS MTech RA (Got Offer)
  • IIT Bombay CMINDS MS RA (Made it into provisional/waiting)
  • IIT Delhi Minds MTech (Got offer)
  • IIT Delhi MINDS MS RA (Cleared written; didn't do interview)

I didn't apply to IIT Kanpur MS because I wasn't interested in traveling such a long distance, and I preferred MTech over MS there. As for IIT Delhi, I applied mainly because SCAI offered multiple exam centers and online interviews. Other than that, I wasn't particularly interested in any other colleges.
Also, I have received offers from IIT Kanpur and IIT Kharagpur for MTech.

2.1 IISC

CDS MTech

The interview was on the 16th of April. In the morning, we went through verifications and then had a written test. These are the questions they asked:

  • How many ways to reach H7 if you start from A1 in chess (only up and right movements allowed)?
  • Find the graph of \( f(2x) \) given \( f(x) \).
  • Find the graph of \( x^2 + \sin(x) \).
  • \( P(X + Y > 1 \mid Y \geq X) \) where \( X \) and \( Y \) are uniform in \([0,1]\).
  • Dimension of the subspace of \( \{(x, y, z) \mid x - 2y = 0\} \).
  • Degree of the characteristic equation of \( A_{n \times n} \) is what?
  • You are given \( P(A \cup B) \), \( P(A \cap B) \), find \( P(A) \) when \( P(A) > P(B) \) and \( A \) and \( B \) are independent.
  • Arrange the word 'Indian' such that no two vowels are together.
  • Right rotate a 2D array (coding).
  • Divide numbers into bins of size 10 (coding).

The questions were relatively easy. After that, we had the interview. I guess most of the time we spent chatting about general stuff and solving a few questions. We talked a bit about my background and the CDS program. Then, the interviewer started asking questions on linear algebra and algorithms, where I had to explain the logic and write pseudocode on the board. One question was about using dynamic programming (DP) for matrix chain multiplication. Overall, the interview went well, although I wasn't able to write the entire pseudocode. Anyway, I made it to the provisional list, even though I didn't make it to the final list.

CSA and CDS Research

I attended the CDS interview but skipped CSA because by then, I had already been selected for IIT Madras, which was my first preference. In the CDS interview, you first write a test, and then the interview. The written test is so easy that I don't even understand why they conduct it in the first place. The interview was quite long and stretched until 8 pm, and I was already tired at that point. The questions were mostly from linear algebra, basic ML, and some coding. Since I didn't want to face Bangalore traffic, I decided to skip CSA the next day.

2.2 IIT Madras

IIT Madras releases a shortlist, and I think for the general category, a score of 600-650 is a safe zone. We need to travel to Madras for the interview. We had the written test on the 5th of May. The test was relatively easy and based on the GATE syllabus. I won't list the questions here since I don't remember them exactly, but it was an easy GATE-level paper. I have a feeling that the cutoffs are high; probably 25/30 is required to get shortlisted. No official information on the cutoff, just a guess that's most probably correct. Around 50 people from ~350-400 were shortlisted. Some had their interviews on the same day, and others had them the next day. I had both my interviews the next day. So, for the rest of the day, my friends and I just roamed around and studied a bit of linear algebra since our preference was Intelligence Systems.

My theory interview was first. For the theory interview, you have the freedom to choose the subject. I chose discrete math, and they asked questions on combinatorics, functions, and relations. I answered almost all the questions, and very few with some hints. Overall, the interviewers were very friendly and didn't expect me to answer everything but focused more on how I approached the problems.

Then, I had my interview for Intelligent Systems. It started with an introduction and a question about which subject I would prefer. I chose linear algebra. For the Intelligent Systems track, there's a coding question first. Many candidates get easy questions such as reversing an array, and mine was about counting one-child nodes in a binary tree. I solved it easily, wrote the code on the board, and explained it. After that, they started asking questions from linear algebra. I answered most of them except for the last one, where I had to take a hint. It was about finding eigenvalues and eigenvectors without a pen and paper, using properties. I solved most of the eigenvalues and two eigenvectors, but the last one was tricky for me. Still, I managed to answer it with a small hint from the interviewer. Overall, the interview went well, but I wasn't very confident about the Intelligent Systems part, although I felt confident about the theory panel. They released the provisional list a week later, and I was on both panels.

2.3 IIT Bombay

From Madras, I flew to Bombay the next day, and we had the written test for MS based on the panel I chose earlier. I chose Intelligence Systems. The paper was a little harder than GATE and required knowledge of additional subjects. But I think if you know the GATE math well, it is easier to clear.

The next day, we had coding tests for MTech and MS RA. It was easy for me, and I believe you will get shortlisted for an interview if you can solve 3 out of 5 questions. The questions mostly covered arrays and recursion:

  • Check if a given number is an Armstrong number.
  • Count the number of each distinct alphabet that occurs consecutively in a string and print the character followed by the count.
  • Find the number of points in a matrix where the value is strictly smaller than all adjacent values (left, right, up, down).
  • Determine if the second matrix is a submatrix of the first matrix. If found, output the position of the leftmost element of the first row of the submatrix in the original matrix; otherwise, return -1.
  • Write a recursive function to count the number of ways to arrange K 1's in N spaces such that no two 1's are adjacent.

MS TA

I messed up this one. But I still solved questions related to Linear Algebra, like explaining why we take $|A - \lambda I| = 0$ when finding eigenvalues. One question was very similar to: "Prove that the Binomial distribution converges to the Poisson distribution under the condition that the number of trials $n \to \infty$, the probability of success $p \to 0$, and the product $\lambda = np$ remains constant." There was also a question on best-fit regression.

MTech RA, MS RA

These interviews were based on the project you choose. I chose a generative AI project for both MTech and MS RA, and a CS-101 course for MTech RA. The interviews were okay and didn't go too badly. If you choose projects like this, there will be an additional Python test. But the interviews were not that bad overall. However, they expect you to know at least something about the project you choose.

CMINDS

The written test will be tough, believe me! But like the other interview tests, if you know the basics, you will at least get shortlisted. I was shortlisted for the interview. My interview started with an introduction, followed by questions on probability and linear algebra, which were okay. Then, they asked me about computer science topics like the difference between a Trie and a Tree, which I couldn't answer. The second part of the interview with CS questions was a bit terrible.

2.4 IIT D

To be honest, I wasn't interested in going to IIT Delhi (due to personal reasons), but I applied to the ScAI department because it seemed cool. I had my exams at IIT Bombay when I was there for the IITB interview. You can choose from multiple centers, including Mumbai, Bangalore, Delhi, Hyderabad, and Kolkata. The exam was okay, with questions on aptitude, math, algorithms, and ML. I felt that the math section was a bit harder than the others.

3.

Resources

These are some resources that you can check out if you're interested in GATE preparation or interviews:

4.

End Notes

I would like to sincerely thank my parents, friends, and teachers for their constant support throughout this journey. A special thanks to Deepak Sir, Sachin Sir, and Arjun Sir from Go Classes for their invaluable guidance and scholarship support. I am also grateful to my friends from the GATEFATE group for their encouragement, discussions, and companionship along the way. I have accepted the offer from IIT Bombay MTech RA, because it perfectly aligns with what I want to do in the future.
Also, note that the labs from the DS-AI department at IIT Madras are no longer available to CS students. So, if you are planning to join any of those labs or are interested in professors from the DS-AI department, please write the GATE DA exam for MTech. For MS, you can apply with a CS score.

IITM Selection COAP Selection

By the way, I haven't included everything in this blog — so if you have any doubts, feel free to ask in the comments or via email/LinkedIn.

]]>
Geo Jolly
Replicating AlphaGo2024-02-29T00:00:00+00:002024-02-29T00:00:00+00:00https://kingjuno.github.io/blog/2024/02/29/replicating-alphagoInfo: You can find the codes for the following article from the Github

1.

A Little Bit of History

Go, also known as Baduk in Korea, has a long history spanning over 2500 years and is loved by many in East Asia. In this game, players take turns placing black and white stones on a grid, aiming to control territory and capture their opponent's stones. The game gets more complicated as the board size increases, but typically, players use boards with 9x9, 13x13, or 19x19 squares.

In 2016, DeepMind, now Google DeepMind, made waves when AlphaGo defeated the legendary Lee Sedol. What made AlphaGo special was its use of advanced technology like deep neural networks and reinforcement learning, along with smart algorithms like Monte Carlo Tree Search (MCTS). By training on lots of Go data and playing millions of games against itself, AlphaGo learned to evaluate board positions and make strategic decisions really well.

2.

Creating an Environment

Environment for this is heavily inspired by the repository Deep Learning and the Game of Go. Some changes have been made to the original code to make it more readable and understandable. Regardless of that, the original code contains more information and is much more detailed. It is a part of the book "Deep Learning and the Game of Go" by Max Pumperla and Kevin Ferguson. In this article, we are only focusing on the bare minimum to train an AlphaGo on a 9x9 board. The codes for the environment could be found in the alphago/env folder. If you are planning to code along, create a folder alphago and copy the env folder from the repository to the alphago folder. Initially I thought of just directly going to the codes of AlphaGo, but then later I thought it would be better to include many details like MCTS, neural networks, etc. separately. That way it would be easier for a beginner to understand the code. I would highly recommend going through the book Deep Learning and the Game of Go. It's a great book with a lot of information for beginners while this blog assumes you are already familiar with the basics of Go and PyTorch, deep learning.

Quick overview of environment

  1. Files in env folder
      alphago/
      └── env/
          ├── generate_zobrist.py
          ├── go_board.py
          ├── gotypes.py
          ├── scoring.py
          ├── utils.py
          └── zobrist.py
                          
    1. generate_zobrist.py This file is used to generate Zobrist hash for the board. Run this file and copy output to zobrist.py, or python3 generate_zobrist.py > zobrist.py from the alphago/env directory.

    2. go_board.py This file contains the GoBoard class which is our environment.

    3. zobrist.py This file contains the Zobrist hash for the board which is generated using generate_zobrist.py.

  2. Creating game state

    Use the following snippet to create a 9x9 board.

    >>> from alphago.env import go_board
    >>> from alphago.env.utils import print_board, print_move
    
    >>> board_size = 9
    >>> game = go_board.GameState.new_game(board_size=board_size)
  3. Rendering board
    >>> print_board(game.board)
    9  .  .  .  .  .  .  .  .  . 
    8  .  .  .  .  .  .  .  .  . 
    7  .  .  .  .  .  .  .  .  . 
    6  .  .  .  .  .  .  .  .  . 
    5  .  .  .  .  .  .  .  .  . 
    4  .  .  .  .  .  .  .  .  . 
    3  .  .  .  x  .  .  .  .  . 
    2  .  .  .  .  .  .  .  .  . 
    1  .  .  .  .  .  .  .  .  . 
       A  B  C  D  E  F  G  H  I
  4. Taking actions

    Movements are achieved using alphago.env.go_board.Move object. There are 3 types of movements available: play(point), pass(), and resign(). To apply the movement, you can use the apply_move method from the game object.

    >>> from alphago.env.go_board import Move
    >>> from alphago.env.gotypes import Point
    >>> point = Point(row=3, col=4)
    >>> move = Move.play(point)
    >>> # You can also do Move.pass() or Move.resign()
    >>> print_move(game.next_player, move)
    Player.black D3
    >>> game = game.apply_move(move)
    >>> print_board(game.board)
      9  .  .  .  .  .  .  .  .  . 
      8  .  .  .  .  .  .  .  .  . 
      7  .  .  .  .  .  .  .  .  . 
      6  .  .  .  .  .  .  .  .  . 
      5  .  .  .  .  .  .  .  .  . 
      4  .  .  .  .  .  .  .  .  . 
      3  .  .  .  x  .  .  .  .  . 
      2  .  .  .  .  .  .  .  .  . 
      1  .  .  .  .  .  .  .  .  . 
         A  B  C  D  E  F  G  H  I
3.

We've figured out how to handle tic-tac-toe or a 5x5 Go board with our computer, which only has a few hundred thousand possible situations. But what about games like Go or chess? They've got more situations than all the atoms on Earth! So, we're using something called Monte Carlo Tree Search (MCTS) to figure out the game state without any fancy strategies. The idea's pretty simple: we're just randomly looking around to see how good a situation is. We build a tree with all the possible moves we can make from a given situation. But then again, we can't make a tree with every single situation because there are just way too many of them for our systems to handle.

So, MCTS basically has four parts:

  • Selection
  • Expansion
  • Simulation
  • Backpropagation
MCTS
Figure 2: The above figure illustrates the steps of the Monte Carlo Tree Search (MCTS) algorithm. Image credit: Robert Moss, CC BY-SA 4.0, Link.

Let's break down every step in this.

Selection

Starting from the root node, we select the child node with the highest UCT (Upper Confidence Bound 1 applied to trees) value. The UCT value is calculated as follows:

$$UCT = \frac{Q}{N} + c \sqrt{\frac{\log{P}}{N}}$$

Where:

  • \(Q\) is the number of wins after the move.
  • \(N\) is the number of simulations after the move.
  • \(P\) is the number of simulations after the parent node.
  • \(c\) is the exploration parameter.

The first part of the equation is the exploitation term, and the second part is the exploration term. The exploration parameter is a hyperparameter that controls the balance between exploration and exploitation. A higher value of $c$ will lead to more exploration, and a lower value will lead to more exploitation. The reason why we need to explore is simple:
The agent has to exploit what it has already experienced in order to obtain a reward, but it also has to explore in order to make better action selections in the future.
(Reinforcement Learning - An Introduction by Richard S. Sutton and Andrew G., Chapter 1)

The implementation is as follows:

def select_child(self, node):
    """
    Selection
    """
    total_rollouts = sum(child.num_rollouts for child in node.children)
    log_total_rollouts = math.log(total_rollouts)

    best_score = -1
    best_child = None
    for child in node.children:
        win_pct = child.winning_pct(node.game_state.next_player)
        exploration_factor = math.sqrt(
          log_total_rollouts / child.num_rollouts
          )
        uct_score = win_pct + self.temperature * exploration_factor
        if uct_score > best_score:
            best_score = uct_score
            best_child = child
    return best_child

Expansion

Once we have selected a node, we need to expand it. We will add a new node to the tree at random. This is also called rollout.

Simulation

Once we have expanded the node, we will simulate the game from that node until the end and see who wins. It's to evaluate the current state of the game.

Backpropagation

Once we have the result of the simulation, we will update the nodes on the path from the root to the selected node. We will update the number of wins and the number of visits to each node.

MCTS Implementation

class MCTSNode:
  def __init__(self, game_state, parent=None, move=None):
      self.game_state = game_state
      self.parent = parent
      self.move = move
      self.children = []
      self.num_rollouts = 0
      self.unvisited_moves = game_state.get_legal_moves()
      self.win_counts = {
          Player.black: 0,
          Player.white: 0,
      }

  def add_random_child(self):
      index = random.randint(0, len(self.unvisited_moves) - 1)
      new_move = self.unvisited_moves.pop(index)
      next_state = self.game_state.apply_move(new_move)
      child = MCTSNode(next_state, self, new_move)
      self.children.append(child)
      return child

  def fully_expanded(self):
      return len(self.unvisited_moves) == 0

  def is_terminal(self):
      return self.game_state.is_over()

  def winning_pct(self, player):
      return float(self.win_counts[player]) / float(self.num_rollouts)

Next, we define a new Agent which uses this data structure and MCTS algorithm to play Go. We will now need to define select_move and for that, we will need to implement these methods to our agent:

  • Selection
  • Expansion
  • Simulation
  • Backpropagation
  • I have already discussed the first one, and the rest are relatively easy once you have an idea of how MCTS works. The complete implementation of the MCTS agent can be found in alphago/agents/mcts_agent.py file.

    4.

    Bringing Neural Networks

    MCTS
    Figure 2: Neural network training pipeline and architecture. From "Mastering the game of Go with deep neural networks and tree search" (DeepMind) .

    In the original paper, we train 4 different neural networks:

    1. Rollout Policy, \(p_\pi\)
    2. SL Policy Network, \(p_\sigma\)
    3. RL Policy Network, \(p_\rho\)
    4. Value Network, \(v_\theta\)
    5. First, we have a fast rollout policy, which is a smaller network compared to the rest, and we actually use it in tree search for speed. In our case, since the board is 9x9, I won't be training a separate rollout policy network, but you are free to train it, and for that purpose, you can use the GoNet in alphago/networks. Next we have the supervised learning policy (SL) network that uses data from game plays to train the network. Then, we improve this network by playing against itself using reinforcement learning. Lastly, we have a value network that predicts the outcome of moves for the sake of strategic decision-making. Now let's start from the zero to train an AlphaGo.

      Encoder

      Here we will implement a simple one-plane encoder. We will be using a 1x9x9 matrix to represent the board. In this matrix, 0 represents empty points, 1 represents the points occupied by the current player, and -1 represents the points occupied by the opponent. I won't be going into any detailed explanation of the implementation since it is pretty straightforward. The encoder is implemented in alphago/encoders/oneplane.py file. The following functions are relatively important in the encoder.

    6. Encoding the board
    7. def encode(self, game_state):
          board_matrix = np.zeros(self.shape())
          next_player = game_state.next_player
          for r in range(self.board_height):
              for c in range(self.board_width):
                  p = Point(row=r + 1, col=c + 1)
                  go_string = game_state.board.get_go_string(p)
                  if go_string is None:
                      continue
                  if go_string.color == next_player:
                      board_matrix[0, r, c] = 1
                  else:
                      board_matrix[0, r, c] = -1
          return board_matrix
    8. Encoding the point
    9. def encode_point(self, point):
          return self.board_width * (point.row - 1) + (point.col - 1)
    10. Decoding the point index
    11. def decode_point_index(self, index):
          row = index // self.board_width
          col = index % self.board_width
          return Point(row=row + 1, col=col + 1)

      Even though there are different encoders we can try, I won't be discussing them because the current one we have works perfectly for 9x9 boards. In case you want to try creating one, please refer to the Extra Notes.

      Generating Data

      Now that we have our encoder, we can start generating data for our neural network. We will use the MCTS agent to generate data for us. We will play a game with the MCTS agent and store the encoded board state and the move made by the MCTS agent. We will then use this data to train our neural network. You can find the implementation of the data generation in archives/generate_mcts_game.py file. Additionally, I have rewritten the code, which now uses multiprocessing to play multiple games at the same time. You can find that in archives/generate_mcts_game_mp.py file.
      But don't you think this is actually troublesome because it will take us a lot of time to get enough data, and also we can obtain data of games which are equal or better in terms of quality from real-world Go games? So when you think about factors like that, the best option we have right now is to download the datasets which were already created by other people, and most of them are the game records of top amateur games played on competitions or servers like OGS. Here are some of the links to the datasets:

      1. K Go Server amateur 19x19
      2. K Go Server 4Dan 19x19
      3. cwi.nl contains a huge collection of go games
      4. Juno contains some datasets hosted by me.
      5. All of these datasets are in SGF format and we need to convert them to tensors for us to train the neural networks. I have implemented a dataset which is compatible with PyTorch DataLoader in alphago/data folder. The idea behind this implementation is simple:

        1. Download the dataset, if not present
        2. Sample X amount of games (i.e X sgf files)
        3. Save each board state and movement to the dataset (use multiprocessing)
        4. [Note]:If you have your own dataset, paste the folder containing sgf files into your dataset folder and pass name of the folder to the dataloader.

          This is how you use the GoDataSet.

          from alphago.data.dataset import GoDataSet
          
          BOARD_SIZE = 9
          print(GoDataSet.list_all_datasets()) #prints all available datasets
          training_dataset = GoDataSet(
            encoder = "oneplane", # encoder used
            game = "go9", # Dataset name, 
            no_of_games = 8000, # number of games
            dataset_dir="dataset", # directory
            seed = 100,
            redownload = False,
            avoid = [] # files to avoid
          )
          test_dataset = GoDataSet(
            encoder="oneplane", no_of_games=100, avoid=training_dataset.games
          )
          train_loader = torch.utils.data.DataLoader(
              training_dataset, batch_size=64, shuffle=True
          )

          Simple Supervised Learning

          Let's start by defining the neural network which trains on the Go9 dataset.

          class AlphaGoNet(nn.Module):
            def __init__(self, input_shape, num_filters=192, dropout=0.3):
                super(AlphaGoNet, self).__init__()
                self.input_shape, self.num_filters, self.dropout = input_shape, num_filters, dropout
                self.conv1 = nn.Conv2d(input_shape[0], num_filters, 3, 1, 1)
                self.conv2 = nn.Conv2d(num_filters, num_filters, 3, 1, 1)
                self.conv3 = nn.Conv2d(num_filters, num_filters, 3, 1)
                self.conv4 = nn.Conv2d(num_filters, num_filters, 3, 1)
                self.bn1, self.bn2, self.bn3, self.bn4 = *[
                  nn.BatchNorm2d(num_filters) for _ in range(4)
                ]
                self.fc1 = nn.Linear(num_filters * (
                  (input_shape[1] - 4) * (input_shape[2] - 4)), 1024)
                self.fc_bn1 = nn.BatchNorm1d(1024)
                self.fc2 = nn.Linear(1024, 512)
                self.fc_bn2 = nn.BatchNorm1d(512)
                self.fc3 = nn.Linear(512, input_shape[1] * input_shape[2])
                self.fc4 = nn.Linear(512, 1)
          
            def forward(self, s):
                s = F.relu(self.bn1(self.conv1(s.view(-1, 1, self.input_shape[1], self.input_shape[2]))))
                s = F.relu(self.bn2(self.conv2(s)))
                s = F.relu(self.bn3(self.conv3(s)))
                s = F.relu(self.bn4(self.conv4(s)))
                s = s.view(-1, self.num_filters * ((self.input_shape[1] - 4) * (self.input_shape[2] - 4))))
                s = F.dropout(F.relu(self.fc_bn1(self.fc1(s))), p=self.dropout, training=self.training)
                s = F.dropout(F.relu(self.fc_bn2(self.fc2(s))), p=self.dropout, training=self.training)
                pi, v = self.fc3(s), self.fc4(s)
                if self.training:
                    return F.log_softmax(pi, dim=1), torch.tanh(v)
                return F.softmax(pi, dim=1), torch.tanh(v)
          

          Now that we have a model, we can just go ahead and train it as you would with a normal deep learning model. You can use torch.nn.functional.nll_loss or torch.nn.CrossEntropyLoss. If you are going to use the latter, then please one-hot encode the labels before feeding them to the neural network. Code for the training can be found in alphago_sl.py.

          I have also trained this with the 19x19 KGS dataset, but the results weren't that great. Of course, it's better than a random agent, but that doesn't mean it's good enough to win a real-world game, let alone a beginner-level game. Maybe training on more data and increasing epochs might do the trick, but it won't be enough to defeat an average Go player. The only reason our 9x9 model is performing well at this point is because of its lower complexity compared to larger boards. But then again, I wouldn't go as far as to say our model is good enough to defeat an intermediate Go player. Why? It seems like something is missing, isn't it? As an answer to this, I say experience. When we think about how we humans are different from these deep learning or any machine learning models, we might have a lot of answers, but the most relevant and important one would be our ability to learn from experience. And how do we gain experience? The answer is simple: we humans spend a lot of time practicing, training, and improving upon our knowledge. Likewise, we introduce the concept of learning from experience, the concept of practice, to computers by using reinforcement learning. The next section will be about how we can use reinforcement learning to improve our model by practicing, also known as self-play.

          [Note]: We are supposed to train the fast rollout policy network, but I won't be doing that and instead will be using the supervised model as the fast rollout policy. However, as I explained before, you can feel free to train the fast rollout if you want to.

          5.

          Reinforcement Learning

          I won't be explaining much about reinforcement learning in this article since it won't fit within this limited space, and our main objective here is to teach an agent to play Go. However, I will provide a brief overview of reinforcement learning. Reinforcement learning is very similar to how we humans learn. We are rewarded when we perform well and punished when we make mistakes. In reinforcement learning, the focus is on learning optimal behavior to maximize rewards.

          If you're interested in learning more about reinforcement learning, you can explore the following resources:

          1. Reinforcement Learning: An Introduction Andrew Barto and Richard S. Sutton
          2. UCL Course on RL David Silver
          3. These will provide you with more comprehensive insights into the topic.

            Back to our topic, we know that experience is very important for a human to take action. For animals, we have brains to store these experiences. But what about our agents? It's not like our agents have a brain already integrated into them. To address this, we create a small data structure which stores the relevant information we need. In our case, we store the state of the board, the action we took, and the reward we received. Now that we have a brain, don't we need to fill our brains with some relevant data? So, we do something called self-play where an agent plays against another. Collecting experience by playing with a random agent isn't a great idea since the gathered data isn't great. We can use MCTS, but it would take a huge amount of time since you will need records of a lot of games, probably around 10,000 or more. Then we can do an agent vs. human, but that would take an eternity, lol. So, we play against the supervised learning model against itself. Essentially, what happens is self-play of two instances of the deep learning agent we trained earlier. By the end, we will discuss the training of the last two networks we need for AlphaGo, the Policy network and the Value network.

            Training the Policy Network

            In the original paper, they used the supervised model and improved it using self-play, and this is what we should do. So here, for us to improve the model, the idea is simple: we take the supervised model, do a lot of self-play, around 10,000 games would be fine but remember more is better, and store those data. Since I won't be explaining the complete codes, I will explain how the data would be like. So, say you won a game in self-play; in that case, for all states and all the actions we encountered and took, we store a +1 reward, and for the states and actions which we lost the game, we store a -1 reward. And now you can go ahead and train the policy network like how you would do for a normal neural network. Codes for the self-play and training can be found in alphago_rl.py.

            Training the Value Network

            Now we need a value network to evaluate the board position. We can use the policy model which we trained in the last section for training our value network. In the last case, we considered both the state and action to get the y value which is the reward. In this case, since we are only evaluating how good a position is, we will only need the state. If the state in self-play resulted in a win then we consider the target label for that as +1 else we give 0. Codes for the training can be found in alphago_value.py.

            6.

            AlphaGo

            Now let's bring together all that we have. In the beginning, we trained two policy networks, one of which is small and the other one is comparatively large. The smaller, faster policy network is used in tree search rollouts, while the larger one is optimized for accuracy. After supervised learning, we engage in self-play to improve this network and make it stronger. In our case, we won't be training a different fast rollout network; instead, we'll use the supervised model. With the data from self-play by the policy network, we train a value network, which is an essential part of AlphaGo.

            Neural Networks in MCTS

            MCTS
            Figure 3: MCTS in AlphaGo. Monte Carlo tree search in AlphaGo. a, Each simulation traverses the tree by selecting the edge with maximum action value \(Q\), plus a bonus \(u(P)\) that depends on a stored prior probability \(P\) for that edge. b, The leaf node may be expanded; the new node is processed once by the policy network \(p_\sigma\) and the output probabilities are stored as prior probabilities \(P\) for each action. c, At the end of a simulation, the leaf node is evaluated in two ways: using the value network \(v_\theta\); and by running a rollout to the end of the game with the fast rollout policy \(p_\pi\), then computing the winner with function \(r\). d, Action values \(Q\) are updated to track the mean value of all evaluations \(r(·)\) and \(v_\theta(·)\) in the subtree below that action.

            AlphaGo uses a more complex tree search than the one we used before, but the 4 parts of classic MCTS are still relevant to AlphaGo's MCTS. The only difference is that we are using a deep learning network to evaluate positions and nodes. Let's start with the rollout.

            def policy_rollout(self, game_state):
              for step in range(self.rollout_limit):
                  if game_state.is_over():
                      break
                  move_probabilities = self.rollout_policy.predict(game_state)[0]
                  encoder = self.rollout_policy.encoder
                  for idx in np.argsort(move_probabilities)[::-1]:
                      max_point = encoder.decode_point_index(idx)
                      greedy_move = Move(max_point)
                      if greedy_move in game_state.legal_moves():
                          game_state = game_state.apply_move(greedy_move)
                          break
            
              next_player = game_state.next_player
              winner = game_state.winner()
            
              if winner is not None:
                  return 1 if winner == next_player else -1
              else:
                  return 0

            This is pretty much the same as our rollout in classic MCTS. However, as you can see, there is a rollout_limit in place to avoid the rollout taking up a huge amount of time. Then, we have a policy_probabilities function to compute the policy values for the legal moves available to us.

            def policy_probabilities(self, game_state):
              encoder = self.policy.encoder
              outputs = self.policy.predict(game_state)[0]
              legal_moves = game_state.legal_moves()
              if len(legal_moves) == 2:
                  return legal_moves, [1, 0]
              encoded_points = [
                  encoder.encode_point(move.point) for move in legal_moves if move.point
              ]
              legal_outputs = outputs[encoded_points]
              normalized_outputs = legal_outputs / np.sum(legal_outputs)
              return legal_moves, normalized_outputs

            Next we have our most important function, select_move.

            def select_move(self, game_state):
              for sim in range(self.num_simulations):
                  current_state = game_state
                  node = self.root
                  for depth in range(self.depth):
                      if not node.children:
                          if current_state.is_over():
                              break
                          moves, probabilities = self.policy_probabilities(current_state)
                          node.expand_children(moves, probabilities)
                      move, node = node.select_child()
                      current_state = current_state.apply_move(move)
                  value = self.value.predict(current_state)
                  rollout = self.policy_rollout(current_state)
            
                  weighted_value = (
                      1 - self.lambda_value
                  ) * value + self.lambda_value * rollout
            
                  node.update_values(weighted_value)
            
              moves = sorted(
                  self.root.children,
                  key=lambda move: self.root.children.get(move).visit_count,
                  reverse=True,
              )
            
              for i in moves:
                  if not is_point_an_eye(
                    game_state.board, i.point, game_state.next_player
                  ):
                      move = i
                      break
              else:
                  move = Move.pass_turn()
              self.root = AlphaGoNode()
              if move in self.root.children:
                  self.root = self.root.children[move]
                  self.root.parent = None
              return move

            The idea of select_move is simple: we play a number of simulations. We restrict the game length using depth, so we play until the specified depth is reached. If we don't have any children, we expand them using the probabilities of moves from the strong policy network. Note that the policy network returns all the moves and their associated probabilities. We update this information to the AlphaGo Node using the following function in the Node.

            def expand_children(self, moves, probabilities):
                for move, prob in zip(moves, probabilities):
                    if move not in self.children:
                        self.children[move] = AlphaGoNode(parent=self, probability=prob)

            If we have children, then we select one from them and play the move. We select using the following function in the AlphaGoNode:

            def select_child(self):
              return max(
                  self.children.items(), 
                  key=lambda child: child[1].q_value + child[1].u_value
              )

            After each simulation, we find the value of the value network and rollout by the fast policy, and then combine them using the equation:

            $$ \text{weighted\_value} = (1 - \lambda) \cdot \text{value} + \lambda \cdot \text{rollout\_value} $$

            After combining, we update the value of AlphaGoNode using the following function that we have in the node:

            def update_values(self, leaf_value):
              if self.parent is not None:
                  self.parent.update_values(leaf_value)
            
              self.visit_count += 1
              self.q_value += leaf_value / self.visit_count
              if self.parent is not None:
                  c_u = 5
                  self.u_value = (
                      c_u
                      * np.sqrt(self.parent.visit_count)
                      * self.prior_value
                      / (1 + self.visit_count)
                  )

            The rest is simple; we just take the child which is visited the most, and to save time, we avoid choosing positions like an eye. This is also partially because our agent doesn't have the ability to pass until it has no other option.

            AlphaGo played as white in all matches, with no inherent advantage. It won every game except when AlphaGo faced itself, where black won. Something you need to note is that this version of AlphaGo only had 10 simulations, a depth of 30, and a rollout limit of 40 for the sake of computation. Even though by doing this, we are trading accuracy, it still manages to win an average of 60 games out of 100 against the supervised agent and the MCTS Agent with more rounds.

            7.

            Extra Notes

            • AlphaGo uses an encoder which is much more complex than our simple encoder with 48 features, which I didn't implement because in our cases, since the board is small, our encoder works well. If you want to try that out, you can create a new encoder in the alphago/encoders.
            • This is already mentioned, but it's worth noting once more: we aren't training a different rollout model; instead, we're using the same model for self-play and rollout.
            • KGS pretty much has the entire or more data on which AlphaGo was trained.
            • If you want to use custom data, create a folder inside the dataset folder and pass the name to the dataloader.
            • The current version doesn't have the ability to pass a turn until and unless it doesn't have any legal move to choose. But the implementation is pretty much straightforward, as you will only have to tweak some small changes in the encoders and increase the output shape of networks other than policy by 1 for passing the turn.
            • 8.

              References

              1. David Silver et al. "Mastering the game of Go with deep neural networks and tree search." Nature, vol. 529, 2016, pp. 484-503. [Link].
              2. Jeffrey Barratt et al. "Playing Go without Game Tree Search Using Convolutional Neural Networks." arXiv preprint arXiv:1907.04658, 2019. [PDF].
              3. Kevin Ferguson and Max Pumperla. "Deep Learning and the Game of Go" [Link].
              4. Richard S. Sutton and Andrew G. Barto. "Reinforcement Learning: An Introduction." MIT Press, 2018.
              ]]>
              Geo Jolly
              Google Summer of Code’23 - ML4SCI2023-09-30T00:00:00+00:002023-09-30T00:00:00+00:00https://kingjuno.github.io/blog/2023/09/30/gsoc-ml4sci ul:not(.citations) li { font-family: "Nunito Sans", sans-serif; /* Match the font style of

              */ font-size: 1rem; /* Match the font size of

              */ line-height: 1; /* Match the line height of

              */ margin-bottom: 0.5rem; /* Add consistent spacing between list items */ }

              Go-game-cat
              Figure 1: Generated using Stability AI's DreamStudio

              I'm thrilled to share that I've been selected for Google Summer of Code (GSoC) at Ml4SCI. I'll be working on developing equivariant neural networks for dark matter substructure with strong lensing.

              I'll be mentored by:

              Throughout the summer, I'll be documenting my work and sharing the things I learn. I hope you enjoy reading about my experiences and progress.

              1.

              Project Abstract

              Strong gravitational lensing is a promising probe of the substructure of dark matter to better understand its underlying nature. Deep learning methods have the potential to accurately identify images containing substructure and differentiate WIMP particle dark matter from other well-motivated models, including vortex substructure of dark matter condensates and superfluids. However, accurately identifying images containing substructure and differentiating between various dark matter models can be challenging. Deep learning methods, particularly equivariant neural networks, provide a promising approach to addressing these challenges. This project will focus on the further development of the DeepLense pipeline that combines state-of-the-art deep learning models with strong lensing simulations based on lenstronomy. The focus of this project is using equivariant neural networks for the classification and regression of dark matter particle candidates (e.g. CDM, WDM, axions, SIDM).

              2.

              Gravitational Lensing

              MCTS
              Figure 2: Steps of Monte Carlo tree search. Image credit: ESA/Hubble & NASA

              Gravitational lensing is a phenomenon where the gravity of massive objects, like clusters of galaxies or individual stars, distorts and magnifies the light of more distant objects behind them, allowing us to study the details of early galaxies too far away to be seen with current telescopes. Hubble observations have greatly increased the number of Einstein rings and helped create maps of dark matter in galaxy clusters. The lensed images of crosses, rings, arcs, and more not only provide intriguing visuals but also enable astronomers to probe the distribution of matter in galaxies and clusters of galaxies, as well as observe the distant universe.

              One promising method for studying the nature of dark matter is through strong gravitational lensing. By analyzing the perturbations in lensed images that cannot be explained by a smooth lens model, such as those caused by subhalos or line-of-sight halos, researchers can gain insights into the distribution and properties of dark matter.

              3.

              Equivariant Neural Network

              Equivariant neural networks are a type of neural network that can preserve the symmetries of input data, particularly data with group symmetries. They achieve this through the use of a group representation, which describes how a group acts on a vector space. The convolution operation, which is a key building block of many neural networks used in image and signal processing, is defined based on this group representation.

              Compared to standard convolutional neural networks, where filters are learned independently of the input data, in equivariant neural networks, the filters are learned as a function of the group representation. This ensures that the filters are consistent with the symmetries of the input data, making the learning process more efficient and allowing for better generalization.

              Different types of group representations, such as rotation, translation, or permutation representations, can be used in equivariant neural networks depending on the type of data being processed and the symmetry properties of the data.

              4.

              Why Equivariant Neural Network?

              E(2)-steerable Convolutional Neural Networks (CNNs) are neural networks that exhibit rotational and reflectional equivariance, meaning that their output remains independent of the orientation and reflection of the input image. This property can be demonstrated by feeding a randomly initialized E(2)-steerable CNN with rotated images, and visualizing the feature space of the network after a few layers. The feature space consisted of a scalar field and a vector field, which were color-coded and represented by arrows, respectively. The visualization showed that the feature space underwent equivariant transformations under rotations, and the output was stable under changes in the input image orientation. To further illustrate this stability, the feature space was transformed into a comoving reference frame by rotating the response fields back, resulting in a stabilized view of the output.

              e2cnn rotate
              Figure 3: The visualization demonstrates the equivariance claim by feeding rotated images into a randomly initialized E(2)-steerable CNN (left). The middle plot shows the equivariant transformation of a feature space, consisting of one scalar field (color-coded) and one vector field (arrows), after a few layers. In the right plot, we transform the feature space into a comoving reference frame by rotating the response fields back (stabilized view). Codes for visualization are taken from e2cnn.

              The invariance of the features in the comoving frame validates the rotational equivariance of E(2)-steerable CNNs empirically. Note that the fluctuations of responses are discretization artifacts due to the sampling of the image on a pixel grid, which does not allow for exact continuous rotations.

              Conventional CNNs are not equivariant under rotations, leading to random variations in the response with changes in image orientation. This limits the ability of CNNs to generalize learned patterns between different reference frames. Equivariant neural networks, such as E(2)-steerable CNNs, address this limitation by ensuring that the feature space of the network undergoes a specified transformation behavior under input transformations. As a result, these networks effectively capture symmetries in the data, making them useful for tasks such as studying substructures in strong gravitational lensing images.

              In summary, there are four major reasons to favor an equivariant neural network:

              • Data Efficiency
              • Equivariance in all layers
              • Better generalizability
              • Reduce Parameters
              5.

              Why not Augmentation?

              The answer to this question is explained in the paper "Equivariance versus Augmentation for Spherical Images" by Jan E Gerken et al. There, they analyze the role of rotational equivariance in convolutional neural networks applied to spherical images. They demonstrated that non-equivariant classification models require significant data augmentation to reach the performance of smaller equivariant networks. They also showed that the performance of non-equivariant semantic segmentation models saturates well below that of equivariant models as the amount of data augmentation is increased. Additionally, they found that the total training time for an equivariant model is shorter compared to a non-equivariant model with matched performance.

              MCTS
              Figure 4: The above figure shows an important result that the paper demonstrated: Left: For classification of spherical MNIST, the non-equivariant models reach the test accuracy of the equivariant models for very large amounts of data augmentation. Right: For semantic segmentation of one-digit spherical MNIST as in Figure 2, the non-background IoU of the non-equivariant models saturates well below the performance of the equivariant model even for moderately high amounts of data augmentation.
              6.

              Models

              E(2)-steerable CNN

              Group equivariant Convolutional Neural Networks (G-CNNs), are a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. The feature maps of a GCNN are functions over the elements of the group. A naive implementation of group convolution requires computing and storing a response for each group element. For this reason, the GCNN framework is not particularly convenient to implement networks equivariant to groups with infinite elements.

              Steerable CNNs are a more general framework that solves this issue. The key idea is that, instead of storing the value of a feature map on each group element, the model stores the Fourier transform of this feature map, up to a finite number of frequencies.

              Steerable CNNs are a neural network architecture that is equivariant to both 2D and 3D isometries, as well as equivariant MLPs. Equivariant neural networks ensure that the transformation behavior of their feature spaces is specified under transformations of their input. For example, conventional CNNs are designed to be equivariant to translations of their input, meaning that a translation of image results in a corresponding translation of the network's feature maps. However, E(n)-equivariant models, including steerable CNNs, are guaranteed to generalize over a broader range of transformations which includes translation, rotation, and reflection, and are thus more data-efficient than conventional CNNs.

              The feature spaces of E(n)-equivariant steerable CNNs are defined as spaces of feature fields characterized by their transformation law under rotations and reflections. Examples of such feature fields include scalar fields (such as grayscale images or temperature fields) and vector fields (such as optical flow or electromagnetic fields).

              Equivariant Transformer Networks

              Equivariant Transformer (ET) layers are image-to-image mappings that incorporate prior knowledge on invariances with respect to continuous transformation groups. ET layers can be used to normalize the appearance of images before classification (or other operations) by a convolutional neural network.

              Harmonic Net

              MCTS
              Figure 5: An example of a 2 hidden layer H-Net with m=0 output, input-output left-to-right. Each horizontal stream represents a series of feature maps (circles) of constant rotation order. The edges represent cross-correlations and are numbered with the rotation order of the corresponding filter. The sum of rotation orders along any path of consecutive edges through the network must equal M=0, to maintain the disentanglement of rotation orders.

              Harmonic Networks or H-Nets are a type of convolutional neural network (CNN) that exhibits equivariance to patch-wise translation and 360-rotation, which is not the case for regular CNNs, where global rotation equivariance is typically sought through data augmentation. They achieve this by using circular harmonics instead of regular CNN filters, which return a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient, and low computational complexity representation, and deep feature maps within the network encode complicated rotational invariants.

              There are a few advantages to using Harmonic nets:

              • Good generalization for insufficient data
              • Few parameters
              • Interpretable features for rotation
              • No rotational data augmentation

              Equivariant Wide ResNet

              The e2wrn (Equivariant Wide ResNet) is a technique to attain equivariance in ResNet. It utilizes the codebase available at Wide ResNet as its foundation and can be constructed using escnn/e2cnn.

              7.

              Publications

              Geo Jolly Cheeramvelil*, Sergei V Gleyzer, Michael W Toomey, "Equivariant Neural Network for Signatures of Dark Matter Morphology in Strong Lensing Data", Machine Learning for Physical Sciences 2023

              8.

              References

              1. Jan E. Gerken et al. "Equivariance versus Augmentation for Spherical Images." arXiv preprint arXiv:2202.03990, 2022. [Link].
              2. Gabriele Cesa et al. "A Program to Build E(n)-Equivariant Steerable CNNs." arXiv preprint arXiv:1911.08251, 2019. [Link].
              3. Kai Sheng Tai, Peter Bailis, and Gregory Valiant. "Equivariant Transformer Networks." arXiv preprint arXiv:1901.11399, 2019. [Link].
              4. Daniel E. Worrall et al. "Harmonic Networks: Deep Translation and Rotation Equivariance." arXiv preprint arXiv:1612.04642, 2016. [Link].
              5. Wide ResNet GitHub Repository. [GitHub].
              ]]>
              Geo Jolly