Studytrails

The Active Brain: How Google’s “ATLAS” Rewires Itself to Master Infinite Context

Mithil Shah — Sat, 31 Jan 2026 04:30:53 +0000

In my last article, we explored how DeepSeek’s Engram effectively gave AI a hippocampus—offloading static facts into a massive, efficient lookup table. It was a breakthrough in separating memory from reasoning.

But what if the model didn’t just “look up” memories? What if it actually rewired its own brain while it was reading, optimizing its understanding in real-time?

That is the premise behind ATLAS (“Learning to Optimally Memorize the Context at Test Time”), a revolutionary architecture from Google Research. While Engram solves the storage problem, ATLAS solves the context problem, allowing models to process a staggering 10 million tokens with near-perfect recall.

What is the ATLAS Module?

If Engram is like a student with access to a massive library (external memory), ATLAS is a student who actively takes notes and reorganizes their thoughts while listening to a lecture.

Standard Transformers suffer from a “static” nature during inference—their weights are frozen. They can only attend to what is in their context window, which grows quadratically expensive. ATLAS changes the rules of the game by treating memory as an optimizable component at test time.

It introduces a Long-Term Memory Module that doesn’t just store tokens; it learns them. Using a mechanism called the Omega Rule, the model actively updates its memory weights based on the text it is currently reading, effectively “training” itself on the fly to remember the specific context it is in.

The “Muon” Spark: Optimizing in Real-Time

The secret sauce of ATLAS is how it manages these updates. Traditional optimization (like SGD) is too slow and clumsy for real-time inference.

ATLAS employs a Muon Optimizer—a second-order optimization method that allows the memory module to converge on the “perfect” representation of the context almost instantly.

Standard RNNs: Update memory based only on the last token seen (myopic).
ATLAS: Updates memory by looking back at a sliding window of tokens, ensuring it captures the gestalt of the sequence, not just the most recent word.

Key Stats: The 10-Million Token Milestone

When pitted against other long-context architectures, ATLAS didn’t just win; it changed the benchmark.

Context Length: Successfully modeled sequences of 10 Million Tokens.
BABILong Benchmark: Achieved 80% accuracy at 10M context.
- Comparison: GPT-4’s accuracy drops significantly after 128k tokens; simpler RNNs like Titans hovered around 70%.
Efficiency: Because it compresses context into an optimized memory state rather than keeping a massive KV cache, it performs inference significantly faster than Transformer++ baselines.

The Paradigm Shift: Test-Time Training

The most significant contribution of the ATLAS paper is the validation of Test-Time Training (TTT).

For years, we assumed that “learning” stops once the model is trained. ATLAS proves that “inference” and “training” are not binary opposites. By allowing a small part of the model (the memory module) to remain plastic and learn during the conversation, we get a model that adapts to the user’s specific context without the massive cost of fine-tuning.

Why This Matters for AGI

If Engram mimics the Hippocampus (storage), ATLAS mimics Synaptic Plasticity (adaptation).

Infinite Context Agents: An ATLAS-powered agent could read an entire codebase, legal discovery, or genetic sequence and “learn” the structure of that specific data instantly, answering questions with perfect recall.
The End of the “Lost in the Middle” Phenomenon: Standard LLMs often forget information buried in the middle of a long prompt. ATLAS actively optimizes to retain difficult-to-remember sections.
Hardware Efficiency: Like Engram, ATLAS reduces the need for massive VRAM clusters, as it doesn’t need to store a KV cache for millions of tokens—just the optimized memory weights.

Conclusion: The Hybrid Future?

We are seeing a fascinating divergence in AI architecture in 2026. DeepSeek’s Engram pushes for extreme sparsity and lookup-based memory, while Google’s ATLAS pushes for continuous, active optimization.

The ultimate AGI architecture will likely be a hybrid: A model with an Engram-style library for static world knowledge, and an ATLAS-style active memory for understanding the long, complex context of the task at hand.

Is Test-Time Training the future of LLMs, or is it too computationally risky? Let me know your thoughts in the comments!

The Future of AI Architecture: How DeepSeek’s “Engram” Module Mimics Human Memory to Supercharge LLMs

Mithil Shah — Sat, 17 Jan 2026 08:03:03 +0000

The race for Artificial General Intelligence (AGI) has hit a bottleneck: efficiency. Standard Large Language Models (LLMs) are “computationally heavy” because they don’t know how to separate thinking from remembering.

A groundbreaking research paper from Peking University and DeepSeek-AI—“Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models” (arXiv:2601.07372)—is changing the paradigm. By introducing a “Conditional Memory” system called Engram, they have effectively given AI its own version of the human hippocampus.

What is the Engram Module in AI?

In neuroscience, an “engram” is a unit of cognitive information—a memory trace. In this new AI architecture, the Engram module acts as a massive, scalable lookup table.

Currently, models like GPT-4 or Claude use “Conditional Computation” (Mixture-of-Experts) to handle both logic and facts. DeepSeek’s innovation introduces a second axis of sparsity: Conditional Memory.

Instead of forcing a neural network to “calculate” a fact, the model uses an O(1) lookup to retrieve it instantly. This mimics Dual-Process Theory in humans:

System 1 (Fast): Retrieval of known patterns via Engram.
System 2 (Slow): Deep reasoning via the Transformer layers.

Key Stats: Why Engram Outperforms Standard MoE

When the researchers scaled the Engram module to 27 billion parameters, the performance gains were undeniable. By freeing the “reasoning neurons” from the burden of memorization, the model achieved massive jumps in benchmarks:

Benchmark Performance Gains:

General Reasoning (BBH): +5.0 points
Factual Knowledge (MMLU): +3.4 points
Scientific Problem Solving (ARC-C): +3.7 points
Coding (HumanEval): +3.0 points
Mathematics (MATH): +2.4 points

The U-Shaped Scaling Law: The “Goldilocks Zone” of AI

One of the paper’s most significant contributions is the discovery of the U-Shaped Scaling Law. The researchers found that you can’t just add infinite memory or infinite computation. There is an optimal balance.

The study suggests that allocating roughly 20-25% of a model’s sparse parameter budget to “Conditional Memory” (Engram) provides the maximum possible intelligence per FLOP.

Why This Matters for the Future of AGI

The most exciting part of this research isn’t just the accuracy—it’s the efficiency.

Lower GPU Costs: Because Engram lookups are deterministic, they can be stored in standard CPU RAM rather than expensive GPU VRAM.
Ultra-Long Context: By delegating local patterns to memory, the model’s attention mechanism can focus on global context. This improved Needle In A Haystack (NIAH) scores from 84.2 to 97.0.
Scalability: We can now build models with trillions of “knowledge parameters” without needing exponentially more power.

Conclusion: A New Era of Sparse Intelligence

DeepSeek’s Engram paper proves that the path to AGI isn’t just about making models bigger—it’s about making them smarter by mimicking the modularity of the human brain. By separating Conditional Memory from Conditional Computation, we are finally allowing AI to “know” things as effortlessly as we do.

Is the Engram module the missing piece of the AGI puzzle? Share your thoughts in the comments below!

An easy introduction to LLM agents – structure and components

Mithil Shah — Sun, 16 Feb 2025 22:13:59 +0000

LLM based autonomous agents can use Generative AI to automate processes without needing human intervention. There is a subtle distinction between autonomous agents and workflows as explained in this anthropic blog. If you know the series of steps needed to automate a process, then you can use a workflow. However, if you need a system to figure out the series of steps required to automate the process, then that’s autonomous agents. In this blog we will look at the different parts of autonomous agents and a mental model on how to start thinking about agents when you have a business problem.

Why LLM based autonomous agents?

‘I have heard about LLM based autonomous agents and want to start using this cool thing… ‘

‘I have a business goal of improving our NPS/CSAT/CES scores by 20% by end of this year, can Generative AI help me?’

If you think about the two questions above, it becomes apparent that the reason for using LLM based autonomous agents should be more than a technical one. If you are on your first Generative AI project, make sure you think about the why before you think about the how.

Components of LLM based autonomous agents

Once you have defined your business problem and are ready to build your autonomous agent, there are four components that you might have to start thinking about. This blog will only cover the concepts, and in the next blog we will look at the implementation.

Give your agent a character profile

If you think of a business workflow then there are multiple personas that come together to accomplish a task. For example, if you are building a mortgage processing system, then you might have a person looking at the customers document, another person that looks at the credit score, someone else that looks at the property and so on. Each person has a set of tools that they have access to to accomplish their task. While building an agentic system too, you will need to create agents that have a profile and has access to specific tools.

We will talk about tools later, but let’s talk about the profile first. To give the agent a profile, write a prompt the provides information such as demography (age, gender etc), expertise (credit score specialist), social information (relationship of this agent with other agents), character (“you are an outgoing person”), tone(“use a professional but friendly language”). You can either handcraft the profile manually, or you can even use an LLM to do that for you. Here’s a paper that introduces a simulator called RecAgent to build an agent that simulates behaviours.

An image from the RecAgent paper that introduces how to build agents that simulate a profile.

A technique to align the agent profile closer to a human profile would be to use profiles from demographic databases such as the census or American National Election Studies.

Give your agent memory

There is nothing more frustrating for a customer than repeating what they have already told an agent. If you build LLM based agents that can’t remember what their customers told them a few seconds or a few days ago, then you won’t get anywhere near your target NPS score. A thing to understand though is that LLMs, by default, cannot remember conversations. Each call to an LLM is stateless. The LLM has no way to remember previous conversations, unless your replay the whole conversation every time. There has been some research around building LLMs with inherent memory, but that’s still not available. How do you give agents memory then?

Before we talk about the how, lets understand the types of memory that the agent would need. The agent would need both short term and long-term memory. For example, an agent’s profile is its long-term memory, but a current conversation would sit in its short-term memory.

The quickest way to implement both the short-term and long-term memory would be to pass all previous conversations via the prompt. This is not a bad idea for many use cases. If the conversations are only a few sentences long, then we can store all the conversation in memory (temporary or permanent) and play the entire conversation back in each call to the LLM. There are, however, certain disadvantages to this approach. As the conversation history grows, the calls to LLM can become expensive (the more tokens you pass in, the more money you pay) and slower. There may also be limits to the number of words that your LLM can take as input. Also, as the context length grows, the LLM may struggle figuring out which part of the past conversation is more meaningful in the current context.

To overcome the shortcomings of short-term memory, you can ‘attach’ a long-term memory module to the agentic system. For example, you can store the previous conversation in an external system (such as a vector store or a database) and retrieve parts of the previous conversation that are relevant to the current conversation. For example, if a customer is talking to an agentic bot, then the bot can retrieve the previous conversation relating to the current customer case and ignore all conversations that refer to a different case. In addition to retrieving part of previous conversations, long term memory can also store a summary of previous conversations. For example, you can store information about the customer sentiment during the previous conversations (irrespective of the case number or whether the customer used the bot or sent an email) and retrieve the sentiment from the long-term memory. This will allow you to modify the agent responses to be more aligned to customer expectations. To make retrieval from long-term memory more effective, you can use an RAG system to retrieve relevant information from long-term memory.

There are different ways to store memory, and each has their pros and cons:

Store as natural language text in its original form.
Store as embedded vectors.
Store as a summary of previous conversations.
Store memory in a hierarchical structure or temporal structure.

The different ways are not exclusive. It’s also possible to have a combination of these.

When retrieving memory during a conversation, it’s important to extract meaningful information using criteria such as recency, relevance and importance. While writing memory it’s important to remove memory duplication, or to remove/compress information in memory if it reaches a certain size. It’s also important to periodically reflect on the existing memory and 1/ summarize it 2/ extract abstract information from it (for example, sentiment).

Help your agents plan better

When we need agents to decide on what series of steps it should take, then we want them to be able to plan their next steps. We will look at different planning methods in a single agent. However, in a multi-agent scenario, there will be a single ‘orchestrator’ agent that does most of the planning.

Chain of Thought – In this method, we pass in reasoning steps within the prompt itself. The LLM then uses the steps as an example to create its own steps when it receives user input. We can also tell the LLM to ‘think step by step’ without providing any specific examples. This is called zero COT.
ReWOO – This method introduces a concept where the reasoning is separated from the observation. Normally, the LLM would wait for the response from each step before executing the next step.

3. SwiftSage – If you have read the book by Daniel Kahneman called ‘Thinking, Fast and Slow’ then you know exactly what SwiftSage does. It is composed of two modules a/Swift replicates the fast and intuitive thinking process and b/ Sage uses reasoning LLMS for planning and grounding.

4. Self-consistent CoT (CoT-SC) – Here the LLM samples a diverse set of reasoning paths. Instead of jumping straight to the first solution that comes to mind, it explores multiple different ways of solving the problem, then picks the final answer that best matches across all these different solution approaches. Think of it like getting multiple opinions from different experts and choosing the answer that most of them agree on, rather than just going with the first expert’s suggestion.

5. Tree of Thoughts (ToT) : ToT is an improved version of Chain of Thought that lets LLMs consider multiple possible paths to solve a problem, similar to how humans might explore different solutions on a decision tree. Unlike the simpler Chain of Thought which follows a single linear path, ToT can evaluate different options at each step, look ahead to future consequences, and even backtrack if it realizes a different path might work better.

6. ReAct : Instead of treating an LLM’s ability to reason and its ability to take actions as separate skills, this approach combines them – letting the LLM think and act at the same time. This creates a more natural flow where the LLM can adjust its plans based on what it learns from taking actions, and can use its reasoning to decide what actions to take next – similar to how humans naturally blend thinking and doing when solving problems.

7. Voyager, Ghost, LLM-Planner are few other techniques that use feedback mechanism

Time to take action

Once the agents have planned their execution, its time to take action now. The actions can be in the form of calls to external systems such as web or API. The action has three dimensions

Goal : The intented outcome of the action. This could be
- Task completion: accomplish specific tasks such as completing a function in software development.
- Communication: The action is to communicate with other agents or with humans. e.g. In a multi-agent collaboration system has agents that coordinate tasks
Action Production: The agent can perform the actual action using
- Memory recollection : An action could be via extracting short term or long term memory.
- Predefined plans: An action is performed via a predefined plan or via goal decomposition to create sub-goals
- External tools : An action is performed via external tools such as APIs, databases and knowledge bases (SQL to query databases) and external models
Action impact: The consequence of an agents action can be to
- Change environment by calling external tools. i.e. moving a chess piece
- Change internal state i.e. updating memory, creating new plans etc
- Trigger new action

Conclusion

Building LLM-based autonomous agents requires careful consideration of four key components: character profiles, memory systems, planning capabilities, and action mechanisms.

Think of these components as building blocks:
– Character profiles give your agents purpose and personality
– Memory systems enable contextual understanding and continuity
– Planning capabilities help agents navigate complex decisions
– Action mechanisms allow agents to effect real change

As we move forward in the age of AI, autonomous agents will increasingly become part of our business workflows. However, their effectiveness will depend not just on the sophistication of the underlying LLMs, but on how well we design these fundamental components to work together in solving real business problems.

Enforce tagging for SageMaker training job

Mithil Shah — Mon, 27 Nov 2023 03:35:41 +0000

When a data scientist starts a training job, it is useful to enforce tagging so that cost can be allocated to the training job. We need to enforce both the tag key and the value. For example, a tag key can be ‘project’ and its value could be ‘projectA’. The user should not be able to launch a training job without the ‘project’ key and that key should only be able to take a value of ‘projectA’ so that the cost can be allocated to the right project. Here’s the SageMaker policy for accomplishing that

 sagemaker_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "cloudwatch:PutMetricData",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents",
                    "logs:CreateLogGroup",
                    "logs:DescribeLogStreams",
                    "logs:GetLogEvents",
                    "s3:GetObject",
                    "s3:PutObject",
                    "s3:ListBucket",
                    "ecr:GetAuthorizationToken",
                    "ecr:BatchCheckLayerAvailability",
                    "ecr:GetDownloadUrlForLayer",
                    "ecr:BatchGetImage",
                    "iam:PassRole",
                    "sagemaker:DescribeTrainingJob",
                    "sagemaker:AddTags"
                ],
                "Resource": "*"
                
            },
            {
                "Effect": "Allow",
                "Action": [
                      "sagemaker:CreateTrainingJob"
                ],
                "Resource": "*",
                "Condition": {"StringEquals": {
                "aws:RequestTag/project": [
                    "projectA"
                    ]
                }
            }
            }
        ]
    }

Introduction to scaling Large Model training and inference using DeepSpeed

Mithil Shah — Wed, 26 Apr 2023 01:30:08 +0000

What is DeepSpeed for Generative AI?

DeepSpeed is an open source (apache2 license) library that optimizes training and inference for foundation models. It is a lightweight wrapper for PyTorch and optimizes for both speed and scale.

Training optimization using DeepSpeed

DeepSpeed optimizes training by managing distributed training, mixed precision, gradient accumulation, and checkpoints. Some of its features are:

It can train up to 13 Billion parameters in a single GPU.
It implements a feature called Zero Redundancy Optimizer (ZeRO) which essentially reduces redundancies in memory in distributed training.
It supports combinations of data, model and pipeline parallelism, which it calls 3D parallelism.
It increases communication efficiency by using 1-bit Adam (using 1 bit compression with Adam), 0/1 Adam and 1-bit LAMB reduce
It uses a library called Data Efficiency which increases training efficiency and model quality by making better use of data. It does that by using two techniques :
- Curriculum Learning improves training convergence by providing relatively simple examples during earlier training.
- Random layerwise token dropping, as the name suggest, randomly drop token at each layer.
supports long sequence length using sparse attention kernels.
Improves training efficiency by using large batch optimizers for deep training (lamb)
Enables distributed training with mixed precision

Inference Optimization using DeepSpeed

There are two main challenges to inference – latency and cost. DeepSpeed has the following features to optimize inference:

splitting Inference to multiple GPUs and using the best parallelism strategy for multiple GPU inference.
Increase the efficiency per GPU using
- deep fusion – combine multiple operations into a single kernel.
- novel kernel scheduling – small batch size increases kernel invocation time and the General Matrix Multiplication library is not tuned for small sizes. DeepSpeed addresses these challenges.
DeepSpeed Quantization toolkit reduces inference cost and contains
- Different quantization for parameters and activation.
- Specialised INT8 inference kernels.

Compression Features

DeepSpeed contains a component known as compression composer. This offers multiple compression methods such as quantization, head/row/channel pruning, knowledge distillation and layer reduction. It provide an API to combine the compression methods in various combinations.

Understand CLIP (Contrastive Language-Image Pre-Training) — Visual Models from NLP

Mithil Shah — Wed, 28 Sep 2022 07:56:11 +0000

CLIP introduces a model that enables zero shot learning for a new dataset (in addition to a new example) by using NLP to supervise pre-training. i.e., To identify an object, you can provide the name or description of a new object that the model has not seen before.

Traditionally a computer vision model was trained with just images, which means to classify an object as a zebra, the model had to be trained with lots of zebras. However, what if you train a model using not just an image but also its associated text (e.g. caption)? Now if you train the model with hundreds of animals (excluding a zebra) and test it with an image of zebra with a description of how a zebra looks like (like horses but with black and white stripes), then the model may be able to classify a zebra without seeing one in training. This is also called zero-shot learning.

Natural Language Supervision for Visual Models

The idea is to learn more about the image using supervision from Natural Language Processing. However, its hard to find high quality large datasets that have crowd-labeled images with text. The paper introduces a new dataset of 400 million image x text pairs collected from internet.

One way to train the model is to jointly train the image CNN and text transformer from scratch, however that does not scale very well. Secondly training a model to predict the exact words of the text accompanying an image is hard and therefore a contrastive representation might be easier.

What is contrastive representation learning?

Contrastive representation captures information that is shared by multiple sources (images, text). The idea is to maximize mutual information. Predictive learning might use an encode+decoder setup to predict one source from other. Contrastive learning on the other hand learns an embedding that separates (contrasts) samples from two different sources

Training CLIP

Given N (image, text) pairs, CLIP is trained to predict which of the N x N possible pairings actually occurred. To do this, CLIP trains a multi modal embedding space by jointly training an image and a text encoder by maximizing (image,text) mappings and penalizing incorrect mappings.

For the image encorder the paper uses two models

ResNet-50 as base architecture with modifications.
Vision Transformer (ViT)

For the text encorder they used a transformer based on this paper.

CLIP Evaluation

We learnt that zero shot learning is about predicting a new class, but the paper treats zero shot learning as learning a new task. For example, another paper identified using a language model trained to generate Wikipedia articles to reliably transliterate names between languages.

To Perform zero-shot class classification, CLIP uses the name of all the classes in the dataset as text pairings and predicts the most probably (text, image) pair.

For the input dataset, since the classes sometimes only have a single word (dog), they replaced it with a prompt ‘A photo of a dog’ or sometimes providing more context ‘A photo of a dog, a type of pet’. This is similar to Prompt engineering.

The performance of CLIP on various tasks/datasets is summarized in this figure

CLIP comparison with human performance

Humans showed an increase in average performance from 54% to 76% between zero shot and one-shot (trained with one example) learning. The few shot learning from the paper was not so great, suggesting that there is still scope to improve the algorithm. One reason is that few shot learning does not make use of prior knowledge and humans do. However, out of distribution images are hard for both humans and the algorithm.

Limitations

The following limitations are listed in the paper

CLIPs zero shot performance is weak in certain fine grained classification tasks such as differentiating models of cars, species of flowers, and variants of aircraft when compared with task specific models.
CLIP struggles with abstract tasks such as counting the number of objects.
For novel tasks such as classifying the distance from nearest car in photo, the performance is random.
For out of distribution tasks the performance is poor as well (OCR trained on typed documents works well but handwritten documents does not)

Conclusion

CLIP presents a method to perform zero shot learning for completely new tasks using a computer vision model that is pretrained with supervision from NLP. It has shown promising results on multiple datasets, but still needs work for complex datasets.

Amazon S3 lifecycle policies

Mithil Shah — Sat, 26 Jun 2021 02:07:41 +0000

Why S3 lifecycle policies

S3 lifecycle policies allow you to do two things:

Reduce cost by deleting data that is not longer required.
Implement your security policies by :
- Retaining data for the required duration and reducing cost by moving it to low cost storage.
- Deleting data that you are not allowed to retain for a period beyond a specified period.

S3 lifecycle policy transition types

There are two kinds of transition policies within an S3 bucket.

The first kind of policy allows you to expire an object and
The second kind of policy allows you to transition an object to a low cost storage tier. For example, you can transition an object from a standard storage to an infrequently access storage and then onto a glacier storage.

In the transition policy you can specify the number of days after which you want to transition an object. For example, one possible transaction policy is when you want to transition to infrequently access storage from your standard Storage after 30 days and then move on to Glacier Storage after 90 days and then possibly Glacier deep archive after 180 days. You can then choose to retain the object for a year or two and eventually retire it.

S3 lifecycle policies

S3 lifecycle policy retire vs delete

one confusion arises from the fact that the object is retired and not deleted . Essentially retired means that the object is marked for delete and S3 deletes the object after a certain time period. You’re not charged for the object after you retire it.

Host a static website on Amazon S3

Mithil Shah — Sat, 19 Jun 2021 22:53:06 +0000

In this blog I will show you how to deploy a static website on Amazon S3, along with an HTTPS endpoint and content delivered by CloudFront CDN. Here’s a video of the steps

Deploy static website on Amazon S3

Six steps to deploying a static website on Amazon S3

Step 1. Create a Bucket

As the first step I create a bucket in S3. Note that at this point, I already have the static website HTML/CSS/Javascript pages built and tested. The bucket should not be public.

Step 2. Upload files to the bucket

Next, I uploaded the HTML/CSS and javascript files to the bucket. You can upload using the AWS console or use the AWS S3 cp utility.

Step 3. Create an Origin Access identity

I built the website so that it can be accessed using CloudFront instead of directly through S3. The advantage is that your traffic is served through CDN and also your website is secured through HTTPS since CloudFront can be used to terminate your SSL connection. There are two ways in which CloudFront can access the S3 bucket. The first way is to make the S3 bucket public and the second way is to keep the bucket private and create an Origin Access Identity (OAI). This OAI is used by CloudFront to access files in the S3 bucket. I just modified the bucket policy in S3 to allow access via OAI. Actually, that’s not complete accurate – while I created the OAI, I just clicked a checkbox in the UI and it updated the bucket policy on my behalf. Nifty !

Step 4. Create a certificate in ACM

I then created a certificate in ACM. My site was called ‘site.studytrails.com’ so I created a certificate for that name. Since my website DNS is in Route53, I had to click another button in ACM to validate the certificate and it made the required entries into Route53 to validate the certificate. I had to wait till the validation was done, before moving on the next step.

Step 5. Create the CloudFront distribution

Once the certificate is ready, I created a new CloudFront distribution. While creating the distribution, I selected the RESP API endpoint for the bucket and used the OAI that I had created in step 3. I also gave the distribution the additional CNAME of ‘site.studytrails.com’ since that is how I want to access my website. I then chose the certificate for that site that I had created in Step 5. I also put in index.html as the root object. Once I clicked on ‘create distribution’, I had to wait for a while before the distribution was ready.

Step 6. Create the DNS entry

The last step is to create the DNS entry in Route 53 (or your own DNS provider). I created a Route53 entry for my site and used the CloudFront distribution as the alias. It might take a while for the DNS to propagate.

That’s it! Six easy and simple steps to host your static website on AWS! You site is secure, scalable and ready for the world!!

Amazon S3 encryption

Mithil Shah — Thu, 27 May 2021 05:45:59 +0000

Encryption allows you to store objects in such a way that only an entity that has the encryption key, or access to the encryption key can access that object.

S3 Encryption types

There are five ways in which you can encrypt objects that you write to S3.

S3 Encryption Options

S3 Server Side Encryption using Amazon S3 Managed Keys

You can encrypt data in S3 using keys that are managed completely by S3. This uses AES-256 encryption. There are no additional charges for using this encryption. Use this if your company does not have a policy around encryption keys and you are ok with AWS owning your keys. Encryption via code is requested by using the x-amz-server-side-encryption header. You can write a bucket policy that denies upload requests that do not contain this header

S3 Server Side Encryption using keys stored in AWS KMS (AWS Or Customer managed)

AWS KMS allows you to create encryption keys and these encryption keys can be used to encrypt objects in S3. There are two ways to create encryption keys (also known as Customer Master Key or CMK). The first way is to let AWS create a key for you. When you use this option, AWS creates a key called aws/s3

S3 key created by AWS KMS

The other way is to create a key yourself in KMS using your own key material. The advantage of this method is that you can manage (rotate) the keys yourself and you own the key material.

For keys stored in AWS KMS there is an overhead as well. You are charged for using KMS as specified here. You can reduce the charge if you configure S3 bucket keys. To specify SSE using KMS pass in the header x-amz-server-side-encryption with a value of aws:kms. You can then also pass in the header x-amz-server-side-encryption-aws-kms-key-id

S3 Server Side Encryption using Customer keys

In the first case we had keys managed by S3, then we say keys managed by KMS (and the key generated by either AWS or the customer). In the third case we look at keys that are completely with the client. i.e. AWS does not manage the keys at all. You pass in the key when you upload objects and then pass in the same key to download the object.

Here’s an example in python to do that

import boto3
import os

BUCKET = 'studytrails-s3-course'
KEY = os.urandom(32)
s3 = boto3.client('s3')

# Upload encrypted object
s3.put_object(Bucket=BUCKET,
              Key='encryptedObject',
              Body=b'a,b,c',
              SSECustomerKey=KEY,
              SSECustomerAlgorithm='AES256')

# Upload unencrypted object
s3.put_object(Bucket=BUCKET,
              Key='unencryptedObject',
              Body=b'a,b,c')

# Getting the object:
print("Getting S3 object...")
# Note how we're using the same ``KEY`` we created earlier.
response = s3.get_object(Bucket=BUCKET,
                         Key='encryptedObject',
                         SSECustomerKey=KEY,
                         SSECustomerAlgorithm='AES256')
print("Done, response body:")
print(response['Body'].read())

S3 Client side encryption

You can encrypt a file in your own application. You can either use your own key, or use keys managed in KMS. There is no client for Python currently.

Amazon S3 access control and permissions

Mithil Shah — Mon, 24 May 2021 08:53:03 +0000

There are three ways to control access to s3 bucket and its objects

Using bucket policies.
Using bucket Access Control Lists (ACL)
Using User policies

ACL is used only in cases where Objects are not owned by the bucket owner. I.e. objects are uploaded by another account and the bucket owner does not own these objects. The Object owner (the other account that uploaded them) can write Object ACL to manage them. Bucket ACL is only used to grant permission to Amazon S3 Log Delivery group to write access log to your bucket.

You can put users in a group and then write group based policy as well.

Bucket policies have a limit of 20KB.

User Policy to allow S3 bucket listing, get and put objects

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowListingAllBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListAllMyBuckets"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowListingObjectsInABucket",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::studytrails-s3-course"
        },
        {
            "Sid": "AllowDownloadUploadDelete",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::studytrails-s3-course/*"
        }
    ]
}

The user policy above is split into three parts. The first part allows a user to list all buckets, the second part allows the user to list all objects in a bucket and the third part allows the user to put, get and delete objects

Bucket policy to deny access to a bucket except to specific users

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "S3:*",
            "Resource": [
                "arn:aws:s3:::studytrails-s3-course",
                "arn:aws:s3:::studytrails-s3-course/*"
            ],
            "Condition": {
                "StringNotLike": {
                    "aws:userId": [
                        "3812xxx91xxx",
                        "AIDAVxxxxxxxD3BS47ZLR"
                    ]
                }
            }
        }
    ]
}

This policy denies access to all users except for the root account (identified by the account number) and another user identified by the user id. To obtain the user id use this command

aws iam get-user --user-name studytrails

Bucket policy to allow cross account access

{
    "Version": "2012-10-17",
    "Statement": [
        
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::43157xxxxxxx:root"
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::studytrails-s3-course"
        }
    ]
}

The policy above allows the administrator in another account (43157xxxxxxx) access to the bucket in account (3812xxx91xxx). The administrator can then delegate this access to any user in that account using policies specified in the first section in this blog.