We will use the standard Markov Decision Process (MDP) model
\[\mathcal{M} = (\mathcal{S},\mathcal{A},P,r,\gamma,\rho_0).\]| Term | MDP notation | LM analogue | LM notation |
|---|---|---|---|
| State | $s\in\mathcal{S}$ | prompt + prefix tokens | $s_t \equiv x_{0:t}$ |
| Action | $a\in\mathcal{A}$ | next token | $a_t \equiv x_{t+1}$ |
| Transition | $P(s’\mid s,a)$ | append token (usually deterministic) | $s_{t+1}=(s_t,a_t)$ |
| Reward | $r(s,a)$ | terminal score / reward model | $r_T$ (often at EOS) |
| Discount | $\gamma\in[0,1)$ | (optional) token-position discount | $\gamma$ |
| Start state dist. | $\rho_0$ | prompt distribution | $p(\text{prompt})$ |
| Policy | $\pi_\theta(a\mid s)$ | LM next-token distribution | $p_\theta(x_{t+1}\mid x_{0:t})$ |
| Trajectory | $\tau=(s_0,a_0,\dots,s_T)$ | completion / rollout | $\tau \equiv x_{0:T}$ |
| Return | $R(\tau)=\sum_{t=0}^{T-1}\gamma^t r(s_t,a_t)$ | scalar score for completion | $R(x_{0:T})$ |
Policy / LM distribution. In general RL notation, our policy is a conditional distribution $\pi_\theta(a\mid s)$. For an autoregressive language model, we will identify
\[\pi_\theta(a_t\mid s_t) \equiv p_\theta(x_{t+1}\mid x_{0:t}).\]With this identification, the probability of a full completion (trajectory) factors as
\[\pi_\theta(\tau) \equiv p_\theta(x_{0:T}) = p(x_0)\prod_{t=0}^{T-1} p_\theta(x_{t+1}\mid x_{0:t}),\]where $p(x_0)$ denotes the prompt distribution (or is treated as fixed/conditioned on).
Note that we call discounting optional because most LM RL work sets $\gamma = 1$ and only has a single reward at the end of the trajectory. However, sometimes we change this (e.g., in scenarios where dense reward is available).
In policy gradient, we assume a differentiable, parameterized policy $\pi_\theta$ and update parameters by (stochastic) gradient ascent on $J(\theta)$:
\[\theta \leftarrow \theta + \alpha\,\widehat{\nabla_\theta J(\theta)}.\]An important initial identity is the log-deriv trick:
\[\begin{aligned} \nabla_\theta \log \pi_\theta(\tau) &= \frac{\nabla_\theta \pi_\theta(\tau)}{\pi_\theta(\tau)} \quad\text{(when $\pi_\theta(\tau)>0$)}\\[0.5em] \nabla_\theta \pi_\theta(\tau) &= \pi_\theta(\tau)\,\nabla_\theta \log \pi_\theta(\tau) \end{aligned}\]We want to maximise our expected reward, that is we set:
\[\begin{aligned} J(\theta) &= \mathbb{E}_{\tau\sim\pi_\theta}[R(\tau)] \\ &= \int \pi_\theta(\tau)\,R(\tau)\,d\tau \end{aligned}\]Now, let’s compute the derivative:
\[\begin{aligned} \nabla_\theta J(\theta) &= \nabla_\theta \int \pi_\theta(\tau)\,R(\tau)\,d\tau \\ &= \int \nabla_\theta\pi_\theta(\tau)\,R(\tau)\,d\tau \\ &= \int \pi_\theta(\tau)\,\nabla_\theta\log\pi_\theta(\tau)\,R(\tau)\,d\tau \end{aligned}\]Then, we expand out the trajectory logprobs. Finally, expand the trajectory log-probability:
\[\begin{aligned} \log\pi_\theta(\tau) &= \log\rho_0(s_0) + \sum_{t=0}^{T-1}\log\pi_\theta(a_t\mid s_t) + \sum_{t=0}^{T-1}\log P(s_{t+1}\mid s_t,a_t) \\[0.5em] \nabla_\theta\log\pi_\theta(\tau) &= \nabla_\theta\log\rho_0(s_0) + \nabla_\theta\sum_{t=0}^{T-1}\log\pi_\theta(a_t\mid s_t) + \nabla_\theta\sum_{t=0}^{T-1}\log P(s_{t+1}\mid s_t,a_t) \\ &= \sum_{t=0}^{T-1}\nabla_\theta\log\pi_\theta(a_t\mid s_t) \end{aligned}\]Interestingly, the transition dynamics disappear completely! Although, of course, they still are affecting the trajectory distribution implicitly. So we can just plug that right in:
\[\nabla_\theta J(\theta) = \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1}\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,R(\tau)\right]\]Next, we will define the “reward-to-go” (the future return from time $t$):
\[G_t \triangleq \sum_{k=t}^{T-1} \gamma^{k-t}\,r(s_k,a_k)\]Importantly, we can note that the reward-to-go is the only thing that matters in our derivative. Write the full return as a “past” part plus a “future” part:
\[\begin{aligned} R(\tau) &= \sum_{k=0}^{T-1} \gamma^k r(s_k,a_k)\\ &= \underbrace{\sum_{k=0}^{t-1} \gamma^k r(s_k,a_k)}_{\triangleq\,R_{<t}} + \underbrace{\sum_{k=t}^{T-1} \gamma^k r(s_k,a_k)}_{\triangleq\,G_t} \end{aligned}\]Then, for any fixed time $t$,
\[\begin{aligned} \mathbb{E}_{\tau\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,R(\tau)\big] &= \mathbb{E}_{\tau\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G_t + R_{<t})\big]\\ &= \mathbb{E}_{\tau\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,G_t\big] + \underbrace{\mathbb{E}_{\tau\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,R_{<t}\big]}_{=0} \end{aligned}\]The last term is zero because conditioning on $s_t$, $R_{<t}$ is a valid baseline. Let’s put a pin in this for a second, and come back to it later.
This gives us our REINFORCE loss:
\[\mathcal{L}_{\text{REINFORCE}}(\theta) \;\triangleq\; -\,\mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \log\pi_\theta(a_t\mid s_t)\,G_t\right]\]And our basic algorithm:
There are three core issues with REINFORCE:
I’ll cover two of these, and step size is… less of a big issue for things like CISPO, but is important and part of the motivation for developing things like TRPO and PPO. Let’s deal with variance first, since that’s the most important thing.
A standard variance-reducing trick is to subtract a baseline that only relies on the current state:
\[\nabla_\theta J(\theta) = \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G_t - b(s_t))\right]\]We need to show two core things: (1) that doing this is unbiased, and doesn’t change our result; (2) that this actually reduces variance.
Let’s split out our loss:
\[\begin{aligned} \nabla_\theta J(\theta) &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G_t - b(s_t))\right] \\ &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,G_t\right] - \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,b(s_t)\right] \end{aligned}\]Then, we will just consider the baseline term. Let’s consider what happens at a single time step $t$. The key thing is we split up the expectation:
\[\begin{aligned} \mathbb{E}_{\tau\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,b(s_t)\big] &= \mathbb{E}_{s_t,a_t\sim\pi_\theta}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,b(s_t)\big]\\ &= \mathbb{E}_{s_t}\left[\mathbb{E}_{a_t\sim\pi_\theta(\cdot\mid s_t)}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,b(s_t)\ \big|\ s_t\big]\right]\\ &= \mathbb{E}_{s_t}\left[b(s_t)\,\mathbb{E}_{a_t\sim\pi_\theta(\cdot\mid s_t)}\big[\nabla_\theta\log\pi_\theta(a_t\mid s_t)\big]\right]\\ &= \mathbb{E}_{s_t}\left[b(s_t)\,\sum_a \pi_\theta(a\mid s_t)\,\nabla_\theta\log\pi_\theta(a\mid s_t)\right]\\ &= \mathbb{E}_{s_t}\left[b(s_t)\,\sum_a \nabla_\theta\pi_\theta(a\mid s_t)\right]\\ &= \mathbb{E}_{s_t}\left[b(s_t)\,\nabla_\theta\sum_a \pi_\theta(a\mid s_t)\right] = \mathbb{E}_{s_t}\left[b(s_t)\,\nabla_\theta 1\right] = 0 \end{aligned}\]Therefore
\[\mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,b(s_t)\right] = 0,\]so replacing $G_t$ by $G_t-b(s_t)$ leaves the expectation unchanged (i.e., the estimator remains unbiased).
Note that this is why up to the current state is okay: if the current action was included, then we couldn’t move the baseline out of the expectation in the above proof. So we can depend on anything apart from the current action for our baseline.
Now we know we can do this, we need to consider what would be a good baseline that can reduce our variance?
The math for optimal baselines gets more involved, and I couldn’t find a clear explanation, so let’s stick with a higher-level approximation1. These approximations are not strictly valid in general, but they help us build some intuition for why we can reduce variance with baselines. Consider computing the variance of a single trajectory:
\[\begin{aligned} \mathrm{Var}\Big(\sum_{t=0}^{T-1}\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G_t-b(s_t))\Big) &\approx^{(i)} \sum_{t=0}^{T-1}\mathbb{E}_{\tau}\Big[\big(\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G_t-b(s_t))\big)^2\Big]\\ &\approx^{(ii)} \sum_{t=0}^{T-1}\mathbb{E}_{\tau}\Big[\big(\nabla_\theta\log\pi_\theta(a_t\mid s_t)\big)^2\Big]\,\mathbb{E}_{\tau}\Big[\big(G_t-b(s_t)\big)^2\Big] \end{aligned}\]Here (i) drops cross-covariance terms between different time steps, and (ii) further treats the score term and the (centered) return term as approximately independent. We can’t control the variance over the logprobs, so instead we want to minimize just $G_t - b(s_t)$! This suggests that a good baseline is our best guess at $G_t$ based on $s_t$, which is exactly our value function!
This gives us the advantage function:
\[A_{\pi_\theta}(s_t, a_t) = G_t - \hat{V}_{\pi_\theta}(s_t)\]Note here that $G_t$ is the reward we get from doing $a_t$ in state $s_t$. That is, it is (an unbiased Monte Carlo estimate of) the Q-value. Intuitively, we are replacing weighting by absolute reward with weighting by the improvement of the action over the average action taken by the policy. This intuitively reduces variance since we are sort of taking into account “how good” our policy is already.
Let’s cover the other big issue with REINFORCE: data reuse.
We can’t really re-use old data in our current setup, since we derived everything under the assumption that we drew our trajectories from our current model ($\tau \sim \pi_\theta$). This means that once we take a gradient step, the old trajectories are technically coming from a different distribution ($\tau \sim \pi_{\theta_{t-1}}$).
We can apply importance weighting to fix this. The core idea is that we can draw samples from a different distribution so long as we reweight the samples with the probabilities under our current distribution. To simplify, let’s define
\[\begin{aligned} \mathbb{E}_{\tau\sim\pi_{\theta}}\big[f(\tau)\big] &= \int \pi_{\theta}(\tau)\,f(\tau)\,d\tau \\ &= \int \pi_{\theta_{\mathrm{old}}}(\tau)\,\frac{\pi_{\theta}(\tau)}{\pi_{\theta_{\mathrm{old}}}(\tau)}\,f(\tau)\,d\tau \\ &= \mathbb{E}_{\tau\sim\pi_{\theta_{\mathrm{old}}}}\Big[w(\tau)\,f(\tau)\Big] \end{aligned}\]where
\[w(\tau) \triangleq \frac{\pi_{\theta}(\tau)}{\pi_{\theta_{\mathrm{old}}}(\tau)} = \prod_{t=0}^{T-1}\frac{\pi_{\theta}(a_t\mid s_t)}{\pi_{\theta_{\mathrm{old}}}(a_t\mid s_t)}\]So we can just apply this to our original policy-gradient expression:
\[\begin{aligned} \nabla_\theta J(\theta) &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1}\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,G_t\right]\\ &= \mathbb{E}_{\tau\sim\pi_{\theta_{\mathrm{old}}}}\left[w(\tau)\,\sum_{t=0}^{T-1}\nabla_\theta\log\pi_\theta(a_t\mid s_t)\,G_t\right] \end{aligned}\]We can also define the ratio on the token-level:
\[r_{t}(\theta) = \frac{\pi_{\theta}(a_t\mid s_t)}{\pi_{\theta_{\mathrm{old}}}(a_t\mid s_t)}\]Actually, we basically already have enough to get the current SOTA RL algorithm, CISPO2!
CISPO’s loss is:
\[J_{\mathrm{CISPO}}(\theta) = \mathbb{E}_{(q,a)\sim\mathcal{D},\,\{o^i\}_{i=1}^G\sim\pi_{\theta_{\mathrm{old}}}(\cdot\mid q)}\left[ \frac{1}{\sum_{i=1}^G |o^i|} \sum_{i=1}^G\sum_{t=1}^{|o^i|} \mathrm{sg}\big(r_{i,t}(\theta)\big)\,\hat A_{i,t}\,\log \pi_\theta\big(o^i_t\mid q,o^i_{<t}\big) \right]\]Note that we are averaging over groups here (the $i$ index) and over timesteps (number of tokens in a given rollout). $\mathrm{sg}$ stands for “stop gradient”, and is used to avoid backpropagating through the logprobs used in computing the importance ratio $r$. $G$ is our group size (number of rollouts sharing a prompt), which is used to compute the advantage $\hat A$:
\[\hat A_{i,t} = \frac{R_i - \mathrm{mean}(\{R_j\}_{j=1}^G)}{\mathrm{std}(\{R_j\}_{j=1}^G)}\]Note that this is intuitively capturing exactly what we were doing before, but using the group estimates instead of a learned function for the value estimates.
Some other interesting advantage estimates:
You might wonder why using the reward from trajectories is valid to include in the baseline for GRPO and other baselines like REINFORCE++ — doesn't this include the current action? Actually, this is a special case that is fine!
Firstly, using other trajectories is completely valid, since they are completely independent of our current action (different samples entirely). But the reward of our current trajectory is the weird part. Let's consider using the average batch reward-to-go as a baseline:
$$ b(s_t) \equiv \frac{1}{B}\sum_{n=1}^{B} G^n_t \quad\text{, where B is our batch size} $$ Let $x$ denote our current sample in the batch. Then: $$ \begin{aligned} \nabla_\theta J(\theta) &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G^x_t - b(s_t))\right]\\ &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G^x_t - \frac{1}{B}\sum_{n=1}^{B} G^n_t)\right] \\ &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(G^x_t - \frac{1}{B}G^x_t - \frac{1}{B}\sum_{n=1,\ne x}^{B}G^n_t)\right] \\ &= \mathbb{E}_{\tau\sim\pi_\theta}\left[\sum_{t=0}^{T-1} \nabla_\theta\log\pi_\theta(a_t\mid s_t)\,(\frac{B-1}{B}G^x_t - \frac{1}{B}\sum_{n=1,\ne x}^{B}G^n_t)\right] \end{aligned} $$We know using $\frac{1}{B}\sum_{n=1,\ne x}^{B}G^n_t$ is fine as a baseline, so we just need to consider $\frac{B-1}{B}G^x_t$. It should be intuitively clear that this just scales down the gradient by $\frac{B-1}{B}$, and so does not bias the final result. Strictly, using a leave-one-out estimate as in RLOO is better to avoid this scaling, but it does not hurt!
Crucially, this only works because we used the reward itself in the baselining. Other functions involving $a_t$ would be invalid to use.
And that’s it! Thanks for listening!
See also: Weng, L. (2018). Policy Gradient Algorithms; Sutton & Barto (1998). Reinforcement Learning: An Introduction (Chapter 13).
Seita, D. (2017). Going Deeper Into Reinforcement Learning: Fundamentals of Policy Gradients. ↩
MiniMax et al. (2025). MiniMax-M1: Scaling Test-Time Compute. ↩
Kazemnejad et al. (2025). VinePPO: Refining Credit Assignment. ↩
Feng et al. (2025). Grouping in Group Policy Optimization for LLM. ↩
Hu et al. (2025). REINFORCE++: Stabilizing Critic-Free Policy. ↩
Ahmadian et al. (2024). Back to Basics: Revisiting REINFORCE Style Optimization. ↩
Note to reader: This blog post is a (lightly) edited version of a post I originally wrote in May 2025, but dropped due to getting busy with life. I’ve finally cleaned it up and shared it, but understand it’s a bit out of date! Lots of interesting and cool work on LM overthinking and length control has come out since then, one example being GDPO.
A large flaw of autoregressive thinking models is that their inference can simply go on and on. For example, if we take DeepSeek-R1 and provide it with a slightly nonsense riddle, it very quickly degrades into endless guess-and-checking:
I asked the new R1 to "Perform some calculation to estimate pi/7" and I don't know if it's ever going to stop thinking
— Nathan Lambert (@natolambert) May 28, 2025
This is partly just due to the fact that long CoTs take some time to generate, and can also be part of an issue known as overthinking, wherein models often think for longer and produce longer CoTs for problems they will eventually get wrong or they did not need to spend all that time on. There is a decent amount of literature on this at this point, and papers that propose to solve the issue by better learning how to allocate compute for questions.
One particular approach to solving this I found interesting earlier in the year was L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning. The idea behind the paper is simple: let’s just reward the model for getting the length right during RL training! (Perhaps) surprisingly, a fairly simple reward works well here. Doing everything during RL training is something I’m interested in doing right now, so I quite liked this, and worked a little on replicating this in Open-Instruct, a post-training codebase I contribute to quite a bit.
In order to replicate, I set up a small setting I cared about: Given a math dataset, can I RL-train with an additional length control reward to achieve a ‘reasonable’ degree of length control without sacrificing quality? In particular, I was interested in the following properties:
Some of these questions are answered in the original paper in more detail, but I wanted to try getting my own answers!
Let’s get a bit more specific about the setup. For my experiments, I used a Qwen-2.5-7B model finetuned on the Tulu 3 SFT dataset for 2 epochs (following the same hyperparameters as the original Tulu 3 SFT, but with a max context length of 32768). This is the starting point for my experiments unless otherwise stated. Note that this model is not a thinking model out-of-the-box, so it doesn’t really do long-CoT generations initially.
For data, I used the Eurus 2 data, a large-ish collection of math and code data. I filtered out the code data.
Finally, I did all my training with GRPO. I think this should work with other online RL algorithms (e.g. PPO), but I used GRPO since it was straightforward. To be exact, I used the grpo_fast implementation in Open-Instruct, which includes some minor extra features like packing. Here’s an example of a command I used to train:
dataset=ai2-adapt-dev/eurus2_ground_truth_with_random_max_length
python mason.py \
--cluster ai2/augusta-google-1 \
--pure_docker_mode \
--image hamishivi/open_instruct_mult_dev1704 \
--workspace ai2/tulu-3-dev \
--priority high \
--preemptible \
--num_nodes 3 \
--max_retries 0 \
--budget ai2/oe-adapt \
--gpus 8 -- source configs/beaker_configs/ray_node_setup.sh \&\& python open_instruct/grpo_fast.py \
--exp_name $exp_name \
--beta $beta \
--num_samples_per_prompt_rollout 16 \
--num_unique_prompts_rollout 128 \
--sft_messages_key messages \
--output_dir /output \
--kl_estimator kl3 \
--learning_rate 1e-6 \
--dataset_mixer_list $dataset 1.0 \
--dataset_mixer_list_splits train \
--dataset_mixer_eval_list $dataset 16 \
--dataset_mixer_eval_list_splits train \
--max_token_length 10240 \
--max_prompt_token_length 1024 \
--response_length 8192 \
--model_name_or_path ai2-adapt-dev/tulu_3_long_finetune_qwen_7b_reg \
--apply_verifiable_reward True \
--non_stop_penalty False \
--temperature 0.6 \
--total_episodes 2000000 \
--non_stop_penalty_value 0.0 \
--pack_length 16384 \
--deepspeed_stage 2 \
--per_device_train_batch_size 1 \
--num_learners_per_node 8 8 \
--num_epochs 1 \
--num_mini_batches 1 \
--vllm_tensor_parallel_size 1 \
--vllm_num_engines 8 \
--lr_scheduler_type constant \
--seed 1 \
--num_evals 100 \
--save_freq 2000 \
--try_launch_beaker_eval_jobs_on_weka False \
--gradient_checkpointing \
--with_tracking
There are some minor changes for different datasets (with different length rewards), but really this is the main command and hyperparameters used.
What does our reward look like? I experimented with four different setups:
\nThink for n tokens. We then calculate the reward as 1 - (abs(tokenized_prediction - desired_length) / 8192), and add this to the ground truth reward (so the model gets extra reward for getting the length correct).{100, 1024, 2048, 4096, 6144, 8192}. The idea is that learning specific ‘valid’ lengths might be easier than any integer in a range with > 4000 values.tokenized_prediction - desired_length < 0, we just give full reward. The idea is that this is an easier task to learn, and a bit more realistic (a user probably doesn’t mind if the model finishes early). Note that I edit the prompt to be \nThink for up to n tokens.To visualise the “up to” and “exact” rewards for a desired length of 3200 tokens:

How do we evaluate? Basically, I evaluated on MATH-500 with desired lengths of {100, 1024, 2048, 4096, 6144, 8192, 9216, 10240} (set via the prompt). I measure both performance at the various lengths and how far off the lengths are from the desired output length (shown via violin plot below). I also did one out-of-domain experiment running evaluation on MMLU with the same lengths.
Let’s start by just looking at how well each approach actually adheres to the length budget. We use a violin plot to show the distribution of lengths generated by the model for each desired length, and remove the bottom and top 5% of lengths to reduce the effect of outliers.




Takeaways: Training on the budget works really well! We get pretty good length control, although it’s not exact exact. However, we don’t generalise to new lengths, and so we can’t use this technique to scale inference-time compute beyond what we used during training. Interestingly, we also see an ‘up to’ reward doesn’t work that well, as the model just learns to always be short: instead, we need the tight ‘exact’ reward.
You might be curious about performance. Below I’ve plotted performance at different output lengths for the 8k “exact” reward (and I found the other methods to be similar in performance, apart from the ‘up to’ reward which just learns to be short). Note I evaluated both using greedy and temperature (temperature of 0.7) sampling, hence the two lines.

As you can see, the model matches the performance of the ‘no length control’ baseline once we hit >= 2000 tokens in output. This suggests: a: we can get length control without sacrificing performance, and b: the model doesn’t need to generate long chains to do well. b is especially interesting, since the model without length control is fairly yappy and does make use of the full 8k token budget quite often. This suggests that the model learns to compress its reasoning as part of the length control task. Perhaps this would drop performance in more complex tasks, but here it’s very encouraging. Indeed, much work over the past year found that reasoning models could compress their reasoning chains quite a lot.
Finally, I also wanted to see how well the length control did at tasks that were OOD. Recall we are training on math data only, so I evaluated on MMLU, which is a general QA task.

Here, we see that length control still works, but less strongly on these OOD tasks. I consider this pretty successful, since in reality we can just train on a diverse mixture and minimise how many OOD cases the model needs to deal with (and later for Olmo 3 we did indeed train on a moderately diverse mixture of data).
Sadly, we do see performance drop (all curves using temperature 0.7 sampling):

Note that ‘bucketed’ is the bucketed reward mentioned above, and the other two are the ‘exact’ rewards trained for differing amounts of time with different LRs. All three perform worse than the base model. However, this might just be due to the model overfitting on the training data (math-only), for which I don’t have an experiment.
I did this project to answer a few questions, and we can now clearly do that:
Overall, I think these show that an L1-style recipe is pretty effective at learning token budgets. If you use coarse buckets (e.g. ‘easy’, ‘medium’, ‘hard’ reasoning), you can do really well, and even exact token budgets are possible with training, and don’t seem to sacrifice performance too much. Revisiting these results has reminded me to try to fold these into the next Olmo release, which I will (try) to do.
Thanks for reading, and may your LMs be perfectly verbose.
]]>but anyway, let’s get into it!
Really awful on the reading front this year. I’ll try to do better in the future. I spent much of the first half of the year reading comics and worked through Kieron Gillen’s back catalogue, which was fun (and I really love The Power Fantasy currently)!
Other ongoing comics I’m appreciating include the amazing Ultimates run by Deniz Camp (crazy that the ultimate universe is ending soon!), and DC’s companion Absolute universe comics are also great, especially Absolute Martian Manhunter, which is kind of amazing (and also by Deniz Camp). On the manga side, I grew a weird appreciation for Jujutsu Kaisen Modulo, which is doing some interesting things as a sequel series, and I got back into One Piece, which has just been consistently killing it.
I watched… 107 films this year! Pretty good, improving on last year. It might be tough to beat this next year! As usual, my reviews and ratings are on my letterboxd. I watched all the Fast and Furious films this year (fun to see them progress), and finally watched some Wong Kar-Wai (Chungking Express is amazing). So many amazing films came out this year, but One Battle After Another probably has to be the overall top, although Twinless and Bugonia were really tight contenders (Bugonia is just pure Lanthimos, it’s great) - and, of course, Sinners. I watched a surprising large number of anime movies in theatre (Demon Slayer, Dandandan, Chainsawman), which was fun, and perhaps an interesting shift in stuff in the theatres.
BIG year for TV for me, even more so than last year. Andor blew me away, and was probably my favourite show of the year (and probably is a contender for one of my favourite shows of all time). It’s hard to imagine a tighter show than Andor, and I’m sad it’s ending but also happy they could even do it in the first place. The Rehearsal redefined my notion of what a TV show could be, and was the least predictable show I’ve ever seen (and that’s even compared to something like Severance). Severance was amazing, and an extremely fun watch (and fun to chat about with friends). Pluribus is also great, and I’m excited to see where it goes in the future. I also watched a lot of great short series (Chernobyl, Apple Cider Vinegar, Adolescence, Normal People), and rewatched Legion (remains just an endlessly inventive show, I wish more superhero genre items would take as much liberty with their style as legion does episode-to-episode). Finally, I also finally finished watching all of House, M.D. (which I started a few years back, as a background show). I would probably say…. the first few seasons are easily the best, but still it remains solid surprisingly deep throughout. It also makes all the clips of the show you see floating around even better, to be honest.
Looking forward already to the upcoming year of TV (looking forward to The Boys, Industry, House of the Dragon, and maybe Wonder Man)!
Not too many games this year, similar to last, but I did really enjoy Ace Attorney as usual (and hoping for more), and really enjoyed Deltarune as usual. I got sucked into Balatro (like many others), and then in the back half of the year focussed on enjoying a Switch 2 and the respective games. I’m currently working my way through the Mega Dimension DLC for Pokemon Legends: Z-A, which is a bit rough, but the base game itself is fun and cool to see as someone who’s been a pokemon fan since the Ruby and Sapphire days.
Music-wise, I listend to a lot of Role Model this year, and continued to enjoy Magdalena Bay and Royel Otis (and saw all three of these artists live!). I developed a strong appreciation for Egoism (from my home of Sydney), really liked Mallrat’s new album, Light hit my face like a straight right, and found Ninajirachi’s debut album (I Love My Computer) AMAZING. It’s a good time for Australian music! Other-stuff-wise, I really liked The Crux, Djo’s new album, and I got a little sucked into k-pop, listening to some Le Sserafim (who I saw live!) and Katseye stuff.
Other video wise, I got sucked into Stavvy’s world clips, and otherwise enjoyed Big A clips and Lemonade Stand. I also had a few months deep in the GothamChess zone, deep on chess.com. Apart from this, I feel that my watching habits didn’t change too much - still a lot of WolfeyVGC, a lot of Jacob Geller, and various other video essayists I enjoy (e.g., Patrick H Willems, Lindsey Ellis, Todd in the Shadows).
Finally, I’m still spending time on reels and tiktoks, but I think it’s become much more mainstream. Much more political stuff, and trends come and go faster than ever. My tiktok account somehow got deleted, so I don’t really spend any time on there anymore, which is…probably good. Overall, I made more of an effort to watch more movies, and clearly that’s actually paid off a bit!
That’s about it for me this year! I’m looking forward to 2026 media-wise, and I hope to have maybe some more games on the list next year… maybe a good year to get back into a bunch of reading. Maybe I’ll try to put out some old blog posts. Already have lots of exciting ideas for research, too. It’s a fun time!
]]>Sadly, not much reading this year, although I want to fix this (lots downloaded on my kindle)! I focussed a bit on more non-fiction this year. I really, really enjoyed Rejection: Fiction, but I have to say it’s not for the prude or squeamish. Theory and Reality was also a solid read, and I recommend it to anyone in a PhD program – it feels important to have some grounding in the philosophy of science if you plan to be, well, a scientist. And yes, I (re)read all the plays of Aristophanes this year – sadly in English, but I do always love ancient comedy.
In terms of comics not listed here, I enjoyed (like many) the new Ultimate lines by Marvel, with Ultimate Spiderman and Ultimate X-men being particular highlights. I also didn’t mind the Zdarsky Batman run, and the new Absolute Batman seems promising from initial issues! I reread a lot of manga this year, including Bleach and Hunter x Hunter, which remain solid shounen (especially Hunter x Hunter). I also enjoyed the ongoing SJ comics Witch Watch, Dandadan, and Akane-Banashi. I also read (and got really into) CITY by Keiichi Arawai, author of perhaps my favourite gag manga ever (Nichijou) – and it was like discovering that not only did a sequel to your favourite show existed, but also that the author had really evolved their craft in the meantime – just a really great manga to read start to end.
Unlike reading, I managed to watch alot of films this year - 79 total! You can go read my reviews and ratings on my letterboxd for actual details or recommendations. I went into both Mad Max and Alien phases this year, and (re)watched them. Actually, I discovered that the first Mad Max was shot partly in the same area as my Australian family! I Saw the TV Glow probably ended up being my favourite film of the year, but certainly not for everyone. There was also a decent amount of watching ‘comfort food’ films, which for me are mediocre superhero flicks.
I watched a decent amount of TV too this year! I’m also slowly working on House, M.D. (current comfort show). In hindisght, there were a few big highlights of the year. The Curse is perhaps an all-time piece, and Baby Reindeer and Pantheon really stuck in my brain. I particularly recommend Pantheon for anyone looking for ‘anime sci-fi’ vibes – sure, it’s not 100% technically accurate, but it has interesting ideas and explores them in interesting ways, which is all I can ask for. The Sympathizer and The Penguin also really got me – I really recommend The Sympathizer, with great Park Chan-wook stylings (as the showrunner), and the uniqueness of watching a big-budget TV show mostly in Vietnamese. However, both shows are certainly pretty rough in terms of violence, and aren’t perfect by any means.
Sadly, this is where work really reduced the amount of time I had (compare this list to last year…). I have a long, long backlog of games that came out that I really want to try out. Of these, I found Outer Wilds amazing – it absolutely lives up to its (almost insane levels of) hype, and Paper Mario was a fun take on an RPG. I also played through Final Fantasy XVI, but I have a bit to go to finish the platinum for it, so it’ll have to be on next year’s list.
I didn’t track these other things myself, but Spotify told me I really liked Porter Robinson, Magdalena Bay, Beabadoobee, Charli XCX, and Royel Otis. I managed to see 3 of these in person at concerts this year, along with other great artists like Clairo and Odessza, which was fun. I got super into Magdalena Bay’s stuff, which comes with cool websites, production breakdowns, and just generally some fun vibes.
I watch way too much youtube, but I really liked Jacob Geller’s video essays, WolfeyVGC’s videos on competitive pokemon, and Tom Walker’s generally insane VODs.
I spent way too much time on tiktoks and reels this year, and trying to track what I saw would probably send me insane and/or into a depressive funk. I like the stupid yap dollar trend. I like the educational Map Men series. I enjoy the unhinged Sydney Metro fanpage. Short-form video content is fun and here to stay, but by god is it too addictive.
And that’s about it from me this year! I guess this is probably the most consistent blog post I have, so I’ll see you same time next year. I have a feeling that next year might have some more interesting items on the list :)
]]>I also read many other manga and comics I did not track. I really enjoyed the ongoing run of Immortal X-men, and did a big reread of One Piece and Jujutsu Kaisen. Maybe this year I’ll read more books…
Yes, I did watch most of the Mission Impossible films this past year. I track my film watching on my letterboxd.
In addition, I also watched The Curse and Monarch: Legacy of Monsters, but these did not finish until this year. I usually like to have one or two weekly shows to look forward to.
I like to play through entire series when I have the right consoles - such as playing through (and 100%ing) both The Last of Us games and the entire Pikmin series.
I ended up watching and playing a lot of things over the past year! It was a bit of a weird year, with a big summer break while I waited to start my PhD, so I don’t know if I’ll cover as much ground this year, but we will see.
]]>Over the past year, I applied to a tonnes of schools for PhD programs, interviewed at a bunch, and got into a few. This post will be a mixture of reflections and explanations about how this process went, what I did to prepare, and how I felt during it. There’s already a number of fantastic explanations, interviews, and stories about getting PhDs in NLP and ML more broadly, and if you are thinking about or preparing to do a PhD in NLP I really recommend all these, including:
There are so many more resources out there (for example, see this list), and I definitely recommend sitting and reading as many of these as you can! With this post, I’m not aiming to target anything in particular, but just give another datapoint that people might appreciate.
My journey to seriously starting to prepare for PhD applications begun with starting as a predoctoral young investigator at AI2. This basically gave me the chance to work on a few research projects with some great, immensely talented folks, and get lots of advice / help with applying. This is obviously something that is fairly rare, but it definitely majorly helped my chances, and made me personally much more confident in applying (since it gave me the chance to get some projects under my belt). Before the program, I had a little research experience, but no published papers, and was definitely not feeling up to applying to programs - although I know and have heard of folks succeeding in applications in similar scenarios.
Over the initial PYI year, I managed to publish one paper, and put out two others (which were under review by the time I applied). Maybe not the most productive or impactful researcher out there, but honestly the bigger thing was being exposed to the broader NLP community and gaining the confidence to talk about my research ideas and NLP research more broadly with the community at large. It turns out sometimes just being confident enough to talk about your ideas can get you some places!
Around August, I started to look around and work out where I wanted to apply, and what the deadlines, professors, programs, and so on were like. I had a draft list bouncing around all year, but this is when I think I started to really take it more seriously. For every school I could think of (i.e., had heard something about, or knew about a professor there), I listed (a) the professors there and their areas of interest, (b) the application deadline, (c) any notable details about the program, and (d) general positives or negatives that I could think of (e.g., location, what I had heard from their students, so on). I tried to harden this into a partially-ordered list, segmenting the schools into ‘tiers’, and trying to pare it down so that only the places I would genuinely accept if given an offer were on the list. I also shared the list with some folks, who helped suggesting places I hadn’t heard of or considered before.
The final list was still long (I ended up applying to ~13 schools), but going through the process helped focus my efforts a bit.
Around the start of October, I started to draft up my statement of purpose (SoP) and ask folks to write letters of recommendation for me. After some time working on the SoP by myself, I shared it around with a few folks and got back some really good, clear feedback. In the end, I probably rewrote the SoP almost from scratch 3-4 times in order to really shake it up and keep it tight. I definitely recommend reading the SoPs on CS SoP for inspiration, and sharing your drafts around (this can be scary, but the feedback is definitely worthwhile!). It’s cliche to say, but try to stay positive and not compare yourself to too many of the (legitimately insanely good) candidates out there - just do what you can, without losing your mind over it.
Right before the first deadlines started approaching in late november, I basically started creating accounts in the grad application systems and checking to see what documents I would need. Something worth noting is that every school often has unique constraints or things to do for its application: most schools constrain the SoP in some way (I saw 2 pages, 1 page, 1000 words, single-spacing, double-spacing, pasting raw text only with no formatting). Additionally, some schools require a personal statement (talking more about your own experience and goals), and some require random short answer questions (e.g., write about a time you had to teach).
I ended up spending most of the week before and of thanksgiving writing the first batch of these, targeting the earlier applications first (warning: European deadlines are much earlier than the American deadlines, and you might have to be ready in october or early november). This first hump was the toughest, but once I was through it I had a bunch of variants of personal statements tailored for a bunch of different universities, with the most useful two variants were the two-page and 1000 word version, since shrinking two pages down to 1000 words requires a non-trivial amount of cutting.
Much of this time was also spent inputting my details into the varying application systems, which was mind-numbingly boring and took ages at times. You should really do this because often you can log in and send out recommendation letter requests once you have created an account, which gives more time for your letter-writers to work out the system and send in their letters.
Over December, I attended EMNLP and during this continued the application process. At this point, I was just trying to stay a bit ahead of the deadlines while also attending the conference. I mostly had the parts I needed, as noted above, but it still felt like a frantic rush, as I often was submitting 2-3 applications a day and still had to write the final paragraph of the personal statement where I tried to make connections between my work / interests and the professors I was applying to explicit (some people recommending not doing this, I’m not sure what is best).
I finished up applications by December 15 while in Australia (some schools have later deadlines, but I did not go for them). Unfortunately, there was a paper deadline 5 days later, so I had to turn my attention to paper-writing, but after that I just took a massive break and basically didn’t touch my laptop at all for two weeks.
Starting in very late december (around the 28th), I started to get interview requests (although this was very early). This then increased a bit in the first week of january, and up until around the end of January (~the 25th), I started to get emails asking for interviews on a regular basis. These were actually pretty spread out - I think I almost always had at least one interview lined up at any given point from late December to late January. These were a bit nerve-wracking, of course, but were all ultimately enjoyable and not too difficult. It turns out that talking about research and NLP is fun, actually! I did prepare for these a little, collating a list of common questions and preparing answers, and I sort of refined this and worked on it over time. For each professor, I looked up their background and recent work, and tried to think about questions specific to that school or professor that would be useful.
Here’s a (abbreviated and paraphrased) list of questions I prepared for and got from multiple people. I don’t think any of these should be too surprising!
For the large part, questions were not too technical, but definitely be prepared to talk about your own work and maybe 1-2 recent papers in detail, to be safe. Although - every professor has their own style, so it’s hard to give one-size-fits-all advice.
Also - I did not look at gradcafe or similar through this process, and to be honest didn’t ever feel the urge to. Everything runs at its own pace, and at this point you just have to go with the flow!
I more or less was done with interviews by the end of January, and I started getting offers in early February. It’s worth noting timelines here seem very, very messy - I know folks that got offers from the same school at different times, and even right up until April some folks I know were getting offers. Every school has its own pace, and I think the best strategy once interviews are done is to just try and not think much about it and don’t expect super-fast turnarounds.
I took a nice long break vacationing around Europe to enjoy the period before starting PhD, and I definitely recommend taking some time off if you can afford it!
This was a very rough timeline with some thoughts. Don’t worry if you are ready sooner or later than the dates here, and feel free to move at your own pace. But hopefully it provides a little insight into what (I think) a semi-reasonable timelines looks like. It’s also worth noting that applying for things like fellowships, NSF often take place earlier than admissions, which can be another chance to get stuff ready early.
]]>14/02/23 Update: I have updated the model responses in light of the recently-released ChatGPT and text-davinci-003 models, scroll to the bottom of this page to check them out!
Latin has more text extant than Ancient Greek (I think - at least, it would be more common than Ancient Greek in some random web/book crawl). So this should mean GPT-3 is better at it. Furthermore, Google Translate Latin exists, which provides a fun comparison.
Let’s take a few lines from Book 12 of the Aeneid, and see how it goes. For those who haven’t used GPT-3 before, it works by prompting it with some text in natural language, and then letting it autocomplete your text (a bit like pressing autocomplete on a keyboard, but for a full paragraph). People often try out multiple prompts to work out what is best - I messed around a little and settled on a basic format of Translate the following to English: <text> English:, which seemed to do okay. In the following chunks, bolded indicates model-generated text, with the rest of the text being input by me:
Translate the following to English:
Turnus ut infractos adverso Marte Latinos defecisse videt, sua nunc promissa reposci, se signari oculis, ultro implacabilis ardet attollitque animos. Poenorum qualis in arvis saucius ille gravi venantum vulnere pectus tum demum movet arma leo gaudetque comantis excutiens cervice toros fixumque latronis inpavidus frangit telum et fremit ore cruento: haud secus adcenso gliscit violentia Turno.
English:
Turnus sees the Latins defeated by Mars and is furious, demanding that his promises be fulfilled. He is determined to take revenge and rallies his troops. Just as a wounded lion in the fields finally moves to attack the hunters who wounded him, after shaking off the blood from his mane, and breaks the spear of the bold robber with his powerful neck, so too does Turnus’ fury grow as he climbs up to fight.
For those of you who didn’t spend your final year of high school memorising book 12 of the Aeneid, here’s a professional translation (from poetry in translation):
When Turnus saw the Latins exhausted, and weakened by their military reverse, himself the subject of every gaze, his own promise to them yet unfulfilled, he burned implacably, and unprompted, and raised his courage. As a lion, in the African bush, severely hurt by huntsmen with a wound to the chest, only then rouses himself to battle, tosses his shaggy mane over his neck, in joy, and, unafraid, snaps off the spear some poacher has planted in him, roaring from blood-stained jaws: so the violence grew in Turnus’s inflamed heart.
This does pretty well! It sort of muddles bits, but the lion simile remains intact in a nice way. By the way, Google Translate does awful here:
Turnus sees that the Latins, broken in opposition to Mars, have failed, now that their promises have been restored, and that they have been sealed with their eyes; He inflicts punishments such as those in the fields on the chest of the hunted with a severe wound, then at last he moves his arms like a lion and rejoices, shaking the neck of the commander, fixed on his neck, and fearlessly breaks his weapon and roars with a bloody mouth: just as the burning violence of Turnus glides.
So it appears (to me) that GPT-3 has a better idea of Latin than whatever Google uses! The Aeneid is too easy, though - an all-time classic, which has probably appeared all over in GPT-3’s training. Let’s try something a bit different, Ovid’s Ars Amatoria (still famous, but not as mega-famous as the Roman Epic). I’ll use an easier sentence:
Translate the following to English: Atque aliqua ex illis cum regum nomina quaeret, Quae loca, qui montes, quaeve ferantur aquae, Omnia responde, nec tantum siqua rogabit; Et quae nescieris, ut bene nota refer.
English:
And when she asks about the names of the kings, the places, the mountains, and the waters that are mentioned, answer everything, and even if she doesn’t ask, explain what you know clearly. And if you don’t know something, say so.
Here’s Google’s version:
And when some of them ask for the names of kings, what places, what mountains, what waters are carried, answer everything, and he will not ask so much; And what you do not know, report as well known.
And a more professional, actual translation:
And if she, among them, asks the name of a king, what place, what mountains, and what stream’s displayed, you can reply to all, and more if she asks: and what you don’t know, reply as memory prompts.
So GPT-3 did pretty well here! Although the final sentence is wrong (Ovid says to say what you can recall, rather than to say if you don’t know).
For good measure, let’s try something legal (albeit still fairly straightforward): Cicero’s Pro Roscio Amerino, a famous legal defence.
Translate the following to English: fide sapientiaque vestra fretus plus oneris sustuli quam ferre me posse intellego. hoc onus si vos aliqua ex parte adlevabitis, feram ut potero studio et industria, iudices; sin a vobis, id quod non spero, deserar, tamen animo non deficiam et id quod suscepi quoad potero perferam. quod si perferre non potero1, opprimi me onere offici malo quam id quod mihi cum fide semel impositum est aut propter perfidiam abicere aut propter infirmitatem animi deponere.
English:
I realize that I have taken on more than I can handle, but I am relying on your wisdom and support. If you can help me in any way, I will do my best to carry on, judges. If I am abandoned by you, which I do not hope for, I will not give up and will do my best to carry out what I have started. If I cannot do so, I would rather be crushed by the weight of my responsibilities than abandon or give up on what has been entrusted to me.
A professional translation would be:
Relying on your integrity and wisdom, I have undertaken a greater burden than, I am well aware, I am able to bear. If you, in some degree, lighten this burden, O judges, I will bear it as well as I can with zeal and industry. But if, as I do not expect, I am abandoned by you, still I will not fail in courage, and I will bear what I have undertaken as well as I can. But if I cannot support it, I had rather be overwhelmed by the weight of my duty, than either through treachery betray, or through weakness of mind desert, that which has been once honestly entrusted to me.
As a final fun test, let’s see if GPT-3 knows the basic forms of the verb to be:
What is the 1st person singular form of the verb ‘to be’ in Latin? Sum
What is the 2nd person singular form of the verb ‘to be’ in Latin? The 2nd person singular form of the verb ‘to be’ in Latin is ‘est’.
What is the 3rd person singular form of the verb ‘to be’ in Latin? The third person singular form of the verb “to be” in Latin is “est.”
What is the 1st person plural form of the verb ‘to be’ in Latin? sumus
What is the 2nd person plural form of the verb ‘to be’ in Latin? You are (plural)
What is the 3rd person plural form of the verb ‘to be’ in Latin? Sunt.
Well, it gets a bit confused, although I think this is quite hard considering the prompt doesn’t give any examples. I find it interesting it gets the 2nd person forms wrong, considering those are forms I have found to be far far less common than the others.
This did a really good job! Overall, I’m really impressed by GPT-3’s Latin abilities. I wouldn’t put it past it having memorised these texts (these all exist on the internet with translations usually somewhere in the same document), but it’s still quite impressive to see. It also handily beats Google, which is fun.
Let’s start with something really easy: the first four lines of the Iliad.
Translate the following to English:
μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος οὐλομένην, ἣ μυρί᾽ Ἀχαιοῖς ἄλγε᾽ ἔθηκε, πολλὰς δ᾽ ἰφθίμους ψυχὰς Ἄϊδι προΐαψεν ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν οἰωνοῖσί τε πᾶσι, Διὸς δ᾽ ἐτελείετο βουλή, ἐξ οὗ δὴ τὰ πρῶτα διαστήτην ἐρίσαντε Ἀτρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς.
English: Achilles, the son of Peleus, killed Hector, who had caused great harm to the Achaeans. He also killed many other brave men, and he terrified those who were still alive. This was the will of Zeus, who had first stirred up the conflict between King Atreus’ son Agamemnon and noble Achilles.
This is already worse - Hector doesn’t even appear in the Greek text! However, overall it’s okay, seemingly getting a sense of the text, although being definitely incorrect at times (unlike the Latin translations, which tended to be correct). For those of you who don’t know the Iliad by heart, these lines would be translated something like (from poetry in translation):
Goddess, sing me the anger, of Achilles, Peleus’ son, that fatal anger that brought countless sorrows on the Greeks, and sent many valiant souls of warriors down to Hades, leaving their bodies as spoil for dogs and carrion birds: for thus was the will of Zeus brought to fulfilment. Sing of it from the moment when Agamemnon, Atreus’ son, that king of men, parted in wrath from noble Achilles.
If you prompt it with the first clause(s), it gets a much better translation, likely due to it having encountered the beginning of the Iliad countless times in training:
Sing, Muse, of the wrath of Achilles, son of Peleus, that brought countless ills upon the Achaeans. Many a brave soul did it send hurrying down to Hades, and many a hero did it yield a prey to dogs and vultures, for so was the will of Zeus fulfilled from the time when first they quarreled,Atreus’ son, king of men, and brilliant Achilles.
There are many reasons GPT-3 could doing worse for Greek, but one large change from Latin is the use of a different script. Let’s try normalising the text into beta code (a standard conversion for Ancient Greek into ASCII) and see how it goes:
Translate the following to English:
mh=nin a)/eide qea\ *phlhi+a/dew *)axilh=os ou)lome/nhn, h(\ muri/’ *)axaioi=s a)/lge’ e)/qhke, polla\s d’ i)fqi/mous yuxa\s *)/ai+di proi/+ayen h(rw/wn, au)tou\s de\ e(lw/ria teu=xe ku/nessin oi)wnoi=si/ te pa=si, *dio\s d’ e)telei/eto boulh/, e)c ou(= dh\ ta\ prw=ta diasth/thn e)ri/sante *)atrei/+dhs te a)/nac a)ndrw=n kai\ di=os *)axilleu/s.
English:
After he had seen the many wrongs that had been done to Axylus, a just man, Hercules went to the land of the Achaeans and killed many of the evil men there. But he spared the life of Axylus, because he was a good man. So Zeus gave him a city and made him a god, and the first temple to Hercules was built by the men of Troy after they had been saved from disaster by Hercules and by Zeus himself.
This is even worse! Removing the accent symbols (leaving only the letters) also doesn’t help. Evidently, GPT-3 doesn’t really know Ancient Greek at all (although it has definitely seen the Iliad).
Since the Iliad is so famous, let’s try something more obscure. Here’s a paragraph from Heron’s On Automata:
Translate the following to English:
καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
English: I was once thinking about a picture that seemed better to me than it actually was. The subject of the picture was the story of the Odyssey. Part of the picture showed the twelve Labors of Hercules, which were divided into three groups. These were done by some of the Danaids who were repairing their ships and were becoming burdened.
And if I prompt the model with the first few words:
…English: As I laid out before, I was speaking about a painting that seemed better to me. The painting was hung in such a way that in the beginning, when you opened it, you saw twelve animals painted on it, divided into three verses. These were some of the Danaans, who, when they were repairing their ships, became encircled.
Here’s the correct translation:
As I laid out before, I will talk about one box that seems superior to me. The story set in it was the one about Naupilius. And its parts went like this. In the beginning, when the box opened, 12 painted figurines appeared. These were divided into 3 rows; and these were made to represent some of the Danaans (Greeks) preparing their ships and launching them.
So, the model has clearly learnt some vocab - it gets the number 12, it works out Danaans, ships, and a few other things. These translations seem vaguely familiar but ultimately are definitely incorrect. It’s cool that it works this well, but future classics scholars are definitely not going to be able to use it to cheat on their university homework 😜
Unsurprisingly, GPT-3 is pretty good! Its Latin translation abilities seem pretty good, and it definitely feels like it has seen ancient texts written in English enough to regurgitate them when prompted. Its success at Latin makes its harsher failures in translating Ancient Greek a little surprising, but I suspect the big switch into a different script makes things much harder, and that there is simply far less Ancient Greek text in its pretraining corpus. I’ve focussed on languages I’m personally familiar with here, but it would also be interesting to see if one could somehow get a large language model like GPT-3 to give its best guess on how to translate Linear A or similar!
As a technical note, I used text-davinci-002 for this post.
Very shortly after writing this blog post, OpenAI came out with ChatGPT and text-davinci-003. Here’s a quick try of all the above prompts with these two models.
First, let’s try out text-davinci-003, the successor to text-davinci-002 (based on name and OpenAI’s model documentation). I also tweaked the input prompts to indicate the original language of the text - I found this slightly improved model’s responses, especially for Greek.
text-davinvi-003 LatinTranslate the following to English from Latin:
Turnus ut infractos adverso Marte Latinos defecisse videt, sua nunc promissa reposci, se signari oculis, ultro implacabilis ardet attollitque animos. Poenorum qualis in arvis saucius ille gravi venantum vulnere pectus tum demum movet arma leo gaudetque comantis excutiens cervice toros fixumque latronis inpavidus frangit telum et fremit ore cruento: haud secus adcenso gliscit violentia Turno.
English:
Turnus sees that the Latins have been defeated by the adverse forces of Mars, and he now demands his promised reward; his eyes blaze with implacable rage and his spirit is raised. Just as a wounded lion in the fields, wounded by the heavy weapons of the hunters, finally stirs and rises up, shaking his maned neck and with fearless strength breaking the spear of the robber and roaring with bloody mouth, so too does Turnus surge with violence.
Translate the following to English from Latin:
Atque aliqua ex illis cum regum nomina quaeret, Quae loca, qui montes, quaeve ferantur aquae, Omnia responde, nec tantum siqua rogabit; Et quae nescieris, ut bene nota refer.
English:
And when someone asks you for the names of the kings, the places they lived, the mountains, and the rivers they are said to have, answer them all, even if you are not asked; and if you don’t know something, make sure to report it back as if you did.
Translate the following to English from Latin:
fide sapientiaque vestra fretus plus oneris sustuli quam ferre me posse intellego. hoc onus si vos aliqua ex parte adlevabitis, feram ut potero studio et industria, iudices; sin a vobis, id quod non spero, deserar, tamen animo non deficiam et id quod suscepi quoad potero perferam. quod si perferre non potero1, opprimi me onere offici malo quam id quod mihi cum fide semel impositum est aut propter perfidiam abicere aut propter infirmitatem animi deponere.
English:
Trusting in your fidelity and wisdom, I have taken on more of a burden than I understand I can bear. If you lighten this burden in any way, I shall bear it as best I can with diligence and hard work, judges; if I am abandoned by you, which I do not hope, nevertheless I shall not fail in spirit and I shall carry out as best I can what I have taken on. But if I cannot carry it out, I would rather be overwhelmed by the burden of this duty than abandon or lay down what has been entrusted to me with faithfulness because of my lack of courage.
What is the 1st person singular form of the verb ‘to be’ in Latin? Esse.
What is the 2nd person singular form of the verb ‘to be’ in Latin? Esse.
What is the 3rd person singular form of the verb ‘to be’ in Latin? Est.
What is the 1st person plural form of the verb ‘to be’ in Latin? Sumus.
What is the 2nd person plural form of the verb ‘to be’ in Latin? Estis.
What is the 3rd person plural form of the verb ‘to be’ in Latin? Sunt.
Verdict: text-davinci-003 does really well! I think overall these translations are pretty good. You can probably quibble with some of the choices, but this is generally pretty good. However, it does get some of the basic verb form questions wrong, which is surprising - these should be really easy for it!
text-davinvi-003 GreekTranslate the following to English from Homeric Greek:
μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος οὐλομένην, ἣ μυρί᾽ Ἀχαιοῖς ἄλγε᾽ ἔθηκε, πολλὰς δ᾽ ἰφθίμους ψυχὰς Ἄϊδι προΐαψεν ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν οἰωνοῖσί τε πᾶσι, Διὸς δ᾽ ἐτελείετο βουλή, ἐξ οὗ δὴ τὰ πρῶτα διαστήτην ἐρίσαντε Ἀτρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς.
English: Sing, goddess, the anger of Peleus’ son Achilleus, that destructive wrath which brought countless woes upon the Achaeans, and sent forth to Hades many valiant souls of heroes; and made their bodies a prey for dogs and all birds, and the will of Zeus was accomplished since that time when first there stood in strife Atreus’ son, king of men, and brilliant Achilleus.
Translate the following to English from Ancient Greek:
καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
English: As I had previously planned, I saw a painting which seemed better to me. A myth was depicted in it, concerning the story of Nauplius. In parts, it was arranged in this way: when the painting was opened, there were twelve zodiacs written in three verses; these were composed by some of the Danaans, who were making repairs to the ships and becoming involved in a voyage.
Translate the following to English from Ancient Greek:
καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
English: As I laid out before, I saw a painting that seemed better than the others. There was a myth drawn on it, according to the story of Nauplius. The painting was divided into twelve parts, which were further divided into three verses. These verses were composed by some of the Greeks who were repairing their ships and preparing for a voyage.
Verdict: I think that text-davinci-003 does much better than 002 here: it gets the Iliad more or less fine, while the Heron has some issues (likely due to vocab issues). This is pretty surprisingly good to me, and I imagine with even a little exposure to more Ancient Greek text the model would do even better. Without telling it to translate ‘from Greek’, as I did for 002, the model does a bit worse, and muddles more words, but is still pretty good.
Next, let’s try ChatGPT. To be fair, I’m going to use the same prompts as above on a blank conversation (as there is no API available yet). To be exact, I’m using the January 30th edition of ChatGPT.
ChatGPT (currently) gives titles to the conversations that occur. I don’t know what model does this, but it gives some interesting insight as to what a (I guess GPT-based) model thinks is going on. I’ve included these titles below the prompt and model response.
ChatGPT LatinTranslate the following to English from Latin:
Turnus ut infractos adverso Marte Latinos defecisse videt, sua nunc promissa reposci, se signari oculis, ultro implacabilis ardet attollitque animos. Poenorum qualis in arvis saucius ille gravi venantum vulnere pectus tum demum movet arma leo gaudetque comantis excutiens cervice toros fixumque latronis inpavidus frangit telum et fremit ore cruento: haud secus adcenso gliscit violentia Turno.
English:
When Turnus sees that the Latins have fallen under the unfavorable power of Mars despite their unbroken spirit, he now demands that his promises be fulfilled. He rages and raises his spirit, burning with implacable anger. Just as a wounded lion in the fields, once struck by a serious wound from hunters, then only moves to arms, rejoicing as he shakes his mane and fearlessly breaks the weapon of the thief with his steadfast neck and snarling with bloody mouth, so does Turnus’s violence boil up when he ascends.
ChatGPT Title: Turnus’ Furious Ascension
Translate the following to English from Latin: Atque aliqua ex illis cum regum nomina quaeret, Quae loca, qui montes, quaeve ferantur aquae, Omnia responde, nec tantum siqua rogabit; Et quae nescieris, ut bene nota refer.
English:
And answer some of them when they ask about the names of kings, what places, what mountains, and what waters are said to exist; respond to everything, not only if someone asks; and what you do not know, report as well known.
ChatGPT Title: Answer Regal Questions
Translate the following to English from Latin: fide sapientiaque vestra fretus plus oneris sustuli quam ferre me posse intellego. hoc onus si vos aliqua ex parte adlevabitis, feram ut potero studio et industria, iudices; sin a vobis, id quod non spero, deserar, tamen animo non deficiam et id quod suscepi quoad potero perferam. quod si perferre non potero1, opprimi me onere offici malo quam id quod mihi cum fide semel impositum est aut propter perfidiam abicere aut propter infirmitatem animi deponere.
English:
Relying on your faith and wisdom, I have taken on more weight than I understand I am able to bear. If you can relieve this burden in any way, I will bear it to the best of my abilities with effort and determination, judges. If I am abandoned by you, which I do not hope for, I will not let my spirit falter, and I will carry out what I have undertaken as best as I can. If I am unable to carry it out, I would rather be crushed by the weight of my duty than abandon or lay down what was once entrusted to me with faith because of my own faithlessness or weakness of spirit.
ChatGPT Title: Fide Sapientiaque Onus Sustuli
What is the 1st person singular form of the verb ‘to be’ in Latin? In Latin, the 1st person singular form of the verb “to be” is “sum.”
ChatGPT Title: 1st Person Singular “to be” in Latin
What is the 2nd person singular form of the verb ‘to be’ in Latin? The 2nd person singular form of the verb “to be” in Latin is “es.”
ChatGPT Title: 2nd Person “Es” Latin “To Be”
What is the 3rd person singular form of the verb ‘to be’ in Latin? In Latin, the 3rd person singular form of the verb “to be” is “est”.
ChatGPT Title: 3rd Person Singular Latin “Est”
What is the 1st person plural form of the verb ‘to be’ in Latin? The 1st person plural form of the verb “to be” in Latin is “sumus.”
ChatGPT Title: Latin 1st Person Plural To Be
What is the 2nd person plural form of the verb ‘to be’ in Latin? The 2nd person plural form of the verb “to be” in Latin is “estis.”
ChatGPT Title: Latin 2nd Plural “to be”
What is the 3rd person plural form of the verb ‘to be’ in Latin? The third person plural form of the verb “to be” in Latin is “sunt.”
ChatGPT Title: Latin 3rd Person Plural “sunt”
Verdict: Wildly, I think ChatGPT’s quality here is still really good, and it gets all the verb forms right! I think it’s definitely as good as text-davinci-003, although still a little muddled. I suspect more context would help in some cases, giving the model a better idea of what it is translating. Considering the improvements over 002, I definitely wonder if both text-davinci-003 and ChatGPT have benefited from Latin students using it to help with homework… 🤔
ChatGPT GreekTranslate the following to English from Homeric Greek:
μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος οὐλομένην, ἣ μυρί᾽ Ἀχαιοῖς ἄλγε᾽ ἔθηκε, πολλὰς δ᾽ ἰφθίμους ψυχὰς Ἄϊδι προΐαψεν ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν οἰωνοῖσί τε πᾶσι, Διὸς δ᾽ ἐτελείετο βουλή, ἐξ οὗ δὴ τὰ πρῶτα διαστήτην ἐρίσαντε Ἀτρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος Ἀχιλλεύς.
English: “Sing, goddess, the anger of Achilles, son of Peleus, that brought countless woes upon the Achaeans, and sent many valiant souls of heroes to Hades, and made them themselves food for dogs and all birds; for so was the will of Zeus fulfilled, from which the first sufferings arose, that of Atreus’ son, king of men, and Achilles, the glorious.”
ChatGPT Title: Homeric Greek Translation
Translate the following to English from Ancient Greek:
καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
English: “And as I previously mentioned, I love a picture that seems to me to be better. A story was depicted in it about the Naulpian. The parts were arranged in this way: when the picture was opened at the beginning, twelve zodiacs appeared written; these were divided into three lines. And these were made by some of the Danai, repairing the ships and becoming around the harbor.”
ChatGPT Title: Description of a Superior Painting Depicting the Battle of Nafplion
Translate the following to English:
καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
English: As I laid out before, I fell in love with a painting that seemed to me to be superior. There was a story depicted on it, about Nauplius. Part of it was as follows: when the painting was opened at the beginning, there were twelve zodiac signs written in it, which were divided into three lines. These were made by some of the Danaans as they were repairing their ships and undergoing hardship.
ChatGPT Title: Painted Nauplian Mythology
Verdict: Interestingly, I feel that text-davinci-003 did a bit better with these, and ChatGPT makes some interesting spelling errors at times (Danai? Naulpian?). Overall, it’s definitely still impressive, but I wonder if the use of (allegedly) a smaller model or the further training that ChatGPT went through has cost it some Greek ability compared to text-davinci-003.
Overall, these model improvements are very impressive, especially considering translating Ancient Greek and Latin are likely pretty obscure uses for these models - it’d be interesting to learn if these improvements have come from explicit related feedback during the RLHF training process (students, professors giving feedback on classical language translations), or if it’s a byproduct of other tasks (e.g., feedback on general translation tasks, or further pretraining on related data). A big issue that pops out to me here is managing context: many of the issues with the Heron translation likely come from not knowing the surrounding context of the extract and choosing sub-optimal translations (ζῴδια could be Zodiac-related, but it can also mean `figurine’, which is the correct translation here) from this. Likely, giving the model longer extracts or prompting it with the domain / theme / etc of the text would further improve its translations.
However, even just as-is I think that the current abilities displayed here are pretty impressive, and are more than good enough to be moderately useful, although would not replace an expert translation and / or well-annotated text.
]]>I’m using the default Minima theme, considering it’s both simple and fairly easy to modify. These are currently the various things I’ve added to it (beyond general style tweaks):
dark-mode-toggle elementAs I tweak the plugins and design, I’ll come back to this post and add my changes to the list above. Hopefully, it provides a useful list of ways to tweak a Github Pages site to your liking (and remind me of what I added if I chose to change it up again). Alternatively, you can find the codebase for this site here.
]]>tl;dr: check out a fun little Ace Attorney generator I made!
Phoenix Wright: Ace Attorney is a visual novel series created by Shu Takumi (also of Ghost Trick fame) where you play as a lawyer (usually the titular character Phoenix Wright, although this changes over the games) in a heavily fictionalised version of the Japanese legal system. I heavily recommend this series if you like murder mysteries or puzzle games - the gameplay loop is centred around investigating murders and finding evidence, and then using that evidence in court to prove contradictions or lies in witness’ testimonies.
As a visual novel, Ace Attorney is a text-heavy game, and so with the recent release of the Great Ace Attorney Chronicles and the popularity of the twitter court bot, I thought it would be fun to make use of the court bot’s code to auto-generate Ace Attorney scripts and animate them! This is fairly easy to do, as it turns out - I used the great aitextgen library for training models and generating text, and the objection engine library for animating the scripts - all I had to do really was train the model itself and write some code to convert it to a format the objection engine recognises.
I made this into a little streamlit app available here using the 125M GPT-neo model (in order to keep processing times reasonably fast), so check it out! And if you find yourself enjoying the scenarios, I definitely recommend checking out any of the Ace Attorney games.
]]>🚨 It’s important to note that I’m not filtering the bot, so it could produce potentially harmful text. Ace Attorney is mostly a teen-rated game, so there shouldn’t be much worse than murder mystery discussions, but the original training data of GPT-neo was much wider and expansive, so be warned!
Automata in ancient times were likely thought of quite differently from how we might think about ‘robots’ today: this was a time before electricity and computers, and so the modern notion of a machine holding circuits and powered by some internal battery did not yet exist. Mentions of automata date back to Homer, the most obvious and notable being the self-moving tripods and robotic assistants Hephaestus crafts in the Iliad:
Self-moving tripods (Book XVIII, 372–377 - iliad)
τὸν δ᾽ εὗρ᾽ ἱδρώοντα ἑλισσόμενον περὶ φύσας
σπεύδοντα: τρίποδας γὰρ ἐείκοσι πάντας ἔτευχεν
ἑστάμεναι περὶ τοῖχον ἐϋσταθέος μεγάροιο,
χρύσεα δέ σφ᾽ ὑπὸ κύκλα ἑκάστῳ πυθμένι θῆκεν,
ὄφρά οἱ αὐτόματοι θεῖον δυσαίατ᾽ ἀγῶνα
ἠδ᾽ αὖτις πρὸς δῶμα νεοίατο θαῦμα ἰδέσθαι.
And she [Thetis] found him [Hephaestus] sweating, hurrying about
his bellows, as he was crafting tripods, twenty in all,
to stand around the wall of his well-built hall,
and he had fitted golden wheels beneath the base of each
so that they on their own enter the meeting of the gods,
and then would be able to again return back to his house, a wonder to behold.
Robot assisstants (Book XVIII, 410–420, Illiad)
… ὑπὸ δ᾽ ἀμφίπολοι ῥώοντο ἄνακτι
χρύσειαι ζωῇσι νεήνισιν εἰοικυῖαι.
τῇς ἐν μὲν νόος ἐστὶ μετὰ φρεσίν, ἐν δὲ καὶ αὐδὴ
καὶ σθένος, ἀθανάτων δὲ θεῶν ἄπο ἔργα ἴσασιν.
And attendants moved, supporting their lord
golden ones, like living young women.
They had sense and reason, and speech
and strength, and knowledge of handiwork from the immortal gods.
(Translations by me)
Also worth noting are Hephaestus’ automatic bellows (Book XVIII, 470–473, Illiad) and the Phaeacian’s mind-reading automatic ships (Book VIII, 555–563, Odyssey). At the time of writing these automata may have been thought of purely in magical terms, but later on (e.g. by Hero’s time), such things would have been linked with the technical. While we can’t assume that Homer thought of these things in mechanical terms, the links are striking and interesting: the tripods explicitly have wheels and are linked with Hephaestus, who explicitly constructs things and is a craftsman. As such, even if originally these devices were dreamt of as magic, they undoubtedly served as inspiration for creators like Hero, who developed the techniques to make things like self-moving tripods a reality.
Either way, the notion of automata has been around for a long time, and Hero was by far not the first inventor to build them. Rather, he is another in a group of automata-makers, who built on each others’ work. Other big names in ancient automata were Philo of Byzantium, who Hero explicitly names and builds off, and Ctesibius, who invented an early form of the pipe organ and is credited with ‘inventing’ pneumatics. As we will see, Hero is quite open about using techniques invented by others, and his own fame is likely more a function of more of his work surviving, rather than him being more skilled (although he certainly was quite skilful).
On Automata was a treatise written by Hero, and is split into two books: the first describes what he dubs a ‘mobile automaton’ and the second describes a ‘stationary automaton’. It appears to mainly be a sort of instruction/explanation manual, with the end cut off. For a more in-depth look at the manuscript tradition surrounding it, I suggest reading Grillo’s PhD thesis, which goes into detail on this history. I’ve used their adaption of the text below when translating the original Greek.
The mobile automaton is essentially a mobile diorama of sorts: a shrine (of sorts) of Dionysus with features like dancing figures and self-lighting fires. The diorama/shrine itself moves (and hence is a mobile automaton), moving in a straight line, performing a scene, and then moving back, with everything going back to their starting locations, allowing re-use. Interestingly, Hero describes different motions possible for the automata, including a rectangular and ‘snake-like’ motion. It’s debated where this sort of automaton would be displayed, but both at private parties and public theatres seems plausible. Hero’s description of the automata is in some ways closer to a textbook than an exact manual: he provides different ideas for different movements and setups, looking not to provide an exact schematic for a single device, but rather illustrate a series of different mechanisms and systems that can be put together to create these sorts of automata:
ἐν μὲν οὖν τούτῳ τῷ βιβλίῳ περὶ τῶν ὑπαγόντων γρά- φομεν ἐκθέμενοι διάθεσιν ποικίλην κατά γε ἡμᾶς, ἥτις ἁρμόσει πάσῃ διαθέσει πρὸς τὸ δύνασθαι τὸν προαιρούμενον ἑτέρως διατίθεσθαι μηδὲν ἐπιζητοῦντα πρὸς τὴν τῆς διαθέσεως ἐνέργειαν·
So, In this book, I am writing about mobile automata, setting out an intricate design of my own, which will fit to all other arrangements, such that whoever wants to arrange it differently will be able to, without needing anything else for the creation of the arrangement.
(I.8, translation mine)
So, let’s go through these various systems in order (chapter numbers and paragraphs given in brackets). In order, Hero describes:
The area and material requirements of the automaton - i.e. what sort of material to build it out of (light timber, bronze and iron, etc) and where to put it (if possible, a flat even surface). (1.II.1-5)
A broad description of the central mechanic of both the stationary and mobile automata: counterweights, with rope wound around axles. The counterweight rests on millet/mustard (or dry sand for the stationary automata) in a tube. At runtime, the millet or mustard is slowly drained from the bottom, allowing the counterweight and an attached rope to descend, causing the rope to spin any axles it’s attached to. (1.II.6-1.IV)
The rough design of the automata, illustrated below. Some rough measurements are given. (1.IV)
(1) Τούτων δὲ οὕτως ὑπαρχόντων ἐν ἀρχῇ τεθέντος τοῦ αὐτομάτου ἐπί τινα τόπον καὶ ἀποστάντων <ἡμῶν> μετ ̓ οὐ πολὺν χρόνον ὑπάξει τὸ αὐτόματον ἐπί τινα ὡρισμένον τόπον. καὶ στάντος αὐτοῦ ἀνακαυθήσεται ὁ κατάπροσθεν τοῦ Διονύσου βωμός. καὶ ἐκ μὲν τοῦ θύρσου τοῦ Διονύσου ἤτοι γάλα ἢ ὕδωρ ἐκπυτισθήσεται, ἐκ δὲ τοῦ σκύφους οἶνος ἐκχυθήσεται ἐπὶ τὸν ὑποκείμενον πανθηρίσκον.
(2) στεφανωθήσεται δὲ πᾶς ὁ παρὰ τοὺς τέσσαρας κίονας τῆς βάσεως τόπος. αἱ δὲ περικύκλῳ Βάκχαι περιελεύσονται χορεύουσαι περὶ τὸν ναΐσκον. καὶ ἦχος ἔσται τυμπάνων καὶ κυμβάλων. καὶ μετὰ ταῦτα σταθέντων τῶν ἤχων ἀποστραφήσεται τὸ τοῦ Διονύσου ζῴδιον εἰς τὸ ἐκτὸς μέρος. ἅμα δὲ τούτῳ καὶ ἡ ἐπικειμένη τῷ πυργίῳ Νίκη συνεπιστραφήσεται.
(3) καὶ πάλιν ὁ ἔμπροσθεν γεγονὼς τοῦ Διονύσου βωμός, πρότερον δὲ ὀπίσθιος ὑπάρχων ἀνακαυθήσεται. καὶ πάλιν ἐκ μὲν τοῦ θύρσου ὁ ἀναπυτισμὸς ἔσται, ἐκ δὲ τοῦ σκύφους ἡ ἔκχυσις. καὶ πάλιν αἱ Βάκχαι χορεύσουσι περιερχόμεναι τὸν ναΐσκον μετὰ ψόφου τυμπάνων καὶ κυμβάλων. καὶ πάλιν σταθεισῶν αὐτῶν τὸ αὐτόματον ἀναχωρήσει εἰς τὸν ἐξ ἀρχῆς τόπον.
(1) And with things in this way, at first the automaton is placed in a spot, and while we are standing away (from it), after a short time the automaton will move to a defined location. And once it stands still, the altar in front of Dionysus will flare up. And either milk or water will flow out of Dionysus’ thyrsus, and wine will flow out of his cup onto the panther lying below.
(2) And every place near the four columns of the altar will be crowned (with garlands). And the bacchantes all around will go around the shrine, dancing. And there will be a sound of kettledrums and cymbals. After this, when the sound has halted, the figurine of Dionysus will turn to the outside. At the same time as this, the Nike placed on the cupola will turn together with it.
(3) And again, the alter, which is in front of Dionysus and before was behind him, will flare up. And again there will be the spurt from the thyrsus and the outpour from the cup. And again the bacchantes will dance going around the shrine with the noise of kettledrums and cymbals. And again, when they have come to a stop, the automaton will go back to the place it started.
After this, we come to descriptions of how to make motion (the more complex of which may not actually work well in practice):
Motion forward and back (1.V-VI), with allowances for pauses. This is done by wounding rope in particular ways and adding some slack in certain spots for pauses.
Circular motion (1.VII-VIII), which uses axels set on angles, and wheels of different sizes.
Rectilinear motion (1.IX-X), which uses two sets of wheels, alternately raised and lowered.
‘Snake-like’ or simply non-rectangular motion (1.XI), for which Hero describes 3 configurations. All of these essentially use the core idea of multiple independent axels for wheels, allowing different degrees of turning.
After this, Hero turns to implementations of aspects of the performance:
Lighting the fires (XII) is done by lighting a fire (probably manually before the automaton is run) under a grate covered by a plate, and then moving the plate via the same rope-counterweight system used for everything else.
Getting milk and wine (XIII) to spurt out is done via the use of pipes and a tap system, with again ropes controlling this system.
Sound is made by pouring little balls on cymbals and drums, dropped by opening a door. (XIV).
Garlands are dropped on the stage from trapdoors (XV), much like the balls.
The baccantes are made to ‘dance’ by spinning them on their own wooden ring on the stage (XVI).
He then adds small details on how to hide the cords, showing how to split up the spaces for the millet counterweight, etc. (XVII.1-2)
He then discusses methods for extending the range of the automation:
First, he notes using bigger wheels or smaller axels will extend the range (XVII.3)
Then he describes a system where the rope is wound around the smaller part of a pulley, and then onto a larger part, amplifying the rotation of later axels in the system. (XVIII)
Finally, a brief (rough) description of a two-counterweight system is given (XIX). In this system, the one counterweight deals with forwards and backwards motions, and the other all other types of motion of the system.
And that’s the first book! It’s both a description of how to build this one specific automaton and a bit of an explanation of generic techniques that can be re-used across different designs. Personally, the way all these different mechanisms can be mixed and placed wherever feels a bit like programming: you have these sets of primitives (e.g. axels, or the fire-lighting mechanism) that are controlled largely in the same manner, through specific placement in a cord’s unwinding. While the physical aspect obviously would make it incredibly difficult to make changes on the fly or build without much pre-planning, Hero certainly presents these ideas as pre-made sub-programs for remixing. It’s important to note it’s unlikely Hero built all the movement mechanisms he describes here - rather, not all the movement mechanisms described are physically feasible, making them likely to more be results of Hero’s own mathematical deductions as opposed to empirical results. Another interesting aspect is the way mathematical ideas are described. Hero is fairly geometric in his descriptions, with the most common formula being ἔστω + a geometric label, for example:
ἔστω γὰρ πλινθίον τὸ α̅βγ̅δ̅; ἐν ᾧ ἄξων ἔστω ὁ εζ̅̅ συμφυεῖς ἔχων τροχοὺς τοὺς η̅θ; κ̅λ; ὁ δὲ τρίτος τροχὸς ἔστω ὁ μν̅.
“Let there be a case, αβ̅γ̅δ̅; in which let there be an axle, εζ̅, with wheels attached to it, η̅θ̅ and κ̅λ; let there be the third wheel, μν̅.”
(translation mine)
There’s some debate on exactly how to translate ἔστω here which I am not qualified to weigh in on, but I’ll just note that it does feel similar to how we write out geometrical descriptions nowadays (‘let there be a line X…’).
Finally, I’d like to note Hero’s eye to showmanship, with him dedicating some time to discuss how to hide the mechanisms of the automaton, and even from the outset fronting that these automata are things that inspire and generate wonder in others. Indeed, as an automatic theatre-constructor, Hero here is acting as stage director and engineer at the same time. Even from the outset of invention, we see innovation arising not to serve functional needs, but rather as a way to express creativity in unique ways.
The stationary automaton is essentially a box that is able to display a series of scenes, acting like a mini theatre, containing painted images with moving elements (e.g. arms sticking out and moving, or figurines moving in front of a backdrop), and the box opening and closing on its own to facilitate scene transitions. It seems likely this sort of automaton was used in private parties as a form of entertainment. Let’s get into Hero’s description of it.
First, Hero notes that the description and work in the mobile automata were more original, and explicitly notes that in his stationary work he is working off what Philo had already done. He both criticises and praises aspects of Philo’s previous work. (XX)
He then very briefly describes the stationary automata in general: boxes that open and shut to show a series of different scenes with moving, painted figures (XXI).
He then talks about old stationary automata and one particular one he saw that impressed him, telling a story about the mythological hero Naupilus (XXII). He describes the set of scenes shown by the box, and it is its construction that the rest of this book relates. The scenes go as follows:
Book XXII.3-6
(3) καθὰ δὲ προεθέμην, ἐρῶ περὶ ἑνὸς πίνακος τοῦ δοκοῦντός μοι κρείττονος. μῦθος μὲν ἦν τεταγμένος ἐν αὐτῷ ὁ κατὰ τὸν Ναύπλιον. τὰ δὲ κατὰ μέρος εἶχεν οὕτως· ἀνοιχθέντος ἐν ἀρχῇ τοῦ πίνακος ἐφαίνετο ζῴδια γεγραμμένα δώδεκα· ταῦτα δὲ ἦν εἰς τρεῖς στίχους διῃρημένα· ἦσαν δὲ οὗτοι πεποιημένοι τῶν Δαναῶν τινες ἐπισκευάζοντες τὰς ναῦς καὶ γινόμενοι περὶ καθολκήν.
(4) ἐκινεῖτο δὲ ταῦτα τὰ ζῴδια τὰ μὲν πρίζοντα, τὰ δὲ πελέκεσιν ἐργαζόμενα, τὰ δὲ σφύραις, τὰ δὲ ἀρίσι καὶ τρυπάνοις χρώμενα <καὶ> ψόφον ἐποίουν πολύν, καθάπερ ἐπὶ τῆς ἀληθείας {γίνοιτο}. χρόνου δὲ ἱκανοῦ διαγενομένου κλεισθεῖσαι πάλιν ἠνοίγησαν αἱ θύραι, καὶ ἦν ἄλλη διάθεσις· αἱ γὰρ νῆες ἐφαίνοντο καθελκόμεναι ὑπὸ τῶν Ἀχαιῶν. κλεισθεισῶν δὲ καὶ πάλιν ἀνοιχθεισῶν, οὐδὲν ἐφαίνετο ἐν τῷ πίνακι πλὴν ἀέρος γεγραμμένου καὶ θαλάσσης.
(5) μετὰ δὲ οὐ πολὺν χρόνον παρέπλεον αἱ νῆες στολοδρομοῦσαι· καὶ αἱ μὲν ἀπεκρύπτοντο, αἱ δὲ ἐφαίνοντο. πολλάκις δὲ παρεκολύμβων καὶ δελφῖνες ὁτὲ μὲν εἰς τὴν θάλατταν καταδυόμενοι, ὁτὲ δὲ φαινόμενοι, καθάπερ ἐπὶ τῆς ἀληθείας. κατὰ μικρὸν δὲ ἐφαίνετο χειμέριος ἡ θάλασσα, καὶ αἱ νῆες ἔτρεχον συνεχῶς. κλεισθέντος δὲ πάλιν καὶ ἀνοιχθέντος, τῶν μὲν πλεόντων οὐδὲν ἐφαίνετο, ὁ δὲ Ναύπλιος τὸν πυρσὸν ἐξηρκὼς καὶ ἡ Ἀθηνᾶ παρεστῶσα·
(6) καὶ πῦρ ὑπὲρ τὸν πίνακα ἀνεκαύθη, ὡς ἀπὸ τοῦ πυρσοῦ φαινομένης ἄνω φλογός. κλεισθέντος δὲ καὶ πάλιν ἀνοιχθέντος, ἡ τῶν νεῶν ἔκπτωσις ἐφαίνετο καὶ ὁ Αἴας νηχόμενος, μηχανὴ τε {καὶ} ἄνωθεν τοῦ πίνακος ἐξήρθη καὶ βροντῆς γενομένης ἐν αὐτῷ τῷ πίνακι κεραυνὸς ἔπε- σεν ἐπὶ τὸν Αἴαντα, καὶ ἠφανίσθη αὐτοῦ τὸ ζῴδιον. καὶ οὕτως κλεισθέντος καταστροφὴν εἶχεν ὁ μῦθος. ἡ μὲν οὖν διάθεσις ἦν τοιαύτη.
(3) As I laid out before, I will talk about one box that seems superior to me. The story set in it was the one about Naupilius. And its parts went like this. In the beginning, when the box opened, 12 painted figurines appeared. These were divided into 3 rows; and these were made to represent some of the Danaans (Greeks) preparing their ships and launching them.
(4) These figurines moved, some sawing, some working with axes, some with hammers, and some with bow-drills and augers. They made much notice, just as it would be in reality. And once enough time had passed, the doors closed again and opened, and there was another arrangement; the ships, in fact, appeared being launched by the Achaeans (Greeks). And after the doors closed and opened again, nothing appeared in the box except the painted sky and sea.
(5) And not long after the ships sailed along in line. Some were out of sight, and others were visible. Often dolphins swam along too, sometimes plunging into the sea, sometimes appearing, just like in real life. And gradually the sea appeared stormy, and the ships ran uninterrupted. And after the doors shut and opened again, none of the sailing ships were visible, but Naupilius holding up the torch and Athena standing alongside (were visible).
(6) And a fire was lit up above the box, as if a flame appeared above from the torch. And after the doors closed and opened, the wreck of the ships appeared, and Ajax swimming; and a machine was raised above the box, and while there was thunder in the box itself, lightning fell on Ajax and his figure vanished. And thus, once the doors closed, the story came to an end. So, such was the arrangement.
He then starts his description of how to construct this by starting with general design and materials for the box, and the key element of the stationary automaton: the doors that swing open and shut automatically (XXIII). As before, this is done with a counterweight along with a series of knobs and axles and carefully wound rope.
The following chapters then go through the implementation of each scene above:
Second, the launching of the ships (XXV). The transition from the previous scene is achieved by painting this new scene on a cloth and using a rod as a weight. The same counterweight system is then used to release it at a particular time, changing the scene. This mechanism is used to transition to the fourth and fifth scenes too.
Third, the ships sailing (XXVI-XXVII). The sky and sea here are painted onto papyrus, which itself is attached to rollers on either side, allowing them to move back and forth and so make it look like the ships below (painted on cloth) are moving along. Dolphins are added on top of this, attached to a pulley inside the system that makes them swing up into the scene and then down, as if they were swimming alongside the ships.
Fourth, Naupilus and Athena (XXVIII). This is painted on cloth, and Naupilus’s torch is made by lighting some wood shavings using a small fire hidden inside the box, very much like how Dionysus’ alter was lit above.
Finally, the shipwreck (XXIX-XXX.6). Athena is placed on a base, which is flipped up and down via cords while she rotates on the base. A painted figure of Ajax swimming is present on top of the background. The lightning is made by dropping a board with some painting on it (by holding the board up with string and then dropping it), and at the same time as this falls, the figure of Ajax is covered with a cloth painted the same colour as the background, making him effectively vanish as he is struck by lightning.
There is then a brief (one paragraph long) cut off epilogue (XXX.7), noting that these movements and the box are managed through the same means.
While potentially less exciting than the mobile automata (since it doesn’t move), the stationary automata is actually more intricate in some ways, telling a full story across more scenes than the mobile, while still highlighting the versatility of the counterweight-style system. It’s also worth highlighting this automaton was not Hero’s invention, but Philo’s, showing how there were a few people using these ideas and mechanisms to devise their own automata art (in fact, Philo likely predates Hero, and as such Hero likely learnt many of these techniques from Philo). Again, the artistic and the mechanical is blended in this automaton, with the focus on giving a good show to an audience, rather than solving some specific problem or issue. Beyond this, the presentation and ideas used share a lot with the mobile automaton. Personally, I think a seven-scene story is probably more exciting to watch than the relatively simple automated mobile shrine above, even if the fact the mobile shrine moves on its own is fairly impressive. It’s fun to think about what you could potentially ‘program’ into this type of automata, and the length of the stories you could tell - was watching this an ancient version of watching the latest blockbuster with the newest and best SFX? (probably not). Overall, the mobile automaton is just as technically impressive as the mobile one, with a complex story being told.
Hero’s On Automata, to me, exposes a lot of interesting ideas and facts about ancient innovation, mathematics, and how people thought about automation. Interesting, I think it links more to computer animation and computer art than it does artificial intelligence, despite the name of ‘automata’. The focus is on creating awe and wonder, hiding the mechanical truths to get the audience to focus on the little stories told by these complex and intricate devices. The use of this early style of programming, and coming up with novel ways to use a central system to create new effects reminds me a lot of how blockbusters have often involved the creation of new technology to achieve a director’s vision. Perhaps this exposes a core element of human innovation, dating back to Homer and his automata: technological and artistic creation are somehow innately linked.
So that’s Hero’s On Automata. There’s a lot of generic posts on Hero out there on the internet, but actual in-depth resources require a bit more digging, so I hope this post is able to show you something you didn’t previously know in a reasonable amount of detail. I’m certainly not an expert in this space - see my bibliography for the real experts - but nonetheless I hope my reasonably unique experience as a classicist and software developer has provided a unique view. If you want to read further, in particular, I found Francesco Grillo’s PhD thesis on the first book of On Automata very thorough and informative for not just the book itself, but Hero’s life and context as a whole (as you might have guessed from my constant references to it above). Hope to see you around for my next post!
Bosak-Schroeder, Clara. “The Religious Life of Greek Automata.” Archiv Für Religionsgeschichte, vol. 17, no. 1, Dec. 2016, pp. 123–36. DOI.org (Crossref), doi:10.1515/arege-2015-0007.
Grillo, Francesco. Hero of Alexandria’s Automata: A Critical Edition and Translation, Including a Commentary on Book One. University of Glasgow, 2019.
Knight, Edward Henry. Knight’s American Mechanical Dictionary. http://onlinebooks.library.upenn.edu/webbin/book/lookupid?key=olbp69570.
McCourt, Finlay. “An Examination of the Mechanisms of Movement in Heron of Alexandria’s On Automaton-Making.” Explorations in the History of Machines and Mechanisms, edited by Teun Koetsier and Marco Ceccarelli, vol. 15, Springer Netherlands, 2012, pp. 185–98. DOI.org (Crossref), doi:10.1007/978-94-007-4132-4_13.
Sherwood, Andrew N., et al. Greek and Roman Technology: A Sourcebook: Annotated Translations of Greek and Latin Texts and Documents. Routledge, 2003. DOI.org (Crossref), doi:10.4324/9780203413258.
]]>