Here are the three lists separated by genre and ordered by your rating (highest to lowest).
Novels and short story collections
American novels (and non-novels): “4 3 2 1”, “American Pastoral”, “Stoner”, “Demon Copperhead”, “Patrimony”; re-discovery of David Mitchell; some classics.
Fewer than the last few years.
Other bits and pieces. “Apple in China” was extremely interesting.
Restricted to things published in 2024 - I got to see very few movies in a theater.
Movies I’d like to recover: The Substance, Anora, Juror #2, The Zone of Interest, Challengers, Hit Man, Civil War.
This was a great example of how chatGPT works nicely not only to write code, but also to guide you step-by-step in debugging your code, setting up multiple AWS services (a Lambda and an API Gateway), integrating them, fixing permissions, testing your solution end to end, etc. In my opinion this is one of the use cases where the tool really shines thanks to the possibility of interacting with it - it’s like a (stupid, but very knowledgeable) live support chat.
I used this to quickly format this post copying straight from Goodreads. A menial task that I would have done manually.
Hi. This is a (badly formatted) list of books with title, author, and other information. Could you format this list this way in markdown: Certainly! Here is the reformatted list in Markdown with the author and title in bold:Here's the chat.
cover title author rating date read date added Down arrow
The City and the Stars
The City and the Stars
Clarke, Arthur C.
1 of 5 stars2 of 5 stars3 of 5 stars[ 4 of 5 stars ]5 of 5 stars
Jul 30, 2022 [edit]
Jul 19, 2022
edit
view »
Remove from my books
The October Country
The October Country
Bradbury, Ray
1 of 5 stars2 of 5 stars3 of 5 stars[ 4 of 5 stars ]5 of 5 stars
Jul 19, 2022 [edit]
Jul 17, 2022
edit
view »
Remove from my books
The Sundial
The Sundial
Jackson, Shirley
1 of 5 stars2 of 5 stars3 of 5 stars4 of 5 stars[ 5 of 5 stars ]
Jul 09, 2022 [edit]
Jul 09, 2022
edit
view »
Remove from my books
[...]
1. **The City and the Stars**, _Arthur C. Clarke_
1. **The October Country**, _Ray Bradbury_
1. **The Sundial**, _Shirley Jackson_
[...]
It was able to parse the crazily formatted list, extract the correct entities, and even re-write the author as “Name Surname” instead of “Surname, Name” (I didn’t request it explicitly but it was a nice touch). To be fair it started spouting nonsense (books that were not in my list 🙄) after item 16. So I split the input list in two and it worked flawlessly.
This works quite nicely. Of course it’s only a skeleton of an article, and the related works are mostly bullshit (for the most part, they are not real articles).
Good for a quick survey on some datasets I was interested in.
One of the killer applications. The prose is a bit bland and sometimes uselessly verbose, but it’s generally more than acceptable, and often better than a non-native English speaker would do unless they put really honest effort into it.
]]>Comics
Non-fiction
Restricting to things published in 2022, very few movies.
A few other ones that are not worth including in a “best of” list - Sam Raimi’s Doctor Strange (quite fun), Secrets of Dumbledore (meh), Thor: Love and Thunder (🤮). Movies I’d like to recover.. if I ever have the time: White noise; The northmen; Triangle of sadness; Crimes of the future; Licorice pizza; Nope; Everything everywhere all at once; The Fabelmans; Bones and all; Aftersun?; Vortex?; The menu?; Top Gun?; The pale blue eye?; Glass Onion?.
Obi-Wan Kenobi was terrible. A couple of other shows such as the new seasons of shows like The Boys and Stranger Things were OK. Shows I might want to recover: Severance; Slow Horses?
]]>Wordle uses a list of 2315 words that can be a valid solution (and other 10657 additional valid input words). A simple idea is to just pick the input word that will maximally reduce the space of valid solutions, at least on average, across all possible targets.
As an example, consider a word like SMILE. Depending on the target words, the outcome can vary significantly:
SMILE seems to be a decent, but not spectacular, initial choice: from the full solution space (2315), you can expect to reduce it on average to around 114 valid words.
We can do better than that: applying the same logic, the word that maximally reduces the solution space on average across all possible target words is ROATE. To be honest, I’m not even sure what that means. Picking ROATE as the first guess reduces the number of valid solutions to ~60, on average! On the other hand, choosing the worst ranked word in the vocabulary, IMMIX (?), you wouldn’t cover much ground: you’d still be left, on average, with more than 1300 valid solutions.
Of course, the process doesn’t stop at the choice of the first input. We can iterate until we get to the end, using our 1-step lookahead policy:
Note: this can be done in such a naive way because the list of valid words is pretty small, otherwise it would blow up quite quickly.
From my simulation, it looks like we would be able to always solve the puzzle in under 6 moves (leftmost plot). The average is slightly below 3.5, which is not bad. You win in ≤3 attempts more than 50% of the time, if you use the entire vocabulary of admissible words (2315 + 10657).
If you only choose inputs from the vocabulary of possible solutions (2315), results are slightly worse (middle plot). Still, in no cases you need 6 attempts, and only very rarely you need 5.
It’s fun to compare the strategy with what happens if you use a totally random choice of words from the valid solutions (rightmost plot): you’d still be able to win (at most 6 attempts) about 57% of the time. (Here I limited to max 10 attempts.)

Things do get harder if you play in “hard mode”, that is, all your input words must be consistent with the hints given so far. In this case, you sometimes need up to 8 attempts1. On the contrary, having more constraints greatly helps the “random” strategy: 98% of the time you’d win in 6 or less, and the average number of attempts would be around 4.1.

An example of an unlucky scenario for the lookahead policy in “hard mode” (solutions only) is the word WOUND:
RAISE ➝ ⬛⬛⬛⬛⬛ The solution space is restricted to 168 valid solutions.COULD ➝ ⬛🟩⬛🟩🟩 The solution space is restricted to 6 valid solutions.BOUND ➝ ⬛🟩🟩🟩🟩 The solution space is restricted to 5 valid solutions.FOUND ➝ ⬛🟩🟩🟩🟩 The solution space is restricted to 4 valid solutions.HOUND ➝ ⬛🟩🟩🟩🟩 The solution space is restricted to 3 valid solutions.MOUND ➝ ⬛🟩🟩🟩🟩 The solution space is restricted to 2 valid solutions.POUND ➝ ⬛🟩🟩🟩🟩 The solution space is restricted to 1 valid solutions.WOUND ➝ 🟩🟩🟩🟩🟩You could try to use slightly different criteria to rank the input words: I used the average (expected value) over the distribution of possible outcomes, but one could use the worst case (we would pick the input word that would guarantee you the max reduction even in the unluckiest scenario) or even get fancier including some measure of the variance of the outcomes.
How far is this from the optimal policy? Good question. A conceptually trivial extension would be to have a longer horizon, for example, a 2-step lookahed, but it would be quite expensive from a computational point of view.
If, instead of the average over the solution space of all possible outcomes, one uses the median, you get even better results. The starting word, in this case, is REIST.

You can’t use words that you know are not valid, but would potentially give you a lot of useful information. ↩
In computing, procedural generation is a method of creating data algorithmically as opposed to manually, typically through a combination of human-generated assets and algorithms coupled with computer-generated randomness and processing power.
Procedural generation is common in computer graphics and game design to generate textures, terrain, or even for level/map design. Generative art is also all the rage now, thanks to all the NFT hype (I won’t comment further here..).
There are so many methods for procedural generations, and so many people that do great stuff online. As an example, a few days ago I stumbled upon this delightful infinitely-scrolling Chinese landscape generator (here):

While this Chinese landscape is generated using an ad-hoc algorithm, there are several popular and more general techniques used in the procedural generation community, such as the so-called Wave Function Collapse algorithm by Ex-Utumno. WFC generates images that are locally similar to an input bitmap. The algorithm splits the input image into small tiles (say, 2x2), and tries to infer simple constraints for each unique tile (for example, a tile with a road exiting on the right can only be adjacent to a tile with a road entering from the left). Then, it builds a new image generating tiles (with randomness) and propagating constraints so that we find a feasible solution with high probability.

I wanted to try something like that myself, but, being extremely lazy, I didn’t feel like implementing a constraint propagation algorithm from scratch (though that sounds fun!).
So I decided to try an extremely dumb technique, but that required almost 0 effort on my part: why not formulating the problem as an Integer Program?
Let’s define the output image as a graph (V,E) where each node is a tile and the arcs link to the adjacent tiles.
Say we have 4 tile patterns: T = {sea, coast, land, mountain}. We set a desired output distribution (say, 40% of the cells must be sea, 20% must be land, etc..).
Then it’s enough to formulate a model as:
where $x_{vt}$ is 1 if the pattern $t$ is assigned to the cell $v$, and we minimize the distance from the desired distribution. (The absolute value in the objective function can be easily removed adding a few auxiliary variables and constraints).
If there are two patterns $(t_1,t_2)$ that can’t be adjacent, such as sea and mountain, we add:
\[\begin{align} & x_{ut_1} + x_{vt_2} \leq 1 \qquad&\forall (u,v) \in E\\ \end{align}\]And if we want a pattern $t_1$ to have at least an adjacent cell with pattern $t_2$ (say, coast must be next to sea):
\[\begin{align} & \sum_{u \in adj(v)} x_{ut_2} \geq x_{vt_1} \qquad&\forall v \in V\\ \end{align}\](Here's the Julia code to define and solve the model with JuMP and SCIP.)
I = 1:I
J = 1:J
V = [(i,j) for i in I for j in J]
adj = (i,j) -> [tuple(([i,j] + k)...) for k in [[-1,0],[+1,0],[0,-1],[0,1]] if all([i,j] + k .∈ (I,J))]
E = [(u,tuple(v...)) for u in V for v in adj(u...)]
T = [:land, :sea, :coast, :mountain];
N = length(V)
Desired = Dict(:land => round(0.4*N), :sea => round(0.4*N), :coast => round(0.1*N)) #mountain: what's left
Desired[:mountain] = N - sum(values(Desired))
model = Model(SCIP.Optimizer);
@variable(model, x[V,T], Bin);
@variable(model, η[T] >= 0)
@constraint(model, assignment[v in V], sum(x[v,t] for t in T) == 1);
@constraint(model, sealand[(u,v) in E], x[u,:land] + x[v,:sea] <= 1);
@constraint(model, seamountain[(u,v) in E], x[u,:mountain] + x[v,:sea] <= 1);
@constraint(model, coastmountain[(u,v) in E], x[u,:mountain] + x[v,:coast] <= 1);
@constraint(model, landcoast[v in V], sum(x[u,:land] for u in adj(v...)) >= x[v,:coast]);
@constraint(model, seacoast[v in V], sum(x[u,:sea] for u in adj(v...)) >= x[v,:coast]);
@constraint(model, mountainchain[v in V], sum(x[u,:mountain] for u in adj(v...)) >= x[v,:mountain]);
@constraint(model, major1[t in T], η[t] >= sum(x[v,t] for v in V) - Desired[t]);
@constraint(model, major2[t in T], η[t] >= - sum(x[v,t] for v in V) + Desired[t]);
@objective(model, Min, sum(η[t] for t in T));
Random.seed!(42)
# Fix randomly 3 cells
for t in T
v = Random.rand(V)
println("Fixing $v to $t")
fix(x[Random.rand(V),t], 1.0, force=true)
end
optimize!(model)
println("Optimal value: ", objective_value(model))
xval = value.(x) .> 1 - 1e-5 #circa 1
Color = Dict(:land => :green, :sea => :blue, :coast => :yellow, :mountain => :light_black);
for i in I
for j in J
for t in T
if x[(i,j),t]
printstyled(" ", color=Color[t], reverse=true)
end
end
end
print("\n")
end
Solving to optimality for large-ish images is super slow - my model is as naive as it gets, and finding an optimal solution is really pointless, we’d rather generate quickly multiple feasible solutions. Here’s a result for the simple example in the code above:

Not too bad: it does look like a big map with recognizable features such as lakes, mountain chains, islands. This is not a recommended approach, it’s very dumb – still, I suspect that with some tuning and clever modeling tricks one could 1) generate very interesting/complex patterns 2) greatly reduce the solving time. After all, somebody already thought of ways to generate art with mathematical optimization and even wrote a book about it.
]]>Started strong, but didn’t keep up in the second half of the year, I wasn’t in a reading mood. Roughly in order:
All in all, a bit of a disappointing year, no great discovery.
Comics
Non-fiction
I was back in a theater! But still watched very few new movies (Freaks Out, The green knight, È stata la mano di dio). Mare of Easttown was a great TV show. Jonathan Strange and Mr. Norrell was a nice surprise. And a re-watch of The office happened. \
]]>See UDA, MixMatch, FixMatch, Meta Pseudo Labels in the last 2-3 years.
The underlying assumption of most semi-supervised learning work are minimal: pretty much nothing is known about unlabeled examples, other than their having the same distribution of the labeled data.. But.. is that realistic? In the real-world, we often we know a lot about the problem at hand! That’s why we can (and should!) often CHEAT a little bit.
Suprisingly often, we can find “target-consistent” groups in our unlabeled data, where the label is consistent, although unknown. Similar ideas have also been exploited for self-supervised learning approaches (e.g., Time-Contrastive Networks, Geography-Aware Self-supervised Learning).
A few practical examples are:

So here is a simple idea that I called “Domain-aware Semi-supervised learning” (DSSL). Let’s use domain knowledge to identify groups of unlabeled data that are target-consistent. Then, for each batch, we take $N$ labeled examples and $M$ target-consistent groups of unlabeled examples and compute a loss as:
\[\mathcal{L} = \mathcal{L}_s + \lambda_u\mathcal{L}_u\]where we sum the standard supervised loss with a domain-aware unsupervised loss term (consistency term) computed, for each group $G^j$, as in the following figure:

In summary, we want model predictions to be consistent across each group. To do so:
Let’s see 3 real-world cases where I applied this simple technique to get a big performance improvement.
Consider the task of classifying weather in road scenes. We have a few hundreds labeled images, and thousands of unlabeled short videos from the BDD100k dataset.
We know that weather doesn’t change instantaneously: frames from the same video are “target-consistent” (left: sunny; right: rainy).

DSSL can easily be applied here: we compute a pseudolabel as the average prediction of the current model on multiple frames of the same (unlabeled) video.
This works great compared to just training in a supervised way on the available data. I also compared DSSL with a semi-supervised baseline (essentially, FixMatch with no bells and whistles), showing that DSSL outperforms that, too.
A task that comes up surprisingly often in practical computer vision on road scenes is the segmentation of the ego-vehicle, in videos where the camera pose, vehicle type, etc. are all unknown.


For each vehicle, we do know that the camera position doesn’t change over time: images from the same device are “target-consistent”.
Having access to millions of unlabeled images from a private dataset, DSSL can easily be applied: a pseudo-label (or pseudo-mask) can be obtained averaging the predicted segmentation masks over multiple images from the same camera. This gives you on average more than a +5 IoU improvements “for free” over both supervised and semi-supervised learning baselines.
Finally, another practical tasks. Assume we have access to streams of IMU + GPS data from connected vehicles, and we want to know what kind of vehicle generated the data. For a subset of vehicles, we might know make + model. Interestingly, in this case it’s not even possible to manually label more data - unless you know somebody that can interpret IMU data!
But also in this case we know valuable information: given a vehicle, its type doesn’t change (duh!). So, data collected from the same device over time are “target-consistent”.

DSSL works well also in this case. Additional bonus: this is a somewhat less standard task - there’s no standard recipe for data augmentations on IMU+GPS, and most self- or semi-supervised methods are designed and tuned for image-related tasks. Indeed, I couldn’t get the Fixmatch baseline to work at all for this problem, even throwing at it all the IMU+GPS augmentations I had. While DSSL just worked out of the box.
I briefly showed here a simple yet effective method to exploit domain knowledge in a semi-supervised framework. Some more details can be found in the paper that I presented at a ICCV workshop this year.
The main takeaway should be: if you can cheat and exploit some knowledge of your problem or the process that generates your data, by all means do!
Olivier Chapelle, B Scholkopf, and A Zien. “Semi-Supervised Learning”. MIT Press, 2006 ↩
Quite a good run this year, also helped by the move from anobii to goodreads, which is just nicer and where people are more active. Roughly in order:
Loved the first 6/7 books, greatly enjoyed the first dozen. About the rest, I was really disappointed by Banks and Barth. And unfortunately I can’t really find anything new by Egan that I really love. His 90s short stories were 💣.
Comics:
Non-fiction but not work related:
Townscaper is a minimal city builder, as simple as it can get: you can only left click, to add an element, or right click, to delete one. What element is actually built (a road? a garden? a roof? a stairway?) depends entirely on the local configuration of the surrounding elements. In the end, you are actually exploring the enormous combinatorial space of possible 3d configurations. Delightful.
Outer Wilds is awesome. You’re a tiny space explorer in a fragile, inadequate wooden spaceship, stuck in a 20 minute-time loop. You need to gather enough clues and to git good at moving around your galaxy so you can solve the riddle that can save you before it’s too late.
What works great: the feeling of mystery and the wonder at each discovery; the wonderful, mysterious, scary galaxy you need to explore; the space physics! Flying is so fun! Landing on a planet, seeing it fill the horizon, is incredible! Making progress is rewarding!
Unfortunately, it has its flaws. Man, looking at the supernova go boom is awesome. Hearing the ominous 2-minutes music: perfectly Pavlovian. But sometimes the gameplay feels a bit tedious. The countdown is unforgiving: repeatedly being interruped during your explorations can get rather annoying. I know, that’s kind of the point – but still.
Persona 5 Royal. Guilty pleasure if ever there was one. But I admit I really enjoyed the whole Japanese-ness of it all. And catchy music. Man, I wasted a lot of time on this! But I’m getting too old for J-RPGs.
The Last of Us Part 2 is a tough one. Sometimes I think it’s a masterpiece, sometimes I’m much more cynical and dismiss it. Perhaps the truth is in the middle. The first one was much more coherent. The environment is wonderful, but for the majority of the game, it really feels like a dejà-vu. The gameplay was hit or miss: level design is top notch, but they could have trimmed a few missions, especially Ellie’s. Some of the plot verged on the nonsensical – it’s all so dreadful that it’s sooo close to going all the way around and becoming ridiculous. And yet, to me it didn’t: on the contrary, it packed a big punch, surprisingly. At the very end, I was emotionally drained: when it’s your turn to hit, I left the controller sitting, hoping I didn’t need to… but choice is an illusion.
The Pathless You wander in an open world full of empty ruins and remains of a violent past. Environmental enigmas and a bunch of big boss fights: impossible to avoid thinking of “Shadow of the Colossus”. But it has its very own personality (and less of a challenge). Exploring and running/gliding around is extremely satisfying. Made with love and care.
Superliminal A well balanced puzzle game that exploits quirky mechanics that can only work in a game. Between “Portal” and “Stanley Parable”.
God of War. Not my kind of game, but it’s well done.
I’m not sure I watched a 2020 movie.. for sure not in a theater. A few TV shows:
Same old: The Lowe Post and The Mismatch on NBA; Exponent for tech news commentary.
New entries (in Italian): Joypad on videogames; N by Nicolò Melli (Italian basketball player) on the NBA bubble.
]]>Still, writing correct and efficient code with pandas can be tricky - I see that a lot in colleagues that are new to the library. Here’s a short note I wrote with the main ideas to keep in mind in order to write fast code with pandas:
A number of recipes are contained in the Cookbook in the official website. Other good pages to read in the documentation are those around indexing, grouping, text data (especially stuff on the magic .str notation), and time series/dates. And for the interested reader I’d recommend the great series of blog posts by Tom Augspurger on “modern” pandas here – it’s 4 years old but it still contains a lot of great stuff.
Also, I won’t comment here on common pitfalls around the correctness of pandas code. That would deserve its own post.
Calling apply row by row, with df.apply(f, axis=1), is no faster than looping over the rows.. so it’s rather slow.
True, it’s highly flexibile, because the argument passed to f is the entire row where you can access multiple fields at the same time.
Viceversa, df.apply(f) would try to apply f column-wise on each column (can be used with any functions that can be applied to a column/vector), which is faster but clearly less flexible.
Example:
def slow_apply(dat):
return dat.apply(lambda row: np.sqrt(row["a"]), axis=1)
def fast_apply(dat):
return dat["a"].apply(np.sqrt)
def faster_apply(dat):
return np.sqrt(dat["a"])
> len(df)
14378
> %time slow_apply(df)
CPU times: user 484 ms, sys: 18.7 ms, total: 503 ms
Wall time: 503 ms
> %time fast_apply(df)
CPU times: user 994 µs, sys: 87 µs, total: 1.08 ms
Wall time: 532 µs
> %time faster_apply(df)
CPU times: user 653 µs, sys: 32 µs, total: 0.685 ms
Wall time: 386 µs
A 1000x difference, and even slightly better if we completely avoid the “apply”*.
(*A nice but tricky thing about pandas: applying a numpy function to a pandas object will return a pandas object. In this cases, the return type of np.sqrt(col) is again a pd.Series, not an np.array! Personal preference: I’d rather take the (slightly) slower version and be more explicit using the apply(np.sqrt). All readers will understand that the output is going to be a pd.Series.)
Iterating on groups of a dataframe can be ok for quick data analysis or development; however, if you need to run a big number of them (say, feature extraction on a big dataset, or something that needs to run in production), note that it can be extremely slow if not done right.
One should rather use groupby followed by .agg(), .transform(), or .apply(). In particular, transform is useful when you want to group and then apply the (same) result of a computation on all elements of each group. However, transform functions are applied on single columns.
Example:
def slow(data):
""" For each object in a group, that might have multiple classes,
assign the class with max total confidence. Example:
index id class confidence
0 0 apple 0.90
1 0 banana 0.35
2 0 apple 0.99
3 0 banana 0.43
4 0 banana 0.30
Here the object with id = 0 would be assigned the class 'apple'.
"""
for i in data["id"].unique():
idx = data["id"] == i
weights = data.loc[idx, ["class", "confidence"]].groupby("class").sum()
data.loc[idx, "class"] = weights.idxmax()[0]
def fast(data):
data["weights"] = data.groupby(["id", "class"])["confidence"].transform('sum')
data["class"] = data.groupby("id")["weights"].transform(lambda g: data.loc[g.idxmax(), "class"])
del data["weights"]
> len(df)
14378
> %time slow(df)
CPU times: user 3.42 s, sys: 45.8 ms, total: 3.46 s
Wall time: 3.46 s
> %time fast(df)
CPU times: user 760 ms, sys: 18.6 ms, total: 779 ms
Wall time: 204 ms
We obtained quite a big improvement: 20x speedup. There might be way to speed things up even further.
The “fast” version relies on the fact that the index of the groups in a groupby are the same as the one in the original df (actually groups will be a view on that dataframe), so the index of the max element can be used to index the original dataframe.
Another example, perhaps even simpler:
def slow(data, step=12):
"""Compute relative change in quantities a and b along time of each element in a group"""
for group_id in data["id"].unique():
pos = (data["id"] == group_id)
data.loc[pos, "a_change"] = data.loc[pos, "a"].diff(step) / data.loc[pos, "a"].shift(step)
data.loc[pos, "b_change"] = data.loc[pos, "b"].diff(step) / data.loc[pos, "b"].shift(step)
def fast(data, step=12):
data[["a_change", "b_change"]] = data.groupby("id")[["a", "b"]].transform(
lambda g: g.diff(step) / g.shift(step))
> len(df)
14378
> %time slow(df)
CPU times: user 2.11 s, sys: 6.13 ms, total: 2.12 s
Wall time: 2.13 s
> %time fast(df)
CPU times: user 1.03 s, sys: 15.3 ms, total: 1.05 s
Wall time: 1.04 s
Here the speed up is 2x, and the code is in my opinion also easier to read…
…but still pretty slow. It can be made even faster, when you find out that there is a df method, pct_change(), that does exactly what we’re trying to do here! \o/
def faster(data, step=12):
data[["a_change", "b_change"]] = data.groupby("id")[["a","b"]].pct_change(periods=step)
> %time faster(df)
CPU times: user 9.43 ms, sys: 52 µs, total: 9.48 ms
Wall time: 10.9 ms
And this is much faster, 350x speedup.
Finally, when more flexibility is needed (e.g., for operations between columns), a general apply() can also be used, obtaining max flexibility:
def slow(data):
motions = []
for _, group in data.groupby('id'):
group = group.copy()
group['going_left'] = group['right'].shift() > group['right']
group['going_right'] = group['left'].shift() < group['left']
group['is_shrinking'] = (group['right'] - group['left']) < (group['right'].shift() - group['left'].shift())
motions.append(group[['id', 'timestamp', 'going_left', 'going_right', 'is_shrinking']])
return pd.concat(motions, ignore_index=True)
def fast(data):
return data.groupby('id')[["right", "left", "timestamp"]].apply(
lambda g: pd.DataFrame(
{
"going_left": g["right"].shift() > g["right"],
"going_right": g["left"].shift() < g["left"],
"is_shrinking": (g["right"] - g["left"]) < (g["right"] - g["left"]).shift(),
"timestamp": g["timestamp"],
})
).reset_index(0) # remove the id index from groupby
> %time slow(df)
CPU times: user 1.53 s, sys: 58 µs, total: 1.53 s
Wall time: 1.53 s
> %time fast(df)
CPU times: user 945 ms, sys: 7.99 ms, total: 953 ms
Wall time: 949 ms
The benefit of using groupby().apply() is small in terms of efficiency, but still noticeable; and the code is in my opinion more terse and readable.
Still, not something you can call fast. For that, a bit more effort is required :-)