I myself have just asked ChatGPT 5.4 Pro whether the preprint mentioned above is sufficient to resolve the word problem for that Artin-Tits group, and it has told me that it isn’t. Of course, one needs to be cautious about that as well, especially as one or other of the two conclusions it has come to has to be wrong, but given that the authors of the preprint are presumably well aware of the previously open status of that word problem, it seems unlikely that they wouldn’t have done the calculation that GTP 5.4 Pro claimed in the other conversation to have done. So my current guess is that the preprint doesn’t in fact resolve the problem, but that’s just a guess and isn’t based on a serious attempt to engage with the mathematics.
]]>For the moment I am not sharing the document here, as I hope that by not doing so I’ll provoke other people to try the same experiment, perhaps with different LLMs.
Obviously (but perhaps not quite obviously enough) the results of any such experiment should be treated cautiously, but if the conclusion turns out to be correct, then I am still interested in the following three questions.
It seems rather optimistic to hope for a positive answer to 3, but if there is one, then it would imply positive answers to 1 and 2 as well.
]]>Yes I have found that article, and I link to it from the game tutorial. One thing I quite like about the game is that, as I implied above, once one plays it for a bit, one starts to see how to encode instances of the word problem for groups, and as a result one intuitively understands the proof of the reduction in a way that I found easier than following the argument in the article (which is not at all a criticism of the article, but more like a comment on the virtues of being able to manipulate words conveniently on a screen).
]]>Many thanks.
Charney proves biautomaticity by analysing a normal form due to Frank Garside, a 1980s mayor of Oxford. Very roughly, the linearly bounded solution to the puzzle game for braid groups then comes from pushing the word “orthogonally” to those normal form lines.
Van Kampen diagrams bring to bear observations about curves in the plane running through discs. These observations are simple but can be powerful and are hard to uncover by just manipulating words. How useful they are for the Artin-Tits example at hand is not so clear to me, and may have a lot to do with the subtlety of its word problem.
One thing I took from your post is that there might be a “natural” or “elementary” example of a finitely presented group with undecidable word problem. I had imagined such groups would always have to be elaborately contrived to house a Turning machine or similar, but Tseytin’s semigroup indicates that’s not necessary — the machine simulation part of the proof enters via the challenge-word and is external to the semigroup. You’ve probably found Carl-Fedrik Nyberg-Brodda’s nice 2024 article on Tseytin’s semigroup.
]]>Thank you for these useful comments. My first attempt to build a game was in fact more visual and based on van Kampen diagrams with exactly those pieces, but it soon became clear that manipulating the diagrams was going to be a slow and laborious process, at least compared with what I now have.
I didn’t know about Ruth Charney’s result — I’ll look that up.
PS I edited your message to make the LaTeX come out: to do this I replaced each opening dollar $ with $latex. It’s a slight bore to type that every time, but one gets used to it.
]]>…didn’t mean to be anonymous, just not used to using WordPress — Tim Riley
]]>———————————
You ask whether every braid that is equal to the identity can be reduced to the empty word without ever increasing the number of crossings. If I understand the set-up correctly, this is equivalent to asking whether the puzzle game for the braid group (with respect to standard generators) can always be solved without increasing length.
I don’t know the answer, but I’m aware of a result that is close. Ruth Charney proved that braid groups are “biautomatic.” It follows that they have “linear filling length,” which is to say that there’s a constant such that every word of length at most
that represents the identity can be reduced to the empty word as per the puzzle game with every word en route having length at most
. You could ask whether the Artin-Tits group you discuss has the same property.
———————————
There are many examples of finitely presented groups where the lengths of words must grow a lot in the course of the puzzle game before they being shrunk down to the empty word. “Filling length” is the term to search for.
The group is a striking example. For all integers
, there is a word
, whose length we denote by
, that represents the identity, but cannot be shrunk down without encountering words whose lengths are at least the
-fold-iterated exponential
. But the word problem is decidable and I don’t believe there is enough going on in that group to build “alien mathematics.”
The link no longer works, where are the animations kept now?
]]>I came across this blog by accident: I looked up the composer of the score for the (eternally peerless) Granada Sherlock Holmes series
I wouldn’t qualify as an assistant on your project but I hope that I can be of some small use
Your ad hoc summary of LLM “habits” certainly agrees with my own experience of LLMs’ response to mathematical queries; you hint that you have found that LLM performance has improved significantly in your last 3 years’ experience; it has improved noticeably in my last 3 months’ experience
But it remains – day of posting – prone to shocking bad error; and I wonder that it could be that the underlying “bad habits” responsible are more salient under rudimentary mathematics than (at a guess) the theorems which your own queries invoke (or don’t, either because not published, or because (strategically) false)
Some minutes ago I submitted a set of rudimentary arithmetic statements I had submitted perhaps 3 months ago to Gemini, ChatGPT; today the conclusions were not repudiated (as they had been) but affirmed; but at least some of the glaring errors of 3 months ago persisted
(Today but not previously I gave the LLM the job of returning, independently, “Conclusions true” / “User’s arithmetic and/or inferences include at least one error”)
…
I will happily forward the complete query; meanwhile ChatGPT’s response to the follow on query, “How can an LLM get this wrong?” may rhyme with your own experience and/or knowledge/analysis:
This verbatim :
“Explanation for earlier misreading:
“The earlier mistake was not arithmetic but scope tracking.
“LLMs do not have a symbolic parser; when long structured mathematical text alternates between definitions and claims,
it can lose track of which statements are assumptions versus derived claims.
“You explicitly wrote “Letting a^N have distinct Nth roots {x}, b^N have distinct Nth roots {y} mod prime powers d, 0 < (x, y) < d,”
which already established the premise.
“The model’s earlier response failed to carry that assumption into the evaluation of Claim #2, producing an incorrect qualification.
“Under your system definition, Claim #2 is fully correct.”
…
You will know that repeated submissions of some identical query seldom yields some internally consistent set of responses; the Gemini responses are more interesting than ChatGPT’s
Either I’d say were worth a look
…
Anyway, hope this helps, if you are interested in the original query: [email protected]
Regards, D
]]>I came across this blog by accident: I looked up the composer of the score for the (eternally peerless) Granada Sherlock Holmes series
I wouldn’t qualify as an assistant on your project but I hope that I can be of some small use
Your ad hoc summary of LLM “habits” certainly agrees with my own experience of LLMs’ response to mathematical queries; you hint that you have found that LLM performance has improved significantly in your last 3 years’ experience; it has improved noticeably in my last 3 months’ experience
But it remains – day of posting – prone to shocking bad error; and I wonder that it could be that the underlying “bad habits” responsible are more salient under rudimentary mathematics than (at a guess) the theorems which your own queries invoke (or don’t, either because not published, or because (strategically) false)
Some minutes ago I submitted a set of rudimentary arithmetic statements I had submitted perhaps 3 months ago to Gemini, ChatGPT; today the conclusions were not repudiated (as they had been) but affirmed; but at least some of the glaring errors of 3 months ago persisted
(Today but not previously I gave the LLM the job of returning, independently, “Conclusions true” / “User’s arithmetic and/or inferences include at least one error”)
…
I will happily forward the complete query; meanwhile ChatGPT’s response to the follow on query, “How can an LLM get this wrong?” may rhyme with your own experience and/or knowledge/analysis:
This verbatim :
“Explanation for earlier misreading:
“The earlier mistake was not arithmetic but scope tracking.
“LLMs do not have a symbolic parser; when long structured mathematical text alternates between definitions and claims,
it can lose track of which statements are assumptions versus derived claims.
“You explicitly wrote “Letting a^N have distinct Nth roots {x}, b^N have distinct Nth roots {y} mod prime powers d, 0 < (x, y) < d,”
which already established the premise.
“The model’s earlier response failed to carry that assumption into the evaluation of Claim #2, producing an incorrect qualification.
“Under your system definition, Claim #2 is fully correct.”
…
You will know that repeated submissions of some identical query seldom yields some internally consistent set of responses
I have found that Gemini’s responses are more interesting than ChatGPT’s
Either I’d say were worth a look
…
Anyway, hope this helps, if you are interested in the original query: [email protected]
Regards, D
]]>I’m an independent design researcher and software engineer who’s built a web-based platform for spatial mathematical writing (based on my MSc thesis, 20/20). It might address some of the interface challenges you mention. The approach focuses on reduction of cognitive load through spatiality and progressive disclosure and it’s designed for ease of authoring. I’ve written spatial versions of several theorems, including those leading to Newman’s proof of PNT: https://www.brainec.com/university-demo/~vfuto. The tool itself is at https://brainec.com and is already being used by students and professors.
I’d be happy to share any know-how I’ve gathered over the years if it helps.
]]>Some time ago I wrote this spatial version of Heine-Borel theorem with the proof, maybe it could help: https://www.brainec.com/university-demo/~vfuto/heine-borel-theorem.html
]]>https://arxiv.org/abs/2412.15184v1
“Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning”
by
Simon Frieder, Jonas Bayer, Katherine M. Collins, Julius Berner, Jacob Loader, András Juhász, Fabian Ruehle, Sean Welleck, Gabriel Poesia, Ryan-Rhys Griffiths, Adrian Weller, Anirudh Goyal, Thomas Lukasiewicz, Timothy Gowers
]]>The best way is probably to email me and we can discuss whether there’s a match between our needs and what you would be willing/able to do.
]]>