So, nearly all (or perhaps all) C-like languages require us to use curly brackets { and } when writing a do-while loop of more than one statement inside of them. Why is it so? The while acts almost like EndLoop in this case, doesn't it? Why cannot the parser simply look for the next while token in order to end the loop?
-
$\begingroup$ What is your definition of C-like languages? MATLAB does not utilise curly brackets for its while loop. $\endgroup$CroCo– CroCo2025-09-14 19:00:50 +00:00Commented Sep 14, 2025 at 19:00
-
$\begingroup$ @CroCo I must admit I've never heard of anybody referring to MatLab as a C-like language. Some people refer to Rust as a C-like language, though. $\endgroup$FlatAssembler– FlatAssembler2025-09-14 19:35:20 +00:00Commented Sep 14, 2025 at 19:35
-
$\begingroup$ This question about adding do-while to Lua arrives at the same issue from the opposite side. $\endgroup$Michael Homer– Michael Homer ♦2025-09-14 21:46:46 +00:00Commented Sep 14, 2025 at 21:46
-
1$\begingroup$ The design decisions for the syntax of a language do not revolve around what a compiler can or cannot get away with. They revolve around what makes the developer's life easier. Having some blocks of code arbitrarily require curly brackets (for-loops, if-statements, while-loops, function bodies, etc.) but not others, does not make the developer's life easier. $\endgroup$AxiomaticNexus– AxiomaticNexus2025-09-16 10:22:45 +00:00Commented Sep 16, 2025 at 10:22
4 Answers
It becomes ambiguous because you can also have a while loop nested inside the do-while loop. So does the next while end the do-while or does it start a nested while?
It's not enough to check whether while (<condition>) is immediately followed by ;, because that's also a while loop with an empty body.
It might be possible to disambiguate by finding all the do and while, there may be only one valid set of nestings. But C was designed to be parsed in a single pass, this would be more complicated.
In addition, this makes do-while consistent with all other control structures. In all cases, the body is either a single statement or a block delimited with {}.
-
4$\begingroup$ It would also have been possible to solve this problem by using different tokens for the end of a "do-while" and the start of a "while". This is a good demonstration of why overloading keywords is a bad idea. Most BASICs actually solved this problem by using "do" and "loop" as the keywords for a post-check loop. $\endgroup$Graham– Graham2025-09-14 09:31:03 +00:00Commented Sep 14, 2025 at 9:31
-
3$\begingroup$ @Graham BASIC does not have fixed delimiters around any control structures, it has unique start and end keywords for each one. That's nothing to do with overloading keywords, it's a design choice of how blocks work. $\endgroup$IMSoP– IMSoP2025-09-14 12:57:26 +00:00Commented Sep 14, 2025 at 12:57
-
1$\begingroup$ @Graham Or in the case of BBC BASIC, the condition-at-the-top loop is
WHILE <condition>…ENDWHILE, while the condition-at-the-bottom loop isREPEAT…UNTIL <condition>. (I didn't think any 8-bit BASICS hadDO…LOOPas such — it sounded like FORTRAN to me — though I find that it's in CBM BASIC for the Commodore 64.) $\endgroup$gidds– gidds2025-09-14 15:14:16 +00:00Commented Sep 14, 2025 at 15:14 -
5$\begingroup$ right, so you can do
do while(x); while(y);and it's a valid do-while loop. Nngh. $\endgroup$ilkkachu– ilkkachu2025-09-15 10:12:23 +00:00Commented Sep 15, 2025 at 10:12 -
1$\begingroup$ @Graham ...why is this even a “problem” to begin with? Who needs it “solved”? How does it make a “good demonstration” of this, when it doesn’t seem to matter at all? $\endgroup$KRyan– KRyan2025-09-15 16:20:04 +00:00Commented Sep 15, 2025 at 16:20
Regardless of parser ambiguity, I would flip this around: what would the advantage be to allowing this control structure, unlike all others, to have a block with no delimiters?
Programming languages are written, first and foremost, for the convenience of humans. To this end, consistency is a high-value feature of any programming language: it gives people learning the language the ability to transfer understanding from one feature to another.
To this end, most programming languages pick a particular style to delimit blocks, and apply it to all control structures:
- paired keywords for each structure (
for...next,do...loop,if...fi, etc) - fixed keywords for start and end of block (
begin...end) - fixed punctuation for start and end (
{...})
It's quite common for languages to make these optional when the body is a single statement (or, as Eric Lippert points out, consider the delimited block as a single statement); but it's also quite common for projects to enforce a Coding Style which treats delimiters as mandatory even in those cases.
Having the do-while loop work differently from other, more commonly used, control structures, would be an extra thing for developers to learn; if it was an optional choice, Coding Standards would probably enforce using the delimited form.
There are languages which include features to avoid typing a few delimiters, but they're generally controversial - JavaScript's "Automatic Semicolon Insertion" comes to mind.
-
$\begingroup$ Yes! Parsable is not the target—parsable is just table stakes. $\endgroup$Chris Bouchard– Chris Bouchard2025-09-14 15:12:37 +00:00Commented Sep 14, 2025 at 15:12
-
8$\begingroup$ I had to implement auto semis in JScript and ensure that any time we added a new feature to JS, that auto semis continued to work. I personally wouldn't say that it was controversial. I'd say that it was dumb. Netscape for the most part did a good job of the design of JS but that feature made huge amounts of work for the dev team, created a whole new class of hard-to-spot bugs for users, and the benefit was miniscule. It literally saved a dozen keystrokes per program. Remember, JS was designed for 10-ish-line scripts. $\endgroup$Eric Lippert– Eric Lippert2025-09-14 15:20:37 +00:00Commented Sep 14, 2025 at 15:20
-
1$\begingroup$ Several languages have optional semicolon delimiters — not just JavaScript, but awk, PHP, Lua, Kotlin, and Swift too. (I think the level of controversy depends upon how it's implemented in each case.) $\endgroup$gidds– gidds2025-09-14 15:20:53 +00:00Commented Sep 14, 2025 at 15:20
-
1$\begingroup$ @gidds That seems an odd list of examples. In awk, semicolons are an alternative to newlines for separating statements within an "action" (block); they're not required between blocks, but that's true in C itself. In PHP, the only place I can think they're optional is when explicitly switching to output with
?>; even at end-of-file you need to terminate the last statement with a semicolon. At a glance it seems Kotlin and Swift copied JS's terrible idea of inference, which has lowered my opinion of both languaages. $\endgroup$IMSoP– IMSoP2025-09-14 16:07:28 +00:00Commented Sep 14, 2025 at 16:07 -
1$\begingroup$ @Seggan Unless Kotlin's rule is "never inside a single statement", the fundamental problem remains: machines are happy with long lists of grammar rules, humans need easily learned generalisations. I would much rather unthinkingly press a key on my keyboard at the end of every statement, than pause every time to think whether the parser could get by without one. $\endgroup$IMSoP– IMSoP2025-09-14 21:35:51 +00:00Commented Sep 14, 2025 at 21:35
The other answers are good but I'll give you the frame challenge answer as well:
So, nearly all (or perhaps all) C-like languages require us to use curly brackets { and } when writing a do-while loop of more than one statement inside of them.
They do not do any such thing, and so the question is predicated upon a falsehood. A do-while loop in those languages always requires that the body be a single statement.
A block statement is a single statement. The fact that a block statement contains zero or more statements does not change the fact that a block statement is itself a single statement, same way that an if-else statement containing two statements is itself a single statement.
-
1$\begingroup$ I think the point of the question was that do-while doesn't need to be limited to a single statement, because it has a built-in end delimiter of its own. Everything else needs to be a single statement or block because there's no other way to know where it ends. $\endgroup$Barmar– Barmar2025-09-14 15:59:06 +00:00Commented Sep 14, 2025 at 15:59
-
1$\begingroup$ @Barmar: I get that was the point of the question. The point of the answer is that questions that indicate that the poster has a false belief are opportunities to disabuse them of that false belief. I certainly thought about language design more clearly after I realized that block statements were statements. $\endgroup$Eric Lippert– Eric Lippert2025-09-14 16:01:21 +00:00Commented Sep 14, 2025 at 16:01
-
1$\begingroup$ This may be the "traditional" interpretation, and a common way of defining it in a parser, but it is not quite universal. In Perl, for instance, curly braces delimit a "block", which has explicit semantics (such as being able to break out with
last); control keywords must be followed by a block, not a statement. I think Rust takes a similar approach. There are almost certainly other languages where the delimiters are part of the control syntax (not a general "block") but still mandatory. $\endgroup$IMSoP– IMSoP2025-09-14 16:27:01 +00:00Commented Sep 14, 2025 at 16:27 -
3$\begingroup$ An interesting example from a different language family is Pascal -
begin...endcreates a compound statement, and is mostly used in the same way as{...}in C; butrepeat...untildoes not require a single statement, it delimits any number of statements, exactly as proposed in the question. So even if you do define"{" statement_list "}"as a subset ofstatement, it doesn't automatically lead to all loops requiring such a compound statement, if you don't want them to. $\endgroup$IMSoP– IMSoP2025-09-14 16:40:46 +00:00Commented Sep 14, 2025 at 16:40 -
2$\begingroup$ (+1) "so the question is predicated upon a falsehood." kill it before it even proliferates. Not to mention the lack of clarity about what exactly a C-like language is. $\endgroup$CroCo– CroCo2025-09-14 19:03:43 +00:00Commented Sep 14, 2025 at 19:03
Along with other points people have raised, pointing out that the assumptions in the question are incorrect, I'll add one more (mostly historical) reason.
I'm going to start with a bit of a digression. When C was young, the compiler ran on a PDP-11 computer that could only address 64K of memory. That held not only the compiler, but also the code as it was being generated. The compiler was arranged in memory in roughly the same order as the code ran: the preprocessor first, then the lexer, then the parser, and finally the code generator (followed by empty space up to the top of the memory).
When the compiler started generating code, it put the code it was generating in empty area past the end of the compiler. But it frequently ran out of empty space at the top of memory--and as the pointers were incremented, they'd wrap around to the bottom of memory, so generated code started to overwrite the preprocessor, then the lexer, then the parser.
That digression is mostly intended to emphasize the fact that memory was extremely tight. So even if it would have been easy to parse a do/while loop with a multi-statement body, they probably wouldn't have done so. They were scrimping and saving on memory almost anywhere they could. Even if it would have been fairly simple, adding code to parse a multiple statement block ending with a while would still have taken more memory than using existing code to parse a block delimited with curly braces, which was already used in many places.
-
6$\begingroup$ Kids today with their 64 bit process isolated flat virtual memory spaces should GET OFF MY LAWN. But seriously, the problem you point out was around for a long time. I fondly remember writing code for the early NetWare OS where I could tell if I'd blown past the end of the stack because it would start writing to memory mapped to the screen. Everything ran in the zero ring; user code had full access to OS memory structures. $\endgroup$Eric Lippert– Eric Lippert2025-09-15 16:20:29 +00:00Commented Sep 15, 2025 at 16:20
-
1$\begingroup$ @EricLippert: Interestingly, at the time it wasn't really even seen as a problem. Overwriting code that was no longer being used just made sense. Even in the early 1990's when Dennis posted about it on Usenet, quite a few people didn't really see it as a problem. Most figured it would be a problem if there was an input sequence from which the code generator would overwrite itself. Although he never stated it directly, if memory serves Dennis hinted that the answer was probably yes. There was general agreement that that would be a real problem... :-) $\endgroup$Jerry Coffin– Jerry Coffin2025-09-15 16:28:57 +00:00Commented Sep 15, 2025 at 16:28
-
3$\begingroup$ @JerryCoffin: "Even in the early 1990's when Dennis posted about it on Usenet, quite a few people didn't really see it as a problem." Back in the old days, Microsoft happily signed the WinRing0 driver. Attitudes have (rightly!) changed. $\endgroup$Brian– Brian2025-09-15 21:37:46 +00:00Commented Sep 15, 2025 at 21:37
-
$\begingroup$ C was originally developed while Unix was being designed for the PDP-7, which only had 4K words (9K bytes) of memory. $\endgroup$Barmar– Barmar2025-09-16 15:06:10 +00:00Commented Sep 16, 2025 at 15:06
-
$\begingroup$ I'm not so sure about that, @Barmar. According to DMR, it was the introduction of a PDP-11 into the Bell Labs computing environment that spurred the development of C from B, and rewriting the (B) compiler to generate PDP-11 machine instructions was among the very first steps, while he was still calling the result "new B". UNIX did start out on the PDP-7, but C started on the PDP-11, as part of the effort to port early UNIX. $\endgroup$John Bollinger– John Bollinger2025-09-16 18:52:06 +00:00Commented Sep 16, 2025 at 18:52