-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathtl-manual.html
More file actions
2468 lines (1895 loc) · 100 KB
/
tl-manual.html
File metadata and controls
2468 lines (1895 loc) · 100 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<HTML>
<HEAD>
<!-- Created by texi2html 1.56k from tl-manual.texinfo on 12 October 2001 -->
<TITLE>ThinLisp Rant and Notes</TITLE>
</HEAD>
<BODY>
<H1>ThinLisp Manual</H1>
<H2>Version 1.0.1</H2>
<H2>12 October 2001</H2>
<ADDRESS>The ThinLisp Group:</ADDRESS>
<ADDRESS>Jim Allard</ADDRESS>
<ADDRESS>Ben Hyde</ADDRESS>
<P>
<P><HR><P>
<H1><A NAME="SEC1" HREF="tl-manual_toc.html#TOC1">Acknowledgements</A></H1>
<P>
ThinLisp was begun in 1995 as a late night project born of the frustration of
attempting to ship commercial quality software products written in Lisp. Four
years later I'm still obsessed by the issues that drove me to it in the first
place -- practical use of high level languages to implement complex
applications while achieving the performance that seems easy when using
languages that are semantically "close to the silicon". Hopefully this
implementation will lead to some useful insights towards that goal.
<P>
Thanks to Ben Hyde for convincing Gensym and myself that this system should be
taken off the shelf, polished up, and released as open source software. Mike
Colena and Glen Iba wrote significant parts of the implementation. Nick Caruso,
Joe Devlin, Rick Harris, and John Hodgkinson and also contributed their time,
sage opinions, and code. Thanks to Lowell Hawkinson, Jim Pepe, and Dave Riddell
for their support of this work at Gensym Corporation. Bill Brody, Kim Barrett,
and Rick Harris wrote much of the Chestnut Lisp to C translator that was the
philosophical precursor to this work. Thanks to Kim Barrett and David Sotkowitz
of IS Robotics, Inc. (now iRobot Corporation) for contributions of
multiple-inheritance code and opinions, and to Rod Brooks and Dave Barrett for
their ideas. Finally, thanks to my wife, Gerry Zipser, for putting up with the
bleary-eyed aftermath of coding sessions, and for helping to focus efforts
during this past summer's release push.
<P>
<BR>Jim Allard<BR>Newton, Massachusetts<BR>October 6, 1999 and again May 9, 2001
<H1><A NAME="SEC2" HREF="tl-manual_toc.html#TOC2">Preface</A></H1>
<P>
ThinLisp is an open source Lisp to C translator for delivering commercial
quality, Lisp-based applications. It implements a subset of Common Lisp with
extensions. ThinLisp itself is written in Common Lisp, and so must run on top
of an underlying Common Lisp implementation such as Allegro, MCL, or CMU Lisp.
The C code resulting from a translation can then be independently compiled to
produce a small, efficient executable image.
<P>
Originally designed for real-time control applications, ThinLisp is not a
traditional Lisp environment. It purposefully does not contain a garbage
collector. ThinLisp produces compile time warnings for uses of inherently slow
Lisp operations, for consing operations, and for code that cannot be optimized
due to a lack of sufficient type declarations. These warnings can be suppressed
by improving the code, or through use of lexical declarations acknowledging that
the code is only of prototype quality. Code meeting the stringent requirements
imposed by ThinLisp cannot by sped up by rewriting it in C.
<P>
The ThinLisp home is <A HREF="http://www.thinlisp.org/">http://www.thinlisp.org/</A>. The newest source
distributions of TL and this manual can be found here. Bugs should be
<H1><A NAME="SEC3" HREF="tl-manual_toc.html#TOC3">Rant</A></H1>
<P>
<A NAME="IDX1"></A>
<H2><A NAME="SEC4" HREF="tl-manual_toc.html#TOC4">My Excuse for Ranting</A></H2>
<P>
I've imagined the reactions people will have when they see that there is yet
another Lisp to C translator implementation being released. Mostly I've
expected that people would think I'm stark raving mad to engage in a large work,
to serve an almost non-existent user base of programmers in a language that is
largely discredited while being actively antagonistic towards many of the design
principles this language has employed for decades. Since I'm expecting that
people will think I'm mad (and half believing it myself), I then feel entitled
(even obligated) to engage in a rant to explain the motivations of such a
curious endeavor. Please excuse this indulgence and look charitably upon it as
a form of self-therapy.
<H2><A NAME="SEC5" HREF="tl-manual_toc.html#TOC5">Lisp Background</A></H2>
<P>
More than fifteen years ago I was in a recitation session of an Introduction to
Artificial Intelligence course. The lecturer, Patrick Winston, had come in for a
session on automated planning. At one point I asked a question about how a
certain situation might be handled in the system we were studying, and he
answered that it couldn't be handled in that system and that solving that
problem was a research topic. The fact that problems as simple (so I thought at
the time) as what we were describing were research, and that as a sophomore in
college I could grasp them seemed amazing to me then, as it still does now. On
the spot I could imagine six different ways to attack that problem, and none of
them felt any less plausible than some of the established techniques that we
were studying. I was hooked.
<P>
At the time, this sort of optimism and arrogance about the immediacy and
inevitability of technical success in AI was rampant. The convergence of
breakthroughs in techniques, tools, and venture capital money that flowed like
wine was intoxicating for everyone.
<P>
No one thought this work would be trivial, so the best software development
tools available would be required. And of course the best tools from the
perspective of the AI community included the language that this community had
been developing since the late 1950's, Lisp.
<P>
Originally, Lisp was developed as an experiment in providing a direct means of
implementing lambda calculus, where operations were first class objects that
could be manipulated in the same manner as data. The presumption behind many of
the Lisp design choices was that the most important goal was to maximize a
programmer's productivity in developing new functionality by enabling the purest
form possible of coded solutions, and so to hide as many of the platform
specific details as possible. In the ongoing evolution of Lips, considerations
of the efficiency of processing speed and memory use came in a distant second to
programmer convenience and to the attempt to protect the programmer from bugs
caused by mathematically abhorrent behaviors such as integer wraparound.
<P>
Perhaps because of this devotion to developer productivity; perhaps because of
the synergy between language syntax, parsing, and data; perhaps because of the
invention of the Lisp macro facility; or perhaps because it was invented down
the hall, Lisp became the language of choice for research and experimentation in
new programming constructs (at least at MIT). Many of the different possible
approaches to object oriented programming were first tried out in Lisp.
Examples are multiple inheritance, multi-methods, prototype-based object systems,
and automatic method combination via whoppers, wrappers, before, and after
methods. Other innovations experimented with in Lisp are error handling
systems, multi-platform file naming approaches, and guaranteed execution of
clean-up code (i.e. unwind-protect).
<P>
In 1984 a group formed to attempt to unify as many of the best features of
different Lisp dialects as possible into one central standard. From this effort
came the excellent books <CITE>Common Lisp, The Language</CITE> by Guy Steele and it's
second edition, which incorporated many of the changes that would eventually be
codified into ANSI Common Lisp. This is the dialect that dominates Lisp work
today, at least in the U.S.
<H2><A NAME="SEC6" HREF="tl-manual_toc.html#TOC6">Lisp in Action</A></H2>
<P>
In any case, throughout the 1980's the bulk of the research labs and companies
that attempted to commerciallize AI technologies used Common Lisp to implement
their products. The most broadly deployed AI-based technique was expert
systems, also called rule-based systems. <A NAME="DOCF1" HREF="tl-manual_foot.html#FOOT1">(1)</A> Early results were encouraging. In the lab, sometimes on specialized
hardware, the software engineers were able to demonstrate the beginnings of the
functionality that they were promising.
<P>
However, when it came to deploying these systems, things didn't turn out so
smoothly. Most software products of that era were coded in languages such as
Fortran, Cobol, Pascal, C, and assembly language. It turned out that Software
developers who could breathe fire in Lisp were clumsy, inefficient, or flatly
unwilling to port their systems to these traditional languages.
<P>
Customers were loathe to buy expensive, special purpose hardware for these
applications while cheap, ever more powerful PC computers were rolling out.
During this period the workstation class computer was also coming on, with
initial excellent products coming from from Sun and Apollo. Stories about the
difficulties in rolling out applications are rife throughout the industry, but
I'll speak to just the cases I personally was involved in. While the presence
of Lisp as the basis of these software systems cannot be fully blamed for their
success or failure, in all cases it was a controversial and high-visibility part
of the system.
<P>
The Mentor system for preventative maintenance was a success story. A group at
Honeywell made a forward-chaining, rule-based system that helped novice
technicians perform preventative maintenance on centrifigal chillers,
essentially large air-conditioning units found in office buildings. Honeywell
had a significant business in maintenance services for traditional
compressor-based chillers, but it lacked trained personnel to go after this new
and lucrative service market. A corporate research lab made a Lisp-based
implementation of a run-time environment for Mentor that ran on a ruggedized
Grid laptop computer. While the system was useful, it ran sluggishly and the
Lisp-based implementation left little room on the laptop for the system to grow
and improve. A technology transfer project was begun which resulted in a
software engineer within the service division rewriting the application in C,
greatly improving its speed and reducing its size, allowing the system to be
expanded into a commercially viable tool. As far as I know, this system is
still in use.
<P>
In this system, Lisp served a very useful purpose as a prototyping language, but
a critical element in the success of the project was to abandon the original
implementation and rewrite it in C. This was necessary for the commercial
division that deployed this product, first to gain acceptance, and second to
enable that division to maintain the system with the personnel they had working
there. The rewrite also sped up the application and greatly reduced its
run-time size.
<P>
The Cooker system was a real-time supervisory level monitoring system also built
at Honeywell which was intended to be used in process control plants. It was
implemented on a Symbolics 3640 which was deployed to a paper mill in
Pennsylvania. Though care had been taken to tailor memory use such that a
generational garbage collector would efficiently reclaim the vast majority of
discarded memory, every 6 weeks or so a hemisphere flip would occur (i.e. a big
garbage collection), during which the system could not maintain adequate
real-time performance for several hours. After the third time that this
happened, the plant crew turned off the system and never restarted it again.
<P>
In this case, the system was gaining acceptance by the pilot users, but as a
direct consequence of the performance problems caused by garbage collection this
pilot project was closed out, and this particular deployment approach was
dropped. A different
<P>
The G2 system is a real-time supervisory level control system built by Gensym
Corporation. It's source code was written in Lisp, and for approximately 3
years was released as a world-saved image made from Lucid Lisp environments on
UNIX workstation class machines. In 1990, Gensym ported its system to Ibuki
Common Lisp, which was a Lisp-to-C translator descended from Kyoto Common Lisp.
By translating to C and discarding those parts of Common Lisp that we didn't
use, we were able to decrease the size of our shipped product by 25%, a fact
that was greatly appreciated by a customer base tired of constantly having to
buy more memory for computers. In 1993 Gensym changed again to a different Lisp
to C translator made by Chestnut Software. This produced an additional 40%
decrease in size, and a significant improvement in performance. The ThinLisp
translator is a philosophical descendent of the Chestnut translator, though
ThinLisp was written from scratch. One of the factors cited by the head of
sales of G2 was that once we used a Lisp to C translator, he could say that the
product was "deployed in C", thereby side-stepping the Lisp issue, which had
been a significant downside to our system in the eyes of many potential
customers.
<P>
Here Lisp was a competative advantage for development, but became a sales
liability due to the poor reputation of Lisp-based applications in the
marketplace. Once the organization switched to a Lisp-to-C translator, then
development could continue to see the advantages of Lisp coding, but marketing
could bury that detail, avoid a defensive technical argument that was proving a
barrier to closing sales, and continue to focus on the product's positive
benefits.
<P>
The Bore Rat is an autonomous robotic device that travels through the piping
installed in oil wells. Built by IS Robotics, it is capable of traveling 2
miles straight down into a well, perform maintenance and data logging tasks, and
then return to the surface, all under its own power and control. The software
for this system was written in L, a Lisp subset made for embedded processors, in
this case a Motorola 68376. The first prototypes of the BoreRat became
operational in 1999 and are currently being marketed to the oil field services
industry by IIC. In this case Lisp was not a significant factor, either in a
positive or a negative sense.
<P>
During this time, Lisp has waned in its influence. Java has replaced Lisp in
many places, the number of Lisp vendors has dropped, and Lisp is being taught in
fewer college courses. The future of traditional Common Lisp is in doubt,
though it is still strong in a few niches, and deep within a few specific
applications, such as Yahoo Store.
<P>
As a result of these experiences, and as an outcropping of my experience using
Lisp to implement real-time control systems, I've wanted to have a Lisp
environment that provided the features that I liked, such as macros for control
structure abstractions, that had outstanding performance characteristics given
proper type declarations, and which remained small and lean. ThinLisp is an
embodiment of that environment.
<H2><A NAME="SEC7" HREF="tl-manual_toc.html#TOC7">Scientists, Mathematicians, and Engineers (oh my!)</A></H2>
<P>
I have been fascinated (and occasionally amused) by the nearly religious
arguments about the merits of various computer languages. Beyond the pure
merits of the systems being discussed there seemed to be lurking a difference in
philosophy about how to go about programming that colored these discussions.
This section is an attempt to expose a theory about that underpinning. In the
end, this discussion also helps me explain why I've made the design choices that
I have in ThinLisp.
<P>
From the beginning Lisp was a mathematician's language. To understand what I
mean by this, I must explain one view of the differences between the goals of
scientists, mathematicians, and engineers. <A NAME="DOCF2" HREF="tl-manual_foot.html#FOOT2">(2)</A>
<P>
The goal of scientists is to learn truths about the world around them, using
observation, analysis, hypothesis, and experimentation to advance and buttress
their claims of discovery. Pure science aspires to be unconcerned with the
utility or applicability of the information they seek. The ultimate goal is to
discover and understand what <EM>is</EM>.
<P>
Mathematicians aspire to an even higher goal -- understanding what truths there
are or that there might be, regardless of or even entirely divorced from the
limits of the physical world. The studies of mathematicians are the stuff of
pure thought and imagination, yet bound by the strictest rules of proof. To
build robust castles in the air based on proofs combining into a single
monolithic whole requires that all pieces must act perfectly in concert.
Therefore the mathematician must constantly be vigilant against contradictions
between the different portions of his whole proof, where one technique carries
with it constraints which must be respected within all other parts of the proof.
With this rigorous technique in hand, the ultimate goal is to discover what is
<EM>true</EM>.
<P>
On the other hand the goal of the engineer, in the abstract, is to bend the
behavior of an uncooperative world to his will. Whether it is the stereotypical
engineer operating a train, or a civil, acoustic, electrical, mechanical,
software, or other specialty of engineer, their work is to use what is known
from science and mathematics, exploiting practicality and expediency, to operate
or design a physical entity so that it performs a specific task, usually with
the motivation of the almighty dollar. Within the constraints of the physical
world, the ultimate goal is to accomplish what is <EM>practical</EM>.
<P>
Within the computer science community, examples of personalities suited to each
of these different disciplines abound. Within the the small community of
computer language designers, the mathematician and engineer distinguish
themselves through two very different approaches to language design.
<A NAME="DOCF3" HREF="tl-manual_foot.html#FOOT3">(3)</A>
<H2><A NAME="SEC8" HREF="tl-manual_toc.html#TOC8">Denotational and Operational Semantics</A></H2>
<P>
The mathematician espouses <EM>denotational semantics</EM>, where the behavior of
an operation is determined by what the operation <EM>means</EM>. In this style,
the goal is to make a complete unified system where each element of the language
must be intricately coordinated with each other part, so that each element can
retain its meaning when used in conjunction with any other part. The semantics
of each operation are primary, and the consequences of the definition of one
operation can strongly affect the implementation of many other operations.
While the behaviors required of any individual operation may become complicated,
the programmer using such a language can work confident in the knowledge that
his code may be intermingled in complex ways with other's code without fear of
each violating the other's assumptions about eventual program behavior. When
finished, systems defined along these lines make powerful and elegant
environments.
<P>
The engineer espouses <EM>operational semantics</EM>, where what each operation
means is determined by what it <EM>does</EM>. In this style, the goal is to
determine the elements of a toolbox and to specify the behavior of each, largely
independently of any other operation in the language. The definitions of
operations stand alone and are designed specifically to minimize the interaction
between operations. The semantics of each operation are thin, but this allows
the programmer to work confident of the performance and size of the program
being generated. Here one is seldom surprised by the behavior of a line of
code, and moderately experienced programmers can accurately predict the behavior
down to enumerating the machine instructions that are likely to be executed.
<P>
Most computer languages is use today are exemplars of the engineering style,
defined with operational semantics in mind. Building primarily from the
operations that could be implemented on the computing devices that could be made
at the time, and secondarily on the specific computing needs of a user base at
hand, languages were designed to allow people to command computers to do things.
This quote from the preface to <CITE>K&R C</CITE><A NAME="DOCF4" HREF="tl-manual_foot.html#FOOT4">(4)</A>, the original book describing C,
explains this style well.
<BLOCKQUOTE>
<P>
C is a general-purpose programming language which features economy of
expresssion, modern control flow, and data structures, and a rich set of
operators. C is not a "very high level" language, nor a "big" one, and is not
specialized to any particular area of application. But its absence of
restrictions and it generality make it more convenient and effective for many
tasks than supposedly more powerful languages.
</BLOCKQUOTE>
<P>
Typically the operations in these languages are semanticly thin (or shallow, if
you prefer), to the extent that a programmer of medium experience could almost
predict the number of machine instructions required to carry out any given
command in the language. Fortran, Pascal, Basic, and C are all examples of
languages in this category, with C (and it's derivative, C++) now being the
default choice for people building big systems.
<P>
Lisp and other languages (such as CLU, Smalltalk, Eiffel, and recently Java)
were are exemplars of the mathematical style, emphasizing denotational
semantics. The operations in these languages were defined with semantics that
are deep (or fat if you prefer), focusing not on what an operation <EM>did</EM>
but on what it <EM>meant</EM>. Of course there had to be implementations of these
operations, but the actual machine instructions carried out were allowed (even
expected) to vary widely depending on the manner and situation in which it was
used.
<P>
These philosophical differences lead to vastly different languages, with the
consequences of each style evoking surprise, disbelief, or even indignant
disgust in ardent followers of the opposing camp. Denotational fans can't
believe how much of the underlying limitations of systems are exposed, and
operational fans can't believe how disassociated operations are from the
machine. One sees too much exposed, and the other sees too much being hidden.
<H2><A NAME="SEC9" HREF="tl-manual_toc.html#TOC9">GOTO Right Now, or GOTO When You're Ready</A></H2>
<P>
An example of how this difference in design perspective plays out can be found
in the behavior of the <CODE>goto</CODE> statement in C and the <CODE>go</CODE> special
operator in Common Lisp. Consider the following two code fragments.
<P>
In C:
<PRE>
int find_big_int(int *set, int start, int end) {
<I>Iterate through the set</I> {
if <I>test</I>
goto bad;
...
}
bad:
...
}
</PRE>
<P>
In Lisp:
<PRE>
(defun find-big-int (set start end)
(tagbody
(<I>Iterate through the set</I>
(if <I>test</I>
(go bad))
...)
bad
...))
</PRE>
<P>
In the C code above, regardless of the surrounding code and environments, a user
can predict the machine's behavior at the <CODE>goto</CODE> command and what the
likely cost is. In particular, a conditional branch instruction will be
generated based on the result of the test expression, and it will immediately
begin executing the code following the label <CODE>bad</CODE>.
<P>
In the Lisp code, though the situation appears similar, the resulting behavior
may be very different. If the code that implements the "Iterate through the
set" element above is a simple <CODE>do</CODE>, then the behavior will be very much
like what happens in C. However, if the iteration is performed using a mapping
function, and the <CODE>(go bad)</CODE> form is contained within an embedded
<CODE>lambda</CODE> form defining a different function, then the situation is very
different. Because goto tags have lexical scope, even across function
boundaries, requires that the <CODE>go</CODE> be able to exit the function it is in
and re-enter the outer function at the appropriate point. In this case,
entering the <CODE>tagbody</CODE> statement would require establishing a restart point
similar to a <CODE>setjmp</CODE>, and the <CODE>go</CODE> would become something similar to
a <CODE>longjmp</CODE>. If the iteration contains an <CODE>unwind-protect</CODE>, then
cleanup code will be executed after the <CODE>go</CODE> executes, but before the code
at the <CODE>bad</CODE> statement. If the <CODE>unwind-protect</CODE> clean-up forms
contain a go to a further surrounding tagbody, then the code after the
<CODE>bad</CODE> tag will never be executed.
<P>
In this example, operational prorammers are floored by how complicated a simple
<CODE>goto</CODE> statement can become. Denotational programmers are in disbelief
that the execution of clean-up code cannot be enforced.
<H2><A NAME="SEC10" HREF="tl-manual_toc.html#TOC10">ThinLisp's Compromises</A></H2>
<P>
While C++ and Java can be viewed as attempts to move C more towards a
denotational footing, Thinlisp is an attempt to move Common Lisp more towards an
operational footing. I've attempted to lessen the impact of Lisp's "gotchas"
that frustrate programmers who look at languages from an operational semantics
point of view. Attempting to allow the benefits of deep denotational semantics
while warning of performance and behavior surprises is the underlying goal for
ThinLisp.
<P>
One of the goals of ThinLisp is to implement the deep semantics of Lisp, while
offering compile time warnings whenever these semantics force behavior that
deviates from expectations consistent with operational (or "thin") semantics.
If the programmer really intends to use the full, deep semantics, then a
lexically apparent declaration can be used to suppress the warnings and accept
the more expensive behavior required by the deep semantics.
<P>
There are three main areas where ThinLisp warnings are used to help prevent
performance gotchas.
<UL>
<LI>
consing,
<LI>
run-time dispatched operations, and
<LI>
non-local exits.
</UL>
<P>
Within ThinLisp there is no garbage collector. While many fine garbage
collection systems exist, and some are truely approaching the goal of real-time
operation, they all impose significant overhead to maintain a root set of
objects that form the base of the reference tree that is walked during a garbage
collect. ThinLisp avoids the cost of the garbage collection and the cost of
maintaining the data that makes garbage collection possible.
<P>
This means that Lisp programmers must adopt the discipline of the C programmer
in knowing where memory is allocated and where it is reclaimed. Because this is
such a foreign concept to Lisp programming style, automated help is required to
protect Lisp programmers from careless consing. It is a well known maxim that
Lisp programmers looking to improve performance should avoid consing, however
very few Lisp programmers are good at it. Within ThinLisp there is a concept of
both lexical and dynamic scopes that supply consing contexts. At compile time
any consing operation must have a <CODE>consing-area</CODE> declaration lexically
surrounding its scope, or else a compile time warning is given. The
<CODE>consing-area</CODE> can either be <CODE>permanent</CODE> or <CODE>temporary</CODE>.
<A NAME="DOCF5" HREF="tl-manual_foot.html#FOOT5">(5)</A>
A <CODE>permanent</CODE> area means consing proceeds as normal, with all created
objects having dynamic extent. However, whenever consing happens within a
<CODE>temporary</CODE> scope, then the object's lifetime only has dynamic scope back
to the exit point of the <CODE>temporary</CODE> <CODE>consing-area</CODE>. The
<CODE>consing-area</CODE> declaration is similar to the <CODE>dynamic-extent</CODE>
declaration, but it is pervasive instead of applying to a particular variable
binding.
<P>
ThinLisp warns about run-time dispatched operations which could be made more
efficient with type declarations. In some cases, the run-time polymorphism of
an operation is exactly what is desired, and then there are declarations that
can be used to suppress these warnings. It is well known that type declarations
can improve the performance of some Lisp operations, however it is very
difficult in practice to acquire these optimizations without some warnings and
assistance from the compiler. The assumption in building the ThinLisp
translator is that you are attempting to make production quality code, and
therefore want advice about code that could be hiding a performance gotcha.
<P>
Lastly, ThinLisp can issue compile-time warnings about non-local exits. The
<CODE>unwind-protect</CODE> form can guarantee that clean-up code is executed on exit
from a scope, but it is a very expensive operation do to <CODE>setjmp</CODE> calls
establish a reentry point into the function, and the need to declare variables
<CODE>volitile</CODE> if they are modified within that scope. There are some macro
forms where is it acceptible if an error exits a scope without a clean-up, but a
normal execution of the body form should not leave the scope with a
<CODE>return-from</CODE> form. In order to allow programmers to write macros that
have clean-up code without resorting to an <CODE>unwind-protect</CODE> form, ThinLisp
provides a <CODE>require-local-exit</CODE> declaration that will cause compile time
warnings if a scope is left prematurely. The other kind of non-local exit that
ThinLisp prevents is <CODE>goto</CODE> or <CODE>return-from</CODE> forms that exit a lexical
closure and re-enter a surrounding function. These patterns of code are
tantamount to <CODE>catch</CODE> and <CODE>throw</CODE> forms, which are very expensive. By
declining to implement this form of non-local exit, we can prevent this kind of
performance pratfall.
<H1><A NAME="SEC11" HREF="tl-manual_toc.html#TOC11">Original Design Notes</A></H1>
<P>
<I>The following were the original design notes about what should be built into
GL, which became ThinLisp. -jallard 10/12/99</I>
<H2><A NAME="SEC12" HREF="tl-manual_toc.html#TOC12">Def-System</A></H2>
<P>
In order to get the flexibility that we want with this development environment,
a def- system implementation will be built-in to GL. This is necessary to get
the desired layering of libraries and to enable identification of call-trees,
module interdependencies, and dead-code.
<P>
The base system will be GL. It will contain only the smallest set of
primitives possible, though this is still a substantial set. Support for all
defining forms and special forms must be built here. There will several
systems that implement portions of Common Lisp that we think may not be
necessary for all products we intend to build in Lisp. These should include
many of the utilities that we currently have in G2 to replace consing versions
of the same from Common Lisp. For example, the gensym-pathnames and gensym-
streams code should be made into a separate file-io system that may or may not
be included into products (G2 and TW need it, GSI does not).
<P>
Within the Def-system for a group of files there should be a flag indicating
whether or not this set of files is to be distributed as a library or an
executable. If it is a library, the def-system should include (or there should
be a separate form for declaring) those functions that are called by users of
the library, and the def-system should also include (or there should be a
separate form for declaring) those call-back functions provided by users of the
library that the internals of the library will call (this is the GSI call out
case). Foreign functions should be handled as they are now using the def-
gensym-c-function form.
<H2><A NAME="SEC13" HREF="tl-manual_toc.html#TOC13">Overview of Translations</A></H2>
<P>
There are four different modes in which GL (Gensym Lisp, oops, er, Language)
source code will need to be processed, two Lisp compilation modes and two
translation modes. The first is development compilation in an underlying Lisp
environment. The second is a macro definition compilation to define macros in
preparation for a translation. These are otherwise unremarkable uses of the
underlying Lisp implementations compile-file and load functions, distinguished
only by what we do with the :development and :distribution compiler switches.
The third is a translation pre-process step in which the names and data types
of variables and functions are determined. The fourth and final is a
translation mode in which C files and H files are generated.
<P>
The translator from GL to C will be written to make as few dependencies as
possible on the implementation it is layered on top of. (As this is being
written Lucid has gone bankrupt and been acquired by Harlequin and Apple is
dumping off MCL to a third party to support.) In order to keep things portable
we will implement our own macro-expander to avoid the problem of differences in
macro-expansion environment representations. Those Lisp functions whose
argument lists are more complex than positional and optional arguments will be
implemented as macros. These macros will expand into functions with fixed and
optional arguments. We will write all of our own macro implementations of all
non- function Lisp primitives so that we can control the base set of functions
and special forms that these macro-expansions will bottom out within in a
portable way. All of these base functions that the translator natively
supports will be implemented in the Lisp development environment so that the
development environment execution will be as similar as possible to the final C
execution, but those facilities that are required to come directly from the
underlying Lisp during development (e.g. special forms) will be had through
conditional macro-expansion that uses the underlying Lisp. For example, a call
to gl:list with 5 arguments will always macro-expand into calls to the
gli::list-5 function. However, calls to gl:catch will expand to gli::catch
while translating and will expand into lisp:catch during development. The
special-form gl:the will expand such that it performs type checking at
development run-time and type declared local variables will at least have their
initial values type-checked. All reasonable effort will be made to avoid
redundant run-time checks in development.
<P>
A Lisp function fragment illustrating the outermost loop of a translation is as
follows. The actual implementation will need further complications, such as
recompile arguments.
<PRE>
(defun translate-system (system)
(loop for used-system in (system-used-systems system) do
(load-system used-system :binaries t :gl-preprocess-info t
:gl-trans-info t))
(loop for file in (system-files system) do
(when (binary-out-of-date-p file)
(compile-file file))
(unless (binary-loaded-p file)
(load-binary file))
(when (gl-preprocessed-date-out-of-date-p file)
(gl-preprocess file))
(unless (gl-preprocessed-data-loaded-p file)
(load-gl-preprocessed-data file)))
(loop for file in (system-files system) do
(when (or (gl-trans-data-out-of-date-p file)
(gl-c-file-out-of-date-p file)
(gl-h-file-out-of-date-p file)
(progn (load-gl-trans-data file)
(gl-trans-data-indicates-retranslate-p file)))
(gl-translate-file file)))
(gl-dump-makefile-information system))
</PRE>
<H2><A NAME="SEC14" HREF="tl-manual_toc.html#TOC14">Compilation Passes</A></H2>
<P>
The development compilation pass can be detected by the :development switch in
*features*. The macro compilation pass has no significant *features*
included. The pre-process translation pass and the translation pass can be
detected with the :translator and :no- macro switches in *features*. Note that
the pre-process pass may be interleaved with the macro pass (i.e. we may macro
compile then pre-process each Lisp file in turn), but these features can still
be reliably used to detect the difference. Also note that we will likely want
to perform development translations which will produce images with full runtime
type checking, which could be detected by having both :translator and
:development available at the same time. This may be difficult, since the
:development switch has often been confounded with dependencies on Lisp
development environment (i.e. :lucid and formerly :lispm) switches. An
alternative but inferior approach would be to use a different switch to control
translator type-safety checks.
<H2><A NAME="SEC15" HREF="tl-manual_toc.html#TOC15">Pre-Process Compilation Pass</A></H2>
<P>
The pre-process pass is used to gather information about the following:
function name, argument types, and return types needed to translate calls to
this function; and variable data types needed to compile fetch and set
references. This information must be gathered before translation begins on any
files, since Lisp generally allows forward references to as-yet undefined
functions and variables. This pass over each file must occur after that file
has been compiled (if necessary) and loaded during the macro compilation pass.
(Consider collecting this information during the macro-expansion of gl:defun,
gl:defvar, etc. macros.) During this pass all top-level forms in a file are
read and macro expansion proceeds on each form only far enough so that top
level defining forms can be examined and their types determined. Normally this
would only be far enough to example declare statements on functions and global
declaim statements. In practice, we may also attempt to perform limited
macro-expansion on the bodies of functions that do not contain argument and
return value type specifications in order to determine stricter typing
information, but I'm expecting that this may be too time-consuming to do
without some limitations, for example stopping after 100 calls to
gl-macroexpand-1.
<P>
The gathered information will be stored in properties of symbols and will also
be written out into a .glp file (GL pre-process, a Lisp syntax file suitable
for processing with load, written into the macro directory next to the binary)
for each .lisp file pre- processed. This enables us to reload the .glp file
during subsequent pre-processing in which the source Lisp file has not changed.
<H2><A NAME="SEC16" HREF="tl-manual_toc.html#TOC16">Translation Compilation Pass</A></H2>
<P>
The translate pass is finally used to read source Lisp files and translate
their contents into C and H sources files. Besides the translated functions
and variables, each C file will also contain an init function, which will
allocate code constants needed by the translated functions and perform the
operations called for by the top-level forms of the translated file. Some
constants will be globally shared, such as compiled-function objects and
symbols. At the end of each translation, a .glt file (GL translation data)
will also be written out containing information about which globally optimized
constants are used, which of those are initialized within a translated file,
and which functions and variables are referenced with what type signatures.
This is needed for later incremental translations. During these later
translations it may be necessary to retranslate due to changes in prior
initialization of globally shared constants, such as when a symbol is no longer
initialized in a prior file, and so must be initialized in this file.
Similarly, when translating calls to functions, information about the type
signature of the called function is used to translate the call site. If the
type signature of a function changes, that will force a retranslate of files
containing calls. Lastly, we have always been concerned about changes to Lisp
macros being propagated to all callers. If during translations we keep track
of which macros are called and keep a checksum of the compilation of its
macro-function, we may be able to compare the checksums of later translations
and automatically determine when retranslates must be performed.
<H2><A NAME="SEC17" HREF="tl-manual_toc.html#TOC17">Packages</A></H2>
<P>
Packages should be designed as follows. The AB package should now use the GL
package to get all of its underlying Lisp implementation symbols. The GL
package will not use any other package and it will export all GL symbols that
we use. The translator functions will be available at translation time through
the GLT package, and the translator itself will be written within the GLI (GL
Internals) package. During development, some GL symbols will have been
imported from the Lisp package, but at runtime there will be no Lisp package,
only GL.
<P>
During runtime, the AB and GL packages will be available, but the GLT and GLI
packages will be absent. Note that it is currently my intent to have fully
reclaimable package and symbol structures so that new packages using the AB
package can be created at runtime and symbols removed from packages using the
unintern function will in fact be reclaimed as they are uninterned.
<P>
There is support in the symbol data structures for importing, but that will not
be used at this time.
<H2><A NAME="SEC18" HREF="tl-manual_toc.html#TOC18">Stacks</A></H2>
<P>
A global Obj_stack and Obj_stack_top will be maintained for holding shadowed
values of global variables and setjmp environments for catch and unwind_protect
scopes. The stack needs to be structured so that it can be searched from the
top down towards the bottom finding catch tags for throws, unwind_protect
scopes to execute, values to reinstate back into global variables, and
multiple_value sets to reinstate back into the multiple_values_buffer. Each
function that pushes values onto the stack must reset the Obj_stack_top back to
it's original value before exiting. Catch and unwind_protect locations must
cache the Obj_stack_top value available on entry to their scope and be prepared
to restore it. When throws are performed, stack unwinding will determine
appropriate catch locations for throws, restore global variable values whose
binding scopes are exited, and execute unwind-protect cleanup forms.
<H2><A NAME="SEC19" HREF="tl-manual_toc.html#TOC19">Global Variable Binding</A></H2>
<P>
Bindings of global variables will be handled as follows. The current value of
the global will be stored in a local lexical variable of the binding function
and also will be pushed onto the Obj_stack. Next a pointer to the value cell
of the bound variable is pushed onto the stack, and finally the keyword
:global-binding is pushed. The global variable is then set to its new bound
value. If a normal lexical exit is made from the scope of the binding, then
the global variable's old value is restored from the location in the local
lexical variable of the binding function and the three values are popped off
the stack by decrementing Obj_stack_top by 3 words. If a non-local exit is
made from this scope, then unwind_obj_stack will recognize the keyword
:global-binding and restore the value into the symbol-value cell pointed to on
the stack.
<H2><A NAME="SEC20" HREF="tl-manual_toc.html#TOC20">Unwind-Protect Scopes</A></H2>
<P>
On entry to an unwind protect scope, a call to setjmp is made with a jmp_buf
environment in order to save this location within the C calling stack. This
jmp_buf (which is a pointer, see "C, A Reference Manual, 3rd Edition" by
Harbison and Steele, p. 352) is pushed onto the Obj_stack, then the keyword
:unwind-protect. Any unwind scopes will then perform a longjmp to this jmp_buf
as the stack is unwound. After performing the cleanup forms, unwinding should
then continue in this case by performing another longjmp to the next unwind
location on the stack, or to the target catch tag if no further cleanups are
needed. On a normal exit from this scope, Obj_stack may merely be decremented,
abandoning this cleanup on the stack. Note that all local lexical variables in
scope around the unwind-protect must be declared volatile in the translation.
<H2><A NAME="SEC21" HREF="tl-manual_toc.html#TOC21">Catch and Throw</A></H2>
<P>
On entry to a catch form, a setjmp is performed with a jmp_buff environment.
This jmp_buf, the catch value, and the keyword :catch are then pushed onto the
stack. If a throw is performed, then a longjmp will be executed with the
appropriate values held in the Multiple_values_buffer. The receiver of values
from the catch form then may or may not unpack values from that buffer. Note
that all local lexical variables in scope on entry to the catch form should be
declared volatile in the translation.
<H2><A NAME="SEC22" HREF="tl-manual_toc.html#TOC22">Multiple Values</A></H2>
<P>
Within locations where the source and receiver of values are mutually visible,
then all single valued expressions will be handled as C expressions, and all
multiple valued expressions will be handled with a series of setqs into
gensymed variables. However, translations of locations where the source of
values is a function of unknown type signature or where multiple values are
returned to an unknown receiving location, then a general protocol of multiple
values will be used. In these cases, multiple values will be handled with a
global Multiple_values_buffer and Multiple_values_count. The first Lisp value
will always be returned as a normal C expression or function value, with any
secondary values being placed into the Multiple-values-buffer. If no values
are returned, then the values count will be set to 0 and NIL is returned, which
is the default value when receiving more values than are supplied. Sites
receiving only the first value may use the value of the C expression. Sites
receiving more one value must check the C values count, retrieving values from
the Multiple_values_buffer, or NIL if more values are desired than are
supplied. In multiple-value-prog1, catch, or unwind-protect cases, then a
protocol will be supplied which enables all values being returned to be cached
onto the global Obj stack. After all cleanup forms in the
multiple-value-prog1, catch, or unwind- protect have been executed, then the
cached values should be popped off the stack and back into the
multiple-values-buffer.
<H2><A NAME="SEC23" HREF="tl-manual_toc.html#TOC23">Constants</A></H2>
<P>
Constant values of type char, sint32, fixnum, and double will be translated
into inline C constants. The sint32 and fixnum constants should be able to be
emitted without a trailing L (as opposed to current practice) since we will
not be using the AlphaOSF long type, and so we expect to not be subject to the
preprocessor type conversion bug that led us to adopt the L integer suffix
rule.
<P>
Constants of type string, symbol, and compiled-function will be implemented
using initialized C struct variables, and these will be constant folded across
all translated C files. Translated Lisp references to these constants will
actually be references to the address of these C variables. For string
constants, typedefs will be emitted into the C files for string structures
containing differing lengths of string. Where the initial value of a string
constant is longer than 80 characters, the initial characters of the string
will be initialized using the array variable initialization format instead of
the normal string constant approach. This eliminates the long lines that break
some compilers (e.g. VAX C). Variables for strings should be declared as
"const". Symbol and compiled- function variables cannot be declared const,
since they must be modified during initialization to intern symbols, install
compiled-functions in symbols, and install pointers to C functions in
compiled-functions.
<P>
The number of external symbols needed within individual C files is a concern,
since we have seen this be a link time limitation on some systems. If this is
the case, then we can still achieve global constant folding within a translated
system by using an approach where the pointers to constants are installed into
a single large constant vector, with indices into this vector used to fetch the