thinlisp/docs/tl-manual.html at master · ska80/thinlisp

History

2468 lines (1895 loc) · 100 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

<HTML>

<HEAD>

<TITLE>ThinLisp Rant and Notes</TITLE>

</HEAD>

<BODY>

<H1>ThinLisp Manual</H1>

<H2>Version 1.0.1</H2>

<H2>12 October 2001</H2>

<ADDRESS>The ThinLisp Group:</ADDRESS>

<ADDRESS>Jim Allard</ADDRESS>

<P>

<H1><A NAME="SEC1" HREF="tl-manual_toc.html#TOC1">Acknowledgements</A></H1>

<P>

ThinLisp was begun in 1995 as a late night project born of the frustration of

attempting to ship commercial quality software products written in Lisp. Four

years later I'm still obsessed by the issues that drove me to it in the first

place -- practical use of high level languages to implement complex

applications while achieving the performance that seems easy when using

languages that are semantically "close to the silicon". Hopefully this

implementation will lead to some useful insights towards that goal.

<P>

Thanks to Ben Hyde for convincing Gensym and myself that this system should be

taken off the shelf, polished up, and released as open source software. Mike

Colena and Glen Iba wrote significant parts of the implementation. Nick Caruso,

Joe Devlin, Rick Harris, and John Hodgkinson and also contributed their time,

sage opinions, and code. Thanks to Lowell Hawkinson, Jim Pepe, and Dave Riddell

for their support of this work at Gensym Corporation. Bill Brody, Kim Barrett,

and Rick Harris wrote much of the Chestnut Lisp to C translator that was the

philosophical precursor to this work. Thanks to Kim Barrett and David Sotkowitz

of IS Robotics, Inc. (now iRobot Corporation) for contributions of

multiple-inheritance code and opinions, and to Rod Brooks and Dave Barrett for

their ideas. Finally, thanks to my wife, Gerry Zipser, for putting up with the

bleary-eyed aftermath of coding sessions, and for helping to focus efforts

during this past summer's release push.

<P>

<BR>Jim Allard<BR>Newton, Massachusetts<BR>October 6, 1999 and again May 9, 2001

<H1><A NAME="SEC2" HREF="tl-manual_toc.html#TOC2">Preface</A></H1>

<P>

ThinLisp is an open source Lisp to C translator for delivering commercial

quality, Lisp-based applications. It implements a subset of Common Lisp with

extensions. ThinLisp itself is written in Common Lisp, and so must run on top

of an underlying Common Lisp implementation such as Allegro, MCL, or CMU Lisp.

The C code resulting from a translation can then be independently compiled to

produce a small, efficient executable image.

<P>

Originally designed for real-time control applications, ThinLisp is not a

traditional Lisp environment. It purposefully does not contain a garbage

collector. ThinLisp produces compile time warnings for uses of inherently slow

Lisp operations, for consing operations, and for code that cannot be optimized

due to a lack of sufficient type declarations. These warnings can be suppressed

by improving the code, or through use of lexical declarations acknowledging that

the code is only of prototype quality. Code meeting the stringent requirements

imposed by ThinLisp cannot by sped up by rewriting it in C.

<P>

The ThinLisp home is <A HREF="http://www.thinlisp.org/">http://www.thinlisp.org/</A>. The newest source

distributions of TL and this manual can be found here. Bugs should be

reported to <A HREF="mailto:[email protected]">[email protected]</A>.

<P>

<H2><A NAME="SEC4" HREF="tl-manual_toc.html#TOC4">My Excuse for Ranting</A></H2>

<P>

I've imagined the reactions people will have when they see that there is yet

another Lisp to C translator implementation being released. Mostly I've

expected that people would think I'm stark raving mad to engage in a large work,

to serve an almost non-existent user base of programmers in a language that is

largely discredited while being actively antagonistic towards many of the design

principles this language has employed for decades. Since I'm expecting that

people will think I'm mad (and half believing it myself), I then feel entitled

(even obligated) to engage in a rant to explain the motivations of such a

curious endeavor. Please excuse this indulgence and look charitably upon it as

a form of self-therapy.

<H2><A NAME="SEC5" HREF="tl-manual_toc.html#TOC5">Lisp Background</A></H2>

<P>

More than fifteen years ago I was in a recitation session of an Introduction to

Artificial Intelligence course. The lecturer, Patrick Winston, had come in for a

session on automated planning. At one point I asked a question about how a

certain situation might be handled in the system we were studying, and he

answered that it couldn't be handled in that system and that solving that

problem was a research topic. The fact that problems as simple (so I thought at

the time) as what we were describing were research, and that as a sophomore in

college I could grasp them seemed amazing to me then, as it still does now. On

the spot I could imagine six different ways to attack that problem, and none of

them felt any less plausible than some of the established techniques that we

were studying. I was hooked.

<P>

At the time, this sort of optimism and arrogance about the immediacy and

inevitability of technical success in AI was rampant. The convergence of

breakthroughs in techniques, tools, and venture capital money that flowed like

wine was intoxicating for everyone.

<P>

No one thought this work would be trivial, so the best software development

tools available would be required. And of course the best tools from the

perspective of the AI community included the language that this community had

been developing since the late 1950's, Lisp.

<P>

Originally, Lisp was developed as an experiment in providing a direct means of

implementing lambda calculus, where operations were first class objects that

could be manipulated in the same manner as data. The presumption behind many of

the Lisp design choices was that the most important goal was to maximize a

programmer's productivity in developing new functionality by enabling the purest

form possible of coded solutions, and so to hide as many of the platform

specific details as possible. In the ongoing evolution of Lips, considerations

of the efficiency of processing speed and memory use came in a distant second to

programmer convenience and to the attempt to protect the programmer from bugs

caused by mathematically abhorrent behaviors such as integer wraparound.

<P>

Perhaps because of this devotion to developer productivity; perhaps because of

the synergy between language syntax, parsing, and data; perhaps because of the

invention of the Lisp macro facility; or perhaps because it was invented down

the hall, Lisp became the language of choice for research and experimentation in

new programming constructs (at least at MIT). Many of the different possible

approaches to object oriented programming were first tried out in Lisp.

Examples are multiple inheritance, multi-methods, prototype-based object systems,

and automatic method combination via whoppers, wrappers, before, and after

methods. Other innovations experimented with in Lisp are error handling

systems, multi-platform file naming approaches, and guaranteed execution of

clean-up code (i.e. unwind-protect).

<P>

In 1984 a group formed to attempt to unify as many of the best features of

different Lisp dialects as possible into one central standard. From this effort

came the excellent books <CITE>Common Lisp, The Language</CITE> by Guy Steele and it's

second edition, which incorporated many of the changes that would eventually be

codified into ANSI Common Lisp. This is the dialect that dominates Lisp work

today, at least in the U.S.

<H2><A NAME="SEC6" HREF="tl-manual_toc.html#TOC6">Lisp in Action</A></H2>

<P>

In any case, throughout the 1980's the bulk of the research labs and companies

that attempted to commerciallize AI technologies used Common Lisp to implement

their products. The most broadly deployed AI-based technique was expert

systems, also called rule-based systems. <A NAME="DOCF1" HREF="tl-manual_foot.html#FOOT1">(1)</A> Early results were encouraging. In the lab, sometimes on specialized

hardware, the software engineers were able to demonstrate the beginnings of the

functionality that they were promising.

<P>

However, when it came to deploying these systems, things didn't turn out so

smoothly. Most software products of that era were coded in languages such as

Fortran, Cobol, Pascal, C, and assembly language. It turned out that Software

developers who could breathe fire in Lisp were clumsy, inefficient, or flatly

unwilling to port their systems to these traditional languages.

<P>

Customers were loathe to buy expensive, special purpose hardware for these

applications while cheap, ever more powerful PC computers were rolling out.

During this period the workstation class computer was also coming on, with

initial excellent products coming from from Sun and Apollo. Stories about the

difficulties in rolling out applications are rife throughout the industry, but

I'll speak to just the cases I personally was involved in. While the presence

of Lisp as the basis of these software systems cannot be fully blamed for their

success or failure, in all cases it was a controversial and high-visibility part

of the system.

<P>

The Mentor system for preventative maintenance was a success story. A group at

Honeywell made a forward-chaining, rule-based system that helped novice

technicians perform preventative maintenance on centrifigal chillers,

essentially large air-conditioning units found in office buildings. Honeywell

had a significant business in maintenance services for traditional

compressor-based chillers, but it lacked trained personnel to go after this new

and lucrative service market. A corporate research lab made a Lisp-based

implementation of a run-time environment for Mentor that ran on a ruggedized

Grid laptop computer. While the system was useful, it ran sluggishly and the

Lisp-based implementation left little room on the laptop for the system to grow

and improve. A technology transfer project was begun which resulted in a

software engineer within the service division rewriting the application in C,

greatly improving its speed and reducing its size, allowing the system to be

expanded into a commercially viable tool. As far as I know, this system is

still in use.

<P>

In this system, Lisp served a very useful purpose as a prototyping language, but

a critical element in the success of the project was to abandon the original

implementation and rewrite it in C. This was necessary for the commercial

division that deployed this product, first to gain acceptance, and second to

enable that division to maintain the system with the personnel they had working

there. The rewrite also sped up the application and greatly reduced its

run-time size.

<P>

The Cooker system was a real-time supervisory level monitoring system also built

at Honeywell which was intended to be used in process control plants. It was

implemented on a Symbolics 3640 which was deployed to a paper mill in

Pennsylvania. Though care had been taken to tailor memory use such that a

generational garbage collector would efficiently reclaim the vast majority of

discarded memory, every 6 weeks or so a hemisphere flip would occur (i.e. a big

garbage collection), during which the system could not maintain adequate

real-time performance for several hours. After the third time that this

happened, the plant crew turned off the system and never restarted it again.

<P>

In this case, the system was gaining acceptance by the pilot users, but as a

direct consequence of the performance problems caused by garbage collection this

pilot project was closed out, and this particular deployment approach was

dropped. A different

<P>

The G2 system is a real-time supervisory level control system built by Gensym

Corporation. It's source code was written in Lisp, and for approximately 3

years was released as a world-saved image made from Lucid Lisp environments on

UNIX workstation class machines. In 1990, Gensym ported its system to Ibuki

Common Lisp, which was a Lisp-to-C translator descended from Kyoto Common Lisp.

By translating to C and discarding those parts of Common Lisp that we didn't

use, we were able to decrease the size of our shipped product by 25%, a fact

that was greatly appreciated by a customer base tired of constantly having to

buy more memory for computers. In 1993 Gensym changed again to a different Lisp

to C translator made by Chestnut Software. This produced an additional 40%

decrease in size, and a significant improvement in performance. The ThinLisp

translator is a philosophical descendent of the Chestnut translator, though

ThinLisp was written from scratch. One of the factors cited by the head of

sales of G2 was that once we used a Lisp to C translator, he could say that the

product was "deployed in C", thereby side-stepping the Lisp issue, which had

been a significant downside to our system in the eyes of many potential

customers.

<P>

Here Lisp was a competative advantage for development, but became a sales

liability due to the poor reputation of Lisp-based applications in the

marketplace. Once the organization switched to a Lisp-to-C translator, then

development could continue to see the advantages of Lisp coding, but marketing

could bury that detail, avoid a defensive technical argument that was proving a

barrier to closing sales, and continue to focus on the product's positive

benefits.

<P>

The Bore Rat is an autonomous robotic device that travels through the piping

installed in oil wells. Built by IS Robotics, it is capable of traveling 2

miles straight down into a well, perform maintenance and data logging tasks, and

then return to the surface, all under its own power and control. The software

for this system was written in L, a Lisp subset made for embedded processors, in

this case a Motorola 68376. The first prototypes of the BoreRat became

operational in 1999 and are currently being marketed to the oil field services

industry by IIC. In this case Lisp was not a significant factor, either in a

positive or a negative sense.

<P>

During this time, Lisp has waned in its influence. Java has replaced Lisp in

many places, the number of Lisp vendors has dropped, and Lisp is being taught in

fewer college courses. The future of traditional Common Lisp is in doubt,

though it is still strong in a few niches, and deep within a few specific

applications, such as Yahoo Store.

<P>

As a result of these experiences, and as an outcropping of my experience using

Lisp to implement real-time control systems, I've wanted to have a Lisp

environment that provided the features that I liked, such as macros for control

structure abstractions, that had outstanding performance characteristics given

proper type declarations, and which remained small and lean. ThinLisp is an

embodiment of that environment.

<H2><A NAME="SEC7" HREF="tl-manual_toc.html#TOC7">Scientists, Mathematicians, and Engineers (oh my!)</A></H2>

<P>

I have been fascinated (and occasionally amused) by the nearly religious

arguments about the merits of various computer languages. Beyond the pure

merits of the systems being discussed there seemed to be lurking a difference in

philosophy about how to go about programming that colored these discussions.

This section is an attempt to expose a theory about that underpinning. In the

end, this discussion also helps me explain why I've made the design choices that

I have in ThinLisp.

<P>

From the beginning Lisp was a mathematician's language. To understand what I

mean by this, I must explain one view of the differences between the goals of

scientists, mathematicians, and engineers. <A NAME="DOCF2" HREF="tl-manual_foot.html#FOOT2">(2)</A>

<P>

The goal of scientists is to learn truths about the world around them, using

observation, analysis, hypothesis, and experimentation to advance and buttress

their claims of discovery. Pure science aspires to be unconcerned with the

utility or applicability of the information they seek. The ultimate goal is to

discover and understand what <EM>is</EM>.

<P>

Mathematicians aspire to an even higher goal -- understanding what truths there

are or that there might be, regardless of or even entirely divorced from the

limits of the physical world. The studies of mathematicians are the stuff of

pure thought and imagination, yet bound by the strictest rules of proof. To

build robust castles in the air based on proofs combining into a single

monolithic whole requires that all pieces must act perfectly in concert.

Therefore the mathematician must constantly be vigilant against contradictions

between the different portions of his whole proof, where one technique carries

with it constraints which must be respected within all other parts of the proof.

With this rigorous technique in hand, the ultimate goal is to discover what is

<EM>true</EM>.

<P>

On the other hand the goal of the engineer, in the abstract, is to bend the

behavior of an uncooperative world to his will. Whether it is the stereotypical

engineer operating a train, or a civil, acoustic, electrical, mechanical,

software, or other specialty of engineer, their work is to use what is known

from science and mathematics, exploiting practicality and expediency, to operate

or design a physical entity so that it performs a specific task, usually with

the motivation of the almighty dollar. Within the constraints of the physical

world, the ultimate goal is to accomplish what is <EM>practical</EM>.

<P>

Within the computer science community, examples of personalities suited to each

of these different disciplines abound. Within the the small community of

computer language designers, the mathematician and engineer distinguish

themselves through two very different approaches to language design.

<H2><A NAME="SEC8" HREF="tl-manual_toc.html#TOC8">Denotational and Operational Semantics</A></H2>

<P>

The mathematician espouses <EM>denotational semantics</EM>, where the behavior of

an operation is determined by what the operation <EM>means</EM>. In this style,

the goal is to make a complete unified system where each element of the language

must be intricately coordinated with each other part, so that each element can

retain its meaning when used in conjunction with any other part. The semantics

of each operation are primary, and the consequences of the definition of one

operation can strongly affect the implementation of many other operations.

While the behaviors required of any individual operation may become complicated,

the programmer using such a language can work confident in the knowledge that

his code may be intermingled in complex ways with other's code without fear of

each violating the other's assumptions about eventual program behavior. When

finished, systems defined along these lines make powerful and elegant

environments.

<P>

The engineer espouses <EM>operational semantics</EM>, where what each operation

means is determined by what it <EM>does</EM>. In this style, the goal is to

determine the elements of a toolbox and to specify the behavior of each, largely

independently of any other operation in the language. The definitions of

operations stand alone and are designed specifically to minimize the interaction

between operations. The semantics of each operation are thin, but this allows

the programmer to work confident of the performance and size of the program

being generated. Here one is seldom surprised by the behavior of a line of

code, and moderately experienced programmers can accurately predict the behavior

down to enumerating the machine instructions that are likely to be executed.

<P>

Most computer languages is use today are exemplars of the engineering style,

defined with operational semantics in mind. Building primarily from the

operations that could be implemented on the computing devices that could be made

at the time, and secondarily on the specific computing needs of a user base at

hand, languages were designed to allow people to command computers to do things.

This quote from the preface to <CITE>K&R C</CITE><A NAME="DOCF4" HREF="tl-manual_foot.html#FOOT4">(4)</A>, the original book describing C,

explains this style well.

<P>

C is a general-purpose programming language which features economy of

expresssion, modern control flow, and data structures, and a rich set of

operators. C is not a "very high level" language, nor a "big" one, and is not

specialized to any particular area of application. But its absence of

restrictions and it generality make it more convenient and effective for many

tasks than supposedly more powerful languages.

</BLOCKQUOTE>

<P>

Typically the operations in these languages are semanticly thin (or shallow, if

you prefer), to the extent that a programmer of medium experience could almost

predict the number of machine instructions required to carry out any given

command in the language. Fortran, Pascal, Basic, and C are all examples of

languages in this category, with C (and it's derivative, C++) now being the

default choice for people building big systems.

<P>

Lisp and other languages (such as CLU, Smalltalk, Eiffel, and recently Java)

were are exemplars of the mathematical style, emphasizing denotational

semantics. The operations in these languages were defined with semantics that

are deep (or fat if you prefer), focusing not on what an operation <EM>did</EM>

but on what it <EM>meant</EM>. Of course there had to be implementations of these

operations, but the actual machine instructions carried out were allowed (even

expected) to vary widely depending on the manner and situation in which it was

used.

<P>

These philosophical differences lead to vastly different languages, with the

consequences of each style evoking surprise, disbelief, or even indignant

disgust in ardent followers of the opposing camp. Denotational fans can't

believe how much of the underlying limitations of systems are exposed, and

operational fans can't believe how disassociated operations are from the

machine. One sees too much exposed, and the other sees too much being hidden.

<H2><A NAME="SEC9" HREF="tl-manual_toc.html#TOC9">GOTO Right Now, or GOTO When You're Ready</A></H2>

<P>

An example of how this difference in design perspective plays out can be found

in the behavior of the <CODE>goto</CODE> statement in C and the <CODE>go</CODE> special

operator in Common Lisp. Consider the following two code fragments.

<P>

In C:

<PRE>

int find_big_int(int *set, int start, int end) {

<I>Iterate through the set</I> {

if <I>test</I>

goto bad;

...

}

bad:

...

}

</PRE>

<P>

In Lisp:

<PRE>

(defun find-big-int (set start end)

(tagbody

(<I>Iterate through the set</I>

(if <I>test</I>

(go bad))

...)

bad

...))

</PRE>

<P>

In the C code above, regardless of the surrounding code and environments, a user

can predict the machine's behavior at the <CODE>goto</CODE> command and what the

likely cost is. In particular, a conditional branch instruction will be

generated based on the result of the test expression, and it will immediately

begin executing the code following the label <CODE>bad</CODE>.

<P>

In the Lisp code, though the situation appears similar, the resulting behavior

may be very different. If the code that implements the "Iterate through the

set" element above is a simple <CODE>do</CODE>, then the behavior will be very much

like what happens in C. However, if the iteration is performed using a mapping

function, and the <CODE>(go bad)</CODE> form is contained within an embedded

<CODE>lambda</CODE> form defining a different function, then the situation is very

different. Because goto tags have lexical scope, even across function

boundaries, requires that the <CODE>go</CODE> be able to exit the function it is in

and re-enter the outer function at the appropriate point. In this case,

entering the <CODE>tagbody</CODE> statement would require establishing a restart point

similar to a <CODE>setjmp</CODE>, and the <CODE>go</CODE> would become something similar to

a <CODE>longjmp</CODE>. If the iteration contains an <CODE>unwind-protect</CODE>, then

cleanup code will be executed after the <CODE>go</CODE> executes, but before the code

at the <CODE>bad</CODE> statement. If the <CODE>unwind-protect</CODE> clean-up forms

contain a go to a further surrounding tagbody, then the code after the

<CODE>bad</CODE> tag will never be executed.

<P>

In this example, operational prorammers are floored by how complicated a simple

<CODE>goto</CODE> statement can become. Denotational programmers are in disbelief

that the execution of clean-up code cannot be enforced.

<H2><A NAME="SEC10" HREF="tl-manual_toc.html#TOC10">ThinLisp's Compromises</A></H2>

<P>

While C++ and Java can be viewed as attempts to move C more towards a

denotational footing, Thinlisp is an attempt to move Common Lisp more towards an

operational footing. I've attempted to lessen the impact of Lisp's "gotchas"

that frustrate programmers who look at languages from an operational semantics

point of view. Attempting to allow the benefits of deep denotational semantics

while warning of performance and behavior surprises is the underlying goal for

ThinLisp.

<P>

One of the goals of ThinLisp is to implement the deep semantics of Lisp, while

offering compile time warnings whenever these semantics force behavior that

deviates from expectations consistent with operational (or "thin") semantics.

If the programmer really intends to use the full, deep semantics, then a

lexically apparent declaration can be used to suppress the warnings and accept

the more expensive behavior required by the deep semantics.

<P>

There are three main areas where ThinLisp warnings are used to help prevent

performance gotchas.

<UL>

<LI>

consing,

<LI>

run-time dispatched operations, and

<LI>

non-local exits.

</UL>

<P>

Within ThinLisp there is no garbage collector. While many fine garbage

collection systems exist, and some are truely approaching the goal of real-time

operation, they all impose significant overhead to maintain a root set of

objects that form the base of the reference tree that is walked during a garbage

collect. ThinLisp avoids the cost of the garbage collection and the cost of

maintaining the data that makes garbage collection possible.

<P>

This means that Lisp programmers must adopt the discipline of the C programmer

in knowing where memory is allocated and where it is reclaimed. Because this is

such a foreign concept to Lisp programming style, automated help is required to

protect Lisp programmers from careless consing. It is a well known maxim that

Lisp programmers looking to improve performance should avoid consing, however

very few Lisp programmers are good at it. Within ThinLisp there is a concept of

both lexical and dynamic scopes that supply consing contexts. At compile time

any consing operation must have a <CODE>consing-area</CODE> declaration lexically

surrounding its scope, or else a compile time warning is given. The

<CODE>consing-area</CODE> can either be <CODE>permanent</CODE> or <CODE>temporary</CODE>.

A <CODE>permanent</CODE> area means consing proceeds as normal, with all created

objects having dynamic extent. However, whenever consing happens within a

<CODE>temporary</CODE> scope, then the object's lifetime only has dynamic scope back

to the exit point of the <CODE>temporary</CODE> <CODE>consing-area</CODE>. The

<CODE>consing-area</CODE> declaration is similar to the <CODE>dynamic-extent</CODE>

declaration, but it is pervasive instead of applying to a particular variable

binding.

<P>

ThinLisp warns about run-time dispatched operations which could be made more

efficient with type declarations. In some cases, the run-time polymorphism of

an operation is exactly what is desired, and then there are declarations that

can be used to suppress these warnings. It is well known that type declarations

can improve the performance of some Lisp operations, however it is very

difficult in practice to acquire these optimizations without some warnings and

assistance from the compiler. The assumption in building the ThinLisp

translator is that you are attempting to make production quality code, and

therefore want advice about code that could be hiding a performance gotcha.

<P>

Lastly, ThinLisp can issue compile-time warnings about non-local exits. The

<CODE>unwind-protect</CODE> form can guarantee that clean-up code is executed on exit

from a scope, but it is a very expensive operation do to <CODE>setjmp</CODE> calls

establish a reentry point into the function, and the need to declare variables

<CODE>volitile</CODE> if they are modified within that scope. There are some macro

forms where is it acceptible if an error exits a scope without a clean-up, but a

normal execution of the body form should not leave the scope with a

<CODE>return-from</CODE> form. In order to allow programmers to write macros that

have clean-up code without resorting to an <CODE>unwind-protect</CODE> form, ThinLisp

provides a <CODE>require-local-exit</CODE> declaration that will cause compile time

warnings if a scope is left prematurely. The other kind of non-local exit that

ThinLisp prevents is <CODE>goto</CODE> or <CODE>return-from</CODE> forms that exit a lexical

closure and re-enter a surrounding function. These patterns of code are

tantamount to <CODE>catch</CODE> and <CODE>throw</CODE> forms, which are very expensive. By

declining to implement this form of non-local exit, we can prevent this kind of

performance pratfall.

<H1><A NAME="SEC11" HREF="tl-manual_toc.html#TOC11">Original Design Notes</A></H1>

<P>

<I>The following were the original design notes about what should be built into

GL, which became ThinLisp. -jallard 10/12/99</I>

<H2><A NAME="SEC12" HREF="tl-manual_toc.html#TOC12">Def-System</A></H2>

<P>

In order to get the flexibility that we want with this development environment,

a def- system implementation will be built-in to GL. This is necessary to get

the desired layering of libraries and to enable identification of call-trees,

module interdependencies, and dead-code.

<P>

The base system will be GL. It will contain only the smallest set of

primitives possible, though this is still a substantial set. Support for all

defining forms and special forms must be built here. There will several

systems that implement portions of Common Lisp that we think may not be

necessary for all products we intend to build in Lisp. These should include

many of the utilities that we currently have in G2 to replace consing versions

of the same from Common Lisp. For example, the gensym-pathnames and gensym-

streams code should be made into a separate file-io system that may or may not

be included into products (G2 and TW need it, GSI does not).

<P>

Within the Def-system for a group of files there should be a flag indicating

whether or not this set of files is to be distributed as a library or an

executable. If it is a library, the def-system should include (or there should

be a separate form for declaring) those functions that are called by users of

the library, and the def-system should also include (or there should be a

separate form for declaring) those call-back functions provided by users of the

library that the internals of the library will call (this is the GSI call out

case). Foreign functions should be handled as they are now using the def-

gensym-c-function form.

<H2><A NAME="SEC13" HREF="tl-manual_toc.html#TOC13">Overview of Translations</A></H2>

<P>

There are four different modes in which GL (Gensym Lisp, oops, er, Language)

source code will need to be processed, two Lisp compilation modes and two

translation modes. The first is development compilation in an underlying Lisp

environment. The second is a macro definition compilation to define macros in

preparation for a translation. These are otherwise unremarkable uses of the

underlying Lisp implementations compile-file and load functions, distinguished

only by what we do with the :development and :distribution compiler switches.

The third is a translation pre-process step in which the names and data types

of variables and functions are determined. The fourth and final is a

translation mode in which C files and H files are generated.

<P>

The translator from GL to C will be written to make as few dependencies as

possible on the implementation it is layered on top of. (As this is being

written Lucid has gone bankrupt and been acquired by Harlequin and Apple is

dumping off MCL to a third party to support.) In order to keep things portable

we will implement our own macro-expander to avoid the problem of differences in

macro-expansion environment representations. Those Lisp functions whose

argument lists are more complex than positional and optional arguments will be

implemented as macros. These macros will expand into functions with fixed and

optional arguments. We will write all of our own macro implementations of all

non- function Lisp primitives so that we can control the base set of functions

and special forms that these macro-expansions will bottom out within in a

portable way. All of these base functions that the translator natively

supports will be implemented in the Lisp development environment so that the

development environment execution will be as similar as possible to the final C

execution, but those facilities that are required to come directly from the

underlying Lisp during development (e.g. special forms) will be had through

conditional macro-expansion that uses the underlying Lisp. For example, a call

to gl:list with 5 arguments will always macro-expand into calls to the

gli::list-5 function. However, calls to gl:catch will expand to gli::catch

while translating and will expand into lisp:catch during development. The

special-form gl:the will expand such that it performs type checking at

development run-time and type declared local variables will at least have their

initial values type-checked. All reasonable effort will be made to avoid

redundant run-time checks in development.

<P>

A Lisp function fragment illustrating the outermost loop of a translation is as

follows. The actual implementation will need further complications, such as

recompile arguments.

<PRE>

(defun translate-system (system)

(loop for used-system in (system-used-systems system) do

(load-system used-system :binaries t :gl-preprocess-info t

:gl-trans-info t))

(loop for file in (system-files system) do

(when (binary-out-of-date-p file)

(compile-file file))

(unless (binary-loaded-p file)

(load-binary file))

(when (gl-preprocessed-date-out-of-date-p file)

(gl-preprocess file))

(unless (gl-preprocessed-data-loaded-p file)

(load-gl-preprocessed-data file)))

(loop for file in (system-files system) do

(when (or (gl-trans-data-out-of-date-p file)

(gl-c-file-out-of-date-p file)

(gl-h-file-out-of-date-p file)

(progn (load-gl-trans-data file)

(gl-trans-data-indicates-retranslate-p file)))

(gl-translate-file file)))

(gl-dump-makefile-information system))

</PRE>

<H2><A NAME="SEC14" HREF="tl-manual_toc.html#TOC14">Compilation Passes</A></H2>

<P>

The development compilation pass can be detected by the :development switch in

*features*. The macro compilation pass has no significant *features*

included. The pre-process translation pass and the translation pass can be

detected with the :translator and :no- macro switches in *features*. Note that

the pre-process pass may be interleaved with the macro pass (i.e. we may macro

compile then pre-process each Lisp file in turn), but these features can still

be reliably used to detect the difference. Also note that we will likely want

to perform development translations which will produce images with full runtime

type checking, which could be detected by having both :translator and

:development available at the same time. This may be difficult, since the

:development switch has often been confounded with dependencies on Lisp

development environment (i.e. :lucid and formerly :lispm) switches. An

alternative but inferior approach would be to use a different switch to control

translator type-safety checks.

<H2><A NAME="SEC15" HREF="tl-manual_toc.html#TOC15">Pre-Process Compilation Pass</A></H2>

<P>

The pre-process pass is used to gather information about the following:

function name, argument types, and return types needed to translate calls to

this function; and variable data types needed to compile fetch and set

references. This information must be gathered before translation begins on any

files, since Lisp generally allows forward references to as-yet undefined

functions and variables. This pass over each file must occur after that file

has been compiled (if necessary) and loaded during the macro compilation pass.

(Consider collecting this information during the macro-expansion of gl:defun,

gl:defvar, etc. macros.) During this pass all top-level forms in a file are

read and macro expansion proceeds on each form only far enough so that top

level defining forms can be examined and their types determined. Normally this

would only be far enough to example declare statements on functions and global

declaim statements. In practice, we may also attempt to perform limited

macro-expansion on the bodies of functions that do not contain argument and

return value type specifications in order to determine stricter typing

information, but I'm expecting that this may be too time-consuming to do

without some limitations, for example stopping after 100 calls to

gl-macroexpand-1.

<P>

The gathered information will be stored in properties of symbols and will also

be written out into a .glp file (GL pre-process, a Lisp syntax file suitable

for processing with load, written into the macro directory next to the binary)

for each .lisp file pre- processed. This enables us to reload the .glp file

during subsequent pre-processing in which the source Lisp file has not changed.

<H2><A NAME="SEC16" HREF="tl-manual_toc.html#TOC16">Translation Compilation Pass</A></H2>

<P>

The translate pass is finally used to read source Lisp files and translate

their contents into C and H sources files. Besides the translated functions

and variables, each C file will also contain an init function, which will

allocate code constants needed by the translated functions and perform the

operations called for by the top-level forms of the translated file. Some

constants will be globally shared, such as compiled-function objects and

symbols. At the end of each translation, a .glt file (GL translation data)

will also be written out containing information about which globally optimized

constants are used, which of those are initialized within a translated file,

and which functions and variables are referenced with what type signatures.

This is needed for later incremental translations. During these later

translations it may be necessary to retranslate due to changes in prior

initialization of globally shared constants, such as when a symbol is no longer

initialized in a prior file, and so must be initialized in this file.

Similarly, when translating calls to functions, information about the type

signature of the called function is used to translate the call site. If the

type signature of a function changes, that will force a retranslate of files

containing calls. Lastly, we have always been concerned about changes to Lisp

macros being propagated to all callers. If during translations we keep track

of which macros are called and keep a checksum of the compilation of its

macro-function, we may be able to compare the checksums of later translations

and automatically determine when retranslates must be performed.

<H2><A NAME="SEC17" HREF="tl-manual_toc.html#TOC17">Packages</A></H2>

<P>

Packages should be designed as follows. The AB package should now use the GL

package to get all of its underlying Lisp implementation symbols. The GL

package will not use any other package and it will export all GL symbols that

we use. The translator functions will be available at translation time through

the GLT package, and the translator itself will be written within the GLI (GL

Internals) package. During development, some GL symbols will have been

imported from the Lisp package, but at runtime there will be no Lisp package,

only GL.

<P>

During runtime, the AB and GL packages will be available, but the GLT and GLI

packages will be absent. Note that it is currently my intent to have fully

reclaimable package and symbol structures so that new packages using the AB

package can be created at runtime and symbols removed from packages using the

unintern function will in fact be reclaimed as they are uninterned.

<P>

There is support in the symbol data structures for importing, but that will not

be used at this time.

<H2><A NAME="SEC18" HREF="tl-manual_toc.html#TOC18">Stacks</A></H2>

<P>

A global Obj_stack and Obj_stack_top will be maintained for holding shadowed

values of global variables and setjmp environments for catch and unwind_protect

scopes. The stack needs to be structured so that it can be searched from the

top down towards the bottom finding catch tags for throws, unwind_protect

scopes to execute, values to reinstate back into global variables, and

multiple_value sets to reinstate back into the multiple_values_buffer. Each

function that pushes values onto the stack must reset the Obj_stack_top back to

it's original value before exiting. Catch and unwind_protect locations must

cache the Obj_stack_top value available on entry to their scope and be prepared

to restore it. When throws are performed, stack unwinding will determine

appropriate catch locations for throws, restore global variable values whose

binding scopes are exited, and execute unwind-protect cleanup forms.

<H2><A NAME="SEC19" HREF="tl-manual_toc.html#TOC19">Global Variable Binding</A></H2>

<P>

Bindings of global variables will be handled as follows. The current value of

the global will be stored in a local lexical variable of the binding function

and also will be pushed onto the Obj_stack. Next a pointer to the value cell

of the bound variable is pushed onto the stack, and finally the keyword

:global-binding is pushed. The global variable is then set to its new bound

value. If a normal lexical exit is made from the scope of the binding, then

the global variable's old value is restored from the location in the local

lexical variable of the binding function and the three values are popped off

the stack by decrementing Obj_stack_top by 3 words. If a non-local exit is

made from this scope, then unwind_obj_stack will recognize the keyword

:global-binding and restore the value into the symbol-value cell pointed to on

the stack.

<H2><A NAME="SEC20" HREF="tl-manual_toc.html#TOC20">Unwind-Protect Scopes</A></H2>

<P>

On entry to an unwind protect scope, a call to setjmp is made with a jmp_buf

environment in order to save this location within the C calling stack. This

jmp_buf (which is a pointer, see "C, A Reference Manual, 3rd Edition" by

Harbison and Steele, p. 352) is pushed onto the Obj_stack, then the keyword

:unwind-protect. Any unwind scopes will then perform a longjmp to this jmp_buf

as the stack is unwound. After performing the cleanup forms, unwinding should

then continue in this case by performing another longjmp to the next unwind

location on the stack, or to the target catch tag if no further cleanups are

needed. On a normal exit from this scope, Obj_stack may merely be decremented,

abandoning this cleanup on the stack. Note that all local lexical variables in

scope around the unwind-protect must be declared volatile in the translation.

<H2><A NAME="SEC21" HREF="tl-manual_toc.html#TOC21">Catch and Throw</A></H2>

<P>

On entry to a catch form, a setjmp is performed with a jmp_buff environment.

This jmp_buf, the catch value, and the keyword :catch are then pushed onto the

stack. If a throw is performed, then a longjmp will be executed with the

appropriate values held in the Multiple_values_buffer. The receiver of values

from the catch form then may or may not unpack values from that buffer. Note

that all local lexical variables in scope on entry to the catch form should be

declared volatile in the translation.

<H2><A NAME="SEC22" HREF="tl-manual_toc.html#TOC22">Multiple Values</A></H2>

<P>

Within locations where the source and receiver of values are mutually visible,

then all single valued expressions will be handled as C expressions, and all

multiple valued expressions will be handled with a series of setqs into

gensymed variables. However, translations of locations where the source of

values is a function of unknown type signature or where multiple values are

returned to an unknown receiving location, then a general protocol of multiple

values will be used. In these cases, multiple values will be handled with a

global Multiple_values_buffer and Multiple_values_count. The first Lisp value

will always be returned as a normal C expression or function value, with any

secondary values being placed into the Multiple-values-buffer. If no values

are returned, then the values count will be set to 0 and NIL is returned, which

is the default value when receiving more values than are supplied. Sites

receiving only the first value may use the value of the C expression. Sites

receiving more one value must check the C values count, retrieving values from

the Multiple_values_buffer, or NIL if more values are desired than are

supplied. In multiple-value-prog1, catch, or unwind-protect cases, then a

protocol will be supplied which enables all values being returned to be cached

onto the global Obj stack. After all cleanup forms in the

multiple-value-prog1, catch, or unwind- protect have been executed, then the

cached values should be popped off the stack and back into the

multiple-values-buffer.

<H2><A NAME="SEC23" HREF="tl-manual_toc.html#TOC23">Constants</A></H2>

<P>

Constant values of type char, sint32, fixnum, and double will be translated

into inline C constants. The sint32 and fixnum constants should be able to be

emitted without a trailing L (as opposed to current practice) since we will

not be using the AlphaOSF long type, and so we expect to not be subject to the

preprocessor type conversion bug that led us to adopt the L integer suffix

rule.

<P>

Constants of type string, symbol, and compiled-function will be implemented

using initialized C struct variables, and these will be constant folded across

all translated C files. Translated Lisp references to these constants will

actually be references to the address of these C variables. For string

constants, typedefs will be emitted into the C files for string structures

containing differing lengths of string. Where the initial value of a string

constant is longer than 80 characters, the initial characters of the string

will be initialized using the array variable initialization format instead of

the normal string constant approach. This eliminates the long lines that break

some compilers (e.g. VAX C). Variables for strings should be declared as

"const". Symbol and compiled- function variables cannot be declared const,

since they must be modified during initialization to intern symbols, install

compiled-functions in symbols, and install pointers to C functions in

compiled-functions.

<P>

The number of external symbols needed within individual C files is a concern,

since we have seen this be a link time limitation on some systems. If this is

the case, then we can still achieve global constant folding within a translated

system by using an approach where the pointers to constants are installed into

a single large constant vector, with indices into this vector used to fetch the

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

tl-manual.html

Latest commit

History

tl-manual.html

File metadata and controls