ChatScript/HTMLDOCUMENTATION/ChatScript-Pattern-Redux.html at 13.4 · ChatScript/ChatScript

History

773 lines (770 loc) · 34.5 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

<!DOCTYPE html>

<head>

<title>ChatScript-Pattern-Redux</title>

<style>

html {

color: #1a1a1a;

background-color: #fdfdfd;

}

body {

margin: 0 auto;

max-width: 36em;

padding-left: 50px;

padding-right: 50px;

padding-top: 50px;

padding-bottom: 50px;

hyphens: auto;

overflow-wrap: break-word;

text-rendering: optimizeLegibility;

font-kerning: normal;

}

@media (max-width: 600px) {

body {

font-size: 0.9em;

padding: 12px;

}

h1 {

font-size: 1.8em;

}

@media print {

html {

background-color: white;

}

body {

background-color: transparent;

color: black;

font-size: 12pt;

}

p, h2, h3 {

orphans: 3;

widows: 3;

}

h2, h3, h4 {

page-break-after: avoid;

}

p {

margin: 1em 0;

}

a {

color: #1a1a1a;

}

a:visited {

color: #1a1a1a;

}

img {

max-width: 100%;

}

svg {

height; auto;

max-width: 100%;

}

h1, h2, h3, h4, h5, h6 {

margin-top: 1.4em;

}

h5, h6 {

font-size: 1em;

font-style: italic;

}

h6 {

font-weight: normal;

}

ol, ul {

padding-left: 1.7em;

margin-top: 1em;

}

li > ol, li > ul {

margin-top: 0;

}

blockquote {

margin: 1em 0 1em 1.7em;

padding-left: 1em;

border-left: 2px solid #e6e6e6;

color: #606060;

}

code {

font-family: Menlo, Monaco, Consolas, 'Lucida Console', monospace;

font-size: 85%;

margin: 0;

hyphens: manual;

}

pre {

margin: 1em 0;

overflow: auto;

}

pre code {

padding: 0;

overflow: visible;

overflow-wrap: normal;

}

.sourceCode {

background-color: transparent;

overflow: visible;

}

hr {

background-color: #1a1a1a;

border: none;

height: 1px;

margin: 1em 0;

}

table {

margin: 1em 0;

border-collapse: collapse;

width: 100%;

overflow-x: auto;

display: block;

font-variant-numeric: lining-nums tabular-nums;

}

table caption {

margin-bottom: 0.75em;

}

tbody {

margin-top: 0.5em;

border-top: 1px solid #1a1a1a;

border-bottom: 1px solid #1a1a1a;

}

th {

border-top: 1px solid #1a1a1a;

padding: 0.25em 0.5em 0.25em 0.5em;

}

td {

padding: 0.125em 0.5em 0.25em 0.5em;

}

header {

margin-bottom: 4em;

text-align: center;

}

#TOC li {

list-style: none;

}

#TOC ul {

padding-left: 1.3em;

}

#TOC > ul {

padding-left: 0;

}

#TOC a:not(:hover) {

text-decoration: none;

}

code{white-space: pre-wrap;}

span.smallcaps{font-variant: small-caps;}

div.columns{display: flex; gap: min(4vw, 1.5em);}

div.column{flex: auto; overflow-x: auto;}

div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}

/* The extra [class] is a hack that increases specificity enough to

override a similar rule in reveal.js */

ul.task-list[class]{list-style: none;}

ul.task-list li input[type="checkbox"] {

font-size: inherit;

width: 0.8em;

margin: 0 0.8em 0.2em -1.6em;

vertical-align: middle;

}

.display.math{display: block; text-align: center; margin: 0.5rem auto;}

</style>

<!--[if lt IE 9]>

<![endif]-->

</head>

<body>

<h1 id="chatscript-pattern-redux">ChatScript Pattern Redux</h1>

Copyright Bruce Wilcox, mailto:[email protected]

www.brilligunderstanding.com

Revision 4/24/2022 cs12.1

Pattern matching information was introduced in the Beginner manual

and expanded in the <a

href="ChatScript-Advanced-User-Manual.html">Advanced User Manual</a>.

Since pattern matching is of such importance, this concise manual lists

everything about patterns in one place, including patterns not listed in

the Advanced manual.

NOTE: despite the extraordinary range of weird matching abilities,

almost all of my normal code is based on one of three patterns:

<pre><code># rule 1

u: (![plastic] << bag trick >>)

# rule 2

u: (I * love * you)

# rule 3

t: What fruit do you like?

a: (~why)

a: (orange)

a: (apple)

a: (~vegetables)</code></pre>

Rule 1 - searches for key words in any order. While there is a normal

order to questions, e.g., where do you live, one can ask

you live where? so handling arbitrary order is generally

valuable. Just have all the keywords you need to detect a meaning and

use <code>![...]</code> to get rid of interpretations you don’t

want.

Rule 2 - requires an order when both first person and second person

pronouns are involved, since order will matter.

Rule 3 - uses simple keywords or concept sets in rejoinders, since

the context of the gambit constrains the input so highly.

<h2 id="if-patterns">IF Patterns</h2>

Pattern matching can be done not just in a rule’s pattern component

but also in its output component, within an <code>if</code> statement,

e.g.:

<pre><code>if ( PATTERN _~number ) { print( _0) }</code></pre>

That is, if the first word in the test condition is the word

<code>PATTERN</code>, the rest is treated as a standard pattern of a

rule (not using <code>AND</code> <code>OR</code> etc). You can capture

data here or do anything a normal pattern does.

<h2 id="pattern-position">Pattern Position</h2>

A pattern consists of tokens. By default, any normal word in

canonical form can match any form of the word, so he in a

pattern can match him, he, his. A pattern

aborts when a token fails to match unless allowed to not match.

The performance cost of a pattern generally is linear in the number

of tokens processed. That means these two rules take the same time to

match (other than the imperceptable time difference to read the longer

token).

<pre><code>u: (apple)

u: (~ten_thousand_names_of_fruits)</code></pre>

The system tracks current position in the sentence as it matches.

The first token of a pattern is allowed to match anywhere in the

sentence. After that normally tokens are matched against words in

consecutive order in the input. If a pattern starts to match and then

fails, the system is allowed to retry matching later in the sentence

once. It does this by freeing up the first matching word/concept token

and letting it rebind later.

Given this rule:

<pre><code>u: ( I like apple )</code></pre>

The input Do you know that I like oranges and I like apples

would match as follows.

The first pattern token <code>I</code> would match the first

I partway into the sentence (because it is allowed to match

anywhere).

The next pattern token <code>like</code> is required to match the

next input word in the sentence, which it does.

The third pattern word apple fails to match

oranges. We just failed. But we have one retry left.

So I is sought deeper in the sentence and matched.

<code>like</code> matches like and <code>apple</code> matches

apples. So we match.

Had that not matched, no more retries exist so the failure sticks.

There are tokens you can use that alter the rules/location around

current position.

<h2 id="pattern-constituents">Pattern Constituents</h2>

<h3 id="type-of-sentence-s">Type of Sentence <code>s:</code>

A responder beginning with <code>s:</code> or <code>?:</code>

implictly is testing that the sentence is a statement or a question. It

is built in even before the pattern. All other rules are not immediately

sensitive to kind of sentence.

<h3 id="existence---word-concept-var-sysvar-_0-0-var">Existence - word

<code>~concept</code> <code>$var</code> <code>%sysvar</code>

Basic pattern matching is against words or concepts. Does this word

or concept exist?

<pre><code>u: ( this ~animal )</code></pre>

matches this dog or this dogs but not this is

my dog

You can also ask if a user variable is defined just by naming it:

<pre><code>u: ( $myvar help )</code></pre>

this only matches if input has help and <code>$myvar</code>

is not null.

<h4 id="system-variables">System variables</h4>

one would not ask if they are defined (they almost always are) but

would use in a relation instead.

Similarly, <code>_0</code> by itself in a pattern means is it

defined, that is, not null.

<pre><code>u: ( _{apple orange} _0 )</code></pre>

matches only if apple or orange got matched. And <code>@0</code> by

itself means does this fact-set have any facts stored in it.

You can also reference an argument to a function call, and its value

will be used to decide what to do.

<h4 id="stand-alone">Stand-alone <code>?</code></h4>

A stand-alone <code>?</code> means is this sentence a question.

<h4 id="stand-alone-1">Stand-alone <code>!?</code></h4>

<code>!?</code> would test if it is not a question.

<h4 id="stand-alone-2">Stand-alone <code>~</code></h4>

A stand-alone <code>~</code> means the current topic is already on

the pending topic list (was recently considered an active topic).

<h3 id="grouping-pairs">Grouping Pairs <code>(</code> <code>)</code>

<h4 id="parens">Parens <code>(</code> … <code>)</code></h4>

Parens mean the tokens within must be found “in sequence”. The

notation of a pattern starts with parens, but has the unusual property

of allowing the match to occur anywhere within the sentence, not just at

the start. Any nested parens do not have that property, and still

require in sequence.

<pre><code>u: ( this (is my) pattern)</code></pre>

matches this is my pattern and not this sometimes is my

pattern

<h4 id="brackets">Brackets <code>[</code> … <code>]</code></h4>

Brackets mean match one of contained tokens, in the order given. A

bracket list tries all its members in sequence, stopping when it finds a

match. For the input I go home for Christmas this will not

match:

because <code>~noun</code> will match to home and then

home cannot be found later. On a retry, <code>~noun</code> will

match to Christmas. Since <code>~noun</code> can match multiple

times, <code>~verb</code> never gets tried.

You can composite things like:

<pre><code>u: ( [ apple pear (favorite fruit) cherry ] )</code></pre>

to match I eat pear and my favorite fruit, but this

form is unlikely to be used in normal CS.

Note that <code>[ … ]</code> and <code>~concept</code> are similar

but different in important ways. Matching <code>~concept</code> is

faster than the corresponding list inside <code>[]</code> because naming

the concept only requires one token. But it takes more memory to store

the concept than it does to put the words inside the

<code>[]</code>.

The other fundamental difference is in position. Words in

<code>[]</code> are matched in the order given, possibly moving your

position mark deep into the sentence.

Words in a concept are all matched simultaneously,

so which one is found first in the sentence is what sets the position.

For an input I like beer but not wine

<pre><code>u: ( I like * ~drinks )</code></pre>

this would match beer if beer and wine are in that concept in any

order.

this would match wine even though it is farther in the sentence.

<h4 id="braces">Braces <code>{</code> … <code>}</code></h4>

Braces means match one of the contained tokens if you can, but don’t

fail if you don’t.

Using <code>{}</code> inside of angles is pointless (unless you put

an underscore in front to memorize something) because it makes no

difference to matching whether or not you had the <code>{}</code>

content.

Braces do not align position within a sentence. They are normally

used to assist in positional alignment by swallowing words.

<pre><code>u: ( I go to {the} market )</code></pre>

matches both I go to the market and I go to

market.

If you use underscore before braces to memorize the answer found,

then when no answer is found the match variable is set to

<code>null</code> (no content) but it is set.

<h4 id="angles">Angles <code><<</code> …

Angles mean match all of the contained tokens in any order.

Putting <code>*</code> in this kind of pattern is illegal because it

has no meaning.

Position is not relevant anyway. Position is freely reset to the

start following this sequence so if you had the pattern:

<pre><code>u: ( I * like << really >> photos )</code></pre>

and input photos I really like then it would match because

it found I * like then found anywhere <code>really</code> and

then reset the position freely back to start and found

<code>photos</code> anywhere in the sentence.

<h3 id="wildcards-2-3--2-2b">Wildcards <code>*</code> <code>*~2</code>

Wildcards allow you to relax the positional requirements for

matching. The classic wildcard <code>*</code>allows you to have zero or

more words between other tokens in a pattern.

matches I love chicken and hate you as well as I you

they.

You can limit the unlimited range by adding <code>~n</code> after it.

So <code>*~1</code> means 0 or 1 words may intervene.

<code>*~2</code> is what I commonly use to restrict a range. This

allows a determiner and an adjective to fit before a noun, for example,

but not allow a pattern to match weirdly.

matches I like my cats or I like a yellow cat.

<code>*~2b</code> is similar to <code>*~2</code> except it tries to

match bidirectionally. First it tries to match behind it, and if that

fails, it tries forward (like *~2). You may not follow a bidirectional

wildcard with either <code>{</code> or <code>(</code>.

You can also request a match of a specific number of words in

succession using <code>*n</code>. <code>*1</code> means get the next

word. If you are already positionally on the end of the sentence, this

match fails.

If you aren’t sure how many words are left, you could do something

like this:

<pre><code>u: ( apple _[*4 *3 *2 *1] )</code></pre>

which will grab the next 4 or 3 or 2 or 1 words, depending on how

many are available.

Generally done with an underscore in front to memorize the

sequence.

<code>*-2</code> is like <code>*2</code>, only it matchs backwards

instead of forwards. Valid thru <code>*-9</code>.

<h3 id="negation-and-and--">Negation <code>!</code> and <code>!!</code>

and ‘!-’</h3>

<code>!x</code> means match only if x is not found anywhere in the

sentence later than where we are:

This pattern says the word not cannot occur anywhere in the

sentence.

<code>!!x</code> means match only if x is not the next word.

This pattern says the word not cannot occur anywhere before us in the

sentence.

<code>!-x</code> means match only if x is not any prior word.

<h3 id="original-form">Original Form <code>'</code></h3>

While CS normally matches both original and canonical forms of words

when you give a pattern word in canonical form, you can require it only

match the original form by quoting it.

does not match I am taking it

Likewise in a relation where you use a match variable, quoting it

means use only its original value.

<pre><code>u: ( _~fruits '_0==apple )</code></pre>

matches I like apple but not I like apples

<h3 id="literal-next">Literal Next <code>\</code></h3>

You can tell CS that a token should be considered a token, not a

special form, by putting a <code>\</code> in front of it.

This applies to single characters like: <code>\[ \]</code> and it

also applies to relational tokens like <code>\tom=*</code> which means

do not treat this as a relational test, but instead as a token whose

name is wildcarded.

Note that the <code>\</code> does not suppress detecting the

<code>*</code> in a word and therefore allowing variant spelling.

<h3 id="composite-words-my-composite-word">Composite Words “my composite

word”</h3>

There are sequences of words that have a specific meaning and are

treated as a single word, e.g., batting cage.

In a dictionary these are often represented using an <code>_</code>

instead of a space, e.g., batting_cage.

When CS tokenized your input, it automatically converts your

separated input words into ones with underscores in them when

appropriate.

They are no longer single words, but instead a single composite word.

This would normally mean that

<pre><code>u: ( batting cage )</code></pre>

would not match. But the script compiler does the same tokenization

thing, so your actual internal pattern looks like:

<pre><code>u: ( batting_cage )</code></pre>

For clarity, it is recommended that when you know you are dealing

with a composite word, you use the underscore notation.

Sequences of words can also be designated using double quotes.

<pre><code>u: ( "batting cage" )</code></pre>

CS converts a quoted string into the same underscore notation. The

distinction between the two is generally one of documentation.

I use quoted strings for phrases to highlight the intention that they

are a phrase. I also use them for multiple word proper names like

Eiffel Tower.

It is particularly important to use the quoted notation when

punctuation is embedded in the name like John’s Apple Pie

because knowing where to put underscores when punctuation is involved is

tricky.

By using quotes, you tell the system to manage things appropriately

(<code>John_'s_Apple_Pie</code>)

When using the quoted notation, the system will actually try to match

original and canonical, just like with ordinary words. If all words in

phrase are canonical, the system will match any form of each word.

If one is not canonical, it can only match the original form.

<pre><code>u: ( "king of the jungle" )</code></pre>

cannot match kings of the jungle because the in

pattern is not canonical.

<pre><code>u: ( "king of a jungle" )</code></pre>

but the above rule can match kings of the jungle since all

words in the quote are canonical.

<h3 id="memorization-_">Memorization <code>_</code></h3>

Placing an underscore means to memorize what was matched onto a match

variable. Match variables are allocated in sequence in a pattern,

starting with <code>_0</code> and increasing to <code>_1</code> etc for

each memorized match.

The system memorizes the original word, the canonical word, and the

position in the sentence of the match.

<h3 id="relations">Relations <code>></code> <code><</code>

You can test relationships by conjoining a token with a relationship

operator and another token, with no spaces. E.g.,

<pre><code>u: ( I am _~number > _0>18 ) You are of legal age.</code></pre>

The relationship operators are:

<table>

</colgroup>

<thead>

<th style="text-align: center;">operator</th>

<th>meaning</th>

</tr>

</thead>

<tbody>

<td>equal</td>

</tr>

<td>not equal</td>

</tr>

</tr>

<td>less than or equal to</td>

</tr>

<td>greater than</td>

</tr>

<td>greater than or equal to</td>

</tr>

<td>bit anded results in non-zero</td>

</tr>

<td>is member of 2nd arg concept or topic or JSON array. If no argument

occurs after, means is value found in sentence</td>

</tr>

</tbody>

</table>

Using a compare with two text strings (not numbers) will evaluate

based on case-independent alpha sorting.

For comparison against a number (< <= > >=) a null value

will be treated as the number 0.

The <code>?</code> operator has two forms. <code>xxx?~yyy</code> will

look for actual membership in the set whereas <code>_n?~yyy</code> will

only see if the location of match detection of <code>_n</code> is the

same as a corresponding match location for the concept. If the concept

has not been marked, then obviously no match is found.

Note: You can put <code>!</code> before the tokens instead of using

<code>!=</code> and <code>!?</code>. E.g.,

<pre><code>u: ( _~noun !_0?~fruit ) if the noun is not in fruit concept</code></pre>

<h4 id="dynamic-matching">Dynamic matching</h4>

The stand-alone <code>?</code> is used with variables for dynamic

matching.

While you cannot do memorization in front of a comparison (because no

positional data is gained) you can in front of the <code>?</code>

operator since finding where in the sentence something is will return a

position for memorization.

means is the value of <code>$var</code> found in the sentence

anywhere

Note that when <code>$var</code> is a normal word, that is simple for

CS to handle.

If <code>$var</code> is a phrase, then generally CS cannot match it.

This is because for phrases, CS needs to know in advance that a phrase

can be matched.

If you put take a seat as a keyword in a concept or topic or

pattern, that phrase is stored in the dictionary and marked as a pattern

phrase, meaning if the phrase is ever seen in a sentence, it should be

noticed and marked so it can be matched in a pattern.

But if it is merely in a variable, then the dictionary is unaware of

the phrase and so <code>$var?</code> will not work for it.

There is also a <code>?$var</code> form, which means see if the value

of the variable is findable. The value can be either a word or a concept

name.

<h3 id="assignment-in-a-pattern">Assignment in a pattern</h3>

You can directly assign to any variable from any other value using

<code>:=</code>. You can even do arithmetic for these assignments (:+=

:-= “*= :/= :&= and any of the other numeric assignment operators)

.

<pre><code> $value = 5

( _some_test $value:=5 $value1:=_0 $value2:='_0 $value3:=%time )

( _some_test $value:+=5 $value1:-=_0 )</code></pre>

If you want to do function call assignment, you can do this:

<pre><code> $value:=^"^function(foo d)"</code></pre>

The reason you have to do an active string here, is because normally

spaces break apart tokens, and a pattern token involving a function

needs to have all arguments part of the same token. Hence assigning from

an active string, where the double quotes around it prevents the token

from breaking apart.

<h3 id="escape">Escape <code>\</code></h3>

If you want to match a reserved punctuation symbol like

<code>[</code> or <code>(</code>, you must escape it by putting a

backslash in front. This is commonly done in matching out-of-band

information.

One also uses escape if you want to know if the sentence was

punctuated with an exclamation.

means user did something like I love you!.

You may use either <code>?</code> or <code>\?</code> when asking if

the sentence has a question in it. You would generally only do this in a

rejoinder.

<h2 id="concept-intersection-keywords">Concept intersection

keywords</h2>

If you join a word (or a concept) and one or more concepts, that

represents the intersection of them. e.g., (animalstasty)

will reference all animals considered tasty.

Note, you cannot use word~1 (meaning specification) or word~n

(pos-tag specification) on your first word.

<h3 id="function-call---xxx...">Function Call -

You can call a function from within a pattern. If the function

returns a failure code of any kind, the match fails. If the function is

a predefined system function, you are allowed relation operators on the

result as well.

<pre><code>u: ( ^lastused(~ gambit)>5 )</code></pre>

NOTE:

User defined functions (<code>patternmacros</code>) do not allow

relational operators after them.

Patternmacros do not generate answers. They are treated as in-line

additional pattern tokens.

<pre><code>Patternmacro: ^testuse(^value) _~noun _0==^value

u: ( ~noun ^testuse(apple)) # matches "I like pear and apple"</code></pre>

A powerful use of function calling is to call

<code>^respond(~topicname)</code> in a pattern. The topic can match

something and set up a variable for further guidance. E.g.,

<pre><code>u: ( ^respond(~finddelay) $$delay ) Wait for $$delay.</code></pre>

<code>~finddelay</code> can hunt for time referred to in seconds,

minutes, hours, etc, or in words like next week or tomorrow or whatever

complex matching you want to do.

<h3 id="partially-spelled-words-ing-bottle-8bott">Partially Spelled

words: <code>*ing</code> <code>bottle*</code> <code>8bott*</code></h3>

You can request a match against a partial spelling of an original

word (not its canonical alternative) in various ways.

If you use <code>*</code> somewhere after an alpha, it matches any

number of characters.

matches many misspellings of sagittarius.

If you use <code>*</code> followed by an alpha, you get anything as a

prefix followed by what you request.

matches Martha.

If you put a number in front, it means the word must be exactly that

many characters long, matching your pattern.

matches sitter.

When using an <code>*</code> word, you can use <code>.</code> to

indicate exactly one character of any value.

matches situation.

<h3 id="altering-position-_0-_0--_0">Altering Position <code><</code>

When you put <code><</code> in your pattern, it doesn’t actually

match anything. It means “reset position” to the start of the

sentence.

matches I love but not do I love.

When you put <code>></code> in your pattern, it does not alter

your position, but it tries to confirm you are on the last word of the

sentence.

in this pattern <code>></code> is redundant, since <code>*</code>

would match to the end of the sentence anyway.

You may also use <code>!></code> to ask that we NOT be at the end

of the sentence.

<code>@_1+</code> says to set the position to where the given match

variable (<code>_1</code>) matched. Positional sequencing will continue

normally increasing thereafter.

You can suffix the match variable with <code>-</code> instead, to

tell CS to begin matching in reverse order in the sentence, i.e.,

matching backwards to the start of the sentence.

When you use <code>+</code>, the position starts at the end of the

match. When you use <code>-</code>, the position starts at the start of

the match.

<pre><code>u: ( _home is @_0- pretty )</code></pre>

matches my pretty home is near here.

Note when you use <code>-</code> for reverse matching, the behavior

of <code><</code> and <code>></code> changes. <code>></code>

sets a position and <code><</code> confirms it instead of the way it

is for <code>+</code>.

When you omit either + or -, you create a matchable anchor like

<code>@_0</code>. It represents what was found at that position, and

during the pattern must also match at that location now.

The above pattern says that the word <code>is</code> must be

precisely found between the locations referenced by <code>@0</code> and

<code>@1</code>.

<h3 id="retrying-scan-retry">Retrying scan <code>@retry</code></h3>

Normally one matches a pattern, performs the output code, and if you

want to restart the pattern to find the next occurrence of a match, you

use ^retry(RULE) or ^retry(TOPRULE). Well, if your pattern executes

@retry as a token, it

will retry on its own without needing to execute any output code. Useful

in conjunction with ^testpattern.

<h2 id="debugging">Debugging</h2>

<h3 id="testpattern"><code>:testpattern</code></h3>

The system inputs the sentence and tests the pattern you provide

against it. It tells you whether it matched or failed.

<pre><code>:testpattern ( it died ) Do you know if it died?</code></pre>

Some patterns require variables to be set up certain ways. You can

perform assignments prior to the sentence.

<pre><code>:testpattern ($gender=male hit) $gender = male hit me</code></pre>

Typically you might use <code>:testpattern</code> to see if a subset

of your pattern that fails works, trying to identify what has gone

wrong. You can also name an existing rule, rather than supply a

pattern.

<pre><code>:testpattern ~aliens.ufo do you believe in aliens?</code></pre>

<h3 id="prepare"><code>:prepare</code></h3>

Since CS may revise your input for various reasons, to know why a

pattern fails you may need to know what actually say.

Using <code>:prepare</code> will tell you what the final input words

were, and what concepts got marked.

<pre><code>:prepare This is a sentence.</code></pre>

<h3 id="verify"><code>:verify</code></h3>

In general all of your responders and rejoinders should have a sample

input comment above them.

<pre><code>#! Do you believe in dogs?

?: ( << you believe dog >>) I do.</code></pre>

This allows you to do

<pre><code>:verify ~mytopic pattern</code></pre>

and have the system test if your rule would match your input.

<h3 id="trace"><code>:trace</code></h3>

You can get a trace of various system functions.

<pre><code>:trace pattern</code></pre>

will show you pattern matching and match variable binding. Also

useful if done before <code>:testpattern</code>.

<h2 id="overrulingsupplementing-cs-matching">Overruling/Supplementing CS

Matching</h2>

Sometimes you want to supplement the marking of concepts done by

adding your own marks. This is particularly useful handling idioms where

no keyword exists. I set <code>$cs_prepass</code> to be a topic which

looks for idioms.

This will cause the work topic to react later as though one of its

keywords was given.

Likewise sometimes you want to disable some marking. For example,

chocolate is both a flavor and a color. To avoid going to the

color topic incorrectly I might do this:

<pre><code>u: ( << _chocolate [taste eat] >> ) ^unmark(~colorTopic _0)</code></pre>

If the above rule detects chocolate in the apparent context

of eating, it will unmark any reference to <code>~colortopic</code>

found at the location of the word chocolate.

<h2 id="graduation-exercise">Graduation Exercise</h2>

The pattern matching system of ChatScript has esoteric abilities. I

was asked if I would implement an additional one that would look

something like this:

which he wanted to mean: find all those words, in any order, with

each word after the first within a gap range of 3 from the previous

word.

So it could recognize: green is the nose with my mucus or

my nose puts forth green mucus and not match while green is

my favorite color or I don’t want to see it in my nose

mucus.

I replied it could probably already be done in CS as it stood, and a

few minutes later had whipped up code to do that. Your advanced

challenge, if you care to think about it and really warp your mind, is

to think of a way to do it yourself. That will prove you really

understand what can be done in pattern matching. Answer is on the next

page.

<pre><code>patternmacro: ^nearbyword(^word)

[

(@_0+ *~3 _^word ^eval(_0 = _1))

(@_0- *~3 _^word ^eval(_0 = _1))

]</code></pre>

The macro contains two choices, a sequence that looks forward from

where you are and a sequence that looks backwards. Using a nested

<code>()</code> the system will effectively treat that as a single

match, which makes it a single token to be used in a <code>[]</code>

choice.

Whichever <code>()</code> finds the next word, it memorizes where it

is, then sets the current <code>_0</code> location to the new word and

the choice ends.

While you can’t do code execution directly in a pattern, and you

can’t call out to user-defined outputmacros, you can call out to system

functions, and <code>^eval</code> lets you do any amount of normal code

execution. So this allows us to assign the new match variable to the

old. And assigning match variables means assigning all of their

attributes, including original value, canonical value, and actual

position data.

<pre><code>topic: ~test()

u: ( _green ^nearbyword(nose) ^nearbyword(mucus)) You have a disgusting nose.</code></pre>

The test pattern therefore, finds the first word and sets the current

<code>_0</code> location. Then it uses the macro to find the next word

and change location, and then the next word.

</body>

</html>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

ChatScript-Pattern-Redux.html

Latest commit

History

ChatScript-Pattern-Redux.html

File metadata and controls