-
Notifications
You must be signed in to change notification settings - Fork 13
Expand file tree
/
Copy pathsubstring-search.html
More file actions
384 lines (339 loc) · 127 KB
/
substring-search.html
File metadata and controls
384 lines (339 loc) · 127 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>Algorithm - Substring Search</title>
<meta name="generator" content="VuePress 1.8.2">
<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
<link rel="manifest" href="/site.webmanifest">
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
<link rel="mask-icon" href="/safari-pinned-tab.svg" color="#5bbad5">
<meta name="description" content="Introduction To Knuth-Morris-Pratt (KMP) Algorithm, Introduction to Rabin-Karp Algorithm, KMP Algorithm in C, Python Implementation of KMP algorithm.">
<meta property="og:site_name" content="DevTut">
<meta property="og:title" content="Algorithm - Substring Search">
<meta property="og:description" content="Introduction To Knuth-Morris-Pratt (KMP) Algorithm, Introduction to Rabin-Karp Algorithm, KMP Algorithm in C, Python Implementation of KMP algorithm.">
<meta property="og:type" content="article">
<meta property="og:url" content="/algorithm/substring-search.html">
<meta property="og:image" content="/logo.png">
<meta name="twitter:title" content="Algorithm - Substring Search">
<meta name="twitter:description" content="Introduction To Knuth-Morris-Pratt (KMP) Algorithm, Introduction to Rabin-Karp Algorithm, KMP Algorithm in C, Python Implementation of KMP algorithm.">
<meta name="twitter:url" content="/algorithm/substring-search.html">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:image" content="/logo.png">
<meta name="theme-color" content="#ffffff">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="msapplication-TileImage" content="/mstile-150x150.png">
<meta name="msapplication-TileColor" content="#da532c">
<meta name="google-site-verification" content="76_rKXgwMVIjd-axJC_1zPV9OS4mEjvtgjYOWVkAdnQ">
<link rel="preload" href="/assets/css/0.styles.60619e34.css" as="style"><link rel="preload" href="/assets/js/app.1779e102.js" as="script"><link rel="preload" href="/assets/js/3.2cfa8016.js" as="script"><link rel="preload" href="/assets/js/74.4a6de632.js" as="script">
<link rel="stylesheet" href="/assets/css/0.styles.60619e34.css">
</head>
<body>
<div id="app" data-server-rendered="true"><div class="theme-container"><header class="navbar"><div class="sidebar-button"><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" role="img" viewBox="0 0 448 512" class="icon"><path fill="currentColor" d="M436 124H12c-6.627 0-12-5.373-12-12V80c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12zm0 160H12c-6.627 0-12-5.373-12-12v-32c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12zm0 160H12c-6.627 0-12-5.373-12-12v-32c0-6.627 5.373-12 12-12h424c6.627 0 12 5.373 12 12v32c0 6.627-5.373 12-12 12z"></path></svg></div> <a href="/" class="home-link router-link-active"><!----> <span class="site-name">DevTut</span></a> <div class="links"><form id="search-form" role="search" class="algolia-search-wrapper search-box"><input id="algolia-search-input" class="search-query"></form> <nav class="nav-links can-hide"> <a href="https://github.com/devtut/generate" target="_blank" rel="noopener noreferrer" class="repo-link">
GitHub
<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a></nav></div></header> <div class="sidebar-mask"></div> <aside class="sidebar"><nav class="nav-links"> <a href="https://github.com/devtut/generate" target="_blank" rel="noopener noreferrer" class="repo-link">
GitHub
<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a></nav> <ul class="sidebar-links"><li><section class="sidebar-group depth-0"><p class="sidebar-heading open"><span>Algorithm</span> <!----></p> <ul class="sidebar-links sidebar-group-items"><li><a href="/algorithm/" aria-current="page" class="sidebar-link">Disclaimer</a></li><li><a href="/algorithm/getting-started-with-algorithm.html" class="sidebar-link">Getting started with algorithm</a></li><li><a href="/algorithm/algorithm-complexity.html" class="sidebar-link">Algorithm Complexity</a></li><li><a href="/algorithm/big-o-notation.html" class="sidebar-link">Big-O Notation</a></li><li><a href="/algorithm/trees.html" class="sidebar-link">Trees</a></li><li><a href="/algorithm/binary-search-trees.html" class="sidebar-link">Binary Search Trees</a></li><li><a href="/algorithm/check-if-a-tree-is-bst-or-not.html" class="sidebar-link">Check if a tree is BST or not</a></li><li><a href="/algorithm/binary-tree-traversals.html" class="sidebar-link">Binary Tree traversals</a></li><li><a href="/algorithm/lowest-common-ancestor-of-a-binary-tree.html" class="sidebar-link">Lowest common ancestor of a Binary Tree</a></li><li><a href="/algorithm/graph.html" class="sidebar-link">Graph</a></li><li><a href="/algorithm/graph-traversals.html" class="sidebar-link">Graph Traversals</a></li><li><a href="/algorithm/dijkstras-algorithm.html" class="sidebar-link">Dijkstra’s Algorithm</a></li><li><a href="/algorithm/a-pathfinding.html" class="sidebar-link">A* Pathfinding</a></li><li><a href="/algorithm/a-pathfinding-algorithm.html" class="sidebar-link">A* Pathfinding Algorithm</a></li><li><a href="/algorithm/dynamic-programming.html" class="sidebar-link">Dynamic Programming</a></li><li><a href="/algorithm/applications-of-dynamic-programming.html" class="sidebar-link">Applications of Dynamic Programming</a></li><li><a href="/algorithm/kruskal-s-algorithm.html" class="sidebar-link">Kruskal's Algorithm</a></li><li><a href="/algorithm/greedy-algorithms.html" class="sidebar-link">Greedy Algorithms</a></li><li><a href="/algorithm/applications-of-greedy-technique.html" class="sidebar-link">Applications of Greedy technique</a></li><li><a href="/algorithm/prim-s-algorithm.html" class="sidebar-link">Prim's Algorithm</a></li><li><a href="/algorithm/bellman-ford-algorithm.html" class="sidebar-link">Bellman–Ford Algorithm</a></li><li><a href="/algorithm/line-algorithm.html" class="sidebar-link">Line Algorithm</a></li><li><a href="/algorithm/floyd-warshall-algorithm.html" class="sidebar-link">Floyd-Warshall Algorithm</a></li><li><a href="/algorithm/catalan-number-algorithm.html" class="sidebar-link">Catalan Number Algorithm</a></li><li><a href="/algorithm/multithreaded-algorithms.html" class="sidebar-link">Multithreaded Algorithms</a></li><li><a href="/algorithm/knuth-morris-pratt-kmp-algorithm.html" class="sidebar-link">Knuth Morris Pratt (KMP) Algorithm</a></li><li><a href="/algorithm/edit-distance-dynamic-algorithm.html" class="sidebar-link">Edit Distance Dynamic Algorithm</a></li><li><a href="/algorithm/online-algorithms.html" class="sidebar-link">Online algorithms</a></li><li><a href="/algorithm/integer-partition-algorithm.html" class="sidebar-link">Integer Partition Algorithm</a></li><li><a href="/algorithm/maximum-path-sum-algorithm.html" class="sidebar-link">Maximum Path Sum Algorithm</a></li><li><a href="/algorithm/maximum-subarray-algorithm.html" class="sidebar-link">Maximum Subarray Algorithm</a></li><li><a href="/algorithm/sliding-window-algorithm.html" class="sidebar-link">Sliding Window Algorithm</a></li><li><a href="/algorithm/sorting.html" class="sidebar-link">Sorting</a></li><li><a href="/algorithm/bubble-sort.html" class="sidebar-link">Bubble Sort</a></li><li><a href="/algorithm/merge-sort.html" class="sidebar-link">Merge Sort</a></li><li><a href="/algorithm/insertion-sort.html" class="sidebar-link">Insertion Sort</a></li><li><a href="/algorithm/bucket-sort.html" class="sidebar-link">Bucket Sort</a></li><li><a href="/algorithm/quicksort.html" class="sidebar-link">Quicksort</a></li><li><a href="/algorithm/counting-sort.html" class="sidebar-link">Counting Sort</a></li><li><a href="/algorithm/heap-sort.html" class="sidebar-link">Heap Sort</a></li><li><a href="/algorithm/cycle-sort.html" class="sidebar-link">Cycle Sort</a></li><li><a href="/algorithm/odd-even-sort.html" class="sidebar-link">Odd-Even Sort</a></li><li><a href="/algorithm/selection-sort.html" class="sidebar-link">Selection Sort</a></li><li><a href="/algorithm/pigeonhole-sort.html" class="sidebar-link">Pigeonhole Sort</a></li><li><a href="/algorithm/radix-sort.html" class="sidebar-link">Radix Sort</a></li><li><a href="/algorithm/shell-sort.html" class="sidebar-link">Shell Sort</a></li><li><a href="/algorithm/pancake-sort.html" class="sidebar-link">Pancake Sort</a></li><li><a href="/algorithm/searching.html" class="sidebar-link">Searching</a></li><li><a href="/algorithm/substring-search.html" aria-current="page" class="active sidebar-link">Substring Search</a><ul class="sidebar-sub-headers"><li class="sidebar-sub-header"><a href="/algorithm/substring-search.html#introduction-to-knuth-morris-pratt-kmp-algorithm" class="sidebar-link">Introduction To Knuth-Morris-Pratt (KMP) Algorithm</a></li><li class="sidebar-sub-header"><a href="/algorithm/substring-search.html#introduction-to-rabin-karp-algorithm" class="sidebar-link">Introduction to Rabin-Karp Algorithm</a></li><li class="sidebar-sub-header"><a href="/algorithm/substring-search.html#kmp-algorithm-in-c" class="sidebar-link">KMP Algorithm in C</a></li><li class="sidebar-sub-header"><a href="/algorithm/substring-search.html#python-implementation-of-kmp-algorithm" class="sidebar-link">Python Implementation of KMP algorithm.</a></li></ul></li><li><a href="/algorithm/breadth-first-search.html" class="sidebar-link">Breadth-First Search</a></li><li><a href="/algorithm/depth-first-search.html" class="sidebar-link">Depth First Search</a></li><li><a href="/algorithm/hash-functions.html" class="sidebar-link">Hash Functions</a></li><li><a href="/algorithm/travelling-salesman.html" class="sidebar-link">Travelling Salesman</a></li><li><a href="/algorithm/shortest-common-supersequence-problem.html" class="sidebar-link">Shortest Common Supersequence Problem</a></li><li><a href="/algorithm/knapsack-problem.html" class="sidebar-link">Knapsack Problem</a></li><li><a href="/algorithm/equation-solving.html" class="sidebar-link">Equation Solving</a></li><li><a href="/algorithm/longest-common-subsequence.html" class="sidebar-link">Longest Common Subsequence</a></li><li><a href="/algorithm/longest-increasing-subsequence.html" class="sidebar-link">Longest Increasing Subsequence</a></li><li><a href="/algorithm/check-two-strings-are-anagrams.html" class="sidebar-link">Check two strings are anagrams</a></li><li><a href="/algorithm/pascal-s-triangle.html" class="sidebar-link">Pascal's Triangle</a></li><li><a href="/algorithm/algo-print-a-m-n-matrix-in-square-wise.html" class="sidebar-link">Algo:- Print a m*n matrix in square wise</a></li><li><a href="/algorithm/matrix-exponentiation.html" class="sidebar-link">Matrix Exponentiation</a></li><li><a href="/algorithm/polynomial-time-bounded-algorithm-for-minimum-vertex-cover.html" class="sidebar-link">polynomial-time bounded algorithm for Minimum Vertex Cover</a></li><li><a href="/algorithm/dynamic-time-warping.html" class="sidebar-link">Dynamic Time Warping</a></li><li><a href="/algorithm/fast-fourier-transform.html" class="sidebar-link">Fast Fourier Transform</a></li><li><a href="/algorithm/pseudocode.html" class="sidebar-link">Pseudocode</a></li><li><a href="/algorithm/contributors.html" class="sidebar-link">The Contributors</a></li></ul></section></li></ul> </aside> <main class="page"> <div class="theme-default-content content__default"><h1 id="substring-search"><a href="#substring-search" class="header-anchor">#</a> Substring Search</h1> <h2 id="introduction-to-knuth-morris-pratt-kmp-algorithm"><a href="#introduction-to-knuth-morris-pratt-kmp-algorithm" class="header-anchor">#</a> Introduction To Knuth-Morris-Pratt (KMP) Algorithm</h2> <p>Suppose that we have a <strong>text</strong> and a <strong>pattern</strong>. We need to determine if the pattern exists in the text or not. For example:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Text <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> g <span class="token operator">|</span> l <span class="token operator">|</span> x <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> g <span class="token operator">|</span> l <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>This <strong>pattern</strong> does exist in the <strong>text</strong>. So our substring search should return <strong>3</strong>, the index of the position from which this <strong>pattern</strong> starts. So how does our brute force substring search procedure work?</p> <p>What we usually do is: we start from the <strong>0th</strong> index of the <strong>text</strong> and the <strong>0th</strong> index of our *pattern and we compare <strong>Text[0]</strong> with <strong>Pattern[0]</strong>. Since they are not a match, we go to the next index of our <strong>text</strong> and we compare <strong>Text[1]</strong> with <strong>Pattern[0]</strong>. Since this is a match, we increment the index of our <strong>pattern</strong> and the index of the <strong>Text</strong> also. We compare <strong>Text[2]</strong> with <strong>Pattern[1]</strong>. They are also a match. Following the same procedure stated before, we now compare <strong>Text[3]</strong> with <strong>Pattern[2]</strong>. As they do not match, we start from the next position where we started finding the match. That is index <strong>2</strong> of the <strong>Text</strong>. We compare <strong>Text[2]</strong> with <strong>Pattern[0]</strong>. They don't match. Then incrementing index of the <strong>Text</strong>, we compare <strong>Text[3]</strong> with <strong>Pattern[0]</strong>. They match. Again <strong>Text[4]</strong> and <strong>Pattern[1]</strong> match, <strong>Text[5]</strong> and <strong>Pattern[2]</strong> match and <strong>Text[6]</strong> and <strong>Pattern[3]</strong> match. Since we've reached the end of our <strong>Pattern</strong>, we now return the index from which our match started, that is <strong>3</strong>. If our <strong>pattern</strong> was: <code>bcgll</code>, that means if the <strong>pattern</strong> didn't exist in our <strong>text</strong>, our search should return exception or <strong>-1</strong> or any other predefined value. We can clearly see that, in the worst case, this algorithm would take <code>O(mn)</code> time where <strong>m</strong> is the length of the <strong>Text</strong> and <strong>n</strong> is the length of the <strong>Pattern</strong>. How do we reduce this time complexity? This is where KMP Substring Search Algorithm comes into the picture.</p> <p>The <a href="https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm" target="_blank" rel="noopener noreferrer">Knuth-Morris-Pratt String Searching Algorithm<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> or KMP Algorithm searches for occurrences of a "Pattern" within a main "Text" by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. The algorithm was conceived in 1970 by <a href="https://en.wikipedia.org/wiki/Donald_Knuth" target="_blank" rel="noopener noreferrer">Donuld Knuth<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> and <a href="https://en.wikipedia.org/wiki/Vaughan_Pratt" target="_blank" rel="noopener noreferrer">Vaughan Pratt<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> and independently by <a href="https://en.wikipedia.org/wiki/James_H._Morris" target="_blank" rel="noopener noreferrer">James H. Morris<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a>. The trio published it jointly in 1977.</p> <p>Let's extend our example <strong>Text</strong> and <strong>Pattern</strong> for better understanding:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span><span class="token number">0</span> <span class="token operator">|</span><span class="token number">1</span> <span class="token operator">|</span><span class="token number">2</span> <span class="token operator">|</span><span class="token number">3</span> <span class="token operator">|</span><span class="token number">4</span> <span class="token operator">|</span><span class="token number">5</span> <span class="token operator">|</span><span class="token number">6</span> <span class="token operator">|</span><span class="token number">7</span> <span class="token operator">|</span><span class="token number">8</span> <span class="token operator">|</span><span class="token number">9</span> <span class="token operator">|</span><span class="token number">10</span><span class="token operator">|</span><span class="token number">11</span><span class="token operator">|</span><span class="token number">12</span><span class="token operator">|</span><span class="token number">13</span><span class="token operator">|</span><span class="token number">14</span><span class="token operator">|</span><span class="token number">15</span><span class="token operator">|</span><span class="token number">16</span><span class="token operator">|</span><span class="token number">17</span><span class="token operator">|</span><span class="token number">18</span><span class="token operator">|</span><span class="token number">19</span><span class="token operator">|</span><span class="token number">20</span><span class="token operator">|</span><span class="token number">21</span><span class="token operator">|</span><span class="token number">22</span><span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span>
<span class="token operator">|</span> Text <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>c <span class="token operator">|</span>x <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>c <span class="token operator">|</span>d <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>x <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>c <span class="token operator">|</span>d <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>c <span class="token operator">|</span>d <span class="token operator">|</span>a <span class="token operator">|</span>b <span class="token operator">|</span>c <span class="token operator">|</span>y <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">+</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> d <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> y <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>At first, our <strong>Text</strong> and <strong>Pattern</strong> matches till index <strong>2</strong>. <strong>Text[3]</strong> and <strong>Pattern[3]</strong> doesn't match. So our aim is to not go backwards in this <strong>Text</strong>, that is, in case of a mismatch, we don't want our matching to begin again from the position that we started matching with. To achieve that, we'll look for a <strong>suffix</strong> in our <strong>Pattern</strong> right before our mismatch occurred (substring <strong>abc</strong>), which is also a <strong>prefix</strong> of the substring of our <strong>Pattern</strong>. For our example, since all the characters are unique, there is no suffix, that is the prefix of our matched substring. So what that means is, our next comparison will start from index <strong>0</strong>. Hold on for a bit, you'll understand why we did this. Next, we compare <strong>Text[3]</strong> with <strong>Pattern[0]</strong> and it doesn't match. After that, for <strong>Text</strong> from index <strong>4</strong> to index <strong>9</strong> and for <strong>Pattern</strong> from index <strong>0</strong> to index <strong>5</strong>, we find a match. We find a mismatch in <strong>Text[10]</strong> and <strong>Pattern[6]</strong>. So we take the substring from <strong>Pattern</strong> right before the point where mismatch occurs (substring <strong>abcdabc</strong>), we check for a suffix, that is also a prefix of this substring. We can see here <strong>ab</strong> is both the suffix and prefix of this substring. What that means is, since we've matched until <strong>Text[10]</strong>, the characters right before the mismatch is <strong>ab</strong>. What we can infer from it is that since <strong>ab</strong> is also a prefix of the substring we took, we don't have to check <strong>ab</strong> again and the next check can start from <strong>Text[10]</strong> and <strong>Pattern[2]</strong>. We didn't have to look back to the whole <strong>Text</strong>, we can start directly from where our mismatch occurred. Now we check <strong>Text[10]</strong> and <strong>Pattern[2]</strong>, since it's a mismatch, and the substring before mismatch (<strong>abc</strong>) doesn't contain a suffix which is also a prefix, we check <strong>Text[10]</strong> and <strong>Pattern[0]</strong>, they don't match. After that for <strong>Text</strong> from index <strong>11</strong> to index <strong>17</strong> and for <strong>Pattern</strong> from index <strong>0</strong> to index <strong>6</strong>. We find a mismatch in <strong>Text[18]</strong> and <strong>Pattern[7]</strong>. So again we check the substring before mismatch (substring <strong>abcdabc</strong>) and find <strong>abc</strong> is both the suffix and the prefix. So since we matched till <strong>Pattern[7]</strong>, <strong>abc</strong> must be before <strong>Text[18]</strong>. That means, we don't need to compare until <strong>Text[17]</strong> and our comparison will start from <strong>Text[18]</strong> and <strong>Pattern[3]</strong>. Thus we will find a match and we'll return <strong>15</strong> which is our starting index of the match. This is how our KMP Substring Search works using suffix and prefix information.</p> <p>Now, how do we efficiently compute if suffix is same as prefix and at what point to start the check if there is a mismatch of character between <strong>Text</strong> and <strong>Pattern</strong>. Let's take a look at an example:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> d <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>We'll generate an array containing the required information. Let's call the array <strong>S</strong>. The size of the array will be same as the length of the pattern. Since the first letter of the <strong>Pattern</strong> can't be the suffix of any prefix, we'll put <strong>S[0]</strong> = <strong>0</strong>. We take <strong>i</strong> = <strong>1</strong> and <strong>j</strong> = <strong>0</strong> at first. At each step we compare <strong>Pattern[i]</strong> and <strong>Pattern[j]</strong> and increment <strong>i</strong>. If there is a match we put <strong>S[i]</strong> = <strong>j</strong> + <strong>1</strong> and increment <strong>j</strong>, if there is a mismatch, we check the previous value position of <strong>j</strong> (if available) and set <strong>j</strong> = <strong>S[j-1]</strong> (if <strong>j</strong> is not equal to <strong>0</strong>), we keep doing this until <strong>S[j]</strong> doesn't match with <strong>S[i]</strong> or <strong>j</strong> doesn't become <strong>0</strong>. For the later one, we put <strong>S[i]</strong> = <strong>0</strong>. For our example:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code> j i
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> d <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p><strong>Pattern[j]</strong> and <strong>Pattern[i]</strong> don't match, so we increment <strong>i</strong> and since <strong>j</strong> is <strong>0</strong>, we don't check the previous value and put <strong>Pattern[i]</strong> = <strong>0</strong>. If we keep incrementing <strong>i</strong>, for <strong>i</strong> = <strong>4</strong>, we'll get a match, so we put <strong>S[i]</strong> = <strong>S[4]</strong> = <strong>j</strong> + <strong>1</strong> = <strong>0</strong> + <strong>1</strong> = <strong>1</strong> and increment <strong>j</strong> and <strong>i</strong>. Our array will look like:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code> j i
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> d <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> S <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token operator">|</span> <span class="token operator">|</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>Since <strong>Pattern[1]</strong> and <strong>Pattern[5]</strong> is a match, we put <strong>S[i]</strong> = <strong>S[5]</strong> = <strong>j</strong> + <strong>1</strong> = <strong>1</strong> + <strong>1</strong> = <strong>2</strong>. If we continue, we'll find a mismatch for <strong>j</strong> = <strong>3</strong> and <strong>i</strong> = <strong>7</strong>. Since <strong>j</strong> is not equal to <strong>0</strong>, we put <strong>j</strong> = <strong>S[j-1]</strong>. And we'll compare the characters at <strong>i</strong> and <strong>j</strong> are same or not, since they are same, we'll put <strong>S[i]</strong> = <strong>j</strong> + 1. Our completed array will look like:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> S <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>This is our required array. Here a nonzero-value of <strong>S[i]</strong> means there is a <strong>S[i]</strong> length suffix same as the prefix in that substring (substring from <strong>0</strong> to <strong>i</strong>) and the next comparison will start from <strong>S[i]</strong> + <strong>1</strong> position of the <strong>Pattern</strong>. Our algorithm to generate the array would look like:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure <span class="token function">GenerateSuffixArray</span><span class="token punctuation">(</span>Pattern<span class="token punctuation">)</span><span class="token operator">:</span>
i <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">1</span>
j <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">0</span>
n <span class="token operator">:</span><span class="token operator">=</span> Pattern<span class="token punctuation">.</span>length
<span class="token keyword">while</span> i is less than n
<span class="token keyword">if</span> Pattern<span class="token punctuation">[</span>i<span class="token punctuation">]</span> is equal to Pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span>
S<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">:</span><span class="token operator">=</span> j <span class="token operator">+</span> <span class="token number">1</span>
j <span class="token operator">:</span><span class="token operator">=</span> j <span class="token operator">+</span> <span class="token number">1</span>
i <span class="token operator">:</span><span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span>
<span class="token keyword">else</span>
<span class="token keyword">if</span> j is <span class="token operator">not</span> equal to <span class="token number">0</span>
j <span class="token operator">:</span><span class="token operator">=</span> S<span class="token punctuation">[</span>j<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span>
<span class="token keyword">else</span>
S<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">0</span>
i <span class="token operator">:</span><span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span>
end <span class="token keyword">if</span>
end <span class="token keyword">if</span>
end <span class="token keyword">while</span>
</code></pre></div><p>The time complexity to build this array is <code>O(n)</code> and the space complexity is also <code>O(n)</code>. To make sure if you have completely understood the algorithm, try to generate an array for pattern <code>aabaabaa</code> and check if the result matches with <a href="https://i.stack.imgur.com/4aqZk.jpg" target="_blank" rel="noopener noreferrer">this<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> one.</p> <p>Now let's do a substring search using the following example:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span> <span class="token number">6</span> <span class="token operator">|</span> <span class="token number">7</span> <span class="token operator">|</span> <span class="token number">8</span> <span class="token operator">|</span> <span class="token number">9</span> <span class="token operator">|</span><span class="token number">10</span> <span class="token operator">|</span><span class="token number">11</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Text <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> x <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> y <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Index <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">3</span> <span class="token operator">|</span> <span class="token number">4</span> <span class="token operator">|</span> <span class="token number">5</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> Pattern <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> c <span class="token operator">|</span> a <span class="token operator">|</span> b <span class="token operator">|</span> y <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
<span class="token operator">|</span> S <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span> <span class="token number">1</span> <span class="token operator">|</span> <span class="token number">2</span> <span class="token operator">|</span> <span class="token number">0</span> <span class="token operator">|</span>
<span class="token operator">+</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span><span class="token operator">--</span><span class="token operator">-</span><span class="token operator">+</span>
</code></pre></div><p>We have a <strong>Text</strong>, a <strong>Pattern</strong> and a pre-calculated array <strong>S</strong> using our logic defined before. We compare <strong>Text[0]</strong> and <strong>Pattern[0]</strong> and they are same. <strong>Text[1]</strong> and <strong>Pattern[1]</strong> are same. <strong>Text[2]</strong> and <strong>Pattern[2]</strong> are not same. We check the value at the position right before the mismatch. Since <strong>S[1]</strong> is <strong>0</strong>, there is no suffix that is same as the prefix in our substring and our comparison starts at position <strong>S[1]</strong>, which is <strong>0</strong>. So <strong>Pattern[0]</strong> is not same as <strong>Text[2]</strong>, so we move on. <strong>Text[3]</strong> is same as <strong>Pattern[0]</strong> and there is a match till <strong>Text[8]</strong> and <strong>Pattern[5]</strong>. We go one step back in the <strong>S</strong> array and find <strong>2</strong>. So this means there is a prefix of length <strong>2</strong> which is also the suffix of this substring (<strong>abcab)</strong> which is <strong>ab</strong>. That also means that there is an <strong>ab</strong> before <strong>Text[8]</strong>. So we can safely ignore <strong>Pattern[0]</strong> and <strong>Pattern[1]</strong> and start our next comparison from <strong>Pattern[2]</strong> and <strong>Text[8]</strong>. If we continue, we'll find the <strong>Pattern</strong> in the <strong>Text</strong>. Our procedure will look like:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure <span class="token function">KMP</span><span class="token punctuation">(</span>Text<span class="token punctuation">,</span> Pattern<span class="token punctuation">)</span>
<span class="token function">GenerateSuffixArray</span><span class="token punctuation">(</span>Pattern<span class="token punctuation">)</span>
m <span class="token operator">:</span><span class="token operator">=</span> Text<span class="token punctuation">.</span>Length
n <span class="token operator">:</span><span class="token operator">=</span> Pattern<span class="token punctuation">.</span>Length
i <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">0</span>
j <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">0</span>
<span class="token keyword">while</span> i is less than m
<span class="token keyword">if</span> Pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span> is equal to Text<span class="token punctuation">[</span>i<span class="token punctuation">]</span>
j <span class="token operator">:</span><span class="token operator">=</span> j <span class="token operator">+</span> <span class="token number">1</span>
i <span class="token operator">:</span><span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span>
<span class="token keyword">if</span> j is equal to n
<span class="token function">Return</span> <span class="token punctuation">(</span>j<span class="token operator">-</span>i<span class="token punctuation">)</span>
<span class="token keyword">else</span> <span class="token keyword">if</span> i <span class="token operator"><</span> m <span class="token operator">and</span> Pattern<span class="token punctuation">[</span>j<span class="token punctuation">]</span> is <span class="token operator">not</span> equal t Text<span class="token punctuation">[</span>i<span class="token punctuation">]</span>
<span class="token keyword">if</span> j is <span class="token operator">not</span> equal to <span class="token number">0</span>
j <span class="token operator">=</span> S<span class="token punctuation">[</span>j<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span>
<span class="token keyword">else</span>
i <span class="token operator">:</span><span class="token operator">=</span> i <span class="token operator">+</span> <span class="token number">1</span>
end <span class="token keyword">if</span>
end <span class="token keyword">if</span>
end <span class="token keyword">while</span>
Return <span class="token operator">-</span><span class="token number">1</span>
</code></pre></div><p>The time complexity of this algorithm apart from the Suffix Array Calculation is <code>O(m)</code>. Since <strong>GenerateSuffixArray</strong> takes <code>O(n)</code>, the total time complexity of KMP Algorithm is: <code>O(m+n)</code>.</p> <p>PS: If you want to find multiple occurrences of <strong>Pattern</strong> in the <strong>Text</strong>, instead of returning the value, print it/store it and set <code>j := S[j-1]</code>. Also keep a <code>flag</code> to track whether you have found any occurrence or not and handle it accordingly.</p> <h2 id="introduction-to-rabin-karp-algorithm"><a href="#introduction-to-rabin-karp-algorithm" class="header-anchor">#</a> Introduction to Rabin-Karp Algorithm</h2> <p><a href="https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm" target="_blank" rel="noopener noreferrer">Rabin-Karp Algorithm<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> is a string searching algorithm created by <a href="https://en.wikipedia.org/wiki/Richard_M._Karp" target="_blank" rel="noopener noreferrer">Richard M. Karp<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> and <a href="https://en.wikipedia.org/wiki/Michael_O._Rabin" target="_blank" rel="noopener noreferrer">Michael O. Rabin<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a> that uses hashing to find any one of a set of pattern strings in a text.</p> <p>A substring of a string is another string that occurs in. For example, <strong>ver</strong> is a substring of <strong>stackoverflow</strong>. Not to be confused with subsequence because <strong>cover</strong> is a subsequence of the same string. In other words, any subset of consecutive letters in a string is a substring of the given string.</p> <p>In Rabin-Karp algorithm, we'll generate a hash of our <strong>pattern</strong> that we are looking for & check if the rolling hash of our <strong>text</strong> matches the <strong>pattern</strong> or not. If it doesn't match, we can guarantee that the <strong>pattern</strong> <strong>doesn't exist</strong> in the <strong>text</strong>. However, if it does match, the <strong>pattern</strong> <strong>can</strong> be present in the <strong>text</strong>. Let's look at an example:</p> <p>Let's say we have a text: <strong>yeminsajid</strong> and we want to find out if the pattern <strong>nsa</strong> exists in the text. To calculate the hash and rolling hash, we'll need to use a prime number. This can be any prime number. Let's take <strong>prime</strong> = <strong>11</strong> for this example. We'll determine hash value using this formula:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token punctuation">(</span><span class="token number">1</span>st letter<span class="token punctuation">)</span> <span class="token function">X</span> <span class="token punctuation">(</span>prime<span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token number">2</span>nd letter<span class="token punctuation">)</span> <span class="token function">X</span> <span class="token punctuation">(</span>prime<span class="token punctuation">)</span>¹ <span class="token operator">+</span> <span class="token punctuation">(</span><span class="token number">3</span>rd letter<span class="token punctuation">)</span> <span class="token function">X</span> <span class="token punctuation">(</span>prime<span class="token punctuation">)</span>² X <span class="token operator">+</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
</code></pre></div><p>We'll denote:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>a <span class="token operator">-></span> <span class="token number">1</span> g <span class="token operator">-></span> <span class="token number">7</span> m <span class="token operator">-></span> <span class="token number">13</span> s <span class="token operator">-></span> <span class="token number">19</span> y <span class="token operator">-></span> <span class="token number">25</span>
b <span class="token operator">-></span> <span class="token number">2</span> h <span class="token operator">-></span> <span class="token number">8</span> n <span class="token operator">-></span> <span class="token number">14</span> t <span class="token operator">-></span> <span class="token number">20</span> z <span class="token operator">-></span> <span class="token number">26</span>
c <span class="token operator">-></span> <span class="token number">3</span> i <span class="token operator">-></span> <span class="token number">9</span> o <span class="token operator">-></span> <span class="token number">15</span> u <span class="token operator">-></span> <span class="token number">21</span>
d <span class="token operator">-></span> <span class="token number">4</span> j <span class="token operator">-></span> <span class="token number">10</span> p <span class="token operator">-></span> <span class="token number">16</span> v <span class="token operator">-></span> <span class="token number">22</span>
e <span class="token operator">-></span> <span class="token number">5</span> k <span class="token operator">-></span> <span class="token number">11</span> q <span class="token operator">-></span> <span class="token number">17</span> w <span class="token operator">-></span> <span class="token number">23</span>
f <span class="token operator">-></span> <span class="token number">6</span> l <span class="token operator">-></span> <span class="token number">12</span> r <span class="token operator">-></span> <span class="token number">18</span> x <span class="token operator">-></span> <span class="token number">24</span>
</code></pre></div><p>The hash value of <strong>nsa</strong> will be:</p> <div class="language- extra-class"><pre class="language-text"><code>
14 X 11⁰ + 19 X 11¹ + 1 X 11² = 344
</code></pre></div><p>Now we find the rolling-hash of our text. If the rolling hash matches with the hash value of our pattern, we'll check if the strings match or not. Since our pattern has <strong>3</strong> letters, we'll take 1st <strong>3</strong> letters <strong>yem</strong> from our text and calculate hash value. We get:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token number">25</span> X <span class="token number">11</span>⁰ <span class="token operator">+</span> <span class="token number">5</span> X <span class="token number">11</span>¹ <span class="token operator">+</span> <span class="token number">13</span> X <span class="token number">11</span>² <span class="token operator">=</span> <span class="token number">1653</span>
</code></pre></div><p>This value doesn't match with our pattern's hash value. So the string doesn't exists here. Now we need to consider the next step. To calculate the hash value of our next string <strong>emi</strong>. We can calculate this using our formula. But that would be rather trivial and cost us more. Instead, we use another technique.</p> <ul><li>We subtract the value of the <strong>First Letter of Previous String</strong> from our current hash value. In this case, <strong>y</strong>. We get, <code>1653 - 25 = 1628</code>.</li> <li>We divide the difference with our <strong>prime</strong>, which is <strong>11</strong> for this example. We get, <code>1628 / 11 = 148</code>.</li> <li>We add <strong>new letter X (prime)ᵐ⁻¹</strong>, where <strong>m</strong> is the length of the pattern, with the quotient, which is <strong>i</strong> = <strong>9</strong>. We get, <code>148 + 9 X 11² = 1237</code>.</li></ul> <p>The new hash value is not equal to our patterns hash value. Moving on, for <strong>n</strong> we get:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Previous String<span class="token operator">:</span> emi
First Letter of Previous String<span class="token operator">:</span> <span class="token function">e</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">)</span>
New Letter<span class="token operator">:</span> <span class="token function">n</span><span class="token punctuation">(</span><span class="token number">14</span><span class="token punctuation">)</span>
New String<span class="token operator">:</span> <span class="token string">"min"</span>
<span class="token number">1237</span> <span class="token operator">-</span> <span class="token number">5</span> <span class="token operator">=</span> <span class="token number">1232</span>
<span class="token number">1232</span> <span class="token operator">/</span> <span class="token number">11</span> <span class="token operator">=</span> <span class="token number">112</span>
<span class="token number">112</span> <span class="token operator">+</span> <span class="token number">14</span> X <span class="token number">11</span>² <span class="token operator">=</span> <span class="token number">1806</span>
</code></pre></div><p>It doesn't match. After that, for <strong>s</strong>, we get:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Previous String<span class="token operator">:</span> min
First Letter of Previous String<span class="token operator">:</span> <span class="token function">m</span><span class="token punctuation">(</span><span class="token number">13</span><span class="token punctuation">)</span>
New Letter<span class="token operator">:</span> <span class="token function">s</span><span class="token punctuation">(</span><span class="token number">19</span><span class="token punctuation">)</span>
New String<span class="token operator">:</span> <span class="token string">"ins"</span>
<span class="token number">1806</span> <span class="token operator">-</span> <span class="token number">13</span> <span class="token operator">=</span> <span class="token number">1793</span>
<span class="token number">1793</span> <span class="token operator">/</span> <span class="token number">11</span> <span class="token operator">=</span> <span class="token number">163</span>
<span class="token number">163</span> <span class="token operator">+</span> <span class="token number">19</span> X <span class="token number">11</span>² <span class="token operator">=</span> <span class="token number">2462</span>
</code></pre></div><p>It doesn't match. Next, for <strong>a</strong>, we get:</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Previous String<span class="token operator">:</span> ins
First Letter of Previous String<span class="token operator">:</span> <span class="token function">i</span><span class="token punctuation">(</span><span class="token number">9</span><span class="token punctuation">)</span>
New Letter<span class="token operator">:</span> <span class="token function">a</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span>
New String<span class="token operator">:</span> <span class="token string">"nsa"</span>
<span class="token number">2462</span> <span class="token operator">-</span> <span class="token number">9</span> <span class="token operator">=</span> <span class="token number">2453</span>
<span class="token number">2453</span> <span class="token operator">/</span> <span class="token number">11</span> <span class="token operator">=</span> <span class="token number">223</span>
<span class="token number">223</span> <span class="token operator">+</span> <span class="token number">1</span> X <span class="token number">11</span>² <span class="token operator">=</span> <span class="token number">344</span>
</code></pre></div><p>It's a match! Now we compare our pattern with the current string. Since both the strings match, the substring exists in this string. And we return the starting position of our substring.</p> <p>The pseudo-code will be:</p> <p><strong>Hash Calculation:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure Calculate<span class="token operator">-</span><span class="token function">Hash</span><span class="token punctuation">(</span>String<span class="token punctuation">,</span> Prime<span class="token punctuation">,</span> x<span class="token punctuation">)</span><span class="token operator">:</span>
hash <span class="token operator">:</span><span class="token operator">=</span> <span class="token number">0</span> <span class="token comment">// Here x denotes the length to be considered</span>
<span class="token keyword">for</span> m from <span class="token number">1</span> to x <span class="token comment">// to find the hash value</span>
hash <span class="token operator">:</span><span class="token operator">=</span> hash <span class="token operator">+</span> <span class="token punctuation">(</span>Value of String<span class="token punctuation">[</span>m<span class="token punctuation">]</span><span class="token punctuation">)</span>ᵐ⁻¹
end <span class="token keyword">for</span>
Return hash
</code></pre></div><p><strong>Hash Recalculation:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure Recalculate<span class="token operator">-</span><span class="token function">Hash</span><span class="token punctuation">(</span>String<span class="token punctuation">,</span> Curr<span class="token punctuation">,</span> Prime<span class="token punctuation">,</span> Hash<span class="token punctuation">)</span><span class="token operator">:</span>
Hash <span class="token operator">:</span><span class="token operator">=</span> Hash <span class="token operator">-</span> Value of String<span class="token punctuation">[</span>Curr<span class="token punctuation">]</span> <span class="token comment">//here Curr denotes First Letter of Previous String</span>
Hash <span class="token operator">:</span><span class="token operator">=</span> Hash <span class="token operator">/</span> Prime
m <span class="token operator">:</span><span class="token operator">=</span> String<span class="token punctuation">.</span>length
New <span class="token operator">:</span><span class="token operator">=</span> Curr <span class="token operator">+</span> m <span class="token operator">-</span> <span class="token number">1</span>
Hash <span class="token operator">:</span><span class="token operator">=</span> Hash <span class="token operator">+</span> <span class="token punctuation">(</span>Value of String<span class="token punctuation">[</span>New<span class="token punctuation">]</span><span class="token punctuation">)</span>ᵐ⁻¹
Return Hash
</code></pre></div><p><strong>String Match:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure String<span class="token operator">-</span><span class="token function">Match</span><span class="token punctuation">(</span>Text<span class="token punctuation">,</span> Pattern<span class="token punctuation">,</span> m<span class="token punctuation">)</span><span class="token operator">:</span>
<span class="token keyword">for</span> i from m to Pattern<span class="token operator">-</span>length <span class="token operator">+</span> m <span class="token operator">-</span> <span class="token number">1</span>
<span class="token keyword">if</span> Text<span class="token punctuation">[</span>i<span class="token punctuation">]</span> is <span class="token operator">not</span> equal to Pattern<span class="token punctuation">[</span>i<span class="token punctuation">]</span>
Return <span class="token boolean">false</span>
end <span class="token keyword">if</span>
end <span class="token keyword">for</span>
Return <span class="token boolean">true</span>
</code></pre></div><p><strong>Rabin-Karp:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Procedure Rabin<span class="token operator">-</span><span class="token function">Karp</span><span class="token punctuation">(</span>Text<span class="token punctuation">,</span> Pattern<span class="token punctuation">,</span> Prime<span class="token punctuation">)</span><span class="token operator">:</span>
m <span class="token operator">:</span><span class="token operator">=</span> Pattern<span class="token punctuation">.</span>Length
HashValue <span class="token operator">:</span><span class="token operator">=</span> Calculate<span class="token operator">-</span><span class="token function">Hash</span><span class="token punctuation">(</span>Pattern<span class="token punctuation">,</span> Prime<span class="token punctuation">,</span> m<span class="token punctuation">)</span>
CurrValue <span class="token operator">:</span><span class="token operator">=</span> Calculate<span class="token operator">-</span><span class="token function">Hash</span><span class="token punctuation">(</span>Pattern<span class="token punctuation">,</span> Prime<span class="token punctuation">,</span> m<span class="token punctuation">)</span>
<span class="token keyword">for</span> i from <span class="token number">1</span> to Text<span class="token punctuation">.</span>length <span class="token operator">-</span> m
<span class="token keyword">if</span> HashValue <span class="token operator">==</span> CurrValue <span class="token operator">and</span> String<span class="token operator">-</span><span class="token function">Match</span><span class="token punctuation">(</span>Text<span class="token punctuation">,</span> Pattern<span class="token punctuation">,</span> i<span class="token punctuation">)</span> is <span class="token boolean">true</span>
Return i
end <span class="token keyword">if</span>
CurrValue <span class="token operator">:</span><span class="token operator">=</span> Recalculate<span class="token operator">-</span><span class="token function">Hash</span><span class="token punctuation">(</span>String<span class="token punctuation">,</span> i<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">,</span> Prime<span class="token punctuation">,</span> CurrValue<span class="token punctuation">)</span>
end <span class="token keyword">for</span>
Return <span class="token operator">-</span><span class="token number">1</span>
</code></pre></div><p>If the algorithm doesn't find any match, it simply returns <strong>-1</strong>.</p> <p>This algorithm is used in detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material, ignoring details such as case and punctuation. Because of the abundance of the sought strings, single-string searching algorithms are impractical here. Again, <strong>Knuth-Morris-Pratt algorithm</strong> or <strong>Boyer-Moore String Search algorithm</strong> is faster single pattern string searching algorithm, than <strong>Rabin-Karp</strong>. However, it is an algorithm of choice for multiple pattern search. If we want to find any of the large number, say k, fixed length patterns in a text, we can create a simple variant of the Rabin-Karp algorithm.</p> <p>For text of length <strong>n</strong> and <strong>p</strong> patterns of combined length <strong>m</strong>, its average and best case running time is <strong>O(n+m)</strong> in space <strong>O(p)</strong>, but its worst-case time is <strong>O(nm)</strong>.</p> <h2 id="kmp-algorithm-in-c"><a href="#kmp-algorithm-in-c" class="header-anchor">#</a> KMP Algorithm in C</h2> <p>Given a text <strong>txt</strong> and a pattern <strong>pat</strong>, the objective of this program will be to print all the occurance of <strong>pat</strong> in <strong>txt</strong>.</p> <p><strong>Examples:</strong></p> <p><strong>Input:</strong></p> <div class="language- extra-class"><pre class="language-text"><code>
txt[] = "THIS IS A TEST TEXT"
pat[] = "TEST"
</code></pre></div><p><strong>output:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Pattern found at index <span class="token number">10</span>
</code></pre></div><p><strong>Input:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>txt<span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"AABAACAADAABAAABAA"</span>
pat<span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"AABA"</span>
</code></pre></div><p><strong>output:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Pattern found at index <span class="token number">0</span>
Pattern found at index <span class="token number">9</span>
Pattern found at index <span class="token number">13</span>
</code></pre></div><p><strong>C Language Implementation:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code><span class="token comment">// C program for implementation of KMP pattern searching </span>
<span class="token comment">// algorithm</span>
<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><stdio.h></span></span>
<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><string.h></span></span>
<span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"><stdlib.h></span></span>
<span class="token keyword">void</span> <span class="token function">computeLPSArray</span><span class="token punctuation">(</span><span class="token keyword">char</span> <span class="token operator">*</span>pat<span class="token punctuation">,</span> <span class="token keyword">int</span> M<span class="token punctuation">,</span> <span class="token keyword">int</span> <span class="token operator">*</span>lps<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">void</span> <span class="token function">KMPSearch</span><span class="token punctuation">(</span><span class="token keyword">char</span> <span class="token operator">*</span>pat<span class="token punctuation">,</span> <span class="token keyword">char</span> <span class="token operator">*</span>txt<span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token keyword">int</span> M <span class="token operator">=</span> <span class="token function">strlen</span><span class="token punctuation">(</span>pat<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">int</span> N <span class="token operator">=</span> <span class="token function">strlen</span><span class="token punctuation">(</span>txt<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">// create lps[] that will hold the longest prefix suffix</span>
<span class="token comment">// values for pattern</span>
<span class="token keyword">int</span> <span class="token operator">*</span>lps <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token keyword">int</span> <span class="token operator">*</span><span class="token punctuation">)</span><span class="token function">malloc</span><span class="token punctuation">(</span><span class="token keyword">sizeof</span><span class="token punctuation">(</span><span class="token keyword">int</span><span class="token punctuation">)</span><span class="token operator">*</span>M<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">int</span> j <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">// index for pat[]</span>
<span class="token comment">// Preprocess the pattern (calculate lps[] array)</span>
<span class="token function">computeLPSArray</span><span class="token punctuation">(</span>pat<span class="token punctuation">,</span> M<span class="token punctuation">,</span> lps<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">int</span> i <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">// index for txt[]</span>
<span class="token keyword">while</span> <span class="token punctuation">(</span>i <span class="token operator"><</span> N<span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>pat<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token operator">==</span> txt<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token punctuation">{</span>
j<span class="token operator">++</span><span class="token punctuation">;</span>
i<span class="token operator">++</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>j <span class="token operator">==</span> M<span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token function">printf</span><span class="token punctuation">(</span><span class="token string">"Found pattern at index %d \n"</span><span class="token punctuation">,</span> i<span class="token operator">-</span>j<span class="token punctuation">)</span><span class="token punctuation">;</span>
j <span class="token operator">=</span> lps<span class="token punctuation">[</span>j<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token comment">// mismatch after j matches</span>
<span class="token keyword">else</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>i <span class="token operator"><</span> N <span class="token operator">&&</span> pat<span class="token punctuation">[</span>j<span class="token punctuation">]</span> <span class="token operator">!=</span> txt<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token comment">// Do not match lps[0..lps[j-1]] characters,</span>
<span class="token comment">// they will match anyway</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>j <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">)</span>
j <span class="token operator">=</span> lps<span class="token punctuation">[</span>j<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token keyword">else</span>
i <span class="token operator">=</span> i<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token function">free</span><span class="token punctuation">(</span>lps<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// to avoid memory leak</span>
<span class="token punctuation">}</span>
<span class="token keyword">void</span> <span class="token function">computeLPSArray</span><span class="token punctuation">(</span><span class="token keyword">char</span> <span class="token operator">*</span>pat<span class="token punctuation">,</span> <span class="token keyword">int</span> M<span class="token punctuation">,</span> <span class="token keyword">int</span> <span class="token operator">*</span>lps<span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token keyword">int</span> len <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">// length of the previous longest prefix suffix</span>
<span class="token keyword">int</span> i<span class="token punctuation">;</span>
lps<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">// lps[0] is always 0</span>
i <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">;</span>
<span class="token comment">// the loop calculates lps[i] for i = 1 to M-1</span>
<span class="token keyword">while</span> <span class="token punctuation">(</span>i <span class="token operator"><</span> M<span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>pat<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">==</span> pat<span class="token punctuation">[</span>len<span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token punctuation">{</span>
len<span class="token operator">++</span><span class="token punctuation">;</span>
lps<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> len<span class="token punctuation">;</span>
i<span class="token operator">++</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token keyword">else</span> <span class="token comment">// (pat[i] != pat[len])</span>
<span class="token punctuation">{</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>len <span class="token operator">!=</span> <span class="token number">0</span><span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token comment">// This is tricky. Consider the example </span>
<span class="token comment">// AAACAAAA and i = 7.</span>
len <span class="token operator">=</span> lps<span class="token punctuation">[</span>len<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span>
<span class="token comment">// Also, note that we do not increment i here</span>
<span class="token punctuation">}</span>
<span class="token keyword">else</span> <span class="token comment">// if (len == 0)</span>
<span class="token punctuation">{</span>
lps<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>
i<span class="token operator">++</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span>
<span class="token comment">// Driver program to test above function</span>
<span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token punctuation">{</span>
<span class="token keyword">char</span> <span class="token operator">*</span>txt <span class="token operator">=</span> <span class="token string">"ABABDABACDABABCABAB"</span><span class="token punctuation">;</span>
<span class="token keyword">char</span> <span class="token operator">*</span>pat <span class="token operator">=</span> <span class="token string">"ABABCABAB"</span><span class="token punctuation">;</span>
<span class="token function">KMPSearch</span><span class="token punctuation">(</span>pat<span class="token punctuation">,</span> txt<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code></pre></div><p><strong>Output:</strong></p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>Found pattern at index <span class="token number">10</span>
</code></pre></div><p><strong>Reference:</strong></p> <p><a href="http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/" target="_blank" rel="noopener noreferrer">http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/<span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></a></p> <h2 id="python-implementation-of-kmp-algorithm"><a href="#python-implementation-of-kmp-algorithm" class="header-anchor">#</a> Python Implementation of KMP algorithm.</h2> <p><strong>Haystack</strong>: The string in which given pattern needs to be searched.<br> <strong>Needle</strong>: The pattern to be searched.</p> <p><strong>Time complexity</strong>: Search portion (strstr method) has the complexity O(n) where <code>n</code> is the length of haystack but as needle is also pre parsed for building prefix table O(m) is required for building prefix table where <code>m</code> is the length of the needle.<br>
Therefore, overall time complexity for KMP is <strong>O(n+m)</strong> <br> <strong>Space complexity</strong>: <strong>O(m)</strong> because of prefix table on needle.</p> <p>Note: Following implementation returns the start position of match in haystack (if there is a match) else returns -1, for edge cases like if needle/haystack is an empty string or needle is not found in haystack.</p> <div class="language-cpp extra-class"><pre class="language-cpp"><code>def <span class="token function">get_prefix_table</span><span class="token punctuation">(</span>needle<span class="token punctuation">)</span><span class="token operator">:</span>
prefix_set <span class="token operator">=</span> <span class="token function">set</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
n <span class="token operator">=</span> <span class="token function">len</span><span class="token punctuation">(</span>needle<span class="token punctuation">)</span>
prefix_table <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">*</span>n
delimeter <span class="token operator">=</span> <span class="token number">1</span>
<span class="token keyword">while</span><span class="token punctuation">(</span>delimeter<span class="token operator"><</span>n<span class="token punctuation">)</span><span class="token operator">:</span>
prefix_set<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span>needle<span class="token punctuation">[</span><span class="token operator">:</span>delimeter<span class="token punctuation">]</span><span class="token punctuation">)</span>
j <span class="token operator">=</span> <span class="token number">1</span>
<span class="token keyword">while</span><span class="token punctuation">(</span>j<span class="token operator"><</span>delimeter<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token operator">:</span>
<span class="token keyword">if</span> needle<span class="token punctuation">[</span>j<span class="token operator">:</span>delimeter<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">]</span> in prefix_set<span class="token operator">:</span>
prefix_table<span class="token punctuation">[</span>delimeter<span class="token punctuation">]</span> <span class="token operator">=</span> delimeter <span class="token operator">-</span> j <span class="token operator">+</span> <span class="token number">1</span>
<span class="token keyword">break</span>
j <span class="token operator">+=</span> <span class="token number">1</span>
delimeter <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token keyword">return</span> prefix_table
def <span class="token function">strstr</span><span class="token punctuation">(</span>haystack<span class="token punctuation">,</span> needle<span class="token punctuation">)</span><span class="token operator">:</span>
<span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">m</span><span class="token expression"><span class="token operator">:</span> denoting the position within S where the prospective match <span class="token keyword">for</span> W begins</span></span>
<span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">i</span><span class="token expression"><span class="token operator">:</span> denoting the index of the currently considered character in W<span class="token punctuation">.</span></span></span>
haystack_len <span class="token operator">=</span> <span class="token function">len</span><span class="token punctuation">(</span>haystack<span class="token punctuation">)</span>
needle_len <span class="token operator">=</span> <span class="token function">len</span><span class="token punctuation">(</span>needle<span class="token punctuation">)</span>
<span class="token keyword">if</span> <span class="token punctuation">(</span>needle_len <span class="token operator">></span> haystack_len<span class="token punctuation">)</span> <span class="token function">or</span> <span class="token punctuation">(</span><span class="token operator">not</span> haystack_len<span class="token punctuation">)</span> <span class="token function">or</span> <span class="token punctuation">(</span><span class="token operator">not</span> needle_len<span class="token punctuation">)</span><span class="token operator">:</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span>
prefix_table <span class="token operator">=</span> <span class="token function">get_prefix_table</span><span class="token punctuation">(</span>needle<span class="token punctuation">)</span>
m <span class="token operator">=</span> i <span class="token operator">=</span> <span class="token number">0</span>
<span class="token keyword">while</span><span class="token punctuation">(</span><span class="token punctuation">(</span>i<span class="token operator"><</span>needle_len<span class="token punctuation">)</span> <span class="token function">and</span> <span class="token punctuation">(</span>m<span class="token operator"><</span>haystack_len<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token operator">:</span>
<span class="token keyword">if</span> haystack<span class="token punctuation">[</span>m<span class="token punctuation">]</span> <span class="token operator">==</span> needle<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token operator">:</span>
i <span class="token operator">+=</span> <span class="token number">1</span>
m <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token keyword">else</span><span class="token operator">:</span>
<span class="token keyword">if</span> i <span class="token operator">!=</span> <span class="token number">0</span><span class="token operator">:</span>
i <span class="token operator">=</span> prefix_table<span class="token punctuation">[</span>i<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span>
<span class="token keyword">else</span><span class="token operator">:</span>
m <span class="token operator">+=</span> <span class="token number">1</span>
<span class="token keyword">if</span> i<span class="token operator">==</span>needle_len <span class="token operator">and</span> haystack<span class="token punctuation">[</span>m<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">==</span> needle<span class="token punctuation">[</span>i<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token operator">:</span>
<span class="token keyword">return</span> m <span class="token operator">-</span> needle_len
<span class="token keyword">else</span><span class="token operator">:</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span>
<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">'__main__'</span><span class="token operator">:</span>
needle <span class="token operator">=</span> <span class="token string">'abcaby'</span>
haystack <span class="token operator">=</span> <span class="token string">'abxabcabcaby'</span>
print <span class="token function">strstr</span><span class="token punctuation">(</span>haystack<span class="token punctuation">,</span> needle<span class="token punctuation">)</span>
</code></pre></div></div> <footer class="page-edit"><div class="edit-link"><a href="https://github.com/devtut/generate/edit/master/docs/algorithm/substring-search.md" target="_blank" rel="noopener noreferrer">Edit this page on GitHub</a> <span><svg xmlns="http://www.w3.org/2000/svg" aria-hidden="true" focusable="false" x="0px" y="0px" viewBox="0 0 100 100" width="15" height="15" class="icon outbound"><path fill="currentColor" d="M18.8,85.1h56l0,0c2.2,0,4-1.8,4-4v-32h-8v28h-48v-48h28v-8h-32l0,0c-2.2,0-4,1.8-4,4v56C14.8,83.3,16.6,85.1,18.8,85.1z"></path> <polygon fill="currentColor" points="45.7,48.7 51.3,54.3 77.2,28.5 77.2,37.2 85.2,37.2 85.2,14.9 62.8,14.9 62.8,22.9 71.5,22.9"></polygon></svg> <span class="sr-only">(opens new window)</span></span></div> <!----></footer> <div class="page-nav"><p class="inner"><span class="prev">
←
<a href="/algorithm/searching.html" class="prev">
Searching
</a></span> <span class="next"><a href="/algorithm/breadth-first-search.html">
Breadth-First Search
</a>
→
</span></p></div> </main></div><div class="global-ui"><!----></div></div>
<script src="/assets/js/app.1779e102.js" defer></script><script src="/assets/js/3.2cfa8016.js" defer></script><script src="/assets/js/74.4a6de632.js" defer></script>
</body>
</html>