-
Notifications
You must be signed in to change notification settings - Fork 22
/
PatternMatching
415 lines (335 loc) · 13.3 KB
/
PatternMatching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
********************************* I. Using Naive Algorithm *****************************************
/*
Question: Check whether a pattern string lies in the text string ?
VERY IMP NOTE:
There is difference between subsequence and substring
The difference between them, can cearly be made out from the following readings:
http://en.wikipedia.org/wiki/Subsequence
http://en.wikipedia.org/wiki/Substring
The ALGORITHMS for finding both are also DIFFERENT:
1. For Subsequnece:
We use the traditional LCS algorithm
2. For Substring:
We use,
1. KMP (Knuth–Morris–Pratt) algorithm
URL: http://www.quora.com/What-is-the-best-resource-to-learn-KMP-Algorithm
2. Rabin-Karp algorithm
URL:
In actual understanding, subsequence and substring are DIFFERENT.
However Java developers implemented these two methods to return the same output.
For Example:
(Example Source: http://mytactics.blogspot.com/2013/10/difference-between-subsequence-and.html)
class StringSeqSub{
public static void main(String args[])
{
String str = "Hello World.!!";
// String s1 = str.subSequence(0, 5); // this line will throw error
CharSequence s1 = str.subSequence(0, 5);
String s2 = str.substring(0, 5);
System.out.println(s1);
System.out.println(s2);
}
}
Output -
Hello
Hello
Question Source: http://www.careercup.com/question?id=6196366774632448
Algorithm: For i = 0 to (1 + text.length - pattern.length)
For j=0 to pattern.length
Check if (text[i+j]==pattern[])
*/
package PatternMatching;
import java.util.Scanner;
public class NaiveAlgorithm {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
try{
System.out.println("Enter the text");
String text = in.nextLine();
System.out.println("Enter the pattern");
String pattern = in.nextLine();
System.out.println("The pattern present in the text: "+usingNaiveAlgorithm(text,pattern));
/*
// Understand how substring and subsequence methods are implemented in String class
String substring = text.substring(0,3);
CharSequence sequence = text.subSequence(0, 3);
System.out.println(substring);
System.out.println(sequence);
*/
}
finally{
in.close();
}
}
private static boolean usingNaiveAlgorithm(String text, String pattern) {
int i=0;
int j=0;
for(i=0;i<(1+text.length()-pattern.length());i++){
for(j=0;j<pattern.length();j++){
if(text.charAt(i+j)!=pattern.charAt(j))
break;
}
if(j==pattern.length())
return true;
}
return false;
}
}
/*
Analysis:
Time Complexity = (n*m)
where n = Length of Text
m = Length of Pattern
Space Complexity = O(1)
*/
********************************* II. Using KMP Algorithm *****************************************
/*
Question: Check whether a pattern string lies in the text string ?
VERY IMP NOTE:
The difference between them, can cearly be made out from the following readings:
http://en.wikipedia.org/wiki/Subsequence
http://en.wikipedia.org/wiki/Substring
In actual understanding, subsequence and substring are DIFFERENT.
However Java developers implemented these two methods to return the same output.
For Example:
(Example Source: http://mytactics.blogspot.com/2013/10/difference-between-subsequence-and.html)
class StringSeqSub{
public static void main(String args[])
{
String str = "Hello World.!!";
// String s1 = str.subSequence(0, 5); // this line will throw error
CharSequence s1 = str.subSequence(0, 5);
String s2 = str.substring(0, 5);
System.out.println(s1);
System.out.println(s2);
}
}
Output -
Hello
Hello
GeeksForGeeks Links for the following String Matching algorithms:
1. Naive String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-1-naive-pattern-searching/
2. Rabin-Karp String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/
3. Knuth-Morris-Pratt String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
The ALGORITHMS for finding both are also DIFFERENT:
1. For Subsequnece:
We use the traditional LCS algorithm
2. For Substring:
We use,
1. KMP (Knuth–Morris–Pratt) algorithm
URL: http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
http://www.quora.com/What-is-the-best-resource-to-learn-KMP-Algorithm
2. Rabin-Karp algorithm
URL: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching
Explanation of KMP Algorithm:
Source: http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
Question Source: http://www.careercup.com/question?id=6196366774632448
Algorithm: Algorithm is very well explained in these two sources:
http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
*/
package PatternMatching;
import java.util.Scanner;
public class KnuthMorrisPrattAlgorithm {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
try{
System.out.println("Enter the text");
String text = in.nextLine();
System.out.println("Enter the pattern");
String pattern = in.nextLine();
usingKMPAlgorithm(text,pattern); // Returns the index if pattern lies in text OR else returns NOTHING
/*
// Understand how substring and subsequence methods are implemented in String class
String substring = text.substring(0,3);
CharSequence sequence = text.subSequence(0, 3);
System.out.println(substring);
System.out.println(sequence);
*/
}
finally{
in.close();
}
}
private static void usingKMPAlgorithm(String text, String pattern) {
// create longestPrefixSuffixTable that will hold the longest prefix suffix values for pattern
int[] longestPrefixSuffixTable = new int[pattern.length()];
// populate the table
populateTable(pattern,longestPrefixSuffixTable);
int textIterator = 0; // iterator for text
int patternIterator=0; // iterator for pattern
while(textIterator<text.length()){
if(pattern.charAt(patternIterator)==text.charAt(textIterator)){ // if characters in pattern and text match
patternIterator++;
textIterator++;
}
if(patternIterator==pattern.length()){ // if pattern is fully visited
System.out.println("Found pattern at index: "+(textIterator-patternIterator));
patternIterator=longestPrefixSuffixTable[patternIterator-1];
}
// mismatch after patternIterator matches
else if(textIterator<text.length() && pattern.charAt(patternIterator)!=text.charAt(textIterator)){
// Do not match longestPrefixSuffixTable[0..longestPrefixSuffixTable[j-1]] characters,
// they will match anyway
if(patternIterator!=0)
patternIterator=longestPrefixSuffixTable[patternIterator-1];
else
textIterator=textIterator+1;
}
}
}
private static void populateTable(String pattern, int[] longestPrefixSuffixTable) {
int i=1; // index for iterating over the pattern string
longestPrefixSuffixTable[0] = 0; // longestPrefixSuffixTable[0] is always 0
int previousLength = 0; // length of the previous longest prefix suffix
while(i<pattern.length()){ // the loop calculates longestPrefixSuffixTable[i] for i = 1 to M-1
if(pattern.charAt(i)==pattern.charAt(previousLength)){
longestPrefixSuffixTable[i]=previousLength+1;
previousLength++; // increment the previous Length
i++; // increment the iterator
}
else{ // if(pattern.charAt(i)!=pattern.charAt(previousLength))
if(previousLength!=0)
previousLength = longestPrefixSuffixTable[previousLength-1];
else{ //(previousLength==0)
longestPrefixSuffixTable[i]=0;
i++;
}
}
}
}
}
/*
Analysis:
Time Complexity = O(n) where n = length of TEXT
Space Complexity = O(m) where m = length of PATTERN
*/
********************************* III. Using Karp-Rabin Algorithm *****************************************
/*
Question: Check whether a pattern string lies in the text string ?
VERY IMP NOTE:
The difference between them, can cearly be made out from the following readings:
http://en.wikipedia.org/wiki/Subsequence
http://en.wikipedia.org/wiki/Substring
In actual understanding, subsequence and substring are DIFFERENT.
However Java developers implemented these two methods to return the same output.
For Example:
(Example Source: http://mytactics.blogspot.com/2013/10/difference-between-subsequence-and.html)
class StringSeqSub{
public static void main(String args[])
{
String str = "Hello World.!!";
// String s1 = str.subSequence(0, 5); // this line will throw error
CharSequence s1 = str.subSequence(0, 5);
String s2 = str.substring(0, 5);
System.out.println(s1);
System.out.println(s2);
}
}
Output -
Hello
Hello
GeeksForGeeks Links for the following String Matching algorithms:
1. Naive String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-1-naive-pattern-searching/
2. Rabin-Karp String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/
3. Knuth-Morris-Pratt String Matching Algorithm: http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
The ALGORITHMS for finding both are also DIFFERENT:
1. For Subsequnece:
We use the traditional LCS algorithm
2. For Substring:
We use,
1. KMP (Knuth–Morris–Pratt) algorithm
URL: http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/
http://www.quora.com/What-is-the-best-resource-to-learn-KMP-Algorithm
2. Rabin-Karp algorithm
URL: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching
Explanation of Karp-Rabin Algorithm:
Source: http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching
Question Source: http://www.careercup.com/question?id=6196366774632448
Algorithm: Algorithm is very well explained in these two sources:
http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching
*/
package PatternMatching;
import java.util.Scanner;
public class KarpRabinAlgorithm {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
try{
System.out.println("Enter the text");
String text = in.nextLine();
System.out.println("Enter the pattern");
String pattern = in.nextLine();
usingKarpRabinAlgorithm(text,pattern); // Returns the index if pattern lies in text OR else returns NOTHING
/*
// Understand how substring and subsequence methods are implemented in String class
String substring = text.substring(0,3);
CharSequence sequence = text.subSequence(0, 3);
System.out.println(substring);
System.out.println(sequence);
*/
}
finally{
in.close();
}
}
private static void usingKarpRabinAlgorithm(String text, String pattern) {
int hashValueText = 0; // hash value for text
int hashValuePattern = 0; // hash value for pattern
int primeNumber = 101; // prime number for calculating hash value
int i = 0; // text string iterator
int j = 0; // pattern string iterator
int N = text.length();
int M = pattern.length();
// d is the number of characters in input alphabet
int ascii = 256; // ASCII Table characters
// The value of hash would be (hash*ascii)%primeNumber"
int hash=1;
for (i = 0; i < (M-1); i++)
hash = (hash*ascii)%primeNumber;
// Calculate the hash value of pattern and first window of text
for (i = 0; i < M; i++)
{
hashValuePattern = (ascii*hashValuePattern + pattern.charAt(i))%primeNumber;
hashValueText = (ascii*hashValueText + text.charAt(i))%primeNumber;
}
// Slide the pattern over text one by one
for (i = 0; i < (N - M + 1); i++)
{
// Check the hash values of current window of text and pattern
// If the hash values match then only check for characters on by one
if ( hashValuePattern == hashValueText )
{
/* Check for characters one by one */
for (j = 0; j < M; j++)
{
if (text.charAt(i+j) != pattern.charAt(j))
break;
}
if (j == M) // if hashValuePattern == hashValueText and pat[0...M-1] = text[i, i+1, ...i+M-1]
{
System.out.println("Pattern found at index: "+i);
}
}
// Calculate hash value for next window of text: Remove leading digit,
// add trailing digit
if ( i < N-M )
{
hashValueText = (ascii*(hashValueText - text.charAt(i)*hash) + text.charAt(i+M))%primeNumber;
// We might get negative value of t, converting it to positive
if(hashValueText < 0)
hashValueText = (hashValueText + primeNumber);
}
}
}
}
/*
Analysis:
Time Complexity:
Best and Average Case = O(n+m)
Worst Case = O(nm)
where n = length of TEXT
m = length of PATTERN
Space Complexity = O(1)
*/