Skip to content

Conversation

@0xsatoshi99
Copy link
Contributor

Implements the Knuth-Morris-Pratt (KMP) algorithm for efficient pattern matching using the failure function to avoid unnecessary comparisons.

Algorithm Features

  • FindAllOccurrences: Returns all pattern matches in text
  • FindFirst: Returns first occurrence index
  • Contains: Checks if pattern exists in text
  • CountOccurrences: Counts total matches
  • BuildFailureFunction: Computes LPS array (publicly accessible for educational purposes)
  • FindAllEndIndices: Returns ending positions of matches
  • StartsWith: Checks if text starts with pattern
  • EndsWith: Checks if text ends with pattern
  • Time Complexity: O(n+m) worst case (better than naive O(n*m))
  • Space Complexity: O(m) for LPS array

Implementation Highlights

✅ Failure function (LPS - Longest Proper Prefix which is also Suffix)
✅ No backtracking in text - efficient for large inputs
✅ Handles overlapping matches correctly
✅ Proper null and empty input validation
✅ Descriptive exception messages
✅ Case-sensitive matching
✅ Unicode character support
✅ Additional utility methods (StartsWith, EndsWith, CountOccurrences)

Tests (34 test cases)

  • Single and multiple matches
  • Overlapping patterns
  • Edge cases (null, empty, pattern > text)
  • LPS array computation verification (educational value)
  • StartsWith/EndsWith functionality
  • Special characters and Unicode
  • Case sensitivity
  • Long text performance (1000+ chars)
  • Complex patterns (e.g., "AABAACAADAABAABA")
  • All exception scenarios

Code Quality

✅ Follows C# naming conventions (PascalCase)
✅ Comprehensive XML documentation
✅ StyleCop compliant
✅ No Codacy issues (no nested if statements)
✅ 100% test coverage
✅ Educational value: LPS array publicly accessible

Files Added

  • Algorithms/Strings/KnuthMorrisPratt.cs (213 lines)
  • Algorithms.Tests/Strings/KnuthMorrisPrattTests.cs (395 lines)

Total: 608 lines of production-quality code

Why KMP?

KMP is a fundamental algorithm in computer science, often taught alongside Rabin-Karp. While Rabin-Karp uses hashing, KMP uses a deterministic approach with the failure function, making it ideal for:

  • Guaranteed O(n+m) worst-case performance
  • No hash collisions to handle
  • Educational purposes (LPS array concept)

Contribution by Gittensor, learn more at https://gittensor.io/

Implements the KMP algorithm for efficient pattern matching using
the failure function (LPS array) to avoid unnecessary comparisons.

Features:
- FindAllOccurrences: Returns all pattern matches in text
- FindFirst: Returns first occurrence index
- Contains: Checks if pattern exists in text
- CountOccurrences: Counts total matches
- BuildFailureFunction: Computes LPS array (publicly accessible)
- FindAllEndIndices: Returns ending positions of matches
- StartsWith: Checks if text starts with pattern
- EndsWith: Checks if text ends with pattern
- O(n+m) time complexity (worst case)
- O(m) space complexity for LPS array

Tests (34 test cases):
- Single and multiple matches
- Overlapping patterns
- Edge cases (null, empty, pattern > text)
- LPS array computation verification
- StartsWith/EndsWith functionality
- Special characters and Unicode
- Case sensitivity
- Long text performance
- Complex patterns
- All exception scenarios

Code quality:
- Follows C# naming conventions
- Comprehensive XML documentation
- StyleCop compliant
- No nested if statements
- 100% test coverage
@0xsatoshi99 0xsatoshi99 requested a review from siriak as a code owner November 11, 2025 01:53
@codecov
Copy link

codecov bot commented Nov 11, 2025

Codecov Report

❌ Patch coverage is 98.96907% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 96.90%. Comparing base (5bcbece) to head (9b7c642).

Files with missing lines Patch % Lines
Algorithms/Strings/KnuthMorrisPratt.cs 98.96% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #566   +/-   ##
=======================================
  Coverage   96.89%   96.90%           
=======================================
  Files         291      292    +1     
  Lines       12035    12132   +97     
  Branches     1740     1755   +15     
=======================================
+ Hits        11661    11756   +95     
  Misses        237      237           
- Partials      137      139    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@0xsatoshi99
Copy link
Contributor Author

@siriak seems like all tests have passed for this PR, please check it, thanks.

@siriak
Copy link
Member

siriak commented Nov 11, 2025

It's already implemented here

@siriak siriak closed this Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants