All Questions
Tagged with string-matching algorithms
24 questions
1
vote
1
answer
158
views
Data structure for grouping strings in a collection when they share common substrings [closed]
I am looking for a data structure and an algorithm to manage a dynamic collection of strings, but grouping strings that have a substring in common. I try to describe it through an example.
@Christophe:...
2
votes
2
answers
915
views
Are there typo-tolerance algorithms (as opposed to string similarity)? [closed]
I want to build a search with basic typo tolerance.
There are quite a few string similarity algorithms (and implementations for almost all languages I guess).
However, humans tend to make some typos ...
2
votes
0
answers
237
views
Algorithm to search very long blacklist in another very long data set
I have two data sets. The first data set has approx. 50.000 movie and song titles and the second one have 20.000 blacklist strings. I am looking for the best algorithm to detect movie/song title which ...
2
votes
3
answers
2k
views
Algorithm for optimizing text compression
I am looking for text compression algorithms (natural language compression, rather than compression of arbitrary binary data).
I have seen for example An Efficient Compression Code for Text ...
1
vote
0
answers
195
views
phonetic algorithms for words that aren't surnames?
I've been doing a little research into algorithms for matching spelling mistakes in names. From Soundex through to metaphone and Beider-Morse. All of these algorithms generally focus on last names ...
1
vote
1
answer
169
views
Find a string in list of strings
Background:
I am writing an application for a small embedded device. There is a static list of strings: currently about 500 strings and string length is 12 characters on average. The list might ...
6
votes
2
answers
4k
views
Detecting plagiarism – what algorithm?
I'm currently writing a program to read a body of text and compare it to search-engine results (from searching for substrings of the given text), with the goal of detecting plagiarism in, for example, ...
37
votes
7
answers
51k
views
What algorithm would you best use for string similarity?
I am designing a plugin to uniquely identify content on various web pages, based on addresses.
So I may have one address which looks like:
1 someawesome street, anytown, F100 211
later I may find ...
3
votes
3
answers
139
views
Replace strings based on substring match
I have N strings and M search-replace pairs. Each of the strings contains exactly one of the search pair and the whole string needs to be replaced by the replace pair.
Say you have returns,between,...
2
votes
3
answers
2k
views
Algorithm to find maximum matched units
I will try to explain my objective with an example which will be easier to understand.
Suppose I have a sentence like "A B C D E F G H".(Each word seperated using single space).
I have a Database ...
7
votes
2
answers
282
views
Finding and counting equal substrings in a set of strings
I'm thinking about a way of finding similar parts in Strings. I have a set of strings of varying length i.e:
The quick brown fox jumps
fox force five
the bunny is much quicker than the fox
is
First, i ...
24
votes
5
answers
4k
views
Is there any good search algorithm for a single character?
I know several basic string-matching algorithms such as KMP or Boyer-Moore, but all of those analyze the pattern before searching.However, if one has a single character, there is not much to analyze.
...
-1
votes
1
answer
1k
views
Find missing number in sequence in string [closed]
I have a string that contains numbers in sequence. There are no delimiters between numbers. I have to find missing number in that sequence. For example:
176517661768 is missing the number: 1767
...
3
votes
2
answers
1k
views
Burrows-Wheeler transform backward search: how to find suffix index?
BWT backward search algorithm is pretty straightforward if we only need the multiplicity of a pattern. However I also need to find the suffix indices (i.e. positions in the reference string where a ...
1
vote
0
answers
404
views
clustering of strings with variable-length prefixes
I've got bunch of strings with variable-length prefixes (or postfixes - I can always revert them) as follows:
0155555555
523455555555
755555555
...
87129999999999999
119999999999999
09119999999999999
...