Skip to main content

All Questions

1 vote
1 answer
158 views

Data structure for grouping strings in a collection when they share common substrings [closed]

I am looking for a data structure and an algorithm to manage a dynamic collection of strings, but grouping strings that have a substring in common. I try to describe it through an example. @Christophe:...
differentrain's user avatar
2 votes
2 answers
915 views

Are there typo-tolerance algorithms (as opposed to string similarity)? [closed]

I want to build a search with basic typo tolerance. There are quite a few string similarity algorithms (and implementations for almost all languages I guess). However, humans tend to make some typos ...
cis's user avatar
  • 255
2 votes
0 answers
237 views

Algorithm to search very long blacklist in another very long data set

I have two data sets. The first data set has approx. 50.000 movie and song titles and the second one have 20.000 blacklist strings. I am looking for the best algorithm to detect movie/song title which ...
Eray's user avatar
  • 336
2 votes
3 answers
2k views

Algorithm for optimizing text compression

I am looking for text compression algorithms (natural language compression, rather than compression of arbitrary binary data). I have seen for example An Efficient Compression Code for Text ...
Lance Pollard's user avatar
1 vote
0 answers
195 views

phonetic algorithms for words that aren't surnames?

I've been doing a little research into algorithms for matching spelling mistakes in names. From Soundex through to metaphone and Beider-Morse. All of these algorithms generally focus on last names ...
Jarede's user avatar
  • 323
1 vote
1 answer
169 views

Find a string in list of strings

Background: I am writing an application for a small embedded device. There is a static list of strings: currently about 500 strings and string length is 12 characters on average. The list might ...
psy's user avatar
  • 137
6 votes
2 answers
4k views

Detecting plagiarism – what algorithm?

I'm currently writing a program to read a body of text and compare it to search-engine results (from searching for substrings of the given text), with the goal of detecting plagiarism in, for example, ...
Vivian's user avatar
  • 189
37 votes
7 answers
51k views

What algorithm would you best use for string similarity?

I am designing a plugin to uniquely identify content on various web pages, based on addresses. So I may have one address which looks like: 1 someawesome street, anytown, F100 211 later I may find ...
Squiggs.'s user avatar
  • 511
3 votes
3 answers
139 views

Replace strings based on substring match

I have N strings and M search-replace pairs. Each of the strings contains exactly one of the search pair and the whole string needs to be replaced by the replace pair. Say you have returns,between,...
chx's user avatar
  • 373
2 votes
3 answers
2k views

Algorithm to find maximum matched units

I will try to explain my objective with an example which will be easier to understand. Suppose I have a sentence like "A B C D E F G H".(Each word seperated using single space). I have a Database ...
Nagaraju's user avatar
7 votes
2 answers
282 views

Finding and counting equal substrings in a set of strings

I'm thinking about a way of finding similar parts in Strings. I have a set of strings of varying length i.e: The quick brown fox jumps fox force five the bunny is much quicker than the fox is First, i ...
Chris's user avatar
  • 207
24 votes
5 answers
4k views

Is there any good search algorithm for a single character?

I know several basic string-matching algorithms such as KMP or Boyer-Moore, but all of those analyze the pattern before searching.However, if one has a single character, there is not much to analyze. ...
Christian's user avatar
  • 365
-1 votes
1 answer
1k views

Find missing number in sequence in string [closed]

I have a string that contains numbers in sequence. There are no delimiters between numbers. I have to find missing number in that sequence. For example: 176517661768 is missing the number: 1767 ...
Neo's user avatar
  • 31
3 votes
2 answers
1k views

Burrows-Wheeler transform backward search: how to find suffix index?

BWT backward search algorithm is pretty straightforward if we only need the multiplicity of a pattern. However I also need to find the suffix indices (i.e. positions in the reference string where a ...
user798275's user avatar
1 vote
0 answers
404 views

clustering of strings with variable-length prefixes

I've got bunch of strings with variable-length prefixes (or postfixes - I can always revert them) as follows: 0155555555 523455555555 755555555 ... 87129999999999999 119999999999999 09119999999999999 ...
god's user avatar
  • 232

15 30 50 per page