All Questions
Tagged with string-matching strings
17 questions
1
vote
1
answer
158
views
Data structure for grouping strings in a collection when they share common substrings [closed]
I am looking for a data structure and an algorithm to manage a dynamic collection of strings, but grouping strings that have a substring in common. I try to describe it through an example.
@Christophe:...
-1
votes
3
answers
353
views
Is there a text distance (or string similarity) algorithm which accounts for the distance between characters?
I'm interested in finding a text distance (or string similarity) algorithm which computes a greater distance (or lower similarity) when characters are further apart.
For example, I want the distance ...
2
votes
3
answers
2k
views
Algorithm for optimizing text compression
I am looking for text compression algorithms (natural language compression, rather than compression of arbitrary binary data).
I have seen for example An Efficient Compression Code for Text ...
2
votes
4
answers
3k
views
What is the optimal way to perform 5000 unique string replace functions in terms of performance?
Restructuring some code, and the way I built it up over time has portions that look something like this:
s.replace("ABW"," Aruba ");
s.replace("AFG"," Afghanistan ");
s.replace("AGO"," Angola ");
s....
2
votes
1
answer
4k
views
Efficient multiple substrings search
I have many substrings(2-5 words each) which I would like to search in some text of about 40-50 words length. What is the most efficient way to flag matching substrings.
Currently I am simply using:
...
6
votes
2
answers
4k
views
Detecting plagiarism – what algorithm?
I'm currently writing a program to read a body of text and compare it to search-engine results (from searching for substrings of the given text), with the goal of detecting plagiarism in, for example, ...
-6
votes
2
answers
341
views
Which piece of code is more efficient with respect to Time and Memory cost? [closed]
Code 1:
private static int myCompare(String a, String b) {
/* my version of the compareTo method from the String Java class */
int len1 = a.length();
int len2 = b.length();
if (...
7
votes
2
answers
282
views
Finding and counting equal substrings in a set of strings
I'm thinking about a way of finding similar parts in Strings. I have a set of strings of varying length i.e:
The quick brown fox jumps
fox force five
the bunny is much quicker than the fox
is
First, i ...
0
votes
1
answer
2k
views
Most Pythonic way to remove first match of potential leading strings?
This is a bit difficult to describe, but I'll do my best.
In Python, I can use string.startswith(tuple) to test for multiple matches. But startswith only returns a boolean answer, whether or not it ...
6
votes
4
answers
6k
views
Is "use "abc".equals(myString) instead of myString.equals("abc") to avoid null pointer exception" already problematic in terms of business logic?
I heard numerous times that when comparing Strings in Java, to avoid null pointer exception, we should use "abc".equals(myString) instead of myString.equals("abc"), but my question is, is this idea ...
-1
votes
1
answer
1k
views
Find missing number in sequence in string [closed]
I have a string that contains numbers in sequence. There are no delimiters between numbers. I have to find missing number in that sequence. For example:
176517661768 is missing the number: 1767
...
3
votes
2
answers
1k
views
Burrows-Wheeler transform backward search: how to find suffix index?
BWT backward search algorithm is pretty straightforward if we only need the multiplicity of a pattern. However I also need to find the suffix indices (i.e. positions in the reference string where a ...
4
votes
2
answers
2k
views
why regex, when using global search and {0,} quantifier, match the end of the string?
I have asked a question here about js, regex, quantifiers and global search. I've understood finally how this works, but, let's take a concrete example and then I`ll write my question.
Based on the ...
1
vote
0
answers
404
views
clustering of strings with variable-length prefixes
I've got bunch of strings with variable-length prefixes (or postfixes - I can always revert them) as follows:
0155555555
523455555555
755555555
...
87129999999999999
119999999999999
09119999999999999
...
0
votes
0
answers
1k
views
Compare names and the use of Levenshtein's algorithm
I need to cross names from two lists. What is the best away to compare the names? As you may expect, in one list we can have the complete name, on the other just the first and last.
Besides that, ...