Count overlapping substrings

Question

I'm looking for feedback on the actual logic of this simple Python 2.7 script. The automated grader passed it, but I expect it could be make more 'pythonic'. It was written for homework, and some of the constraints included basic looping constructs only and no function declarations. This differs from a simple String.find() because it must count substrings with common characters. For example, "bobob" must match "bob" twice.

#! /usr/bin/python
s = 'azcbobobegghakl'  #2 bobs
s = 'auoeunbobnoeubobob,ntonethbobobo'  #5 bobs

seek = 'bob'
matches = 0

for i in range(0,len(s)):
    if s[i] == seek[0]:  #if any input char matches seek string
        if abs(i-len(s)) < len(seek):  #checks not closer to end of input than seek
            break
        match_maybe = True
        match = False
        seek_index = 0
        while (match_maybe):  #look ahead to see if subsequent chars match seek
            seek_index += 1
            if seek_index == len(seek):  #successful match each char of seek
                match = True
                break
            if s[i+seek_index] != seek[seek_index]: #does not match
                match_maybe = False
                break
        if match:
            matches += 1

print 'Number of times bob occurs is: ' + str(matches)

Peilonrayz · Accepted Answer · 2016-01-19 10:04:08Z

First things first, make this a function. If you then add a doctest to the function you can test if it works correctly, without having to change the code. This is good, as it also describes how to use it to your users.
Whilst this is a homework assignment, but it's better to get used to using these things.

At the moment you should have the following code. The import doctest is so that you don't have to use the command line to test the code. This is good if you don't know how to use the command line or if there are problems with Python and the system PATH.

def overlapping_count(s, seek):
    """
    >>> overlapping_count('azcbobobegghakl', 'bob')
    2
    >>> overlapping_count('auoeunbobnoeubobob,ntonethbobobo', 'bob')
    5
    """
    matches = 0
    # Rest of code...
    return matches

if __name__ == "__main__":
    import doctest
    doctest.testmod()

The biggest improvement in clarity that you can make with the code you have is to use slicing. Python allows you to index from a beginning to an end of a sliceable object. To use this is quite simple.

>>> s = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
>>> s[0]
'L'
>>> s[1]
'o'
>>> s[0:1]
'L'
>>> s[0:5]
'Lorem'

As you can probably guess, using this for your match check will greatly decrease the amount of lines, and clarity. Now the inside of your for-loop can be:

if s[i] == seek[0]:
    if s[i:i + len(seek)] == seek:
        matches += 1

If you use some (premature?) optimizations, you can make your code faster. It keeps roughly the same amount of complexity in the function. But mostly, you don't want to waste CPU cycles on things you can remember. Therefore you would want to store both len(seek) and seek[0] in variables. Finally enumerate can add a small speed boost to the function, and is also generally preferred in this scenario.

This solution is twice as fast as your solution:

def overlapping_count(string, seek):
    """
    >>> overlapping_count('azcbobobegghakl', 'bob')
    2
    >>> overlapping_count('auoeunbobnoeubobob,ntonethbobobo', 'bob')
    5
    """
    matches = 0
    seek_len = len(seek)
    seek_c = seek[0]
    for i, c in enumerate(string):
        if c == seek_c and string[i:i + seek_len] == seek:
            matches += 1
    return matches

The homework specifically forbid using functions, ie it was rejected by the automated grader because it had the def keyword, but overall thanks for including this info. — nexus_2006, Commented Jan 20, 2016 at 0:01

Caridorc · Accepted Answer · 2016-01-19 11:57:48Z

This makes use of implicit looping, and sum to count how many times a predicate is true to avoid manually incrementing a counter to focus better on the logic:

def overlapping_count(string, seek):
    """
    >>> overlapping_count('azcbobobegghakl', 'bob')
    2
    >>> overlapping_count('auoeunbobnoeubobob,ntonethbobobo', 'bob')
    5
    """
    seek_len = len(seek)
    return sum(c == seek[0] and string[i:i + seek_len] == seek
                  for i, c in enumerate(string))

Stack Exchange Network

Count overlapping substrings

2 Answers 2

Hot Network Questions

Count overlapping substrings

2 Answers 2

Related

Hot Network Questions