I see that there already is an accepted answer, before I got around to writing this answer. That is a little discouraging for me as a reviewer, but I've written a code alternative and have some thoughts regarding your code.
Code review
- Document your code – To guess what your code is doing by simply reading it is not very easy. It could very well have a few comments on what it tries to achieve.
- Better variable names – In general try to use variable names describing the content. Names like
x
, a
, exp
(which I would read as expression?), and so on, are not very informative.
- Regex are generally expensive – In general regex, whilst being a fantastic too, are somewhat overused. Do note that it takes time to build the patterns and match the patterns, especially compared to simple string matches.
Some new(?) constructs
I'm not sure if you know these already, but there are a few new constructs I would like to show you:
String repeat – Strings can be multiplied (aka duplicated) using the multiplication operator.
text = "abc"
print text * 3 # Will output "abcabcabc"
divmod
with multiple outputs – divmod(a, b)
will divide a
by b
and return the divisor and the rest. This can be stored directly into a tuple like in the following:
div, rest = divmod(13, 5) # Giving div == 2, rest = 3
- Slice and dice strings – As of Python 2.3 one can slice strings. The most common variants indicates how many characters you want from start like
text[:3]
or if you want the rest of the text like text[3:]
. This can be very useful...
A slightly fancier print
varant – Using .format in combination with print
can produce nicer output rather easily:
print 'i: {}, sub: {}, div: {}, mod: {}'.format(i, source[:i], div, rest)
This would output on the same line, something like:
i: 4, sub: abcd, div: 2, mod: 3
else
-block after for
?! – This will make sense later on, but if a for
loop completes normally, it'll not enter the optional else:
-block. In other words, if you break
out of a for
loop Python won't enter the else
block. This can be used to verify that the for
loop actually found/did something, and provide an alternative if it didn't.
for i in range(5):
print i
if i == 3:
break
else:
print "Nothing larger than 3..."
Code alternative
I haven't commented too much on your chosen algorithm, as I find it a little confusing, so I thought what is he trying to achieve and what is an alternative approach. I then came up with these demands for the code:
- You're looking for the maximum repeated substring completely filling out the original string. This means:
- The candidate substring length, must then divide the original string length without any leftover (or rest characters)
- The candidate substring can't be more than half the length of the original string, as it then can't be duplicated
- The first (and shortest) candidate substring will always give the most repeats, if it matches the other criteria
So one way to write this out is like this:
def repeatedSubstring(source):
"""Look for the shortest substring which when repeated equals
the source string, without any left over characters.
"""
length = len(source)
print '\nOriginal text: {}. Length: {}'.format(source, length)
# Check candidate strings
for i in range(1, length/2+1):
repeat_count, leftovers = divmod(length, i)
# print 'i: {}, sub: {}, div: {}, mod: {}'.format(i, source[:i], repeat_count, leftovers)
# print 'repeated: {}'.format(source[:i]*repeat_count)
# Check for no leftovers characters, and equality when repeated
if (leftovers == 0) and (source == source[:i]*repeat_count):
print 'Found repeated substring: {}, count: {}'.format(source[:i], repeat_count)
break
else:
print "Couldn't find any repeated substring"
repeatedSubstring("abcdeabcde")
repeatedSubstring("abcdeabcdef")
repeatedSubstring("aaaaa")
I've commented out some debug print
statements, and left it a little more verbose than the original code.
If you want, you can easily replace the remaining print
statements with return div
and return 1
. A stripped down version would then look like:
def repeatedSubstringCount(source):
"""Look for the shortest substring which when repeated equals
the source string, without any left over characters.
Return the maximum repeat count, 1 if none found.
"""
length = len(source)
# Check candidate strings
for i in range(1, length/2+1):
repeat_count, leftovers = divmod(length, i)
# Check for no leftovers characters, and equality when repeated
if (leftovers == 0) and (source == source[:i]*repeat_count):
return repeat_count
return 1
I still left a few comments in there, so that it possible to have some idea on what is happening. I've also changed the name to "repeatedSubstringCount" to indicate both what the function does, but also what it returns.
for i in range(1, length+1):
is indented just as the next lineif length % i == 0:
. \$\endgroup\$