2
\$\begingroup\$

Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.

Does this implementation look correct? Do you have any tips for optimizing this?

'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''

#Calls heapify and returns sorted list from greatest to least. 
def heapSort(list):
    result = []
    while list:
        result.append(heapify(list))
    return result

def heapify(struct):
    for x in range(len(struct),0,-1):
    #Base Case: Returns last integer when sequence reaches below 2
        if len(struct) < 2:
            return struct.pop(0)

        #Compares Tree Root Node vs Children. 
        if struct[0] < struct[1]:
            temp = struct[0]
            struct[0] = struct[1]
            struct[1] = temp
        else:
        #Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
            if struct[x-1] > struct[x/2]:
                temp = struct[x-1]
                struct[x-1] = struct[x/2]
                struct[x/2] = temp
    return struct.pop(0)
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code). \$\endgroup\$ Commented May 10, 2018 at 10:19
  • \$\begingroup\$ Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing. \$\endgroup\$
    – Johnathan
    Commented May 10, 2018 at 23:46

2 Answers 2

5
\$\begingroup\$

See §1.9 for the cause of the bug identified by janos.

1. Review

  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.

  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.

  3. heapSort modifies its input. If I run:

    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
    

    Then a is now empty:

    >>> a
    []
    

    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.

  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.

  5. The base case does not depend on x and so it should be outside the loop.

  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:

    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
    
  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:

    range(len(seq),0,-1)
    

    to:

    range(len(seq) - 1, -1, -1)
    

    which can be simplified to:

    reversed(range(len(seq)))
    

    and the repetition of x//2 can be avoided by assigning a local variable:

    parent = (x + 1) // 2
    

    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)

  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).

  9. Normally the way heaps are implemented is that element \$i\$ is the parent of elements \$2i + 1\$ and \$2i + 2\$. However, in the code in the post, element \$i\$ is the parent of elements \$2i\$ and \$2i + 1\$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:

    if struct[0] < struct[1]:
    

    But this special case led to the bug.

    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:

    if len(struct) < 2:
    

    would also be unnecessary.

  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.

2. Revised code

def heap_sorted(iterable):
    "Return new list of all items from iterable in descending order."
    result = []
    seq = list(iterable)
    while seq:
        heapify(seq)
        result.append(seq.pop(0))
    return result

def heapify(seq):
    "Transform sequence into a max-heap, in-place."
    for i in reversed(range(1, len(seq))):
        parent = (i - 1) // 2
        if seq[i] > seq[parent]:
            seq[i], seq[parent] = seq[parent], seq[i]

3. Performance

The docstring says:

HeapSort. Worst: \$n \log n\$

But it is easy to see that this is not the case. That's because heapify is called \$n\$ times, and on the \$i\$th call heapify loops over all the remaining \$n-i\$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ \sum_{0\le i< n} n - i = {n(n + 1)\over2} $$ which is \$Θ(n^2)\$.

To get the runtime down to \$O(n \log n)\$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time \$O(\log n)\$.

\$\endgroup\$
2
  • \$\begingroup\$ Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario? \$\endgroup\$
    – Johnathan
    Commented May 10, 2018 at 23:48
  • \$\begingroup\$ @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3. \$\endgroup\$ Commented May 11, 2018 at 8:08
4
\$\begingroup\$

The implementation has a bug

For example it will not sort correctly 1, 2, 3.

Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?

Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.

When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.

Extract constant conditions from loops

The check on length will have the same result for all iterations of the loop. It should be done before the loop.

Style

There are some style issues I would improve.

  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.

  • The naming and spacing doesn't follow PEP8 conventions.

  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.

  • The name struct is not great for a list.

\$\endgroup\$
2
  • \$\begingroup\$ Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out. \$\endgroup\$
    – Johnathan
    Commented May 10, 2018 at 23:36
  • \$\begingroup\$ @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack. \$\endgroup\$
    – janos
    Commented May 11, 2018 at 5:01

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.