14
\$\begingroup\$

There is a follow-up question available: shell-sort-insertion-sort-bubble-sort-selection-sort-algorithms-python.

Selection Sort

The selection sort algorithm sorts a list (array) by finding the minimum element from the right (unsorted part) of the list and putting it at the left (sorted part) of the list.

Bubble Sort

The Bubble Sort algorithm functions by repeatedly swapping the adjacent elements, if they aren't in correct order.

Optimized Bubble Sort

An optimized version of Bubble Sort algorithm is to break the loop, when there is no further swapping to be made, in one entire pass.

Insertion Sort

Insertion sort algorithm builds the final sorted array in a one item at a time manner. It is less efficient on large lists than more advanced algorithms, such as Quick Sort, Heap Sort or Merge Sort, yet it provides some advantages, such as implementation simplicity, efficiency for small datasets and sorting stability.

Shell Sort (Optimized Insertion Sort)

Shell Sort is just a variation of Insertion Sort, in which the elements are moved only one position ahead. When an element has to be moved far ahead, too many movements are involved, which is a drawback. In Shell Sort, we'd make the array "h-sorted" for a large value of h. We then keep reducing the value of h (sublist_increment) until it'd become 1.


I've been trying to implement the above algorithms in Python and modified it based on prior reviews, I'd appreciate it if you'd review it for any other changes/improvements.

Code

import random
from typing import List, TypeVar
from scipy import stats

T = TypeVar('T')


def selection_sort(input_list: List[T]) -> List[T]:
    """
    This method returns an ascending sorted integer list
    for an input integer/float list using Selection Sort Algorithm.

    Sorting:
    - In-Place (space complexity O(1))
    - Efficiency (Time Complexity => O(N^2))
    - Unstable Sort (Order of duplicate elements is not preserved)


    Iterates through the list and swaps the min from the right side
    to sorted elements from the left side of the list.
    """

    # Is the length of the list.
    length = len(input_list)

    # Iterates through the list to do the swapping.
    for element_index in range(length - 1):

        min_index = element_index

        # Iterates through the list to find the min index.
        for finder_index in range(element_index + 1, length):
            if input_list[min_index] > input_list[finder_index]:
                min_index = finder_index

        # Swaps the min value with the pointer value.
        if element_index is not min_index:
            input_list[element_index], input_list[min_index] = input_list[min_index], input_list[element_index]

    return input_list


def bubble_sort(input_list: List[T]) -> List[T]:
    """
    This method returns an ascending sorted integer list
    for an input integer/float list using regular Bubble Sort algorithm.

    Sorting:
    - In-Place (Space Complexity => O(1))
    - Efficiency (Time Complexity => O(N^2))
    - Stable Sort (Order of equal elements does not change)
    """

    length = len(input_list)
    for i in range(length - 1):
        for j in range(length - i - 1):
            if input_list[j] > input_list[j + 1]:
                _swap_elements(input_list, j, j + 1)

    return input_list


def optimized_bubble_sort(input_list: List[T]) -> List[T]:
    """
    This method returns an ascending sorted integer list
    for an input integer/float list using an Optimized Bubble Sort algorithm.

    For optimization, the Bubble Sort algorithm stops if in a pass there would be no further swaps
    between an element of the array and the next element.

    Sorting:
    - In-Place (Space Complexity => O(1))
    - Efficiency (Time Complexity => O(N^2))
    - Stable Sort (Order of equal elements does not change)
    """

    # Assigns the length of to be sorted array.
    length = len(input_list)

    for i in range(length - 1):
        number_of_swaps = 0
        for j in range(length - i - 1):
            if input_list[j] > input_list[j + 1]:
                _swap_elements(input_list, j, j + 1)
                number_of_swaps += 1

        # If there is no further swap in iteration i, the array is already sorted.
        if number_of_swaps == 0:
            break

    return input_list


def _swap_elements(input_list: List[T], current_index: int, next_index: int) -> None:
    """
    Swaps the adjacent elements.
    """
    input_list[current_index], input_list[next_index] = input_list[next_index], input_list[current_index]


def insertion_sort(input_list: List[T]) -> List[T]:
    """
    This method returns an ascending sorted integer list
    for an input integer/float list using a Insertion Sort algorithm.

    Sorting:
    - In-Place (space complexity O(1))
    - Efficiency (time complexity O(N^2) - Good if N is small - It has too many movements)
    - Stable Sort (Order of duplicate elements is preserved)
    """

    # Assigns the length of to be sorted array.
    length = len(input_list)

    # Picks the to-be-inserted element from the right side of the array, starting with index 1.
    for i in range(1, length):
        element_for_insertion = input_list[i]

        # Iterates through the left sorted-side of the array to find the correct position for the element to be inserted.
        j = i - 1
        while j >= 0 and input_list[j] > element_for_insertion:
            input_list[j + 1] = input_list[j]
            j -= 1

        # Inserts the element.
        input_list[j + 1] = element_for_insertion

    return input_list


def shell_sort(input_list: List[T], sublist_increment: int) -> List[T]:
    if sublist_increment // 2 == 0:
        print("Please select an odd number for sublist incrementation. ")
        return

    # Assigns the length of to be sorted array.
    length = len(input_list)

    while sublist_increment >= 1:

        for i in range(sublist_increment, length):
            element_for_insertion = input_list[i]

            # Iterates through the left sorted-side of the array to find the correct position for the element to be inserted.
            j = i - sublist_increment
            while j >= 0 and input_list[j] > element_for_insertion:
                input_list[j + sublist_increment] = input_list[j]
                j -= sublist_increment

            # Inserts the element.
            input_list[j + sublist_increment] = element_for_insertion

        # Narrows down the sublists by two increments.
        sublist_increment -= 2

    return input_list


if __name__ == "__main__":

    # Generates a random integer list
    TEST_LIST_INTEGER = random.sample(range(-1000, 1000), 15)

    # Generates a random float list
    TEST_LIST_FLOAT = stats.uniform(-10, 10).rvs(10)

    print(f"The unsorted integer input list is:\n{TEST_LIST_INTEGER}\n-----------------------------------\n")
    print(f"The unsorted float input list is:\n{TEST_LIST_FLOAT}\n-----------------------------------\n")

    # Tests the Selection Sort Algorithm:
    print("---------------------------------")
    print(f"Selection Sort (Integer): {selection_sort(TEST_LIST_INTEGER.copy())}")
    print(f"Selection Sort (Float): {selection_sort(TEST_LIST_FLOAT.copy())}")

    # Tests the Optimized Bubble Sort Algorithm:
    print("---------------------------------")
    print(f"Optimized Bubble Sort (Integer): {optimized_bubble_sort(TEST_LIST_INTEGER.copy())}")
    print(f"Optimized Bubble Sort (Float): {optimized_bubble_sort(TEST_LIST_INTEGER.copy())}")
    # Tests the Bubble Sort Algorithm:
    print("---------------------------------")
    print(f"Bubble Sort (Integer): {bubble_sort(TEST_LIST_INTEGER.copy())}")
    print(f"Bubble Sort (Float): {bubble_sort(TEST_LIST_INTEGER.copy())}")
    # Tests the Insertion Sort Algorithm:
    print("---------------------------------")
    print(f"Insertion Sort (Integer): {insertion_sort(TEST_LIST_INTEGER.copy())}")
    print(f"Insertion Sort (Float): {insertion_sort(TEST_LIST_INTEGER.copy())}")

    # Tests the Shell Sort Algorithm:
    print("---------------------------------")
    print(f"Shell Sort (Integer): {shell_sort(TEST_LIST_INTEGER.copy(), 5)}")
    print(f"Shell Sort (Float): {shell_sort(TEST_LIST_INTEGER.copy(), 5)}")

References

\$\endgroup\$
1
  • \$\begingroup\$ BTW, "optimized bubble sort" is kind of silly. Basically the only benefit of Bubble Sort is very small code size (in machine code) and simplicity. If you want it to run faster, use Insertion Sort. There's not much use case for "optimized bubble sort" because it's still quite bad when the list isn't already sorted. See Bubble Sort: An Archaeological Algorithmic Analysis for some history about why Bubble Sort is so widely known even though it doesn't perform well. \$\endgroup\$ Commented Sep 25, 2019 at 23:20

4 Answers 4

11
\$\begingroup\$

In-place sort

Your selection_sort is an in-place sort, so there's no need to return the same list you were given. In fact, returning the list is confusing, because it somewhat implies that you would be returning something different from what you were given. You can just drop the return, here and in similar functions.

Failure modes

if sublist_increment // 2 == 0:
    print("Please select an odd number for sublist incrementation. ")
    return

This has issues. You're printing - but what if the caller doesn't want you to print? You're returning None - but what if the caller wants to catch an exception and try with different input? You should be raiseing an exception here, not printing and returning None.

Don't repeat yourself

# Tests the Selection Sort Algorithm:
print("---------------------------------")
print(f"Selection Sort (Integer): {selection_sort(TEST_LIST_INTEGER.copy())}")
print(f"Selection Sort (Float): {selection_sort(TEST_LIST_FLOAT.copy())}")

# Tests the Optimized Bubble Sort Algorithm:
print("---------------------------------")
print(f"Optimized Bubble Sort (Integer): {optimized_bubble_sort(TEST_LIST_INTEGER.copy())}")
print(f"Optimized Bubble Sort (Float): {optimized_bubble_sort(TEST_LIST_INTEGER.copy())}")
# Tests the Bubble Sort Algorithm:
print("---------------------------------")
print(f"Bubble Sort (Integer): {bubble_sort(TEST_LIST_INTEGER.copy())}")
print(f"Bubble Sort (Float): {bubble_sort(TEST_LIST_INTEGER.copy())}")
# Tests the Insertion Sort Algorithm:
print("---------------------------------")
print(f"Insertion Sort (Integer): {insertion_sort(TEST_LIST_INTEGER.copy())}")
print(f"Insertion Sort (Float): {insertion_sort(TEST_LIST_INTEGER.copy())}")

# Tests the Shell Sort Algorithm:
print("---------------------------------")
print(f"Shell Sort (Integer): {shell_sort(TEST_LIST_INTEGER.copy(), 5)}")
print(f"Shell Sort (Float): {shell_sort(TEST_LIST_INTEGER.copy(), 5)}")

This should be a loop that executes five times. You can iterate over a tuple that contains entries for

  • the name of the sorting algorithm, and
  • a reference to a wrapper function that passes arguments in addition to TEST_LIST

Tests

It seems that there's either a bug or an unimplemented mechanism, because there is no difference between the "integer" and "float" tests. They're all integer tests.

Also, these are only tests in the sense that a developer has to use their eyeballs and verify the output manually. You should consider writing real automated tests: pass methods a known input (as you already do), and assert that the output is equal to expected output.

\$\endgroup\$
2
  • \$\begingroup\$ For testing, perhaps using Python's built-in sort function would be useful to generate the "expected" output, letting you randomly generate inputs to your test function. (And/or check every permutation of an input list to catch any corner cases.) \$\endgroup\$ Commented Sep 26, 2019 at 0:11
  • \$\begingroup\$ @PeterCordes I had suggested that in the review of the last version of this program. \$\endgroup\$
    – GZ0
    Commented Sep 26, 2019 at 0:15
10
\$\begingroup\$

Adding to @Reinderien's review, here are a few more points:

Testing

  • The test code has some repeated statements for every function. It would be better to put that into a for loop like this:

    sorting_algorithms = [
       ("Selection Sort", selection_sort),
       ...
       # Wrap shell_sort into a lambda to make it a single-argument function for testing
       ("Shell Sort", lambda s: shell_sort(s, 5))
    ]
    
    for description, func in sorting_algorithms:
        ...
        print(f"{description} (Integer): {func(TEST_LIST_INTEGER.copy())}")
        ...
    
  • Since callers of sorting functions are normally expected to supply only the list to be sorted, it would be better to make all other arguments optional:

    def shell_sort(input_list: List[T], sublist_increment: int = 5) -> List[T]:
    

    This sets a default value for the sublist_increment argument. With this change, the lambda wrapper for shell_sort in the code above is no longer needed (it is still needed if you want to test calling the function with non-default arguments).

  • random.sample performs sampling without replacement. So every input occurs only once and there are no duplicates in the output list. This is undesired for testing purpose since the functions are expected to work with duplicated elements. random.choice should be used instead.

  • It is a bit unusual to use two modules scipy.stats and random for the same task -- generating random numbers. The former is more powerful but in this case either of them is sufficient.

Coding style

  • Since you have defined the function _swap_elements, it would be better to use it everywhere when the functionality is needed. The selection_sort function has not used it yet.

  • The function _swap_elements does not need to know what the input indices mean for the caller. The function would work as long as the indices are valid. Therefore in this declaration

    def _swap_elements(input_list: List[T], current_index: int, next_index: int)
    

    the argument names current_index and next_index can be changed to more general names such as index1 and index2.

  • There are some overly long lines. Although it may not always be necessary to conform to the 79-char limit recommended by PEP 8, it would also be better not to make the lines too long. Long comments can be written on multiple lines. Statements like this

    print(f"The unsorted integer input list is:\n{TEST_LIST_INTEGER}\n-----------------------------------\n")
    

    can be written as this

    print("The unsorted integer input list is:",
          TEST_LIST_INTEGER,
          "-----------------------------------\n", sep='\n')
    

    or this (Python automatically joins adjacent string literals with no separators)

    print("The unsorted integer input list is:\n"
          f"{TEST_LIST_INTEGER}\n"
          "-----------------------------------\n")
    

    The shorter-line versions are also a bit more clear since each line of code corresponds to a line in the actual output.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Good review, thanks! You can also use "-" * 70 to create these ASCII lines. \$\endgroup\$
    – ggorlen
    Commented Sep 25, 2019 at 19:14
  • 1
    \$\begingroup\$ If that separator line is used multiple times it is better to define a variable sep_line = '-' * 70 and reuse it everywhere. Otherwise, I think it might be a bit clearer to just lay it out so that the alignment can be visually seen in the code. However this is purely subjective. \$\endgroup\$
    – GZ0
    Commented Sep 25, 2019 at 19:29
5
\$\begingroup\$

Rationale

Given that this question and your previous question, that I've seen, both mangled testing and implementation, I think you should properly setup your Python project environment.

  • Since you have tests you should use something like unittest or pytest.
  • Since I would setup a test directory and a source directory I can't just import se_229598, and so the simplest way to ensure I'm testing the correct code is to use tox or Nox.

    This comes with the added benefits that you'll; be testing your setup.py, can test against multiple Python versions and you can add other tools such as linters, hinters and documentation to your testing tool-chain.

I should note the code that I'm providing for the setup.py and tox.ini are MCVEs to keep the answer small and so don't follow best practices or have many cool features.

Python Project Environment

On Windows this would look something like:

$ mkdir src/se_229598
$ mkdir tests
$ python -m pip install virtualenv
$ python -m virtualenv venv
$ ./venv/Scripts/activate
(venv) $ vim setup.py
(venv) $ vim tox.ini
(venv) $ vim src/se_229598/__init__.py
(venv) $ vim tests/test_all.py
(venv) $ pip install .[dev]
(venv) $ tox

Where:

  • __init__.py is the code that you have in the post.
    Since you added a main guard it means that your old tests won't run. And so you can delete that if you want.

  • setup.py

    from setuptools import setup, find_packages
    
    setup(
        name='se_229598',
        packages=find_packages('src'),
        package_dir={'': 'src'},
        extras_require={
            'dev':  [
                'tox',
                'pytest',
                'scipy',
            ]
        },
    )
    
  • tox.ini

    [tox]
    envlist =
        unit-py36
        unit-py37
    
    [testenv]
    basepython =
        py36: python3.6
        py37: python3.7
    deps =
        .[dev]
    commands =
        unit: pytest
    
  • test_all.py. It should be obvious, but I've only tested one of your functions.

    import random
    
    import pytest
    import scipy.stats
    
    import se_229598
    
    TEST_LIST_INTEGER = random.sample(range(-1000, 1000), 15)
    TEST_LIST_FLOAT = list(scipy.stats.uniform(-10, 10).rvs(10))
    
    
    def test_selection_sort_int():
        assert (
            se_229598.selection_sort(TEST_LIST_INTEGER.copy())
            == sorted(TEST_LIST_INTEGER)
        )
    
    
    def test_selection_sort_float():
        assert (
            se_229598.selection_sort(TEST_LIST_FLOAT.copy())
            == sorted(TEST_LIST_FLOAT)
        )
    

Explanation

To test your code all you need to do is run tox in your virtual environment.

$ ./venv/Scripts/activate
(venv) $ tox
...
___________ summary ___________
  unit-py36: commands succeeded
  unit-py37: commands succeeded
  congratulations :)
$ 

This is as we setup tox to run pytest against Python 3.7 and 3.6 in the [testenv] section. If we don't specify the environment then it will default to running pytest on both 3.7 and 3.6, as we specified in the envlist.

Due to doing a standard pytest install we can just run pytest to test the code using its test auto discovery.

From here you can setup linters and hinters in your tox.ini and verify these raise no problems. You can also setup Sphinx to document your code. And even add test coverage. And all this runs simply from one command, tox.

Not only does this simplify local testing, tools like tox have integration with some CI software. Where I have used Jenkins CI and tox together to allow a basic CI workflows.

Further reading

\$\endgroup\$
2
\$\begingroup\$

As noted in another answer by @Reinderien, some of your functions modify the list in-place and some do not. This is already not so good, but it is exacerbated by the fact that all of your docstrings claim that the function returns a sorted list, indicating that it does not mutate any of the inputs.

If you fix this, e.g. by, as a crude hack, making a copy of the list first, you gain immediate improvements to the testability of your code. Suddenly it becomes very easy to e.g. produce a performance comparison of your algorithms:

enter image description here

For fairness' sake I added the line input_list = input_list[:] to all functions. I also gave sublist_increment a default value of 5 as suggested in the answer by @GZ0 and threw in the built-in sorted function (with a wrapper containing the input_list = input_list[:] line).

A few takeaway points from this:

  • It is hard to beat the built-in sorting function (especially with code written in Python and not C). It is between 3 and 400 times faster than the functions you wrote. For performance critical applications always prefer the built-in function unless you have some weird edge-case and a sorting function optimized for that specific case.
  • All of your functions seem not to be only slower in absolute terms, but also in relative. The asymptotic behavior looks like it has a different slope than that of sorted, which is \$\mathcal{O}(n\log n)\$. As mentioned in the comments by @GZ0 your algorithms are all \$\mathcal{O}(n^2)\$.
  • Note that I was limited to lists of length less than about a thousand because otherwise the runtimes would become too long.
  • The function you call "optimized" bubble sort does not seem to perform any better than the normal bubble sort.
  • In contrast, the shell sort (optimized insertion sort) does actually perform better than the normal insertion sort.
\$\endgroup\$
2
  • 1
    \$\begingroup\$ All the implemented algorithms in the post have \$O(n^2)\$ time complexity while a build-in sorting function in any language that has wide industrial applications would implement a \$O(nlogn)\$ algorithm. Hence the asymptotic behaviour. \$\endgroup\$
    – GZ0
    Commented Sep 26, 2019 at 15:16
  • 1
    \$\begingroup\$ @GZ0 The latter I was aware of, but for the former I would have had to study the algorithms more closely. Thanks for pointing it out. \$\endgroup\$
    – Graipher
    Commented Sep 26, 2019 at 15:30

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.