How, when, and why do developers use custom data structures in python?

Question

Python's lack of pointers makes data-structure construction intuitively more challenging, and has so much built in functionality that, as an amateur, I can't see when, why, or how you would create something beyond what's already there.

When have you built your own, and why did you do it? E.g. special types of trees, etc.

You create a custom data structure when you need it and Python doesn't provide it out of the box. en.wikipedia.org/wiki/List_of_data_structures — Robert Harvey, Commented Dec 11, 2012 at 19:39
"Python's lack of pointers makes data-structure construction intuitively more challenging" Uhm, no it doesn't. It's roughly the same as in C# or Java. — user16764, Commented Dec 11, 2012 at 19:47
The whole field of object-orientation is about creating custom data structures - in most object oriented languages they are called "classes". So you actually ask "shall anyone create a new class, there are enough classes already there". Hope this makes the absurdity of your question clear. — Doc Brown, Commented Dec 11, 2012 at 19:51
@DocBrown: Probably pedantic but: the whole field of programming is all about creating custom data structures (with algorithms to operate on them)... OOP just commonly uses objects as the primary mechanism to do that. I think that knowing this increases the clarity you mention. — Steven Evers, Commented Dec 11, 2012 at 20:31
It sounds like your hang-up is about pointing to things. Most languages without explicit pointers still support references (assigning any object = to another that isn't a value type). Generally you just create a containing data, and a reference to another node, or list of nodes, or two nodes, or however your structure is going to be represented. Or write it in C and link to it. — KChaloux, Commented Dec 14, 2012 at 15:34

mjgpy3 · Accepted Answer · 2012-12-11 20:29:25Z

Well, that kind of depends on what you are willing to call a "data structure." According to Wikipdia, a data structure is simply a "particular way of storing and organizing data in a computer so that it can be used efficiently." So, therefore, classes would be a form of data structure, and classes are very widely used in Python. But, for the sake of this answer, I will assume you are more interested in what you might learn about in a data structures and algorithms class (e.g. trees, linked lists, queues, hashes, etc...).

Now, because of Python's pseudo-code-like syntax it can be a very useful language for implementing data structures in. If for no other purpose than just to aid in understanding these basic concepts. For example, when I first learned about linked list I decided to implement them in Python:

class node:
    def __init__(self, data = None, is_header = False):
        self.data = data
        self.next = None
    def __str__(self):
        return "|{0}| --> {1}".format(str(self.data), str(self.next))


class linked_list:
    def __init__(self):
        self.header = node("!Header!", True)

    def add_node(self, add_me):
        current_node = self.header
        while 1:
            if current_node.next == None:
            current_node.next = add_me
            break
            else:
            current_node = current_node.next    

    def print(self):
        print (str(self.header))

Now, this isn't a perfect example, nor is it even a totally proper implementation of a linked list, but it goes to illustrate something:

Python's simple syntax can be helpful in understanding data structures

Another example would be a priority queue that I built back in the day:

class priority_queue:
    def __init__(self):
        self.element = []
        self.priority = []

    def enqueue_with_priority(self, me, priority):
        where = 0
        for i in range(len(self.element)):
            if not priority < self.priority[i]:
            where = i
            break

        self.element.insert(where, me)
        self.priority.insert(where, priority)

    def dequeue_highest(self):
        return self.element.pop(0)

    def print(self):
        for i in self.element:
            print i

Again, not a perfect implementation but it illustrates another benefit of coding a data structure in Python:

Python is useful in prototyping data structures to optimize them for lower-level programming languages

Looking back on this code I see flaws in my implementation. The Python code, however, tends to be short and sweet. So if I wanted to implement a data structures in a lower-level language (such as a c-style language) I could first generate a quick Python prototype and then optimize later.

Finally, I think Python can help in the development of data structures, because:

In Python, development is quick, mistakes are allowed and you can try things out.

Imagine you are building a hash-table-like data structure, in a strongly-typed, compiled language you would usually try things out in an IDE, then have to compile and run it. In Python, you can just pull up IDLE, iPython or the Python interpreter and just try things out! No need to recompile for each little change to the hash function you want to try -- just plug it into the interpreter.

So, in conclusion, I guess what I'm saying is that I agree with you: there's not a lot of practicality in building your own data structures (since most anything you may want has already been implemented and optimized). However, for me, there is a lot of educational benefit (because of Python's ease of syntax), as well as a lot of creative and developmental freedom (due to Python's low-constraint, EAFP design).

It is important to note that although python (through its wide-reaching library) provides many standard data structures, a "data structure" (by definition) can be almost anything. So, in Python as well as any other language we may use to solve non-trivial problems, we are going to need to define new data structures. Therefore, it is quite arguable that serious Python developers create custom data structures just as much as serious developers in other languages do.

Are you actually quoting someone or is this just for presentation? — phant0m, Commented Dec 11, 2012 at 20:14
It's a little confusing, yes. Also, there are plenty of custom data structures you'll need from time to time, a data structure is just structured data; the classic structures are indeed already covered with great implementations, but don't think you ever are done building more. — Martijn Pieters, Commented Dec 11, 2012 at 20:20

Winston Ewert · Accepted Answer · 2012-12-11 19:53:48Z

2

To make data structures in Python you'd use a class. References in Python (and other reference based languages) act like pointers and so there is no restriction to being able to implement a data structure.

Python does have many data structures builtin. But then so do various other languages. See the STL for C++ or Java's standard library. In both C++ and Java you also rarely need to implement your own data structure because most of the basic structures already exist.

But sometimes, there is a data structure you need that Python doesn't have builtin. For example, python doesn't have any builtin tree-based structures, or linked lists, etc. You don't need these all that often, but when you need them, you need them.

answered Dec 11, 2012 at 19:53

Winston Ewert

25k12 gold badges74 silver badges104 bronze badges

2

A more common (due to its generality) data structure not in the Python standard library is a graph. This is partially because (as python-ideas recently discussed) there are so many radically different uses of them that a single implementation can't possibly cover most use cases. I regularly create graphs, though they're rarely explicitly treated as such.
– user7043
Commented Dec 11, 2012 at 20:16
@delnan, that's true. We create graphs all the time even without thinking about it.
– Winston Ewert
Commented Dec 12, 2012 at 15:56

Add a comment |

QuantumChris · Accepted Answer · 2020-02-14 10:41:26Z

The other answers list useful datastructures written from scratch, but it's also useful to extend the abilities of existing datastructures.

I like to use namedtuples for simple immutable data and SimpleNamespaces for simple mutable data, but they are missing useful features. SimpleNamespaces for example don't allow iteration and don't have access via .keys(), .values() or .items(). So, I extend the functionality like so:

from types import SimpleNamespace


class CustomNamespace(SimpleNamespace):
    """
    Adds iteration and dict-style access of keys, values and items to SimpleNamespace.
    """
    def keys(self):
        for key in self.__dict__:
            yield key

    def values(self):
        for value in self.__dict__.values():
            yield value

    def items(self):
        for key, value in self.__dict__.items():
            yield key, value

    def __iter__(self):
        return self.items()

Now I have all the normal functionality of a SimpleNamespace (mutable, easy to create in-line, etc) and it also allows me to iterate over all entries and call .values() etc.

Daniel · Accepted Answer · 2021-01-17 00:37:42Z

a lot of people have answered this, but I actually completely share your view point. What most answers implement seem to be Abstract Data types, not data structures.

For example, priority queue, a stack, a deque. These aren't data structures. They're abstract data types.

Abstract data types can be implemented sometimes in various ways using different data structures. So for example I could implement a circular buffer using something akin to a linked list, or simply by using a struct and an inner array and some variables.

Queues, stacks, etc can all be implemented as linked lists. They normally ARE linked lists. What's different is the subroutines. They each have specific functions to provide a certain behavior, i.e. you won't bother writing a pop() function for a fifo, and won't write a 'prepend' or 'insert' or whatever function for a LIFO.

Abstract data types are about behavior and implementing an abstract idea. E.g. a 'list' is an abstract implementation (implementation is really the wrong word, I simply mean the concept is made specific) of the idea of a 'container'.

Data structures are about efficiency, memory layout, and implementing an absract data type.

In truth, Python's countless layers of abstraction, as with all high-level languages, makes data structures an ilusive idea. It's how you end up with ridiculous notions like trying to implement a linked list or an array in Python-- and you'll see online examples with a list embeeded in a class type to that end. When Python's List data type itself is actually implemented most likely as a dynamic array (if not linked list). So you use a data type- List - that's implemented using arrays - to implement an array. That's like cutting your plastic bottle in half so that you can turn it into a cup to drink out of.

Besides, linked lists are normally used because they can hold different data types (int, char etc), which arrays can't. And the fact that you can insert an item at the beginning or middle, which is highly inefficient in the case of arrays. Python's List data type can hold different Data types, and you can insert items at any point in the list. So why would you try to emulate a linked list? And it really is just emulation, because it emulates the superficial behavior of a linked list, turning it into something of an abstract data type. Because obviously, YOU'RE NOT using a linked list.

Just my take. Abstract data types are of course easy to implement using OOP otoh.

Stack Exchange Network

How, when, and why do developers use custom data structures in python?

4 Answers 4

Hot Network Questions

How, when, and why do developers use custom data structures in python?

4 Answers 4

Related

Hot Network Questions