Skip to content

Commit 0602af8

Browse files
committed
Some more on ADTs
1 parent 39c1762 commit 0602af8

File tree

3 files changed

+1913
-65
lines changed

3 files changed

+1913
-65
lines changed

‎doc/source/4_abstract_data_types.rst

+136-64
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ concrete realisations of a mathematical idea.
1717
An *abstract data type* is a purely mathematical :term:`type`,
1818
defined independently of its concrete realisation as code.
1919

20+
Abstract data types enable the programmer to reason about algorithms and their
21+
cost separately from the task of implementing them.
2022
That said, it will frequently be helpful in understanding abstract
2123
data types to refer to the ways in which they might be implemented.
2224

@@ -37,18 +39,24 @@ a :term:`LIFO (last in, first out)`, because the last object added to
3739
the stack is the first object retrieved (contrast :term:`FIFO <FIFO (first in, first out)>`).
3840

3941
Recall that a :term:`type` is defined by a set of possible values and
40-
a set of operations. A stack is an ordered sequence of objects (of any
41-
type) with the operations `push` to add a new object to the sequence,
42+
a set of operations. The value of stack is an ordered sequence of objects of any
43+
type. The operations are `push` to add a new object to the sequence,
4244
and `pop` to return the most recently added object, and remove it from
43-
the sequence. It is also common to add an additional operation of
45+
the sequence. :numref:`stackdiag` shows these operations. It is also common to add an additional operation of
4446
`peek`, which returns the most recently added object without removing
4547
it from the stack.
4648

4749
.. note::
4850

49-
Put a diagram illustrating stack operations here.
51+
The stack operations in the diagram are wrong. The stack is back to front and
52+
the wrong value is popped.
53+
54+
.. _stackdiag:
5055

5156
.. blockdiag::
57+
:caption: Cartoon of a sequence of stack operations. First 24, 12, 57 are
58+
pushed, then 57 is popped.
59+
5260

5361
blockdiag stack{
5462
// setup info
@@ -261,9 +269,9 @@ Algorithmic complexity
261269
The second reason that understanding abstract data types is important
262270
is that a good implementation of a well designed abstract data type
263271
will have well-defined performance characteristics. In particular, the
264-
optimal algorithmic complexity, expressed in big 'O' notation, of
272+
optimal algorithmic complexity, expressed in big :math:`O` notation, of
265273
operations on abstract data types will be known. Recall the definition
266-
of big 'O':
274+
of big :math:`O`:
267275

268276
.. _bigO:
269277

@@ -327,15 +335,18 @@ data structure.
327335
def peek(self):
328336
return self.data[-1]
329337
330-
:numref:`bigO` is a particular case of the big `O` notation, which you
331-
may already have seen in numerical analysis. However, there the limit
332-
is taken as the independent variable approaches 0. This difference of
333-
context between computer science and numerical analysis is sometimes
334-
confusing, particularly since both disciplines conventionally leave
335-
out the limit. It's worth keeping in mind that the difference, because
336-
a numerical algorithm with :math:`O(h^4)` error is really rather good
337-
since `h` is small, but an algorithm with :math:`O(n^4)` cost is very
338-
expensive indeed!
338+
339+
.. note::
340+
341+
:numref:`Definition %s <bigO>` is a particular case of the big `O` notation, which you may
342+
already have seen in numerical analysis. The distinction is that in
343+
analysing algorithmic complexity, the limit is taken as :math:`n` approaches
344+
infinity, while in numerical analysis the independent variable approaches 0.
345+
This difference between two closely related fields is often confusing,
346+
particularly since both disciplines conventionally leave out the limit. It's
347+
worth keeping in mind that the difference, because a numerical algorithm
348+
with :math:`O(h^4)` error is really rather good since `h` is small, but an
349+
algorithm with :math:`O(n^4)` cost is very expensive indeed!
339350

340351
Amortised complexity and worst case complexity
341352
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -359,14 +370,18 @@ full does a further append operation cause Python to allocate more
359370
memory. The amount of memory allocated is approximately proportional
360371
to the current length of the list. That is, if the current list length
361372
is :math:`n` then the new memory allocation will be of size
362-
approximately :math:`kn` for some :math:`k>1`.
363-
364-
.. note::
365-
366-
Need diagrams of how a dynamic array works here.
373+
approximately :math:`kn` for some :math:`k>1`. This concrete data structure is
374+
called a :term:`dynamic array`. :numref:`dynamicarray` illustrates its operation.
367375

376+
.. _dynamicarray:
377+
368378
.. graphviz::
369-
:align: center
379+
:caption: A dynamic array implementation of a :class:`list`. The existing
380+
memory buffer is full, so when 11 is appended to the list, a larger
381+
buffer is created and the whole list is copied into it. When 13 is
382+
subsequently appended to the list, there is still space in the buffer so
383+
it is not necessary to copy the whole list.
384+
:align: center
370385

371386
digraph dl {
372387
bgcolor="#ffffff00" # RGBA (with alpha)
@@ -384,7 +399,7 @@ approximately :math:`kn` for some :math:`k>1`.
384399
style="ellipse, dashed";
385400
bgcolor="#CD5C5C";
386401
"node0" [
387-
label = "<f0> 2 | 3| 5| 7 |e<f1>"
402+
label = "<f0> 2 | 3| 5| 7 |e <f1>"
388403
shape = "record"
389404
];
390405
}
@@ -394,27 +409,27 @@ approximately :math:`kn` for some :math:`k>1`.
394409
bgcolor="#2E8B57";
395410

396411
"node1" [
397-
label = "<f0> 2 | 3| 5| 7 | <f1>| | | <f2>"
412+
label = "<f0> 2 | 3| 5| 7 | <f1> 11| | | <f2>"
398413
shape = "record"
399414

400415
];
416+
}
417+
subgraph cluster_4 {
418+
style="ellipse, dashed";
419+
bgcolor="#2E8B57";
401420

402421
"node3" [
403422
label = "<f0> 2 | 3| 5| 7| <f1> 11| <f2> 13| | <f3>"
404423
shape = "record"
405424
];
406425
}
407-
408-
"node0":f0 -> "node1":f0 [
409-
id = 0
410-
];
411426

412-
"node1":f0 -> "node3":f1 [
427+
"node0":f0 -> "node1":f0 [
413428
id = 2
414429
label = "append 11"
415430
];
416431

417-
"node1":f0 -> "node3":f2 [
432+
"node1":f0 -> "node3":f0 [
418433
id = 2
419434
label = "append 13"
420435
];
@@ -451,43 +466,65 @@ contrast, the occasional list append operation is an example of the
451466
list has an amortised time complexity of :math:`O(1)` but a worst-case
452467
time complexity of :math:`O(n)`.
453468

454-
.. note::
469+
We can use Python's :term:`introspection` capabilities to illustrate how the
470+
dynamic allocation of space for a list works as the list is appended. The
471+
:func:`sys.getsizeof` function returns the amount of computer memory that an
472+
object consumes. The function in :numref:`byte_size` uses this to diagnose the memory
473+
consumption of progressively longer lists, and :numref:`byte_size` demonstrates
474+
this.
455475

456-
Not sure if you want this? Shows the byte usage of the array. If so,
457-
I will update the IPython call numbers.
458-
459-
.. code-block:: python
476+
.. _byte_size:
460477

461-
import sys
478+
.. code-block:: python3
479+
:caption: Code to progressively lengthen a :class:`list` and observe the
480+
impact on its memory consumption. This function is available as
481+
:func:`example_code.linked_list.byte_size`.
482+
:linenos:
483+
484+
import sys
485+
486+
def byte_size(n):
487+
"""Print the size in bytes of lists up to length n."""
488+
data = []
489+
for i in range(n):
490+
a = len(data)
491+
b = sys.getsizeof(data)
492+
print(f"Length:{a}; Size in bytes:{b}")
493+
data.append(i)
494+
495+
.. _byte_size_demo:
462496

463-
def byteSize(n):
464-
data = []
465-
for i in range(n):
466-
a = len(data)
467-
b = sys.getsizeof(data)
468-
print(f"Length:{a}; Size of bytes:{b}")
469-
data.append(i)
470-
471497
.. code-block:: ipython3
472-
473-
In [1]: n = 10
474-
In [2]: byteSize(n)
475-
Length:0; Size of bytes:72
476-
Length:1; Size of bytes:104
477-
Length:2; Size of bytes:104
478-
Length:3; Size of bytes:104
479-
Length:4; Size of bytes:104
480-
Length:5; Size of bytes:136
481-
Length:6; Size of bytes:136
482-
Length:7; Size of bytes:136
483-
Length:8; Size of bytes:136
484-
Length:9; Size of bytes:200
485-
486-
Some more abstract data types
487-
-----------------------------
488-
489-
Queue and deque
490-
~~~~~~~~~~~~~~~
498+
:caption: The memory consumption of lists of length 0 to 19. We can infer
499+
that the list is reallocated at lengths 1, 5, 9, and 17.
500+
501+
In [1]: from example_code.linked_list import byte_size
502+
503+
In [2]: byte_size(20)
504+
Length:0; Size in bytes:56
505+
Length:1; Size in bytes:88
506+
Length:2; Size in bytes:88
507+
Length:3; Size in bytes:88
508+
Length:4; Size in bytes:88
509+
Length:5; Size in bytes:120
510+
Length:6; Size in bytes:120
511+
Length:7; Size in bytes:120
512+
Length:8; Size in bytes:120
513+
Length:9; Size in bytes:184
514+
Length:10; Size in bytes:184
515+
Length:11; Size in bytes:184
516+
Length:12; Size in bytes:184
517+
Length:13; Size in bytes:184
518+
Length:14; Size in bytes:184
519+
Length:15; Size in bytes:184
520+
Length:16; Size in bytes:184
521+
Length:17; Size in bytes:256
522+
Length:18; Size in bytes:256
523+
Length:19; Size in bytes:256
524+
525+
526+
Queues and deques
527+
-----------------
491528

492529
A :term:`queue` is, like a :term:`stack`, an ordered sequence of
493530
objects. The difference is that the only accessible item in the
@@ -502,6 +539,41 @@ deque. Python's standard library contains the
502539
:class:`collections.deque` class, providing a simple and efficient
503540
implementation of a deque.
504541

542+
Ring buffers
543+
~~~~~~~~~~~~
544+
545+
How might one go about implementing a deque? A dynamic array allows values to be
546+
appended with :math:`O(1)` complexity, but doesn't offer an efficient mechanism
547+
for prepending values. One might think that the natural solution for this would
548+
be to create a double-ended dynamic array: a buffer with spare space at each
549+
end. Unfortunately this is not optimally efficient in the case where the deque
550+
is used to implement a queue of approximately constant length. In that case,
551+
values are consistently added at one end of the data structure and removed from
552+
the other. Even in the case of a double-ended dynamic array, the buffer space at
553+
the append end of the queue will constantly run out, necessitating an expensive
554+
copy operation. The solution is to use a dynamic array, but to logically join up
555+
its ends, so that the first position in the buffer follows on from the last.
556+
Only in the case where all positions in the buffer are full would the buffer be
557+
reallocated.
558+
559+
.. figure:: images/ring_buffer.*
560+
561+
An implementation of a deque in a ring buffer, with queue
562+
operations illustrating its operation.
563+
564+
Objects are added to the end of the
565+
buffer and removed from its start.
566+
567+
At step 7, the contents of the buffer
568+
wrap around: the queue at this stage contains `D, E, F`.
569+
570+
At step 9 there is
571+
insufficient space in the buffer to append `G`, so new space is allocated
572+
and the buffer's contents copied to the start of the new buffer.
573+
574+
575+
Some more abstract data types
576+
-----------------------------
505577

506578
Linked lists
507579
~~~~~~~~~~~~
@@ -747,10 +819,10 @@ to keep track of the iteration.
747819
def __init__(self, link):
748820
self.here = link
749821
750-
def __iter__():
822+
def __iter__(self):
751823
return self
752824
753-
def __next__(self):
825+
def __next__(self):
754826
if self.here:
755827
next = self.here
756828
self.here = self.here.next

0 commit comments

Comments
 (0)