-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
212 lines (193 loc) · 15.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
<!DOCTYPE html>
<html lang="en_us">
<head>
<title>Pointer Soup</title>
<meta charset="utf-8" />
<meta name="generator" content="Pelican" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" href="/static/css/main.css" />
<link rel="stylesheet" href="/static/css/theme.css" />
<link rel="icon" href="/images/yazani/yazani_1_extracted_bg_big_eyes_cropped.png" type="image/png" />
<link rel="apple-touch-icon" href="/images/yazani/yazani_1_extracted_bg_big_eyes_cropped.png" type="image/png" />
<script src="/static/misc.js"></script>
<script src="/blog/banner_image.js"></script>
<meta name="tags" content="programming, c, lisp, language-design" />
<meta property="og:site_name" content="dragoncoder047’s blog" />
<meta property="og:title" content="Pointer Soup" />
<meta property="og:description" content="I never realized that the concept of a pointer could get so confusing. Over the last two months I came up with a number of different ideas that all involve pointers and got my brain all twisted up in knots when I got to the pointer part. When I left …" />
<meta property="og:image" content="/images/yazani/yazani_1_extracted_bg.png" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://dragoncoder047.github.io/blog/2024/pointer-soup" />
<meta property="og:locale" content="['']" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="dragoncoder047’s blog - Pointer Soup" />
<meta name="twitter:description" content="I never realized that the concept of a pointer could get so confusing. Over the last two months I came up with a number of different ideas that all involve pointers and got my brain all twisted up in knots when I got to the pointer part. When I left …" />
<meta name="twitter:image" content="/images/yazani/yazani_1_extracted_bg.png" />
<!-- PrismJS -->
<script src="/static/prism.js" data-autoloader-path="https://cdn.jsdelivr.net/npm/prismjs@v1.x/components/"></script>
<script src="/static/prism-runbutton.js"></script>
<script src="/phoo/prism-phoo.js"></script> <!-- /PrismJS -->
<!-- Katex -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/katex.css" type="text/css" />
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/katex.js"></script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/contrib/mhchem.js"></script>
<script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/contrib/auto-render.js"></script>
<link href="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/contrib/copy-tex.css" rel="stylesheet" type="text/css" />
<script src="https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/contrib/copy-tex.js"></script>
<script>
window.addEventListener("DOMContentLoaded", () => {
renderMathInElement(document.body,
{
delimiters: [
{ left: "$$", right: "$$", display: true },
{ left: "$", right: "$", display: false },
{ left: "\\begin{align}", right: "\\end{align}", display: true },
]
}
);
});
</script> <!-- /Katex -->
</head>
<body class="match-braces rainbow-braces">
<header>
<a href="https://dragoncoder047.github.io/blog" class="flex-row"><div class="flex-row"><img src="/images/yazani/yazani_1_extracted_bg.png" style="max-height:10em" id="banner-image" /><div id="sitename-text"><h1>dragoncoder047’s blog</h1><h2>random thoughts about nonrandom things</h2></div></div></a>
<nav>
<ul>
<li><a href="https://dragoncoder047.github.io/blog/">Home</a></li>
<li><a href="https://dragoncoder047.github.io/blog/archives">Archives</a>
</li>
<li><a href="https://dragoncoder047.github.io/blog/tags">By tag</a>
</li>
<li><a href="/">Site root</a>
</li>
<li><a href="#">Projects</a>
<ul>
<li><a href="https://dragoncoder047.github.io/thuepaste">Thuepaste</a>
</li>
<li><a href="https://dragoncoder047.github.io/armdroid">Armdroid</a>
</li>
<li><a href="https://dragoncoder047.github.io/langton-music">Langton's Ant Music</a>
</li>
<li><a href="https://dragoncoder047.github.io/schemascii">Schemascii</a>
</li>
<li><a href="https://dragoncoder047.github.io/parasite">Parasite</a>
</li>
</ul>
</li>
<li>
<form action="https://www.google.com/search" method="GET">
<input name="q" type="search" placeholder="Search"></input>
<input type="hidden" name="as_sitesearch" value="https://dragoncoder047.github.io/blog"></input>
<input type="submit" value="Search"></input>
</form>
</li>
</ul>
</nav>
</header>
<main>
<h1><a href="https://dragoncoder047.github.io/blog/2024/pointer-soup" rel="bookmark" title="Permalink to this page">Pointer Soup</a></h1>
<div class="flex-row">
<span style="flex: 1">← Previous:
<a href="https://dragoncoder047.github.io/blog/2024/i-hope-this-sticks">
I Hope This Sticks
</a>
</span>
<span>Next:
<a href="https://dragoncoder047.github.io/blog/2024/now-fully-two-dimensional">
Now Fully Two-Dimensional
</a> →
</span>
</div>
<div class="post-info">
Posted <time class="published" datetime="2024-08-06T00:00:00-04:00">Tue 06 August 2024</time>
<address>By
<a href="https://dragoncoder047.github.io/blog/">dragoncoder047</a>
</address>
<div class="tags">
Tags:
<a href="https://dragoncoder047.github.io/blog/tag/c">c</a>
<a href="https://dragoncoder047.github.io/blog/tag/language-design">language-design</a>
<a href="https://dragoncoder047.github.io/blog/tag/lisp">lisp</a>
<a href="https://dragoncoder047.github.io/blog/tag/programming">programming</a>
</div>
</div>
<p>I never realized that the concept of a pointer could get so confusing. Over the last two months I came up with a number of different ideas that all involve pointers and got my brain all twisted up in knots when I got to the pointer part.</p>
<p>When I left off in June I had spent the month discussing a whole bunch of my ideas for uLisp with David Johnson-Davies, uLisp’s maintainer. He put out a new release – version 4.6 – that incorporated a number of these changes, but not all of them. The reason why he had not implemented many of them was that they affected the way uLisp interprets pointers, and my suggestions were so different than the current method that David (rightly) felt apprehensive.</p>
<p>One of the suggestions that I sent to David was that the symbol representation might be able to be sped up.</p>
<p>Currently, uLisp has three different representations of symbols: built-in symbols, packed symbols, and long symbols. All of the symbol objects contain a number in the <code>cdr</code> pointer, but whether it represents a pointer or not depends on the kind of symbol:</p>
<ol>
<li>
<p>Built-in symbols are represented as a number that is the index of the symbol in the built-in symbol table. However, to allow them to be distinguished from other kinds of symbols the index is “twisted” into a slightly different format by first adding a constant that has the top two bits set (specifically, <code class="language-c highlight">0xF4240000</code> on 32-bit uLisp), and then performing a rightwards bit rotation by 2 bits. The effect is that the bottom two bits of a built-in symbol’s representation will always be 1 (mathematically, the resultant number will be congruent to 3 modulo 4).</p>
<p>As a pointer, this is almost universally invalid; processors typically choke on pointers that aren’t aligned to a multiple of the processor’s word size (which is either 4 or 8 bytes).</p>
</li>
<li>
<p>Packed symbols are a way to save memory for common symbol names by encoding them as a base-40 number instead of storing every character. If the name contains only valid characters (namely, letters, digits, <code>$</code>, <code>-</code>, and <code>*</code>) and is short enough, the name is encoded in base-40 and then the resultant number is “twisted” in the same manner as built-in indexes (just with a different constant) again to ensure that at least one of the bottom two bits is set to 1.</p>
</li>
<li>
<p>Long symbols are sort of the “fallback” option: if no other representation fits, the symbol is encoded in the same manner as a uLisp string, with the characters stored in a linked list of chunks inside the uLisp <code>Workspace</code>, and then the symbol object’s value is a pointer to the head of the list. Since it’s obviously a valid pointer, the bottom two bits of a long symbol are guaranteed to be both 0’s.</p>
</li>
</ol>
<p>When the symbol is printed, the determination as to which kind of symbol it is has a few steps. If the bottom two bits are 0, then it’s obviously a long symbol, and it is printed in the same manner as a string (just without quotes). If either of the bottom two bits are 1 (a “misaligned” pointer), then it is either a built-in symbol or a packed symbol. The number is un-rotated, and if it is greater than the built-in symbols’ scaling constant, then it must be a built-in, otherwise it’s a packed symbol.</p>
<p>Suffice to say that this pointer-twisting was twisting my brain up in knots. This representation is a very smart, well-thought out, and headache-inducing encoding method – but kudos to David for coming up with it, because it works very well.</p>
<p>In figuring out how exactly this pointer-twisting scheme worked I noticed that there’s one assumption that this encoding scheme makes: <em>all</em> values with the bottom two bits set to 0 (i.e. the ones that represent valid pointers) must be reserved for long symbols. In reality, it’s only the values that are valid pointers <em>into the block of allocated objects</em> – the <code>Workspace</code> – that are the values reserved as long symbols. Of course, depending on the platform and the compiler, this block of addresses may be different. But what’s wrong with actually using the addresses of the top and bottom of the <code>Workspace</code> in the encoding/decoding computations?</p>
<p>I proposed changing the representations of the built-in symbols to point directly to the table entry that they came from. Since it’s another pointer, the bottom two bits would now always be 0, instead of always being 1. Long symbols and packed symbols would be represented <em>exactly the same way</em>.</p>
<p>The process for determining which kind of symbol a pointer represents would still involve pointer-twisting, but only for the packed symbols. If the pointer is not misaligned, then it’s either a built-in symbol pointer, or a long symbol pointer. The determination as to which it is, is made by – you guessed it – checking whether the pointer points to valid memory within uLisp’s <code>Workspace</code>. If it is within the bounds of the <code>Workspace</code>, it’s a long symbol. If it points to memory outside of the <code>Workspace</code>, then it is assumed to point to a table entry of a built-in symbol. Easy as that.</p>
<p>The added benefit of having the built-in symbols point directly to their table entries is that looking up the built-in function is made a heck of a lot faster – now it is just one pointer dereference, versus having to untwist the number first to get the index, which is itself a handful of instructions, and then using that index to find the entry in the built-in symbol table, another few instructions. The built-in functions are used a <em>lot</em> (by definition), so making them faster is an important optimization.</p>
<p>Unfortunately, David didn’t implement that. Granted, I still haven’t, either, but for a valid reason: thinking about pointers is difficult. Especially if they are part of a C <code class="language-c highlight">union</code> and you have to tell then apart from other “kinds” of pointers!</p>
<hr />
<p><strong>Related Posts</strong></p>
<ul>
<li><a href="https://dragoncoder047.github.io/blog/2023/manual-memory-management-madness">Manual Memory Management Madness</a></li>
<li><a href="https://dragoncoder047.github.io/blog/2024/a-hash-mapped-mess">A Hash-Mapped Mess</a></li>
<li><a href="https://dragoncoder047.github.io/blog/2024/the-lesser-of-two-evils">The Lesser of Two Evils</a></li>
<li><a href="https://dragoncoder047.github.io/blog/2023/continuations-and-the-thunk-queue">Continuations and the thunk queue</a></li>
<li><a href="https://dragoncoder047.github.io/blog/2023/powerful-pickle-pattern-matching">Powerful PICKLE Pattern Matching</a></li>
</ul>
<script src="https://giscus.app/client.js"
data-repo="dragoncoder047/blog"
data-repo-id="R_kgDOHCL60w"
data-category="Post Comments"
data-category-id="DIC_kwDOHCL6084CRxCW"
data-mapping="og:title"
data-reactions-enabled="1"
data-input-position="top"
data-theme="dark"
data-lang="en"
crossorigin="anonymous"
async
></script>
<section id="extras">
<div class="blogroll">
<ul>
<li><a href="https://www.conwaylife.com/">Conwaylife.com Forums</a></li>
<li><a href="https://www.python.org/">Python</a></li>
<li><a href="http://www.ulisp.com/">uLisp</a></li>
</ul>
</div>
<div class="social">
<ul>
<li><a href="https://github.com/dragoncoder047">dragoncoder047 on GitHub</a></li>
<li><a href="https://youtube.com/@dragoncoder047">dragoncoder047 on YouTube</a></li>
<li><a href="https://instagram.com/dragoncoder047/">dragoncoder047 on Instagram</a></li>
</ul>
</div>
</section>
</main>
<footer>
<address>
Site built by <a href="https://getpelican.com/">Pelican</a>
</address>
<a href="#" onclick="window.scrollTo({top: 0, left: 0});">Back to top</a>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XR0F89CCGK"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag("js", new Date());
gtag("config", "G-XR0F89CCGK");
</script>
</footer>
</body>
</html>