I hope it's not poor form to answer my own question, but upon trying to implement a LogicProgramAnswersetIterator
, I ran into some problems that lead me to think either I would have to go to "heroic" measures to pretend that my objects are iterable, or else admit that iterator
is not a good match for this use case. I think that the reasons why that is the case illuminate differences between stream
s and iterator
s; the objects I'm working with are either "actually streams" or "something else", but they are not iterators.
tl;dr: Streams are different from iterators in one or both of two of the following aspects. First, an iterator to N results is post-advanced N times to consume all results, but a stream pointing to N results is pre-advanced (i.e. something is fetched) N+1 times to consume all results. Second, most iterators are ephemeral pointers to persistent state, and most streams are persistent pointers to ephemeral state.
The first problem I ran into is that there's nothing (no safe thing) to point at to return a real reference from operator *()
. That leaves me needing to return a proxy object that acts like a reference, a la std::vector<bool>
. This is not a showstopper, but it does mean that many of the limitations/surprises mentioned in articles about std::vector<bool>
necessarily apply here too.
The second problem I ran into is that my objects require a "force the first answer" operation before you can get the first valid value, and there's no good place to put this call. It's potentially expensive, and mutates either the iterator, or the thing the iterator points to, or both. If the "force an answer" operation could be confined to the pre-increment operator ++(int)
it could be justifiable, but unfortunately this must be done before returning from the dereference operator *()
. We are left with a few choices:
- Expect the user to call
++iter
first, then call *iter
after to get the 0th item.
- This is just not how iterators work.
- Force the first result in the
.begin()
method before constructing the iterator.
- Violates expectations because a call like "do_only_if(false_condition, src.begin(), src.end())" will actually execute the logic program at least once.
- (And what happens if you call
.begin()
twice?)
- Force the first result in the iterator constructor itself.
- Has all the problems of doing it in .begin().
- Force the first result in
operator *()
.
- Requires a needs_eval flag. This flag has to be a mutable member variable, because it needs to be updated in const versions of the member functions.
- The flag check logic needs to be duplicated in
operator ++(int)
in case the user didn't dereference the iterator.
- Worse / most astonishingly, this flag check and force result needs to be done in
operator ==()
! (Because although you can easily "fake a failure" to create a "not-successful" iterator return from the .end()
function, you can't know that an unevaluated iterator is going to fail (and thus, equals .end()
) until you force the first result.)
Wait a minute! Isn't the std::istream_iterator wrapper class subject to all of these same design constraints? Indeed, it is, and the caveats required to meet them in its implementation lead Eric Niebler in his blog post to advise: "Naked input iterators like std::istream_iterator that don’t “belong” to a range have serious problems." [Because converting the istream to an iterator eagerly fetches a value, and calling .begin() twice actually advances the underlying stream twice.]
I think the last sentence of Useless's answer, "If the only sane way to do this is wrapping it in an istream_iterator<MyComplexRecord>
anyway, you could just write the iterator," could be charitably taken as an answer in its inverse: "If the only sane way [to make this into an iterator] is wrapping it in an istream_iterator
, [then maybe it's really a stream]."
The nature of the incompatibility between streams and iterators is something I've decided to call the "fence-post mismatch problem", in analogy to the class of bugs called "fence-post errors". A "fence-post error" is a logic error committed when failing to account for the fact that a fence with N sections has N+1 posts. The "fence-post mismatch problem" is one of:
- The difficulty arising when trying to adapt an object for which N results is associated with N+1 advancements, onto an interface for which N results is associated N advancements. Graphically:
|--|--| onto |--|--
- The difficulty arising when trying to adapt an object with results-between-advancements onto an interface designed for advancements-between-results. Graphically:
|--|--| onto --|--|--
.
- A combination of these two things. Graphically:
|--|--| onto --|--|
The canonical model of an iterator range in C++ is an array where begin
points to the first element of the array and is dereferenceable, and end
points to the one past the end of the array and is not dereferenceable. Since begin starts out already pointing at a valid element (provided the array was not empty), the number of elements is N and the number of advancements to the end is also N.
Contrast this with the result set from a logic expression evaluation: the initial state of the result, before the first answer is forced, is not dereferencable. The result that marks the end, "failure", is also not dereferenceable. So the answer set is book-ended by advancement operations on both sides, like posts on a fence, therefore N results are associated with N+1 advancements.
Clearly this situation satisfies the first case of the fence-post mismatch problem.
The second case I defined for the fence-post mismatch problem is more subtle; but it is not just word-play - even if it sounds like a trivial difference of (English language) semantics, the question of whether the iterator is advancing between results or the result is moving between advancements has a concrete implementation in the program operational semantics.
Observe that in the case of iterators pointing into an array, the results are part of a larger persistent object and previous results still exist when the iterator is advanced. This is advancements-between-results
because the results make up a fixed environment and it is the advancement operation which is moving between results.
In contrast the case of "iterating" through the results of a logic program has no actualized collection in memory. The computational mechanism ticking and whirring out results is the only persistent object, and the result is a temporary read off of the current state. As we compute the next answer, the states of parts of this machine reconfigure into a new result - therefore it is the result which is moving between advancements. This is results-between-advancements
.
The promised simple explanation in operational semantics is this: for advancements-between-results
we can return a reference (or reference proxy) to the current result and then advance the iterator without affecting the previously returned result and without making a copy of the result. For results-between-advancements
we must violate at least one of the three conditions: either we have to return results by value (not reference), or we are returning a reference to an internal variable that is going to get overwritten by advancing, or else we are copying results somewhere.
This case of which-is-between-what differs from the case of "N vs N+1" because the latter concerns "when" and "how many times" the iterator is/must be advanced, and the former concerns "what happens when you dereference the iterator." Both, however, are amenable to the metaphor of fences and fence-posts.
While this answer as a whole may have been a roundabout and long-winded alternative to the observation from this Boost page: that standard iterator categories unfortunately conflate the two orthogonal traits of traversal and dereference, I want to close by talking about why I think that the fence-post problem pushes things across the line from "it's tricky to make an adapter" to "probably indicates these things should not be treated the same."
The cases of the fence-post mismatch problem involve two things that C++ is not inherently designed to allow code to be agnostic to: when (or how many times) something happens, and who owns the memory to which those things are happening. That is, it's not that you can't make it all happen automatically (by going to heroic measures), it's just that as a language C++ expects the user to have an intent in mind for these things, and that's inherently not compatible with shoehorning objects into an API which masks these details. Without garbage collection you can't just say "I don't care where the memory comes from or what happens to it later" and without non-strict evaluation with memoization you can't just say "I don't care when / how many times this code is evaluated".
In conclusion: iterators and streams are different things because they have different fence-post attributes. While it is possible to adapt one to the other, ignoring the differences can lead to astonishing results. In C++ in particular, one should consider carefully whether it is worthwhile to try to force one into the interface of the other, because hiding the differences requires making assumptions about things that C++ expects the user to make conscious choices about.
decltype(it)
tostd::iterator_traits
and a stream when you can<<
or>>
values into / out of it. I.e. they can be interchangeable