7

The Problem

Imagine you have a function in a public API that needs to return some kind of collection. Its type is something like

foo :: a -> b

where b is some kind of collection.

As far as I can see, there a three major kinds of collections -- at least those are the ones used most often:

  1. Sequences (order matters, there might be duplicates)
  2. Sets (order doesn't matter, no duplicates)
  3. Maps (elements are key-value pairs, order doesn't matter, no duplicate keys)

What should you use for b in these three cases?. I have considered three alternatives, but there might be more.


A. Just return a list

In this alternative, you just return lists for everything:

  1. foo :: a -> [c]
  2. foo :: a -> [c] (and document that there are no duplicates)
  3. foo :: a -> [(k, c)] (and document that there are no duplicate keys)

The advantage of this is that the client can construct any collection they want from the returned list.

The disadvantage for 2 and 3 is that

  • the type does not have the "no duplicates" guarantee and that the client has to do more work to get a set/map out of it
  • construction of the list in 2 and 3 might be less efficient, because something like a union for sets is obviously faster than for lists
  • you might construct an intermediate list (depends on the problem at hand).

B. Return a concrete type matching the guarantees

In this alternative, you return a type that precisely matches the guarantees you are giving:

  1. foo :: a -> [c] (or maybe foo :: a -> Data.Sequence.Seq c)
  2. foo :: a -> Data.Set.Set c (lazy or strict?)
  3. foo :: a -> Data.Map.Map k c (lazy or strict?)

The advantage is that the type accurately describes the guarantees you want to express and that the client doesn't have to do any extra work.

The disadvantage is that you're locked into a particular implementation. If you just want to switch to a different set implementation, you break the public API. That might be a good thing in case of a switch between lazy and strict, because those can matter for correctness of the calling code. But in general you want to be able to swap implementation without your clients noticing.

C. Let the client decide

In this alternative, you just constrain the return type with a type class and let the client decide on the concrete type:

1. `foo :: Sequence b c => a -> b`
2. `foo :: Set b c => a -> b`
3. `foo :: Map b c k => a -> b`

(The names I used are from collections-api, but it doesn't really matter.)

The advantage of this is that the client can choose which collection type to use.

The disadvantage is that you have no control over the collection you construct anymore. This means, e.g., that you cannot assume laziness or complexity of inserts. Another problem is that the client can get ambiguous types, e.g., when they immediately fold the returned collection:

Data.Foldable.fold (foo whatever)

Now b is ambiguous.


Are there any other alternatives? What would you recommend?

I don't want foo to be able to return a different kind of collection on every call. The three cases considered are completely separate.

In each case the question is "if I have a function that needs to return an X, what type should it have?" where X can be a sequence, set, or map (or any other kind of collection you can think of).

4
  • is there nothing like unfoldM ? Commented May 3, 2014 at 16:15
  • Does foo can return any type of collection in any calls? or some calls return some specific type? Commented May 3, 2014 at 16:37
  • @Jimmy Hoffa: that's alternative C. Commented May 3, 2014 at 18:30
  • @Simon: no, foo is just a placeholder here. I'm asking what to do if a function returns a sequence, what if it returns a set, etc. Commented May 3, 2014 at 18:31

2 Answers 2

5

My suggestion would be to declare an opaque newtype that hides the actual implementation and implement type classes that could be useful to your clients and that you can guarantee to be stable in the future. And you can also export functions that allow conversion of your data type to some standard ones.

This forces clients to use only functions or instances you provide and document, and also allows clients to use standard API such as Foldable:

module MyMod
    ( MyResult()
    , resultToSet   -- by hiding the internals and exporting the accessor
                    -- explicitly, you can always change the internal
                    -- representation and replace the accessor a function
                    -- with the same signature
    ) where

import Data.Monoid
import qualified Data.Foldable as F
import qualified Data.Set as S

newtype MyResult a = MyResult { resultToSet :: S.Set a }

-- For example, your clients could use `Foldable` and `Monoid` instances:

instance F.Foldable MyResult where
    foldMap f = F.foldMap f . resultToSet

instance (Ord a) => Monoid (MyResult a) where
    mempty = MyResult mempty
    mappend (MyResult x) (MyResult y) = MyResult (x `mappend` y)

Update: If you later decide to change the internal representation to HashSet, the module would look like this:

module MyMod
    ( MyResult()
    , resultToSet   -- by hiding the internals and exporting the accessor
                    -- explicitly, you can always change the internal
                    -- representation and replace the accessor a function
                    -- with the same signature
    ) where

import Data.Monoid
import qualified Data.Foldable as F
import Data.Hashable
import qualified Data.HashSet as HS
import qualified Data.Set as S

newtype MyResult a = MyResult { resultToHashSet :: HS.HashSet a }

-- For example, your clients could use `Foldable` and `Monoid` instances:

instance F.Foldable MyResult where
    foldMap f = F.foldMap f . resultToHashSet

instance (Eq a, Hashable a) => Monoid (MyResult a) where
    mempty = MyResult mempty
    mappend (MyResult x) (MyResult y) = MyResult (x `mappend` y)

-- Reimplementation of functions from the previous version

resultToSet :: (Ord a) => MyResult a -> S.Set a
resultToSet = S.fromList . HS.toList . resultToHashSet

The external interface is almost the same (the only change is that the Monoid instance now has slightly different constraints).


Another way that would give clients much flexibility is to let the collection be fully abstract. This can be nicely expressed using monoids. The output of your function will be a polymorphic monoid, we just need a way how to create singletons. For example, if we want to generalize enumFromTo, we could write

enumCol :: (Enum a, Monoid c) => a -> a -> ((a -> c) -> c)
enumCol f t singleton = mconcat . map singleton $ enumFromTo f t

It's then the client responsibility to use a monoid with effective mappend operation.

With RankNTypes the return type could be also encapsulated using type as

type OutCol e c = (Monoid c) => (e -> c) -> c

The additional benefit is that in some cases this can be very effective thanks to laziness. For example, the following returns immediately:

enumCol 0 (10^20) (First . Just)
5
  • Doesn't your first alternative still have the problem that I have to decide what type of set resultToSet returns? You're second alternative is basically C in my question. Do you know any packages that follow that approach? Commented May 4, 2014 at 18:40
  • @TobiasBrandt Yes, you need to decide it, but you're not bound by the implementation. If you later decide to change the internal representation from Set to HashSet, you can just replace resultToSet by a more sophisticated conversion. So clients won't be dependent on the internal representation.
    – Petr
    Commented May 4, 2014 at 19:54
  • @TobiasBrandt No, I'm not aware of any packages that follow the second approach. I agree, it's similar to C, with the small difference that by supplying the singleton function explicitly, we can avoid ambiguous types (at the cost of client having to supply it every time and perhaps also RankNTypes).
    – Petr
    Commented May 4, 2014 at 19:56
  • If I replace Set by HashSet, won't that require a recompilation by clients? Commented May 4, 2014 at 20:23
  • @TobiasBrandt I updated the example showing the changes when replacing Set with HashSet.
    – Petr
    Commented May 5, 2014 at 5:30
0

Why not just have three (or N) separate functions, one for each of the three (or N) separate types of collections?

2
  • Sorry, I think you misunderstood me. I'm asking how functions returning some particular kind of collections should be treated. The three cases are not the same function. Commented May 3, 2014 at 18:33
  • I dunno man. Sounds like this should be on cs.stackexchange.com =) Commented May 3, 2014 at 19:47

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.