A Month of Haskell, Day 5 - Applicative, Alternative, and Functor

Posted on May 9, 2017 by Chris Lumens in month-of-haskell.

And now I’m two days behind. If I can think of something sufficiently quick, I’ll have a bonus day where I do two posts. For now, you’re just going to have to enjoy this one single post about three complementary type classes: Applicative, Alternative, and Functor. The first two are in the Control.Applicative module, while the last is in the Data.Functor module. I’m going to ignore the jargon-heavy definitions of these type classes and skip right to showing how they can be used.

Both Control.Applicative and Data.Functor are part of the base system, so there’s nothing you need to install.

Prior to version 4.8.0 of the base system, a lot of the functions in Applicative were not re-exported as part of the prelude. To use them, you had to import them. If you care about supporting both older and newer versions of GHC, you can use the CPP language extension. I’m going to cover that in greater depth in a future post, but here’s the gist of it:

#if !MIN_VERSION_base(4,8,0)
import Control.Applicative((<$>))
#endif

As you can see, it looks a lot like CPP in any other language.

I think it makes more sense to talk about this module in terms of what you can do with it, rather than just running through definitions. The useful functions will come up as we go along.

Functors

We’re going to use the <$> function quite a bit, which is part of the functor module but is re-exported by Applicative. These two type classes are all tangled up with each other, so it’s worth covering functors briefly first.

Type classes:

class Functor f where
    fmap :: (a -> b) -> f a -> f b
    (<$) :: a -> f b -> f a

class Functor f => Applicative f where
    pure :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b
    (*>) :: f a -> f b -> f b
    (<*) :: f a -> f b -> f a

As with a lot of things in Haskell, “functor” is a fancy word for a really simple concept. It’s just a type class that provides the fmap function. It includes some other things, but they’re less important. A functor is anything that can be mapped over, and the fmap function is how you do that.

Type signatures:

fmap :: (a -> b) -> f a -> f b

A list is the most basic and most obvious example of a functor:

ghci> fmap (+1) [1, 2, 3]
[2,3,4]

Plenty of other types can be mapped over, but in a more abstract way. I prefer to think of it as a functor is a type you can reach inside of and apply a function to. So, Maybe is a functor:

ghci> fmap (* 2) (Just 4)
Just 8
ghci> fmap (* 2) Nothing
Nothing

IO is a functor, too:

ghci> :m +System.Directory Data.Char
System.Directory Data.Char> fmap (map toUpper) getHomeDirectory
"/HOME/CLUMENS"

An awful lot of things are functors. And anywhere you use fmap, you can use <$> instead. I suggest only doing so when an operator would make code look more natural. That last example could be written like this instead:

ghci> :m +System.Directory Data.Char
System.Directory Data.Char> map toUpper <$> getHomeDirectory
"/HOME/CLUMENS"

Eliminating intermediate variables

Here’s one way you could get the time and add a colon between the hour and minutes:

import Data.Time.Clock(UTCTime(..), getCurrentTime)
import Data.Time.Format(defaultTimeLocale, formatTime)

format :: String -> UTCTime -> String
format fmt time = formatTime defaultTimeLocale fmt time

theTime :: IO String
theTime = do
    time <- getCurrentTime
    return $ format "%R" time

addColon :: String -> String
addColon [h1, h2, m1, m2] = [h1, h2, ':', m1, m2]
addColon s = s

main :: IO ()
main = do
    time <- theTime
    let time' = addColon time
    putStrLn time'

Aside from the fact that time-related code is awful everywhere, there’s nothing really tricky here. addColon is pretty fragile - what if you change the format string and return a time with seconds? But aside from that, there’s no real problem. Running it gives you an answer you might expect:

$ runhaskell time.hs 
20:39

There’s something I really hate about this code, though, and it happens in two spots. theTime function has this intermediate time variable that we only need because of the IO monad. If we were in an imperative language we wouldn’t need it - we could just pass the result of getCurrentTime right into the formatting function. If we try that in Haskell, however, we get:

time.hs:9:26: error:
    • Couldn't match expected type ‘UTCTime’
                  with actual type ‘IO UTCTime’
    • In the second argument of ‘format’, namely ‘getCurrentTime’
      In the second argument of ‘($)’, namely
        ‘format "%R" getCurrentTime’
      In the expression: return $ format "%R" getCurrentTime

The same thing happens in the main function, too. To me, these intermediate steps obscure what is actually happening in the code. It makes things needlessly wordy and makes it seem like there’s some deficiency with functional programming.

How could we get rid of those? We use the <$> function. Instead of thinking about all that functor nonsense, I prefer to think about this function as being very similar to $, but for different types. Squint hard enough and those brackets go away and that’s what it is.

We can eliminate the intermediate variable in theTime in two different ways, depending on if you like functions or operators more:

theTime2 :: IO String
theTime2 = fmap (format "%R") getCurrentTime

theTime3 :: IO String
theTime3 = format "%R" <$> getCurrentTime

Both result in the exact same time string, and both eliminate the intermediate variable. Of the two, I think I prefer the operator version this time. However, there’s plenty of places for fmap. Then there’s the main function. That can be shortened up like so:

main :: IO ()
main = do
    time <- addColon <$> theTime
    putStrLn time

Here, addColon is a function that doesn’t have anything to do with the IO monad or any other monad at all. Its type is String -> String, but it somehow just works in this case. That’s due to the type of <$> and the fact that IO is a functor. We are reaching inside of the result of theTime and running addColon on what’s inside. Maybe using fmap would be more obvious here, but I like the pipeline style.

In general, anywhere you do something of this form:

v <- someFunction
someFunction2 v

You should think about using fmap and <$> to shorten things up. hlint will remind you, if you forget. At the least, this can save you from having to think up a lot of crazy temporary variable names just to throw them away on the next line.

Here’s a variation on that theme. Consider this code:

dlg <- new Dialog []
box <- dialogGetContentArea dlg
set box [ #spacing := 12 ]

let s = T.concat ["<b>Duplicate QSO detected</b>\n\n",
                  "A QSO made with ", asText qCall, " at ", T.pack . colonifyTime $ qTime, " ",
                  T.pack . dashifyDate $ qDate, " on ", showt qFreq,
                  " is a potential duplicate."]

lbl <- new Label [ #label := s, #useMarkup := True ]
containerAdd box lbl

void $ dialogAddButton dlg "Cancel" 1
void $ dialogAddButton dlg "Log it" 0

widgetShowAll dlg
ret <- dialogRun dlg
widgetHide dlg
return ret

This code uses a lot of advanced GTK stuff with haskell-gi, which I plan on going into in another post. However, it should be fairly easy to follow because it looks a lot like GTK code in C (or any other language, really). I’m leaving off the giant block of imports.

For this post, the most important thing is the last four lines. dialogRun blocks the screen while the dialog is displayed and returns the value associated with whatever button is pressed. This is pretty obnoxious code, though. We are using another temporary variable here, but it doesn’t fit the previous pattern because of having to run widgetHide in the middle.

Luckily, digging into the Applicative docs show a function that takes care of this problem. The <* function (whose type signature was shown in the Applicative type class definition above) runs what’s on its left side, runs what’s on its right side, and returns the value from the left side. Any value from the right side is discarded. Rewriting the last four lines look like this:

widgetShowAll dlg
dialogRun dlg <* widgetHide dlg

There’s also a *> function that is similar to <* but drops the value of the left side and returns the value of the right side. I’ve not had much reason to use it, but it’s there if you need it.

Alternatives

Sometimes, you want to try a couple different actions and take whichever one succeeds. That’s where the Alternative type class comes in. It’s an Applicative that adds two more basic functions, empty and the operator-like <|>. As you can see from the instances, not everything that is an Applicative is an Alternative, but at least lists, Maybe, and IO are. That covers a lot.

Type classes:

class Applicative f => Alternative f where
    empty :: f a
    (<|>) :: f a -> f a -> f a
    some :: f a -> f [a]
    many :: f a -> f [a]

To see why Alternative is useful, look at this contrived example:

let x = Nothing
let y = Just 2

if isJust x then x
else
    if isJust y then y
    else Just 3

Running this mess will give you Just 2. But imagine if there were a third or fourth or a whole list of possibilities to check. That would be a lot of stairsteps, and any time you see stairsteps you should start feeling like there’s a better way to do it.

That is exactly the point of <|>. The following would also return Just 2:

let x = Nothing
let y = Just 2

x <|> y <|> Just 3

Because of Maybe’s Alternative definition, the first thing in the chain that returns a Just value will be the final result. Nothing later in the chain is evaluated. This can be very handy in real world applications:

import Control.Applicative((<|>))
import System.Directory(doesFileExist)

findConfigFile :: FilePath -> IO (Maybe FilePath)
findConfigFile fp = do
    ret <- doesFileExist fp
    if ret then return (Just fp) else return Nothing

main :: IO ()
main = do
    cfg <- findConfigFile "/home/clumens/.foorc" <|>
           findConfigFile "/usr/local/etc/foorc" <|>
           findConfigFile "/etc/foorc"
    print cfg

If none of these files exist, you’ll get Nothing. Otherwise, the first one that exists will be the value of cfg. I have implemented my own wrapper around doesFileExist to make this work, but it’s not doing anything special. It’s just there to return the right type so the Applicative style works out.

Applicative Records

There’s plenty more things you can do with applicatives, but here’s just one more example for now. Let’s say we want to make a record that contains several things involving the IO monad:

import System.Directory(XdgDirectory(..), getHomeDirectory, getXdgDirectory)
import System.Posix.User(GroupEntry, getAllGroupEntries, getLoginName)

data User = User { homeDir :: FilePath,
                   xdgDir :: FilePath,
                   loginName :: String,
                   userGroups :: [GroupEntry],
                   ident :: Int }

main :: IO ()
main = do
    homeDir <- getHomeDirectory
    xdgDir <- getXdgDirectory XdgData ""
    loginName <- getLoginName
    userGroups <- getAllGroupEntries

    let u = User { homeDir=homeDir,
                   xdgDir=xdgDir,
                   loginName=loginName,
                   userGroups=userGroups,
                   ident=0 }

    return ()

We’re back to temporary variables here, and it’s still annoying. Because the IO monad is involved again, we have to make a temporary variable for each element of the record to run the action, just to put it into the record and throw the variable name away. You might be thinking there’s a better way to do it, and you’re right. With the <*> function from Applicative, we can condense it like this:

import System.Directory(XdgDirectory(..), getHomeDirectory, getXdgDirectory)
import System.Posix.User(GroupEntry, getAllGroupEntries, getLoginName)

data User = User { homeDir :: FilePath,
                   xdgDir :: FilePath,
                   loginName :: String,
                   userGroups :: [GroupEntry],
                   ident :: Int }

main :: IO ()
main = do
    u <- User <$> getHomeDirectory
              <*> getXdgDirectory XdgData ""
              <*> getLoginName
              <*> getAllGroupEntries
              <*> return 0

    return ()

Use <$> for the first element and <*> for all subsequent elements. If you are interested in the details, the documentation is somewhat enlightening. Also, because of the Applicative functions involved and the first four elements of the record involving a monad, the last one needs to as well. That’s why you have to use return to get the integer value into a monad.

So that’s about it for now. Functors come up all over the place. Applicatives come up in surprising places too. In the future, I hope to cover writing parsers in an applicative style.