Saturday, October 12, 2013

Haskell type graphs with Uniplate and Haskell-src-exts

Summary: Generating graphs from Haskell modules is easy, using Haskell-src-exts and Uniplate.

In my recent Uniplate talk I showed a graph of which Haskell-src-exts types contain Exp at any level. Several people in the audience asked for the code, so I've tidied it up and will walk through it in this post. Before we start, the graph we want to generate is:


Each node is a type, and an edge from a to b means at least one constructor of type a contains a value of type b. I only include types which eventually reach Exp. To calculate this graph, I used Haskell-src-exts and Uniplate. The main function is:

import Language.Haskell.Exts          -- haskell-src-exts
import Data.Generics.Uniplate.Data    -- uniplate
import Data.List

main :: IO ()
main = writeFile "graph.dot" . graph . interesting "Exp" . reach =<< getModule

We can read main from right to left:

  • We first call getModule to get the module we are interested in.
  • Next we call reach to compute which types are directly contained within which types.
  • We use interesting to pick out only those types which can reach Exp.
  • We convert to a DOT file using graph.
  • And finally we write the file out to Graph.dot.
  • Afterwards we can manually run dot -Tpng graph.dot > graph.png to generate the image above.

Now let's go through each of the functions in turn.

The getModule function

getModule :: IO Module
getModule = do
    src <- readFile "../../haskell-src-exts/src/Language/Haskell/Exts/Syntax.hs"
    return $ fromParseResult $ parseModule $ unlines $ filter (not . bad) $ lines src
    where bad x = head (words x ++ [""]) `elem` words "#if #else #endif deriving #ifdef"

We read the module from a known location on disk and parse it. The only complication is that the Haskell-src-exts AST module uses the C pre processor (CPP). By filtering out all lines beginning with CPP directives, and also deriving, we end up with a module with the same type structure, but which can be parsed with Haskell-src-exts. An alternative approach would be to use something like cpphs to properly interpret the CPP directives.

The reach function

reach :: Module -> [(String, [String])]
reach m = [ (prettyPrint name, nub [prettyPrint x | TyCon x <- universeBi ctors])
          | DataDecl _ _ _ name _ ctors _ <- universeBi m]

This function gets a list of type name and contained types for all data declarations in the module. We use the universeBi function from Uniplate to find everything of the relevant type, a list comprehension pattern to filter it down and grab the parts we want, and then prettyPrint to convert things to String. This function is the only one that uses Uniplate.

The interesting function

interesting :: String -> [(String, [String])] -> [(String, [String])]
interesting target xs = [(a,b `intersect` keep) | (a,b) <- xs, a `elem` keep]
    where keep = f [target] xs
          f want xs = if null new then want else f (map fst new ++ want) rest
              where (new,rest) = partition (not . null . intersect want . snd) xs

This function removes all types which don't at some point include the target, which is Exp in our case. We first compute keep, which is the interesting types, then restrict the input to only refer to things in keep. To compute keep we start by using the target as the thing we want. We then repeatedly find any types that include anything we want. Once there is nothing new that we want, we finish. This function is the only one that performs thoughtful computation, the others are just shuffling data between formats.

The graph function

graph :: [(String, [String])] -> String
graph xs = unlines $ ["digraph g {"] ++ [from ++ " -> " ++ t ++ ";" | (from,to) <- xs, t <- to] ++ ["}"]

This function converts the types into GraphViz format, which is a list of edges in a directed graph.

License

This code is throwaway code I wrote to generate one slide, and didn't even bother checking into a version control system. If it's useful to you, please use it for whatever purpose you want. Ignoring blank lines, optional type signatures and imports, it only takes 12 lines.

No comments: