Wednesday, November 26, 2014

Releasing haskell-names-0.5.0

As the new maintainer of haskell-names I am happy to release version 0.5.0. The biggest
changes are:

  • Class and instance declarations are now properly resolved
  • Unify the two types for type-level and value-level symbols
  • Annotate global symbol occurrences with their qualification
  • Remove all the different notions of names
  • Symbols do not contain package information anymore

All the names exported from a module are stored in a .names file in JSON format. For example a small excerpt from Prelude.names looks like:

[
  {
    "name": "map",
    "entity": "value",
    "module": "GHC.Base"
  },
  {
    "name": "IO",
    "entity": "newtype",
    "module": "GHC.Types"
  },
  ...

A symbol is uniquely identified by its name, the module it originates from and what entity it refers to. Some symbols carry additional information, for example constructors have an additional field for the type they belong to.

Let’s look at the new example from the haskell-names github page. The findHeads function finds all occurrences of the head symbol in a given AST of a Haskell module. It needs access to stored name information and therefore runs in ModuleT.

findHeads :: Module SrcSpanInfo -> ModuleT [Symbol] IO [SrcSpanInfo]
findHeads ast = do

First we get all symbols exported from Prelude with getModuleInfo.

  symbols <- fromMaybe (error "Prelude not found") <$>
    getModuleInfo "Prelude"

Then we filter those for the one with name "head".

  let
    headSymbol =
      fromMaybe (error "Prelude.head not found") (listToMaybe (do
        symbol <- symbols
        guard (symbolName symbol == UnAnn.Ident "head")
        return symbol))

We annotate the given ast.

  annotatedAst <-
    annotateModule
      Haskell2010 -- base language
      []          -- set of extensions
      ast

We get a list of all annotations from the annotated module.

  let
    annotations = Foldable.toList annotatedAst

A GlobalSymbol annotation means that the annotated name refers to a global symbol. It also contains the qualified name that corresponds to how it is referenced but that is not needed here.

    headUsages = nub (do
      Scoped (GlobalSymbol globalSymbol _) location <- annotations
      guard (globalSymbol == headSymbol)
      return location)

And finally we return all found usages.

 return headUsages

That concludes the brief example of how you could use haskell-names.

Big thanks to Roman for his awesome work.

Thursday, October 9, 2014

The Haskell image processing benchmark game

Since by now we have several ongoing image processing library projects in Haskell I thought it’s time to spice things up a little. I therefore announce the Haskell image processing benchmark game. Right now the five contestants are: friday, unm-hip, yarr, repa and opencv but the contest is open and participation is just one pull request away! The disciplines are: reading an image, binary thresholding and mean filtering. Of course more disciplines will be added.
Performance is not the only goal one should strive for. Ease of use and number of predefined algorithms are two other important dimensions.
Let’s compare the different libraries with code examples for specific use cases.

The image type

Currently all benchmarks run on grey images so the first thing we need is a type for those.

-- friday
import Vision.Image.Grey (Grey)
type Image = Grey

-- unm-hip
import Data.Image.Boxed (GrayImage)
type Image = GrayImage

-- yarr
import Data.Yarr (UArray,D,L,Dim2)
import Data.Word (Word8)
type Image = UArray D L Dim2 Word8

-- repa
import Data.Array.Repa (Array,D,DIM2)
import Data.Word (Word8)
type Image = Array D DIM2 Word8

-- opencv
import Foreign.ForeignPtr (ForeignPtr)
data IPLImage
type Image = ForeignPtr IPLImage

For friday and unm-hip the preferred type for grey images was easy to find. There is only one. For yarr and repa not so much. In opencv there is only one image type for any kind of image regardless of the pixel type. From experience I can tell that this is not a good thing.

Reading images from disk

Next we want to read in an image. Let’s compare:

-- friday
import Vision.Image.Storage (load)
import Vision.Image.Type (convert)

readPng :: FilePath -> IO Image
readPng filepath = do
    Right image <- load Nothing filepath
    return (convert image)

-- unm-hip
import Data.Image.Boxed (readImage)

readPgm :: FilePath -> IO Image
readPgm = readImage

-- yarr
import Data.Yarr (delay)
import Data.Yarr.IO.Image (readImage)
import qualified Data.Yarr.IO.Image as Yarr (Image(Grey))

readPng :: FilePath -> IO Image
readPng filepath = do
    Yarr.Grey image <- readImage filepath
    return (delay image)

-- repa
import Data.Array.Repa.IO.DevIL (readImage,runIL)
import qualified Data.Array.Repa.IO.DevIL as Repa (Image(Grey))
import Data.Array.Repa (delay)

readPng :: FilePath -> IO Image
readPng filepath = runIL (do
   Repa.Grey image <- readImage filepath
   return (delay image))

-- opencv
import Foreign.C.String (CString,withCString)
import Foreign.Ptr (Ptr,FunPtr)
import Foreign.ForeignPtr (ForeignPtr,newForeignPtr)

foreign import ccall "&finalizeImage" opencv_finalizeImage ::
    FunPtr (Ptr IPLImage -> IO ())

foreign import ccall "readPng" opencv_readPng ::
    CString -> IO (Ptr IPLImage)

addImageFinalizer :: Ptr IPLImage -> IO (ForeignPtr IPLImage)
addImageFinalizer = newForeignPtr opencv_finalizeImage

readPng :: FilePath -> IO Image
readPng filepath = do
    imagePtr <- withCString filepath opencv_readPng
    addImageFinalizer imagePtr

-- opencv C code
void finalizeImage(IplImage* image){
    cvReleaseImage(&image);
}

IplImage* readPng(char* filepath){
    return cvLoadImage(filepath,CV_LOAD_IMAGE_GRAYSCALE);
}

Reading in an image in unm-hip is straight forward. Except that it does not know how to read PNG and can only read PGM. For yarr, repa and friday I had to look a little more stuff up than I’d have liked to. And opencv being a foreign library we need to manually make every image we create garbage collected by adding a finalizer.

Forcing an image

For benchmarking we also need a way to make sure the resulting images are fully evaluated. The friday and opencv image types already guarantee that. For the others we define an extra force function:

-- unm-hip
import Data.Image.Internal (maxIntensity)

force :: Image -> IO Image
force image = do
    maxIntensity image `seq` return image

-- yarr
import Data.Yarr (UArray,F,L,Dim2)
import Data.Yarr.Eval (dComputeP)

force :: Image -> IO (UArray F L Dim2 Word8)
force = dComputeP

-- repa
import Data.Array.Repa (Array,DIM2,computeP,deepSeqArray)
import Data.Array.Repa.Repr.Unboxed (U)
import Data.Word (Word8)

force :: Image -> IO (Array U DIM2 Word8)
force image = do
    forcedImage <- computeP image
    forcedImage `deepSeqArray` return forcedImage

Wait, what kind of hack is that forcing an unm-hip image? I couldn’t find a better way because the constructor of the image type is hidden. Of course this means that the benchmark is pointless, but I decided to include it anyway.

Binary thresholding

Now let’s look at the actual image processing algorithms. The threshold function should accept a grey value image and yield an image where every pixel that has a value above 127 in the original image has value 255 and all others have value 0.

-- friday
import qualified Vision.Image.Threshold as Friday (
    threshold)
import Vision.Image.Threshold (
    ThresholdType(BinaryThreshold))

threshold :: Image -> Image
threshold = Friday.threshold (>127) (BinaryThreshold 0 255)

-- unm-hip
threshold :: Image -> Image
threshold = toBinaryImage (<127)

-- yarr
import Data.Yarr.Flow (dmap)

threshold :: Image -> Image
threshold = dmap (\value -> if value > 127 then 255 else 0)

-- repa
import qualified Data.Array.Repa as Repa (map)

threshold :: Image -> Image
threshold = Repa.map (\value -> if value > 127 then 255 else 0)

-- opencv
foreign import ccall "threshold" opencv_threshold :: Ptr IPLImage -> IO (Ptr IPLImage)

threshold :: Image -> IO Image
threshold = withImage opencv_threshold

withImage :: (Ptr IPLImage -> IO (Ptr IPLImage)) -> Image -> IO Image
withImage f image = withForeignPtr image (\imagePtr -> do
    f imagePtr >>= addImageFinalizer)

-- opencv C code
IplImage* threshold(IplImage* image){
IplImage* output = cvCreateImage(cvGetSize(image),IPL_DEPTH_8U,1);
cvThreshold(image,output,127,255,CV_THRESH_BINARY);
return output;
}

Binary thresholding is a simple map that is straight forward to implement in yarr and repa. The other three friday, unm-hipand opencv provide their own specialized functions for binary thresholding. unm-hips toBinaryImage chooses reasonable default values for you. friday provides its own mini language for creating filtering operations that I had to learn. opencv takes as its fifth parameter an integer that represents the thresholding algorithm to be used and depending on that gives different meaning to the other parameters. I personally find that very confusing.

Mean filter

Now the task is to apply a five by five mean filter:

-- friday
import Vision.Image.Filter (apply,blur,SeparableFilter)

mean :: Image -> Image
mean = apply (blur 2 :: SeparableFilter GreyPixel GreyPixel GreyPixel)

-- unm-hip
import Data.Image.Convolution (convolveRows,convolveCols)

mean :: Image -> Image
mean = fmap (/25) . convolveCols [1,1,1,1,1] . convolveRows [1,1,1,1,1]

-- yarr
import Data.Yarr.Convolution (
    dConvolveLinearDim2WithStaticStencil,
    dim2St,Dim2Stencil)
import Data.Yarr.Utils.FixedVector (N1,N5)

mean :: Image -> IO Image
mean image = do

    rowConvolvedImage <- dComputeP (
        dConvolveLinearDim2WithStaticStencil
            rowMeanStencil (dmap fromIntegral image))

    convolvedImage <- dComputeP (
        dConvolveLinearDim2WithStaticStencil
            colMeanStencil (rowConvolvedImage :: UArray F L Dim2 Int))

    return (dmap (fromIntegral . (`div` 25)) (
        convolvedImage :: UArray F L Dim2 Int))

rowMeanStencil :: Dim2Stencil N5 N1 Int (Int -> Int -> IO Int) Int
rowMeanStencil = [dim2St| 1 1 1 1 1 |]

colMeanStencil :: Dim2Stencil N1 N5 Int (Int -> Int -> IO Int) Int
colMeanStencil = [dim2St| 1
                          1
                          1
                          1
                          1 |]

-- repa
import Data.Array.Repa.Algorithms.Convolve (
    convolveOutP,outClamp)
import Data.Array.Repa (
    Array,DIM2,delay,computeP,Z(Z),(:.)((:.)))
import Data.Array.Repa.Repr.Unboxed (
    U,fromListUnboxed)

mean :: Image -> IO Image
mean image = do
    intImage <- computeP (Repa.map fromIntegral image)
    rowConvolvedImage <- convolveOutP outClamp rowMeanStencil intImage
    convolvedImage <- convolveOutP outClamp colMeanStencil rowConvolvedImage
    grayImage <- computeP (Repa.map (fromIntegral . (`div` 25)) convolvedImage)
    return (delay (grayImage :: Array U DIM2 Word8))

rowMeanStencil :: Array U DIM2 Int
rowMeanStencil = fromListUnboxed (Z:.1:.5) [1,1,1,1,1]

colMeanStencil :: Array U DIM2 Int
colMeanStencil = fromListUnboxed (Z:.5:.1) [1,1,1,1,1]

-- opencv
foreign import ccall "mean" opencv_mean ::
    Ptr IPLImage -> IO (Ptr IPLImage)

mean :: Image -> IO Image
mean = withImage opencv_mean

-- opencv C code
IplImage* mean(IplImage* image){
    IplImage* output = cvCreateImage(cvGetSize(image),IPL_DEPTH_8U,1);
    cvSmooth(image,output,CV_BLUR,5,5,0.0,0.0);
    return output;
}

friday provides it’s own set of functions for filtering. Using unm-hip feels like functional image processing should feel like. Using repa and yarr to implement a five by five mean filter required quite a bit of reading documentation and examples. Again the opencv function cvSmooth accepts as its third parameter an integer representing the smoothing algorithm to be used and changes the meaning of the other parameters or completely ignores them depending on that. This is why I think we need one or more Haskell image processing library, so let’s work together to make it happen.

Oh, right, there are benchmark results as well! Disclaimer: While I have tried my best, I might have used the libraries in the wrong way. Please do correct my mistakes.

EDIT: After using deepseq to force unm-hip images, rewriting the mean function in friday to pointful style and adjusting the yarr and repa functions these are the new results.

Sunday, October 5, 2014

Two sorting algorithms in Morte

So I have translated the non-recursive 1 insertion and selection sort to Morte. The idea was to check if they both reduce to the same highly optimized code. It turns out that they do not. I will reproduce the code here anyway because I might have made a mistake or it is useful to someone else.

-- Foreign imports
(  \(a : *)
-> \(min : a -> a -> a)
-> \(max : a -> a -> a)
->

-- Module definitions
(  \(L_a : * -> *)
-> \(Mu : (* -> *) -> *)
-> \(Nu : (* -> *) -> *)
-> \(fmapL :
        forall (x:*) ->
        forall (y:*) ->
        (x -> y) ->
        L_a x ->
        L_a y)
-> \(fold :
        forall (f : * -> *) ->
        forall (x:*) ->
        (f x -> x) ->
        Mu f -> x)
-> \(wrap :
        forall (f : * -> *) ->
        (forall (x:*) ->forall (y:*) -> (x -> y) -> f x -> f y) ->
        f (Mu f) ->
        Mu f)
-> \(unfold :
        forall (g: * -> *) ->
        forall (s:*) ->
        (s -> g s) ->
        s ->
        Nu g)
-> \(unwrap :
        forall (g : * -> *) ->
        (forall (x:*) -> forall (y:*) -> (x -> y) -> g x -> g y) ->
        Nu g ->
        g (Nu g))
-> \(swap :
        forall (x:*) ->
        L_a (L_a x) ->
        L_a (L_a x))
->
--  Insertion sort
    fold L_a (Nu L_a) (
        unfold L_a (L_a (Nu L_a)) (\(s : L_a (Nu L_a)) ->
            swap (Nu L_a) (
                fmapL (Nu L_a) (L_a (Nu L_a)) (unwrap L_a fmapL) s)))
--  Selection sort
--  unfold L_a (Mu L_a) (
--      fold L_a (L_a (Mu L_a)) (\(f_x : L_a (L_a (Mu L_a))) ->
--          fmapL (L_a (Mu L_a)) (Mu L_a) (wrap L_a fmapL) (
--              swap (Mu L_a) f_x)))
)

-- type L_a x = z -> (a -> x -> z) -> z
(\(x:*) -> forall (z:*) -> z -> (a -> x -> z) -> z)

-- data Mu f = Mu (forall x . (f x -> x) -> x)
(\(f : * -> *) -> forall (x:*) -> (f x -> x) -> x)

-- data Nu g = forall s . Nu (s -> g s) s
(\(g : * -> *) -> forall (y:*) -> (forall (s:*) -> (s -> g s) -> s -> y) -> y)

-- fmapL :: (x -> y) -> L_a x -> L_a y
-- fmapL f la_x empty cons = la_x empty (\va vx -> cons va (f vx))
(   \(x:*)
->  \(y:*)
->  \(f : x -> y)
->  \(la_x : forall (z:*) -> z -> (a -> x -> z) -> z)
->  \(z:*)
->  \(empty : z)
->  \(cons : a -> y -> z)
->  la_x z empty (\(va:a) -> \(vx:x) -> cons va (f vx))
)

-- fold :: (f x -> x) -> Mu f -> x
-- fold alg (Mu mu_f) = mu_f alg
(   \(f : * -> *)
->  \(x:*)
->  \(alg : f x -> x)
->  \(mu_f : forall (x:*) -> (f x -> x) -> x)
->  mu_f x alg
)

-- wrap :: f (Mu f) -> Mu f
-- wrap f_mu_f = Mu (\alg -> alg (fmap (fold alg) f_mu_f)))
(
(   \(Mu : (* -> *) -> *)
->  \(fold : forall (f : * -> *) -> forall (x:*) -> (f x -> x) -> Mu f -> x)
->  \(f : * -> *)
->  \(fmap : forall (x:*) -> forall (y:*) -> (x -> y) -> f x -> f y)
->  \(f_mu_f : f (Mu f))
->  \(y:*) ->  \(alg : f y -> y)
->  alg (fmap (Mu f) y (fold f y alg) f_mu_f)
)
-- Mu
(\(f : * -> *) -> forall (x:*) -> (f x -> x) -> x)
-- fold
(   \(f : * -> *)
->  \(x:*)
->  \(alg : f x -> x)
->  \(mu_f : forall (x:*) -> (f x -> x) -> x)
->  mu_f x alg
)
)

-- unfold :: (s -> g s) -> s -> Nu g
-- unfold coalg vs = Nu coalg vs
(   \(g : * -> *)
->  \(s : *)
->  \(coalg : s -> g s)
->  \(vs : s)
->  \(y : *)
->  \(caseNu : forall (s:*) -> (s -> g s) -> s -> y)
->  caseNu s coalg vs
)

-- unwrap :: Nu g -> g (Nu g)
-- unwrap (Nu coalg vs) = fmap (unfold coalg) (coalg vs)
(
(   \(Nu : (* -> *) -> *)
->  \(unfold : forall (g: * -> *) -> forall (s:*) -> (s -> g s) -> s -> Nu g)
->  \(g : * -> *)
->  \(fmap : forall (x:*) -> forall (y:*) -> (x -> y) -> g x -> g y)
->  \(nu_g : forall (y:*) -> (forall (s:*) -> (s -> g s) -> s -> y) -> y)
->  nu_g (g (Nu g)) (\(s:*) -> \(coalg : s -> g s) -> \(vs:s)
->  fmap s (Nu g) (unfold g s coalg) (coalg vs))
)
-- Nu
(\(g : * -> *) -> forall (y:*) -> (forall (s:*) -> (s -> g s) -> s -> y) -> y)
-- unfold
(   \(g : * -> *)
->  \(s : *)
->  \(coalg : s -> g s)
->  \(vs : s)
->  \(y : *)
->  \(caseNu : forall (s:*) -> (s -> g s) -> s -> y)
->  caseNu s coalg vs
)
)

-- swap :: L_a (L_a x) -> L_a (L_a x)
-- swap Empty = Empty
-- swap (Cons a Empty) = Cons a Empty
-- swap (Cons a1 (Cons a2 x)) = Cons (min a1 a2) (Cons (max a1 a2) x)
(
(   \(L_a : * -> *)
->  \(empty : forall (x:*) -> L_a x)
->  \(cons : forall (x:*) -> a -> x -> L_a x)
->  \(caseLa : forall (x:*) -> L_a x -> forall (z:*) -> z -> (a -> x -> z) -> z)
->  \(x:*)
->  \(outer : L_a (L_a x))
->  caseLa (L_a x) outer (L_a (L_a x))
        (empty (L_a x))
        (\(vaa : a) -> \(inner : L_a x) -> caseLa x inner (L_a (L_a x))
            (cons (L_a x) vaa (empty x))
            (\(vab : a) -> \(vx : x) ->
                (cons (L_a x) (min vaa vab) (cons x (max vaa vab) vx))))

)
-- L_a
(\(x:*) -> forall (z:*) -> z -> (a -> x -> z) -> z)
-- empty
(\(x:*) -> \(z:*) -> \(empty : z) -> \(cons : a -> x -> z) -> empty)
-- cons
(\(x:*) -> \(head : a) -> \(tail : x) ->
    \(z:*) -> \(empty : z) -> \(cons : a -> x -> z) ->
        cons head tail)
-- caseLa
(\(x:*) -> \(lax : forall (z:*) -> z -> (a -> x -> z) -> z) -> lax)
)

)

It would still be interesting to compare the reduced code to the Core that GHC produces or to see if one can encode tree sort and merge sort in Morte as well.


  1. What I mean with non-recursive is that they avoid general recursion and are defined using only primitive recursion.

Sunday, September 28, 2014

A non-recursive sorting algorithm

So I told my (non-Haskeller) friends about Morte and how it promises that programs with identical semantics have identical performance. They immediately replied: “How is that possible, if we have different sorting algorithms with the same semantics but different asymptotic performance”. I replied: “Either it is impossible to write a sorting algorithm in Morte, or they all have the same performance”. Looking back I now understand that identical semantics here means identical semantics as provable by equational reasoning and that there might be other notions of identical semantics where programs with identical semantics can have different performance.
Anyway, I challenged myself to write a sorting algorithm in Morte to see what’s going on. As a first step I had to find a non-recursive Haskell sorting algorithm. This is what the rest of this post is about. I then plan to translate it to Morte.

Let’s first introduce the standard F-Algebra and F-Coalgebra machinery. Why? Because Gabriel suggest to implement recursion with an algebra and corecursion with a coalgebra. A little generalization doesn’t hurt.

data Fix f = In {out :: f (Fix f)}

fold :: (Functor f) => (f x -> x) -> Fix f -> x
fold alg = alg . fmap (fold alg) . out

wrap :: f (Fix f) -> Fix f
wrap = In

unfold :: (Functor g) => (s -> g s) -> s -> Fix g
unfold coalg = In . fmap (unfold coalg) . coalg

unwrap :: Fix g -> g (Fix g)
unwrap = out

But these standard definitions for Fix, fold and unfold are recursive. This will not do.

The non-recursive version encodes the least fix point Mu as the ability to fold and the greatest fix point Nu as the ability to unfold. Mu uses universal quantification and Nu uses existential quantification. wrap folds the inner data and then applies the given algebra one more time. unwrap applies the coalgebra once and then proceeds to unfold with the new seed.

data Mu f = Mu (forall x . (f x -> x) -> x)

fold :: (f x -> x) -> Mu f -> x
fold alg (Mu mu) = mu alg

wrap :: (Functor f) => f (Mu f) -> Mu f
wrap layer = Mu (\alg -> alg (fmap (fold alg) layer))


data Nu g = forall s . Nu (s -> g s) s

unfold :: (s -> g s) -> s -> Nu g
unfold coalg s = Nu coalg s

unwrap :: (Functor g) => Nu g -> g (Nu g)
unwrap (Nu coalg s) = fmap (unfold coalg) (coalg s)

We then have different types for lists and streams that both share the same underlying Functor called L.

data L a x = Empty | Cons a x deriving (Functor)

type List a = Mu (L a)

type Stream a = Nu (L a)

The actual sorting algorithm is from this paper about sorting with folds and unfolds. It does a better job at explaining it than I could do here so I’ll just give you the definitions for insertion sort and selection sort.

insertionSort :: (Ord a) => List a -> Stream a
insertionSort = fold insert

insert :: (Ord a) => L a (Stream a) -> Stream a
insert = unfold (swap . fmap unwrap)

selectionSort :: (Ord a) => List a -> Stream a
selectionSort = unfold select

select :: (Ord a) => List a -> L a (List a)
select = fold (fmap wrap . swap)

swap :: (Ord a) => L a (L a x) -> L a (L a x)
swap Empty = Empty
swap (Cons a Empty) = Cons a Empty
swap (Cons a1 (Cons a2 x)) = if a1 <= a2
    then Cons a1 (Cons a2 x)
    else Cons a2 (Cons a1 x)

If we translate the definitions for insertion sort and for selection sort to Morte, will they be identical?