Problem

After becoming interested in various bits of functional programming provided by non-functional languages, I’ve recently decided to start learning Haskell. As I’m fairly experienced with conventional imperative programming languages, I decided to make my “hello world” something a little more complex – an implementation of gradient descent with 2 variables.

The idea with this code is that you fill in the training set for the algorithm in the code, and then run the code something similar to this:

```
descentFunc = singleDescend trainingSet 0.1
deltas = descend descentFunc ( 100, 1 ) 100000
```

Where `0.1`

is the learning rate (referred to as `lr`

in the code) `100000`

is the number of iterations for the loop, and `( 100, 1 )`

is an initial guess for the coefficients.

The actual code that’s running is below. As I’m entirely new to Haskell, I was wondering whether code such as this is acceptable? And whether there’s any obvious idioms I’m missing/misusing or any glaring style errors I’ve made.

Any comments on my implementation of the algorithm are welcome also.

```
import Data.List
trainingSet = [ (1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0) ]
hypothesis (d1,d2) x = d1 + (d2 * x)
squareDiff d (x,y) = diff * diff
where diff = ( hypothesis d x ) - y
costFunc ts d = scaleFactor * sum squares
where scaleFactor = 1.0 / (2 * genericLength ts)
squares = map (squareDiff d) ts
diff d (x,y) = (hypothesis d x) - y
descendTheta thetaFunc deltaFunc ts lr deltas =
deltaFunc deltas - (scaleFactor * sum scaledDiffs)
where scaleFactor = lr / genericLength ts
diffs = map (diff deltas) ts
scaledDiffs = map ((x,y) -> x * thetaFunc y) $ zip diffs ts
descendThetaZero = descendTheta (_ -> 1) fst
descendThetaOne = descendTheta (fst) snd
singleDescend ts lr deltas = (thetaZero,thetaOne)
where thetaZero = descendThetaZero ts lr deltas
thetaOne = descendThetaOne ts lr deltas
descend func deltas i
| i == 0 = deltas
| otherwise = descend func ( func deltas ) ( i - 1 )
```

Solution

Welcome to haskell, Your code is quite good for a beginner, I have only peripheral recommendations on your style.

```
import Test.HUnit
import Data.List
import Control.Arrow
```

As @epsilon ‘εⳆ2’ halbe suggested below, use type signatures to define your functions first, this would help you avoid obvious errors.

```
type DeltaPair = (Double, Double)
type PairFunc = DeltaPair -> Double
trainingSet = [(1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0)]
```

The classes in `Control.*`

are often very useful. These are just your functions that have been modified a little bit. This may seem like overkill to you, but as a beginner, you would profit tremendously if you internalize the functions in the `Control.*`

space. At the very least, understand the functions used here. (I would certainly recommend your definition of the diff over this, but the point here is one of demonstration.)

```
diff :: DeltaPair -> DeltaPair -> Double
diff d = uncurry (-) . first (flip hypothesis d)
where hypothesis x = uncurry (+) . first (* x)
```

The scaleDiffs can be written in two ways, one by recognizing that it is actually a `zipWith`

and the other, using the functions in `Control.Arrow`

to manipulate the pair. The second gets you a pointfree form. Use which ever you feel comfortable with, but understand both.

I have replaced genericLength with `fromIntegral . length`

because it seems from your training set that the set is not very large. If that is not the case, change it back to `genericLength`

```
descendTheta :: PairFunc -> PairFunc -> [DeltaPair] -> Double -> DeltaPair -> Double
descendTheta thetaFunc deltaFunc ts lr deltas = deltaFunc deltas - sSum
where scaleFactor = lr / fromIntegral (length ts)
-- scaledDiffs ts = zipWith (x y -> x * thetaFunc y) (diffs ts) ts
scaledDiffs ts = map (uncurry (*) . second thetaFunc) $ zip (diffs ts) ts
diffs = map (diff deltas)
sSum = scaleFactor * sum (scaledDiffs ts)
```

It is often hard to decide which definitions go on the top level. A good rule of thumb is to see if the function being defined can be reused elsewhere. If they can’t then they probably are better of as sub definitions under a where clause.

```
singleDescend :: [DeltaPair] -> Double -> DeltaPair -> DeltaPair
singleDescend ts lr deltas = (fn descendThetaZero, fn descendThetaOne)
where fn f = f ts lr deltas
descendThetaZero = descendTheta (const 1) fst
descendThetaOne = descendTheta fst snd
```

We should always be on the look out for general functions. These help us tremendously in refactoring the code. Here is one such function.

```
napply :: Int -> (c -> c) -> c -> c
napply n = ((!! n) .) . iterate
descend :: (DeltaPair -> DeltaPair) -> DeltaPair -> Int -> DeltaPair
descend = flip . flip napply
deltas n = descend descentFunc (100, 1) n
where descentFunc = singleDescend trainingSet 0.1
```

A few unit tests

```
tests = TestList [
TestLabel "1" test1,
TestLabel "2" test2,
TestLabel "3" test3
]
test1 = TestCase $ assertEqual "A1" (100.0, 1.0) (deltas 0)
test2 = TestCase $ assertEqual "A1" (92.25,-17.25) (deltas 1)
test3 = TestCase $ assertEqual "A1" (89.8375,-19.875) (deltas 2)
tt = runTestTT tests
```

These two does not seem to be used any where

```
{-
squareDiff :: Num a => (a, a) -> (a, a) -> a
squareDiff = ((^ 2) .) . diff
costFunc :: Fractional a => [(a, a)] -> (a, a) -> a
costFunc ts d = scaleFactor * sum squares
where scaleFactor = 1.0 / (2 * fromIntegral (length ts))
squares = map (squareDiff d) ts
-}
```

one obvious change would be to add type signatures,

i additionally introduced some type synonyms to distinguish between all those `Doubles`

. A bit explanation could be done towards the algorithm – I don’t know what it actually want to achieve. I am no programmer so the non understanding comes quite often.

Next thing is it seems `costFunc`

is never used – why is it there?

and I’ve replaced `genericLength`

by length and converting the result – as the library says length is faster.

```
type TrainData = [(Double,Double)]
type LearnRate = Double
type DeltaPair = (Double,Double)
trainingSet :: TrainData
trainingSet = [ (1.0,10.0),(2.0,20.0),(3.0,30.0),(4.0,40.0) ]
hypothesis :: DeltaPair -> Double -> Double
hypothesis (d1,d2) x = d1 + (d2 * x)
costFunc :: TrainData -> DeltaPair -> Double
costFunc ts d = scaleFactor * sum squares
where scaleFactor = 1.0 / (2 * fromIntegral (length ts))
squares = map (square . diff d) ts
square x = x * x
diff :: DeltaPair -> (Double, Double) -> Double
diff d (x,y) = hypothesis d x - y
descendTheta :: ((Double, Double) -> Double) -> (DeltaPair -> Double) -> TrainData -> LearnRate -> DeltaPair -> Double
descendTheta thetaFunc deltaFunc ts lr deltas =
deltaFunc deltas - (scaleFactor * sum scaledDiffs)
where scaleFactor = lr / fromIntegral (length ts)
diffs = map (diff deltas) ts
scaledDiffs = map ((x,y) -> x * thetaFunc y) $ zip diffs ts
descendThetaZero :: TrainData -> LearnRate -> (Double, Double) -> Double
descendThetaZero = descendTheta (const 1) fst
descendThetaOne :: TrainData -> LearnRate -> (Double, Double) -> Double
descendThetaOne = descendTheta fst snd
singleDescend :: TrainData -> LearnRate -> DeltaPair -> DeltaPair
singleDescend ts lr deltas = (thetaZero, thetaOne)
where thetaZero = descendThetaZero ts lr deltas
thetaOne = descendThetaOne ts lr deltas
descend :: (DeltaPair -> DeltaPair) -> DeltaPair -> Int -> DeltaPair
descend f deltas i
| i < 0 = error "no negative numbers"
| i == 0 = deltas
| otherwise = descend f (f deltas) (i-1)
main :: IO ()
main = print (descend descendFunc (100, 1) 10000)
where descendFunc = singleDescend trainingSet 0.1
```

This was the simplest rewrite of your code that I could come up with:

```
{-# LANGUAGE UnicodeSyntax #-}
import Data.List
trainingSet = [(1, 10), (2, 20), (3, 30), (4, 40)]
hypothesis (δ1, δ2) x = δ1 + δ2 * x
diff δs (x, y) = hypothesis δs x - y
average xs = realToFrac (sum xs) / genericLength xs
descendθ θFunc ts lr δs = lr * average (map (t -> diff δs t * θFunc t) ts)
next ts lr δs@(δ0, δ1) = (δ0', δ1')
where δ0' = δ0 - descendθ ((x, y) -> 1) ts lr δs
δ1' = δ1 - descendθ ((x, y) -> x) ts lr δs
descend f δs i = iterate f δs !! i
```

Some general comments:

You don’t need to put `.0`

s on your numeric literals to indicate they are `Doubles`

like you would in C. A type signature will suffice.

I used the unicode syntax extension for greek variable names to make it easier on my eyes.

Your `descend`

function is just `iterate`

in disguise.

Your variable naming in your `descendTheta`

function was incredibly confusing, since you name your output variables theta, even thought they are basically the same as your input variables, which you named delta. I decided to name them both delta for consistency, leaving theta only to refer to the projection of the data set that you are descending on.

Your `scaledDiffs`

function was incredibly awkward and a round-about way to do what essentiallyw as a single `map`

and `average`

. Unfortunately, the `average`

function is not in the Haskell standard library, but the definition I included in the code is the most general version and will type-check for anything you would want to average.

Your training set and learning rate are constant over the entire run, so you could actually improve the above code even more by using the `Reader`

monad to thread those variables through your code without explicit parameter-passing. If you don’t know what the `Reader`

monad is, just ask and I will update this post with a demonstration of it. It’s very easy to use.

Also, I disagree that you must put type signatures on your code. For application code that only you will maintain, using type signatures is mostly a matter of style. For libraries, though, or code other people have to maintain, I would strongly recommend type signatures.