Read file only once

Posted on

Problem

I have a function that returns the contents of a file.

Since reading files from disk is expensive, I’d like to avoid having to read the file again after the first read.

I’ve come up with the function getFileContents that caches the file content in memory during the first call and returns the cached contents when called again.

Here’s a short program including imports that demonstrates its behavior:

import qualified Data.ByteString               as BS
import           System.Directory               ( getCurrentDirectory )
import           System.FilePath
import           Control.Exception
import           Data.Typeable                  ( typeOf )
import           Text.Printf                    ( printf )
import           Data.IORef


main = do
  fileContentsRef        <- newIORef Nothing
  -- First time reading the file accesses the disk
  _                      <- getFileContents fileContentsRef
  fileContentsFromMemory <- getFileContents fileContentsRef
  print fileContentsFromMemory

getFileContents
  :: IORef (Maybe BS.ByteString) -> IO (Either IOException BS.ByteString)
getFileContents fileContentsRef = do
  refContents <- readIORef fileContentsRef
  case refContents of
    Just fileContents -> do
      putStrLn "Using cached file contents from memory"
      return $ Right fileContents
    Nothing -> readFileAndCacheContents fileContentsRef

readFileAndCacheContents
  :: IORef (Maybe BS.ByteString) -> IO (Either IOException BS.ByteString)
readFileAndCacheContents fileContentsRef = do
  putStrLn "Reading file from disk, then caching it"
  curDir <- getCurrentDirectory
  let filePath = curDir </> "aDir" </> "theFile"
  readResult <-
    (try $ BS.readFile filePath) :: IO (Either IOException BS.ByteString)
  case readResult of
    Left ex -> do
      logEx ex
      return readResult
    Right fileContents -> do
      -- Cache the file contents
      writeIORef fileContentsRef $ Just fileContents
      return readResult
where
  logEx ex = printf "Exception of type %s: %sn" (show (typeOf ex)) (show ex)

Was IORef the right choice in this case? Is there something to improve in the code?

Solution

This strategy will work for any IO action, and so should be generalized.

once :: IO a -> IO (IO a)
once ioa = do
  cache <- newIORef Nothing
  return $ readIORef cache >>= case
    Nothing -> do
      a <- ioa
      writeIORef cache $ Just a
      return a
    Just a -> return a

main = do
  fileContentsGetter     <- once readFileContents
  -- First time reading the file accesses the disk
  _                      <- try fileContentsGetter
  fileContentsFromMemory <- try fileContentsGetter
  print fileContentsFromMemory

Note that if two threads call the getter at the same time, they will both find the cache empty, and both read the file. System.IO.Memoize provides a once that isn’t vulnerable to this.

(catch and rethrow in the definition of readFileContents lets you rescue logEx.)

Leave a Reply

Your email address will not be published. Required fields are marked *