C readline function

Posted on

Problem

I wrote this function for my next learning project (cfg parser). What do you think about it?

char readline(FILE *fp, char **line, size_t *size) {
    *size = 0;
    size_t cpos = ftell(fp);

    int c;
    while ((c = fgetc(fp)) != 'n' && c != EOF) (*size)++;
    if (*size == 0 && c == EOF) {
        *line = NULL;
        return 1;
    }

    *line = calloc(*size + 1, sizeof(char));
    if (*line == NULL)
        return -1;

    fseek(fp, cpos, SEEK_SET);
    if (fread(*line, 1, *size, fp) != *size)
        return -2;

    fgetc(fp); // Skip that newline

    return 0;
}

Can be used as follows:

FILE *fp = fopen("file.txt", "rb");
char *line = NULL;
size_t size = 0;
readline(fp, &line, &size);

Solution

  • sizeof(char) is guaranteed to be 1.

    If you think that in future you might want to use another representation of characters (say, wchar_t), you’d have to modify code in more than one place. It is much safer to infer size from the variable, rather than from type:

    *line = calloc(..., sizeof(**line));
    

    This way there is a single modification point.

    Similarly, you may want to use sizeof(**line) instead of literal 1 in fread.

  • fseek (and ftell)may fail with EBADF if the stream is not seekable. Since your function has no control over what kind of stream it is handled, it is advisable to test their return codes.

  • One could argue that reading 0 characters is in fact a success (a client can tell the difference by examining size), so differentiating return codes (0 and 1) is redundant.

    It is highly recommended to use named macros (instead of magic numbers -1 and -2) for failure codes.

  1. Profiling will tell, yet I doubt reading a file twice is faster than reading a file sequentially once and performing realloc() as needed.

  2. ftell(fp) returns long. Code should be long cpos. long is a signed integer whose positive value might not fit in a size_t. But more importantly, long may be insufficient for a file’s size. Consider int fgetpos(FILE * restrict stream, fpos_t * restrict pos); and fsetpos() instead.

  3. It is important to check the returns of ftell(), fseek() for errors.

  4. Rather than calloc(*size + 1, sizeof(char));, consider calloc(*size + 1, 1) or calloc(*size + 1, sizeof *(*line)).

  5. As the intention is to return a pointer to a string, *line needs termination. calloc() negates the need for (*line)[*size] = '', but that is subtle and likely deserves a comment that the string is properly terminated. I could see an update replacing calloc() with malloc() and not then explicitly terminating the string.

  6. Curious that code does not save the 'n'. With this code running in binary mode on various machines that use an alternate line ending like "r", maybe the entire file would then be 1 line as the is no 'n'. Or in Windows with "rn", each line ends with 'r'. Recommend amending this code to work with more line ending than 'n' or insure only text mode is used.

[edit]

  1. Nice use of fseek(fp, cpos, SEEK_SET) to go back to a position derived from ftell(). This is better than fseek(fp, cpos - *size, SEEK_SET); which is UB for text files.

  2. The return type char is weak. char may be unsigned and returning -1 could become 255. Suggest int.

  3. About the sample usage: This readline() allocates memory, and should demo proper usage including free():

    FILE *fp = fopen("file.txt", "rb");
    char *line = NULL;
    size_t size = 0;
    while (0 == readline(fp, &line, &size)) {
      puts(line);
      free(line);
    }
    

You appear to be unknowingly . This function is similar to the getline() function specified in POSIX 1003.1-2008 — yet incompatible. The order of the parameters is different, and your return value is less informative. If you are implementing this function for your own amusement, or because your system lacks POSIX 2008 support, then you should mimic the getline() interface.

It is misleading to use char as the return type for a status code that is conceptually a number. (getline() returns a ssize_t to indicate the length of the resulting string, or -1 for failure, with additional information in errno.)

Reading one character at a time with fgetc() is bad for performance. I recommend using fgets() instead. Then, your function would just act as a convenient memory reallocation wrapper for fgets(). You would also avoid the need to seek, which lets your function work with unseekable streams. It may take several tries to arrive at an overestimate of the required buffer size, but I think that it would still be better than reading the input twice.

Few things:

  1. Return int instead of char, it is easily extensible for future and you may be able to match with standard POSIX errno. Else have explicit enum values to indicate return status.

  2. Seems you are reading contents twice, once to determine size and then to actually read the entire line. Even though these are buffered fetches, measure and consider optimizing (also see 3.)

  3. Mimic scanf. Read strings assuming caller has allocated enough memory for the line. That way you only need to do “append” as you read characters. In general, a bad idea to have memory allocation and de-allocation in separate functions. It is easy to miss a free and cause memory leaks. API should be designed so as to minimize user errors.

  4. fscanf(fp,"%[^n]",buffer); is usually a good substitute to read a single line

Leave a Reply

Your email address will not be published. Required fields are marked *