Problem
I wrote this function for my next learning project (cfg parser). What do you think about it?
char readline(FILE *fp, char **line, size_t *size) {
*size = 0;
size_t cpos = ftell(fp);
int c;
while ((c = fgetc(fp)) != 'n' && c != EOF) (*size)++;
if (*size == 0 && c == EOF) {
*line = NULL;
return 1;
}
*line = calloc(*size + 1, sizeof(char));
if (*line == NULL)
return -1;
fseek(fp, cpos, SEEK_SET);
if (fread(*line, 1, *size, fp) != *size)
return -2;
fgetc(fp); // Skip that newline
return 0;
}
Can be used as follows:
FILE *fp = fopen("file.txt", "rb");
char *line = NULL;
size_t size = 0;
readline(fp, &line, &size);
Solution
-
sizeof(char)
is guaranteed to be 1.If you think that in future you might want to use another representation of characters (say,
wchar_t
), you’d have to modify code in more than one place. It is much safer to infer size from the variable, rather than from type:*line = calloc(..., sizeof(**line));
This way there is a single modification point.
Similarly, you may want to use
sizeof(**line)
instead of literal 1 infread
. -
fseek
(andftell
)may fail withEBADF
if the stream is not seekable. Since your function has no control over what kind of stream it is handled, it is advisable to test their return codes. -
One could argue that reading 0 characters is in fact a success (a client can tell the difference by examining
size
), so differentiating return codes (0 and 1) is redundant.It is highly recommended to use named macros (instead of magic numbers -1 and -2) for failure codes.
-
Profiling will tell, yet I doubt reading a file twice is faster than reading a file sequentially once and performing
realloc()
as needed. -
ftell(fp)
returnslong
. Code should belong cpos
.long
is a signed integer whose positive value might not fit in asize_t
. But more importantly,long
may be insufficient for a file’s size. Considerint fgetpos(FILE * restrict stream, fpos_t * restrict pos);
andfsetpos()
instead. -
It is important to check the returns of
ftell()
,fseek()
for errors. -
Rather than
calloc(*size + 1, sizeof(char));
, considercalloc(*size + 1, 1)
orcalloc(*size + 1, sizeof *(*line))
. -
As the intention is to return a pointer to a string,
*line
needs termination.calloc()
negates the need for(*line)[*size] = ' '
, but that is subtle and likely deserves a comment that the string is properly terminated. I could see an update replacingcalloc()
withmalloc()
and not then explicitly terminating the string. -
Curious that code does not save the
'n'
. With this code running in binary mode on various machines that use an alternate line ending like"r"
, maybe the entire file would then be 1 line as the is no'n'
. Or in Windows with"rn"
, each line ends with'r'
. Recommend amending this code to work with more line ending than'n'
or insure only text mode is used.
[edit]
-
Nice use of
fseek(fp, cpos, SEEK_SET)
to go back to a position derived fromftell()
. This is better thanfseek(fp, cpos - *size, SEEK_SET);
which is UB for text files. -
The return type
char
is weak.char
may be unsigned and returning-1
could become255
. Suggestint
. -
About the sample usage: This
readline()
allocates memory, and should demo proper usage includingfree()
:FILE *fp = fopen("file.txt", "rb"); char *line = NULL; size_t size = 0; while (0 == readline(fp, &line, &size)) { puts(line); free(line); }
You appear to be unknowingly reinventing-the-wheel. This function is similar to the getline()
function specified in POSIX 1003.1-2008 — yet incompatible. The order of the parameters is different, and your return value is less informative. If you are implementing this function for your own amusement, or because your system lacks POSIX 2008 support, then you should mimic the getline()
interface.
It is misleading to use char
as the return type for a status code that is conceptually a number. (getline()
returns a ssize_t
to indicate the length of the resulting string, or -1 for failure, with additional information in errno
.)
Reading one character at a time with fgetc()
is bad for performance. I recommend using fgets()
instead. Then, your function would just act as a convenient memory reallocation wrapper for fgets()
. You would also avoid the need to seek, which lets your function work with unseekable streams. It may take several tries to arrive at an overestimate of the required buffer size, but I think that it would still be better than reading the input twice.
Few things:
-
Return
int
instead ofchar
, it is easily extensible for future and you may be able to match with standard POSIXerrno
. Else have explicit enum values to indicate return status. -
Seems you are reading contents twice, once to determine size and then to actually read the entire line. Even though these are buffered fetches, measure and consider optimizing (also see 3.)
-
Mimic
scanf
. Read strings assuming caller has allocated enough memory for the line. That way you only need to do “append” as you read characters. In general, a bad idea to have memory allocation and de-allocation in separate functions. It is easy to miss afree
and cause memory leaks. API should be designed so as to minimize user errors. -
fscanf(fp,"%[^n]",buffer);
is usually a good substitute to read a single line