# Sequential Search – Recursion

Posted on

Problem

I have written this program to search for a value in an array using sequential search. And well, I’ve been trying to implement the same algorithm using recursive structure. is this a good implementation? or is there any other way to make it a little more…? (I’m not sure if…, to say if there is a better way to implement it.)

``````#include <iostream>
#include <cstdlib>

using std::cout;
using std::cin;
using std::endl;

/* function prototypes */
int
seq_search(const int [], size_t, int);

/* main function */
int
main()
{
int array[15] = {2, 6, 8, 23, 45, 43, 51, 62, 83, 78, 61, 18, 71, 34, 72};
int value = 0;
int location = 0;

for (int i = 0; i < 15; i++)
cout << array[i] << " ";
cout << endl;

cout << "Enter the search value: ";
cin >> value;

location = seq_search(array, 15, value);
if (location != -1)
cout << value << " found at position: " << location << endl;
else
cout << value << " not in array" << endl;
exit(EXIT_SUCCESS);
}

/* function definitions */
int
seq_search(const int array[],
size_t size, int search_value)
{
return (size == 1) ? -1 :
(array[size - 1] == search_value) ? size - 1 :
seq_search(array, size - 1, search_value);
}
``````

test1:

``````./main
2 6 8 23 45 43 51 62 83 78 61 18 71 34 72
Enter the search item: 61
61 found at position: 10
``````

test2:

``````./main
2 6 8 23 45 43 51 62 83 78 61 18 71 34 72
Enter the search item: 5
5 not in list
``````

This is what I wanted to implement:

``````int
seq_search(const int array[],
size_t size, int search_value)
{
int loc = 0;
bool found = false;

while (loc < listLength && !found) {

if (array[loc] == searchItem) {
found = true;
break;
}
else
loc++;
}
if (found)
return loc;
else
return -1;
}
``````

Solution

# Don’t use recursion

Although contrary to the code in your previous question, this time you do have tail recursion, which results in efficient code that doesn’t have the problem of potentially unlimited stack growth. However, I don’t see any value in trying to write a recursive implementation of something that is so much more obvious when implemented iteratively. An iterative version also can be written quite compactly:

``````int seq_search(const int array[], size_t size, int search_value) {
for (size_t i = 0; i < size; ++i)
if (array[i] == search_value)
return i;

return -1;
}
``````

# Your `seq_search()` can’t find the first element of the array

If `size == 1`, you return `-1`. Consider searching for something in an array of just one element, this will obviously return the wrong result. But due to the recursion, it happens for longer arrays as well when the recursion reaches the case where `size == 1`.

# Return a `size_t` instead of an `int`

An `int` might not be large enough to be able to represent all possible positions in the array. Especially on 64-bit machines, where an `int` is typically only 32 bits, but arrays can be larger than that. If you return a `size_t`, you don’t have that problem. You can use the largest possible value of `size_t` to represent “item not found”, since it will never be possible to have an array that large, as it would not leave memory for anything else, including the program itself.

A simple way to return the largest possible value is to just return `-1`, or declare a constant of type `size_t` with that value. See for example `std::string::npos`.

# Use `'n'` instead of `std::endl`

Prefer to use `'n'` instead of `std::endl`; the latter is equivalent to the former, but also forces the output to be flushed, which is usually not necessary and can negatively impact performance.

# Make it more generic

Again, your `seq_search()` function has limited use, since it only works on arrays of `int`s. In C++, it is possible to write a single function that can search for a given value in many different types of containers, see for example `std::find()`.

## There is a Bug in Your Implementation

It returns `-1` when you search for the first element in the array. Classic fencepost error.

If you had used a `std::array` with `begin()` and `end()`, this would not have happened.. You’ll have fewer security bugs in your code, too.

Frankly, `/* main function */` and `/* function definition */` are the kind of comments a student throws in only because a professor required there to be a comment before every function.

Here is how I would document this function:

``````/* Returns the index in an array a of length n of the LAST occurrence of
* search_value in a, or -1 if there is none.
*/
``````

This specifies several things about the behavior that are not immediately obvious from the name of the function and its parameters: which direction the function searches in, and what the behavior is when it does not find the value.

It also explains the meaning of every parameter, and specifies every return value, which is standard practice in literate programming.

You should also comment on decisions that might need explaining, which assumptions and invariants your code depends on, and in some cases, a brief explanation of why an algorithm is correct. Your code is pretty straightforward, though.

I also like to add a comment for which long, nested block a brace is supposed to be closing, and which things from a header file I actually used (so I know when it’s safe to remove the `#include`).

## Use Appropriate Types

An array index should be type `std::size_t`. On most platforms, `int` is a 32-bit signed integer that will overflow on an array with more than 2³¹ elements, which is possible to have these days.

## Use `constexpr` Where Appropriate

If you declare `seq_search` as `constexpr`, the compiler can fold calls to constant data, known at compile time, to constants, and eliminate the runtime cost of the call entirely.

I also threw `noexcept` on it, while I was at it.

## Input in a Loop

Right now, if you type in anything other than a valid `int`, your program will search for garbage. You’d be better off checking whether `cin >> value` succeeded and looping until it does.

One reason you might not want to do this is that you need the program to run in scripts and other batch-mode tools, not interactively.

## Factor out Helper Functions

I don’t follow this religiously, but good indications that something should be a helper function are that it has its own local variables that are only used within that one part of the algorithm, if you need to write the same code repeatedly, and if there’s a long and complicated block of code whose effect is to set a variable that never changes thereafter.

In particular, if you set `value` with an expression like `const auto value = prompt_for_value();`, you’ve helped the compiler to optimize your code, in several different ways.

First, most modern C and C++ compilers will attempt to transform code that sets a variable into a static single assignment with a phi function anyway, so you’ve expressed the code in the format the optimizer pass is designed to work with.

More importantly, you’ve now told the static analyzer that `value` is set once and never changes. This can help the loop optimizer tremendously in some cases, especially in a loop where you pass a pointer that might be an alias to `value` to a function outside the loop. I’ve especially run into this problem after I’ve passed `value` to any function that takes an `int` by reference. All the compiler knows is that there’s another pointer aliasing `value` that got passed to some function named `foo(int&)` and escaped its analysis. For all the compiler knows, this mysterious `foo(value)` might have saved its modifiable reference to `value` as a global variable that every other function it calls might have used to modify `value` behind the compiler’s back. And then it can no longer make any optimization that assumes `value` hasn’t changed. It must reload it from memory. And, frequently, the actual name of this function is `std::sscanf()`.

It also forces you to handle error conditions by throwing exceptions, not by returning error codes, which every programmer is supposed to always check for, but doesn’t.

Right now, you declare `int array[15]` and use the number `15` everywhere. Let’s say you need to add an element and change the size to 16. How sure are you that you would catch every single place in the program that checks the size to the correct value? Even the ones that a search would miss, because you wrote it as `14+1` for some reason? And that you changed only the instances of 15 that you should have? And that you never made a typo?

Because, if you made any of those very common mistakes, you wrote a bounds error and potential security bug that a compiler has no hope of catching, and would be nearly impossible for a human, either. This is one reason there are so many buffer-overrun vulnerabilities in C and C++.

Or here’s another tricky one. Let’s say you want to take our advice and change the return type from `int` to `std::size_t`. Do you still want to use `-1` as your error code, for an unsigned type? How sure are you that, in a big program, you would change all the instances of `-1` that are supposed to be a return value of `seq_search`, and no other uses of `-1` in the program?

So write something like:

``````constexpr int array[] = {2, 6, 8, 23, 45, 43, 51, 62, 83, 78, 61, 18, 71, 34, 72};
constexpr size_t array_extent = sizeof(array)/sizeof(array[0]);
constexpr size_t argmin = seq_search(array, array_extent, 2);
``````

## Indent Your Code for Clarity

I personally like to line up my nested ternary expressions like this:

``````return (size == 0)                       ? not_found :
(array[size - 1] == search_value) ? size - 1 :
seq_search(array, size - 1, search_value);
``````

This lines up the statements, the condition checks and the possible return values in columns. In my opinion, this is much easier to scan.

## Always Brace Your Loops and Control Structures

There was a famous bug in an Apple app that, conceptually, looked something like this:

``````if (flag)
doSomething();
oopsThisShouldNotHaveRunUnconditionally();
``````

Enough languages do use that kind of syntax that, if your editor auto-indents your code that way, it can be hard for a human to spot. However, many compilers will now warn you about this kind of misleading indentation. Your mileage will vary, but that style has definitely fallen out of fashion.

## Consider Using the musttail Extension

The great frustration of the computer scientist trying to write functional code in C is that compilers are not actually required to optimize tail calls. You sometimes hear older dodders than me gripe that, back in the day, compilers did break their recursive code, so they refactored it into iterative loops! After they carried their program to the computer on punch cards, in the snow, uphill both ways. And they liked it! Kids these days, with their optimizing compilers that make recursive code that’s easier to understand just as efficient as a loop, too! That’s not real C!

So, some compilers, currently Clang and ICX, have an attribute named `[[clang:musttail]]` or `__attribute((musttail))__`, which orders the compiler to optimize a tail call. You use this by writing in a header file

``````#if __clang__ || __INTEL_LLVM_COMPILER
#  define MUSTTAIL __attribute((musttail))__
#else
#  define MUSTTAIL /**/
#endif
``````

And then you can write

``````MUSTTAIL return seq_search( a, n-1, search_value );'
``````

Unfortunately, this happens not to work with ternary `return` statements. (You can only put the `musttail` attribute before a `return` statement followed by a tail-recursive function call. The compiler does not, as of 2022, accept a ternary expression that possibly returns a valid tail-call, nor can you put it on only one branch of the tree. You can only use it with an `if` block.)

## Some Minor Nits

You exit your program with `exit(EXIT_SUCCESS)`, but it’s more conventional to just `return EXIT_SUCCESS` or `EXIT_FAILURE` from `main`.

You seem to be importing the entire `namespace std` into the global namespace, which is a bad idea. You have no way of knowing what identifiers some implementation will add to `namespace std` in the future. (Although enough people just disabled the committee’s first effort to hide all identifiers behind a namespace, and not break old code that happened to use the same name, that it now hides most new identifiers behind two layers of namespace.)

## Putting it All Together

You get something like this:

``````#include <cassert>
#include <cstdlib> // for EXIT_SUCCESS
#include <iostream>
#include <stdexcept> // For runtime_error

using std::cin, std::cout, std::endl, std::size_t;

constexpr size_t not_found = (size_t)-1;

constexpr size_t seq_search( const int array[],
const size_t size,
const int search_value
) noexcept
/* Returns the index in an array a of length n of the LAST occurrence of
* search_value in a, or not_found if there is none.
*/
{
return (size == 0)                       ? not_found :
(array[size - 1] == search_value) ? size - 1 :
seq_search(array, size - 1, search_value);
}

int prompt_for_value()
/* Displays an input prompt until the user enters a valid search value.
*/
{
int value = 0;

cout << "Enter the search value: ";
cout.flush(); // This will normally happen anyway when reading from cin.

while (!(cin >> value)) {
cin.clear();
cin.ignore();
/* If we'd called cin.ignore the first time, the program would've halted
* and waited for a line of input.
*/
throw std::runtime_error("cin is borked");
}

cout << "Enter the search value: ";
cout.flush(); // This will normally happen anyway when reading from cin.
}

return value;
}

int main()
{
constexpr int array[] = {2, 6, 8, 23, 45, 43, 51, 62, 83, 78, 61, 18, 71, 34, 72};
constexpr size_t array_extent = sizeof(array)/sizeof(array[0]);
constexpr size_t argmin = seq_search(array, array_extent, 2);

static_assert( argmin == 0, "" );

cout << "{ ";
for ( size_t i = 0; i < array_extent; ++i )
{
cout << array[i] << " ";
}
cout << "}n";

const int value = prompt_for_value();

const size_t location = seq_search(array, array_extent, value);
if (location != not_found) {
cout << value << " found at position: " << location << endl;
} else {
cout << value << " not in array" << endl;
}

return EXIT_SUCCESS;
}
``````

The compiler does a good job with this, inlining the tail-recursive function and optimizing it to,

``````.LBB1_3:                                # =>This Inner Loop Header: Depth=1
sub     rbx, 1
jb      .LBB1_6
cmp     dword ptr [4*rbx + .L__const.main.array], eax
jne     .LBB1_3
``````

Whereas a slightly-modified version of your iterative code:

``````int
seq_search2(const int array[],
size_t size, int search_value)
{
int loc = 0;
bool found = false;

while (loc < size && !found) {

if (array[loc] == search_value) {
found = true;
break;
}
else
loc++;
}
if (found)
return loc;
else
return -1;
}
``````

compiles to:

``````.LBB2_2:                                # =>This Inner Loop Header: Depth=1
cmp     dword ptr [rdi + 4*rcx], edx
je      .LBB2_4
inc     rcx
cmp     rsi, rcx
jne     .LBB2_2
``````

(This was with clang++13 on Linux with `-std=c++20 -Os -march=x86-64-v4`.)

So, the recursive version is slightly faster, and an optimized iterative loop would compile to nearly-identical code.