Binary Search in Python and C

Earlier this year I was writing a Python script wherein I had to search a two dimensional grid of longitude and latitude values to retrieve the correct elevation at that point. I originally started with linear search, but it became evident when that grid became fine enough that linear search was wasting precious time. Instead, I read up on binary search (also, interpolation search, which is even faster given that the data are evenly distributed, but that's a topic for a future post).

Binary search will locate a point in a sorted array in \(\mathcal{O}(\log(N))\) time. If you're curious about the mathematics/reasoning for this conclusion, see this answer on SO; I'll paraphrase the logic here, because it's really quite interesting.

Binary search works by starting in the middle of the array, as opposed to either end. Next, we ask, is the entry we're looking for greater than, less than, or equal to the current one? If it's greater, we move to the middle of the upper-half of the array; if it's lesser, then we move to the lower-half of the array, and so on.

Hence, it is like asking, as the user on SO so nicely puts it, how many times should we divide some array of length \(N\) in half until we reach the one element we're looking for? Such is the motivation for the equation
\[1=\dfrac{N}{2^{x}}.\]
As in any elementary algebra course, we solve for \(x\):
\begin{align*}\newcommand{\lg}{\mathop{\rm lg}\nolimits}1&=\dfrac{N}{2^{x}}\\\implies 2^{x}&=N\\\implies x\lg(2)&=\lg(N)\\\implies x&=\lg(N).\end{align*}
Thus we should expect to, at most, divide an array about \(\lg(N)\) times.

Without further ado, a Python implementation of the one dimensional case.
#! /usr/bin/env python3.4
# -*- coding: utf-8 -*-


""" A quick implementation of binary seach """


from typing import Union
from array import array
from random import randint


def binarySearch(val: int, arr: array =array('I', [])) -> Union[int, None]:
    """ get the index at which a value exists in a sorted array """

    top = len(arr) - 1
    bottom = 0

    while top > bottom:

        index = (top + bottom) // 2
        current = arr[index]

        if (val == current):
            return index

        elif (val < current):
            # search lower half
            top = index - 1

        elif (val > current):
            # search upper half
            bottom = index + 1
    else:
        return None

def main(LENGTH: int =100, MOD: int =100) -> None:
    # build random array of length LENGTH
    X = array('I', [])

    for _ in range(LENGTH):
        X.append(randint(0, MOD))

    X = sorted(X)
    for _ in range(10):
        valueToFind = randint(0, MOD)
        loc = binarySearch(randint(0, MOD), X)

        if (type(loc) == int):
            print("** Value %d found" % valueToFind)

        else:
            print("** Value %d not found" % valueToFind)

if __name__ == '__main__': main(LENGTH=1000000, MOD=1000000)
For an array of size 1E6 it takes a little bit of time to construct and sort, but to search for various values it takes only milliseconds:
** Value 714968 found
** Value 826215 found
** Value 682153 found
** Value 950237 not found
** Value 830899 not found
** Value 632265 not found
** Value 150923 not found
** Value 54808 not found
** Value 398748 found
** Value 41400 not found
Performing as many as a million searches on these random data takes only about 20 seconds, which is quite fast.

Now I want to look at constructing and sorting random single dimensional arrays in C. To dynamically allocate space in the heap for our array we use malloc, as in the following functions.
/* allocate 1d array, fill it with random values and print those values */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
#include <string.h>

#define SIZE 25

uint32_t *
fabricate1DArray(uint32_t N)
{
    uint32_t i;

    // allocate array space in the heap
    uint32_t * arr = (uint32_t *)malloc(sizeof(uint32_t) * N);

    // set all the entries in this array to 0
    memset(arr, 0, sizeof(uint32_t) * N);

    return arr;
}

void
destroy1DArray(uint32_t * arr)
{
    free(arr);
}

int
comp(const void * A, const void * B)
{
    // comparison function for `qsort`
    uint32_t X = *((uint32_t *)A);
    uint32_t Y = *((uint32_t *)B);
    return (X < Y) ? -1 : ((X > Y) ? 1 : 0);
}

int
main(void)
{
    uint32_t * arr;
    uint32_t i;
    srand(time(NULL));
    
    arr = fabricate1DArray(SIZE);

    // fill array with random values
    for (i = 0; i < SIZE; ++i)
        arr[i] = rand() % SIZE;

    qsort(arr, SIZE, sizeof(*arr), comp);

    // print values (although this could just be done in the loop above)
    for (i = 0; i < SIZE; ++i)
        printf("%u ", arr[i]);

    printf("\n");
    destroy1DArray(arr);

    return 0;
}
As you can see, I've also written functions that free and sort the array, which has been filled with random values. Binary search, echoing the Python function above, looks as follows.
int64_t
binarySearch(uint32_t * val, uint32_t * arr)
{
    uint32_t top = SIZE - 1;
    uint32_t bottom = 0;
    uint32_t index, current;
    
    while (top > bottom)
    {
        index = (top + bottom) / 2;
        current = arr[index];

        if (current == *val) {
            return index;
        } else if (*val < current) {
            top = index - 1;
        } else if (*val > current) {
            bottom = index + 1;
        }
    }

    return -1;
}
I've found that a million searches on an array one million entries in size results in the following:
0.672178s for 1000000 searches
A testament to C's speed. What's more, using the -O3 flag cuts this time down to nearly half a second!