### Binary Search in Python and C

Earlier this year I was writing a Python script wherein I had to search a two dimensional grid of longitude and latitude values to retrieve the correct elevation at that point. I originally started with linear search, but it became evident when that grid became fine enough that linear search was wasting precious time. Instead, I read up on binary search (also, interpolation search, which is even faster given that the data are evenly distributed, but that's a topic for a future post).

Binary search will locate a point in a sorted array in \(\mathcal{O}(\log(N))\) time. If you're curious about the mathematics/reasoning for this conclusion, see this answer on SO; I'll paraphrase the logic here, because it's really quite interesting.

Binary search works by starting in the middle of the array, as opposed to either end. Next, we ask, is the entry we're looking for greater than, less than, or equal to the current one? If it's greater, we move to the middle of the upper-half of the array; if it's lesser, then we move to the lower-half of the array, and so on.

Hence, it is like asking, as the user on SO so nicely puts it, how many times should we divide some array of length \(N\) in half until we reach the

Now I want to look at constructing and sorting random single dimensional arrays in C. To dynamically allocate space in the heap for our array we use

Binary search will locate a point in a sorted array in \(\mathcal{O}(\log(N))\) time. If you're curious about the mathematics/reasoning for this conclusion, see this answer on SO; I'll paraphrase the logic here, because it's really quite interesting.

Binary search works by starting in the middle of the array, as opposed to either end. Next, we ask, is the entry we're looking for greater than, less than, or equal to the current one? If it's greater, we move to the middle of the upper-half of the array; if it's lesser, then we move to the lower-half of the array, and so on.

Hence, it is like asking, as the user on SO so nicely puts it, how many times should we divide some array of length \(N\) in half until we reach the

*one*element we're looking for? Such is the motivation for the equation
\[1=\dfrac{N}{2^{x}}.\]

As in any elementary algebra course, we solve for \(x\):

\begin{align*}\newcommand{\lg}{\mathop{\rm lg}\nolimits}1&=\dfrac{N}{2^{x}}\\\implies 2^{x}&=N\\\implies x\lg(2)&=\lg(N)\\\implies x&=\lg(N).\end{align*}

Thus we should expect to, at most, divide an array

*about*\(\lg(N)\) times.
Without further ado, a Python implementation of the one dimensional case.

For an array of size 1E6 it takes a little bit of time to construct and sort, but to search for various values it takes only milliseconds:#! /usr/bin/env python3.4 # -*- coding: utf-8 -*- """ A quick implementation of binary seach """ from typing import Union from array import array from random import randint def binarySearch(val: int, arr: array =array('I', [])) -> Union[int, None]: """ get the index at which a value exists in a sorted array """ top = len(arr) - 1 bottom = 0 while top > bottom: index = (top + bottom) // 2 current = arr[index] if (val == current): return index elif (val < current): # search lower half top = index - 1 elif (val > current): # search upper half bottom = index + 1 else: return None def main(LENGTH: int =100, MOD: int =100) -> None: # build random array of length LENGTH X = array('I', []) for _ in range(LENGTH): X.append(randint(0, MOD)) X = sorted(X) for _ in range(10): valueToFind = randint(0, MOD) loc = binarySearch(randint(0, MOD), X) if (type(loc) == int): print("** Value %d found" % valueToFind) else: print("** Value %d not found" % valueToFind) if __name__ == '__main__': main(LENGTH=1000000, MOD=1000000)

Performing as many as a million searches on these random data takes only about 20 seconds, which is quite fast.** Value 714968 found ** Value 826215 found ** Value 682153 found ** Value 950237 not found ** Value 830899 not found ** Value 632265 not found ** Value 150923 not found ** Value 54808 not found ** Value 398748 found ** Value 41400 not found

Now I want to look at constructing and sorting random single dimensional arrays in C. To dynamically allocate space in the heap for our array we use

`malloc`

, as in the following functions.
As you can see, I've also written functions that free and sort the array, which has been filled with random values. Binary search, echoing the Python function above, looks as follows./* allocate 1d array, fill it with random values and print those values */ #include <stdio.h> #include <stdlib.h> #include <time.h> #include <stdint.h> #include <string.h> #define SIZE 25 uint32_t * fabricate1DArray(uint32_t N) { uint32_t i; // allocate array space in the heap uint32_t * arr = (uint32_t *)malloc(sizeof(uint32_t) * N); // set all the entries in this array to 0 memset(arr, 0, sizeof(uint32_t) * N); return arr; } void destroy1DArray(uint32_t * arr) { free(arr); } int comp(const void * A, const void * B) { // comparison function for `qsort` uint32_t X = *((uint32_t *)A); uint32_t Y = *((uint32_t *)B); return (X < Y) ? -1 : ((X > Y) ? 1 : 0); } int main(void) { uint32_t * arr; uint32_t i; srand(time(NULL)); arr = fabricate1DArray(SIZE); // fill array with random values for (i = 0; i < SIZE; ++i) arr[i] = rand() % SIZE; qsort(arr, SIZE, sizeof(*arr), comp); // print values (although this could just be done in the loop above) for (i = 0; i < SIZE; ++i) printf("%u ", arr[i]); printf("\n"); destroy1DArray(arr); return 0; }

I've found that a million searches on an array one million entries in size results in the following:int64_t binarySearch(uint32_t * val, uint32_t * arr) { uint32_t top = SIZE - 1; uint32_t bottom = 0; uint32_t index, current; while (top > bottom) { index = (top + bottom) / 2; current = arr[index]; if (current == *val) { return index; } else if (*val < current) { top = index - 1; } else if (*val > current) { bottom = index + 1; } } return -1; }

A testament to C's speed. What's more, using the0.672178s for 1000000 searches

`-O3`

flag cuts this time down to nearly half a second!