### A few takeaways from PyCon, 2018

There were a lot of interesting talks this year, and although I didn't attend nearly as many talks as I would have liked, I got a lot out of those that I did.

For example, let's use code from my last post and apply the

### Performance Python

First and foremost has to be Jake Vanderplas' talk,*Performance Python: Seven Strategies for Optimizing Your Numerical Code*. My first takeaway from his talk is line_profiler, which is a command line tool that's intended to be used to profile your Python functions' execution time, line-by-line.For example, let's use code from my last post and apply the

`convert`

function to a single image. We use `line_profiler`

in the terminal by adding the `@profile`

decorator to the function head (no imports necessary).@profile def convert(im: np.array, transform: np.array) -> np.array: """ Convert an image array to another colorspace """ dimensions = len(im.shape) axes = im.shape[:dimensions-1] # Create a new array (respecting mutability) new_ = np.empty(im.shape) for coordinate in np.ndindex(axes): pixel = im[coordinate] pixel_prime = transform @ pixel new_[coordinate] = pixel_prime return new_I receive the following in my terminal.

Total time: 15.9443 s File: convert_colorspace.py Function: convert at line 53 Line # Hits Time Per Hit % Time Line Contents ============================================================== 53 @profile 54 def convert(im: np.array, transform: np.array) -> np.array: 55 """ Convert an image array to another colorspace """ 56 1 3.0 3.0 0.0 dimensions = len(im.shape) 57 1 2.0 2.0 0.0 axes = im.shape[:dimensions-1] 58 #iters = reduce(mul, axes) 59 60 # Create a new array (respecting mutability) 61 1 14.0 14.0 0.0 new_ = np.empty(im.shape) 62 63 1297921 1834286.0 1.4 11.5 for coordinate in np.ndindex(axes): 64 1297920 4639324.0 3.6 29.1 pixel = im[coordinate] 65 1297920 8069295.0 6.2 50.6 pixel_prime = transform @ pixel 66 1297920 1401408.0 1.1 8.8 new_[coordinate] = pixel_prime 67 68 1 0.0 0.0 0.0 return new_From this output, we see that the majority of this function's execution time is spent (not surprisingly) in four places, namely

- allocating a new array in which to save data with
`np.empty`

, - retrieving RGB vectors from the original array,
- performing the matrix multiplication \(n\times m\)-times, and
- assigning the resultant vector to the corresponding location in the allocated space.

`for`

-loop with the single linenp.einsum('ij,...j', transform, im, optimize=True)Or, alternatively,

np.tensordot(im, transform, axes=(-1,1))(Special thanks to Warren Weckesser for helping me simplify this statement on StackOverflow.) The former results in the following savings.

Total time: 0.048487 s File: /home/brandon/Documents/convert_colorspace.py Function: convert at line 53 Line # Hits Time Per Hit % Time Line Contents ============================================================== 53 @profile 54 def convert(im: np.array, transform: np.array) -> np.array: 55 """ Convert an image array to another colorspace """ 56 #return im @ transform.T 57 1 48487.0 48487.0 100.0 return np.einsum('ij,...j', transform, im, optimize=True)In other words, we see an \(\approx 328\) times improvement in absolute time to convert the image vectors from one color space to another. And replacing the

`np.einsum`

implementation with the latter we see \(\approx 318\) times improvement over the original `for`

-loop.Total time: 0.050796 s File: convert_colorspace.py Function: convert at line 53 Line # Hits Time Per Hit % Time Line Contents ============================================================== 53 @profile 54 def convert(im: np.array, transform: np.array) -> np.array: 55 """ Convert an image array to another colorspace """ 56 1 50796.0 50796.0 100.0 return np.tensordot(im, transform, axes=(-1,1))In short, there are a lot of possibilities in Python if you'd like to make your numerical code faster. Here I've basically replaced a pretty readable (but extremely slow) function body with an extremely dense, single line that is many times as efficient. And you don't only have to use NumPy - there are a number of really great modules and packages, such as Numba and scikit-learn, that have optimized implementations of many algorithms or tools to make existing code faster.

### PyTorch

I would definitely say my understanding of neural networks and other deep learning algorithms is limited, but Stephanie Kim's talk about exploring PyTorch was pretty interesting to me, so I decided to check it out.

There are also many examples you can grab from GitHub to test or grab ideas from.

There are also many examples you can grab from GitHub to test or grab ideas from.

### Baysian Statistics

Another talk I watched for a while on YouTube after the fact is Christopher Fonnesbeck's

*Bayesian Non-parametric Models for Data Science using PyMC3*. Again, I've never used this module and nor do I have any relevant experience with this theory of statistics, but I do recall it being used extensively in analyzing certain astronomical datasets.
## Comments

## Post a Comment