### A few takeaways from PyCon, 2018

There were a lot of interesting talks this year, and although I didn't attend nearly as many talks as I would have liked, I got a lot out of those that I did.

### Performance Python

First and foremost has to be Jake Vanderplas' talk, Performance Python: Seven Strategies for Optimizing Your Numerical Code. My first takeaway from his talk is line_profiler, which is a command line tool that's intended to be used to profile your Python functions' execution time, line-by-line.

For example, let's use code from my last post and apply the convert function to a single image. We use line_profiler in the terminal by adding the @profile decorator to the function head (no imports necessary).
@profile
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]

# Create a new array (respecting mutability)
new_ = np.empty(im.shape)

for coordinate in np.ndindex(axes):
pixel            = im[coordinate]
pixel_prime      = transform @ pixel
new_[coordinate] = pixel_prime

return new_

I receive the following in my terminal.
Total time: 15.9443 s
File: convert_colorspace.py
Function: convert at line 53

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
53                                           @profile
54                                           def convert(im: np.array, transform: np.array) -> np.array:
55                                               """ Convert an image array to another colorspace """
56         1          3.0      3.0      0.0      dimensions = len(im.shape)
57         1          2.0      2.0      0.0      axes = im.shape[:dimensions-1]
58                                               #iters = reduce(mul, axes)
59
60                                               # Create a new array (respecting mutability)
61         1         14.0     14.0      0.0      new_ = np.empty(im.shape)
62
63   1297921    1834286.0      1.4     11.5      for coordinate in np.ndindex(axes):
64   1297920    4639324.0      3.6     29.1          pixel          = im[coordinate]
65   1297920    8069295.0      6.2     50.6          pixel_prime    = transform @ pixel
66   1297920    1401408.0      1.1      8.8          new_[coordinate] = pixel_prime
67
68         1          0.0      0.0      0.0      return new_

From this output, we see that the majority of this function's execution time is spent (not surprisingly) in four places, namely
1. allocating a new array in which to save data with np.empty,
2. retrieving RGB vectors from the original array,
3. performing the matrix multiplication $$n\times m$$-times, and
4. assigning the resultant vector to the corresponding location in the allocated space.
We can alleviate this by replacing the for-loop with the single line
np.einsum('ij,...j', transform, im, optimize=True)

Or, alternatively,
np.tensordot(im, transform, axes=(-1,1))

(Special thanks to Warren Weckesser for helping me simplify this statement on StackOverflow.) The former results in the following savings.
Total time: 0.048487 s
File: /home/brandon/Documents/convert_colorspace.py
Function: convert at line 53

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
53                                           @profile
54                                           def convert(im: np.array, transform: np.array) -> np.array:
55                                               """ Convert an image array to another colorspace """
56                                               #return im @ transform.T
57         1      48487.0  48487.0    100.0      return np.einsum('ij,...j', transform, im, optimize=True)

In other words, we see an $$\approx 328$$ times improvement in absolute time to convert the image vectors from one color space to another. And replacing the np.einsum implementation with the latter we see $$\approx 318$$ times improvement over the original for-loop.
Total time: 0.050796 s
File: convert_colorspace.py
Function: convert at line 53

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
53                                           @profile
54                                           def convert(im: np.array, transform: np.array) -> np.array:
55                                               """ Convert an image array to another colorspace """
56         1      50796.0  50796.0    100.0      return np.tensordot(im, transform, axes=(-1,1))

In short, there are a lot of possibilities in Python if you'd like to make your numerical code faster. Here I've basically replaced a pretty readable (but extremely slow) function body with an extremely dense, single line that is many times as efficient. And you don't only have to use NumPy - there are a number of really great modules and packages, such as Numba and scikit-learn, that have optimized implementations of many algorithms or tools to make existing code faster.

### PyTorch

I would definitely say my understanding of neural networks and other deep learning algorithms is limited, but Stephanie Kim's talk about exploring PyTorch was pretty interesting to me, so I decided to check it out.

There are also many examples you can grab from GitHub to test or grab ideas from.

### Baysian Statistics

Another talk I watched for a while on YouTube after the fact is Christopher Fonnesbeck's Bayesian Non-parametric Models for Data Science using PyMC3. Again, I've never used this module and nor do I have any relevant experience with this theory of statistics, but I do recall it being used extensively in analyzing certain astronomical datasets.