DyND Callables: Speed and Flexibility

(This is a post I wrote for the Continuum Developer Blog. You can see the original here)

Introduction

We've been working hard to improve DyND in a wide variety of ways over the past few months. While there is still a lot of churn in our codebase, now is a good time to show a few basic examples of the great functionality that's already there. The library is available on GitHub at https://github.com/libdynd/libdynd.

Today I want to focus on DyND's callable objects. Much of the code in this post is still experimental and subject to change. Keep that in mind when considering where to use it.

All the examples here will be in C++14 unless otherwise noted. The build configuration should be set up to indicate that C++14, the DyND headers, and the DyND shared libraries should be used. Output from a line that prints something will be shown directly in the source files as comments. DyND also has a Python interface, so several examples in Python will also be included.

Getting Started

DyND's callables are, at the most basic level, functions that operate on arrays. At the very lowest level, a callable can access all of the data, type-defined metadata (e.g. stride information), and metadata for the arrays passed to it as arguments. This makes it possible to use callables for functionality like multiple dispatch, views based on stride manipulation, reductions, and broadcasting. The simplest case is using a callable to wrap a non-broadcasting non-dispatched function call so that it can be used on scalar arrays.

Here's an example of how to do that:

#include <iostream>
// Main header for DyND arrays:
#include <dynd/array.hpp>
// Main callable object header:
#include <dynd/callable.hpp>

using namespace dynd;

// Write a function to turn into a DyND callable.
double f(double a, double b) {
  return a * (a - b);
}

// Make the callable.
nd::callable f_callable = nd::callable(f);

// Main function to show how this works:
int main() {
  // Initialize two arrays containing scalar values.
  nd::array a = 1.;
  nd::array b = 2.;

  // Print the dynamic type signature of the callable f_callable.
  std::cout << f_callable << std::endl;
  // <callable <(float64, float64) -> float64> at 000001879424CF60>

  // Call the callable and print its output.
  std::cout << f_callable(a, b) << std::endl;
  //array(-1,
  //      type="float64")
}

The constructor for dynd::nd::callable does most of the work here. Using some interesting templating mechanisms internally, it is able to infer the argument types and return type for the function, select the corresponding DyND types, and form a DyND type that represents an analogous function call. The result is a callable object that wraps a pointer to the function f and knows all of the type information about the pointer it is wrapping. This callable can only be used with input arrays that have types that match the types for the original function's arguments.

The extra type information contained in this callable is "(float64, float64) -> float64", as can be seen when the callable is printed. The syntax here comes from the datashape data description system—the same type system used by Blaze, Odo, and several other libraries.

One key thing to notice here is that the callable created now does its type checking dynamically rather than at compile time. DyND has its own system of types that is used to represent data and the functions that operate on it at runtime. While this does have some runtime cost, dynamic type checking removes the requirement that a C++ compiler verify the types for every operation. The dynamic nature of the DyND type system makes it possible to write code that operates in a generic way on both builtin and user-defined types in both static and dynamic languages. I'll leave discussion of the finer details of the DyND type system for another day though.

Broadcasting

DyND has other functions that make it possible to add additional semantics to a callable. These are higher-order functions (functions that operate on other functions), and they are used on existing callables rather than function pointers. The types for these functions are patterns that can be matched against a variety of different argument types.

Things like array broadcasting, reductions, and multiple dispatch are all currently available. In the case of broadcasting and reductions, the new callable calls the wrapped function many times and handles the desired iteration structure over the arrays itself. In the case of multiple dispatch, different implementations of a function can be called based on the types of the inputs. DyND's multiple dispatch semantics are currently under revision, so I'll just show broadcasting and reductions here.

DyND provides broadcasting through the function dynd::nd::functional::elwise. It follows broadcasting semantics similar to those followed by NumPy's generalized universal functions—though it is, in many ways, more general. The following example shows how to use elwise to create a callable that follows broadcasting semantics:

// Include <cmath> to get std::exp.
#include <cmath>
#include <iostream>

#include <dynd/array.hpp>
#include <dynd/callable.hpp>
// Header containing dynd::nd::functional::elwise.
#include <dynd/func/elwise.hpp>

using namespace dynd;

double myfunc_core(double a, double b) {
  return a * (a - b);
}

nd::callable myfunc = nd::functional::elwise(nd::callable(myfunc_core));

int main() {
  // Initialize some arrays to demonstrate broadcasting semantics.
  // Use brace initialization from C++11.
  nd::array a{{1., 2.}, {3., 4.}};
  nd::array b{5., 6.};
  // Create an additional array with a ragged second dimension as well.
  nd::array c{{9., 10.}, {11.}};

  // Print the dynamic type signature of the callable.
  std::cout << myfunc << std::endl;
  // <callable <(Dims... * float64, Dims... * float64) -> Dims... * float64>
  //  at 000001C223FC5BE0>

  // Call the callable and print its output.
  // Broadcast along the rows of a.
  std::cout << myfunc(a, b) << std::endl;
  // array([[-4, -8], [-6, -8]],
  //       type="2 * 2 * float64")

  // Broadcast the second row of c across the second row of a.
  std::cout << myfunc(a, c) << std::endl;
  // array([[ -8, -16], [-24, -28]],
  //       type="2 * 2 * float64")

}

A similar function can be constructed in Python using DyND's Python bindings and Python 3's function type annotations. If Numba is installed, it is used to get JIT-compiled code that has performance relatively close to the speed of the code generated by the C++ compiler.

from dynd import nd, ndt

@nd.functional.elwise
def myfunc(a: ndt.float64, b: ndt.float64) -> ndt.float64:
    return a * (a - b)

Reductions

Reductions are formed from functions that take two inputs and produce a single output. Examples of reductions include taking the sum, max, min, and product of the items in an array. Here we'll work with a reduction that takes the maximum of the absolute values of the items in an array. In DyND this can be implemented by using nd::functional::reduction on a callable that takes two floating point inputs and returns the maximum of their absolute values. Here's an example:

// Include <algorithm> to get std::max.
#include <algorithm>
// Include <cmath> to get std::abs.
#include <cmath>
#include <iostream>

#include <dynd/array.hpp>
#include <dynd/callable.hpp>
// Header containing dynd::nd::functional::reduction.
#include <dynd/func/reduction.hpp>

using namespace dynd;

// Wrap the function as a callable.
// Then use dynd::nd::functional::reduction to make a reduction from it.
// This time just wrap a C++ lambda function rather than a pointer
// to a different function.
nd::callable inf_norm = nd::functional::reduction(nd::callable(
        [](double a, double b) { return std::max(std::abs(a), std::abs(b));}));

// Demonstrate the reduction working along both axes simultaneously.
int main() {
  nd::array a{{1., 2.}, {3., 4.}};

  // Take the maximum absolute value over the whole array.
  std::cout << inf_norm(a) << std::endl;
  // array(4,
  //       type="float64")
}

Again, in Python, it is relatively easy to create a similar callable.

from dynd import nd, ndt

@nd.functional.reduction
def inf_norm(a: ndt.float64, b: ndt.float64) -> ndt.float64:
    return max(abs(a), abs(b))

The type for the reduction callable inf_norm is a bit longer. It is (Dims... * float64, axes: ?Fixed * int32, identity: ?float64, keepdims: ?bool) -> Dims... * float64. This signature represents a callable that accepts a single input array and has several optional keyword arguments. In Python, passing keyword arguments to callables works the same as it would for any other function. Currently, in C++, initializer lists mapping strings to values are used since the names of the keyword arguments are not necessarily known at compile time.

Exporting Callables to Python

The fact that DyND callables are C++ objects with a single C++ type makes it easy to wrap them for use in Python. This is done using the wrappers for the callable class already built in to DyND's Python bindings. Using DyND in Cython merits a discussion of its own, so I'll only include a minimal example here.

This Cython code in particular is still using experimental interfaces. The import structure and function names here are very likely to change.

The first thing needed is a header that creates the desired callable. Since this will only be included once in a single Cython based module, additional guards to make sure the header is only applied once are not needed.

// inf_norm_reduction.hpp
#include <algorithm>
#include <cmath>

#include <dynd/array.hpp>
#include <dynd/callable.hpp>
#include <dynd/func/reduction.hpp>

static dynd::nd::callable inf_norm =
    dynd::nd::functional::reduction(dynd::nd::callable(
        [](double a, double b) { return std::max(std::abs(a), std::abs(b));}));

The callable can now be exposed to Python through Cython. Some work still needs to be done in DyND's Python bindings to simplify the system-specific configuration for linking extensions like this to the DyND libraries. For simplicity, I'll just show the commented Cython distutils directives that can be used to build this file on 64 bit Windows with a libdynd built and installed from source in the default location. Similar configurations can be put together for other systems.

# py_inf_norm.pyx
# distutils: include_dirs = "c:/Program Files/libdynd/include"
# distutils: library_dirs = "c:/Program Files/libdynd/lib"
# distutils: libraries = ["libdynd", "libdyndt"]

from dynd import nd, ndt

from dynd.cpp.callable cimport callable as cpp_callable
from dynd.nd.callable cimport dynd_nd_callable_from_cpp

cdef extern from "inf_norm_reduction.hpp" nogil:
    # Have Cython call this "cpp_inf_norm", but use "inf_norm" in
    # the generated C++ source.
    cpp_callable inf_norm

py_inf_norm = dynd_nd_callable_from_cpp(inf_norm)

To build the extension I used the following setup file and ran it with the command python setup.py build_ext --inplace.

# setup.py
# This is a fairly standard setup script for a minimal Cython module.
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("py_inf_norm.pyx", language='c++'))

A Short Benchmark

elwise and reduction are not heavily optimized yet, but, using IPython's timeit magic, it's clear that DyND is already doing well:

In [1]: import numpy as np

In [2]: from py_inf_norm import py_inf_norm

In [3]: a = np.random.rand(10000, 10000)

In [4]: %timeit py_inf_norm(a)
1 loop, best of 3: 231 ms per loop

In [5]: %timeit np.linalg.norm(a.ravel(), ord=np.inf)
1 loop, best of 3: 393 ms per loop

These are just a few examples of the myriad of things you can do with DyND's callables. For more information take a look at our github repo as well as libdynd.org. We'll be adding a lot more functionality, documentation, and examples in the coming months.

GSOC Concluding Thoughts

After many long weeks of hunting bugs and reorganizing code, my GSOC has come to an end. My project turned out significantly different than I had expected, but much of my work still proved to be very valuable. I originally envisioned being able to use many of DyND's features within Cython in a transparent way. Much of the work that I did was in fixing bugs and laying the groundwork for making things like that possible. There are still several bugs to fix and features to add in Cython before that's really possible, but I'm still pleased with the progress I made during my GSOC. My bugfixes served to simplify some of the difficult problems that arise when wrapping and interfacing with nontrivial C++ code. My restructuring of the DyND Python bindings, as expected, made it much easier to transition between the C++-level DyND constructs and the Python classes used in its Python bindings.

Here's a minimum working example that works with the latest versions of libdynd and dynd-python (as of this post).

// myfunc.cpp
#include "dynd/func/callable.hpp"
#include "dynd/func/apply.hpp"
#include "dynd/func/elwise.hpp"

// Write a function to turn into a DyND callable.
double f(double a, double b) {
  return a * (a - b);
}

// Make the callable object.
dynd::nd::callable f_broadcast = dynd::nd::functional::elwise(dynd::nd::functional::apply(&f));
# custom_callable.pyx
# distutils: extra_compile_args = -std=c++11
# distutils: libraries = dynd

from dynd.cpp.func.callable cimport callable as cpp_callable
from dynd.nd.callable cimport callable
from dynd import nd

# Declare the C++ defined callable object.
cdef extern from "myfunc.cpp" nogil:
    cpp_callable f_broadcast

# Construct a Python object wrapping the DyND callable.
cdef callable py_f = callable()
py_f.v = f_broadcast

# A short demonstration.
def test():
    a = nd.array([1., 2., 3., 2.])
    b = nd.array([3., 2., 1., 0.])
    print py_f(a, b)
# setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("*.pyx", language='c++'))

As you can see, it is now very easy for external modules to transition smoothly between using DyND in C++ and providing Python-level functionality. There's still a lot to be done, but I find the fact that this is now such a simple process really remarkable. This example uses the strengths of both Cython and Dynd to provide useful and significant functionality, and I'm thrilled it is finally working.

Cython Collaboration

I've spent the majority of these last few weeks working on adding new features to Cython. I just recently finished my work there, though the pull request involving exception handling for overloaded operators is still waiting to be merged. Overloading the assignment operator should now work in all cases for C++ classes. Once the exception handling pull request is merged, translating C++ exceptions into Python exceptions should also work perfectly well. It has been a relief to finally get these bugs taken care of. There were a variety of features I wanted to add to Cython, but I've run out of time to work on them now, so they will have to wait till later. The Cython developers were very helpful in getting these bugs fixed. They've helped find and take care of a few corner cases that I missed in the pull requests that have been merged thus far. I wasn't able to finish all the features I wanted, but I was able to make a lot of good progress and start building a consensus on what some features should look like further down the road. I got a lot of good things done, but I'm ready to move back to working on DyND.

One of the major features I added was allowing C++ classes to use an overloaded operator= within Cython. It is all working now and is on-schedule to be released as a part of 0.23. Handling cascaded assignment proved to be particularly tricky, since Cython mimics the Python semantics in the case of a = b = c rather than following the C++ behavior for C++ objects. With the help of the Cython developers, I was able to get it all working fine.

Making Cython respect the exception handling declarations for overloaded C++ operators used within Cython proved to be a lengthy bit of work. In order to make it work properly, the type analysis code that determines when an exception handler should be used had to be connected to the C++ generation code that, if an exception handler is used, wraps the given arithmetic expression in a try-catch statement at the C++ level. This was challenging, not so much because it was hard to see what needed to be changed in the output, but because the needed changes consisted of identifying and making many small changes in many places. I'm pleased with the result now. Respecting different exception declarations for different operators requires that Cython generate a separate try-catch block for each operator used. Forming a try-except statement for each call to an overloaded operator also requires that the result of each subexpression be stored in a temporary variable. In theory, good removal of temporaries and good branch prediction should make it so that all of these extra temporary variables and try-catch statements have no additional effect on the time needed to evaluate a given expression, but that ideal may be difficult for C++ compilers to reach. Either way, that was the only way to properly conform to the existing API for overloaded operators that is already documented and used extensively in Cython. At some future point it would be nice to add a way to let users consolidate different try-except blocks, but that is well outside the scope of my current project.

Thanks to the exception handling logic that I've added, things like the following raise an appropriate Python exception now rather than crash the Python interpreter:

cdef extern from "myheader.hpp" nogil:
    cppclass thing:
        thing()
        thing(int)
        operator+(thing &other) except +

def myfunc():
    # Initialize some "thing"s to some values
    cdef thing a = ...
    cdef thing b = ...
    cdef thing c
    # Now do an operation that throws a C++ exception.
    c = a + b

One of the key features I was hoping to work through with the Cython developers was finding an appropriate API for assignment to a multidimensional index. In C++ this is easily implemented as an assignment to an lvalue reference resulting from an overloaded or variadic function call. The Cython developers were opposed to allowing assignment to function calls, since an interface of that sort is fairly un-pythonic. That's a sensible opinion, but it leaves the problem of assigning to a multidimensional index unsolved. The only workable proposal that came from the discussion was one that would allow inline definitions of methods like __getitem__ and __setitem__ in the Cython extern declaration of the C++ class. Though that would provide a Python-like interface for users, I didn't opt to implement it that way for a variety of reasons. First, when working with a C++ library with Python bindings written in Cython, it forces users to keep track of three interfaces (the C++ interface, the Cython interface built on top of the C++ interface, and the Python interface) instead of two (the C++ and Python interfaces). Second, it makes library writers write larger amounts of oddly placed boilerplate code that is more difficult to understand and maintain. Third, it causes possible naming conflicts when working with C++ classes that overload both operator[] and operator() (this is primarily a theoretical concern, but it is still troubling). In order to allow C++-like control over the number of indices used (without using std::tuple and requiring C++11), these special functions would have to support C++ style function signature overloading as well, which requires that they have entirely different signatures than their corresponding Python variants. The maintainability of such a feature is also a concern, since it would require a wide variety of changes to the Cython code base, all of which are unlikely to be used frequently in many other projects. In spite of all these concerns, this is still a workable proposal, but I have avoided implementing it thus far in the hope that something better would surface. Given how much time I have left to work on this GSOC project, this feature may have to be left for later.

Another important feature I spent some time trying to add was support for non-type template parameters. I posted two different possible API's: one which specifies only the number of parameters and allows the C++ compiler to verify that all the parameters are the correct type, and another that requires that the types of the parameters be specified at the Cython level so that they can all be checked when Cython generates the C++ source file. After waiting several days and receiving no response, I started work on the former, primarily because it seemed like the simplest option available. Adding the additional type declarations to the template parameters would have required finding a way to apply the existing operator overloading logic to template parameters, and I was uncertain of whether or not that would be doable while still mirroring the C++ template semantics. After I had spent a pair of days implementing most of the more lenient syntax, one of the Cython developers chimed in to support the stricter type checking. The argument in favor of stricter type checking makes sense since it lets users avoid having to debug the generated C and C++ files, but it would have been nice to hear a second opinion before spending a few days implementing the other syntax. This is another feature that, given how much time I have left, I'll probably have to leave for later.

While discussing non-type template parameters, we also discussed adding support for variadic templates. The consensus on the syntax was to allow variadic template arguments to be used by simply adding an ellipsis where the C++ expression typenames ... would go. I also proposed allowing something like types ..., and that syntax may be adopted in the end anyway. I didn't get the time to add that feature, but at least the design ideas are there for someone to use later on. Currently, due to some incomplete type checking, variadic templated functions are already usable in Cython, but this is really a bug and not a feature. For example:

// variadic_print.hpp
#pragma once
#include <iostream>

template<typename T>
void cpp_print(T value){
  std::cout << value << std::endl;
}

template<typename T, typename... Args>
void cpp_print(T value, Args... args){
  std::cout << value << ", ";
  cpp_print(args...);
}
# cpp_print.pyx
# The extra compile arguments here assume gcc or clang is used.
# distutils: extra_compile_args = ['-std=c++11']

from libcpp.string cimport string

cdef extern from "variadic_print.hpp" nogil:
    void cpp_print(...)

def test():
    cdef:
        int thing1 = 1
        double thing2 = 2
        long long thing3 = 3
        float thing4 = 4
        string thing5 = "5"
    cpp_print(thing1, thing2, thing3, thing4, thing5)

This definitely isn't a feature though. It's just a cleverly exploited bug that hasn't been fixed yet and is certainly not usable in anything more than entertaining examples.

It would also be nice to see things like overloading in-place operations and overloading operator|| and operator&& in Cython at some point, but I haven't done that yet.

All things considered, even though I wasn't able to accomplish all I had hoped in Cython, I was still able to do a lot of useful things. The existing design philosophy in Cython didn't prove to be completely amenable to exposing a Cython-level API for a relatively advanced C++ library. The existing features are designed primarily to allow users to write Python-like code that manipulates C and C++ objects. The features that are focused toward interfacing with external C++ libraries are effective for constructing and exposing Python wrappers for C++ objects, but can often prove to be very unwieldy for exposing Cython-level APIs for C++ libraries. It's certainly still possible, but it definitely stretches the limits of the tools at hand.

Now, moving forward after my work on Cython, I've shifted toward contributing to libdynd and dynd-python again. I've worked with my mentors to isolate a mysterious segfault that we were seeing when using clang to build libdynd and dynd-python. I've also worked on several improvements to the build system. Everything should be building with MinGW again soon, but, for now, my linux environment is good enough to work on some new features too. I've also spent some time working on some minimal examples of using the Cython declarations in dynd-python outside the installed Python package.

To help with testing libdynd and dynd-python, I've also overhauled the Travis-CI build to make use of the new container-based infrastructure. Thus far the builds are running noticeably faster. These changes should be a great help to all of us in the future as we continue to develop both packages.

Over these last few weeks I expect to spend most of my time making it as easy as possible to use as much of libdynd as possible within Cython. The infrastructure is already in place for most of the core pieces that need to be taken care of, so I'm feeling optimistic.

Fixing Cython Bugs

Last week I went to the SciPy conference. It was great. I got to meet many of the major contributors to open source, including one of my mentors, Mark Wiebe. I also got to give a talk about some of my previous work on SciPy and see my wife's talk about our new applied math curriculum here at BYU. There were a lot of great presenters at the conference this year and it was good to see how many people are interested in and working on the software in the scientific Python stack. Above all, there were free t-shirts. Lots of free t-shirts.

Other than that, the past few weeks I've spent a lot of time fixing and following up on a series of Cython bugs. A lot of my time has been spent reading large chunks of the Cython codebase to properly understand what is going on. My understanding of what is going on is still very incomplete, but I've managed to get enough of a hold on things to be able to start fixing bugs. As I currently understand it, the Cython parser goes through a file and constructs a series of nodes representing the abstract syntax tree. Once the abstract syntax tree is constructed, there is a type resolution pass on all the existing nodes where some of the more generic nodes can specialize themselves as instances of more specific nodes. That helps localize parts of the code that deal with entirely different cases for the different types. This type resolution phase is also where all the static types are checked for correctness. Once types have all been checked and the proper nodes for code generation are in place, another pass is made to generate the correct code for each operation. Compiler directives are taken into account and different preambles are included conditionally as well. Most of the code for performing the type analysis pass and the code generation pass on the abstract syntax tree is in the different methods of the classes defined in Cython/Compiler/ExprNodes.py and Cython/Compiler/Nodes.py. That's a very rough overview of what is going on, but it has been enough for me to fix a few things.

The biggest bug I've been working on has been the fact that, until now, exception declarations for operators overloaded at the C++ level have not been respected in the generated C++ code. This means that, when an operation like a + b ought to throw a C++ error, catch the C++ error, then translate it into a Python error, the only thing that really happens is the whole program crashing due to an uncaught C++ error. Fixing this has been a little tricky since temporary variables must be allocated and used to store the result of each operation, and many of the overloadable operations are handled separately throughout the Cython code. In spite of the fact that this is a large and delicate bug to fix, I have managed to get everything working for arithmetic and unary operators. Indexing and function calls still need more work/testing.

I've also been discussing the best way to do multidimensional indexing with the Cython developers. In C++, multidimensional indexing of C++ objects can be written easily as "a(1, 2, 3)". That is all well and good, except that such an expression cannot be used properly as a left-value in assignment in Cython even though that is perfectly valid in C++. The primary reasoning as far as I can tell is that allowing that would be decidedly un-pythonic. My personal preference is very strongly toward allowing the C++ syntax when working with C++ objects, but we're still trying to settle on a reasonable solution.

I've also resurrected an existing patch to allow using an overloaded call operator for stack-allocated objects and put in some initial code toward allowing users to declare and use an overloaded operator= for a C++ class. There will be more patches coming soon on that.

Once I've fixed a few more of these Cython bugs, I expect to be able to continue my work in refactoring DyND's Python bindings to make the transition between Python, Cython/DyND, and C++/DyND much smoother.

Structure Plans and More Examples

Despite a few setbacks, these past two weeks have been ones of progress. I was able to work with my mentors, Mark and Irwin, to come up with a plan for how to restructure the C++ namespaces in dynd-python. We opted to separate the namespaces for the Python interoperability functions and the namespaces for the primary DyND C++ functionality. This will make it so that Cython wrapper types for specific C++ types can have the same name without causing any sort of naming conflict. Users that want to use both types within the same file can use the different namespaces to separate them. Within Cython this will be even easier since types can be aliased on import the same way Python objects can. I finished the name separation work shortly after my last blog post.

We also developed a plan with regards to how to structure the pxd files so that they can be easily used in third party extension modules. I'm working to separate the existing Cython pxd files in the project so that there is one for each header. That will make it so that the structure of the headers is clear and it will be easy to add new declarations. It will also make it so that projects that do not wish to include swathes of project headers will be able to avoid that when they only want to include specific things. On the other hand, for users that want to throw something together quickly, I'll also put together another set of pxd files that use Cython's relative cimports to mimic the namespace structure currently in the DyND library. I am in the process of that restructuring, and hope to be able to get it done soon.

With regards to dynamically loading C++ classes via Cython, for now, we're going to continue including the C++ project headers and linking against the original DyND library. I suspect that there is a way of dynamically loading C++ classes from other Python modules without having to link against the original module. However, I haven't figured it out yet. It appears to be a cool feature that could be added later, but brings relatively little added benefit other than a slightly simplified API. Cython certainly does not support anything like that currently and getting it to do that would require significant modifications.

On the other hand, in Cython, it is relatively easy to export functions and variables in a way that does not require client extension modules to link against the original library. For now I'm going to employ that approach so that, when all is said and done, as much of the existing library as possible can be loaded via Cython's cimport machinery.

I have spent some time putting together a basic example of working with DyND's C++ objects within Cython, but I have run across some unresolved issues with current project structure. I expect that we'll be able to get those resolved and I'll be able to post a working example soon.

During the past two weeks I also had the opportunity to learn how DyND's arrfuncs work. Arrfuncs operate on DyND arrays the same way that gufuncs operate on NumPy arrays. I found that arrfuncs are surprisingly easy to both construct and use. Here's a simple program that shows some arrfuncs in action:

#include <iostream>
#include <dynd/array.hpp>
#include <dynd/func/arrfunc.hpp>
#include <dynd/func/apply.hpp>
#include <dynd/func/elwise.hpp>

using namespace dynd;

namespace {
// Functions to use to create arrfuncs.
double f(double x) { return x * x; }
double g(double x, double y) { return x * x + y * y; }
// A print function that mimics Python's,
// but only accepts one argument.
template <typename T> void print(T thing) { std::cout << thing << std::endl; }
}

int main() {
  // First make versions of f and g that operate on
  // DyND array types rather than native C++ types.
  nd::arrfunc square_0 = nd::functional::apply(&f);
  nd::arrfunc norm2_0 = nd::functional::apply(&g);
  // Now make versions of f and g that operate on DyND arrays
  // and follow NumPy-like broadcasting semantics.
  nd::arrfunc square = nd::functional::elwise(square_0);
  nd::arrfunc norm2 = nd::functional::elwise(norm2_0);
  // Try applying these functions to DyND scalars.
  nd::array a = 2.;
  nd::array b = 3.;
  print(square_0(a));
  print(norm2_0(a, b));
  print(square(a));
  print(norm2(a, b));
  // Now apply these functions to DyND arrays.
  a = {1., 2., 3., 2.};
  b = {3., 2., 1., 0.};
  print(square(a));
  print(norm2(a, b));
  // Run a two-dimensional example.
  a = {{1., 2.}, {2., 3.}, {3., 4.}};
  b = {{2., 1.}, {3., 3.}, {2., 1.}};
  print(square(a));
  print(norm2(a, b));
  // Broadcast the function with two arguments along the rows of a.
  b = {1., 2.};
  print(norm2(a, b));
  // Broadcast a scalar with an array.
  a = {{1., 2., 3.}, {2., 3., 4.}};
  b = 2.;
  print(norm2(a, b));
  print(norm2(b, a));
  // Broadcast a row and column together.
  a = {{3.}, {2.}, {1.}};
  b = {{2., 1.}};
  print(norm2(a, b));
}

I compiled this C++ file on Windows with a recent MinGW-w64 g++ using the command

g++ -O3 -I"c:/Program Files (x86)/libdynd/include" -std=c++11 ufunc.cpp -L"c:/Program Files (x86)/libdynd/lib/static" -ldynd -o ufunc.exe

Again, I was very impressed with how naturally I was able to create a C++ arrfunc that followed all the expected broadcasting rules. In another few weeks I expect to be able to do this sort of thing just as naturally in Cython.

Basic Examples

The past two weeks have been an adventure. I've spent a great deal of time working to find a reasonable way to refactor the DyND Python bindings in a way that works well for exporting all the existing Python/C++ compatibility functions at both the Cython and C++ level. After some discussion with my mentors, Mark and Irwin, I was able to get to what seems to be a workable model. I'm in the process of doing all the needed refactoring to put those changes into place. I'll post more details once I've finished the main structural changes to the Python bindings.

For now, I'd like to show a few basic examples of how to use DyND. The examples here showcase some of the similarities with NumPy as well as the simplicity of the notation. I haven't done any sort of performance comparison. The implementations I've put together here using DyND work using array slicing and array arithmetic, so the cost of dispatching for types is still present in each array arithmetic operation. More efficient implementations could be created by coding a version of an algorithm that operates directly on the buffers of the arrays given. More efficient and general implementations could be constructed using DyND's arrfuncs, which are similar in purpose to NumPy's gufuncs, but I'm still figuring out how they work, so I'll have to include examples of how to do that later on.

For my examples here, I'll focus on three different algorithms for polynomial evaluation. For simplicity, I'll assume the coefficients are passed as an array and the parameter value at which to evaluate the polynomial(s) is a single double precision value. Each algorithm will be presented in Python (using NumPy), in C++ (using DyND), and in Fortran. Each file will either compile to an executable (C++ and Fortran) or be run-able as a script (Python). The NumPy and DyND implementations will be roughly equivalent and will both operate on arbitrary dimensional arrays, operating on the first axis of the array and broadcasting along all trailing axes. The NumPy and DyND versions will also work with relatively generic types (either floating point or complex). The Fortran versions will apply only to 1-dimensional arrays and will only operate on arrays of double precision real coefficients.

Regarding the notation, I found it very easy to use DyND's irange object with its overloaded comparison operators very easy to use. It is used to specify ranges of indices over which to slice an array. Since multiple arguments are not allowed for the indexing operator in C++, the call operator is used for array slicing.

For compiling these examples on Windows, I used gfortran and a recent development version (3.7.0 dev) of clang compiled with mingw-w64. The exact compiler calls replacing "(algorithm)" with the name of each particular algorithm were

clang++ -fPIC -O3 -I"c:/Program Files (x86)/libdynd/include" -std=c++11 (algorithm).cpp -L"c:/Program Files (x86)/libdynd/lib/static" -ldynd -o (algorithm).exe
gfortran -fPIC -O3 (algorithm).f90 -o (algorithm)_f.exe

The corresponding compiler calls on BSD, Linux, or other Unix-based operating systems are similar with the -I and -L directories being modified (or omitted) according to the location of the DyND headers and library. The same set of flags should work with recent versions of clang++ and g++.

Horner's Algorithm

Horner's algorithm is the standard algorithm for evaluating a power basis polynomial at a parameter value, given the coefficients for the polynomial. Here the coefficients are assumed to be given in order from highest to lowest degree.

# Python
from __future__ import print_function
import numpy as np

def horner(a, t):
    v = a[0]
    for i in xrange(1, a.shape[0]):
        v = t * v + a[i]
    return v

if __name__ == '__main__':
    a = np.array([1, -2, 3, 0], 'd')
    print(horner(a, .3333))
// C++
#include <iostream>
#include <dynd/array.hpp>

using namespace dynd;

nd::array horner(nd::array a, double t){
    nd::array v = a(0);
    size_t e = a.get_dim_size();
    for(size_t i=1; i < e; i++){
        v = t * v + a(i);}
    return v;}

int main(){
    nd::array a = {1., -2., 3., 0.};
    std::cout << horner(a, .3333) << std::endl;}
! Fortran
module hornermodule
implicit none
contains
double precision function horner(a, t)
    double precision, intent(in) :: a(:)
    double precision, intent(in) :: t
    double precision v
    integer i, n
    n = size(a)
    v = a(1)
    do i = 2, n
        v = t * v + a(i)
    end do
    horner = v
end function horner
end module hornermodule

program main
    use hornermodule
    double precision a(4)
    a = (/1, -2, 3, 0/)
    print *, horner(a, .3333D0)
end program main

Decasteljau's Algorithm

Decasteljau's Algorithm is used to evaluate polynomials in Bernstein form. It is also often used to evaluate Bézier curves.

# Python
from __future__ import print_function
import numpy as np

def decasteljau(a, t):
    for i in xrange(a.shape[0]-1):
        a = (1-t) * a[:-1] + t * a[1:]
    return a

if __name__ == '__main__':
    a = np.array([1, 2, 2, -1], 'd')
    print(decasteljau(a, .25))
// C++
#include <iostream>
#include <dynd/array.hpp>

using namespace dynd;

nd::array decasteljau(nd::array a, double t){
    size_t e = a.get_dim_size();
    for(size_t i=0; i < e-1; i++){
        a = (1.-t) * a(irange()<(e-i-1)) + t * a(0<irange());}
    return a;}

int main(){
    nd::array a = {1., 2., 2., -1.};
    std::cout << decasteljau(a, .25) << std::endl;}

In the Fortran version of this algorithm, dynamic allocation within the function is needed.

! Fortran
module decasteljaumodule
implicit none
contains
double precision function decasteljau(a, t)
    double precision, intent(in) :: a(:)
    double precision, intent(in) :: t
    double precision, allocatable :: b(:)
    integer i, n
    n = size(a)
    allocate(b(n-1))
    b = (1-t) * a(:n-1) + t * a(2:)
    do i = 2, n-1
        b(:n-i) = (1-t) * b(:n-i) + t * b(2:n-i+1)
    end do
    decasteljau = b(1)
end function decasteljau
end module decasteljaumodule

program main
    use decasteljaumodule
    double precision a(4)
    a = (/1, 2, 2, -1/)
    print *, decasteljau(a, .25D0)
end program main

Clenshaw's Algorithm (Chebyshev Polynomials)

Clenshaw's algorithm provides a simple recurrence for evaluating Chebyshev polynomials. There is a similar algorithm for evaluating Legendre polynomials, but we will not demonstrate it here. The code here is a simplified adaptation of the code

# Python
from __future__ import print_function
import numpy as np

def clenshaw(a, t):
    c0, c1 = a[-2], a[-1]
    for i in xrange(3, a.shape[0]+1):
        c0, c1 = a[-i] - c1, c0 + c1 * (2 * t)
    return c0 + c1 * t

if __name__ == '__main__':
    a = np.array([1, 2, -1, -4], 'd')
    print(clenshaw(a, -.5))
// C++
#include <iostream>
#include <dynd/array.hpp>

using namespace dynd;

nd::array clenshaw(nd::array a, double t){
    size_t e = a.get_dim_size();
    nd::array c0 = a(e-2), c1 = a(e-1), temp;
    for(size_t i=3; i<e+1; i++){
        temp = c0;
        c0 = a(e-i) - c1;
        c1 = temp + c1 * (2 * t);}
    return c0 + c1 * t;}

int main(){
    nd::array a = {1., 2., -1., -4.};
    std::cout << clenshaw(a, -.5) << std::endl;}
! Fortran
module clenshawmodule
implicit none
contains
double precision function clenshaw(a, t)
    double precision, intent(in) :: a(:)
    double precision, intent(in) :: t
    double precision c0, c1, temp
    integer i, n
    n = size(a)
    c0 = a(n-1)
    c1 = a(n)
    do i = 2, n-1
        temp = c0
        c0 = a(n-i) - c1
        c1 = temp + c1 * (2 * t)
    end do
    clenshaw = c0 + c1 * t
end function clenshaw
end module clenshawmodule

program main
    use clenshawmodule
    double precision a(4)
    a = (/1, 2, -1, -4/)
    print *, clenshaw(a, -.5D0)
end program main

First Impressions And Overcoming Compilation Errors

Most of the past month was spent in a mad dash to finish my master's thesis. Who knew that writing 100 pages of original research in math could take so long? Either way, I'm finished and quite pleased with the result. My thesis is written and I have been able to, mostly, move on to other endeavors, like starting the work on my GSOC project, the Cython API for DyND (see the brief description and full proposal).

I've been able to start familiarizing myself with the package a little more. My background is primarily in using NumPy and Cython, but it has been really nice to be able to use a NumPy-like data structure in C++ without having to use the Python C API to do everything. Stay tuned for a few nice snippets of code over the next few days.

Overall, I've been impressed by the amount of functionality that is already present. A few things like allowing the use of an ellipsis in indexing or providing routines for basic linear algebra operations have yet to be implemented, but, for the most part, the library is easy to use and provides a great deal of useful functionality at both the C++ and Python levels.

Before beginning detailed work on my project, I was first faced with the task of getting everything working on my main development machine. For a variety of reasons, it uses windows and needs to stay that way. Undaunted, I ventured forth, and attempted to build libdynd and its corresponding Python bindings from source with my compiler of choice, MinGW-w64.

MinGW-w64 is a relatively new version of MinGW that targets 64 bit windows. It is not as commonly used as MSVC, or gcc, so it is normal to have to fix a few compile errors when compiling large amounts of C/C++ code. Python's support for MinGW is shaky as well, but usually things come together nicely with a little effort.

After several long and ugly battles with a variety of compiler errors, I was able to get it all working, though there are still some test failures that still need to be revisited. The DyND source proved to be fairly easy to work with. The most difficult errors I came up against came from conflicts between the environment provided by the compiler and some of the type definitions in the Python headers. I'll document some of the useful information I picked up here.

For those that aren't familiar with it, the C preprocessor is an essential part of writing platform independent code. Among other things, it lets you write specialized sections of code for pieces that are specific to a given operating system, compiler, or architecture. For example, to write a section of code that applies only to Windows, you would write something like

// If the environment variable indicating that you're compiling on Windows is defined
#ifdef _WIN32
  // Platform-specific code goes here.
#endif

Similarly, if you want to differentiate between compilers,

#ifdef _MSC_VER
  // Code for MSVC compiler here.
#elseif __GNUCC__
  // Code for gcc/g++ here.
  // Compilers that implement a GCC-like environment
  // (e.g. ICC on Linux, or Clang on any platform)
  // will also use this case.
#else
  // Code for all other compilers.
#endif

Most of the work I did to get everything running involved modifying blocks that look like this to get the right combination. Frequently, code is defined to do one thing on windows (with the assumption that only MSVC will be used) and another on Linux, even though, for MinGW, many of the Linux configurations are needed for the compiler. In the past, I've found myself continually searching for what the right ways to detect a given compiler, operating system, or architecture, but it turns out there is a fairly complete list where the variables indicating most common compilers and operating systems are shown.

The build system CMake also provides ways of detecting things like that. A few useful examples:

if(WIN32)
    # You're compiling for Windows
endif()
if("${CMAKE_SIZEOF_VOID_P}" EQUAL "8")
    # You're compiling for a 64-bit machine.
endif()
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
    # You're compiling for gcc.
endif()
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
    # You're compiling for MSVC.
endif()

Now, on to some of the tricker bugs. Hopefully including them here can help others resolve similar errors later on.

It is a less commonly known fact that the order of compiler flags matters if you are linking against different libraries. (An aside, for those that are new to the command line flags for gcc, there is a nice introduction here) As it turns out, the order of the flags for gcc also matters in some less than intuitive cases. After compiling libdynd, in order to compile a basic working example I had to do g++ -std=c++11 -fPIC declare_array.cpp -ldynd rather than g++ -std=c++11 -ldynd -fPIC declare_array.cpp. Running the latter would give a series of linker errors like

undefined reference to 'dynd::detail::memory_block_free(dynd::memory_block_data*)'

clang++ accepted both orderings of the flags without complaint.

I also ran into this bug in Python where there is an unfortunate name collision between a preprocessor macro defined in Python and a function in the c++11 standard library. The name collision results in the unusual error

error: '::hypot' has not been declared

coming from the c++11 standard library headers. It appears that a generally known solution is to include the standard library math header before Python.h. This wasn't a particularly favorable resolution to the issue since the Python headers are supposed to come first, so I opted to define the symbol _hypot back again to be hypot via a command line argument, i.e. adding the flag -D_hypot=hypot. It worked like a charm.

The last batch of errors came from my forgetting to define the macro MS_WIN64 on the command line when working with the Python headers. That macro is used internally to indicate that a Python extension module is being compiled for a 64 bit Python installation on Windows. In this case, forgetting to include that flag resulted in the error

undefined reference to '__imp_Py_InitModule4'

at link time.

In trying to fix the various errors from not defining MS_WIN64, I also came across errors where np_intp (a typedef of Py_intptr_t) was not the correct size, resulting in the error

error: cannot convert 'long long int*' to 'npy_intp* {aka int*}' in argument passing

The fundamental problem is that the logic in the Python headers fails to detect the proper size of a pointer without some sort of extra compile switch. Passing the command-line argument -DMS_WIN64 to the compiler when compiling Python extensions was sufficient to address the issue.

With most of the compilation-related issues behind me, I am now continuing my work in putting together several simple examples of how DyND can be used to perform NumPy-like operations at the C++ level. I'll be posting a few of them soon.

Update (6/8/2015)

In fixing a few more compilation errors, I found another useful table of C preprocessor macros that can be used to identify a given operating system. Among other things, it mentions that any BSD operating system can be detected using something like

#ifdef __unix__
#include <sys/param.h>
#ifdef BSD
// BSD stuff
#endif
#endif

The table in the GCC manual also lists which preprocessor macros are required for a compiler implementing the relevant language standards.