Commits


Yue authored and GitHub committed bbb610e61a0
GH-37753: [C++][Gandiva] Add external function registry support (#38116) # Rationale for this change This PR tries to enhance Gandiva by supporting external function registry, so that developers can author third party functions without modifying Gandiva's core codebase. See https://github.com/apache/arrow/issues/37753 for more details. In this PR, the external function needs to be compiled into LLVM IR for integration. # What changes are included in this PR? Two new APIs are added to `FunctionRegistry`: ```C++ /// \brief register a set of functions into the function registry from a given bitcode /// file arrow::Status Register(const std::vector<NativeFunction>& funcs, const std::string& bitcode_path); /// \brief register a set of functions into the function registry from a given bitcode /// buffer arrow::Status Register(const std::vector<NativeFunction>& funcs, std::shared_ptr<arrow::Buffer> bitcode_buffer); ``` Developers can use these two APIs to register external functions. Typically, developers will register a set of function metadatas (`funcs`) for all functions in a LLVM bitcode file, by giving either the path to the LLVM bitcode file or an `arrow::Buffer` containing the LLVM bitcode buffer. The overall flow looks like this:  # Are these changes tested? Some unit tests are added to verify this enhancement # Are there any user-facing changes? Some new ways to interfacing the library are added in this PR: * The `Configuration` class now supports accepting a customized function registry, which developers can register their own external functions and uses it as the function registry * The `FunctionRegistry` class has two new APIs mentioned above * The `FunctionRegistry` class, after instantiation, now it doesn't have any built-in function registered in it. And we switch to use a new function `GANDIVA_EXPORT std::shared_ptr<FunctionRegistry> default_function_registry();` to retrieve the default function registry, which contains all the Gandiva built-in functions. * Some library depending on Gandiva C++ library, such as Gandiva's Ruby binding's `Gandiva::FunctionRegistry` class behavior is changed accordingly # Notes * Performance * the code generation time grows with the number of externally added function bitcodes (the more functions are added, the slower the codegen will be), even if the externally function is not used in the given expression at all. But this is not a new issue, and it applies to built-in functions as well (the more built-in functions are there, the slower the codegen will be). In my limited testing, this is because `llvm::Linker::linkModule` takes non trivial of time, which happens to every IR loaded, and the `RemoveUnusedFunctions` happens after that, which doesn't help to reduce the time of `linkModule`. We may have to selectively load only necessary IR (primarily selectively doing `linkModule` for these IR), but more metadata may be needed to tell which functions can be found in which IR. This could be a separated PR for improving it, please advice if any one has any idea on improving it. Thanks. * Integration with other programming languages via LLVM IR/bitcode * So far I only added an external C++ function in the codebase for unit testing purpose. Rust based function is possible but I gave it a try and found another issue (Rust has std lib which needs to be processed in different approach), I will do some exploration for other languages such as zig later. * Non pre-compiled functions, may require some different approach to get the function pointer, and we may discuss and work on it in a separated PR later. Another issue https://github.com/apache/arrow/issues/38589 was logged for this. * The discussion thread in dev mail list, https://lists.apache.org/thread/lm4sbw61w9cl7fsmo7tz3gvkq0ox6rod * I submitted another PR previously (https://github.com/apache/arrow/pull/37787) which introduced JSON based function registry, and after discussion, I will close that PR and use this PR instead * Closes: #37753 Lead-authored-by: Yue Ni <niyue.com@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>