Cross-language integration in Java
Java is one of the most popular languages, but it is eclipsed by JavaScript and Python. This has the unfortunate effect that many useful opensource libraries are not available for Java even though similar libraries are well supported in JavaScript or Python. This leaves Java developers with the unenviable task of integrating non-Java libraries into Java apps.
There are several ways to get non-Java libraries to work in Java:
- Porting: Rewriting the library in Java yields the best results, but even though porting is cheaper than original development and despite help from LLMs, porting every useful library to Java is not feasible.
- GraalVM: GraalVM is a polyglot runtime. It is the best hope for cross-language integration in Java. It is so promising it probably sucks life out of single-language runtimes: Rhino and Nashorn are deprecated while Jython is stuck on Python 2. GraalVM's status, compatibility, and future are still crystalizing, so switching to it is non-trivial and risky. Seems unsuitable for use in libraries at the moment.
- Lua: Lua is specifically designed for embedding in other languages and there is therefore hope it might be easier to work with than Python or JavaScript. Unfortunately, it is not so well supported in practice. It is up to you to pick a fork of Luaj that works for you. Of course, to embed Lua, you first need a library implemented in Lua and unfortunately Lua is rarely the first choice of target language for library implementors.
- WebAssembly: WebAssembly will likely take Lua's place one day with much wider library support and much better performance, but we are not there yet. As of 2023, neither Python nor JavaScript run inside WebAssembly.
- JNI/JNA: Native library access is useful for integrating C, C++, and recently also Rust libraries, but it cannot be directly used to call into Python or JavaScript libraries, which are the most abundant ones. You can use it to integrate with Python/C API and V8 (perhaps via Javet), but you are probably better off using GraalVM.
- CLI: Shelling out to Python or Node.js utilizes primary runtime of the language, which is always up to date. It is a universal solution that works with just about every language. Plus it provides access to existing CLI tools. The catch is in dependency management and in the cost of launching separate process every time the library is used.
- Microservices: This is similar to the CLI approach in that it works for just about anything. The cost of launching separate process is replaced with much lower cost of an RPC call. The downside is that now the process has to be kept in RAM all the time and there are new security considerations. While dedicated RPC API can be developed for every library, it might well be simpler to just expose generic REPL API or even to run REPL on localhost via pipe.
- Data: If the library is just a thin wrapper around machine learning model or other data, it is often reasonably easy to rewrite the wrapper in Java. Underlying data can then be reused as is.
There is no hassle-free solution. You can generally choose from one of two paths:
- Pure Java: Your choices are limited to porting, Lua, WebAssembly, and data wrappers. Your code remains portable and easy to build, but you pay for that with a lot of development time.
- Containerization: Instead of shipping a Java app, you ship a container. Obviously, this only works for apps, not libraries. Container creates controlled environment where you can install non-Java dependencies.
I think the root cause of these difficulties is that Maven lacks standard set of plugins for managing non-Java dependencies. Attempts to bundle non-Java dependencies in library JARs tend to be half-successful and cause issues downstream. Java only has means to invoke non-Java dependencies. It cannot declare or install them. That effectively forces Java libraries to be pure Java while apps have to use additional containerization layer that ties them to particular containerization platform.