diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..4d80992 --- /dev/null +++ b/LICENSE @@ -0,0 +1 @@ +ALL RIGHTS RESERVED. diff --git a/README.md b/README.md new file mode 100644 index 0000000..f90cb17 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +# kempinger.at blog + +All rights reserved. \ No newline at end of file diff --git a/public_html/index.html b/public_html/index.html index 9348054..898f916 100644 --- a/public_html/index.html +++ b/public_html/index.html @@ -637,15 +637,15 @@ laboratory work.
The story of how I cross compiled a giant Rust + C++ facial recognition project using nix
+ Read full observation → ++ Over the past few months, I have been deep in the trenches + of cross-compiling. +
+ ++ This story takes place within + Project Digidow, a facial + recognition sensor written in Rust, built with Nix, and + deployed on a Raspberry Pi. +
+ +
+ Everything started when the version of TensorFlow Lite in
+ nixpkgs was deprecated and stopped building.
+ The ideal solution seemed simple: just update TensorFlow
+ Lite to a newer version. Unfortunately, that turned out to
+ be far from easy, since TensorFlow Lite can be built with
+ either Bazel or CMake. The existing Nix build used Bazel,
+ which did not want to update gracefully.
+
+ I decided to switch to building TensorFlow Lite from source
+ using CMake. This worked significantly better, but still
+ required some effort. I replaced all fetchers in the CMake
+ scripts with file:// URLs and used pre-built
+ binaries from nixpkgs whenever possible.
+ However, since the CMake build system insists on compiling
+ nearly everything from source, I had to manually provide
+ most dependencies.
+
+ At this point, I ran into linker issues with
+ tflite-rs. The Bazel build produces a “fat”
+ .so file that includes all transitive
+ dependencies, while the CMake build does not. This meant
+ that I had to manually link all dependencies.
+
+ My eventual solution was to provide all transitive
+ dependencies in the Nix build output directory as well. As a
+ result, tflite-rs now uses Bazel for local
+ building, but it also supports a provided binary, which I
+ now build with CMake.
+
+ Unfortunately, the sensor was far too large to compile
+ directly on the Raspberry Pi. The obvious next step was
+ cross-compiling. Rust supports cross-compilation out of the
+ box, and Nix has great support for it via the
+ cross toolchain. In theory, this should have
+ been simple. In practice, nothing worked. Or rather, some
+ parts worked, others did not, and I had no idea why.
+
+ The biggest issue was the classic “tools for host vs. tools + for target” problem. +
+ +
+ The solution was to use the appropriate tools for the host
+ and target platforms via Nix (thanks,
+ pkgsCross). During build time, I injected and
+ replaced tool paths in the CMake scripts. The worst
+ offenders were Protobuf and FlatBuffers, since both rely on
+ compiler executables that need to run on the host machine,
+ yet require version-specific shared libraries that belong to
+ the target platform.
+
+ After a lot of manual labor, everything finally compiled + successfully. +
+ ++ To run the binary on the Raspberry Pi, I had to transfer it + there along with all its transitive dependencies. Of course, + those dependencies also had their own dependencies. I copied + them all over, only to find that the binary still refused to + run. +
+
+ Nix uses absolute library paths starting with
+ /nix/store, which normally ensures version
+ consistency and isolation. However, that system falls apart
+ when you move binaries to another machine.
+
+ I tried statically linking everything into one binary to + solve this problem, but that did not work either. The static + linking process simply is not supported by all dependencies + yet. +
+ +
+ So, I went back to manually editing the library paths in the
+ binary. At this point, patchelf became my new
+ best friend:
+
+patchelf --remove-needed libfoo.so --remove-needed libbar.so + patchelf --add-needed libfoo.so --add-needed libbar.so+
+ Later, I realized that I could have simply used
+ patchelf --set-rpath $ORIGIN/../lib
+ instead of spending hours on manual path edits.
+
+ After fixing the paths, I could finally execute the binary + using: +
++LD_LIBRARY_PATH=/path/to/libs:$LD_LIBRARY_PATH ./binary+
+ The binary started, printed some initial logs, and then + failed when attempting to import an OpenCV model. The error + message indicated that the model file could not be parsed. + This was confusing because the exact same model worked + before, and it still worked perfectly on x86_64 — just not + on aarch64. +
+ ++ Running the sensor under GDB confirmed that all the correct + libraries were being called. The relevant call chain looked + like this: +
+Rust code → rust-opencv → OpenCV → Protobuf.
+
+ My first theory was that the protoc compiler
+ had generated code for the wrong architecture or endianess.
+
+ The conclusion was clear: the import code itself worked. It + just didn’t work inside my main binary. +
+ +
+ I switched from a bottom-up approach to a top-down one and
+ started reducing the binary until I found the smallest
+ version that still failed. The culprit quickly emerged:
+ simply importing tflite with
+ use tflite::Tflite was enough to break OpenCV.
+
+ Remember how Nix uses absolute library paths? And how I + mentioned that CMake wanted to build everything from source? + To get TensorFlow Lite building, I had given it the version + of Protobuf it wanted, and I also exported all its + transitive dependencies — including that version of + Protobuf. +
++ This led to a situation where OpenCV loaded the wrong + Protobuf library. It wasn’t different enough to cause a + linker error, but it was just different enough to break + model parsing in subtle ways. +
+ +
+ The fix turned out to be surprisingly simple. I just needed
+ to make TensorFlow Lite use the same version of Protobuf
+ that was packaged with nixpkgs.
+
+ After months of debugging, rebuilding, and manually patching
+ binaries, everything finally works again. Until the next
+ nixpkgs update, of course.
+