How to run OpenCV on STM32 MCU

In this article I’ll tell how I got OpenCV application running on STM32746G-Discovery and STM32F769I-Discovery. If you want just a brief instruction how to reproduce it, head to corresponding wiki page on Embox OS Github.

I’m one of Embox developers. This RTOS allows to run some “heavy” Linux software (QT, OpenGL, PJSIP, etc.) without Linux kernel (i.e. you need far less resources and get more direct control over peripherals). Running OpenCV on MCUs seems to be quite popular request, but it seems that nobody done that yet (there are some videos with names like “OpenCV + STM32”, but as far as I see they use STM32 board as a camera, while actual image processing is being done on desktop), so I decided to port it to STM32F7Discovery board.

What’s the problem?

OpenCV has two major issues when you try to run it on MCU:

Porting OpenCV to Embox

When you port something to a new platform, it’s a good idea to build it from source in a normal way, i.e. compile it for GNU/Linux system. With OpenCV it’s not a problem: source code is available at Github, and it’s easy to build it with cmake.

Good news: OpenCV can be statically linked, so it will be much easier to port to MCU. Let’s build it with default config and find out how much code it produces:

> size lib/*so --totals
text     data   bss    dec      hex     filename
1945822  15431  960    1962213  1df0e5  lib/libopencv_calib3d.so
17081885 170312 25640  17277837 107a38d lib/libopencv_core.so
10928229 137640 20192  11086061 a928ed  lib/libopencv_dnn.so
842311   25680  1968   869959   d4647   lib/libopencv_features2d.so
423660   8552   184    432396   6990c   lib/libopencv_flann.so
8034733  54872  1416   8091021  7b758d  lib/libopencv_gapi.so
90741    3452   304    94497    17121   lib/libopencv_highgui.so
6338414  53152  968    6392534  618ad6  lib/libopencv_imgcodecs.so
21323564 155912 652056 22131532 151b34c lib/libopencv_imgproc.so
724323   12176  376    736875   b3e6b   lib/libopencv_ml.so
429036   6864   464    436364   6a88c   lib/libopencv_objdetect.so
6866973  50176  1064   6918213  699045  lib/libopencv_photo.so
698531   13640  160    712331   ade8b   lib/libopencv_stitching.so
466295   6688   168    473151   7383f   lib/libopencv_video.so
315858   6972   11576  334406   51a46   lib/libopencv_videoio.so
76510375 721519 717496 77949390 4a569ce (TOTALS)

As you can see at the last line, .bss and .data sections take less than 1MiB each while code section is ~70MiB (of course with static linking it will take much less for particular application, but that’s too much anyway).

Now let’s try to make a minimal build. Call cmake .. -LA to list available options and turn off as much options as possible:

-DBUILD_opencv_java_bindings_generator=OFF \
   -DBUILD_opencv_stitching=OFF \
   -DWITH_PROTOBUF=OFF \
   -DWITH_PTHREADS_PF=OFF \
   -DWITH_QUIRC=OFF \
   -DWITH_TIFF=OFF \
   -DWITH_V4L=OFF \
   -DWITH_VTK=OFF \
   -DWITH_WEBP=OFF \
   <...>

Section sizes:

> size lib/libopencv_core.a --totals
text data bss dec hex filename
3317069 36425 17987 3371481 3371d9 (TOTALS)

Run OpenCV in QEMU

It’s a good idea to start with emulator than going with actual hardware, so let’s try out QEMU to run OpenCV on emulated Integrator/CP board (it’s just a random ARM board with video support in QEMU).

Minimal working example of using OpenCV looks like this:

version.cpp:

#include
#include

int main() {
 printf("OpenCV: %s", cv::getBuildInformation().c_str());
 return 0;
}

This program prints some OpenCV info:

root@embox:/#opencv_version
OpenCV:
General configuration for OpenCV 4.0.1 =====================================
Version control: bd6927bdf-dirty

Platform:
Timestamp: 2019-06-21T10:02:18Z
Host: Linux 5.1.7-arch1-1-ARCH x86_64
Target: Generic arm-unknown-none
CMake: 3.14.5
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: Debug

CPU/HW features:
Baseline:
requested: DETECT
disabled: VFPV3 NEON

C/C++:
Built as dynamic libs?: NO
< other build info follows >

Next step is running some basic example of actual image processing. There are some examples on the official website, I’ve chosen Canny edge detector.

OpenCV supports QT, GTK and Window APIs, which are too heavy to run on MCUs, so I had to rewrite some parts of this example for direct drawing to the frame buffer. After some tinkering with inner OpenCV image formats, I got following results:

Original image

Result of edge detection

Run OpenCV on STM32F7Discovery

32F746GDISCOVERY has following memory resources:

  1. 1 Mibytes of Flash memory
  2. 340 Kibytes of RAM
  3. 128-Mbit Quad-SPI Flash memory
  4. 128-Mbit SDRAM (64 Mbits accessible)
  5. Connector for microSD card

microSD can be used to store images, but it’s not very helpful to handle large code section.

Display resolution is 480x272, so frame buffer will take 522 240 bytes (with 32-bit colors), i.e. it doesn’t fit RAM. However, it’s possible to use SDRAM for heap and frame buffer; the rest of RAM will be used for other OS needs (stack, process resources, etc.).

Minimal Embox config with OpenCV has following sections:

text data bss dec hex filename
2876890 459208 312736 3648834 37ad42 build/base/bin/embox

Brief digression on sections: .text and .rodata sections contain instructions and constants (i.e. unmodifiable data), .data contains mutable data, .bss contains “zeroed” variables, which are not actually placed into the kernel image, but this memory will be used in run-time.

.data/.bss are ok, they surely will fit RAM/SDRAM, but .text is too large (code is placed to flash memory — 1MiB).

The only way to handle such large code section is to use QSPI-flash memory. It has memory-mapped mode, which allows read-only access via system bus, so it’s possible to place code there. However, there are few problems with it:

  1. QSPI is not accessible after reboot, i.e. it’s neccessary to perform some software initialization before executing code from this memory.
  2. You can’t flash it with openocd and gdb as usual.

Eventually I decided to write a small bootloader which got data from host computer via TFTP and wrote data to QSPI with stm32cube functions.

Results

Finally, it’s working! However, it takes too much time: 40 seconds to process and draw image.

Then I tried to run the same program on STM32F769I-Discovery board with slightly different config (it uses another pins for UART and stuff like that). This board has 2MiB flash memory, so with -O2 it works just fine without QSPI trick, and process this image in 3 seconds.

I hope this article would help you to run your own OpenCV-based projects. Feel free to create issues in Embox repository or to mail me if you need some help.


Back