Matlab-like math library on top of CMSIS -> Thoughts?

valentin · ‎2016-12-11

Posted on December 11, 2016 at 10:16

Hi,

I recently started a hobby project which eventually ends up needing an unscented kalman filter. So I'm now facing the challenge to port my matlab code to the STM I know there is embedded target code generator etc but I want to build it myself to also learn something in the process and build my own C math toolbox.

After some research I found that the CMSIS-DSP library is quite powerful, handles matrizes and seems optimized for CortexM cpus. It's even able to make use of the M4 DSP instructions and FPU.

Therefore, I decided to use it as a foundation for my project.Alternatives like Eigen, Armadillo or gsl seemed 1. overkill and 2. difficult to implement on an embedde platform (let alone the potential lack of FPU/DSP instructions use and the bigger code size).

But some needed functionality in CMSIS is still missing, like diag(), ones(), chol() etc. So I set off to build my own matlab.c/.h library to act on top of CMSIS. The idea is to implement functions with the same name as the ones in matlab and to use CMSIS data types + functions and standard C style in the best (speed-wise and code-readability-wise) way possible.

I'd like to get an opinion about my approach:

- Am I re-inventing the wheel in vein? Should I rather stick to an established library?

- Functions like ones() pose the question about malloc. It would be way more straightofrward if I malloced a matrix instance in ones() and returned it instead of relying on working with pre-allocated memory. But I somewhat fear the additional memory management involved. What do you guys think?

- Any other thoughts? Should I maybe put this on github? Would someone else be interested in collaborating and adding own matlab functions to this?

Here is a snippet of the first code I've written today (actually the complete file so far) so you can get an idea of how it all works:

/*
 * matlab.c
 *
 * Created on: 2016
 * Author: nose
 *
 * Tool library to add matlab-like functionality on top of CMSIS math library.
 * The goal is to add all matlab functions over time with C-Style parameters.
 *
 * Requirements:
 * ARM Cortex-M CPU (M4/7 for SIMD instructions) and CMSIS-DSP library
 *
 * As this library uses 32bit floats, a FPU is highly recommended (M4/M7).
 * Maybe I'll do another one with Q31/Q15 fixed point math, too.
 * No 64bit math in here (no doubles) for speed reasons.
 *
 * Note: All arm_matrix_instance_f32 variables used as parameters MUST be initialized first.
 * Each function assumes allocated memory for the data array. NO dynamic memory allocation.
 */
/*
 * undefine MAT_EXTRA_CHECKS for speedup
 */
#define MAT_EXTRA_CHECKS
#include 'matlab.h'
/**
 * Fills the given Matrix with ones.
 * @param pMatrix The matrix to fill. Needs numRows and numCols set to final values first!
 * @return ARM_MATH_SUCCESS or ARM_MATH_SIZE_MISMATCH
 */
arm_status mat_ones(arm_matrix_instance_f32 *pMatrix) {
#ifdef MAT_EXTRA_CHECKS
 if (pMatrix->numCols < 1 || pMatrix->numRows < 1)
 return ARM_MATH_SIZE_MISMATCH;
 // pDataVector not initialized?
 if (&(pMatrix->pData) == NULL)
 return ARM_MATH_ARGUMENT_ERROR;
#endif
 for (int i = 0; i < pMatrix->numCols * pMatrix->numRows; i++) {
 pMatrix->pData[i] = 1.0f;
 }
 return ARM_MATH_SUCCESS;
}
/**
 * Takes an input vector (matrix with numCols OR numRows == 1) and creates a diagonal matrix
 * with pVecIn along the diagonal.
 * @param pVecIn Input Vector (length N)
 * @param pMatOut Output Matrix (has to be size NxN)
 * @return ARM_MATH_SUCCESS if OK, ARM_MATH_SIZE_MISMATCH if error
 */
arm_status mat_diag(arm_matrix_instance_f32 *pVecIn, arm_matrix_instance_f32 *pMatResult) {
#ifdef MAT_EXTRA_CHECKS
 // pDataVector not initialized?
 if (&(pVecIn->pData) == NULL)
 return ARM_MATH_ARGUMENT_ERROR;
 if (&(pMatResult->pData) == NULL)
 return ARM_MATH_ARGUMENT_ERROR;
 // only accept vectors!
 if (pVecIn->numCols != 1 && pVecIn->numRows != 1) {
 return ARM_MATH_SIZE_MISMATCH;
 }
 // result matrix size checks
 if (pMatResult->numCols != pMatResult->numRows)
 return ARM_MATH_SIZE_MISMATCH;
#endif
 uint16_t size = pVecIn->numCols > pVecIn->numRows ? pVecIn->numCols : pVecIn->numRows;
#ifdef MAT_EXTRA_CHECKS
 // result matrix size checks
 if (pMatResult->numCols != size)
 return ARM_MATH_SIZE_MISMATCH;
#endif
 memset(pMatResult->pData, 0, size * size);
 for (int i = 0; i < size; i++) {
 pMatResult->pData[i + i * size] = pVecIn->pData[i];
 }
 return ARM_MATH_SUCCESS;
}
/*
 #include <stdio.h>
 #include <stdlib.h>
 #include <math.h>
 double *cholesky(double *A, int n) {
 double *L = (double*)calloc(n * n, sizeof(double));
 if (L == NULL)
 exit(EXIT_FAILURE);
 for (int i = 0; i < n; i++)
 for (int j = 0; j < (i+1); j++) {
 double s = 0;
 for (int k = 0; k < j; k++)
 s += L[i * n + k] * L[j * n + k];
 L[i * n + j] = (i == j) ?
 sqrt(A[i * n + i] - s) :
 (1.0 / L[j * n + j] * (A[i * n + j] - s));
 }
 return L;
 }
 */
/**
 * Cholesky separation of pMatIn.
 * MUST initialize pMatResult before passing to this.
 * @param pMatIn NxN input matrix
 * @param pMatResult NxN output matrix
 * @return ARM_MATH_ARGUMENT_ERROR if matrizes uninitialized
 * ARM_MATH_SIZE_MISMATCH on matrix size mismatches
 * ARM_MATH_SUCCESS on success
 */
arm_status mat_chol(arm_matrix_instance_f32 *pMatIn, arm_matrix_instance_f32 *pMatResult) {
#ifdef MAT_EXTRA_CHECKS
 // pDataVector not initialized?
 if (&(pMatIn->pData) == NULL)
 return ARM_MATH_ARGUMENT_ERROR;
 if (&(pMatResult->pData) == NULL)
 return ARM_MATH_ARGUMENT_ERROR;
 if (pMatIn->numCols != pMatIn->numRows)
 return ARM_MATH_SIZE_MISMATCH;
 if (pMatResult->numCols != pMatIn->numCols)
 return ARM_MATH_SIZE_MISMATCH;
 if (pMatResult->numRows != pMatIn->numRows)
 return ARM_MATH_SIZE_MISMATCH;
#endif
 for (int i = 0; i < pMatIn->numCols; i++)
 for (int j = 0; j < (i + 1); j++) {
 float32_t s = 0;
 for (int k = 0; k < j; k++) {
 s += pMatResult->pData[i * pMatIn->numCols + k]
 * pMatResult->pData[j * pMatIn->numCols + k];
 }
 if (i == j) {
 arm_sqrt_f32(pMatIn->pData[i * pMatIn->numCols + i] - s,
 pMatResult->pData[i * pMatIn->numCols + j]);
// pMatResult->pData[i * pMatIn->numCols + j] = sqrt(pMatIn->pData[i * pMatIn->numCols + i] - s);
 } else {
 pMatResult->pData[i * pMatIn->numCols + j] = (1.0
 / pMatResult->pData[j * pMatIn->numCols + j]
 * (pMatIn->pData[i * pMatIn->numCols + j] - s));
 }
 }
 return ARM_MATH_SUCCESS;
}
�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?
/*
 * matlab.h
 *
 * Created on: 2016
 * Author: nose
 *
 * Tool library to add matlab-like functionality on top of CMSIS math library.
 * The goal is to add all matlab functions over time with C-Style parameters.
 *
 * Requirements:
 * ARM Cortex-M cpu and CMSIS-DSP library
 */
#ifndef MATLAB_H_
#define MATLAB_H_
#define __FPU_PRESENT 1U
#include 'arm_math.h'
#include 'stdint.h'

arm_status mat_ones(arm_matrix_instance_f32 *pMatrix);
arm_status mat_diag(arm_matrix_instance_f32 *pVecIn, arm_matrix_instance_f32 *pMatResult);
arm_status mat_chol(arm_matrix_instance_f32 *pMatIn, arm_matrix_instance_f32 *pMatResult);
#endif /* MATLAB_H_ */
�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?

(or would it be better to make the whole thing a C++ project? But then I could also use Eigen, I suppose ... did anybody get Eigen running on STM32?)

Franzi.Edo · ‎2017-01-05

Posted on January 05, 2017 at 14:30

Hi,

You should check the uKOS (

http://www.ukos.ch

) project

http://www.ukos.ch/sysquake-embedded-2/nn-separator.html

In the RTOS it is included an LME interpreter (Matlab like language) that allow you to mix C and LME code.

The project is open source and support the stm cpus (M3, M4, M7).

Cheers

Edo

AvaTar · ‎2017-01-05

Posted on January 05, 2017 at 17:46

Besides of performance and code space restraints, keep the limited accuracy of float (32-bit) in mind.

Iterative algorithms tend to falter quickly with single precision.

An integer-based algorithm often serves you better.

At least verify you code and error margins on a PC.

valentin · ‎2017-01-08

Posted on January 09, 2017 at 01:38

Thanks

Franzi.Edo

‌, I'll check that link. Looks like some on-top scripting interpreter? I wonder how it behaves performance-wise?

meyer.frank

‌: Yes, I'd prefer double. I will do some tests beforehand. Thanks for the heads-up. A general-purpose unscented kalman filter implementation would be quite difficult with integers-only, though. The model and sensor variances need floating point numbers, for example. Anyhow, I'll report back once I have some implementation running.

Right now it looks to me as if doing the whole thing in C++ using the Eigen library would be the easiest way to do it. Saves me the reinventing of the wheel...

AvaTar · ‎2017-01-09

Posted on January 09, 2017 at 20:07

'Double' is much better for precision and stability, less so for performance. The M4 FPU only supports only single precision, but I think that's no news for you.

Right now it looks to me as if doing the whole thing in C++ using the Eigen library would be the easiest way to do it.

While it is stated to depend only on the C++ standard libraries, that is no safeguard against portability issues.

In the end, it's a codesize and performance issue. If it fits for your application, it's fine.

S.Ma · ‎2017-01-09

Posted on January 10, 2017 at 05:16

For data crunching, there is STM32F769 with 64 bit double precision handling by the core: Could be a good candidate for experimenting with matlab from the higher end side.

http://www.st.com/en/microcontrollers/stm32f7x9.html?querycriteria=productId=LN1999

AvaTar · ‎2017-01-10

Posted on January 10, 2017 at 10:20

A board/platform Cortex A (e.g. a Cortex A5) or similar (MIPS, etc.) running Linux would be an alternative.

That would provide a much 'richer' development environment, with cost being competitive with a M7 solution.

Renaud BOUZEREAU · ‎2017-01-11

Posted on January 11, 2017 at 09:41

Cortex-A has clear benefits but would come with integration, power concumption and real time trade-off.

AvaTar · ‎2017-01-11

Posted on January 11, 2017 at 10:32

This does not come unexpected, since ST has no Cortex A portfolio ;)

But power consumption of typical Cortex A5 controllers is even below that of M7, and boards are on about the same level.

Besides of the much more comfortable development environment (feature and example-wise), such a Cortex A board usually comes with a Linux and all drivers in source code. And there are several real-time variants of Linux, and other real-time OSs with full POSIX - compatible API.

This is where Cortex M and Cortex A begin to intersect, and compete for customers. At the end, it depends mostly on personal preferences of the developer(s) and project manager(s).

Renaud BOUZEREAU · ‎2017-01-12

Posted on January 12, 2017 at 09:44

For sure, personnal preferences and skills are major decision criteria. An MCU cannot replace MPUs in many application for pure performance reasons. However, the MCUs are pushing the limits up. As for power consumption, while the STM32F7 shows comparable power figures in Run mode vs an A5 processor only, one should not forget to add the DDR power to the A5. The recent H7 series just cuts half of the dynamic power vs the F7. Then, A5+DDR should be less integrated than one single STM32F7, unless a POP approach is offered for the MPU. As for the ecosystem, Linux is just incomparable. However, the efforts currently done to enrich the MCU ecosystem are just impressive. An Alexa demo running on the STM32F7 Discovery kit was presented at the CES is generating a lot of interest as it provides a solution MCU developers willing to leverage Alexa technology. Again, personal preferences and skills will be king in the end decision for developers.