Section E) Run-time systems

This section describe the usage and design of:
  • PTCORE : A run-time system for pthread platforms (OSX/Linux, etc.)
  • WINCORE : A run-time system for Win32 platforms (WinXP/Win7/Win8, etc.)

E.1) PTCORE Usage, and options.

The PTCORE run-time uses the pthread library for execution onto POSIX based platforms, and has been adopted and tested under OSX 10.9.4 and Ubuntu (debian) Linux, 12.04/14.04. It's likely to compile and run under similar distributions. In section E.2 we will further discuss the requirements
put by the design of PTCORE.

PTCORE is distributed either as a tarball or from archive, having the same basic catalogue structure. The given structure can be altered at will, and is used only for the compilation (i.e., has no meaning during run-time).

The structure is a follows:
Application/augotgen.c (the auto generated C file is assumed here)
RTFM-RT/RTFM-PT.c      (the run-time implementation)
RTFM-RT/RTFM-PT.h      (the basic run-time API)
RTFM-RT/RTFM-RT.h      (the real-time extension to the run-time API)
RTFM-SRC/Ex1.core...   (a set of examples)

An executable i built by simply run
gcc RTFM-PT.c -lpthread
(the RTFM-PT.c includes the necessary headers and autogen.c)

The PTCORE run-time system currently supports enabling the following options given at compile time -D (options can be arbitrarily combined)

TRACE       to trace the interaction between application and run-time system
TRACE_OS    to trace the interaction between run-time system and OS
TRACE_TIME  to trace the current time for other traces
            (will have no effect if both TRACE and TRACE_OS are disabled 
WARN_DL     run-time verification, reporting deadline over-run
ABORT_DL    run-time verification, aborting on deadline over-run

Tracing options are provides detailed information both on the -core program behaviour as such, but can also be used for getting a deeper understanding of the run-time system design and implementation.

The WARN_DL/ABORT_DL options are giving feedback on the timely properties of the -core program as run under best effort by the PTCORE run-time system.

It is possible to link with (any) other C code or library, but -core programs are compiled monolithically. However, -core programs may include other -core files as shown in the examples (section A).

Monolithic compilation facilitates code generation in the -core compiler (side-stepping the need to define an object-code format for -core). Moreover, it allows the C compiler (backend) to optimise -core code as a whole (circumventing the need for link time optimisation, at least for the -core parts).

For now, PTCORE has only been tested under gcc, but both CompCert C (ccomp) and LLVM (clang) compilation should be feasible with minor modifications to the code-generation and run-time system implementation.

E.2) PTCORE design.

PTCORE has been designed to fulfil the following purposes:
  • To provide an easy in-step to -core programming, the compilation cycle from -core program to OSX/Linux executable is typically < 1s. This allows the programmer to experiment with the language without being concerned with low-level details of an embedded platform. (Build cycles are typically more complex, involving the downloading to/flashing of target, observing stdoutput, typically involves multiple connections/terminals, etc, etc.)
  • To provide an alternative to traditional thread based libraries for concurrent programming in C. Using -core/PTCORE the programmer is required no knowledge of threads, and can focus directly on the problem to be solved. Moreover, the -core compiler gives feedback on potential deadlocks directly derived from analysing the input program. A program that passes without warning will be free of deadlocks during execution by PTCORE. In comparison thread libraries (pthreads/boost etc), leaves deadlocks at the hand of the programmer. In case the programmes cannot guarantee the program/model to be free of deadlocks, this has to be handled (by the programmer) gracefully (by implementing fallback code managing each potential deadlock).
  • As a basis for researching multi-core scheduling. Already today PTCORE provides a unique solution to multi-threaded programming. All tasks are instantiated at compile time. This allows all costly memory management (buffer allocations) thread creation etc. to be performed before executing the Reset. Hence a system, that is able to start, will never fail due to the depletion of resources. (During run-time the PTCORE requires 0 bytes of additional ram!).

In the following we will outline the basic design of the PTCORE run-time.

E.2a) Outset from -core and the rtfm-core compiler

  1. Tasks equals threads: A -core program is fully specialised w.r.t., task instances and functions. This allows for a 1-1 mapping in between threads and tasks in the system. Each task instance is associated with a (relative) deadline. The -core compiler assigns logic priories to the task instances according to their deadline (shorter deadline, higher priority).
  2. Resources equals (named) critical sections: For a -core program, the resource ceilings are statically computed. I.e., for each critical section S, the ceiling of S is equal to the highest priority of all tasks that access S.
  3. Under the assumption that tasks meet their deadline (the system is schedulable) and the inter-arrival of tasks are larger then their deadline, each message requires only a single buffer element. All buffers are generated by the -core compiler, and allocated as (static) heap elements by the C-code linker.

E.2b) PTCORE implementation

The outsets for run-time system implementation given in E.2a) allows for a minimalistic and efficient design of PTCORE.
  • Each task instance amounts to a thread created with according (static) priority.
PTCORE deploys pthread FIFO based scheduling. This allows higher priority tasks to execute preemptively to lower priority tasks and works over multiple CPUs. This ensures that the (non-blocked) threads/tasks with highest priority in the system are executed. (FIFO scheduling is non-time sliced, and is the closest to real-time you get under pthreads).
  • Each resource/critical section amounts to a mutex. Priority inversion is the case when
    • a low priority task A holds a resource preventing a
    • high priority task C form execution,
    • while a task B with prio A<B<C executes in favour of A.
Pthreads provides among the jungle of primitives and options a means to enforce ceiling protocols on the mutexes. In PTCORE we use this feature to assign a static ceiling to each mutex (resource/critical section).
Hence, as soon as a task enters a critical section, it will run at the priority of the resource ceiling, thus the problem of priority inversion is avoided.
  • Exploiting the assumption that the system is schedulable (E.2a-3) holds, the message passing can be done without inferring any additional locks (mutexes) on the message buffers.
We perform run-time verification, the ABORT_DL discards an invalid system during execution, while WARN_DL presents a warning. Without any options an over-run might pass without the user's attention. To this end, the run-time system is designed such to provide a robust solution (discarding any message causing a buffer). However, in case of overload, in the current implementation, only timing information not actual message data is protected from race.
  • Thread life cycle.
Each thread is created before the system Reset is executed. Reset and Idle are executed as the main thread of the process. This allows the interaction to badly written (legacy) libraries requiring main thread hooks, to be setup in Reset and managed in Idle. In this way, PTCORE is complete w.r.t., POSIX, Reset and Idle may perform ANY system call. I.e, any application or even scheduler that can be written under POSIX will work in collaboration with PTCORE without the need of ANY adjustments of the PTCORE run-time. Each thread is given its task id as argument for creation, through witch it access the following static structures:

generated by -core compiler
// id
// 0    reset task
// 1    idle 
// ...  application tasks
int entry_prio[] = {0, 0, ...}; 
ENTRY_FUNC entry_func[] = {user_reset, user_idle, ..., ENTRY_NR};

and internal in the PTCORE (ENTRY_NR known after including autogen.c)
RTFM_time       base_line[ENTRY_NR];
RTFM_time       dead_line[ENTRY_NR];
RTFM_time       new_base_line[ENTRY_NR];
RTFM_time       new_dead_line[ENTRY_NR];
pthread_mutex_t res_mutex[RES_NR];
sem_t*          pend_sem[ENTRY_NR];
pthread_mutex_t pend_count_mutex[ENTRY_NR];
int             pend_count[ENTRY_NR] = { 0 };

The thread/task handler is implemented as follows
void *thread_handler(void *id_ptr) {
    int id = *((int *) id_ptr);
        while (1) {
        // consume the semaphore (decrement its value)
        RTFM_time offset;
        {   // inside lock of the counter
            pend_count[id]--; // may not be atomic, hence needs a lock
            // for good measure we work on the timing information under the lock
            base_line[id] = new_base_line[id];
            dead_line[id] = new_dead_line[id];
            RTFM_time cur_time = time_get();
            offset = base_line[id] - cur_time;
        entry_func[id](id); // dispatch the task
#ifdef ABORT_DL
        if (over_run(id)) {
            fprintf(stderr, "Aborting on failing to meet deadline!\n");
#elif WARN_DL
    return NULL;

the actual execution the task is performed by entry_func[id], which is
an auto-generated C-code for the task instance, that passes on the arguments
to (a potentially inlined) task implementation, e.g.,

void entry_reset_t_0(int RTFM_id) {
    reset_t_0(RTFM_id, arg_reset_t_0.i); // (inlined) call to the task (function)

void reset_t_0(int RTFM_id, int i){ // function implementation for task:reset_t_0
    printf("task t %d\n", i);

The structs for holding the message is auto-generated as follows:
typedef struct {int i;} ARG_reset_t_0; // type definition for argument
ARG_reset_t_0 arg_reset_t_0;           // instance for argument

(All data structures, functions etc. are properly prototyped such to pass compilation (-wall, C99) without warnings.)

A corresponding async message (triggering this task) looks as follows (example corresponds to Ex1b.core)
arg_reset_t_0 = (ARG_reset_t_0){75*(2+1)}; 
RTFM_pend(0, max_int, RTFM_id, reset_t_0_nr);

Notice, that passing around the RTFM_id is done only to allow for tracing (of the sender) and is omitted when generation code for the ultra-light weight RTFM-Kernel. However in the context of threaded hosts each call to the underlying OS/library will likely add overhead orders of magnitude larger than
passing an extra parameter.

The implementation of RTFM_pend is as follows:
void RTFM_pend(RTFM_time a, RTFM_time b, int f, int t) {
    int lcount;
    {   // inside lock of the counter
        lcount = pend_count[t];
        if (lcount == 0) {
            DPT("RTFM_pend   :pend_count[%d]++", t);
            new_base_line[t] = RT_time_add(base_line[f], a); 
            new_dead_line[t] = new_base_line[t] + b;

    if (lcount == 0) { // just a single outstanding semaphore
    } else { // discard message

The handling of resources/mutexes, is straightforward. Each resource mutex is created with it's ceiling priority. On entering a critical section, code is auto-generated that locks the mutex, and on exit a critical section the auto-generated code unlocks the mutex.

Under PTCORE, the run-time is implemented as potentially inlined functions (as we compile the run-time and application together inlining is possible without fragile link time optimisation). (The RTFM-Kernel API is defined completely as macros, and hence the RTFM-Kernel is fully inlined, i.e., during run-time there is no -Kernel, just the application!)

E.3) WINCORE status

The current implementation of WINCORE is compatible with the generated code of the -core compiler. Additionally baseline values are given to the frontend in the same time unit as in PTCORE (µs).

WINCORE and PTCORE will be fused together in order to ease development and comprehension. The attached diagram (see picture) represents the common structure for this. It's a mixed flow diagram starting with the most relevant functions in the runtimes. Blue arrows show a sequence of functions, while the red and the green arrow denote synchronization relations between async senders and task instances.


Future improvements.

  • Stack memory.
For now stack is allocated to default size, but this can be improved on by taking the maximum stack required for each task into consideration.

(gcc -fstack-usage provides this information on a per function basis, -core Funcs call graph is known in the -core compiler, and gcc can report on the C-call graph (as long as function pointers are not used).
  • Robustness under overload.
Full protection from races on message data can be implemented by allowing the sender to reject the message if there is already an outstanding message. In this way, the mutex inside the thread_handler and
RTFM_pend function can be completely omitted.
  • Deadlock free execution under potential deadlocks.
Work in progress investigates the use of analysis to identify regions of potential deadlock, and infer "super" locks such that only one task/thread my enter a danger zone. Already today danger zones are reported visually to the end user (as cycles in the dependency graph). Solving this problem will greatly facilitate concurrent programming in threaded environments, and may be exactly what we all been waiting for!
  • Buffering of outstanding messages.
The timing semantics for -core currently restricts async messages to single buffered elements. This may pose a limitation whenever a periodic task emits messages with a baseline offset (after). In this case we should allocate a buffer of length after/period to accommodate for the outstanding messages.

Sufficient information to pursue required analysis is present in the -core model, but the required analysis, code generation and run-time implementation still remains.
  • Handling of externally caused exceptions.
While the first three improvements are low hanging fruit the 5th is challenging. Exception handling in most (of not all) languages and models, requires the raising and handling the exception from within the same context. (Typically this amounts to the same thread or task.) Handling of exceptions emitted externally (e.g., from the run-time system, some other task/thread) is if all possible, limited supervisory actions, like killing the thread under POSIX.

We currently seek a generic representation for exceptions in the -core language, and ways to implement it in the run-time system(s). Under pthreads/POSIX signal handling reflects the commonly limited exception handling (where operations in signal handlers are severely crippled). Onto bare metal, we are free to
manipulate the stack, and will likely be the target for our first implementation attempts.

Word from the author:

PTCORE is simple in design and implementation in comparison to pthread based libraries/wrappers like boost. The implementation uses and relies on only a very limited subset of the flourishing and blunted pthread library. The simple design makes it possible to argue on correctness and efficiency.
Libraries like boost, still requires lots of cooperation of the user, you need to understand boost (and ultimately the pthreads themselves) to correctly use threads in C. Under -core/PTCORE this is NOT the case. The programmer can think in terms of tasks and critical sections, and can focus completely on solving the problem at hand, not on how to code for a solution (or even worse, set of solutions) provided by boost or native threads. We argue that the expressiveness of -core (and the PTCORE run-time) is sufficient to expressing most application level functionality, and as Reset and Idle are implemented in the main thread of the process, we suffer no loss of generality to native pthreads (-core/PTCORE is complete)
w.r.t., POSIX programming.

The future improvements (stated above) will take PTCORE to a new level, where no existing library based solution can compete. (You need language support for static deadlock protection, unless you rely on
deep cooperation by the programmer, or byte-code based solutions for which analysis can be performed on). Also w.r.t., to library based solutions, we strongly believe that our -core model will infer (by nature) much more concurrency (and hence potential parallelism) than library based solutions (where the user must take the conscious decision to explicitly defer computations to tasks, and weight that to the overhead of the library and the OS). In our case, it is up to the compiler to assess the pros-and cons-. Any async, (given
deadlock free execution) can semantically be executed synchronously (as long as it does not transitively refer to the sender). Hence, these delicate decisions on whether or not to create a task, will no longer
be a burden put to the programmer. The ONLY thing the programmer needs to consider is wether he/she needs the result of an operation (function) now (sync) or not (async), knowing that a both correct and efficient solution will be derived byt the compiler and run-time.

Assessing hard real-time requirements under a hosting thread based OS is to my believe not possible and will never be! The mere complexity of threads and their implementation onto modern multi-core platforms
goes far beyond any reasoning. However, this does not preclude the implementation of robust applications using threads as a vehicle for scheduling, given proper run-time verification. To this end -core programs carry the timing information in a verifiable way. With an exception mechanism at hand that allows for external exceptions, novel methods for run-time verification may developed to enforce robustness without requiring the programmer to manually consider cases of buffer over-runs, etc.

-core and PTORE are still at early stages of development (3 months and ticking). However results are already at your hands. Hope you will hop on the train and enjoy the ride.

/Per Lindgren, founder of RTFM-lang 2014

Last edited Sep 4, 2014 at 10:48 PM by andreaslindner, version 14