Overview
The Trusted-CPP project demonstrates an implementation of the concept of guarantees for safe software development in C++ at the language syntax level, while preserving backward compatibility with existing source code.
This makes it possible to change the approach to software safety from a point-by-point search and fixing of individual bugs and vulnerabilities to guaranteeing their absence in the program source code. In other words, if the code compiled correctly, then certain classes of errors are absent in it, and therefore the vulnerabilities associated with them are absent as well.
The Trusted-CPP project consists of two components. The first component is the header file trusted-cpp.h. It contains template classes and the main settings required for the safe code analyzer. The second component is a static C++ code analyzer implemented as a plugin for the clang compiler.
Moreover, the source code can be compiled by any other compiler and without using a special plugin, because it is needed only for static analysis and does not modify the executable file generated by the compiler in any way.
Trusted-CPP implementation details:
A simple example to get started
Download the header file, the compiler plugin, and an auxiliary launcher script that simplifies using the plugin.
The helper script trusted-cpp.sh automatically adds clang arguments to load the plugin and pass it command-line arguments, so you don’t have to write something like this every time:
$ clang++ -Xclang -load -Xclang ./trusted-cpp_clang.so -Xclang -add-plugin -Xclang trust -Xclang -plugin-arg-trust -Xclang verbose example.cpp
and can replace it with a simple call trusted-cpp.sh -trust verbose example.cpp
When compiling the file invalidate.cpp that contains the following code, no errors will be reported.
std::vector vect(100000, 0);
auto x = vect.begin();
auto &y = vect[0];
vect = {};
std::sort(x, vect.end()); // Error
y += 1; // Error
Whereas with the plugin, the following warning and error messages about invalidation of reference variables will be printed:
$ ./trusted-cpp.sh -std=c++20 -fsyntax-only invalidate.cpp
../invalidate.cpp:26:5: warning: using main variable 'vect'
26 | vect = {};
| ^
../invalidate.cpp:31:15: error: Using the dependent variable 'x' after changing the main variable 'vect'!
31 | std::sort(x, vect.end()); // Error
| ^
../invalidate.cpp:36:5: error: Using the dependent variable 'y' after changing the main variable 'vect'!
36 | y += 1; // Error
| ^
3 warnings and 2 errors generated.
Description of the safe C++ development concept
The project is based on the concept of safe memory management from the NewLang (trust-lang) language, which was ported to C++ as a separate memsafe library and later extended, including by implementing part of the STEELMAN requirements adapted for C++.
By the term “safe development in C++” we mean:
- safe computations and type conversions
- safe memory and resource management
- safe multithreaded programming
And since any solution for safe C++ must be economically viable, this means it must provide backward compatibility with existing C++ code, detect errors at the program compilation stage (i.e., as close as possible to the code writing stage), and use automatic control at the source level, similar to safe development facilities built directly into the language.
In other words, if the C++ source code compiled correctly, then it contains no errors, and therefore no vulnerabilities due to:
- incorrect computations or casts (conversions) of data types
- buffer overflow or resource management errors, including cyclic references
- absence of multithreaded access errors leading to a “race condition”
Guarantees of safe software development at the language syntax level are implemented by the C++ compiler by automatically checking the program source text and imposing restrictions on the use of certain code fragments, which are syntactically correct, but may lead to runtime errors or vulnerabilities.
Direct analysis of Trusted-CPP source code is performed by the compiler plugin, but the source code itself remains an ordinary C++ program, and it can be compiled by any compiler without using the plugin, and you can also use linters and additional static analyzers.
Source code analysis in Trusted-CPP is based on marking elements using user-defined C++ attributes that appeared in C++11. This is very similar to proposals in the safety profiles p3038 by Bjarne Stroustrup and P3081 by Herb Sutter, but does not require developing a new C++ standard.
At the moment, the checking of syntactic rules when the plugin is connected is activated automatically during compilation by using the built-in function __has_attribute (trust). If the plugin is absent during compilation, then the use of user-defined attributes is disabled using preprocessor macros to suppress warnings like warning: unknown attribute 'trust' ignored [-Wunknown-attributes].
A fundamental feature of Trusted-CPP is the ability to mark various elements using user-defined C++ attributes not only when defining a class or function, but also at an arbitrary place in the program source text (or even in an external configuration file). This feature makes it possible to mark important classes, functions, or their arguments after they have been defined and without having to change previously written code.
Nominal (named) typedef typing
In C/C++, a typedef declaration creates an alias for an existing type, but during type casting the type and its alias are not distinguished. Adding nominal typing for typedef makes it possible to prevent accidental type equivalence compared to structural typing and means that two variables are type-compatible if and only if their declarations contain the names of the same type.
Code example using nominal (named) typedef typing
Safe memory management
Safe memory management is fully compatible with C++ code and STL templates and is implemented by using the following kinds of addressing (address variables):
-
A raw (Raw) address for direct addressing of data, which is required for iterators to work and for explicit or implicit address arithmetic. For raw addresses, the automatic invalidation mode works, which warns about an address change when the state of the object from which it was obtained is modified. Addresses of this type can be used only in local (automatic) variables with a limited lifetime until the end of the current scope. Raw addresses cannot be stored in object fields and should be passed as function arguments with caution.
-
A strong reference (derived from
std::shared_ptr) - the variable contains only a strong (owning) pointer to data. Copying a strong reference copies the pointer and increments the ownership counter. Cyclic and cross strong references are prohibited at the data type level (class definitions), which is controlled during program compilation. -
A weak reference (derived from
std::weak_ptr) - the variable contains only a weak (non-owning) pointer to data, and copying a weak reference does not increment the ownership counter. -
Locker - a temporary object for accessing data for both kinds of smart references. It contains a dereferenced raw address, and with its help an RAII mechanism is implemented to limit the lifetime of owning the captured value to the current scope only, which guarantees automatic memory release after it is destroyed. Objects of this type can be used only in local variables with a limited lifetime until the end of the current scope.
The main difference between strong and weak references and the corresponding standard templates is the way the object is accessed when dereferencing a reference: this is done by creating a temporary variable, and then direct access to the data (object) is obtained through it. Code example
All other variables that store data by value (variables by value) cannot create references. To create references, you must use a smart pointer (shared or weak reference).
Control of multithreaded access to data
Safe multithreaded programming - is the automatic elimination of problems that lead to a “data race condition”.
And to minimize logical errors when acquiring a synchronization object (if this is required for variables with controlled multithreaded access), the attempt to acquire access and dereference a reference is performed as a single operation.
Automatic data access variable is not only a temporary owner of a strong reference, but also performs ownership functions for an inter-thread synchronization object in the style of std::lock_guard, whose lifetime is limited to the current scope and is managed by the compiler automatically.
Open and closed variable scopes
The implementation of safe multithreaded programming is based on the STEELMAN requirements using open and closed scopes for external variables. In fact, this is an implementation of item 5G from the STEELMAN requirements, only refined for C++ and OOP.
In open scopes there are no restrictions on the use of external variables, whereas in closed scopes non-local variables must be explicitly imported (listed). The scope and the list of imported external variables for nested scopes are inherited from higher-level scopes until explicitly overridden.
Closed scopes - are essentially an inversion of OOP (OOP the other way around), where it is not the object’s internal data that is hidden from the external environment, but vice versa: within the current scope, access to variables from the external environment is restricted, and access to them is possible only after their explicit listing in the program source code. (The same basis is used for pure functions, where the external environment becomes unavailable from the function body).
Creating a closed scope or redefining it is done using the macro TRUST_USING_EXTERNAL(""), which is applied to the definition of a function, class, class method, or an individual expression. An example of using closed scopes can be seen in the examples below.
For the purposes of safe multithreaded access in C++, the following scheme is implemented:
- Inter-thread safety is built on the use of two attributes (conventionally) THREAD and THREADSAFE.
- The THREAD attribute is applied (marks) functions that run in separate threads.
- The THREADSAFE attribute is applied to variables that provide access synchronization in multithreaded programming, i.e. variables with THREADSAFE must implement access synchronization.
- The THREAD and THREADSAFE attributes mark the arguments of all functions that create execution threads, i.e. an argument passed into the function must be marked with the corresponding attribute.
Marking with attributes happens once and is inherited for derived classes, and then the compiler automatically tracks correct usage, i.e. so that when creating an execution thread, its body is marked with the THREAD attribute, and the arguments passed into the thread are THREADSAFE.
In addition, the thread function becomes closed for accessing external variables, and the analyzer (the compiler via the plugin) will automatically check that from a function with the THREAD attribute any imported external variables must have the THREADSAFE attribute.
An example of controlling multithreaded access to data is given below
Additional features
Macro restrictions in the exported interface of a C++ module
- Two macro processing modes are introduced in the compiler:
- Legacy mode - for all files except module files.
- New mode - only for module files.
- In the new mode, macro scopes (lifetimes) are added.
- In a C++ module file, restrictions are introduced on using macros in exported names of variables, functions, and classes.
The differences between the two modes are as follows. Legacy macro processing mode is used for all files except C++ module files (i.e., macro processing happens as before - only at the preprocessor level), whereas the new macro processing mode is intended exclusively for C++ modules, and macro processing itself is performed with AST awareness.
Manual and automatic stack overflow checking
Manual and automatic stack overflow checking - is the only functionality that introduces changes into the generated code when compiling the program, therefore this part was completed as a separate project stack-check, which can be used both together with Trusted-CPP and without it.
Formal proof analysis of program correctness *
In fact, this is an implementation of static checking of dynamic AoRTE (“Absence of Run-Time Errors”) expressions, which does not produce false positives, although false negative results are possible. That is, if there are no compilation errors, then you can be sure that there are no problems in the code, while an indication of a possible error does not always correspond to reality and the tool may be wrong.
Formal analysis does not try to prove the correctness of the program as a whole. It is used only to prove user-defined assertions in different parts of the program and in function calls. Moreover, the correctness proof is performed only to the extent defined by the user, and the assertions themselves correctly and fully describe and constrain the program implementation.
Formal proof analysis of program correctness is implemented following the gnatprove principle for the Ada language and uses three macros to define preconditions, postconditions, and assertions: TRUST_ASSERT_PRED(), TRUST_ASSERT_POST(), and TRUST_ASSERT() respectively. This is something between assert and static_assert, which is evaluated during program compilation, but the expression may use non-constant values (non-constant expressions must be computable at the data type level or described in pre- and postconditions).
*) - This functionality is planned for implementation, but is currently paused until the main part of the project is completed
Code examples
Code example using nominal (named) typedef typing
Creating a data type with nominal typing is done using an attribute, which is expanded when using the TRUST_NOMINAL macro or is listed in the TRUST_NOMINAL_TYPES(...) macro.
typedef int IntType;
int int_value = 0;
IntType IntType_value = 0;
int int_value_cast = IntType_value; // OK
IntType IntType_cast = int_value; // OK
TRUST_NOMINAL typedef int IntSubType; // Nominal typing at type definition time
IntSubType IntSubType_value = 0; // OK
int int_value_cast2 = IntSubType_value; // OK
IntSubType IntSubType_cast = int_value; // ERROR
IntType IntType_cast2 = IntSubType_value; // ERROR
TRUST_NOMINAL_TYPES("IntType"); // Nominal typing for an existing type
IntType IntType_value2 = 0;
IntType IntType_cast3 = int_value; // ERROR
IntType IntType_cast4 = IntSubType_value; // ERROR
Example of using closed scopes
To specify imported variables, you can use a mask for the variable name or the namespace:
TRUST_USING_EXTERNAL("*")- Allow access to any variables - default behaviorTRUST_USING_EXTERNAL("")- Forbid access to all external variablesTRUST_USING_EXTERNAL("ns::*")- Allow access to any variables from the ns:: namespace
int global = 0;
int func_default() {
// Open scopes by default
return global;
}
TRUST_USING_EXTERNAL("") // Forbid access to all external variables
int func_closed() {
// In fact, it is a pure function with no side effects.
return global; // ERROR
}
TRUST_USING_EXTERNAL("global") // Access is allowed only to the variable global
int func_using_external() {
return global; // OK
}
Applying safe multithreaded programming
One-time marking of functions and classes that create separate execution threads.
// Set the 'thread' attribute for the first constructor arguments for std::thread and std::jthread classes
TRUST_SET_ATTR_ARGS(thread, std::thread, 1);
TRUST_SET_ATTR_ARGS(thread, std::jthread, 1);
// Set the 'threadsafe' attribute for all constructor arguments of std::thread and std::jthread classes
TRUST_SET_ATTR_ARGS(threadsafe, std::thread::thread, 0);
TRUST_SET_ATTR_ARGS(threadsafe, std::jthread::jthread, 0);
// Mark pthread_create arguments with attributes:
// 'thread' for the third argument and 'threadsafe' for the fourth
TRUST_SET_ATTR_ARGS(thread, pthread_create, 3);
TRUST_SET_ATTR_ARGS(threadsafe, pthread_create, 4);
// Set the 'threadsafe' attribute for thread-safe templates
TRUST_SET_ATTR(threadsafe, std::atomic);
TRUST_SET_ATTR(threadsafe, trust::SyncTimedShared);
Code example that controls a thread function against a “race condition”
uint64_t notrust_count = 0; // Without setting the THREADSAFE attribute
void *thread_notrust(void *arg) { // Thread without the THREAD attribute
++notrust_count; // Race
return nullptr;
}
std::atomic<uint64_t> trust_count = 0; // Automatic THREADSAFE marking for std::atomic
TRUST_THREAD void *thread_trust(void *arg) { // Thread function (marked with THREAD attribute)
trust_count++;
notrust_count++; // ERROR: Expected attribute 'threadsafe' for 'notrust_count'
return nullptr;
}
Code example of creating threads with control of potential “data race condition” errors
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_t tid;
pthread_create(&tid, &attr, thread_notrust, nullptr); // ERROR
// error: Expected attribute: 'thread' for 3 argument
// pthread_create(&tid, &attr, thread_notrust, nullptr);
// ^
pthread_create(&tid, &attr, thread_trust, nullptr); // OK
{
std::thread t_lambda([&]() {
for (auto i = 0; i < 1'000'0000; ++i)
++notrust_count; // ERROR
// error: Expected attribute 'threadsafe' for 'notrust_count'
// ++notrust_count;
// ^
});
std::thread t_notrust(thread_notrust, nullptr); // ERROR
// error: Expected attribute: 'thread' for 1 argument
// std::thread t_notrust(thread_notrust, nullptr);
// ^
std::thread t_trust(thread_trust, nullptr);
t_lambda.join();
t_notrust.join();
t_trust.join();
}
Example of dereferencing a reference and acquiring an access lock for smart pointers
trust::Shared<int, trust::SyncTimedMutex> var_sync(1); // derived from std::shared_ptr
trust::Weak< trust::Shared<int, trust::SyncTimedMutex> > var_weak = var_sync.weak(); // derived from std::weak_ptr
TRUST_THREAD void func_thread(){
try{
// Cannot capture into a static variable
// static auto static_fail1(var_sync.lock());
auto sync = var_sync.lock(); // or *var_sync
auto weak = var_weak.lock(); // or *var_weak
*sync += *weak;
} catch(...){
// Handle access lock or reference dereference error
}
}
Conclusions and summary
This document provides a simple description of the project with examples of solving obvious and understandable tasks of safe programming in C++.
The current state of the project - not production-ready yet. Most likely it has flaws and omissions, since there are many other interesting and complex situations not covered here, but we will be glad to accept your suggestions for improving the project if you want to add or improve something.