GNSS C++ solutions

Global Navigation Satellite Systems Software-Defined Receivers explained. Part 0: Hardware and system design

2022-06-20T00:00:00+00:00

In this series of articles I’d like to share my experience in GNSS receiver design over the last 10+ years. Writing down the knowledge on various topics is an excellent way to share the information, as well as structuring and preparing this kind of educational material allows one to find the missing pieces in one’s knowledge.

I’d like to start from a system design perspective since most of the decisions made at this stage are directly influencing the type of processing that will be used later.

Part 0: Hardware and system design
Part 1: Digital frontend
Part 2: Jamming mitigation
Part 3: Acquisition
Part 4: Correlation and tracking
Part 5: Standalone positioning
Part 6: Advanced positioning methods

Overview

A general schematic of the GNSS receiver is presented below with the high-level blocks. Generally, the plurality of the satellite signals is received in an antenna (or multiple antennas), filtered, amplified and then fed into the (multiband) radio-frequency (RF) frontend to convert the signal into the baseband.

The baseband signal is quantized and sampled in the ADC and then processed by the digital frontend, acquisition engine, correlation engine, tracking loops etc. The tracking loop results are used to generate the observables and with (optional) external data the receiver performs the positioning routines. On top of that, there is a user interaction like position, velocity and time (PVT) and observables data output, generation of the synchronized timescale pulses, various external events registration and so on.

To be honest, with all the receivers I’ve made over the last decade, the system design was by far my favourite stage of development. It involves a lot of research and communication with the potential customers, handling of both software and hardware aspects, like PCB, mechanical, environmental, not to say the RF and electrical.

Hardware

The system design starts with collecting all the requirements and limitations for the future receiver, which will result in some related decisions. For example:

We need to build an accurate heading solution → we’ll have to build a receiver with two antenna inputs → we need a multi-channel RF-frontend and a baseband processor with enough processing power
We are limited to a special ASIC or FPGA family (this happens, when you need to use the in-house solution) → depending on the supposed use cases we should choose the appropriate RF frontend, inspect it with an evaluation kit and propose a frequency plan and a functional diagram of the receiver
We have to comply with the certain form-factor either by legacy reasons or if we want to propose a hot-swap for some competitor → we should carefully inspect all the tolerances and limits (mechanical, electrical and thermal) and check for the undocumented behaviour of the reserved pins.

One of the interesting topics is the selection of an appropriate antenna. There are three kinds of antennas:

Passive antennas, are often used in trackers, smartphones and other cost-effective solutions. There’s a great application note (GNSS antennas. RF design considerations for u-blox GNSS receivers. Application note) by u-blox.
High-precision active antennas, with built-in filters and amplifiers and, most importantly guarantee a stable phase centre position. As you may know, all the positioning and signal processing is performed for the signals, received at the phase centre of an antenna. When the phase centre is stable, signals from the satellites at various elevation angles are perceived at the same virtual position. An excellent overview of the active antennas is provided in a whitepaper (Topcon GNSS Reference Station with Cavity Filters TPS CR-G5-C & TPS PN-A5-C) by Topcon Positioning Systems with a major focus on the cavity filters.
Antenna arrays with null- and beam-formers. This is a deep and interesting topic, check out the NovAtel GAJT-710 antenna, I’ll briefly explain the antenna array algorithms in the antijamming section.

Active antennas, due to the presence of an amplifier, require additional power to operate properly. This is solved (quite elegantly) by providing the DC via the same signal coaxial cable. The DC flows from the receiver (or the bias tee) to the antenna and the RF signal comes the other way around.

The signal from the antenna in the receiver is amplified with an onboard LNA to compensate for the signal losses in the cable. Then, after additional signal splitting and band-dependent filtering, the signal is processed in the RF frontend. The goals of the RF frontend are the following:

Additional signal amplification with automatic gain control (AGC)
Signal spectrum shift to the baseband frequency
Additional signal filtering
(Optional) Signal sampling with onboard ADCs

The first gain-controlled amplifier, which is often omitted in the low-end frontends is required when the interference is present. As you may know, the mixer is a non-linear device that multiplies the input signal with a reference signal to produce a signal with a set of sum and difference of the original frequencies. However, to operate correctly, the powers of both signals should be similar. If this condition is not met the major signal distortion will be observed.

There’s a way to overcome this issue with an approach, usually found in RADARs, called the bandpass sampling. It’s based on the properties of the digital signal and allows to use of the ADC as a perfect mixer. There’s an article I’m happy to share or discuss where I demonstrate that with sufficient amplification this approach is equivalent and even surpasses the traditional RF frontends in terms of positioning precision.

Nowadays it’s common to have all the functions of the frontend in a single RF integrated circuit. This approach allows to reduce the footprint of the receiver, simplify the overall design and reduce the cost both of the receiver itself and the R&D expenses. There are two major players in COTS (commercially available off-the-shelf) RF frontends: Maxim Integrated (now part of Analog Devices) with MAX2769/MAX2771 chips and NTLab with NT1065/NT1066/NT1062 device family. Let’s illustrate a hardware part of the L1/L2 receiver:

Frequency plan

When the hardware system design is more or less determined, it’s time to estimate the frequency plan, which defines the set of parameters responsible for the satellite signal processing. It includes:

Reference oscillator frequency $f_{ref}$
Sampling rate $f_s$
Local oscillator (LO) frequencies $f_{LO_i}$
Resampling factors in the digital frontend

The key point to the frequency plan design is to distribute and allocate the signal spectrum in a manner to avoid spectrum overlapping, which is often referred to as aliasing. This should take into account all the frequency shifts in RF and digital frontends, as well as resampling. The tricky part here is that when it comes down to the real hardware there are a lot of limitations like integer PLLs (hence, a finite set of local oscillator frequencies), sampling rate and bandwidth limits, presence or lack of the I/Q processing in the baseband processor and so on. For example, here’s an illustration of the frequency plan for the one of my first receivers:

It was built with two NT1065 RF frontends to cover the L1/L2/L3/L5 GNSS signal bands and had local oscillators at 1590, 1237.5 and 1190 MHz. The fourth local oscillator was set to 1330 MHz and was a source of the 66.5 MHz ADC clock.

One more important thing is that the frequency plan should not be treated as something engraved in stone: it may (and should) change to reflect the situation. One of the best things about the modern IC-based RF frontends is their reconfigurability: if the selected ADC clock or the LO frequency doesn’t fit you for some reason (unexpected spectrum harmonics from the self-interference is one of the possible problems) it’s easy to use another configuration with a different set of register values.

Summary

This introductory part is a very high-level overview of the system design process for the GNSS receivers, but this is the kind of knowledge you get from experience and have to acquire piece by piece from multiple sources. As I’ve mentioned, it always was my favourite design stage since it always represented pure research and, in a matter of speaking, art.

There’s also a big part of the system design related to the mechanical specifications, external data and user interaction, I hope to add it once I revisit this text.

Global Navigation Satellite Systems Software-Defined Receivers explained. Part 1: Digital frontend

2022-06-20T00:00:00+00:00

Overview

In software-defined and software-oriented GNSS receivers, systems engineers can decouple the satellite signal processing into two separate stages: group and individual.

Group signal processing is performed on the whole subband of the signal spectrum without distinguishing the signals of the individual satellite vehicles (SV). This is implemented by the digital frontend block, where the number of channels is roughly the number of processed signals (GPS L1 C/A, GLONASS L1OF etc.).
Individual signal processing is the traditional digital channel explained in-depth by many authors [Kaplan, Springer]. This will be explained in the Part 4: Correlation article.

The digital frontend may be perceived as a signal conditioner, the purpose of it is to prepare the input signal for the subsequent processing: translate the signal spectrum, reduce the data rate, remove interference and pack the signal in a format, suitable for the correlator.

There are two major beneficial aspects of using the digital frontend:

As I’ve mentioned earlier (Frequency plan), ADCs usually operate on a fixed sampling rate, which often is not optimal for the specific signals. The resampling part of the digital frontend allows us to reduce the sampling rate individually for every group signal, hence reducing the computational load on the correlators.

For example, if the ADC is providing samples with 79.5 MHz and we want to develop a narrowband GPS L1 C/A timing receiver we can downsample the incoming signal by the factor of 20 (down to 3.975 MHz). In that case, we’ll perform one spectrum translation on the high frequency and resampling, but the following N (by the number of satellites) translations and correlations would take 20 times fewer operations for both mixer and correlator.
By separating the group and individual SV processing it is possible to separate the timing approach as well. For real-time receivers (implemented with FPGAs or ASICs), the group processing is often implemented synchronously with the ADC clock (~ tens of MHz), but the resulting signal can be saved to the internal memory and then processed in a co-processor manner, running as fast as it can be synthesized/clocked in case of FPGA or ASIC accordingly. The benefit here is that the same hardware can be reused for the benefit of silicon area optimization.

The second aspect is not so relevant for the PC-based receivers, since it’s virtually impossible to organize the sample-based streamed data, and those kinds of receivers are asynchronous by design.

Mixer

Mixers are the devices used to translate the signal, which results in the shift in the frequency domain. Most digital mixers are complex and perform the multiplication of the input signal with the complex exponent:

\[result(t) = input(t) * e^{-j(2\pi ft + \varphi)}\]

With the transition to the digital signal processing and samples, this can be re-written as:

\[result[i] = input[i] * e^{-j(2\pi i\frac{f}{f_s} + \varphi)}\]

For the demonstration, I’ve used a simple Jupyter Notebook so you can try it yourself.

First of all, we download and read the data buffer. For the sake of simplicity of the demonstration I’ll convert the data to the floating-point:

file_link = 'https://www.dropbox.com/s/b2il4hnu30fup7c/GPSdata-DiscreteComponents-fs38_192-if9_55_partial.bin?dl=1'
filename = 'GPSdata-DiscreteComponents-fs38_192-if9_55.bin'
fs = 38.192e6
intermediate_frequency = 9.55e6
request.urlretrieve(file_link, filename);

signal_data = np.fromfile(filename, dtype=np.int8).astype('float')

To illustrate the data properties, we’ll plot the power spectrum via Welch’s method, the first 100 microseconds and the signal histogram:

As it was mentioned earlier, to translate the signal we’d have to multiply it by the complex sine:

time = np.arange(signal_data.shape[0]) / fs
sine = np.exp(-1j * 2 * np.pi * time  * intermediate_frequency)
translated_signal = signal_data * sine

Here are the key points that can be observed in the results of the multiplication:

Since we’re multiplying the real signal with the complex exponent we can observe the spectrum repetition in the frequency domain plot. This is present because the real signal has a spectrum that’s symmetrical relative to the $\frac{f_s}{2}$. This won’t affect the following processing as long as we have a proper anti-aliasing filter in the downsampling block.
The peak at -9.55MHz is the DC part of the spectrum being translated to the negative intermediate frequency.
The multiplication operation is linear and doesn’t change the nature of the underlying signal.

Verification of operations

It’s always good to check and test your algorithms and operations throughout the development process. In those articles, I’ll omit most of the tests for demonstration purposes but I intend to keep the most visual and high-level test: the matched filter test.

The matched filter is an optimal filter, designed to maximize the signal-to-noise ratio. It has a complex frequency response conjugated to the complex frequency response of the signal it is designed to detect. According to the convolution theorem, we can substitute the circular convolution with the multiplication in the frequency domain. Therefore, the Python code for the matched filter is straightforward:

def MatchedFilter(signal_data, code_data):
  return np.fft.ifft(np.fft.fft(signal_data) * np.conj(np.fft.fft(code_data)))

Since we’re working on a well-known GNSS dataset, we can use external information about the signals present. I’ll use a GPS SV21, but any satellite can be used. With known Doppler offset this test performs an additional spectrum translation, upsamples the PRN code and calculates the matched filter output, which is then being normalized and plotted:

Hardware-friendly optimizations

It is worth pointing out that digital signal processing devices and GNSS receivers in particular rarely operate with floating-point values and usually work with integers.

:information_source: Interestingly, with my research on PC-based software-defined receivers, fp32 digital signal processing routines significantly outperformed the integer-based ones, while both approaches were implemented with the Intel® Integrated Performance Primitives library. I’m looking forward to evaluating the performance and accuracy of the lower-resolution floating-point values (like fp16, bfloat16 or even fp8) but without the optimized libraries and hardware support, this kind of research isn’t available yet.

The most hardware-friendly approach for the integer-sine is a numerically controlled oscillator (NCO) with the corresponding phase-to-value sine table.

NCO may be viewed as an arbitrary precision unsigned integer with the full-scale treated as a period of the target function. NCOs have two main input parameters: phase accumulator value (phase) and an adder (relative frequency). The bit depth of the NCO is directly related to the possible frequency precision it’s able to achieve. NCOs are widely used in various digital signal processing systems for both receivers and transceivers.

To create a sine with an NCO it’s required to precalculate the lookup table that will convert the NCO phase into the amplitude of the target function. Usually, the number of these lookup table entries is much smaller than the resolution of the NCO. To address this issue and keep the frequency resolution of the NCO the index in the lookup table can be achieved as the higher $M$ bits of the $N$-bit phase accumulator.

Another thing worth pointing out: if your lookup table is filled with integers (scaled sine values) you may find that the bus width is not enough for several consecutive operations. For that matter, you might use a so-called normalizer block, which is a scale-down operation. To illustrate this: your input is an 8-bit and the lookup table is also using 8 bits to represent the sine. When you multiply the two numbers the worst-case scenario is that you’ll require 17 bits to store the result.

The rule of thumb: when you’re adding two integer numbers, the required bit depth will be increased by one from the max: result_bit_depth = 1 + max(bit_depth_first, bit_depth_second).

If you’re multiplying two integer numbers you’re facing the sum of the bit depths increased by one: result_bit_depth = 1 + bit_depth_first} * bit_depth_second.

This is an interesting and very well-reviewed topic, mainly targeted at the microelectronics and FPGA engineers, but if you think this chapter would benefit from a more detailed review please do let me know.

Resampler

Changing the data rate of the digital signal is a complex task with numerous approaches, each with its set of pros and cons. The preferred type may vary based on the subsequent processing stages, a priori information about the signal and limitations of the hardware platform. The upsampling routine may be described as a two step-process: inserting $N-1$ zeroes between the source samples with the (optional) follow-up low-pass filtering to suppress spectrum copies. Filtering type is the main customization point of the upsampling methods. To name a few:

Lack of low-pass filtering. This will induce minimal signal distortion, but is only applicable if the subsequent stages are tolerant to the spectrum copies;
Digital low-pass filtering with a cutoff frequency of $\frac{f_{sOld}}{2}$, to suppress the spectrum copies. Filter design is always a trade-off between suppression level, passband ripple and hardware complexity. However, this method provides one of the best results and is used internally in the resample function in MATLAB;
Low-pass filtering in the frequency domain will achieve the best results in exchange for increased computational complexity. However, this would be a good fit for post-processing applications;
CIC-based filters. This design was used quite a lot due to the very hardware-friendly architecture with no multiplications involved. However, a very specific roll-off of the frequency response should be noted, which is commonly being compensated with the following FIR-filter, which negates the hardware complexity profit.

The easiest and most vectorized-friendly way to do insert zeros in Python and MATLAB environments is to resize the input vector:

\[\begin{pmatrix} s_0 \\ s_1 \\ \vdots \\ s_{M-1} \end{pmatrix}\]

into the matrix with zero-filled columns

\[\begin{pmatrix} s_0 & 0 & \cdots & 0 \\ s_1 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ s_{M-1} & 0 & \cdots & 0 \end{pmatrix}\]

For example, we’ll upsample by the factor of four. After that, to get the vector we need, we’ll reshape the matrix into the flattened vector

\[\begin{pmatrix} s_0 & 0 & \cdots & 0 & s_1 & 0 & \cdots & 0 & \cdots & & s_{M-1} & 0 & \cdots & 0 \end{pmatrix}\]

upsampled_signal = translated_signal;
upsampled_signal = np.c_[upsampled_signal, np.zeros(upsampled_signal.shape[0]), np.zeros(upsampled_signal.shape[0]), np.zeros(upsampled_signal.shape[0])]
upsampled_signal = upsampled_signal.ravel()
PlotSignal(upsampled_signal, fs * 4, 'Upsampled signal')

It will produce a signal with the following spectral and temporal characteristics:

There are two main points in this signal:

The spectrum is periodical and repeated. This is due to the properties of the digital signal
The histogram shows a lot of zeros because for every normally distributed sample we’ve inserted $N - 1$ zeros.

Integer resampling is an expansion of the previous task with the downsampling follow-up. Downsampling may be viewed as an inverted upsampling: low-pass filtering with the following signal decimation (taking every $M$th sample). Downsampling filter types are the same with the only difference in cutoff frequency: $M\frac{f_{sOld}}{2}$.

Hardware-friendly optimizations

One of the hardware-efficient downsampling methods I’ve been using a lot in GNSS signal processing is accumulation, in which every output sample is the (normalized) sum of $M$ samples of the input samples.

For the sake of vectorization, interpreter-based languages can perform accumulation as a pair of reshaping and column-wise sums. Similar to the upsampling, the input vector

\[\begin{pmatrix} s_0 \\ s_1 \\ \vdots \\ s_{M-1} \end{pmatrix}\]

is reshaped into the $(M / N, N)$ matrix

\[\begin{pmatrix} s_0 & s_1 & \cdots & s_{N-1} \\ s_N & \vdots & \ddots & \vdots \\ \cdots & \cdots & \cdots & s_{M-1} \end{pmatrix}\]

and then being row-wise summed

\[\begin{pmatrix} \sum_{i=0}^{N-1} s_i \\ \sum_{i=N}^{2N-1} s_i \\ \vdots \\ \end{pmatrix}\]

fs_down = fs / 4
time_down = np.arange(translated_signal.shape[0] / 4) / fs_down
reshaped = translated_signal.reshape((-1, 4))
accumulated = reshaped.sum(axis=1) / 4
PlotSignal(accumulated, fs_down, 'Downsampled signal')

The point of accumulating is the implicit low-pass filtering with a $\frac{sin(x)}{x}$ frequency response, which is relatively inefficient from the high-frequency suppression point of view, but it’s one of the most easy-to-implement digital filters I’ve seen.

Signal packer

The final operation in the digital frontend is to save the final group signal into the local memory in the format, suitable for the used correlator implementation. This is a hardware-dependent block and all the possibilities should be thoroughly benchmarked.

One of the things worth pointing out here is that since the digital frontend can be treated as a continuation of the traditional analog RF frontend, it’s possible to interpret the signal packer as some kind of the ADC. For example, if there’s no interference present (or mitigated, more on that in Part 2: Jamming mitigation), the resulting signal can be stored with low precision: 1 or 2 bits, like the ADCs, used widely in GNSS frontends.

Another benefit of such low-precision signal storage is the possibility to implement the correlator block to operate on such packed data. It will allow to increase the throughput of the block and reduce the required silicon area.

Curiously, during my experiments with PC-based digital signal processing, the most performant versions turned out to be the float-based ones. This can be explained simply by the fact that with floats we’re only required to perform the operations we intend to: multiply, add, accumulate etc. On the other hand, if we’re dealing with integers (or even packed integers) the number of operations and memory accesses is increased because now it’s necessary to unpack the data, scale the result and pack it back.

Summary

This chapter has provided a brief overview and explanation of the digital frontend block for the software-defined GNSS receivers. It, by no means, is a necessary part of the receiver, but it provides enough flexibility for the system engineers to design the receiver tailored for the specific market and/or applications.

Using the digital frontend in the GNSS receiver design allows decoupling of the synchronous digital processing, performed at the ADC sampling rate, from the asynchronous correlation processing. This, in turn, helps to design a more flexible signal data flow with additional filtering and resampling.

Another essential part of the digital frontend is the jamming detection and mitigation block, but I’ve decided to dedicate a standalone chapter to it since the topic is very deep.

Global Navigation Satellite Systems Software-Defined Receivers explained. Part 2: Jamming mitigation

2022-06-20T00:00:00+00:00

GNSS jamming is something I’m very familiar with and with what I’ve been involved in for most of my career. My specialist’s thesis (6-year degree, Russian alternative to bachelor and master combined) was about real-time narrowband interference mitigation for GNSS receivers with FIR-filters, I wrote several papers about that and, eventually, I became a scientific supervisor of the big antijamming and antispoofing project. In that project, I and my team have investigated various approaches for spatial (digital CRPA) and non-spatial interference and spoofing mitigation. Then we implemented them with MATLAB models and built some test benches with real hardware to test the models and algorithms with real-world data. And we’ve even designed and implemented antispoofing and CRPA antijamming receivers as working prototypes, operating in real-time! As you already guessed, this is the topic I’m very comfortable with and could talk about for hours, but I’ll try to keep it brief and concentrated.

Part 0: Hardware and system design
Part 1: Digital frontend
Part 2: Jamming mitigation
Part 3: Acquisition
Part 4: Correlation and tracking
Part 5: Standalone positioning
Part 6: Advanced positioning methods

Overview and effects of the interference

Jamming is the term used for a radio-frequency interference of the GNSS signal, and it can be unintentional (like the DME landing systems for the L3/L5 subband) or harmful or intentional (jammers and other “personal privacy devices”). The goal of jamming is pretty simple: to disable the GNSS receivers in some areas. We’ll omit the reasons for this and just get to the effects and countermeasures.

The receiver, affected by jamming, will provide a deteriorated solution or even no solution at all. The presence of jamming leads to a decrease in signal-to-noise ratio (SNR) up to the point where the signal acquisition and, then, the tracking are impossible. For simplicity, we’ll assume that before the jamming mitigation the signal path is linear and the jammer signal is not distorted. For real-world applications, you’d need to look at the 1 dB compression point parameter (P1dB) of the RF frontend and the number of bits of the ADC. For example, if the narrowband tone interference is being distorted (in the frontend or by using the 2-bit ADC) it’ll turn into the multitone. It’s still possible to mitigate, but at the cost of an additional SNR.

To illustrate this let’s add a tone interference to the signal we’re working on:

interference_offset = 1.023e6
interference_level = 25
signal_with_interference = np.ceil(signal_data + interference_level * np.cos(-2 * np.pi * time  * (intermediate_frequency + interference_offset)))

PlotSignal(signal_with_interference.astype(np.int8), fs, 'Signal with narrowband interference')

In the following figure, we can see the main interference tone and some harmonics due to the quantization error (ceil + cast to int8 for the following clipping demonstration). Another interesting point here is in the histogram. Usually, in the lack of interference, the signal histogram is bell-shaped due to the normal distribution of the input signal, which is caused by the satellite signal being buried ~20 dB beneath the noise. However, if we add the narrowband interference, it’ll shift to the ramp shape, since the harmonic signals “spend” most of their time in their upper and lowest values.

When the GNSS receiver uses low-bit ADCs, they’re clipping the input signal. For example, let’s demonstrate the simplest case of the 1-bit ADC, which is effectively a comparator. When you clip the sine into two levels it becomes a square wave. As you may know, the spectrum of the square wave is full of harmonics with a $\frac{sin(f)}{f}$ envelope. This is exactly what we see in the figure below:

Narrowband interference

The narrowband (single-tone and multitone) interference is a common thing to spot in the GNSS receivers. Since the signals are weak, it’s easy to pick up some spurs from both external and internal equipment. There are two approaches to interference mitigation, based on the domain the filtering is performed: temporal and frequency. The former is more suitable for hardware-oriented approaches or systems with very limited resources, while the latter is more software-friendly and provides better results.

Digital filters approach

There are two kinds of digital filters: finite and infinite impulse response (FIR and IIR correspondingly). The main difference between them is the latter being recursive with the possibility of being unstable. Another downside to the IIR-filters is that there’s no way to design a filter with a linear phase. However, there’s a significant advantage: it’s possible to design the IIR-filter with the same attenuation characteristics of a significant less order, therefore, less silicon area.

For GNSS jamming mitigation it’s reasonable to go with the FIR-filters because it’s possible to design a filter with a linear phase response. During my research, I’ve found a very elegant and computationally-efficient design method I’d like to share: the frequency response sampling method. To demonstrate this algorithm I’ll use two additional tones at -1.023 MHz and 2.046 MHz on top of the translated signal:

If we use such a signal as-is with the matched filter we won’t be able to distinguish the signal from the noise:

The idea and the algorithm are simple: create an ideal frequency response with zeros for frequencies to suppress and ones otherwise:

The second step is to apply the phase multiplier to create a linear phase response. The preliminary impulse response is obtained by the inverse Fourier transform, but such a system is susceptible to the Gibbs phenomenon, resulting in a massive suppression performance degradation. As a final step, to mitigate this effect, a window function is applied.

All those steps are implemented in the GetIr function, which is in the composed Jupyther Notebook. The resulting impulse response with a linear phase and nulls at the frequency locations of the interference looks like this:

Bear in mind, that since the input signal (and interference) is complex, we’d need a complex frequency response to mitigate it and, therefore, a complex impulse response. After the complex FIR processing, we can observe the mitigated interference:

The temporal processing will allow to suppress the narrowband interference and locate the satellite signals:

Frequency domain approach

The frequency-domain approach is more demanding in terms of memory but provides much better results. The main idea behind it is to detect the interference, zero out the corresponding frequency components of the input signals Fourier transform and then perform an inverse transform to get a signal for the following signal processing.

Another benefit of frequency-based processing is the lack of filter-induced group delay. This is just a constant for a traditional linear-phase FIR-filter, but it’s another thing to keep in mind and pass to the observable block and, therefore, a potential source of errors.

For the same signal and interference mix we’ve used in the temporal approach demonstration, we may observe the superiority of the frequency domain-based approach:

The matched filter will again prove that we’re suppressing the interference well and the signal can be found:

Wideband interference

Unlike the narrowband interference, the wideband is covering the whole frequency band of the GNSS signals. One of the most common and easy-to-assemble devices produces a chirp signal. A chirp or sweep signal is a kind of signal in which the frequency is varying over time. The most common is the linear frequency sweep interference, but there’s an interesting ‘tick’ swept frequency signal worth investigating (more information in Impact Analysis of Standardized GNSS Receiver Testing against Real-World Interferences Detected at Live Monitoring Sites by NSL). There are also some debates going on in academia regarding the pseudo-white noise interference, but I’ve never seen it implemented in the real world due to the heavy practical limitations like the crest factor.

Usually, wideband jammer signals would look like this with an additional (not displayed here) shape of the RF filter in the antenna:

As you can see in the figure above, a regular Fourier transform will render all the efforts useless, since it’s impossible to separate the jammer signal from the satellite signal.

To get around this limitation we can use an algorithm called Short-Time Fourier Transform (STFT) and plot a time-frequency spectrum dependency called a periodogram. It splits the input signal into overlapping segments and calculates the Fourier transform for each of them independently. The length of the segments is selected to match the time resolution, or, simply put, the portion of the signal where the frequency of the interference is relatively constant.

This kind of jammer is very efficient, as you can see in the matched filter output:

Digital filters approach

The approach I’ve tested numerous times is described in the Chirp Mitigation for Wideband GNSS Signals with Filter Bank Pulse Blanking paper. It splits the input signal into the $N$ non-overlapping subbands with digital filters followed by the blanking device. The blanker compares the magnitude of the filtered signal with some threshold and, if it’s exceeding the threshold, zeros the signal. Afterwards, the subbands are summed to reconstruct the whole signal.

For example, to synthesize the filter bank for this approach we can use the following Python code. This is by no means optimal and should be reviewed and investigated much further for the production-grade receiver, but it works for demonstration purposes.

bandwidth = 0.5e6
frequency_start = 7.5e6
frequency_stop = 11.5e6

frequency_bands = []
current_start = frequency_start
while current_start < frequency_stop:
  frequency_bands.append([current_start, current_start + bandwidth])
  current_start += bandwidth

filter_bank = []
for current_range in frequency_bands:
  filter_bank.append(signal.firwin(512, current_range, width=None, window='hamming', pass_zero='bandpass', scale=True, nyq=None, fs=fs))

This code will generate the filters with the following frequency responses:

Modern chirp jammers have a sweep period of around 20 to 100μs and the described method effectively blanks each subband multiple times during each accumulation period.

Using this approach we can achieve this kind of mitigation:

Even though the cleared spectrum doesn’t look very similar to the input one (before jamming), the satellite signal beneath the noise floor is still there, and can be acquired and tracked:

Frequency domain approach

As I’ve mentioned earlier, it’s nearly impossible to distinguish between the jammer and satellite signals with the regular Fourier transform, which is why we’re using the STFT for the signals with wideband interference. The segments of the STFT are divided in a manner to keep the jammer frequency relatively constant. With that in mind, the interference is mitigated the same way as in the narrowband case.

Like the temporal pulse blanking approach, the signal is detectable after the jamming mitigation:

Space-based approaches

The most advanced jamming mitigation algorithms use space-time processing via the special antennas called antenna arrays. It is possible to manipulate levels and phases from each individual antenna element to modify the equivalent antenna radiation shape to form nulls and focused beams in the combined signal.

Unfortunately, since this approach requires multiple antennas and multiple synchronized RF frontends, it drastically increases the device cost and is rarely used for the traditional (civil) GNSS receivers.

The main idea behind the space-time GNSS processing is to fix the signal from the reference antenna element and to manipulate the levels and phases of the other elements to minimize the output power. Since the GNSS signal is well below the noise floor, everything exceeding it is considered to be an interference and should be mitigated.

Since there are much fewer openly available datasets from antenna arrays, I’ll leave this chapter without examples, possibly to return and revise in the future.

A note about spoofing

It is a quite controversial topic whether to treat spoofing as interference or not. In my personal opinion, since the spoofing detection and mitigation approaches are vastly different from the antijamming algorithms, they should be discussed separately.

There are two ways to counter the spoofing threat:

Antenna-based via space-time or space-polarization processing. The latter may require a special antenna and is being investigated by Septentrio.
Individual satellite signal processing with multi-decision acquisition parallel tracking and decoding. With that approach, the receiver will track all the copies of the signal simultaneously followed by duplicate rejection and grouping or clustering. After that, the cluster with the higher confidence is selected as a PVT candidate.

This is a deep and interesting topic like spatial processing, I intend to get back to it in the future.

Summary

GNSS signal jamming is a major threat in the current world. Thankfully, some algorithms and countermeasures allow for preserving the resilient PNT even in case of major interference.

We’ve inspected temporal and frequency-based approaches to mitigate narrowband and wideband interference, well suited for both traditional and software-defined receivers.

Global Navigation Satellite Systems Software-Defined Receivers explained. Part 3: Acquisition

2022-06-20T00:00:00+00:00

Signal acquisition is a process of a rough estimation of partial code delay and pseudo-Doppler frequency to provide a bootstrap for the tracking loops. In the modern software-defined receivers there’s a designated hardware/software block called the Fast Search Engine (FSE) that is used to speed up the initialization of the receiver in the case of a cold start.

Part 0: Hardware and system design
Part 1: Digital frontend
Part 2: Jamming mitigation
Part 3: Acquisition
Part 4: Correlation and tracking
Part 5: Standalone positioning
Part 6: Advanced positioning methods

Overview

Before we start with acquisition let’s revisit the model of the signal, simplified for one satellite in one frequency band without the multipath effect:

\[s(t) = AC(t - \tau)D(t - \tau)e^{j2\pi (f + f_D) t + \phi_0} + n(t)\]

In that model $A$ represents the signal power, $\tau$ is the total code delay, $C(t - \tau)$ is the delayed ranging code, $D(t - \tau)$ is the delayed navigation data message, $f$ is the carrier frequency of the satellite, $f_D$ is the pseudo-Doppler frequency offset (more on the pseudo- part later), $\phi_0$ is the phase delay and the $n(t)$ is the additive noise of the receiver.

The goals of the acquisition engine are:

Determine the partial code delay by estimating the ranging code $C(t - \tau)$ phase ($\tau\ \% {\ code\ duration}$)
Evaluate the $f_D$ pseudo-Doppler frequency of the signal
(Optional) Estimate the $D(t - \tau)$ navigation message partial phase (bit switch event)

A note about the pseudo- in the Doppler frequency offset

As you’re aware, the Doppler effect is the change of the signal frequency proportional to the radial velocity between the emitter and the observer. The radial velocity occurs when those two objects either approach or move away from each other.

I’m sorry for an offtopic, but this explanation is just hilarious:

The pure (so to speak) Doppler effect is observed when the observer is producing the initial signal as well with the same reference oscillator, for example, a traditional passive radar.

On the other hand, if the reference oscillators of the emitter and the observer aren’t the same, an additional frequency offset may be observed. To illustrate this, let’s imagine that a GPS satellite is in the zenith (directly above) and a (simplified) stationary receiver with a reference oscillator with a nominal frequency $f_{ref}$ of 10 MHz. However, due to the temperature and ageing, the frequency is slightly off, let’s say 9.99 MHz.

A simple single-conversion IQ RF frontend would use some kind of PLL to multiply the reference frequency $f_{ref}$ to become $f_{LO}$ of 1575.42 MHz. For the sake of simplicity, let’s assume a fractional-N PLL with a coefficient of 1575.42. If we multiply the “real” $f_{ref}$ with that coefficient we’ll get roughly 1573.84 MHz.

So simply because of the single heterodyne, we’ll get more than 1.57 MHz of an additional frequency shift, and we haven’t even started with the sampling effects of the ADC with the mismatched clock.

Thankfully, modern GNSS receivers use Temperature Compensated Crystal Oscillators (TCXO) as a standard option with good stability and accuracy for a reasonable price and the pseudo- part of the Doppler offset is kept within the tens and hundreds of Hz, but that’s something worth remembering.

It’s also worth pointing out that there are devices called GNSSDOs (GNSS-disciplined oscillators), that estimate the offset of the reference oscillator by an additional step in the positioning phase called the speed equation: by using the same maths on pseudo-Doppler shifts instead of the pseudoranges it’s possible to estimate the speed of the receiver and the clock drift (with $\frac{seconds}{seconds}$ dimension) instead of the coordinates and a clock shift with the traditional navigation equation.

The whole timing receivers and time scales is a deep and interesting topic, I highly recommend the Global synchronization and near-Earth movement control satellite systems (unfortunately, only in Russian) book by Prof. Povalyaev. It’s quite difficult to understand, but really worth it.

Fast Search Engine

As I’ve mentioned earlier, the acquisition engine is used to determine the partial delay and the pseudo-Doppler frequency offset. Therefore, the acquisition process may be treated as a 2-dimensional maximum search with a certain threshold. There are three customization points:

Search range: usually, the whole code period for the delay axis and ±5 kHz for the pseudo-Doppler offset. The frequency range should be selected for the assumed receiver dynamics and the accuracy of the onboard reference oscillator
Search step: determines the granularity and is defined by the initialization bandwidth of the (following) tracking loops on one hand and timing constraints on the other.
Accumulation quantity and type: how many milliseconds we accumulate and how exactly the accumulation is performed (coherent vs non-coherent)

The acquisition process in GNSS receivers is operating in the so-called soft real-time mode.

A quick detour to explain the real-time operations and the difference between the modes:

The real-time is all about constraints and, technically speaking, isn’t about fast software. Depending on what are the effects of missing the processing deadline, three types of real-time modes can be highlighted:

Non-real-time: this is the most widespread type of operation when no timeframe is specified.

Soft real-time provides a timeframe for a response, but if the system misses the deadline user will observe a temporary degradation, following restoration. One of the great examples is a video player: if for some reason, the video frame is missed, the player will produce output with some artefacts, but the movie will continue to play.

Hard real-time has a more strict timeframe, if you miss it the whole system is rendered useless. Therefore, hard real-time systems preferably run either bare-metal or with a thinnest scheduler-type pseudo-OS.

The software-friendly acquisition engine saves the signal samples synchronously upon receiving an acquisition request, followed by an asynchronous process of 2-dimensional search. The first step is to reduce the data rate to lower the number of required calculations. It is reasonable to reduce the sampling to the minimal feasible ($f_S= 2f_{code}$) due to the usage of the code tracking loops.

fs_acquisition = 2.046e6
samples_per_ms = int(fs_acquisition / 1e3)
ms_to_process = 4
acquisition_signal = signal.resample(translated_signal[0:int(ms_to_process * fs / 1e3)], int(ms_to_process * samples_per_ms))

The next step is to slice the 2-dimensional window of search into the consecutive iterations with fixed Doppler offset and run the matched filter calculations on the signal. The next step is the accumulation, finding the maximum (peak value and location) and comparing it with a threshold.

for doppler in range(-7000, 7000, 50):
  sine = np.exp(-1j * 2 * np.pi * doppler * time_acquisition)
  translated = (acquisition_signal * sine)
  matched_output = np.fft.ifft(np.fft.fft(translated) * np.conj(np.fft.fft(code)))
  matched_output_ms = matched_output.reshape((ms_to_process, samples_per_ms))
  if use_coherent_acquisition:
    matched_output_ms = np.abs(matched_output_ms.sum(axis=0))
  else:
    matched_output_ms = np.abs(matched_output_ms).sum(axis=0)

  current_peak_value = np.max(matched_output_ms)
  if current_peak_value > max_peak_value:
    max_peak_value = current_peak_value

The result of the acquisition of all the GPS satellites can be illustrated like this:

Accumulation in the acquisition

One of the important questions in the acquisition is accumulation. To increase the detection probability in case of the navigation message sign change or shadowing it’s common to operate on several consecutive milliseconds followed by accumulation. There are two approaches for accumulation: coherent and non-coherent.

In the acquisition routine we don’t care about the carrier phase the difference between the two types of accumulation is the order of the magnitude and sum operations:

matched_output_ms = matched_output.reshape((ms_to_process, samples_per_ms))
if use_coherent_acquisition:
  matched_output_ms = np.abs(matched_output_ms.sum(axis=0))
else:
  matched_output_ms = np.abs(matched_output_ms).sum(axis=0)

The coherent accumulation usually results in a more “sharp” 2-dimensional ambiguity body but has a major downside being prone to the navigational message sign bit change. For example, if working with 4 milliseconds and the first two are positive and the last two are negative, resulting in a massively decreased signal-to-noise ratio.

This is also the case for the relatively high-speed data transfer signals, like the SBAS L1 or the signals with overlay codes like the BeiDou B1I.

Matrix-oriented acquisition

The sliced approach, described above, is often used in memory-limited devices like ASICs or FPGAs. However, for PC-based receivers (both CPU- and GPU-based) a matrix-based algorithm can be used. It benefits from the pre-existing highly optimized matrix multiplication libraries and massively parallel architectures at the cost of higher memory consumption.

To illustrate this with an example we’ll use the following data:

$N$ milliseconds of data at $f_S$ sampling rate, resulting in a $(\frac{N*f_S}{10^{-3}}, 1)$ vector, which we’ll denote as a $(n, 1)$ vector
Vector of Doppler offset frequencies within the range of $\pm F_D$ Hz and a step of $\Delta f_D$ Hz, resulting in a $(1, \frac{2F_D + 1}{\Delta f_D})$ vector, which we’ll denote as a $(1, p)$ vector

To match the dimensions we’ll need to prepare a translation matrix of size $(n, p)$ by calculating the complex exponent of frequencies over the $N$ milliseconds (see the translation chapter).

To translate the input signal we’ll perform an elementwise multiplication of the input signal and the translation matrix, resulting in a matrix of the same size $(n, p)$.

The next step is to acquire the Fourier transform of those matrices, but keep in mind, that this would be the traditional 1-dimensional DFT and not the 2-dimensional DFT. This means that we need only the column-wise transform without the follow-up row-wise calculations. Ths upside here is that it can be paralleled to utilize the multicore architecture.

According to the convolution theorem, to get the result we’ll elementwise multiply the input matrix and the complex conjugated input code, followed by the inverse Fourier transform. Accumulation and maximum search is performed as in the usual acquisition engine, but with fewer temporary variables and better memory locality.

Fine acquisition and bit border detection

One of the nice little acquisition tricks up my sleeve I’ve learned during my university years is the combined fine acquisition and bit border detection. Usually, when there’s an acquisition candidate, the first tracking step is the frequency locked loop (FLL) for some time followed by a phase-locked loop (PLL). However, this approach may take some time to obtain stable tracking, up to several seconds.

The idea is that for each successful acquisition we run the correlator as-is, without any tracking attempts, for 32 milliseconds to see something like this:

In the lower-left, you can see the correlator outputs and in the lower right the FFT output. As you can see, there are two low peaks in the spectrum of the correlator outputs. The fact that there are a finite number of bit shift combinations (I’ll leave it up to you to write them all down if you’d decide to implement this yourself) allows to test multiple hypotheses about the bit shift location and correct the Doppler frequency ambiguity to no more than a 32.5 Hz ($\frac{1000Hz}{32}$).

For example, with one of the combinations we can turn the sign-changing sine to the smooth one, fixing the 200 Hz Doppler offset and finding the sign change at the 8th millisecond:

Summary

Signal acquisition is an essential part of every GNSS receiver. The basic algorithm is the same, but the details are system- and signal-dependent. The traditional approach for multi-band receivers is to acquire whichever signal’s easier for you and use this data to bootstrap the tracking in the other frequency bands.

The main challenge in the acquisition algorithms, in my opinion, is overcoming the memory and computational limits and implementing everything accurately because acquisition sensitivity is usually much lower than the tracking, therefore errors and mismatches here and there may lead to even further degradation.

FIR-based electric guitar cabinet simulation explained

2022-05-10T00:00:00+00:00

It is extremely useful to have a little vacation every once in a while, to distract yourself from your day-to-day basis. You may find yourself thinking about various tasks you were too busy to give some serious time to. One of such tasks I was intrigued about is the guitar cabinet simulation via the impulse responses of the real cabinet-microphone pair.

Introduction

Let’s analyze the traditional electric guitar signal chain. We have a nice looking strat, some guitar cable, effects (missing here, as well as the effects loop), amplifier and the cabinet. Every part of this chain is important. The guitar (with your help, of course) produces the music, which is then transferred by the guitar cable to the amplifier.

Electric guitar amplifiers are different species than the HiFi ones. This is because the main purpose of the amplifier is not to make the signal louder, but to distort it in a way, that’s pleasant to the human ear. HiFi ones, on the contrary, are designed to be as transparent as possible. At the end of the signal chain, there is a cabinet, which is, mainly, a speaker in a box. This is what we’ll discuss further in this article.

One part of the signal chain that is being overlooked more often, than it should be, is the guitar cable. An important characteristic of the cable is its capacitance. If the cable is long enough, it acts as a low-pass filter (RC-circuit) and cuts some of the high frequencies. A friend of mine once told me, that he almost sold his guitar due to the “muddy” sound. What a relief it was for him when he swapped the cable for a good one. Bear in mind, that for active pickups the influence of the cable is almost neglectable.

Guitar cabinets

First of all, let’s grey out the parts of the scheme, that we’re not currently interested in, and only leave the cabinet highlighted. The cabinet is used for the:

Conversion of the electric signal to the sound waves (sound);
Low-pass filtering of the signal.

Amplified (and distorted) guitar signal has a lot of harmonics, especially in the high frequencies, which is harsh and doesn’t sound good. To solve this, most guitar speakers have a steep cutoff at around 6-8 kHz. Here’s an example of the frequency response of the cabinet (some Mesa Boogie if I recall correctly).

The task is to emulate this behaviour with digital signal processing techniques.

Guitar cabinet simulation

A guitar cabinet is a linear device and may be modelled as a digital FIR filter. FIR stands for finite impulse response and it’s the kind of digital filter where you don’t have any internal feedback.

The math behind FIR-filters is quite simple: for each output sample you multiply the last N input samples with the corresponding coefficients (often called weights) and accumulate them:

\[y[n] = b_0x[n] + b_1x[n-1] + ... + b_Nx[n-N] = \sum_{i=0}^Nb_ix[n-i]\]

This process is also known as convolution, becoming more famous with the rising popularity of the artificial convolutional neural networks.

Influence of the guitar cabinet

Let’s assume we have a nice little guitar part which has been directly recorded with a sound card or some DI-box:

Michael Klimenko · Dry

It sounds a bit dull, but that’s a start. According to our diagram above here, we have a guitar and a cable. The next step would be the amplifier (with some optional effects). For the sake of demonstration let’s assume that we want a gainy amplifier. High-gain amplifiers are used to get some nonlinear distortion to the input signal, which results in an enrichment of the original signal with harmonics. Be careful, the following sound isn’t what you’d call pleasant.

Michael Klimenko · Without cabinet

It has a lot of so-called “sand” and this isn’t something you want to hear on your recording or while jamming on your couch. To mitigate that the cabinet is used to get something like this:

Michael Klimenko · Full

The simplicity of this approach has resulted in a big number of products related to the cabinet simulation, which is great for musicians who gig a lot to get a reproducible tone as well as the recording guitarists to reamp and get the sound they’re looking for.

Automate your C library type-based overload resolutions with C++17

2021-08-17T00:00:00+00:00

Every time I work with a C library, I miss the power and capability of the type system C++ provides. That’s why I developed a simple C++17 header-only helper library to pack the multiple type-dependent C-style functions into single overload deduced at compile-time. No external libraries are required. Repo link: https://github.com/MKlimenko/plusifier. Currently, it’s just the header and a compile-time test file, CMake integration coming soon.

UPD: Some of the comments (somewhy I can’t see them now) suggested this lightning talk by Niel Waldren. It is indeed a slightly less bulky solution, but, in my opinion, it won’t trigger a warning with a type conversion mismatch (std::size_t vs plain int) and, due to the usage of std::function, it’s heavier to compile. On my local machine results with the clang-10 via WSL2 it took twice as long to compile: 359 vs 183 ms.

Motivation
Usage and examples
- Function overloading
- Pointer automation
Under the hood

Motivation

Many programming languages can call libraries with the pure C interface. Libraries themselves may be written in various languages, however, it is a de-facto standard for them to have a C interface.

Due to the lack of function overloading in pure C, library maintainers are required to explicitly specify all of the available types for the function. For example, I’d like to list one of my favourite libraries out there, the Intel Integrated Performance Primitives, IPP:

IppStatus   ippsMulC_16s_I(Ipp16s val, Ipp16s* pSrcDst, int len);
IppStatus   ippsMulC_32f_I(Ipp32f val, Ipp32f* pSrcDst, int len);
IppStatus   ippsMulC_64f_I(Ipp64f val, Ipp64f* pSrcDst, int len);
IppStatus   ippsMulC_32fc_I(Ipp32fc val, Ipp32fc* pSrcDst, int len);
IppStatus   ippsMulC_64fc_I(Ipp64fc val, Ipp64fc* pSrcDst, int len);
// ... and so on

If you’re a C++ developer like myself, you may find this mildly irritating to look up and change the function every single time you decide to change the type. And it works poorly with generic (templated) code as well.

Usage and examples

Wrapper object is created in the constructor and then the correct overload is selected in the operator() call:

auto fn = plusifier::FunctionWrapper(/*function overloads*/);

auto dst = fn(/* function arguments... */);

Pointer wrapper object is used similarally:

auto ptr = plusifier::PointerWrapper<PointerType, DeleterFunction>(allocator_function, /* allocator function arguments... */);

Where allocator_function may be both the callable (function pointer, lambda, std::function) as well as the plusifier::FunctionWrapper.

Function overloading

For a more simplified example, suppose we have three functions with a slightly different signature:

int square_s8(const std::int8_t* val, int sz) {
    return 1;
}
int square_s32(const std::int32_t* val, int sz) {
    return 4;
}
int square_fp32(const float* val) {
    return 8;
}

With this library, they may be packed into single object:

auto square = plusifier::FunctionWrapper(square_s8, square_s32, square_fp32);

auto dst_ch = square(arr_ch.data(), 0);     // <-- calls square_s8
auto dst_int = square(arr_int.data(), 0);   // <-- calls square_s32
auto dst_fp32 = square(arr_fp32.data());    // <-- calls square_fp32

It will check if the passed arguments are viable to be used as the arguments for the functions at the compile-time and select the most appropriate overload.

Pointer automation

RAII is the lifesaver in modern C++. However, it’s a bit tedious to mix it with the C-style allocations. One of the approaches would be to use the std::unique_ptr with a custom deleter, but it’s quite excess, so I decided to expand this library a little bit more.

For example, we might have a specified allocation functions for various types:

Ipp8u*      ippsMalloc_8u(int len);
Ipp16u*     ippsMalloc_16u(int len);
Ipp32u*     ippsMalloc_32u(int len);
Ipp8s*      ippsMalloc_8s(int len);
Ipp16s*     ippsMalloc_16s(int len);
Ipp32s*     ippsMalloc_32s(int len);
Ipp64s*     ippsMalloc_64s(int len);
Ipp32f*     ippsMalloc_32f(int len);
Ipp64f*     ippsMalloc_64f(int len);
// and so on...

We’ll wrap all of them into single FunctionWrapper and pass it to the PointerWrapper:

auto ippsMalloc = plusifier::FunctionWrapper(ippsMalloc_8u, ippsMalloc_16u, ippsMalloc_32u, /* etc */);

auto ptr = plusifier::PointerWrapper<Ipp8u, ippsFree>(ippsMalloc, size);

Under the hood

Internals of the class

FunctionWrapper is a variadic template class with the types being the function pointers:

template <typename ... F>
class FunctionWrapper  final {
        static_assert(sizeof...(F) != 0, "FunctionWrapper should be not empty");
        std::tuple<F...> var;
        constexpr static inline std::size_t pack_size = sizeof...(F);
};

First static_assert is used to create a legit compile-time error when there are no functions passed. std::tuple is a heterogeneous container to store those function pointers, and a pack_size is a simple helper constant.

Due to the fact, that there are no references and move semantics in pure C, I’ve decided to omit the perfect forwarding and pass the parameter pack in the constructor as-is, so the constructor is extremely trivial:

FunctionWrapper(F ... functions) : var(functions...) {}

Then there is a function call operator (operator()), overload search and verification routines and small helper functions and classes.

`operator()`

Function call operator may be split into two parts: compile-time and run-time. First is used to select the correct overload or to indicate the lack of one, while the runtime calls the selected function.

template <typename ... Args>
auto operator()(Args ... args) const {
    // compile-time
    constexpr auto verification_result = VerifyOverload<0, Args...>();
    if constexpr (!verification_result)
        static_assert(NoOverloadFound<F...>(), "No suitable overload is found");

    // run-time
    return std::get<verification_result>(var)(args...);
}

Here the verification_result variable is an object of a simple helper struct with two fields and conversion operators. In the first place, I wanted to use a structured binding, but the compiler told me I’m not supposed to. This struct contains an index of the function inside the tuple and the fact that the correct overload has been found. This flag ended up there due to the recursive nature of the used template metaprogramming approach.

Verification starts at index 0 and iterates up to the end of the tuple.

Function verification

Every iteration, I get the function pointer signature from the tuple, as well as the std::function signature to ease the following metaprogramming. Then there’s an excellent function std::is_invocable_v in the standard library, that allows me to check if the function pointer in the tuple may be called with the type pack passed to the operator(). If we’re good, we would prematurely quit the function, otherwise, we’ll continue iterating, until the very end of the tuple.

If there’s no suitable overload, a function with a failing static_assert is called for better error diagnostics.

Estimating the penalty of including Boost libraries

2021-06-29T00:00:00+00:00

TL;DR; I’ve made a (yet another) simple repo and a table on its’ wiki to estimate the build penalty when including boost headers.

Boost libraries have a mixed reputation in the C++ community. There are a lot of exceptional quality libraries with algorithms and data structures missing in the standard library. One might even say, that the boost is kind of a playground to test something before it can get into the standard (smart pointers are one of the many examples). On the other hand, boost gets heavily criticized for overcomplication, custom build environment and a lot of cross-connections. Don’t forget the NIH syndrome, which once forced me to re-implement the static_vector class.

C++ is (in)famous for it’s compilation times, especially for the template-heavy code. That got me thinking, how bad is the penalty for including the boost headers? So I’ve came up with a simple CMake script, which creates a trivial source file with a single #include directive, repeated N times for all the main boost headers I could reach:

foreach(header ${HEADERS_TO_PROCESS})
  string(REPLACE "." "-" filename_preliminary ${header})
  string(REPLACE "/" "-" header_name ${filename_preliminary})
  set(filename "${CMAKE_CURRENT_LIST_DIR}/${header_name}_main.cpp")
  file(WRITE ${filename} "#include <${header}>\n  int main() { return 0; }\n")
      
  set(executable_name "Check${header_name}")
  add_executable(${executable_name} ${filename})
  target_compile_options(${executable_name} PUBLIC -ftime-trace)
  target_link_libraries(${executable_name} pthread stdc++ stdc++fs)
endforeach()

Then I used the -ftime-trace clang (9.0+, IIRC) switch to generate JSON report on the compilation times. I decided to settle for the whole .cpp compilation time since it’s easier to drag it from the report.

Due to the fact, that neither Linux nor Windows are real-time operating systems, compilation times wouldn’t be constant and will have some distribution. To account for that, I ran the compilation process several times (10 to 20 looks fine to me) and averaged the results.

I wrote a simple program to read the clang reports, average the values and print them in a markdown-friendly way. I also decided that it would be interesting to estimate the relative slowdown to the plain int main() source file. The resulting table looks something like this:

Header	Time, ms	Relative slowdown
boost/accumulators/accumulators.hpp	3000.4	357.19
boost/algorithm/algorithm.hpp	693.667	82.5794
boost/align.hpp	495.733	59.0159
…	…	…

To make it more reproducible and trustworthy, I’ve added the Travis CI script to build, measure the time and auto-generate and upload to the wiki. As a rant, I’d like to say that I much prefer the GitLab way of CI, which is much more intuitive to me.

An interesting fact: I’ve conducted my first runs at my local PC (Ryzen 3700X, 32GB, WSL Ubuntu) and the bare int main() took relatively the same time to compile (7.2 vs 8 ms), the time tripled for the heaviest boost files (2100 vs 6600 ms for the boost/geometry.hpp).

There’s a simple repo you may check out, the build dependencies are relatively simple (clang 9+ and boost, however, I encourage you to use ninja for speedup).

TEX-CUP GNSS dataset

2020-05-23T00:00:00+00:00

TL;DR: There’s an excellent new GNSS dataset which would allow you to become a better engineer: link. Beware, it would take some time to download it.

TEX-CUP
GNSS data overview
Conclusion

There is an opinion, that an engineer nowadays should know (at least to some degree) MATLAB, a data scientist should get his hands on the MNIST handwritten digit database, every FPGA developer should blink a LED and so on. In my opinion, any engineer, related to the GNSS and signal processing should try to perform a signal acquisition to get all the details about the signals. It’s a relatively simple task and it’s one of the first we give to the interns in our team.

However, there’s a problem: due to the extremely high rate of the IF data (ADC samples at the intermediate frequency, which is about 60 MHz for modern high-precision receivers), any file worth investigating would be enormous, mind the fact that it’s extremely difficult to collect such data in the first place.

There are several known existing signal records, but most of them are for basic use only: GPS-only/L1-only, narrowband low-pass filters in the RF front-end etc. They are good to get your hands on the GNSS, but it won’t get you far in terms of high-precision or modern receiver design, where multi-band multi-constellation signal processing is important, as well as the sensor fusion with inertial sensors, cameras, radars and so on.

For example, there’s a well-known dataset with GPS L1-only, which you may find useful and it’s a (relatively) lightweight signal record, only 1.8G. It’s called GPSdata-DiscreteComponents-fs38_192-if9_55, it is static (no movement of the receiver), narrowband (as you can see from the power spectrum plot below), about 50 seconds long:

TEX-CUP

Let’s get to the point. About a month ago I’ve stumbled upon an extremely well-conducted experiment by a team from the University of Texas at Austin Radionavigation Laboratory, which is described in details in the following article.

This team developed a platform called Sensorium, which consists of multiple sensors, which are sampled simultaneously and recorded independently. The dataset consists of two identical sub-datasets, taken at two days. Each of the sub-datasets contains the following data:

Binary protocol log and the derivative RINEX files from the Septentrio AsteRx4 high-precision receiver;
Accelerometer and gyroscope data from the Bosch BMX055 IMU;
Stereo images from two Basler acA2040-35gm cameras with the Sony IMX265 CMOS sensor;
Accelerometer and gyroscope data from the LORD MicroStrain 3DM-GX5-25 AHRS;
Binary IF-data form the NTLab NT1065-based USB3-grabber (dual/triple band);
Binary IF-data from the RadioLynx GNSS RF front end;
uBlox EVK-M8T NMEA data.

According to the figure of the Sensorium, there are RADAR sensors as well, but I’ve been told that the lab wants to conduct some of the test themselves before making this data public.

As you can see, this is an excellent opportunity for GNSS-related research. One may develop an SDR of various complexity (GPS-only, multi-constellation, multi-frequency etc), sensor fusion with IMU, sensor fusion with cameras and so on. There’s even a RINEX data from a separate receiver if you’d like to develop a custom RTK engine!

Due to the fact of the data synchronization, you may develop inertial sensor fusion algorithms on different levels: loose, tight and ultra-tight coupling. As far as I know, this is a first dataset, consisting of both IMU and IF GNSS data, which allow you to develop your ultra-tight coupling algorithms.

GNSS data overview

Let’s take a look at the GNSS data. For this, I’ll be using my own WIP software-defined receiver, which I’ve been working on in my spare time for the last couple of months. It’s currently L1-only, GPS+GLONASS (combined).

Signal parameters

First, let’s take a look at the signal in the L1-band. NT1065 frontend separates the signals to the different sidebands (therefore, GPS L1 and GLONASS L1 are technically independent IF signals), but for the sake of simplicity, I’ve combined them using the Hilbert transform (Hilbert transform of the real signal provides you with a complex signal with an imaginary part being the real one, shifted 90 degrees).

One more shortcut I’ve taken is that, because currently, I’m not processing BeiDou signals, I can downsample the signal by the factor of two:

In the upper subplot, you may see the spectrum with the distinctive hump of the NT1065 lowpass filter frequency response at around 1 MHz, as well as the slight elevation in the -14.58 MHz (GPS L1). The central frequency of the mixer is the 1590 MHz, which is typical for most NT1065-based designs.

In the bottom left plot, I’ve plotted the signal itself, blue being the real and orange — the imaginary parts. To the right, there’s a histogram, which looks like a Gaussian distribution, which is what we’d expect from a GNSS signal record. One thing I’d like to point out is the bit depth increment due to the Hilbert transform (the original data was 2-bit).

Signal acquisition

The next step is the acquisition process. There are several approaches for acquisition, which aren’t related to this subject, let me provide the acquisition result plots.

Bear in mind, that the acquisition is a statistical process and heavily depends on the detection threshold you set. A good acquisition engine also provides additional steps to get the more precise frequency and delay estimation of the signals, as well as the bit and/or overlay code synchronization.

Signal tracking

After the signals are acquired, the tracker kicks in. In my opinion, it is one of the most scientifically intensive parts of the receiver. Currently, I have a primitive delay- and phase-locked loops, but they do their job, which is fine for me right now.

Tracking results look pretty similar with GLONASS (except for an additional square-wave modulation), therefore let’s not provide it here.

Positioning

As a final step, the receiver decodes the ephemeris data from the tracker output and estimates its position. Currently, due to the time consumption of the receiver, I’m only processing 1 minute of the data (of the about 90 minutes in total). For the first 10 minutes or so the car with the Sensorium on top of it just stands still, and then it begins it’s movement.

Conclusion

Honestly, this dataset is extremely useful for every GNSS-related engineer out there. It’s kind of like Large Hadron Collider, where physicists conduct some experiments, record all the data they can, and research it for decades afterwards. I’m really glad I’ve found it and would like to thank the authors for sharing it with the community. It was a bit of a pain to download it (more than 500GB for the single sub-dataset), but it’s worth it. The other thing I’m glad I did just before I’ve found it is a PC upgrade to 8c/16t CPU, which allowed me to process this data much faster than it would on my old PC.

CPU instruction set dispatcher

2020-03-09T00:00:00+00:00

Introduction
Benchmarking first
Creating a library
Delayed library loading
Generating multiple libraries
Detecting the processor architecture at runtime
Using the library

TL;DR: In this blog post we’ll generate multiple libraries from the same source code with the various architecture flags. Later on, at the runtime, an application selects the most appropriate library based on the instruction set and will gain a 3x performance gain on the simple function I’ve decided to implement.

Introduction

Modern processors are often much more capable than we think because the CPU vendors care about us, fellow programmers. There’s an amazing talk by Matt Godbolt, go check it out if you haven’t already. The popcnt example blows my mind to this day. Briefly: modern (Haswell and forth) processors have a special instruction which counts the number of set bits.

As well as the additional operations, SIMD (single instruction, multiple data) is a thing to be reckoned with. Primarily, it was used to perform simple operations (such as an addition) on “vectors” of data. In this case, “vectors” meant loading the data from the memory to the wide registers of the CPU, processing it with a single instruction and then repeating. Nowadays, AVX-512 allows both long registers (512-bit wide) as well as the sophisticated operations, which are useful for neural network tasks for both inference and training.

If you’re running some scientific/research code on your local powerful computer, this is a place to stop reading, just build your code with -march=native (/arch on MSVC) switch and enjoy all the benefits your hardware can provide. However, if you’re planning on distributing your software, there might be a little problem. By default, modern compilers don’t utilize any of the vector extensions to make the resulting program as portable as possible. This is a good approach, but owners of the modern hardware won’t be as happy, as they could’ve been. Today I’d like to discuss and implement a dispatcher pattern, which is used in the Intel IPP library.

The main idea is to put the most performance-critical code into a separate library, build several variants of it and dispatch the calls at the runtime. In this post, I’ll only consider shared (dynamic) libraries, since it’s easier to implement, but with several tweaks, you may get a static executable with the same functionality

Benchmarking first

As in any optimization related article, first of all, one must benchmark various aspects and parts of an application and decide, which functions should be extracted for the library. My toy example happened to have such a function:

extern "C" void Add(const double* a, const double* b, std::size_t length, double* dst) {
    for (int i = 0; i < length; ++i)
        dst[i] = a[i] + b[i];
}

Yup, I know, raw pointers, but you don’t want to pass C++ objects in and out from the dynamic library, trust me.

So I’ve measured this code on my machine with a Google Benchmark and got these result for 256 elements:

Instruction set	Time, ns
Common	95
AVX	40
AVX2	30

I’d like to point out that I haven’t hand-optimized any of the code, haven’t used any intrinsic functions or whatever, just recompiled with the change of one flag: /arch. Triple performance is a target worth pursuing, so this is a perfect candidate for such an optimization, so let’s proceed to the next step.

Creating a library

As we’ve decided with the functions we’ll be extracting, the next step is to create a library. This is as straightforward as it gets: a pair of header and a source file:

#pragma once

#include 
#include 

namespace lib {
    extern "C" void Add(const double * a, const double * b, std::size_t length, double * dst);
}

#include "lib.hpp"

extern "C" void lib::Add(const double* a, const double* b, std::size_t length, double* dst) {
    for (int i = 0; i < length; ++i)
        dst[i] = a[i] + b[i];
}

And we’ll use the basic CMake script to generate a library:

cmake_minimum_required(VERSION 3.10)
project(best_instruction_set)

set(SOURCES src/lib.cpp) 
set(HEADERS 
            src/lib.hpp 
            src/lib.def
)
            
add_library(best_instruction_set SHARED ${SOURCES} ${HEADERS})
set_property(TARGET best_instruction_set PROPERTY CXX_STANDARD 17)
set_property(TARGET best_instruction_set PROPERTY CXX_STANDARD_REQUIRED ON) 

Delayed library loading

The key to the dispatcher is the delayed loading of the library. This is a concept when the library is being loaded in some moment at runtime and a developer assigns function pointers to the exported functions of a library.

For this purpose I’ve written a simple cross-platform wrapper, which is used as-is for multiple projects:

#pragma once
#include 
#include 
#ifdef _WIN32
#include 
#else
#include 
#endif

namespace DllWrapper {
#ifdef _WIN32
    using InstanceType = HMODULE;
#else
    using InstanceType = void*;
#endif

    inline auto GetInstance(const char* path) {
#ifdef _WIN32
        return LoadLibraryExA(path, nullptr, 0);
#else
        return dlopen(path, RTLD_LAZY);
#endif
    }

    inline void FreeInstance(InstanceType instance) {
        if (!instance)
            return;
#ifdef _WIN32
        FreeLibrary(instance);
#else
        dlclose(instance);
#endif
    }

    inline auto GetAddress(InstanceType instance, const char* symbol_name) {
#ifdef _WIN32
        return GetProcAddress(instance, symbol_name);
#else
        return dlsym(instance, symbol_name);
#endif
    }
}

Based on this common wrapper, every library has to get a custom wrapper, such as the following:

#pragma once

#include "lib.hpp"
#include "wrapper_common.hpp"
#include "cpuinfo_x86.h"

#include 
#include 

struct LibWrapper {
    void (*Add)(const double* a, const double* b, std::size_t length, double* dst) = nullptr;

    LibWrapper() {      
        auto path = std::string("best_instruction_set");

#ifdef _WIN32
        path += ".dll";
#elif __linux__
        path = "lib" + path + ".so";
#else
        throw std::runtime_error("Unexpected system");
#endif
        instance = DllWrapper::GetInstance(path.c_str());
        if (!instance)
            throw std::runtime_error("Unable to load library " + std::string(path));

        Assign("Add", Add);
    }

    ~LibWrapper() {
        DllWrapper::FreeInstance(instance);
    }

private:
    DllWrapper::InstanceType instance = nullptr;
    
    template <typename T>
    void Assign(const char* symbol_name, T& dst_pointer) {
        auto address = DllWrapper::GetAddress(instance, symbol_name);
        if (!address)
            throw std::runtime_error("Unable to find symbol: " + std::string(symbol_name));

        dst_pointer = reinterpret_cast<T>(address);
    }
};

This wrapper is a class with a function pointer and a bare-bones logic. In the constructor, the library is loaded and the pointer is assigned via the helper function. In the destructor, the binary resources are released.

Generating multiple libraries

There’s a simple extension to the provided CMake script, which will allow us to generate multiple libraries from the same source code at the same time:

set(ARCHITECTURE_OPTIONS "avx;avx2;avx512")
            
foreach (INSTRUCTION_SET ${ARCHITECTURE_OPTIONS})
    message(STATUS "Generating ${INSTRUCTION_SET} library")
    add_library(best_instruction_set_${INSTRUCTION_SET} SHARED ${SOURCES} ${HEADERS})
    if (WIN32)
        string(TOUPPER ${INSTRUCTION_SET} UPPERCASE_INSTRUCTION_SET)
        set(COMPILER_OPTION /arch:${UPPERCASE_INSTRUCTION_SET})
    elseif (UNIX)
        set(COMPILER_OPTION -m${INSTRUCTION_SET})
        if (${INSTRUCTION_SET} STREQUAL "avx512")
            set(COMPILER_OPTION -m${INSTRUCTION_SET}f)
        endif (${INSTRUCTION_SET} STREQUAL "avx512")
    endif(WIN32)

    target_compile_options(best_instruction_set_${INSTRUCTION_SET}
          PRIVATE ${COMPILER_OPTION}
    )
    set_property(TARGET best_instruction_set_${INSTRUCTION_SET} PROPERTY CXX_STANDARD 17)
    set_property(TARGET best_instruction_set_${INSTRUCTION_SET} PROPERTY CXX_STANDARD_REQUIRED ON)  
endforeach(INSTRUCTION_SET)

Detecting the processor architecture at runtime

For this, we’ll be using one of the Google side-projects, cpu_features. It will extend the library wrapper class in such a manner:

static auto GetSuffix() -> std::string {
    const auto features = cpu_features::GetX86Info().features;

    if (features.avx512f)
        return "avx512";
    else if (features.avx2)
        return "avx2";
    else if (features.avx)
        return "avx";

    return "";
}

Since we’re using suffixes to distinguish our libraries, this is good enough. So the constructor of the wrapper will be extended as well:

LibWrapper() {
    SwitchImplementation(GetSuffix());
}

void SwitchImplementation(std::string suffix) {
    DllWrapper::FreeInstance(instance);

    auto path = std::string("best_instruction_set") + (suffix.empty() ? "" : ("_" + suffix));

#ifdef _WIN32
    path += ".dll";
#elif __linux__
    path = "lib" + path + ".so";
#else
    throw std::runtime_error("Unexpected system");
#endif
    instance = DllWrapper::GetInstance(path.c_str());
    if (!instance)
        throw std::runtime_error("Unable to load library " + std::string(path));

    Assign("Add", Add);
}

Using the library

And that’s pretty much it. One last this is to use the library, which we’ll be doing through the wrapper we’ve just created:

#include "lib_wrapper.hpp"

#include 
#include 

int main() {
    try {
        std::vector<double> a(64, 1);
        std::vector<double> b(64, 2);
        std::vector<double> dst(64);

        auto wrapper = LibWrapper();
        wrapper.Add(&a[0], &b[0], a.size(), &dst[0]);
        
//      wrapper.SwitchImplementation("avx512");
//      wrapper.Add(&a[0], &b[0], a.size(), &dst[0]);
    }
    catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
    }
    return 0;
}

This is a basic example, but you’re free to play with it in the repository. Looking forward to all the feedback and discussions about this approach and have a nice day.

GitLab CI for C++ projects

2020-02-02T00:00:00+00:00

In 2016, before moving to GitHub pages as a hosting platform for this blog, I wrote a little post about CI and automated builds for C++ projects as a synopsis for the week I spent at work with this task. Currently, we’re modernizing the technological stack for one of our paramount product (neural network middleware for the NeuroMatrix processors called NMDL) and one of the tasks is to configure and maintain the continuous integration system. The modernizing process also involved integration and renovation of tools, projects, architecture VCS and various “best practices”, I hope I’ll compile it as a talk at some C++ conference.

TL;DR: GitLab CI is a great instrument to build, test and deploy C++ projects and it takes less than an hour to setup. There’s a repo with some minimal build and test code you may start with.

Project structure

Throughout this whole post I’ll be referring to this repo, which is a simplified representation of the project structure we use in our projects:

src folder, which contains source files of the project. We tend to keep the header files and various .ui stuff here as well. For our purposes it contains:
- lib.hpp — a header-only library we intend to use and test. It contains a simple Add function template which, according to its’ name, summarizes two numbers and returns the result
- main.cpp with simple usage of the library
test folder is introduced to separate the project code from the tests. In this folder we have:
- test.cpp — the test itself, written to use the googletest library. I prefer it to others (such as Catch2, or the Boost.Test, but it’s up to you to choose which one you like)
- CMakeLists.txt — simple CMake script, which is responsible for building the tests
- CMakeLists.txt.in — a little CMake helper, designed to download the latest googletest library version and build it. It provides tight integration with the build process of the tests themselves and leads to more isolated (in a good way) builds.
Global CMakeLists.txt designed to build and test the library
Default .gitignore as a sign of good manners
.gitlab-ci.yml — the most important file for this post, this is the script that defines the structure and order of pipelines.

When the project is ready, it is time to set up the CI.

Step 1: set up the runner

To familiarize you with CI, I’ll provide a brief simplified explanation of what we’re about to achieve. Our ultimate goal is to constantly check whether the current version of our product (library, project etc) is good enough to be shipped. This can be verified by running the set of predefined tests for every pushed commit of the repository. The program used for it is called the runner. Every time there’s a new commit, GitLab (both the original one and the self-hosted, whichever you prefer) will notify the runner, so it could fetch the latest changes and perform the actions you’ve listed int the .gitlab-ci.yml file.

Installing the runner for Windows is extremely easy, just follow the instructions from GitLab. During the installation, the runner will ask you to provide some information and will register itself as a service.

I’ve had a couple of difficulties with the runner, which I’ll list to save you some time:

Make sure you have git.exe added to your PATH environment variable;
The runner registers itself as a system service, which may not be something you want. If you’re getting some weird access errors, find the gitlab-runner in the services (Start -> services) and change the login type to one of the administrator accounts you have on that server.
I’d like to state this as a separate point if you want to use WSL (Windows Subsystem for Linux, lightweight virtual machine in Windows 10, highly recommend!) be aware, that the WSL currently is installed per-user, therefore, to make it available for the runner, the runners’ service must be logged-in via that account.

Step 2: come up with the CI scenario

After the runner is set, you’re almost good to go. You have a server to run your tasks, optionally, some specific hardware and a repository. It’s obvious, but be sure to install all the required developer environment at the server you’ll build your project on.

This is the where CMake shines. If you’ve done everything correctly, all you have to do is just these simple commands:

mkdir build && cd build
cmake ..
cmake --build .
ctest .

The thing I love the most about that approach is that it’s entirely cross-platform. CMake will generate Visual Studio solutions for Windows and Makefiles for Linux. You don’t have to write specific build scripts, just one common CMakeLists will do.

Of course, it is always better to separate the CI script into several stages, so if something fails, you won’t have to read all the listing to reveal the part where everything went wrong.

There are numerous additional topics one may cover. Here I’ve provided the very basics for you to start integrating CI into your project and making it a little better every time.