MDFREADER
This module imports MDF files (Measured Data Format V3.x and V4.x), typically
from INCA (ETAS), CANape or CANoe. It is widely used in the automotive industry
to record data from ECUs. The main module mdfreader.py inherits from two
module pairs (one per MDF version): the first reads the file's block structure
(mdfinfoX), and the second reads the raw data (mdfXreader). It can
optionally run multithreaded and was designed for efficient batch processing of
large endurance-evaluation files for data mining.
When Cython is available (strongly recommended), mdfreader uses several low-level optimisations:
-
Fast CN/CC/SI/TX metadata reader (
read_cn_chain_fastindataRead.pyx): walks the entire MDF4 channel linked list in a single Cython function using POSIXpread()(no Python file-object dispatch, no GIL during I/O) and C packed-structmemcpyparsing. A fast<TX>…</TX>bytes scan replaceslxml.objectifyfor the common MD-block pattern (~95% of files). Result: 3–4× speedup on large files compared to the pure-Python path. -
SymBufReader: a Cython bidirectional-buffered wrapper around the raw file object. MDF4 metadata blocks are linked by backward-pointing pointers;
SymBufReaderkeeps a 64 KB buffer centred on the current position so that most seeks are served from cache without a kernelread(). -
Vectorised data reading: sorted channel groups are read in a single
readinto()call into a flatuint8buffer that is then reinterpreted as a structured record array — zero copies, no per-chunk Python loop.
Typical timings on a 184 MB / 36 000-channel MDF4 file:
| Scenario | Time |
|---|---|
| Pure Python path | ~1.9 s |
| v4.2 with Cython | ~1.9 s |
| v4.3 (this version) | ~0.6 s |
For each channel mdf[channelName] the following keys exist:
| Key | Description |
|---|---|
data |
numpy array of channel values |
unit |
unit string |
master |
name of the master (time/angle/…) channel |
masterType |
master channel type: 0=None, 1=Time, 2=Angle, 3=Distance, 4=Index |
description |
channel description string |
conversion |
present when convert_after_read=False; dict describing raw→physical mapping |
mdf.masterChannelList is a dict mapping each master channel name to the list
of channels sampled at the same raster.
- resample channels to one sampling frequency
- merge files
- plot one channel, several channels on one graph (list) or several channels on subplots (list of lists)
It is also possible to export mdf data into:
- CSV file (Excel dialect by default)
- NetCDF file for compatibility with Uniplot (needs
netcdf4,Scientific.IO) - HDF5 (needs
h5py) - Excel 95–2003 (needs
xlwt— very slow for large files) - Excel 2007/2010 (needs
openpyxl— can also be slow with large files) - Matlab
.mat(needshdf5storage) - MDF file — allows creating, converting or modifying data, units and descriptions
- Pandas DataFrame(s) (command line only, not in mdfconverter) — one DataFrame per raster
Python 3.9+ — tested on Linux and Windows (x86-64)
Core: numpy, lxml, sympy
lxml is used for MDF4 metadata XML blocks. When Cython is compiled, the fast
path handles the common <TX>…</TX> pattern directly from bytes and only falls
back to lxml for complex XML (CDATA, namespaces).
Reading channels defined by a formula requires sympy.
Cython is strongly advised. It compiles dataRead.pyx, which provides:
- fast metadata parsing via
pread()+ C packed structs - the
SymBufReaderbidirectional file buffer - bit-exact reading for non-byte-aligned or record-padded channels
- VLSD/VLSC string data reading helpers
If Cython compilation fails, bitarray is used as a fallback (slower, pure Python).
Export requirements (optional): scipy, h5py, hdf5storage, openpyxl, pandas, fastparquet
Data compression in memory (optional): blosc
Graphical converter: PyQt5
From PyPI:
pip install mdfreaderFrom source:
pip install cython numpy # build prerequisites
python setup.py build_ext --inplace
python setup.py developA PyQt5 GUI to convert batches of files. Launch with:
mdfconverterRight-click a channel in the list to plot it. Channels can be dragged between
columns. A .lab channel-list file can be imported. Multiple files can be
merged into one and resampled.
For large files or limited memory:
- Channel list only — pass
channel_list=['ch1', 'ch2']; callmdfreader.MdfInfo(file)to get the full channel list without loading data. - Raw data mode — pass
convert_after_read=False; data stays as stored in the MDF file and is converted on-the-fly byget_channel_data,plot,export_to_*, etc. - Blosc compression — pass
compression=True(default level 9) to compress data in memory after reading. - No-data skeleton — pass
no_data_loading=Trueto build the channel metadata dict without reading any samples; data is fetched on demand viaget_channel_data.
For data visualisation, a dataPlugin for Veusz (≥ 1.16) is also available; follow the instructions in Veusz's documentation and the plugin file's header.
import mdfreader
# loads whole mdf file content in yop mdf object.
yop=mdfreader.Mdf('NameOfFile')
# you can print file content in ipython with a simple:
yop
# alternatively, for max speed and smaller memory footprint, read only few channels
yop=mdfreader.Mdf('NameOfFile', channel_list=['channel1', 'channel2'], convert_after_read=False)
# also possible to keep data compressed for small memory footprint, using Blosc module
yop=mdfreader.Mdf('NameOfFile', compression=True)
# for interactive file exploration, possible to read the file but not its data to save memory
yop=mdfreader.Mdf('NameOfFile', no_data_loading=True) # channel data will be loaded from file if needed
# parsing xml metadata from mdf4.x for many channels can take more than just reading data.
# You can reduce to minimum metadata reading with below argument (no source information, attachment, etc.)
yop=mdfreader.Mdf('NameOfFile', metadata=0) # 0: full, 2: minimal
# only for mdf4.x, you can search for the mdf key of a channel name that can have been recorded by different sources
yop.get_channel_name4('channelName', 'source path or name') # returns list of mdf keys
# to yield one channel and keep its content in mdf object
yop.get_channel('channelName')
# to yield one channel numpy array
yop.get_channel_data('channelName')
# to get file mdf version
yop.MDFVersionNumber
# to get file structure or attachments, you can create a mdfinfo instance
info=mdfreader.MdfInfo()
info.list_channels('NameOfFile') # returns only the list of channels
info.read_info('NameOfFile') # complete file structure object
yop.info # same class is stored in mdfreader class
# to list channels names after reading
yop.keys()
# to list channels names grouped by raster, below dict mdf attribute contains
# pairs (key=masterChannelName : value=listOfChannelNamesForThisMaster)
yop.masterChannelList
# quick plot or subplot (with lists) of channel(s)
yop.plot(['channel1',['channel2','channel3']])
# file manipulations
yop.resample(0.1)
# or
yop.resample(master_channel='master3')
# keep only data between begin and end
yop.cut(begin=10, end=15)
# export to other file formats :
yop.export_to_csv(sampling=0.01)
yop.export_to_NetCDF()
yop.export_to_hdf5()
yop.export_to_matlab()
yop.export_to_xlsx()
yop.export_to_parquet()
# return pandas dataframe from master channel name
yop.return_pandas_dataframe('master_channel_name')
# converts data groups into pandas dataframes and keeps it in mdf object
yop.convert_to_pandas()
# drops all the channels except the one in argument
yop.keep_channels({'channel1','channel2','channel3'})
# merge 2 files
yop2=mdfreader.Mdf('NameOfFile_2')
yop.merge_mdf(yop2)
# can write mdf file after modifications or creation from scratch
# write4 and write3 also allow to convert file versions
yop.write('NewNameOfFile') # write in same version as original file after modifications
yop.write4('NameOfFile', compression=True) # write mdf version 4.1 file, data compressed
yop.write3() # write mdf version 3 file
yop.attachments # to get attachments, embedded or paths to files