Jekyll2023-06-26T19:18:24+00:00https://integrated-earth.github.io//feed.xmlIntegrated Geodynamic Earth ModelsNSF funded project to generate integrated geodynamic Earth models.Remote Rendering Setup2023-06-01T10:00:00+00:002023-06-01T10:00:00+00:00https://integrated-earth.github.io//2023/06/01/remote<p>This post summarizes the steps needed for parallel remote rendering using ParaView. With this setup you can
visualize very large simulation data that is residing on a workstation while visualizing it over the internet
on a laptop.</p>
<h1 id="setup-server">Setup server</h1>
<p>After installing ParaView on the server, go into the bin/ folder and run (over an ssh terminal):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./mpiexec -np 16 ./pvserver -display :0
</code></pre></div></div>
<p>Note that you need to use the included <code class="language-plaintext highlighter-rouge">mpiexec</code>, which is likely different than the system MPI. This will
create 16 processes that will take care of loading, filtering, and rendering the data.</p>
<h1 id="connectivity">Connectivity</h1>
<p>To connect to the server, you must be able to reach the machine (and port 11111) directly. This can be achieved
by connecting to the school VPN where the workstation sits, or alternatively using <a href="https://tailscale.com">tailscale</a>.</p>
<h1 id="on-the-client">On the client</h1>
<p>You will need to install the identical version of ParaView that you have running on the server. Simply open it up
and connect using “File”, “Connect”.</p>
<p>It is useful to enable debug output using “Edit”, “Settings”, “Render View”, “Show Annotation”. After loading
files it will look something like this:</p>
<p><img src="/images/remote-rendering.png" alt="" /></p>
<h1 id="references">References</h1>
<ul>
<li>https://docs.paraview.org/en/latest/ReferenceManual/parallelDataVisualization.html</li>
</ul>
<p><em>(written by <a href="https://www.math.clemson.edu/~heister/">Timo Heister</a>)</em></p>This post summarizes the steps needed for parallel remote rendering using ParaView. With this setup you can visualize very large simulation data that is residing on a workstation while visualizing it over the internet on a laptop.A quick introduction to performance testing2023-02-03T15:00:00+00:002023-02-03T15:00:00+00:00https://integrated-earth.github.io//2023/02/03/perf<p>Today I would like to show how one get a quick estimate on the performance impact on a specific code change using the Linux tool <code class="language-plaintext highlighter-rouge">perf</code> (see <a href="https://perf.wiki.kernel.org/index.php/Tutorial">perf tutorial</a> for an introduction).</p>
<h1 id="the-setup">The Setup</h1>
<p>I am considering the ASPECT pull request <a href="https://github.com/geodynamics/aspect/pull/5044">#5044</a> that removes
an unnecessary copy of a vector inside the linear solver. I was curious how much of a difference this makes
in practice.</p>
<p>First, we need to pick a suitable example prm file to run. The change
is inside the geometric multigrid solver, so we need to run a test
that uses it. We also want it to be large enough that we can easily
time it without too much noise. For this, we are going to pick
<a href="https://github.com/geodynamics/aspect/tree/main/benchmarks/nsinker">nsinker
benchmark</a>
and slightly modify the file (disable graphical output, disable
adaptive refinement, choose 6 global refinements). See
<a href="https://gist.github.com/tjhei/d895c50e2481d7f3b8013c69e9cf17a8">here</a>
for the file.</p>
<p>We can only get a good estimate for the performance difference, if we
compare optimized versions. That’s why we need to compile both
versions of ASPECT (with and without the change) in <a href="https://aspect-documentation.readthedocs.io/en/latest/user/run-aspect/debug-mode.html?highlight=release">optimized
mode</a>. We
also use <a href="https://aspect-documentation.readthedocs.io/en/latest/user/methods/geometric-multigrid.html?highlight=candi#geometric-multigrid">native
optimizations</a>
as recommended for geometric multigrid.</p>
<h1 id="a-first-test">A first test</h1>
<p>With perf correctly configured, we can get a first idea about the program by running</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf stat ./aspect test.prm
</code></pre></div></div>
<p>and get something like the following output</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf stat ../aspect-new test.prm
-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
-- . version 2.5.0-pre
-- . using deal.II 9.4.1 (dealii-9.4, 6a1115bbf6)
-- . with 32 bit indices and vectorization level 2 (256 bits)
-- . using Trilinos 13.2.0
-- . using p4est 2.3.2
-- . running in OPTIMIZED mode
-- . running with 1 MPI process
-----------------------------------------------------------------------------
Loading shared library <./libnsinker.so>
Vectorization over 4 doubles = 256 bits (AVX), VECTORIZATION_LEVEL=2
-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
-- https://aspect.geodynamics.org/citing.html?ver=2.5.0-pre&mf=1&sha=&src=code
-----------------------------------------------------------------------------
Number of active cells: 262,144 (on 7 levels)
Number of degrees of freedom: 8,861,381 (6,440,067+274,625+2,146,689)
*** Timestep 0: t=0 seconds, dt=0 seconds
Solving Stokes system...
GMG coarse size A: 81, coarse size S: 8
GMG n_levels: 7
Viscosity range: 0.01 - 100
GMG workload imbalance: 1
Stokes solver: 28+0 iterations.
Schur complement preconditioner: 29+0 iterations.
A block preconditioner: 29+0 iterations.
Relative nonlinear residual (Stokes system) after nonlinear iteration 1: 0.999967
Postprocessing:
System matrix memory consumption: 101.42 MB
Termination requested by criterion: end time
+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 117s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system rhs | 1 | 15.9s | 14% |
| Build Stokes preconditioner | 1 | 5.96s | 5.1% |
| Initialization | 1 | 0.106s | 0% |
| Postprocessing | 1 | 0.00504s | 0% |
| Setup dof systems | 1 | 10s | 8.5% |
| Setup initial conditions | 1 | 8.58s | 7.3% |
| Setup matrices | 1 | 3.37s | 2.9% |
| Solve Stokes system | 1 | 68.1s | 58% |
+---------------------------------+-----------+------------+------------+
-- Total wallclock time elapsed including restarts: 117s
-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
-- https://aspect.geodynamics.org/citing.html?ver=2.5.0-pre&mf=1&sha=&src=code
-----------------------------------------------------------------------------
Performance counter stats for '../aspect-new test.prm':
117,495.66 msec task-clock # 0.987 CPUs utilized
1,134 context-switches # 9.651 /sec
44 cpu-migrations # 0.374 /sec
4,838,162 page-faults # 41.177 K/sec
412,324,642,261 cycles # 3.509 GHz
1,021,213,808,438 instructions # 2.48 insn per cycle
122,571,132,407 branches # 1.043 G/sec
338,189,797 branch-misses # 0.28% of all branches
119.093952900 seconds time elapsed
112.405895000 seconds user
5.087723000 seconds sys
</code></pre></div></div>
<p>As you can see, we are indeed running in optimized mode, with
vectorization enabled, and we are solving a 3d problem with 8.8
million degrees of freedom. It takes about 68 seconds to solve the
Stokes system with a single MPI rank.</p>
<h1 id="the-real-setup">The real setup</h1>
<p>For a more realistic test, we will run the same program with 4 MPI ranks (this way at least some small cost for possible changes in communication are accounted for) by using <code class="language-plaintext highlighter-rouge">mpirun -n 4 ./aspect</code>. Finally,
<code class="language-plaintext highlighter-rouge">perf</code> supports running the program several times and averaging the stats. This turns out to be necessary,
as the change is otherwise too small to detect.</p>
<p>Our final command line is thus</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>perf stat -r 10 mpirun -n 4 ../aspect test.prm
</code></pre></div></div>
<h1 id="the-result">The result</h1>
<p>The output without the patch is</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Performance counter stats for 'mpirun -n 4 ../aspect-old test.prm' (10 runs):
182,419.23 msec task-clock # 4.010 CPUs utilized ( +- 0.44% )
1,042 context-switches # 5.934 /sec ( +- 7.12% )
137 cpu-migrations # 0.780 /sec ( +- 4.94% )
2,016,941 page-faults # 11.485 K/sec ( +- 0.36% )
536,241,394,539 cycles # 3.054 GHz ( +- 0.25% )
1,180,113,849,900 instructions # 2.25 insn per cycle ( +- 0.12% )
159,889,552,768 branches # 910.491 M/sec ( +- 0.24% )
446,788,836 branch-misses # 0.28% of all branches ( +- 0.30% )
45.494 +- 0.200 seconds time elapsed ( +- 0.44% )
</code></pre></div></div>
<p>while the new version gives</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Performance counter stats for 'mpirun -n 4 ../aspect-new test.prm' (10 runs):
174,309.09 msec task-clock # 3.880 CPUs utilized ( +- 0.21% )
1,350 context-switches # 7.787 /sec ( +- 4.85% )
102 cpu-migrations # 0.588 /sec ( +- 5.06% )
1,993,599 page-faults # 11.499 K/sec ( +- 0.38% )
522,637,629,676 cycles # 3.015 GHz ( +- 0.11% )
1,170,583,946,847 instructions # 2.24 insn per cycle ( +- 0.09% )
157,504,211,601 branches # 908.506 M/sec ( +- 0.17% )
448,626,534 branch-misses # 0.29% of all branches ( +- 0.23% )
44.9306 +- 0.0919 seconds time elapsed ( +- 0.20% )
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>The new code executes about 1% fewer instructions, the total runtime
decreases from 45.5 to 44.9 seconds (with some uncertainty, see
above). The Stokes solve takes around 26 seconds (not shown), which
means the patch improves the Stokes solve by about 2%.</p>
<p>What is not taking into account here is that the construction and
usage of the vector also causes some MPI communication, which is
potentially more expensive when running large simulations on more than
a single node.</p>
<p><em>(written by <a href="https://www.math.clemson.edu/~heister/">Timo Heister</a>)</em></p>Today I would like to show how one get a quick estimate on the performance impact on a specific code change using the Linux tool perf (see perf tutorial for an introduction).Videos and Interactive Visualization Tests2022-08-02T15:18:18+00:002022-08-02T15:18:18+00:00https://integrated-earth.github.io//2022/08/02/viz<p>This blog post is a quick progress report on experiments on video rendering and interactive visualizations. I am including instructions on how these were generated below.</p>
<h1 id="video-rendering">Video rendering</h1>
<p>After loading the single timestep into ParaView, we set up the following filter chain:</p>
<p>We then save the files as .png files using File->Save Animation… and render using ffmpeg:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ffmpeg -framerate 25 -i a.%04d.png -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p output.mp4
</code></pre></div></div>
<h2 id="youtube-version">Youtube version</h2>
<iframe width="500" height="420" src="https://www.youtube.com/embed/LRs1Qdm-FSI" frameborder="0" allowfullscreen=""></iframe>
<p>See <a href="https://youtu.be/LRs1Qdm-FSI">https://youtu.be/LRs1Qdm-FSI</a></p>
<h2 id="self-hosted-and-directly-embedded">Self-hosted and directly embedded</h2>
<video width="500" height="420" muted="" autoplay="" controls="">
<source src="https://f.tjhei.info/files/fres-spherical-rotating-for-cover.mp4" type="video/mp4" />
</video>
<h1 id="interactive-visualizations">Interactive visualizations</h1>
<p>After experimenting with different technologies, we are quite happy with what <a href="https://kitware.github.io/glance/index.html">ParaView Glance</a> has to offer. It can open <code class="language-plaintext highlighter-rouge">vtkjs</code> files (these can be obtained by exporting directly from ParaView by clicking File -> Export Scene…) directly and allows the user to toggle visibility and rendering style for individual pieces of the visualization much like in ParaView itself.</p>
<p>You can generate state files by going to <a href="https://f.tjhei.info/glance">our Glance Instance</a>, opening a local vtkjs file, and clicking “save state”.
To host the examples online, upload the state file to f.tjhei.info and provide the address to the file in the link in the form</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://f.tjhei.info/glance/?name=NAME&url=URL
</code></pre></div></div>
<p>where you replace NAME by a filename and URL by the full address on where the file can be downloaded from. Note that NAME has to have the same file extension as the file to load (in our examples it is <code class="language-plaintext highlighter-rouge">.glance</code>). Finally, the URL and the instance of Glance have to be on the same server (single origin browser policy), so files need to be served from f.tjhei.info.</p>
<p>Here are two examples (click the image to open glance):</p>
<p><a href="https://f.tjhei.info/glance/?name=faulting-demo.glance&url=https://f.tjhei.info/view-faulting-test/202278_17-26-18.glance"><img src="/images/glance-fault.png" alt="Faulting Demo" /></a>
<a href="https://f.tjhei.info/glance/?name=spherical-cover-v2.glance&url=https://f.tjhei.info/view-spherical-cover/spherical-cover-v2.glance"><img src="/images/glance-spherical.png" alt="Spherical Demo" /></a></p>
<h1 id="future-work">Future work</h1>
<ul>
<li>Test loading data files directly (VTI, etc.)</li>
<li>Test the animation support</li>
<li>Test problems with surface deformation</li>
<li>Using resampled spherical models</li>
</ul>
<h1 id="references">References</h1>
<ul>
<li><a href="https://kitware.github.io/glance/index.html">ParaView Glance</a></li>
</ul>This blog post is a quick progress report on experiments on video rendering and interactive visualizations. I am including instructions on how these were generated below.Starting Earth Models2021-08-25T15:18:18+00:002021-08-25T15:18:18+00:00https://integrated-earth.github.io//2021/08/25/starting-earth-models<p>Mantle convection and the associated plate tectonics are some of the
most fundamental yet complex processes here on Earth. The complexity
arises from several physical processes governing the mantle
circulation at different temporal and spatial scales. With the recent
increase in the availability of computational resources and advanced
numerical techniques, global mantle flow models have become possible
to investigate the underlying physics of plate tectonics. Here, we
will discuss one such model developed as part of the NSF-funded
project,
<a href="https://integrated-earth.github.io/">Integrated Geodynamic Earth Models</a>.</p>
<p>We setup instantaneous mantle convection models using
<a href="https://aspect.geodynamics.org/">ASPECT</a>, an open-source code that
simulates problems in the Earth’s mantle, with the goal to reproduce
present-day GPS velocities and deformation patterns. The material
properties in our models are constrained using recent high-resolution
geophysical observations. The main components of our models and
corresponding parameter values are mentioned below:</p>
<p>1) Input global tomography model: Several global seismic tomography
models have emerged since early 21st century revealing detailed
heterogeneity throughout the mantle. The models differ in the input
travel-times of a seismic phase and frequency content of that
phase. Currently, we use the joint P and S wave tomography model,
LLNL-G3D-JPS, by Simmons et al.,(2015) with a resolution of ~1 degree
in the upper mantle and ~2 degrees in the lower mantle.</p>
<p>2) Density model: Our models are driven by buoyancy forces which are
calculated using depth-dependent scaling of density anomalies with the
S-wave anomalies. We base densities in the crust from Crust1.0 model
(Laske et al., 2013) averaging over the upper crust, middle crust and
lower crust layers.</p>
<p>3) Temperature model: We compute temperature variations from a mantle
adiabat using a constant scaling factor, -4.2, with the S-wave
anomalies in the global tomography model (see Figure below). The lower
wavelength variations heterogeneity expected in the upper mantle are
often smoothed in the global tomography models. Therefore, we use a
high-resolution temperature model (TM1 in Tutu et al., 2018) in the
top 300 km that includes the variable ages of continental lithosphere,
cooling of oceanic lithospheres and cold slab structures.</p>
<p>4) Plate boundaries: We prescribe plate boundaries from the Global
Earthquake fault database in
<a href="https://github.com/GeodynamicWorldBuilder/WorldBuilder">WorldBuilder</a>,
an open-source code that can prescribe complex initial conditionsin
geodynamic models.</p>
<p>5) Rheology computation: We use dislocation and diffusion creep with
different prefactors for different mineral phases. The average lateral
variations in viscosity are scaled to a reference viscosity profile
(Steinberger and Calderwood, 2006) that is consistent with the
observed geoid. Additionally, we weaken the plate boundaries to
localize deformation along them.</p>
<p>We include all these components in a modular fashion to test the
relative importance of each component to best-match the surface GPS
observations.</p>
<p><img src="/images/sem-fig1.png" alt="" /></p>
<p>To resolve the high deformation at the plate boundaries, we use
adaptive mesh refinement in ASPECT. Our current highest-resolution
models have refinement cell size of up to ~ 10 km:</p>
<p><img src="/images/sem-fig2.png" alt="" /></p>Mantle convection and the associated plate tectonics are some of the most fundamental yet complex processes here on Earth. The complexity arises from several physical processes governing the mantle circulation at different temporal and spatial scales. With the recent increase in the availability of computational resources and advanced numerical techniques, global mantle flow models have become possible to investigate the underlying physics of plate tectonics. Here, we will discuss one such model developed as part of the NSF-funded project, Integrated Geodynamic Earth Models.Scaling parallel IO in ASPECT2021-08-15T15:18:18+00:002021-08-15T15:18:18+00:00https://integrated-earth.github.io//2021/08/15/aspect-io<p>The goal of this post is to evaluate the performance of generating
visualization output for large computations produced by ASPECT. ASPECT
uses deal.II to generate the visualization output. By default we
generate VTU files of the unstructured mesh. Instead of generating one
file per MPI rank, the output can be <em>grouped</em> to a specified number
of files (even a single one). These files are written using MPI I/O,
which should allow for fast performance.</p>
<h1 id="machine-setup-and-striping">Machine setup and striping</h1>
<ul>
<li>Computations done on Frontera, nsinker benchmark, adaptive refinement</li>
<li>Computations were done on 32 nodes (56 cores each)</li>
<li>/scratch1/ LFS filesystem has 16 OSTs (file servers), up to 60 GB/s</li>
<li>Striping can be enabled (-1 = maximum striping) by calling
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lfs setstripe -c -1 <file or fikder>
</code></pre></div> </div>
</li>
<li>default striping is 1 (disabled)</li>
</ul>
<h1 id="results">Results</h1>
<p><img src="/images/vtu-io-scaling.png" alt="" /></p>
<h1 id="conclusions">Conclusions</h1>
<p>This little experiment showed some interesting results:</p>
<ul>
<li>
<p>For now, grouping to 16 without striping gives the best performance. This is the default in ASPECT.</p>
</li>
<li>
<p>We can achieve up to 2 GB/s in performance. This is far from the theoretical maximum.</p>
</li>
<li>
<p>Overall, performance is good enough: The linear solver takes about 30 seconds for 800m DoFs (IO: 5 seconds).</p>
</li>
</ul>
<h1 id="future-work">Future work</h1>
<ul>
<li>
<p>Compare against HDF5 output.</p>
</li>
<li>
<p>Check why striping is slower than writing several files.</p>
</li>
<li>
<p>How about 32 files instead of 16?</p>
</li>
</ul>
<h1 id="references">References</h1>
<ul>
<li><a href="https://frontera-portal.tacc.utexas.edu/user-guide/files/">Frontera Guide</a></li>
</ul>The goal of this post is to evaluate the performance of generating visualization output for large computations produced by ASPECT. ASPECT uses deal.II to generate the visualization output. By default we generate VTU files of the unstructured mesh. Instead of generating one file per MPI rank, the output can be grouped to a specified number of files (even a single one). These files are written using MPI I/O, which should allow for fast performance.Sampling AMR data, storing, and rendering it2020-08-15T15:18:18+00:002020-08-15T15:18:18+00:00https://integrated-earth.github.io//2020/08/15/structured-netcdf<p>The goal of this article is compare storage formats and rendering of Finite
Element solutions produced with ASPECT. The computational mesh in ASPECT is a
collection of adaptively refined octrees in 3d, see an example image below.
For this experiment, we consider a stationary test benchmark in a unit cube
that is part of ASPECT, see
<a href="https://github.com/geodynamics/aspect/tree/master/benchmarks/nsinker">nsinker</a>.</p>
<p>As part of <a href="https://integrated-earth.github.io/">this NSF funded project</a>, we
are evaluating how sampling unstructured data to a structured mesh works, and
how storage and rendering of this structured data compares to the original
unstructured output produced by ASPECT and the underlying
<a href="https:/dealii.org">deal.II library</a>.</p>
<h1 id="parallel-sampling-of-unstructured-amr-data">Parallel sampling of unstructured AMR data</h1>
<p>For this experiment, we wrote an ASPECT postprocessor that sampled arbitrary
solution variables on an unstructured AMR grid to a structured grid of
arbitrary resolution. The postprocesor can be run at different resolutions to
produce “multi-resolution” output.</p>
<p>The structured data is generated by looping over all cells, evaluating the
solution variables at each quadrature point for some quadrature rule (slower
to evaluate but more accurate with more quadrature points per cell). We then
use nearest neighbor interpolation (the values at the closest quadrature point
are used for each structured grid point). See the following image for an example:</p>
<p><img src="/images/draw-io-structured-sampling.png" alt="" /></p>
<p>Here, black is the unstructured mesh, red are the quadrature points in each
cell, blue circles are the points of the structured mesh, and the arrows
denote what data is used at each point. Notice that a more sophisticated
interpolation could be used, but this is certainly accurate enough for
graphical visualization.</p>
<p>Internally, the algorithm transforms each quadrature point location to index
space and then “splats” the solution to close by structured points. We keep
track of the real world distance to the currently closest quadrature point in
each structured point (as an additional output variable). When “splatting”, we
only overwrite the current value, if the new distance is smaller than the
stored one.</p>
<p>This also works in an MPI parallel computation, because we can split the
structured mesh between processors and we know based on a given index, who the
owner is. Each rank sends a list of indices with values and their distance to
the owner, who then performs a similar “splat” operation like it is done for
the own values.</p>
<p>The structured mesh is then output in parallel using the netCDF library with
the HDF5 backend.</p>
<h1 id="structured-netcdf-vs-unstructured-compressed-vtu">Structured netCDF vs unstructured, compressed vtu</h1>
<p>We now compare the output of a structured netCDF file against the unstructured
VTU output (as done using deal.II). The example is done on a sequence of
adaptively refined meshes (based on the viscosity, visualized below). We
output 6 data values in double precision (this corresponds to x,y, and z
velocity, pressure, temperature, and viscosity here.</p>
<p>The unstructured solution in one of the intermediate steps looks like this (we
show surface rendering, volume rendering, and isosurface rendering with 3
contours):</p>
<p><img src="/images/result-amr.png" alt="" /></p>
<p>The unstructured data stores an unstructured list of cells (the leaves of the
octree) with vertex coordinates in an compressed VTU file format. This is done
by sending the binary representation of the data through zlib, followed by a
base64 encoding to end up with a valid XML VTU file.</p>
<p>The structured data at various resolutions looks like this:</p>
<p><img src="/images/result-structured.png" alt="" /></p>
<p>The netCDF file stores the data as binary using HDF5 directly and without
compression (also see below for compression).</p>
<p>Now to the results. We start with the structured data:</p>
<p><img src="/images/table-structured.png" alt="" /></p>
<p>The table show the resolution, file size, and memory consumption and render
time inside ParaView for selected data points.</p>
<p>For comparison, this is the unstructured data:</p>
<p><img src="/images/table-amr.png" alt="" /></p>
<p>Here, the resolution refers to the maximum resolution (compared to the table
above). Notice that file size are orders of magnitude smaller, while rendering
time and memory consumption inside ParaView are orders of magnitude slower.</p>
<p>To conclude:</p>
<p>First, contour rendering is very fast, regardless of method and resolution
(0.01s not shown in the table), while contour extraction is quite a lot slower
for an unstructured mesh.</p>
<p>Second, surface rendering is very fast for both methods (interactive
framerates even on a laptop without GPU rendering).</p>
<p>Third, file sizes for structured meshes quickly become very large but memory
consumption inside ParaView is very efficient.</p>
<h1 id="float-vs-double-netcdfhdf5">float vs double netCDF/hdf5</h1>
<p>An obvious question is if we can reduce the file size of the netCDF
files. First, let’s consider storing the data as floats instead of
doubles. The loss in accuracy is unlikely to be problematic for visualization,
but as expected, we save a factor of 2 in file size. Rendering performance and
memory consumption also improves by a similar factor:</p>
<p><img src="/images/table-float.png" alt="" /></p>
<h1 id="other-options">Other options</h1>
<p>netCDF (HDF5) supports compression of the data (using nc_def_var_deflate()),
but it turns out that is not supported when writing data in parallel. This
makes it unusable for us during a computation. We could compress after the
fact using <code class="language-plaintext highlighter-rouge">nccopy</code>, as reading from compressed data should work in parallel.
See https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression
for more details.</p>
<h1 id="conclusions">Conclusions</h1>
<p>This little experiment showed some interesting results:</p>
<ol>
<li>
<p>Structured data is a lot more efficient for rendering (RAM consumption,
rendering time, contour extraction time)</p>
</li>
<li>
<p>High resolution structured data has very large file sizes (floats are a
good option, but files will still be quite large without additional
compression). Outputting at a lower resolution is certainly an attractive
option as we are sampling the data already anyways.</p>
</li>
<li>
<p>All parts of the algorithms and the file sizes scale with the resolution,
making it cheaper to produce the output as well. This makes structured,
lower resolution output attractive as a data exchange format and to do
quick visualization.</p>
</li>
<li>
<p>Sampling to structured data and compressing the data could be done as a
postprocessing step in a separate code base after the fact if we store the
unstructured data.</p>
</li>
</ol>
<h1 id="future-work">Future work</h1>
<ul>
<li>
<p>Clean up the sampling process and merge into ASPECT.</p>
</li>
<li>
<p>Update the code to support spherical geometries (lat/long/depth).</p>
</li>
<li>
<p>Compare unstructured data against vtkHyperOctree or vtkHyperTreeGrid from
VTK. This requires investigation of the file formats.</p>
</li>
<li>
<p>Compare against OSPRay AMR rendering using <a href="https://www.willusher.io/publications/tamr">TAMR</a>.</p>
</li>
</ul>
<h1 id="references">References</h1>
<ul>
<li>
<p><a href="https://arxiv.org/pdf/1703.00212.pdf">Two New Contributions to the Visualization of AMR Grids:
I. Interactive Rendering of Extreme-Scale 2-Dimensional Grids
II. Novel Selection Filters in Arbitrary Dimension
</a></p>
</li>
<li>
<p><a href="https://dx.doi.org/10.1111/cgf.13958">Feng Wang, Nathan Marshak, Will Usher, Carsten Burstedde, Aaron Knoll, Timo Heister, Chris R. Johnson:
CPU Ray Tracing of Tree-Based Adaptive Mesh Refinement Data
Computer Graphics Forum, 2020.</a></p>
</li>
<li>
<p><a href="https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf_compression">About netCDF compression</a></p>
</li>
</ul>The goal of this article is compare storage formats and rendering of Finite Element solutions produced with ASPECT. The computational mesh in ASPECT is a collection of adaptively refined octrees in 3d, see an example image below. For this experiment, we consider a stationary test benchmark in a unit cube that is part of ASPECT, see nsinker.Welcome2020-03-13T15:18:18+00:002020-03-13T15:18:18+00:00https://integrated-earth.github.io//jekyll/update/2020/03/13/first-news<p>Hi there, this is the beginning of our project website for the NSF funded project “Collaborative Research: Development and Application of a Framework for Integrated Geodynamic Earth Models”. Stay tuned for more content.</p>Hi there, this is the beginning of our project website for the NSF funded project “Collaborative Research: Development and Application of a Framework for Integrated Geodynamic Earth Models”. Stay tuned for more content.