DEV Community: Axelspace The latest articles on DEV Community by Axelspace (@axelspace). https://dev.to/axelspace https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F8577%2F6b603de0-fab3-4c4b-9dd0-49c5b3ad2e72.png DEV Community: Axelspace https://dev.to/axelspace en Saved EBS Costs by Cleaning Up 3 TiB of Duplicate Data in InfluxDB v1 Ryosuke Hara Wed, 28 May 2025 07:33:39 +0000 https://dev.to/axelspace/saved-ebs-costs-by-cleaning-up-3-tib-of-duplicate-data-in-influxdb-v1-23hp https://dev.to/axelspace/saved-ebs-costs-by-cleaning-up-3-tib-of-duplicate-data-in-influxdb-v1-23hp <p>Hi, I’m <a href="proxy.php?url=https://www.axelspace.com/" rel="noopener noreferrer">rhara</a>, a software engineer at Axelspace.</p> <p>In this post, I’ll share how we reduced the size of an <a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/" rel="noopener noreferrer">InfluxDB OSS v1</a> EBS volume from 5.9TiB to 2.9TiB by removing duplicate data. Since I couldn’t find much information on this process, I’m writing this as both a record and a reference.</p> <blockquote> <p>Note: This method requires creating a new InfluxDB database with a different name, meaning the original database name cannot be retained.<br><br> To restore the original name, additional steps such as using the <code>SELECT * INTO</code> clause again are required.</p> </blockquote> <h2> What is InfluxDB? </h2> <p><a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/" rel="noopener noreferrer">InfluxDB</a> is a time-series database optimized for storing and querying time-based data, especially in use cases where high write and read throughput is important—such as log aggregation. (<a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/concepts/crosswalk/#influxdb-is-not-crud" rel="noopener noreferrer">Docs</a>)</p> <h2> InfluxDB at Axelspace </h2> <p>At Axelspace, we use InfluxDB to store satellite telemetry data, such as power consumption data. While InfluxDB v2 is the mainstream version today, we've continued using the OSS version of v1 in Docker on an EC2 instance since our first satellite, GRUS-1A, was launched in 2018.</p> <p>Over time, we noticed that some of the data had accumulated with duplication, leading to a bloated EBS volume. To reduce storage costs, we decided to clean up the duplicate data and migrate to a smaller EBS volume.</p> <h2> InfluxDB Cleanup </h2> <h3> How to Reduce EBS Costs? </h3> <p>Since EBS pricing depends on storage size, reducing the volume size can lower costs. However, AWS does not allow shrinking EBS volumes directly.<br><br> Following <a href="proxy.php?url=https://repost.aws/en/knowledge-center/ebs-increase-decrease-volume-size" rel="noopener noreferrer">this AWS article</a>, we opted to create a new, smaller volume and replace the existing one.</p> <h3> Cleaning Up Duplicate Data in InfluxDB </h3> <p>Our first idea was to use InfluxDB's <code>DELETE</code> command to remove duplicate data directly from the existing volume, then copy the cleaned data to the new volume.</p> <p>However, <a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/query_language/manage-database/#delete-series-with-delete" rel="noopener noreferrer">as noted in the official documentation</a>, <code>DELETE</code> only allows specifying data to delete by timestamp or tag value—making it unsuitable for fine-grained removal of duplicates.</p> <p>We considered several alternatives:</p> <ol> <li>Use the <a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/query_language/explore-data/#the-into-clause" rel="noopener noreferrer"><code>SELECT * INTO</code> clause</a> provided by InfluxQL </li> <li>Use <a href="proxy.php?url=https://docs.influxdata.com/flux/v0/" rel="noopener noreferrer">Flux</a> queries </li> <li>Export and deduplicate data with <a href="proxy.php?url=https://docs.influxdata.com/telegraf/v1/" rel="noopener noreferrer">Telegraf</a>, then re-ingest</li> </ol> <p>We ultimately chose Option 1: <code>SELECT * INTO</code>.</p> <p>This clause allows flexible data selection, including filtering or dropping fields, and writing the result into a new database—ideal for deduplication.<br><br> We ruled out Option 2 because some query was not implemented, and Option 3 because it required full export and re-import of data.</p> <h3> Challenges with <code>SELECT * INTO</code> </h3> <p>One downside is that <code>SELECT * INTO</code> creates a new copy of the data, temporarily increasing total data size. To avoid enlarging the existing EBS volume, we used the new volume as the destination for the copied data.</p> <p>Also, since new telemetry data was being written during the cleanup, we had to ensure that InfluxDB could remain online and that memory usage wouldn’t spike. We processed data in small time windows (e.g., one or three days) to keep memory usage manageable.</p> <h2> Step-by-Step Procedure </h2> <p>As mentioned, the core idea is to move data to a new smaller EBS volume using <code>SELECT * INTO</code>, which cleans the duplicated data by its own functionality and then replace the volumes.<br><br> This required making both volumes accessible to a single InfluxDB process, which took some workarounds.</p> <p>We followed these five steps:</p> <ol> <li>Set up a smaller EBS volume </li> <li>Symlink both old and new volumes into the InfluxDB directory structure </li> <li>Execute <code>SELECT * INTO</code> to deduplicate and copy data </li> <li>Copy any additional data from old to new volume </li> <li>Replace the old volume with the new one</li> </ol> <h3> 1. Prepare the New EBS Volume </h3> <p>We created a new EBS volume to hold the cleaned data. We refer to this as the <em>new</em> volume and the currently used one as the <em>old</em> volume.<br><br> Due to EBS limitations, we couldn't shrink the old volume, so our goal was to replace it.</p> <p>How to attach a volume is covered in the <a href="proxy.php?url=https://docs.aws.amazon.com/ebs/latest/userguide/ebs-attaching-volume.html" rel="noopener noreferrer">official AWS documentation</a>.</p> <h3> 2. Using Both Volumes in InfluxDB </h3> <p>InfluxDB OSS v1 doesn't natively support splitting storage across multiple volumes.<br><br> So, we had to take steps not documented officially.</p> <h4> 2.1 Using Symbolic Links </h4> <p>By default, InfluxDB stores data in <code>/var/lib/influxdb/</code> with three subdirectories: <code>data</code>, <code>wal</code>, and <code>meta</code>. (<a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/administration/config/#dir--varlibinfluxdbdata" rel="noopener noreferrer">Docs</a>)</p> <p>Since each database has its own subdirectory, we could symlink only the new database's directories:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">ln</span> <span class="nt">-s</span> /mnt/new_ebs/wal/new_db /var/lib/influxdb/wal/new_db <span class="nb">ln</span> <span class="nt">-s</span> /mnt/new_ebs/data/new_db /var/lib/influxdb/data/new_db </code></pre> </div> <p>In this explanation, we use <code>/mnt/new_ebs</code> as the mount point for the new volume.</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mmt9h1rv3j2wkiy856b.png" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mmt9h1rv3j2wkiy856b.png" alt="Image description" width="800" height="469"></a></p> <p>Note: <a href="proxy.php?url=https://community.influxdata.com/t/how-can-i-find-my-data-after-upgrade/21233/4" rel="noopener noreferrer">Symlinks don’t always work reliably in InfluxDB</a> (the link discusses v2), so data may appear missing temporarily.<br><br> To avoid this, you can configure InfluxDB to directly use the real path rather than through symlinks. </p> <blockquote> <p>NOTE: Using bind mounts instead of symlinks may be more robust:<br><br> (<a href="proxy.php?url=https://community.influxdata.com/t/how-to-move-var-lib-influxdb-to-a-different-location/30163/2" rel="noopener noreferrer">https://community.influxdata.com/t/how-to-move-var-lib-influxdb-to-a-different-location/30163/2</a>)</p> </blockquote> <h4> 2.2 Create the New Database </h4> <p>After setting up the symlinks, run the following InfluxQL command to create the new database:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight sql"><code><span class="k">CREATE</span> <span class="k">DATABASE</span> <span class="n">new_db</span> </code></pre> </div> <h3> 3. Copy and Deduplicate Data via <code>SELECT * INTO</code> </h3> <p>We ran Command like this:<br> </p> <div class="highlight js-code-highlight"> <pre class="highlight sql"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">INTO</span> <span class="n">new_db</span> <span class="k">FROM</span> <span class="n">old_db</span> <span class="k">WHERE</span> <span class="p">...</span> </code></pre> </div> <p>You can also specify tags or fields instead of using <code>*</code>.<br><br> This was the most time-consuming step—it took months to process about 5 TiB of data spanning several years.</p> <h3> 4. Copy Additional Data to New Volume </h3> <p>Some remaining steps:</p> <ul> <li>Copy <code>/var/lib/influxdb/meta/meta.db</code> to <code>/mnt/new_ebs/meta/meta.db</code> </li> <li>If there are other databases beyond <code>old_db</code>, copy them as well: </li> </ul> <div class="highlight js-code-highlight"> <pre class="highlight shell"><code><span class="nb">cp</span> <span class="nt">-avi</span> /var/lib/influxdb/data/other_db /mnt/new_ebs/data <span class="nb">cp</span> <span class="nt">-avi</span> /var/lib/influxdb/wal/other_db /mnt/new_ebs/wal </code></pre> </div> <p>Using <code>rsync</code> is also a valid option.</p> <h3> 5. Replace the Volumes </h3> <p>Finally, point InfluxDB to the new volume by adjusting mount points or configuration.<br><br> This can be done via the config file or environment variables (<a href="proxy.php?url=https://docs.influxdata.com/influxdb/v1/administration/config/#dir--varlibinfluxdbdata" rel="noopener noreferrer">Docs</a>).</p> <h2> Final Notes </h2> <p>While our main target was duplicate data, the <code>SELECT * INTO</code> clause offers flexibility to remove or transform data during migration.</p> <p>Again, note that this approach does <strong>not</strong> preserve the original database name.<br><br> Since InfluxDB doesn't support renaming databases, if you must retain the name, you'll need to re-create the database with the same name and run <code>SELECT * INTO</code> back into it.</p> About Axelspace's security initiatives (towards achieving zero-trust) mizu Fri, 11 Apr 2025 06:13:21 +0000 https://dev.to/axelspace/about-axelspaces-security-initiatives-towards-achieving-zero-trust-31bo https://dev.to/axelspace/about-axelspaces-security-initiatives-towards-achieving-zero-trust-31bo <h2> Introduction </h2> <p>Hello! I’m mizu, a corporate IT and security engineer at <a href="proxy.php?url=https://www.axelspace.com/en/" rel="noopener noreferrer">Axelspace</a>, a startup developing business in the satellite industry. It’s been two and a half years since I joined the company, and I wanted to take the opportunity to reflect on and share what I’ve worked on so far through this technical blog.</p> <p>If you’re a security engineer at a company with <strong>100+ employees and plans to scale rapidly</strong>, I hope our experience can serve as a helpful reference.</p> <p>In this post, I’ll focus on our initiatives aimed at realizing Zero Trust security.</p> <h2> Challenges and Our Approach to Zero Trust </h2> <p>With remote work becoming the norm and SaaS adoption expanding, several security challenges came to light within our organization:</p> <ul> <li>Security logs were siloed across products, making it difficult to gain a holistic view </li> <li>SaaS usage across employees was unmonitored (Shadow IT) </li> <li>Passwords were managed individually, lacking centralized control</li> </ul> <p>To address these issues, we selected and implemented the following tools in a phased manner. The key themes of this post are <strong>“visibility”</strong> and <strong>“control.”</strong></p> <h2> EDR: Endpoint Monitoring with CrowdStrike Complete </h2> <p>To strengthen endpoint security (PCs and servers), we replaced our legacy signature-based antivirus software with <strong>CrowdStrike Complete</strong>, a leading EDR (Endpoint Detection and Response) solution.</p> <p>While many organizations already use CrowdStrike, we opted for the Complete plan, which includes MDR (Managed Detection and Response). This allows CrowdStrike’s security analysts to handle triage and initial response actions, enabling swift and efficient incident handling.</p> <p><strong>Changes and highlights after implementation:</strong></p> <ul> <li>Visibility into all endpoint activities, including non-incidents, greatly improved traceability </li> <li>Triage and initial containment are handled by CrowdStrike MDR, significantly reducing operational burden </li> <li>Even complex threats like Living-off-the-Land (LotL) attacks can be detected and responded to effectively </li> </ul> <blockquote> <p><strong>[Operational Note]</strong><br><br> Many companies face a common issue after implementing EDRs: “We bought it, but can’t operate it.” By including MDR with our EDR, we significantly reduced alert fatigue and operational strain.<br><br> In my experience, incidents occurring several times a month are now resolved in under an hour each. If you’re struggling with EDR operations, MDR might be a solution worth exploring.</p> </blockquote> <h2> CASB/SWG: Shadow IT Visibility and Web Access Control with Netskope </h2> <p>We implemented <strong>Netskope</strong> as our CASB/SWG solution. This gave us the ability to log and control user access to SaaS and web services over HTTPS via endpoint-based agents.</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet5lb69e6cifhken0dvb.jpg" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet5lb69e6cifhken0dvb.jpg" alt="Visibility and control via Netskope" width="800" height="408"></a></p> <p>Unlike traditional UTM or IDS tools, which can’t inspect HTTPS traffic, Netskope’s endpoint agent decrypts SSL/TLS traffic and allows inspection of HTTP methods, file names on Google Drive, and more.</p> <p><strong>Changes and highlights after implementation:</strong></p> <ul> <li>Shadow IT usage by employees became visible (<strong>Visibility</strong>) </li> <li>We gained policy-based control over risky services like online storage, chat tools, and social media (<strong>Control</strong>) </li> <li>Access via personal Gmail or Outlook accounts can now be blocked (<strong>Control</strong>) </li> </ul> <blockquote> <p><strong>[Operational Note]</strong><br><br> Due to HTTPS decryption, Netskope sometimes inadvertently blocks traffic from development environments. We maintain an exception list for specific executables and destination URLs to avoid interfering with development.<br><br> For now, we only use CASB/SWG function and have chosen not to implement NPA (VPN alternative) for several reasons.</p> </blockquote> <h2> Password Manager: Centralized and Shared Management with Keeper </h2> <p>We introduced <strong>Keeper Security</strong> as our password manager to replace free software and browser-based storage. This enables secure password sharing*, auto-fill, and generation capabilities.</p> <p>We also subscribed to the <strong>BreachWatch</strong> add-on, which provides dark web monitoring, weak password detection, and scoring by user and organization.</p> <p>* While password sharing should generally be avoided, there are still unavoidable scenarios where shared accounts are required.</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvh8qqlvgf9mcr95z2s9.png" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdvh8qqlvgf9mcr95z2s9.png" alt="Password Manager Migration" width="800" height="169"></a><br> <em>Password Manager migration illustration</em></p> <p>As info-stealer malware that extracts browser-stored credentials remains prevalent, password managers like Keeper are also effective against malware-related threats.</p> <p><strong>Changes and highlights after implementation:</strong></p> <ul> <li>Standardized password management, policies, and sharing methods (<strong>Control</strong>) </li> <li>Improved operational efficiency via auto-fill </li> <li>Admins can now monitor password breaches, weak usage, and scoring (<strong>Visibility</strong>) </li> </ul> <blockquote> <p><strong>[Operational Note]</strong><br><br> Keeper supports importing from various tools (Keepass, Google Password Manager, etc.), which made the migration smooth and user-friendly.<br><br> We didn’t mandate usage and started with a small license count to reduce initial costs. Expanding Keeper’s usage across the org is our next goal.</p> </blockquote> <h2> SIEM: Centralized Logging and Visualization with Elastic Cloud </h2> <p>To manage and visualize logs from all our security tools, we implemented <strong>Elastic Cloud</strong> as our SIEM platform. By integrating logs into Elastic Cloud, we enhanced visibility, centralized monitoring, and enabled long-term storage.</p> <p>Notably, our CrowdStrike plan does not retain event logs long-term, so we strongly felt the need to centralize them in a SIEM for traceability during incidents.</p> <p><a href="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2p7bhjpwz8h8ktm7gta.png" class="article-body-image-wrapper"><img src="proxy.php?url=https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2p7bhjpwz8h8ktm7gta.png" alt="Elastic integration" width="698" height="473"></a><br> <em>Elastic integration illustration</em></p> <p><strong>Changes and highlights after implementation:</strong></p> <ul> <li>Logs from various tools can now be viewed directly on Elastic Cloud (<strong>Visibility</strong>) </li> <li>Activity across users, devices, and email addresses can be traced across tools (<strong>Visibility</strong>) </li> <li>Dashboards provide executives with visual summaries of security incidents and risk scoring (<strong>Visibility</strong>) </li> </ul> <blockquote> <p><strong>[Operational Note]</strong><br><br> Since Elastic Cloud is SaaS-based, we could skip infrastructure setup and begin operations quickly. However, proper configuration of ILM (Index Lifecycle Management) is still required to manage data retention.<br><br> Fortunately, the Elastic Support Assistant (AI chatbot) was very effective in resolving issues.<br><br> We plan to cover the selection process and comparison with other SIEM tools in a future article.</p> </blockquote> <h2> Closing Thoughts </h2> <p>While we’ve successfully deployed these solutions, I’d say we’re at about <strong>50%</strong> when it comes to fully utilizing their features and implementing the right policies.</p> <p>Going forward, we aim to deepen our understanding of these tools and expand improvements to previously postponed security areas.</p> <p>Some of our next steps include:</p> <ul> <li> <strong>ZTNA</strong>: Building a secure, VPN-less network </li> <li> <strong>ASM</strong>: Risk evaluation and vulnerability assessment of IT assets </li> <li> <strong>Security Education</strong>: Phishing simulations and security e-learning programs </li> </ul> <h2> We are hiring!! </h2> <p>Axelspace is actively hiring across multiple roles, and we're especially looking for security engineers!</p> <p>We’d love to hear from you if any of the following apply:</p> <ul> <li>You’re interested in the space industry </li> <li>You want to work at a fast-growing startup </li> <li>You want to have autonomy and impact in a rapidly scaling company</li> </ul> <p>If you're curious, let’s start with a casual chat!<br> <a href="proxy.php?url=https://hrmos.co/pages/axelspace/jobs/3000000006" rel="noopener noreferrer">https://hrmos.co/pages/axelspace/jobs/3000000006</a></p> security