Stackscale

The Linux kernel prepares for a “day after” Linus Torvalds

David Carrero — Wed, 28 Jan 2026 21:01:55 +0000

For more than three decades, the Linux kernel has grown with a comfortable paradox: a massive project maintained by hundreds of people worldwide, yet with a final, centralised step that has effectively passed through the hands of a near-single figure. Linus Torvalds didn’t just write the first lines of code; since 1991, he has also been the final integrator of the mainline tree—the canonical repository that decides what lands and what doesn’t.

That centrality—both efficient and fragile—now has a formal answer. The kernel community has documented a continuity procedure: a “plan to activate a plan” in case there is no orderly transition one day, whether due to retirement, incapacity, or any scenario that leaves the stewardship of the canonical repository in limbo.

It’s not a dramatic turn and it doesn’t crown an heir. It’s what a mature, industrial-grade project does: accept that even strong informal institutions need a protocol once time passes and the software stops being a hobby and becomes critical infrastructure.

Linux: from “personal project” to digital backbone

It’s easy to forget that Linux began as a small kernel in the era of newsgroups and mailing lists. What followed was organic growth: first in universities and technical environments, then as a foundation for servers, networking and supercomputing, and later as an indirect pillar of the mobile world (through Android) and the cloud.

The impact is obvious to anyone running production systems: Linux isn’t “just another operating system”. It’s a de facto standard in data centres, a structural ingredient in the software supply chain, and a layer on which essential services are built. When something like that depends—at the very last step—on a single central figure, a familiar risk concept comes into play: the bus factor.

The bus factor of mainline

Kernel development is distributed: subsystem maintainers, intermediate trees, public review, pull requests, and an engineering culture that has turned Linux into a remarkably effective social machine. Still, the final step—merging into mainline—has historically been the “good bottleneck”: a consistency filter that keeps the project coherent in both technical direction and style.

The uncomfortable question is not whether the community has the talent to keep going. The question is what happens if, when the time comes, there is no transition “with a script”—or if the person responsible for mainline cannot (or does not want to) facilitate it. That is the gap the new document is designed to close.

What the continuity procedure establishes

The text has been added to kernel documentation as “Linux kernel project continuity”, describing an emergency mechanism that is intended to be activated only if there is no smooth transition.

The logic is straightforward:

An Organizer is designated to kick off the process. The document defines this role as the most recent Linux Maintainers Summit organizer, with the chair of the Linux Foundation Technical Advisory Board (TAB) as an alternative.
That Organizer has 72 hours to initiate discussion with the invitees of the most recent Maintainers Summit.
If more than 15 months have passed since the last Maintainers Summit, the TAB determines the set of invitees.
The group may include additional maintainers if needed.
Within two weeks, a representative of the group communicates next steps to the wider community via ks*****@*********ux.dev.

In other words: the clock is defined, the convener is defined, the “electoral college” is bounded to the people already carrying day-to-day technical responsibility, and public communication is required on a short timeline. The goal isn’t to pre-appoint a successor—it’s to avoid a vacuum and reduce uncertainty in a project on which millions of deployments depend.

The approach also reveals a key idea: kernel continuity is not decided behind closed doors or through a mass vote, but within the core group that already bears the project’s operational weight.

Why this matters to companies, cloud, and operations

For an infrastructure provider—and especially for the European cloud ecosystem, where Linux is ubiquitous—the headline is not “Linus might retire tomorrow”. The headline is that the kernel treats as first-class something IT considers almost sacred: planning for critical failure modes.

Business continuity is a relentless lesson: technical redundancy is not enough; you also need organisational redundancy. The Linux kernel is a textbook case of software becoming global public infrastructure. Once a project reaches that status, its governance must also be “operable” under adverse conditions.

In systems terms:

If Linux underpins your workloads, you benefit from clear, documented procedures for unexpected scenarios.
If you run platforms at scale, you want the core of your stack to have continuity mechanisms—just like you demand within your own organisation.
If your business depends on stability and planning (kernel cycles, drivers, security, support), you don’t want the final integration point to be stuck in limbo due to the absence of a protocol.

There’s also a broader reading: in a world where the software supply chain is under increasing scrutiny, trust doesn’t come only from code quality. It also comes from how a project governs itself when the hard days arrive.

A quiet lesson: Linux scales in governance too

Linux has long proven it can scale in lines of code, number of maintainers, and technical complexity. The continuity document adds another layer: the ability to scale organisationally.

The community isn’t saying “this will happen”. It’s saying “if it happens, we won’t improvise”. That subtle difference is enormous—and it’s one of the signals that separates a brilliant project from a truly industrial one.

In an industry where critical systems live on procedures (change management, review, audits, recovery, incident response), formalising a continuity plan is not bureaucracy. It’s maturity.

Sysadmin quick take

What changes: a continuity procedure is documented to keep mainline moving if the canonical repository cannot be managed normally.
Trigger: lack of an orderly transition or an inability to facilitate one.
Timelines: 72 hours to start the process; 2 weeks to communicate next steps publicly.
Governance: driven by the Maintainers Summit invitee set and the Linux Foundation TAB, with Linux Foundation support.

References: GitHub Linus y System Administration

The post The Linux kernel prepares for a “day after” Linus Torvalds appeared first on Stackscale.

Let’s Encrypt Opens the Door to HTTPS by IP: IPv4 and IPv6 Certificates With a 160-Hour Lifetime

David Carrero — Sun, 18 Jan 2026 15:20:41 +0000

For years, web encryption has relied on a basic assumption: if a service wants HTTPS, it needs a domain name. That’s practical—people browse names, not numbers—but also historical: certificate validation and lifecycle management were designed around DNS.

That rule has started to change. As of January 15, 2026, Let’s Encrypt has announced general availability of TLS certificates for IP addresses, a development that makes it possible to secure HTTPS connections directly over IPv4 or IPv6 without relying on a domain. This is particularly relevant for homelabs, temporary infrastructure, test environments, “base” network services, and scenarios where a domain does not fit (or does not exist).

What an “IP Certificate” Means—and Why It Matters Now

A traditional TLS certificate includes domain names in the SAN (Subject Alternative Name) field so that a browser or client can verify that the server it is connecting to is who it claims to be. With this new option, the certificate identifier can be an IP address: the client validates the encrypted channel against that IP, not against an FQDN.

In practice, this solves a very common problem: accessing an HTTPS service by IP—say, an admin console, a temporary panel, a test front-end, or an internal service exposed briefly—without browser warnings, self-signed certificates, or permanent security exceptions.

Let’s Encrypt acknowledges that this will not be “for everyone.” Their own view is that most services will continue to work better with domain-based certificates, due to flexibility (hosting changes, load balancing, multi-site deployments) and usage habits. But for the cases where HTTPS by IP is genuinely needed, general availability removes a barrier that has slowed administrators and developers for years.

The Trade-Off: Ultra-Short 160-Hour Certificates

The key condition for IP certificates is their mandatory short lifetime: 160 hours, a little over six days. Let’s Encrypt chose this approach for a straightforward reason: IP addresses can change hands quickly, especially with residential connections or ephemeral deployments. Keeping a “long-lived” certificate tied to an IP that the requester no longer controls would create avoidable risk.

This aligns with the broader industry trend toward shortening exposure windows in the event of key theft or mis-issuance, and leaning on automation so rotation becomes routine. Let’s Encrypt explicitly positions the 160-hour certificates as part of its strategy to move the ecosystem toward shorter lifecycles.

How Issuance Works: ACME Profiles and Validation Challenges

To obtain IP certificates, Let’s Encrypt requires an ACME client that supports ACME Profiles, and the requester must explicitly select the shortlived profile (the short-lived one). The design prioritizes automation and leaves less room for “manual” setups that tend to be forgotten over time.

There are also sensible technical constraints: you cannot use DNS-01 to prove control of an IP address (there is no DNS record that proves ownership of the address as the primary identifier), so validation is restricted to http-01 and tls-alpn-01. In practical terms, the server must demonstrate real control over the endpoint reachable at that IP to pass the challenge.

Real-World Use Cases: From Homelabs to Critical Infrastructure Services

Let’s Encrypt lists several scenarios where these certificates make sense, and the common pattern is clear: environments where a domain is a luxury, an extra cost, or simply unnecessary for the technical goal.

Typical examples include:

Secure access to services without a domain, accepting that this is less convenient and more fragile than DNS-based setups.
Default hosting provider pages—when someone pastes an IP into a browser and the service wants to respond over HTTPS without errors.
Infrastructure services such as DNS over HTTPS (DoH) or other endpoints where a public certificate helps client validation.
Remote access to home devices (NAS, IoT, lab equipment) when there is no associated domain.
Ephemeral connections inside cloud infrastructure, including administration or temporary services—provided there is a public IP available.

The underlying point is that the internet is not only “web pages”: more and more services use HTTPS as a secure transport, even if the end user never sees a friendly name in the browser bar.

Quick Table: What Changes With Short-Lived Certificates

Certificate type	Identifier	Validity	Profile / approach	Best fit
“Classic” certificate	Domain (DNS)	90 days	Standard automated renewal	Public web, stable services, hybrid infrastructure
Planned short-lived (domain)	Domain (DNS)	45 days	Opt-in and gradual transition	Teams that want more frequent rotation
Ultra-short	Domain or IP	160 hours	`shortlived` profile (ACME Profiles)	Homelabs, ephemeral environments, direct IP access, testing

A Note for Sysadmins: Without Automation, This Isn’t Viable

The downside of 160-hour certificates is obvious: renewing every few days is not realistic without a solid automation and monitoring chain. At a minimum, that means:

scheduled renewals with sufficient margin,
automated deployment of the new certificate,
alerts if renewal fails,
and recurring tests (for example, verifying externally if the endpoint is public).

Let’s Encrypt does not hide that this is the direction it wants the market to move toward. Their roadmap points to progressively reducing the default validity period—from 90 to 45 days—starting with opt-in adoption and supported by ecosystem improvements to make renewal more predictable.

A Small Change on the Surface, With Big Implications

A free, high-volume CA like Let’s Encrypt issuing certificates for IP addresses has impact beyond the novelty. In practice, it normalizes a pattern: encrypt by default even when there is no DNS, using a trust model based on rapid rotation.

For advanced users, the result is immediate: more options to build and expose services without insecure shortcuts. For the broader ecosystem, the message is deeper: the future of TLS looks less like “install a cert and forget it,” and more like “my infrastructure renews certificates the same way it rotates secrets.”

Frequently Asked Questions

What’s the point of a TLS certificate for an IP if I can use dynamic DNS?

It’s useful in environments where you don’t want—or can’t—depend on DNS: labs, testing, ephemeral endpoints, infrastructure services, or direct IP access. With a valid public certificate, clients can verify HTTPS without security exceptions.

Why does Let’s Encrypt cap these certificates at 160 hours?

Because IP “ownership” can change quickly (dynamic IPs, reassignment, temporary infrastructure). A short window reduces the risk of a certificate remaining valid after the requester no longer controls that IP.

What do I need to issue an IP certificate with Let’s Encrypt?

An ACME client compatible with ACME Profiles and an explicit request for the shortlived profile. Also, validation cannot be done with DNS-01; you must use methods such as http-01 or tls-alpn-01, depending on the deployment.

How should homelabs or dev environments operate such short-lived certificates?

With full automation: renewals with margin, automated deployment, alerts, and continuous verification. If the process depends on manual steps, expirations and service interruptions become likely.

Source: Let’s Encrypt Opens the Door to HTTPS by IP

The post Let’s Encrypt Opens the Door to HTTPS by IP: IPv4 and IPv6 Certificates With a 160-Hour Lifetime appeared first on Stackscale.

Proxmox in 2025: the definitive leap from “alternative” to standard — and how Stackscale speeds up the migration

David Carrero — Tue, 16 Dec 2025 10:11:42 +0000

For years, Proxmox Virtual Environment (Proxmox VE) lived in a kind of “in-between land”: too powerful to be dismissed as a lab tool, yet still lacking the commercial gravity of the biggest names in virtualization. In 2025, that balance has broken. This open-source platform has solidified as a real — and in many cases preferred — option for companies looking to reduce dependence on complex licensing, unify virtualization and containers, and keep control of their infrastructure.

The conversation no longer revolves only around price. In the day-to-day work of infrastructure teams, the question is more practical: “Can we migrate with minimal friction, operate with real guarantees, and get serious support?” That’s where Proxmox has been closing the loop: technical maturity, a large community, a growing partner ecosystem, and a consistent roadmap. And with Stackscale, you also get access to specialized, close-to-the-customer support.

From a niche project to global adoption

In terms of deployment, Proxmox has been growing for a while — but the number that best captures its maturity is scale. Proxmox Server Solutions places the platform at more than 1,500,000 installed hosts and presence in 140+ countries. That data point alone dismantles the “minority solution” narrative and reflects sustained adoption in production environments.

In parallel, industry media and players have highlighted a symbolic milestone: 2025 marks 20 years of project development — a rare track record for infrastructure tools that not only survive, but improve their value proposition year after year. That anniversary also arrives at a particularly favorable market moment: many organizations have revisited their strategy after shifts in the VMware ecosystem and a rising sensitivity to technology sovereignty.

What changed: why Proxmox fits better in 2025

The short version is familiar: Proxmox combines KVM virtualization, LXC containers, cluster management, high availability, integrated storage (ZFS/Ceph), connectivity with network storage solutions, and an API ready for automation. But the real impact shows up when you overlay it with today’s operational pressure:

Standardization and simplification: a single console for VMs, containers, networking, storage, and roles.
Automation: natural integration with IaC tools and pipelines — critical when infrastructure teams can’t “click” their way through everything.
Hybrid architectures: Proxmox doesn’t force you into “all on-prem” or “all cloud”; it supports an in-between path.
A culture of control: the open-source model encourages auditing, understanding, and deciding — instead of accepting black boxes.

This shift is also reflected in market surveys: in 2025, PeerSpot places Proxmox VE at 16.1% mindshare in the server virtualization segment — a meaningful indicator of visibility and consideration among professionals comparing solutions.

Migrating from VMware: less drama, more engineering

The critical point isn’t “does Proxmox work?” — it’s “how do you get there?” VMware-to-Proxmox migrations often fail when they’re treated as a simple VM move. In reality, the dependencies that matter more than the VM itself are: networks, storage, backups, maintenance windows, compliance, observability, team procedures, and above all, the rollback plan.

In this context, Stackscale keeps pushing a realistic approach: turn migration into a controlled project, with clear phases, testing, and validation — instead of a weekend sprint. In its Proxmox VE migration guide, the company frames the move as an “efficient and secure” alternative for organizations that need to move forward without getting trapped by third-party decisions.

The key is to break the journey into three layers:

1) Platform layer

Define the destination: a Proxmox cluster with network design; shared storage or synchronous storage or Ceph; HA; segmentation; and roles. Here you decide whether the goal is a private cloud, a hybrid model, or a “transitional Proxmox” setup to exit VMware first and consolidate later.

2) Data and continuity layer

Before moving workloads, you protect what can’t be lost: backup policies, retention, encryption, restore testing, and — where relevant — a multi-datacenter strategy.

3) Workload layer

Batch-based migration: start with less critical systems, then core services, and finally the most delicate dependencies (databases, hardware-tied licenses, apps with specific drivers, or virtual appliances).

Proxmox + Stackscale: a proposal focused on what happens after the migration

A migration promise is worthless if day-two operations become painful. That’s why Stackscale’s approach is built around two ideas: production-ready infrastructure and continuity mechanisms.

A clear example is backups. In its content on Proxmox Backup Server (PBS), Stackscale presents PBS as a structural component in Proxmox environments running mission-critical workloads: incremental backups, deduplication, encryption, and fast restores — with the ability to integrate with “Archive”-type storage and multi-location designs (for example, between Madrid and Amsterdam) to reinforce strategies like 3-2-1.

That point — backups as part of the design, not an add-on — often makes the difference between a migration that merely “works” and a platform that’s sustainable. Week one, everything boots. The first real incident? Only the teams that tested recovery beforehand make it through cleanly.

On-premises to Proxmox: modernize without giving up control

Beyond VMware, 2025 is accelerating another shift: companies running heterogeneous on-prem environments (mixed hypervisors, NAS without governance, manual processes) want a unified management layer without losing control over data and latency.

Proxmox fits especially well here because it enables you to:

Consolidate virtualization and containers under a single control plane.
Design HA with clustering and move workloads with more flexibility.
Build mission-critical infrastructure more consistently.
Professionalize backups and replication.
Prepare for a hybrid jump (without rewriting everything) when the business demands it.

The takeaway is simple: rather than treating “move to the cloud” as a dogma, many organizations are choosing to get their house in order with a properly built private cloud — and then connect only what they need.

The 2025 effect: Proxmox isn’t explained anymore — it’s compared

The clearest sign of maturity is that Proxmox is no longer defended with ideological arguments (“open source is better”), but with operational comparisons: total cost, deployment speed, ease of automation, support, resilience, and reversibility. In a market where licensing changes have strained budgets and IT roadmaps, that comparison has become routine.

And that’s where Proxmox is gaining ground: it enters the conversation with two strong cards — solid technology and a control-first narrative — right when many organizations are tired of depending on other people’s decisions.

Frequently asked questions

What typical risks show up when migrating from VMware to Proxmox VE, and how can they be reduced?
The most common ones are isolated incompatibilities, poorly documented network/storage dependencies, and lack of restore testing. You reduce them with a phased pilot, a real workload inventory, and a rollback plan.

Can Proxmox VE be used as the foundation of an enterprise private cloud?
Yes — especially when it’s designed with clustering, high availability, network segmentation, role-based access control, and verified backup/retention policies.

What does Proxmox Backup Server add compared to “traditional backups” in Proxmox?
PBS is built for Proxmox: incremental backups, deduplication, encryption, and fast restores with tight ecosystem integration — simplifying operations and improving recovery times.

Which strategy is better: migrate to Proxmox first and redesign later, or redesign before migrating?
It depends on the timeline. In many cases, a controlled migration (an orderly exit) followed by step-by-step optimization works better, so the project doesn’t turn into an endless “big migration.”

Source: Cloud News Tech

The post Proxmox in 2025: the definitive leap from “alternative” to standard — and how Stackscale speeds up the migration appeared first on Stackscale.

Proxmox Backup Server: enterprise-grade backups for Proxmox environments on Stackscale

David Carrero — Thu, 11 Dec 2025 10:42:50 +0000

In any modern infrastructure, backups are no longer a “nice to have”, but a structural component on the same level as storage or networking. When you work with Proxmox VE clusters, HA solutions, and mission-critical workloads, having a backup solution designed specifically for this ecosystem is what makes the difference between a fast recovery and a long, expensive outage.

That’s where Proxmox Backup Server (PBS) comes in: an enterprise backup solution, 100% open source, designed to protect virtual machines, containers, and physical hosts with incremental, deduplicated, encrypted backups, tightly integrated with Proxmox VE.

On Stackscale infrastructure, Proxmox Backup Server can be combined with Archive storage (via NFS or S3-compatible access), available in two separate data center locations in Madrid and Amsterdam, to build robust, multi–data center backup strategies aligned with the 3-2-1 model. It can also be complemented with remote storage on other providers to add extra backup copies and data resilience.

What is Proxmox Backup Server?

Proxmox Backup Server is a Linux distribution focused specifically on backup, based on Debian with native ZFS support, written primarily in Rust and licensed under GNU AGPLv3. Its goal is to offer an open alternative to proprietary backup solutions, without limiting the number of machines or the amount of protected data.

Key features include:

Incremental, deduplicated backups for VMs, containers, and physical hosts.
High-performance Zstandard compression.
Client-side authenticated encryption (AES-256-GCM), so data travels and is stored encrypted even on targets that are not fully trusted.
Integrity verification using checksums and schedulable verification jobs.
Native integration with Proxmox VE, including support for QEMU dirty bitmaps for fast incremental backups.
Support for tape (LTO) and S3-compatible storage as data backends.

Proxmox Backup Server 4.1, released in November 2025, is based on Debian 13.2 “Trixie”, uses the Linux 6.17.2-1 kernel as the new default, and ZFS 2.3.4. It introduces important improvements in traffic control, verification, and S3-based storage usage.

Architecture and how it works

Client–server model and namespaces

PBS follows a client–server model: one or more backup servers store the data, while clients (including Proxmox VE) send backups incrementally over TLS 1.3. Backups are organized into datastores and namespaces, which makes it possible to separate environments (for example, production, preproduction, or different clusters) while maintaining effective deduplication.

Thanks to deduplication and incremental backups, only the blocks that have changed since the last backup are transferred and stored, drastically reducing network and storage consumption.

Integration with Proxmox VE

In Proxmox VE, you only need to add a PBS datastore as a new backup storage to start using it for VMs and containers. From the Proxmox interface itself you define backup jobs (full, incremental, scheduling, retention) targeting the PBS datastore.

PBS also allows you to restore:

Full VMs (with live restore support in many cases).
Full containers.
Individual files from VM or container backups, directly from the GUI or an interactive shell.

Security, integrity, and ransomware protection

Security is one of the core pillars of Proxmox Backup Server:

Client-side encryption with AES-256-GCM, so the backup server cannot read the data without the corresponding keys.
SHA-256 checksums to verify the integrity of all stored blocks.
Schedulable verification jobs to detect bit rot or corruption.
A granular role-based access control (RBAC) model with roles, API tokens, integration with LDAP, Active Directory, and OpenID Connect, plus 2FA support.

Starting with version 4.1, PBS also adds user-based traffic control and bandwidth limiting for S3 endpoints, which helps prevent network congestion when running heavy backup or restore operations to object storage.

All of this makes Proxmox Backup Server a very solid component in a broader strategy for defending against ransomware and large-scale infrastructure failures.

Proxmox Backup Server on Stackscale infrastructure

On Stackscale, Proxmox Backup Server can be integrated with different types of storage, but the recommended design is always that the primary backup copy lives on Archive storage, isolated and redundant, and then faster layers are added if you need to accelerate recovery.

The general pattern looks like this:

Reference copy (golden copy): always on Stackscale Archive storage, accessible via NFS or S3-compatible API, and available across three separate physical locations (2 data centers in Madrid and 1 in Amsterdam).
Optional performance layers: additional copies on faster network storage from Stackscale (All-Flash) to speed up frequent restores.
Local disk storage on the backup server: possible, but with clear limitations: access to the backups depends on that specific physical server, and this layer does not provide redundancy.

Archive storage as the primary backend for Proxmox Backup Server

Stackscale’s Archive storage should be the reference backend for Proxmox Backup Server, ensuring that no matter what happens to hosts or clusters, there is always a consistent, recoverable copy in a separate environment:

Archive mounted via NFS
1. PBS sees Archive as just another datastore exported via NFS.
1. Ideal for large backup volumes over the network, while keeping PBS’s own deduplication and compression.
1. Recommended as the primary destination for medium- and long-term retention.
Archive via S3-compatible access
1. Archive can be exposed as an S3 bucket and configured in PBS as a remote target.
1. This allows you to leverage Proxmox Backup Server’s object storage capabilities (including bandwidth limiting and multi–data center scenarios).
1. Especially suitable for long-term backups and disaster recovery plans.

In both cases, Archive is available in two regions:

2 data centers in Madrid.
1 data center in Amsterdam.

This makes it easy to design multi–data center strategies: for example, keeping primary backups in Archive Madrid and periodically syncing them to Archive Amsterdam using PBS remote sync jobs.

Fast network storage as an accelerated recovery layer

On top of Archive, customers can add a faster network storage layer from Stackscale (All-Flash) to:

Store the most recent backups (for example, from the last few days).
Speed up frequent restores or recovery tests.
Reduce RTO for services that are extremely sensitive to downtime.

In this model, Archive remains the single source of truth, and the network storage acts as a hot cache for the most common restores.

Local disk: a situational option, but use with care

Local storage on the Proxmox Backup Server host itself can still be used in some scenarios (for example, as an initial backup buffer), but it comes with several drawbacks:

Access to the backups depends on the health of that specific physical server.
There is no intrinsic redundancy at this layer unless data is replicated to Archive or network storage.
In the event of a serious host or local storage failure, the most valuable backup layer could be lost.

That’s why, in professional architectures on Stackscale, the recommended approach is:

Stackscale Archive as the mandatory destination for critical backups (NFS or S3).
Fast network storage to accelerate restores.
Local disk only as a temporary or complementary layer, never as the only backup repository.

Example architectures with PBS and Archive on Stackscale

1. Proxmox private cloud with Archive as primary repository and a fast recovery layer

Proxmox VE cluster in Madrid.
Proxmox Backup Server deployed as a VM or bare-metal server on Stackscale.
Primary PBS datastore hosted on Stackscale Archive storage (in Madrid), accessible via NFS or S3-compatible access.
Optionally, an additional datastore on fast network storage (SSD/NVMe over network) to speed up frequent restores.

In this design, backup jobs always write to Archive, which acts as the golden copy.
Optionally, you can define additional jobs or internal PBS syncs towards the fast datastore to maintain a “hot” layer for faster restores of specific machines.

To comply with a 3-2-1 and multi–data center strategy, a common pattern is:

Keep the primary datastore in Archive Madrid.
Periodically sync that datastore to Archive Amsterdam using PBS remote sync jobs.

This way, even if a full data center fails, the organization still has recoverable copies in another region (Madrid and Amsterdam).

2. Multi-cluster platforms with namespaces and Archive as the single source of truth

Several dedicated Proxmox clusters (production, preproduction, QA, customer environments, etc.) across different Stackscale data centers.
One or more central Proxmox Backup Server instances, with datastores backed by Archive and namespaces separated by environment or internal customer.
Optionally, additional datastores on faster network storage for environments that require very low RTO (for example, production).

In this model:

All environments write their backups to Archive, organized into namespaces (for example: prod/, preprod/, qa/, customer-X/).
Namespaces provide logical isolation and allow different retention policies and permissions while still benefiting from global datastore deduplication.
For certain namespaces (like production), you can add a “fast” datastore that acts as an accelerated recovery layer, but always backed by Archive as the long-term, multi–data center repository.

In this way, the organization combines:

Governance and segregation (namespaces + RBAC).
Robust, isolated copies on Archive.
And, where needed, very fast restores from higher-performance network storage.

3. Backups of physical hosts and specific services with Archive as the primary target

Proxmox Backup Server is not limited to VMs and containers: the backup client can be installed on physical Linux servers to back up files or volumes.

On Stackscale, this makes it possible to:

Protect specific appliances (for example, physical database servers, legacy application nodes, or specialized servers).
Send those backups directly to Stackscale Archive storage, using NFS or S3-compatible access as the main PBS backend.

By placing these backups on Archive:

You decouple the backup from the physical hardware itself (which is often a single point of failure).
You gain multi–data center redundancy by replicating PBS datastores between the two Archive locations in Madrid and Archive Amsterdam.
Long-term retention becomes more cost-efficient on a per-GB basis, while keeping the option to restore in another region if needed.

If you need to speed up certain restores (for example, for regular DR drills or frequent incidents on a critical physical server), you can add an additional datastore on fast network storage—but always under the assumption that the “definitive” copy lives on Archive.

Best practices when designing the solution

When combining Proxmox Backup Server with Archive on Stackscale, it’s recommended to:

Define a clear retention policy (daily, weekly, monthly) using PBS pruning options and the prune simulator to visualize its long-term effect.
Separate operational backups (fast restores) from archival backups (long-term retention) into different datastores or namespaces.
Take advantage of per-user traffic control and S3 bandwidth limiting so backup windows don’t saturate the network or impact production services.
Integrate PBS with the corporate identity system (LDAP/AD/OIDC) and enable 2FA for administrative logins.

Quick summary for administrators

What it is
Proxmox Backup Server is an enterprise, open source backup solution based on Debian and written in Rust, with native ZFS support, deduplication, Zstandard compression, and client-side authenticated encryption.
What it’s for
Protecting VMs, containers, and physical hosts with fast, efficient incremental backups, natively integrated with Proxmox VE and with support for tape and S3-compatible storage.
Key new features in 4.1
Based on Debian 13.2 “Trixie”, Linux kernel 6.17, ZFS 2.3.4, per-user traffic control, configurable parallelism for verification jobs, and bandwidth limiting for S3 endpoints.
How it fits into Stackscale
It can be deployed as a VM or bare-metal node inside a Proxmox private cloud, using Stackscale’s Archive storage as an NFS or S3-compatible backend, available in Madrid and Amsterdam, to build multi–data center backup architectures with fast local restores and long-term off-site copies.

If you’re already considering Proxmox as a virtualization alternative, or you want to strengthen your backup strategy on Proxmox infrastructures running on Stackscale, Proxmox Backup Server + Archive is a very solid combination to gain resilience without giving up the open source model or sovereignty over your data.

The form can be filled in the actual website url.

The post Proxmox Backup Server: enterprise-grade backups for Proxmox environments on Stackscale appeared first on Stackscale.

Proxmox Datacenter Manager 1.0: the new “command center” for Proxmox environments at Stackscale

David Carrero — Tue, 09 Dec 2025 09:13:09 +0000

The Proxmox ecosystem has just taken an important step forward with the release of Proxmox Datacenter Manager 1.0, the first stable version of its centralized management platform. The tool is designed to solve a very specific problem: how to operate, monitor, and scale dozens of Proxmox VE nodes and clusters and Proxmox Backup Server instances spread across multiple data centers and locations, without losing visibility or increasing operational complexity.

For companies already using Proxmox-based infrastructure —such as Stackscale customers with dedicated platforms and clusters in high-availability data centers— this release opens the door to a management model much closer to what tools like VMware vCenter have historically offered, while preserving the open-source philosophy and flexibility of the Proxmox ecosystem.

What is Proxmox Datacenter Manager?

Proxmox Datacenter Manager (PDM) is a “single pane of glass” platform designed to monitor and manage multiple, independent Proxmox environments from a single control point. From that central console you can see nodes, clusters, virtual machines, containers, storage, and backup datastores, even if they are spread across different data centers or remote locations.

Instead of logging manually into each Proxmox VE cluster or each Proxmox Backup Server instance, PDM aggregates metrics, states, and alerts, and lets you perform basic operations on resources from a unified view. It does not aim to completely replace the classic Proxmox VE interface, but it does become the top governance and multi-cluster orchestration layer.

Technically, Proxmox Datacenter Manager 1.0 is built on Debian 13.2 “Trixie,” a Linux 6.17 kernel, and ZFS 2.3, and it is developed in Rust end to end (backend, CLI, and the new Yew-based web UI framework). The result is a modern platform focused on performance, security, and long-term maintainability.

Key technical capabilities of PDM 1.0

While the most visible change is the global view across all clusters, version 1.0 ships with a set of features that are highly relevant for enterprise environments and infrastructure providers like Stackscale and their customers’ platforms:

1. Centralized view and metrics aggregation

PDM lets you register multiple “remotes” (Proxmox VE nodes and clusters and backup servers) and display their global health status on a single dashboard: CPU usage, RAM, storage I/O, capacity, alerts, and critical KPIs. It also caches data locally, so you keep the latest snapshot of the environment even if a remote becomes temporarily unavailable.

For a customer with several Proxmox clusters spread across different racks or Stackscale data centers, this makes it easy to answer at a glance questions like: “Where do I have free capacity?”, “Which cluster is under the most pressure?”, or “How are the metrics of a critical service evolving across all sites?”.

2. Multi-cluster management and live migration across clusters

One of PDM’s strongest points is its multi-cluster management and the ability to perform live migrations of virtual machines between independent clusters, without service disruption. This allows you to:

Rebalance load across clusters when one approaches its limit.
Carry out planned maintenance without downtime windows.
Implement high-availability scenarios across data centers with much greater flexibility.

On Proxmox platforms hosted at Stackscale, this fits perfectly with designs where a customer has multiple dedicated clusters (for example, production and pre-production in different racks or even different data centers) and wants to move workloads between them without manual export/import processes.

3. Basic VM and container lifecycle from a single panel

From Proxmox Datacenter Manager you can start, stop, reboot, and perform basic operations on VMs, containers, and storage resources on different remotes, without jumping between interfaces. The tool also centralizes task and log history, which simplifies auditing and compliance in regulated environments.

For systems teams, this means less time spent on scattered “clickops” and more capacity to standardize operational procedures, scripts, and automations around a single API.

4. Advanced search and custom views

The integrated search uses a syntax inspired by languages like Elasticsearch or GitHub’s query language, allowing you to filter resources by type (VM, container, remote, etc.), state (running, stopped, etc.), or tags. In environments with thousands of virtual guests, quickly locating a problematic VM or a specific datastore is no longer a challenge.

On top of that, PDM lets you create custom “views” —dashboards focused by remote, resource type, tags, internal customer, and more— and bind them to specific roles. A development team, for example, can have its own view of resources without seeing or touching the rest of the infrastructure.

5. Integration with Proxmox Backup Server and SDN (EVPN)

Version 1.0 includes full integration with Proxmox Backup Server: you can view datastores, namespaces, and backup snapshots in dedicated panels, along with usage and performance metrics. This makes it easier to monitor the state of backups globally, which is especially important in multi-site disaster recovery strategies.

On the networking side, PDM introduces initial Software-Defined Networking (SDN) capabilities with EVPN, allowing you to configure zones and VNets across multiple remotes from a single interface. For architectures with L2/L3 overlays distributed across data centers, this is key to maintaining configuration consistency.

6. Security, authentication, and access governance

Proxmox Datacenter Manager supports authentication via LDAP, Active Directory, and OpenID Connect, as well as tokens and 2FA. It combines identity support with a very granular role-based access control (RBAC) system: you can grant access to specific views without giving direct access to the underlying nodes or virtual machines.

In multi-tenant environments or organizations with highly segmented teams (development, security, operations, business), this helps balance operational visibility with the principle of least privilege.

7. Centralized update management and remote shell access

The platform adds a centralized update panel showing repository status and pending packages across all connected Proxmox VE and Proxmox Backup Server instances. From there, you can trigger updates using PDM’s new remote shell capabilities, without opening separate sessions to each node.

For providers like Stackscale or for their customers who manage their own Proxmox platform, this global view of patch status significantly reduces the risk of running desynchronized clusters or leaving vulnerabilities unpatched somewhere in the environment.

What does Proxmox Datacenter Manager bring to Stackscale customers?

Stackscale customers already running platforms based on Proxmox VE and Proxmox Backup Server —whether in self-managed mode or as a managed service— gain several clear benefits when they add Proxmox Datacenter Manager to their architecture:

Unified multi–data center, multi-cluster view

Instead of treating each Proxmox cluster as a “silo,” PDM gives an end-to-end view of the distributed infrastructure: dedicated clusters in different racks, redundant data centers, nodes specifically for AI workloads or databases, and even Proxmox environments running on-premises at the customer’s facilities and connected to Stackscale through dedicated links or VPN.

This global view helps with:

Capacity planning and scale-out decisions.
Early detection of CPU, RAM, or I/O bottlenecks.
Prioritizing hardware investments where they have the most impact.

Workload mobility across clusters and data centers

Live migration across clusters, combined with Stackscale’s network and inter–data center connectivity options, unlocks interesting scenarios:

Moving production workloads to a more powerful cluster without downtime.
Draining clusters to perform hardware migrations or generational refreshes.
Preparing moves between data centers as part of a resilience strategy or an orderly exit from a given facility.

All of this with a unified operational experience, instead of treating each move as a separate manual project.

Better governance, auditing, and multi-tenant operation

In organizations where different departments share the same Stackscale-based Proxmox platform, Proxmox Datacenter Manager helps to:

Define specific views for each business unit, project, or internal customer.
Limit the scope of operations by role (for example, allowing a development team to start/stop VMs in its own scope, but not touch shared network or storage).
Centralize logs and tasks, improving traceability of who did what, where, and when.

This fits especially well in mission-critical environments where regulatory compliance, separation of duties, and auditing are non-negotiable.

More efficient day-to-day operations

For teams managing dozens or hundreds of Proxmox nodes at Stackscale, the combination of:

Advanced search,
A unified task panel,
Centralized updates,
Shell access from a single console,

reduces friction in day-to-day work. Routine tasks take less time, and there are fewer errors caused by juggling multiple interfaces.

Typical use cases in Stackscale environments

Some examples of how Proxmox Datacenter Manager can fit into real-world architectures on top of Stackscale:

Multi-site private cloud with DR

A company deploys dedicated Proxmox clusters in two Stackscale data centers and maintains a third site as a disaster recovery location. With PDM, all three environments are managed from the same panel, with the ability to migrate workloads between clusters and monitor the state of remote backups and datastores.

Consolidation after a VMware migration

After changes to VMware’s licensing and strategy, many organizations are looking for open alternatives with more predictable costs. Proxmox, combined with a specialized provider like Stackscale, has become one of the go-to options. Proxmox Datacenter Manager acts as the missing piece to comfortably operate multiple Proxmox clusters that replace previous VMware environments.

Hybrid on-premises + Stackscale environments

A customer maintains small Proxmox clusters in branches or plants, and a backbone of more powerful clusters hosted at Stackscale for core workloads. PDM lets them manage both ends —edge and core— from the same platform, with aggregated metrics and coordinated operations.

Deployment and support considerations

Proxmox Datacenter Manager 1.0 is available as an ISO image that can be installed on bare metal or on top of an existing Debian installation, and it is distributed as free software under the GNU AGPLv3 license. For production environments, Proxmox offers subscriptions with access to enterprise repositories and certified technical support.

For Stackscale customers, the most common options are:

Deploying PDM as a dedicated VM (or bare-metal node) within their own Proxmox platform at Stackscale.
Integrating it with corporate directory services (LDAP/AD/OIDC) and existing security policies.
Defining, together with the Stackscale team, a strategy for views, roles, and permissions aligned with how systems, development, and business teams are organized.

As a first stable release, it makes sense to start with a well-scoped pilot —for example, connecting a few non-critical clusters— and then evolve toward full management of the entire Proxmox footprint as the organization gains confidence in the tool. At Stackscale we can already help you with the deployment and maintenance of your own Proxmox Datacenter Manager instance, so feel free to reach out.

Quick summary for administrators

What it is: an open-source, centralized management platform for Proxmox VE and Proxmox Backup Server, built on Debian 13.2, Linux kernel 6.17, ZFS 2.3, and developed in Rust.
Problem it solves: lack of global visibility and unified operations in environments with multiple clusters and distributed nodes.
Key technical features: aggregated metrics, RBAC-driven custom views, advanced search, live migration between clusters, centralized update panel, SDN with EVPN, and integration with Proxmox Backup Server.
Value for Stackscale customers: makes it easier to operate multi-cluster, multi–data center Proxmox platforms, improve workload mobility, strengthen governance and auditing, and reduce the time spent on routine operational tasks.

For organizations that already trust Proxmox running on Stackscale infrastructure —or that are planning a migration from other virtualization platforms— Proxmox Datacenter Manager 1.0 becomes a strategic building block to professionalize environment management without giving up open-source principles or infrastructure sovereignty.

The post Proxmox Datacenter Manager 1.0: the new “command center” for Proxmox environments at Stackscale appeared first on Stackscale.

How to prepare your infrastructure for traffic peaks without your site going down

David Carrero — Thu, 20 Nov 2025 12:04:57 +0000

On a normal day, everything looks fine: the website is fast, APIs are calm, and monitoring dashboards stay in comfortable ranges.

Then a big newsletter goes out, a paid campaign performs better than expected, or Black Week (Black Friday and Cyber Monday) kicks off. Suddenly traffic spikes and you quickly see whether your infrastructure is well designed — or whether every peak turns into a late-night incident.

For many projects (eCommerce, SaaS, online banking, media, gaming, etc.), traffic peaks are not an anomaly; they are part of the business model. And when we talk about mission-critical environments, with geo-redundant high availability and network storage with synchronous replication, the margin for error is very small.

At Stackscale, where we work every day with private cloud based on Proxmox or VMware, bare-metal servers and network storage solutions for exactly these scenarios, the conclusion is quite clear:
it’s not just about “adding more machines”, it’s about designing the architecture so that these peaks are part of the plan.

1. First things first: really understand your traffic patterns

Without data, everything else is just intuition. The first step is having a clear picture of how your platform behaves.

Questions you should be able to answer without breaking a sweat:

What does your “normal” traffic look like per hour and per day of the week?
What have your real peaks been over the last 6–12 months, and what triggered them?
Which parts of the platform suffer first under stress?
- Checkout, login, APIs, internal searches, private area, etc.
At what point do errors, high response times or conversion drops start to appear?

To get there, the minimum reasonable approach is to combine:

Application metrics: response times per endpoint, error ratios, job queues.
Infrastructure metrics: CPU, RAM, disk I/O, network latency, storage usage.
Business data: abandoned carts, forms that never complete, support tickets during campaigns.

With that base you can start talking about capacity in a meaningful way:
“we want to handle 3× last November’s peak” is very different from
“we need something more powerful because the site feels slow”.

2. Optimize before you scale: common bottlenecks

The most common reaction when a site goes down during a traffic spike is to blame the hosting. And sometimes that’s fair… but very often it’s not the whole story.

In practice, we usually see a combination of:

Heavy frontend

Huge uncompressed images or no use of modern formats.
Too many third-party scripts: tags, pixels, chats, A/B testing tools, etc.
Themes or frontends overloaded with JavaScript that block rendering.

The more “expensive” each visit is in terms of resources, the faster your infrastructure saturates — no matter how good your bare-metal servers or private cloud are underneath.

Backend under pressure

Database queries without proper indexes, or logic that is simply too complex.
Overuse of plugins, modules or extensions that add logic to every request.
Lack of caching (page, fragment or object cache) that forces recalculation on each request.
Tasks that should go to a queue (bulk email sending, report generation, external integrations) running in real time.

Infrastructure at the limit

Even with good code, there are physical limits:

CPU and RAM permanently high during peak hours.
Storage systems or arrays saturated in terms of IOPS.
Virtual machines oversized for secondary services and undersized for the critical path.

Practical rule of thumb:
before increasing resources at Stackscale (or anywhere else), it’s highly recommended to:

Clean up frontend and media.
Review database and cache usage.
Remove unnecessary complexity from the backend.

Every improvement in this layer means the same hardware can handle more traffic, and every euro invested in infrastructure delivers better returns.

3. From a single server to a peak-ready architecture

Many projects start on a single server (physical or virtual) acting as a “Swiss army knife”: web, database, cache, cron jobs, queues… all in one place.

It works — until it doesn’t.

Once the business depends on that platform or traffic grows, priorities shift to:

Reducing single points of failure.
Separating responsibilities.
Being able to add capacity without breaking everything.

Typical architectures we see in critical environments

A very common pattern in private cloud or bare-metal deployments:

Several application nodes behind one or more load balancers.
A separate database, with the option of read replicas when volume requires it.
A cache/intermediate storage layer (for example Redis, Varnish cache, other cache proxies, etc.).
Centralised, redundant network storage, simplifying high availability setups.

At Stackscale, many customers choose this type of architecture from the start or grow into it in stages, especially when:

They cannot afford prolonged downtime windows.
Traffic growth is structural, not just occasional.
There are demanding requirements in terms of compliance, audits or internal SLAs.

4. Geo-redundant high availability and synchronous replication: taking “plan B” seriously

When we talk about mission-critical workloads, it’s not enough that one node can replace another inside the same data center. You also have to consider:

Full data center failures.
Network issues at carrier level.
Serious incidents at region/city level.

This is where concepts such as:

Geo-redundancy come in: having resources ready in a secondary data center that can take over if the primary fails.
Synchronous storage replication between data centers: writes are only confirmed once they have been stored on both sides, minimizing the risk of data loss in case of a failure.

This kind of architecture makes sense when:

The acceptable RPO is practically 0 (you cannot lose transactions).
The expected RTO is very low (recovery in minutes).
The impact of a prolonged outage clearly outweighs the additional cost of geo-redundancy.

In this context, a provider like Stackscale brings:

Redundantly interconnected data centers.
Network storage solutions designed for synchronous replication.
Technical teams used to sizing and operating this kind of environment.

5. Capacity planning: enough headroom without paying for over-engineering all year long

The practical question is: how much headroom do I really need?

You don’t want an infrastructure that is permanently oversized, nor do you want to run “in the red” all the time:

Start from real peaks, not gut feelings.
- Put the actual graphs of your strongest days on the table.
Define a reasonable target:
- For example, handle 3× your current peak with acceptable response times.
Translate that into infrastructure:
- How many application nodes?
- What size and type of bare-metal servers or private cloud resources?
- What minimum performance do you need from network storage?
Design growth steps with your provider:
- Increase CPU/RAM for VMs without stopping the service.
- Add nodes to the application cluster.
- Increase capacity and performance of network storage and the inter-DC network.

Here the advantage of a hosted private cloud versus on-premise is obvious: you can grow step by step, with more predictable costs and without over-provisioning your own hardware “just in case”.

6. Dress rehearsal: test the limit before your next campaign

You don’t want to discover your limits on the day of your biggest campaign of the year. The sensible approach is to run a dress rehearsal in advance.

Minimum test elements:

Load tests on critical paths (homepage, search, listing pages, product pages, login, checkout, APIs).
Fine-grained monitoring during tests:
- Response times.
- CPU/RAM usage.
- I/O latency and saturation.
- Application errors.
Failure simulations:
- Remove one application node from the load balancer and see how the system reacts.
- Test how the platform behaves with partial storage or network failures (where the environment allows, always in coordination with the infrastructure team).

The outcome should be a concrete list of:

Configuration parameters to adjust.
Resources to increase (or rebalance).
Internal procedures for campaign day (what to watch, how often, who decides what).

7. “Good” peaks vs “bad” peaks: campaigns vs DDoS

Not every traffic spike is good news. On top of successful campaigns, you also have:

DDoS attacks at network or layer-7 (HTTP) level.
Bots crawling or attacking forms.
Automated traffic trying to exploit vulnerabilities.

To avoid mixing everything together, you need:

Visibility: the ability to detect anomalous behaviour by IP pattern, user-agent, country, path, etc.
Mitigation measures:
- WAF with specific rules.
- Rate limiting on sensitive endpoints.
- Upstream DDoS protection and/or use of a CDN where it makes sense.
Response procedures:
- What to do if traffic suddenly explodes and it’s not linked to a planned campaign.
- How to coordinate with your infrastructure provider to filter at network level before the attack reaches your nodes.

A solid architecture helps, but the security layer and procedures are just as important to ensure good peaks add value and bad peaks don’t take anything down.

Conclusion

Traffic peaks are not going away; in fact, they are becoming more frequent and more abrupt.

The difference lies in whether your infrastructure:

Suffers them as an unpredictable problem, or
Absorbs them as something already accounted for in the design.

With private cloud, bare-metal servers, network storage and synchronous replication between data centers, and an architecture built for geo-redundant high availability, you can move from “surviving as best you can” to planning growth intelligently.

If you’re reviewing your mission-critical platform or preparing your next major campaign, this is a good time to sit down with the Stackscale team, look at real data and define together what you need so that the next traffic peak is good news — not the start of an incident.

FAQ on traffic peaks and infrastructure architecture

1. What’s the difference between vertical and horizontal scaling to handle traffic peaks?
Vertical scaling means giving more resources to a single machine (more CPU, RAM, etc.). It’s simple and works well up to a point, but it has a physical limit and it remains a single point of failure.
Horizontal scaling means distributing the load across multiple application nodes behind one or more load balancers. It requires more architectural work, but offers better resilience and more room to grow, especially in mission-critical, high-availability environments.

2. What role does a private cloud provider like Stackscale play compared to an on-premise solution?
On-premise, you handle everything: hardware purchase, installation, maintenance, spare parts, growth planning, etc.
With hosted private cloud and bare-metal servers at Stackscale, you delegate the physical and data center layers (power, networking, hardware, replacements, etc.) and focus on the system and application layers. On top of that, you get geo-redundancy options between data centers and network storage prepared for synchronous replication, which is much harder to implement efficiently in a single on-premise data center.

3. How do I know if I really need geo-redundancy and synchronous replication, or if a single data center is enough?
It depends on your RPO/RTO and the real cost of downtime. If your business can afford to lose a few minutes of data and be offline for a while while the primary data center recovers, a solid high-availability setup within one DC might be enough.
If, on the other hand, every minute of downtime has a serious impact (economic, regulatory or reputational) and you cannot afford to lose transactions, the usual approach is to plan for two data centers and some form of replication (synchronous or asynchronous, depending on the case). The final recommendation comes from balancing risk vs cost.

4. How often should I run load tests on my platform?
A sensible minimum is to run them:

Before major predictable campaigns (Singles’ Day, Black Friday, Cyber Monday, Black Week, big product launches, etc.).
After significant architectural changes (new application version, database change, new provider, etc.).

In very dynamic projects, some companies integrate them into their CI/CD cycle on a regular basis (for example monthly or quarterly). The key idea is to avoid letting changes in application and infrastructure pile up for years without checking again how the system behaves under stress.

The post How to prepare your infrastructure for traffic peaks without your site going down appeared first on Stackscale.

From High Availability to a Real Plan B: Lessons from the Latest AWS Outage and a Practical Guide

Marketing Stackscale — Tue, 21 Oct 2025 08:03:41 +0000

AWS’s October 20 outage reminded us of a simple truth: resilience can’t be improvised. High availability (HA) within a single region helps, but it doesn’t replace a plan B that keeps services running when the common element too many components depend on fails (region, control plane, DNS, queues, identity, etc.).

This guide lays out a practical path from “HA” to business continuity, with patterns we see working in production—and that we at Stackscale implement with two active-active data centers. When required, these architectures combine smoothly with other providers. It all starts with the basics: define RTO/RPO and rehearse failover.

David Carrero (Stackscale co-founder): “HA is necessary, but if everything hinges on a single common point, HA fails. The difference between a scare and a crisis comes down to a rehearsed plan B.”

1) Align business and tech: RTO/RPO and a dependency map

RTO (Recovery Time Objective): how long a service can be down.
RPO (Recovery Point Objective): how much data loss (in time) you can accept when restoring.

With those targets signed off by the business, draw the dependency map: what breaks what? Where do identity, DNS, queues, catalogs, or global tables really anchor? The output is your list of services that can’t go down—and what they depend on.

2) Continuity patterns that actually work

Active–active across two data centers (RTO=0 / RPO=0)

For payments, identity, or core transactions:

Synchronous storage replication between both DCs → RTO=0 / RPO=0.
Distributed / quorum databases, with conflict handling (CRDTs / sagas).
DNS/GTM with real service health checks (not just pings).
Degrade modes pre-defined (read-only, feature flags).

At Stackscale we offer active–active across two European data centers with synchronous replication as a mission-critical foundation. You can complement that core with third parties (hyperscalers or other DCs) if you need a third continuity path or sovereignty/localization guarantees.

Multi-site warm standby (hot passive)

For most important workloads:

Asynchronous data replication to a second site.
Pre-provisioned infrastructure (templates / IaC) to promote site B in minutes.
Automatic failover via DNS/GTM and rehearsed runbooks.

Minimum footprint at an alternate provider (formerly “pilot-light”)

When you want to lower concentration risk at a reasonable cost:

Keep the bare minimum running outside your primary provider (e.g., DNS, observability, immutable backups, break-glass identity, status page).
Promote to active-passive or active-active only what’s truly critical, based on RTO/RPO.

Carrero: “It’s not about abandoning hyperscalers—it’s about balancing. The European ecosystem—Spain included—is mature and a strong complement for resilience and sovereignty.”

3) Three layers that move the needle (and how to handle them)

Data

Transactional: quorum-based distribution + conflict control.
Objects: versioning + inter-site replication, immutability (WORM lock) and, where needed, air-gap.
Catalogs/queues: avoid “global” services anchored in a single external region if your RTO/RPO can’t tolerate it.

Network / DNS / CDN

Two DNS providers and GTM with business-level probes (a real “health transaction,” not a ping).
Multi-CDN and alternate origins (A/A or A/P) with origin shielding.
Redundant private connectivity (overlay SD-WAN) between cloud and DC.

Identity & access

IdP with key (JWKS) caching and contextual re-auth.
Break-glass accounts outside the failure domain, protected by strong MFA (hardware keys).
App governance to stop consent phishing and OAuth abuse.

4) Observability, backups, and drills: without these, there’s no plan B

Observability outside the same failure domain: at least one mirror of metrics/logs and a status page that don’t depend on the primary provider.
Immutable backups and timed restore drills (with recent success).
Quarterly gamedays: region/IdP/DNS/queue/DB failures. Measure real RTO, dwell time, and MTTR.

5) Two reference architectures (and how they fit with Stackscale)

All-in with Stackscale continuity

DC A + DC B (Stackscale) in active–active with synchronous replication (RTO=0/RPO=0).
Immutable backups in a third domain (another DC or isolated object storage).
Multi-provider DNS/DNSSEC + GTM with business health checks.
Observability mirrored out (dedicated provider or a second Stackscale site).
Runbooks and regular gamedays with local, 24/7 support.

Hybrid continuity (Stackscale + another location/provider)

DC A + DC B (Stackscale) active–active for the core.
Minimum footprint at another provider for DNS, status, logs/SIEM, and immutable object storage (or vice versa).
Non-critical workloads at the other location, with DR back to Stackscale if sovereignty or cost requires it.
Private connectivity and portable policies (identity, logging, backup).

What Stackscale brings (without the sales pitch):

Two European data centers with low latency, redundant power & networks, and nearby 24/7 support.
Synchronous replication for mission-critical apps (RTO=0/RPO=0).
High-performance storage (block/object) with versioning and WORM lock (Object Lock) for immutable backups.
Bare-metal and private cloud to consolidate workloads, plus dedicated connectivity with carriers and public clouds via partners.
Easy integration with external providers (DNS, CDN, observability, hyperscalers) for hybrid or selective multicloud strategies.

6) A realistic 30-60-90 day roadmap

Days 1–30

Get RTO/RPO approved per service.
Build the dependency & global-anchor map.
Immutable backups and your first timed restore.

Days 31–60

Multi-provider DNS/GTM, multi-CDN, and observability outside the same failure domain.
Minimum footprint (emergency identity, status, SIEM).
First gameday (DNS + DB/queues).

Days 61–90

Active–active or warm standby between Stackscale’s two DCs.
Integrations with third parties (if needed).
Full-region failure gameday and metrics review.

7) Executive metrics that actually say something

% of services with signed RTO/RPO and met in drills.
Restore OK (last 30 days) and average restore time.
Failover time (gamedays) and dwell time by scenario.
Observability coverage “outside” the failure domain (yes/no by domain).
“Global” dependencies with alternatives (yes/no).

Final word: resilience with a clear head (and without handcuffs)

The lesson isn’t to “flee the cloud,” but to design for failure and de-concentrate single points of breakage. HA is necessary, but not sufficient: you need a complete alternate route to the same outcome. With two active–active DCs (RTO=0/RPO=0) as the base and continuity layers—DNS, immutable copies, observability, and break-glass identity—outside the same failure domain, your platform will stay up when a provider or region stumbles.

At Stackscale, we support that transition every day. And when continuity goals or regulations call for it, we combine our two-data-center infrastructure with other providers. That way, plan B isn’t a PDF in a drawer—it’s a path you’ve already walked.

The post From High Availability to a Real Plan B: Lessons from the Latest AWS Outage and a Practical Guide appeared first on Stackscale.

VMware vs Proxmox VE: how to choose (for real) for your virtualization platform

David Carrero — Mon, 13 Oct 2025 08:24:40 +0000

At Stackscale we work with VMware vSphere and Proxmox VE every day. Our approach isn’t to “force” a tool, but to fit the platform to the project’s requirements (technical, operational, and business). Below is a practical analysis—aligned with Stackscale’s editorial style—of when each technology makes sense and how that translates into operations, cost, and risk.

Why one-to-one comparisons are often unfair

VMware vSphere is an end-to-end product with decades of evolution and a broad ecosystem (vCenter, vSAN, etc.), aimed at enterprise environments that require certifications, compatibility matrices, and formal support.
Proxmox VE is an open distribution that integrates very mature technologies—KVM/QEMU (hypervisor), LXC (containers), ZFS and Ceph (storage)—behind a solid web console and API/CLI (Application Programming Interface / Command-Line Interface).

The value isn’t about “who checks more boxes,” but what you actually use today and across the project’s lifecycle (3–5 years).

Where VMware fits… and where Proxmox shines

When VMware makes sense (and brings everything you need)

Certifications and compliance: regulated sectors and demanding audits.
VDI (Virtual Desktop Infrastructure) and graphics workloads: wide vGPU and MIG (Multi-Instance GPU) support with ISV certifications.
Continuous availability: FT (Fault Tolerance, host-level Tolerance to Failures) with zero downtime and zero loss of in-memory data if a host fails.
Large-scale operations: DRS (Distributed Resource Scheduler) for resource balancing and vDS (vSphere Distributed Switch) when you want the hypervisor’s own distributed switching (although Stackscale does not depend on it; see the network section).

When Proxmox VE shines (and optimizes TCO)

Large-scale Linux workloads: hundreds or thousands of VMs with HA (High Availability) and live migration, automation (Ansible/Terraform), and accessible API/CLI.
SDS (Software-Defined Storage) without vendor lock-in: Ceph integrated or NFS/iSCSI/ZFS depending on the case.
Native backups: Proxmox Backup Server with scheduling, retention, verification, and immutability.
Clear failure domains: sensible, segmented clusters (not mega-clusters).
Cost efficiency (TCO, Total Cost of Ownership): open model with optional support subscription.

Operational comparison (what matters on Monday morning)

Area	VMware vSphere	Proxmox VE
Hypervisor	ESXi	KVM/QEMU + LXC
Management	vCenter (GUI + API), Aria	Web console + API/CLI
Network	vDS (if used); Stackscale network with real VLANs, hypervisor-agnostic	Linux bridge/OVS; Stackscale network with real VLANs, hypervisor-agnostic
Local / software-defined storage	vSAN + ecosystem	Ceph integrated; ZFS/NFS/iSCSI
Network / synchronous storage (Stackscale)	Available to both: NetApp arrays with synchrony and performance tiers	Available to both: NetApp arrays with synchrony and performance tiers
HA / Balancing	Mature HA/DRS	HA + live migration
FT (Fault Tolerance)	Yes (zero downtime / zero in-RAM data loss)	No direct equivalent
vGPU / MIG	Broad support and certifications	Possible, less standardized
Backups	Very large third-party ecosystem	Native backup (PBS) + third parties
Support	Global enterprise (SLA)	Optional enterprise subscription
Cost / TCO	Higher	Very competitive

FT (Fault Tolerance): keeps two synchronized instances of the VM on different hosts; if one fails, the secondary takes over instantly with no memory loss. It’s one of vSphere’s enterprise differentiators.

Stackscale networking: real VLANs and a hypervisor-agnostic architecture

At Stackscale we don’t depend on the hypervisor’s software-defined networking. Our network is independent of VMware or Proxmox and delivers real VLANs directly over our private and bare-metal infrastructure.
This enables:

Extending the same VLAN across VMware clusters, Proxmox clusters, bare-metal servers, and even housing connected to Stackscale.
Simplifying design, avoiding SDN lock-in at the hypervisor layer, and enabling hybrid VMware ↔ Proxmox ↔ bare-metal scenarios with low latency and high performance.
Keeping segmentation and control in the Stackscale network itself, instead of tying the topology to hypervisor-specific networking features.

When a project requires especially fine-grained policies or specific security use cases, we implement them on the Stackscale network, keeping the data plane hypervisor-agnostic.

Storage at Stackscale: compute–storage separation, synchronous RTO=0 & RPO=0, and performance tiers

Beyond each stack’s local or SDS options (vSAN/Ceph), Stackscale offers network storage and synchronous storage based on NetApp arrays, designed to decouple compute from storage. This boosts resilience, enables independent scaling, and minimizes recovery times.

Continuity objectives

RTO=0 (Recovery Time Objective): service continuity with no noticeable downtime in the event of an incident.
RPO=0 (Recovery Point Objective): zero data loss thanks to synchronous replication.

All-flash performance tiers

Flash Premium: ultra-low latency and very high IOPS for critical databases, real-time analytics, or high-throughput queues.
Flash Plus: balance between latency and throughput for app servers, middleware, and mixed workloads.
Flash Standard: consistent performance and optimized cost for general VMs and steady workloads.
Archive (backup & retention): capacity-optimized for copies, replicas, and long-term retention.

Protection & continuity

Frequent, block-efficient snapshot policies.
Cross-datacenter replication included, providing consistent copies ready for failover/failback.
Integration with native or third-party backup tools and granular recovery (file/VM).

Combinations

You can combine Stackscale network storage with vSAN (VMware) or Ceph (Proxmox) depending on design: e.g., SDS for scratch/hot data and NetApp for persistent data with multi-DC replication.
The outcome: resilience by design, with decoupled planes (compute ↔ network ↔ storage) and RPO/RTO aligned with the service’s criticality.

Total cost and operational risk

VMware: higher TCO, low operational risk when its catalog and certifications are explicit project requirements.
Proxmox VE: very competitive TCO, controlled risk when you define failure domains, runbooks, and observability from day one (and work with an experienced partner).

The right decision shows up as stability, predictable costs, and operator velocity.

Design patterns by use case

1) Linux/DevOps workloads (microservices, middleware, queues, non-relational DBs)

Proxmox VE + Ceph for compute, plus Stackscale network storage (Flash Plus/Standard) for persistent data and replicas.
Recommendation: many small/medium clusters vs. a mega-cluster; well-defined failure domains.

2) Datacenters with segmentation and complex topologies

Stackscale network with real VLANs and centralized control, regardless of hypervisor.
Recommendation: leverage agnostic networking to move or coexist workloads across VMware and Proxmox without redesigning the network plane.

3) VDI and graphics profiles

VMware + vGPU/MIG and Flash Premium when storage latency affects UX.
Recommendation: validate profiles and seasonal peaks.

4) Modernization with SDS

Proxmox VE + Ceph for scratch/hot data, NetApp (Flash Plus/Standard) for persistent data + multi-DC replication.
Recommendation: separate networks (front/back), NVMe/SSD for journals, monitor placement groups.

Implementation best practices

With Proxmox VE

Ceph: dedicated networks, CRUSH rules, health alerts.
Backup: dedicated PBS repositories, restore tests, immutable retention.
Automation: templates, cloud-init, tagging, quotas, hooks, API/CLI.
Security: VLAN isolation, cluster firewall, users/roles, host hardening.

With VMware

Licensing fit: align to actual feature usage.
Disaster recovery (DR): runbooks, replicas, and periodic drills.
Capacity: plan DRS and CPU/RAM/IO headroom; enable FT where it adds real value.

Migration with rollback (both directions)

VMware → Proxmox: inventory and criticality (RPO/RTO), compatibility (virtio, guest tools, cloud-init), storage design (Ceph / NFS / iSCSI or NetApp), controlled pilot, and cutover automation (API/scripts, re-IP if needed).
Proxmox → VMware: classify by dependency on FT, vGPU, or ISV certs; network mapping (VLANs/segments, port groups); IOPS/latency validation; HA/live migration tests.

How we approach it at Stackscale

We start from the use case, not the tool.
We design clusters with clear failure domains, observability, and runbooks from day one.
We operate and support environments on VMware vSphere and Proxmox VE across private and bare-metal infrastructure.
Hypervisor-independent networking: we deliver real VLANs and can extend segments across VMware, Proxmox, and bare-metal/housing connected to Stackscale.
Stackscale storage: NetApp arrays with Flash Premium/Plus/Standard and Archive, snapshots and multi-DC replication included; and a synchronous option with RTO=0 and RPO=0 for workloads that cannot tolerate downtime or data loss.
We guide migrations: to VMware or Proxmox, and also VMware → Proxmox and Proxmox → VMware, with pilots, defined rollback, and planned maintenance windows. Ask us!
We iterate: recurring measurements, recovery tests, and adjustments as the project evolves.

Conclusions

Choose VMware vSphere when advanced features (FT, vGPU), certifications, and enterprise support are explicit requirements.
Choose Proxmox VE when you prioritize control, cost efficiency, and flexibility, especially for large Linux estates (and SDS with Ceph).
At Stackscale, both technologies coexist and integrate on a hypervisor-agnostic network and storage, so the architecture serves the project—not the other way around.

FAQ

Does Proxmox VE support Windows?
Yes. Proxmox runs Windows and Linux; its sweet spot is Linux efficiency and the fit with Ceph/ZFS and automation.

Is VMware always more expensive?
It depends on the license profile and actual use of features like FT (Fault Tolerance), vGPU/MIG, or ISV certifications. When these are requirements, their value justifies the cost.

Can I run both?
Yes. It’s common to segment by criticality or workload type: e.g., VDI and certified workloads on VMware, and microservices/DevOps on Proxmox VE with Ceph. Our real-VLAN network and network/synchronous storage make that coexistence straightforward.

What does Stackscale provide in each case?
We design and operate VMware vSphere and Proxmox VE platforms on private and bare-metal infrastructure, with SDS (Ceph), a hypervisor-independent network, and NetApp arrays (Flash Premium/Plus/Standard, Archive) with snapshots and multi-DC replication included—and synchronous options (RTO=0, RPO=0) for critical workloads. We tailor the platform to each project’s needs.

The post VMware vs Proxmox VE: how to choose (for real) for your virtualization platform appeared first on Stackscale.

Proxmox VE 9: The Evolution of Open Source Private Cloud

David Carrero — Mon, 11 Aug 2025 08:04:38 +0000

Proxmox Virtual Environment 9.0 is now available, marking a major step forward in key features such as snapshots, high availability, and software-defined networking. At Stackscale, as a provider of private cloud infrastructure with Proxmox VE support, we review its innovations and compare them with the previous version to help you make informed decisions for your deployments.

A Version Designed for Enterprise Environments

Officially released on July 8, 2025, Proxmox VE 9.0 positions itself as a mature, powerful alternative to commercial platforms like VMware vSphere, Hyper-V, or Nutanix. This open-source virtualization system has built a growing, active community thanks to its reliability, customization capabilities, and complete independence from proprietary licenses. Key highlights include:

Storage vendor-agnostic snapshots.
New affinity rules in high-availability (HA) environments.
Support for “Fabrics” in software-defined networking (SDN).
Completely redesigned mobile web interface using Rust and the Yew framework.
Updated software stack with the latest stable versions of Debian, kernel, hypervisors, and distributed file systems.

Beyond these technical improvements, Proxmox VE 9 strengthens its role as a foundation for hybrid multicloud, edge computing, and DevOps platforms.

Comparison: Proxmox VE 8 vs Proxmox VE 9

Feature	Proxmox VE 8.x	Proxmox VE 9.0
Base OS	Debian 12 “Bookworm”	Debian 13 “Trixie”
Kernel	Linux 6.2 / 6.5	Linux 6.14.8-2
QEMU	7.x / 8.x	10.0.2
LXC	5.x	6.0.4
ZFS	2.1.x	2.3.3 (supports adding disks to RAIDZ)
Ceph	Reef	Squid 19.2.3
Snapshots	Backend-dependent	Vendor-agnostic (volume chains)
High Availability (HA)	No affinity rules	Node/VM affinity rules
SDN Networking	Bridges and Open vSwitch	Introduction of “Fabrics”
Mobile Web UI	Legacy	Redesigned with Rust/Yew
Upgrade Path	Via APT, with precautions	Officially supported
Version Support	Until August 2026	LTS (full cycle from 2025)

Key Features in Depth

Vendor-Agnostic Snapshots

The new volume chains snapshot system is a paradigm shift in Proxmox VE. Unlike previous versions, snapshots no longer depend on the storage backend, enabling consistent block-level backups whether using local storage, NFS, Ceph, iSCSI, or external SAN arrays. This enhances disaster recovery strategies, automates testing environments, and speeds up cloning.

HA Affinity Rules

Proxmox VE 9 allows defining positive or negative affinity rules between VMs and nodes. This is essential for mission-critical environments, preventing two critical services from running on the same host or ensuring workloads run on preferred hosts. This integrates natively with failover systems for better availability.

Smarter Networking with “Fabrics”

The SDN system advances with the introduction of Fabrics: structures for building complex routed networks between nodes and containers. Benefits include:

Multi-tenant segmentation.
Hybrid cloud integration.
Secure isolation between development and production.

It remains compatible with Linux bridges and Open vSwitch, increasing flexibility.

New Mobile Interface in Rust

The mobile web interface has been rebuilt in Rust using the Yew framework, improving usability, load times, and security. It delivers smoother management from smartphones or tablets without sacrificing the classic panel experience.

Ready for the Future

Although Debian 13 “Trixie” is officially released in August 2025, its key components were frozen since May after rigorous beta testing. This allowed Proxmox to confidently launch early without compromising stability.

The migration path from Proxmox VE 8.4 is supported and documented, including for environments running Ceph Reef. In such cases, a progressive upgrade to Ceph Squid is recommended before moving to VE 9.0.

Stackscale’s Perspective

Currently, at Stackscale we closely monitor the evolution of Proxmox VE 9 and its improvements. However, we recommend our customers keep Proxmox VE 8.4 in production for now, as it is still early to adopt the new version in critical environments. While Proxmox VE 9 introduces promising features, we advise giving it more time to mature, validate its stability with real workloads, and wait for best practices and community documentation to consolidate.

We provide bare-metal and private cloud infrastructure optimized for Proxmox VE environments. Our expertise allows us to support both production enterprises and integrators or developers adopting the Proxmox ecosystem.

With version 9’s arrival, our technical teams are ready to assist with transition planning, compatibility checks, and performance optimization.

“Proxmox 9 is a qualitative leap for enterprise virtualization without proprietary licenses. Universal snapshots, SDN improvements, and better HA allow us to deliver more resilient, predictable, and flexible environments than ever,” says David Carrero, co-founder of Stackscale and head of sales and marketing.

Conclusion

Proxmox VE 9.0 marks a before-and-after moment in the open-source virtualization landscape. Not just for its updated stack, but for its mature, production-ready features: advanced storage management, smart networking, modular HA, and a mobile-friendly interface.

With long-term support, modern hardware compatibility, and a growing community, Proxmox VE 9 strengthens its place as a robust, secure platform without proprietary dependencies.

At Stackscale, we’re ready to guide you through this transition—whether planning a migration, deploying new services, or scaling your infrastructure—to get the most out of this new Proxmox VE release.

The post Proxmox VE 9: The Evolution of Open Source Private Cloud appeared first on Stackscale.

The Silent Revolution of Proxmox VE Helper-Scripts: How a Community is Transforming Systems Administration

David Carrero — Mon, 30 Jun 2025 09:55:13 +0000

At Stackscale, as specialists in private cloud infrastructure based on Proxmox VE, we’ve witnessed firsthand how Proxmox VE Helper-Scripts are changing the way organizations deploy and manage their systems. From our data centers in Spain and the Netherlands, we’ve seen this grassroots tool evolve into a vital asset for thousands of sysadmins. This is the story of a quiet but transformative revolution.

A Real-Time Paradigm Shift

It was a late October night when Marc, a systems administrator at a tech startup in Barcelona, faced a familiar challenge: set up a full development environment — GitLab, PostgreSQL, Grafana, and Nextcloud — before the team arrived the next morning.

A few years ago, this would have meant hours of manual configuration and troubleshooting. But Marc had a secret weapon: Proxmox VE Helper-Scripts, a powerful set of automation tools that would do the heavy lifting in minutes. This isn’t fiction — it’s happening daily in data centers and home labs around the world.

A Powerful Duo: Stackscale and Proxmox VE Helper-Scripts

At Stackscale, we offer high-performance infrastructure based on Proxmox VE, with a focus on private cloud deployments. Over time, we’ve seen how these Helper-Scripts have become a natural extension of the platform, especially in enterprise environments where speed, consistency, and repeatability are critical.

“What used to take us an entire afternoon can now be done in 10 minutes,” says a DevOps engineer at a consulting firm that runs on Stackscale’s infrastructure.

These scripts are not just time-savers — they represent a new philosophy in systems automation, born from the community and tailored to real-world challenges.

From Personal Project to Global Standard

The origin of the project traces back to @tteck (Tom), a German developer who began writing scripts to automate repetitive tasks in his personal Proxmox environment. His creations were fast, elegant, and surprisingly user-friendly.

As word spread, so did his impact. GitHub stars skyrocketed. Forums called his work the “correct” way to deploy on Proxmox. But in 2023, Tom passed away unexpectedly — and the community was left with a vital question: what now?

The answer came quickly. Community-scripts, a collaborative initiative, emerged to preserve and evolve his legacy. Today, dozens of contributors from around the world continue to expand the project.

“Tom taught us that true innovation comes from understanding real problems,” says one of the project’s lead maintainers. “We’re just carrying that vision forward.”

More Than Just Scripts — It’s a Workflow Philosophy

With over 17,000 GitHub stars, Helper-Scripts is more than a repo — it’s a movement. Each script is designed with care, offering interactive dialogs, secure defaults, and intelligent configurations.

Even beginner users can deploy sophisticated stacks with a single command like:

bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/homeassistant.sh)"

A Vast Catalog for Every Use Case

The project now includes more than 350 scripts, neatly categorized and regularly updated.

AI & Machine Learning

Spin up Ollama for local LLMs or ComfyUI for image generation in just a few clicks. Researchers can now test ideas in minutes — not days.

Business Tools

From OnlyOffice to InvoiceNinja and Akaunting, small businesses can deploy robust software without SaaS licensing headaches.

Media & Entertainment

Install Jellyfin, Plex, or the ARR ecosystem (Sonarr, Radarr, Lidarr) in no time. A favorite among home labbers and media enthusiasts.

Development & DevOps

Scripts for Gitea, Jenkins, and Code-Server make it easy to build full CI/CD pipelines quickly — even for entire development teams.

Real-World Stories: Speed Meets Scale

TechnoSolutions, a data analytics startup in Valencia, used Stackscale and Helper-Scripts to build its full stack — PostgreSQL, Redis, GitLab, Prometheus, Nextcloud — in two days, without a dedicated DevOps team.

A hospital in southern Spain used Paperless-ngx to digitize years of medical records in just three weeks, cutting their expected timeline in half.

A consulting firm in Barcelona runs 40+ internal tools — from Mattermost to Uptime Kuma — all deployed using Helper-Scripts. Efficiency and autonomy at scale.

What Happens Behind the Scenes

Each script goes through multiple intelligent phases:

Environment detection (Proxmox version, resource checks)
Interactive configuration (RAM/CPU, USB/Zigbee access, storage mounts)
Secure setup (systemd services, firewalls, backups)
Post-install guidance (access URLs, logs, management commands)

All of this, typically in under 10 minutes.

Security First: Open and Auditable

Automation doesn’t mean giving up control. The project emphasizes:

Open-source code for full auditability
Secure defaults: non-root users, limited permissions, hardened services
Continuous testing on multiple Proxmox versions
Rollback support and clear change documentation

Measurable Impact: Real ROI

From our vantage point at Stackscale, we’ve seen the numbers:

Setup time cut by 80–90% — saving €150–400 per deployment
Fewer human errors, fewer support tickets
Environment consistency: faster testing, more reliable releases
Operational scalability: more services managed with fewer staff

Engineering Excellence Under the Hood

Helper-Scripts integrates advanced techniques:

Smart resource detection to auto-tune deployments
Dependency management across diverse environments
Reusable templates for complex configurations
Cross-integration with monitoring tools like Grafana

What’s Next? A Roadmap of Possibilities

The project’s roadmap includes:

Cross-platform support (Docker, KVM, LXD)
AI-powered recommendations for optimal settings
Meta-scripts for deploying entire high-availability stacks
Hybrid cloud integration for edge + core infrastructure setups

Stackscale’s Take: Empowering through Specialization

As a Proxmox-focused infrastructure provider, Stackscale is proud to support this community-driven movement. We see daily how Helper-Scripts empower customers — from startups to institutions — to deploy faster and with greater confidence.

“Helper-Scripts have leveled the playing field,” says Head of Technical Operations at Stackscale. “They allow small teams to compete with larger players using enterprise-grade infrastructure.”

Lessons Beyond Code

This project is proof that:

Community is key: shared knowledge multiplies impact
Simplicity wins: true complexity lies in making things simple
Documentation matters: clarity invites contribution
Open source is sustainable: when a passionate community leads the way

Conclusion: A Quiet Revolution, A Shared Legacy

Somewhere in the world, a sysadmin is running their first Helper-Script right now. Maybe it’s a student setting up their first homelab. Or a CTO deploying production infrastructure on a tight schedule.

In both cases, they’re part of a growing movement — a quiet revolution built on code, collaboration, and trust.

At Stackscale, we’re proud to power that movement. With private cloud infrastructure optimized for Proxmox VE, we provide the solid foundation from which innovation grows.

Helper-Scripts reminds us that great technology often comes from people — not corporations. That collaboration beats complexity. And that every script executed is a tribute to the vision of @tteck and the strength of the community that carries it forward.

Ready to Experience the Power of Helper-Scripts?

Stackscale offers private cloud environments optimized for Proxmox VE and fully compatible with Helper-Scripts. Our data centers in Spain and the Netherlands provide the ideal foundation for your deployments.

Resources:

Project site: https://community-scripts.github.io/ProxmoxVE/
GitHub repo: https://github.com/community-scripts/ProxmoxVE

The post The Silent Revolution of Proxmox VE Helper-Scripts: How a Community is Transforming Systems Administration appeared first on Stackscale.

Stackscale

The Linux kernel prepares for a “day after” Linus Torvalds

Linux: from “personal project” to digital backbone

The bus factor of mainline

What the continuity procedure establishes

Why this matters to companies, cloud, and operations

A quiet lesson: Linux scales in governance too

Sysadmin quick take

Let’s Encrypt Opens the Door to HTTPS by IP: IPv4 and IPv6 Certificates With a 160-Hour Lifetime

What an “IP Certificate” Means—and Why It Matters Now

The Trade-Off: Ultra-Short 160-Hour Certificates

How Issuance Works: ACME Profiles and Validation Challenges

Real-World Use Cases: From Homelabs to Critical Infrastructure Services

Quick Table: What Changes With Short-Lived Certificates

A Note for Sysadmins: Without Automation, This Isn’t Viable

A Small Change on the Surface, With Big Implications

Frequently Asked Questions

What’s the point of a TLS certificate for an IP if I can use dynamic DNS?

Why does Let’s Encrypt cap these certificates at 160 hours?

What do I need to issue an IP certificate with Let’s Encrypt?

How should homelabs or dev environments operate such short-lived certificates?

Proxmox in 2025: the definitive leap from “alternative” to standard — and how Stackscale speeds up the migration

Table of contents

From a niche project to global adoption

What changed: why Proxmox fits better in 2025

Migrating from VMware: less drama, more engineering

1) Platform layer

2) Data and continuity layer

3) Workload layer

Proxmox + Stackscale: a proposal focused on what happens after the migration

On-premises to Proxmox: modernize without giving up control

The 2025 effect: Proxmox isn’t explained anymore — it’s compared

Frequently asked questions

Proxmox Backup Server: enterprise-grade backups for Proxmox environments on Stackscale

Table of contents

What is Proxmox Backup Server?

Architecture and how it works

Client–server model and namespaces

Integration with Proxmox VE

Security, integrity, and ransomware protection

Proxmox Backup Server on Stackscale infrastructure

Archive storage as the primary backend for Proxmox Backup Server

Fast network storage as an accelerated recovery layer

Local disk: a situational option, but use with care

Example architectures with PBS and Archive on Stackscale

1. Proxmox private cloud with Archive as primary repository and a fast recovery layer

2. Multi-cluster platforms with namespaces and Archive as the single source of truth

3. Backups of physical hosts and specific services with Archive as the primary target

Best practices when designing the solution

Quick summary for administrators

Proxmox Datacenter Manager 1.0: the new “command center” for Proxmox environments at Stackscale

What is Proxmox Datacenter Manager?

Key technical capabilities of PDM 1.0

1. Centralized view and metrics aggregation

2. Multi-cluster management and live migration across clusters

3. Basic VM and container lifecycle from a single panel

4. Advanced search and custom views

5. Integration with Proxmox Backup Server and SDN (EVPN)

6. Security, authentication, and access governance

7. Centralized update management and remote shell access

What does Proxmox Datacenter Manager bring to Stackscale customers?

Unified multi–data center, multi-cluster view

Workload mobility across clusters and data centers

Better governance, auditing, and multi-tenant operation

More efficient day-to-day operations

Typical use cases in Stackscale environments

Multi-site private cloud with DR

Consolidation after a VMware migration

Hybrid on-premises + Stackscale environments

Deployment and support considerations

Quick summary for administrators

How to prepare your infrastructure for traffic peaks without your site going down

1. First things first: really understand your traffic patterns

2. Optimize before you scale: common bottlenecks

Heavy frontend

Backend under pressure

Infrastructure at the limit

3. From a single server to a peak-ready architecture

Typical architectures we see in critical environments

4. Geo-redundant high availability and synchronous replication: taking “plan B” seriously