How our small company migrated from Docker Swarm to Kubernetes

Development
10. October 2025

Tento článok je dostupný len v anglickom jazyku.

As a small tech company with 20–30 people, we’ve gone through the natural evolution of infrastructure. From the days when one server and a few LXC containers were enough, to Docker and Docker Swarm, and finally to Kubernetes, which we now use not only in production but also for development and testing.
In this article, I’d like to share why we migrated, the challenges we faced, and how we successfully moved from Docker Swarm to Kubernetes.

Our beginnings: from LXC to Docker Swarm:

At the start, our infrastructure was simple: one server, a few VMs, and containers. As the number of projects grew, we moved to Docker and then to Docker Swarm.

We had three physical servers running VMs with Swarm on top. Storage was handled with NFS so that we could run services flexibly on different nodes. Load balancing was handled by a shared setup distributing traffic across three Swarm nodes.

Our core services included:

self-hosted GitLab for code and CI/CD,
Nexus as an artifact repository,
several small Java services sharing a common database,
internal systems (attendance, document management, etc.).

Our Swarm setup actually served us well for quite some time — it was simple to manage and had very low complexity compared to other orchestration tools. That was one of the main reasons we stuck with it for years.

However, as our projects and team grew, several limitations became clear:

single point of failure on NFS,
limited security and multi-team separation — it was harder to isolate deployments per team,
lack of integration with tools we used elsewhere (like GitOps deployments with ArgoCD),
shrinking community and ecosystem support.

Swarm’s simplicity and low operational overhead were strong advantages, but in the long run they were outweighed by the need for more features, stability, and integration.

Simple schema of our swarm setup

Why Kubernetes

As both our projects and team size grew, we started using Kubernetes in production for customer projects. It made sense to unify environments — so development and testing should also run on Kubernetes.

Our goals were:

higher stability and availability, even if a physical server goes down,
eliminate SPOFs in storage,
improve control and security (secrets, certificates),
adopt GitOps for deployments,
keep infrastructure management simple with tools we already knew.

The new architecture

We chose MicroK8s (because we already use Ubuntu extensively). Kubernetes runs directly on physical machines, while KVM is only used separately outside of the cluster.

Key components:

Longhorn — distributed storage replacing NFS,
Vault (HashiCorp) — for secrets and ACME endpoint for internal certificates,
ArgoCD — GitOps orchestration integrated with GitLab,
Traefik — separate instances for internal and external traffic,
MetalLB — provides LoadBalancer IPs for services.

Simple schema of new setup

🛠️ How we migrated

Experiments and first tests

We first took one server out of the Swarm cluster, reinstalled it, and ran MicroK8s for testing. The first components deployed were Longhorn, GitLab, ArgoCD, LDAP, and Vault. Nexus was also included in the base services, although we later decided to keep it on a local disk (see below).

Data migration

One of the most challenging parts of the migration was transferring existing data from Swarm into Kubernetes.

At first, we experimented with NFS, but large transfers turned out to be unreliable. In addition, we couldn’t find a straightforward way to copy data directly into Longhorn volumes. Because of that, we decided to use temporary sync pods that would upload the data once into pre-provisioned volumes.

In the end, rsync over SSH proved to be the most consistent approach. To support this, we prepared a lightweight custom image:

				FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
    rsync \
    ssh-client \
    vim \
    && rm -rf /var/lib/apt/lists/*

ENTRYPOINT [ "sleep infinity" ]

FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
    rsync \
    ssh-client \
    vim \
    && rm -rf /var/lib/apt/lists/*

ENTRYPOINT [ "sleep infinity" ]

Then we used sync pods to pull data from legacy VMs into new PVCs in Longhorn:

				apiVersion: v1
kind: Pod
metadata:
  name: sync-mongodb-data
  namespace: core-databases
spec:
  containers:
    - name: sync-mongodb-data
      image: registry.example.com/project/sync-image:latest
      command: ["/bin/sh", "-c"]
      args:
        [
          "rsync -azh --delete --stats -e 'ssh -i /key -o StrictHostKeyChecking=no'            user@virt2:/media/data/mongodb/ /data/db/",
        ]
      volumeMounts:
        - mountPath: /data/db
          name: mongodb-data
        - mountPath: /key
          subPath: key
          readOnly: false
          name: sync-key
  volumes:
    - name: mongodb-data
      persistentVolumeClaim:
        claimName: core-mongodb-pvc
    - name: sync-key
      secret:
        secretName: sync-key
        defaultMode: 0600
  restartPolicy: Never

apiVersion: v1
kind: Pod
metadata:
  name: sync-mongodb-data
  namespace: core-databases
spec:
  containers:
    - name: sync-mongodb-data
      image: registry.example.com/project/sync-image:latest
      command: ["/bin/sh", "-c"]
      args:
        [
          "rsync -azh --delete --stats -e 'ssh -i /key -o StrictHostKeyChecking=no'            user@virt2:/media/data/mongodb/ /data/db/",
        ]
      volumeMounts:
        - mountPath: /data/db
          name: mongodb-data
        - mountPath: /key
          subPath: key
          readOnly: false
          name: sync-key
  volumes:
    - name: mongodb-data
      persistentVolumeClaim:
        claimName: core-mongodb-pvc
    - name: sync-key
      secret:
        secretName: sync-key
        defaultMode: 0600
  restartPolicy: Never

SSH keys and registry credentials were stored in Kubernetes Secrets. Once these sync tasks were completed, we could run services directly on top of Longhorn volumes.

Nexus — a special case

We found that Nexus wasn’t stable when running on Longhorn. Instead, we placed its data on a local disk of one server. Yes, this introduces a new SPOF, but we mitigate it with regular backups, and the performance is stable.

There are certainly ways to tune Nexus to run reliably on Longhorn, but for now we decided not to invest time into this optimization. The local storage solution is sufficient for our current needs.

Separating internal and external traffic

We wanted a clear split:

internal services (only inside the company network, SSL certificates from Vault),
external services (public-facing, certificates from Let’s Encrypt).

We run two separate Traefik instances, each with its own IP provided by MetalLB.

Configuration — internal Traefik

				kind: Deployment
apiVersion: apps/v1
metadata:
  namespace: kube-system
  name: traefik-internal-deployment
  labels:
    app: traefik-internal
spec:
  replicas: 1
  selector:
    matchLabels:
      app: traefik-internal
  template:
    metadata:
      labels:
        app: traefik-internal
    spec:
      serviceAccountName: traefik-account
      containers:
        - name: traefik-internal
          image: traefik:v3.3
          imagePullPolicy: Always
          args:
            - --entryPoints.http.address=:80
            - --entryPoints.https.address=:443
            - --entryPoints.metrics.address=:8082
            - --metrics.prometheus=true
            - --providers.kubernetesingress=true
            - --providers.kubernetesingress.ingressclass=traefik-internal
            - --providers.kubernetescrd
            - --providers.kubernetescrd.allowEmptyServices=true
            - --certificatesresolvers.internal.acme.email=admin@example.com
            - --certificatesresolvers.internal.acme.storage=/internal/acme.json
            - --certificatesresolvers.internal.acme.caServer=https://vault.example.lan/v1/pki_internal/acme/directory
            - --certificatesresolvers.internal.acme.httpChallenge.entryPoint=http
          ports:
            - name: web
              containerPort: 80
            - name: https
              containerPort: 443
            - name: dashboard
              containerPort: 8080
            - name: metrics
              containerPort: 8082
          volumeMounts:
            - mountPath: /internal
              name: traefik-internal-data
            - mountPath: /etc/ssl/certs/root_ca.crt
              subPath: root-ca-crt
              readOnly: true
              name: traefik-internal-root-ca
      volumes:
        - name: traefik-internal-data
          persistentVolumeClaim:
            claimName: traefik-internal-data
        - name: traefik-internal-root-ca
          secret:
            secretName: root-ca
---
apiVersion: v1
kind: Service
metadata:
  name: traefik-internal-web
  annotations:
    metallb.universe.tf/allow-shared-ip: "internal-ip"
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: web
    - port: 443
      targetPort: https
  selector:
    app: traefik-internal
  loadBalancerIP: 192.168.100.20

kind: Deployment
apiVersion: apps/v1
metadata:
  namespace: kube-system
  name: traefik-internal-deployment
  labels:
    app: traefik-internal
spec:
  replicas: 1
  selector:
    matchLabels:
      app: traefik-internal
  template:
    metadata:
      labels:
        app: traefik-internal
    spec:
      serviceAccountName: traefik-account
      containers:
        - name: traefik-internal
          image: traefik:v3.3
          imagePullPolicy: Always
          args:
            - --entryPoints.http.address=:80
            - --entryPoints.https.address=:443
            - --entryPoints.metrics.address=:8082
            - --metrics.prometheus=true
            - --providers.kubernetesingress=true
            - --providers.kubernetesingress.ingressclass=traefik-internal
            - --providers.kubernetescrd
            - --providers.kubernetescrd.allowEmptyServices=true
            - --certificatesresolvers.internal.acme.email=admin@example.com
            - --certificatesresolvers.internal.acme.storage=/internal/acme.json
            - --certificatesresolvers.internal.acme.caServer=https://vault.example.lan/v1/pki_internal/acme/directory
            - --certificatesresolvers.internal.acme.httpChallenge.entryPoint=http
          ports:
            - name: web
              containerPort: 80
            - name: https
              containerPort: 443
            - name: dashboard
              containerPort: 8080
            - name: metrics
              containerPort: 8082
          volumeMounts:
            - mountPath: /internal
              name: traefik-internal-data
            - mountPath: /etc/ssl/certs/root_ca.crt
              subPath: root-ca-crt
              readOnly: true
              name: traefik-internal-root-ca
      volumes:
        - name: traefik-internal-data
          persistentVolumeClaim:
            claimName: traefik-internal-data
        - name: traefik-internal-root-ca
          secret:
            secretName: root-ca
---
apiVersion: v1
kind: Service
metadata:
  name: traefik-internal-web
  annotations:
    metallb.universe.tf/allow-shared-ip: "internal-ip"
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: web
    - port: 443
      targetPort: https
  selector:
    app: traefik-internal
  loadBalancerIP: 192.168.100.20

Explanation:

The Deployment runs Traefik internal, serving only company services.
Certificates are obtained via Vault ACME endpoint.
The Service is of type LoadBalancer, with the IP (192.168.100.20) provided by MetalLB.
The annotation allow-shared-ip allows multiple services to share the same IP if needed (e.g., developer databases)..

External Traefik

The external instance is very similar but uses Let’s Encrypt with DNS challenge.

Originally, we tried the standard HTTP challenge, but this caused issues with our geoIP filtering. To obtain certificates, we had to temporarily disable geoIP protection, which was both inconvenient and insecure.

Switching to DNS challenge solved this problem completely. Certificates are now issued directly through the DNS provider’s API, without exposing a .well-known/acme-challenge endpoint to the internet.

				- --certificatesresolvers.letsencrypt.acme.dnschallenge=true
- --certificatesresolvers.letsencrypt.acme.dnschallenge.provider=websupport
- --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.letsencrypt.acme.dnschallenge.delaybeforecheck=0

- --certificatesresolvers.letsencrypt.acme.dnschallenge=true
- --certificatesresolvers.letsencrypt.acme.dnschallenge.provider=websupport
- --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.letsencrypt.acme.dnschallenge.delaybeforecheck=0

Results and benefits

After migration, our infrastructure is:

more stable — services survive even if a whole physical server fails,
more secure — Vault handles secrets and certificates,
more flexible — ArgoCD + GitLab give us GitOps workflows,
unified — Kubernetes is used consistently from dev to production,
experimental-friendly — with Longhorn we can use snapshots for individual Persistent Volumes, which makes experimenting with services much easier.

Lessons learned

A few takeaways that might help other small teams:

DNS challenge instead of HTTP challenge
At first, we used HTTP challenge for Let’s Encrypt, but ran into issues with geoIP filtering. We had to temporarily disable it, which wasn’t safe or convenient. Switching to DNS challenge was a game-changer — certificates worked reliably without compromises.
If we could, we’d adopt Kubernetes earlier
We postponed moving to Kubernetes, thinking it was overkill for a small team. In reality, MicroK8s was the perfect fit — easy to get started with, yet fully Kubernetes. If we could go back, we’d use it from day one. Operations are much simpler now.
Storage is critical
Our original NFS was a weakness and a single point of failure. Longhorn was a huge improvement, but we also learned that some services (like Nexus) have special requirements. In those cases, local disk + proper backups made more sense than distributed storage.

Conclusion

Migrating from Swarm to Kubernetes wasn’t trivial — we had to solve many details, from storage to data migration. But the outcome was worth it. Even a small company can benefit from Kubernetes if priorities are clear and the right tools are chosen.

Tento článok si môžete prečítať na našom Medium

O nás

Vitajte na našom blogu! Prinášame inovatívne a efektívne riešenia a delíme sa o naše skúsenosti, aby ste mohli rásť spolu s nami.

Odporúčané články

Najnovšie články

All Post
Branding
Desktopové aplikácie
Development
Leadership
Management
Mobilné aplikácie
Nezaradené
Projekty
Webové aplikácie

Pošlite nám správu

Spojte sa s nami

Telefonné číslo

E-mail

Adresa