Skip to content

Swarm removes service DNS when all tasks are unhealthy, allowing unintended external or network-level resolution #52049

@mpcref

Description

Description

When all tasks of a Docker Swarm service become unhealthy, the service name is removed from Swarm’s internal DNS. From that moment on, resolving the service hostname no longer returns a VIP or task IP. Instead, the hostname behaves as if it does not exist within the stack.

This causes the resolver to fall back to the host’s normal DNS resolution chain. As a result, the same hostname may resolve to:

  • a public DNS record
  • an internal corporate DNS record
  • another system on the local network
  • or any other resolver configured on the host

This fallback can lead to traffic being silently routed to unintended destinations.

The underlying problem is that an internal Swarm service name can “escape” the internal DNS namespace entirely once all replicas are unhealthy.

Core problem

Swarm currently treats a service with zero healthy tasks as if the service name does not exist in the internal DNS namespace. This allows normal system resolution to proceed. In many environments, that resolution will succeed and point to something else.

An internal service hostname should never resolve outside the cluster simply because all tasks are unhealthy.

Why this is dangerous

This behavior can lead to:

  • Silent cross-service or cross-tenant routing
  • Accidental exposure of internal traffic to external networks
  • Data leaks
  • Security boundary violations
  • Hard-to-debug outages
  • Unexpected connections to production systems outside the cluster

In many real-world environments, short hostnames exist both internally and externally. Removing the internal DNS entry makes it very likely that resolution will succeed elsewhere.

Impact

High.
This can cause unintended routing of internal traffic to external or unrelated systems without any configuration change, creating both reliability and security risks.

Suggested improvements

  • Keep service names resolvable within Swarm even when all tasks are unhealthy
  • Return an explicit internal failure instead of removing DNS entries
  • Prevent fallback to external resolution for internal service names
  • Optionally provide a configuration flag to control this behavior
  • Document the current behavior clearly if it is considered intentional

Environment

  • Docker Swarm mode
  • Overlay networks
  • Linux hosts
  • Services using internal service names for connectivity

This behavior is surprising and can result in dangerous routing scenarios in production environments. It would be safer if internal service names never left the internal DNS namespace, even when no healthy tasks are available.

Reproduce

Example setup:

A stack contains two services:

  • www
  • reverse-proxy

The stack hosts a customer website:

  • www.myclientsdomain.com

The operator also hosts their own website:

  • www.mydomain.com

Their domain mydomain.com is configured as the Linux search domain on the Swarm nodes.

Observed behavior

When all replicas of the www service become unhealthy:

  1. Swarm removes the internal DNS entry for www.
  2. The reverse proxy attempts to resolve www.
  3. The host resolver applies the search domain → www.mydomain.com.
  4. That domain resolves publicly.
  5. The reverse proxy connects to www.mydomain.com.

Actual behavior

  • Service name disappears from Swarm DNS when all tasks are unhealthy
  • Resolver falls back to host/system DNS
  • Hostname may resolve to public or network resources
  • Traffic is routed outside the intended cluster

Result:
www.myclientsdomain.com starts serving the content of www.mydomain.com.

This is only an illustrative example. The same issue can occur without any search domain configured. If a hostname like postgres, api, redis, or internal-service exists elsewhere on the network or in DNS, traffic may be routed there instead.

Expected behavior

When all tasks of a service are unhealthy:

  • The service name should remain in the internal DNS namespace.
  • Resolution should return an internal failure or empty result.
  • It should not fall back to external or host-level resolution.
  • Internal service names should never implicitly resolve outside the cluster.

In other words:
An internal service should be “unavailable,” not “nonexistent.”

docker version

Client: Docker Engine - Community
 Version:           29.2.0
 API version:       1.53
 Go version:        go1.25.6
 Git commit:        0b9d198
 Built:             Mon Jan 26 19:29:31 2026
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          29.2.0
  API version:      1.53 (minimum version 1.44)
  Go version:       go1.25.6
  Git commit:       9c62384
  Built:            Mon Jan 26 19:25:41 2026
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.2.1
  GitCommit:        dea7da592f5d1d2b7755e3a161be07f43fad8f75
 runc:
  Version:          1.3.4
  GitCommit:        v1.3.4-0-gd6d73eb8
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    29.2.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.30.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v5.0.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 30
  Running: 16
  Paused: 0
  Stopped: 14
 Images: 51
 Server Version: 29.2.0
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: active
  NodeID: t3ksy9agq5jo32ph35bgg2gyv
  Is Manager: true
  ClusterID: jx8hs69rtd9iyf6vb0co5obju
  Managers: 2
  Nodes: 2
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.10.10.60
  Manager Addresses:
   10.10.10.60:2377
   10.10.10.61:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: dea7da592f5d1d2b7755e3a161be07f43fad8f75
 runc version: v1.3.4-0-gd6d73eb8
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.14.0-570.58.1.el9_6.x86_64
 Operating System: Rocky Linux 9.7 (Blue Onyx)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.93GiB
 Name: sn60.knaken.eu
 ID: 7536ad89-c51c-4acd-a1ff-54ec7f1c61dc
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Firewall Backend: iptables

Additional Info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions