non deterministic NIC order in multihomed instance with both static and dynamic network

**Describe the bug**

we're observing that some bosh deployments with an instance groups having two networks, end up having non determinist NIC order allocation (between eth0 and eth1) at initial deployment, on the index 1. Then, this order remains the same after any `bosh recreate` of the instance group.

This impacts the following use-cases which can't rely on a deterministic interface name (e.g. eth0) or a given network
- per interface configuration such as: 
   - sysctl -w net.ipv4.conf.eth0.rp_filter=2
   - TCP checksum offload
- ARP configuration assigned per interface. See https://github.com/cloudfoundry/haproxy-boshrelease/blob/c4d3f5d055ee11650f8cd1109cdf71e6ff1b86ac/jobs/keepalived/spec#L28-L30
>  keepalived.interface:
>  description: interface keepalived will use to mount the VIP. If set to 'auto', uses the default interface on the VM
>  default: auto
   - (We're fetching history and retesting to understand why we have disabled `auto`)

We have been comparing logs of two bosh deployments with the same manifest network specs such as the following (including cloud-config network ordering):

```yaml
  name: proxy
  instances: 2
  networks:
    - name: tf-net-osb-data-plane-shared-pub2
      static_ips:
        - 10.xx.yy.189
        - 10.xx.yy.190
    - default:
        - dns
        - gateway
      name: tf-net-osb-data-plane-shared-priv
  stemcell: default 
```

The difference in logs during a bosh recreate is limited to the
* `DEBUG -- DirectorJobRunner: Fetching existing instance for: #<Bosh::Director::Models::Instance @values=` which shows that the current instance networks are fetched from the agent settings and returned with a different order
   * the agent_settings.json have indeed a different order in the two instances of the instance group
* `Creating instance network reservations from database for instance` (See [sources](https://github.com/cloudfoundry/bosh/blob/62397480f7cd79404712380805ff48b14a0a3560/src/bosh-director/lib/bosh/director/deployment_plan/instance_network_reservations.rb#L9)) which list the ip_addresses in a different order
* cpi call and response to `create_vm` which have network in different order 


Looking into the bosh database `instances` table, the `spec_json` have a diverging order of networks for the two instances.

Is there a way to make the network interface assignment (eth0/eth1) deterministic for a new deployment ?

Thanks in advance for your help !

**To Reproduce**

See above manifest fragment that triggered the problem

Steps to reproduce the behavior (example):
1. Deploy a bosh director on vsphere-cpi 
2. Deploy <manifest>
3. Check eth0/eth1 ordering 

**Expected behavior**

Systematic determinist ordering of eth0/eth1

**Versions (please complete the following information):**
 - Infrastructure: vsphere 97.0.15
 - BOSH version: 280.1.5
 - Stemcell version '1.631'


/CC @ogrand

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non deterministic NIC order in multihomed instance with both static and dynamic network #2596

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

non deterministic NIC order in multihomed instance with both static and dynamic network #2596

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions