Skip to main content

Preflight Analysis Tutorial

In this article:

  • Learn how to analyze planned changes using Invariant before pushing them to any network device.
  • Use Networks to organize snapshots.

8 minute read

Organizing Snapshots For Preflight Analysis

Before you get started, know that network snapshots inside your account can and probably should be organized in groups called “Networks”. Each network contains a time series of snapshots, ideally representing the same network or proposed network change.

This is important because Invariant can produce a network health chart per-network. You will also configure alerting rules and set security and connectivity requirements per-network. Invariant can also serve as a searchable archive of prior network snapshots. All of these functions work best when a network contains snapshots of a single site. You don’t want to mix planned or experimental network change in with your live network sync network, potentially triggering production alerts.

You can name your networks whatever you like. A scheme for naming your networks could look like this:

  • Always include the name of the site and clarify the role of the network, e.g. IAD-prod might represent your production network in the IAD area.
  • For Networks that are regularly synced with your live network environment, append -synced to the network name.
  • For Networks that track the 'golden' or intended configuration of the network, append -golden to the network name.
  • For Networks that contain proposed changes, start with the baseline network you intend to use, then add -proposed- and a unique name for the proposal, e.g.IAD-prod-proposed-06-25-maintenance or IAD-prod-proposed-JIRA-803.
  • For Networks containing changes that aren’t necessarily intended to go live, use -experimental, e.g. IAD-prod-experimental-JIRA-894.

Selecting A Baseline Snapshot

To get started with a differential change analysis, you will need to do the following:

  • Choose a name for the network you will use for this analysis.
  • Identify the network snapshot to use as your baseline.
  • Collect the network configuration files for that baseline version.
  • Retrieve the snapshot UUID from the Invariant analysis of that snapshot. If the baseline version hasn’t been analyzed by Invariant yet, analyze it now and note the UUID.

Choose a name for the network you will use for this analysis.

As suggested above, because we are building on baseline network IAD-prod, we can name our planned change network IAD-prod-proposed-JIRA-803, where JIRA-803 might be a ticket tracking the work we hope to address with our change.

Identify the network snapshot to use as your baseline.

For our example we are going to start by fetching a new snapshot of the IAD-prod network to use as our baseline. We will fetch these configs, store them locally, and upload them for initial analysis, making note of the new snapshot UUID. Then we will make our edits to the local network config files.

Collect the network config files for that baseline version.

We can use the CLI in “fetch” mode to collect the latest network config files using the same fetcher config used to sync with production. Run invariant fetch.

Get the snapshot UUID of the Invariant analysis of that snapshot.

We can use invariant run --network IAD-prod-proposed-JIRA-803 to analyze the snapshot we just fetched from the live network and place it in our IAD-prod-proposed-JIRA-803 network. We will note the UUID of this new snapshot - b1dc9602-d62d-4cd1-aa55-21293c0e826d .

Edit Network Configuration Files Directly

Disaster Averted

Suppose our task is to decommission an IPSec tunnel which we believe is not in use. However, we will responsibly verify our change using Invariant first. Naturally, we will discover that we are mistaken, as the IPSec tunnels we think are unused are actually the ones that connect our on-premise enterprise network to our AWS infrastructure. This will certainly cause an outage in our network if deployed, but our analysis will reveal the failure and provide very detailed information about the extent of the potential outage if we were to deploy it.

We will directly modify the network config file for border-1.cfg to show that we intend to shut down interfaces Tunnel1 and Tunnel2 on device border-1 .

--- a/p/demo/private/enterprise_quick/configs/border-1.cfg
+++ b/p/demo/private/enterprise_quick/configs/border-1.cfg
@@ -139,6 +139,7 @@ crypto ipsec profile ipsec-vpn-0dc7abdb974ff8a69-1
!

interface Tunnel1
+ shutdown
ip address 169.254.25.162 255.255.255.252
ip tcp adjust-mss 1379
tunnel source 197.10.10.2
@@ -148,6 +149,7 @@ interface Tunnel1
ip virtual-reassembly
!
interface Tunnel2
+ shutdown
ip address 169.254.172.2 255.255.255.252
ip tcp adjust-mss 1379
tunnel source 197.10.10.2

To see the impact of our change, we start by re-running the analysis run invariant run. This will generate route tables and other network model data describing the state of the digital twin network with this change applied. It will also evaluate all Invariant rules in the snapshot, and if any rules are violated by the change, it will generate virtual traceroutes demonstrating the issue. We will set the --condensed flag which causes the terminal to simply report on whether there were rule failures while still generating all network data files.

$ invariant run --network IAD-prod-proposed-JIRA-803 --condensed
snapshot: d457e2da-9416-42d1-b606-6d3c36209878
outcome: Rule violations found

At this point we know there are rule violations found. Invariant has created virtual traceroutes which we could investigate. However, given that we just shut down an IPSec tunnel and Invariant is reporting connectivity loss, we can safely assume that our change had some kind of impact on the route tables or protocol establishment, so a good place to start is diffing the network model details between our change and the baseline snapshot.

To compare the network models, create three directories: “baseline/” “current/” and “output_diffs/”

mkdir -p current baseline output_diffs
note

This logic may be migrated into the Invariant application in a future release.

Then, edit the BASELINE_SNAPSHOT value below, and run this script. Optionally, you can install the diff tool “dyff” and this script will use it.

# Put the baseline snapshot UUID here:
export BASELINE_SNAPSHOT="b1dc9602-d62d-4cd1-aa55-21293c0e826d"


export INVARIANT_REPORTS="nodes interfaces routes edges ipsec_session_status ipsec_edges bgp_process_config bgp_peer_config bgp_session_compatibility bgp_session_status bgp_edges bgp_ribs ospf_process_config ospf_interface_config ospf_area_config ospf_session_compatibility isis_edges eigrp_edges loops multipath file_parse_status ignored_lines parse_warnings errors probes"

echo "Downloading files for current snapshot into 'current' directory..."
for report_type in $INVARIANT_REPORTS; do
echo "Fetching current ${report_type}.json..."
invariant show "$report_type" --json > "current/${report_type}.json"
done

echo "Downloading files for BASELINE snapshot ($BASELINE_SNAPSHOT) into 'baseline' directory..."
for report_type in $INVARIANT_REPORTS; do
echo "Fetching baseline ${report_type}.json..."
invariant show "$report_type" --snapshot "$BASELINE_SNAPSHOT" --json > "baseline/${report_type}.json"
done

echo "Performing diff between 'baseline/' and 'current/' directories..."
diff -urN "baseline/" "current/" > "output_diffs/0_invariant_data.diff" || true

if [ -s "output_diffs/0_invariant_data.diff" ]; then


if command -v dyff &> /dev/null; then
echo "dyff command found. Generating dyff summary..."
dyff between --color off -b "baseline/" "current/" > "output_diffs/1_invariant_diff_summary.txt" || true
if [ -s "output_diffs/1_invariant_diff_summary.txt" ]; then
echo "dyff summary saved to output_diffs/1_invariant_diff_summary.txt"
echo "--- dyff Summary ---"
cat "output_diffs/1_invariant_diff_summary.txt"
echo "--- End dyff Summary ---"
else
echo "dyff ran but produced no output."
fi
else
echo "Differences found. Full diff saved to output_diffs/0_invariant_data.diff"
head -n 40 "output_diffs/0_invariant_data.diff"
echo ""
fi
else
echo "No differences found between 'baseline' and 'current' directories."
fi
echo ""

echo "Baseline files are in: baseline/"
echo "Current files are in: current/"
if [ -s "output_diffs/0_invariant_data.diff" ]; then
echo "Diff outputs (if any) are in: output_diffs/"
fi
echo "Done."

Understanding The Issue

With the information from the network model diffs we can start to piece together the mechanics of the averted outage step by step. A great place to start is by looking at the difference between RIBs. We can see that several key routes would be lost with this change:

The border-1 device loses key /16 routes to the AWS network, which would have carried traffic over the transit gateway IPSec tunnels. The border-1 device also loses /30 routes established by the IPSec tunnel. Normally, BGP is established over these links, which allows AWS to advertise those /16 networks.

Note that next-hop on the /16 routes (in the golden branch) were set to 169.254.25.161 and 169.254.172.1, which would have been the AWS side of the IPSec tunnel.

(root level)  (routes.json)
- eight list entries removed:
- Admin_Distance: 1
Metric: 0
Network: 10.20.0.0/16
Next_Hop:
type: ip
interface: null
ip: 169.254.25.161
Next_Hop_IP: 169.254.25.161
Next_Hop_Interface: dynamic
Next_Hop_str: "ip 169.254.25.161"
Node: border-1
Protocol: static
Tag: null
VRF: default
- Admin_Distance: 1
Metric: 0
Network: 10.20.0.0/16
Next_Hop:
type: ip
interface: null
ip: 169.254.172.1
Next_Hop_IP: 169.254.172.1
Next_Hop_Interface: dynamic
Next_Hop_str: "ip 169.254.172.1"
Node: border-1
Protocol: static
Tag: null
VRF: default
- Admin_Distance: 1
Metric: 0
Network: 10.30.0.0/16
Next_Hop:
type: ip
interface: null
ip: 169.254.25.161
Next_Hop_IP: 169.254.25.161
Next_Hop_Interface: dynamic
Next_Hop_str: "ip 169.254.25.161"
Node: border-1
Protocol: static
Tag: null
VRF: default
- Admin_Distance: 1
Metric: 0
Network: 10.30.0.0/16
Next_Hop:
type: ip
interface: null
ip: 169.254.172.1
Next_Hop_IP: 169.254.172.1
Next_Hop_Interface: dynamic
Next_Hop_str: "ip 169.254.172.1"
Node: border-1
Protocol: static
Tag: null
VRF: default
- Admin_Distance: 0
Metric: 0
Network: 169.254.25.160/30
Next_Hop:
type: interface
interface: Tunnel1
ip: null
Next_Hop_IP: AUTO/NONE(-1l)
Next_Hop_Interface: Tunnel1
Next_Hop_str: "interface Tunnel1"
Node: border-1
Protocol: connected
Tag: null
VRF: default
- Admin_Distance: 0
Metric: 0
Network: 169.254.25.162/32
Next_Hop:
type: interface
interface: Tunnel1
ip: null
Next_Hop_IP: AUTO/NONE(-1l)
Next_Hop_Interface: Tunnel1
Next_Hop_str: "interface Tunnel1"
Node: border-1
Protocol: local
Tag: null
VRF: default
- Admin_Distance: 0
Metric: 0
Network: 169.254.172.0/30
Next_Hop:
type: interface
interface: Tunnel2
ip: null
Next_Hop_IP: AUTO/NONE(-1l)
Next_Hop_Interface: Tunnel2
Next_Hop_str: "interface Tunnel2"
Node: border-1
Protocol: connected
Tag: null
VRF: default
- Admin_Distance: 0
Metric: 0
Network: 169.254.172.2/32
Next_Hop:
type: interface
interface: Tunnel2
ip: null
Next_Hop_IP: AUTO/NONE(-1l)
Next_Hop_Interface: Tunnel2
Next_Hop_str: "interface Tunnel2"
Node: border-1
Protocol: local
Tag: null
VRF: default

Key BGP sessions between AWS and border-1 cannot be established, failing with UNKNOWN_REMOTE.

(root level)  (bgp_session_compatibility.json)
- two list entries removed: + two list entries added:
- Address_Families: [] - Address_Families: []
Configured_Status: HALF_OPEN Configured_Status: UNKNOWN_REMOTE
Local_AS: 64512 Local_AS: 64512
Local_IP: 169.254.25.161 Local_IP: 169.254.25.161
Local_Interface: null Local_Interface: null
Local_Interface_str: None Local_Interface_str: None
Node: tgw-06b348adabd13452d Node: tgw-06b348adabd13452d
Remote_AS: 64500 Remote_AS: 64500
Remote_IP: 169.254.25.162 Remote_IP: 169.254.25.162
Remote_Interface: null Remote_Interface: null
Remote_Interface_str: None Remote_Interface_str: None
Remote_Node: null Remote_Node: null
Session_Type: EBGP_SINGLEHOP Session_Type: EBGP_SINGLEHOP
VRF: vrf-tgw-rtb-00e37bc5142347b03 VRF: vrf-tgw-rtb-00e37bc5142347b03
- Address_Families: [] - Address_Families: []
Configured_Status: HALF_OPEN Configured_Status: UNKNOWN_REMOTE
Local_AS: 64512 Local_AS: 64512
Local_IP: 169.254.172.1 Local_IP: 169.254.172.1
Local_Interface: null Local_Interface: null
Local_Interface_str: None Local_Interface_str: None
Node: tgw-06b348adabd13452d Node: tgw-06b348adabd13452d
Remote_AS: 64500 Remote_AS: 64500
Remote_IP: 169.254.172.2 Remote_IP: 169.254.172.2
Remote_Interface: null Remote_Interface: null
Remote_Interface_str: None Remote_Interface_str: None
Remote_Node: null Remote_Node: null
Session_Type: EBGP_SINGLEHOP Session_Type: EBGP_SINGLEHOP
VRF: vrf-tgw-rtb-00e37bc5142347b03 VRF: vrf-tgw-rtb-00e37bc5142347b03

Using this deep network behavior comparison data, we can confirm that the mechanics of the incident were as follows: the Tunnel1, Tunnel2 interfaces were shut down. All connectivity to the AWS Transit Gateway was lost. Key /16 networks would have been advertised by AWS through BGP over the IPSec tunnels, but were no longer advertised.

Without having examined the virtual traceroutes, we can already conclude that this change is not safe and these Tunnel interfaces are not ready to be decommissioned.