Why your users refer to the network as weather
This topic holds a special place in my heart, yet it is also one of the most frustrating aspects to encounter, as the solution often appears overly simplistic and, in fact, is. The crux of the matter lies in the fact that the services required to be as close to the end user as possible are, in reality, situated at a considerable distance. The two primary services I am referring to are Domain Name System (DNS) and Dynamic Host Configuration Protocol (DHCP).
It is undeniable that all organizations are driven by cost-cutting measures and a reduced attack surface, leading many to opt for the omission of a Domain Controller (DC) at every branch office. This decision is, in my opinion, a grave mistake that I consistently advise IT Executives against making.
DNS and DHCP are both UDP packets, characterized by their “fire and forget” nature. While a network devoid of packet loss would be ideal, this is an unrealistic expectation in the context of contemporary technologies. However, mechanisms can be implemented to mitigate packet loss, and these mechanisms are indeed in place. Nevertheless, when UDP traffic is forced across a Wide Area Network (WAN) connection, such as a VPN tunnel, packet loss becomes inevitable. Furthermore, UDP exhibits poor resilience in recovering from such losses.
In essence, the Domain Controllers in the Data Center serve as a redundant backup, with each site having its own dedicated DC locally.
This is the reason why end users perceive the network as a dynamic entity, adapting to changes on a daily basis. From the perspective of the endpoint, every webpage, advertisement, and background service before any action is taken requires name resolution. The Wireshark Certification Guide defines that any delay exceeding one second will be noticeable to the end user. In packet captures from the end user initiating a DNS request and completing the TCP handshake alone, I have observed delays ranging from five to ten seconds, well exceeding the one-second threshold that engineers should strive for.
Poor implementation of DNS
Rarely have I observed organizations with a well-implemented DNS deployment. Why is this the case? In most organizations I have worked with, the individuals responsible for DNS ownership are typically the same as those responsible for Domain Controllers. It is well-known that Systems Engineers generally prefer to restrict access to Domain Controllers. Given that DNS is closely integrated with Active Directory Domain Services (AD DS), these individuals serve as the gatekeepers to DNS records.
DNS is a network service that should be managed collaboratively by both the Systems and Networking Teams. The standard for DNS is quite straightforward: if an IP address exists, it must have both an A Record and a PTR Record, with no exceptions. Systems Engineers often focus on DNS records from an endpoint perspective, believing that this is all that is required. However, in reality, every single IP address requires both an A Record and a PTR Record. When I say “every,” I mean every, including redundant gateways, which translates to six records per redundant gateway. This results in a total of three A records: one for each router gateway IP and one for the floating IP, and three PTR records to match each A record.
The rationale behind this is that computers perform numerous behind-the-scenes tasks to provide end users with a seamless experience. One such task is reverse lookups on the gateway retrieved from DHCP when the IP address was leased. This process can lead to name lookup failures, which further exacerbates the issue of DNS deployment. If DNS is not deployed locally, there will be a delay in the round-trip of name resolution, compounded by the additional time required to retrieve the IP address from DHCP. PTR records are essential for machines, while A records are crucial for humans. Therefore, while A records are important for humans, PTR records are equally important for the machines we use.
The most common argument against maintaining a healthy DNS system is that “PTR records are a security vulnerability.” While this is partially true, the operational costs associated with maintaining a record are significant, with minimal benefits from a security perspective.
Letting Security Trump Operations
In no way should security be compromised, disregarded, or downplayed. However, neither should operations or performance. The equilibrium between the two is an ongoing process and dynamic interaction. Organizations that have experienced significant security breaches often exhibit a noticeable impact on productivity, as it becomes exceedingly challenging to accomplish tasks. When security measures hinder the productivity of any individual or department within an organization, it can be considered a flawed security strategy and detrimental to the business’s overall performance.
It is important to note that this does not imply that security should be neglected. Rather, it suggests that if security is given the final say in architectural and design decisions, a secure solution can be achieved. However, this perspective was limited to a single lens. Having had experience on both sides of the spectrum, I have observed that organizations that adopt this mentality often have limited experience in other areas. For instance, an organization that outsources all IT functions while retaining only a few “Security Architects” in-house. The result is a third-party entity attempting to prioritize customer satisfaction and service through communication, while still needing to balance these considerations with the need to weigh customer satisfaction against service.
The balance between security and operations is simply a delicate equilibrium that can be disrupted if either aspect is weighted excessively. Introducing another variable into this discussion, what is the actual skill level on either side of the spectrum? Relying solely on established policies or decisions without verifying their validity does not guarantee their soundness. This principle extends even to training and certifications. Many general security training materials and books briefly discuss ICMP (Internet Control Message Protocol) as a potential risk. While this statement is accurate from an operational perspective, it is a requirement rather than a sound security measure. Without ICMP, basic connectivity verification becomes challenging, and network devices lose the ability to perform Path MTU Discovery, leading to dropped traffic. From a security standpoint, it is also worth noting that the tools used by attackers often incorporate ICMP mitigation capabilities. This is a common policy implemented by many organizations, but it results in dropped traffic and hinders the verification of basic IP connectivity without providing any security benefits.
Not hiring a dedicated network professional
Why is it necessary for an organization to employ a networking professional? Despite their high cost, the network appears to be an enigmatic black box, and their expertise remains largely unknown. These factors do not constitute compelling reasons against hiring a network engineer; rather, they highlight the value of such professionals. The cost of a skilled network engineer is not solely attributed to their specialized knowledge acquired through a relatively short learning curve and ease of employment. Instead, it stems from their comprehensive understanding of network design, implementation, and maintenance, ensuring the security and optimal performance of the network. The fact that many individuals lack knowledge about network professionals underscores the critical role they play in safeguarding critical systems. In the event of an outage, the recovery time is significantly prolonged due to the need to engage external support and resolution entities. While network devices are designed to provide connectivity out of the box, this does not guarantee reliability, health, or security. Operating in the cloud or adopting a “we can figure it out” mindset can lead to poor performance and extended recovery times.
Taking a top-down approach to architecture
Technology has become an indispensable requirement for businesses to maintain competitiveness. However, when a company decides to implement a specific technology, it typically does so from an end-user perspective, such as purchasing an application. Their primary concern is ensuring that the application is operational and functioning correctly. Companies whose business is not IT-focused do not actively seek to establish a robust network infrastructure. In the OSI Model, the network precedes the application, necessitating the design of the foundational network that will deliver and host the application before designing the system and its required dependencies. Networking should be the first team consulted when implementing a new solution. Without networking expertise, the physical hosting location of the solution remains uncertain, there is a risk of developing an inefficient IP scheme, and ultimately, other teams cannot perform their tasks effectively without the outputs generated by the networking team.
