Achieve Networking at Scale with a Self-Service Network Solution
- September 01, 2020
When starting with AWS, many organizations assume that they will only need one or two accounts. They quickly realize they need many more and end up with ten, 50, or more accounts. This is a common situation when customers onboard new business units, set up application workloads, and/or provision new accounts in AWS Control Tower. Yet, unplanned account proliferation can create networking challenges. In today’s article, I’ll share the next steps to tackle this challenge and achieve networking at scale within AWS with a self-service pattern that will help you achieve a unified, secure, scalable, and extendable cloud foundation.
There are four important steps to developing a self-service pattern that I’ll share today. They are:
- VPC deployment approaches
- IP address space management
- AWS Transit Gateway, the backbone of the network in AWS
- Network connectivity management
Network at Scale for Future Growth
To deploy any business-supporting application in the AWS cloud, enabling a baseline of support services is important. For example, you’ll want to create an account structure with distribution based on duty, compliance, and specific workloads; you’ll also want to create accounts for billing, production, non-production, and more. Last, you’ll want accounts for network connectivity, monitoring, and security. Moreover, there are several AWS services that establish the foundation for workloads, before an application is even deployed.
The baseline I reference above is achieved by enabling a multitude of AWS services such as AWS Control Tower, AWS Organizations, AWS IAM, AWS SSO, AWS CloudTrail, AWS Security Hub, AWS Config, and Amazon GuardDuty. These services may not be enabled automatically, yet work together to form a security baseline. Couple this list of services with account proliferation and it’s easy to see why networking at scale quickly becomes imperative. Indeed, many organizations choose AWS to give them greater agility and flexibility. A network that can gracefully manage current and future demands is a necessary component to achieving this level of agility.
Build Cloud Foundations
Hundreds of decisions must be made to create a secure, scalable, and extendable cloud foundation. To help you avoid missteps that can lead to security risk, unscalable systems, and inefficiencies that slow cloud adoption, our Build Cloud Foundations offering helps you make the right decisions faster, speeding the architectural design and build processes. Our secure AWS landing zones help you achieve automation that reduces human errors and ensures consistent and secure account creation and operation while giving you security best practices built into a foundational architecture that speeds the deployment of high-value applications.
Cloud Networking
Networking is an important part of securely connecting to the cloud and then isolating, controlling, and distributing applications across multiple environments. Yet, it is challenging to build a network in the cloud when you dynamically add new accounts and application workloads. In such a dynamic environment, networking configuration may fast become a bottleneck and require cooperation between multiple teams like networking, security, and development.
Here at NTT DATA, we are proponents of a self-service solution in which end-users have access to a self-service portfolio where they can choose from different approved network products. This approach automates the process, avoids time-consuming cooperation between different departments (as the departments agree in advance which network products will be offered), avoids human error, and provides a standardized, repeatable and secure network solution.
- VPC Deployment Approaches: Unified or Case-Specific
When organizations start planning the development of automation to deploy networking, they face a dilemma. They can either develop a few unified VPC patterns and apply them as part of account creation or develop a VPC solution portfolio with multiple products, and share the portfolio in AWS Organizations so teams and users can create new VPCs on-demand for different cases. Both approaches have pros and cons. Unified VPC deployment approach
VPC deployment can be fully automated — and can even be part of account creation automation (like Build Cloud Foundations). Doing so requires a unified VPC pattern for the organizational unit or even for the whole organization. Such an approach allows us to decrease operational overhead because there is no need to configure the network on new accounts. You don’t need to worry about whom to share and give permission to deploy VPCs in accounts. All this will be deployed as part of account automation and end-users will get configured networking. However, such an approach doesn’t provide enough flexibility when teams and users want to have different VPCs for different needs. The next approach solves the problem.Case-Specific VPC deployment approach
Case-specific VPC deployment approaches are more flexible because you can apply them to different use cases, i.e. a different number of tiers (e.g. 3-tiers: public, private and protected subnet types; 2-tiers: public, private subnet types), VPCs, different CIDR block sizes, or for different numbers of availability zones, etc. That can preserve IP address space because you can select different VPC sizes for different accounts and application workflows. Also, it allows you to deploy different VPCs in the same account and add new VPCs when needed. In most cases, a case-specific VPC deployment approach is an organization’s first choice. - IP address space management
IP address management (IPAM) is a core part of planning and managing the assignment and use of IP address space of a network. The main challenges for organizations in IP address management:
- IP address space management
- Plan IP address space so that different segments of the network get dedicated IP address space in order to simplify routing and security configuration between segments of the network.
- Preserve IP address space.
To address these challenges, I recommend implementing an IPAM solution for the organization.
When we started planning a solution for one of our customers, we agreed to use native AWS services to limit the overhead of implementing and maintaining customer solutions. We chose AWS Service Catalog to provide a convenient user interface to the VPC solution portfolio. That allows us to share our solution with the whole AWS Organization and configure role-based access. AWS CloudFormation was used to deploy VPC as infrastructure as code (IaC).
In order to request available CIDR blocks from IPAM for VPCs we used AWS CloudFormation Custom Resources. Custom resources enable you to write custom provisioning logic in templates that AWS CloudFormation runs anytime you create, update (if you changed the custom resource), or delete stacks.
One of the challenges was to establish secure connectivity between AWS CloudFormation Custom Resource and IPAM that is deployed in a single location as a shared service. That connection should be cross-regional and cross-account; it should be available for new accounts before VPC is deployed. The account should also have a private connection to the shared services. Exposing IPAM endpoints to the internet leads to potential security risks so we found another way.
To provide a secure connection between Custom Resource and IPAM, we used message service Amazon SNS and AWS API. We associated Custom Resource with Amazon SNS topics. When you associate an SNS topic with a custom resource, you use SNS notifications to trigger custom provisioning logic. In our case, we created a custom resource that sends messages to the SNS topics to request or release CIDR blocks. Custom resources required SNS topics in every region.
In the diagram below, you can see SNS topics that were deployed in every region in the Global Network Services account. Then we added a cross-region subscription to every SNS topic to trigger a single Lambda function that is deployed in the VPC as a Netbox IPAM solution. Netbox IPAM application was deployed as AWS Fargate containers that are serverless and don’t require maintaining OS. Netbox uses PostgreSQL DB to store persistent data like available and occupied CIDR blocks. PostgreSQL was deployed in AWS RDS service. All Netbox IPAM resources (Fargate containers, RDS instance, ALB) were deployed in private subnets and are not accessible from the internet.
The diagram below illustrates how you can execute a solid IPAM solution as described above.
AWS IP Address Management Solution
- AWS Transit Gateway – the backbone of the AWS network
Previously we discussed how to organize and create hundreds of VPCs in different accounts and regions. The next challenge is to connect them together, configure network segmentation and network security. AWS Transit Gateway is a service that enables organizations to connect VPCs and on-premises networks to a single gateway. AWS Transit Gateway acts as a hub that controls how traffic is routed among all the connected networks which act like spokes. This hub and spoke model helps simplify management and reduces operational costs because each network only has to connect to AWS Transit Gateway and not to every other network. Any new VPC or subnet that is connected to AWS Transit Gateway is then automatically available to every other network-connected via AWS Transit Gateway.Transit Gateway is a Regional resource and can connect thousands of VPCs within the same AWS Region. You can create multiple Transit Gateways per Region and peer them. The diagram below illustrates how to efficiently enable cross account and cross region connections.
- Network Connectivity Management
So far we automated multi-account landing zones through Build Cloud Foundation and VPC through AWS Service Catalog and AWS CloudFormation. Let’s focus on network connectivity management. As we discussed, AWS Transit Gateway acts as a hub that controls how traffic is routed among all the connected networks which act like spokes. This hub and spoke model helps simplify management and reduce operational costs because each network only has to connect to AWS Transit Gateway and not to every other network. Any new VPC or subnet that is connected to AWS Transit Gateway is then automatically available to every other network connected via AWS Transit Gateway.
For this purpose we decided to use the Serverless Transit Network Orchestrator (STNO), an AWS-provided solution that adds automation to AWS Transit Gateway. This solution provides the tools necessary to automate the process of setting up and managing transit networks in distributed AWS environments. A web interface is created to help control, audit, and approve (transit) network changes. STNO supports both AWS Organizations and standalone AWS account types.As part of the STNO solution default deployment, the Transit Gateway is set up with the following Transit Gateway route tables: Flat, Isolated, On-premises, and Infrastructure. That allows us to configure different styles of communication between VPCs and network segmentation. For example, VPCs that are associated with Flat can communicate with each other but VPCs that are associated with Isolated can not. In addition, it is possible to configure propagation between Transit Gateway route tables that will allow communication VPCs associated with different Transit Gateway route tables. For example, VPCs that are associated with Flat can communicate with VPCs that are associated with Infrastructure.
With these critical network components preserved in a self-serve solution, teams are empowered to move faster and innovate more. More importantly, you’ll have a network that scales to fully deliver on the cloud’s promise of greater agility that provides the business the flexibility to quickly pivot to changing market demands and customer needs.
Can we help you enshrine network at scale best practices in your cloud environment? Reach out to our experienced AWS Consulting team today.
Subscribe to our blog