Inadequacy of Previous Generation NF Placement
The industry will need to develop new strategies and technologies to solve the NF placements we just described. In the first generation of NF placement, vendors focused on ensuring that the underlying servers could support the I/O performance needed. The telco industry developed techniques like CPU-pinning, huge pages, and non-uniform memory architectures (NUMA) alignment to improve VNF performance. Vendors utilized DPDK, VPP, and other software approaches to maximize the performance of the underlying hardware.
Capabilities like enhanced placement awareness (EPA) were developed and supported by the underlying NFV-infrastructure and virtual infrastructure managers (VIMs) like OpenStack. EPA functionality was expanded to accommodate containers managed by Kubernetes. However, the focus was to simply match VNFs to underlying software or hardware that the VNF needed.
Nonetheless, the industry continued to make progress, and ETSI NFV1 provided affinity and anti-affinity capabilities for the NF manager to inform the NFV-O of placement constraints. These affinity constraints could be used to facilitate high availability by placing backup VNFs in different zones. But again, these zones usually spanned neighboring servers, at most spread a few racks apart.
On the orchestration side, the open-source orchestration project, open networking automation platform (ONAP) from the Linux Foundation, was extended to provide hardware-platform enablement (HPA2) . HPA policies allow ONAP to utilize information from the underlying NFVI to inform on NF placement. However, these NF placements are within a cluster, often stretching across servers in a rack, and sometimes across racks, but almost always within the same data center. The level of scope and sophistication in this first generation does not meet that needed by our first-responder scenario described above.
Enter the Second Generation of NF Placement
As should be clear by now, the first generation of NF placement approaches apply only within an edge or cloud data center. Unlike the first generation of NF placement, the second generation of NF placement requires dynamic orchestration that spans multiple data centers — across both private infrastructure and public clouds. This means the orchestration solution needs to be substantially more sophisticated.
Goals and Constraints
From a business standpoint, CSPs will be selling network slices that provide a certain bound on latency, guaranteed bandwidth, limited jitter or specific reliability. For instance, in latency, we might see 20-40 millisecond bounds for video applications, sub-20 milliseconds for gaming, sub-5 milliseconds for robotic control systems, and eventually sub-1 millisecond for use cases involving sensitive control systems like edge-assisted autonomous driving or telesurgery. The reality is that many ultra-low-latency use cases are in regulated industries and will likely take a while to mature.
Nevertheless, there's keen enterprise interest in low-latency use cases. These latency goals will likely be paired with bandwidth and reliability promises. The orchestration system will also have other implicit goals, such as cost-minimization. And so, the collective goals across multiple network slices will serve as input into the orchestration system, which will have a control span much more extensive than prior NFV orchestration systems. Overall availability and reliability needs may require the orchestration system to provision backup paths and additional resources in cold, warm, or even hot standby modes to ensure that resources are available when needed to meet strict availability and latency SLAs. This is one area where telco networks differ from cloud applications and why NFV is different from enterprise or cloud virtualization. Telco networks are much less tolerant of failures, and recovery windows are much tighter than they are for SaaS-type applications. Few SaaS applications are asked to provide five nine-reliability coupled with a sub-5 or sub-1 millisecond response time. In our first-responder example above, we would likely ask for sub-20 millisecond latencies for our video feed, but just as importantly, we would want to ensure high-reliability to make sure critical communications don't drop.
Furthermore, CSPs may face constraints that they have to abide by, including any isolation requirements that their clients have related to corporate security or other regulatory compliance mandates. There could also be data sovereignty requirements, which stipulate where data can reside and transit as part of overall system optimization. This means that NFs processing sensitive data cannot be provisioned in data centers that are outside the zones of data sovereignty — such as healthcare data in our example earlier. For instance, even if the closest and most cost-effective data center is just across a neighboring border, data jurisdiction rules might prevent placing the NF in that data center.
1NFV-IFA011/NFV-SOL001 design time and NFV-IFA 007/NFV-SOL 003 run time
2HPA was made available in the ONAP Casablanca release at the end of 2018 and provides Node Feature Discovery, health monitoring and hardware platform configuration