Sanlam Group Position Paper
IaaS Cloud Provider
Group Technology and Information
Author(s): Tania Paulse
Date: August 2018
Name Position Date
Johan Marnewick Francois Venter Document History
Author Version Date Status List of Major Changes
Tania Paulse 0.1 10 August 2018 Draft Creation
Tania Paulse 0.2 15 August 2018 Draft Incorporation of comments from Johan Marnewick, other inputs received and corrections to content
The master for this document is held electronically and only signed copies are valid. An unsigned, printed document is not copy controlled and is to be utilised for INFORMATION PURPOSES ONLY, as it will not be automatically updated. It is therefore the responsibility of the reader to ascertain that it is a currently valid copy.
Table of Contents
TOC o “1-2” h z u 1.Executive Summary PAGEREF _Toc522531693 h 51.1.Cloud Providers IaaS Comparison Overview PAGEREF _Toc522531694 h 51.2.The Sanlam Situation PAGEREF _Toc522531695 h 61.3.Recommendation PAGEREF _Toc522531696 h 62.Introduction PAGEREF _Toc522531697 h 72.1.Purpose PAGEREF _Toc522531698 h 72.2.Background and Context PAGEREF _Toc522531699 h 72.3.Assumptions and Dependencies PAGEREF _Toc522531700 h 83.Cloud providers IaaS Comparison Overview PAGEREF _Toc522531701 h 93.1.Infrastructure as a Service Features Comparison PAGEREF _Toc522531702 h 123.2.Enterprise Adoption Statistics PAGEREF _Toc522531703 h 463.3.Gartner Peer Reviews PAGEREF _Toc522531704 h 494.Sanlam Situation PAGEREF _Toc522531705 h 524.1.Implications of adopting Immature Cloud Computing Offerings PAGEREF _Toc522531706 h 534.2.Sanlam Geographic Presence and its implications on Cloud Provider Selection PAGEREF _Toc522531707 h 534.3.Current Landscape and Cloud Computing implications PAGEREF _Toc522531708 h 545.Recommendation PAGEREF _Toc522531709 h 566.Next Steps PAGEREF _Toc522531710 h 567.References PAGEREF _Toc522531711 h 577.1.Sanlam Documents PAGEREF _Toc522531712 h 577.2.Gartner PAGEREF _Toc522531713 h 577.3.White Papers PAGEREF _Toc522531714 h 578.Appendix A: Gartner IaaS Provider Evaluation Criteria Explained PAGEREF _Toc522531715 h 588.1.Feature Category Description PAGEREF _Toc522531716 h 588.2.Required Feature Set PAGEREF _Toc522531717 h 608.3.Preferred Feature Set PAGEREF _Toc522531718 h 808.4.Optional Feature Set PAGEREF _Toc522531719 h 92
Executive SummaryThere are numerous choices for cloud infrastructure as a service (IaaS) providers today, and choosing the right service for an organization’s technical and business needs is critical. As per a recent Cloud Security Alliance (CSA) report, Amazon Web Services is the most popular public cloud infrastructure platform, comprising 41.5% of application workloads in the public cloud. While Amazon has long been viewed as the dominant provider of public cloud infrastructure, Microsoft Azure is gaining ground quickly in application workload. Azure currently holds 29.4% of the installed base, measured by application workloads. Google Cloud Platform trails with 3.0% of application workloads followed by Rackspace with 2.8%, IBM SoftLayer with 2.6%, and a long tail of providers that comprise another 20.7% of the market.
Cloud Providers IaaS Comparison Overview
Gartner has defined a set of criteria that the organization and its partners and customers deem to be required of Cloud Providers for the delivery of effective and secure cloud services. the set of criteria has been divided into three categories, namely: required, preferred and optional. This evaluation provides the required, preferred, and optional features key findings, the strengths, weaknesses, when to deploy and when not to deploy for each of these providers to be noted when selecting an IaaS cloud provider.
Furthermore, RightScale’s State of the Cloud Report findings show that AWS maintains a lead among enterprises with the highest percentage adoption and largest VM footprint of the top public cloud providers. However, Azure is showing strength by growing much more quickly on already solid adoption numbers. IBM and Google are growing strongly as well but on a smaller base of users. AWS still leads in public cloud adoption but Azure continues to grow more quickly and gains ground, especially with enterprise customers. Among enterprise cloud beginners, Azure is slightly ahead of AWS. Google maintains the third position.
The Sanlam SituationIt is imperative to highlight that the capabilities of the cloud provider solution must be equal or better than the capabilities Sanlam Group has on premise to ensure that the risk level of Sanlam is maintained, at minimum, or improved (reduced risk) and thus aligned to the Sanlam Group Risk Appetite and risk management practises. It is also imperative that the cloud providers enable Sanlam Group to achieve its own strategic business objectives as opposed to only providing technology solution to a subset of the business. For details refer to Section 4 of this document.
RecommendationBased on all the information within this document including our existing technology landscape and licensing agreements, the risk averse culture within Sanlam and the global presence of the cloud market leaders, the recommendation from Group Enterprise Architecture is to look to AWS as the preferred Sanlam Cloud Service Provider for IaaS Services. With Microsoft Azure being considered the alternate option, when AWS is not viable based on pertinent solution criteria or when pursuing a contingency provider deployment. This recommendation is also largely driven by the fact that these cloud providers have committed to having cloud presence within Africa and specifically South African borders in the next 3 to 12 months, indicating a potential for bandwidth cost savings.
Should any cloud provider other than AWS or Azure be pursued, extra governance (related to the capability gaps and risks introduced by each) and capability requirements to mitigate the added risk will need to be considered and provisioned for as part of the initial deployment. It is not advised to pursue Oracle and equivalently immature cloud providers as discussed earlier, as the perceived benefits will be outweighed by the weaknesses introduced. It is advisable that should this be insisted upon for whatever reason, extra funding be requested to be able to mitigate the gaps to be introduced and this aspect is to be considered when the decision is being taken.
IntroductionPurposeThe purpose of this document is to provide an overview of the Infrastructure as a Service (IaaS) Cloud Service Providers market share and key differences as well as provide a recommendation for Sanlam IaaS provider selection moving forward taking into account various aspects from the Gartner Evaluation Criteria for Cloud IaaS.
Further note that this is not only to be used to guide IaaS decisions but also is to be leveraged to guide PaaS and SaaS infrastructure level decisions as the PaaS and SaaS solutions are generally built upon an Infrastructure as a Service solution from one of the cloud providers and thus susceptible to the weaknesses of the said cloud provider’s infrastructure offering.
Background and ContextIndustry analyst firm Gartner predicts that the infrastructure-as-a-service (IaaS) market will grow by 35.9% in 2018–only a slight decrease from 38.6% in 2017–to reach $40.8 billion by the end of the year. IaaS shows no sign of slowing down soon, and is expected to reach $83.5 billion by 2021.
The SaaS industry is expected to grow by 22.2% to reach $73.6 billion by the end of 2018. According to the same Gartner report, SaaS is forecasted to grow to a whopping $117.1 billion by 2021.
Although much smaller than SaaS and IaaS, platform-as-a-service (PaaS) is expected to grow at a formidable 26% to reach $15 billion by the end of 2018. By 2021, Gartner expects PaaS to have a total market size of $27.3 billion.
There are numerous choices for cloud infrastructure as a service (IaaS) providers today, and choosing the right service for an organization’s technical and business needs is critical. However, some industry experts argue that cloud IaaS is a commodity and that price is the only significant consideration. Gartner strongly opposes this idea and finds that service features and configurations differ greatly, even among the leading providers in the market. For example, different providers offer many kinds of security and management configurations, and often gravitate toward compatibility with certain on-premises software platforms for hybrid cloud enablement, which may not be aligned to certain cloud consumer needs and will thus drive customers (cloud consumers) into the arms of different providers. Service parity should never be assumed. Therefore, it will remain crucial for end-user organizations to identify their critical requirements and map those to the capabilities of prospective IaaS providers.
According to a recent Cloud Security Alliance (CSA) report, Amazon Web Services is the most popular public cloud infrastructure platform, comprising 41.5% of application workloads in the public cloud. While Amazon has long been viewed as the dominant provider of public cloud infrastructure, Microsoft Azure is gaining ground quickly in application workload.
Azure currently holds 29.4% of the installed base, measured by application workloads. Google Cloud Platform trails with 3.0% of application workloads followed by Rackspace with 2.8%, IBM SoftLayer with 2.6%, and a long tail of providers that comprise another 20.7% of the market. The scope of the 20.7% long tail providers’ usage is surprising, and may indicate the market is still at an early stage of maturity.
Cloud Market Share (2018)
That being said, this paper has decided to focus on the top three market share holders for Sanlam cloud adoption as a rule, with the aim to have a preferred provider at the end as well as an alternate within certain circumstances. This will allow Sanlam to focus its spend, gain better traction within the cloud adoption journey through this focused effort and also enable better negotiation stance for costs as we can leverage numbers for cloud services and also focus staff knowledge and skills growth into specific provider services and capabilities, thus empowering staff to build their skills and remain relevant within the industry.
Assumptions and DependenciesThe following assumptions are documented as part of our Cloud Computing strategy to ensure the organization is aligned on the scope and expectations of the Cloud Computing adoption journey Sanlam is embarking on.
AssumptionsID Assumptions Rationale
A01 Sanlam will embark on the cloud computing adoption journey. The organizational strategy reflects the want to leverage cloud computing.
Staff has been allocated tasks in alignment with this assumption.
There are already more than 60 cloud computing solutions deployed across the Group
The Group risk reports discussed at executive level and above reflect the need to shift to cloud computing
A02 DependenciesID Dependency Organization Internal/External
D01 Approval of Sanlam Cloud Computing Strategy Group level Internal
D02 Cloud providers IaaS Comparison OverviewGartner has defined a set of criteria that the organization and its partners and customers deem to be required of Cloud Providers for the delivery of effective and secure cloud services. the set of criteria has been divided into three categories, namely: required, preferred and optional. The top three market share holders have been evaluated against this set of criteria with some interesting findings.
IaaS CSP Comparison based on Gartner Evaluation Criteria
Amazon Web Services meets 94% of the required criteria in Gartner’s “Evaluation Criteria for Cloud Infrastructure as a Service.” Consequently, Gartner recommends AWS for most cloud infrastructure as a service (IaaS) production deployment scenarios. AWS is the market share and feature leader in public cloud IaaS; therefore, customers should note that this score is currently the highest among players in this market. AWS meets 100% of the required criteria in the network, service offerings, and price and billing categories. However, AWS has some deficiencies in the compute, support and service levels, and management and DevOps categories.
Microsoft Azure meets 93% of the required criteria in Gartner’s “Evaluation Criteria for Cloud Infrastructure as a Service.” Thus, Gartner advises enterprises that it is safe to consider Azure for many projects, deployments, and applications. Azure is the clear No. 2 market share leader in public cloud infrastructure as a service (IaaS), on the heels of Amazon Web Services (AWS). Although Azure trails AWS in total score and in some technical offerings and configurations, many enterprise customers choose Azure for the integration with the overall Microsoft ecosystem and on-premises software. Furthermore, many customers choose Azure as a strategic, secondary provider for very specific workloads or as an alternative to AWS or other providers.
Google Cloud Platform (GCP) meets 83% of the required criteria in Gartner’s “Evaluation Criteria for Cloud Infrastructure as a Service.” Google continues to invest in GCP’s capabilities, and year over year, GCP is steadily gaining ground on Gartner’s evaluation criteria. However, organizations should proceed with caution. They must understand the shortcomings of the platform and architect for them accordingly. Although Google continues to primarily productize existing capabilities (rather than engineer those capabilities from scratch), it is also making efforts to meet more traditional enterprise use cases. Therefore, Gartner expects that Google will continue to make significant improvements to GCP in the coming years. Customers interested in GCP should follow its development closely.
Oracle Cloud Infrastructure (OCI) meets 69% of the required criteria in Gartner’s “Evaluation Criteria for Cloud Infrastructure as a Service.” OCI scores well in the network category, but falls short in all other categories. Consequently, Gartner does not advise that enterprise clients host production workloads in OCI IaaS without additional managed services. Enterprises may consider OCI for development or testing projects, or as part of a larger Oracle outsourcing arrangement that may include data center outsourcing or other managed services.
Infrastructure as a Service Features ComparisonAmazon Web Services Microsoft Azure Google Oracle
CSP Radar Charts depicting Criteria Alignment
These are the essential or must-have features (i.e. Blue Surface depicted in Figure 3) needed to develop, deploy, and manage a broad range of use cases, including production applications at cloud IaaS providers. (Refer to Appendix A: Required Features for feature details). Thus when Sanlam identifies a workload to be migrated and that workload has defined requirements, does that generate a “Red Flag” on any of the below gaps?
Required Feature CSP Gaps
Amazon Web Services Microsoft Azure Google Oracle
Single-instance/single-data-center availability SLA: The Amazon EC2 SLA requires that multiple AZs be unavailable for the SLA to apply. There is no single-instance or single-AZ SLA. Data center proximity: Historically, Azure data centres have been separated by at least 96 kilometres and, therefore, could not support synchronous replication. Although Microsoft has announced an “availability zone” architecture to address this — the service is still in preview and is only currently available in a few regions. Data center proximity: GCP provides at least two zones for each region it operates. However, the locations within regions are documented to have round-trip network latencies of under 5 milliseconds (ms) on the 95th percentile. These latencies are not sufficient for most synchronous replication use cases. Compute: OCI does not currently have any data center locations online in Asia/Pacific. OCI lacks dynamic horizontal auto-scaling, and its console lacks the ability to start multiple compute instances simultaneously. OCI can hot-patch critical updates to hypervisors without incurring user downtime. However, if a pending hardware issue requires a reboot, OCI does not currently support live migration for compute instances. Thus, Oracle cannot perform certain maintenance tasks on OCI without disruptively impacting running VMs.
Storage service availability SLA — 30 minutes: The Amazon S3 SLA offers an available uptime of at least 99.9% during any monthly billing cycle. This means that, in a month, the SLA allows for a loss of 43.2 minutes. Object-versioning support: Azure does not support file or object versioning. However, customers can utilize snapshot features on the entire BLOB to access older versions of files or to roll back to previous versions. Bring your own image/VM import: GCP does not provide any native mechanisms to facilitate importing VMs or images into the platform. The suggested method involves a third-party partner, and it requires the installation of third-party agents on top of the source machines. Security and access: OCI lacks network forensics capabilities and a managed directory service that is compatible with Active Directory.
Published SLAs for all generally available services: AWS does not publish SLAs for all its generally available and fee-based services (e.g., Amazon DynamoDB, AWS Direct Connect and AWS Elastic Beanstalk). Metrics-driven load balancing: Although Azure offers load-balancing services, the behaviour cannot be altered based on CPU or memory load readings within the compute instance itself, without developers building custom interfaces to retrieve these readings from within the instance operating system. Network forensics: GCP does not enable customers to log metadata about network traffic that is permitted or denied by firewall services. It also does not allow customers to log traffic that is automatically blocked by the platform.
Network: OCI does not offer a global load-balancing service.
VM-preserving data center maintenance: AWS can hot-patch critical updates to hypervisors without incurring customer downtime. However, if a pending hardware issue requires a reboot, AWS does not currently support live migration for all EC2 instances. Thus, AWS cannot perform certain maintenance tasks without disruptively impacting running VMs. However, before setting a mandated reboot deadline, AWS proactively communicates with customers so that they can reboot running instances at their convenience. Relational DBaaS: Google Cloud SQL is a relational DBaaS offering, but it supports only MySQL and PostgreSQL databases. It does not support any enterprise database options, such as Microsoft SQL Server or Oracle.
Storage: OCI supports fewer than half of the required storage features. Missing required features include bulk data import/export with encryption; cross-geography replication; expandable block storage volumes; tiered storage; snapshot copy/replication; automatic object durability; bulk object delete; logging of administrative object service requests; provider-enabled encryption services; object versioning; multipart object PUT and GET from the console; file storage service criteria; and scalable, instance-independent file storage.
Forced tagging: At provisioning time, GCP cannot automatically apply labels to resources based on conditions set by customer policies. This is an important requirement for organizations that want to ensure that their cost allocation strategy is properly implemented across all deployed resources. Operations management: OCI lacks nearly half of required operations management features, including support for mobile browsers in its console; forced tagging; self-service templating; real-time performance monitoring service; real-time performance health checks, thresholds, and alerts; API access to monitoring data; configuration management based on Puppet, Chef or Ansible functions as standard services; task scheduling services; historical-performance monitoring; and custom monitoring metrics.
Software infrastructure services: OCI only has one choice for a relational database as a service (DBaaS), and it does not have a NoSQL DBaaS.
Vendor management and support: Oracle does not make a formal commitment to provide notice of changes to OCI SLAs. OCI does not meet the required criteria for single-instance/single-data-center availability SLA; storage service availability SLA of 30 minutes; unlimited service credits/refunds, and notification window of at least two billing cycles for customers to submit an SLA miss.
Compliance and documentation: Oracle does not provide documents with guidance for users to ensure that individual applications or deployments in OCI adhere to specific compliance certifications.
These are the supplementary features (i.e. Amber surface depicted in Figure 3 above) that are not necessary to satisfy the minimum requirements of the typical large enterprise but frequently desired to address specifically identified needs such as larger scales, better management, and improved availability. (Refer to Appendix A: Preferred Features for feature details) Thus when Sanlam identifies a workload to be migrated and that workload has defined requirements, does that generate a “Red Flag” on any of the below gaps?
Preferred Feature CSP Gaps
Amazon Web Services Microsoft Azure Google Oracle
VM console access — basic access: AWS does not support console-level access to VMs. VM console access — basic access: Azure does not support console-level access to VMs. Customers must use Remote Desktop Protocol (RDP) for Windows and Secure Shell (SSH) for Linux. However, Azure does offer boot diagnostics with one-way output of the OS boot procedures for debugging. Single-tenant compute VMs: GCE does not offer a model that allows customers to consume physical compute nodes on a single-tenant basis, customers with compliance or security concerns sometimes prefer this capability. Compute: OCI lacks support for explicit host affinity, hot-swappable virtual hardware, dynamic vertical auto-scaling, the ability to set restart priority, automatic host anti-affinity, basic access to a VM console, single-tenant compute VMs, sub-minute provisioning times, backup service, the ability to run stand-alone container instances, and Container Linux.
Compute performance baseline: AWS does not publish or maintain a compute performance baseline. Single-tenant compute VMs: Azure does not offer an explicit single-tenant compute VM service for customers that may be concerned about hypervisor-level vulnerabilities or attacks. Provider-offered Linux distribution: Google does not offer a distribution of Linux that is specifically optimized (and continually updated) for use on GCE. However, Google claims that it continues to work with standard Linux distributions, such as Ubuntu, to ensure optimal support for GCE. Storage: Multiple-instance mounts of block storage, automatic snapshot management, the ability to define an object life cycle management policy, and snapshots and cross-geography replication for the OCI file storage service.
Backup service: AWS does not currently have a backup-as-a-service offering. AWS does offer automated backups for certain services like ElastiCache, RDS and Redshift. However, AWS does not offer an EC2 backup service that allows customers to automatically backup and restore different volumes. Furthermore, AWS does not offer a backup service for any of the storage tiers. It does offer snapshot capabilities, versioning, and replication to different regions. Object life cycle management policy: Azure does not provide an automated object storage life cycle system whereby actions can be automatically performed on objects based on various time conditions. Backup service: Google does not offer traditional file-based backup services, which most enterprises are accustomed to using. It offers only block-volume snapshots to protect data. Network: OCI does not offer network address translation (NAT) gateway functions as a standard service, content-routing load balancing, the ability to purchase an explicit network performance tier, LAN traffic encryption, real-time network performance visibility or multi-region virtual networks.
Tiered firewall functionality: AWS does not support firewall policy hierarchies. However, customers can assign up to five security groups per instance. Granular assignment of support tiers: Azure support is tiered, but a support tier must be chosen for the entire account and cannot be selected for portions of deployments or for specific services.
Cloud storage gateway (CSG): Google does not provide and support an on-premises CSG.
Security and access: OCI does not offer tiered firewall functionality; API support for federated authentication; or services for Web Application Firewalls (WAFs), security information and event management (SIEM) integration, patch management, or compute instance vulnerability scanning.
Regions and zones architectural transparency: Microsoft publishes neither the locations of its data centres nor the expected availability of its power and cooling infrastructure or external network connectivity. Web application firewall (WAF): Google does not provide a Layer 7 WAF capability as a service. Software infrastructure services: OCI supports none of the preferred software infrastructure services. This includes content delivery network (CDN); in-memory caching; more than one relational DBaaS with redundancy; database transfer via import/export; asynchronous messaging service; event stream processing service; integrated development environment (IDE) integration; and Function-as-a-Service (FaaS).
Data reliability SLA: GCP does not provide a data reliability guarantee in the Cloud Storage SLA. Although Google states that it has designed Cloud Storage to achieve 99.999999999% durability, it does not contractually commit to this percentage. Operations management: OCI does not provide alert notification via URL, multi-data-center templating, a community image catalogue, professional developer program, pricing API, spending/allocation quotas, cost optimization engine or forecasting, predictive billing alerts, scheduled compute instance suspension, threat monitoring services, automated OS upgrades, an API gateway service, or a personalized health dashboard.
Spending/allocation quotas: Although GCP provides resource quotas for nearly every resource, these quotas define the maximum number of resources that can be created within a certain project. Customers need mechanisms to define and enforce spending limits against a predefined budget. Currently, GCP does not provide such mechanisms. Vendor management and support: OCI lacks live support in native languages for each of its hosting locations, granular assignment of support tiers, a storage service availability SLA of five minutes and a data durability SLA of at least 99.99%.
Spending/allocation quotas: Although GCP provides resource quotas for nearly every resource, these quotas define the maximum number of resources that can be created within a certain project. Customers need mechanisms to define and enforce spending limits against a predefined budget. Currently, GCP does not provide such mechanisms. Compliance and documentation: Oracle does not provide sufficient architectural details on transparency of regions and zones for OCI, and it has published Cloud Security Alliance (CSA) STAR documentation for only a subset of the OCI platform. OCI also lacks a cloud security guideline matrix.
These are the requirements-driven features (i.e. Green surface depicted within Figure 3) necessary for specifically identified deployment scenarios, but not needed in all deployments. (Refer to Appendix A: Optional Features for feature details) Thus when Sanlam identifies a workload to be migrated and that workload has defined requirements, does that generate a “Red Flag” on any of the below gaps?
Optional Feature CSP Gaps
Amazon Web Services Microsoft Azure Google Oracle
Approval workflow: AWS does not have approval workflow services built into its management console or API to control the deployment of AWS assets like EC2 instances or EBS volumes. Approval workflow: Azure does not offer an approval workflow service to control deployments of assets such as VMs or networks. Compute instance leases: GCP does not offer lease periods on its services.
Compute: OCI does not currently allow users to control oversubscription, and it does not offer an ML-optimized compute function.
Bare-metal provisioning: AWS offers this service in preview only. Published data center energy: Azure does not break down energy sources and metrics per individual data centres. However, it uses 100% renewable electricity for its data centres, offices, labs, and manufacturing plants. Microsoft achieved carbon neutrality in 2014 and achieved a goal that all new data centres have an average power usage effectiveness (PUE) of 1.125. Bare-metal provisioning: GCE does not offer bare-metal compute nodes for customers who may want isolation or higher performance. (“Bare metal” refers to compute nodes with an operating system installed directly onto the hardware.) Storage: OCI does not offer a single-tenant storage service, static web hosting support via its object storage or internet-accessible file storage shares.
Export VM image: AWS offers VM export capabilities via the Amazon EC2 VM Import/Export service. However, the service exports only instances that were previously imported by a customer via the VM import task. VM Import/Export cannot export AMIs that were originally deployed from the AWS catalogue.
Single-tenant storage service: AWS does not offer a single-tenant storage service.
Compute instance leases: AWS does not offer lease periods on its services. Variable/auction-priced tier offering: Azure has pay-as-you-go plans, but currently doesn’t offer auction-type purchasing. Export VM image: Google does not enable customers to export an instance into a standardized image format — such as Virtual Machine Disk (VMDK), Open Virtualization Format (OVF) or virtual hard disk (VHD) — for use outside of GCE.
Network: OCI lacks Internet Protocol version 6 (IPv6) support, private customer connectivity as an integrated service and WAN optimization for traffic between OCI data centres.
Dedicated HSM: GCP does not offer dedicated HSM support. Security: OCI does not offer functions for managing approval workflow, single-tenant hardware security modules (HSMs) or support for adaptive authorization based on time and location.
Variable/auction-priced tier offering: GCP offers pre-emptible VMs, which can be up to 80% less expensive than standard instances. However, the pre-emptible-VM price is set by Google; it is not based on a variable or auction-type mechanism determined by market demand. Software infrastructure services: OCI does not offer Apache Hadoop as a service, cross-region failover for more than relational DBaaS, continuous deployment (CD) as a service, a distributed logging service, log data masking or a source control service.
Operations management: OCI lacks a mobile application for its management console and support for a mobile SDK, a GUI for network design/inventory mapping or network architecture, support for multi-cloud libraries, a serverless compute function and support for serverless auto-scaling, the ability to clone an environment, compute instance leases and automated application stack upgrades.
Vendor management and support: OCI does not offer a compute service or storage service availability SLA for three minutes, the ability to access SLAs in programmatically readable formats, or variable/auction-priced tier offering.
Compliance and documentation: OCI does not meet the optional criteria for published data center energy efficiency metrics or offer compliance with FedRAMP Moderate and High, U.S. DoD Provisional Authorization (PA), or Criminal Justice Information Services Division (CJIS). Oracle also does not publish annual reports regarding law enforcement metrics for OCI.
During the evaluation done, a few key features of each CSP were flagged to be extraordinary strengths and have been highlighted below for consideration in the CSP selection process while keeping in mind the Cloud Computing Strategy and more specifically the workload to be migrated.
Clear strengths of each CSP
Amazon Web Services Microsoft Azure Google Oracle
Network offerings and configurations: Through its Amazon Virtual Private Cloud (Amazon VPC) and AWS Direct Connect products, AWS offers customers control and flexibility over how they define network topology and private network connections. It also includes robust load-balancing options for both external- and internal-facing applications. AWS has expanded the feature set of its AWS Elastic Load Balancing (ELB) service to include session affinity and metrics-driven load balancing. On-Premises offering: Microsoft recently announced the general availability of “Azure Stack,” an integrated hardware and software offering launched in conjunction with hardware partners, which can provide a subset of public Azure services for customers within their own data centres.
Network offerings and configurations: GCP offers customers flexibility over how they define network topology. It also enables customers to create an inter-region private network that connects all their instances across different Google cloud regions via Google’s internal high-performance global network. GCP offers robust load-balancing options for both external-facing applications and internal application back ends, with metrics-driven load balancing built into the service. Network: OCI scored well on networking criteria in the required category, benefiting from a high-performance architecture based on hardware-based network virtualization, which results in outstanding overall network and input/output (I/O) performance with minimal oversubscription and I/O jitter. Almost all OCI virtual machine (VM) shapes offer dedicated network bandwidth per instance. Oracle also maintains a high-performance backbone interconnecting its regions, which yields strong WAN performance.
Security and access: AWS provides solid security foundations, including:
Customer-controlled firewalls/access control lists (ACLs; security groups)
Comprehensive compliance certifications and reports
Encrypted data stores
Encryption key management (AWS Key Management Service KMS)
Network traffic logging (VPC Flow Logs)
Secure Sockets Layer (SSL)-secured endpoints
Broad role-based authorization controls for all services, using AWS Identity and Access Management (AWS IAM) Identity and access management: Many organizations are deeply invested in Active Directory (AD) for user and account management, as well as for governance and policy definition. Microsoft Azure has several industry-leading advantages for integration, synchronization and conveyance of AD domains or forests into Azure for more seamless deployments. In addition, many organizations have already adopted Azure AD for Office 365 deployments. This will make it easier to start bringing Azure capacity online given that the complex task of identity integration has often already been completed.
Security and access management: GCP has good implementations of security documentation, a variety of compliance certifications, customer-controlled firewalls, identity and access management services and integrations, and Secure Sockets Layer (SSL)-secured endpoints. Compute: OCI’s bare-metal implementation provides users with flexible options for virtualization and deploying systems such as Exadata in IaaS. OCI allows users to install their own hypervisors, import a custom VM image in the QEMU Copy on Write 2 (QCOW2) or Virtual Machine Disk (VMDK) formats, and then use the custom image to launch VM instances in Emulation Mode. VM images can also be exported to run on-premises, if necessary. Oracle maintains its own Linux distribution called Oracle Linux, which can be deployed for free on Oracle Cloud Infrastructure and on-premises.
Global geographic footprint: AWS provides cloud services in 18 regions worldwide, with a total of 53 AZs. AWS currently has regions in the following locations:
Seven regions in Asia/Pacific (Australia, China, India, Japan, Singapore, and South Korea)
Four regions in Europe (England, France, Germany, and Ireland)
Six regions in North America (Canada and the U.S.) including one AWS GovCloud in the U.S.
One region in South America (Brazil)
In addition, AWS has announced four upcoming regions with 12 new AZs in Bahrain, Hong Kong, Sweden (Stockholm), and a second AWS GovCloud (U.S.). Global geographic footprint: Azure has one of the broadest geographic footprints among the major cloud IaaS providers. It provides services in:
North America: United States and Canada
South America: Brazil
Europe: London, Ireland, Netherlands, Frankfurt, Magdeburg, and Cardiff
Asia/Pacific: Singapore, Hong Kong, Shanghai, Beijing, Pune, Mumbai, Chennai, Tokyo, Saitama, Osaka, Seoul, and Busan
In addition, Azure has declared new upcoming regions:
France: Paris and Marseille
South Africa: Cape Town and Johannesburg
Australia: Canberra Global geographic footprint: Google provides cloud services from:
Five regions in Asia/Pacific (Mumbai, Singapore, Sydney, Taiwan, and Tokyo)
One region in Canada (Montreal)
Four regions in Europe (Belgium, Frankfurt, London, and the Netherlands)
One region in South America (Sao Paulo)
Four regions in the U.S. (Iowa, Northern Virginia, Oregon, and South Carolina)
Each of these regions contains at least two available zones. Google has also announced upcoming regions in Finland, Hong Kong, Los Angeles, and Osaka (Japan).
“Up the stack” features: AWS has built several value-added services outside the core of IaaS, including:
Relational database (Amazon Relational Database Service Amazon RDS)
NoSQL database (Amazon DynamoDB)
Real-time data streaming and processing (Amazon Kinesis)
Content delivery network (CDN; Amazon CloudFront)
DNS (Amazon Route 53)
In-memory caching (Amazon ElastiCache)
Data warehouse (Amazon Redshift)
Desktop as a service (Amazon WorkSpaces) “Up the stack” features: Azure has built several value-added services outside the core of IaaS. These include services such as:
Proactive cloud-based analytics (Azure Machine Learning)
Composition and orchestration of data services at scale (Azure Data Factory)
Advanced key-value cache and store (Azure Redis Cache)
Building, deploying and managing web and mobile apps (Azure App Service)
Apache Hadoop as IaaS on Azure (HDInsight)
Windows apps as a service (Azure RemoteApp) Block storage: GCP offers some differentiating block storage configurations when compared with competitors like Amazon Web Services (AWS) and Microsoft Azure. For example, GCP’s maximum block storage volume size is 64 terabytes (TB). Google also allows the maximum input/output operations per second (IOPS) to be reached with a single persistent disk, removing the need for customers to define and manage striping of multiple disks. In addition, Google enables customers to share volumes across different compute instances, with one instance in read/write mode and all others in read-only mode.
Availability options: AWS has multiple Availability Zones (AZs) within its regions. These AZs are effectively multiple data centres near one another. AWS’s architecture is designed to make it easier to run applications across multiple AZs. Customers are responsible for architecting their applications for high availability. Premier support: Many organizations are Microsoft Premier Support customers. Microsoft extends Premier Support into Azure services. For customers needing assistance with cloud services all the way down into Microsoft software and Windows operating systems, Premier Support can offer comprehensive assistance. Live instance migration: Google Compute Engine (GCE) automatically live-migrates customer workloads away from maintenance events so that customers’ applications continue to operate during any scheduled maintenance. Although instances may still experience a short period of decreased performance, live migration is ideal for workloads that require constant uptime. Management controls and DevOps enablement: In addition to providing customizable monitoring and alerting through Amazon CloudWatch, AWS provides the following:
Comprehensive offerings at the management console and API layers
Custom metadata tagging of resources
Detailed API logging through AWS CloudTrailA full continuous-delivery toolchain (AWS CloudFormation, AWS CodeBuild, AWS CodeCommit, AWS CodeDeploy, AWS CodePipeline and AWS CodeStar) Azure is also well-integrated into Microsoft System Center for management and Microsoft Operations Management Suite (OMS) for management of on-premises and cloud management using a common tool. In addition, because Azure runs on top of Hyper-V using a standard virtual hard disk (VHD) format, existing Hyper-V customers can expect easier processes for importing and exporting Hyper-V VMs. Large-scale capacity and scalability offerings: AWS focuses on building and delivering all services at large scale. Its AWS Auto Scaling and ELB services enable customers to automatically deploy and scale AWS building blocks. Furthermore, AWS operates the largest public cloud IaaS service and allows customers to request very large amounts of capacity. Finally, its high-performance computing (HPC) offerings are industry-leading. Hybrid cloud capabilities: Azure offers comprehensive hybrid cloud networking options through the ExpressRoute service. It offers hybrid management capabilities through the Microsoft System Center suite of tools. It offers services like Azure Site Recovery for disaster recovery as a service. And it offers the afore-mentioned Azure Stack private cloud. Flexible VM instance size: GCE offers Custom Machine Types. Instead of forcing customers to choose from a list of instance types with only predefined amounts of CPU and RAM, Custom Machine Types allows customers to choose their preferred combination of CPU and RAM. Custom Machine Types enables customers to avoid overbuying CPU or RAM that they may not need. This feature makes Google’s compute offering particularly effective at matching appropriate instance sizes to actual workload demand.
Financial management, analysis, and billing flexibility: AWS offers the industry’s most robust set of provider-offered financial management options, including:
AWS Simple Monthly Calculator
AWS Total Cost of Ownership (TCO) Calculator
AWS Cost Explorer
AWS Trusted Advisor
During the evaluation done, a few key features of each CSP were flagged to be extraordinary weaknesses that could adversely impact an organization on the cloud adoption journey. These have been highlighted below for consideration in the CSP selection process while keeping in mind the Cloud Computing Strategy and more specifically the workload to be migrated.
Clear weaknesses of each CSP
Amazon Web Services Microsoft Azure Google Oracle
Service levels: AWS meets 82% of Gartner’s required criteria for support and service levels. Although AWS offers many flexibilities through its support plans, it exhibits the following deficiencies about SLAs:
AWS is missing a single-instance availability SLA.
For its Amazon Simple Storage Service (S3) storage service, AWS does not offer an availability SLA with a maximum allowed outage time of less than or equal to 30 minutes in a month.
AWS does not publish SLAs for all generally available services. Limited support for non-Microsoft technology: Although Microsoft has made investments to support non-Microsoft technologies, it still has work to do to close these gaps. Azure’s relational database as a service (DBaaS) only supports Microsoft SQL Server in general availability at the time of this publication, although MySQL and PostgreSQL engines are currently in preview. Support-and-service-level gaps: GCP’s policies on SLA management do not meet Gartner’s criteria in multiple cases:
The notification window for customers to report a missed SLA is less than two billing cycles.
GCP is missing a single-instance SLA.
For all GCP services, Google caps service credits for SLA misses at 50% of the monthly bill. Automation and notification: Overall, OCI has limited support for automation and notification. OCI does not have any native mechanisms to schedule automated activities, and it does not provide general monitoring functions that can generate alerts.
Dynamic vertical auto scaling: AWS does not support dynamically resizing an instance’s resources (such as CPU and memory) when that instance comes under heavy load. AWS Auto Scaling can dynamically provision new instances into the auto scaling group when the load increases and can dynamically deprovision instances when the load decreases. However, auto scaling cannot increase or decrease (or add or remove) the resources of an existing instance. To accomplish this, customers must first power off the instance, migrate to a new instance size, and then power the instance back on. Metrics-driven load balancing: Load balancing traffic within Azure with metrics other than whether a TCP ACK or Hypertext Transfer Protocol (HTTP) 200-level response code is received. This requires application developers to build a poll able interface to retrieve metrics such as CPU or memory utilization (instead of Azure can retrieve them directly from the hypervisor). Reduced architectural transparency: GCP does not publish the architecture of the data centres, the data center locations, the distance between zones, or the latency guarantees that would allow for synchronous replication. Although Google is notably strong in network performance, this reduced transparency limits customers’ ability to properly design resilient applications. Auto-scaling: OCI does not currently support horizontal or vertical auto-scaling. Thus, compute resources must be manually reconfigured in response to changing workloads.
VM-preserving data center maintenance: Although AWS can hot-patch critical updates to hypervisors without incurring customer downtime, it does not currently support live migration for all EC2 instances when a pending hardware issue requires a reboot. Thus, AWS cannot perform certain maintenance tasks without disruptively impacting running virtual machines (VMs). Before setting a mandated reboot deadline, AWS proactively communicates with customers to give them the opportunity to reboot running instances at their convenience. However, this criterion requires the provider to perform all maintenance without disruptively impacting running VMs. No object life cycle management policy: Azure does not provide an automated object storage life cycle system whereby actions can be automatically performed on objects based on various time conditions. Lack of relational database as a service (DBaaS) support for enterprise databases: Although GCP is actively innovating in this space with the launch of Cloud Spanner, its Cloud SQL managed database service does not support any enterprise database engines, such as Microsoft SQL Server or Oracle. This requirement is particularly important to support more traditional enterprise use cases. VM-preserving maintenance: OCI does not currently support live migration of VMs or automatic host failure recovery. Thus, users may experience downtime due to planned or unplanned maintenance of VM hosts.
Backup service: AWS does not currently have a backup-as-a-service offering. AWS does offer automated backups for certain services like ElastiCache, RDS and Redshift. However, AWS does not offer an EC2 backup service that allows customers to automatically backup and restore different volumes. Furthermore, AWS does not offer a backup service for any of the storage tiers. It does offer snapshot capabilities, versioning, and replication to different regions. No object versioning support: Azure does not support file or object versioning. However, customers can utilize snapshot features on the entire binary large object (BLOB) to access older versions of files or to roll back to previous versions.
Reduced object storage service transparency: Although Google states that it designed all supported storage classes for a durability target of 99.999999999%, Google does not publish the replication strategy of the Cloud Storage service. Google employs erasure coding for storing data redundantly, but customers can’t know how many object replicas the service maintains to ensure data durability. The only exception is for the Multi-Regional Storage class, where Google states that data is stored “in at least two regions separated by at least 100 miles.” Furthermore, the scope for a multiregional bucket can be “Asia/Pacific,” “EU” or “U.S.” This scope does not give enough granular control to customers over object location, and it can’t ensure that multiregional objects don’t cross country borders (apart from “U.S.”). Software infrastructure services: OCI does not support any of the preferred, and only two of the optional capabilities in this category. Many of the software infrastructure functions that Gartner tracks can be obtained by separate cloud services offered by Oracle. However, these are not yet part of OCI and must be purchased and accessed separately.
No single-tenant compute VMs or bare-metal options: Although most workloads will technically operate just fine in a multitenant environment, in some cases — especially for compliance-sensitive workloads — customers would prefer a single-tenant compute option to choose when needed, to guarantee that compute nodes, are not shared by any other customers (and thus, would not be subject to any theoretical hypervisor cross-guest vulnerability). Lack of file storage support: GCP does not currently offer a file-based storage service. Operations management: OCI has very limited support for preferred operations management capabilities, and no support for optional capabilities in this category. This reflects the relative immaturity of the platform.
Geographic coverage: OCI currently has data center locations online in only four regions: two in the United States, and two in Europe.
When to Deploy
Noting all the features, it can be difficult to wade through everything and make a clear decision that will benefit Sanlam. Keeping this in mind, Gartner has also made certain recommendations on when and under what conditions it is appropriate to consider a particular CSP as a cloud adoption partner.
When to pursue and deploy services from these CSPs
Amazon Web Services Microsoft Azure Google Oracle
The organization desires to design and deploy cloud-native applications with the highest degree of control, orchestration, automation, security, and scalability in the market. The organization has developers who use Microsoft developer tools or middleware and who will benefit from the combination of IaaS and PaaS capabilities in a single platform. The organization prioritizes running on an infrastructure managed by Google’s internal technology capability over the currently missing criteria. The organization is willing to learn, and commit to, Google cloud architecture. The organization primarily requires IaaS for hosting workloads that require bare-metal servers to be provisioned within minutes. These workloads could include Oracle databases, scale-up Oracle applications (such as the Oracle e-Business Suite), Exadata and other software that performs best on bare-metal servers with large amounts of RAM.
The organization is interested in a global footprint with hosting locations in:
Asia/Pacific (Australia, China, India, Japan, Singapore, and South Korea)
Europe (England, France, Germany, and Ireland)
North America (Canada and the U.S.)
South America (Brazil)
AWS has announced four upcoming regions with 12 new AZs in Bahrain, Hong Kong, Sweden (Stockholm), and a second AWS GovCloud (U.S.). The organization is interested in a global footprint with cloud locations in any of the following locations:
North America: United States and Canada
Europe: London, Ireland, Netherlands, Frankfurt, Magdeburg, and Cardiff
Asia/Pacific: Singapore, Hong Kong, Shanghai, Beijing, Pune, Mumbai, Chennai, Tokyo, Saitama, Osaka, Seoul, and Busan
South America: Brazil
Future locations in announced upcoming regions:
United States: Government-cloud regions in Arizona and Texas
France: Paris and Marseille
South Africa: Cape Town and Johannesburg The organization prioritizes access to cutting-edge technology and hardware, such as Intel Skylake or tensor processing units (TPUs), over support for any traditional enterprise workload that requires a missing GCP capability.
The organization is specifically concerned with high-performance networking, and will seek out that capability at the expense of many other features.
The organization is interested in the broadest ecosystem of partners and marketplace vendor offerings. The organization desires hybrid cloud capabilities delivered through the Microsoft ecosystem and through products such as Microsoft System Center, OMS, Hyper-V, .NET, Azure ExpressRoute, and Azure Stack. The organization is looking for a provider that offers all the cloud service models — that is, software as a service (SaaS), platform as a service (PaaS) and IaaS. G Suite, Google App Engine, and Google Compute Engine are examples of Google offerings in each cloud layer. Similarly, organizations that have already migrated to G Suite may find enough integration benefits, such as the shared identity management service, to prioritize consideration of GCP. The organization is interested in infrastructure that is on-demand, scalable and elastic, but also wants managed services coupled with that infrastructure from a single vendor.
The organization is looking to employ the market share leader. The organization is looking to employ a competitor to or contingency plan for AWS and is willing to work around any feature disparity that may exist in comparison to AWS. The organization is looking to employ a competitor to AWS or Microsoft Azure that covers features missing from the other provider’s cloud model. The organization has existing relationships with Oracle and would select OCI as complementary to other services or products from Oracle.
The organization prioritizes maturity and proven stability of services. AWS has demonstrated, over many years that it can operate its infrastructure at scale while adhering to published SLAs. The organization values and prioritizes the relationship with Microsoft and the upcoming roadmap over any currently missing criteria. The organization wants sustained-use discounts in addition to other cost model discounts like commitment or prepayment. To avoid short-term costs and capacity-planning costs, some organizations are seeking more flexible volume-discounting models like those offered by GCP. The organization needs flexible provisioning options on bare-metal servers to meet requirements for performance, regulatory compliance, or software licensing.
The organization has a use case for VMware Cloud on AWS. Organizations that want to maintain their VMware infrastructure but gain proximity to AWS will find that the VMware Cloud on AWS managed service offers an attractive solution for “lift and shift” workloads. The organization is deeply tied to Microsoft software and the Microsoft ecosystem for infrastructure hosting options or management. Furthermore, the organization has deep expertise in Windows Server, PowerShell, AD, SQL Server, System Center, and SharePoint and is looking for a public IaaS environment in which to run these technologies. The organization is in the retail vertical and is hesitant to use Amazon’s platform (AWS) for computing needs, due to competitive concerns.
The organization received free Azure hours as part of a Microsoft Enterprise Agreement (EA) or incentive program and is interested in trying the service out. When NOT to Deploy
Noting all the features, it can be difficult to wade through everything and make a clear decision that will benefit Sanlam. Keeping this in mind, Gartner has also made certain recommendations on when and under what conditions it is appropriate NOT to consider a particular CSP as a cloud adoption partner.
When NOT to pursue and deploy services from these CSPs
Amazon Web Services Microsoft Azure Google Oracle
The organization is looking for a provider that can also offer on-premises private cloud services. The organization has existing in-house expertise with another cloud provider that meets its requirements and does not feel a need currently to spread workloads out to mitigate vendor risk. The organization is looking for an IaaS provider with highly customized contractual protections that are more typical of a data center outsourcing provider. The organization is looking for a mature cloud IaaS provider with broad geographic coverage and a rich set of functionality.
The organization requires hosting options outside of the locations covered by AWS. The organization is interested in cloud-hosting locations outside the Azure hosting locations mentioned in this research or depicted on the Azure website. The organization is interested in cloud-hosting locations outside the Google hosting locations (for example, Africa or China). The organization now needs, or will likely in the future need, capabilities that are more commonly available in other, more established hyperscale providers, but which OCI does not yet provide.
The organization demands robust SLAs or highly customized contractual protections that are more typical of data center outsourcing contracts. The organization is strategically looking at public cloud services to control/limit large vendor relationships such as the one with Microsoft that already exists within the organization. The organization needs key features that GCP lacks or has not yet released into general availability. The organization requires strong integration between basic IaaS functions and higher-level software infrastructure and operations management services, including the ability for the IaaS platform to automate operations and issue alerts.
The organization is not interested in committing to the AWS architecture. The organization cannot tolerate key missing features that Azure does not yet deliver. The organization is committed to a specific hypervisor or hardware set (for example, Microsoft or VMware) that allows management compatibility between infrastructures to ensure virtual machine (VM) image compatibility and policy enforcement. The organization does not want to use the AWS API, or any third-party tools/libraries that use the AWS API, even if the tools or libraries can abstractly support multiple cloud providers. The organization prioritizes a more robust third-party ecosystem that can be found with alternative providers. The organization prefers a shared-resource-pool model for resource allocation and billing, instead of a pay-per-use financial model. Enterprise Adoption StatisticsAccording to Forrester’s Wave for Q2 2018 on both the Hybrid Cloud Management Platform and the cost management and optimization waves, RightScale’s tools have been flagged as the leader of the current offerings on the market.
Hybrid Cloud Management Software Leadersversus Cloud Cost Management and Optimization Software Leaders from Forrester, Q2 2018
Thus, RightScale’s State of the Cloud Report findings are forming the basis of the cloud adoption statistics portion of this paper. Figure 5 below, depicts the cloud adoption by enterprises for the last and current year.
In 2018, AWS continues to lead in public cloud adoption, but other public clouds are growing more quickly. Azure especially, is now catching up to AWS, especially in larger companies. In 2018, 64 percent of respondents currently run applications in AWS, up from 57 percent in 2017, showing a 12 percent growth rate. Overall Azure adoption, grew more quickly from 34 to 45 percent, showing a 32 percent growth rate to close the gap with AWS. Thus, Azure adoption has now reached 70 percent of AWS adoption, up from 60 percent last year. Google maintained its third place position, growing from 15 to 18 percent adoption, showing a 20 percent growth rate. VMware Cloud on AWS was used by 8 percent of respondents, a strong showing in the first year of availability. We can also gauge interest and potential for future adoption by measuring respondents who are experimenting or planning to use a specific cloud provider. This year there was a higher percentage of respondents experimenting or planning to use Google (38 percent), followed by Azure (31 percent), and VMware Cloud on AWS (28 percent). This indicates a potential for Google to accelerate adoption in future years as the respondents’ experiments and plans come to fruition.
Enterprise Public Cloud Adoption Statistics
Azure increased adoption significantly from 43 percent to 58 percent (35 percent growth rate) while AWS adoption in this group increased from 59 percent to 68 percent (15 percent growth rate). Among other cloud providers that were included in the survey last year, all saw increased adoption this year with Oracle growing fastest from 5 to 10 percent (100 percent growth rate), IBM Cloud from 10 to 15 percent (50 percent growth rate), and Google from 15 to 19 percent (27 percent growth rate). Enterprise respondents with future projects (the combination of experimenting and planning to use) show the most interest in Google (41 percent).
Enterprise Public Cloud Adoption for Running Applications
Enterprises starting on their cloud journey use Azure slightly more than AWS. The cloud maturity of an organization typically correlates to the length of time that it has been using cloud. That correlation is due to the time it takes to build cloud expertise and create processes and best practices across the organization. Because AWS was the first large-scale cloud provider, AWS is used more frequently by advanced (i.e., longer-term) cloud users. Across all respondents, 81 percent of advanced cloud users leverage AWS vs. 49 percent using Azure.
What becomes more interesting is to look at which clouds are chosen by users who are just starting their cloud journeys now. Here we see that AWS and Azure are very close, with 40 percent of cloud beginners choosing AWS vs. 36 percent for Azure.
Cloud Provider Adoption by Consumer Maturity
Among cloud beginner enterprises (more than 1,000 employees) Azure shows a slight lead (within the margin of error) with 49 percent adoption vs. 47 percent for AWS.
Public cloud scorecard for enterprises: AWS leads, Azure closes in.
AWS has been moving quickly to address the needs of enterprises, and Microsoft has been working to bring its enterprise relationships to Azure. Google and IBM are also focusing on growing their infrastructure-as-a-service lines of business and continue to increase adoption. The above scorecard provides a quick snapshot showing that AWS still maintains a lead among enterprises with the highest percentage adoption and largest VM footprint of the top public cloud providers. However, Azure is showing strength by growing much more quickly on already solid adoption numbers. IBM and Google are growing strongly as well but on a smaller base of users. AWS still leads in public cloud adoption but Azure continues to grow more quickly and gains ground, especially with enterprise customers. Among enterprise cloud beginners, Azure is slightly ahead of AWS. Google maintains the third position.
Gartner Peer ReviewsIn section 3.1 of this document, we unpacked the results of the Gartner evaluation of the top 3 cloud providers, based on market share, and the key aspects highlighted from this evaluation that may be pertinent to Sanlam in its selection process. We then looked at Cloud adoption statistics from RightScale showing which cloud providers are being purchased, experimented with, or watched by enterprises for adoption and showing the trends over the past 2 years. This section of the document provides the view from the IaaS Cloud Consumer perspective using the Gartner Peer Insights reviews, which constitute the subjective opinions of individual end-users based on their own experiences, and do not represent the views of Gartner or its affiliates. As this is a very subjective representation of Consumer experience with Cloud Providers services, it is imperative to show the number of reviewers who have been involved as the greater the number of independent reviewers, the more reliance can be placed on the feedback provided.
Number of Reviewers per Industry
The graph below (Figure 10) indicates the overall peer rating assigned to the top five (5) market share leaders and Oracle as a wildcard (Figure 1) by these Peer Reviews.
Overall Gartner Peer Rating
Figure 10 provides a different view to the structured feedback from the formal Gartner Evaluations, but it is pertinent to understand that the Peer Reviews are subjective and could be influenced by elements, such as vendor relationships, other than facts based on available capabilities and offerings. Likewise, Figure 11 below, depicts an aligned view to the overall Peer ratings, showing AWS and Google to be the leaders rather than AWS and Microsoft, this is from the perspective of the peers’ willingness to recommend the cloud provider to others. The high scores reflected for Oracle in figures 10 and 11, is largely based on a migration of existing Enterprise-wide Oracle applications from on premise to Oracle cloud infrastructure deployments, such as Oracle e-Business Suite, not vendor agnostic workloads.
Willingness of Peers to Recommend CSP
Figures 12 and 13 show that the consumers view of the capabilities show very little differentiation between the cloud providers either from a product capability or a customer experience perspective.
Gartner Peer subjective view of Product Capabilities
The lower ratings for Microsoft and IBM from consumers may be attributed to the fact that these cloud providers are more focused on cloud solutions that are compatible with their own product sets or that the consumers are attempting to avoid the large vendor relationships already existing within their enterprises, as opposed to the more flexible and open architectures and compatibility options offered by the AWS and Google counterparts. This however is supposition as the ratings are not clarified sufficiently to determine this but is simply answers to a set of questions with individual reviews listed as clarification.
Gartner Peer subjective view of Customer Experience when deploying cloud
Aspects to note from the customer experience ratings is the cost differentials noted. As per the cloud consumers (Gartner Peers), Google’s offering has the most price flexibility, which aligns to the Gartner evaluation which indicates one of Google’s strengths to be in their discounted pricing. AWS takes second place in this aspect and the rest are roughly equal.
Sanlam SituationSanlam is relatively new to Cloud Computing. The understanding thus far is that Sanlam has adopted Cloud Computing on an ad-hoc basis with limited control and limited enforced governance within a federated operating model where each business unit with the various clusters are largely contracting separately for cloud services and thus ensuring that economies of scale cannot be leveraged for the benefit of the Group. When it comes to the bigger cloud deployments, the ownership is largely centralised within the GTI space, but this cannot currently be conclusively stated.
This paper is not proposing that the entire Sanlam Group operating model be changed, as the basis for the federated model is justified and practical from a purely “run the business” perspective. What the paper is suggesting is that there are fundamental capabilities within the adoption of Cloud Computing that should be managed and governed centrally to ensure that Sanlam Group benefits can be optimised across Service Providers. It does also call for improved collaboration and sharing between Group entities.
Based on various interviews held thus far, a decision has been proposed to look at a three provider model, namely: AWS, Azure and IBM (solely due to the size of the IBM product deployment within Sanlam). Other input is looking to use AWS and Azure together to provide a solution with components of the solution deployed in each to provide assurance of the capability to exit (plan to leave, as needed) but also to leverage the “best of breed” from each provider. This also provides Sanlam with bargaining power with these providers, but also has the potential to introduce integration complexity and security weaknesses that Sanlam does not want and cannot afford.
Therefore, it is imperative to highlight that the capabilities of the cloud provider solution must be equal or better than the capabilities Sanlam has on premise to ensure that the risk level of Sanlam is maintained, at minimum, or improved (reduced risk) and thus aligned to the Sanlam Risk Appetite and risk management practises.
Implications of adopting Immature Cloud Computing OfferingsThe implication to Sanlam and its clusters of deploying solutions on weakened or immature cloud provider services (infrastructure, platform or software related), is that additional managed services (at minimum) will need to be procured to substitute cloud provider capability gaps (where possible) and this will mean additional resources, skills, and ultimately cost which will need to be considered within the solution business cases. Possibly indicating that the cloud option is not viable or cost effective to pursue.
Sanlam Geographic Presence and its implications on Cloud Provider SelectionIt is well known that Sanlam Group is operating in a federated business model, meaning that each business or legal entity has a definitive decision capability within the borders of its operation. This means that Group does not control how the businesses run and what it does.
One of Sanlam Group’s main strategic business drivers is global expansion. Based on this and the fact that Sanlam currently (at the time of writing this document) has presence in 44 countries across the world, it is imperative that we look at adopting cloud providers that could be leveraged across the world as well. This will enable Sanlam to leverage the global presence of the cloud providers to improve and build collaboration capabilities for the Sanlam Group thus making it possible to share capabilities and potentially leverage cloud providers’ networks to make sharing of resources more feasible across borders, where appropriate. Thus introducing the potential for further cost optimizations through shared cloud instances.
To build the foundation for making this possible, Sanlam must ensure that the global presence of the cloud providers is as prolific as the Sanlam Group presence within the world.
Current Landscape and Cloud Computing implicationsThe Microsoft Enterprise Agreement is structured in four pieces, namely: Microsoft Business and Services Agreement, Enterprise Agreement, Enterprise Enrollment, Server and Cloud Enrollment, and finally, Subscription Enrollment. The Enterprise Enrollment products and platforms licensing are depicted in Figure 14 below:
Enterprise Enrollment Breakdown
The Server and Cloud Enrollment (SCE) covers the product sets depicted within figure 15 below:
Microsoft Enterprise Agreement License Categories
Further than these two enrolment types, there is also something called a Subscription Enrollment, which includes online services such as Office 365 and Enterprise Mobility and security services as well as additional online services such as Dynamics 365.
Sanlam has an agreement in place with Microsoft covering all of this with licensing for various Microsoft products distributed across the Group and clusters at the highest discount level (level D) allowed by Microsoft. Furthermore, Microsoft has committed and published the fact that the company will be deploying a cloud presence within Africa and more specifically within South Africa within this year. This information and the fact that there are resources within Sanlam that is familiar with and understands the Microsoft development landscape and makes use of this regularly for Sanlam’s benefit, makes Microsoft a logical choice as a Cloud Provider.
Microsoft is a large vendor and tends to make promises beyond its capability to deliver immediately. In a drive to limit the large vendor lock-in of services and ensure that Sanlam always has negotiation leverage, it is best to have an alternate as the primary cloud provider.
With AWS being the market share leaders, the most mature of the cloud providers and forming the base platform for most PaaS and SaaS cloud provider offerings, as well as having made commitments to deploy a local South African presence within the next year or so. AWS is the logical choice for the primary Sanlam cloud provider within the IaaS world.
RecommendationBased on all the information above including our existing technology landscape and licensing agreements, the risk averse culture within Sanlam and the global presence of the cloud market leaders, the recommendation from Group Enterprise Architecture is to look to AWS as the preferred Sanlam Cloud Service Provider for IaaS Services. With Microsoft Azure being considered the alternate option, when AWS is not viable based on pertinent solution criteria or when pursuing a contingency provider deployment. This recommendation is also largely driven by the fact that these cloud providers have committed to having cloud presence within Africa and specifically South African borders in the next 3 to 12 months, indicating a potential for bandwidth cost savings.
Should any cloud provider other than AWS or Azure be pursued, extra governance (related to the capability gaps and risks introduced by each) and capability requirements to mitigate the added risk will need to be considered and provisioned for as part of the initial deployment. It is not advised to pursue Oracle and equivalently immature cloud providers as discussed earlier, as the perceived benefits will be outweighed by the weaknesses introduced. It is advisable that should this be insisted upon for whatever reason, extra funding be requested to be able to mitigate the gaps to be introduced and this aspect is to be considered when the decision is being taken.
The cloud provider market and “up and comers” will be monitored for growth and maturity and this approach will be reassessed as the market develops to ensure the best approach for Sanlam. Should the market be severely disrupted in the future or significant maturity improvements become visible, the potential for Sanlam to benefit will be investigated and an approach defined at that time.
Next StepsThe next steps are to:
Finalise and gain approval on the Sanlam Cloud Computing Strategy
Produce a position paper about PaaS services and providers and where Sanlam will benefit the most.
Finalise governance definition including policies and standards to support it for approval.
Product a position paper to unpack the cloud provider cost models and look at building a Sanlam TCO calculator to appropriately manage cloud costs for the Group.
Finalise list of requirements and funding for RFx process to be followed to procure Hybrid cloud management tool for cloud usage optimization within Sanlam
Evaluate skills and establish training plans for key resources to become change agents for Cloud adoption across Sanlam.
ReferencesSanlam Documents Sanlam Group Cloud Computing Strategy-Architecture v0.07 – July 2018, Tania Paulse: http://gtiportal.sanlam.co.za/mygti/Group-Enterprise-Architecture/SitePages/Cloud%20Computing.aspx
GartnerGartner Peer Insights: https://www.gartner.com/reviews/market/public-cloud-iaasEvaluation Criteria for Cloud Infrastructure as a Service: https://www.gartner.com/document/3729517In-Depth Assessment of Google Cloud Platform IaaS, March 2018: https://www.gartner.com/document/code/349149?ref=ddrec
In-Depth Assessment of Amazon Web Services IaaS, March 2018: https://www.gartner.com/document/3867872?ref=ddrec&refval=3867873
In-Depth Assessment of Microsoft Azure IaaS, March 2018: https://www.gartner.com/document/3867976?ref=ddrec&refval=3867873
In-Depth Assessment of Oracle Infrastructure IaaS, July 2018: https://www.gartner.com/document/3884666?ref=ddrec&refval=3867976
White PapersRightScale 2018: State of the Cloud Report, Data to Navigate your multi-cloud strategy.
Appendix A: Gartner IaaS Provider Evaluation Criteria ExplainedFeature Category DescriptionCompute
The compute portion of an IaaS offering is often the most requested and deployed service. Providers most commonly use server virtualization to provide virtual machines (VMs) as a service. However, the compute node will sometimes be referred to as “instance” in this document. The term “instance” is derived from “operating system (OS) instance” and is used to refer to a compute unit, whether virtual or physical. VM always refers to an actual virtual machine, whereas instance could refer to any type of compute instance.
Providers offer a wide variety of IaaS storage, and these services are often complementary to a compute offering. Block and object storage services are the most common offerings found in the industry. Block storage services typically are not exposed via the internet, whereas object services typically are. For the purposes of this document, the following definitions apply.
Block storage service: Instance-independent block storage. The customer can obtain block storage volumes that are network-attached and independent of any specific compute instance. The customer can then mount this storage volume on a compute instance.
Object storage service: A scalable, elastic storage offering where objects (files) are stored and retrieved via a web services API.
File storage service: A scalable, elastic, and fully managed file-level storage offering. The customer can upload, download, and interact with files over standard IP-based network file-sharing protocols like Network File System (NFS) and Server Message Block (SMB).
IaaS cloud offerings must operate at extreme scales to satisfy the complex requirements of multiple customers simultaneously. Therefore, the networking of all IaaS components is crucial to the overall viability of each service. Too little throughput, not enough isolation between tenants or too much latency among tiers of service will likely cause customers to take their business elsewhere. Therefore, evaluating the network capabilities of a cloud IaaS cloud provider is important in an overall cloud service assessment.
Security and Access
Security and access control often top the list of biggest customer concerns about using cloud IaaS solutions. Cloud IaaS providers are quickly improving security and access configurations, as well as sharing more information about the ways in which their environments are secured.
Compute, storage, and network are the three main pillars in IaaS offerings. However, many criteria exist that cannot be isolated into any of the three specific areas. Therefore, in this section, Gartner has provided criteria not specific to any of the three pillars. This category generally focuses on service implementations of the bundled IaaS stack, including data center architecture, availability, scalability, and value-added services.
Support and Service Levels
Self-service and on-demand features have helped define IaaS offerings. However, IT organizations need support and service levels from providers that will accommodate enterprise-grade hosting requirements. Enterprise customers care about support and service levels because they demonstrate provider commitment and integrity, and they help build trust in the relationship between providers and customers. SLAs do not prevent cloud outages. Rather, they define the terms of an outage, incident, or degradation of service; they explain the response expectation to issues; and they set customer expectations for service performance.
Management and DevOps
The strength or weakness of an IT organization often has nothing to do with technology implementation or selection, but instead with the organization’s processes and people. When IT organizations desire to move critical workloads to cloud IaaS environments, providers must offer them mature enterprise management capabilities and DevOps solutions to further automate and control the various customer assets.
Price and Billing
Price is often a key concern when evaluating cloud IaaS providers. Although price is important, the following criteria highlight the financial aspects of using IaaS, rather than actual price. Price fluctuates based on various provider differences (e.g., the business model).
Required Feature SetCompute
Rapid, self-service provisioning: The cloud service must offer self-service provisioning of instances, either through a management console or through a programmatic interface (e.g., an API and a command line interface CLI). Provisioning must be simultaneous, not sequential — the provider’s self-service interfaces and control plane must be able to provision many instances for multiple customers at the same time, without dependencies between those provisioning jobs. Finally, the provisioning capabilities must be rapid. That is, the interval between the provisioning action and the point where the customer can log in to the server must be within a certain threshold. These provisioning thresholds are as follows:
Less than five minutes for a single Linux instance (1 CPU, ~4GB RAM)
Less than 10 minutes for a single Windows server instance (1 CPU, ~4GB RAM)
Less than 15 minutes in total for 20 Linux instances (each with 1 CPU, ~4GB RAM)
Image customization: As a self-service capability, a customer can:
Provision an image onto a compute instance
Customize that image (altering OS files, installing additional software and so on)
Save it as a new, privately available image that can then be used to provision other instances in the future
This new image must either be saved into the image catalogue or otherwise be persisted beyond the lifetime of the instance itself.
Bring your own image/VM import: As a self-service capability, a customer with an existing image in a supported format can import that image and save it as a new, privately available image that can then be used to provision VMs in the future. This ability to custom import an image should require no intervention on the part of the cloud provider. However, the provider must be able to support Open Virtualization Format (OVF)-based, Virtual Machine Disk (VMDK)-based or Virtual Hard Disk (VHD)-based images.
Two-generation OS support: In its catalogue of OS images, the provider must offer current and N-1 long-term support (LTS) versions of the following:
At least one enterprise Linux distribution (such as Red Hat or SUSE)
One commonly used free Linux distribution (such as CentOS, Debian or Ubuntu)
The two latest major Windows Server releases (i.e., Windows Server 2012 and Windows Server 2016)
Large-instance support: Providers must offer instances with many processor cores and a large amount of memory for processor- or memory-intensive use cases. The provider must be able to provide instances that support at least eight CPUs and 64GB of RAM.
No compute starvation or resource prioritization across tenants: Providers must not employ any amount of resource starvation or prioritization across tenants, unless the customer clearly subscribes to such a variable-performing tier of service. In the standard service, it is not acceptable to impose resource starvation of customer instances to rebalance or improve the performance of another tenant.
VM-preserving host maintenance: The provider may perform maintenance on a compute host without disruptively impacting running VMs on that host. Such maintenance may include performing a kernel upgrade or a hypervisor upgrade. Avoiding VM disruption may be accomplished by a variety of means, including live migration to another host or memory-preserving maintenance (suspending VMs for no more than 60 seconds, with the VMs restored to the same state after recovering from suspension). Customers do not need to have any control over this capability.
VM-preserving data center maintenance: The provider may perform maintenance that impacts the physical compute host without disruptively impacting running VMs on that host. Such maintenance may include replacing or upgrading hardware, or performing power maintenance that eliminates power to the host. Live migration would be the standard way to avoid VM disruption. Customers do not need to have any control over this capability.
VM host failure recovery: The cloud service must be architected to automatically restart VMs on a healthy host if the original physical host fails. This capability is often referred to as “VM restart.” The service may automatically attempt to reboot the original physical host and restart VMs on it. However, if that fails, the service must support automatically restarting the VMs on a healthy host. The failure detection must occur within one minute, and if the host cannot be immediately recovered, the automatic VM restart process on another host must begin within five minutes.
Instance maintenance/failure notifications: The cloud service must be able to notify customers of compute resilience events, such as live migrations, instance restarts or memory-preserving maintenance. At a minimum, an option for email notification must exist. The customer must be able to opt in or out of this communication via self-service means.
VM restart flexibility: If VM-impacting maintenance must occur, providers must offer customers flexibility in selecting restart windows per VM. For example, suppose a security vulnerability patch is required at the infrastructure level and a customer VM reboot is required to apply the patch. Providers must offer customers at least three different choices of downtime windows or communicate a procedure by which customers have flexibility to take their own action. The flexibility window must be at least 30 days long, apart from critical vulnerabilities.
Explicit host anti-affinity: Customers must be able to explicitly set, through self-service interfaces, anti-affinity rules for individual VMs that must be placed on different physical hosts from one another. Anti-affinity rules help customers achieve greater application availability by ensuring that a single physical failure event does not impact multiple VMs. Such rules make sure that VMs are dispersed across different physical hosts, clusters, or locales.
Dynamic horizontal auto scaling: The cloud service must provide functionality to automatically scale VM pools horizontally based on triggers (i.e., functionality to automatically provision/deprovision compute VMs and reconfigure load balancing accordingly). This must be a service and cannot require direct application instrumentation. It also cannot require the client to preprovision compute pools. For more details, see “Technology Overview for Auto scaling.”Storage
Bulk data import/export with encryption: It is challenging to migrate large amounts of data to an external hosting location via the network. The challenge is due in large part to bandwidth restrictions, network latency, overall reliability, and internet backbone charges. Therefore, providers must offer a bulk data import and export service with encryption for moving large amounts of data both into and out of the cloud service. Bulk data movement includes the ability to physically ship datasets via traditional express-mailing techniques (for example, FedEx, United Parcel Service UPS and national postal services). The provider must support at least one of the following device interfaces: Fibre Channel (FC)-based and/or internet Small Computer System Interface (iSCSI)-based disk arrays, external SATA (eSATA), SATA or USB.
Cross-geography replication: The provider must offer a replication service offering for both object and block storage services that traverse regional boundaries. This offering must be a service customers can opt into for geographic/global data protection, and the customer must have control over country placement. Due to the distance between regional boundaries, this replication is assumed to be asynchronous.
Block Storage Service Criteria
Scalable instance-independent block storage service: This type of offering allows the customer to obtain block storage volumes that are network-attached and independent of any specific compute instance. The customer can then mount this storage on any compute instance (regardless of the type of compute instance). Thus, this storage can persist beyond the life of an individual instance and can be moved between instances. To meet large-scale requirements, the block storage service offering must support at least 10 volumes of at least 1TB in size per customer, and it must allow the overall capacity of the storage service to be unlimited per customer.
Block storage snapshots: The cloud service must support a point-in-time copy of a storage volume, aka a “snapshot.” The customer must be able to create a snapshot of any storage volume through self-service means. Furthermore, snapshots must be able to be used as an image for self-service provisioning new compute instances.
Expandable block storage volumes: The customer must be able to increase the size of an existing block storage volume, without having to provision a new volume and copy the data.
SSD-based block storage: The provider must offer block storage that is solely solid-state drive (SSD)-based. It must provide higher input/output (I/O) than the non-SSD service offering, if one exists, and the performance target or percentage improvement over the standard offering must be documented.
Snapshot copy/replication: The customer must be able to replicate snapshots on demand to a different data center, including data centres in a different geographic region. The portal and/or APIs must directly support this feature, not by a customer who manually copies files around.
Block storage data eradication: The cloud service must support one of the two following forms of data eradication:
Immediate eradication: When a storage volume is de-provisioned or otherwise released by the customer, the platform must offer the capability to immediately overwrite the data associated with the storage volume. This allows the customer to be assured that the data has been eradicated from the physical disks. The platform may do this automatically for all storage volumes, or customers may be offered the choice to force an overwrite. This must be a platform feature (providers cannot tell customers to simply overwrite their blocks with zeros before releasing the storage volume).
Eventual overwrite: This guarantees that data deleted by a customer cannot be accessed, and that deleted data is eventually overwritten. For block storage, this means that nobody has access to read that block again until the block is overwritten. Furthermore, if a customer de-provisions or otherwise releases a storage volume, those blocks must be overwritten before that storage is made available again (whether to the same customer or to another customer).
Object Storage Service Criteria
Scalable object storage service: The provider must offer a distributed, multi-data-center object storage service where objects (individual files) can be stored and retrieved via a web services API. The scalability requirement applies to both individual object sizes and overall storage capacity. For individual object sizes, the storage service must be able to support at least 1TB file sizes, and the overall amount of cloud storage capacity per customer must be unlimited.
Object storage replication: The provider must automatically replicate objects across multiple data centres. This replication must not cross country boundaries (or the customer must have the ability to prevent it from doing so). Alternatively, the replication must be controlled in terms of the country placement.
Automatic object durability: To provide ample data protection, all customer-written data objects must be automatically replicated to three or more locations, and the provider must implement erasure coding that tolerates multiple concurrent failures. The service must not inform the customer that a data PUT operation has been successful until at least two copies have been successfully written. Erasure coding is a method of data protection by which data is fragmented, expanded, and encoded with redundant data pieces, and then stored across a set of different locations.
Bulk object delete: The object storage service must enable customers to bulk-delete all objects in a container or bulk-delete objects based on their metadata assignments. For example, if objects are tagged with a metadata tag of “Project X,” customers must have the self-service ability to bulk-delete all objects with that tag.
Logging of administrative object service requests: The provider must enable customers to log and audit every administrative object storage service request. This capability must include logging of create, read, write, copy, and delete events.
Provider-enabled encryption services: The object storage service must allow customers to opt into provider-enabled server-side encryption (SSE) for objects or object hierarchies within the storage service. Customers should note that they can always manage their own encryption keys and encrypt their own data prior to uploading and storing within a public cloud storage service.
Object versioning: The object storage service must offer the customer the option of versioning an object. This capability automatically keeps previous versions of an object, thus protecting against accidental data loss due to object overwrite or object deletion. However, too much versioning leads to storage sprawl. Customers must therefore have self-service configuration control to turn versioning on or off per object or container.
File Storage Service Criteria
Scalable instance-independent file storage: The provider must offer a scalable, distributed, multi-data-center network file storage service. This service allows customers to obtain a network file share that can be mounted on any of their compute instances. The service should be able to automatically grow and shrink file share capacity as files are added or removed. In addition, the file share storage capacity per customer must be unlimited. Furthermore, the service must support both:
IP-based network file-sharing protocols: The provider must support either NFS 4.0 or SMB 3.0 or higher. Instances from any region must be able to connect to the file storage service.
File storage service authentication: The provider must offer built-in authentication for its file storage service. For NFS 4.0 or higher, the provider must support at least access control lists (ACLs) and security groups. For SMB 3.0 or higher, the provider must support at least ACLs or directory service integration.
File storage data eradication: The cloud service must support one of the following two forms of data eradication:
Immediate eradication: The platform must offer the capability to immediately overwrite the data associated with a file share when the file share is de-provisioned. This allows the customer to be assured that the data has been eradicated from the physical disks. The platform may do this automatically for all file shares, or customers may be offered the choice to force an overwrite. This must be a platform feature.
Eventual overwrite: This guarantees that data deleted by a customer cannot be accessed, and that deleted data is eventually overwritten. Furthermore, if a customer de-provisions a file share, that storage must be overwritten before it is made available again (whether to the same customer or to another customer).
Customer-defined hierarchical LAN topology: Customers require the ability to design hierarchical network infrastructure at the provider and to choose their Request for Comments (RFC) 19181 IP addressing scheme without dependency on having instances in place at the provider. Prior to deploying any compute instances, customers must be able to design the following network components and layouts:
Firewalls and ACLs
Subnets or VLANs
Network address translation (NAT)
Multiple vNICs per VM: All VMs must be able to have multiple IP addresses that are independently routed. Furthermore, all instances must be able to have multiple virtual network interface cards (vNICs), each with its own Media Access Control (MAC) address. If the provider supports private IP addresses, customers must be able to mix public IP addresses and private IP addresses on the same compute instance. The provider must support these capabilities, unless the instance is using a guest OS without such capabilities.
Multi-segment networks and multiple subnets per virtual network: The platform must directly support a single customer having multiple virtual network segments (approximately equivalent to VLANs), without needing to use third-party software to build an overlay. Furthermore, the platform must allow the customer to create multiple subnets per virtual network.
Isolated virtual networks and private-IP-address-only compute instances: Providers must offer virtual networks that are fully isolated and not routable externally. Furthermore, instance configurations must exist that can reside only on these isolated virtual networks and without having any public IP address or internet routing. To qualify for this criterion, an instance must be deployable with only a private-facing network address, or the provider must allow customers to remove a public-facing network address. Firewall support is not sufficient for qualifying for this criterion.
Static IP addresses: To qualify as having static IP addresses, the provider must support several capabilities. First, if a compute instance is dynamically assigned an IP address, that address must remain the same across the lifetime of the instance (unless the customer wants to change it). Second, the customer must be able to obtain an IP address, including an internet-facing public IP address, that can be assigned to a compute instance or load-balancing pool. That IP address must be able to move between compute instances or load-balancing pools and persist as long as the customer wants.
Private IP addresses: Customers must have the ability to choose or define a customized RFC 1918 IP address space within the cloud service environment. This is critical for many hybrid cloud networking designs.
Customer VPN connectivity: Providers must allow customers to access the cloud service via an IPsec VPN tunnel or a Secure Sockets Layer (SSL) VPN tunnel over the public internet. This must be a self-service capability from the provider side, although customers must make configurations on their end.
Private customer connectivity: The provider must enable customers to make private WAN connections into the cloud service from their data centres. In this scenario, customers must be able to obtain private WAN connectivity into the cloud provider via a carrier or colocation facility of their choice. (WAN connectivity can take several forms, such as Ethernet, Multiprotocol Label Switching MPLS, virtual private LAN service VPLS or a direct cross-connect.) Providers may limit this service to specific carriers or colocation facilities.
Multi-data-center virtual networks: The provider must allow the customer to define virtual networks that span two or more physical data center locations. (Note that two physical data centres on the same campus do not count for this purpose. This criterion applies to long-distance, stretched networks.)
Multiple private customer connections: Within the cloud service, the provider must support two or more private WAN connections per customer and per geographic region. At minimum, the service capabilities (such as routing features) must support two scenarios:
Customers with multiple offices that do not want to backhaul all traffic via a single private WAN connection
Customers that want private WAN connectivity from multiple carriers into a single location in order to achieve redundancy
Virtual network routing: The provider must allow customers to define a gateway and custom routing for each virtual network. At minimum, the provider must enable the customer to route all traffic to compute instances on the virtual network through a virtual appliance such as a firewall.
Virtual network traffic exchange: The customer can route traffic between two virtual networks that belong to the same customer account. If the virtual networks use private IP addresses, the traffic must be routable between the two virtual networks without leaving private IP space (i.e., the traffic must not cross the public internet).
Inter-customer private traffic exchange: Customers can route traffic between two virtual networks that belong to different cooperating accounts (including accounts held by completely separate entities). If the virtual networks use private IP addresses, the traffic must be routable between the two virtual networks without leaving private IP space (i.e., the traffic must not cross the public internet).
Front-end load balancing: The platform must provide a front-end, proprietary load-balancing-as-a-service capability. The provider must allow customers, through a self-service capability, to configure IP-based load balancing between compute instances that have public IP addresses or that otherwise take traffic external to the platform. At minimum, the provider must support load balancing of HTTP and HTTPS traffic. The load-balancing capability must be self-service and support up to 25 nodes (instances) per load-balancing group. The service must also support health checks to avoid sending requests to nonresponsive compute instances, and it should leverage round-robin, weighted or metrics-driven algorithms.
Session affinity load balancing: In addition to meeting the load-balancing requirement previously detailed, providers must support session affinity features within the load-balancing service. Session affinity is typically accomplished by setting a session cookie and continuing to route requests associated with that cookie to the same compute instance.
Back-end load balancing: The platform must provide a back-end, proprietary load-balancing-as-a-service capability. The provider must allow customers, through a self-service capability, to configure IP-based load balancing between compute instances that have private IP addresses or that pass traffic among each other (e.g., to distribute traffic from front-end web servers to the middle-tier app servers, or from the middle-tier app servers to back-end databases). At minimum, the provider must support load balancing of HTTP and HTTPS traffic. The load-balancing capability must be self-service and support up to 25 nodes (instances) per load-balancing group. The service must support health checks to avoid sending requests to nonresponsive compute instances, and it should leverage round-robin, weighted or metrics-driven algorithms.
DNS-based global load balancing: Via a self-service capability, the customer can configure global load balancing, where requests may be directed to endpoints located in different data centres (and those endpoints may be different front-end load-balancer pools). Global load balancing is usually a DNS-based service. Furthermore, the service must support endpoint health check routing (i.e., the service will automatically avoid sending requests to endpoints that are unresponsive). It must also support at least one of the two following algorithms:
Latency-based request routing: The request is directed to the endpoint with the lowest latency between the requestor and the location.
Geographic request routing: The request is directed to the endpoint based on the location of the request origination. Typically, this algorithm will be used to route a request to endpoints based on the country of origin.
Metrics-driven load balancing: In addition to meeting the load-balancing requirement previously detailed, providers must support metrics-driven load-balancing features. Metrics-driven load balancing is when the load-balancing service has insight into various metrics at the compute instances (such as utilization, latency, and responsiveness) and routes requests based on those metrics. Furthermore, customers must have the self-service ability to choose which load-balancing algorithm they want to use for a given configuration (for example, round-robin, weighted or metrics-driven).
Security and Access
Documentation of user control considerations: There is a division of responsibility between the provider and the customer that the provider must make explicit by documenting a set of user control expectations. The provider must document the operational and security risk controls that the customer is responsible for. This documentation may take any of the following forms:
A responsible, accountable, consulted and informed (RACI) matrix
Some other clearly defined shared-responsibility model
Data sanitization: The provider must have documented evidence that it adheres to the Department of Defence (DoD) 5220.22M or National Institute of Standards and Technology Special Publication (NIST SP) 80088 processes for data sanitization and disposal. These processes apply when the provider retires or otherwise disposes of physical storage devices (such as disks and arrays). The provider’s policy must apply not just to all storage services, but also to all services that store customer data (such as database as a service DBaaS).
Stateful network firewall: The provider must offer customers the ability to define rules that are associated with a virtual network or a subnet that serves as a stateful firewall for inbound and outbound traffic. This happens regardless of whether the traffic originates within or external to the platform. Furthermore, the customer must be able to define rules that are associated with a group of instances. Platforms may vary in how they define groups, but the customer must be able to control group membership via self-service. In addition, updating a rule must immediately apply the new policy across all instances in the group.
Stateless network ACLs: The provider must offer customers the ability to define an access control list that is associated with a virtual network or a subnet that serves as a stateless source, destination, or port packet filter for inbound and outbound traffic. This happens regardless of whether the traffic originates within or external to the platform. Furthermore, the customer must be able to define an access control list that is associated with a group of instances. Platforms may vary in how they define groups, but the customer must be able to control group membership via self-service. In addition, updating the ACL must immediately apply the new policy across all instances in the group.
Instance-independent ACLs and firewalls: When an ACL or firewall rule changes, there is no need to reprovision, reboot or shut down any impacted compute instances for the new policy to take effect.
Annual SOC 1 and SOC 2 reports: Providers must have an annual, completed Service Organization Control (SOC) 1 and SOC 2 audit report that customers and prospects can see on demand. Gartner allows providers to require a signed confidentiality agreement to meet this requirement. Customers should realize that SOC 1 focuses more on financial reporting controls, whereas SOC 2 focuses on a “business’s nonfinancial reporting controls as they relate to security, availability, processing integrity, confidentiality and privacy of a system.”2Published compliance assistance: Organizations have a wide variety of regulatory, privacy and compliance requirements that cross industry verticals and individual countries. The onus of adhering to individual compliance requirements always falls upon the shoulders of the application owner (i.e., the organization). Therefore, for any regulatory, privacy or compliance certification that it claims to support, the provider must offer published guidance that explains how the customer can ensure that individual applications or deployments adhere to the named certification. The provider must publish a consolidated listing of all such certifications, along with the supporting documentation.
Customer control over data locale residency: The provider must enable customers to control data placement on a regional basis. The platform design should clearly indicate where each service is physically hosted (specifically, the country and metropolitan area), so that customers can choose which locale to use. In each service, customers should be able to physically move data between locales, either through service interfaces or through management consoles. The provider must never arbitrarily move data across international boundaries. If a service explicitly distributes data across international boundaries for redundancy, customers must either opt into such redundancy or can control which specific countries their data resides in.
ISO/IEC 27001 certification: The International Organization for Standardization (ISO) and the International Electro-Technical Commission (IEC) published an information management standard referred to as ISO/IEC 27001:2013, which is often shortened to ISO 27001. ISO 27001 mandates specific requirements to organizations about the management of information security. Thus, IaaS providers must be ISO 27001-certified. The certification must cover all worldwide data centres. For more information, refer to “ISO/IEC 27001:2013 Shifts Focus from the Effectiveness of Controls to Risk Treatment Plans.”Customer data ownership: The provider owns the rights to the overall service and physical infrastructure. However, the provider’s general terms of service and enterprise agreements must specify that the customer owns the rights to all data, inputs and outputs associated with consumption of the service. The customer must retain ownership even through a provider acquisition or bankruptcy event. In the event of bankruptcy, closing of business or retirement of service, the provider must give customers 90 days to get their data out of the system and migrate applications.
Provider personnel protections: The provider must document the measures it takes to protect customer data from its employees and personnel. These measures must include background checks, logging of administrative access to the cloud service, and network separation between the cloud service and the general-purpose LAN of the provider company. It is acceptable for this documentation to be included in an audit report, such as Statement on Standards for Attestation Engagements (SSAE) 16 or ISO 27001, or to be published in a stand-alone security paper.
Initial administrative access credentials: The default administrative access for every deployed instance in the cloud service must be automatically generated or chosen at the point of provisioning by the customer. If automatically generated, the access credentials must be communicated to the cloud administrator through either an API service call or a visual alert in the management console. Passwords must never be emailed.
SSL-secured API endpoints: The platform’s customer-facing API endpoints must be secured with SSL. The SSL certificate must be signed by a commonly trusted certificate authority or Root (CA). It must not be a self-signed certificate, unless the provider is also a commonly trusted CA.
Multiple API keys per account: Providers must allow customers to generate and own multiple unique API keys.
Local identity management and granular role-based authorization — compute: Providers must include a local identity management system (i.e., local accounts) for compute services. The system must reside in both the service interfaces and the management console, and it must enable granular role-based authorization. At a minimum, the role-based authorization must be able to:
Assign authorization based on individual users and groups of users
Delineate the following granular actions, as applicable: create, delete, restart/reboot, shutdown, start-up, and backup/snapshot
Local identity management and granular role-based authorization — storage: Providers must include a local identity management system (i.e., local accounts) for storage services. The system must reside in both the service interfaces and the management console, and it must enable granular role-based authorization. At a minimum, the role-based authorization must be able to:
Assign authorization based on individual users and groups of users
Delineate the following granular actions, as applicable: create, delete, attach, detach and backup/snapshot/copy.
Local identity management and granular role-based authorization — network: Providers must include a local identity management system (i.e., local accounts) for network services. The system must reside in both the service interfaces and the management console, and it must enable granular role-based authorization. At a minimum, the role-based authorization must be able to:
Assign authorization based on individual users and groups of users and
Delineate the following granular actions, as applicable: create, delete, and configure.
Private image catalogue: The provider must allow customers to maintain a private image catalogue of instances that is not accessible to other cloud customers. Furthermore, the provider must offer basic role-based authorization into the catalogue, so that granular rights (e.g., create, deploy, and delete) can be handled on an image-by-image basis and a user-by-user basis.
Identity integration with AD: Providers must offer integration with on-premises Active Directory (AD) for cloud service account management. Providers must support importing and synchronizing users with the customer’s AD, and/or allowing users to authenticate with the customer’s directory credentials. Such functionality must be a service offering and not a customized customer solution.
Authentication via SSO using SAML: Providers must offer single sign-on (SSO) login access to the GUI management console through Security Assertion Mark-up Language (SAML) assertions. The platform must allow integration with one or more third-party SSO services.
MFA administrative access control: Providers must enable customers to control user access to the cloud service through multifactor authentication (MFA). MFA does not need to be the default configuration, but providers must present an offering for customers to secure user accounts. Examples of satisfactory MFA include hardware- or software-based secure tokens using either of the following algorithms:
Hashed message authentication code (HMAC)based one-time password (HOTP; RFC 4226)
Time-based one-time password (TOTP; RFC 6238)
DDoS mitigation: Providers must offer a distributed denial of service (DDoS) mitigation service. DDoS attacks that originate from the internet must automatically be detected and mitigated for all customers across the platform.
Network forensics: The provider must enable customers to log metadata about network traffic that is permitted and denied by ACL and firewall services. The provider must also give customers access to metadata about traffic that is automatically blocked by the platform (because cloud providers may automatically stop clearly malicious traffic for all customers). Customers must have self-service access to the data for analytics and historical trending as follows:
The data must be available in real time at an interval of 10 minutes or less.
The data must be consolidated and accessible from a dashboard in the management console.
The customer should be able to export the data for external analysis.
Network forensics monitoring service integration: The provider’s network forensics service must integrate with the provider’s monitoring solution to allow alarms to be created.
Data center dispersion: Providers must have at least three data centres that are a minimum of 200 kilometres (125 miles) apart from one another. These data centres must be on different power grids. Offering multiple data centres with this level of dispersion allows enterprise customers to develop availability options that can sustain local-outage-causing issues like storms.
Data center proximity: Providers must also have at least two data centres (per region) that are within 100 kilometres (60 miles) of each other to support synchronous replication requirements. This is the maximum distance for which the latencies are low enough to accomplish synchronous replication. Furthermore, customer networking must be able to span across these local data centres so that customers may build highly available architectures.
Multiple continents and data centres: Providers must include service offerings to satisfy high availability (HA) and disaster recovery (DR) on multiple continents and for multinational customers. At a bare minimum, the provider must have at least:
Two distinct geographic offerings in the United States (e.g., East Coast and West Coast)
One offering in Europe
One offering in Asia/Pacific (not including China)
Furthermore, for each geographic service offering, the implementation must include at least two separate, distinct hosting locations (e.g., data centres or zones). Within a geographic locale, IT organizations must have the ability to distribute applications across two or more physical data centres.
Flexibility to negotiate custom terms in a service/cloud-hosting agreement: IT organizations often begin using an IaaS cloud solution based on a publicly accessible terms-of-service agreement. However, most of the same organizations eventually desire a more robust enterprise agreement for cloud hosting. Providers must offer such customers the ability to negotiate a custom agreement. To qualify for this criterion, the provider must offer a request process by which a customer can initiate a custom-negotiated agreement, and the provider must be willing to materially alter the terms of the standard agreement.
Published architecture transparency: Customers need to assess the risk of using a cloud IaaS environment, as well as the impacts of migration, compliance, licensing, configuration and performance. To help customers with this assessment, providers must publish certain infrastructure details to customers. Such details must include:
The region and country of each data center location (e.g., Northern California or Western Germany)
Data center availability configurations for power, cooling and network
The underlying hypervisor product/technology
A description of what constitutes a CPU at the provider
Any variability, oversubscription or throttling of CPUs
Data resiliency, protection, replication or durability strategies — essentially, how often data is replicated and protected by the provider automatically on behalf of the customer
Enterprise customer case studies: Case studies and customer references are an important aspect of evaluating providers. The provider must have a repository of customer case studies published on its website. The list must include at least three enterprises with annual revenue of $1 billion or more. The repository must also include at least one case study from each region of the world in which the provider has a cloud presence.
Published reference architecture: The provider must maintain a published list of reference architecture diagrams and descriptions on its website. This material should be a set of recommended patterns that customers can emulate when deploying common scenarios. The provider must have at least five reference architecture examples. Furthermore, the examples must be inclusive and illustrative of combining multiple services (e.g., compute, storage and networking) to solve a common customer problem with the cloud service.
Relational DBaaS: The provider must offer a relational database as a service, provided as a fully automated, self-service turnkey offering. In this service, the customer should not have access to the underlying instance, and the database maintenance must be done entirely by the provider. At a minimum, the service must support one open-source database (either MySQL or PostgreSQL) and one enterprise database (either Microsoft SQL Server or Oracle).
In-memory caching: In-memory caching enables customers to optimize performance for read operations, as well as maintain important availability information such as session affinity and state. The provider must offer an in-memory caching service that is delivered via a self-service, turnkey offering. The customer should not have access to any underlying infrastructure running the caching service, and maintenance must be done entirely by the provider. The service should support standard specifications (such as Memcached and Redis).
Support and Service Levels
24/7 support with 15-minute response: Providers must offer at least one support service tier that includes:
24/7 incident coverage
Incident response times of 15 minutes or less
Many IT organizations require this tier of support to align with internal SLAs and operational-level agreements (OLAs) for production and critical workloads.
TSANet membership: Providing IaaS is a complex initiative that includes integrating many different pieces of IT hardware and software. Furthermore, IT organizations marry internal hardware and software components with various cloud services. When problems arise, it is challenging for providers or IT organizations to gather the various companies together to solve complex issues. Technical Support Alliance Network (TSANet) offers organized collaboration across vendors in order to solve complex problems. The provider must be a member of TSANet.
Live support offering (English): Providers must offer live support via phone and instant messaging. Furthermore, the live support must be offered in English. Native languages for each hosting location are covered in the Preferred category.
Free online self-service support: Providers must offer free, online self-service support that includes FAQs, a knowledge base and discussion forums. The discussion forums must have evidence of regular participation and moderation from cloud service support staff.
Online error/bug reporting: Providers must offer all customers an online-based mechanism for reporting errors or bugs with the service. Providers must offer the mechanism through an online process posted on their public cloud service websites or through their service management platform.
One-year parallel support for API changes: In the event of an API retirement, upgrade or substantial change, providers must guarantee in writing at least one year of parallel support for both the old and the new APIs.
Cloud service partner registry: Providers must have an online listing of established and official partners that offer value-added functionality, such as increased security, management, integration, audit, control or configurations. The partner list must be organized and broken down by the service provided.
Dedicated account manager offering: Providers must offer at least one support service tier that includes a dedicated account manager. Large and complex IT organizations prefer a concierge-level support experience for production and critical workloads, as well as a direct relationship with an account manager who understands their unique requirements and business challenges.
Cloud offboarding support: Providers must enable customers to completely manage all existing assets, deployments and spend with the provider. The service must allow customers to terminate all cost-accruing assets at once, rather than force them to terminate each asset individually. Due to the high-impact nature of this action, providers may inject a confirmation step into the workflow (such as a phone call or an email to verify that the action is desirable).
Published SLAs for all generally available services: Providers must publish written SLAs, on their website, for all products and services that are generally available and have published fees.
90-day SLA change notice: In the event of an SLA change in language, exclusions, terms or measurements, providers must provide at least 90 days of advanced notice before implementing the change. However, an advanced notice is not required if the SLA measurement change is advantageous to the customer (for example, availability increases from 99.9% to 99.99%). To qualify, the provider must define the SLA change notice in the terms of service.
SLA versioning and revision history: Providers must make the SLAs for standard services accessible for review at any time. Providers must include versioning control as well as revision history. Enterprise customers need the ability to see the revision history of all SLA changes for proper auditing and continuous assessment.
60-day service health and SLA history: Providers must offer a dashboard or snapshot of service health and standard SLA status for customers to view at any time. The dashboard must contain at least 60 days of trailing health history so that customers have more than one billing period to review current health and SLA status against the prior month’s billing reports.
Immediate downtime calculation: Customers require that downtime calculations begin immediately when the downtime starts. However, it is acceptable for up to five minutes to pass before reporting engines take notice of the downtime. Delays longer than five minutes are not acceptable for this criterion.
Compute service availability SLA — 30 minutes: The cloud compute service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to 30 minutes per month. Measured monthly, an uptime availability SLA of 99.95% or higher achieves the threshold of 30 minutes per month.
Single-instance/single-data-center availability SLA: Providers must offer and document a cloud compute single-instance or single-data-center SLA. This SLA must be applicable for a single instance or a single data center and cannot contain any requirements mandating the use of multiple instances or multiple data centres. Customers should note that the availability percentage for a single instance could be considerably less than the availability SLA for multiple/pooled instance sets.
Storage service availability SLA — 30 minutes: The cloud storage service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to 30 minutes per month. Measured monthly, an uptime availability SLA of 99.95% or higher achieves the threshold of 30 minutes per month.
Unlimited service credits/refunds: During the time that a service is unavailable, customers must not be charged for the use of that service. SLA penalties must be in the form of service credits or refunds, and must not cap at less than 100% of the previous month’s bill.
Notification window of at least two billing cycles for customers to submit an SLA miss: Providers must grant customers at least two billing cycles worth of time to submit a claim after the occurrence of an SLA miss. Enterprise IT customers have complicated organizational structures, including accounts receivable/payable, procurement, operations and architecture. Such customers need at least two billing cycles to submit SLA miss claims to a cloud service provider.
No maintenance downtime exceptions in the SLA: Providers must count all non-customer-initiated downtime events as outages, no matter how the downtime occurs. This means that any scheduled, announced, planned, unplanned or malicious events all count against documented SLAs.
Technical certification program: Providers design environments differently and utilize different underlying architectures and configurations. Therefore, providers must offer a technical certification for consumers or administrators of the service. The technical certification must involve an exam or test to obtain certification, and it may or may not involve instructor-led or individual-led study.
MSP and consulting ecosystem: Providers must have an online listing of established and official managed service providers (MSPs) and consultants that offer customized services, such as architecture, design, implementation, integration or support. The partner list must be organized and broken down by the services provided.
Management and DevOps
GUI management console support: Providers must offer a rich, configurable, web-based GUI management console for interacting with the cloud service. Providers must demonstrate that most critical cloud service functions are represented in the GUI management console and that — within 180 days of release — all new services are represented in the GUI management console. Finally, the web-based GUI console must support current versions of Chrome, Firefox, Internet Explorer and Safari, as well as mobile browsers in leading Android and iOS tablets (e.g., mobile Safari and WebKit).
Self-service incident logging system: Providers must offer an incident management system for identifying, submitting and tracking cloud service incidents. The system must be available online and be accessible via API to paying customers. It must also include the capability to submit incidents and track incident statuses and updates.
Metadata tagging of resources: Providers must offer customers the self-service ability to set metadata tags on all of their assets. The provider must support the ability to attach at least seven tags per asset, with no limit on the total number of tags per customer. All tags must be searchable and filterable within all service interfaces.
Forced tagging: The platform must enable customers to mandate, within the platform, that all taggable resources be provisioned with a tag. The system must force provisioning to fail with an error if a resource does not have a tag specified by the provisioning request (regardless of whether the request is made via the UI, CLI or API).
Self-service templating: Providers must enable customers to create infrastructure templates. A template in this scenario is a blueprint that allows a collection of different resources to be provisioned together. These resources may include compute instances, storage volumes, network elements, monitoring configurations and security configurations. The customer must be able to create, store and provision from templates built in the cloud service. A template is not just the combination of compute instance size with a particular image. It is a build manifest that provisions multiple elements, and it must do more than just specify how to provision one or more compute instances.
Real-time performance-monitoring service: Providers must offer a real-time performance-monitoring service. It must be accessible from a service interface or the management console, and it must not require customers to build their own performance-monitoring hooks or integration points. It must support monitoring of the following:
Compute metrics (such as CPU utilization, memory utilization, disk I/O and network I/O)
Storage metrics (such as utilization, IOPS and queue length)
Real-time performance health checks, thresholds and alerts: Providers must enable customers to monitor the general health of their compute, storage and network infrastructure for failures, and to receive an alert if a failure occurs. Providers must enable customers to receive alerts based on performance-monitoring thresholds, and the alerts must be generated within one minute of a triggered threshold. Furthermore, customers must be able to define at least three different performance thresholds for which to be alerted, and customers must be able to assign these thresholds on a component-by-component basis (e.g., instances, storage or networks). At a minimum, the provider must support email and SMS alerts and is encouraged to support URL triggers.
API access to monitoring data: Providers must expose their monitoring-service data to customers via a self-service API. This functionality allows customers to programmatically integrate cloud monitoring data with in-house or third-party monitoring systems, or to export monitoring data for analysis. The API must support real-time querying of the following:
Current health status (up, down, degraded and so on)
Current metrics (the most recent values of monitored metrics)
Historical data for both individual elements and all monitored elements (thus supporting bulk data export)
Account management logging: Providers must offer logging of account management activities, including user create/delete events, user grouping/tagging events and miscellaneous account events (such as assigning users to roles, assigning quotas to users and changing user passwords). Customers must have access to these logs through self-service interfaces. Logs should be provided by default for at least three months, and customers should be able to export these logs for longer retention.
Provisioning and catalog action logging: Providers must offer logging of provisioning and catalog actions. In this case, “provisioning” refers to both the provisioning (creation) and deprovisioning (termination) of infrastructure elements. The log service must support the following:
Logging provisioning events for compute instances
Logging provisioning events for storage volumes
Logging changes made to a customer’s image catalog (such as adding, deleting or updating an image)
Logging changes made to a customer’s template catalog (such as adding, deleting or updating a template)
Customers must have access to these logs through self-service interfaces. Logs should be provided by default for at least three months, and customers should ability to export these logs for longer retention.
Security configuration logging: Providers must offer logging of security configurations, including changes in network ACLs and changes in firewall configurations. Customers must have access to these logs through self-service interfaces. Logs should be provided by default for at least three months, and customers should have the ability to export these logs for longer retention.
Configuration management: Configuration management tools, such as Puppet, Chef, Ansible and SaltStack, have become increasingly popular. The platform must embed one (or more) of these tools as a fully standardized service. For instance, if the provider offers a Puppet service, the customer does not need to operate or maintain a Puppet server and is able to self-service usage. Note that this is not a managed service (where Puppet is installed and managed for each customer with the service); providers may wrap or extend the configuration management tool with additional capabilities, if they so choose.
Task scheduler: The platform must provide a scheduling service that allows customers to run tasks at scheduled times and recurring intervals. At minimum, the service must:
Be programmatically configurable
Be able to directly invoke an arbitrary customer-chosen URL for task execution
Allow the customer to configure automatic retries of failed tasks
Historical-performance monitoring: Providers must offer customers historical-performance data of services. At a minimum, the provider must be able to support up to three months of historical-performance data with at most five-minute collection intervals. The provider can offer this as an add-on service or require that the customer use cloud-based storage to store the historical data. The service should also support standard charts to show historical performance graphically, and should support all standard metrics in the real-time performance monitoring service. Finally, providers must enable customers to export the performance data for longer-term storage.
Custom monitoring metrics: The platform must offer a method by which customers can send custom metrics to the monitoring service, which will create an alert, just like it does for any other metric. Providers may require that a monitoring agent be installed onto the assets being monitored.
Price and Billing
Flexible payment options: Providers must offer credit card billing and one other form of enterprise billing, such as a purchase order or invoice. Providers must offer at least 30 days for the proper turnaround of enterprise invoice payment. Large organizations with complex accounts payable departments require ample time to process transactions through their systems.
Detailed billing: Providers must allow customers to download or receive electronic detailed bill reports that list costs on a line-by-line basis. This bill must be in a format that can be programmatically imported for analysis, such as CSV format. The detailed bill must be broken down by each individual billable item so that organizations can perform analytics on which cloud assets are contributing to cloud costs.
Consolidated billing: Providers must allow customers to consolidate multiple bills from the provider into a single bill. This is common when customers have multiple accounts with the provider.
Granular billing based on group/tag: Providers must offer customers bills/invoices that are broken down based on groups or metadata tags determined by the customer. This assistance greatly improves readability and internal cost allocation for large organizations.
Billing alerts per user and per account: The platform must enable the customer to configure a billing alert that automatically generates a notification to the customer when a certain threshold has been exceeded. At minimum, a platform with billing alerts should generate such notifications for account-level consumption and for user-level consumption.
Point-in-time visibility into cost accrual: Providers must enable customers to see financial consumption per service with at least 24-hour accuracy. Gartner does not yet expect providers to offer up-to-the-minute visibility. However, customers must have at least a day’s visibility into how much spend they’ve accrued so that they can scale up or down as necessary to meet financial targets during the billing period. Finally, providers must be able to generate email or SMS alerts based on customer-defined financial thresholds.
Publicly available cost calculator/simulator: Providers must have a generally available cost calculator or simulator so that customers and prospects can forecast the amount of financial spend a specific use case or project might generate. The calculator or simulator must account for all cloud costs that a customer might consume. Providers must offer this tool as a web-based calculator or as a downloadable Microsoft Excel spreadsheet with built-in instructions, formulas and logic.
Marketplace offering: Providers must offer and maintain a marketplace offering, whereby software and hardware OEMs can create prebuilt templates and images and offer them as self-service offerings for customers to deploy into their cloud environments. The marketplace must be centrally billed by the cloud provider and be broken down into categories based on the functionality provided by the hardware/software.
Discounts: Providers must offer a discount program for cloud instances. The level of this discount may vary, but the provider should document any fees or commitments to qualify, as well as the discount applied to the ongoing rate through the lifetime of the cloud instance. Examples of these discounts include, but are not limited to, prepayment, continued use and large-volume use.
Preferred Feature SetCompute
Explicit host affinity: Opposite the “host anti-affinity” requirement, some use cases require that two or more VMs explicitly reside on the same host. This assurance is useful to organizations that must improve performance, minimize latency, or maximize security (e.g., between an application server and a database server). Therefore, the cloud service must support a self-service interface where clients can specify which instances must share a physical host.
Extra-large instance support: Providers must offer instances with a very large number of processor cores and a very large amount memory for processor- or memory-intensive use cases. The provider must be able to provide instances that support at least 32 CPUs and 128GB of RAM.
Hot-swappable virtual hardware: Providers must enable customers to “hot swap” (i.e., resize, add, or remove) core virtual resources on the virtual machine. These resources include processors, memory, disk, and network configurations. Providers must not force customers to reprovision a new VM with new characteristics. Providers can offer this functionality in a granular manner, with controls for each core resource, or through prebundled “sizes” of VMs in the service catalogue. Customers should note that some guest OS families will require a guest reboot.
Dynamic vertical auto scaling: The cloud service must provide functionality to automatically resize CPU and RAM on existing VMs based on triggers.
Restart priority: For providers that support VM restart, customers can choose what order they would like instances restarted in or otherwise can indicate a priority level for restart that determines the order in which the platform restarts instances.
Automatic host anti-affinity: When a customer provisions a new VM, the default placement algorithm automatically attempts to place the VM on a physical host that isn’t used by any of the customer’s other VMs. Thus, a customer’s VMs are distributed among as many different physical hosts as possible (but are not guaranteed to be on different physical hosts).
VM console access — basic access: The customer can, via the portal, access the local console of the instance. Although the specific implementation and protocol used for the console may vary, it must emulate the local video and accept mouse and keyboard inputs. It must include access to basic input/output system (BIOS) and OS load/boot procedures. Finally, the console must work from popular web browsers like Firefox, Google Chrome, Microsoft Internet Explorer and/or Safari.
Single-tenant compute VMs: Single-tenant compute VMs ensure that instances reside on a physical host that is not shared with any other customer. For this specific offering, storage and network may be shared or isolated. If this capability is not directly integrated into the fully multitenant service, then customers must have the option to stretch a virtual LAN (VLAN) or subnet across both environments and share Internet Protocol (IP) address spaces with the multitenant service. Providers can provide and price the dedicated or isolated offering separately from the multitenant offering.
Compute performance baseline: Performance benchmarks across the IaaS industry are not well-defined or standardized. However, enterprise customers must know what is “expected” or “normal” in terms of a compute performance baseline at their provider. Providers must annually publish a documented performance test scenario/script against three or more of the most popular instance sizes and across both a standard Linux configuration and a standard Windows configuration. The results of the baseline provider test must also be published, along with a statement indicating an acceptable standard deviation from the baseline. Customers must have access to the test script and suite, thereby making it easy to repeat or run the test scenario on their own.
Subminute provisioning times: At least one version of a Linux image must be able to provision, boot and go live in less than one minute. This image must have the following minimum specifications: 1 CPU, 2GB of RAM and 40GB of persistent storage.
Provider-offered Linux distribution: The provider must offer and maintain a specific Linux distribution/image that is optimized, tuned and pre-integrated (drivers and software) for use within the cloud platform. The provider must update the Linux image on a quarterly basis with security updates and configurations, so that it stays current with other cloud platform features. Furthermore, the provider must support this Linux distribution for customers that opt into the support offerings of the cloud service.
Backup service: The provider must offer a backup service that facilitates backup and restore of the following:
VMs and storage volumes attached to current VMs
Detached storage volumes hosted on the public cloud IaaS provider’s storage platform
The service must support application-consistent backups, store backups on the provider’s platform and replicate backups between regions. It must be able to restore data to both the original location and the replication region. The backup service must support file-level restores and snapshot restores. Customers should not be required to install their own backup software or agents. Customers should not be required to leverage third-party products to accomplish this.
Managed container service: Providers must offer a managed container service that is compatible with Docker and that includes a scalable, highly available and monitored container management infrastructure. The service must allow customers to:
Launch containers that are automatically scheduled and placed into a pool of infrastructure based on resource needs (such as CPU, memory, and availability)
Subsequently manage the containers
Container Linux: In its catalogue of OS images, the provider must offer the current or N-1 LTS version of one commonly used container-oriented Linux distribution, such as CoreOS or Red Hat Atomic Host.
Block Storage Service Criteria
Block storage interconnect transparency: The interconnect between storage services and the rest of the cloud service can be implemented in a variety of ways and with different characteristics. This architecture can have a dramatic effect on the success or failure of performance requirements. Therefore, the provider must document the storage interconnect architecture to help customers understand performance implications. This documentation should include the following key points:
The explicit, average latency between compute instances and block storage
The use of software, such as rate-limiting or traffic-shaping software, to avoid storage contention and starvation, and the relevant effect such software has
Multiple instance mount: Customers must be able to simultaneously mount a block storage volume on multiple compute instances. It is acceptable for these volumes to be in read-only mode.
Performance target/tier block storage: Customers must have the option to purchase an explicit performance target or performance tier on block storage volumes, such as a certain number of I/O operations per second (IOPS) or a certain amount of throughput (measured in Mbps). Because of limitations in present-day technology, the target need not be an absolute guarantee, but the provider should document how much variance is typical in the target.
Encryption at rest and in motion: A customer must be able to encrypt data at rest on the block storage volume. If the customer chooses encryption at rest, the data must also be automatically encrypted in motion. Encryption must be a simple, self-service option that customers can select when provisioning an instance.
Automatic snapshot management: The customer must be able to configure a snapshot management policy that automatically takes a snapshot of a block storage volume based on a schedule or a time interval. The policy should also enable customers to configure continuous automatic replication of the backed-up snapshots to other regions.
Object Storage Service Criteria
Object life cycle management policy: The customer must be able to set time-based policies on objects, allowing actions to be taken automatically as objects age. At minimum, this feature must support automatically deleting objects that are over a certain age.
CSG: The provider must offer a cloud storage gateway (CSG) that can be placed at the customer’s on-premises data center as a physical and/or virtual appliance. The provider may either provide its own CSG or incorporate a partner’s CSG as part of the cloud service (i.e., the provider sells, ships and supports the partner’s CSG). Beyond simple file movement, the CSG must support some level of compression, acceleration, and/or encryption capabilities.
Tiered storage services: The provider must offer lower-tier (e.g., less redundant) storage services for use cases like long-term archival. The provider must facilitate data transfer to the lower tier from both standard tiers and outside sources. Compared with the traditional tiers of storage services, Gartner expects the lower-tier services to have different SLAs, pricing and performance (specifically, access times).
File Storage Service Criteria
File storage service snapshots: The provider must offer a snapshot feature. Customers must be able to configure and schedule various snapshot intervals and retentions. Customers must be able to make these configurations from the management console and via API.
File storage service cross-geography replication: For its file storage service, the provider must offer a replication service offering that traverses regional boundaries. This offering must be a service customers can opt into for geographic/global data protection, and the customer must have control over country placement. Due to the distance between regional boundaries, this replication is assumed to be asynchronous.
Five virtual networks: Providers must support five or more virtual networks or routing segments simultaneously per customer and per geographic region. Simple application architectures require only one or two virtual networks, but more complex architectures with several applications often require five or more virtual networks. These networks must be standard offerings and must be accessible via self-service interfaces.
Inter-region private WAN: The provider must enable customers to communicate between all its data centres via private network. This service offering must route data-center-to-data-center traffic, including traffic across geographic regions, over a private network. This private network can be logically private and does not have to include private fibre. However, it cannot use the internet, even if the traffic is tunnelled.
Local load balancing — independent IP address: The inbound IP address of a load-balancing group can be static and independent. Therefore, the customer can choose to explicitly move this IP address to a different load-balancing group (which can be used to facilitate seamless application upgrades and similar scenarios).
LAN performance target: Customers must have the option to purchase an explicit network performance target or performance tier within the cloud service, such as a certain guarantee of latency, jitter, packet loss or bandwidth throughput. Because of limitations in present-day technology, the target need not be an absolute guarantee, but the provider should document how much variance is typical in the target.
Multiple private-network connections per virtual network: The cloud provider must support two or more private network connections, such as VPN, MPLS and direct connect, per virtual network. For example, a cloud customer must be able to connect two VPN connections into a single virtual network simultaneously. If the provider supports only one private network connection per virtual network, the provider must support internal network security ACLs/firewalls between virtual networks and the ability to route across the virtual networks.
WAN traffic encryption: The provider must encrypt all WAN traffic between cloud data centres, regardless of what protocol is used for inter-data-center connectivity.
LAN traffic encryption: Recent revelations of government surveillance have increased customer concerns about data privacy in the cloud, raising the feature roadmap priority of end-to-end encryption for many service providers. To meet this criterion, the provider must encrypt all LAN traffic between compute instances within the data center.
Real-time network performance visibility: The provider offers a customer-visible dashboard that shows real-time network performance conditions (throughput and latency) between services within the platform, as well as historical metrics for at least the past 30 days. At minimum, the dashboard must show the average latency between items located within the same data center, including:
At least one type of compute-associated data store (such as the block storage service)
WAN performance target: Providers must optimize the network traffic performance across regions to achieve the following sustained metrics between two regions:
Ping round-trip time (RTT) less than or equal to 75 milliseconds (ms)
Transfer of a 1GB file over Secure Copy (SCP) at a speed greater than or equal to 2 MB/sec
To validate these metrics, tests should be run from one United States region to one European region and from one United States region to one Asia/Pacific region.
DNS service: The provider must offer DNS as a service. The customer should have the option of designating this DNS service as an authoritative (i.e., primary, or secondary) DNS for a zone. Furthermore, if the customer uses the provider’s IP addresses, the customer should be able to, as a self-service capability, create reverse DNS records for those IP addresses.
Security and Access
Tiered firewall functionality: Providers must support the following tiered firewall features:
Firewall policy hierarchy: Customers must be able to implement levels of firewall policies. This is helpful for implementing overarching firewall policy restrictions that other administrators cannot override.
Multiple firewall policy assignment: Customers must be able to assign three or more firewall policies simultaneously to an instance or group of instances.
Directory services: The provider must offer an AD-managed service or a managed directory service that is compatible with AD. This is not directory federation. Providers must enable customers to host their directory services on a fully managed and scalable platform that supports, at a minimum, the following:
Domain-joining of Windows and Linux instances
User accounts and group memberships
API support for federated authentication: Providers must enable identity federation for service interfaces (i.e., APIs), and provide documentation for customers to implement the federation. Organizations that build new applications within an IaaS environment prefer to leverage identity federation through service interfaces into the application. to build this functionality, customers need more than federated SSO to the management console; they need federated authentication into the service interfaces underlying the cloud infrastructure. Due to the complexity of integrating cloud service interfaces with federation, Gartner does not require that this federation be implemented with leading open standards.
WAF: The platform must provide a web application firewall (WAF) as a service. A WAF is intended to protect applications that are accessed via HTTP and HTTPS against attack. WAFs focus primarily on web server protection at Layer 7 (the application layer). They mitigate against classes of “self-inflicted” vulnerabilities in configured commercial applications or in custom-developed code that make web applications subject to attacks. However, WAFs may also include safeguards against some attacks at other layers.
Cloud security guideline matrix: Many different security standards and guidelines exist in the industry. IT organizations deem it important for public IaaS providers to respond to one or more of the leading standards as a compromise to responding to every cloud security document in existence. Providers must create a document detailing how their cloud service meets or does not meet every detail from one of the following cloud computing security guidelines:
Cloud Security Alliance (CSA) — “Security Guidance” Federal Risk and Authorization Management Program (FedRAMP) NIST Special Publication 800-53 — “Security and Privacy Controls for Federal Information Systems and Organizations” European Union Agency for Network and Information Security (ENISA) — “Cloud Computing Risk Assessment”Customer penetration testing request process: IT organizations that implement critical applications often employ a security organization or firm to conduct a penetration test against an application. Penetration tests seek to expose security vulnerabilities, scalability limitations or general configuration errors. A penetration test may be intrusive and impactful to infrastructure at a cloud provider, and some provider intrusion detection systems (IDSs) may even signal alarms indicating that the penetration test is an actual attack. With that in mind, providers must be able to support customer requests to conduct a penetration test. Even if they require customers to follow a request process for initiating a penetration test, providers can still qualify for this criterion.
SIEM integration or service: Security information and event management (SIEM) provides real-time analysis of security alerts generated by applications or network equipment. Providers must either offer out-of-the-box integration with leading SIEM products or provide a self-service, turnkey offering by which customers can configure real-time analysis and alerting of security events. At a minimum, the integration or service must support alerting, log retention, and some form of forensic analysis that can search across logs and periods of time for patterns.
Patch management service: The cloud service must offer an automated patch management service for Windows instances, free Linux instances and enterprise Linux instances running on the cloud platform. This should be an optional service that customers can opt into as they see fit. The service must include approval processes for patches and target or deadline dates for installation enforcement.
Compute instance vulnerability scanning: The provider must offer a service that scans customer compute instances for guest vulnerabilities. The scanning service may employ any of the following methods:
An in-OS agent supplied by the provider
An inspector of virtual hard drive data
An external network-based scan
This service must provide a report/dashboard so that customers can see any existing vulnerabilities. The service must be accessible via self-service interfaces.
Role-based authorization based on dynamic group or tag: Providers must enable role-based authorization permissions to apply only to specific aspects of the service, as opposed to the entire account and its associated resources. Permissions may apply per group, per tag or per element. Role-based authorization in such fashion is described as follows:
Element: This type of permission applies to an individual item like a single instance, single load balancer or single firewall.
Group: A group consists of multiple elements that have been grouped together and collectively given an associated keyword, like “mobile.” An element can belong to only one group. Groups are typically used to associate elements with a particular business project (such as grouping all infrastructure associated with a particular application).
Tag: A tag is essentially an arbitrary keyword, like “db” or “experiment.” Taggable items can be associated with many of these keywords.
HSM support: Providers must offer support for hardware security modules (HSMs). HSMs help address the most stringent regulatory requirements by offering cryptographic key management. To meet the regulatory requirements, the HSM service or support should comply with NIST Federal Information Processing Standards (FIPS) 140-2.3Published CSA STAR documentation: The Cloud Security Alliance’s (CSA’s) Security, Trust & Assurance Registry (STAR) is a generally available and free repository of cloud provider controls aimed at assisting end-user organizations in assessing cloud providers. CSA has garnered attention and legitimacy among IT organizations. The provider’s STAR registration must specifically cover its cloud offering (as opposed to any colocation, managed hosting or other services it may also provide).
Annual SOC 3 published publicly: Compared with a SOC 2 report, a SOC 3 report is at a higher level (i.e., it is more general-purpose). Due to the nature of this type of report, auditing agencies allow providers to make the report freely available on a public website. Therefore, to meet this requirement, the provider must have an annual SOC 3 report and publicly publish that report on its website.
PCI DSS Level 1 compliance: The provider must be Payment Card Industry Data Security Standard (PCI DSS) Level 1-compliant for the cloud service and have supporting documentation for customer assistance.
CDN: Many enterprise customers desire a provider that also offers a service for global content delivery networking. Content delivery networks (CDNs) are global caching or acceleration networks that attempt to improve end-user access performance around the world. Popular existing CDN vendors include Akamai, Level 3 Communications, Limelight Networks and Verizon Digital Media Services. Providers may look to implement their own CDN, or they may collaborate with a CDN vendor and resell such functionality as a service to their customers. The CDN service must be offered in self-service fashion with all maintenance offered by the provider. To qualify for this criterion, the CDN offering must provide 15 or more edge locations, including presences in North America, South America, Europe and Asia/Pacific (including at least one Australia location).
NoSQL DBaaS: NoSQL databases are increasingly popular for massive scale-out application architectures. The provider must offer a fully automated, self-service NoSQL DBaaS offering that is accessible from the rest of the cloud IaaS offering. The provider may support a common NoSQL platform, such as Apache Cassandra, Apache CouchDB or MongoDB, or build its own.
Relational DBaaS with redundancy: In addition to offering a relational DBaaS, providers must deliver this service in a locally and geographically redundant fashion. In other words, the service must automatically replicate the customer database across multiple data centres within a single region and within multiple regions. Opting into this service must be simple and available through self-service means. It should not require any additional replication configurations on the customer’s part.
Database transfer — import/export: Providers must offer mechanisms that allow customers to optimize data migration from a database outside of the cloud platform (and outside of the provider’s control) to a database within the cloud platform. Furthermore, the platform must also support mechanisms that allow customers to optimize data migration from a database within the cloud platform to a database outside of the cloud platform (and outside of the provider’s control).
Regions and zones architectural transparency: The provider must document the following:
The locations of its data centres (to within a specific metropolitan area, even if no specific addresses are provided)
The expected availability of the data center infrastructure (power and cooling)
The expected availability of external network connectivity
Any dependencies between data centres located in the same metropolitan area (for instance, sharing power sources or a fiber ring)
Additionally, provider architectures that implement zones must identify the minimum and maximum distance (in miles and kilometers) and latency between zones in the same region.
Support and Service Levels
Live support offering (native language): Providers must offer live support via phone and instant messaging. Furthermore, the live support must be offered in the native languages for each hosting location. When public IaaS providers expand globally, IT organizations in the different countries require language support native to them.
Granular assignment of support tiers: Providers must allow customers to self-assign different tiers of support to assets based on granular classification and not by maintaining separate master cloud accounts. Examples of classification might include the user, a component (e.g., an instance) or a metadata tag that can be assigned to any asset. Customers must furthermore be able to change service support tiers on demand through a service interface or management console.
365-day service health and SLA history: Providers must offer a dashboard or snapshot of service health and standard SLA status for customers to view at any time. The dashboard must contain at least 365 days of trailing health history so that customers have ample time to review health and SLA status against the prior year’s billing reports.
Compute service availability SLA — five minutes: The cloud compute service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to five minutes per month. Measured monthly, an uptime availability SLA of 99.99% or higher achieves the threshold of five minutes per month.
Storage service availability SLA — five minutes: The cloud storage service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to five minutes per month. Measured monthly, an uptime availability SLA of 99.99% or higher achieves the threshold of five minutes per month.
Data reliability SLA at least 99.99%: The cloud object storage service shall have a minimum documented data reliability SLA of 99.99%. Gartner differentiates data reliability from overall uptime availability because a storage service can be online while still failing in terms of the reliability of specific customer data. For example, an SLA of 99.99% data reliability equates to tolerance of one out of 10,000 files/objects being unavailable/corrupt at any point during the billing period. This must be an actual SLA and not simply a definition of how often providers replicate objects.
Customer view of SLA dashboard: Providers must offer a dashboard or snapshot of SLA status on a per-customer basis. A customer-specific dashboard must take into account all of the cloud assets a customer has deployed, as well as the various locations where those assets reside. The dashboard should therefore be able to pinpoint whether or not a specific customer asset was affected by a specific service issue based on location or region. The dashboard must contain at least 365 days of trailing health history so that customers have ample time to review health and SLA status against the prior year’s billing reports. This requirement is in addition to the servicewide health dashboard.
Management and DevOps
SDK library: Providers must offer a rich set of software development kits (SDKs) for three or more of the leading programming languages: Java, .NET, Node.js, Perl, PHP, PowerShell, Python and Ruby. These SDKs should, at a minimum, provide support for the core services: compute, storage and networking. The SDKs must also include documentation and code samples. It is acceptable for the provider to support an open-source library instead of creating one independently.
Notification via URL: The provider must offer a monitoring service that can invoke a user-specified URL when an alert is generated (for instance, when a standard or custom performance threshold is passed or an event occurs). This allows the user to automate alert responses, as well as to integrate with third-party services like PagerDuty.
Complex, multi-data-center templating: Providers must expand their basic templating service to support multi-data-center deployments within or across geographic regions. As previously mentioned, a template is a blueprint that allows a collection of different resources to be provisioned together. Such resources include compute instances, storage volumes, network elements, monitoring configurations and security configurations. The customer must be able to create, store and provision from templates built in the cloud service. A template is not just the combination of compute instance size with a particular image. It is a build manifest that provisions multiple elements, and it must do more than just specify how to provision one or more compute instances.
Community image catalog: Providers must maintain an image catalog of community-supplied images that customers have created and made publicly available. Community images must be directly self-service accessible, without the use of a third-party service or tool.
Post-provisioning hooks: Providers must offer customers the self-service ability to automatically perform a user-specified action when the boot process is finished on a newly provisioned compute instance. Customers may need to provide a way for the platform to interact with their compute instances (for instance, providing a helper script or agent).
Professional developers program: Providers must offer a professional developers program that provides certification for both developers and applications. A professional developers program validates a comprehensive set of skills that are necessary to develop applications successfully when using the programmatic interfaces of the provider. Supporting and bolstering such a development program will create an ecosystem of trusted third parties and individuals that will increase the adoption of the cloud service.
Price and Billing
Cost optimization engine: Providers must offer customers a service that recommends configurations for optimizing financial spend. The service must provide customer-specific recommendations based on current or historical patterns at the provider. It must not be customer-generic. Recommendations must be actionable, tied to specific assets and documented as having a certain amount of financial savings. This service must be offered directly by the provider and not require the customer to seek third-party partners.
Billing alerts for customer-chosen thresholds: The customer must be able to configure billing-alert thresholds. At a minimum, the customer must be able to configure the following:
Spend exceeded: The platform must alert the customer when a predefined threshold for spend is exceeded (i.e., the customer has, to this point, accumulated that amount in charges).
Projected excess: The platform must alert the customer that, based on a predefined desired monthly spend, the customer is projected to exceed spend at the current consumption rate.
Billing in multiple currencies: Providers must offer billing in currencies local to every location for which the provider has a cloud service offering.
Pricing API: Providers must offer an API for pricing data that customers or third parties can query/access to get real-time price points for any asset the cloud provider offers.
Spending/allocation quotas: Some customers do not want to outspend certain budgets, especially for development or testing exercises. Thus, providers must enable customers to impose financial or technical quotas on the amount of compute, storage or network resources they consume during a defined time (typically a billing period). However, customers should take caution when implementing such a feature.
Optional Feature SetCompute
HPC offering: Many scientific and financial services organizations are increasingly looking to take advantage of IaaS offerings for high-performance computing (HPC) projects. Therefore, leading providers must include a unique offering or service tier within their platforms. Offerings must include the following features:
Low-latency, high-bandwidth network connections among the cluster peers of an architecture with at least 10 Gigabit Ethernet or 10 Gbps throughput.
Identical and published processor architectures among the cluster peers.
Support for 250 or more nodes.
Graphics processing unit (GPU) support. (The provider must document which GPU technology is used so that customers can program specifically against the GPU model.)
Export VM image: The cloud service must support the ability to take an existing running VM, or a copy of a VM, into a VMDK, OVF or VHD image format.
Bare-metal provisioning: The cloud service must offer bare-metal compute servers — as a service — that do not run on top of server virtualization platforms. The bare-metal offering must be fully automated.
Customer-controlled overprovisioning: The provider must offer at least one tier of service by which a customer can decide to overprovision compute resources. This must be an opt-in service or a separate service, and it must be priced at a lower rate than the standard service.
Single-tenant storage service: Providers must enable customers to create a single-tenant block storage volume. This single-tenant volume can be used in conjunction with any compute instances that would otherwise use the multitenant block storage service. It is located on a physical storage device that is restricted to a single customer for the lifetime of the storage volume. (Note that the isolation applies to the whole storage device, not just a disk drive. For instance, if the device is a server with just a bunch of disks JBOD, the whole server must be single-tenant for the customer.)
Static web hosting support: Customers must be able to implement a fully functioning website via the object storage service and associated object containers. With this feature, customers can easily upload root and static web content and publish websites without configuring web servers on top of compute instances. Providers may front-end this capability with a website service outside of the object storage API or interface.
Internet-accessible file storage shares: The provider must enable customers to expose file shares hosted on its file storage service to the internet. This could be achieved by opening a TCP port, via a RESTful API, or via web interface access to the file shares.
Integration with EFSS: The provider must enable customers to integrate file shares hosted on its file storage service with enterprise file synchronization and sharing (EFSS) services. For example, customers should be able to accomplish any combination of the following:
Point Amazon Cloud Drive to Amazon Elastic File System (EFS)
Point Microsoft OneDrive for Business to Azure Files
Point Google Drive to Amazon EFS
Point Dropbox Business to Azure Files
Instance support for five or more network interfaces and IP addresses: To satisfy some customer application use cases, providers must:
Offer the ability to connect instances, as allowed by the guest OS, to at least five different vNICs and routing networks simultaneously
Support up to five different IP addresses
IPv6 support: Providers must support Internet Protocol version 6 (IPv6) at either the gateway (e.g., load balancer) or instance level, and expose this functionality to customers.
Private customer connectivity — integrated service: A customer can obtain private WAN connectivity directly from the provider, and it is integrated directly into the cloud platform. This service can be manually provisioned, but once provisioned, must be self-service configurable via an API and, preferably, a portal.
WAN optimization — automatic: All traffic between the platform’s data centres is automatically accelerated using WAN optimization technology. The technology used can be either commercial or proprietary to the provider. WAN optimization includes techniques such as protocol optimization, data deduplication, compression, and traffic shaping. Furthermore, customers must be able to opt into a WAN optimization service for traffic between two of the platform’s data centres.
Security and Access
Approval workflow: Providers must offer customers a self-service approval workflow. Customers sometimes want the ability to add an approval step to cloud service functions, especially those that accrue cost. Approval workflow means that, when a self-service user makes a provisioning request, the account owner (or some other delegated manager) must approve or deny the request, and the request is automatically executed if approved. Approval workflows must be designed with a self-service interface and must allow for flexibility of approval assignment based on the master account holder or delegate.
Dedicated HSM: Providers must be able to dedicate an HSM to a single customer. The dedicated HSM will help that customer address the most stringent regulatory requirements by offering cryptographic key management. The HSM should comply with NIST FIPS 140-2.3Adaptive authorization based on time and location: Providers must offer customers adaptive authorization to cloud services, based on time and location. The adaptive authorization rules must be configurable through a self-service interface. Adaptive authorization in such fashion is described as follows:
Time: This type of permission restricts access to the functionality to a certain time frame. For example, a time permission could allow the restarting of compute instances only between 3 a.m. and 5 a.m.
Location: This type of permission restricts access to functionality based on the IP address of the source of the request. For instance, a location permission could restrict the restarting of compute instances to only requests coming from a particular corporate network IP address block.
EU-U.S. Privacy Shield compliance: Providers must comply with EU data protection policies. Providers must be EU-U.S. Privacy Shield-certified, and/or offer a data processing agreement that complies with the EU Model Clauses.
HIPAA with BAA: The provider must support Health Insurance Portability and Accountability Act (HIPAA) compliance and sign a business associate agreement (BAA) with customers. The provider must have supporting documentation for customer assistance.
CJIS compliance: The provider’s services can be used in accordance with Criminal Justice Information Services (CJIS) requirements, and the provider must offer supporting documentation for customer assistance. The provider must sign the CJIS Security Addendum.
U.S. Government Certifications
U.S. government agencies have a variety of certification needs. Each is listed as an individual optional criterion:
FedRAMP, including Joint Authorization Board (JAB) Provisional Authorization to Operate (P-ATO)
International Traffic in Arms Regulations (ITAR) compliance
Federal Information Security Management Act (FISMA) Low
Hadoop as a service: Providers must deliver a Hadoop environment as a fully automated, self-service, turnkey offering. This must be a full service, not simply a “one click install” of Hadoop or the like.
Published data center energy efficiency metrics: For each data center, the provider must document the following:
The source of energy (or metrics, such as Carbon Usage Effectiveness CUE and Green Energy Coefficient GEC, that would allow assessment of the environmental impact of the energy sources)
A power usage effectiveness (PUE) metric (measured at least annually)
If a data center has Leadership in Energy and Environmental Design (LEED) certification, this should also be documented.
Relational DBaaS with cross-region failover: Providers must offer a relational DBaaS with cross-region high availability, enabling customer databases to be synchronously or asynchronously replicated into at least one data center in a different region. Furthermore, the service must allow failover to a replica in another region. Customers must be able to opt into this service through self-service means, without performing any additional replication configuration.
Support and Service Levels
Compute service availability SLA — three minutes: The cloud compute service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to three minutes per month. Measured monthly, an uptime availability SLA of 99.995% or higher achieves the threshold of three minutes per month.
Storage service availability SLA — three minutes: The cloud storage service must offer at least one tier of service that has a maximum allowed outage time of less than or equal to three minutes per month. Measured monthly, an uptime availability SLA of 99.995% or higher achieves the threshold of three minutes per month.
SLAs in programmatically readable formats: Emerging requirements from organizations will increasingly compel providers to offer their SLAs to customers in programmatically readable formats such as XML. IT organizations leverage many different cloud providers and brokers, and manually keeping track of each individual SLA is challenging. IT organizations will be looking to automate SLA tracking. As such, they will require providers to offer SLAs in a programmatically readable format or through a service interface that integrates into an SLA tracking system or dashboard.
Management and DevOps
GUI-based network design/inventory mapping: Providers must enable customers to create infrastructure architecture via a GUI-based interface. In this scenario, customers can drag and drop servers, storage and networks to build a logical design that automatically deploys from the layout of the design. Providers can accomplish this through a native graphical interface or through the import of a Microsoft Visio diagram.
GUI-based network architecture export: Providers must enable customers to export a graphical representation of their infrastructure architecture, which includes all servers, storage and networks. Providers can export this into an image file (such as a JPG or a BMP), a PDF, a Microsoft Visio format or a Microsoft PowerPoint format.
Multicloud library support: The platform’s native API must be directly supported by at least two major multicloud libraries, such as Apache Libcloud, Dasein or Deltacloud. This must be active, nonbeta, production-quality support, including an active maintainer that responds to requests and bug reports and that regularly updates the library for this platform.
Mobile application for management console: Providers must offer customers a mobile application that works on one of the top three mobile platforms (Android, iOS and Windows). This mobile application must be able to monitor, report and alert on all generally available services.
Price and Billing
Compute instance leases: The platform must allow the customer to define a lease time for a compute instance. A lease puts a time limit on the life of an infrastructure element. When that time limit expires, the element is automatically deprovisioned (or otherwise placed in a state that stops billing). For instance, a VM with a two-week lease is automatically terminated after two weeks, unless the lease is renewed or otherwise extended.
Variable/auction-priced tier offering: Providers must offer a variably priced tier of service through an auction- or bidding-type format to enable the consumption of extra capacity in the cloud environment.
Cost forecasting: The platform must provide a cost-forecasting service that allows the customer to use its own historical cost data to project future costs. This service must allow the customer to project higher or lower utilization than before, as well as to determine the effects of changing resources and services (for instance, moving from many small compute instances to fewer large compute instances).