How Cloud Resiliency organized in Microsoft Azure


Disaster avoidance and Disaster recovery are two concepts used in many industrial areas. The two terms look alike but they are somewhat different based on the situation that we use them in. Sometimes disaster avoidance is also called resiliency. In information technology, these two terms are mainly used in system engineering and system designing areas. So, what is meant by terms resiliency and recoverability? Resiliency expresses the ability to operate services even in a disruptive system event. Recovery means actions needed to perform to make a system work again when a disruption causes a system to fail. For example, resiliency is like building structures to stay stable in an earthquake situation, on the other hand, recoverability means if some disaster happens to the structure how we are going to correct that damage. So, now as we know the meaning of the two terms, let’s move to our main topic, How Cloud Resiliency is organized in Microsoft Azure. Cloud consists of a larger number of data centers. These data centers are similar to on-prem data centers but huge in scale. Cloud is mainly built using virtualization technologies same as on-prem data centers. Therefore, the resilience options that the cloud has similar to what we use in on-prem data centers. However, different cloud providers use different names to market their products. In this article, let’s discuss a few key resilience concepts offered by the Azure cloud service.

First let’s have some idea on a few key terms, starting with regions, region is a logical name given to a group of data centers that are placed somewhat close to each other. Regions are organized under geographies. Geography is a boundary defined based on political, regulatory, and location factors. Under regions there are multiple zones, a zone has at least one data center, a zone has its own dedicated power supply and network service. Zones contain multiple fault domains and update domains. Fault domain identifies a group of hardware that have common cooling, power, and network service. We can think of a fault domain as a server rack. Update domain identifies a group of hardware that gets fabric level update at the same time. Azure ensures that they do not update two or more update domains at the same time. These policies apply as same to zone level. In this paragraph we identify a few commonly used keywords that we need when talking resiliency options in Azure, let’s discuss a few resiliency options from next para. 

Let’s start with availability sets, it is a logical group that spans through few fault-domains and update-domains. The recommended setting is to include 2-3 fault domains and 5-20 update domains within an availability-set. After setting up, then we can place over service, under the availability set. Users do not have the ability to select which fault or update domains to place their service. Azure cloud handles that part therefore Azure highly recommends creating separate availability sets for different tasks. Availability set guarantees 99.5% availability.  

Next, let’s talk about Availability Zone. This concept is all about placing your services and resources in multiple zones. This ensures that in the event of zonal level disruption still your service can operate as normal. Availability zones guarantee 99.9% service availability. Under current configurations, a service can be deployed to one of three availability zones. Selection of the availability zone should be carried out by customers. Therefore, when architecting your applications, you have to be really careful about where to place services. Regions that support availability zones contain at least three zones. 

Next, we have paired regions, if you wish to have regional level resiliency then this service is for you. Regional level Azure service could fail due to mainly natural disasters, war situations but this type of situation occurs rarely and normally you have enough time to react before that scale of an event happens.

These are the commonly used resiliency services. However, there are many more resiliency options we can use with Azure cloud. Hope to discuss those options in future posts.

References

Comments

  1. Nice write-up. As per my understanding, in cloud computing, applications are involved with many types of resources and the applications are depending on internal and external services. There can be issues due to the failures in those services or defective software. So, resiliency should be involved with how these failures can be detected and recovered. Has Azure come up with a mechanism to address this context too?

    ReplyDelete
    Replies
    1. Yes Dulanga, there are various offerings in azure to address different situations as you mentioned. For example, we can use Azure application insights to identify behaviors of application that are running in the cloud, Azure Monitor is another service we could use to monitor infrastructure and application running on Azure, we have a security center to monitor security-related activities. Azure also offers best practices and recommendations with the Azure advisor.

      Delete
  2. As a leading cloud service provider it is good that Azure has addressed the issues related to availability ensuring high availability of it's services.

    ReplyDelete
    Replies
    1. Yes, it is must to do, because of the high competitiveness in cloud business all the cloud providers need to come up with their own availability strategies to stay in the business.

      Delete
  3. It's very informative. keep it up!

    ReplyDelete

Post a Comment

Popular posts from this blog

Network Traffic Identification and Classification with Machine Learning

MalLocker.B