Best Practices Guide

Azure Architecture Best Practices

July 26, 2018 3 Minutes to Read

To optimize Azure performance and costs, we have compiled a list of best practices for Azure architecture. We based the list on our experience transitioning hundreds of on-premise applications to Azure and building dozens of new Azure applications. The applications required a variety of assets such as databases, data warehouses, websites, data streaming services, and machine learning models. Depending on functionality, scalability, and security needs, we relied on Azure Virtual Machines (VMs) or on serverless Azure services architecture. We look forward to sharing this guide and improving it based on feedback from fellow Azure architects.

  1. To reduce Azure costs, turn off VMs at set times

    Switching off resources during non-usage periods reduces overall subscription costs. There are several ways to schedule auto-shutdowns for Azure VMs. When provisoning a new VM, Azure includes settings for scheduled shutdown time according to time zone.

    When administering multiple VMs, runbooks are ideal for scheduling automatic shutdown. Runbooks in Service Management Automation and Microsoft Azure Automation are Windows PowerShell workflows or PowerShell scripts.

    In DevOps mode, our developers typically maintain four parallel environments: Development, Testing, UAT, and Production. We use ARO Toolbox and runbooks to automatically shut down all non-production environments at the end of business hours. The environments automatically start again before business hours resume. When a VM is shut down, Azure does not charge compute or network fees for the stopped VM. Azure does charge a small fee for storage. Depending on usage patterns, Azure subscription costs can be reduced by 20% to 30% by turning VMs off outside of business hours.

    AWS Instance Scheduler offers a similar service and claims to reduce costs by up to 70% versus full-time operation.

  2. To efficiently resolve failure scenarios, implement checkpoints in the Azure Data Factory (ADF) v2 pipeline

    Handling failure scenarios in multiple pipelines can be challenging in ADF. Because of the number of pipelines, identifying the precise failure point is difficult. When you cannot determine the failure point, all pipelines must be retriggered.

    Control the flow of execution of multiple pipelines through a Master Pipeline and log tables. This triggers the Master Pipeline from the stage where it failed rather than triggering all child pipelines from the start.

  3. Use Query Replicas for Azure Analysis Services (AAS) synchronization

    To get a load balanced experience for AAS, use query replicas and synchronization for the AAS Model between the processing node and read-only query replicas. This helps parallelly serve multiple concurrent connections, improving the responsiveness of the models significantly. This also allows for high availability, even when a model is being processed.

    Because creating a query replica adds cost equivalent to an AAS node, a cost-benefit analysis should be performed before implementing this practice.

  4. To optimize query execution time in Azure SQL Data Warehouse (ADW), use appropriate resource classes

    Resource classes help manage workloads by setting limits on the number of queries that run concurrently and the compute-resources assigned to each query. Smaller resource classes reduce the maximum memory per query but increase concurrency. Larger resource classes increase the maximum memory per query but reduce concurrency.

    There are two types of resource classes:

    • Static resource classes, which are well suited for increased concurrency on fixed data sets. Choose a static resource class when resource expectations vary throughout the day. For example, a static resource class works well when the data warehouse is queried by many people. When scaling the data warehouse, the amount of memory allocated to the user does not change. Consequently, more queries can be executed in parallel on the system.
    • Dynamic resource classes, which are well suited for data sets that are growing in size and improving performance as the service level is scaled up. Choose a dynamic resource class when queries are complex but do not need high concurrency. For example, generating daily or weekly reports is an occasional need for resources. If the reports are processing large amounts of data, scaling the data warehouse provides more memory to the user's existing resource class.

    In sum, use static resource classes for fixed datasets and use dynamic resource classes for growing datasets. Be aware, however, that large dynamic resource classes use many slots, reducing the resources available for additional queries.

  5. To process large volumes of data, dynamically scale Azure Analysis Services (AAS)

    To improve processing performance, schedule automatic scale-up of AAS through a runbook immediately before processing large volumes of data. To optimize costs, schedule scale-down after processing.

  6. To check available and expected roles in AAS, use Azure App functions

    To secure data in AAS, we create roles. Often, however, roles are deleted or missing in subsequent deployments. Use Azure App functions to periodically check the available and expected roles in the model, and send administrative alerts if discrepancies are identified.

  7. To reduce latency, use partition-specific processing for AAS

    Use Azure App functions to process specific partitions in the AAS model, thereby reducing the processing time and latency when showing the latest data.

  8. To increase security, use Service Principal Identity and Azure Key Vault

    Store credentials for data stores and computes in an Azure Key Vault. Azure Data Factory retrieves the credentials when executing an activity that uses the data store/compute.

  9. To avoid performance issues, restrict activities in single Azure Data Factory (ADF) pipelines

    Limit single ADF pipelines to 40 activities or less to avoid performance issues and resource contention.

References

Microsoft offers additional documents that provide a high-level framework for best practices. We strongly encourage you to review the following documents: