Skip to content

kimtth/azure-bicep-data-platform-laC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Intrastructure as a code : Bicep - Azure Data Platforms

In this chapter, we are going to deploy Azure Data Platforms by Bicep. Infrastructure automation reduces redundant manipulations. By using this code we can deploy all data platforms within approx. 3-4 mins.

  • What is Bicep?

Bicep is a domain-specific language (DSL) that uses declarative syntax to deploy Azure resources.

Under the hood, Bicep (DSL) is converted into JSON which would be passed into Azure Resource Manager.

  • Bicep (DSL) -> JSON -> Azure Resource Manager -> Azure

Azure Data services overview

Microsoft Azure offers services for a wide variety of data-related needs.

  • Data factory: Data pipelines. Integration of data sources, ETL, and data flows
  • Synapse Analytics: Data lake and Datawarhouse platform. T-SQL Polybase and Spark
  • Databricks: PySpark-based cloud service
  • Purview: Data Governance solution to track and monitor, an overview of data lineage
  • Azure HDInsight: an open-source analytics service that runs Hadoop, Spark, Kafka, and more (Most of cases, HDInsight can be replaced by the other azure services)

Databricks vs Synapse Analytics

Synapse has an open-source Spark version with built-in support for . NET, whereas Databricks has an optimized version of Spark which offers increased performance.

https://adatis.co.uk/databricks-vs-synapse-spark-pools-what-when-and-where/

Azure AWS GCP
Data factory Glue Cloud DataPrep
Synapse Analytics Redshift BigQuery
Databricks Databricks on AWS Databricks on GCP
Purview - -

Azure DL/DW model architecutre

mda

https://docs.microsoft.com/en-in/azure/architecture/solution-ideas/articles/enterprise-data-warehouse

Bicep structure

project
β”‚   README.md
β”‚   init.ps1    
β”‚   main.bicep
└───img
β”‚   └───...
└───template
    β”‚   databricks.bicep
    β”‚   datafactory.bicep
    β”‚   purview.bicep
    └── synapse.bicep
PS> az group create --name $resourceGroup --location $location
PS> az deployment group create --resource-group $resourceGroup --template-file .\main.bicep --parameters .\deploy.parameter.json

How to execute the script

  • Execution flow
.\init.ps1 -resourceGroup bicepRG -location japaneast

init.ps1 --> main.bicep --> execute in sequence bicep files in template directory

  • init.ps1: Pass parameters and biceps to Azure CLI
  • main.bicep: Consolidate the bicep files and control modules dependencies
  • template/<service_name>.bicep: Define Each service's specifications

References

Bicep Anti pattern - dependsOn syntax

resource dnsZone 'Microsoft.Network/dnszones@2018-05-01' = {
  name: 'demoeZone1'
  location: 'global'
}

resource otherZone 'Microsoft.Network/dnszones@2018-05-01' = {
  name: 'demoZone2'
  location: 'global'
  dependsOn: [
    dnsZone
  ]
}

Azure Resource Manager evaluates the dependencies between resources, and deploys them in their dependent order. When resources aren't dependent on each other, Resource Manager deploys them in parallel.

The use case for dependsOn is if you need to make sure that the Resource Manager will create the resource with the dependOn only after the other resource has been created (and not in parallel).

https://stackoverflow.com/questions/71320257/why-is-dependson-not-recommended-in-bicep

About

πŸ—„οΈ πŸ‘¨πŸΎβ€πŸ’»πŸ­Azure Data platform Infrastructure as Code (Datafactory, Databricks, Synapse Analytics, Purview)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors