Designing Modern Data and Analytic Solution in Azure
After having explained the pros and cons of Azure and the decision drivers for going to an Azure Architecture, some interesting messages were delivered like the decoupling of the storage and compute aspect on Azure, even if some of services still combine both. Another message that we all know but is essential to remind on a regular basis is that cost control is an important aspect. Developers and architects can influence the cost of cloud based platform hugely.
The fact that Azure is evolving a lot, that new services are offered or that their features are changing , you have to be constantly informed about these new or deprecated capabilities. You are sometimes faced to some migration works to adapt to these evolutions. Proof of concept must be an habit, a routine due to this constantly moving environment.
Organizing and configuring your resources in Azure is also something to consider seriously not to be in trouble and lost with all the resources you deployed within your organization. You have some notions that become important to use to organize your Azure platform like
- Resource groups
- Locations of your services
- Naming convention
Using Tags on Azure can help a lot for reporting reasons.
A right naming convention (i.e. Purpose + type of service + environment), can help you to deploy easier your different environments, but a trick is also not use special characters because they cannot be used consistently in azure services. Using policies will support you to control the services are created the way your governance wants them.
A presentation of a data analytic architecture was presented including the following services in order to managed structured data:
- Azure data factory
- Azure Blob Storage
- Azure data bricks
- Azure SQL Data Warehouse
- Azure Analysis Services
- Power BI
Then following services are suggested additionally to add real-time data analytic
- Azure HDInsight Kafka
- Azure Databricks Spark
- Azure Cosmos DB
They presented then the different layers of the architecture and the possible azure services, data acquisition, data storage, data computing and data visualization.
When doing on premise data warehouse project for 15 years , I was mentioning always that I want to privilege ELT vs. ETL , I was glad to hear that now ELT is really mostly used terminology and Data Factory was presented that way. The new version enables now to run SSIS packages too. The different data storage possibilities, like blob storage, data lake, SQL DWH, SQL DB and Cosmos DB have been reviewed. Choosing the right data storage is always to find the right balanced between data schema read and data schema write. But in a modern BI Azure platform you will find a a so called polyglot data storage solution, combining several types a storage services.
A component often emerging in this tools constellation is Databricks running Apache Spark for the data engineering and data science. The advantage is that it is closer to open source world and support several languages SQL, Python, R, Scala and can be used for use cases like bath data processing, Interactive analytics, machine learnings, Stream event processing,…
In such environment, the data virtualization is also an interesting capabilities on such architecture. It is possible using for instance polybase, allowing to query the disparate stored data avoiding to replicate, duplicate it.
The per-conference day finished with the presentation of Power BI and Azure Analysis Service and some automation technics with ARM and PowerShell
Again an interesting data analytic day…stay tuned…