Tuesday, May 15, 2018

Azure @ Enterprise - Finding how many nodes are really created for one HDInsight cluster

When we create an Azure HDICluster, it internally creates virtual machines. In the Azure portal's cluster creation blade, it asks for the details about Head and Worker nodes. We cannot set the no of head nodes but worker nodes. All good till now.

But @ enterprise, if the HDInsight cluster need to be in vNet, there could be issues on lack of IP Addresses available in the subnet. Its gets worse if the creation needs to happen dynamically in a multi tenant application. It is very difficult to do calculation on the IP address requirements of HDICluster, if we don't know the internals of how many VMs get created as part of one HDInsight cluster regardless of worker nodes.

Is that not available publicly? Yes it is and below are links towards it.
https://blogs.msdn.microsoft.com/azuredatalake/2017/03/10/nodes-in-hdinsight/
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services

The above tells for Spark it create Head nodes, ZooKeeper nodes and Gateway nodes. How to validate how many machines created or how to verify the facts ourselves. The portal never tells how many machines get created, if we navigate to already created HDICluster resource blade. PowerShell object of HDICluster instance too doesn't have direct info about the internal machines created. So what is the alternative?

PowerShell to retrieve nodes

Again PowerShell and some string comparisons to rescue. Below goes the script.

$hdiClusterName = "<name of cluster without domain>"

"Assumption 1 - The vNet and subnet of all nodes are same."
"Assumption 2 - The vNet, Public IPAddresses & NIC are in same resource group"
"Assumption 3 - There will be a gateway nodes fof HDICluster and public ip address for gateway is in format publicIpgateway-<internal id>"
"Assumption 4 - A unique internal id is used to name the nodes,NICs, public addresses etc...This script heavily depend on that internal id based naming convention"

"--------------------------------------------------------"

$resource =(Get-AzureRmResource -ResourceId (Get-AzureRmHDInsightCluster -clustername $hdiClusterName).Id)

$hdiClustersVNetResourceGroupName = (Get-AzureRmResource -ResourceId $resource.Properties.computeProfile.roles[0].virtualNetworkProfile.id).ResourceGroupName

"ResourceGroup of vNet assiciated with HDI cluster- $hdiClustersVNetResourceGroupName"

$publicAddress = (Get-AzureRmPublicIpAddress -ResourceGroupName $hdiClustersVNetResourceGroupName) | Where-Object {$_.DnsSettings.DomainNameLabel -eq $hdiClusterName}

$publicIpgatewayName = $publicAddress.Name

$hdiClusterInternalId = $publicIpgatewayName.Split('-')[1]

"Internal Id of HDI used to create nodes - $hdiClusterInternalId"

"Below are the NICs used by $hdiClusterName HDI Cluster. Each NIC corresponds to one node."

$nics = Get-AzureRmNetworkInterface -ResourceGroupName $hdiClustersVNetResourceGroupName
$nics = $nics | Where-Object {$_.Name -like "*$hdiClusterInternalId"}
$nics | Select-Object -Property Name

As we can see the script relies on the naming convention of NICs. If Microsoft changes it the script will fail.

From the list we can see it creates 2 Head nodes, 3 ZooKeeper and 2 Gateway nodes along with minimum 1 worker node. Minimum 8 IP Addresses will be consumed for one HDInsight cluster. At the time of writing this post the ZooKeeper and Gateway nodes seems free. The charge is only for Head and Worker node(s)

Ambari Portal

Another way is via Ambai portal. If we navigate to the below URL, we can see the head nodes and ZooKeeper nodes. But not able to see the gateway nodes.

https://<cluster name>.azurehdinsight.net/#/main/hosts

Happy scripting...

No comments: