Context: Azure Premium File Share (100 GiB provisioned, 15.37 GiB used, 179 snapshots) mounted as a Persistent Volume in AKS via SMB. IOPS prereq checks are consistently failing:
- Read IOPS: 378 (threshold: 1,800) ❌
- Write IOPS: 123 (threshold: 600) ❌
| # | Check Area | Priority |
|---|---|---|
| 1 | Current PV/StorageClass mount options | 🔴 Critical |
| 2 | SMB protocol version on the node | 🔴 Critical |
| 3 | Network path — AKS VNet ↔ Storage Account | 🔴 Critical |
| 4 | Storage account region vs AKS cluster region | 🔴 Critical |
| 5 | AKS node VM size (network bandwidth cap) | 🟡 Medium |
| 6 | Azure Storage metrics — throttling errors | 🟡 Medium |
| 7 | CSI driver version | 🟡 Medium |
| 8 | Snapshot count and overhead | 🟢 Low |
| 9 | Provisioned share size vs required IOPS | 🟢 Low |
# Get the StorageClass name used by the PVC
kubectl get pvc -n <namespace> -o wide
# Inspect the StorageClass
kubectl get storageclass <storageclass-name> -o yaml
# Inspect the PV directly
kubectl get pv <pv-name> -o yaml | grep -A 15 mountOptions
What to look for:
# ❌ BAD — Missing or minimal mount options
mountOptions: []
# ✅ GOOD — Optimized mount options
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict
- nosharesock
- nobrl
- actimeo=30
SSH into the AKS node (or use a debug pod):
# Option A: Debug pod on the node
kubectl debug node/<node-name> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
chroot /host
# Check what's actually mounted
mount | grep cifs
# Look for vers= in the output
# ❌ BAD: vers=2.1 or vers=3.0
# ✅ GOOD: vers=3.1.1
# From inside a pod that uses the PVC
kubectl exec -it <pod-name> -n <namespace> -- sh
# Check DNS resolution
nslookup <storageaccount>.file.core.windows.net
# Check latency
ping -c 10 <storageaccount>.file.core.windows.net
# ❌ BAD: Resolves to public IP, latency > 5ms
# ✅ GOOD: Resolves to private IP (10.x.x.x), latency < 2ms
# Check if private endpoint exists (Azure CLI)
az storage account show \
--name <storageaccount> \
--resource-group <rg> \
--query "privateEndpointConnections" -o table
# Check service endpoints on the AKS subnet
az network vnet subnet show \
--resource-group <rg> \
--vnet-name <vnet> \
--name <aks-subnet> \
--query "serviceEndpoints" -o table
# AKS cluster region
az aks show --name <cluster> --resource-group <rg> --query "location" -o tsv
# Storage account region
az storage account show --name <storageaccount> --resource-group <rg> --query "primaryLocation" -o tsv
# ❌ BAD: Different regions (e.g., eastus vs westus2)
# ✅ GOOD: Same region
# Check node pool VM size
az aks nodepool list --cluster-name <cluster> --resource-group <rg> \
--query "[].{Name:name, VMSize:vmSize, Count:count}" -o table
# Check expected network bandwidth for the VM SKU
az vm list-skus --location <region> --size <vm-size> --query "[0].capabilities[?name=='UncachedDiskIOPS' || name=='ExpectedNetworkBandwidth']" -o table
Note: Small VM SKUs like
Standard_B2sorStandard_D2s_v3have limited network bandwidth (e.g., 1 Gbps) which can bottleneck SMB throughput.
# Check for throttling events in the last hour
az monitor metrics list \
--resource "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>/fileServices/default" \
--metric "Transactions" \
--dimension "ResponseType" \
--interval PT1H \
--query "value[0].timeseries[?metadatavalues[0].value=='ClientThrottlingError' || metadatavalues[0].value=='ServerBusyError']" \
-o table
Or in the Azure Portal:
TransactionsResponse typeClientThrottlingError, ServerBusyError# Check Azure Files CSI driver version
kubectl get pods -n kube-system -l app=csi-azurefile-node -o jsonpath='{.items[0].spec.containers[0].image}'
# ❌ BAD: v1.28 or older
# ✅ GOOD: v1.30+ (latest stable)
# List snapshots and total size
az storage share-rm list \
--storage-account <storageaccount> \
--resource-group <rg> \
--query "[].{Name:name, Quota:shareQuota, UsedBytes:shareUsageBytes}" -o table
# Check snapshot count
az storage share snapshot list \
--account-name <storageaccount> \
--name <sharename> \
--query "length(@)" -o tsv
For Premium Azure Files, baseline IOPS is calculated as:
Baseline IOPS = MAX(3,000 + (1 × Provisioned GiB), 100)
Burst IOPS = MAX(10,000, 3 × Baseline IOPS)
| Provisioned GiB | Baseline IOPS | Burst IOPS |
|---|---|---|
| 100 | 3,100 | 10,000 |
| 256 | 3,256 | 10,000 |
| 1,024 | 4,024 | 12,072 |
Your 3,100 baseline IOPS should be more than enough for the 1,800 read + 600 write threshold. The problem is not provisioned IOPS — it’s delivery to the pod.
┌──────────┐ ┌──────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Pod │───>│ AKS Node │───>│ Network Path │───>│ Azure Storage │
│ │ │ (SMB Client) │ │ (VNet/PE/SE) │ │ (Premium Files) │
└──────────┘ └──────────────┘ └─────────────────┘ └──────────────────┘
▲ ▲ ▲
│ │ │
Bottleneck 1 Bottleneck 2 Bottleneck 3
Mount options, Latency, no private Throttling,
SMB version, endpoint, cross-region snapshot overhead
VM size limits routing
| Root Cause | Likelihood | Impact on IOPS |
|---|---|---|
Suboptimal SMB mount options (missing nosharesock, nobrl, cache=strict, small buffer sizes) |
🔴 Very High | Can reduce IOPS by 3-5x |
| No private endpoint — traffic routing through public internet or NAT gateway | 🔴 High | Adds 5-20ms latency per operation |
Old SMB version (vers=2.1 or 3.0 instead of 3.1.1) |
🟡 Medium | 20-40% IOPS reduction |
| Small AKS node VM — limited network bandwidth | 🟡 Medium | Caps throughput regardless of storage |
| SMB protocol overhead on Linux — encryption, signing, non-native protocol | 🟡 Medium | Inherent ~30-50% overhead vs NFS |
| High snapshot count (179) — metadata overhead | 🟢 Low | Minimal direct IOPS impact |
Create a new optimized StorageClass:
# file: storageclass-smb-optimized.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile-premium-optimized
provisioner: file.csi.azure.com
parameters:
skuName: Premium_LRS
protocol: smb
mountOptions:
- dir_mode=0777
- file_mode=0777
- uid=0
- gid=0
- mfsymlinks
- cache=strict # Enable client-side caching
- nosharesock # Dedicated TCP connection per mount
- nobrl # Disable byte range locks (if app doesn't need them)
- actimeo=30 # Cache file attributes for 30 seconds
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
kubectl apply -f storageclass-smb-optimized.yaml
⚠️ Note: Existing PVCs cannot change StorageClass. You must create a new PVC and migrate data.
Expected impact: 🟡 2-3x IOPS improvement (378 → 800-1,100 read IOPS)
# Create a private endpoint if one doesn't exist
az network private-endpoint create \
--name pe-storageaccount \
--resource-group <rg> \
--vnet-name <aks-vnet> \
--subnet <aks-subnet-or-pe-subnet> \
--private-connection-resource-id "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>" \
--group-id file \
--connection-name pe-connection
Expected impact: 🟡 1.5-2x IOPS improvement (reduces latency from ~10-20ms to ~1-2ms)
This is the single most impactful change you can make.
# file: storageclass-nfs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azurefile-premium-nfs
provisioner: file.csi.azure.com
parameters:
protocol: nfs # Switch from SMB to NFS v4.1
skuName: Premium_LRS
mountOptions:
- nconnect=8 # 8 parallel TCP connections (massive perf boost)
- actimeo=30
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
kubectl apply -f storageclass-nfs.yaml
# file: pvc-nfs-profiles.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: profiles-nfs
namespace: <namespace>
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile-premium-nfs
resources:
requests:
storage: 100Gi
kubectl apply -f pvc-nfs-profiles.yaml
# file: migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: pv-data-migration
namespace: <namespace>
spec:
template:
spec:
containers:
- name: migrate
image: mcr.microsoft.com/cbl-mariner/busybox:2.0
command:
- sh
- -c
- |
echo "Starting migration..."
cp -av /src/* /dst/
echo "Migration complete!"
echo "Source file count: $(find /src -type f | wc -l)"
echo "Dest file count: $(find /dst -type f | wc -l)"
volumeMounts:
- name: smb-source
mountPath: /src
readOnly: true
- name: nfs-dest
mountPath: /dst
volumes:
- name: smb-source
persistentVolumeClaim:
claimName: <existing-smb-pvc-name> # Current SMB PVC
- name: nfs-dest
persistentVolumeClaim:
claimName: profiles-nfs # New NFS PVC
restartPolicy: Never
backoffLimit: 2
kubectl apply -f migration-job.yaml
kubectl logs -f job/pv-data-migration -n <namespace>
# In your deployment spec, change:
volumes:
- name: profiles
persistentVolumeClaim:
claimName: profiles-nfs # <-- Updated from old SMB PVC name
# Verify/update storage account for NFS support
az storage account update \
--name <storageaccount> \
--resource-group <rg> \
--https-only false # NFS requires this if using existing account
# Or create a new dedicated FileStorage account for NFS
az storage account create \
--name <newstorageaccount> \
--resource-group <rg> \
--location <same-region-as-aks> \
--sku Premium_LRS \
--kind FileStorage \
--https-only false \
--default-action Deny
Expected impact: 🟢 5-8x IOPS improvement (378 → 2,000-3,100 read IOPS)
| Action | Command | Impact |
|---|---|---|
| Increase provisioned size to 256 GiB | az storage share-rm update --storage-account <acct> --name <share> --quota 256 |
Baseline IOPS: 3,100 → 3,256 |
| Clean up old snapshots | az storage share snapshot delete --account-name <acct> --name <share> --snapshot <id> |
Reduces metadata overhead, saves cost |
| Upgrade AKS node VM size | az aks nodepool update --cluster-name <cluster> --name <pool> --resource-group <rg> --node-vm-size Standard_D4s_v3 |
More network bandwidth |
| Solution | Effort | Downtime | IOPS Improvement | Passes Threshold? | Long-Term? |
|---|---|---|---|---|---|
| Optimize SMB mount options | 🟢 Low | ~5 min (PVC recreate) | 2-3x (~800-1,100) | ⚠️ Maybe | No |
| Add private endpoint | 🟢 Low | None | 1.5-2x | ⚠️ Maybe | Partial |
| SMB optimized + private endpoint | 🟡 Medium | ~5 min | 3-4x (~1,200-1,500) | ⚠️ Close | Partial |
| Switch to NFS ⭐ | 🟡 Medium | ~15 min (migration) | 5-8x (~2,000-3,100) | ✅ Yes | ✅ Yes |
| NFS + increased share size | 🟡 Medium | ~15 min | 8-10x | ✅ Yes | ✅ Yes |
Immediate → Add private endpoint (if missing) + optimize SMB mount options
Short-term → Switch to NFS (StorageClass change + data migration)
The NFS switch is the definitive fix — it eliminates the SMB protocol overhead that is the primary reason your 3,100 baseline IOPS is being delivered as only 378 to the pod.