user-guides

Azure Files SMB — AKS Persistent Volume IOPS Troubleshooting Guide

Context: Azure Premium File Share (100 GiB provisioned, 15.37 GiB used, 179 snapshots) mounted as a Persistent Volume in AKS via SMB. IOPS prereq checks are consistently failing:


1. What to Check

# Check Area Priority
1 Current PV/StorageClass mount options 🔴 Critical
2 SMB protocol version on the node 🔴 Critical
3 Network path — AKS VNet ↔ Storage Account 🔴 Critical
4 Storage account region vs AKS cluster region 🔴 Critical
5 AKS node VM size (network bandwidth cap) 🟡 Medium
6 Azure Storage metrics — throttling errors 🟡 Medium
7 CSI driver version 🟡 Medium
8 Snapshot count and overhead 🟢 Low
9 Provisioned share size vs required IOPS 🟢 Low

2. How to Check

2.1 Current PV / StorageClass Mount Options

# Get the StorageClass name used by the PVC
kubectl get pvc -n <namespace> -o wide

# Inspect the StorageClass
kubectl get storageclass <storageclass-name> -o yaml

# Inspect the PV directly
kubectl get pv <pv-name> -o yaml | grep -A 15 mountOptions

What to look for:

# ❌ BAD — Missing or minimal mount options
mountOptions: []

# ✅ GOOD — Optimized mount options
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict
  - nosharesock
  - nobrl
  - actimeo=30

2.2 SMB Protocol Version on the Node

SSH into the AKS node (or use a debug pod):

# Option A: Debug pod on the node
kubectl debug node/<node-name> -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0
chroot /host

# Check what's actually mounted
mount | grep cifs

# Look for vers= in the output
# ❌ BAD: vers=2.1 or vers=3.0
# ✅ GOOD: vers=3.1.1

2.3 Network Path — VNet ↔ Storage Account

# From inside a pod that uses the PVC
kubectl exec -it <pod-name> -n <namespace> -- sh

# Check DNS resolution
nslookup <storageaccount>.file.core.windows.net

# Check latency
ping -c 10 <storageaccount>.file.core.windows.net

# ❌ BAD: Resolves to public IP, latency > 5ms
# ✅ GOOD: Resolves to private IP (10.x.x.x), latency < 2ms
# Check if private endpoint exists (Azure CLI)
az storage account show \
  --name <storageaccount> \
  --resource-group <rg> \
  --query "privateEndpointConnections" -o table

# Check service endpoints on the AKS subnet
az network vnet subnet show \
  --resource-group <rg> \
  --vnet-name <vnet> \
  --name <aks-subnet> \
  --query "serviceEndpoints" -o table

2.4 Storage Account Region vs AKS Region

# AKS cluster region
az aks show --name <cluster> --resource-group <rg> --query "location" -o tsv

# Storage account region
az storage account show --name <storageaccount> --resource-group <rg> --query "primaryLocation" -o tsv

# ❌ BAD: Different regions (e.g., eastus vs westus2)
# ✅ GOOD: Same region

2.5 AKS Node VM Size

# Check node pool VM size
az aks nodepool list --cluster-name <cluster> --resource-group <rg> \
  --query "[].{Name:name, VMSize:vmSize, Count:count}" -o table

# Check expected network bandwidth for the VM SKU
az vm list-skus --location <region> --size <vm-size> --query "[0].capabilities[?name=='UncachedDiskIOPS' || name=='ExpectedNetworkBandwidth']" -o table

Note: Small VM SKUs like Standard_B2s or Standard_D2s_v3 have limited network bandwidth (e.g., 1 Gbps) which can bottleneck SMB throughput.

2.6 Azure Storage Metrics — Throttling

# Check for throttling events in the last hour
az monitor metrics list \
  --resource "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>/fileServices/default" \
  --metric "Transactions" \
  --dimension "ResponseType" \
  --interval PT1H \
  --query "value[0].timeseries[?metadatavalues[0].value=='ClientThrottlingError' || metadatavalues[0].value=='ServerBusyError']" \
  -o table

Or in the Azure Portal:

  1. Storage Account → MonitoringMetrics
  2. Metric: Transactions
  3. Split by: Response type
  4. Filter for: ClientThrottlingError, ServerBusyError

2.7 CSI Driver Version

# Check Azure Files CSI driver version
kubectl get pods -n kube-system -l app=csi-azurefile-node -o jsonpath='{.items[0].spec.containers[0].image}'

# ❌ BAD: v1.28 or older
# ✅ GOOD: v1.30+ (latest stable)

2.8 Snapshot Count

# List snapshots and total size
az storage share-rm list \
  --storage-account <storageaccount> \
  --resource-group <rg> \
  --query "[].{Name:name, Quota:shareQuota, UsedBytes:shareUsageBytes}" -o table

# Check snapshot count
az storage share snapshot list \
  --account-name <storageaccount> \
  --name <sharename> \
  --query "length(@)" -o tsv

2.9 Provisioned Size vs Required IOPS

For Premium Azure Files, baseline IOPS is calculated as:

Baseline IOPS = MAX(3,000 + (1 × Provisioned GiB), 100)
Burst IOPS    = MAX(10,000, 3 × Baseline IOPS)
Provisioned GiB Baseline IOPS Burst IOPS
100 3,100 10,000
256 3,256 10,000
1,024 4,024 12,072

Your 3,100 baseline IOPS should be more than enough for the 1,800 read + 600 write threshold. The problem is not provisioned IOPS — it’s delivery to the pod.


3. Reasoning

Why are you getting 378 Read / 123 Write IOPS instead of 3,100?

┌──────────┐    ┌──────────────┐    ┌─────────────────┐    ┌──────────────────┐
│  Pod     │───>│  AKS Node    │───>│  Network Path   │───>│  Azure Storage   │
│          │    │ (SMB Client) │    │  (VNet/PE/SE)   │    │  (Premium Files) │
└──────────┘    └──────────────┘    └─────────────────┘    └──────────────────┘
                 ▲                    ▲                      ▲
                 │                    │                      │
            Bottleneck 1         Bottleneck 2           Bottleneck 3
         Mount options,       Latency, no private     Throttling,
         SMB version,         endpoint, cross-region  snapshot overhead
         VM size limits       routing
Root Cause Likelihood Impact on IOPS
Suboptimal SMB mount options (missing nosharesock, nobrl, cache=strict, small buffer sizes) 🔴 Very High Can reduce IOPS by 3-5x
No private endpoint — traffic routing through public internet or NAT gateway 🔴 High Adds 5-20ms latency per operation
Old SMB version (vers=2.1 or 3.0 instead of 3.1.1) 🟡 Medium 20-40% IOPS reduction
Small AKS node VM — limited network bandwidth 🟡 Medium Caps throughput regardless of storage
SMB protocol overhead on Linux — encryption, signing, non-native protocol 🟡 Medium Inherent ~30-50% overhead vs NFS
High snapshot count (179) — metadata overhead 🟢 Low Minimal direct IOPS impact

4. Solutions — Minimal Change to Long-Term

Phase 1: Quick Wins (Minimal Change) ⏱️ ~30 minutes

1A. Optimize Mount Options on StorageClass

Create a new optimized StorageClass:

# file: storageclass-smb-optimized.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-premium-optimized
provisioner: file.csi.azure.com
parameters:
  skuName: Premium_LRS
  protocol: smb
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict        # Enable client-side caching
  - nosharesock          # Dedicated TCP connection per mount
  - nobrl                # Disable byte range locks (if app doesn't need them)
  - actimeo=30           # Cache file attributes for 30 seconds
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
kubectl apply -f storageclass-smb-optimized.yaml

⚠️ Note: Existing PVCs cannot change StorageClass. You must create a new PVC and migrate data.

Expected impact: 🟡 2-3x IOPS improvement (378 → 800-1,100 read IOPS)

1B. Ensure Private Endpoint / Service Endpoint Exists

# Create a private endpoint if one doesn't exist
az network private-endpoint create \
  --name pe-storageaccount \
  --resource-group <rg> \
  --vnet-name <aks-vnet> \
  --subnet <aks-subnet-or-pe-subnet> \
  --private-connection-resource-id "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<account>" \
  --group-id file \
  --connection-name pe-connection

Expected impact: 🟡 1.5-2x IOPS improvement (reduces latency from ~10-20ms to ~1-2ms)


This is the single most impactful change you can make.

2A. Create NFS StorageClass

# file: storageclass-nfs.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-premium-nfs
provisioner: file.csi.azure.com
parameters:
  protocol: nfs           # Switch from SMB to NFS v4.1
  skuName: Premium_LRS
mountOptions:
  - nconnect=8             # 8 parallel TCP connections (massive perf boost)
  - actimeo=30
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
kubectl apply -f storageclass-nfs.yaml

2B. Create New PVC

# file: pvc-nfs-profiles.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: profiles-nfs
  namespace: <namespace>
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile-premium-nfs
  resources:
    requests:
      storage: 100Gi
kubectl apply -f pvc-nfs-profiles.yaml

2C. Migrate Data (Old SMB PV → New NFS PV)

# file: migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: pv-data-migration
  namespace: <namespace>
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: mcr.microsoft.com/cbl-mariner/busybox:2.0
        command:
          - sh
          - -c
          - |
            echo "Starting migration..."
            cp -av /src/* /dst/
            echo "Migration complete!"
            echo "Source file count: $(find /src -type f | wc -l)"
            echo "Dest file count: $(find /dst -type f | wc -l)"
        volumeMounts:
        - name: smb-source
          mountPath: /src
          readOnly: true
        - name: nfs-dest
          mountPath: /dst
      volumes:
      - name: smb-source
        persistentVolumeClaim:
          claimName: <existing-smb-pvc-name>   # Current SMB PVC
      - name: nfs-dest
        persistentVolumeClaim:
          claimName: profiles-nfs               # New NFS PVC
      restartPolicy: Never
  backoffLimit: 2
kubectl apply -f migration-job.yaml
kubectl logs -f job/pv-data-migration -n <namespace>

2D. Update Deployment to Use New NFS PVC

# In your deployment spec, change:
volumes:
  - name: profiles
    persistentVolumeClaim:
      claimName: profiles-nfs    # <-- Updated from old SMB PVC name

Prerequisites for NFS

# Verify/update storage account for NFS support
az storage account update \
  --name <storageaccount> \
  --resource-group <rg> \
  --https-only false    # NFS requires this if using existing account

# Or create a new dedicated FileStorage account for NFS
az storage account create \
  --name <newstorageaccount> \
  --resource-group <rg> \
  --location <same-region-as-aks> \
  --sku Premium_LRS \
  --kind FileStorage \
  --https-only false \
  --default-action Deny

Expected impact: 🟢 5-8x IOPS improvement (378 → 2,000-3,100 read IOPS)


Phase 3: Additional Optimizations (If Needed)

Action Command Impact
Increase provisioned size to 256 GiB az storage share-rm update --storage-account <acct> --name <share> --quota 256 Baseline IOPS: 3,100 → 3,256
Clean up old snapshots az storage share snapshot delete --account-name <acct> --name <share> --snapshot <id> Reduces metadata overhead, saves cost
Upgrade AKS node VM size az aks nodepool update --cluster-name <cluster> --name <pool> --resource-group <rg> --node-vm-size Standard_D4s_v3 More network bandwidth

Summary — Solution Comparison

Solution Effort Downtime IOPS Improvement Passes Threshold? Long-Term?
Optimize SMB mount options 🟢 Low ~5 min (PVC recreate) 2-3x (~800-1,100) ⚠️ Maybe No
Add private endpoint 🟢 Low None 1.5-2x ⚠️ Maybe Partial
SMB optimized + private endpoint 🟡 Medium ~5 min 3-4x (~1,200-1,500) ⚠️ Close Partial
Switch to NFS 🟡 Medium ~15 min (migration) 5-8x (~2,000-3,100) Yes ✅ Yes
NFS + increased share size 🟡 Medium ~15 min 8-10x Yes ✅ Yes
Immediate  →  Add private endpoint (if missing) + optimize SMB mount options
Short-term →  Switch to NFS (StorageClass change + data migration)

The NFS switch is the definitive fix — it eliminates the SMB protocol overhead that is the primary reason your 3,100 baseline IOPS is being delivered as only 378 to the pod.