Skip to main content

Command Palette

Search for a command to run...

Day 64: Kubernetes Python App Troubleshooting - Fixing Deployment Issues ๐Ÿ

Published
โ€ข11 min read
Day 64: Kubernetes Python App Troubleshooting - Fixing Deployment Issues ๐Ÿ

Welcome back! ๐Ÿ‘‹ Day 64 of the 100 Days Cloud DevOps Challenge, and today we're troubleshooting a broken Python Flask application on Kubernetes! This is production debugging - identifying configuration errors and restoring service quickly. Let's fix it! ๐ŸŽฏ

๐ŸŽฏ The Mission - Fix Python Flask Deployment

๐Ÿ“‹ TASK TICKET #DEV-8064 - URGENT: Python App Down
Priority: CRITICAL | Type: Incident Response
Server: Jump Host | Cluster: Kubernetes

INCIDENT:
- Python Flask app deployment failed
- Application not accessible
- Misconfiguration in deployment/service
- Need immediate resolution

INVESTIGATION:
- Deployment: python-deployment-nautilus
- Image: poroko/flask-demo-app (mentioned)
- NodePort should be: 32345
- TargetPort should be: Flask default (5000)
- Current status: Not working

SUCCESS CRITERIA:
- Identify configuration issues
- Fix deployment and service
- Application accessible on NodePort 32345
- Flask app responding correctly

This is production troubleshooting! ๐Ÿšจ

๐Ÿ› ๏ธ Complete Troubleshooting & Fix

Step 1: Initial Assessment

# SSH to jump host
ssh user@jump_host

# Check deployment status
kubectl get deployment python-deployment-nautilus

Expected output (broken state):

NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
python-deployment-nautilus    0/1     1            0           10m

Problem indicators:

  • โŒ READY: 0/1 (no pods ready)

  • โŒ AVAILABLE: 0 (service unavailable)

Step 2: Check Pod Status

# Check pods
kubectl get pods -l app=python-deployment-nautilus

Expected output:

NAME                                          READY   STATUS             RESTARTS   AGE
python-deployment-nautilus-7d8f9c6b5-abc12    0/1     ImagePullBackOff   0          10m

Issue identified: ImagePullBackOff - Image cannot be pulled!

Step 3: Investigate Deployment Configuration

# Check deployment details
kubectl describe deployment python-deployment-nautilus

Look for image specification:

Pod Template:
  Containers:
   python-container:
    Image:      poroko/flask-demo-appimage

Root cause #1 found: Image name has typo! poroko/flask-demo-appimage should be poroko/flask-demo-app

Step 4: Check Service Configuration

# Check service
kubectl get service python-service-nautilus

Expected output:

NAME                      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
python-service-nautilus   NodePort   10.96.123.45    <none>        8080:32345/TCP   10m
# Check service details
kubectl describe service python-service-nautilus

Look for ports:

Port:              <unset>  8080/TCP
TargetPort:        8080/TCP
NodePort:          <unset>  32345/TCP

Root cause #2 found: TargetPort is 8080, but Flask default port is 5000!

Step 5: Fix Deployment - Correct Image Name

# Fix image name in deployment
kubectl set image deployment/python-deployment-nautilus python-container=poroko/flask-demo-app

# OR edit deployment directly
kubectl edit deployment python-deployment-nautilus
# Change: image: poroko/flask-demo-appimage
# To:     image: poroko/flask-demo-app

Expected output:

deployment.apps/python-deployment-nautilus image updated

Image fixed! โœ…

Step 6: Fix Service - Correct TargetPort

# Edit service to fix targetPort
kubectl edit service python-service-nautilus

# Find the ports section and change:
# FROM:
#   ports:
#   - port: 8080
#     targetPort: 8080
#     nodePort: 32345

# TO:
#   ports:
#   - port: 8080
#     targetPort: 5000
#     nodePort: 32345

# Save and exit

Expected output:

service/python-service-nautilus edited

Service fixed! โœ…

Alternative - Patch command:

# Patch service with correct targetPort
kubectl patch service python-service-nautilus -p '{"spec":{"ports":[{"port":8080,"targetPort":5000,"nodePort":32345}]}}'

Step 7: Monitor Rollout

# Watch deployment rollout
kubectl rollout status deployment/python-deployment-nautilus

Expected output:

Waiting for deployment "python-deployment-nautilus" rollout to finish: 0 of 1 updated replicas are available...
deployment "python-deployment-nautilus" successfully rolled out

Check pod status:

kubectl get pods -l app=python-deployment-nautilus

Expected output (fixed):

NAME                                          READY   STATUS    RESTARTS   AGE
python-deployment-nautilus-5c7d8e9f6-xyz89    1/1     Running   0          45s

Pod running successfully! โœ…

Step 8: Verify Service Configuration

# Check updated service
kubectl get service python-service-nautilus

Expected output:

NAME                      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
python-service-nautilus   NodePort   10.96.123.45    <none>        8080:32345/TCP   15m
# Verify targetPort
kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].targetPort}'
echo

Expected output:

5000

TargetPort corrected to 5000! โœ…

Step 9: Test Application Access

# Get node IP
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "Access URL: http://$NODE_IP:32345"

# Test Flask application
curl http://$NODE_IP:32345

Expected output:

<!DOCTYPE html>
<html>
<head>
    <title>Flask Demo App</title>
</head>
<body>
    <h1>Welcome to Flask Demo Application!</h1>
    <p>This is a simple Python Flask app running on Kubernetes.</p>
</body>
</html>

Application accessible and working! ๐ŸŽ‰

Step 10: Verify Flask Application

# Check Flask app logs
kubectl logs -l app=python-deployment-nautilus --tail=20

Expected output:

 * Serving Flask app 'app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://10.244.1.5:5000
Press CTRL+C to quit

Flask running on port 5000! โœ…

Step 11: Complete Verification

# Comprehensive verification script
cat > verify-python-app.sh << 'EOF'
#!/bin/bash
echo "=== Python Flask Application Fix Verification ==="
echo ""
echo "1. Deployment Status:"
kubectl get deployment python-deployment-nautilus
echo ""
echo "2. Pod Status:"
kubectl get pods -l app=python-deployment-nautilus
echo ""
echo "3. Deployment Image:"
echo "   Current: $(kubectl get deployment python-deployment-nautilus -o jsonpath='{.spec.template.spec.containers[0].image}')"
echo "   Expected: poroko/flask-demo-app"
echo ""
echo "4. Service Configuration:"
kubectl get service python-service-nautilus
echo ""
echo "5. Service Ports:"
echo "   Port: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].port}')"
echo "   TargetPort: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].targetPort}')"
echo "   NodePort: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].nodePort}')"
echo "   Expected TargetPort: 5000 (Flask default)"
echo ""
echo "6. Service Endpoints:"
kubectl get endpoints python-service-nautilus
echo ""
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "7. Access URL: http://$NODE_IP:32345"
echo ""
echo "8. Application Health Check:"
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://$NODE_IP:32345)
if [ "$HTTP_CODE" = "200" ]; then
    echo "   โœ“ Application accessible (HTTP $HTTP_CODE)"
    echo "   โœ“ Flask app responding correctly"
else
    echo "   โœ— Application not responding (HTTP $HTTP_CODE)"
fi
echo ""
echo "9. Flask Application Logs (Last 5 lines):"
kubectl logs -l app=python-deployment-nautilus --tail=5
echo ""
echo "10. Issues Fixed:"
echo "    โœ“ Image name corrected: poroko/flask-demo-app"
echo "    โœ“ TargetPort corrected: 5000 (Flask default)"
echo "    โœ“ NodePort maintained: 32345"
echo ""
echo "โœ“ VERIFICATION COMPLETE - Application Restored"
EOF

chmod +x verify-python-app.sh
./verify-python-app.sh

Expected output:

=== Python Flask Application Fix Verification ===

1. Deployment Status:
NAME                          READY   UP-TO-DATE   AVAILABLE   AGE
python-deployment-nautilus    1/1     1            1           20m

2. Pod Status:
NAME                                          READY   STATUS    RESTARTS   AGE
python-deployment-nautilus-5c7d8e9f6-xyz89    1/1     Running   0          5m

3. Deployment Image:
   Current: poroko/flask-demo-app
   Expected: poroko/flask-demo-app

4. Service Configuration:
NAME                      TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
python-service-nautilus   NodePort   10.96.123.45    <none>        8080:32345/TCP   20m

5. Service Ports:
   Port: 8080
   TargetPort: 5000
   NodePort: 32345
   Expected TargetPort: 5000 (Flask default)

6. Service Endpoints:
NAME                      ENDPOINTS          AGE
python-service-nautilus   10.244.1.5:5000    20m

7. Access URL: http://172.17.0.2:32345

8. Application Health Check:
   โœ“ Application accessible (HTTP 200)
   โœ“ Flask app responding correctly

9. Flask Application Logs (Last 5 lines):
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://10.244.1.5:5000
172.17.0.1 - - [08/Jan/2026 10:15:30] "GET / HTTP/1.1" 200 -
Press CTRL+C to quit

10. Issues Fixed:
    โœ“ Image name corrected: poroko/flask-demo-app
    โœ“ TargetPort corrected: 5000 (Flask default)
    โœ“ NodePort maintained: 32345

โœ“ VERIFICATION COMPLETE - Application Restored

All issues resolved! ๐ŸŽŠ

๐Ÿ” Understanding the Issues

Issue #1: Image Name Typo

The Problem:

# Broken configuration:
image: poroko/flask-demo-appimage
#                            โ†‘โ†‘โ†‘โ†‘โ†‘
#                            Extra "image" suffix

What happened:

1. Deployment tries to pull image
2. Docker Hub search: poroko/flask-demo-appimage
3. Image not found (doesn't exist)
4. Pod status: ImagePullBackOff
5. Application down

The Fix:

# Correct configuration:
image: poroko/flask-demo-app

How to identify:

# Check pod events
kubectl describe pod POD_NAME

# Look for:
Events:
  Type     Reason     Message
  ----     ------     -------
  Warning  Failed     Failed to pull image "poroko/flask-demo-appimage"
  Warning  Failed     Error: ErrImagePull
  Normal   BackOff    Back-off pulling image "poroko/flask-demo-appimage"

Issue #2: Wrong TargetPort

The Problem:

# Broken service configuration:
spec:
  ports:
  - port: 8080
    targetPort: 8080  # Wrong! Flask uses 5000
    nodePort: 32345

Understanding the ports:

Port Flow:
External Request โ†’ NodePort (32345)
  โ†“
Service Port (8080)
  โ†“
TargetPort (should be 5000) โ† Flask listens here
  โ†“
Container Port (5000)

Problem:
Service sends traffic to port 8080
But Flask listens on port 5000
Result: Connection refused

Flask Default Port:

# Flask applications default to port 5000
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

The Fix:

# Correct service configuration:
spec:
  ports:
  - port: 8080        # Service port (can be anything)
    targetPort: 5000  # Must match Flask port!
    nodePort: 32345   # External access port

Port Mapping Explained

Complete flow:

User Browser:
http://NodeIP:32345
  โ†“
Node (kube-proxy):
Receives on port 32345
  โ†“
Service:
Routes to port 8080 (service port)
  โ†“
TargetPort:
Forwards to pod port 5000
  โ†“
Pod/Container:
Flask app listening on 5000
  โ†“
Response back through same chain

Port terminology:

nodePort (32345):
- External access point
- Port on every node
- Range: 30000-32767
- User-facing

port (8080):
- Service port
- Internal cluster port
- ClusterIP:port
- Can be any valid port

targetPort (5000):
- Container port
- Where app actually listens
- Must match container
- Flask default is 5000

Common Port Configuration Mistakes

Mistake #1: Wrong targetPort

# Wrong
targetPort: 80  # Nginx default, not Flask!

# Correct
targetPort: 5000  # Flask default

Mistake #2: Mismatched container port

# Container definition:
ports:
- containerPort: 5000

# Service must target the same:
targetPort: 5000  # Must match!

Mistake #3: Confusing port types

# All three are different!
nodePort: 32345      # External
port: 8080           # Service
targetPort: 5000     # Container

๐Ÿ’ก Key Takeaways

โœจ Image names must be exact - typos cause ImagePullBackOff
โœจ TargetPort must match application listening port
โœจ Flask default port is 5000
โœจ kubectl describe reveals configuration issues
โœจ Service ports map external to internal traffic
โœจ Pod events show image pull failures
โœจ Port mismatches cause connection refused
โœจ Systematic debugging finds root causes fast

๐ŸŽ“ Quick Interview Questions

Q: What causes ImagePullBackOff error? A: Container runtime cannot pull image. Causes: (1) Image doesn't exist, (2) Wrong image name/tag, (3) Authentication required but not provided, (4) Registry unreachable, (5) Network issues. Check with kubectl describe pod.

Q: What's the difference between port, targetPort, and nodePort? A: port: Service's internal cluster port. targetPort: Container's listening port (where app actually runs). nodePort: External access port on nodes (30000-32767 range). Flow: nodePort โ†’ port โ†’ targetPort โ†’ container.

Q: How do you determine an application's default port? A: Check: (1) Application documentation, (2) Dockerfile EXPOSE directive, (3) Application logs, (4) kubectl exec and run netstat -tulpn. Flask: 5000, Django: 8000, Nginx: 80, Node.js: 3000.

Q: Can targetPort be different from containerPort in pod spec? A: No, they must match! containerPort (in deployment) declares what port container listens on. targetPort (in service) must route to that same port. Mismatch = connection refused.

Q: What happens if you fix image but not targetPort? A: Pod runs successfully (image pulls), but service doesn't work. Connection refused errors because service sends traffic to wrong port. Pod healthy, but unreachable through service.

Q: How do you verify Flask is running on correct port? A: Three ways: (1) Check logs kubectl logs POD, look for "Running on...port", (2) kubectl exec POD -- netstat -tulpn, (3) kubectl exec POD -- curl localhost:5000.

Q: Why use kubectl set image vs kubectl edit? A: kubectl set image: Fast, scriptable, specific change only, good for automation. kubectl edit: Interactive, see full config, edit multiple fields, better for complex changes. Both trigger rolling update.

Q: What's the purpose of service port if we have targetPort? A: Service port is abstraction layer. Benefits: (1) Change container port without changing clients, (2) Multiple services can target same backend with different service ports, (3) Port translation, (4) Logical grouping.

Q: How do you troubleshoot "connection refused" errors? A: Check: (1) Service targetPort matches container port, (2) Container actually listening (logs/netstat), (3) Service selector matches pod labels, (4) Pod is running and ready, (5) Firewall rules, (6) Network policies.

Q: Can you have multiple ports in a service? A: Yes! Service can expose multiple ports. Example: HTTP (80) and HTTPS (443). Each needs its own port, targetPort, and optional name. Useful for applications listening on multiple ports.

Advanced Questions:

Q: What's the difference between ErrImagePull and ImagePullBackOff? A: ErrImagePull: Initial failure to pull image. ImagePullBackOff: Kubernetes is backing off (waiting) before retrying pull after repeated failures. Backoff uses exponential delay to avoid overwhelming registry.

Q: How does Kubernetes retry image pulls? A: Exponential backoff: First retry immediate, then 10s, 20s, 40s, up to max 5 minutes. Pod stays in ImagePullBackOff during backoff. Reset on successful pull or manual intervention.

Q: What's the impact of wrong targetPort on different service types? A: ClusterIP: Internal requests fail. NodePort: External requests fail (but NodePort still allocated). LoadBalancer: Health checks fail, load balancer removes backend. All result in connection refused or timeout.

Q: How do you fix these issues without downtime? A: Impossible for ImagePullBackOff (pod not running). For targetPort: If multiple replicas, rolling update keeps some old pods running while fixing. With single replica, brief downtime unavoidable during fix.

Q: Why didn't Kubernetes catch the wrong targetPort? A: Kubernetes validates syntax not semantics. targetPort can be any valid port number (1-65535). Kubernetes doesn't know what port your application listens on - that's application-specific knowledge.

๐ŸŽ‰ Final Thoughts

You've successfully debugged and fixed a broken Python Flask application on Kubernetes! This is real production troubleshooting.

What you accomplished: โœ… Identified ImagePullBackOff issue
โœ… Corrected image name typo
โœ… Fixed service targetPort mismatch
โœ… Understood Flask port defaults
โœ… Restored application service
โœ… Verified complete functionality

Root causes fixed:

  1. Image typo: poroko/flask-demo-appimage โ†’ poroko/flask-demo-app

  2. Wrong port: targetPort: 8080 โ†’ targetPort: 5000

Real-world impact:

  • Fast resolution (systematic debugging)

  • Service restored (application accessible)

  • Knowledge gained (Flask port defaults)

  • Skills sharpened (port mapping understanding)

  • Documentation created (prevent recurrence)

Debugging workflow:

  1. Check deployment status (READY 0/1)

  2. Check pod status (ImagePullBackOff)

  3. Describe pod (image pull failure)

  4. Fix image name

  5. Test service (connection refused)

  6. Check service ports (wrong targetPort)

  7. Fix targetPort

  8. Verify application (working!)

This is production SRE excellence! ๐Ÿ’ช


Day: 64/100
Challenge: KodeKloud Cloud DevOps
Date: January 08, 2026
Topic: Kubernetes Python Flask Troubleshooting

What's your debugging process for Kubernetes issues? Share your troubleshooting stories! ๐Ÿ

More from this blog

๐Ÿš€ DevOps Challenge- KodeKloud Solutions

73 posts