Day 64: Kubernetes Python App Troubleshooting - Fixing Deployment Issues ๐

Welcome back! ๐ Day 64 of the 100 Days Cloud DevOps Challenge, and today we're troubleshooting a broken Python Flask application on Kubernetes! This is production debugging - identifying configuration errors and restoring service quickly. Let's fix it! ๐ฏ
๐ฏ The Mission - Fix Python Flask Deployment
๐ TASK TICKET #DEV-8064 - URGENT: Python App Down
Priority: CRITICAL | Type: Incident Response
Server: Jump Host | Cluster: Kubernetes
INCIDENT:
- Python Flask app deployment failed
- Application not accessible
- Misconfiguration in deployment/service
- Need immediate resolution
INVESTIGATION:
- Deployment: python-deployment-nautilus
- Image: poroko/flask-demo-app (mentioned)
- NodePort should be: 32345
- TargetPort should be: Flask default (5000)
- Current status: Not working
SUCCESS CRITERIA:
- Identify configuration issues
- Fix deployment and service
- Application accessible on NodePort 32345
- Flask app responding correctly
This is production troubleshooting! ๐จ
๐ ๏ธ Complete Troubleshooting & Fix
Step 1: Initial Assessment
# SSH to jump host
ssh user@jump_host
# Check deployment status
kubectl get deployment python-deployment-nautilus
Expected output (broken state):
NAME READY UP-TO-DATE AVAILABLE AGE
python-deployment-nautilus 0/1 1 0 10m
Problem indicators:
โ READY: 0/1 (no pods ready)
โ AVAILABLE: 0 (service unavailable)
Step 2: Check Pod Status
# Check pods
kubectl get pods -l app=python-deployment-nautilus
Expected output:
NAME READY STATUS RESTARTS AGE
python-deployment-nautilus-7d8f9c6b5-abc12 0/1 ImagePullBackOff 0 10m
Issue identified: ImagePullBackOff - Image cannot be pulled!
Step 3: Investigate Deployment Configuration
# Check deployment details
kubectl describe deployment python-deployment-nautilus
Look for image specification:
Pod Template:
Containers:
python-container:
Image: poroko/flask-demo-appimage
Root cause #1 found: Image name has typo! poroko/flask-demo-appimage should be poroko/flask-demo-app
Step 4: Check Service Configuration
# Check service
kubectl get service python-service-nautilus
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
python-service-nautilus NodePort 10.96.123.45 <none> 8080:32345/TCP 10m
# Check service details
kubectl describe service python-service-nautilus
Look for ports:
Port: <unset> 8080/TCP
TargetPort: 8080/TCP
NodePort: <unset> 32345/TCP
Root cause #2 found: TargetPort is 8080, but Flask default port is 5000!
Step 5: Fix Deployment - Correct Image Name
# Fix image name in deployment
kubectl set image deployment/python-deployment-nautilus python-container=poroko/flask-demo-app
# OR edit deployment directly
kubectl edit deployment python-deployment-nautilus
# Change: image: poroko/flask-demo-appimage
# To: image: poroko/flask-demo-app
Expected output:
deployment.apps/python-deployment-nautilus image updated
Image fixed! โ
Step 6: Fix Service - Correct TargetPort
# Edit service to fix targetPort
kubectl edit service python-service-nautilus
# Find the ports section and change:
# FROM:
# ports:
# - port: 8080
# targetPort: 8080
# nodePort: 32345
# TO:
# ports:
# - port: 8080
# targetPort: 5000
# nodePort: 32345
# Save and exit
Expected output:
service/python-service-nautilus edited
Service fixed! โ
Alternative - Patch command:
# Patch service with correct targetPort
kubectl patch service python-service-nautilus -p '{"spec":{"ports":[{"port":8080,"targetPort":5000,"nodePort":32345}]}}'
Step 7: Monitor Rollout
# Watch deployment rollout
kubectl rollout status deployment/python-deployment-nautilus
Expected output:
Waiting for deployment "python-deployment-nautilus" rollout to finish: 0 of 1 updated replicas are available...
deployment "python-deployment-nautilus" successfully rolled out
Check pod status:
kubectl get pods -l app=python-deployment-nautilus
Expected output (fixed):
NAME READY STATUS RESTARTS AGE
python-deployment-nautilus-5c7d8e9f6-xyz89 1/1 Running 0 45s
Pod running successfully! โ
Step 8: Verify Service Configuration
# Check updated service
kubectl get service python-service-nautilus
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
python-service-nautilus NodePort 10.96.123.45 <none> 8080:32345/TCP 15m
# Verify targetPort
kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].targetPort}'
echo
Expected output:
5000
TargetPort corrected to 5000! โ
Step 9: Test Application Access
# Get node IP
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "Access URL: http://$NODE_IP:32345"
# Test Flask application
curl http://$NODE_IP:32345
Expected output:
<!DOCTYPE html>
<html>
<head>
<title>Flask Demo App</title>
</head>
<body>
<h1>Welcome to Flask Demo Application!</h1>
<p>This is a simple Python Flask app running on Kubernetes.</p>
</body>
</html>
Application accessible and working! ๐
Step 10: Verify Flask Application
# Check Flask app logs
kubectl logs -l app=python-deployment-nautilus --tail=20
Expected output:
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://10.244.1.5:5000
Press CTRL+C to quit
Flask running on port 5000! โ
Step 11: Complete Verification
# Comprehensive verification script
cat > verify-python-app.sh << 'EOF'
#!/bin/bash
echo "=== Python Flask Application Fix Verification ==="
echo ""
echo "1. Deployment Status:"
kubectl get deployment python-deployment-nautilus
echo ""
echo "2. Pod Status:"
kubectl get pods -l app=python-deployment-nautilus
echo ""
echo "3. Deployment Image:"
echo " Current: $(kubectl get deployment python-deployment-nautilus -o jsonpath='{.spec.template.spec.containers[0].image}')"
echo " Expected: poroko/flask-demo-app"
echo ""
echo "4. Service Configuration:"
kubectl get service python-service-nautilus
echo ""
echo "5. Service Ports:"
echo " Port: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].port}')"
echo " TargetPort: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].targetPort}')"
echo " NodePort: $(kubectl get service python-service-nautilus -o jsonpath='{.spec.ports[0].nodePort}')"
echo " Expected TargetPort: 5000 (Flask default)"
echo ""
echo "6. Service Endpoints:"
kubectl get endpoints python-service-nautilus
echo ""
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
echo "7. Access URL: http://$NODE_IP:32345"
echo ""
echo "8. Application Health Check:"
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://$NODE_IP:32345)
if [ "$HTTP_CODE" = "200" ]; then
echo " โ Application accessible (HTTP $HTTP_CODE)"
echo " โ Flask app responding correctly"
else
echo " โ Application not responding (HTTP $HTTP_CODE)"
fi
echo ""
echo "9. Flask Application Logs (Last 5 lines):"
kubectl logs -l app=python-deployment-nautilus --tail=5
echo ""
echo "10. Issues Fixed:"
echo " โ Image name corrected: poroko/flask-demo-app"
echo " โ TargetPort corrected: 5000 (Flask default)"
echo " โ NodePort maintained: 32345"
echo ""
echo "โ VERIFICATION COMPLETE - Application Restored"
EOF
chmod +x verify-python-app.sh
./verify-python-app.sh
Expected output:
=== Python Flask Application Fix Verification ===
1. Deployment Status:
NAME READY UP-TO-DATE AVAILABLE AGE
python-deployment-nautilus 1/1 1 1 20m
2. Pod Status:
NAME READY STATUS RESTARTS AGE
python-deployment-nautilus-5c7d8e9f6-xyz89 1/1 Running 0 5m
3. Deployment Image:
Current: poroko/flask-demo-app
Expected: poroko/flask-demo-app
4. Service Configuration:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
python-service-nautilus NodePort 10.96.123.45 <none> 8080:32345/TCP 20m
5. Service Ports:
Port: 8080
TargetPort: 5000
NodePort: 32345
Expected TargetPort: 5000 (Flask default)
6. Service Endpoints:
NAME ENDPOINTS AGE
python-service-nautilus 10.244.1.5:5000 20m
7. Access URL: http://172.17.0.2:32345
8. Application Health Check:
โ Application accessible (HTTP 200)
โ Flask app responding correctly
9. Flask Application Logs (Last 5 lines):
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://10.244.1.5:5000
172.17.0.1 - - [08/Jan/2026 10:15:30] "GET / HTTP/1.1" 200 -
Press CTRL+C to quit
10. Issues Fixed:
โ Image name corrected: poroko/flask-demo-app
โ TargetPort corrected: 5000 (Flask default)
โ NodePort maintained: 32345
โ VERIFICATION COMPLETE - Application Restored
All issues resolved! ๐
๐ Understanding the Issues
Issue #1: Image Name Typo
The Problem:
# Broken configuration:
image: poroko/flask-demo-appimage
# โโโโโ
# Extra "image" suffix
What happened:
1. Deployment tries to pull image
2. Docker Hub search: poroko/flask-demo-appimage
3. Image not found (doesn't exist)
4. Pod status: ImagePullBackOff
5. Application down
The Fix:
# Correct configuration:
image: poroko/flask-demo-app
How to identify:
# Check pod events
kubectl describe pod POD_NAME
# Look for:
Events:
Type Reason Message
---- ------ -------
Warning Failed Failed to pull image "poroko/flask-demo-appimage"
Warning Failed Error: ErrImagePull
Normal BackOff Back-off pulling image "poroko/flask-demo-appimage"
Issue #2: Wrong TargetPort
The Problem:
# Broken service configuration:
spec:
ports:
- port: 8080
targetPort: 8080 # Wrong! Flask uses 5000
nodePort: 32345
Understanding the ports:
Port Flow:
External Request โ NodePort (32345)
โ
Service Port (8080)
โ
TargetPort (should be 5000) โ Flask listens here
โ
Container Port (5000)
Problem:
Service sends traffic to port 8080
But Flask listens on port 5000
Result: Connection refused
Flask Default Port:
# Flask applications default to port 5000
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
The Fix:
# Correct service configuration:
spec:
ports:
- port: 8080 # Service port (can be anything)
targetPort: 5000 # Must match Flask port!
nodePort: 32345 # External access port
Port Mapping Explained
Complete flow:
User Browser:
http://NodeIP:32345
โ
Node (kube-proxy):
Receives on port 32345
โ
Service:
Routes to port 8080 (service port)
โ
TargetPort:
Forwards to pod port 5000
โ
Pod/Container:
Flask app listening on 5000
โ
Response back through same chain
Port terminology:
nodePort (32345):
- External access point
- Port on every node
- Range: 30000-32767
- User-facing
port (8080):
- Service port
- Internal cluster port
- ClusterIP:port
- Can be any valid port
targetPort (5000):
- Container port
- Where app actually listens
- Must match container
- Flask default is 5000
Common Port Configuration Mistakes
Mistake #1: Wrong targetPort
# Wrong
targetPort: 80 # Nginx default, not Flask!
# Correct
targetPort: 5000 # Flask default
Mistake #2: Mismatched container port
# Container definition:
ports:
- containerPort: 5000
# Service must target the same:
targetPort: 5000 # Must match!
Mistake #3: Confusing port types
# All three are different!
nodePort: 32345 # External
port: 8080 # Service
targetPort: 5000 # Container
๐ก Key Takeaways
โจ Image names must be exact - typos cause ImagePullBackOff
โจ TargetPort must match application listening port
โจ Flask default port is 5000
โจ kubectl describe reveals configuration issues
โจ Service ports map external to internal traffic
โจ Pod events show image pull failures
โจ Port mismatches cause connection refused
โจ Systematic debugging finds root causes fast
๐ Quick Interview Questions
Q: What causes ImagePullBackOff error? A: Container runtime cannot pull image. Causes: (1) Image doesn't exist, (2) Wrong image name/tag, (3) Authentication required but not provided, (4) Registry unreachable, (5) Network issues. Check with kubectl describe pod.
Q: What's the difference between port, targetPort, and nodePort? A: port: Service's internal cluster port. targetPort: Container's listening port (where app actually runs). nodePort: External access port on nodes (30000-32767 range). Flow: nodePort โ port โ targetPort โ container.
Q: How do you determine an application's default port? A: Check: (1) Application documentation, (2) Dockerfile EXPOSE directive, (3) Application logs, (4) kubectl exec and run netstat -tulpn. Flask: 5000, Django: 8000, Nginx: 80, Node.js: 3000.
Q: Can targetPort be different from containerPort in pod spec? A: No, they must match! containerPort (in deployment) declares what port container listens on. targetPort (in service) must route to that same port. Mismatch = connection refused.
Q: What happens if you fix image but not targetPort? A: Pod runs successfully (image pulls), but service doesn't work. Connection refused errors because service sends traffic to wrong port. Pod healthy, but unreachable through service.
Q: How do you verify Flask is running on correct port? A: Three ways: (1) Check logs kubectl logs POD, look for "Running on...port", (2) kubectl exec POD -- netstat -tulpn, (3) kubectl exec POD -- curl localhost:5000.
Q: Why use kubectl set image vs kubectl edit? A: kubectl set image: Fast, scriptable, specific change only, good for automation. kubectl edit: Interactive, see full config, edit multiple fields, better for complex changes. Both trigger rolling update.
Q: What's the purpose of service port if we have targetPort? A: Service port is abstraction layer. Benefits: (1) Change container port without changing clients, (2) Multiple services can target same backend with different service ports, (3) Port translation, (4) Logical grouping.
Q: How do you troubleshoot "connection refused" errors? A: Check: (1) Service targetPort matches container port, (2) Container actually listening (logs/netstat), (3) Service selector matches pod labels, (4) Pod is running and ready, (5) Firewall rules, (6) Network policies.
Q: Can you have multiple ports in a service? A: Yes! Service can expose multiple ports. Example: HTTP (80) and HTTPS (443). Each needs its own port, targetPort, and optional name. Useful for applications listening on multiple ports.
Advanced Questions:
Q: What's the difference between ErrImagePull and ImagePullBackOff? A: ErrImagePull: Initial failure to pull image. ImagePullBackOff: Kubernetes is backing off (waiting) before retrying pull after repeated failures. Backoff uses exponential delay to avoid overwhelming registry.
Q: How does Kubernetes retry image pulls? A: Exponential backoff: First retry immediate, then 10s, 20s, 40s, up to max 5 minutes. Pod stays in ImagePullBackOff during backoff. Reset on successful pull or manual intervention.
Q: What's the impact of wrong targetPort on different service types? A: ClusterIP: Internal requests fail. NodePort: External requests fail (but NodePort still allocated). LoadBalancer: Health checks fail, load balancer removes backend. All result in connection refused or timeout.
Q: How do you fix these issues without downtime? A: Impossible for ImagePullBackOff (pod not running). For targetPort: If multiple replicas, rolling update keeps some old pods running while fixing. With single replica, brief downtime unavoidable during fix.
Q: Why didn't Kubernetes catch the wrong targetPort? A: Kubernetes validates syntax not semantics. targetPort can be any valid port number (1-65535). Kubernetes doesn't know what port your application listens on - that's application-specific knowledge.
๐ Final Thoughts
You've successfully debugged and fixed a broken Python Flask application on Kubernetes! This is real production troubleshooting.
What you accomplished: โ
Identified ImagePullBackOff issue
โ
Corrected image name typo
โ
Fixed service targetPort mismatch
โ
Understood Flask port defaults
โ
Restored application service
โ
Verified complete functionality
Root causes fixed:
Image typo:
poroko/flask-demo-appimageโporoko/flask-demo-appWrong port:
targetPort: 8080โtargetPort: 5000
Real-world impact:
Fast resolution (systematic debugging)
Service restored (application accessible)
Knowledge gained (Flask port defaults)
Skills sharpened (port mapping understanding)
Documentation created (prevent recurrence)
Debugging workflow:
Check deployment status (READY 0/1)
Check pod status (ImagePullBackOff)
Describe pod (image pull failure)
Fix image name
Test service (connection refused)
Check service ports (wrong targetPort)
Fix targetPort
Verify application (working!)
This is production SRE excellence! ๐ช
Day: 64/100
Challenge: KodeKloud Cloud DevOps
Date: January 08, 2026
Topic: Kubernetes Python Flask Troubleshooting
What's your debugging process for Kubernetes issues? Share your troubleshooting stories! ๐




