Skip to main content

Command Palette

Search for a command to run...

Day 9: The Case of the Missing Database Directory - A Production Mystery 🔍

Published
22 min read
Day 9: The Case of the Missing Database Directory - A Production Mystery 🔍

Welcome back! 👋 Day 9 of the 100 Days Cloud DevOps Challenge, and today we're dealing with a critical production incident that turned into a detective story. The database is down, but the reason why surprised me! Let's dive into this real troubleshooting journey! 🚨

🤔 What's the Situation?

It's Thursday morning, and your Slack explodes 🔥

🚨 CRITICAL ALERT - P0 🚨
Application: Nautilus
Status: COMPLETE OUTAGE
Error: Cannot connect to database
Affected: ALL USERS
Time: 06:47 UTC
Action: IMMEDIATE INVESTIGATION REQUIRED

The production support team has been investigating and found: MariaDB service is down on stdb01 (database server)!

But here's where it gets interesting - this isn't a simple "restart the service" situation. There's something more sinister going on... 🕵️

🎯 Why This Incident Matters

Database outages are the worst kind of incidents. Here's why:

💰 The Business Impact is MASSIVE

When your database goes down:

  • ❌ Zero transactions processing

  • ❌ Users can't login or access data

  • ❌ Revenue completely stops

  • ❌ Customer trust evaporates

  • ❌ Support tickets flood in

  • ❌ Social media lights up with complaints

Real numbers: For a mid-sized e-commerce site, database downtime can cost:

  • $5,000 - $10,000 per minute

  • $300,000 - $600,000 per hour

  • Plus immeasurable reputation damage

🔥 The Cascading Failure

Database Down
    ↓
Application Errors (500s)
    ↓
API Failures
    ↓
Frontend Breaks
    ↓
Users Can't Complete Actions
    ↓
Abandoned Carts/Transactions
    ↓
Revenue Loss
    ↓
Support Overwhelmed
    ↓
Executive Escalation
    ↓
"ALL HANDS ON DECK!"

Your Role: You're the DevOps engineer on call. Your ability to quickly diagnose and fix this determines how fast the business recovers. No pressure! 😅

💼 The Infrastructure: Nautilus in Stratos DC

Let's understand our environment:

Database Server:

  • Hostname: stdb01.stratos.xfusioncorp.com

  • IP: 172.16.239.10

  • User: peter

  • Password: Sp!dy

  • Purpose: Nautilus DB Server (MariaDB)

Application Servers (affected):

  • stapp01 (tony) - 172.16.238.10

  • stapp02 (steve) - 172.16.238.11

  • stapp03 (banner) - 172.16.238.12

All three app servers can't connect to the database. Complete outage! 💀

🧩 Understanding What We're Dealing With

Before we troubleshoot, let's understand MariaDB:

What is MariaDB?

MariaDB is an open-source relational database - a fork of MySQL created by the original MySQL developers after Oracle's acquisition.

Why MariaDB exists:

MySQL (1995) → Oracle Acquires (2010) → Community Concerns
                                              ↓
                                    MariaDB Fork (2009)
                                    (True Open Source)

Key advantages:

  • ✅ 100% open source (GPL licensed)

  • ✅ Drop-in replacement for MySQL

  • ✅ Better performance in many scenarios

  • ✅ More storage engines

  • ✅ Active community development

  • ✅ No Oracle licensing concerns

MariaDB Architecture

┌─────────────────────────────────────────┐
│    Application Layer (Nautilus App)    │
│         stapp01, stapp02, stapp03       │
└────────────────┬────────────────────────┘
                 │ TCP/IP
                 │ Port 3306
                 ↓
┌─────────────────────────────────────────┐
│         MariaDB Server (stdb01)         │
│  ┌───────────────────────────────────┐  │
│  │    Connection Handler              │  │
│  └─────────────┬─────────────────────┘  │
│                ↓                         │
│  ┌───────────────────────────────────┐  │
│  │    SQL Parser & Optimizer         │  │
│  └─────────────┬─────────────────────┘  │
│                ↓                         │
│  ┌───────────────────────────────────┐  │
│  │    Storage Engine (InnoDB)        │  │
│  └─────────────┬─────────────────────┘  │
│                ↓                         │
│  ┌───────────────────────────────────┐  │
│  │    Data Directory                 │  │
│  │    /var/lib/mysql/  ← CRITICAL!   │  │
│  └───────────────────────────────────┘  │
└─────────────────────────────────────────┘

That data directory is about to become very important! 📁

🔍 The Investigation Begins

Let's follow the systematic troubleshooting process:

Step 1: Access the Database Server

ssh peter@stdb01
# Password: Sp!dy

We're in! Time to investigate! 🕵️

Step 2: Gain Root Access

sudo su -

Database troubleshooting needs root privileges!

Step 3: Check the Service Status

This is always the first check:

systemctl status mariadb

Output:

● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled)
   Active: inactive (dead)

Okay, service is stopped. But disabled? That's interesting - it won't start on boot! 🤔

Let's check if the process exists:

ps aux | grep mysql

Nothing. No mysqld process running. ❌

Is it listening on port 3306?

ss -tlnp | grep 3306

No output. Nothing listening on the database port. ❌

Step 4: Attempt to Start the Service

Let's try starting it:

systemctl start mariadb

Result: The command hangs for a moment, then...

Job for mariadb.service failed because the control process exited with error code.
See "systemctl status mariadb.service" and "journalctl -xe" for details.

Uh oh! It's not just stopped - it's failing to start! 🚨

This is different from a simple stopped service. Something is wrong!

Step 5: Check the Logs (The Plot Thickens!)

Let's see what's happening:

journalctl -u mariadb -n 50

The logs show (from the actual output):

Nov 14 06:47:05 stdb01.stratos.xfusioncorp.com systemd[1]: systemd-hostnamed.service: Main process exited
Nov 14 06:47:05 stdb01.stratos.xfusioncorp.com systemd[1]: systemd-hostnamed.service: Deactivated successfully

Hmm, these are systemd logs but nothing specific about MariaDB failing... 🤔

Let's check MariaDB error log:

tail -50 /var/log/mariadb/mariadb.log

Output: The file might not even exist, or shows old entries!

This is unusual. Usually, logs tell us everything. We need to dig deeper!

Step 6: Check Disk Space

Disk space issues are the #1 cause of database failures:

df -h

Actual output:

Filesystem      Size  Used Avail Use% Mounted on
overlay         1.5T  286G  1.1T  21% /
tmpfs            32G     0   32G   0% /proc/acpi
devtmpfs         32G     0   32G   0% /dev/tty
shm              64M     0   64M   0% /dev/shm
/dev/md127      1.5T  286G  1.1T  21% /etc/hosts
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
tmpfs            13G  8.1M   13G   1% /run
tmpfs           6.3G     0  6.3G   0% /run/user/1001

21% used - plenty of space! So disk space isn't the issue. ✅

What else could it be?

Step 7: Check the Configuration

cat /etc/my.cnf

Output:

#
# This group is read both both by the client and the server
# use it for options that affect everything
#
[client-server]
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d

Configuration looks fine. Nothing obviously wrong. ✅

Let's check the included directory:

cat /etc/my.cnf.d/server.cnf

Output:

cat: /etc/my.cnf.d/server.cnf: No such file or directory

Interesting, but not necessarily a problem - default configs should work.

Step 8: The EUREKA Moment! 💡

Let's check the data directory itself:

ls -la /var/lib/mysql

Output:

ls: cannot access '/var/lib/mysql': No such file or directory

THERE IT IS! 🎯

The entire MariaDB data directory is MISSING!

This is why the service won't start! It's like trying to start a car with no engine! 🚗💨

Understanding What This Means

The /var/lib/mysql/ directory should contain:

  • System databases (mysql, performance_schema)

  • User databases

  • InnoDB tablespace files

  • Transaction logs

  • The Unix socket file

  • Privilege tables

Without this directory:

  • MariaDB has nowhere to store data

  • System tables don't exist

  • Can't authenticate users

  • Can't create databases

  • Service simply can't function

🛠️ The Solution: Database Initialization

Now that we know the problem, let's fix it!

Step 1: Verify MariaDB is Installed

rpm -qa | grep mariadb

If installed, you'll see packages like:

mariadb-server-10.3.28-1.el7.x86_64
mariadb-10.3.28-1.el7.x86_64

If not installed:

yum install -y mariadb-server

Step 2: Create the Data Directory

mkdir -p /var/lib/mysql

The -p flag creates parent directories if needed.

Step 3: Set Proper Ownership

This is CRITICAL! MariaDB runs as the mysql user:

chown -R mysql:mysql /var/lib/mysql
chmod 755 /var/lib/mysql

Why these permissions?

  • mysql:mysql - MySQL process needs to own these files

  • 755 - Owner can read/write/execute, others can read/execute

  • Without correct ownership, MySQL process can't write!

Step 4: Initialize the Database (THE KEY STEP!)

This is where the magic happens:

mysql_install_db --user=mysql --datadir=/var/lib/mysql

What's happening:

mysql_install_db is:
├── Creating system database (mysql)
├── Creating performance_schema database
├── Creating InnoDB system tablespace
├── Creating default user accounts
├── Generating privilege tables
├── Creating stored procedures
├── Setting up time zone tables
└── Initializing grant system

Output you'll see:

Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...
OK
Filling help tables...
OK

To start mysqld at boot time you have to copy
support-files/mysql.server to the right place for your system

PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following commands:

'/usr/bin/mysqladmin' -u root password 'new-password'
'/usr/bin/mysqladmin' -u root -h stdb01.stratos.xfusioncorp.com password 'new-password'

...

Alternative for newer MariaDB versions:

# With random root password
mysqld --initialize --user=mysql --datadir=/var/lib/mysql

# Without root password (like mysql_install_db)
mysqld --initialize-insecure --user=mysql --datadir=/var/lib/mysql

Step 5: Verify Data Directory Structure

ls -la /var/lib/mysql/

Now you should see:

total 110596
drwxr-xr-x  5 mysql mysql     4096 Nov 14 07:00 .
drwxr-xr-x 26 root  root      4096 Nov 14 06:50 ..
-rw-rw----  1 mysql mysql    16384 Nov 14 07:00 aria_log.00000001
-rw-rw----  1 mysql mysql       52 Nov 14 07:00 aria_log_control
-rw-rw----  1 mysql mysql 79691776 Nov 14 07:00 ibdata1
-rw-rw----  1 mysql mysql 12582912 Nov 14 07:00 ib_logfile0
-rw-rw----  1 mysql mysql 12582912 Nov 14 07:00 ib_logfile1
drwx------  2 mysql mysql     4096 Nov 14 07:00 mysql
drwx------  2 mysql mysql     4096 Nov 14 07:00 performance_schema

Perfect! The database is initialized! ✅

What each file/directory is:

  • mysql/ - System database (users, privileges, etc.)

  • performance_schema/ - Performance monitoring

  • ibdata1 - InnoDB shared tablespace

  • ib_logfile* - InnoDB transaction logs

  • aria_log* - Aria storage engine logs

Step 6: Start the Service

Now that the data directory exists:

systemctl start mariadb

This time it should succeed! 🎉

Check the status:

systemctl status mariadb

Output:

● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled)
   Active: active (running) since Thu 2025-11-14 07:05:23 UTC; 5s ago
  Process: 12345 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION
  Process: 12346 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 12350 (mysqld)
   Status: "Taking your SQL requests now..."
   CGroup: /system.slice/mariadb.service
           └─12350 /usr/sbin/mysqld

Key indicators:

  • Active: active (running)

  • Main PID present

  • Status: "Taking your SQL requests now..."

SUCCESS! The database is running! 🎊

Step 7: Enable Auto-Start

Don't forget this critical step:

systemctl enable mariadb

Output:

Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service.

Verify:

systemctl is-enabled mariadb

Output: enabled

Now MariaDB will start automatically after reboots!

Step 8: Verify Database Functionality

Check the process:

ps aux | grep mysql

Output:

mysql     12350  2.1  5.2 1184564 173420 ?      Ssl  07:05   0:02 /usr/sbin/mysqld

Perfect! mysqld is running as the mysql user. ✅

Check port 3306:

ss -tlnp | grep 3306

Output:

LISTEN  0  80  *:3306  *:*  users:(("mysqld",pid=12350,fd=21))

MariaDB is listening on port 3306! ✅

Test local connection:

mysql -u root

You're in!

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.001 sec)

MariaDB [(none)]> SELECT VERSION();
+----------------------------+
| VERSION()                  |
+----------------------------+
| 10.3.28-MariaDB            |
+----------------------------+
1 row in set (0.000 sec)

MariaDB [(none)]> EXIT;
Bye

The database is fully functional! 🎉

Step 9: Test from Application Servers

Now the critical test - can the applications connect?

From stapp01 (or any app server):

# Exit from database server
exit

# SSH to app server
ssh tony@stapp01

# Test port connectivity
telnet stdb01 3306

Output:

Trying 172.16.239.10...
Connected to stdb01.stratos.xfusioncorp.com.
Escape character is '^]'.
J
5.5.5-10.3.28-MariaDB...

That garbled text is the MariaDB handshake - connection successful!

Press Ctrl+] then type quit to exit.

Better test with netcat:

nc -zv stdb01 3306

Output:

Connection to stdb01 3306 port [tcp/mysql] succeeded!

Best test - actual database connection:

mysql -h stdb01 -u root

If you can connect and query - MISSION ACCOMPLISHED! 🏆

The Nautilus application can now connect to the database!

🎨 Why Did This Happen?

Let's understand the root causes:

Common Scenarios for Missing Data Directory

1. Fresh Installation Without Initialization ⚠️

# Someone did this:
yum install mariadb-server
systemctl start mariadb  # FAILS!

# Should have done:
yum install mariadb-server
mysql_install_db --user=mysql --datadir=/var/lib/mysql
systemctl start mariadb  # SUCCESS!

2. Container Without Persistent Volume 🐳

# Docker Compose - WRONG:
services:
  db:
    image: mariadb:10.3
    # Data lost on restart!

# Docker Compose - RIGHT:
services:
  db:
    image: mariadb:10.3
    volumes:
      - db_data:/var/lib/mysql

volumes:
  db_data:

3. Accidental Deletion 😱

# Someone panicked and did:
rm -rf /var/lib/mysql/*
# Or even worse:
rm -rf /var/lib/mysql/

# Now database won't start!

4. Filesystem Issues 💾

# Disk corruption, failed mount, etc.
# /var/lib/mysql becomes inaccessible

5. Improper Migration 🚚

# Migrating to new server without data:
rsync /etc/ new_server:/etc/  # Config copied
rsync /usr/bin/ new_server:/usr/bin/  # Binaries copied
# But forgot: rsync /var/lib/mysql/ new_server:/var/lib/mysql/
# Result: No data directory!

6. Configuration Management Gap 🤖

# Ansible playbook - INCOMPLETE:
- name: Install MariaDB
  yum:
    name: mariadb-server
    state: present

- name: Start MariaDB
  service:
    name: mariadb
    state: started

# Missing: Database initialization step!

💡 Key Takeaways

✨ Missing data directory prevents MariaDB from starting

✨ Always check if /var/lib/mysql exists when troubleshooting

✨ mysql_install_db initializes the database structure

✨ Initialization creates system tables and databases

✨ Must set proper ownership (mysql:mysql) on data directory

✨ Enable service to ensure auto-start after reboot

✨ Test both local and remote connectivity

✨ Common in fresh installs and containerized environments

✨ Logs may not clearly indicate missing data directory

✨ Systematic troubleshooting reveals root cause

✨ Document the issue for future reference

✨ Prevention: Include initialization in deployment automation

🎓 Interview Questions to Master

Q1: What's the difference between a stopped MariaDB service and a failed initialization?

Answer: A stopped service means MariaDB was running previously and has been stopped - the data directory exists with all databases intact. You can simply systemctl start mariadb to bring it back. A failed initialization or missing data directory means MariaDB has never been properly set up - there's no /var/lib/mysql directory with system tables. Starting the service will fail with errors like "Can't find file './mysql/user.frm'" or "Fatal error: Can't open privilege tables." The solution requires running mysql_install_db to create the database structure before starting the service. You can differentiate by: 1) Checking if /var/lib/mysql exists, 2) Looking for mysql, performance_schema directories inside, 3) Reviewing logs which will explicitly mention missing tables/files. In troubleshooting, always check data directory existence before attempting to start the service.

Q2: Walk me through what mysql_install_db actually does and why it's necessary.

Answer: mysql_install_db initializes the MariaDB data directory and creates the essential system infrastructure needed for the database to function. Specifically, it: 1) Creates system databases - The 'mysql' database which stores user accounts, privileges, and configuration; and 'performance_schema' for monitoring. 2) Generates tables - User table (authentication), db table (database privileges), tables_priv, columns_priv (granular permissions), host table, etc. 3) Creates InnoDB infrastructure - ibdata1 (shared tablespace), ib_logfile0/1 (transaction logs), InnoDB system tables. 4) Sets up initial accounts - Creates root@localhost and root@hostname with no password. 5) Installs system procedures - Stored procedures and functions needed for database operations. 6) Initializes engines - Prepares all storage engines (InnoDB, MyISAM, Aria) for use. Without this initialization, MariaDB has no structure to: authenticate users, store databases, manage permissions, or track transactions. It's like installing an operating system - you need the initial file system structure before you can use it. This is a one-time operation done after installation. Running it on an existing database would be destructive!

Q3: How would you differentiate between a missing data directory issue versus disk space or permission problems?

Answer: Each has distinct symptoms: Missing data directory (ls -la /var/lib/mysql shows "No such file or directory"): Service fails immediately on start, logs show "Can't find file './mysql/user.frm'" or "Fatal error: Can't open privilege tables", system tables completely absent, mysql_install_db is the solution. Disk space full (df -h shows 100%): Service might start but crash quickly, logs show "Disk is full" or "Cannot allocate memory for buffer pool", existing data is intact but can't write new data, free up space and restart. Permission problems (ls -ld /var/lib/mysql shows wrong ownership): Logs show "Can't create/write to file" or "Permission denied" on specific files, data directory exists but owned by wrong user (like root:root instead of mysql:mysql), fix with chown -R mysql:mysql /var/lib/mysql and restart. Troubleshooting order: 1) Check if directory exists (ls -la /var/lib/mysql), 2) Check disk space (df -h), 3) Check ownership/permissions (ls -ld /var/lib/mysql), 4) Review logs (journalctl -u mariadb). Missing directory is the easiest to identify - it simply doesn't exist!

Q4: In a production environment, how would you prevent this type of issue from occurring?

Answer: Prevention requires multiple layers: 1) Infrastructure as Code - Use Ansible/Terraform/CloudFormation that includes database initialization in provisioning scripts. Example Ansible task: - name: Initialize MariaDB / command: mysql_install_db / creates: /var/lib/mysql/mysql. 2) Container orchestration - Use persistent volumes in Kubernetes/Docker: volumeMounts: - name: db-data / mountPath: /var/lib/mysql with PersistentVolumeClaim. 3) Configuration management - Puppet/Chef modules that ensure data directory exists and is initialized. 4) Monitoring - Alert on: service status (immediately), data directory existence checks, file count in /var/lib/mysql. 5) Automated testing - Post-deployment smoke tests that verify database connectivity and can query system tables. 6) Documentation - Clear runbooks for new server setup including initialization steps. 7) Backup strategy - Regular backups mean you can restore data directory if lost. 8) Change management - Any database infrastructure changes reviewed and tested in staging first. 9) Health checks - Load balancers should check not just port 3306 but actual query execution. 10) Immutable infrastructure - Pre-baked images/AMIs with database already initialized reduce deployment variability. The key is treating database initialization as a required step in infrastructure provisioning, not an afterthought.

Q5: If you had to choose between mysql_install_db and mysqld --initialize, which would you use and why?

Answer: The choice depends on MariaDB version and use case: mysql_install_db (traditional method): Works on all MariaDB and MySQL versions, creates root user with no password by default, more familiar to experienced DBAs, explicitly shows what it's doing with verbose output, better for situations where you want to immediately set your own root password. I'd use this for: MariaDB 10.3 and earlier, automated deployments where I script the root password setting afterward, development environments where no password is convenient. mysqld --initialize (modern method, MySQL 5.7+/MariaDB 10.4+): Creates root user with random password (stored in error log), slightly faster, more secure by default (forces password change), modern recommended approach. mysqld --initialize-insecure is like mysql_install_db (no root password). I'd use this for: MySQL 5.7+ or MariaDB 10.4+, production where you want enforced password security, when following modern best practices. In practice, for MariaDB 10.3 (common in production), mysql_install_db is standard. For newer versions, I'd use mysqld --initialize and retrieve the temporary password from logs with grep 'temporary password' /var/log/mysqld.log. Both achieve the same goal - initialized data directory - the difference is security defaults and version compatibility.

Q6: After initializing the database, what security steps should you take before putting it in production?

Answer: Critical security hardening steps: 1) Run mysql_secure_installation - Interactive script that: removes anonymous users, disables remote root login, removes test database, reloads privilege tables. 2) Set strong root password: ALTER USER 'root'@'localhost' IDENTIFIED BY 'StrongP@ssw0rd123!'; 3) Create application-specific users - Never use root for applications: CREATE USER 'nautilus'@'stapp01' IDENTIFIED BY 'AppP@ss'; GRANT SELECT, INSERT, UPDATE, DELETE ON nautilus_db.* TO 'nautilus'@'stapp01'; 4) Restrict root access - Remove all root accounts except localhost: DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1'); 5) Configure bind-address - In /etc/my.cnf, set bind-address=172.16.239.10 to only listen on specific interface (not 0.0.0.0 if not needed). 6) Enable SSL/TLS - Force encrypted connections for remote access. 7) Firewall rules - Only allow port 3306 from application servers: firewall-cmd --add-rich-rule='rule family="ipv4" source address="172.16.238.0/24" port port="3306" protocol="tcp" accept' 8) Audit logging - Enable query logging to track access: SET GLOBAL general_log = 'ON'; 9) Regular updates - Schedule security patches: yum update mariadb-server 10) Backup immediately - Before going live, take initial backup as baseline. Never leave a database with default no-password root account in production!

🚨 Common Mistakes to Avoid

❌ Starting service before initialization

# Wrong order:
systemctl start mariadb  # FAILS!
mysql_install_db

# Right order:
mysql_install_db
systemctl start mariadb  # SUCCESS!

❌ Wrong permissions

# Wrong - owned by root:
mkdir /var/lib/mysql
mysql_install_db  # Might work but service fails

# Right - owned by mysql:
mkdir /var/lib/mysql
chown mysql:mysql /var/lib/mysql
mysql_install_db --user=mysql

❌ Running mysql_install_db on existing database

# DANGER - This can corrupt existing data!
ls /var/lib/mysql/  # Has databases
mysql_install_db    # DO NOT RUN!

# Only run on fresh/missing directory

❌ Forgetting to enable auto-start

# Service works now, but after reboot...
systemctl start mariadb  # Started
reboot
# Database down again!

# Always enable:
systemctl start mariadb
systemctl enable mariadb

❌ Not testing application connectivity

# Wrong - assume it works:
systemctl start mariadb
# Done!

# Right - verify end-to-end:
systemctl start mariadb
mysql -u root -e "SELECT 1"
ssh tony@stapp01
nc -zv stdb01 3306

🚀 Post-Resolution Actions

After fixing the database:

1. Monitor Closely

# Watch service for 15-30 minutes
watch -n 60 'systemctl status mariadb'

# Monitor connections
watch -n 10 'mysql -u root -e "SHOW PROCESSLIST"'

2. Document the Incident

Incident Report:

Title: MariaDB Service Down - Missing Data Directory

Timeline:
- 06:47 UTC: Alert triggered - Nautilus app cannot connect to DB
- 06:50 UTC: Investigation started
- 07:00 UTC: Root cause identified - /var/lib/mysql missing
- 07:05 UTC: Database initialized and service started
- 07:10 UTC: Full functionality restored

Root Cause:
- Fresh MariaDB installation without proper initialization
- Data directory (/var/lib/mysql) was never created
- Service unable to start without system tables

Impact:
- Complete application outage
- Duration: 23 minutes
- All users affected

Resolution:
1. Created data directory: mkdir -p /var/lib/mysql
2. Set proper ownership: chown -R mysql:mysql /var/lib/mysql
3. Initialized database: mysql_install_db --user=mysql --datadir=/var/lib/mysql
4. Started service: systemctl start mariadb
5. Enabled auto-start: systemctl enable mariadb
6. Verified functionality from app servers

Preventive Measures:
- Added database initialization to deployment automation
- Implemented monitoring for data directory existence
- Updated runbooks with initialization procedures
- Scheduled training on database provisioning

Lessons Learned:
- Always check data directory existence in troubleshooting
- Database initialization is mandatory after fresh install
- Automated deployments must include initialization step
- Testing should verify data directory structure

Status: RESOLVED ✅

3. Communication Template

Subject: [RESOLVED] Database Service Restored - Nautilus Application

Team,

The MariaDB database service on stdb01 has been fully restored.

ISSUE: MariaDB service down due to missing data directory
DURATION: 23 minutes (06:47 - 07:10 UTC)
IMPACT: Complete Nautilus application outage

ROOT CAUSE:
The MariaDB data directory (/var/lib/mysql) was missing, preventing
the service from starting. This occurred due to incomplete initialization
during the initial database setup.

RESOLUTION:
- Initialized MariaDB database structure
- Started and enabled the service
- Verified connectivity from all application servers
- Confirmed full functionality

CURRENT STATUS: ✅ Operational
- Service running and stable
- Application connectivity verified
- Auto-start enabled for future reboots

PREVENTIVE ACTIONS:
- Added initialization checks to deployment automation
- Updated monitoring to alert on missing data directory
- Scheduled team training on database setup procedures

All systems are now operating normally. Please contact me if you
experience any further database connectivity issues.

Best regards,
DevOps Team

4. Update Monitoring

Add these checks:

# Monitoring configuration
checks:
  - name: mariadb_service_status
    command: systemctl is-active mariadb
    interval: 60s
    alert_on: inactive

  - name: mariadb_data_directory
    command: test -d /var/lib/mysql && echo ok
    interval: 300s
    alert_on: failure

  - name: mariadb_port_listening
    command: ss -tlnp | grep :3306
    interval: 60s
    alert_on: failure

  - name: mariadb_connectivity
    command: mysql -u monitoring -p${MYSQL_PASS} -e "SELECT 1"
    interval: 60s
    alert_on: failure

5. Update Automation Scripts

Ansible Playbook - Complete Database Setup:

---
- name: Setup MariaDB Database
  hosts: database_servers
  become: yes

  tasks:
    - name: Install MariaDB server
      yum:
        name: mariadb-server
        state: present

    - name: Check if data directory exists
      stat:
        path: /var/lib/mysql/mysql
      register: mysql_data_dir

    - name: Create data directory
      file:
        path: /var/lib/mysql
        state: directory
        owner: mysql
        group: mysql
        mode: '0755'
      when: not mysql_data_dir.stat.exists

    - name: Initialize MariaDB database
      command: mysql_install_db --user=mysql --datadir=/var/lib/mysql
      args:
        creates: /var/lib/mysql/mysql
      when: not mysql_data_dir.stat.exists

    - name: Start MariaDB service
      systemd:
        name: mariadb
        state: started
        enabled: yes

    - name: Verify service is running
      systemd:
        name: mariadb
        state: started
      register: mariadb_status
      failed_when: mariadb_status.status.ActiveState != "active"

    - name: Test database connectivity
      mysql_db:
        name: test_connection
        state: present
        login_user: root
        login_password: "{{ mysql_root_password }}"
      register: db_test
      failed_when: db_test.failed

🎯 What We Learned Today

This incident taught us several valuable lessons:

Technical Lessons

  1. Data directory is the heart of MariaDB - without it, nothing works

  2. Initialization is mandatory - not optional, not automatic

  3. Logs don't always tell the full story - sometimes you need to check the filesystem

  4. Systematic troubleshooting wins - checking each component methodically

  5. mysql_install_db is not just a command - it's the foundation of your database

Operational Lessons

  1. Automation must be complete - missing one step can cause outages

  2. Testing should verify structure - not just service status

  3. Documentation prevents incidents - clear runbooks save time

  4. Monitoring needs depth - check data directory, not just service

  5. Post-mortems improve reliability - learn from every incident

Career Lessons

  1. Stay calm under pressure - panic leads to mistakes

  2. Trust the process - systematic beats random troubleshooting

  3. Document everything - your future self will thank you

  4. Communication is key - keep stakeholders informed

  5. Prevention > reaction - invest time in automation and monitoring

🔧 Troubleshooting Checklist for Future

When MariaDB won't start, check in this order:

☐ 1. Is the service stopped or failed?
   systemctl status mariadb

☐ 2. Does the data directory exist?
   ls -la /var/lib/mysql

☐ 3. Does it have the right structure?
   ls -la /var/lib/mysql/mysql

☐ 4. Are permissions correct?
   stat /var/lib/mysql (should be mysql:mysql)

☐ 5. Is there disk space?
   df -h (check /var partition)

☐ 6. What do logs say?
   journalctl -u mariadb -n 50

☐ 7. Is the config valid?
   cat /etc/my.cnf

☐ 8. Is another process using port 3306?
   ss -tlnp | grep 3306

☐ 9. Any recent changes?
   Review recent deployments/updates

☐ 10. Hardware/system issues?
   dmesg | grep -i error

🎓 Pro Tips for Database Management

Daily Operations

# Quick health check
mysql -u root -p -e "SHOW GLOBAL STATUS LIKE 'Uptime'; SHOW PROCESSLIST;"

# Check database sizes
mysql -u root -p -e "SELECT table_schema AS 'Database', 
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)' 
FROM information_schema.TABLES 
GROUP BY table_schema;"

# Monitor connections
mysql -u root -p -e "SHOW GLOBAL STATUS LIKE 'Threads_connected';"

# Check for locks
mysql -u root -p -e "SHOW OPEN TABLES WHERE In_use > 0;"

Backup Best Practices

# Full backup
mysqldump --all-databases --single-transaction --master-data=2 > backup_$(date +%Y%m%d).sql

# Backup with compression
mysqldump --all-databases --single-transaction | gzip > backup_$(date +%Y%m%d).sql.gz

# Verify backup
gunzip -c backup_20251114.sql.gz | mysql -u root -p test_restore

Performance Tuning

# Check slow queries
mysql -u root -p -e "SHOW GLOBAL STATUS LIKE 'Slow_queries';"

# Optimize all tables
mysqlcheck --all-databases --optimize -u root -p

# Analyze tables
mysqlcheck --all-databases --analyze -u root -p

🌟 Final Thoughts

Today we solved a critical production incident by:

  • ✅ Staying calm and systematic

  • ✅ Following a logical troubleshooting path

  • ✅ Understanding the root cause deeply

  • ✅ Implementing a complete solution

  • ✅ Preventing future occurrences

  • ✅ Documenting for the team

This is real DevOps work - not just running commands, but understanding systems, solving problems under pressure, and building resilience into infrastructure.

🎯 What's Next?

Day 9 complete! 🎉 We successfully troubleshot and resolved a critical database outage by identifying a missing data directory and properly initializing MariaDB. This is the kind of real-world incident response that separates good DevOps engineers from great ones!

Remember:

  • The best troubleshooting is systematic, not random

  • Understanding the "why" prevents future incidents

  • Documentation saves lives (and careers!)

  • Every incident is a learning opportunity

Tomorrow, Day 10 awaits with new challenges! Keep learning, keep building! 💪


Day: 9/100
Challenge: KodeKloud Cloud DevOps
Date: November 14, 2025
Topic: Database Troubleshooting & Initialization

What's your most memorable database incident? The one that taught you the most? Share your war stories in the comments - we all learn from each other! 🚨


📚 Additional Resources

MariaDB Documentation:

Related Topics to Explore:

  • MariaDB replication setup

  • Automated backup strategies

  • Performance optimization

  • High availability configurations

  • Security hardening

  • Monitoring and alerting

Commands Reference:

# Installation
yum install mariadb-server

# Initialization
mysql_install_db --user=mysql --datadir=/var/lib/mysql

# Service Management
systemctl start mariadb
systemctl stop mariadb
systemctl restart mariadb
systemctl status mariadb
systemctl enable mariadb

# Database Access
mysql -u root -p
mysql -h hostname -u username -p database

# Backup
mysqldump --all-databases > backup.sql
mysql < backup.sql

# Monitoring
mysql -u root -p -e "SHOW PROCESSLIST"
mysql -u root -p -e "SHOW GLOBAL STATUS"

Keep this as your reference guide! 📖

More from this blog

🚀 DevOps Challenge- KodeKloud Solutions

73 posts