Stop Command¶

The manta_node stop command gracefully shuts down running node instances, ensuring clean task termination and resource cleanup.

Overview¶

The stop command:

Sends termination signals to running nodes
Waits for graceful shutdown of active tasks
Cleans up resources and temporary files
Removes instance tracking files
Supports both individual and bulk stops

Synopsis¶

manta_node stop [instance] [options]

Arguments¶

instance

Instance ID or alias of the node to stop

Optional: If omitted, stops all nodes
Can be partial match (e.g., “prod” matches “production-a3f2c891”)

Options¶

--all

Stop all running node instances

Same as omitting instance argument
Confirms before stopping multiple nodes

--force, -f

Force immediate termination (SIGKILL)

Skips graceful shutdown
May cause data loss
Use only when normal stop fails

--timeout, -t <seconds>

Time to wait for graceful shutdown

Default: 10 seconds
After timeout, forces termination
Set higher for nodes with long-running tasks

Usage Examples¶

Stop Specific Node¶

Stop a node by alias or ID:

$ manta_node stop production
Sent termination signal to node 'production' (PID: 12345)
Waiting up to 10 seconds for graceful shutdown...
✓ Node 'production' stopped gracefully

Stop with Partial Match¶

Use partial instance ID:

$ manta_node stop prod-a3f2
Found matching instance: production-a3f2c891
Sent termination signal to node 'production' (PID: 12345)
✓ Node stopped successfully

Stop All Nodes¶

Stop all running instances:

$ manta_node stop --all
Found 3 running instance(s):
  - production (PID: 12345)
  - development (PID: 12346)
  - test-node (PID: 12347)

Stopping all instances...
✓ All 3 instances stopped successfully

Force Stop¶

Force immediate termination:

$ manta_node stop production --force
Force killed node 'production' (PID: 12345)
Warning: Forced termination may cause data loss

Custom Timeout¶

Allow more time for shutdown:

$ manta_node stop long-task-node --timeout 60
Sent termination signal to node 'long-task-node'
Waiting up to 60 seconds for graceful shutdown...
✓ Node stopped gracefully after 45 seconds

Shutdown Process¶

Graceful Shutdown¶

Normal stop sequence:

Send SIGTERM: Node receives termination signal
Stop accepting tasks: Refuses new task assignments
Complete active tasks: Allows running tasks to finish
Disconnect services: Closes gRPC and MQTT connections
Cleanup resources: Removes temporary files
Update status: Marks instance as stopped
Exit cleanly: Process terminates with code 0

Forced Shutdown¶

When using --force or after timeout:

Send SIGKILL: Immediate process termination
No cleanup: Tasks and connections terminated abruptly
Remove tracking: Instance file deleted
Potential data loss: Incomplete operations lost

Instance Management¶

Instance files in ~/.manta/nodes/instances/ are:

Checked: Verify process is actually running
Updated: Mark as stopping during shutdown
Removed: Deleted after successful stop
Cleaned: Stale files removed automatically

Signal Handling¶

Signal Types¶

SIGTERM (15)

Default termination signal
Allows graceful shutdown
Caught by node for cleanup

SIGKILL (9)

Force termination signal
Cannot be caught or ignored
Immediate process death

SIGINT (2)

Interactive interrupt (Ctrl+C)
Same as SIGTERM for nodes
Graceful shutdown

Timeout Behavior¶

During graceful shutdown:

Time | Action
-----|--------------------------------------------------
0s   | SIGTERM sent, shutdown initiated
5s   | Check if process still running
10s  | Default timeout reached
     | If still running: Send SIGKILL
     | If stopped: Cleanup complete

Error Handling¶

No Running Instances¶

$ manta_node stop
No running node instances found.

Instance Not Found¶

$ manta_node stop unknown-node
Error: Node instance 'unknown-node' not found
Running instances:
  - production (PID: 12345)
  - development (PID: 12346)

Process Already Stopped¶

$ manta_node stop production
Warning: Node 'production' was already stopped
Cleaning up stale instance file

Permission Denied¶

$ manta_node stop system-node
Error: Permission denied (PID: 1234)
Try running with sudo or as the user who started the node

Bulk Operations¶

Stop Multiple Specific Nodes¶

Stop nodes sequentially:

# Stop specific nodes
for node in prod-1 prod-2 prod-3; do
    manta_node stop $node
done

Stop Cluster Nodes¶

Stop all cluster nodes:

# Stop cluster (nodes with '-cluster-' in name)
manta_node cluster stop

Conditional Stops¶

Stop based on criteria:

# Stop all GPU nodes
manta_node status | grep gpu | while read node; do
    manta_node stop $node
done

Cleanup Operations¶

Automatic Cleanup¶

On successful stop:

Instance tracking files removed
Temporary directories cleaned
Docker containers stopped
Network connections closed
Log files finalized

Manual Cleanup¶

If automatic cleanup fails:

# Remove stale instance files
rm ~/.manta/nodes/instances/*.json

# Check for orphaned processes
ps aux | grep manta_node

# Clean Docker containers
docker ps -a | grep manta
docker rm -f <container_id>

Best Practices¶

Production Environments¶

Always graceful: Avoid force stops in production
Increase timeout: Allow time for task completion
Monitor stops: Check logs after stopping
Scheduled stops: Plan maintenance windows
Verify cleanup: Ensure resources are freed

Development Environments¶

Quick iteration: Use force stop for faster development
Bulk stops: Stop all nodes when done testing
Auto-cleanup: Let system clean stale instances
Check status: Verify all nodes stopped

Emergency Procedures¶

When nodes won’t stop normally:

Try graceful first: manta_node stop <node>
Increase timeout: manta_node stop <node> -t 30
Force if needed: manta_node stop <node> --force
Manual kill: kill -9 <pid> (last resort)
Clean up: Remove instance files manually

Integration with Scripts¶

Bash Script Example¶

#!/bin/bash
# safe_stop.sh - Safely stop all nodes

echo "Stopping all manta nodes..."

# Get list of running nodes
nodes=$(manta_node status --plain | grep "running" | cut -d' ' -f1)

# Stop each node with timeout
for node in $nodes; do
    echo "Stopping $node..."
    if manta_node stop $node --timeout 30; then
        echo "✓ $node stopped"
    else
        echo "✗ Failed to stop $node"
        exit 1
    fi
done

echo "All nodes stopped successfully"

Python Script Example¶

import subprocess
import time

def stop_node(instance_id, timeout=10, force=False):
    """Stop a manta node instance."""
    cmd = ['manta_node', 'stop', instance_id]

    if force:
        cmd.append('--force')
    else:
        cmd.extend(['--timeout', str(timeout)])

    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.returncode == 0

# Stop all nodes gracefully
if not stop_node('--all', timeout=30):
    print("Graceful stop failed, forcing...")
    stop_node('--all', force=True)

Troubleshooting¶

Node Won’t Stop¶

If a node refuses to stop:

Check process: ps aux | grep manta_node
View logs: manta_node logs <instance>
Check tasks: Look for hanging tasks
Network issues: Verify manager connectivity
Force stop: Use --force flag

Zombie Processes¶

Clean up zombie processes:

# Find zombie processes
ps aux | grep defunct | grep manta

# Kill parent process
kill -9 <parent_pid>

Stale Instance Files¶

Remove orphaned tracking files:

# List instance files
ls ~/.manta/nodes/instances/

# Verify processes
for file in ~/.manta/nodes/instances/*.json; do
    pid=$(jq -r .pid "$file")
    if ! ps -p $pid > /dev/null; then
        echo "Removing stale: $file"
        rm "$file"
    fi
done