Stop Command

The manta_node stop command gracefully shuts down running node instances, ensuring clean task termination and resource cleanup.

Overview

The stop command:

  • Sends termination signals to running nodes

  • Waits for graceful shutdown of active tasks

  • Cleans up resources and temporary files

  • Removes instance tracking files

  • Supports both individual and bulk stops

Synopsis

manta_node stop [instance] [options]

Arguments

instance

Instance ID or alias of the node to stop

  • Optional: If omitted, stops all nodes

  • Can be partial match (e.g., “prod” matches “production-a3f2c891”)

Options

--all

Stop all running node instances

  • Same as omitting instance argument

  • Confirms before stopping multiple nodes

--force, -f

Force immediate termination (SIGKILL)

  • Skips graceful shutdown

  • May cause data loss

  • Use only when normal stop fails

--timeout, -t <seconds>

Time to wait for graceful shutdown

  • Default: 10 seconds

  • After timeout, forces termination

  • Set higher for nodes with long-running tasks

Usage Examples

Stop Specific Node

Stop a node by alias or ID:

$ manta_node stop production
Sent termination signal to node 'production' (PID: 12345)
Waiting up to 10 seconds for graceful shutdown...
✓ Node 'production' stopped gracefully

Stop with Partial Match

Use partial instance ID:

$ manta_node stop prod-a3f2
Found matching instance: production-a3f2c891
Sent termination signal to node 'production' (PID: 12345) Node stopped successfully

Stop All Nodes

Stop all running instances:

$ manta_node stop --all
Found 3 running instance(s):
  - production (PID: 12345)
  - development (PID: 12346)
  - test-node (PID: 12347)

Stopping all instances...
✓ All 3 instances stopped successfully

Force Stop

Force immediate termination:

$ manta_node stop production --force
Force killed node 'production' (PID: 12345)
Warning: Forced termination may cause data loss

Custom Timeout

Allow more time for shutdown:

$ manta_node stop long-task-node --timeout 60
Sent termination signal to node 'long-task-node'
Waiting up to 60 seconds for graceful shutdown...
✓ Node stopped gracefully after 45 seconds

Shutdown Process

Graceful Shutdown

Normal stop sequence:

  1. Send SIGTERM: Node receives termination signal

  2. Stop accepting tasks: Refuses new task assignments

  3. Complete active tasks: Allows running tasks to finish

  4. Disconnect services: Closes gRPC and MQTT connections

  5. Cleanup resources: Removes temporary files

  6. Update status: Marks instance as stopped

  7. Exit cleanly: Process terminates with code 0

Forced Shutdown

When using --force or after timeout:

  1. Send SIGKILL: Immediate process termination

  2. No cleanup: Tasks and connections terminated abruptly

  3. Remove tracking: Instance file deleted

  4. Potential data loss: Incomplete operations lost

Instance Management

Instance files in ~/.manta/nodes/instances/ are:

  • Checked: Verify process is actually running

  • Updated: Mark as stopping during shutdown

  • Removed: Deleted after successful stop

  • Cleaned: Stale files removed automatically

Signal Handling

Signal Types

SIGTERM (15)
  • Default termination signal

  • Allows graceful shutdown

  • Caught by node for cleanup

SIGKILL (9)
  • Force termination signal

  • Cannot be caught or ignored

  • Immediate process death

SIGINT (2)
  • Interactive interrupt (Ctrl+C)

  • Same as SIGTERM for nodes

  • Graceful shutdown

Timeout Behavior

During graceful shutdown:

Time | Action
-----|--------------------------------------------------
0s   | SIGTERM sent, shutdown initiated
5s   | Check if process still running
10s  | Default timeout reached
     | If still running: Send SIGKILL
     | If stopped: Cleanup complete

Error Handling

No Running Instances

$ manta_node stop
No running node instances found.

Instance Not Found

$ manta_node stop unknown-node
Error: Node instance 'unknown-node' not found
Running instances:
  - production (PID: 12345)
  - development (PID: 12346)

Process Already Stopped

$ manta_node stop production
Warning: Node 'production' was already stopped
Cleaning up stale instance file

Permission Denied

$ manta_node stop system-node
Error: Permission denied (PID: 1234)
Try running with sudo or as the user who started the node

Bulk Operations

Stop Multiple Specific Nodes

Stop nodes sequentially:

# Stop specific nodes
for node in prod-1 prod-2 prod-3; do
    manta_node stop $node
done

Stop Cluster Nodes

Stop all cluster nodes:

# Stop cluster (nodes with '-cluster-' in name)
manta_node cluster stop

Conditional Stops

Stop based on criteria:

# Stop all GPU nodes
manta_node status | grep gpu | while read node; do
    manta_node stop $node
done

Cleanup Operations

Automatic Cleanup

On successful stop:

  • Instance tracking files removed

  • Temporary directories cleaned

  • Docker containers stopped

  • Network connections closed

  • Log files finalized

Manual Cleanup

If automatic cleanup fails:

# Remove stale instance files
rm ~/.manta/nodes/instances/*.json

# Check for orphaned processes
ps aux | grep manta_node

# Clean Docker containers
docker ps -a | grep manta
docker rm -f <container_id>

Best Practices

Production Environments

  1. Always graceful: Avoid force stops in production

  2. Increase timeout: Allow time for task completion

  3. Monitor stops: Check logs after stopping

  4. Scheduled stops: Plan maintenance windows

  5. Verify cleanup: Ensure resources are freed

Development Environments

  1. Quick iteration: Use force stop for faster development

  2. Bulk stops: Stop all nodes when done testing

  3. Auto-cleanup: Let system clean stale instances

  4. Check status: Verify all nodes stopped

Emergency Procedures

When nodes won’t stop normally:

  1. Try graceful first: manta_node stop <node>

  2. Increase timeout: manta_node stop <node> -t 30

  3. Force if needed: manta_node stop <node> --force

  4. Manual kill: kill -9 <pid> (last resort)

  5. Clean up: Remove instance files manually

Integration with Scripts

Bash Script Example

#!/bin/bash
# safe_stop.sh - Safely stop all nodes

echo "Stopping all manta nodes..."

# Get list of running nodes
nodes=$(manta_node status --plain | grep "running" | cut -d' ' -f1)

# Stop each node with timeout
for node in $nodes; do
    echo "Stopping $node..."
    if manta_node stop $node --timeout 30; then
        echo "✓ $node stopped"
    else
        echo "✗ Failed to stop $node"
        exit 1
    fi
done

echo "All nodes stopped successfully"

Python Script Example

import subprocess
import time

def stop_node(instance_id, timeout=10, force=False):
    """Stop a manta node instance."""
    cmd = ['manta_node', 'stop', instance_id]

    if force:
        cmd.append('--force')
    else:
        cmd.extend(['--timeout', str(timeout)])

    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.returncode == 0

# Stop all nodes gracefully
if not stop_node('--all', timeout=30):
    print("Graceful stop failed, forcing...")
    stop_node('--all', force=True)

Troubleshooting

Node Won’t Stop

If a node refuses to stop:

  1. Check process: ps aux | grep manta_node

  2. View logs: manta_node logs <instance>

  3. Check tasks: Look for hanging tasks

  4. Network issues: Verify manager connectivity

  5. Force stop: Use --force flag

Zombie Processes

Clean up zombie processes:

# Find zombie processes
ps aux | grep defunct | grep manta

# Kill parent process
kill -9 <parent_pid>

Stale Instance Files

Remove orphaned tracking files:

# List instance files
ls ~/.manta/nodes/instances/

# Verify processes
for file in ~/.manta/nodes/instances/*.json; do
    pid=$(jq -r .pid "$file")
    if ! ps -p $pid > /dev/null; then
        echo "Removing stale: $file"
        rm "$file"
    fi
done

See Also