Core Concepts

Understand the Manta Platform Architecture

Master the fundamental concepts that power Manta’s distributed computing platform. This guide explains the key components and how they work together.

🎯 Key Concepts to Master

Clusters - Computing infrastructure
Nodes - Individual compute units
Swarms - Distributed algorithms
Modules - Task implementations
Tasks - Execution units
Configuration - Environment management

Platform Architecture

Manta follows a hierarchical architecture:

Platform Hierarchy:

                  ┌─────────┐
                  │Platform │
                  └────┬────┘
                       │
                  ┌────▼────┐
                  │Clusters │
                  └────┬────┘
                       │
                  ┌────▼────┐
                  │  Nodes  │
                  └────┬────┘
                       │
                  ┌────▼────┐
                  │Container│
                  └────┬────┘
                       │
                  ┌────▼────┐
                  │ Tasks   │
                  └────┬────┘
                       │
                  ┌────▼────┐
                  │Modules  │
                  └─────────┘

User Interaction Flow:

┌─────┐    ┌─────────┐    ┌─────────┐
│Users├────┤ SDK/CLI ├────┤Platform │
└─────┘    └─────────┘    └─────────┘

┌──────┐            ┌─────┐
│Swarms├────────────┤Tasks│
└──────┘            └─────┘

┌────┐              ┌─────┐
│Data├──────────────┤Nodes│
└────┘              └─────┘

Clusters and Nodes

Clusters are logical groups of computing resources:

  • Created and managed through the dashboard or API

  • Have unique IDs and authentication credentials

  • Can contain multiple nodes

  • Provide isolated execution environments

Nodes are individual compute units:

  • Connect to clusters using credentials

  • Execute tasks in secure containers

  • Manage local data and resources

  • Communicate with platform services

# Cluster management via SDK
from manta.apis.user_api import AsyncUserAPI

api = AsyncUserAPI(token="your_token")

# List clusters
clusters = await api.list_clusters()

# Get cluster details
cluster = await api.get_cluster("cluster-id")
print(f"Cluster {cluster.id} has {cluster.node_count} nodes")

# List nodes in cluster
nodes = await api.list_nodes("cluster-id")
for node in nodes:
    print(f"Node {node.id}: {node.status}")

Swarms and Modules

Swarms define distributed algorithms:

  • Specify task topology and execution flow

  • Configure resource requirements

  • Define data distribution patterns

  • Orchestrate multi-stage workflows

Modules contain task implementations:

  • Python packages with Task classes

  • Execute inside secure containers

  • Access local and global data

  • Communicate with other tasks

from manta import Swarm, Task

class MySwarm(Swarm):
    def __init__(self):
        super().__init__()

        # Define tasks
        self.worker = Task(
            module="worker_module",
            class_name="WorkerTask",
            replicas=4
        )

        self.aggregator = Task(
            module="aggregator_module",
            class_name="AggregatorTask",
            replicas=1
        )

        # Define execution schedule
        self.schedule = [
            ("worker", "parallel"),
            ("aggregator", "sequential")
        ] * 10  # 10 rounds

Task Execution Model

Tasks follow a specific lifecycle:

  1. Initialization: Container starts with runtime

  2. Setup: Task.setup() called once

  3. Execution: Task.execute() called per round

  4. Communication: Access local/world interfaces

  5. Completion: Results collected and container stops

from manta.light import Task

class MyTask(Task):
    def setup(self):
        """One-time initialization."""
        self.model = self.initialize_model()

    def execute(self):
        """Called for each execution round."""
        # Access local data
        data = self.local.get_dataset("training_data")

        # Get global state
        params = self.world.get_global("parameters")

        # Perform computation
        result = self.compute(data, params)

        # Share results
        self.world.set_result("output", result)

        return {"status": "success"}

Configuration System

Configuration System

The ~/.manta configuration system provides:

  • Profiles: Environment-specific settings

  • Credentials: Secure token storage

  • Node Configs: Hardware and resource settings

  • Hierarchical Overrides: Flexible configuration

Data Management

Manta handles data at multiple levels:

Local Data (Node-specific): - Datasets stored on individual nodes - Accessed via self.local in tasks - Supports various formats (numpy, torch, etc.)

Global Data (Swarm-wide): - Shared state across all tasks - Accessed via self.world interface - Synchronized via platform services

Results (Output collection): - Task outputs collected centrally - Accessible via SDK or dashboard - Supports streaming and batch access

# Data access in tasks
class DataTask(Task):
    def execute(self):
        # Local data access
        local_data = self.local.get_dataset("mnist")

        # Global state access
        global_model = self.world.get_global("model")

        # Process data
        output = self.process(local_data, global_model)

        # Store results
        self.world.set_result("processed", output)

Communication Patterns

Manta supports various communication patterns:

All-Reduce: Aggregate values from all nodes

# In worker tasks
self.world.set_result("gradients", local_gradients)

# In aggregator task
all_gradients = self.world.get_results("gradients")
averaged = sum(all_gradients) / len(all_gradients)
self.world.set_global("model", averaged)

Broadcast: Share data with all nodes

# In coordinator task
self.world.broadcast("config", configuration)

# In worker tasks
config = self.world.get_broadcast("config")

Peer-to-Peer: Direct node communication

# Send to specific node
self.world.send_to_node("node-2", "message", data)

# Receive from specific node
data = self.world.receive_from_node("node-1", "message")

Security Model

Manta implements multiple security layers:

Authentication: - JWT tokens for API access - Cluster-specific credentials - User and role management

Encryption: - TLS for API communication - Optional mTLS for production - Encrypted credential storage

Isolation: - Container-based task isolation - Secure execution environments - Resource quotas and limits

Performance Considerations

Optimize your algorithms for Manta:

Data Transfer: - Minimize data movement between nodes - Use compression for large transfers - Cache frequently accessed data locally

Task Scheduling: - Balance work across available nodes - Consider node capabilities (CPU/GPU) - Implement adaptive scheduling

Resource Usage: - Monitor memory consumption - Optimize batch sizes - Implement checkpointing for long tasks

class OptimizedTask(Task):
    def execute(self):
        # Check available resources
        resources = self.local.get_resources()

        # Adapt batch size based on memory
        batch_size = self.calculate_optimal_batch_size(
            resources.available_memory
        )

        # Process with optimized settings
        self.process_data(batch_size=batch_size)

Deployment Modes

Manta supports multiple deployment scenarios:

Development Mode: - Single machine testing - Simulated multi-node environment - Rapid iteration and testing

Production Mode: - Multi-machine clusters - High availability configuration - Monitoring and logging integration

Cloud Deployment: - Container orchestration - Auto-scaling capabilities - Cloud-native integrations

Concepts in Practice

Here’s how the concepts come together:

# 1. Create a cluster (via dashboard or API)
cluster_id = await api.create_cluster(name="ml-cluster")

# 2. Connect nodes to cluster
# Nodes automatically register when started with credentials

# 3. Define your swarm
swarm = FederatedLearningSwarm()
swarm.set_cluster(cluster_id)

# 4. Deploy the swarm
swarm_id = await api.deploy_swarm(swarm)

# 5. Tasks execute on nodes
# - Workers train on local data
# - Aggregator combines results
# - Process repeats for multiple rounds

# 6. Collect results
results = await api.get_swarm_results(swarm_id)

Next Steps

Deep dive into specific topics:

🎓 Concepts Understood! You now have a solid foundation of how Manta works.