Core Concepts¶
Understand the Manta Platform Architecture
Master the fundamental concepts that power Manta’s distributed computing platform. This guide explains the key components and how they work together.
🎯 Key Concepts to Master
Platform Architecture¶
Manta follows a hierarchical architecture:
Platform Hierarchy:
┌─────────┐
│Platform │
└────┬────┘
│
┌────▼────┐
│Clusters │
└────┬────┘
│
┌────▼────┐
│ Nodes │
└────┬────┘
│
┌────▼────┐
│Container│
└────┬────┘
│
┌────▼────┐
│ Tasks │
└────┬────┘
│
┌────▼────┐
│Modules │
└─────────┘
User Interaction Flow:
┌─────┐ ┌─────────┐ ┌─────────┐
│Users├────┤ SDK/CLI ├────┤Platform │
└─────┘ └─────────┘ └─────────┘
┌──────┐ ┌─────┐
│Swarms├────────────┤Tasks│
└──────┘ └─────┘
┌────┐ ┌─────┐
│Data├──────────────┤Nodes│
└────┘ └─────┘
Clusters and Nodes¶
Clusters are logical groups of computing resources:
Created and managed through the dashboard or API
Have unique IDs and authentication credentials
Can contain multiple nodes
Provide isolated execution environments
Nodes are individual compute units:
Connect to clusters using credentials
Execute tasks in secure containers
Manage local data and resources
Communicate with platform services
# Cluster management via SDK
from manta.apis.user_api import AsyncUserAPI
api = AsyncUserAPI(token="your_token")
# List clusters
clusters = await api.list_clusters()
# Get cluster details
cluster = await api.get_cluster("cluster-id")
print(f"Cluster {cluster.id} has {cluster.node_count} nodes")
# List nodes in cluster
nodes = await api.list_nodes("cluster-id")
for node in nodes:
print(f"Node {node.id}: {node.status}")
Swarms and Modules¶
Swarms define distributed algorithms:
Specify task topology and execution flow
Configure resource requirements
Define data distribution patterns
Orchestrate multi-stage workflows
Modules contain task implementations:
Python packages with Task classes
Execute inside secure containers
Access local and global data
Communicate with other tasks
from manta import Swarm, Task
class MySwarm(Swarm):
def __init__(self):
super().__init__()
# Define tasks
self.worker = Task(
module="worker_module",
class_name="WorkerTask",
replicas=4
)
self.aggregator = Task(
module="aggregator_module",
class_name="AggregatorTask",
replicas=1
)
# Define execution schedule
self.schedule = [
("worker", "parallel"),
("aggregator", "sequential")
] * 10 # 10 rounds
Task Execution Model¶
Tasks follow a specific lifecycle:
Initialization: Container starts with runtime
Setup: Task.setup() called once
Execution: Task.execute() called per round
Communication: Access local/world interfaces
Completion: Results collected and container stops
from manta.light import Task
class MyTask(Task):
def setup(self):
"""One-time initialization."""
self.model = self.initialize_model()
def execute(self):
"""Called for each execution round."""
# Access local data
data = self.local.get_dataset("training_data")
# Get global state
params = self.world.get_global("parameters")
# Perform computation
result = self.compute(data, params)
# Share results
self.world.set_result("output", result)
return {"status": "success"}
Configuration System¶
The ~/.manta configuration system provides:
Profiles: Environment-specific settings
Credentials: Secure token storage
Node Configs: Hardware and resource settings
Hierarchical Overrides: Flexible configuration
Data Management¶
Manta handles data at multiple levels:
Local Data (Node-specific):
- Datasets stored on individual nodes
- Accessed via self.local
in tasks
- Supports various formats (numpy, torch, etc.)
Global Data (Swarm-wide):
- Shared state across all tasks
- Accessed via self.world
interface
- Synchronized via platform services
Results (Output collection): - Task outputs collected centrally - Accessible via SDK or dashboard - Supports streaming and batch access
# Data access in tasks
class DataTask(Task):
def execute(self):
# Local data access
local_data = self.local.get_dataset("mnist")
# Global state access
global_model = self.world.get_global("model")
# Process data
output = self.process(local_data, global_model)
# Store results
self.world.set_result("processed", output)
Communication Patterns¶
Manta supports various communication patterns:
All-Reduce: Aggregate values from all nodes
# In worker tasks
self.world.set_result("gradients", local_gradients)
# In aggregator task
all_gradients = self.world.get_results("gradients")
averaged = sum(all_gradients) / len(all_gradients)
self.world.set_global("model", averaged)
Broadcast: Share data with all nodes
# In coordinator task
self.world.broadcast("config", configuration)
# In worker tasks
config = self.world.get_broadcast("config")
Peer-to-Peer: Direct node communication
# Send to specific node
self.world.send_to_node("node-2", "message", data)
# Receive from specific node
data = self.world.receive_from_node("node-1", "message")
Security Model¶
Manta implements multiple security layers:
Authentication: - JWT tokens for API access - Cluster-specific credentials - User and role management
Encryption: - TLS for API communication - Optional mTLS for production - Encrypted credential storage
Isolation: - Container-based task isolation - Secure execution environments - Resource quotas and limits
Performance Considerations¶
Optimize your algorithms for Manta:
Data Transfer: - Minimize data movement between nodes - Use compression for large transfers - Cache frequently accessed data locally
Task Scheduling: - Balance work across available nodes - Consider node capabilities (CPU/GPU) - Implement adaptive scheduling
Resource Usage: - Monitor memory consumption - Optimize batch sizes - Implement checkpointing for long tasks
class OptimizedTask(Task):
def execute(self):
# Check available resources
resources = self.local.get_resources()
# Adapt batch size based on memory
batch_size = self.calculate_optimal_batch_size(
resources.available_memory
)
# Process with optimized settings
self.process_data(batch_size=batch_size)
Deployment Modes¶
Manta supports multiple deployment scenarios:
Development Mode: - Single machine testing - Simulated multi-node environment - Rapid iteration and testing
Production Mode: - Multi-machine clusters - High availability configuration - Monitoring and logging integration
Cloud Deployment: - Container orchestration - Auto-scaling capabilities - Cloud-native integrations
Concepts in Practice¶
Here’s how the concepts come together:
# 1. Create a cluster (via dashboard or API)
cluster_id = await api.create_cluster(name="ml-cluster")
# 2. Connect nodes to cluster
# Nodes automatically register when started with credentials
# 3. Define your swarm
swarm = FederatedLearningSwarm()
swarm.set_cluster(cluster_id)
# 4. Deploy the swarm
swarm_id = await api.deploy_swarm(swarm)
# 5. Tasks execute on nodes
# - Workers train on local data
# - Aggregator combines results
# - Process repeats for multiple rounds
# 6. Collect results
results = await api.get_swarm_results(swarm_id)
Next Steps¶
Deep dive into specific topics:
Configuration System - Master the configuration system
🎓 Concepts Understood! You now have a solid foundation of how Manta works.