Skip to main content

Build Reactive Systems with Clustron Watch

· 5 min read

Most distributed systems don’t start out complicated.

They begin with something simple — a loop.

You write a small worker that checks for work:

while (true)
{
var job = await client.GetAsync<Job>("jobs:next");

if (job.IsSuccess)
{
await ProcessAsync(job.Value);
}

await Task.Delay(1000);
}

At this stage, everything feels reasonable. The logic is clear, the system behaves predictably, and there is very little cognitive overhead.

But as the system grows, this pattern quietly becomes one of the most expensive parts of your architecture.


The Hidden Cost of Polling in Distributed Systems

Polling introduces a fundamental inefficiency: you are repeatedly asking the system for changes that usually haven’t happened.

As you scale:

  • Multiple instances begin polling the same data
  • Each instance performs repeated reads, most of which return no useful information
  • You introduce artificial latency — changes are only detected on the next polling cycle
  • The system starts consuming resources simply to confirm that nothing has changed

This leads to a system that is technically correct, but operationally inefficient.

More importantly, polling forces you to make tradeoffs you shouldn’t have to make:

  • Poll frequently → higher cost, lower latency
  • Poll less frequently → lower cost, higher latency

There is no correct answer — only compromise.


A Different Model: Reacting Instead of Asking

The alternative is not just an optimization — it is a different way of thinking about systems.

Instead of asking:

“Has something changed?”

You design your system to respond to:

“Something just changed.”

This is the shift from a pull-based model to a push-based model.

In Clustron, this is implemented through Watch.


Polling model vs watching changes


Introducing Watch: A Stream of State Changes

Clustron Watch allows you to subscribe to changes in the system and receive events as they happen.

At its core, Watch turns your data into a continuous stream of updates.

Watching a Single Key

var (subscription, initialRecord) = await client.Watch.WatchKeyAsync(
"order:1001",
new WatchOptions { IncludeInitialSnapshot = true },
ev =>
{
Console.WriteLine($"{ev.EventType}{ev.Key} = {ev.Value}");
});

This does two important things:

  1. It gives you the current state (initialRecord)
  2. It subscribes you to all future changes

This combination is critical. Without it, you would have to manually fetch the current value and then subscribe — which introduces race conditions.


Snapshot + Streaming: A Correctness Guarantee

One of the subtle but essential aspects of Watch is the ability to include an initial snapshot.

When IncludeInitialSnapshot is enabled:

  • You receive the current value at the time of subscription
  • You then receive all subsequent updates in order

This ensures that:

  • You do not miss existing state
  • You do not miss future changes
  • You do not need additional synchronization logic

In other words, Watch provides a complete and consistent view of state evolution.


Watching Collections: Moving Beyond Single Keys

While watching a single key is useful, most real systems operate on groups of related data.

Clustron supports this through prefix-based watching:

var subscription = await client.Watch.WatchPrefixAsync(
"jobs:",
new WatchOptions { IncludeInitialSnapshot = false },
ev =>
{
Console.WriteLine($"Job update: {ev.Key}");
});

This allows you to observe all changes within a logical group.

From an architectural perspective, this is where Watch becomes significantly more powerful:

  • You are no longer reacting to individual values
  • You are reacting to system-wide state transitions

Understanding the Event Model

Each change in the system produces a WatchEvent.

A typical event contains:

  • The key that changed
  • The type of change (Put or Delete)
  • The new value (if applicable)
  • A revision representing the version of the change

This revision is particularly important. It provides ordering and allows you to reason about the sequence of events across a distributed system.

Effectively, Watch gives you a log of state transitions, delivered in real time.


What This Enables in Practice

Once you adopt this model, several common problems become simpler.

Job Processing

Instead of polling for new work, workers react immediately when jobs are created or updated.

This removes idle cycles and reduces latency.


Cache Invalidation

Rather than relying on TTL or periodic refresh, caches can update themselves in response to actual changes.

This improves both freshness and efficiency.


Distributed Coordination

Systems that rely on shared state — such as leader election or lease ownership — benefit significantly from real-time updates.

Changes propagate instantly, allowing faster and more reliable coordination.


Presence and Monitoring

Tracking active components becomes trivial:

  • When a node appears → you receive an event
  • When a node disappears → you receive an event

No polling, no guessing.


Operational Considerations

Watch is designed for long-running, event-driven workloads.

A few important considerations:

Keep Handlers Lightweight

Your event handler should not perform heavy work directly.

Instead:

  • Process quickly
  • Delegate heavier tasks to background workers

Manage Subscription Lifecycle

Always stop subscriptions when they are no longer needed:

await subscription.StopAsync();

This ensures resources are released cleanly.


Understand Scope

  • Watching a key → only that key produces events
  • Watching a prefix → only matching keys produce events

Events are scoped and isolated, which prevents unintended cross-talk between components.


From Polling Loops to Reactive Systems

The shift from polling to watching is not just about performance.

It changes the structure of your application.

Polling systems are built around:

Loop → Check → Wait → Repeat

Reactive systems are built around:

Event → Handler → Action

This results in systems that are:

  • More responsive
  • More efficient
  • Easier to reason about

Final Thoughts

Polling is often the simplest way to get started, and in small systems, it works well enough.

But as systems grow, the cost of polling becomes increasingly difficult to justify.

Clustron Watch offers a different approach — one where your system reacts to change rather than searching for it.

Stop asking your system if something changed.

Let it tell you when it does.

Getting Started

Try a simple watch:

await client.Watch.WatchPrefixAsync("orders:", null, ev =>
{
Console.WriteLine($"{ev.Key} updated");
});

From there, start removing polling loops — one at a time.

You’ll quickly find that your system becomes not just faster, but fundamentally simpler.

Distributed Caching in .NET

· 4 min read

Caching is one of the most effective ways to improve the performance of an application.
Most .NET applications start with a simple in‑memory cache using MemoryCache or IMemoryCache. This works perfectly when the application runs on a single server.

However, modern applications rarely run on a single machine.

When applications scale horizontally across multiple instances, in‑memory caching stops working reliably. Each instance maintains its own cache, which means the application state becomes inconsistent across the cluster.

This is where distributed caching becomes essential.


Distributed Cache


The Problem with In‑Memory Caches

Consider a typical ASP.NET application deployed behind a load balancer.

          Load Balancer
|
---------------------------
| | |
Server A Server B Server C

Each server has its own memory cache.

If a request updates cached data on Server A, the caches on Server B and Server C remain outdated.

This leads to several problems:

  • inconsistent application state
  • stale cache entries
  • duplicate cache warmups
  • unpredictable behavior

To solve this problem, the cache must be shared across all nodes.


What is a Distributed Cache?

A distributed cache is a centralized or clustered cache that can be accessed by multiple application instances.

Instead of storing cached data inside the application process, the data is stored in a distributed key‑value store.

           Application Nodes

App A App B App C
| | |
+---------+---------+
|
Distributed Cache

All application nodes read and write cache entries from the same distributed store.

This ensures:

  • consistent cached data
  • shared application state
  • better scalability

Basic Distributed Cache Example in .NET

A distributed cache is typically used through an abstraction such as IDistributedCache.

Example:

public class ProductService
{
private readonly IDistributedCache _cache;

public ProductService(IDistributedCache cache)
{
_cache = cache;
}

public async Task<string?> GetProductAsync(string id)
{
var cached = await _cache.GetStringAsync(id);

if (cached != null)
return cached;

var product = await LoadProductFromDatabase(id);

await _cache.SetStringAsync(id, product);

return product;
}

private Task<string> LoadProductFromDatabase(string id)
{
return Task.FromResult($"Product {id}");
}
}

The application first checks the cache. If the value is missing, the data is loaded from the database and then stored in the cache.

This pattern is called cache‑aside caching.


Common Distributed Cache Solutions

Several technologies provide distributed caching capabilities.

Redis

Redis is one of the most widely used distributed caches.

It provides:

  • key‑value storage
  • high performance
  • in‑memory operations

However, Redis often requires additional components when used for distributed coordination.


Database‑Backed Caches

Some applications use relational databases as caches.

While simple, this approach introduces:

  • database contention
  • reduced performance
  • limited scalability

Distributed Key‑Value Platforms

Modern distributed systems often use distributed key‑value platforms that provide both caching and coordination primitives.

These systems support:

  • distributed caching
  • distributed locks
  • leader election
  • change notifications
  • cluster coordination

Using Clustron as a Distributed Cache

Clustron provides a distributed key‑value platform designed for .NET applications.

Because it runs as a distributed cluster, all application instances interact with the same shared data store.

Example:

using Clustron.DKV.Client;

var client = await DKVClient.InitializeRemote(
"teststore",
new[]
{
new DkvServerInfo("localhost", 7861)
});

await client.PutAsync("product:1", "Laptop");

var value = await client.GetAsync<string>("product:1");

Console.WriteLine(value);

In this example:

  • application nodes connect to the cluster
  • data is stored in the distributed store
  • all nodes see the same data

This allows the cache to remain consistent across the entire cluster.


When to Use Distributed Caching

Distributed caching is especially useful in systems that:

  • run multiple application instances
  • process high request volumes
  • rely heavily on database reads
  • require shared application state

Typical use cases include:

  • ASP.NET web applications
  • microservices architectures
  • background processing systems
  • API platforms

Summary

In‑memory caching works well for single‑server applications, but modern distributed systems require a shared caching layer.

Distributed caching allows multiple application instances to share the same cached data, improving scalability and consistency.

Platforms such as Clustron provide a distributed key‑value store that makes it easy to implement distributed caching and coordination in .NET applications.

Leader Election in Distributed Systems

· 4 min read

When applications scale across multiple machines, coordination becomes one of the hardest problems to solve.
Distributed systems often run several identical nodes that can all perform the same work. Without coordination, those nodes may attempt the same task simultaneously, leading to duplicate work, race conditions, or inconsistent state.

One of the most common ways to solve this problem is leader election.

Leader election ensures that one node temporarily becomes the leader while the other nodes remain followers. The leader performs coordination tasks while followers wait for leadership to change.

This pattern is widely used in distributed databases, job schedulers, container orchestration platforms, and configuration systems.


Leader Election in Distributed Systems


A Simple Example

Imagine a distributed worker system with three nodes:

Worker A Worker B Worker C

Each worker is capable of running background jobs.

If all workers execute the same scheduled job, the job will run three times.
Instead, we want exactly one worker to run the job.

This is where leader election becomes useful. One worker becomes the leader, and only the leader runs the job.


The Simplest Leader Election Pattern

A common approach uses a shared coordination store.

The idea is simple:

  1. Each node attempts to acquire leadership.
  2. The node that succeeds becomes the leader.
  3. The leader periodically renews its leadership.
  4. If the leader fails, another node takes over.

This is typically implemented using a lease.

A lease acts like a lock with an expiration time. If the node holding the lease disappears, another node can safely acquire leadership.


A Minimal .NET Example

The following simplified example demonstrates the idea behind lease-based leadership.

public class LeaderElection
{
private readonly ILeaseStore _store;
private readonly string _nodeId;

public LeaderElection(ILeaseStore store, string nodeId)
{
_store = store;
_nodeId = nodeId;
}

public async Task RunAsync()
{
while (true)
{
var acquired = await _store.TryAcquireLeaseAsync("cluster-leader", _nodeId, TimeSpan.FromSeconds(10));

if (acquired)
{
Console.WriteLine($"{_nodeId} is leader");

await PerformLeaderWork();

await _store.RenewLeaseAsync("cluster-leader", _nodeId);
}

await Task.Delay(2000);
}
}

private Task PerformLeaderWork()
{
Console.WriteLine("Running scheduled tasks...");
return Task.CompletedTask;
}
}

In this example:

  • Every node attempts to acquire the lease "cluster-leader".
  • The node that succeeds becomes the leader.
  • The leader periodically renews the lease.
  • If the leader crashes, the lease expires and another node takes over.

This pattern provides automatic failover.


Problems with DIY Leader Election

Although the example above looks simple, real-world distributed systems introduce complications:

Network partitions can cause nodes to believe they are still leaders.

Clock drift may cause lease expiration problems.

Slow failure detection may delay leadership transfer.

Building a robust leader election mechanism requires careful handling of distributed systems edge cases.

This is why many systems rely on specialized coordination platforms such as:

  • etcd
  • ZooKeeper
  • Consul

These systems implement reliable coordination primitives.


Leader Election with Clustron

Clustron provides a clustering platform for distributed systems in .NET.

Instead of building coordination infrastructure yourself, applications can use the platform's built-in primitives.

Example using Clustron's lease concept:

using Clustron.DKV.Client;

var client = await DKVClient.InitializeRemote(
"teststore",
new[] { new DkvServerInfo("localhost", 7861) });

var lease = await client.GrantLeaseAsync(TimeSpan.FromSeconds(10));

if (lease.Acquired)
{
Console.WriteLine("This node is the leader.");

while (true)
{
await client.RefreshLeaseAsync(lease.Id);
await Task.Delay(3000);
}
}

Here the node requests a lease from the cluster.

If the lease is granted, the node becomes the leader.
As long as the node keeps refreshing the lease, leadership remains valid.

If the node crashes or stops renewing the lease, another node can acquire leadership automatically.


When Leader Election Is Useful

Leader election appears in many real-world distributed systems:

  • Distributed job schedulers
  • Background task processors
  • Cluster managers
  • Microservice coordination
  • Configuration controllers

Any time a system requires exactly one node to coordinate activity, leader election becomes essential.


Summary

Leader election is a fundamental building block for distributed systems.

It allows clusters of nodes to coordinate their behavior while maintaining high availability.

A lease-based approach provides a simple and effective implementation strategy, and distributed platforms such as Clustron make these primitives available directly to .NET applications.

Instead of building coordination logic from scratch, developers can focus on application behavior while the clustering platform handles leadership, failover, and coordination.

Distributed Locks Explained

· 4 min read
Clustron Team
Distributed Systems Engineering

A Practical Guide for .NET Developers

Introduction

As applications scale across multiple machines, coordination becomes one of the most challenging aspects of system design.

In a single-machine application, mutual exclusion is straightforward. Developers can rely on language primitives such as lock in C#, Mutex, or Semaphore to ensure that only one thread accesses a shared resource at a time.

However, these primitives only work within a single process or machine.

Once applications run across multiple services or nodes, traditional locking mechanisms no longer work. Two services running on different machines cannot rely on a local lock statement to coordinate access to shared state.

This is where distributed locks become essential.

Distributed locks allow services across a cluster to coordinate access to shared resources while ensuring that only one participant holds the lock at a given time.


Local Locks vs Distributed Locks

A typical local lock in C# looks like this:

private static readonly object _lock = new object();

public void ProcessOrder()
{
lock(_lock)
{
Console.WriteLine("Processing order safely.");
}
}

This works perfectly inside a single process.

However, if your application runs on multiple service instances, each instance has its own _lock object. The lock no longer coordinates work across the system.

Worker A → lock acquired Worker B → lock acquired Worker C → lock acquired

All workers proceed simultaneously, defeating the purpose of locking.

Distributed systems require a shared coordination mechanism.


Distributed Locking


The Idea Behind Distributed Locks

A distributed lock uses a shared system to represent lock ownership.

Instead of locking memory, a node records lock ownership in a distributed coordination system such as:

  • a distributed key‑value store
  • a coordination service
  • a distributed cache

Example concept:

Lock Key: order:123 Owner: worker‑1

If another worker attempts to acquire the same lock, the operation fails until the lock is released or expires.


Real World Scenario: Distributed Invoice Processing

Imagine a cluster of workers processing invoices.

Worker A Worker B Worker C

All workers scan for pending invoices.

Without locking:

Worker A → processes invoice #101 Worker B → processes invoice #101 Worker C → processes invoice #101

The same invoice could be processed multiple times.

A distributed lock ensures that only one worker processes the invoice.


Distributed Locks with Clustron DKV

Clustron DKV provides built‑in primitives for distributed coordination.

Locks are managed through the Locks API, and operations on locked keys must include the correct lock handle.


Acquiring a Lock

A client first acquires a lock for a specific key.

var locks = ((IDkv)client).Locks;

var key = "order:123";

var lockHandle = await locks.AcquireAsync(key, TimeSpan.FromSeconds(10));

if (lockHandle == null)
{
Console.WriteLine("Failed to acquire lock.");
return;
}

Console.WriteLine("Lock acquired.");

The returned lock handle represents ownership of the lock.


Performing Operations Under the Lock

Once a lock is acquired, operations must include the lock handle.

var result = await client.PutAsync(
key,
"processed",
Put.WithLock(lockHandle.Handle));

if (result.IsSuccess)
{
Console.WriteLine("Value updated safely.");
}

Only the lock owner can modify the value.


Reading with the Lock

Reads can also respect the lock ownership.

var value = await client.GetAsync<string>(
key,
Get.WithLock(lockHandle.Handle));

if (value.IsSuccess)
{
Console.WriteLine($"Value: {value.Value}");
}

Competing Operations Without the Lock

If another worker attempts to modify the key without the lock, the operation fails.

var attempt = await client.PutAsync(key, "new-value");

if (!attempt.IsSuccess)
{
Console.WriteLine($"Operation rejected: {attempt.Status}");
}

Result:

KvStatus.Locked

This prevents concurrent modification of the same resource.


Releasing the Lock

Once the work is finished, the lock can be released.

await lockHandle.ReleaseAsync();

Console.WriteLine("Lock released.");

After the lock is released, another worker may acquire it.


Example Cluster Workflow

Worker A → Acquire Lock → Success Worker B → Acquire Lock → Failed Worker C → Acquire Lock → Failed

Worker A → Update Key Worker A → Release Lock

Only one worker performs the operation, ensuring correctness.


Handling Failures

Locks in Clustron are time‑bound.

When acquiring a lock, a duration must be specified:

await locks.AcquireAsync(key, TimeSpan.FromSeconds(10));

If the client holding the lock crashes or disconnects, the lock eventually expires, allowing another worker to acquire it.

This prevents locks from remaining permanently held.


Best Practices

When implementing distributed locks:

  • Keep critical sections short
  • Avoid global locks when possible
  • Partition locks by resource
  • Always design for failure scenarios

Final Thoughts

Distributed locks are one of the most fundamental coordination primitives in distributed systems.

They allow services running across multiple machines to safely coordinate access to shared resources while preserving system correctness.

Platforms like Clustron DKV simplify this by providing built‑in locking primitives, atomic operations, and strong consistency guarantees.

In the next article, we will explore another essential distributed systems primitive:

Leader Election.

Building Distributed Systems

· 4 min read
Clustron Team
Distributed Systems Engineering

A Practical Guide for .NET Developers

Introduction

Most backend systems begin their life on a single machine.

An ASP.NET Core application processes requests, a database stores application data, and background workers handle asynchronous tasks such as sending emails or generating reports. In this environment, concurrency is manageable and coordination between components is relatively straightforward.

As the system grows, however, things start to change.

Traffic increases, more application instances are deployed behind a load balancer, and services begin running across multiple machines. What was once a single application evolves into a distributed system.

At this point, the nature of the problem changes dramatically.

The challenges are no longer limited to writing correct business logic. Developers must now deal with coordination across machines, network failures, partial outages, and maintaining consistency across distributed state.

This is where distributed systems engineering begins.


Distributed Systems Architecture


What Is a Distributed System?

A distributed system is a collection of independent computers that appear to the user as a single coherent system.

Instead of running all logic on one machine, responsibilities are distributed across multiple nodes that communicate over the network.

Examples include:

  • Microservice architectures
  • Distributed databases
  • Clustered caches
  • Message queues
  • Cloud infrastructure platforms

These systems provide scalability and resilience but introduce new complexities.


Why Distributed Systems Are Hard

Unlike a single-machine application, distributed systems must deal with realities such as:

Network Failures

In distributed systems, communication happens over a network. Networks can be slow, unreliable, or temporarily unavailable.

A request may fail because the remote node crashed, because the network dropped packets, or because a timeout occurred.

The system must handle these scenarios gracefully.


Partial Failures

In a distributed system, components can fail independently.

One node in a cluster might crash while others remain healthy. A service instance may restart while other services continue running.

This means the system must operate correctly even when only part of it is functioning.


Consistency Challenges

When data is replicated across multiple nodes, keeping it consistent becomes difficult.

For example:

  • Two nodes may attempt to update the same data simultaneously.
  • Network delays may cause nodes to temporarily disagree about the current state.
  • A node might crash after performing only part of an operation.

Designing systems that maintain consistent state despite these scenarios requires careful coordination.


Key Building Blocks of Distributed Systems

Most distributed systems rely on a few core primitives.

Understanding these primitives helps developers design reliable architectures.


Distributed Storage

Data must often be stored across multiple machines.

This enables:

  • horizontal scaling
  • fault tolerance
  • high availability

Distributed key-value stores are commonly used for this purpose.


Coordination

Nodes must coordinate actions such as:

  • leader election
  • distributed locking
  • cluster membership
  • configuration updates

Without coordination mechanisms, multiple nodes might attempt conflicting operations.


Consensus

Some systems require strong guarantees about shared state.

Consensus algorithms such as Raft or Paxos allow nodes to agree on decisions even in the presence of failures.

These algorithms are fundamental to distributed databases and coordination services.


Partitioning and Replication

Data is typically:

  • partitioned across nodes to distribute load
  • replicated to protect against failures

Balancing performance, availability, and consistency is one of the core design challenges.


Common Distributed System Patterns

Several patterns appear repeatedly in real-world systems.

Examples include:

  • Leader election for coordinating cluster activities
  • Distributed locks for ensuring mutual exclusion
  • Job queues for asynchronous processing
  • Rate limiters for controlling request volume
  • Transaction protocols for atomic multi-node updates

These patterns form the foundation of many large-scale platforms.


The Role of Coordination Systems

To simplify distributed development, many organizations rely on coordination systems such as:

  • etcd
  • ZooKeeper
  • Consul

These systems provide primitives such as:

  • leader election
  • distributed locks
  • cluster membership
  • configuration storage

Applications can then build higher-level behavior on top of these primitives.


Distributed Systems in the .NET Ecosystem

While distributed infrastructure tools are often associated with other ecosystems, .NET applications increasingly operate in distributed environments.

Modern .NET systems frequently use:

  • Kubernetes
  • distributed caches
  • microservices
  • event-driven architectures

As a result, coordination and distributed state management are becoming increasingly important for .NET developers as well.


Final Thoughts

Building distributed systems is fundamentally different from building traditional single-machine applications.

The challenges of network communication, partial failures, and data consistency introduce complexities that require specialized patterns and infrastructure.

Fortunately, by understanding the key building blocks — storage, coordination, consensus, and replication — developers can design systems that scale reliably and remain resilient in the face of failures.

In future articles, we will explore some of these primitives in more detail, including leader election, distributed coordination patterns, and practical implementations in .NET.