High Availability

High Availability support in the OnixS .NET FIX Engine addresses scenarios where the engine must be restarted on a different node due to failure. The objective is to preserve and reuse session recovery data after such relocation.

To enable this, FIX Engine execution is separated from the storage that maintains session state and message history.

The following sections briefly describe two storage approaches that can be used to organize High Availability deployments:

Database-based storage
Shared file system-based storage

Important: For any High Availability deployment, only one FIX Engine instance may access a given session recovery storage at any time. Active-active deployments against the same storage are not supported.

Database-Based Storage Scenario

This approach is illustrated by the Database Session Storage.

The example demonstrates externalization of FIX session recovery data to persistent storage. Session state and message history can be stored in external network-accessible storage located outside the engine node.

The example illustrates the architectural principle and does not implement built-in clustering, coordination, or locking mechanisms.

The external storage contains:

Sent and received FIX messages (used for resend and recovery)
Communication session state

These elements correspond to the traditional file-based persistence files:

STORAGE_ID.summary files (message history for resend/recovery)
STORAGE_ID.state files (session state)

For example, in this implementation, these mechanisms are functionally reproduced using an SQLite database. However, one can use any suitable database. The database may be deployed on remote or fault-tolerant infrastructure, allowing the FIX Engine session recovery data to be restored when the engine starts on another node.

Diagnostic and application log files are not part of this mechanism. They are always stored locally in the file system and remain independent of session recovery storage.

This scenario assumes that storage access is controlled externally to ensure that only one FIX Engine instance is active against the database at any given time.

Shared File System-Based Failover Pattern

A commonly used deployment pattern for active-passive failover relies on a shared file system, such as NFS, SMB/CIFS, Amazon EFS, Azure Files, Google Filestore, CephFS, GlusterFS, or other compatible network file storage solutions.

In this configuration, both nodes are configured to use the same file-based storage path, but only one FIX Engine instance is active at a time.

Important: Shared storage does not provide FIX Engine instance coordination on its own. You must enforce active-passive behaviour externally (for example, by orchestration fencing/locking) so that only one engine instance can access the same STORAGE_ID.summary and STORAGE_ID.state files at any time. Without external fencing, concurrent writers may corrupt the recovery state and break the resend behaviour.

If the active node fails, the passive instance may be started and reuse the existing file-based session recovery data, including:

STORAGE_ID.summary files
STORAGE_ID.state files

This section outlines a general failover approach based on shared file storage. Specific mechanisms for controlling engine startup and ensuring that only one instance is active at a time, are typically implemented as part of the surrounding infrastructure or operational setup.

High Availability

Database-Based Storage Scenario

Shared File System-Based Failover Pattern

See Also