Conduit supports multiple deployment architectures. By design, Conduit is scalable and can be run from configurations requiring only a single box to multiple servers deployed across different data centers to ensure high availability.


Additionally, Conduit by design can be interfaced to use different scalable processing engines, e.g. Spark, BlazinsSQL with GPU, Databricks clusters.


HA stands for Conduit running in High Availability mode.


TABLE OF CONTENTS


Conduit can be run in High Availability modes in the following scenarios:

  • on premise
  • Google Cloud
  • Azure

Please contact us for support regarding installing and running Conduit in High Availability mode.


Deployment types

1. Single Box Deployment

Data Store is configured to use local file system on the Conduit VM.


2. SingleBox + Spark/HDFS cluster attached


3. HA + Cloud Storage

- there are 3+ VMs  for Conduit services, 

- one node has Spark driver - leader election, 

- all nodes are Spark workers

- Spark master and master failover on at least 2 nodes

- storage in the cloud


4. HA + HDFS

- 3+ VMs  for Conduit services

- one node has Spark driver - leader election, 

- all nodes are Spark workers, 

- Spark master and master failover on at least 2 nodes

  • storage on HDFS deployed on all 3 nodes (the only option on-premise)

    • on every node HDFS DN - HDSF Data Node

    • on one of the VMs we have HDSF NN - HDFS Name Node

    • on one of the VMs we have HDFS Stby - standby, different VM from VM with HDFS NN

    • each VM will have a bounded area with HDFS components


5. HA + Spark/HDFS cluster attached

- only one Spark driver in the Conduit VM farm, leader election


6. SingleBox + Cloud Storage (S3, AZBS, GCS)


7. HA + Spark cluster + Cloud Storage


8. Conduit HA + BlazingSQL cluster + Cloud Storage / HDFS


9. Conduit HA + Databricks cluster + Cloud Storage


10. Conduit HA + Cloudera CDH


Use cases

OnPremise to Cloud Migration Platform


Related articles