In Conduit, the following file systems for storing data source caches (Parquet store) are supported: 

  • Azure Blob Storage (abfs) 

  • S3 (s3)

  • Google Cloud Storage (gcs)

  • HDFS (hdfs) 

  • local file system (file).

If file system configuration needs to be modified, this can be done using the env variables from bde-server.env (/etc/bpcs/docker/bde-server.env)

The following describes the steps required to configure supported file systems. If the configurations already exist and want to be modified, all env variables from bde-server.env, that start with FS_ (found at the bottom of the file), must be deleted first. Also, it is required to clean previous hive metastore volume, using the following command (the container must be stopped first):

docker volume rm docker_hive_metastore_volume


A. Azure Blob Storage file system configuration

The configuration of this file system can be done in 2 ways depending on the type of authentication used:

 A.1 Access keys authentication

  • More information on generating access keys can be found here

  • The storage account must have hierarchical namespace enabled.

  • The following configuration must be added to bde-server.env:

FS_TYPE=abfs

FS_ABFS_STORAGE_ACCOUNT={ Azure Blob storage account }

FS_ABFS_CONTAINER={ Azure Blob container }

FS_ABFS_ACCESS_KEY={ Azure Blob access key }

FS_DEFAULTFS=abfs://{ Azure Blob container }@{ Azure Blob storage account }.dfs.core.windows.net

  A.2 Azure Managed Identity authentication

  • More information about this type of authentication can be found here.

  • It is important that the resource group used to have the following role: StorageBlobDataContributor

  • The storage account must have hierarchical namespace enabled.

  • The following configuration must be added to bde-server.env:

FS_TYPE=abfs

FS_ABFS_STORAGE_ACCOUNT={ Azure Blob storage account }

FS_ABFS_CONTAINER={ Azure Blob container }

FS_DEFAULTFS=abfs://{ Azure Blob container }@{ Azure Blob storage account }.dfs.core.windows.net


B. S3 file system configuration

The configuration of this file system can be done in 2 ways, depending on the type of authentication used:

B.1 Access Key authentication

  • More information on access key generation can be found here.

  • It is important for the used service account to have the following permission: AmazonS3FullAccess 

  • The following configuration must be added to bde-server.env:

FS_TYPE=s3

FS_AWS_BUCKET={ S3 bucket name }

FS_AWS_ACCESS_KEY={ S3 bucket access key }

FS_AWS_SECRET_KEY={ S3 bucket secret key }

FS_DEFAULTFS=s3a://{ S3 bucket name }


B.2 IAM metadata authentication

  • More information about this type of authentication can be found here.

  • It is important for the used service account to have the following permission: AmazonS3FullAccess 

  • The following configuration must be added to bde-server.env

FS_TYPE=s3

FS_AWS_BUCKET={ S3 bucket name }

FS_DEFAULTFS=s3a://{ S3 bucket name }


C. GCS file system configuration

The configuration of this file system can be done in 2 ways, depending on the type of authentication used:

C.1 File credential authentication (using P12 certificate)

  • More information about this type of authentication can be found here

  • It is important for the used service account to have the following permission: StorageAdmin

  • The following configuration must be added to bde-server.env

    FS_TYPE=gcs

    FS_GCS_BUCKET={{ GCS bucket name }}

    FS_GCS_PROJECT_ID={{ GCS project id }}

    FS_GCS_SERVICE_ACCOUNT_KEYFILE={{ The path to the GCS json keyfile }}

    FS_DEFAULTFS=gs://{{ GCS bucket name }}

  • if the configuration is new or the keyfile needs to be changed, the new file should be added to the following directory:/etc/bpcs/docker/conduit/gcs/keyfile/

C.2 IAM metadata authentication

  • More information about this type of authentication can be found here.

  • It is important for the used service account to have the following permission: StorageAdmin

  • The following configuration must be added to bde-server.env

    FS_TYPE=gcs

    FS_GCS_BUCKET={{ GCS bucket name }}

    FS_GCS_PROJECT_ID={{ GCS project id }}

    FS_DEFAULTFS=gs://{{ GCS bucket name }}


D. HDFS file system configuration

The following configuration must be added to bde-server.env

FS_TYPE=hdfs

FS_DEFAULTFS=hdfs://{{ spark_host }}:9200


E. Local file system configuration

The following configuration must be added to bde-server.env:

FS_TYPE=file

FS_DEFAULTFS=file:///


Example of Parquet store file system configurations


Azure Blob StorageFS_TYPE=abfsFS_AZBS_STORAGE_ACCOUNT=my_storage_accounFS_AZBS_CONTAINER=my_containeFS_AZBS_ACCESS_KEY=my_access_keyFS_DEFAULTFS=abfs://my_container@my_storage_account.dfs.core.windows.net 

S3FS_TYPE=s3FS_AWS_BUCKET=my_bucket 

FS_AWS_ACCESS_KEY=my_access_key

FS_AWS_SECRET_KEY=my_secret_key FS_DEFAULTFS=s3a://my_bucket 

Google Cloud StorageFS_TYPE=gcs FS_GCS_BUCKET=my_bucket 

FS_GCS_PROJECT_ID=my_project_id 

FS_GCS_SERVICE_ACCOUNT_KEYFILE=/etc/bpcs/docker/conduit/gcs/keyfile/my_keyfile.json FS_DEFAULTFS=gcs://my_bucket 

HDFSFS_TYPE=hdfs

FS_DEFAULTFS=hdfs://10.1.8.4:9000/ 

Local file systemFS_TYPE=filFS_DEFAULTFS=file:///