About Azure Blob Storage

Azure Storage is Microsoft's cloud storage solution for modern data storage scenarios. Azure Storage offers a massively scalable object store for data objects, a file system service for the cloud, a messaging store for reliable messaging, and a NoSQL store. Read more...

Features

  • Conduit makes it easy to connect your data to your favorite BI and data science tools, including Power BI. Your data approachable and interactive – in a matter of minutes, no matter where it's stored.
  • Data aggregation and JOINs with a familiar SQL query syntax at your fingertips.
  • Range of supported file types.
  • Dynamic folder and Hive-Compatible folder modes.
  • Access your data in real-time. Conduit allows you to connector in DirectQuery mode vs. Power BI’s standard import mode, which limits your data refreshes per day. 
  • Advanced Parquet Store cache for a fast performance. Configurable expiration and re-caching.
  • Built-in data governance and security controls. Flexible yet robust.

Prerequisites

If you haven’t already done so, be sure to sign up for a Conduit account.  Try the power and flexibility of Conduit firsthand with a free trial.

For your Azure Blob Storage connector, have the following handy:

  • Storage account name
  • Container name
  • Access key




Create Connector

Connectors can be created from the main dashboard. To create new connector, click on "Add New Connector" button, then select desired connector type to load wizard for configuring the new connector.

There are a few basic steps to getting any connector up an running:

  1. Define your datasource
  2. Configure access
  3. Select what data you want to make available via connector
  4. Configure virtualization and caching options


Datasource

Define your connector name and connection URL.

  • Connector Name
    • Required
    • Will be used to identify published tables
    • Only lowercase letters, numbers and underscore symbols are allowed
    • Can be changed only before the connector is saved
  • Description
    • Optional field for notes about connector; visible in Conduit only
    • Can be changed at any point
  • Storage Account Name
    • Required
    • Can be changed only before the connector is saved

Conduit will generate URL used to connect to Azure Blob Storage. The URL is read-only and will change if you modify the storage account name or container name.

Click Next button (blue right arrow) to go to the Authentication tab to continue configuring your connector.

To cancel connector creation, click Close button.




Authentication

Define how external BI users should be authorized by Conduit to access specific data and how Conduit is connecting to the datasource.

  1. Select Authentication Method for external users connecting to Conduit:
    1.  Anonymous with Impersonation
      1. Anyone with the connector link has read access to all tables/data published through the connector
      2. BI users are not required to provide any form of credentials
      3. Default option
    2.  Conduit Authentication with Impersonation
      1. Allows Conduit Admins to configure data access only to users from specific Conduit Group(s)
      2. BI users are required to provide credentials that are looked up by Conduit in its user database
    3.  Active Directory with Impersonation
      1. Allows Conduit Admins to configure data access only to users from specific Active Directory Groups(s) for a selected User Subscription. The access to the database will be done by Conduit authentication credentials.
  2.  Enter the service account credentials to be used by Conduit to execute all runtime queries against the datasource

    1. Access Key

Click Next button to go to the next tab to continue configuring your connector.

To cancel connector creation, click Close button.



Publish

Select what data will be available to the BI users. Choose to publish one or more files and/or folders.

On Publish tab individual files and/or folders can be selected for publishing.

  • To explore folder structure click on black arrow(s)  to expand datasource node(s)
  • Use Search to find specific fields you would like to select. Please note that search will be finding only items on expanded nodes.
  • Selecting several files in the same folder with the same schema and file type will result in a table with all the files appended to create one table
    • a closest parent folder name will be used for identification
  • Selecting an entire folder (or subfolder) will be an indicative that the selection should be treated in "folder mode", so the source folder can be configured as Dynamic or  Hive-Compatible folder

Once all the desired files and/or folders are selected, the user has 2 options:

  1. Save the connector using the default settings:
    1. Caching not enabled
    2. Authorization not enabled, hence all Conduit users will have access to the published data
    3. Default Advanced tab settings
  2. Continue configuring the connector.

To save connector, click Submit button.

To continue configuring connector properties, click Next button.

To cancel connector creation, click Close button.





Virtualization

On Virtualization tab you can Enable Caching:

  • Caching is a component of hardware or software that allows for storage of relevant data for quick future access.
  • Disabled by default.
  • Recommended to enable for large datasets and/or when expensive queries are expected.  
  • When enabled, Conduit will do the following: 
    • Data Cache on Disk: temporary parquet store will be created for a connector's dataset, all queries will be ran against parquet store.
    • Query Cache: results of all queries ran against Data Cache. Query Cache will expire with Data Cache.
  • All table for a given connector will have cache enabled.

The Conduit SQL Query engine is enabled by default for Azure Blob Storage, being needed to parse all the SQL queries generated by the BI tools.



Authorization

Configure access for a selected Authentication type.

If you've selected on the Authentication tab "Conduit Authentication with Impersonation" or "Active Directory with Impersonation" authentication type, then here you can configure which Conduit Group (s) Or Active Directory Group(s) should grant access to published table(s).

  • By default Authorization is not enabled, meaning all your Conduit users will have access to all published tables for a given connector.
  • To enforce Authorization click Enable Authorization
  • From a group list you can select which groups(s) should grant access to the connector
    • Access is granted on a table level.
      • If you need some group(s) to have access to certain fields from table A, and other group(s) should have access to another set of fields from the same table A, please create two connectors to pruned versions of the table A, one for each permissions case. 
    • If Authorization is enabled but not groups are selected, the connector's tables will be accessible to no one. 

Only Admins are allowed to view and modify Authorization tab. 

Authentication type and Authorization configuration can be changed at any time. If permissions are revoked, the data will no longer be accessible to external user(s) as well as connector to a restricted table will no longer be present in connector list in BI tools. 



Advanced

Fine-tune how your selections should be published.

For each table the following can be configured:

  • Alias
    • A user-friendly table name to be used to identify published tables by external users.
    • Optional. If not specified, real file name or immediate parent folder name will be used for identification
  •  File Type
    • File type of the file (or files if these are expected to be appended into one table)
      • If file type is CSV, TSV, PDV, CDV or SCDV,  First Row Header option will be added, checked by default
  • Folder options - available if an entire folder has been selected 
    • Static
      • Conduit will build a static list of files at connector setup time and new files will be ignored at query time.
    • Dynamic Folder
      • Conduit will recursively traverse the folder structure and build a list of all files in this folder tree at query time, so new files flowing into the folder structure are always going to be included in the query.
    • Hive-Compatible Folder Layout
      • Conduit will read out flat files from cloud storage, and if the files are grouped in folders with names in the "fieldName=fieldValue" format (for example date=12/09/2018, date=12/10/2018 etc), Spark will be able to read this "Hive-compatible" folder with predicate pushdown for queries involving these fields and provide huge performance gains for filtering and aggregation queries.
  • Cache options - available if cache has been enabled for the connector on Virtualization tab
    • Caching Expiration
      • Default cache expiration time is 30 minutes.
      • After expiration, cache will re-create when a query is ran.
    • Cache Now
      • Initiate caching of the data source on connector save to avoid waiting for cache upon initial query. 
    • Memory and Disk
      • Connector's tables stored in Spark Cache on disk, and also in memory for a better performance.
    • Disk Only
      • Connector's tables stored in Parquet Store on disk. 

Endpoints

This page contains the endpoints for the newly created connector that you can use to access the data from different applications:

  • JDBC/ODBC/Thrift Endpoint to connect to dataset(s) defined on the connector from various BI and data science tools.
  • Power BI Spark Connector - to connect to dataset(s) defined on the connector from Power BI.
  • Tableau Spark Connector - to connect to dataset(s) defined on the connector from Tableau.