About Elasticsearch

Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine, and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

Features

  • Conduit makes it easy to connect your data to your favorite BI and data science tools, including Power BI. Your Elasticsearch data is approachable and interactive – in a matter of minutes, no matter where it's stored.
  • Data aggregation and JOINs with a familiar SQL query syntax at your fingertips. Native JOIN with other Elasticsearch datasets or hybrid JOIN with other supported connector types.
  • Automatic flattening and schema generation. Cherry-pick flattened data to use only specific columns needed for reporting to speed things up even more.
  • Advanced feature support, including arrays, multi-nested fields with several depth layers and multiple nested fields defined on the same level.
  • Access your data in real-time. Conduit allows you to connector in DirectQuery mode vs. Power BI’s standard import mode, which limits your data refreshes per day. 
  • Advanced Parquet Store cache for a fast performance. Configurable expiration and re-caching.
  • Built-in data governance and security controls. Flexible yet robust.

Prerequisites

If you haven’t already done so, be sure to sign up for a Conduit account.  Try the power and flexibility of Conduit firsthand with a free trial.

For your Elasticsearch datasource, have the following handy:

  • Datasource URL
  • Service account Username and Password (if applicable)




Create Elasticsearch Connector

Connectors can be created from the main dashboard. To create new Elasticsearch connector, click on "Add New Connector" button, then select Elasticsearh connector type to load wizard for configuring the new connector.

There are a few basic steps to getting Elasticsearch connector up an running:

  1. Define your datasource
  2. Configure access
  3. Select what data you want to make available via connector
  4. Configure virtualization and caching options


Datasource

Define your connector name and enter datasource URL.

  • Connector Name
    • Required
    • Will be used to identify published tables
    • Only lowercase letters, numbers and underscore symbols are allowed
    • Can be changed only before the connector is saved
  • Description
    • Optional field for notes about connector; visible in Conduit only
    • Can be changed at any point
  • Connection URL
    • Required
    • Can be changed only before the connector is saved
  • Amazon Elasticsearch Service
    • If the provided URL is Amazon Elasticsearch Service link, Conduit will determine it as well as AWS Region
    • If your datasource calls so, AWS Region can be changed at this point

Click Next button (blue right arrow) to go to the Authentication tab to continue configuring your connector.

To cancel connector creation, click Close button.


New connector - Select connector type


Datasource Tab - define connector name and data source

Authentication

Define how external BI users should be authorized by Conduit to access specific data and how Conduit is connecting to the datasource.

  • Select Authentication Method for external users connecting to Conduit:
    •  Anonymous with Impersonation
      • Anyone with the connector link has read access to all tables/data published through the connector
      • BI users are not required to provide any form of credentials
      • Default option
    •  Conduit Authentication with Impersonation
      • Allows Conduit Admins to configure data access only to users from specific Conduit Group(s)
      • BI users are required to provide credentials that are looked up by Conduit in its user database
    •  Active Directory with Impersonation
      •  Allows Conduit Admins to configure data access only to users from specific Active Directory Groups(s) for a selected User Subscription. The access to the database will be done by Conduit authentication credentials.
    • User Credentials Pass Through 
      • External users are required to provide their own credentials that are used by Conduit directly against the data source


  •  Enter the service account credentials to be used by Conduit to explore and publish the data source entities during connector creation/editing and to execute all runtime queries against the datasource (if authentication with impersonation was selected)

    • Username

    • Password

Click Next button to go to the next tab to continue configuring your connector.

To cancel connector creation, click Close button.


Authentication Tab

Publish

Select what data will be available to the BI users. Choose to publish one or more tables, specific columns only or entire table(s).

As you navigate to Publish tab, Conduit will flatten JSON object hierarchies into a simple list of field names. 

Publish tab provides an interface to prune tables to include only fields required for analytics, thus reducing the resource load while querying and improving querying times.

Use Search to find specific fields you would like to select. 

Once all the desired fields/tables are selected, the user has 2 options:

  • Save the connector using the default settings:
    • Caching not enabled.
    • Conduit SQL engine for non-native and join queries enabled.
    • Authorization not enabled, hence all Conduit users will have access to the published data.
    • Default fetch, partition and array discovery sample sizes and default query timeout.
  • Continue configuring the connector.

To save connector, click Submit button.

To continue configuring connector properties, click Next button.

To cancel connector creation, click Close button.


Publish Tab  - pruning 1


Publish tab - Search

Virtualization

On Virtualization tab you can configure the following:

  • Enable Caching
    • Caching is a component of hardware or software that allows for storage of relevant data for quick future access.
    • Disabled by default.
    • Recommended to enable for large datasets and/or when expensive queries are expected.  
    • When enabled, Conduit will do the following: 
      • Data Cache on Disk: temporary parquet store will be created for a connector's dataset. In case of Elasticsearch connector non-native and join queries will be ran against parquet store. The rest of the queries will be ran against source.
      • Query Cache: results of all queries, that can be run against the data source natively or against Data Cache. Query Cache will expire with Data Cache.
    • All tables for a given connector will have cache enabled.
  • Enable Conduit SQL engine for non-native and join queries 
    • Enabled by default.
    • Recommended to keep enabled for Elasticsearch connectors.
    • If unchecked, the reporting tool will throw a message to the analyst and won't run non-native or join queries.

Virtualization Tab

Authorization

Configure access for a selected Authentication type.

If you've selected on the Authentication tab "Conduit Authentication with Impersonation" or "Active Directory with Impersonation" authentication type, then here you can configure which Conduit Group (s) Or Active Directory Group(s) should grant access to published table(s).

  • By default Authorization is not enabled, meaning all your Conduit users will have access to all published tables for a given connector.
  • To enforce Authorization click Enable Authorization
  • From a group list you can select which groups(s) should grant access to the connector
    • Access is granted on a table level.
      • If you need some group(s) to have access to certain fields from table A, and other group(s) should have access to another set of fields from the same table A, please create two connectors to pruned versions of the table A, one for each permissions case. 
    • If Authorization is enabled but not groups are selected, the connector's tables will be accessible to no one. 

Only Admins are allowed to view and modify Authorization tab. 

Authentication type and Authorization configuration can be changed at any time. If permissions are revoked, the data will no longer be accessible to external user(s) as well as connector to a restricted table will no longer be present in connector list in BI tools.