Apache Parquet is a file format designed to support fast data processing for complex data, with several notable characteristics:

  • Columnar storage - designed to bring efficiency compared to row-based files like CSV

  • Open-source 

  • Self-describing

Connectors that support parquet

All connectors backed by Spark engine are compatible with parquet like: 

Read parquet files

In order to be able to read parquet files make sure your connector is configured correctly. There are two main steps that must be fulfilled.

  1. When you create your connector, in the Publish step, make sure you select a parquet type of folder/file. See example below:

2. In the Advanced step make sure you select the right parquet file type.