Managing Profilers
The Cloudera Data Catalog profiler engine runs data profiling operations on data located in multiple data lakes. These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.
Profiler Name | Description |
---|---|
Cluster Sensitivity Profiler | A sensitive data profiler- PII, PCI, HIPAA, etc. |
Ranger Audit Profiler | A Ranger audit log summarizer. |
Hive Column Profiler | Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level. |
Limitations
- In VM-based environments, profilers do not support Iceberg Tables. However, Iceberg tables are discoverable. In Compute Cluster enabled environments, Iceberg tables can be profiled.
- In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.
- Supported file formats:
- VM-based environments:
- CSV
- Compute Cluster enabled environments:
- Hive Column Profilers and
Cluster Sensitivity Profilers
- CSV
- Parquet
- Iceberg tables
- Hive Column Profilers and
Cluster Sensitivity Profilers
- VM-based environments: