Managing Profilers

The Cloudera Data Catalog profiler engine runs data profiling operations on data located in multiple data lakes. These profilers create metadata annotations that summarize the content and shape characteristics of the data assets.

Table 1. List of built-in profilers
Profiler Name Description
Cluster Sensitivity Profiler A sensitive data profiler- PII, PCI, HIPAA, etc.
Ranger Audit Profiler A Ranger audit log summarizer.
Hive Column Profiler Provides summary statistics like Maximum, Minimum, Mean, Unique, and Null values at the Hive column level.

Limitations

  • In VM-based environments, profilers do not support Iceberg Tables. However, Iceberg tables are discoverable. In Compute Cluster enabled environments, Iceberg tables can be profiled.
  • In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.
  • Supported file formats:
    • VM-based environments:
      • CSV
    • Compute Cluster enabled environments:
      • Hive Column Profilers and Cluster Sensitivity Profilers
        • CSV
        • Parquet
        • Iceberg tables