What's new

This section lists major features and updates for the Cloudera Data Catalog service.

October 17, 2025

This release (3.1.3) of the Cloudera Data Catalog service introduces the following new changes:

The Cluster Sensitivity Profiler and the Statistics Collector Profiler support incremental profiling to reduce required time and compute resources during repeated profiling jobs in Compute Cluster enabled environments.
For more information, see:
- Configuring the Statistics Collector Profiler
- Configuring the Data Compliance Profiler
The new Asset Filtering Rules tab in Job Summary shows the relevant Allow and Deny list rules for each Data Compliance and Statistics Collector Profiler job.
Bug fixes and improvements.

Apache Parquet CVE-2025-30065

Cloudera released Cloudera Data Catalog 3.1.3 for Cloudera Data Services on cloud to address a critical vulnerability in the parquet-avro module of Apache Parquet.

Background:

On April 1, 2025, a critical vulnerability in the parquet-avro module of Apache Parquet (CVE-2025-30065, CVSS score 10.0) was announced.

Cloudera has determined the list of affected products, and is issuing this TSB to provide details of remediation for affected versions.

Upgraded versions are being released for all currently affected supported releases of Cloudera products. Customers using older versions are advised to upgrade to a supported release that has the remediation, once it becomes available.

Vulnerability Details:

Exploiting this vulnerability is only possible by modifying the accepted schema used for translating Parquet files and subsequently submitting a specifically crafted malicious file.

CVE-2025-30065 | Schema parsing in the parquet-avro module of Apache Parquet 1.15.0 and previous versions allows bad actors to execute arbitrary code.

CVE:NVD - CVE-2025-30065

Severity (Critical):

CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H

Impact:

Schema parsing in the parquet-avro module of Apache Parquet 1.15.0 and previous versions allows bad actors to execute arbitrary code. Attackers may be able to modify unexpected objects or data that was assumed to be safe from modification. Deserialized data or code could be modified without using the provided accessor functions, or unexpected functions could be invoked.

Deserialization vulnerabilities most commonly lead to undefined behavior, such as memory modification or remote code execution.

Addressed in release:

Upgrade to the following release containing the fix:

Cloudera Data Services on cloud (formerly Public Cloud Data Services)
- Release 3.1.3
  note
  Cloudera Data Catalog with Compute Cluster enabled environments supports all versions of Cloudera on cloud, V1 environments support only CDP Public Cloud 7.2.18 and earlier versions.

For the latest update on this issue, see the corresponding Knowledge article:

Cloudera Customer Advisory 2025-847: Cloudera's remediation actions for Apache Parquet CVE-2025-30065

July 23, 2025

This release (3.1.2) of the Cloudera Data Catalog service introduces the following new change:

You are now able to approve tags recommended by the Data Compliance Profiler before applying them to your assets and syncing them to Apache Atlas. This mean, that you can review tag suggestions to correct mistakenly applied tags, which would otherwise lead to unexpected changes of tag-based Apache Ranger policies.

June 17, 2025

This release (3.1.1) of the Cloudera Data Catalog service introduces the following new changes:

Profilers in Compute Cluster enabled environments now support profiling text based files. Custom delimiters are supported for Hive tables where the Hive Metastore contains them. Cloudera Data Catalog now supports LazySimpleSerDe for Hive tables. For more information, see the Apache Hive Developer Guide.
This release contains fixes and updates for the changes in the 3.1.0 release. For more information, see the list of fixed and known issues.

April 8, 2025

This release (3.1.0) of the Cloudera Data Catalog service introduces the following new changes:

Improved services for profilers

Thanks to the improved Cluster Setup API, the configuration of profilers is simplified
- Executor related settings only specify the maximum number of workers, an internal service manages the autoscaling within this range

Redesigned profiler setup

Settings for instance sizing and autoscaling are introduced

Improved profiler UI

The improved profilers present a more user friendly UI and several extended capabilities for Compute Cluster enabled environments.

New names for profilers in Compute Cluster enabled environments:
- The Cluster Sensitivity Profiler is now called Data Compliance profiler.
- The Hive Column Profiler is now called Statistics Collector profiler.
- The Ranger Audit Profiler is now called Activity Profiler.
Redesigned Profilers menu for easier access to jobs, configurations and their history, asset filtering and tag rules:
- The individual profilers show new metrics
  - Number of profiled assets of the last job
  - Job duration of the last job
  - The profilers menu also shows the next jobs’ start time and the number of completions
- The CRON expression based scheduler is supplemented with a natural language based scheduler
- Asset Filtering Rules is expanded with the list of assets affected by your rule set
- You can now access the Configuration History of a profiler, where you can check your changes in a sequential order
- The Job Summary page is introduced new metrics:
  - Workers details:
    - Worker Memory limit
    - Threads per workers
    - Number of workers
  - Last run check details
- The Job Summary page provides the list of profiled assets.

Redesigned and expanded Tag Rules for Compute Cluster enabled environments

Profiling table names is introduced next to column values or column names.
Atlas classifications (Cloudera Data Catalog tags) can be used in a more granular way thanks to the distinction between parent and child tags.
Tag rules are data lake specific in Compute Cluster enabled environments compared to being valid for all data lakes in VM-based environments.
The new Tag Rules tab offers filters to allow for faster searching and displays:
- List of applied parent and child tags
- Tag rule status (Can be used to filter for tag rules not yet validated by Dry Run)
- Rule types
- You can filter for tag rules that apply child tags
The initial loading time of rules has been decreased.
You can upload regex patterns in CSV files for easier handling.
Now you can specify weightage for column value based matching (which was fixed at 85% before). The column weightage and column name weightage add up to 100%.
When profiling column values, you can upload a sample set of column values instead of defining a regex pattern.
You can review your configuration before finalizing your tag rule.
Dry Run: Before deploying your tag rules, you have to test them with actual table data.
New API calls are available.

New file formats for Compute Cluster based profilers

Compute Cluster based profilers also support the ORC and Avro file format.

January 16, 2025

This release (2025-M1) of the Cloudera Data Catalog service introduces the following new changes:

Containerized architecture for profilers

Cloudera Data Catalog introduces a new containerized architecture for Profilers for Compute Cluster enabled environments, providing a scalable environment:

Only the required amount of Kubernetes pods are launched based on the size of the database to be profiled. You need to pay only for the used cloud resources only while they are used by the profilers.
Also, the deployment of the containerized profiler architecture is more streamlined and quicker than the previous VM-based architecture.
Moreover, the containerized nature of the architecture means that later upgrades can be carried out easier, without the need for multiple dependencies as in the VM-based architecture which used multiple services.
Profilers now also support the following file formats:
- VM-based environments: CSV, ORC
- Compute Cluster enabled environments: CSV and Parquet
  - Hive Column Profilers and Cluster Sensitivity Profilers also support profiling Iceberg Tables, including with On-Demand Profilers.
important
- Currently, Kubernetes based profilers are only supported in AWS environments.
- In Compute Cluster enabled environments, profilers only support tables which are stored on AWS S3 storage.

For more information, see Profiler architecture in Compute Cluster enabled environment.

Redesigned Dashboard menu

A new Dashboard is introduced to give a overview of your data lakes and profilers including:
- Data lake type and status
- Profiler status
- Last 10 assets bookmarked by you
- Last run of profiler
- Number of assets profiled

Redesigned Search menu

The Search menu is reorganized so information is easier to access. You can expand each entity result to see their qualified name, database, classification and assigned terms. You can use these to check if your query returns the expected results.

Improved display of comments in Asset Details

Following this release, you can hover over the Comment field for individual schema entries in Asset Details to preview longer comments without opening them.

Common time format

Asset Details and other menus will used the same time format for a more readable overview: MM/DD/YYYY hh:mm A.

Removed features

The following features have been removed:

Tag Rules > Luhn check algorithm
Tag Rules > File-based Allow and Deny list
Tag Rules > Lookup files
Tagging multiple assets in the Search menu