What's new

This section lists major features and updates for the Cloudera Data Catalog service.

January 16, 2025

This release (2025-M1) of the Cloudera Data Catalog service introduces the following new changes:

Containerized architecture for profilers

Cloudera Data Catalog introduces a new containerized architecture for Profilers for Compute Cluster enabled environments, providing a scalable environment:
  • Only the required amount of Kubernetes pods are launched based on the size of the database to be profiled. You need to pay only for the used cloud resources only while they are used by the profilers.
  • Also, the deployment of the containerized profiler architecture is more streamlined and quicker than the previous VM-based architecture.
  • Moreover, the containerized nature of the architecture means that later upgrades can be carried out easier, without the need for multiple dependencies as in the VM-based architecture which used multiple services.
  • Profilers now also support the following file formats:
    • VM-based environments: CSV, ORC

    • Compute Cluster enabled environments: CSV and Parquet

      • Hive Column Profilers and Cluster Sensitivity Profilers also support profiling Iceberg Tables, including with On-Demand Profilers.

For more information, see Profiler architecture in Compute Cluster enabled environment.

Containerized Data Lake

In Compute Cluster enabled environments profilers are running on the default compute cluster automatically created when registering a Compute Cluster enabled environment. For more information, see Using Compute Clusters (AWS) and Using Compute Clusters (Azure).Cloudera will soon release the Containerized Data Lake for the Compute Cluster enabled environment, providing the capabilities of an Enterprise Data Lake with better scaling in a new data lake shape.

Redesigned Dashboard menu

  • A new Dashboard is introduced to give a overview of your data lakes and profilers including:

    • Data lake type and status

    • Profiler status

    • Last 10 assets bookmarked by you

    • Last run of profiler

    • Number of assets profiled

Redesigned Search menu

The Search menu is reorganized so information is easier to access. You can expand each entity result to see their qualified name, database, classification and assigned terms. You can use these to check if your query returns the expected results.

Improved display of comments in Asset Details

Following this release, you can hover over the Comment field for individual schema entries in Asset Details to preview longer comments without opening them.

Common time format

Asset Details and other menus will used the same time format for a more readable overview: MM/DD/YYYY hh:mm A.

Removed features

The following features have been removed:

  • Tag Rules > Luhn check algorithm
  • Tag Rules > File-based Allow and Deny list
  • Tag Rules > Lookup files
  • Tagging multiple assets in the Search menu

December 18, 2024

This release (2.0.28) of the Cloudera Data Catalog service introduces the following new changes:

Column name based tagging in Cluster Sensitivity Profiler

You can override the sampling to profile data in a column based on the column name matching a preset regular expression pattern instead matching the certain percentage of the columns values. This can be used for assets with skewed proportions where relying on the sampling would not result in correct tagging.

For more information, see Cluster Sensitivity Profiler configuration and Setting up column name based tagging.

June 03, 2024

Cloudera Data Catalog is a service within Cloudera that enables you to understand, manage, secure, and govern data assets across enterprise data lakes. Cloudera Data Catalog helps you understand data across multiple clusters and across multiple environments.

This release of the Cloudera Data Catalog service introduces the following new changes:

This release only contains fixes and updates to prepare Cloudera Data Catalog for the changes in the upcoming 3.0.0 release.

May 16, 2024

Cloudera Data Catalog is a service within Cloudera that enables you to understand, manage, secure, and govern data assets across enterprise data lakes. Cloudera Data Catalog helps you understand data across multiple clusters and across multiple environments.

This release of the Cloudera Data Catalog service introduces the following new changes:

Iceberg tables are now supported by the Cloudera Data Catalog service:

  • You are able to filter for them in the Search page.
  • Iceberg tables can be viewed in the Asset Details page.
  • Iceberg tables can be added to a dataset.
  • All subcomponents of Cloudera Data Catalog support JDK 17.

April 3, 2024

This release of the Cloudera Data Catalog service provides you with a notable behavior change which you must note and act accordingly.

While upgrading your cluster from Cloudera Runtime version 7.2.17 to 7.2.18, and specifically during the OS upgrade step, the cluster goes into the failure state. The following message is seen:

__NODE_FAILURE:

New node(s) could not be added to the cluster. Reason Please find more details on Cloudera Manager UI. Failed command(s): Start(id=1546339088): Failed to start role profc6cf3856-PROFILER_SCHEDULER_AGENT-484032cb8f17cacf9e684efe50 of service profiler_scheduler in cluster cdp-dc-profilers-258395ef._

Impact on Cloudera Data Catalog profilers:

If the Cloudera Data Hub is not created, then the Cloudera Data Catalog profilers will not be created in Cloudera Runtime 7.2.18 version.

To overcome this scenario, you must use the following process to bring up the Cloudera Data Catalog profilers in the Cloudera Runtime 7.2.18 version.

First you must delete your existing 7.2.17 clusters. For more information, see Deleting profiler cluster.

Next, after you upgrade to the 7.2.18 Data Lake, then you can launch the Cloudera Data Catalog profilers. For more information, see Launch profiler cluster.