Profiler data testing

You must note the important information about profiler services.

The following dataset has been validated and works as expected for VM-based environments:

  • DataHub Master: m5.4xlarge
  • Hive tables: 3000 Hive assets
  • Total Number of assets (including Hive columns, tables, databases): 1,000,000
  • Total Data Size = 1 GB
  • Partitions on Hive tables: Around 5000 partitions spread across five tables

The following dataset has been validated and works as expected for Compute Cluster enabled environments:

  • Total Data Size = 300 GB
  • Sampling profiler size = 50% (150 GB)