Profiler data testing
You must note the important information about profiler services.
The following dataset has been validated and works as expected for VM-based environments:
- DataHub Master: m5.4xlarge
- Hive tables: 3000 Hive assets
- Total Number of assets (including Hive columns, tables, databases): 1,000,000
- Total Data Size = 1 GB
- Partitions on Hive tables: Around 5000 partitions spread across five tables
The following dataset has been validated and works as expected for Compute Cluster enabled environments:
- Total Data Size = 300 GB
- Sampling profiler size = 50% (150 GB)