Tables in Hive 1 and 2 vs. Hive 3
Ownership, specifying the location of Hive tables, and the default location of Hive tables have changed in Hive 3. You gain an understanding of some restrictions and limitations of Hive 3 in CDP under default configurations. You need to understand how this change impacts your workflow.
In Hive 1 and 2, the owner of a managed table data is the table itself. If you drop the table
you drop the data. In Hive 3, the system user hive
typically owns the managed table
data. Exceptions include Hive 3 Streaming in which the streaming user owns the data. Hive performs compaction of the files. Deltas and the data location is controlled by Hive.
In the default Hive 3 in CDP, you typically cannot specify a location in a CREATE TABLE statement. Hive finds the location of the table location by first looking in the metastore default warehouse directory, the managed warehouse directory. The default location of Hive tables has changed in Hive 3. The managed warehouse is the default location of ACID files. There is also a location in the warehouse for external files. You can change a Hive property to change the warehouse directory, and if your HiveServer (HS2) is connected to the Hive metastore (HMS), the new locations take effect. You can change the location of managed tables and the location of external tables for a specific database. New tables created in that database use the new location.
In Hive 3, all managed tables are transactional (ACID). There is a tight relationship between the metadata and data. Hive keeps track of file system changes as well as metastore changes to satisify features, such as column masking for fine-grained access restrictions.