Spark actions that produce Atlas entities
Spark jobs create Spark application and process entities and create, update, or delete the data assets affected by those operations will affect Atlas entities; operations that only affect data do not show up in Atlas.
The following table lists the Spark actions that produce or update metadata in Atlas.
| This Action in Spark... | ...Produces metadata for these Atlas entities |
|---|---|
CREATE TABLE USING
CREATE TABLE AS SELECT, CREATE TABLE USING ... AS
SELECT |
spark_application, spark_column_lineage,
spark_process, hive_table,
hive_column, hive_storagedesc |
CREATE VIEW AS SELECT, |
spark_application, spark_process,
hive_table, hive_column,
hive_storagedesc |
|
spark_application, spark_process |
Notable actions in Spark that do NOT produce process entities in Atlas, meaning that no lineage is produced for these operations:
LOAD DATA INPATH(when not coming from a local file source)CREATE TABLE(hive_table metadata produced by HMS)ALTER VIEW(hive_table metadata produced by HMS)SELECTor other queries that don’t change table metadataSAVE(hdfs_path) does not generate lineage in Atlas
