Creating tag rules in compute cluster environments
With tag rules, you can apply Apache Atlas classifications to your assets based on
regex expressions or similarity to a set of values in a
table.
To start applying tags, go to Profilers and select your
data lake.
Go to Profilers > Data Compliance > Tag Rules.
Click + Create Tag Rule.
Name your tag rule and add a description to it in General
Information.
Select the tags to be applied from the list of available tags synchronized from
the list of Atlas classifications.
If you select a child tag, its parent tag is also automatically selected. By
default, if the child tag is applied to a column, the table receives the parent
tag.
Select your Data Pattern Type:
OptionDescription
Regular Expression
You can upload a text file containing your regex expression or
directly type it in the Configure Tag Rule
page. The required format of the CSV file can be seen by clicking
Download Sample Tag Rule.
Upload a CSV file with values to be matched against the actual
values in your tables. After uploading your file, continue with step
11.
Creating regular expression based tag rule:
Optional: Define your regular expression for the table name.
When using Column Level regex expressions, you can define
multiple expression for both of the following:
Column Name
Column Values
Define the Column Value Weightage in percentage with the
slider.
The remainder percentage is the column name weightage percentage. The results
of the individual regex matches are weighted according to this setting before
determining the final result confidence for applying the tag.
Tag rule testing:
Optional: You can make a sanity check of your tag rule in Test Tag
Rule by uploading a sample dataset in CSV format.
Review all your input before clicking Create Tag
Rule.
Click Confirm to finalize your tag rule.
Your tag rule is created with StatusDisabled() and
the Test Status will be Test
Pending.
Click >
Dry Run.
The Dry Run Test pane opens.
Click Run to start an on-demand dry run profiling job on
up to 10 tables from your data.
Your tag rule becomes VALIDATED after a
successful dry run.
After the "Dry run" test was passed, click >
Enable to start your using your tag rule on your live
data.