The Azure Guide: Synapse Analytics & GDPR
Due to GDPR constraints, handling of personally identifiable information (PII) has become an increasingly important task.
Data is initially written to a volatile data store, from which Azure Synapse Pipelines pick it up. These pipelines check a configuration table, whether the data contains columns with PII information. For tables without obfuscation requirements the data is copied to the data lake as is. For the tables with PII data the PII columns are read as configuration data, and rule-based dataflow activities calculate hash values for the PII data and a unique id. The key-value pairs are send to the ADX database, while the hash values and ids are written to the data lake.
The obfuscation keys in Azure Data Explorer can be used to dynamically de-obfuscate values in Power BI reports, if required permissions are granted.
This process is currently used in a data platform project, for which each day about 200 columns in 80 data objects are obfuscated. Due to the generic approach it can be easily expanded to new objects by updating configuration.