Concepts
Microsoft Purview is an advanced data governance solution that allows organizations to discover, understand, and manage their data assets across various sources. As part of its feature set, Purview supports the concept of data lineage, which provides insights into the origins, transformations, and destinations of data within an organization’s data ecosystem. In this article, we will explore how to push new or updated data lineage to Microsoft Purview using the available tooling and APIs.
1. Creating a Data Map:
To get started, you need to create a Data Map in Purview that represents your data ecosystem. This involves defining the metadata sources, data sources, and data transformations that are relevant to your organization. You can use the Purview Studio user interface or the REST API to create and configure the Data Map.
2. Pushing lineage from metadata sources:
Metadata sources such as Azure Data Factory, Azure Synapse Analytics, or Apache Atlas can provide information about the data assets and their lineage. You can configure these sources to push metadata and lineage information to Purview. The exact steps and configurations vary depending on the metadata source, but Purview provides comprehensive documentation for each specific integration.
3. Pushing lineage from data sources:
In addition to metadata sources, you can also push lineage information directly from the data sources themselves. By integrating with the appropriate connectors, Purview can capture lineage information from sources like Azure SQL Database, Azure Data Lake Storage, or Azure Blob Storage. Again, the specific steps and configurations depend on the data source, and the documentation provides detailed guidance for each integration.
4. Pushing lineage from data transformations:
Data transformations play a crucial role in understanding the lineage of your data. Tools like Azure Data Factory allow you to define and orchestrate complex data transformation workflows. By configuring Data Factory to log lineage information and integrating it with Purview, you can push the transformation details to the Data Map. This helps build a comprehensive lineage view for your data assets.
5. Utilizing the Purview REST API:
In addition to the integrations mentioned above, you can also use the Purview REST API to push new or updated data lineage. The API provides endpoints to create or update entities, relationships, classifications, and more. You can use the API to programmatically push lineage information from custom data sources, extract lineage information from external tools, or automate the ingestion process. The API documentation provides extensive details about the available endpoints and their usage.
It is important to note that pushing new or updated data lineage to Purview is an ongoing process. As your data ecosystem evolves and new data assets are added or modified, it is crucial to keep the Data Map up to date. By leveraging the integrations and APIs provided by Purview, you can ensure that your data lineage remains accurate and reflects the current state of your data assets.
As you work with Purview, make sure to refer to the official Microsoft documentation for detailed instructions and examples. The documentation provides step-by-step guidance for configuring the integrations, using the REST API, and managing the Data Map effectively. Stay up to date with the latest Purview features and releases, as Microsoft regularly introduces enhancements to improve data lineage capabilities.
In conclusion, Microsoft Purview is a powerful data governance tool that enables organizations to push new or updated data lineage information. By leveraging integrations with metadata sources, data sources, and data transformation tools, as well as utilizing the Purview REST API, you can ensure that your data lineage accurately reflects your data ecosystem. Stay informed with the official documentation to maximize the benefits of Purview’s data lineage capabilities.
Answer the Questions in Comment Section
When pushing new or updated data lineage to Microsoft Purview, which Azure service can be used for capturing and storing data lineage information?
a) Azure Data Catalog
b) Azure Data Factory
c) Azure Purview Data Map
d) Azure Cosmos DB
Correct answer: c) Azure Purview Data Map
Which statement is true regarding pushing new or updated data lineage to Microsoft Purview?
a) Data lineage updates can only be performed manually.
b) Data lineage updates are automatically captured and stored by default.
c) Data lineage updates can only be pushed from Azure SQL Database.
d) Data lineage updates can only be pushed from on-premises data sources.
Correct answer: b) Data lineage updates are automatically captured and stored by default.
When pushing new or updated data lineage to Microsoft Purview, which metadata store is used to store and manage the data lineage information?
a) Azure Blob Storage
b) Azure Data Lake Storage
c) Azure Purview Account
d) Azure Data Catalog
Correct answer: c) Azure Purview Account
Which Azure service can be used to catalog data sources and create a metadata schema for data lineage in Microsoft Purview?
a) Azure Synapse Analytics
b) Azure Databricks
c) Azure Purview Data Catalog
d) Azure Data Lake Storage
Correct answer: c) Azure Purview Data Catalog
What are the possible ways to push new or updated data lineage to Microsoft Purview? (Select all that apply)
a) Using Azure Data Factory
b) Using Apache Spark
c) Using Azure Logic Apps
d) Using REST API calls
Correct answer: a) Using Azure Data Factory, c) Using Azure Logic Apps, d) Using REST API calls
In order to push new or updated data lineage to Microsoft Purview, which permissions are required? (Select all that apply)
a) Owner or Contributor role on the Azure subscription
b) Purview Data Curator or Data Source Administrator role in Purview account
c) Read-only access to the data sources being tracked
d) Azure AD Global Administrator role
Correct answer: a) Owner or Contributor role on the Azure subscription, b) Purview Data Curator or Data Source Administrator role in Purview account
Which of the following data sources are supported for pushing new or updated data lineage to Microsoft Purview? (Select all that apply)
a) Azure Blob Storage
b) Azure Synapse Analytics
c) Azure Cosmos DB
d) On-premises SQL Server database
Correct answer: a) Azure Blob Storage, b) Azure Synapse Analytics, c) Azure Cosmos DB, d) On-premises SQL Server database
True or False: Data lineage updates pushed to Microsoft Purview can also include custom metadata annotations.
Correct answer: True
True or False: Microsoft Purview supports lineage tracking for streaming data sources such as Azure Event Hubs and Azure IoT Hub.
Correct answer: True
True or False: Pushing new or updated data lineage to Microsoft Purview requires additional configuration or setup in Azure, beyond enabling data cataloging for supported data sources.
Correct answer: False
Great post on pushing new or updated data lineage to Microsoft Purview! Helped me with my DP-203 preparations.
Does anyone have experience automating the process with Azure Data Factory?
Thanks for the detailed explanation! This will be a big help for my data engineering project.
What are the common challenges faced while integrating Purview with existing systems?
Very informative. Appreciate it!
How do you handle versioning of data lineage in Purview?
Thank you for sharing this knowledge. It’s very helpful.
Amazing post! Helped clarify many doubts.