DATA ENGINEER - DATABRICKS

iTRecruiter

Descrição da oferta:

DATA ENGINEER - DATABRICKS (HYBRID)

Portuguese company hires for remote/hybrid work

CANDIDATES MUST LIVE IN LISBON/PORTUGAL

FLUENT PORTUGUESE AND ENGLISH (C1)

SEND CV IN ENGLISH

Description

Position Summary:We are looking for a Databricks Specialist Consultant with a deep focus on Unity Catalog and Microsoft Purview, responsible for migrating files in Parquet format stored in Azure Storage Account containers to Delta Table format, with the implementation of a layered architecture (Bronze, Silver, Gold) in Databricks. The consultant will also be responsible for integrating these Delta Tables into Unity Catalog with centralised governance via Microsoft Purview, ensuring that the data is accessible and well governed while being used for dynamic reporting in Power BI.

Responsibilities

  • Expertise in Unity Catalog:
  • Migrate Parquet files stored in Azure Storage Account to Delta Tables registered in Unity Catalog in Databricks.
  • Apply granular access control at table, column and schema level using RBAC in Unity Catalog, ensuring compliance and security.
  • Configure and optimise Unity Catalog to provide centralised governance over all data in Databricks, ensuring that permissions and data lineage are clearly defined.
  • Advanced integration with Microsoft Purview:
  • Integrate Unity Catalog with Microsoft Purview for automatic cataloguing, data lineage tracking, and auditing.
  • Ensure that all data changes, permissions and metadata are visible and auditable via Purview, guaranteeing compliance with regulations such as GDPR and HIPAA.
  • Implementation of Layered Data Architecture:
  • Implement Bronze, Silver, Gold architecture in Databricks, with different layers of data for ingestion, transformation and final exposure for reporting.
  • Create Databricks clusters suitable for each layer, optimising performance and guaranteeing scalability and security in data processing.
  • Notebook Conversion and Delta Table Optimisation:
  • Review and migrate existing notebooks that handle Parquet files to use Delta Tables registered in Unity Catalog.
  • Implement performance optimisations in Delta Tables, using commands such as OPTIMIZE and VACUUM to improve query efficiency and free up space.
  • Reports and Visualizations with Power BI:
  • Ensure that data transformed and governed via Delta Tables in Databricks is accessible for real-time reporting in Power BI, using Direct Query to ensure data is always up-to-date.

Technical Skills Required

  • Expert in Unity Catalog:
  • Advanced experience in configuring, managing and optimizing Unity Catalog in Databricks, including access control, security policies and governance.
  • Microsoft Purview:
  • Proficiency in Microsoft Purview, with experience in integrating and maintaining data governance with Unity Catalog, ensuring traceability, auditing and compliance.
  • Databricks and Delta Lake:
  • Solid experience using Databricks for large-scale data manipulation, especially utilizing Delta Lake and Delta Tables.
  • Ability to implement and optimize data pipelines in Bronze, Silver and Gold tiers in Databricks.
  • Azure Storage:
  • In-depth knowledge of Azure Storage Accounts, Azure Blob Storage, and Azure Data Lake Storage (ADLS), including the manipulation of Parquet files for storage and performance optimization.
  • Power BI:
  • Ability to integrate Databricks data into Power BI to create dynamic dashboards and reports, ensuring security permissions are respected.

Desirable Skills

  • Data Governance and Security: Deep understanding of data governance, compliance policies, and security practices, especially in the context of sensitive data.
  • Pipeline Automation:

Experience in automating and orchestrating data pipelines in an Azure environment, with a focus on efficiency and resource optimization.

Mandatory Requirements

  • Expertise in Unity Catalog in Databricks.
  • Advanced experience with Microsoft Purview for data governance and auditing.
  • Proven ability to migrate and optimize data in Delta Tables and register in Unity Catalog.
  • In-depth knowledge of data manipulation in Azure Storage Accounts and Databricks.
  • Experience with data integration in Power BI for dynamic reports and visualizations.

Nice To Have

  • Microsoft Azure certifications, such as Azure Data Engineer or Azure Solutions Architect.
  • Experience with PySpark for task automation and optimization in Databricks.

Work Location : Lisbon

60% Remote and 40% Presential

#00271029