Digihelic Solutions Private Limited

DigiHelic Solutions - Azure Databricks Engineer - Big Data/Hadoop

Job Location

mumbai, India

Job Description

Responsibilities : - Design, develop, and deploy robust and scalable data pipelines using Azure Databricks, Apache Spark, and PySpark. - Build and optimize ETL/ELT processes to ingest, transform, and load data from various sources into the Azure data lake and data warehouse. - Utilize Delta Lake within Azure Databricks to ensure data reliability, consistency, and performance. - Integrate Azure Databricks with other Azure data services, including Azure Data Lake Storage (ADLS Gen2), Azure Synapse Analytics, Azure SQL Database, Azure Data Factory, and Azure Event Hubs/Event Grid. - Write and optimize complex Spark SQL queries for data analysis and transformation. - Implement data quality checks, data validation, and monitoring mechanisms within the data pipelines. - Collaborate closely with data scientists, data analysts, and business stakeholders to understand their data requirements and provide efficient and scalable data solutions. - Monitor and troubleshoot data pipeline performance, identify bottlenecks, and implement optimizations to ensure efficient processing and resource utilization within Azure Databricks. - Implement data security and governance policies within the Azure Databricks environment, ensuring compliance with data regulations and company standards. - Create and maintain comprehensive technical documentation for data pipelines, workflows, and configurations within Azure Databricks. - Stay up-to-date with the latest features, updates, and best practices related to Azure Databricks and the broader Azure data ecosystem. - Participate in code reviews and contribute to the team's knowledge sharing and best practices. - Contribute to the design and architecture of our data platform on Skills : - Azure Databricks : Extensive hands-on experience (5 years) in designing, developing, and managing data solutions on the Azure Databricks platform, including : - Utilizing Spark SQL and DataFrames for data manipulation and analysis. - Implementing structured streaming for real-time data processing. - Working with Delta Lake for building reliable data lakes. - Managing Databricks clusters and optimizing cluster configurations. - Utilizing Databricks notebooks and workflows for data engineering tasks. - Apache Spark : Deep understanding of Apache Spark architecture, core concepts (RDDs, DataFrames, Datasets), and programming models. - PySpark : Strong proficiency in PySpark for developing data processing applications and interacting with Spark within the Azure Databricks environment. - SQL : Excellent SQL skills with the ability to write complex queries, perform data analysis, and optimize query performance, including experience with Spark SQL. - Azure Data Services : Proven experience in integrating Azure Databricks with other Azure data services, including : - Azure Data Lake Storage (ADLS Gen2) : Experience in storing and accessing large datasets. - Azure Synapse Analytics : Familiarity with data warehousing and analytical capabilities. - Azure SQL Database : Experience in connecting and interacting with relational databases. - Azure Data Factory : Understanding of data orchestration and ETL/ELT processes. - Azure Event Hubs/Event Grid : Experience with real-time data ingestion and event-driven architectures. - Python : Strong proficiency in Python programming, including experience with data manipulation libraries (e.g., Pandas) and building data pipelines. - Data Warehousing Concepts : Solid understanding of data warehousing principles, dimensional modeling, and ETL/ELT processes. - Performance Tuning : Proven ability to analyze and optimize Spark and Databricks jobs for performance, scalability, and cost-efficiency. - Version Control : Proficiency with Git and experience working with collaborative version control to Have Skills : - Experience with Scala programming. - Knowledge of data governance and data quality frameworks within the Azure ecosystem. - Experience with implementing CI/CD pipelines for Azure Databricks using Azure DevOps or other tools. - Familiarity with data visualization tools such as Power BI or Tableau. - Experience with other big data technologies (e.g., Hadoop, Kafka). - Azure certifications (e.g., Azure Data Engineer Associate). - Experience with infrastructure-as-code tools (e.g., Terraform, ARM : - Bachelor's degree in Computer Science, Engineering, or a related field. - 7 years of professional experience as a Data Engineer with a strong focus on Azure Databricks. - Proven experience in designing, building, and deploying data pipelines and analytics solutions on the Azure platform, with a significant emphasis on Azure Databricks. - Strong analytical and problem-solving skills with the ability to troubleshoot complex data engineering challenges. - Excellent verbal and written communication skills to effectively collaborate with team members and stakeholders. - Ability to work independently and as part of a collaborative team in a fast-paced environment. (ref:hirist.tech)

Location: mumbai, IN

Posted Date: 4/14/2025

View More Digihelic Solutions Private Limited Jobs

Contact Information

Contact	Human Resources Digihelic Solutions Private Limited