Digihelic Solutions Private Limited

DigiHelic Solutions - Azure Databricks Engineer - Big Data/Hadoop

Job Location

mumbai, India

Job Description

Responsibilities : - Design, develop, and deploy robust and scalable data pipelines using Azure Databricks, Apache Spark, and PySpark. - Build and optimize ETL/ELT processes to ingest, transform, and load data from various sources into the Azure data lake and data warehouse. - Utilize Delta Lake within Azure Databricks to ensure data reliability, consistency, and performance. - Integrate Azure Databricks with other Azure data services, including Azure Data Lake Storage (ADLS Gen2), Azure Synapse Analytics, Azure SQL Database, Azure Data Factory, and Azure Event Hubs/Event Grid. - Write and optimize complex Spark SQL queries for data analysis and transformation. - Implement data quality checks, data validation, and monitoring mechanisms within the data pipelines. - Collaborate closely with data scientists, data analysts, and business stakeholders to understand their data requirements and provide efficient and scalable data solutions. - Monitor and troubleshoot data pipeline performance, identify bottlenecks, and implement optimizations to ensure efficient processing and resource utilization within Azure Databricks. - Implement data security and governance policies within the Azure Databricks environment, ensuring compliance with data regulations and company standards. - Create and maintain comprehensive technical documentation for data pipelines, workflows, and configurations within Azure Databricks. - Stay up-to-date with the latest features, updates, and best practices related to Azure Databricks and the broader Azure data ecosystem. - Participate in code reviews and contribute to the team's knowledge sharing and best practices. - Contribute to the design and architecture of our data platform on Skills : - Azure Databricks : Extensive hands-on experience (5 years) in designing, developing, and managing data solutions on the Azure Databricks platform, including : - Utilizing Spark SQL and DataFrames for data manipulation and analysis. - Implementing structured streaming for real-time data processing. - Working with Delta Lake for building reliable data lakes. - Managing Databricks clusters and optimizing cluster configurations. - Utilizing Databricks notebooks and workflows for data engineering tasks. - Apache Spark : Deep understanding of Apache Spark architecture, core concepts (RDDs, DataFrames, Datasets), and programming models. - PySpark : Strong proficiency in PySpark for developing data processing applications and interacting with Spark within the Azure Databricks environment. - SQL : Excellent SQL skills with the ability to write complex queries, perform data analysis, and optimize query performance, including experience with Spark SQL. - Azure Data Services : Proven experience in integrating Azure Databricks with other Azure data services, including : - Azure Data Lake Storage (ADLS Gen2) : Experience in storing and accessing large datasets. - Azure Synapse Analytics : Familiarity with data warehousing and analytical capabilities. - Azure SQL Database : Experience in connecting and interacting with relational databases. - Azure Data Factory : Understanding of data orchestration and ETL/ELT processes. - Azure Event Hubs/Event Grid : Experience with real-time data ingestion and event-driven architectures. - Python : Strong proficiency in Python programming, including experience with data manipulation libraries (e.g., Pandas) and building data pipelines. - Data Warehousing Concepts : Solid understanding of data warehousing principles, dimensional modeling, and ETL/ELT processes. - Performance Tuning : Proven ability to analyze and optimize Spark and Databricks jobs for performance, scalability, and cost-efficiency. - Version Control : Proficiency with Git and experience working with collaborative version control to Have Skills : - Experience with Scala programming. - Knowledge of data governance and data quality frameworks within the Azure ecosystem. - Experience with implementing CI/CD pipelines for Azure Databricks using Azure DevOps or other tools. - Familiarity with data visualization tools such as Power BI or Tableau. - Experience with other big data technologies (e.g., Hadoop, Kafka). - Azure certifications (e.g., Azure Data Engineer Associate). - Experience with infrastructure-as-code tools (e.g., Terraform, ARM : - Bachelor's degree in Computer Science, Engineering, or a related field. - 7 years of professional experience as a Data Engineer with a strong focus on Azure Databricks. - Proven experience in designing, building, and deploying data pipelines and analytics solutions on the Azure platform, with a significant emphasis on Azure Databricks. - Strong analytical and problem-solving skills with the ability to troubleshoot complex data engineering challenges. - Excellent verbal and written communication skills to effectively collaborate with team members and stakeholders. - Ability to work independently and as part of a collaborative team in a fast-paced environment. (ref:hirist.tech)

Location: mumbai, IN

Posted Date: 4/14/2025
View More Digihelic Solutions Private Limited Jobs

Contact Information

Contact Human Resources
Digihelic Solutions Private Limited

Posted

April 14, 2025
UID: 5142818227

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.