Steerlean Consulting Services Pvt Ltd.
Senior Data Engineer - Azure Databricks/PySpark
Job Location
in, India
Job Description
Job Description : We are looking for a highly skilled Databricks PySpark Developer to join our data platform implementation team. In this role, you will be instrumental in designing, developing, and maintaining ETL processes to ensure efficient extraction, transformation, and loading of data from various sources into data lake and data warehouse. You will work closely with data engineers, data scientists, and business intelligence teams to build and optimize data workflows that support the project's analytics and reporting needs. Must-Have Skills : - AWS Glue (Crawler, Data Catalog). - Python/Pyspark. - Cloud Skills : - Experience with Snowflake and its architecture (internal/external tables, stages, masking policies). - Knowledge of AWS Services like SNS, S3, Lambda, Secret Manager, and Athena. - Familiarity with Jira, GitHub, and Agile methodology. Key Responsibilities : 1. ETL Development : - Design and develop ETL processes using Databricks PySpark to extract, transform, and load data from heterogeneous sources into our data lake and data warehouse. - Optimize ETL workflows for performance and scalability, leveraging Databricks PySpark and Spark SQL to efficiently process large data volumes. - Implement robust error handling and monitoring mechanisms to proactively detect and resolve issues within ETL processes. - Design and implement data solutions following the Medallion Architecture principles, organizing data into Bronze, Silver, and Gold layers. - Ensure data is appropriately cleansed, enriched, and optimized at each stage to support robust analytics and reporting. 2. Data Pipeline Management : - Hands On experience in creating advanced data pipelines using databricks workflows Develop and maintain data pipelines using Databricks PySpark, ensuring data quality, integrity, and reliability throughout the ETL lifecycle. - Collaborate with data engineering, data science, and business intelligence teams to translate data requirements into efficient ETL workflows and pipelines. 3. Data Analysis and Query Optimization : - Write and optimize complex SQL queries for data manipulation, aggregation, and analysis within Databricks PySpark applications. 4. Project Coordination and Continuous Improvement : - Participate in project planning and coordination activities to ensure timely delivery of ETL solutions. - Stay updated on the latest developments in Databricks PySpark, Spark SQL, and related technologies, recommending and implementing best practices and optimizations. - Document ETL processes, data lineage, and metadata to facilitate knowledge sharing and ensure compliance with data governance standards. 5. Cloud Platform Expertise : - Utilize cloud platforms (e.g. , AWS, Azure, or GCP) to design and deploy scalable and reliable SaaS solutions. - Optimize infrastructure for performance, security, and cost efficiency (ref:hirist.tech)
Location: in, IN
Posted Date: 11/20/2024
Location: in, IN
Posted Date: 11/20/2024
Contact Information
Contact | Human Resources Steerlean Consulting Services Pvt Ltd. |
---|