ATech
Spark/PySpark Developer - Python Programming
Job Location
in, India
Job Description
Job Profile : Spark ( Pyspark ) Developer Industry Type : IT Services Job description : - The developer must have sound knowledge in Apache Spark and Python programming. - Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. - Experience in deployment and operationalizing the code is added advantage - Have knowledge and skills in Devops/version control and containerization. - Preferable having deployment knowledge. - Create Spark jobs for data transformation and aggregation - Produce unit tests for Spark transformations and helper methods - Write Scaladoc-style documentation with all code - Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data - Spark query tuning and performance optimization - Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques. - SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL) - Experience working with (HDFS, S3, Cassandra, and/or DynamoDB) - Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus) - Experience in building cloud scalable high-performance data lake solutions - Hands on expertise in cloud services like AWS, and/or Microsoft Azure. - As a Spark developer you will manage the development of scalable distributed Architecture defined by the Architect or tech Lead in our team. - Analyse, assemble large data sets to designed for the functional and non-functional requirements. - You will develop ETL scripts for big data sources. - Identify, design optimise data processing automate for reports and dashboards. - You will be responsible for workflow optimizations, data optimizations and ETL optimization as per the requirements elucidated by the team. - Work with stakeholders such as Product managers, Technical Leads Service Layer engineers to ensure end-to-end requirements are addressed. - Strong team player to adhere to Software Development Life cycle (SDLC) and documentations needed to represent every stage of SDLC. - Hands on working experience on any of the data engineering analytics platform (Hortonworks Cloudera MapR AWS), AWS preferred - Hands-on experience on Data Ingestion Apache Nifi, Apache Airflow, Sqoop, and Oozie - Hands-on working experience of data processing at scale with event driven systems, message queues (Kafka Flink Spark Streaming) - Hands on working Experience with AWS Services like EMR, Kinesis, S3, Cloud Formation, Glue, API Gateway, Lake Foundation - Hands on working Experience with AWS Athena - Data Warehouse exposure on Apache Nifi, Apache Airflow, Kylo - Operationalization of ML models on AWS (e.g. deployment, scheduling, model monitoring etc.) - Feature Engineering Data Processing to be used for Model development - Experience gathering and processing raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.) - Experience building data pipelines for structured unstructured, real-time batch, events synchronous asynchronous using MQ, Kafka, Steam processing - Hands-on working experience in analysing source system data and data flows, working with structured and unstructured data - Must be very strong in writing SQL queries (ref:hirist.tech)
Location: in, IN
Posted Date: 10/9/2024
Location: in, IN
Posted Date: 10/9/2024
Contact Information
Contact | Human Resources ATech |
---|