ATech

Spark/PySpark Developer - Python Programming

Click Here to Apply

Job Location

in, India

Job Description

Job Profile : Spark ( Pyspark ) Developer Industry Type : IT Services Job description : - The developer must have sound knowledge in Apache Spark and Python programming. - Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. - Experience in deployment and operationalizing the code is added advantage - Have knowledge and skills in Devops/version control and containerization. - Preferable having deployment knowledge. - Create Spark jobs for data transformation and aggregation - Produce unit tests for Spark transformations and helper methods - Write Scaladoc-style documentation with all code - Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data - Spark query tuning and performance optimization - Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques. - SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL) - Experience working with (HDFS, S3, Cassandra, and/or DynamoDB) - Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus) - Experience in building cloud scalable high-performance data lake solutions - Hands on expertise in cloud services like AWS, and/or Microsoft Azure. - As a Spark developer you will manage the development of scalable distributed Architecture defined by the Architect or tech Lead in our team. - Analyse, assemble large data sets to designed for the functional and non-functional requirements. - You will develop ETL scripts for big data sources. - Identify, design optimise data processing automate for reports and dashboards. - You will be responsible for workflow optimizations, data optimizations and ETL optimization as per the requirements elucidated by the team. - Work with stakeholders such as Product managers, Technical Leads Service Layer engineers to ensure end-to-end requirements are addressed. - Strong team player to adhere to Software Development Life cycle (SDLC) and documentations needed to represent every stage of SDLC. - Hands on working experience on any of the data engineering analytics platform (Hortonworks Cloudera MapR AWS), AWS preferred - Hands-on experience on Data Ingestion Apache Nifi, Apache Airflow, Sqoop, and Oozie - Hands-on working experience of data processing at scale with event driven systems, message queues (Kafka Flink Spark Streaming) - Hands on working Experience with AWS Services like EMR, Kinesis, S3, Cloud Formation, Glue, API Gateway, Lake Foundation - Hands on working Experience with AWS Athena - Data Warehouse exposure on Apache Nifi, Apache Airflow, Kylo - Operationalization of ML models on AWS (e.g. deployment, scheduling, model monitoring etc.) - Feature Engineering Data Processing to be used for Model development - Experience gathering and processing raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, etc.) - Experience building data pipelines for structured unstructured, real-time batch, events synchronous asynchronous using MQ, Kafka, Steam processing - Hands-on working experience in analysing source system data and data flows, working with structured and unstructured data - Must be very strong in writing SQL queries (ref:hirist.tech)

Location: in, IN

Posted Date: 10/9/2024
Click Here to Apply
View More ATech Jobs

Contact Information

Contact Human Resources
ATech

Posted

October 9, 2024
UID: 4880834221

AboutJobs.com does not guarantee the validity or accuracy of the job information posted in this database. It is the job seeker's responsibility to independently review all posting companies, contracts and job offers.