TalentXo
Lead Big Data Engineer - PySpark/Python
Job Location
bangalore, India
Job Description
What You'll Be Doing : - Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabyte-sized data warehouse and elasticsearch cluster. - Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service. - Build data pipelines that optimize on data quality and are resilient to poor-quality data sources. - Own the data mapping, business logic, transformations, and data quality. - Low-level systems debugging, performance measurement & optimization on large production clusters. - Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects. - Maintain and support existing platforms and evolve to newer technology stacks and architectures. Ideal Candidate : - Proficiency in Python and PySpark. - Deep understanding of Apache Spark, Spark tuning, creating RDDs, and building data frames. - Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc. - Experience in building distributed environments using any of Kafka, Spark, Hive, Hadoop, etc. - Good understanding of the architecture and functioning of distributed database systems. - Experience working with various file formats like Parquet, Avro, etc., for large volumes of data. - Experience with one or more NoSQL databases. - Experience with AWS, GCP. - 5 years of professional experience as a data or software engineer. (ref:hirist.tech)
Location: bangalore, IN
Posted Date: 2/19/2025
Location: bangalore, IN
Posted Date: 2/19/2025
Contact Information
Contact | Human Resources TalentXo |
---|