View All Jobs

PySpark

ChennaiQuality AssuranceFull-Time

We are seeking a highly skilled PySpark Data Engineer with a minimum of 8 years of experience in the relevant field. As a PySpark Data Engineer, you will play a crucial role in designing and implementing data processing solutions using the PySpark framework.

Required Skills:

  • Should be strong in Pyspark
  • Should have hands-on experience in MWAA (Airflow) / AWS EMR (Hadoop, Hive) framework
  • Hands-on and working knowledge in Python
  • Knowledge in AWS services like EMR, S3, Lambda, Step Function, Aurora – RDS
  • Having good knowledge of RDBMS and Any SQL
  • Person should work as an Individual contributor
  • Having experience converting a large set of data from RDBMS to NoSQL
  • Having experience building data lakes & configurations on delta tables
  • Having good experience with computing & cost optimization
  • Understanding the environment and use case and ready to build holistic frameworks
  • Having good communication skills to interact with IT-Stakeholders and Business

Key Responsibilities:

  • Designing and implementing data processing pipelines using PySpark framework.
  • Developing efficient and scalable Spark applications to process large volumes of data.
  • Collaborating with data engineers and data scientists to understand business requirements and translate them into technical solutions.
  • Optimizing Spark jobs for performance and resource utilization.
  • Troubleshooting and debugging issues in Spark applications.
  • Writing clean and maintainable code following best practices and coding standards.
  • Working with big data technologies such as Hadoop, Hive, and Kafka.
  • Implementing data transformation, aggregation, and analysis tasks using Spark SQL and DataFrame APIs.
  • Documenting technical specifications, design documents, and code changes.
  • Keeping abreast of the latest developments in big data technologies and incorporating them into the development process.

Submit your Application