Experienced with Spark framework and related tools (PySpark, SparkR, Spark SQL, Spark UI)
Solid understanding of performance tuning concepts for Apache Spark jobs and handling large data volumes data from various sources/formats
Optimize Spark code, identify the bottlenecks, understand the scalability requirements
Experienced with Data wrangling and creating workable datasets and work on different file formats like Parquet, ORC, Sequence files, and different serialization formats like Avro
Solid experience with AWS EMR cluster and cluster configurations such as auto-scaling etc.
Comfortable with deploying and managing Spark applications with Yarn
Comfortable with using Spark cluster monitoring tools and integrations
Comfortable in Shell-Scripts, Cron Automation, and Regular Expressions
Comfortable with Cloud systems and AWS bigdata services
Preferred Qualifications
Experienced with Spark-jobserver and fine tuning the job server configurations
Spark on Kubernetes (EKS for example), EMR on EKS approaches Quick learner
Responsibilities
Develop Spark application using Scala and Python
Participate in research and development of Big Data related topics
Participate in enhancements of Big Data infrastructure (scaling, performance improving)
Solve complex problems using the most appropriate polyglot architecture
Follow the best engineering practices, the quality criteria set forth on the project
Performing cost/benefit analysis of different tools and technologies,
Document processes, features, architecture and implementation details.
Monitoring and maintenance production systems,
Writing clean and correct code,
Collaboratively works with customer support team to resolve or diagnose defects