JavaScript is required to use this application. Please enable JavaScript in your browser settings or disable any extensions that may be blocking scripts.
Data engineering interview questions
Explain the benefits of using columnar storage formats like Parquet or ORC.
Explain the concept of RDD, DataFrame, and Dataset in PySpark.
Explain the concept of consumer groups in Kafka. How do they affect message processing?
Explain the concept of preemptible VMs in Dataproc and their cost implications.
Explain the configuration of a Spark cluster for optimal performance
Explain the difference between TriggerDagRunOperator and ExternalTaskSensor in Airflow.
Explain the difference between coalescing and repartitioning in Spark
Explain the differences between Spark's shuffle and broadcast join. When would you use each?
Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.