Spark/Big Data

Data engineering interview questions

All easy medium hard

80+

Easy

80+

Medium

280+

Hard

Explain the benefits of using columnar storage formats like Parquet or ORC.

Spark/Big Datahard

Explain the concept of RDD, DataFrame, and Dataset in PySpark.

Spark/Big Datahard

Explain the concept of consumer groups in Kafka. How do they affect message processing?

Spark/Big Datahard

Explain the concept of preemptible VMs in Dataproc and their cost implications.

Spark/Big Datahard

Explain the configuration of a Spark cluster for optimal performance

Spark/Big Datahard

Explain the difference between TriggerDagRunOperator and ExternalTaskSensor in Airflow.

Spark/Big Datahard

Explain the difference between coalescing and repartitioning in Spark

Spark/Big Datahard

Explain the differences between Spark's shuffle and broadcast join. When would you use each?

Spark/Big Datahard

+20 More Questions with Expert Answers

Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.

Get PDF Bundle - from $21 Try Free Sample

Previous 1...8 9 10 11 12...23 Next

Other Categories

Behavioral SQL Python/Coding System Design/Architecture Cloud/Tools General/Other

Spark/Big Data

Data engineering interview questions

All easy medium hard

80+

Easy

80+

Medium

280+

Hard

Explain the benefits of using columnar storage formats like Parquet or ORC.

Spark/Big Datahard

Explain the concept of RDD, DataFrame, and Dataset in PySpark.

Spark/Big Datahard

Explain the concept of consumer groups in Kafka. How do they affect message processing?

Spark/Big Datahard

Explain the concept of preemptible VMs in Dataproc and their cost implications.

Spark/Big Datahard

Explain the configuration of a Spark cluster for optimal performance

Spark/Big Datahard

Explain the difference between TriggerDagRunOperator and ExternalTaskSensor in Airflow.

Spark/Big Datahard

Explain the difference between coalescing and repartitioning in Spark

Spark/Big Datahard

Explain the differences between Spark's shuffle and broadcast join. When would you use each?

Spark/Big Datahard

+20 More Questions with Expert Answers

Get the complete 1,800+ question library with detailed, expert-level answers covering SQL, Spark, System Design, Python, Cloud, and Behavioral topics.

Get PDF Bundle - from $21 Try Free Sample

Previous 1...8 9 10 11 12...23 Next

Other Categories

Behavioral SQL Python/Coding System Design/Architecture Cloud/Tools General/Other