Spark/Big Data·8 min read·

Citi Data Engineer Interview Questions & Answers (2026)

Practice the 39 most asked data engineering questions at Citi. Covers Spark/Big Data, SQL, General/Other and more.

Why Citi Tests These Questions

Citi is known for rigorous data engineering interviews that focus on practical, production-level knowledge. With 39 questions in our vault, the most common category is Spark/Big Data (16 questions). Difficulty breakdown: 14 easy, 11 medium, 14 hard. Expect system design and optimization questions at senior levels.

Top 5 Most Asked Questions at Citi

- **Q1**: What is the difference between repartition and coalesce in Apache Spark? - **Q2**: What is the difference between SparkSession and SparkContext in Spark? - **Q3**: What is the difference between partitioning and bucketing in Spark, and when would you use bucketing? - **Q4**: What strategies can you use to handle skewed data in Spark? - **Q5**: What is the difference between Managed and External tables in Hive/Spark?

Category Breakdown for Citi Interviews

- **Spark/Big Data**: 16 questions - **General/Other**: 9 questions - **SQL**: 8 questions - **Python/Coding**: 3 questions - **System Design/Architecture**: 3 questions

How to Prepare

Focus on Spark/Big Data questions first, as they dominate Citi's interview pattern. Practice the top-frequency questions below, then move to adjacent categories. For senior roles, expect 1-2 system design rounds.

Get All Answers in PDF Format

1,800+ real interview questions with expert-level answers. Download and study offline.