How Zepto Scales Millions Daily Orders with DynamoDB

The Challenge with Monolithic Architecture

Zepto, India’s quick-commerce pioneer launched in 2021, faced significant scaling challenges as it grew from a single micro-warehouse to processing millions of orders daily across 1,000+ stores. The company’s initial monolithic architecture, built on Amazon Aurora PostgreSQL-Compatible Edition, struggled to handle the massive data volumes and concurrent operations required for their lightning-fast delivery service.

The primary bottlenecks emerged in their Order Management System (OMS), where the Customer_Orders table accumulated billions of records over multiple years. This created several critical issues:

Expensive read operations: Monitoring jobs scanning for problematic transactions experienced severe performance degradation
Write latency issues: Concurrent status updates competing for table locks increased write latency
Autovacuum conflicts: Database maintenance operations interfered with peak-hour performance
Operational complexity: Extensive parameter tuning and maintenance consumed valuable engineering resources

Strategic Migration to DynamoDB

To address these challenges, Zepto implemented a hybrid architecture approach. Rather than completely replacing their Aurora PostgreSQL setup, they strategically migrated specific use cases to Amazon DynamoDB—a fully managed, serverless NoSQL database delivering single-digit millisecond performance at any scale.

The solution involved creating a new Draft-Order service that handles orders before payment confirmation, effectively separating payment-related operations from order fulfillment processes. This lightweight service accepts orders regardless of payment status and provides logical separation between payment updates and fulfillment workflows.

Key factors driving the DynamoDB adoption included:

Consistent single-digit millisecond performance regardless of scale
Serverless operational excellence with no maintenance windows or patching
Cost-effectiveness based on Zepto’s throughput requirements

Schema Design and Access Patterns

DynamoDB’s performance optimization requires upfront knowledge of query patterns. Zepto identified four critical access patterns:

Get order using ORDER_ID
Update order attributes using ORDER_ID
Retrieve orders with unsuccessful payments beyond specific timeframes
Monitor payment completion status

The Draft_Orders table uses ORDER_PK as the partition key (supporting both ORDER_ID and ORDER_CODE identifiers) and includes essential attributes like STATE, EXPIRE_AT, and compressed DATA snapshots. A Global Secondary Index (GSI) with POS_PK and POS_SK keys enables efficient querying of payment-pending orders while preventing hot partition issues through uniform distribution.

Production Rollout and Performance Optimization

During initial testing, Zepto discovered that DynamoDB transactions were causing 18-millisecond average latencies. By eliminating unnecessary synchronous operations and using separate PutItem API calls instead of transactions, they achieved sub-10-millisecond response times while reducing Write Capacity Unit consumption by 50%.

The gradual rollout started with 10% of traffic, expanded to 30%, and finally reached full production deployment. This careful approach ensured system stability while validating performance improvements.

Measurable Business Impact

The DynamoDB migration delivered substantial improvements:

60% faster Create Order API performance on average, with 40% improvement at p99
Consistent single-digit millisecond performance across all operations
Reduced operational overhead by eliminating database maintenance tasks
Enhanced scalability supporting traffic variations from quiet periods to high-demand festivals

These improvements enable Zepto to maintain their promise of ultra-fast delivery while handling exponential growth in order volumes.

Visit the original AWS blog post for more detailed technical information