Scaling Apache Iceberg Tables with AWS Lake Formation Hybrid Access Mode

Understanding Apache Iceberg and Hybrid Access Mode

Apache Iceberg has become a preferred table format for enterprises due to its capabilities like change data capture (CDC), ACID compliance, and schema evolution. These features are ideal for managing large datasets with rapidly incoming records.

AWS Lake Formation provides centralized data access management with fine-grained permissions. While many organizations want to leverage Lake Formation for read access control, they often need to maintain IAM policy-based permissions for write operations, particularly for schema updates and data upserts.

What is Hybrid Access Mode?

Lake Formation’s hybrid access mode creates a bridge between these two permission models. It allows you to register Amazon S3 data locations containing Iceberg tables with Lake Formation while enabling a dual-permission approach:

  • Lake Formation permissions for read access and fine-grained control
  • IAM policy-based permissions for write operations

This approach prevents disruptions to existing IAM policy-based workflows while extending scalable read access to new users through Lake Formation.

Key Benefits of Hybrid Access Mode

  • Eliminating Data Replication: Provides various access levels for different user personas without creating multiple data copies
  • Minimizing Disruption: Adds Lake Formation users with minimal impact to existing IAM policy-based users
  • Supporting Transactional Writes: Accommodates operations like insert, update, and delete that may not be supported by all analytics engines for Lake Formation managed tables

Implementation Architecture

The typical implementation involves maintaining ETL applications with IAM role-based access for write operations while providing data analysts with Lake Formation permissions for read access.

The high-level setup involves:

  • Ensuring IAMAllowedPrincipals has Super access to the database and tables
  • Registering data locations with Lake Formation in hybrid access mode
  • Granting DATA LOCATION permission to IAM roles managing tables
  • Adding appropriate permissions (like SELECT) to analyst roles in Lake Formation
  • Opting-in specific users to make Lake Formation permissions effective

Technical Implementation Considerations

When implementing hybrid access mode, you’ll need to create appropriate IAM roles with specific permissions for both the ETL processes and data analysts. The ETL role requires permissions to interact with Amazon S3, AWS Glue, and possibly AWS KMS, while analyst roles need Glue catalog access and Lake Formation permissions.

Table creation workflows remain unchanged, but the registration process with Lake Formation requires special attention to ensure correct permission settings and opt-in configuration for users.

Testing the Solution

After implementation, it’s crucial to test both read and write operations:

  • Query the Iceberg table as a data analyst to verify read permissions
  • Perform upsert operations (insert, update, delete) using the ETL role
  • Verify that analysts can see the updated data through their read-only access

This verification ensures that the permission model is functioning correctly, providing appropriate access levels to different user groups while maintaining a single source of truth for the data.

Conclusion

AWS Lake Formation’s hybrid access mode provides a flexible approach to scaling Apache Iceberg table access. Organizations can leverage fine-grained permissions for read access while maintaining IAM-based controls for write operations. This methodology can be extended to other open table formats, making it a versatile solution for data governance and access management.

For organizations gradually adopting Lake Formation, this hybrid approach delivers the best of both worlds – enhanced security and governance from Lake Formation with the flexibility of IAM-based permissions where needed.

Visit here for more information on reading and writing Apache Iceberg tables using AWS Lake Formation hybrid access mode