Cross-Account Data Collaboration with Amazon DataZone and AWS Analytics Tools

Breaking Down Data Silos with Cross-Account Collaboration

Data sharing has become a crucial aspect of driving innovation, contributing to growth, and fostering collaboration across industries. According to Gartner, organizations promoting data sharing outperform their peers on most business value metrics. However, managing cross-account permissions and discovering the right data across accounts present significant challenges.

Amazon DataZone provides a solution by offering a fully managed data management service that helps catalog, discover, share, and govern data stored across AWS accounts.

Solution Overview

This cross-account data collaboration solution uses Amazon DataZone domain association to maintain security and governance while enabling seamless data sharing. The solution involves:

A producer account that contains and shares data assets
A consumer account that accesses the shared data
Amazon DataZone domain created in the producer account and associated with the consumer account

The process leverages AWS Resource Access Manager (AWS RAM) to share resources. When accounts are in the same AWS Organization, domain association happens automatically. For accounts in different organizations, AWS RAM sends an invitation to accept or reject the resource grant.

Key User Personas

Data Administrators: Account owners responsible for creating domains, configuring associations, and accepting domain associations
Data Publishers: Users in producer accounts who create publish projects and environments, produce data assets, and accept subscription requests
Data Subscribers: Users in consumer accounts who create subscribe projects, search for and subscribe to data assets, and query data

Implementation Walkthrough

The solution follows these high-level steps:

1. Create an Amazon DataZone domain in the producer account
2. Request domain association from producer to consumer account
3. Accept domain association in the consumer account
4. Add data users to the domain
5. Create publish projects for AWS Glue and Amazon Redshift
6. Set up environments to publish data assets
7. Create and run data sources to publish assets into the business catalog
8. Create subscribe projects
9. Configure environment profiles and environments
10. Subscribe to and consume the shared data

Technical Considerations

Amazon DataZone uses Amazon Redshift Datashares for cross-account data sharing, which has specific requirements:

Both producer and consumer clusters must be encrypted
Data sharing is supported only for provisioned ra3 cluster types and Amazon Redshift Serverless
Proper IAM roles and permissions must be configured
AWS Secrets Manager is used to store database credentials with specific tags for access control

Data Publishing Process

The data publishing workflow involves:

1. Creating data sources that connect to AWS Glue and Amazon Redshift
2. Running these data sources to ingest metadata into Amazon DataZone
3. Reviewing and publishing the assets to the business data catalog
4. Making the assets discoverable and accessible to authorized users

Data Consumption Process

The data consumption workflow includes:

1. Searching for published assets in the catalog
2. Requesting subscription with proper justification
3. Getting approval from data publishers
4. Accessing and querying the data using analytics tools like Amazon Athena and Amazon Redshift query editor

Security and Governance

Throughout the process, Amazon DataZone maintains security and governance by:

Using AWS RAM for secure resource sharing
Implementing proper IAM roles and policies
Requiring explicit approval for subscription requests
Supporting AWS Lake Formation access monitoring and AWS CloudTrail for auditing

This comprehensive solution enables organizations to overcome the challenges of cross-account data sharing while maintaining robust security, governance, and discoverability.

Visit AWS Blog for more information on cross-account data collaboration with Amazon DataZone and AWS analytical tools