Data Domain Blogger

Capture Data Lineage from dbt, Apache Airflow, and Apache Spark with Amazon SageMaker

Jun 25, 2025

—

by

in Amazon Web Services, Data Analytics, Data Engineering

Amazon SageMaker now offers enhanced data lineage capabilities compatible with OpenLineage, allowing users to track data flow from tools like dbt, Apache Airflow, and Apache Spark. This integration creates transparency, builds trust, and centralizes governance of data assets in a single place.

Amazon PackScan: Revolutionizing Real-Time Sort Center Analytics with AWS Services

Jun 15, 2025

—

by

Data Domain Blogger

in Amazon Web Services, Case Studies, Data Analytics, Data Engineering

Discover how Amazon transformed its logistics operations with PackScan, an AWS-powered platform that reduced data latency from 1 hour to under 1 minute. This real-time analytics solution processes 500,000 scan events per second across 80 sort centers, resulting in 25% increased throughput and 12% reduction in labor hours.

Using Amazon Neptune for Real-time Anomaly Detection in Gaming Transactions

Jun 13, 2025

—

by

Data Domain Blogger

in Amazon Web Services, Case Studies, Data Analytics, Data Engineering

Discover how Zupee leveraged Amazon Neptune’s graph database to detect real-time anomalies in gaming wallet transactions. Learn how they overcame relational database limitations to build an integrity system that processes over 1 million daily transactions, identifies suspicious patterns, and ensures incentives reach legitimate users.

How Flutter UKI Optimized Data Pipelines with Amazon MWAA

Jun 11, 2025

—

by

Data Domain Blogger

in Amazon Web Services, Case Studies, Data Analytics, Data Engineering

Discover how Flutter UKI transformed their data pipelines by migrating from EC2-based Airflow to Amazon MWAA, managing 5,500 DAGs and 60,000 daily runs with improved stability and reduced operational overhead.

How Meta Manages Data Understanding at Scale: A Privacy-First Approach

Jun 10, 2025

—

by

Data Domain Blogger

in Case Studies, Data Analytics, Data Engineering

Meta has developed an innovative approach to data understanding at scale through their Privacy Aware Infrastructure, using a five-step process that includes schematization, metadata prediction, annotation, asset inventory, and continuous maintenance. This system enables Meta to manage millions of data assets while protecting user privacy and driving product innovation.

Scaling Apache Iceberg Tables with AWS Lake Formation Hybrid Access Mode

Jun 8, 2025

—

by

Data Domain Blogger

in Amazon Web Services, Data Analytics, Data Engineering

Apache Iceberg tables combined with AWS Lake Formation’s hybrid access mode provide a powerful solution for enterprises managing large datasets. This approach allows organizations to use Lake Formation for read access while maintaining IAM policy-based permissions for write operations, offering fine-grained access control without disrupting existing workflows.

MIT’s SASA Method: Training LLMs to Self-Detoxify Their Language Output

Jun 6, 2025

—

by

Data Domain Blogger

in Artificial Intelligence, Case Studies, Machine Learning

MIT researchers have developed SASA, a method allowing Large Language Models to detoxify their own outputs without retraining. This system creates internal boundaries between toxic/non-toxic subspaces, helping LLMs generate appropriate content while maintaining natural language fluency—similar to how humans develop internal filters for appropriate speech.

Streamlining Cross-Account Orchestration with Amazon MWAA

Jun 4, 2025

—

by

Data Domain Blogger

in Amazon Web Services, Data Analytics, Data Engineering

Learn how to orchestrate data workflows across multiple AWS accounts and regions using Amazon Managed Workflows for Apache Airflow (MWAA). This article covers implementing secure cross-account access, creating custom Airflow operators, and following best practices for distributed data processing and machine learning pipelines.

Instagram’s Journey to Managing 1000+ ML Models

May 29, 2025

—

by

Data Domain Blogger

in Case Studies, Machine Learning

Instagram has successfully scaled its recommendation system to manage over 1000 ML models. This article explores how they built a robust infrastructure through a model registry, streamlined launching process, and innovative stability metrics to maintain high-quality personalized experiences for billions of users.

Author: Data Domain Blogger