PayPal’s Historic 300PB Data Migration to BigQuery Enables AI Innovation

Untapped Potential of Data

As one of the original digital payment pioneers, PayPal processes billions of transactions and houses decades of valuable customer insights. After 25 years of growth and acquisitions including Venmo and Braintree, PayPal faced a significant challenge: 400 petabytes of data spread across a dozen siloed systems, creating complexity that threatened their next evolution.

The fragmentation limited PayPal’s ability to offer personalized experiences and gain deeper insights. With the gen AI era dawning, this digital fragmentation was becoming more than just a technical inconvenience—it was severely limiting their ability to create intelligent experiences customers expect.

Legacy Systems, Modern Ambitions

The scope was massive. PayPal needed to consolidate multiple data platforms, including what’s believed to be the world’s largest Teradata deployment, along with:

  • Hadoop clusters
  • Redshift systems
  • Snowflake deployments
  • Various other systems processing petabytes of transaction data

After evaluating various solutions, PayPal chose BigQuery due to its fully managed, cloud-native platform with disaggregated compute and storage that can scale independently. Most importantly, BigQuery’s native integrations with AI enable seamless and efficient data analytics.

The Journey to Unified Data

Working with Google Cloud Consulting, PayPal migrated more than 300 petabytes of data while maintaining zero downtime. Key success factors included:

  • Alignment: Making it an enterprise-wide priority with stakeholder buy-in
  • Discovery and Analysis: Detailed inventories of data, workloads, and data streams
  • Strategy: Establishing fundamental principles for security, governance, and consumption tracking
  • Execution: Automating tasks and developing live monitoring dashboards

Transformative Benefits Achieved

The migration delivered significant improvements across multiple dimensions:

  • Faster Insights: Queries are 2.5x to 10x faster, enabling real-time personalization
  • AI Foundations: Data for model training is 16x fresher, accelerating AI development
  • Operational Efficiency: Reduced infrastructure vendors from four to one, eliminating data duplication

AI-Powered Innovation Unleashed

This unified data platform enables PayPal to explore new AI-powered experiences:

  • Predictive fraud prevention that spots issues before they affect customers
  • Personalized financial insights for merchant optimization
  • Seamless payment experiences adapted to customer preferences
  • Intelligent risk assessment to expand financial access
  • Agentic commerce and future possibilities

Lessons for the AI Era

PayPal’s transformation offers valuable insights for organizations considering their own data modernization:

  • Don’t underestimate how under-utilized and unorganized your data may be
  • Centralized, accurate, and consistent data paves the way for AI experimentation
  • Accessible data with proper controls unlocks organizational potential
  • Data orchestration coupled with generative AI can break down silos and speed decision-making

The financial world continues evolving with new technologies and changing customer expectations. PayPal’s data transformation demonstrates how established companies can reinvent themselves to lead the next wave of innovation in digital commerce.

Visit Google Cloud’s blog for more detailed information about PayPal’s historic data migration journey