Meetup: How to Collaborate with Data Lake Management Communities

Fecha de inicio
Sede
Online

Talk 1: Modernizing Apache Hive Metastore for the Next Decade by Feng Lu
Talk 2: How Delta Lake Address Data Lake Challenges by Ajay Singh

Moderator: Chris Crosbie 

----------
Abstracts

----------

Modernizing Apache Hive Metastore for the Next Decade by Feng Lu

The Apache Hive project was introduced in 2010 and has enjoyed widespread popularity in the community for a decade. Being the de facto technical metadata standard, Apache Hive Metastore is a critical backbone for building data lake applications. The recent need of providing ACID properties, time travel, end to end security, and streaming support on tabular datasets renders the Hive Metastore data model inadequate. We present a number of open source initiatives at Google that aim to uplift and rejuvenate Hive Metastore for the next decade with design rationales explained. Namely, first-class versioned metadata support for new table formats (e.g., Iceberg, Delta), native gRPC interface for Hive Metastore API access, and running Hive Metastore on cloud-native databases like Cloud Spanner. We will wrap up the talk with development progress and call for community contributions.

How Delta Lake Address Data Lake Challenges by Ajay Singh

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Join the session to learn about specific challenges of Data Lakes and how Delta Lake addresses them to bring the benefits of data warehousing on data lakes.