article thumbnail

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata 130
article thumbnail

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

Python, Angular, SSR, SQLite, DuckDB, Cockroach DB, and many others. Results are stored in git and their database, together with benchmarking metadata. Benchmarking results for each instance type are stored in sc-inspector-data repo, together with the benchmarking task hash and other metadata.  There Tech stack.

Cloud 326
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Python Ray -The Fast Lane to Distributed Computing

ProjectPro

Get ready to supercharge your data processing capabilities with Python Ray! Our tutorial teaches you how to unlock the power of parallelism and optimize your Python code for optimal performance. ​​Imagine This is where Python Ray comes in. Table of Contents What is Python Ray?

Python 45
article thumbnail

Directory Tables, Python UDF and Streams for PDF Processing

Cloudyard

Snowflake provides powerful tools such as directory tables , streams , and Python UDFs to seamlessly process these files, making it easy to extract actionable insights. Pipeline Overview The pipeline consists of the following components: Stage : Stores PDF files and tracks their metadata using directory tables. newly added files).

Python 52
article thumbnail

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. After experiencing the impacts of fragmented metadata and previous attempts at building a solution Suresh Srinivas and Sriharsha Chintalapani created the OpenMetadata project.

Metadata 100
article thumbnail

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

Then, Glue writes the job's metadata into the embedded AWS Glue Data Catalog. AWS Glue then creates data profiles in the catalog, a repository for all data assets' metadata, including table definitions, locations, and other features. For analyzing huge datasets, they want to employ familiar Python primitive types.

AWS 66
article thumbnail

How to Build an ETL Pipeline in Python? (Hands-On Example)

ProjectPro

In this blog, you’ll build a complete ETL pipeline in Python to perform data extraction from the Spotify API, followed by data manipulation and transformation for analysis. In this blog, you’ll learn how to build ETL pipeline in Python, the language most loved by data engineers worldwide. Python fits that role perfectly.

Python 40