This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. What are the aspects of the database market that keep you interested as a VP of product?
The vast amounts of data generated daily require advanced tools for efficient management and analysis. Enter agentic AI, a type of artificial intelligence set to transform enterprise datamanagement. Many enterprises face overwhelming data sources, from structured databases to unstructured social media feeds.
This week, we delve into the vital world of Databases, SQL, DataManagement, and Statistical Concepts in Data Science. Welcome back to Week 2 of KDnuggets’ "Back to Basics" series.
Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.
Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Closing Announcements Thank you for listening!
Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.
In recent years, Meta’s datamanagement systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. Data is at the core of every product and service at Meta. Data is at the core of every product and service at Meta.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
When most people think of master datamanagement, they first think of customers and products. But master data encompasses so much more than data about customers and products. Challenges of Master DataManagement A decade ago, master datamanagement (MDM) was a much simpler proposition than it is today.
Summary One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be complex and expensive, especially if you need to scale to large volumes of data, high traffic, or geographically distributed usage.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Summary The most complicated part of data engineering is the effort involved in making the raw data fit into the narrative of the business. Master DataManagement (MDM) is the process of building consensus around what the information actually means in the context of the business and then shaping the data to match those semantics.
For machine learning applications relational models require additional processing to be directly useful, which is why there has been a growth in the use of vector databases. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.
Summary When you think about selecting a database engine for your project you typically consider options focused on serving multiple concurrent users. Sometimes what you really need is an embedded database that is blazing fast for single user workloads. Can you describe what DuckDB is and the story behind it?
Summary The Cassandra database is one of the first open source options for globally scalable storage systems. The community recently released a new major version that marks a milestone in its maturity and stability as a project and database. Since its introduction in 2008 it has been powering systems at every scale.
Summary Many of the events, ideas, and objects that we try to represent through data have a high degree of connectivity in the real world. TigerGraph is a leading database that offers a highly scalable and performant native graph engine for powering graph analytics and machine learning. Visit [link] to learn more.
Singlestore aims to cut down on the number of database engines that you need to run so that you can reduce the amount of copying that is required. By supporting fast, in-memory row-based queries and columnar on-disk representation, it lets your transactional and analytical workloads run in the same database.
Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and robust features.
Pathway is a streaming database engine that embeds artificial intelligence into the storage, with functionality designed to support the spatiotemporal data that is crucial for shipping and logistics. Atlan is the metadata hub for your data ecosystem. Interview Introduction How did you get involved in the area of datamanagement?
One of the biggest stumbling blocks of a business is the expansion of its Database. A few problems one might have to deal with while trying to expand their Database are storage problems, complicated management issues, and difficulty in the location, sharing, and checking of isolated data.
In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured datamanagement that really hit its stride in the early 1990s.
What if your data lake could do more than just store information—what if it could think like a database? As data lakehouses evolve, they transform how enterprises manage, store, and analyze their data. Hudi, with its robust community and technical innovation, is well-positioned to lead this charge.
In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.
We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner ® Magic Quadrant for Cloud DatabaseManagement Systems. We’re proud to be recognized for the datamanagement and data analytics innovations we have delivered in the new Cloudera Data Platform (CDP).
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Understand how BigQuery inserts, deletes and updates — Once again Vu took time to deep dive into BigQuery internal, this time to explain how datamanagement is done. Pandera, a data validation library for dataframes, now supports Polars. Arroyo, a stream-processing platform, rebuilt their engine using DataFusion.
RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Closing Announcements Thank you for listening!
Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads.
In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Are there any trends that concern you?
Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.
In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. With Materialize, you can!
In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. With Materialize, you can!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Data observability has been gaining adoption for a number of years now, with a large focus on data warehouses.
We’ll also introduce OpenHouse’s control plane, specifics of the deployed system at LinkedIn including our managed Iceberg lakehouse, and the impact and roadmap for future development of OpenHouse, including a path to open source. House database service: This is an internal service to store table service and data service metadata.
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. With Materialize, you can! Closing Announcements Thank you for listening!
In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a data platform architecture. What do you have planned for the future of your data platform? What do you have planned for the future of your data platform?
In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers. Validate consistency between source and target at any scale, and receive alerts about any discrepancies.
KAWA combines analytics, automation and AI agents to help enterprises build data apps and AI workflows quickly and achieve their digital transformation goals. It connects structured and unstructured databases across sources and uses a no-code UI or Python for advanced and predictive analytics.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content