This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary The market for datawarehouse platforms is large and varied, with options for every use case. It was interesting to learn about some of the custom data types and performance optimizations that are included. What are some of the advanced capabilities, such as SQL extensions, supported data types, etc.
In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data. Tired of deploying bad data?
He describes how the platform is architected, the challenges related to selling cloud technologies into enterprise organizations, and how you can adopt Matillion for your own workflows to reduce the maintenance burden of data integration workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
Summary With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. or any other destination you choose.
Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to. Despite these limitations, datawarehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications.
This includes modeling the lifecycle of your information as a pipeline from the raw, messy, loosely structured records in your data lake, through a series of transformations and ultimately to your datawarehouse. How do you define data curation?
In this episode she shares the story behind the project, the details of how it is implemented, and how you can use it for your own data projects. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Who is the target audience for Zingg?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
In this episode he shares his experiences working with organizations to adopt analytics engineering patterns and the ways that Optimus and dbt were combined to let data analysts deliver insights without the roadblocks of complex pipeline management. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
Summary One of the most complex aspects of managingdata for analytical workloads is moving it from a transactional database into the datawarehouse. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade datamanagement.
In our previous post, The Pros and Cons of Leading DataManagement and Storage Solutions , we untangled the differences among data lakes, datawarehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a datamanagement ecosystem?
Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Datamanagement recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.
In our previous post, The Pros and Cons of Leading DataManagement and Storage Solutions , we untangled the differences among data lakes, datawarehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a datamanagement ecosystem?
In our previous post, The Pros and Cons of Leading DataManagement and Storage Solutions , we untangled the differences among data lakes, datawarehouses, data lakehouses, data hubs, and data operating systems. What factors are most important when building a datamanagement ecosystem?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
If you are starting down the path of implementing a data governance strategy then this episode will provide a great overview of what is involved. If you hand a book to a new data engineer, what wisdom would you add to it? What is data governance? If you hand a book to a new data engineer, what wisdom would you add to it?
In this episode Isaac Brodsky explains how the Unfolded platform is architected, their experience joining the team at Foursquare, and how you can start using it for analyzing your spatial data today. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
Most of what is written though has to do with the enabling technology platforms (cloud or edge or point solutions like datawarehouses) or use cases that are driving these benefits (predictive analytics applied to preventive maintenance, financial institution’s fraud detection, or predictive health monitoring as examples) not the underlying data.
So, you’re planning a cloud datawarehouse migration. But be warned, a warehouse migration isn’t for the faint of heart. As you probably already know if you’re reading this, a datawarehouse migration is the process of moving data from one warehouse to another. A worthy quest to be sure.
Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases it adds significant complexity. Coalesce is a platform designed to reduce repetitive work for common workflows by adopting a visual pipeline builder to support your datawarehouse transformations.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. or any other destination you choose.
Data lakes, datawarehouses, data hubs, data lakehouses, and data operating systems are datamanagement and storage solutions designed to meet different needs in data analytics, integration, and processing. However, datawarehouses can experience limitations and scalability challenges.
Data lakes, datawarehouses, data hubs, data lakehouses, and data operating systems are datamanagement and storage solutions designed to meet different needs in data analytics, integration, and processing. However, datawarehouses can experience limitations and scalability challenges.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. How is Pinecone implemented?
Data lakes, datawarehouses, data hubs, data lakehouses, and data operating systems are datamanagement and storage solutions designed to meet different needs in data analytics, integration, and processing. However, datawarehouses can experience limitations and scalability challenges.
In this episode Arjun Narayan explains how the technical barriers to adopting real-time data in your analytics and applications have become surmountable by organizations of all sizes. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
He also discusses the improvements that have been incorporated into FeatureBase to simplify integration with the rest of your data stack, and the SQL interface that was added to make working with the product easier. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. or any other destination you choose.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
Summary One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be complex and expensive, especially if you need to scale to large volumes of data, high traffic, or geographically distributed usage. or any other destination you choose.
This is an interesting conversation about how to make data more accessible and more useful by improving the user experience of the tools that we create. RudderStack’s smart customer data pipeline is warehouse-first. Mention that you’re a Data Engineering Podcast listener, and they’ll send you a free t-shirt.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Missing data?
If you need to deal with massive data, at high velocities, in milliseconds, then Aerospike is definitely worth learning about. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Can you describe what Aerospike is and the story behind it?
In this episode Prineha Narang, co-founder and CTO of Aliro, explains how these systems work, the capabilities that they can offer, and how you can start preparing for a post-quantum future for your data systems. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
Even if you already have a metadata repository this is worth a listen to learn more about the value that visibility of your data can bring to your organization. You work hard to make sure that your data is clean, reliable, and reproducible throughout the ingestion pipeline, but what happens when it gets to the datawarehouse?
She also discusses her views on the role of the data lakehouse as a building block for these architectures and the ongoing influence that it will have as the technology matures. Datafold integrates with all major datawarehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy datawarehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Are you looking for your datawarehouse to support the hybrid multi-cloud?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
Why data consumers do not trust your reporting — It is a good illustration of the data journey manifesto. Stakeholders often notice data issues before the data team does. Datawarehouses are mutable, this is one of the many root causes proposed by Lucas. Data Documentation 101: Why?
He explains how they redesigned the core algorithms and storage management features to deliver ten times faster throughput, how the lower latencies work to reduce the burden on platform engineers, and how they are working toward an open source offering so that you can try it yourself with no friction. What is your target market and customer?
Summary As more organizations are gaining experience with datamanagement and incorporating analytics into their decision making, their next move is to adopt machine learning. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your datawarehouses, data lakes, ETL, and business intelligence.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content