This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Managing and utilizing data effectively is crucial for organizational success in today's fast-paced technological landscape. The vast amounts of data generated daily require advanced tools for efficient management and analysis. Data privacy and security Data privacy and security are important.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Can you describe the operational/architectural aspects of building a full data engine on top of the FDAP stack?
Does the LLM capture all the relevant data and context required for it to deliver useful insights? Does the LLM capture all the relevant data and context required for it to deliver useful insights? Not to mention the crazy stories about Gen AI making up answers without the data to back it up!) Can AIs responses be trusted?
Organizations increasingly rely on streaming data sources not only to bring data into the enterprise but also to perform streaming analytics that accelerate the process of being able to get value from the data early in its lifecycle.
Using the lens of a superhero narrative, he’ll uncover how AI can be the ultimate sidekick, aiding in datamanagement and reporting, enhancing productivity, and boosting innovation. The Future of Product Management 🔮 How to continuously integrate AI into your work to stay ahead of emerging trends and technologies.
Early enterprise adopters of generative AI have made it clear that a robust data strategy is the cornerstone of any successful AI initiative. To truly unlock AI's potential as a value multiplier and catalyst for reimagined customer experiences, an easy-to-use and trusted data platform is indispensable.
Stemma helps you establish and maintain that trust by giving visibility into who is using what data, annotating the reports with useful context, and understanding who is responsible for keeping it up to date. RudderStack’s smart customer data pipeline is warehouse-first. Then what do you do? Send a CSV via email?
When most people think of master datamanagement, they first think of customers and products. But master data encompasses so much more than data about customers and products. It requires reference data such as geographical subdivisions and market segments. Four main challenges make MDM complex.
Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem?
Summary There are extensive and valuable data sets that are available outside the bounds of your organization. Whether that data is public, paid, or scraped it requires investment and upkeep to acquire and integrate it with your systems. Atlan is the metadata hub for your data ecosystem.
The best part to jump on the bandwagon of information technology or IT is, there is an enormous possibility for an individual if he or she starts studying for a diploma or a degree, does either a master's degree or a research course. He or she can get a full-fledged engineering degree. You can learn CCNA, CCNP and more from CISCO academy.
Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. Who are the target customers for Meroxa?
Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and services. Atlan is the metadata hub for your data ecosystem. Can you share your definition of "behavioral data" and how it is differentiated from other sources/types of data?
Only a fraction of data created is actually stored and managed, with analysts estimating it to be between 4 – 6 ZB in 2020. Clearly, hybrid data presents a massive opportunity and a tough challenge. Capitalizing on the potential requires the ability to harness the value of all of that data, no matter where it is.
Big data in information technology is used to improve operations, provide better customer service, develop customized marketing campaigns, and take other actions to increase revenue and profits. In the world of technology, things are always changing. It is especially true in the world of big data.
Integrate data governance and data quality practices to create a seamless user experience and build trust in your data. When planning your data governance approach, start small, iterate purposefully, and foster data literacy to drive meaningful business outcomes. And thats really refreshing, she says.
In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Closing Announcements Thank you for listening!
Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. Can you describe your experiences with Kafka?
Data quality and data governance are the top data integrity challenges, and priorities. A long-term approach to your data strategy is key to success as business environments and technologies continue to evolve. However, they require a strong data foundation to be effective. Take a proactive approach.
Key Takeaways: Data mesh is a decentralized approach to datamanagement, designed to shift creation and ownership of data products to domain-specific teams. Data fabric is a unified approach to datamanagement, creating a consistent way to manage, access, and share data across distributed environments.
AI News 🤖 Mira Murati answers the Wall Street Journal about OpenAI Sora — OpenAI CTO has been asked a few questions about the underlying technology in Sora. AI News 🤖 Mira Murati answers the Wall Street Journal about OpenAI Sora — OpenAI CTO has been asked a few questions about the underlying technology in Sora.
Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. Contact Info tobymao on GitHub @captaintobs on Twitter Website Parting Question From your perspective, what is the biggest gap in the tooling or technology for datamanagement today?
Summary Data systems are inherently complex and often require integration of multiple technologies. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. container orchestration, generalized workflow orchestration, etc.)
What if your data lake could do more than just store information—what if it could think like a database? As data lakehouses evolve, they transform how enterprises manage, store, and analyze their data. represented a significant leap forward in data lakehouse technology. Exploring Apache Hudi 1.0:
Summary Generative AI has rapidly transformed everything in the technology sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines.
Nicola Askham found her way into data governance by accident, and stayed because of the benefit that she was able to provide by serving as a bridge between the technology and business. In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid.
Data is more than simply numbers as we approach 2025; it serves as the foundation for business decision-making in all sectors. However, data alone is insufficient. To remain competitive in the current digital environment, businesses must effectively gather, handle, and manage it. Data engineering can help with it.
Data and AI architecture matter “Before focusing on AI/ML use cases such as hyper personalization and fraud prevention, it is important that the data and data architecture are organized and structured in a way which meets the requirements and standards of the local regulators around the world.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. Closing Announcements Thank you for listening!
In an effort to better understand where data governance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. Get the Trendbook What is the Impact of Data Governance on GenAI?
Our leadership combines decades of experience in product safety and quality management with cutting-edge expertise in AI, data science and regulatory insights. We are inspired by the transformative potential of technology to solve persistent challenges in product quality and compliance that we experienced firsthand.
In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?
With its rise in popularity generative AI has emerged as a top CEO priority, and the importance of performant, seamless, and secure datamanagement and analytics solutions to power those AI applications is essential. This means you can expect simpler datamanagement and drastically improved productivity for your business users.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Data observability has been gaining adoption for a number of years now, with a large focus on data warehouses.
In these conversations, there are a number of questions that I hear time and time again: Will my data platform be scalable and reliable enough? How will my data stay secure and governed? A critical part of this decision is determining which foundational technology to build infrastructure on. What will costs look like?
In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. Stream processing technologies have been around for around a decade. Can you describe what Estuary is and the story behind it?
Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked datatechnologies provide a means of tightly coupling metadata with raw information. What is the overlap between knowledge graphs and "linked data products"?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex.
Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. To address these challenges, we made substantial investments in advanced data understanding technologies, as part of our Privacy Aware Infrastructure (PAI).
In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Can you describe what the focus of Dagster+ is and the story behind it? What problems are you trying to solve with Dagster+?
In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Can you describe what Synq is and the story behind it?
Lastly, companies have historically collaborated using inefficient and legacy technologies requiring file retrieval from FTP servers, API scraping and complex data pipelines. These processes were costly and time-consuming and also introduced governance and security risks, as once data is moved, customers lose all control.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content