February, 2021

article thumbnail

Build your data pipelines like the Toyota Way

François Nguyen

If there is one only book to read about lean manufacturing, this is the one. This is the kind of book you can read again and again and still learn something about your current context. It is also a book you can read whatever your industry, you will always find situations covered by this book. Today, we are going to apply these principles to the data pipelines. “The right process will deliver the right results” – Totoya way (section II) In the 14 Toyota way principles, you have

article thumbnail

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Start Data Engineering

Introduction Pre-requisites Setting up the data-ops pipeline Snowflake Local development environment dbt cloud Connect to Snowflake Link to github repository Setup deployment(release/prod) environment Setup CI PR -> CI -> merge cycle Schedule jobs Host data documentation Conclusion and next steps Further reading References Introduction With companies realizing the importance of having correct data, there has been a lot of attention on the data-ops side of things.

Cloud 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Why Data Capabilities Follow Up a Digital Transformation

Team Data Science

Companies can now make data useful to elevate decision making and to optimise products and processes. But what organizational capabilities are necessary and how to get started? It's currently easy to acquire data strategically. First, consider that smartphones function like questionnaires that customers are frequently filling out in a passive or active manner [ , 1 ].

article thumbnail

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Confluent

One of the great things about using an Apache Kafka® based architecture is that it naturally decouples systems and allows you to use the best tool for the job. While […].

Kafka 145
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Self Service Open Source Data Integration With AirByte

Data Engineering Podcast

Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly short. The team at Airbyte is adding a new entry to that list with the goal of making robust and easy to use data integration more accessible to teams who want or need to maintain full control of thei

article thumbnail

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

This is part 2 in this blog series. You can read part 1, here: Digital Transformation is a Data Journey From Edge to Insight. This blog series follows the manufacturing, operations and sales data for a connected vehicle manufacturer as the data goes through stages and transformations typically experienced in a large manufacturing company on the leading edge of current technology.

More Trending

article thumbnail

Apache Superset Tutorial

Start Data Engineering

Why data exploration Apache Superset architecture Setup Prerequisites Seed data Using Apache Superset 1. Connecting to a data warehouse 2. Querying data in SQL Lab 3. Creating a chart 4. Creating a dashboard Pros and Cons Pros Cons Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people.

article thumbnail

Is Your Data Holding You Back Instead of Driving You Forward?

Teradata

Everyone knows that data is vital for success in retail. But without a clear data strategy, retailers often eat up resources fighting small-scale battles, whilst gradually losing the war.

Retail 112
article thumbnail

Lessons Learned from Running Apache Kafka at Scale at Pinterest

Confluent

Apache Kafka® is at the heart of the data transportation layer at Pinterest. The amount of data that runs through Kafka has constantly grown over the years. This growth sometimes […].

Kafka 145
article thumbnail

Building The Foundations For Data Driven Businesses at 5xData

Data Engineering Podcast

Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed.

Building 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

#ClouderaLife Spotlight: Kevin Smith, Staff Customer Operations Engineer

Cloudera

Meet Kevin Smith, a Staff Customer Operations Engineer within the US Public Sector support team. He sums up his day-to-day by saying he works directly with clients on technical cases and provides support and guidance as they troubleshoot unexpected behavior. He also serves as a member of several project teams focusing on upgrade experiences, internal tools, product testing, training, and documentation.

article thumbnail

Is Devops the future of Agile ?

François Nguyen

Let’s start with maybe the best definition you can find on Devops (credit to AWS ) : “DevOps is the combination of cultural philosophies , practices , and tools that increases an organization’s ability to deliver applications and services at high velocity : evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

AWS 130
article thumbnail

How to Join a fact and a type 2 dimension (SCD2) table

Start Data Engineering

Introduction What is an SCD2 table and why use it? Application table Dimension table Setup Joining fact and SCD2 tables high_spenders user_items Educating end users Conclusion Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tables are used to record a business event and dimension tables are used to record the attributes of business items(eg user, item tables in an e-commerce app).

article thumbnail

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java 98
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Confluent

Soon, Apache Kafka® will no longer need ZooKeeper! With KIP-500, Kafka will include its own built-in consensus layer, removing the ZooKeeper dependency altogether. The next big milestone in this effort […].

Kafka 145
article thumbnail

How Shopify Is Building Their Production Data Warehouse Using DBT

Data Engineering Podcast

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify.

article thumbnail

Data – the Octane Accelerating Intelligent Connected Vehicles

Cloudera

The digital revolution is making a deep impact on the automotive industry, offering practically unlimited possibilities for more efficient, convenient, and safe driving and travel experiences in connected vehicles. This revolution is just beginning to accelerate – in fact, according to a recent Applied Market Research study, the global connected car market was valued at $63.03 billion in 2019, and is projected to reach $225.16 billion by 2027, registering a CAGR of 17.1% from 2020 to 2027.

article thumbnail

Pitching a DataOps Project That Matters

DataKitchen

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project 98
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

From Product Cycle to Digital Thread

Teradata

In order to survive, the auto industry needs to leverage 'digital threads’ that connect data from customers to dealers to products, & link R&D to production line & the aftermarket.

Data 69
article thumbnail

Hawkins: Diving into the Reasoning Behind our Design System

Netflix Tech

Stranger Things imagery showcasing the inspiration for the Hawkins Design System by Hawkins team member Joshua Godi ; with art contributions by Wiki Chaves Hawkins may be the name of a fictional town in Indiana, most widely known as the backdrop for one of Netflix’s most popular TV series “Stranger Things,” but the name is so much more. Hawkins is the namesake that established the basis for a design system used across the Netflix Studio ecosystem.

article thumbnail

Introducing Confluent Platform 6.1

Confluent

We are pleased to announce the release of Confluent Platform 6.1. With this release, we are further simplifying management tasks for Apache Kafka® operators and providing even higher availability for […].

Kafka 142
article thumbnail

System Observability For The Cloud Native Era With Chronosphere

Data Engineering Podcast

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly.

Systems 100
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Data, The Unsung Hero of the Covid-19 Solution

Cloudera

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

article thumbnail

How DataOps Kitchens Enable Version Control

DataKitchen

This blog builds on earlier posts that defined Kitchens and showed how they map to technical environments. We’ve also discussed how toolchains are segmented to support multiple kitchens. DataOps automates the source code integration, release, and deployment workflows related to analytics development. To use software dev terminology, DataOps supports continuous integration, continuous delivery, and continuous deployment.

Coding 59
article thumbnail

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

article thumbnail

Edge Authentication and Token-Agnostic Identity Propagation

Netflix Tech

by AIM Team Members Karen Casella , Travis Nelson , Sunny Singh ; with prior art and contributions by Justin Ryan , Satyajit Thadeshwar As most developers can attest, dealing with security protocols and identity tokens, as well as user and device authentication, can be challenging. Imagine having multiple protocols, multiple tokens, 200M+ users, and thousands of device types, and the problem can explode in scope.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Oracle CDC Source Premium Connector is Now Generally Available

Confluent

One of the most common relational database systems that connects to Apache Kafka® is Oracle, which often holds highly critical enterprise transaction workloads. While Oracle Database (DB) excels at many […].

article thumbnail

Data-driven performance improvements: Football and retail execution

Retail Insight

When I left school to start a professional football career, I understood very little about data – I did keep a note of the goals I scored, the assists I made and, most likely, the keepie-ups I could perform, but that was about it.

Retail 52
article thumbnail

Cloudera Operational Database application development concepts

Cloudera

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). . If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. . In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database.

Database 104
article thumbnail

Declarative Data Sync

Grouparoo

Developers have been using the Grouparoo UI to set up automated data movement from their databases to Mailchimp, Marketo, Salesforce, and more. While having these integrations already written for them saved plenty of time, there was something they missed: their normal developer workflow. Grouparoo now supports declarative data models and integrations to continuously sync your data to all of your cloud-based tools.

Data 52
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.