September, 2021

article thumbnail

dbt(Data Build Tool) Tutorial

Start Data Engineering

1. Introduction 2. Dbt, the T in ELT 3. Project 3.1. Prerequisites 3.2. Configurations and connections 3.2.1. profiles.yml 3.2.2. dbt_project.yml 3.3 Data flow 3.3.1. Source 3.3.2. Snapshots 3.3.3. Staging 3.3.4. Marts 3.3.4.1. Core 3.3.4.2. Marketing 3.4. dbt run 3.5. dbt test 3.6. dbt docs 3.7. Scheduling 4. Conclusion 5. Further reading 6. References 1.

Building 130
article thumbnail

How to Take Notes in 2021?

Simon Späti

Taking notes helps you not to forget things, teaches you to express yourself, brainstorms your thoughts, research a topic, and so many more things. I used to take notes all my life. Maybe it’s because I’m Swiss, they say we are well organised. I used to write in OneNote for 10+ years. I have notebooks for my bachelor studies and every workplace I worked.

IT 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Airflow Trigger Rules: All you need to know!

Marc Lamberti

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

article thumbnail

What’s New in Apache Kafka 3.0.0

Confluent

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka 145
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Uber Engineering

Introduction. Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One … The post Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework appeared first on Uber Engineering Blog.

AWS 141
article thumbnail

Delivering Your Personal Data Cloud With Prifina

Data Engineering Podcast

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience.

Cloud 100

More Trending

article thumbnail

Living on the Edge: How to Accelerate Your Business with Real-time Analytics

Cloudera

Leveraging the Internet of Things (IoT) allows you to improve processes and take your business in new directions. But it requires you to live on the edge. That’s where you find the ability to empower IoT devices to respond to events in real time by capturing and analyzing the relevant data. Edge computing relies on squeezing the power and functionality of a data center into a micro site as close to data sources as possible to enable real-time tasks.

Medical 119
article thumbnail

Groupon

Teradata

Groupon is modernizing with Vantage on AWS to better match its data & analytics with demands of its global business. The Cloud allows Groupon to better leverage infrastructure dollars, support more technology projects and capture opportunity.

AWS 98
article thumbnail

Kafka Summit Americas 2021 Recap

Confluent

The full inventory of three online Kafka Summits in 2021 is now complete. Kafka Summit Americas wrapped just yesterday. Being a part of the event team and the Program Committee, […].

Kafka 145
article thumbnail

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Uber Engineering

Introduction. The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions … The post Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner appeared first on Uber Engineering Blog.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Digging Into Data Reliability Engineering

Data Engineering Podcast

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability engineering practices in companies that rely heavily on their data.

article thumbnail

Start DataOps Today with ‘Lean DataOps’

DataKitchen

Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.

article thumbnail

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka 115
article thumbnail

What is an A/B Test?

Netflix Tech

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix. Subsequent posts will go into more details on the statistics of A/B tests, experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Announcing ksqlDB 0.21.0

Confluent

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes 140
article thumbnail

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka 135
article thumbnail

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

article thumbnail

DataOps is the Factory that Supports Your Data Mesh

DataKitchen

Below is our final post (5 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a data mesh decentralized architecture. We see a DataOps process hub like the DataKitchen Platform playing a central supporting role in successfully implementing a data mesh. DataOps excels at the type of workflow automation that can coordinate interdependent domains, manage order-of-operations issues and handle inter-domain communication.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

When Data Redefines Companies

Cloudera

The more an enterprise wants to know about itself and its business prospects, the more data it needs to collect and analyze. Additionally, the more data it collects and stores, the better its ability to know customers, to find new ones, and to provide more of what they want to buy. Sounds simple, but a surprising majority of U.S. companies (about two-thirds, according to CIO.com ) are only now getting tuned in to become fully functioning data-driven enterprises by starting new initiatives, scali

Hadoop 100
article thumbnail

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Netflix Tech

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services.

article thumbnail

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Confluent

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed […].

Cloud 122
article thumbnail

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

Problem. Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless , which enables the modeling of related entries in one single row of multiple columns, as well as … The post Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System appeared first on Uber Engineering Blog.

Systems 118
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

article thumbnail

Unilever

Teradata

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance 98
article thumbnail

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python 99
article thumbnail

How We Build Micro Frontends With Lattice

Netflix Tech

Written by Michael Possumato , Nick Tomlin , Jordan Andree , Andrew Shim , and Rahul Pilani. As we continue to grow here at Netflix, the needs of Revenue and Growth Engineering are rapidly evolving; and our tools must also evolve just as rapidly. The Revenue and Growth Tools (RGT) team decided to set off on a journey to build tools in an abstract manner to have solutions readily available within our organization.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Getting Started with GraphQL and Apache Kafka

Confluent

GraphQL and Apache Kafka® are sometimes troubled with misconceptions. One of the reasons for this is that people are often familiar with one but not the other. GraphQL is mostly […].

Kafka 122
article thumbnail

What Should Enterprises Do to Offset Future Technology Disruption?

DataKitchen

The post What Should Enterprises Do to Offset Future Technology Disruption? first appeared on DataKitchen.

article thumbnail

Declarative Machine Learning Without The Operational Overhead Using Continual

Data Engineering Podcast

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response.

article thumbnail

Building a Remote-First Culture

Datakin

Blog Building a Remote-First Culture Written by Amanda Bulger on Sep 15, 2021 This morning I was planning an offsite for our team – our first one since Datakin was founded during the pandemic – and I had a realization: I haven’t met most of these people in person yet! We’ve been working together for months and months, solving interesting problems and planning social events, but we have been restricted to knowing each other through a tiny box on a screen.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.