Sat.Aug 14, 2021 - Fri.Aug 20, 2021

article thumbnail

4 Key Patterns to Load Data Into A Data Warehouse

Start Data Engineering

Introduction Patterns 1. Batch Data Pipelines 1.1 Process => Data Warehouse 1.2 Process => Cloud Storage => Data Warehouse 2. Near Real-Time Data pipelines 2.1 Data Stream => Consumer => Data Warehouse 2.2 Cloud Storage => process => Data Warehouse Conclusion Further Reading Introduction Loading data into a data warehouse is a key component of most data pipelines.

article thumbnail

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Cloudera

Dear Parents and Educators and Friends of Cloudera, If you are reading this blog, you know us at Cloudera as a group of self-described data geeks and data analysts. We believe data drives better decisions and moves businesses forward and for us, that’s exciting. We are innovating and helping Fortune 500 transform and grow because they can make better data-driven decisions at the accelerated pace we live and work in today.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing the Confluent Q3 ’21 Release

Confluent

The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical […].

Building 105
article thumbnail

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries.

Building 100
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

4 Ways Conversational AI Is Improving the Customer Experience

DataKitchen

The post 4 Ways Conversational AI Is Improving the Customer Experience first appeared on DataKitchen.

98
article thumbnail

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloud 122

More Trending

article thumbnail

Migrate And Modify Your Data Platform Confidently With Compilerworks

Data Engineering Podcast

Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can’t do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between platforms, and in the process added an advanced lineage tracking capability.

SQL 100
article thumbnail

Implementing a Pharma Data Mesh using DataOps

DataKitchen

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Below is a discussion of a data mesh implementation in the pharmaceutical space. For those embarking on the data mesh journey, it may be helpful to discuss a real-world example and the lessons learned from an actual data mesh implementation.

article thumbnail

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease.

article thumbnail

Mitsui Sumitomo Insurance Co., Ltd.

Teradata

Vantage on AWS supports Next Best Action efforts - adding new supplemental coverage on policy renewals at a rate of 250%.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

Data Engineering Podcast

Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do when you need to manage unstructured information, or build a computer vision model? Activeloop was created for exactly that purpose. In this episode Davit Buniatyan, founder and CEO of Activeloop, explains why he is spending his time and energy on building a platform to simplify the work of getting your unstructured data ready for machine learning.

article thumbnail

AIOps Benefits All Aspects of the Enterprise

DataKitchen

The post AIOps Benefits All Aspects of the Enterprise first appeared on DataKitchen.

96
article thumbnail

Announcing the GA of Cloudera DataFlow for the Public Cloud

Cloudera

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloud 112
article thumbnail

Flight Price Predictor: Training Models to Pinpoint the Best Time for Booking

AltexSoft

Pricing in the airline industry is often compared to a brain game between carriers and passengers where each party pursues the best rates. Carriers aim at selling tickets as expensive as possible — while still not losing consumers to competitors. Passengers want to buy flights at the lowest cost — while not missing the chance to get on board. All this makes flight prices fluctuant and hard to predict.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

How Ripple's C++ Team Cut rippled's Memory Footprint Down To Size

Ripple Engineering

One of the best ways to make software more accessible is to reduce the hardware resources needed to run it. Blockchain software is no exception. The XRP Ledger is already one of the greenest blockchains due to its pioneering consensus protocol, but its ecosystem can still benefit from more efficient resource usage. Reduced inefficiencies benefit businesses, developers, and enthusiasts alike.

Bytes 52
article thumbnail

DataOps engineers run toward error and automate it away

DataKitchen

The post DataOps engineers run toward error and automate it away first appeared on DataKitchen.

IT 79
article thumbnail

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Cloudera

Introduction. In the first part of this series , I outlined the prerequisites for a modern Enterprise Data Platform to enable complex data product strategies that address the needs of multiple target segments and deliver strong profit margins as the data product portfolio expands in scope and complexity: With this article, I will dive into the specific capabilities of the Cloudera Data Platform (CDP) that has helped organizations to meet the aforementioned prerequisite capabilities and fulfill a

article thumbnail

How Telcos are Driving the Connected Economy

Teradata

The rich treasure trove of Teclo-derived data, specifically digital payments data, can be utilized to influence and predict business outcomes. Find out more.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

ripple-keypairs: XRP Ledger Key Generation and Signing

Ripple Engineering

Public key cryptography is one of the fundamental technologies that enables the XRP Ledger and other blockchain systems to operate. It uses a pair of keys: a public key and a private key. Anyone can create a new account and have authority to sign transactions from that account. In order to generate these keys, you can use a software library like ripple-keypairs.

Java 52
article thumbnail

ZIO Kafka: A Practical Streaming Tutorial

Rock the JVM

Discover how to leverage ZIO to seamlessly interact with Apache Kafka: the proven, scalable solution for reliable communication between distributed application components

Kafka 52
article thumbnail

Keys to Ensure that Data isn’t Slowing Down your Innovation Efforts

Cloudera

Data Lifecycle Management: The Key to AI-Driven Innovation. In digital transformation projects, it’s easy to imagine the benefits of cloud, hybrid, artificial intelligence (AI), and machine learning (ML) models. The hard part is to turn aspiration into reality by creating an organization that is truly data-driven. ML models powering AI use cases are becoming more and more ubiquitous in a variety of environments, especially at industrial organizations adopting Industry 4.0 technologies.

Medical 88
article thumbnail

A Day in the Life of a DataOps Engineer

DataKitchen

DataKitchen's DataOps Engineers Priyanjna Sharma & Chip Bloche discuss what DataOps Engineering entails, key skills required & when to add one to your data team. The post A Day in the Life of a DataOps Engineer first appeared on DataKitchen.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Xpring SDK: A 10,000 Foot View

Ripple Engineering

Hello, XRP In early October, Xpring launched Xpring SDK , a set of language specific libraries which made it easy to interact with XRP. As the creator of Xpring SDK, I wanted to take an opportunity to provide some insight into what Xpring has released, our future plans, and the technical architecture of our SDKs. First, a bit of background. The XRP Ledger is a sophisticated, yet complex, piece of software that runs in the context of a distributed system.

article thumbnail

ZIO Kafka: A Practical Streaming Tutorial

Rock the JVM

Discover how to leverage ZIO to seamlessly interact with Apache Kafka: the proven, scalable solution for reliable communication between distributed application components

Kafka 52
article thumbnail

Announcing Preset Cloud GA

Preset

Preset Cloud is now generally available! Preset Cloud is a modern data exploration and visualization platform powered by Apache Superset.

Cloud 52
article thumbnail

Data is the Key to Improving Sustainability in Retail & CPG

Teradata

Consumers continue to place emphasis on the sustainability credentials of those they choose to shop with, & what products they buy. Find out how retailers & CPGs should respond.

Retail 52
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Running a Node app on both IPv4 and IPv6

Grouparoo

We want to make Grouparoo as easy as possible to run, which means considering many different server environments. We recently had a customer who wanted to run Grouparoo in a Docker cluster that only had IPv6 addresses enabled. There are lots of reasons why IPv6 might be better (including the fact that we are running out of public IPv4 Addresses ), but it’s rare to find a deployment environment that only has IPv6 addresses by default.

IT 52
article thumbnail

75 Tableau Interview Questions and Answers for 2023

ProjectPro

Making a career transition into data analysis and visualization? Ace your next data analyst interview with these Tableau interview questions and answers that cover all the important topics and concepts in Tableau. Tableau is one of the most significant data visualization and business intelligence tools used by organizations across industries. Almost all fortune 500 companies use this tool to get better insights and work according to the market demands.

BI 40
article thumbnail

Migrating from Segment Part 2: Personas & SQL Traits in RudderStack

RudderStack

We recently helped a customer migrate from Segment to RudderStack, and the project included transitioning Personas functionality to RudderStack Reverse ETL.

SQL 40
article thumbnail

Tableau + Teradata Vantage: Always a Great Match!

Teradata

Tableau Server is now integrated out-of-the-box with Vantage Trial as part of the free 30-day experience. Find out more!

52
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m