Sat.Aug 14, 2021 - Fri.Aug 20, 2021

article thumbnail

4 Key Patterns to Load Data Into A Data Warehouse

Start Data Engineering

Introduction Patterns 1. Batch Data Pipelines 1.1 Process => Data Warehouse 1.2 Process => Cloud Storage => Data Warehouse 2. Near Real-Time Data pipelines 2.1 Data Stream => Consumer => Data Warehouse 2.2 Cloud Storage => process => Data Warehouse Conclusion Further Reading Introduction Loading data into a data warehouse is a key component of most data pipelines.

article thumbnail

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Cloudera

Dear Parents and Educators and Friends of Cloudera, If you are reading this blog, you know us at Cloudera as a group of self-described data geeks and data analysts. We believe data drives better decisions and moves businesses forward and for us, that’s exciting. We are innovating and helping Fortune 500 transform and grow because they can make better data-driven decisions at the accelerated pace we live and work in today.

article thumbnail

Announcing the Confluent Q3 ’21 Release

Confluent

The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical […].

Building 105
article thumbnail

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries.

Building 100
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

4 Ways Conversational AI Is Improving the Customer Experience

DataKitchen

The post 4 Ways Conversational AI Is Improving the Customer Experience first appeared on DataKitchen.

98
article thumbnail

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloud 122

More Trending

article thumbnail

Migrate And Modify Your Data Platform Confidently With Compilerworks

Data Engineering Podcast

Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can’t do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between platforms, and in the process added an advanced lineage tracking capability.

SQL 100
article thumbnail

Implementing a Pharma Data Mesh using DataOps

DataKitchen

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Below is a discussion of a data mesh implementation in the pharmaceutical space. For those embarking on the data mesh journey, it may be helpful to discuss a real-world example and the lessons learned from an actual data mesh implementation.

article thumbnail

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. By leveraging Spark on Kubernetes as the foundation along with a first class job management API many of our customers have been able to quickly deploy, monitor and manage the life cycle of their spark jobs with ease.

article thumbnail

Mitsui Sumitomo Insurance Co., Ltd.

Teradata

Vantage on AWS supports Next Best Action efforts - adding new supplemental coverage on policy renewals at a rate of 250%.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop

Data Engineering Podcast

Summary The vast majority of data tools and platforms that you hear about are designed for working with structured, text-based data. What do you do when you need to manage unstructured information, or build a computer vision model? Activeloop was created for exactly that purpose. In this episode Davit Buniatyan, founder and CEO of Activeloop, explains why he is spending his time and energy on building a platform to simplify the work of getting your unstructured data ready for machine learning.

article thumbnail

AIOps Benefits All Aspects of the Enterprise

DataKitchen

The post AIOps Benefits All Aspects of the Enterprise first appeared on DataKitchen.

96
article thumbnail

Announcing the GA of Cloudera DataFlow for the Public Cloud

Cloudera

Are you ready to turbo-charge your data flows on the cloud for maximum speed and efficiency? We are excited to announce the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) – a brand new experience on the Cloudera Data Platform (CDP) to address some of the key operational and monitoring challenges of standard Apache NiFi clusters that are overloaded with high-performant flows.

Cloud 116
article thumbnail

Flight Price Predictor: Training Models to Pinpoint the Best Time for Booking

AltexSoft

Pricing in the airline industry is often compared to a brain game between carriers and passengers where each party pursues the best rates. Carriers aim at selling tickets as expensive as possible — while still not losing consumers to competitors. Passengers want to buy flights at the lowest cost — while not missing the chance to get on board. All this makes flight prices fluctuant and hard to predict.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

How Ripple's C++ Team Cut rippled's Memory Footprint Down To Size

Ripple Engineering

One of the best ways to make software more accessible is to reduce the hardware resources needed to run it. Blockchain software is no exception. The XRP Ledger is already one of the greenest blockchains due to its pioneering consensus protocol, but its ecosystem can still benefit from more efficient resource usage. Reduced inefficiencies benefit businesses, developers, and enthusiasts alike.

Bytes 52
article thumbnail

DataOps engineers run toward error and automate it away

DataKitchen

The post DataOps engineers run toward error and automate it away first appeared on DataKitchen.

IT 82
article thumbnail

Data Product Strategies: How Cloudera Helps Realize and Accelerate Successful Data Product Strategies

Cloudera

Introduction. In the first part of this series , I outlined the prerequisites for a modern Enterprise Data Platform to enable complex data product strategies that address the needs of multiple target segments and deliver strong profit margins as the data product portfolio expands in scope and complexity: With this article, I will dive into the specific capabilities of the Cloudera Data Platform (CDP) that has helped organizations to meet the aforementioned prerequisite capabilities and fulfill a

article thumbnail

How Telcos are Driving the Connected Economy

Teradata

The rich treasure trove of Teclo-derived data, specifically digital payments data, can be utilized to influence and predict business outcomes. Find out more.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

ripple-keypairs: XRP Ledger Key Generation and Signing

Ripple Engineering

Public key cryptography is one of the fundamental technologies that enables the XRP Ledger and other blockchain systems to operate. It uses a pair of keys: a public key and a private key. Anyone can create a new account and have authority to sign transactions from that account. In order to generate these keys, you can use a software library like ripple-keypairs.

Java 52
article thumbnail

ZIO Kafka: A Practical Streaming Tutorial

Rock the JVM

Discover how to leverage ZIO to seamlessly interact with Apache Kafka: the proven, scalable solution for reliable communication between distributed application components

Kafka 52
article thumbnail

Keys to Ensure that Data isn’t Slowing Down your Innovation Efforts

Cloudera

Data Lifecycle Management: The Key to AI-Driven Innovation. In digital transformation projects, it’s easy to imagine the benefits of cloud, hybrid, artificial intelligence (AI), and machine learning (ML) models. The hard part is to turn aspiration into reality by creating an organization that is truly data-driven. ML models powering AI use cases are becoming more and more ubiquitous in a variety of environments, especially at industrial organizations adopting Industry 4.0 technologies.

Medical 92
article thumbnail

A Day in the Life of a DataOps Engineer

DataKitchen

DataKitchen's DataOps Engineers Priyanjna Sharma & Chip Bloche discuss what DataOps Engineering entails, key skills required & when to add one to your data team. The post A Day in the Life of a DataOps Engineer first appeared on DataKitchen.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Xpring SDK: A 10,000 Foot View

Ripple Engineering

Hello, XRP In early October, Xpring launched Xpring SDK , a set of language specific libraries which made it easy to interact with XRP. As the creator of Xpring SDK, I wanted to take an opportunity to provide some insight into what Xpring has released, our future plans, and the technical architecture of our SDKs. First, a bit of background. The XRP Ledger is a sophisticated, yet complex, piece of software that runs in the context of a distributed system.

article thumbnail

ZIO Kafka: A Practical Streaming Tutorial

Rock the JVM

Discover how to leverage ZIO to seamlessly interact with Apache Kafka: the proven, scalable solution for reliable communication between distributed application components

Kafka 52
article thumbnail

Announcing Preset Cloud GA

Preset

Preset Cloud is now generally available! Preset Cloud is a modern data exploration and visualization platform powered by Apache Superset.

Cloud 52
article thumbnail

Data is the Key to Improving Sustainability in Retail & CPG

Teradata

Consumers continue to place emphasis on the sustainability credentials of those they choose to shop with, & what products they buy. Find out how retailers & CPGs should respond.

Retail 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Running a Node app on both IPv4 and IPv6

Grouparoo

We want to make Grouparoo as easy as possible to run, which means considering many different server environments. We recently had a customer who wanted to run Grouparoo in a Docker cluster that only had IPv6 addresses enabled. There are lots of reasons why IPv6 might be better (including the fact that we are running out of public IPv4 Addresses ), but it’s rare to find a deployment environment that only has IPv6 addresses by default.

IT 52
article thumbnail

75 Tableau Interview Questions and Answers for 2023

ProjectPro

Making a career transition into data analysis and visualization? Ace your next data analyst interview with these Tableau interview questions and answers that cover all the important topics and concepts in Tableau. Tableau is one of the most significant data visualization and business intelligence tools used by organizations across industries. Almost all fortune 500 companies use this tool to get better insights and work according to the market demands.

BI 40
article thumbnail

Migrating from Segment Part 2: Personas & SQL Traits in RudderStack

RudderStack

We recently helped a customer migrate from Segment to RudderStack, and the project included transitioning Personas functionality to RudderStack Reverse ETL.

SQL 40
article thumbnail

Tableau + Teradata Vantage: Always a Great Match!

Teradata

Tableau Server is now integrated out-of-the-box with Vantage Trial as part of the free 30-day experience. Find out more!

52
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.