Sat.Apr 03, 2021 - Fri.Apr 09, 2021

article thumbnail

How to gather requirements to re-engineer a legacy data pipeline

Start Data Engineering

Introduction Gathering requirements 0. Understand the current state of the data pipeline 1. Think like the end user 2. Know the why 3. End user interviews 4. Reduce the scope 5. End user walkthrough for proposed solution 6. Timelines & deliverables Deliver iteratively Conclusion Further reading References Introduction As data engineers, you will have to re-engineer legacy data pipelines.

article thumbnail

Confluent and Elastic Partner to Deliver Optimized Search and Real-Time Analytics

Confluent

Today, I am delighted to announce an expanded partnership with Elastic. Together, we’re enabling our joint customers to set data in motion, and through that, deliver optimized search, real-time analytics, […].

Data 116
article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

article thumbnail

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

85
article thumbnail

Meet the New Analytics Superhero - The CFO

Teradata

The CFO’s broad remit & natural ownership of core financial data can provide the foundation for an enhanced role that leverages data analytics to enable new value opportunities.

More Trending

article thumbnail

Seven Common Challenges Fueling Data Warehouse Modernisation

Cloudera

Enterprise data warehouse platform owners face a number of common challenges. In this article, we look at seven challenges, explore the impacts to platform and business owners and highlight how a modern data warehouse can address them. Multiplatform. A recent Harvard Business Review study confirmed that data is increasingly being spread across data centres, private clouds and public clouds.

article thumbnail

Managing Data Analytics Is More Like Running A Restaurant Than You Think

DataKitchen

The post Managing Data Analytics Is More Like Running A Restaurant Than You Think first appeared on DataKitchen.

article thumbnail

Optimizing Git’s Merge Machinery, #3

Palantir

Editor’s note: This is the third post in a series by a Palantir Software Engineer on optimizing git’s merge and rename detection machinery. Click to read the first and second posts. This is the third in a series of blog posts on scaling git’s merge and rename detection machinery. In particular, the first also included some background information on how the merge machinery works, how we use git at Palantir, and why I have worked on optimizing and rewriting it.

article thumbnail

Employee Spotlight: Getting to Know Brendan Freehart Data Engineer at Silectis

Silectis

Ever wondered what it’s like to work at Silectis? We’re spotlighting our employees to give you a peek into our lives in and outside of work. For our first spotlight, we hear from Brendan Freehart , a true Silectis veteran who’s been with the company for almost 3 years. Brendan is a Data Engineer at Silectis, meaning he partners with our clients to help them get productive with Magpie, our data engineering platform , faster.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Journey to Understanding your Insurance Customers

Cloudera

Insurance carriers have a unique opportunity: They have access to powerful technologies and a wealth of information that can help them to better understand their customers and provide an enhanced customer experience. . Insurance companies recognize that customer service, communication, and personalization — key tenets of any customer experience — are major components of profitability and growth.

article thumbnail

How to utilise DataOps to improve the performance of Data Teams

DataKitchen

Hub & Spoken podcast host Jason Foster interviews DataKitchen CEO Chris Bergh on how DataOps can help improve technical data teams' performance with shorter delivery time & continuous feedback. The post How to utilise DataOps to improve the performance of Data Teams first appeared on DataKitchen.

Data 52
article thumbnail

How Smart is Your Smart Factory?

Teradata

As a core component of Industry 4.0, the Smart Factory promises significant productivity increases. But connecting a factory to the cloud & collecting data does not necessarily make it "smart.

Cloud 52
article thumbnail

Setting Up Secure Networking in Confluent with Azure Private Link

Confluent

We’re happy to announce that Confluent Cloud, our cloud-native service for Apache Kafka®, now supports Azure Private Link for secure network connectivity, in addition to the existing Azure Virtual Network […].

Kafka 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Cloudera Honored With 5-Star Rating in the 2021 CRN® Partner Program Guide

Cloudera

Cloudera is being acknowledged by CRN®, a brand of The Channel Company, in its 2021 Partner Program Guide. This annual guide provides a conclusive list of the most distinguished partner programs from leading technology companies that provide products and services through the IT Channel. The 5-Star rating is awarded to an exclusive group of companies that offer solution providers the best of the best, going above and beyond in their partner programs.

article thumbnail

DataOps Transformation Trailblazers: The Journey to DataOps Success

DataKitchen

The post DataOps Transformation Trailblazers: The Journey to DataOps Success first appeared on DataKitchen.

52
article thumbnail

Scala 3: Extension Methods Quickly Explained

Rock the JVM

Deconstructing extension methods: one of the most exciting features of the upcoming Scala 3

Scala 52
article thumbnail

Extracting MongoDB fields - even nested ones

Grouparoo

If you’re a data analyst, data scientist, developer, or DB administrator you may have used, at some point, a non-relational database with flexible schemas. Well, I could list several advantages of a NoSQL solution over SQL-based databases and vice versa. However, the main focus of this post is to discuss a particular downside of MongoDB and a possible solution to go through it.

MongoDB 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Data Engineer & Scientist’s Guide To Root Cause Analysis for Data Quality Issues

Monte Carlo

Data pipelines can break for a million different reasons, and there isn’t a one-size-fits all approach to understanding how or why. Here are five critical steps data engineers must take to conduct engineering root cause analysis for data quality issues. While I can’t know for sure, I’m confident many of us have been there. I’m talking about the frantic late afternoon Slack message that looks like: This exact scenario happened to me many times during my tenure at Segment.

article thumbnail

Cooking with DataOps

DataKitchen

The Data Stack Show podcast hosts Eric Dodds & Kostas Pardalis interview DataKitchen CEO Chris Bergh on why most data analytics projects fail, three things DataOps focuses on, comparing & contrasting DevOps & DataOps, & fixing problems at the source rather than downstream improvements. The post Cooking with DataOps first appeared on DataKitchen.

article thumbnail

A Monad Is a Monoid in the Category of Endofunctors: Scala Explanation

Rock the JVM

What's the problem?

Scala 52
article thumbnail

Deep Learning for Image Classification in Python with CNN

ProjectPro

As you begin to read this article on Image Classification, I want you to look around and observe the things that you can see. Based on where you are sitting, the things that you see will be different. Almost 99% of the time, you can name these things, even if you don’t know the exact name, you know what it looks like. Walking on the road, you see a whole new species of a cat you have never seen before, but you still know it’s a cat, right?

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Case Study: Sequoia Capital — Why We Moved from Elasticsearch to Rockset

Rockset

Sequoia Capital is a venture capital firm that invests in a broad range of consumer and enterprise start-ups. To keep up with all the data around potential investment opportunities, they created a suite of internal data applications several years ago to better support their investment teams. More recently, they transitioned their internal apps from Elasticsearch to Rockset.

NoSQL 40
article thumbnail

Adoption = Your Businesses Success

FreshBI

The objective of this blog Many businesses fail to recognize a vital concept: Adoption. No, we’re not talking about adopting a new family pet, we’re referring to software and product adoption— specifically of PowerBI. Here’s a definition I like, “Adoption is the process by which users become aware of a product, understand its value , and begin to use it.

BI 52