Sat.Apr 03, 2021 - Fri.Apr 09, 2021

article thumbnail

How to gather requirements to re-engineer a legacy data pipeline

Start Data Engineering

Introduction Gathering requirements 0. Understand the current state of the data pipeline 1. Think like the end user 2. Know the why 3. End user interviews 4. Reduce the scope 5. End user walkthrough for proposed solution 6. Timelines & deliverables Deliver iteratively Conclusion Further reading References Introduction As data engineers, you will have to re-engineer legacy data pipelines.

article thumbnail

Confluent and Elastic Partner to Deliver Optimized Search and Real-Time Analytics

Confluent

Today, I am delighted to announce an expanded partnership with Elastic. Together, we’re enabling our joint customers to set data in motion, and through that, deliver optimized search, real-time analytics, […].

Data 116
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Summary One of the biggest obstacles to success in delivering data products is cross-team collaboration. Part of the problem is the difference in the information that each role requires to do their job and where they expect to find it. This introduces a barrier to communication that is difficult to overcome, particularly in teams that have not reached a significant level of maturity in their data journey.

article thumbnail

Next Stop – Predicting on Data with Cloudera Machine Learning

Cloudera

This is part 4 in this blog series. You can read part 1 here and part 2 here , and watch part 3 here. This blog series follows the manufacturing and operations data lifecycle stages of an electric car manufacturer – typically experienced in large, data-driven manufacturing companies. The first blog introduced a mock vehicle manufacturing company, The Electric Car Company (ECC) and focused on Data Collection.

article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

DevOps and agile still hindered by enterprise silos, inertia

DataKitchen

The post DevOps and agile still hindered by enterprise silos, inertia first appeared on DataKitchen.

85

More Trending

article thumbnail

Open Source Highlight: Apache Superset

Data Council

Apache Superset is a very popular open-source project that provides users with an exploration and visualization platform for their (big or not-so-big) data. For instance, it can be used to create line charts, but also advanced geospatial charts and dashboards that support queries via SQL Lab.

article thumbnail

The Journey to Understanding your Insurance Customers

Cloudera

Insurance carriers have a unique opportunity: They have access to powerful technologies and a wealth of information that can help them to better understand their customers and provide an enhanced customer experience. . Insurance companies recognize that customer service, communication, and personalization — key tenets of any customer experience — are major components of profitability and growth.

article thumbnail

Managing Data Analytics Is More Like Running A Restaurant Than You Think

DataKitchen

The post Managing Data Analytics Is More Like Running A Restaurant Than You Think first appeared on DataKitchen.

article thumbnail

Optimizing Git’s Merge Machinery, #3

Palantir

Editor’s note: This is the third post in a series by a Palantir Software Engineer on optimizing git’s merge and rename detection machinery. Click to read the first and second posts. This is the third in a series of blog posts on scaling git’s merge and rename detection machinery. In particular, the first also included some background information on how the merge machinery works, how we use git at Palantir, and why I have worked on optimizing and rewriting it.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Employee Spotlight: Getting to Know Brendan Freehart Data Engineer at Silectis

Silectis

Ever wondered what it’s like to work at Silectis? We’re spotlighting our employees to give you a peek into our lives in and outside of work. For our first spotlight, we hear from Brendan Freehart , a true Silectis veteran who’s been with the company for almost 3 years. Brendan is a Data Engineer at Silectis, meaning he partners with our clients to help them get productive with Magpie, our data engineering platform , faster.

article thumbnail

Seven Common Challenges Fueling Data Warehouse Modernisation

Cloudera

Enterprise data warehouse platform owners face a number of common challenges. In this article, we look at seven challenges, explore the impacts to platform and business owners and highlight how a modern data warehouse can address them. Multiplatform. A recent Harvard Business Review study confirmed that data is increasingly being spread across data centres, private clouds and public clouds.

article thumbnail

How to utilise DataOps to improve the performance of Data Teams

DataKitchen

Hub & Spoken podcast host Jason Foster interviews DataKitchen CEO Chris Bergh on how DataOps can help improve technical data teams' performance with shorter delivery time & continuous feedback. The post How to utilise DataOps to improve the performance of Data Teams first appeared on DataKitchen.

Data 52
article thumbnail

How Smart is Your Smart Factory?

Teradata

As a core component of Industry 4.0, the Smart Factory promises significant productivity increases. But connecting a factory to the cloud & collecting data does not necessarily make it "smart.

Cloud 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Setting Up Secure Networking in Confluent with Azure Private Link

Confluent

We’re happy to announce that Confluent Cloud, our cloud-native service for Apache Kafka®, now supports Azure Private Link for secure network connectivity, in addition to the existing Azure Virtual Network […].

Kafka 52
article thumbnail

Cloudera Honored With 5-Star Rating in the 2021 CRN® Partner Program Guide

Cloudera

Cloudera is being acknowledged by CRN®, a brand of The Channel Company, in its 2021 Partner Program Guide. This annual guide provides a conclusive list of the most distinguished partner programs from leading technology companies that provide products and services through the IT Channel. The 5-Star rating is awarded to an exclusive group of companies that offer solution providers the best of the best, going above and beyond in their partner programs.

article thumbnail

DataOps Transformation Trailblazers: The Journey to DataOps Success

DataKitchen

The post DataOps Transformation Trailblazers: The Journey to DataOps Success first appeared on DataKitchen.

52
article thumbnail

Scala 3: Extension Methods Quickly Explained

Rock the JVM

Deconstructing extension methods: one of the most exciting features of the upcoming Scala 3

Scala 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Extracting MongoDB fields - even nested ones

Grouparoo

If you’re a data analyst, data scientist, developer, or DB administrator you may have used, at some point, a non-relational database with flexible schemas. Well, I could list several advantages of a NoSQL solution over SQL-based databases and vice versa. However, the main focus of this post is to discuss a particular downside of MongoDB and a possible solution to go through it.

MongoDB 52
article thumbnail

The Data Engineer & Scientist’s Guide To Root Cause Analysis for Data Quality Issues

Monte Carlo

Data pipelines can break for a million different reasons, and there isn’t a one-size-fits all approach to understanding how or why. Here are five critical steps data engineers must take to conduct engineering root cause analysis for data quality issues. While I can’t know for sure, I’m confident many of us have been there. I’m talking about the frantic late afternoon Slack message that looks like: This exact scenario happened to me many times during my tenure at Segment.

article thumbnail

Cooking with DataOps

DataKitchen

The Data Stack Show podcast hosts Eric Dodds & Kostas Pardalis interview DataKitchen CEO Chris Bergh on why most data analytics projects fail, three things DataOps focuses on, comparing & contrasting DevOps & DataOps, & fixing problems at the source rather than downstream improvements. The post Cooking with DataOps first appeared on DataKitchen.

article thumbnail

A Monad Is a Monoid in the Category of Endofunctors: Scala Explanation

Rock the JVM

What's the problem?

Scala 52
article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

Deep Learning for Image Classification in Python with CNN

ProjectPro

As you begin to read this article on Image Classification, I want you to look around and observe the things that you can see. Based on where you are sitting, the things that you see will be different. Almost 99% of the time, you can name these things, even if you don’t know the exact name, you know what it looks like. Walking on the road, you see a whole new species of a cat you have never seen before, but you still know it’s a cat, right?

article thumbnail

Case Study: Sequoia Capital — Why We Moved from Elasticsearch to Rockset

Rockset

Sequoia Capital is a venture capital firm that invests in a broad range of consumer and enterprise start-ups. To keep up with all the data around potential investment opportunities, they created a suite of internal data applications several years ago to better support their investment teams. More recently, they transitioned their internal apps from Elasticsearch to Rockset.

NoSQL 40
article thumbnail

Adoption = Your Businesses Success

FreshBI

The objective of this blog Many businesses fail to recognize a vital concept: Adoption. No, we’re not talking about adopting a new family pet, we’re referring to software and product adoption— specifically of PowerBI. Here’s a definition I like, “Adoption is the process by which users become aware of a product, understand its value , and begin to use it.

BI 52