Data Pipeline, Data Workflow and Hadoop

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How much SQL is required to learn Hadoop?

ProjectPro

JANUARY 20, 2016

With widespread enterprise adoption, learning Hadoop is gaining traction as it can lead to lucrative career opportunities. There are several hurdles and pitfalls students and professionals come across while learning Hadoop. How much Java is required to learn Hadoop? How much Java is required to learn Hadoop?

Hadoop

Hadoop SQL Java Big Data

Taming Complexity In Your Data Driven Organization With DataOps

Data Engineering Podcast

APRIL 27, 2020

DataKitchen’s DataOps Platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing and monitoring to development and deployment. How do the current set of tools contribute to the fragmentation of data workflows?

Hadoop

Hadoop Data Workflow Data Engineering Data Engineer

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Airflow — An open-source platform to programmatically author, schedule, and monitor data pipelines. Apache Oozie — An open-source workflow scheduler system to manage Apache Hadoop jobs. They make it easy to deploy and manage your own Apache Airflow webserver, so you can get straight to writing workflows.

Consulting

Consulting Machine Learning Data Science Data Pipeline

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is the role of a Data Engineer? Since the evolution of Data Science, it has helped tackle many real-world challenges.

Data Engineer

Data Engineer Data Engineering Engineering Hadoop

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Over the past couple of months, we’ve seen the resurgence of “benchmark wars” between major data warehousing platforms.

Data Warehouse

Data Warehouse Hadoop SQL Data Lake

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

LTIMindtree’s PolarSled Accelerator helps migrate existing legacy systems, such as SAP, Teradata and Hadoop, to Snowflake. This smoothes out workflows and helps teams swiftly mitigate potential issues.

Government

Government Data Cloud Hadoop

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

Each type of tool plays a specific role in the DataOps process, helping organizations manage and optimize their data pipelines more effectively. Data Integration Data integration is the process of collecting, transforming, and consolidating data from various sources.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Knowledge Hut

NOVEMBER 2, 2023

An Azure Data Engineer is a professional who is in charge of designing, implementing, and maintaining data processing systems and solutions on the Microsoft Azure cloud platform. A Data Engineer is responsible for designing the entire architecture of the data flow while taking the needs of the business into account.

Data Engineer

Data Engineer Data Engineering Project Coding

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

The “legacy” table formats The data landscape has evolved so quickly that table formats pioneered within the last 25 years are already achieving “legacy” status. It was designed to support high-volume data exchange and compatibility across different system versions, which is essential for streaming architectures such as Apache Kafka.

Data Lake

Data Lake Metadata Hadoop Data Governance

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

Data orchestration involves managing the scheduling and execution of data workflows. As for this part, Apache Airflow is a popular open-source platform choice used for data orchestration across the entire data pipeline. Data versioning component in a modern data stack.

IT

IT Data Warehouse Data Governance Data Lake

Data Quality Engineer: Skills, Salary, & Tools Required

Monte Carlo

JULY 27, 2023

Data quality engineers also need to have experience operating in cloud environments and using many of the modern data stack tools that are utilized in building and maintaining data pipelines. 78% of job postings referenced at least part of their environment was in a modern data warehouse, lake, or lakehouse.

Engineering

Engineering Healthcare Data Warehouse Scala

Top 25 Data Science Tools To Use in 2024

Knowledge Hut

MAY 23, 2024

Users can also leverage it for generating interactive visualizations over data. It also comes with lots of automation techniques that qualify users to eliminate manual data workflows. It can analyze data in real-time and can perform cluster management. It is much faster than other analytic workload tools like Hadoop.

Data Science

Data Science MongoDB Programming Language Hadoop

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Job Role 1: Azure Data Engineer Azure Data Engineers develop, deploy, and manage data solutions with Microsoft Azure data services. They use many data storage, computation, and analytics technologies to develop scalable and robust data pipelines.

Data Engineer

Data Engineer Data Engineering Engineering Data Warehouse

Data Orchestration Tools (Quick Reference Guide)

Monte Carlo

NOVEMBER 14, 2023

This is the world that data orchestration tools aim to create. Data orchestration tools minimize manual intervention by automating the movement of data within data pipelines. Luigi is an open source, Python-based package designed to facilitate the construction of intricate pipelines for batch jobs.

Pipeline-centric

Pipeline-centric Google Cloud Python Data Workflow

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

How do I know where this data came from or how it’s being used? How do I maintain all my data pipelines? How do I recreate the environment and data sets from scratch? How do I build confidence and trust in the data products I create? How do I ensure customers aren’t impacted by changes or new functionality?

IT

IT AWS Software Engineer Software Engineering

Upgrade your Modern Data Stack

Christophe Blefari

SEPTEMBER 28, 2023

The era of Big Data was characterised by Hadoop, HDFS, distributed computing (Spark), above the JVM. That's why big data technologies got swooshed by the modern data stack when it arrived on the market—excepting Spark. We need to store, process and visualise data, everything else is just marketing.

Big Data

Big Data Cloud Storage Hadoop SQL

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

The platform went live in 2015 at Airbnb, the biggest home-sharing and vacation rental site, as an orchestrator for increasingly complex data pipelines. It still remains a leading workflow management tool adopted by thousands of companies, from tech giants to startups. How data engineering works. What is Apache Airflow?

PostgreSQL

PostgreSQL Metadata Python MySQL

Data Engineering Digest

Stitching Together Enterprise Analytics With Microsoft Fabric

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Webinars

Trending Sources

Modern Customer Data Platform Principles

Webinars

How much SQL is required to learn Hadoop?

Taming Complexity In Your Data Driven Organization With DataOps

The DataOps Vendor Landscape, 2021

How to Become a Data Engineer in 2024?

A Reflection On The Data Ecosystem For The Year 2021

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

The Evolution of Table Formats

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Data Quality Engineer: Skills, Salary, & Tools Required

Top 25 Data Science Tools To Use in 2024

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Data Orchestration Tools (Quick Reference Guide)

DataOps: What Is It, Core Principles, and Tools For Implementation

Upgrade your Modern Data Stack

The Good and the Bad of Apache Airflow Pipeline Orchestration

Stay Connected