Top Data Engineering Digest Data Consolidation Certification Content for Week of Jan 30

Sat.Jan 30, 2021 - Fri.Feb 05, 2021

How to update millions of records in MySQL?

Start Data Engineering

JANUARY 30, 2021

Introduction Setup Problems with a single large update Updating in batches Conclusion Further reading Introduction When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. If those records are locked, they will not be editable(update or delete) by other transactions on your database.

MySQL

MySQL Database

Data, The Unsung Hero of the Covid-19 Solution

Cloudera

FEBRUARY 3, 2021

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

Manufacturing

Manufacturing Transportation Pharmaceutical Data Consolidation

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

System Observability For The Cloud Native Era With Chronosphere

Data Engineering Podcast

FEBRUARY 1, 2021

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly.

Systems

Systems Cloud BI Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

FEBRUARY 3, 2021

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java

Java Architecture Coding Designing

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Pitching a DataOps Project That Matters

DataKitchen

FEBRUARY 1, 2021

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project

Project Raw Data Data Science Consulting

8 Years of Event Streaming with Apache Kafka

Confluent

FEBRUARY 2, 2021

Since I first started using Apache Kafka® eight years ago, I went from being a student who had just heard about event streaming to contributing to the transformational, company-wide event […].

Kafka

Kafka Data Schemas Cloud Data

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Cloudera

FEBRUARY 4, 2021

Financial services institutions need the ability to analyze and act on massive volumes of data from diverse sources in order to monitor, model, and manage risk across the enterprise. They need a comprehensive data and analytics platform to model risk exposures on-demand. Cloudera is that platform. I am pleased to announce that Cloudera was just named the Risk Data Repository and Data Management Product of the Year in the Risk Markets Technology Awards 2021. .

Technology

Technology Data Management Management Insurance

More Trending

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Cloudera

FEBRUARY 4, 2021

Technology

Technology Data Management Management Insurance

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

FEBRUARY 3, 2021

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

Algorithm

Algorithm Building Data

How DataOps Kitchens Enable Version Control

DataKitchen

FEBRUARY 4, 2021

This blog builds on earlier posts that defined Kitchens and showed how they map to technical environments. We’ve also discussed how toolchains are segmented to support multiple kitchens. DataOps automates the source code integration, release, and deployment workflows related to analytics development. To use software dev terminology, DataOps supports continuous integration, continuous delivery, and continuous deployment.

Coding

Coding Project Data Analytics Algorithm

Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure

Confluent

FEBRUARY 4, 2021

How do you process IoT data, change data capture (CDC) data, or streaming data from sensors, applications, and sources in real time? Apache Kafka® and Azure Databricks are widely adopted […].

Kafka

Kafka Cloud Data Process

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Cloudera

FEBRUARY 2, 2021

In the previous posts in this series, we have discussed Kerberos , LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka

Kafka Certification Java Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Digital Payments Analytics Rapidly Respond to Changing Preferences and Emerging Value Propositions

Teradata

JANUARY 31, 2021

Data & analytics now allow rapid response to changing preferences and emerging value propositions to seed future growth in the digital payments area. Read more.

Data

Why You Need to Set SLAs for Your Data Pipelines

Monte Carlo

FEBRUARY 4, 2021

For today’s data engineering teams, the demand for real-time, accurate data has never been higher, yet data downtime is an all-too-common reality. So, how can we break this vicious cycle and achieve reliable data? Just like our software engineering counterparts 20 years ago, data teams in the early 2020s are facing a significant challenge: reliability.

Data Pipeline

Data Pipeline Software Engineering Software Engineer Data

Announcing the Confluent Community Forum

Confluent

FEBRUARY 3, 2021

Today, we’re delighted to launch the Confluent Community Forum. Built on Discourse, a platform many developers will already be familiar with, this new forum is a place for the community […].

Embracing the conversation – The reawakening of civil rights movements in the workplace

Cloudera

FEBRUARY 1, 2021

The 2020 murders of Ahmad Aubrey, Breonna Taylor, and George Floyd within a three month span of one another brought discussions about racial-social justice to dining rooms and boardrooms alike; and just like with the African-American catalysts before them, their tragedies reopened the door to larger discussions around economic, social, and civil rights.

Media

Media Education Management IT

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Idiomatic Error Handling in Scala

Rock the JVM

FEBRUARY 3, 2021

Error handling can be one of the most frustrating aspects of programming: let's explore how Scala offers better and worse ways to manage it

Scala

Scala Programming Management IT

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

Silectis

FEBRUARY 3, 2021

We’re excited to share that Silectis has released a new suite of automated data lineage features within Magpie, the end-to-end data engineering platform. These features equip users with knowledge of where data originates, when it was published, and who it was published by. This additional context provides users the transparency and accountability necessary to trust their data and react to inevitable data quality issues.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Open Source Highlight: OpenLineage

Data Council

FEBRUARY 2, 2021

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch , OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

Metadata

Metadata Data Data Pipeline IT

Building CI/CD with Airflow, GitLab and Terraform in GCP

Ripple Engineering

FEBRUARY 2, 2021

The Ripple Data Engineering team is expanding, which means higher frequency changes to our data pipeline source code. This means we need to build better, more configurable and more collaborative tooling that prevents code collisions and enforces software engineering best practices. To ensure the quality of incoming features, the team sought to create a pipeline that automatically validated those features, build them to verify their interoperability with existing features and GitLab, and al

Building

Building Python Google Cloud Coding

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Scala 3: Match Types Quickly Explained

Rock the JVM

FEBRUARY 1, 2021

Scala 3 comes with lots of new features: in this episode, we dive into match types, a powerful tool for pattern matching on types and more accurate type checking

Scala

Data Governance for Self-Service Analytics Best Practices

DataKitchen

FEBRUARY 1, 2021

The post Data Governance for Self-Service Analytics Best Practices first appeared on DataKitchen.

Data Governance

Data Governance Government Data

Exploring MNIST Dataset using PyTorch to Train an MLP

ProjectPro

FEBRUARY 5, 2021

From the visual search for improved product discoverability to face recognition on social networks- image classification is fueling a visual revolution online and has taken the world by storm. Image classification , a subfield of computer vision helps in processing and classifying objects based on trained algorithms. Image Classification had its Eureka moment back in 2012 when Alexnet won the ImageNet challenge and since then there has been an exponential growth in the field.

Datasets

Datasets Deep Learning Medical Algorithm

It's Never Too Late For a Career Change

Zalando Engineering

FEBRUARY 3, 2021

Is it ever too late to follow your dream and start a new career? Well, I was 30 and had been working for Zalando for more than 4 years when I decided to change my career path for the second time. I made the decision a year ago, joined my new team in April 2020, and I didn't regret it for a single day. Since that transition, a lot of people approached me with questions and asked me for advice.

IT Software Engineer Software Engineering Coding

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Announcing Preset Cloud Beta!

Preset

FEBRUARY 1, 2021

Preset cloud offers an extremely easy way to get up and running with Superset on a highly scalable, highly performant, and secure cloud service.

Cloud

Vantage Trial Delights Cloud Data Analytic Users

Teradata

FEBRUARY 1, 2021

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud for analysts, developers, and operations personnel. Find out more.

Cloud

Cloud Data Analytics Accessibility Accessible

MLOps = more money

DareData

FEBRUARY 1, 2021

If you are a business person wondering why you should invest in DevOps / MLOps, this is your guide in terms of real live money. We'll try to keep this as non-technical as possible. The technical terms we mention are because they are ABSOLUTELY essential and should have budget allocated to them. Be sure to read to the end for instructions on how to execute an analysis of expected gains from infrastructure changes required for MLOps in your organisation.

Project

Project Engineering Programming Language Consulting

Stop using constants. Feed randomized input to test cases.

Zalando Engineering

FEBRUARY 1, 2021

Introduction Testing is widely accepted practice in software industry. I am an iOS Engineer and have been writing tests, like most of us. The way I approach testing changed radically a few years back. And I have used and shared this new technique for a few years within Zalando and outside. In this post, I will explain what is wrong with most test cases and how to apply randomized input to improve tests.

Coding

Coding Accessibility Accessible Architecture

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Sat.Jan 30, 2021 - Fri.Feb 05, 2021

How to update millions of records in MySQL?

Data, The Unsung Hero of the Covid-19 Solution

Webinars

Trending Sources

System Observability For The Cloud Native Era With Chronosphere

Webinars

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

A Guide to Debugging Apache Airflow® DAGs

Pitching a DataOps Project That Matters

8 Years of Event Streaming with Apache Kafka

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Sign up to get articles personalized to your interests!

More Trending

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

How I Built an Algorithm to Help Doctors Fight COVID-19

How DataOps Kitchens Enable Version Control

Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Digital Payments Analytics Rapidly Respond to Changing Preferences and Emerging Value Propositions

Why You Need to Set SLAs for Your Data Pipelines

Announcing the Confluent Community Forum

Embracing the conversation – The reawakening of civil rights movements in the workplace

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Idiomatic Error Handling in Scala

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

Open Source Highlight: OpenLineage

Building CI/CD with Airflow, GitLab and Terraform in GCP

How to Modernize Manufacturing Without Losing Control

Scala 3: Match Types Quickly Explained

Data Governance for Self-Service Analytics Best Practices

Exploring MNIST Dataset using PyTorch to Train an MLP

It's Never Too Late For a Career Change

The Ultimate Guide to Apache Airflow DAGS

Announcing Preset Cloud Beta!

Vantage Trial Delights Cloud Data Analytic Users

MLOps = more money

Stop using constants. Feed randomized input to test cases.

Apache Airflow® Best Practices: DAG Writing

Stay Connected