Sat.Jan 30, 2021 - Fri.Feb 05, 2021

article thumbnail

How to update millions of records in MySQL?

Start Data Engineering

Introduction Setup Problems with a single large update Updating in batches Conclusion Further reading Introduction When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. If those records are locked, they will not be editable(update or delete) by other transactions on your database.

MySQL 130
article thumbnail

Data, The Unsung Hero of the Covid-19 Solution

Cloudera

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

article thumbnail

System Observability For The Cloud Native Era With Chronosphere

Data Engineering Podcast

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly.

Systems 100
article thumbnail

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java 98
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Pitching a DataOps Project That Matters

DataKitchen

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project 98
article thumbnail

8 Years of Event Streaming with Apache Kafka

Confluent

Since I first started using Apache Kafka® eight years ago, I went from being a student who had just heard about event streaming to contributing to the transformational, company-wide event […].

Kafka 97

More Trending

article thumbnail

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

article thumbnail

How DataOps Kitchens Enable Version Control

DataKitchen

This blog builds on earlier posts that defined Kitchens and showed how they map to technical environments. We’ve also discussed how toolchains are segmented to support multiple kitchens. DataOps automates the source code integration, release, and deployment workflows related to analytics development. To use software dev terminology, DataOps supports continuous integration, continuous delivery, and continuous deployment.

Coding 59
article thumbnail

Consuming Avro Data from Apache Kafka Topics and Schema Registry with Databricks and Confluent Cloud on Azure

Confluent

How do you process IoT data, change data capture (CDC) data, or streaming data from sensors, applications, and sources in real time? Apache Kafka® and Azure Databricks are widely adopted […].

Kafka 86
article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Cloudera

In the previous posts in this series, we have discussed Kerberos , LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka 80
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Digital Payments Analytics Rapidly Respond to Changing Preferences and Emerging Value Propositions

Teradata

Data & analytics now allow rapid response to changing preferences and emerging value propositions to seed future growth in the digital payments area. Read more.

Data 59
article thumbnail

Why You Need to Set SLAs for Your Data Pipelines

Monte Carlo

For today’s data engineering teams, the demand for real-time, accurate data has never been higher, yet data downtime is an all-too-common reality. So, how can we break this vicious cycle and achieve reliable data? Just like our software engineering counterparts 20 years ago, data teams in the early 2020s are facing a significant challenge: reliability.

article thumbnail

Announcing the Confluent Community Forum

Confluent

Today, we’re delighted to launch the Confluent Community Forum. Built on Discourse, a platform many developers will already be familiar with, this new forum is a place for the community […].

80
article thumbnail

Embracing the conversation – The reawakening of civil rights movements in the workplace

Cloudera

The 2020 murders of Ahmad Aubrey, Breonna Taylor, and George Floyd within a three month span of one another brought discussions about racial-social justice to dining rooms and boardrooms alike; and just like with the African-American catalysts before them, their tragedies reopened the door to larger discussions around economic, social, and civil rights.

Media 75
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Idiomatic Error Handling in Scala

Rock the JVM

Error handling can be one of the most frustrating aspects of programming: let's explore how Scala offers better and worse ways to manage it

Scala 52
article thumbnail

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

Silectis

We’re excited to share that Silectis has released a new suite of automated data lineage features within Magpie, the end-to-end data engineering platform. These features equip users with knowledge of where data originates, when it was published, and who it was published by. This additional context provides users the transparency and accountability necessary to trust their data and react to inevitable data quality issues.

article thumbnail

Open Source Highlight: OpenLineage

Data Council

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch , OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

article thumbnail

Building CI/CD with Airflow, GitLab and Terraform in GCP

Ripple Engineering

The Ripple Data Engineering team is expanding, which means higher frequency changes to our data pipeline source code. This means we need to build better, more configurable and more collaborative tooling that prevents code collisions and enforces software engineering best practices. To ensure the quality of incoming features, the team sought to create a pipeline that automatically validated those features, build them to verify their interoperability with existing features and GitLab,  and al

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Scala 3: Match Types Quickly Explained

Rock the JVM

Scala 3 comes with lots of new features: in this episode, we dive into match types, a powerful tool for pattern matching on types and more accurate type checking

Scala 52
article thumbnail

MLOps = more money

DareData

If you are a business person wondering why you should invest in DevOps / MLOps, this is your guide in terms of real live money. We'll try to keep this as non-technical as possible. The technical terms we mention are because they are ABSOLUTELY essential and should have budget allocated to them. Be sure to read to the end for instructions on how to execute an analysis of expected gains from infrastructure changes required for MLOps in your organisation.

Project 52
article thumbnail

Data Governance for Self-Service Analytics Best Practices

DataKitchen

The post Data Governance for Self-Service Analytics Best Practices first appeared on DataKitchen.

article thumbnail

Exploring MNIST Dataset using PyTorch to Train an MLP

ProjectPro

From the visual search for improved product discoverability to face recognition on social networks- image classification is fueling a visual revolution online and has taken the world by storm. Image classification , a subfield of computer vision helps in processing and classifying objects based on trained algorithms. Image Classification had its Eureka moment back in 2012 when Alexnet won the ImageNet challenge and since then there has been an exponential growth in the field.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

It's Never Too Late For a Career Change

Zalando Engineering

Is it ever too late to follow your dream and start a new career? Well, I was 30 and had been working for Zalando for more than 4 years when I decided to change my career path for the second time. I made the decision a year ago, joined my new team in April 2020, and I didn't regret it for a single day. Since that transition, a lot of people approached me with questions and asked me for advice.

IT 40
article thumbnail

Announcing Preset Cloud Beta!

Preset

Preset cloud offers an extremely easy way to get up and running with Superset on a highly scalable, highly performant, and secure cloud service.

Cloud 40
article thumbnail

Vantage Trial Delights Cloud Data Analytic Users

Teradata

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud for analysts, developers, and operations personnel. Find out more.

Cloud 40
article thumbnail

Stop using constants. Feed randomized input to test cases.

Zalando Engineering

Introduction Testing is widely accepted practice in software industry. I am an iOS Engineer and have been writing tests, like most of us. The way I approach testing changed radically a few years back. And I have used and shared this new technique for a few years within Zalando and outside. In this post, I will explain what is wrong with most test cases and how to apply randomized input to improve tests.

Coding 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.