Sat.Feb 13, 2021 - Fri.Feb 19, 2021

article thumbnail

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Confluent

Soon, Apache Kafka® will no longer need ZooKeeper! With KIP-500, Kafka will include its own built-in consensus layer, removing the ZooKeeper dependency altogether. The next big milestone in this effort […].

Kafka 145
article thumbnail

Is Devops the future of Agile ?

François Nguyen

Let’s start with maybe the best definition you can find on Devops (credit to AWS ) : “DevOps is the combination of cultural philosophies , practices , and tools that increases an organization’s ability to deliver applications and services at high velocity : evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

AWS 130
article thumbnail

Building The Foundations For Data Driven Businesses at 5xData

Data Engineering Podcast

Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed.

Building 100
article thumbnail

Apache Superset Tutorial

Start Data Engineering

Why data exploration Apache Superset architecture Setup Prerequisites Seed data Using Apache Superset 1. Connecting to a data warehouse 2. Querying data in SQL Lab 3. Creating a chart 4. Creating a dashboard Pros and Cons Pros Cons Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Oracle CDC Source Premium Connector is Now Generally Available

Confluent

One of the most common relational database systems that connects to Apache Kafka® is Oracle, which often holds highly critical enterprise transaction workloads. While Oracle Database (DB) excels at many […].

article thumbnail

Using other CDP services with Cloudera Operational Database

Cloudera

In the previous blog post , we looked at some of the application development concepts for the Cloudera Operational Database (COD). In this blog post, we’ll see how you can use other CDP services with COD. COD is an operational database-as-a-service that brings ease of use and flexibility to Apache HBase. Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution.

More Trending

article thumbnail

Value Classes in Scala Explained

Rock the JVM

Discover a powerful technique for eliminating hard-to-trace bugs with ad-hoc type definitions: learn how Scala 2's newtypes and Scala 3's opaque types can enhance your code's safety and maintainability

Scala 52
article thumbnail

Keys in ksqlDB, Unlocked

Confluent

One of the most highly requested enhancements to ksqlDB is here! Apache Kafka® messages may contain data in message keys as well as message values. Until now, ksqlDB could only […].

Kafka 115
article thumbnail

Cloudera DataFlow’s key milestones and wins in 2020

Cloudera

Needless to say, 2020 was an unforgettable year in a lot of ways and we were all happy to say goodbye to it. The pandemic has ushered in new ways of how we conduct businesses, remote work cultures, telehealth, grocery/food deliveries, etc. While certain industries were hard-hit by this change, most of the businesses were able to adapt, pivot, and take on this adversity in their stride.

Kafka 63
article thumbnail

Rockset Is Up to 9.4x Faster than Apache Druid on the Star Schema Benchmark

Rockset

Rockset released new numbers for the Star Schema Benchmark in April 2022. Learn how Rockset is 1.67 times faster than ClickHouse and 1.12 times faster than Druid in the latest performance blog post. Real-time analytics is all about deriving insights and taking actions as soon as data is produced. When broken down into its core requirements, real-time analytics means two things: access to fresh data and fast responses to queries.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Value Classes in Scala Explained

Rock the JVM

Discover a powerful technique for eliminating hard-to-trace bugs with ad-hoc type definitions: learn how Scala 2's newtypes and Scala 3's opaque types can enhance your code's safety and maintainability

Scala 52
article thumbnail

Announcing ksqlDB 0.15

Confluent

We’re pleased to announce ksqlDB 0.15, our first release of 2021! This version adds rich support for message key columns and long-awaited improvement to interactive development with the command line […].

Process 63
article thumbnail

dbt at Shopify, Active Learning, and More: Top 10 Links From Across the Web

Data Council

Here's our February 2021 roundup of links from across the web that we picked for you: 1. dbt at Shopify (Data Engineering Podcast) The Data Engineering Podcast recently featured a very interesting discussion about dbt at Shopify. Engineering manager Zeeshan Qureshi and senior data engineer Michelle Ark explained how dbt answered Shopify’s need for an SQL-based solution that its data scientists could use autonomously.

article thumbnail

Testing storage with Selenium (Node)

Grouparoo

We have a feature on this site that is using sessionStorage to send analytics data we want to capture. Being that it's an important feature, we should write test(s) to cover the use case(s), right? Okay, fine. Let's do it! This website is a Next.js application that uses Jest as our test runner and Selenium WebDriver for integration test help.

IT 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Reframing “Data Engineering vs Data Science”

Silectis

In this blog post, we’ll walk you through how data science and data engineering are complementary disciplines. We’ll also delineate a third category: data analysis. We’ll explore how both data engineering and data science should be marshaled to make better decisions. Organizations often struggle to strike the right balance between engineering, analysis, and data science skills within data teams.

article thumbnail

2021: The Year Real Time Gets Real

DataKitchen

The post 2021: The Year Real Time Gets Real first appeared on DataKitchen.

52
article thumbnail

Starting my Career as a Woman in Engineering

Afterpay Tech

By: Maggie Luo I think we can all agree that 2020 was a year of many firsts. Maybe for you, it was your first time spending most of your time at home with family in years. Or maybe it was your first time voting in the election, downloading TikTok, or making Dalgona coffee (we all remember that phase of quarantine, don’t we?) For me, last year was filled with many milestones: graduating from UC Berkeley as a first-generation college student, moving into an apartment in San Francisco with m

article thumbnail

Get user's Previous Path with NextJS Router

Grouparoo

We have a form on our meet page (which, BTW, we'd love you to fill out because we like meeting new people). In addition to the data input from the user, we also wanted to capture how that user got to the page. That helps us determine which of our content is most effective in getting website visitors to take action. The document.referrer Attempt My gut was to start with document.referrer.

IT 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Machine Learning Adapts to Rapidly Evolving Risk in Real-Time

Teradata

Addressing the rapid evolution of fraud and risk is an imperative for payments players. Machine learning and advanced analytics can help. Find out more.

article thumbnail

Types of Regression Analysis in Machine Learning

ProjectPro

Regression analysis is the favorite of data science and machine learning practitioners as it provides a great level of flexibility and reliability making it an ideal choice for analyzing different situations like - Do educational degrees and IQ affect salary? Is consuming caffeine and smoking-related to mortality risk? Do regular workouts and a dietary plan affect weight?

article thumbnail

Tame DataOps System Complexity with a DataOps Platform

DataKitchen

The post Tame DataOps System Complexity with a DataOps Platform first appeared on DataKitchen.

Systems 40
article thumbnail

Mutability in Scala Quickly Explained

Rock the JVM

Although frowned upon by FP purists, creating and managing mutable data structures is important in any language: Explore Scala's first-class mutability features

Scala 40
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

A Machine Learning Pipeline with Real-Time Inference

Zalando Engineering

Customers love the freedom to try the clothes first and pay later. We’d love to offer everyone the convenience of deferred payment. However, fraudsters exploit this to acquire goods they never pay for. The better we know the probability of an order defaulting, the better we can steer the risk and offer the convenience of deferred payment to more customers.

article thumbnail

Express Cloudera POV on 2021 data trends in insurance

Cloudera

Almost a year into the pandemic, the accelerated digital transformation has begun to feel less abrupt and more sustained. 2021 looks likely to be defined by a new phase: Thriving on digital transformation, rather than just surviving through it. . We’ve written about the changes forced on the traditionally risk-averse insurance industry by COVID-19. In 2021, with the crisis hopefully fading, insurance will have time to evaluate the changes made in 2020, assessing what worked and what didn’t

Insurance 108
article thumbnail

Data Governance in the Cloud Era – Accelerating, Not Hindering, Data Democratization

Teradata

Cloud tech can be empowering for end users, but without effective data governance, one risks sliding into a morass of inconsistent data, excessive rework & slow projects.