Tue.Aug 13, 2024

article thumbnail

Data Engineering Interview Series #1: Data Structures and Algorithms

Start Data Engineering

1. Introduction 2. Data structures and algorithms to know 2.1. List 2.2. Dictionary 2.3. Queue 2.4. Stack 2.5. Set 2.6. Counter (from collections module) 2.7. Heap 2.8. Graph search 2.8.1 Depth First Search (DFS) 2.8.2. Breadth First Search BFS 2.9. Binary Search 3. Common DSA questions asked during DE interviews 3.1. Intervals 3.

Algorithm 200
article thumbnail

Databricks SQL Serverless is now available on Google Cloud Platform

databricks

Databricks SQL Serverless is now Generally Available on Google Cloud Platform (GCP)! SQL Serverless is available in 7 GCP regions and 40+ regions across AWS, Azure and GCP.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to Deal with Missing Data Using Interpolation Techniques in Pandas

KDnuggets

Stop data from dropping out - learn how to handle missing data like a pro using interpolation techniques in Pandas.

Data 114
article thumbnail

Announcing the Generative AI World Cup 2024: A Global Hackathon by Databricks

databricks

Welcome to the Generative AI World Cup 2024 , a global hackathon inviting participants to develop innovative Generative AI applications that solve real-world.

131
131
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

Using Cluster Analysis to Segment Your Data

KDnuggets

Uncover the potential of cluster analysis for segmenting, analyzing, and gaining insights from groups of similar data.

Data 102

More Trending

article thumbnail

DoorDash Empowers Engineers with Kafka Self-Serve

DoorDash Engineering

DoorDash is supporting an increasingly diverse array of infrastructure use cases as the company matures. To maintain our development velocity and meet growing demands, we are transitioning toward making our stateful storage offerings more self-serve. This journey began with Kafka, one of our most critical and widely used infrastructure components. Kafka is a distributed event streaming platform that DoorDash uses to handle billions of real-time events.

Kafka 82
article thumbnail

Current 2024: What’s on Tap in Data Streaming

Confluent

Current 2024 brings 100+ sessions, keynotes, lightning talks, and more from industry leaders. Check out the agenda, highlights, networking events, and more event info.

Data 78
article thumbnail

Avoid Building a Data Platform in 2024

Towards Data Science

Why articles about ‘Building a Data Platform’ are mostly misleading Continue reading on Towards Data Science »

article thumbnail

Using NumPy to Perform Date and Time Calculations

KDnuggets

NumPy allows you to easily create arrays of dates, perform arithmetic on dates and times, and convert between different time units with just a few lines of code.

Coding 74
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

dbt Generic Tests in Sessions Validation at Yelp

Yelp Engineering

Sessions, Where Everything Started For the past few years, Yelp has been using dbt as one of the tools to develop data products that power data marts, which are one stop shops for high visibility dashboards pertaining to top level business metrics. One of the key data products that’s owned by my team, Clickstream Analytics, is the Sessions Data Mart.

IT 52
article thumbnail

Interpreting the Gartner Data Observability Market Guide

Monte Carlo

From the first mention of “ data observability ” back in 2019 to today, data observability has evolved dramatically—from data stack nice-to-have to a “must-have” solution for enterprise companies. This year data observability skyrocketed to the top of the Gartner’s Hype Cycles. Image courtesy of Gartner. According to Gartner, 50% of enterprise companies implementing distributed data architectures will have adopted data observability tools by 2026 – up from just ~20% in 2024.

Data 52
article thumbnail

Fivetran vs RudderStack Comparison

Hevo

Integrating and transforming data efficiently is crucial for businesses seeking actionable insights. ETL tools have become essential for companies, making data integration and transformation smooth and efficient. With so many ETL tools available, choosing the right one for your needs can be challenging. This post lets you contrast the popular platforms, Fivetran vs RudderStack.

article thumbnail

Top 9 Leadership Skills to know in 2024

Edureka

A good leader is never someone who wants to become a larger-than-life hero. They never aspire to be put on a pedestal or become unreachable icons. They are seemingly ordinary people, quietly delivering extraordinary results. They produce such results not miraculously but with some distinct leadership skills. Skills that can ignite passion and creativity in their teams and foster an environment that motivates and encourages each member to bring their best to the table.

article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

AWS DMS CDC: Complete Guide to Real-Time Data Migration

Hevo

So, you’ve heard about AWS Database Migration Service (DMS) and Change Data Capture (CDC) and are curious how it can help you with your data migration needs. Well, you’ve come to the right place! Let’s dive into this fascinating world of real-time data migration and explore how AWS DMS CDC can make your life easier.

AWS 52
article thumbnail

What is Workplace Gamification in Leadership Training Programs

Edureka

The term “workplace gamification” describes game-design concepts and elements to motivate staff in real business settings. Gamification for leadership training appeals to basic needs like games, competition, and accomplishment, gamification transforms tedious, routine tasks into enjoyable, fulfilling, and unforgettable experiences. However, according to Research and Markets, a more modest but still significant growth rate will be experienced by the gamification market, with a CAGR of

article thumbnail

Top 10 Data Quality Tools for Ensuring High Data Standards

Hevo

If you’re reading this, you already know how important data quality can be in today’s fast-moving world for making critical business decisions. Now, be honest; you want to get high-quality data all the time—right? Use data quality tools.

article thumbnail

How Peer Learning Can Accelerate Executive Development

Edureka

Peer Learning for Executive Development Executive development or we can simply say development because it refers to learning opportunities thrown open to managers working at various levels. It is any attempt to improve managerial performance by imparting knowledge, changing attitudes or increasing skills. The aim of development is not just to improve current job performance of managers but to prepare them for future challenging roles.

Retail 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Data Products 101: Understanding the Fundamentals and Best Practices

The Modern Data Company

Introduction to Data Products In today’s data-driven landscape, data products have become essential for maximizing the value of data. As organizations seek to leverage data more effectively, the focus has shifted from temporary datasets to well-defined, reusable data assets. Data products transform raw data into actionable insights, integrating metadata and business logic to meet specific needs and drive strategic decision-making.

article thumbnail

What is Vite, and what are the reasons for replacing the Create React App?

Edureka

What is Vite? Vite is a framework for building web applications. It is modern and fast. Developers appreciate it for its speed and simplicity. It means that Vite makes it easier for you when you want to develop web projects. It is great for making apps with frameworks like React. Table of Contents: What is Vite? How does Vite work? Reasons to use Vite over CRA FAQs Thus, it can be stated that Vite React functions differently compared to other older tools.

article thumbnail

Just Launched: AI Monitor Recommendations for Proactive Data Quality Management

Monte Carlo

You don’t know what you don’t know, said a data analyst, definitely… and this is certainly true when it comes to the millions of ways your data can break. And even if you could predict all of these issues, you wouldn’t want to manually define and write data quality rules to cover all of them. Wouldn’t it be great if your data quality and observability solution provided recommendations for these rules, and then – with the push of a button – created them?

article thumbnail

How to Drag and Drop in Selenium?

Edureka

Drag-and-drop is one of the core skills that an automation tester must have in Selenium. This technique helps the developer imitate user interactions with web elements. Such testing is applied to complex interfaces and dynamic web applications. For this reason, it is very crucial to understand how this functionality could be implemented for practical testing.

Java 52
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.