Mon.Jan 20, 2025

article thumbnail

Data Engineering Interview Series #2: System Design

Start Data Engineering

1. Introduction 2. Guide the interviewer through the process 2.1. [Requirements gathering] Make sure you clearly understand the requirements & business use case 2.2. [Understand source data] Know what you have to work with 2.3. [Model your data] Define data models for historical analytics 2.4. [Pipeline design] Design data pipelines to populate your data models 2.5.

Designing 130
article thumbnail

Learn Python for Data Science in 6 Weeks on DataCamp

KDnuggets

Whether youre starting from scratch or building on existing skills, this hands-on program teaches you how to import, clean, and visualize data from day one using libraries like pandas, Seaborn, and Matplotlib. Plus, earn an industry-recognized certification to showcase your expertise and stand out in the job market.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Modern Data And Application Engineering Breaks the Loss of Business Context

Towards Data Science

Here’s how your data retains its business relevance as it travels through your enterprise Continue reading on Towards Data Science

article thumbnail

10 Data Science Myths Debunked [Infographic]

KDnuggets

Our latest infographic breaks down 10 of the most common and enduring myths about data science, offering clarity on the misconceptions that often surround this rapidly evolving field.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

The Concepts Data Professionals Should Know in 2025: Part 2

Towards Data Science

From AI Agent to Human-In-The-Loop — Master 12 critical data concepts and turn them into simple projects to stay ahead in IT.

article thumbnail

10 Essential PySpark Commands for Big Data Processing

KDnuggets

Check out these 10 ways to leverage efficient distributed dataset processing combining the strengths of Spark and Python libraries for data science.

Big Data 108

More Trending

article thumbnail

The Ultimate Roadmap to Becoming an LLM Engineer

KDnuggets

Unsure what to learn, where to start, and which order to follow to master LLM engineering concepts and skills? This comprehensive roadmap with clear milestones and stages is here to help!

article thumbnail

Modern Data And Application Engineering Breaks the Loss of Business Context

Towards Data Science

Here’s how your data retains its business relevance as it travels through your enterprise Continue reading on Towards Data Science

article thumbnail

JSON Web Keys (JWK): Rotating Cryptographic Keys at Zalando

Zalando Engineering

Enhancing the Security of Our Customer Identity Platform Through Automated Key Rotation Static secrets are evil. Whether secret keys hard-coded in source code, tokens without expiry or plaintext API keys referenced in configuration files, static secrets are ticking time bombs. The same is true for cryptographic key material in the context of JSON Web Tokens (JWTs) and OpenID Connect (OIDC).

article thumbnail

How Partners Are Poised to Bring Gen AI to Life Across Industries

Snowflake

Imagine a world where AI recommends an optimal product, speeds up medical breakthroughs and predicts financial trends this isn't the future; it's happening now across retail, healthcare and financial services. Having worked through major market transitions before such as the emergence of cloud computing and the rise of AI/ML I understand the importance of taking a thoughtful approach to technology and prioritizing real-world impact.

Retail 97
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Data Engineering Weekly #204

Data Engineering Weekly

Try Fully Managed Apache Airflow for FREE Astro is the fully-managed DataOps platform powered by Apache Airflow. With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time. Try Astro Free → Julia Wiesinger, Patrick Marlow, and Vladimir Vuskovic: Agents The combination of reasoning, logic, and access to external information that are all connected to a Generative AI model invokes the concept of an agent.

article thumbnail

Shared Destiny with Snowflake Horizon catalog Built-in Security

Snowflake

Security has been an integral capability of Snowflake since the company was founded. Through the customer-configurable security capabilities of the Snowflake Horizon Catalog , we empower security admins and chief information security officers (CISOs) to better protect their environments and centralize threat monitoring and role-based access controls across clouds.

article thumbnail

Introducing Lightstep Receiver for OpenTelemetry Collector

Zalando Engineering

OpenTelemetry is a vendor-neutral, flexible standard that supports traces, metrics, and logs all in one place. Organizations who adopted older tracing solutions like OpenTracing or custom legacy tracer libraries to instrument their applications are faced with a migration task. Today, were excited to announce the Lightstep Receiver for OpenTelemetry Collector, a component capable to receive tracing traffic from legacy Lightstep tracers, convert it to OpenTelemetry and propagate via OpenTelemetry

article thumbnail

Why I wish I had a control plane for my renovation

dbt Developer Hub

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didnt realize was just how complex and exhausting managing so many moving parts would be. My wife pondering our sanity We had to coordinate multiple elements: The architects , who designed the layout, interior, and exterior.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

How The Motley Fool Uses Snowflake And Striim To Empower Smarter Investing Decisions

Striim

Manaen Schlabach, Data Administrator at The Motley Fool, shares how Snowflake and Striim enable reliable, scalable, and cost-effective data delivery to support smarter investing tools like Fool IQ. By integrating Snowflake and Striim, The Motley Fool achieved a 10x improvement in the reliability and timeliness of their replication processes. The unified solution, deployed in less than 20 days, tracks membership and campaign activity, allowing timely adjustments to increase value for members.

Process 52