January, 2024

article thumbnail

The Future of Data Engineering as a Data Engineer

Monte Carlo

In the world of data engineering, Maxime Beauchemin is someone who needs no introduction. One of the first data engineers at Facebook and Airbnb, he wrote and open sourced the wildly popular orchestrator, Apache Airflow , followed shortly thereafter by Apache Superset , a data exploration tool that’s taking the data viz landscape by storm. Currently, Maxime is CEO and co-founder of Preset , a fast-growing startup that’s paving the way forward for AI-enabled data visualization for modern companie

article thumbnail

The Only Free Course You Need To Become a Professional Data Engineer

KDnuggets

Data Engineering ZoomCamp offers free access to reading materials, video tutorials, assignments, homeworks, projects, and workshops.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Science vs Software Engineering - Significant Differences

Knowledge Hut

With an array of career options, all that matters is choosing the right career path. The right career path for one depends on their skill set, interest, job availability in that field, and, most importantly, your passion for the same. Speaking of job vacancies, the two careers have high demands till date and in upcoming years are Data Scientist and a Software Engineer.

article thumbnail

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing.

Data Lake 147
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

LLM Training and Inference with Intel(R) Gaudi(R) 2 AI Accelerators

databricks

At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or.

Building 145
article thumbnail

Totally Eclipsed

ArcGIS

Exploring the value of critique as part of the process of creating a new map of the Total Eclipse that will cross the United States on April 8th

Process 143

More Trending

article thumbnail

AI Prompt Engineers are Making $300k/y

KDnuggets

Prompt engineering and generative AI are becoming hotter by the day. Be part of the heat!

article thumbnail

Accelerate Your Machine Learning Workflows in Snowflake with Snowpark ML 

Snowflake

Many developers and enterprises looking to use machine learning (ML) to generate insights from data get bogged down by operational complexity. We have been making it easier and faster to build and manage ML models with Snowpark ML , the Python library and underlying infrastructure for end-to-end ML workflows in Snowflake. With Snowpark ML, data scientists and ML engineers can use familiar Python frameworks for preprocessing and feature engineering as well as training models that can be managed a

article thumbnail

Modern Customer Data Platform Principles

Data Engineering Podcast

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern c

Data Lake 147
article thumbnail

Welcome to the Data Intelligence Platform: Databricks + Einblick

databricks

At Databricks, we believe that AI will change the way that enterprises interact with their data. That’s why today, we're excited to welcome t.

Data 139
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Geoprocessing enhancements in ArcGIS Pro 3.2 for ArcMap users

ArcGIS

Equivalency enhancements to geoprocessing in ArcGIS Pro 3.2 to remove more barriers for those transitioning from ArcMap.

139
139
article thumbnail

A look under GHC's hood: desugaring linear types

Tweag

I recently merged linear let- and where-bindings in GHC. Which means that we’ll have these in GHC 9.10, which is cause for celebration for me. Though they are much overdue, so maybe I should instead apologise to you. Anyway, I thought I’d take the opportunity to discuss some of GHC’s inner workings and how they explain some of the features of linear types in Haskell.

Algorithm 136
article thumbnail

Top 16 Technical Data Sources for Advanced Data Science Projects

KDnuggets

Here are data repositories that will up your data science game and improve your data projects.

article thumbnail

Apache Flink and cluster components deep dive

Waitingforcode

Previously you could read about transformation of a user job definition into an executable stream graph. Since this explanation was relatively high-level, I decided to deep dive into the final step executing the code.

Coding 130
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience.

article thumbnail

Databricks Announces the Industry’s First Generative AI Engineer Learning Pathway and Certification

databricks

Today, we are announcing the industry's first Generative AI Engineer learning pathway and certification to help ensure that data and AI practitioners have.

article thumbnail

Introducing Neighborhood Explorer in ArcGIS Pro

ArcGIS

ArcGIS Pro now includes Neighborhood Explorer: an experience that will help you understand and refine spatial relationships in your analysis.

Education 138
article thumbnail

The State of Data Engineering at Data Day Texas 2024

Jesse Anderson

The premier of my latest talk covering The State of Data Engineering. I go through the history of the industry to see where we’re heading. This starts with data warehousing and goes into data science. I finish off by showing how data engineering can avoid the same fate as data warehousing and data science. Sorry, we didn’t have a microphone for the questions and I forgot to repeat some of the questions.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

7 Steps to Landing Your First Data Science Job

KDnuggets

Want to make a successful career switch to data science? From learning data science concepts to cracking interviews, read this guide to move one step closer to your first data science job.

article thumbnail

Data News — Week 24.04

Christophe Blefari

Hey ( credits ) Hey, new week new email. This is already end of January but I took time to travel and see people I did not see for a long time so I'm super happy how this new year is starting. Next week, I'll be wrapping up my DataOps lecture by incorporating how to deploy machine learning models. This is a fun part where students learn how to serve a simple classifier in production.

Algorithm 130
article thumbnail

Static enrichment dataset with Delta Lake

Waitingforcode

Data enrichment is one of common data engineering tasks. It's relatively easy to implement with static datasets because of the data availability. However, this apparently easy task can become a nightmare if used with inappropriate technologies.

Datasets 130
article thumbnail

Databricks SQL Year in Review (Part I): AI-optimized Performance and Serverless Compute

databricks

This is part 1 of a blog series where we look back at the major areas of progress for Databricks SQL in 2023.

SQL 138
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Cutting Your Data Stack Costs: How To Approach It And Common Issues

Seattle Data Guy

I once had an engineer tell me that they essentially didn’t want to consider cost as they were building a solution. I was baffled. Don’t get me wrong, yes, when you’re building, you iterate and aim to improve your solutions cost. But from my perspective, I don’t think completely ignoring costs from day one is… Read more The post Cutting Your Data Stack Costs: How To Approach It And Common Issues appeared first on Seattle Data Guy.

IT 130
article thumbnail

Cartographic conventions

ArcGIS

What are cartographic conventions and do you need to follow them?

Education 129
article thumbnail

4 Steps to Become a Generative AI Developer

KDnuggets

In this post, we will cover what a generative AI developer does, what tools you need to master, and how to get started.

152
152
article thumbnail

How to learn data engineering

Christophe Blefari

Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. The aim of this post is to create a repository of important links and concepts we should care about when we do data engineering.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Table file formats - streaming reader: Delta Lake

Waitingforcode

Even though I'm into streaming these days, I haven't really covered streaming in Delta Lake yet. I only slightly blogged about Change Data Feed but completely missed the fundamentals. Hopefully, this and next blog posts will change this!

Data 130
article thumbnail

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

databricks

Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more.

article thumbnail

7 Great Embedded Analytics Solutions – Which Embedded Analytics Solutions Should You Use?

Seattle Data Guy

Big data is big business these days. Organizations that hope to get ahead in crowded markets must utilize data from a variety of often highly disparate sources to understand how they’re performing and what customers are saying about them. However, data without the right analysis and reporting tools is just a waste of digital storage… Read more The post 7 Great Embedded Analytics Solutions – Which Embedded Analytics Solutions Should You Use?

Big Data 130
article thumbnail

Polars vs Spark

Confessions of a Data Guy

The post Polars vs Spark appeared first on Confessions of a Data Guy.

Data 130
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you