This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
AWS or Azure? Cloudera or Databricks? With so many data engineering certifications available , choosing the right one can be a daunting task. Whether you are just starting your career as a Data Engineer or looking to take the next step, this blog will walk you through the most valuable data engineering certifications and help you make an informed decision about which one to pursue.
Business Intelligence Trends: Business intelligence (BI) is becoming an ever more critical element in the success of a business. This article will discuss what BI is, its use, and its current associated trends. We’ll also look into ways that businesses can successfully incorporate BI into their practices to gain competitive advantages. Read on to learn more about the powerful world of business intelligence!
Save money, save money!! Hear Hear! Someone on Linkedin recently brought up the point that companies could save gobs of money by swapping out AWS Python lambdas for Rust ones. While it raised the ire of many a Python Data Engineer, I thought it sounded like a great idea. At least it’s an excuse to […] The post AWS Lambdas – Python vs Rust.
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. A collaborative and interactive workspace allows users to perform big data processing and machine learning tasks easily. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.
As I sit down to write this article, I’m filled with a sense of vulnerability and excitement. You see, this is a story that only I can tell. It’s a tale of finding my Pathless Path and discovering who I am in the process. I have learned that some of my best decision-making comes from following my gut, heart, and intuition, a place of inner knowing.
Originally published on 23 Feb 2023 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you're not yet a full subscriber, you missed the in-depth analysis this week: Are tech companies aggressively cutting back on vendor spend?
Originally published on 23 Feb 2023 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you're not yet a full subscriber, you missed the in-depth analysis this week: Are tech companies aggressively cutting back on vendor spend?
Summary There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Hmm … data types. We all know they are important, but we don’t take them very seriously. I mean we know the difference between boolean, string, and integers, those are easy to get right. But we all get sloppy, sometimes we got the string and varchar route because we don’t spend enough time on the […] The post Data Types in Delta Lake + Spark.
Introduction Data science has taken over all economic sectors in recent times. To achieve maximum efficiency, every company strives to use various data at every stage of its operations. Each aspect of data science, like data preparation, the importance of big data, and the process of automation, contributes to how data science is the future […] The post 30 Best Data Science Books to Read in 2023 appeared first on Analytics Vidhya.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Apache Airflow is a very popular tool that data engineers rely on. But why? Why do data engineers like Airflow? Also, what does Apache Airflow event do? In this article we will answer questions like: What is Airflow? What is a DAG? Why do people use Apache Airflow? Why we like Airflow? What are… Read more The post What Is Apache Airflow – Data Engineering Consulting appeared first on Seattle Data Guy.
Originally published 2 February 2023. 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of seven topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. Apple was the first Big Tech giant to mandate a proper return to the office and back in September 2022, this initiative was in full swing, being rolled out in the US and with 3 days per week in the office mandated in the UK.
Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
As I started to use Rust on and off, more out of curiosity than anything, I discovered some specs of gold buried down in the depths. Some of the things I’m going to talk about, well … all of it, is probably fairly obvious to most Rust folk, but it’s enjoyable to learn what new […] The post Ownership and Borrowing in Rust – Data Engineering Gold Mine. appeared first on Confessions of a Data Guy.
Introduction Data replication is also known as database replication, which is copying data to ensure that all information remains consistent across all data resources in real-time. data replication is like a safety net that keeps your information safe from disappearing or falling through the cracks. In most cases, data alters. It is constantly changing.
It's time to start the 4th part of the Table file formats series. This time the topic will be Change Data Capture, so how to stream all changes made on the table. As for the 3rd part, I'm going to start with Delta Lake.
Another tour de force by Riccardo Cardin. Riccardo is a proud alumnus of Rock the JVM, now a senior engineer working on critical systems written in Java, Scala and Kotlin. Version 19 of Java came at the end of 2022, bringing us a lot of exciting stuff. One of the coolest is the preview of some hot topics concerning Project Loom: virtual threads ( JEP 425 ) and structured concurrency ( JEP 428 ).
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Summary This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Your host is Tobias Macey and today I'm reflecting on the m
opam is a source-based package manager for OCaml. It is the de-facto standard for package management in the OCaml ecosystem. opam’s main package repository contains over 4000 individual packages, on average spanning 7 versions each. Like many other language-specific package managers (e.g. cargo, cabal, etc.), opam performs four main tasks: Download the sources.
Introduction “Let’s containerize your code to ship worldwide!” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is. Let me explain it with an example. Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya.
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Facebook for iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012 , it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time. After years of iteration , the Facebook codebase does not resemble a typical iOS codebase: It’s full of C++, Objective-C(++), and Swift.
Liang Ma | Software Engineer, Core Eng; Scott Beardsley | Engineering Manager, Traffic; Haowei Yuan | Software Engineer, Traffic Figure 1 — HTTP/3 at Pinterest Now Pinterest operates on HTTP/3. We have enabled HTTP/3 for major Pinterest production domains on our multi-CDN edge network, and we’ve upgraded client apps’ network stack to support the new protocol.
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Introduction Data is the new oil in this century. The database is the major element of a data science project. To generate actionable insights, the database must be centralized and organized efficiently. If a corrupted, unorganized, or redundant database is used, the results of the analysis may become inconsistent and highly misleading. So, we are […] The post How to Normalize Relational Databases With SQL Code?
A lot has changed since the initial launch of our basemap in late 2020. We’re Meta now, but our mission remains the same: Giving people the power to build community and bring the world closer together. Across Meta, our family of applications (Facebook, Instagram, WhatsApp, among others) are using our basemap to connect people through functions like status updates, location sharing, and location-based searching.
We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content