A Guide to Data Engineering Infrastructure
Towards Data Science
JANUARY 20, 2024
Automate resource provisioning with modern tools Continue reading on Towards Data Science »
Towards Data Science
JANUARY 20, 2024
Automate resource provisioning with modern tools Continue reading on Towards Data Science »
KDnuggets
JANUARY 23, 2024
Prompt engineering and generative AI are becoming hotter by the day. Be part of the heat!
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Data Engineering Podcast
JANUARY 21, 2024
Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern c
ArcGIS
JANUARY 22, 2024
Equivalency enhancements to geoprocessing in ArcGIS Pro 3.2 to remove more barriers for those transitioning from ArcMap.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
databricks
JANUARY 24, 2024
Today, we are announcing the industry's first Generative AI Engineer learning pathway and certification to help ensure that data and AI practitioners have.
KDnuggets
JANUARY 23, 2024
Here are data repositories that will up your data science game and improve your data projects.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Christophe Blefari
JANUARY 26, 2024
Hey ( credits ) Hey, new week new email. This is already end of January but I took time to travel and see people I did not see for a long time so I'm super happy how this new year is starting. Next week, I'll be wrapping up my DataOps lecture by incorporating how to deploy machine learning models. This is a fun part where students learn how to serve a simple classifier in production.
Waitingforcode
JANUARY 23, 2024
Data enrichment is one of common data engineering tasks. It's relatively easy to implement with static datasets because of the data availability. However, this apparently easy task can become a nightmare if used with inappropriate technologies.
KDnuggets
JANUARY 26, 2024
Data Engineering ZoomCamp offers free access to reading materials, video tutorials, assignments, homeworks, projects, and workshops.
databricks
JANUARY 22, 2024
Generative AI has opened new worlds of possibilities for businesses and is being emphatically embraced across organizations. According to a recent MIT Tech.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Christophe Blefari
JANUARY 20, 2024
Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. The aim of this post is to create a repository of important links and concepts we should care about when we do data engineering.
Confessions of a Data Guy
JANUARY 26, 2024
Well, I hate to break the news to you. I was the same when I first started, writing code that is. I was a zealot. I was zealous for every new thing I learned, every new language, every new approach, I would find the preacher who was preaching the message I wanted to hear … […] The post The Difficulties of Senior Engineer … are not Engineering appeared first on Confessions of a Data Guy.
KDnuggets
JANUARY 23, 2024
Want to make a successful career switch to data science? From learning data science concepts to cracking interviews, read this guide to move one step closer to your first data science job.
databricks
JANUARY 24, 2024
Reliable, accurate and trusted data is the most critical requirement for any data application in an enterprise. As Databricks customers increasingly rely on.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Christophe Blefari
JANUARY 20, 2024
Walking in the street be like recently ( credits ) Hey I hope this new edition finds you well. We are deep in the winter, it's time for comfy Data News to read near the fire 🔥 This week, on Monday, I started my annual university lecture. It's been 9 years since I started teaching and this year something was different. The students were incredibly calm, obviously my course is a bit difficult at the beginning because it touches on concepts that they are not used to—cloud,
ThoughtSpot
JANUARY 23, 2024
ThoughtSpot is taking Snowpark use cases to the next level with generative AI, connecting the dots between ML-powered insights and business action. If you’re new to Snowpark, this is Snowflake ’s set of libraries and runtimes that securely deploy and process non-SQL code including Python, Java, and Scala. Combining the power of Snowflake Snowpark and ThoughtSpot, developers and data professionals can create models, uncover insights, and build data apps using their preferred programming language.
KDnuggets
JANUARY 25, 2024
Improve your version control skills to resolve issues and maintain a clean Git repository.
Snowflake
JANUARY 25, 2024
This year may be the most innovative on record. Recent advances in AI are beginning to transform how we live and work. And the potential impacts of artificial intelligence (AI) on the healthcare and life sciences industries are expected to be far-reaching. It’s essential for organizations to leverage vast amounts of structured and unstructured data for effective generative AI (gen AI) solutions that deliver a clear return on investment.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
databricks
JANUARY 23, 2024
This post is part of a series. Check out Part 1: The Data + AI Trifecta: People, Process, and Platform In the current.
Knowledge Hut
JANUARY 25, 2024
With over a decade of my experience in Project management, I might have crashed about 80% of my Project. Project Crashing is not a negative or a bad thing like it sounds, instead it serves as a strategy in project management, aimed at expediting project timelines without compromising the project's scope. It's very different from fast-tracking, which involves resequencing activities, and scope changes, which alter project objectives, project crashing focuses on deploying additional resour
KDnuggets
JANUARY 23, 2024
Learn what Predictive GenAI does and how it can make predictive analytics far more accessible, efficient, and meaningful for your business.
Snowflake
JANUARY 26, 2024
A key benefit of the Snowflake Data Cloud is the elimination of data silos. Fundamental to this outcome is the ability of customers to operate and collaborate globally. To support this, the Data Cloud was designed to provide customers with the same product experience—including security and governance capabilities — across multiple cloud regions with the three major cloud providers: AWS, Azure, and Google Cloud.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Monte Carlo
JANUARY 23, 2024
In the world of data engineering, Maxime Beauchemin is someone who needs no introduction. One of the first data engineers at Facebook and Airbnb, he wrote and open sourced the wildly popular orchestrator, Apache Airflow , followed shortly thereafter by Apache Superset , a data exploration tool that’s taking the data viz landscape by storm. Currently, Maxime is CEO and co-founder of Preset , a fast-growing startup that’s paving the way forward for AI-enabled data visualization for modern companie
Knowledge Hut
JANUARY 24, 2024
Step into the realm of data visualization with a comprehensive exploration of Power BI and Tableau. In a world where data is important, deciding between power bi vs tableau can change your path in analyzing things. As we explore the pasts and ways that Power BI and Tableau work, it'll help us understand what makes these tools special. If you are an expert in working with data or a beginner excited to use visualization, this blog will help you understand the differences between power bi and t
KDnuggets
JANUARY 24, 2024
Transform your AI aspirations into reality—join Uplimit's AI revolution!
Towards Data Science
JANUARY 24, 2024
A Practical guide to optimizing non-equi joins in Spark Photo by John Lee on Unsplash Enriching network events with IP geolocation information is a crucial task, especially for organizations like the Canadian Centre for Cyber Security , the national CSIRT of Canada. In this article, we will demonstrate how to optimize Spark SQL joins, specifically focusing on scenarios involving non-equality conditions — a common challenge when working with IP geolocation data.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
The Modern Data Company
JANUARY 22, 2024
The Modern Data Company has been given an honorable mention in Gartner’s 2023 Magic Quadrant for Data Integration. In honor of this achievement, we’d like to re-introduce ourselves for 2024 and let everyone know why DataOS has been and still is one of the most innovative and reliable ways for organizations not only to interact with their data but to put that data to work once and for all.
Knowledge Hut
JANUARY 22, 2024
In today's era of digital transformation and rapidly evolving technological trends, it is imperative for IT professionals to keep up with the latest know-how about the subject matter, tools, and skills. Other than pursuing career-oriented courses and certifications, there is no better way for professionals to achieve this objective. Certifications are like stepping stones for professionals guiding their career journey and learning paths to progress ahead and stay in vogue with job demands as wel
KDnuggets
JANUARY 22, 2024
Context managers in Python help you handle resources efficiently. Let's learn how you can use them to manage database connections, subprocesses, and more.
Towards Data Science
JANUARY 26, 2024
The way you retrieve variables from Airflow can impact the performance of your DAGs Continue reading on Towards Data Science »
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content