Roadmap for AI Engineers
KDnuggets
OCTOBER 16, 2024
Learn about 10 easy steps to becoming an AI engineer in 2024.
KDnuggets
OCTOBER 16, 2024
Learn about 10 easy steps to becoming an AI engineer in 2024.
The Pragmatic Engineer
OCTOBER 18, 2024
The below was originally published in The Pragmatic Engineer. To get timely analysis on the tech industry like this, on a weekly basis: sign up to The Pragmatic Engineer Newsletter. If you are into podcasts, check out The Pragmatic Engineer Podcast. Imagine Apple decided Spotify was a big enough business threat that it had to take unfair measures to limit Spotify’s growth on the App Store.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
databricks
OCTOBER 14, 2024
Over the last few years, we've seen tremendous growth and adoption of Databricks SQL , our intelligent data warehouse purpose-built on the Data.
Start Data Engineering
OCTOBER 17, 2024
Introduction Setup SQL tips 1. Handy functions for common data processing scenarios 1.1. Need to filter on WINDOW function without CTE/Subquery use QUALIFY 1.2. Need the first/last row in a partition, use DISTINCT ON 1.3. STRUCT data types are sorted based on their keys from left to right 1.4. Get the first/last element with ROW_NUMBER() + QUALIFY 1.5.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
OCTOBER 16, 2024
Where can you find projects dealing with advanced ML topics? GitHub is a perfect source with its many repositories. I’ve selected ten to talk about in this article.
The Pragmatic Engineer
OCTOBER 17, 2024
Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one section from this week’s from last week’s The Pulse issue. To get full issues twice a week, subscribe here.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Start Data Engineering
OCTOBER 14, 2024
1. Introduction 2. Code & Data 3. Using nested data types effectively 3.1. Use STRUCT for one-to-one & hierarchical relationships 3.2. Use ARRAY[STRUCT] for one-to-many relationships 3.3. Using nested data types in data processing 3.3.1. STRUCT enables more straightforward data schema and data access 3.3.2. Nested data types can be sorted 3.3.3.
KDnuggets
OCTOBER 15, 2024
This guide emphasizes the growing significance of GenAI but also highlights the crucial role that data scientists play in harnessing this technology to solve real-world problems.
databricks
OCTOBER 17, 2024
We are excited to announce the Public Preview of Databricks serverless budget policies. Administrators can use budget policies to ensure that the correct.
Simon Späti
OCTOBER 16, 2024
DuckDB has a significant share and is frequently featured in the latest data engineering news. However, it’s still in its early adopter phase and has yet to be adopted by larger enterprises. Sure, all data creators and startups have used and potentially grown to love DuckDB, but is it also suitable for enterprises? What about scaling out and sharing it with others in the organization?
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Engineering at Meta
OCTOBER 15, 2024
At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community. These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components. By sharing our designs, we hope to inspire collaboration and foster innovation. If you’re passionate about building the future of AI, we invite you to engage with us and OCP to help shape the next generation of open hardware for AI.
KDnuggets
OCTOBER 18, 2024
Check out these 10 use cases for AI to shine.
databricks
OCTOBER 16, 2024
AI/BI Genie is a conversational experience for business teams to self-serve insights from their data through natural language. Genie leverages generative AI tailored.
Confessions of a Data Guy
OCTOBER 14, 2024
I figured a few of us might need the WordPress drama explained like we are 5. So, here you go. WordPress is the GOAT of internet website builders WordPress was founded by Matt Mullenweg With much of the internet running on WordPress … hosting WordPress is of course … lucrative and a big business. The […] The post What is the WordPress drama about?
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Snowflake
OCTOBER 17, 2024
Predictive machine learning continues to be a cornerstone of data-driven decision-making. However, as organizations accumulate more data in a wide variety of forms, and as modeling techniques continue to advance, the tasks of a data scientist and ML engineer are becoming increasingly complex. Oftentimes, more effort is spent on managing infrastructure, jumping through package management hurdles, and dealing with scalability issues than on actual model development.
KDnuggets
OCTOBER 16, 2024
This is a comprehensive resource for developers at all levels, whether they are just starting in AI or are looking to refine their expertise further.
databricks
OCTOBER 16, 2024
At Databricks, we are constantly innovating and optimizing our platform to ensure that our customers can maximize the value of their data and.
Engineering at Meta
OCTOBER 15, 2024
At Open Compute Project Summit (OCP) 2024, we’re sharing details about our next-generation network fabric for our AI training clusters. We’ve expanded our network hardware portfolio and are contributing two new disaggregated network fabrics and a new NIC to OCP. We look forward to continued collaboration with OCP to open designs for racks, servers, storage boxes, and motherboards to benefit companies of all sizes across the industry.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Snowflake
OCTOBER 18, 2024
To enhance security and ease operational burden, many organizations with data lakes or lakehouses want flexibility to securely integrate their tools of choice on a single copy of data. An open standard for storage format and catalog API has helped, but there’s still a need for open standards for the catalog, including a consistent way to apply security access controls to data.
KDnuggets
OCTOBER 15, 2024
Various statistical methods you might never have known previously but useful for your workflow.
Confessions of a Data Guy
OCTOBER 12, 2024
Is there anything worse than the PR process (Pull Request) at most companies? Probably not. It’s the dreaded 600-pound gorilla in the room that no one wants to talk about. Everyone hates it, everyone has to do it. But, it doesn’t have to be like that. There are a few tried and true ways to […] The post How to make the PEFECT Pull Request (PR) appeared first on Confessions of a Data Guy.
ArcGIS
OCTOBER 16, 2024
Geographic Information Systems (GIS), are revolutionizing airport safety and security, and driving the transformation into smart airports.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Snowflake
OCTOBER 16, 2024
Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. For organizations who are considering moving from a legacy data warehouse to Snowflake, are looking to learn more about how the AI Data Cloud can support legacy Hadoop use cases, or are struggling with a cloud data warehouse that just isn’t scaling anymore, it often helps to see how others have done it.
KDnuggets
OCTOBER 18, 2024
The title says everything. It is a guide for lazy people who want to learn Python and earn dollars.
Robinhood
OCTOBER 16, 2024
Robinhood is rolling out a suite of new advanced trading tools built from the ground up for active traders Today, to kick off HOOD Summit –our first-ever customer-focused conference geared towards active traders–we announced Robinhood Legend, a powerful, sleek desktop trading platform built specifically for active traders. We’re also launching futures trading and index options on mobile.
Confluent
OCTOBER 15, 2024
Third installment of the Producer/Consumer Internals series that covers preparing the consumer fetch: how consumers interact with brokers, coordinate partitions, and send requests.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Snowflake
OCTOBER 15, 2024
As organizations increasingly seek to enhance decision-making and drive operational efficiencies by making knowledge in documents accessible via conversational applications, a RAG-based application framework has quickly become the most efficient and scalable approach. As RAG-based application development continues to grow, the solutions to process and manage the documents that power these applications need to evolve with scalability and efficiency in mind.
KDnuggets
OCTOBER 15, 2024
Explore free platforms for learning, building portfolios, accessing code editors, engaging with communities, and hosting projects.
Towards Data Science
OCTOBER 15, 2024
Dataflow Architecture—Derived Data Views and Eventual Consistency A (not-so) brief history of a health & fitness data pipeline: part ii Welcome to part ii of our coming-of-age trilogy on a public health and fitness data pipeline. In this chapter, we reimagine the backend system as a distributed state machine and explore the art of achieving consistency — with a functional flavour.
Uber Engineering
OCTOBER 18, 2024
Discover how QueryGPT revolutionizes SQL query generation at Uber! Learn about the cutting-edge AI that turns natural language prompts into efficient SQL queries, boosting productivity at Uber. Dive into our journey of innovation and transformation.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content