This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.
Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.
By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier — Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
We are excited to announce that bit.io is joining Databricks. At Databricks, we’ve always been focused on empowering organizations to solve their toughest p.
The asynchronous progress tracking and correctness issue fixes presented in the previous blog posts are not the single new feature in Apache Spark Structured Streaming 3.4.0. There are many others but to keep the blog post readable, I'll focus here only on 3 of them.
AutoML frameworks are powerful tool for data analysts and machine learning specialists that can automate data preprocessing, model selection, hyperparameter tuning, and even perform complex tasks like feature engineering.
AutoML frameworks are powerful tool for data analysts and machine learning specialists that can automate data preprocessing, model selection, hyperparameter tuning, and even perform complex tasks like feature engineering.
Me ( credits ) Hey, I've been sick in the last 3 days and it was impossible to write something. As I still want to send something, here a raw edition with no comments. See you on Friday. Gen Ai 🤖 QLoRA: Efficient Finetuning of Quantized LLMs — 65B parameter model on a single 48GB GPU reaching 99.3% of the performance level of ChatGPT on Vicuna.
Welcome to Snowflake’s Startup Spotlight, where we highlight the people and companies building businesses on Snowflake. In this Q&A series, Jacques Nadeau, Co-Founder and CEO of Sundeck and co-creator of Apache Arrow, talks about what inspires him to make powerful data tools available to all, how Sundeck’s query engineering platform can help Snowflake users, and why they “eat, sleep, and drink” Snowflake every day at Sundeck.
In November 2022, Tweag engineers merged a WebAssembly back end into the Glasgow Haskell Compiler (GHC). The back end includes a new translation for control flow , which enables GHC to avoid depending on external tools like Binaryen. Because the translation is new, we wanted to test it before submitting a merge request. And classic unit testing was not a good fit—we would have needed to know what the WebAssembly code was expected to be generated from any given fragment of Haskell, and that’s a j
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Faced with clinician shortages, an aging population, and stagnant health outcomes, the healthcare industry has the potential to greatly benefit from disruptive technologies.
Are You a Data Ticket Taker or Decision Maker? The characteristics and value of reactive vs. proactive data teams Image courtesy of the author. Fundamentally, there are two different types of data teams in this world. There are those who are reactive to the wants of the organization, and then there are those who proactively lead the organization towards its needs.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
With the recent announcement of ThoughtSpot Sage , we launched a number of enhancements to our search capabilities including AI-generated answers, AI-powered search suggestions, and AI-assisted data modeling. In this article we will walk you through the steps we take to secure your data during the LLM interaction. Looking more broadly, we’ll also describe the security process we follow during any application iteration or enhancement, so you can see the great lengths we take to keep your data se
In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Project management is a critical function for every organization to achieve its goals in a successful and effective manner. According to one report, project management employment in the United States is predicted to expand by 33% between 2017 and 2027. According to the Bureau of Labour Statistics and PMI, companies will require roughly 88 million people in project management-related activities by 2027.
This post and accompanying notebook and tutorial video demonstrate how to use Cleanlab Studio to improve the performance of Large Language Models (LLMs.
This article will show you how to use OpenAI's Whisper API to transcribe audio into text. It will also show you how to use it in your own projects and how to integrate it into your data science projects.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Every year at this time, I like to share my thoughts on the continuing relevance of Pride Month; you can read my posts here from 2021 and 2022. This Pride Month, we’re going to share insights from the Scott Logic team on what Pride and allyship mean to them, and why they value working in an inclusive environment. I’ll get the ball rolling. What does Pride mean to you?
In a world where user experience and IT support can mean the difference between hitting or missing your ARR marks, businesses have to find smarter ways to build workflows and support their IT departments. That’s where companies like ServiceNow come into play. A few years back, we created our ServiceNow SpotApp , a pre-built analytics template to help companies analyze and understand their data—so they can increase efficiencies across their complex IT environments.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Authors: Ameen Maali , Rohit Pitke , Surbhi Jain , and Mira Thambireddy Security of our members’ data is a key priority at LinkedIn. To tap into the collective insights of the entire security community, we decided to expand our private bug bounty program to everyone on the HackerOne platform last year. In this blog post, we reflect on our journey through the program’s inception, the successes, the learnings, and discuss why our bug bounty program has been so valuable in keeping LinkedIn a secure
MMM (Marketing or Media Mix Modeling), is a data-driven methodology that enables companies to identify and measure the impact of their marketing campaigns.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content