This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Advice from a startup founder in the data space on how to find a startup that works for you Photo by Leeloo Thefirst from Pexels.com So you want to join a startup huh? I’m not talking about a fancy Series E startup that’s about to go IPO funded by a16z. I’m talking about a real startup, from seed to series B — where every day can feel like you’re either about to soar or crash and burn — and there’s little in between.
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. To get full issues twice a week, subscribe here. The past month, we’ve done deepdives in the newsletter on what a senior software engineer is at Big Tech , and at scaleups.
Learning about how to data models from basic star schemas on the internet is like learning data science using the IRIS data set. It works great as a toy example. But it doesn’t match real life at all. Data modeling in real life requires you fully understand the data sources and your business use cases.… Read more The post Why Is Data Modeling So Challenging – How To Data Model For Analytics appeared first on Seattle Data Guy.
Introduction In the bustling arena of database management systems, two heavyweight contenders emerge, each carrying its arsenal of features and capabilities. In one corner, we have the suave and sophisticated Microsoft SQL Server (MSSQL), donned in the elegance of enterprise-level prowess. And in the other corner the scrappy and open-source MySQL, armed with its community-driven […] The post MSSQL vs MySQL: Comparing Powerhouses of Databases appeared first on Analytics Vidhya.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
7 Projects Built with Generative AI • Beyond Numpy and Pandas: Unlocking the Potential of Lesser-Known Python Libraries • 5 Ways You Can Use ChatGPT’s Code Interpreter For Data Science • GPT-4: 8 Models in One; The Secret is Out
In the past few days, the scope and trajectory of Instacart’s use of Snowflake has been misrepresented by some on social media. Snowflake has partnered closely with Instacart to scale up to meet the company’s massive demand growth, and then to optimize for efficiency. Optimizations are undertaken on a workload-by-workload basis, and have been extremely successful.
It’s no secret that modern organizations are doubling down on their investments in data - investments that uncover deep customer insights that provide a.
It’s no secret that modern organizations are doubling down on their investments in data - investments that uncover deep customer insights that provide a.
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of for topics from the past newsletter issue Game Development Basics. To get the full issues, twice a week, subscribe here.
If Delta Lake implemented the commits only, I could stop exploring this transactional part after the previous article. But as for RDBMS, Delta Lake implements other ACID-related concepts. One of these are isolation levels.
Introduction In today’s data-driven world, seamless data integration plays a crucial role in driving business decisions and innovation. Two prominent methodologies have emerged to facilitate this process: Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT). In this article, we will discuss ELT vs ETL, comparing their characteristics, benefits, and suitability for various use cases. […] The post ELT vs ETL: Unveiling the Differences and Similarities appeared first on Ana
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Summary Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale.
I still remember that day. A day that shall live on in infamy in my mind. Well over a decade ago, in the days when SQL Server roamed the land devouring souls on the Altar of Stored Procedures. There was only one tool available at the time. SQL. That’s it. There was one problem that […] The post The Case of the Mysterious Recursive CTE appeared first on Confessions of a Data Guy.
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.
One of the great features of modern table file formats is the ability to handle write conflicts. It wouldn't be possible without commits that are the topic of this new blog post.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning. There are numerous other methods that technical teams are utilizing to handle their data effectively. In this presentation, I… Read more The post Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics appeared first
Summary Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Do you think I’m just trying to get you to click? Maybe. Maybe not. After working in and around Data Teams for well over a decade, with both the smartest people to touch the keyboard, and the others, it’s become quite clear to me what the number one skill that identifies a Senior level Engineering […] The post Senior Engineer – The Number One Skill appeared first on Confessions of a Data Guy.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.
Data Quality Chronicles Missing data, missing mechanisms, and missing data profiling Missing Data prevents data scientists to see the entire story the data has to tell. Sometimes, even the smallest pieces of information can provide a completely unique view of the world. Photo by Ronan Furuta on Unsplash. Earlier this year, I started a piece on several data quality issues (or characteristics) that heavily compromise our machine learning models.
The Burtch Works 2023 Data Science & AI Professionals salary report is here, and includes insightful data such as hiring and marketplace trends, compensation changes over time, and salary data. Get your copy here.
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data.
👋 Hi, this is Gergely with a free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Pulse issue. If you’re not yet a full subscriber, you missed this week’s deepdive: The 2023 tech market, as seen by hiring managers. To get full newsletters twice a week, subscribe here.
There are probably not that many people working today on the flat files with Structured Streaming than 5 years ago thanks to the table file formats. However, if you are in this group and are still generating CSVs or JSONs with the streaming sink, brace yourself, the memory problems are coming if you don't take action!
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Bitcoin and Dogecoin support is now available to all Robinhood Wallet users, and in-app Ethereum Swaps started rolling out today Since launching to the general public nearly six months ago, Robinhood Wallet has seen significant adoption globally, with hundreds of thousands of users in more than 140 countries worldwide. We are always gathering feedback, and have heard loud and clear that people want access to more coins on more chains.
This post discusses the importance of media mix modeling and how it can be used to maximize the business impact of advertising. It also discusses the impact of seasonality on media advertising and how media mix modeling can be used to minimize the impact of seasonality on business outcomes.
Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content