November, 2024

article thumbnail

Which IDEs do software engineers love, and why?

The Pragmatic Engineer

It’s been nearly 6 months since our research into which AI tools software engineers use, in the mini-series, AI tooling for software engineers: reality check. At the time, the most popular tools were ChatGPT for LLMs, and GitHub copilot for IDE-integrated tooling. Then this summer, I saw the Cursor IDE becoming popular around when Anthropic’s Sonnet 3.5 model was released, which has superior code generation compared to ChatGPT.

article thumbnail

What do Snowflake, Databricks, Redshift, BigQuery actually do?

Start Data Engineering

1. Introduction 2. Analytical databases aggregate large amounts of data 3. Most platforms enable you to do the same thing but have different strengths 3.1. Understand how the platforms process data 3.1.1. A compute engine is a system that transforms data 3.1.2. Metadata catalog stores information about datasets 3.1.3. Data platform support for SQL, Dataframe, and Dataset APIs 3.1.4.

Metadata 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Challenges You Will Face When Parsing PDFs With Python – How To Parse PDFs With Python

Seattle Data Guy

Scraping data from PDFs is a right of passage if you work in data. Someone somewhere always needs help getting invoices parsed, contracts read through, or dozens of other use cases. Most of us will turn to Python and our trusty list of Python libraries and start plugging away. Of course, there are many challenges… Read more The post Challenges You Will Face When Parsing PDFs With Python – How To Parse PDFs With Python appeared first on Seattle Data Guy.

Python 130
article thumbnail

15+ Companies Using DuckDB in Production: A Comprehensive Guide

Simon Späti

From Fortune 500 companies processing trillions of security records to innovative startups building interactive data tools, DuckDB is revolutionizing how organizations handle analytical workloads. Building on our exploration of DuckDB’s core capabilities in Part 1 , this guide showcases production implementations and promising experimental applications across five key categories.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Data News — Week 24.45

Christophe Blefari

Métro-boulot-dodo ( credits ) It's Data News time. Time really flies on my side, and apart from the bad news from across the Atlantic, all is well on my side. To be honest, I miss you folks. Writing here has been my little thing for the last 3 years and because I haven't been able to get back to my previous frequency since July, I feel empty every Friday.

Data 130
article thumbnail

GHC's wasm backend now supports Template Haskell and ghci

Tweag

Two years ago I wrote a blog post to announce that the GHC wasm backend had been merged upstream. I’ve been too lazy to write another blog post about the project since then, but rest assured, the project hasn’t stagnated. A lot of improvements have happened after the initial merge, including but not limited to: Many, many bugfixes in the code generator and runtime, witnessed by the full GHC testsuite for the wasm backend in upstream GHC CI pipelines.

Coding 137

More Trending

article thumbnail

Robinhood Crypto Expands Offering with Solana (SOL), Pepe (PEPE), Cardano (ADA) & XRP (XRP) for U.S. Customers

Robinhood

Robinhood Crypto’s commitment to expanding access and maintaining a safe, easy-to-use platform deepens with the addition of 4 digital assets Today, Robinhood Crypto announced the addition of Solana (SOL), Pepe (PEPE), Cardano (ADA) & XRP (XRP) to its U.S. platform, bringing the total number of cryptocurrencies available for trading to 19. You can see a full list of crypto assets currently available in the U.S. here.

Insurance 141
article thumbnail

From IC to Data Leader: Key Strategies for Managing and Growing Data Teams

Seattle Data Guy

There are plenty of statistics about the speed at which we are creating data in today’s modern world. On the flip side of all that data creation is a need to manage all of that data and thats where data teams come in. But leading these data teams is challenging and yet many new data… Read more The post From IC to Data Leader: Key Strategies for Managing and Growing Data Teams appeared first on Seattle Data Guy.

article thumbnail

BI-as-Code and the New Era of GenBI

Simon Späti

BI-as-Code and the New Era of GenBI Imagine creating business dashboards by simply describing what you want to see. No more clicking through complex interfaces or writing SQL queries - just have a conversation with AI about your data needs. This is the promise of Generative Business Intelligence (GenBI). At its core, GenBI delivers an unreasonably effective human interface , where we iterate quickly, based on BI-as-Code.

BI 130
article thumbnail

DuckDB … reading from s3 … with AWS Credentials and more.

Confessions of a Data Guy

In my never-ending quest to plumb the most boring depths of every single data tool on the market, I found myself annoyed when recently using DuckDB for a benchmark that was reading parquet files from s3. What was not clear, or easy, was trying to figure out how DuckDB would LIKE to read default AWS […] The post DuckDB … reading from s3 … with AWS Credentials and more. appeared first on Confessions of a Data Guy.

AWS 113
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Best No-Code LLM App Builders

KDnuggets

Build an LLM application by easily picking and dropping components and connecting them, such as a vector store, web search, memory, and custom prompt.

Coding 149
article thumbnail

How to present and share your Notebook insights in AI/BI Dashboards

databricks

We’re excited to announce a new integration between Databricks Notebooks and AI/BI Dashboards, enabling you to effortlessly transform insights from your notebooks into.

BI 117
article thumbnail

Deep Dive into Handling Consumer Fetch Requests: Kafka Producer and Consumer Internals, Part 4

Confluent

In the final article of this four-part series on Kafka producer and consumer internals, observe the inner workings of brokers as they attempt to serve data up to consumers.

Kafka 98
article thumbnail

What is Unstructured Data? A Guide to Storage, Processing, and Analysis

Seattle Data Guy

Much of the data we have used for analysis in traditional enterprises has been structured data. It’s easy for humans to break down, understand, and, in turn, find insights from it. However, much of the data that is being created and will be created comes in some form of unstructured format. However, the digital era… Read more The post What is Unstructured Data?

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

PSPO Study Guide: The Best Plan to Crack PSPO Exam 2025

Knowledge Hut

Scrum is a quality-driven process for producing excellent business outcomes. Organizations are looking for professional product owners that grasp this notion and can use it in the real world. Employers use many credentialing services to certify levels of comprehension and application by level, which are referred to as belts. Scrum training sessions, along with resources like a PSPO study guide, assist you in learning PSPO I principles, studying efficiently and effectively to pass your exam, adva

article thumbnail

Turkey Day Is Here – Black Friday Sale – %50 Off

Confessions of a Data Guy

Well, another turkey day has come upon us all. I trust you are getting at least a day or two off from your overlords from writing code and taking names. While the rest of you will be slicing up that turkey with your friends and family, clinking your glasses and giving toasts to each other, […] The post Turkey Day Is Here – Black Friday Sale – %50 Off appeared first on Confessions of a Data Guy.

Coding 100
article thumbnail

Math Myths Busted: What Beginners Actually Need for Data Science

KDnuggets

Terrified of calculus but dream of being a data scientist? Breathe easy! Discover the surprising truth about math in data science and how you can succeed without being a math genius.

article thumbnail

Top 10 Marketplace Questions, Answered

databricks

Databricks Marketplace is an open marketplace for data, analytics, and AI, powered by the open-source Delta Sharing standard. Since the release of Databricks.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Connect with Confluent Q4 Update: New Program Entrants and SAP Datasphere Hydration

Confluent

Confluent’s CwC partner program introduces bidirectional data streaming for SAP Datasphere, powered by Apache Kafka and Apache Flink; CwC Q4 2024 new entrants.

article thumbnail

How Data Teams Drive Business Success by Understanding Core Metrics

Seattle Data Guy

A key responsibility for any data team is to understand the core metrics driving their business. Starting from the top, these metrics often include figures like gross revenue and expenses. However, these high-level metrics can feel too far removed and abstract from the actual business. Many companies, therefore, break down these top-line metrics into more… Read more The post How Data Teams Drive Business Success by Understanding Core Metrics appeared first on Seattle Data Guy.

Data 130
article thumbnail

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Monte Carlo

A data engineering architecture is the structural framework that determines how data flows through an organization – from collection and storage to processing and analysis. It’s the big blueprint we data engineers follow in order to transform raw data into valuable insights. Before building your own data architecture from scratch though, why not steal – er, learn from – what industry leaders have already figured out?

article thumbnail

Share and Monetize AI Models Securely on the AI Data Cloud

Snowflake

The rise of generative AI models are spurring organizations to incorporate AI and large language models (LLMs) into their business strategy. After all, these models open up new opportunities to extract greater value from a company’s data and IP and make it accessible to a wider audience across the organization. One key to successfully leveraging gen AI models is the ability to share data.

Cloud 86
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Impact of GenAI on the Software Testing Market

KDnuggets

Could AI replace traditional software testers? Learn how Generative AI transforms their roles and supercharges testing efficiency without missing critical tests.

116
116
article thumbnail

Celebrating Innovation: Announcing the Finalists of the Databricks Generative AI Startup Challenge

databricks

We are thrilled to unveil the finalists for the Databricks Generative AI Startup Challenge , a competition designed to spotlight innovative early-stage startups.

Designing 108
article thumbnail

Netflix’s Distributed Counter Abstraction

Netflix Tech

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction. This counting service, built on top of the TimeSeries Abstraction, enables distributed counting at scale while maintaining similar low latency performance.

article thumbnail

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Seattle Data Guy

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for. As… Read more The post What Is AWS DMS And Why You Shouldn’t Use It As An ELT appeared first on Seattle Data Guy.

AWS 130
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

We are excited to announce the acquisition of Octopai , a leading data lineage and catalog platform that provides data discovery and governance for enterprises to enhance their data-driven decision making. Cloudera’s mission since its inception has been to empower organizations to transform all their data to deliver trusted, valuable, and predictive insights.

article thumbnail

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. For organizations considering moving from a legacy data warehouse to Snowflake, looking to learn more about how the AI Data Cloud can support legacy Hadoop use cases, or assessing new options if your current cloud data warehouse just isn’t scaling anymore, it helps to see how others have done it.

article thumbnail

AnythingLLM: The LLM Application You’ve Been Waiting For

KDnuggets

Turn any document into a conversation-ready AI tool with AnythingLLM — a versatile, open-source platform for building a secure, private assistant.

Building 134
article thumbnail

Announcing the General Availability of Materialized Views and Streaming Tables for Databricks SQL

databricks

We’re excited to announce that materialized views (MVs) and streaming tables (STs) are now Generally Available in Databricks SQL on AWS and Azure.

SQL 112
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!