Sat.Oct 14, 2023 - Fri.Oct 20, 2023

article thumbnail

dbt multi-project collaboration

Christophe Blefari

cross-project dependencies ( credits ) Over the last few years, dbt has become a de facto standard enabling companies to collaborate easily on data transformations. With dbt, you can apply software engineering practices to SQL development. Managing your SQL patrimony has never been easier. So, yes, dbt is cool but there is a common pattern with it: you accumulate SQL queries.

Project 264
article thumbnail

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams.

Process 182
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to use Airflow templates and macros

Marc Lamberti

Templates and Macros in Apache Airflow allow passing data to your DAGs at runtime. Imagine that you want to execute an SQL request with the execution date of your DAG. How can you do that? How can you use the DAG ID when you send notifications to know which DAG to look at? Or what if you need to know when the next DAG run will be? Well, macros and templates answer these questions.

SQL 130
article thumbnail

Watermark and input data filtering in Apache Spark Structured Streaming

Waitingforcode

I've already written about watermarks in a few places in the blog but despite that, I still find things to refresh. One of them is the watermark used to filter out the late data, which will be the topic of this blog post.

Data 130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data News — Week 23.42

Christophe Blefari

Writing about dbt like a sheep ( credits ) Hey, this week Coalesce—the dbt Labs annual conference—took place. During 3 days, people shared how they used dbt around the world. I'll, as usual, write a takeaway post after binge watching all keynotes, but this is for next week. Still dbt Labs announcements were mainly towards dbt Cloud with great features to drive adoption of the paid product.

article thumbnail

The State of WebAssembly 2023 by Colin Eberhardt

Scott Logic

The State of WebAssembly 2023 survey has closed, the results are in … and they are fascinating! If you want the TL;DR; here are the highlights: Rust and JavaScript usage is continuing to increase, but some more notable changes are happening a little further down - with both Swift and Zig seeing a significant increase in adoption. When it comes to which languages developers ‘desire’, with Zig, Kotlin and C# we see that desirability exceeds current usage WebAssembly is still most often used for we

More Trending

article thumbnail

How DoorDash Standardized and Improved Microservices Caching

DoorDash Engineering

As DoorDash’s microservices architecture has grown, so too has the volume of interservice traffic. Each team manages their own data and exposes access through gRPC services, an open-source remote procedure call framework used to build scalable APIs. Most business logic is I/O-bound because of calls to downstream services. Caching has long been a go-to strategy to improve performance and reduce costs.

Database 121
article thumbnail

Data News — Airflow Summit 2023 takeaways

Christophe Blefari

( credits ) Hello, dear Data News reader, I hope you'll enjoy this new edition. It's amazing how quickly time flies and this summer I passed the 3-year mark since I started my freelance adventure. I'm so happy with what it's brought me. But I've got this internal alarm that goes off every 3 years asking me for new things. It's time for me to search for my future paths.

Python 130
article thumbnail

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

LinkedIn Engineering

Authors: Bingfeng Xia and Xinyu Liu Background At LinkedIn, Apache Beam plays a pivotal role in stream processing infrastructures that process over 4 trillion events daily through more than 3,000 pipelines across multiple production data centers. This robust framework empowers near real-time data processing for critical services and platforms, ranging from machine learning and notifications to anti-abuse AI modeling.

Process 119
article thumbnail

5 Free Books to Master Data Science

KDnuggets

Want to break into data science? Check this list of free books for learning Python, statistics, linear algebra, machine learning and deep learning.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Prepare your data for the National Spatial Reference System modernization of 2022 in the U.S.

ArcGIS

The new U.S. datums of 2022 will soon be released. This article covers what is coming and how you should prepare your data.

Systems 141
article thumbnail

Automating product deprecation

Engineering at Meta

Systematic Code and Asset Removal Framework (SCARF) is Meta’s unused code and data deletion framework. SCARF guides engineers through deprecating a product safely and efficiently via an internal tool. SCARF combines this tooling with automation to reduce load on engineers. At Meta, we are constantly innovating and experimenting by building and shipping many different products, and those products comprise thousands of individual features.

Coding 115
article thumbnail

Simplifying Production MLOps with Lakehouse AI

databricks

Machine learning (ML) is more than just developing models; it's about bringing them to life in real-world, production systems. But transitioning from prototype.

article thumbnail

7 Steps to Mastering Large Language Models (LLMs)

KDnuggets

Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

Building 144
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

JSON Schemas to Nickel contracts

Tweag

At Tweag we have been cooking up a JSON Schema to Nickel contract converter , that we’re excited to announce! Background Nickel is a configuration language being developed at Tweag. You can get some deep dives into its design from previous blog posts. I’ll summarize it here as JSON, plus functions, plus types and contracts. One of its main use-cases is generating JSON configurations for other programs (Terraform, GitHub actions, etc).

Coding 106
article thumbnail

How Meta is creating custom silicon for AI

Engineering at Meta

With the recent launches of MTIA v1 , Meta’s first-generation AI inference accelerator, and Llama 2 , the next generation of Meta’s publicly available large language model, it’s clear that Meta is focused on advancing AI for a more connected world. Fueling the success of these products are world-class infrastructure teams, including Meta’s custom AI silicon team, led by Olivia Wu, a leader in the silicon industry for 30 years.

Designing 113
article thumbnail

The benefits of modern data architecture

InData Labs

Big data is central to the efficient running of all modern organizations, but to be of use, raw data must be suitably organized. The way that businesses organize data assets is commonly known as data architecture, with the benefits of modern data architecture enabling teams to respond to changing demands with improved agility when compared. Запись The benefits of modern data architecture впервые появилась InData Labs.

article thumbnail

Semantic Layer: The Backbone of AI-powered Data Experiences

KDnuggets

Looking to understand the semantic layer and how it can improve the AI-powered data experience? Read more to learn why a semantic layer can be the backbone of LLMs and reduce hallucinations.

Data 136
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Bring Your Own Algorithm to Anomaly Detection

Pinterest Engineering

Charles Wu | Software Engineer; Isabel Tallam | Software Engineer; Kapil Bajaj | Engineering Manager Overview In this blog, we present a pragmatic way of integrating analytics, written in Python, with our distributed anomaly detection platform, written in Java. The approach here could be generalized to integrate processing done in one language/paradigm into a platform in another language/paradigm.

Algorithm 103
article thumbnail

Sounds Like a Better Plan: USA Transportation Noise, Revised and Updated

ArcGIS

The Living Atlas of the World just updated the tiled, hosted image service featuring transportation noise, from the USDOT.

article thumbnail

Analysis of the XLS-30 AMM Amendment

Ripple Engineering

RippleX has enabled its validator to vote in support of the XLS-30 amendment, introducing innovative AMM capabilities to the XRPL. We, at RippleX, place great emphasis on the strength that collaborative effort and shared responsibility bring to the enhancement and security of the XRPL. Today, we earnestly request the community's consideration of the XLS-30 amendment —a proposal poised to offer numerous advantages by bolstering liquidity, offering yield opportunities for liquidity pro

article thumbnail

ChatGPT vs. BARD

KDnuggets

Large language models (LLMs) are transforming the way we process and produce information. But, before considering either one of these models as a one-stop-solution, one must consider their key differences.

Process 135
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

LLM Training on Unity Catalog data with MosaicML Streaming Dataset

databricks

Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to.

Datasets 103
article thumbnail

Automating Reality Mapping: Accelerate Your Drone Workflows with ArcGIS Reality for ArcGIS Pro

ArcGIS

Streamline GIS workflows with ArcGIS Reality for ArcGIS Pro. Automate reality mapping, generate accurate geospatial products.

121
121
article thumbnail

Startup Spotlight: Pave Seeks to Remove Barriers to Accessible Lending

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we learn about amazing companies building their businesses on Snowflake. In this edition, Pave.dev President and Co-Founder Ema Rouf talks about breaking down barriers to accessible credit and financial lending, how running a startup is like climbing a mountain, and how building on Snowflake gives Pave the data sharing capabilities it needs to show financial institutions a better way to identify more creditworthy borrowers.

article thumbnail

Gradient Descent: The Mountain Trekker’s Guide to Optimization with Mathematics

KDnuggets

Gradient descent is an optimization technique used to minimise errors in machine learning models. By iteratively adjusting parameters in the steepest direction of decrease, it seeks the lowest error value.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

How the Lakehouse can optimize provider networks and improve member care

databricks

Check out our Nearest Neighborhood Search Solution Accelerator to get started quickly. The Member Experience An insured member typically experiences their healthcare in.

article thumbnail

Tools for measuring Cloud Carbon Emissions by Darren Smith

Scott Logic

Introduction In my previous blog post I discussed how migrating to the Cloud could help your organisation reach its Net Zero goals. I discussed how shifting your workloads away from on-premises data centres can reduce emissions by allowing you to leverage the expertise of cloud providers and their greater efficiency of scale. It should be noted this isn’t always clear cut - do consider how energy efficient your current hosting is and the embodied carbon of any hardware you’d be decommissioning.

Cloud 87
article thumbnail

Product-Led Growth: 6 Secrets for Success

Snowflake

Product-led growth (PLG) is a business model that emerged in the last decade with the enormous success of vendors like Slack and Datadog. Unlike traditional sales-led models, PLG models cut out the middlemen (sales reps, for example) and let customers just download and use the product without third-party onboarding. The relative novelty of the pricing model and its demonstrably successful application in growing these companies attracted a lot of attention.

article thumbnail

How To Fine-Tune ChatGPT 3.5 Turbo

KDnuggets

This article has outlined how you can fine tune your GPT 3.5 Turbo models. You can do this by preparing your data, uploading your files, and then setting up a custom OpenAI session to handle the fine tuning.

Data 126
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.