October, 2023

article thumbnail

Drag, Drop, Analyze: The Rise of No-Code Data Science

KDnuggets

No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.

article thumbnail

Building a Streaming Data Pipeline with Redshift Serverless and Kinesis

Towards Data Science

An End-To-End Tutorial for Beginners Continue reading on Towards Data Science »

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of seven topics from today’s subscriber-only issue Three Cloud Providers, Three Outages: Three Different Responses.

AWS 233
article thumbnail

dbt multi-project collaboration

Christophe Blefari

cross-project dependencies ( credits ) Over the last few years, dbt has become a de facto standard enabling companies to collaborate easily on data transformations. With dbt, you can apply software engineering practices to SQL development. Managing your SQL patrimony has never been easier. So, yes, dbt is cool but there is a common pattern with it: you accumulate SQL queries.

Project 264
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Introduction of Microsoft Fabric

Analytics Vidhya

In today’s rapidly evolving digital landscape, seamless data, applications, and device integration are more pressing than ever. Enter Microsoft Fabric, a cutting-edge solution designed to revolutionize how we interact with technology. This article will explore the key features and benefits, identify the ideal users for this solution, and guide you on when and how to […] The post Introduction of Microsoft Fabric appeared first on Analytics Vidhya.

Designing 262
article thumbnail

How to use the BranchPythonOperator

Marc Lamberti

Are you looking for a way to choose one task or another? Do you want to execute a task based on a condition? Do you have multiple tasks, but only one should be executed if a criterion is valid? You’ve come to the right place! The BranchPythonOperator does precisely what you are looking for. It’s common to have DAGs with different execution flows, and you want to follow only one, depending on a value or a condition.

Python 246

More Trending

article thumbnail

Surveying The Market Of Database Products

Data Engineering Podcast

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.

Database 189
article thumbnail

Going from Developer to CEO: Chronosphere

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of eight topics from today’s deepdive into tech scaleup Chronosphere. To get full issues twice a week, subscribe here.

article thumbnail

5 Free Books to Master Machine Learning

KDnuggets

Machine Learning is one of the most exciting fields in computer science today. In this article, we will take a look at the five best yet free books to learn machine learning in 2023.

article thumbnail

LLM Inference Performance Engineering: Best Practices

databricks

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Airflow Sensors: What you need to know

Marc Lamberti

Airflow Sensors are one of the most common tasks in data pipelines. Why? Because a Sensor waits for a condition to be true to complete. Do you need to wait for a file? Check if an SQL entry exists? Delay the execution of a DAG? That’s the few possibilities of the Airflow Sensors. If you want to make complex and robust data pipelines, you have to understand how Sensors work genuinely.

article thumbnail

The State of WebAssembly 2023 by Colin Eberhardt

Scott Logic

The State of WebAssembly 2023 survey has closed, the results are in … and they are fascinating! If you want the TL;DR; here are the highlights: Rust and JavaScript usage is continuing to increase, but some more notable changes are happening a little further down - with both Swift and Zig seeing a significant increase in adoption. When it comes to which languages developers ‘desire’, with Zig, Kotlin and C# we see that desirability exceeds current usage WebAssembly is still most often used for we

article thumbnail

AMM Performance Testing Report

Ripple Engineering

Overview In the rippled 1.12.0 release, the AMM amendment stands out as a significant feature in both size and scope. Since September 2022, the RippleX performance team has collaborated closely with the engineering team responsible for the AMM feature implementation. This report presents a thorough overview of our testing approach, findings, and key takeaways.

AWS 144
article thumbnail

Code Review on Printed Paper: an Excerpt from the Twitoons Comic Book

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover two out of seven topics from today’s full issue on The Man Behind the Big Tech Comics. To get full issues twice a week, subscribe here.

Coding 187
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

7 Steps to Mastering Large Language Models (LLMs)

KDnuggets

Large Language Models (LLMs) have unlocked a new era in natural language processing. So why not learn more about them? Go from learning what large language models are to building and deploying LLM apps in 7 easy steps with this guide.

Building 146
article thumbnail

Snowflake To Acquire Ponder, Boosting Python Capabilities In the Data Cloud

Snowflake

Python’s popularity has more than doubled in the past decade¹ and it is quickly becoming the preferred language for development across machine learning, application development, pipelines, and more. One of our goals at Snowflake is to ensure we continue to deliver a best-in-class platform for Python developers. Snowflake customers are already harnessing the power of Python through Snowpark , a set of runtimes and libraries that securely deploy and process non-SQL code directly in Snowflake.

Python 141
article thumbnail

Automating dead code cleanup

Engineering at Meta

Meta’s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code. SCARF combines static and dynamic analysis of programs to detect dead code from both a business and programming language perspective. SCARF automatically creates change requests that delete the dead code identified from the program analysis, minimizing developer costs.

Coding 134
article thumbnail

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

LinkedIn Engineering

Think of how many times a day you use some type of search functionality across your devices and applications to discover information, find a contact, or a new job opportunity. The truth is we all depend on the ability to search for things online, and finding the right match to the information, organization, or to a job that maps to your skills and interests makes all the difference in our experiences and the knowledge we can gain.

IT 133
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Announcing Apache Flink 1.18

Confluent

Read updates and improvements in Apache Flink 1.18, including dynamic fine-grained rescaling via REST API, Java 17 support, and faster rescaling & batch performance improvements.

Java 125
article thumbnail

Announcing MLflow 2.8 LLM-as-a-judge metrics and Best Practices for LLM Evaluation of RAG Applications, Part 2

databricks

Today we're excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of.

article thumbnail

7 Best Cloud Database Platforms

KDnuggets

Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.

Database 144
article thumbnail

How DoorDash Standardized and Improved Microservices Caching

DoorDash Engineering

As DoorDash’s microservices architecture has grown, so too has the volume of interservice traffic. Each team manages their own data and exposes access through gRPC services, an open-source remote procedure call framework used to build scalable APIs. Most business logic is I/O-bound because of calls to downstream services. Caching has long been a go-to strategy to improve performance and reduce costs.

Database 121
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

High resolution data updates to Living Atlas World Elevation Layers and Tools (October 2023)

ArcGIS

In October 2023, elevation layers have been updated with high-res datasets of France, New Zealand, USA, Italy along with global bathymetry.

Datasets 134
article thumbnail

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

LinkedIn Engineering

Authors: Bingfeng Xia and Xinyu Liu Background At LinkedIn, Apache Beam plays a pivotal role in stream processing infrastructures that process over 4 trillion events daily through more than 3,000 pipelines across multiple production data centers. This robust framework empowers near real-time data processing for critical services and platforms, ranging from machine learning and notifications to anti-abuse AI modeling.

Process 119
article thumbnail

nixtract 0.1.0

Tweag

Tweag is excited to announce the first release of nixtract 0.1.0 ! This is our first step towards a broader effort to make Nix the best tool to tackle tomorrow’s challenges of the Software Supply Chain. In order to understand why we need nixtract , let me tell you about the “secret” value of Nixpkgs. Is it a bird? A plane? It’s a graph! The Nix language allows you to define the “recipe” to build anything into a package, like the sources and the steps to make the package, but also the dependencie

Metadata 114
article thumbnail

Training LLMs at Scale with AMD MI250 GPUs

databricks

Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

5 Free Books to Master Data Science

KDnuggets

Want to break into data science? Check this list of free books for learning Python, statistics, linear algebra, machine learning and deep learning.

article thumbnail

Automating product deprecation

Engineering at Meta

Systematic Code and Asset Removal Framework (SCARF) is Meta’s unused code and data deletion framework. SCARF guides engineers through deprecating a product safely and efficiently via an internal tool. SCARF combines this tooling with automation to reduce load on engineers. At Meta, we are constantly innovating and experimenting by building and shipping many different products, and those products comprise thousands of individual features.

Coding 116
article thumbnail

Prepare your data for the National Spatial Reference System modernization of 2022 in the U.S.

ArcGIS

The new U.S. datums of 2022 will soon be released. This article covers what is coming and how you should prepare your data.

Systems 141
article thumbnail

More Effectively Control and Limit Your Spend With Budgets

Snowflake

At Snowflake, we’re committed to helping customers effectively manage and optimize spend. To this effect, we’re excited to launch the public preview of Budgets on AWS today, which enables customers to set spending limits and receive notifications for Snowflake credit usage for either their entire Snowflake account or for a custom group of resources within an account.

Retail 113
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.