Sat.Aug 12, 2023 - Fri.Aug 18, 2023

article thumbnail

Don't sleep when you code.about sleep issue in KPL

Waitingforcode

Lessons learned why it's always worth checking the code implementation to avoid surprises later. Even for vendor-supported solutions.

Coding 130
article thumbnail

Unpacking The Seven Principles Of Modern Data Pipelines

Data Engineering Podcast

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Best Practices to Use OpenAI GPT Model

KDnuggets

The improvement you need to get the best result from GPT.

Python 108
article thumbnail

Internship experience with the Spatial Analyst Team at Esri in Summer 2023

ArcGIS

Summer internship experience with the Raster Analysis team at Esri- experience the world of GIS with Rakibul Ahasan.

98
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Your Data’s (Finally) In The Cloud. Now, Stop Acting So On-Prem

Towards Data Science

The modern data stacks allow you to do things differently, not just at a larger scale. Take advantage of it. Photo by Massimo Botturi on Unsplash Imagine you’ve been building houses with a hammer and nails for most of your career, and I gave you a nail gun. But instead of pressing it to the wood and pulling the trigger, you turn it sideways and hit the nail just like you would as if it were a hammer.

Cloud 98
article thumbnail

Delta UniForm: a universal format for lakehouse interoperability

databricks

One of the key challenges that organizations face when adopting the open data lakehouse is selecting the optimal format for their data. Among.

Data 98

More Trending

article thumbnail

Introducing Immortal Objects for Python

Engineering at Meta

Instagram has introduced Immortal Objects – PEP-683 – to Python. Now, objects can bypass reference count checks and live throughout the entire execution of the runtime, unlocking exciting avenues for true parallelism. At Meta, we use Python (Django) for our frontend server within Instagram. To handle parallelism, we rely on a multi-process architecture along with asyncio for per-process concurrency.

Python 94
article thumbnail

A Simple (Yet Effective) Approach to Implementing Unit Tests for dbt Models

Towards Data Science

Unit testing dbt models has always been one of the most critical missing pieces of the dbt ecosystem.

article thumbnail

Delta Live Tables Now Generally Available on Google Cloud

databricks

Today we are announcing the general availability of Delta Live Tables (DLT) on Google Cloud. DLT pipelines empower data engineers to build reliable.

article thumbnail

LangChain + Streamlit + Llama: Bringing Conversational AI to Your Local Machine

KDnuggets

Integrating Open Source LLMs and LangChain for Free Generative Question Answering (No API Key required).

108
108
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

15 Best AXELOS Certifications That Pay Well in 2023

Knowledge Hut

Staying current with rapidly advancing technology holds significant importance. Obtaining the latest certifications can enhance your professional standing by providing you with sought-after skills, thereby increasing your attractiveness to potential employers. It's noteworthy that AXELOS is a renowned authority in awarding certifications across a diverse spectrum of IT domains.

article thumbnail

On-Premise vs Cloud: Where Does the Future of Data Storage Lie?

Monte Carlo

Imagine you’ve been building houses with a hammer and nails for most of your career, and I gave you a nail gun. But instead of pressing it to the wood and pulling the trigger, you turn it sideways and hit the nail with the gun as if it were a hammer. You would probably think it’s expensive and not overly effective, while the site’s inspector is going to rightly view it as a safety hazard.

article thumbnail

Modular Orchestration with Databricks Workflows

databricks

Thousands of Databricks customers use Databricks Workflows every day to orchestrate business critical workloads on the Databricks Lakehouse Platform. As is often the.

98
article thumbnail

How to Build a Real-Time Recommendation Engine Using Graph Databases

KDnuggets

"You may also like" is a simple phrase that implies a new era in the way businesses interact and connect with their customers, and graph databases can easily help to build recommendation engines.

Database 108
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Where’s My Data?—?A Unique Encounter with Flink Streaming’s Kinesis Connector

Lyft Engineering

Where’s My Data — A Unique Encounter with Flink Streaming’s Kinesis Connector For years now, Lyft has not only been a proponent of but also a contributor to Apache Flink. Lyft’s pipelines have evolved drastically over the years , yet, time and time again, we run into unique cases that stretch Flink to its breaking points — this is one of those times.

Data 85
article thumbnail

BPFAgent: eBPF for Monitoring at DoorDash

DoorDash Engineering

As DoorDash experienced rapid growth over the last few years, we began to see the limits of our traditional methods of monitoring. Metrics, logs, and traces provide vital information about our service ecosystem. But these signals almost entirely rely on application-level instrumentation, which can leave gaps or conflicting semantics across different systems.

Bytes 84
article thumbnail

How ActionIQ Integrates with the Databricks Lakehouse Part One: Enable Personalization Without Data Replication

databricks

The Personalization Paradigm: Balancing Business Self-Service and Data Governance Personalization transforms businesses, shaping and reshaping the way brands connect with their audiences. Its.

article thumbnail

How to Use ChatGPT to Convert Text into a PowerPoint Presentation

KDnuggets

A speedy way to convert a long text to a short PowerPoint Presentation using only ChatGPT.

108
108
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How to Ensure Supply Chain Security for AI Applications

Cloudera

Machine Learning (ML) is at the heart of the boom in AI Applications, revolutionizing various domains. From powering intelligent Large Language Model (LLM) based chatbots like ChatGPT and Bard , to enabling text-to-AI image generators like Stable Diffusion , ML continues to drive innovation. Its transformative impact advances multiple fields from genetics to medicine to finance.

article thumbnail

Curbing Connection Churn in Zuul

Netflix Tech

By Arthur Gonigberg , Argha C Plaintext Past When Zuul was designed and developed , there was an inherent assumption that connections were effectively free, given we weren’t using mutual TLS (mTLS). It’s built on top of Netty , using event loops for non-blocking execution of requests, one loop per core. To reduce contention among event loops, we created connection pools for each, keeping them completely independent.

article thumbnail

Your Data’s (Finally) In The Cloud. Now, Stop Acting So On-Prem

Monte Carlo

Imagine you’ve been building houses with a hammer and nails for most of your career, and I gave you a nail gun. But instead of pressing it to the wood and pulling the trigger, you turn it sideways and hit the nail just like you would as if it were a hammer. You would probably think it’s expensive and not overly effective, while the site’s inspector is going to rightly view it as a safety hazard.

Cloud 69
article thumbnail

Learn Data Cleaning and Preprocessing for Data Science with This Free eBook

KDnuggets

In this free ebook, readers will learn how to employ data cleaning and preprocessing for data science using the Python ecosystem.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

New Accreditations for Cloudera Partners

Cloudera

Remember when we announced our redesigned partner program Cloudera Partner Network (CPN) last year? Our goal was to create a more competency-based approach and more comprehensive tools and support to help partners guide their customers adopting modern data strategies based on the Cloudera hybrid data platform. In addition, CPN helps our partners go to market faster, and provides industry-leading incentives and promotions aligned with partner business and sales models.

article thumbnail

AVA Discovery View: Surfacing Authentic Moments

Netflix Tech

By: Hamid Shahid , Laura Johnson , Tiffany Low Synopsis At Netflix, we have created millions of artwork to represent our titles. Each artwork tells a story about the title it represents. From our testing on promotional assets , we know which of these assets have performed well and which ones haven’t. Through this, our teams have developed an intuition of what visual and thematic artwork characteristics work well for what genres of titles.

Media 71
article thumbnail

How to Simplify Data Pipelines with DBT and Airflow?

Workfall

Reading Time: 7 minutes In today’s data-driven world, efficient data pipelines have become the backbone of successful organizations. These pipelines ensure that data flows smoothly from various sources to its intended destinations, enabling businesses to make informed decisions and gain valuable insights. Two powerful tools that have emerged to simplify the management of data pipelines are DBT (Data Build Tool) and Airflow.

article thumbnail

KDnuggets News, August 16: Use ChatGPT to Convert Text into a PowerPoint Presentation • Best Python Tools for Building Generative AI Applications Cheat Sheet

KDnuggets

How to Use ChatGPT to Convert Text into a PowerPoint Presentation • Best Python Tools for Building Generative AI Applications Cheat Sheet • Data Scientists Need to Specialize to Survive the Tech Winter • Python Vector Databases and Vector Indexes: Architecting LLM Apps • How To Speed Up SQL Queries Using Indexes [Python Edition]

Python 108
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Why You Shouldn’t Use Notebooks for Production Data Pipelines

Ascend.io

Jupyter Notebooks have fundamentally revolutionized how data scientists approach their tasks. They offer an unparalleled environment for experimentation and visualization. Yet, there’s an interest in putting notebooks directly into production environments. While it’s great to take the ideas from notebooks and use them in real-world settings, trying to put the entire notebook directly into production as a code artifact can cause problems.

article thumbnail

Best AWS Certifications For Cloud Professionals in 2023

Knowledge Hut

In my early career, I knew that getting certified in AWS would be essential for success. Now that I have obtained multiple AWS certifications, I can vouch for their value to professionals & companies alike. With cloud computing becoming the new norm in today's marketplace, AWS certifications are nothing short of essential. From AWS Certified Solutions Architect to AWS Certified DevOps Engineer, there are many different paths to choose from as per your career goals & skill set.

AWS 52
article thumbnail

How Alex Bank built a real-time banking experience with Confluent

Confluent

Learn how Australia’s Alex Bank leveraged real-time streaming to create a data-driven, customer-focused banking experience.

Banking 52
article thumbnail

Python Vector Databases and Vector Indexes: Architecting LLM Apps

KDnuggets

Vector databases enable fast similarity search and scale across data points. For LLM apps, vector indexes can simplify architecture over full vector databases by attaching vectors to existing storage. Choosing indexes vs databases depends on specialized needs, existing infrastructure, and broader enterprise requirements.

Database 108
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m