Sat.Mar 11, 2023 - Fri.Mar 17, 2023

article thumbnail

How to Build an On-Call Culture in a Data Engineering Team

Towards Data Science

Systematically resolve data issues in production Continue reading on Towards Data Science »

article thumbnail

Amazon doubling down on return to office

The Pragmatic Engineer

Comments

333
333
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Top 5 SQL Interview Questions With Implementation

Analytics Vidhya

Introduction In today’s world, technology has increased tremendously, and many people are using the internet. This results in the generation of so much data daily. This generated data is stored in the database and will maintain it. SQL is a structured query language used to read and write these databases. In simple words, SQL is used […] The post Top 5 SQL Interview Questions With Implementation appeared first on Analytics Vidhya.

SQL 204
article thumbnail

Top Machine Learning Papers to Read in 2023

KDnuggets

These curated papers would step up your machine-learning knowledge.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Announcing FawltyDeps - a dependency checker for your Python code

Tweag

It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker. In the least, it’s our hope to convince you that Tweag’s new dependency checker, FawltyDeps, can help you maintain an environment that is minimal and reproducible for your Python project, by ensuring that required dependencies are explicitly declared and detecting unused dependencies.

Python 145
article thumbnail

The Collapse of Silicon Valley Bank

The Pragmatic Engineer

It’s been a wild weekend, starting Friday. In case you somehow missed it: we went through the fastest bank run in history, in an event that impacted about half of all VC-funded startups in the US and UK. On Friday night, Silicon Valley Bank (SVB) was shut down by regulators, triggering a weekend of fear and uncertainty for many people and businesses with questions like: “can we make payroll next week?

Banking 243

More Trending

article thumbnail

GPT-4: Everything You Need To Know

KDnuggets

A new model by OpenAI with improved natural language generation and understanding capabilities.

Process 160
article thumbnail

Snowflake Connector for ServiceNow Available in Public Preview

Snowflake

ServiceNow, Inc. offers a well-known SaaS application, with companies in multiple industries using it to help manage digital workloads for a variety of departments and operations. What if it was as easy as just a few clicks to get ServiceNow data directly into your Snowflake account so you could combine it with other data sources, including ERPs, HRs, and CRMs?

article thumbnail

Introduction to Apache Spark History

Waitingforcode

If you need to go back in time and analyze your past Apache Spark applications, you can use the native Apache Spark History server. However, it can also be an infrastructure problem because of the continuously increasing historical logs for streaming applications. In this blog post we'll try to understand this component and to see different configuration options.

IT 130
article thumbnail

Data News — Week 23.11

Christophe Blefari

Took a few days with the ☀️ ( credits ) Hey you, I hope you had a great week. On my side I'm slowly starting to get on top of the things I had in queue. But, sadly, I work in LIFO so I feel that I'm never done. For people that are not use to it it means last in, first out. Which means that I get easily disturbed by a notification—or even a thought—and do something that I did not plan to do at first.

Data 130
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

5 More Command Line Tools for Data Science

KDnuggets

Use these tools to Access API, Manipulate CSV files, download datasets, and more from your terminal.

article thumbnail

How To Scale Your Data Team’s Impact Without Scaling Costs 

Seattle Data Guy

Photo by Lukas As you increase your analytical processes and abilities, you’ll unavoidably increase costs. But there are definite ways to avoid having your costs grow at an unsustainable rate. This is the topic of a panel at the Modern Data Stack Conference featuring Maura Church, ex-director of data science and data engineering from Patreon.… Read more The post How To Scale Your Data Team’s Impact Without Scaling Costs appeared first on Seattle Data Guy.

article thumbnail

5 git Commands your Grandma uses.

Confessions of a Data Guy

The post 5 git Commands your Grandma uses. appeared first on Confessions of a Data Guy.

Data 130
article thumbnail

Data News — Week 23.10

Christophe Blefari

Sorting all the eggs of the landscape ( credits ) Dear readers, this week Data News lands on Saturday and will be a little bit different than usual because I found less relevant article and as promised last week I wanted to speak about the MAD Landscape. I hope you will enjoy this topic focus edition where I speak about economics even if I'm a newbie about economy.

Banking 130
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Multi-label NLP: An Analysis of Class Imbalance and Loss Function Approaches

KDnuggets

In this comprehensive article, we have demonstrated that a seemingly simple task of multi-label text classification can be challenging when traditional methods are applied. We have proposed the use of distribution-balancing loss functions to tackle the issue of class imbalance.

Process 136
article thumbnail

Building a Media Understanding Platform for ML Innovations

Netflix Tech

By Guru Tahasildar , Amir Ziai , Jonathan Solórzano-Hamilton , Kelli Griggs , Vi Iyengar Introduction Netflix leverages machine learning to create the best media for our members. Earlier we shared the details of one of these algorithms , introduced how our platform team is evolving the media-specific machine learning ecosystem , and discussed how data from these algorithms gets stored in our annotation service.

Media 120
article thumbnail

Get started with new role-based onboarding trainings for Databricks Lakehouse Platform

databricks

The demand for data, analytics, and AI talent continues to grow as organizations in every industry adopt new technologies to become more efficient.

article thumbnail

Explore the New Calculate Composite Index Tool in ArcGIS Pro 3.1

ArcGIS

Create indices measuring risk, equity, vulnerability, and more, using the new Calculate Composite Index tool.

109
109
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

What Are The Downsides of AI Advancement?

KDnuggets

While AI has certainly several positive uses to offer the world, it’s also displaying harm when it comes to academics, cybersecurity, the environment, jobs, and privacy.

IT 132
article thumbnail

Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi

Uber Engineering

Uber’s Global Data Warehouse team leveraged Apache Hudi to drastically improve performance of traditional batch ETL pipelines by going incremental, improving business-critical data’s freshness, quality, and completeness.

article thumbnail

Production-Ready and Resilient Disaster Recovery for DLT Pipelines

databricks

Disaster recovery is a standard requirement for many production systems, especially in the regulated industries. As many companies rely on data to make.

Systems 98
article thumbnail

MLOps-Tips and Tricks-75 Code Snippets

Towards Data Science

MLOps and Data Engineering Continue reading on Towards Data Science »

Coding 98
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

NoSQL Databases and Their Use Cases

KDnuggets

Learn about NoSQL Databases and their types like key-value, document, graph and column family with their use cases.

NoSQL 126
article thumbnail

Introducing ArcGIS Reality!

ArcGIS

Learn more about the just released ArcGIS Reality Studio application and ArcGIS Reality for ArcGIS Pro extension.

98
article thumbnail

Building the Lakehouse for Healthcare and Life Sciences - Processing DICOM images at scale with ease

databricks

One of the biggest challenges in understanding patient health status and disease progression is unlocking insights from the vast amounts of semi-structured and.

article thumbnail

Concurrently Train Multiple Time Series Models Over Spark with XGBoost

Towards Data Science

Take advantage of the distributive power of Apache Spark and concurrently train thousands of auto-regressive time-series models on big data Photo by Ricardo Gomez Angel on Unsplash 1. Intro Suppose you have a large dataset consisting of your customers’ hourly transactions, and you were tasked with helping your company forecast and identify anomalies in their transaction patterns.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

OpenChatKit: Open-Source ChatGPT Alternative

KDnuggets

OpenChatKit enables developers to fine-tune the model, maintain context in dialog, moderate responses, and effortlessly build their own custom chatbot applications.

Building 116
article thumbnail

What’s new with the ArcGIS Utility Network at ArcGIS Pro 3.1

ArcGIS

Learn more about exciting new functionality and improvements made to ArcGIS Utility Network with ArcGIS Pro 3.

article thumbnail

Real-Time Insights: The Top Three Reasons Why Customers Love Data Streaming with Databricks

databricks

The world operates in real-time The ability to make real-time decisions in today's fast paced world is more critical than ever before. Today's.

Data 98
article thumbnail

How Will Artificial Intelligence Help Good Managers Become Great?

U-Next

Introduction – Adaptation and Evolution of AI in Management Several businesses use Machine Learning and Artificial Intelligence in management. The most significant AI tools are based on a vast amount of data, recognizing patterns, learning from them, and making definitive predictions. AI is becoming popular in project management because of its exceptional capacity to track particular trends and predict project situations and results.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m