Sat.Jan 21, 2023 - Fri.Jan 27, 2023

article thumbnail

Apple: The only big tech giant going against the job cuts tide

The Pragmatic Engineer

Comments

357
357
article thumbnail

The ChatGPT Cheat Sheet

KDnuggets

Impress your friends and loved ones by perfecting your ChatGPT prompt engineering game with this incredibly useful resource.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building a Life Sciences Knowledge Graph with a Data Lake

databricks

This is a collaborative post from Databricks and wisecube.ai. We thank Vishnu Vettrivel, Founder, and Alex Thomas, Principal Data Scientist, for their contributions.

Data Lake 137
article thumbnail

Watch Meta’s engineers discuss optimizing large-scale networks

Engineering at Meta

Managing network solutions amidst a growing scale inherently brings challenges around performance, deployment, and operational complexities. At Meta, we’ve found that these challenges broadly fall into three themes: 1.) Data center networking: Over the past decade, on the physical front, we have seen a rise in vendor-specific hardware that comes with heterogeneous feature and architecture sets (e.g., non-blocking architecture).

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Data News — Week 23.04

Christophe Blefari

My view from the train window ( credits ) Dear Data News readers it's a joy every week to write this newsletter, we are slowly approaching the second birthday of this newsletter. In order to celebrate this together I'd love to receive your stories about data —can be short or long, anonymous or not. This is an open box, just write me with what you have on the mind and I'll bundle an edition with it.

Data 130
article thumbnail

5 Ways to Deal with the Lack of Data in Machine Learning

KDnuggets

Effective solutions exist when you don't have enough data for your models. While there is no perfect approach, five proven ways will get your model to production.

More Trending

article thumbnail

Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI

Data Engineering Podcast

Summary The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects.

article thumbnail

Scalable Annotation Service?—?Marken

Netflix Tech

Scalable Annotation Service — Marken by Varun Sekhri , Meenakshi Jindal Introduction At Netflix, we have hundreds of micro services each with its own data models or entities. For example, we have a service that stores a movie entity’s metadata or a service that stores metadata about images. All of these services at a later point want to annotate their objects or entities.

Algorithm 117
article thumbnail

5 Free Data Science Books You Must Read in 2023

KDnuggets

Get your hands on these gems to learn Python, data analytics, machine learning, and deep learning.

article thumbnail

Tulip: Modernizing Meta’s data platform

Engineering at Meta

The technical journey discusses the motivations, challenges, and technical solutions employed for warehouse schematization, especially a change to the wire serialization format employed in Meta’s data platform for data interchange related to Warehouse Analytics Logging. Here, we discuss the engineering, scaling, and nontechnical challenges of modernizing Meta’s exabyte-scale data platform by migrating to the new Tulip format.

Bytes 112
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Customer Engagement Trends for 2023

Precisely

In today’s hypercompetitive business environment, companies must deliver a standout experience for their target audience. Companies that excel at customer experience (CX) are better at building brand loyalty, increasing total customer lifetime value, and turning occasional customers into brand evangelists. This compelling drive for outstanding CX coincides with an intensive shift toward digitization, personalization, and omnichannel alignment.

article thumbnail

Containerizing the Beast – Hadoop NameNodes in Uber’s Infrastructure

Uber Engineering

We recently containerized Hadoop NameNodes and upgraded hardware, improving NameNode RPC queue time from ~200 to ~20ms – A 10x improvement! With this radical change, Uber’s Hadoop customers are happier and admins rest more at night.

Hadoop 104
article thumbnail

From Data Collection to Model Deployment: 6 Stages of a Data Science Project

KDnuggets

Here are 6 stages of a novel Data Science Project; From Data Collection to Model in Production, backed by research and examples.

article thumbnail

Why Column-Aware Metadata Is Key to Automating Data Transformations

Snowflake

Data, data, data. It does seem we are not only surrounded by talk about data, but by the actual data itself. We are collecting data from every nook and cranny of the universe (literally!). IoT devices in every industry; geolocation information on our phones, watches, cars, and every other mobile device; every website or app we access—all are collecting data.

Metadata 103
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Introduction to Synthetic Aperture Radar

ArcGIS

This blog will answer questions such as “What is SAR?”, “What can SAR be used for?”, and “How is SAR beneficial?”.

article thumbnail

A Gousto use case: how Databricks helps create personalized recipe recommendations for customers at scale

databricks

“This blog is authored by Hai Nguyen, Senior Data Scientist at Gousto” Gousto is the UK's best value recipe box, serving up more rec.

Data 98
article thumbnail

7 Best Libraries for Machine Learning Explained

KDnuggets

Learn about machine learning libraries for building and deploying machine learning models.

article thumbnail

4 Useful BigQuery SQL Functions You May Not Know

Towards Data Science

And how to use them Continue reading on Towards Data Science »

SQL 98
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Understanding and managing ArcGIS Online credits

ArcGIS

ArcGIS Online users and administrators - learn best practices for managing ArcGIS Online credits and get answers to frequently asked questions.

article thumbnail

Work With Large Monorepos With Sparse Checkout Support in Databricks Repos

databricks

For your data-centered workloads, Databricks offers the best-in-class development experience and gives you the tools you need to adhere to code development best.

Coding 98
article thumbnail

7 SMOTE Variations for Oversampling

KDnuggets

Best oversampling techniques for the imbalanced data.

article thumbnail

Enforcing Device AuthN & Compliance at Pinterest

Pinterest Engineering

Armen Tashjian | Security Engineer, Corporate Security Intro Pinterest has enforced the use of managed and compliant devices in our Okta authentication flow, using a passwordless implementation, so that access to our tools always requires a healthy Pinterest device. Following the phishing-based attacks against our peers in the tech industry, Pinterest decided to take a two pronged approach to defend against similar attacks.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

One Minute Map Hacks: 71-75

ArcGIS

Another five hacks in an endless stream of one-minute how-to videos.

98
article thumbnail

Best Practices and Guidance for Cloud Engineers to Deploy Databricks on AWS: Part 2

databricks

This is part two of a three-part series in Best Practices and Guidance for Cloud Engineers to deploy Databricks on AWS. You can.

AWS 98
article thumbnail

Multi-modal deep learning in less than 15 lines of code

KDnuggets

Learn how to easily build, iterate and deploy a state-of-the-art deep learning model to predict customer ratings with a declarative approach to machine learning.

article thumbnail

How to Compare Two Tables For Equality in BigQuery

Towards Data Science

Compare tables and extract their differences with standard SQL Continue reading on Towards Data Science »

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Improving the customer’s experience via ML-driven payment routing

LinkedIn Engineering

Co-Authors: Xianyun Mao , Stan Xu , Rachit Kumar , Vikas R , Xia Hong , and�� Divyakumar Menghani �� As a LinkedIn member, you can subscribe to LinkedIn Premium on a monthly or annual basis. For our customers, we offer the same option for our Talent Solutions and/or Sales Navigator products. For each, LinkedIn offers subscription renewal payments. These subscription renewal payments used to go through a rule-based routing engine to selected payment gateways, which often resulted in a less-than-o

Banking 97
article thumbnail

Bringing Models and Data Closer Together

databricks

We are excited to announce a new AutoML capability to quickly and easily use Feature Store data to improve model outcomes. AutoML users.

Data 98
article thumbnail

Top 8 Data Science Slack Communities to Join in 2023

KDnuggets

Take your Data Science journey to the next level by joining these Slack communities in 2023.

article thumbnail

Streaming Big Data Files from Cloud Storage

Towards Data Science

Methods for efficient consumption of large files Photo by Aron Visuals on Unsplash Working with very large files can pose challenges to application developers related to efficient resource management and runtime performance. Text file editors, for example, can be divided into those that can handle large files, and those that make your CPU choke, make your PC freeze, and make you want to scream.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m