Sat.May 27, 2023 - Fri.Jun 02, 2023

article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Education 364
article thumbnail

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.

Data Lake 162
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Programming Languages for Specific Data Roles

KDnuggets

What programming language do you need for a specific data role?

article thumbnail

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier —  Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Algorithm 140
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Welcoming bit.io to Databricks: Investing in the Developer Experience

databricks

We are excited to announce that bit.io is joining Databricks. At Databricks, we’ve always been focused on empowering organizations to solve their toughest p.

138
138
article thumbnail

What's new in Apache Spark 3.4.0 - Structured Streaming

Waitingforcode

The asynchronous progress tracking and correctness issue fixes presented in the previous blog posts are not the single new feature in Apache Spark Structured Streaming 3.4.0. There are many others but to keep the blog post readable, I'll focus here only on 3 of them.

130
130

More Trending

article thumbnail

Data News — Week 23.21

Christophe Blefari

Me ( credits ) Hey, I've been sick in the last 3 days and it was impossible to write something. As I still want to send something, here a raw edition with no comments. See you on Friday. Gen Ai 🤖 QLoRA: Efficient Finetuning of Quantized LLMs — 65B parameter model on a single 48GB GPU reaching 99.3% of the performance level of ChatGPT on Vicuna.

BI 130
article thumbnail

Startup Spotlight: Making Snowflake Queries Smarter and Cheaper with Sundeck 

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we highlight the people and companies building businesses on Snowflake. In this Q&A series, Jacques Nadeau, Co-Founder and CEO of Sundeck and co-creator of Apache Arrow, talks about what inspires him to make powerful data tools available to all, how Sundeck’s query engineering platform can help Snowflake users, and why they “eat, sleep, and drink” Snowflake every day at Sundeck.

SQL 120
article thumbnail

Testing Control-Flow Translations in GHC

Tweag

In November 2022, Tweag engineers merged a WebAssembly back end into the Glasgow Haskell Compiler (GHC). The back end includes a new translation for control flow , which enables GHC to avoid depending on external tools like Binaryen. Because the translation is new, we wanted to test it before submitting a merge request. And classic unit testing was not a good fit—we would have needed to know what the WebAssembly code was expected to be generated from any given fragment of Haskell, and that’s a j

Algorithm 120
article thumbnail

Bard for Data Science Cheat Sheet

KDnuggets

Check out our latest cheat sheet to get you up to speed and provide a handy reference for using Google's LLM chat tool Bard for data science.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Warmest ocean ever

ArcGIS

Our ocean is a key regulator in our climate and weather patterns. As temperatures rise so will the land temperatures and storm frequencies.

109
109
article thumbnail

Driving Data Usability for Health Plans through Simplified Data Quality Enforcement with Databricks

databricks

Faced with clinician shortages, an aging population, and stagnant health outcomes, the healthcare industry has the potential to greatly benefit from disruptive technologies.

article thumbnail

Data Ticket Takers vs. Decision Makers

Towards Data Science

Are You a Data Ticket Taker or Decision Maker? The characteristics and value of reactive vs. proactive data teams Image courtesy of the author. Fundamentally, there are two different types of data teams in this world. There are those who are reactive to the wants of the organization, and then there are those who proactively lead the organization towards its needs.

Data 98
article thumbnail

Top 10 Tools for Detecting ChatGPT, GPT-4, Bard, and Claude

KDnuggets

Top free tools for detecting thesis, research papers, assignments, documentation, and blogs generated by AI models.

152
152
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

ThoughtSpot Sage: data security with large language models

ThoughtSpot

With the recent announcement of ThoughtSpot Sage , we launched a number of enhancements to our search capabilities including AI-generated answers, AI-powered search suggestions, and AI-assisted data modeling. In this article we will walk you through the steps we take to secure your data during the LLM interaction. Looking more broadly, we’ll also describe the security process we follow during any application iteration or enhancement, so you can see the great lengths we take to keep your data se

article thumbnail

Adaptive Query Execution in Structured Streaming

databricks

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting.

article thumbnail

BigQuery Best Practices: Unleash the Full Potential of Your Data Warehouse

Towards Data Science

Supercharge your BigQuery experience with these 6 best practices Continue reading on Towards Data Science »

article thumbnail

Deep Learning with R

KDnuggets

In this tutorial, learn how to perform a deep learning task in R.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

10 Interesting Project Management Project Ideas to Follow in 2023

Knowledge Hut

Project management is a critical function for every organization to achieve its goals in a successful and effective manner. According to one report, project management employment in the United States is predicted to expand by 33% between 2017 and 2027. According to the Bureau of Labour Statistics and PMI, companies will require roughly 88 million people in project management-related activities by 2027.

Project 98
article thumbnail

Better LLMs with Better Data using Cleanlab Studio

databricks

This post and accompanying notebook and tutorial video demonstrate how to use Cleanlab Studio to improve the performance of Large Language Models (LLMs.

Data 98
article thumbnail

Explore and Prepare Your Data with ArcGIS Pro Data Engineering

ArcGIS

Get a taste of how to use Data Engineering to explore, visualize, clean and prepare data in ArcGIS Pro.

article thumbnail

OpenAI’s Whisper API for Transcription and Translation

KDnuggets

This article will show you how to use OpenAI's Whisper API to transcribe audio into text. It will also show you how to use it in your own projects and how to integrate it into your data science projects.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

What Pride and allyship mean to me by Steve Foreshew-Cain

Scott Logic

Every year at this time, I like to share my thoughts on the continuing relevance of Pride Month; you can read my posts here from 2021 and 2022. This Pride Month, we’re going to share insights from the Scott Logic team on what Pride and allyship mean to them, and why they value working in an inclusive environment. I’ll get the ball rolling. What does Pride mean to you?

article thumbnail

Easy Ingestion to Lakehouse with File Upload and Add Data UI

databricks

Data ingestion into the Lakehouse can be a bottleneck for many organizations, but with Databricks, you can quickly and easily ingest data of.

article thumbnail

Introducing the Snowflake Connector for ServiceNow analytics

ThoughtSpot

In a world where user experience and IT support can mean the difference between hitting or missing your ARR marks, businesses have to find smarter ways to build workflows and support their IT departments. That’s where companies like ServiceNow come into play. A few years back, we created our ServiceNow SpotApp , a pre-built analytics template to help companies analyze and understand their data—so they can increase efficiencies across their complex IT environments.

article thumbnail

LLM Apocalypse Now: Revenge of the Open Source Clones

KDnuggets

This is a story about how open-source projects are taking on the LLM industry.

Project 129
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Share Pop-up Charts from the Spatial Statistics and Space Time Pattern Mining Toolboxes to ArcGIS Online

ArcGIS

Use the Convert Spatial Statistics Popup Charts for Web Display tool to view the pop-up charts from your analysis in ArcGIS Online.

98
article thumbnail

LinkedIn Bug Bounty Program - One Year Anniversary of Public Launch

LinkedIn Engineering

Authors: Ameen Maali , Rohit Pitke , Surbhi Jain , and Mira Thambireddy Security of our members’ data is a key priority at LinkedIn. To tap into the collective insights of the entire security community, we decided to expand our private bug bounty program to everyone on the HackerOne platform last year. In this blog post, we reflect on our journey through the program’s inception, the successes, the learnings, and discuss why our bug bounty program has been so valuable in keeping LinkedIn a secure

article thumbnail

What is a MMM and why does it matter for marketers?

databricks

MMM (Marketing or Media Mix Modeling), is a data-driven methodology that enables companies to identify and measure the impact of their marketing campaigns.

Media 98
article thumbnail

Essential MLOps: A Free eBook

KDnuggets

Check out this free ebook on the essentials of machine learning operations.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m