Sat.May 27, 2023 - Fri.Jun 02, 2023

article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Education 363
article thumbnail

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

Summary Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.

Data Lake 162
article thumbnail

Programming Languages for Specific Data Roles

KDnuggets

What programming language do you need for a specific data role?

article thumbnail

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier —  Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Algorithm 140
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Welcoming bit.io to Databricks: Investing in the Developer Experience

databricks

We are excited to announce that bit.io is joining Databricks. At Databricks, we’ve always been focused on empowering organizations to solve their toughest p.

138
138
article thumbnail

What's new in Apache Spark 3.4.0 - Structured Streaming

Waitingforcode

The asynchronous progress tracking and correctness issue fixes presented in the previous blog posts are not the single new feature in Apache Spark Structured Streaming 3.4.0. There are many others but to keep the blog post readable, I'll focus here only on 3 of them.

130
130

More Trending

article thumbnail

Data News — Week 23.21

Christophe Blefari

Me ( credits ) Hey, I've been sick in the last 3 days and it was impossible to write something. As I still want to send something, here a raw edition with no comments. See you on Friday. Gen Ai 🤖 QLoRA: Efficient Finetuning of Quantized LLMs — 65B parameter model on a single 48GB GPU reaching 99.3% of the performance level of ChatGPT on Vicuna.

BI 130
article thumbnail

Testing Control-Flow Translations in GHC

Tweag

In November 2022, Tweag engineers merged a WebAssembly back end into the Glasgow Haskell Compiler (GHC). The back end includes a new translation for control flow , which enables GHC to avoid depending on external tools like Binaryen. Because the translation is new, we wanted to test it before submitting a merge request. And classic unit testing was not a good fit—we would have needed to know what the WebAssembly code was expected to be generated from any given fragment of Haskell, and that’s a j

Algorithm 116
article thumbnail

Warmest ocean ever

ArcGIS

Our ocean is a key regulator in our climate and weather patterns. As temperatures rise so will the land temperatures and storm frequencies.

111
111
article thumbnail

Bard for Data Science Cheat Sheet

KDnuggets

Check out our latest cheat sheet to get you up to speed and provide a handy reference for using Google's LLM chat tool Bard for data science.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Driving Data Usability for Health Plans through Simplified Data Quality Enforcement with Databricks

databricks

Faced with clinician shortages, an aging population, and stagnant health outcomes, the healthcare industry has the potential to greatly benefit from disruptive technologies.

article thumbnail

ThoughtSpot Sage: data security with large language models

ThoughtSpot

With the recent announcement of ThoughtSpot Sage , we launched a number of enhancements to our search capabilities including AI-generated answers, AI-powered search suggestions, and AI-assisted data modeling. In this article we will walk you through the steps we take to secure your data during the LLM interaction. Looking more broadly, we’ll also describe the security process we follow during any application iteration or enhancement, so you can see the great lengths we take to keep your data se

article thumbnail

10 Interesting Project Management Project Ideas to Follow in 2023

Knowledge Hut

Project management is a critical function for every organization to achieve its goals in a successful and effective manner. According to one report, project management employment in the United States is predicted to expand by 33% between 2017 and 2027. According to the Bureau of Labour Statistics and PMI, companies will require roughly 88 million people in project management-related activities by 2027.

Project 98
article thumbnail

Top 10 Tools for Detecting ChatGPT, GPT-4, Bard, and Claude

KDnuggets

Top free tools for detecting thesis, research papers, assignments, documentation, and blogs generated by AI models.

154
154
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Adaptive Query Execution in Structured Streaming

databricks

In Databricks Runtime, Adaptive Query Execution (AQE) is a performance feature that continuously re-optimizes batch queries using runtime statistics during query execution. Starting.

article thumbnail

Explore and Prepare Your Data with ArcGIS Pro Data Engineering

ArcGIS

Get a taste of how to use Data Engineering to explore, visualize, clean and prepare data in ArcGIS Pro.

article thumbnail

What Pride and allyship mean to me by Steve Foreshew-Cain

Scott Logic

Every year at this time, I like to share my thoughts on the continuing relevance of Pride Month; you can read my posts here from 2021 and 2022. This Pride Month, we’re going to share insights from the Scott Logic team on what Pride and allyship mean to them, and why they value working in an inclusive environment. I’ll get the ball rolling. What does Pride mean to you?

article thumbnail

Deep Learning with R

KDnuggets

In this tutorial, learn how to perform a deep learning task in R.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Better LLMs with Better Data using Cleanlab Studio

databricks

This post and accompanying notebook and tutorial video demonstrate how to use Cleanlab Studio to improve the performance of Large Language Models (LLMs.

Data 98
article thumbnail

Introducing the Snowflake Connector for ServiceNow analytics

ThoughtSpot

In a world where user experience and IT support can mean the difference between hitting or missing your ARR marks, businesses have to find smarter ways to build workflows and support their IT departments. That’s where companies like ServiceNow come into play. A few years back, we created our ServiceNow SpotApp , a pre-built analytics template to help companies analyze and understand their data—so they can increase efficiencies across their complex IT environments.

article thumbnail

Share Pop-up Charts from the Spatial Statistics and Space Time Pattern Mining Toolboxes to ArcGIS Online

ArcGIS

Use the Convert Spatial Statistics Popup Charts for Web Display tool to view the pop-up charts from your analysis in ArcGIS Online.

98
article thumbnail

OpenAI’s Whisper API for Transcription and Translation

KDnuggets

This article will show you how to use OpenAI's Whisper API to transcribe audio into text. It will also show you how to use it in your own projects and how to integrate it into your data science projects.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Easy Ingestion to Lakehouse with File Upload and Add Data UI

databricks

Data ingestion into the Lakehouse can be a bottleneck for many organizations, but with Databricks, you can quickly and easily ingest data of.

article thumbnail

LinkedIn Bug Bounty Program - One Year Anniversary of Public Launch

LinkedIn Engineering

Authors: Ameen Maali , Rohit Pitke , Surbhi Jain , and Mira Thambireddy Security of our members’ data is a key priority at LinkedIn. To tap into the collective insights of the entire security community, we decided to expand our private bug bounty program to everyone on the HackerOne platform last year. In this blog post, we reflect on our journey through the program’s inception, the successes, the learnings, and discuss why our bug bounty program has been so valuable in keeping LinkedIn a secure

article thumbnail

Top 20 Artificial Intelligence Project Ideas in 2023

Knowledge Hut

AI finds its use in a wide range of applications like marketing , automation, transport, supply chain, and communication, to name a few. From cutting-edge research to real-world applications, here we will investigate the most executed artificial intelligence projects. This article will assist you to discover plenty of fascinating ideas and insights to inspire you, whether you are a tech fanatic or want to know about the future of AI.

Project 96
article thumbnail

LLM Apocalypse Now: Revenge of the Open Source Clones

KDnuggets

This is a story about how open-source projects are taking on the LLM industry.

Project 137
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

What is a MMM and why does it matter for marketers?

databricks

MMM (Marketing or Media Mix Modeling), is a data-driven methodology that enables companies to identify and measure the impact of their marketing campaigns.

Media 98
article thumbnail

How DoorDash uses XcodeGen to eliminate project merge conflicts

DoorDash Engineering

At DoorDash, we work to implement efficient processes that can mitigate common conflicts within a large iOS development team. Part of those efforts involve using XcodeGen, a command line interface (CLI), to reduce merging conflicts within our various iOS teams. Here we will discuss its implementation to manage the intricate business scenarios and demanding requirements of the Dasher app, which lets our drivers receive, pick up, and securely deliver orders to customers.

Project 96
article thumbnail

Data Ticket Takers vs. Decision Makers

Towards Data Science

Are You a Data Ticket Taker or Decision Maker? The characteristics and value of reactive vs. proactive data teams Image courtesy of the author. Fundamentally, there are two different types of data teams in this world. There are those who are reactive to the wants of the organization, and then there are those who proactively lead the organization towards its needs.

Data 92
article thumbnail

Essential MLOps: A Free eBook

KDnuggets

Check out this free ebook on the essentials of machine learning operations.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.