Sat.Nov 26, 2022 - Fri.Dec 02, 2022

article thumbnail

Scikit-learn for Machine Learning Cheatsheet

KDnuggets

The latest KDnuggets exclusive cheatsheet covers the essentials of machine learning with Scikit-learn.

article thumbnail

How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka

Confluent

Apache Kafka’s Streams API embeds Machine Learning into any app or microservice (Java, Docker, Kubernetes, etc.) to add business value.

article thumbnail

A Tale of Betrayal and Heartbreak – Databricks Workflows and Jobs.

Confessions of a Data Guy

Nothing captures the imagination and heart like a tale of betrayal and heartbreak, and that is a tale I want to bring to you today. It’s a tale of Databricks Workflows and Jobs, version changes, new features, API’s, and insidious little hidden gems that will make you pull your hair out when you find them. […] The post A Tale of Betrayal and Heartbreak – Databricks Workflows and Jobs. appeared first on Confessions of a Data Guy.

Data 130
article thumbnail

Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase

Data Engineering Podcast

Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data as bitmaps and the benefits that this representation provides for fast aggregate computation.

Data Lake 100
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How I Got 4 Data Science Offers and Doubled My Income 2 Months After Being Laid Off

KDnuggets

In this blog, I shared my story on getting 4 data science job offers including Airbnb, Lyft and Twitter after being laid off. Any data scientist who was laid off due to the pandemic or who is actively looking for a data science position can find something here to which they can relate.

article thumbnail

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

ksqlDB use case: see how apps can use ksqlDB to ingest, filter, enrich, aggregate, and query data directly with Kafka—no complex architectures or data stores needed.

Kafka 144

More Trending

article thumbnail

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, and Voltron Data was created to help grow and support its technology and community.

article thumbnail

What Google Recommends You do Before Taking Their Machine Learning or Data Science Course

KDnuggets

First steps to learning data science & machine learning are the foundations.

article thumbnail

Broadcom Modernizes Machine Learning and Anomaly Detection with ksqlDB

Confluent

Broadcom's Mainframe Operational Intelligence Product (MOI) collects and analyzes data at mass scale, using ksqlDB to improve anomaly detection and custom alarm filtering.

article thumbnail

How DoorDash Secures Data Transfer Between Cloud and On-Premise Data Centers

DoorDash Engineering

As DoorDash’s business grows, engineers strive for a better network infrastructure to ensure more third-party services could be integrated into our system while keeping data securely transmitted. Due to security and compliance concerns, some vendors handling such sensitive data cannot expose services to the public Internet and therefore host their own on-premise data centers.

Cloud 97
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

You Can’t Hit What You Can’t See

Cloudera

Full-stack observability is a critical requirement for effective modern data platforms to deliver the agile, flexible, and cost-effective environment organizations are looking for. For analytic applications to properly leverage a hybrid, multi-cloud ecosystem to support modern data architectures, data observability has become even more important. I spoke to Mark Ramsey of Ramsey International (RI) to dive deeper into that last subject.

article thumbnail

Top 10 Data Science Myths Busted

KDnuggets

The data science field is full of job opportunities, yet there is still a lot of confusion about what data scientists actually do. This confusion is largely due to the many myths that exist about the role of a data scientist. In this article, we will bust the top 10 myths about data science. By the end of this article, you will have a better understanding of the role of a data scientist and what it takes to be one.

article thumbnail

From Eager to Smarter in Apache Kafka Consumer Rebalances

Confluent

Major improvements to the Kafka consumer, Streams, and ksqlDB for incremental cooperative rebalancing while maintaining at-least-once and exactly-once guarantees.

Kafka 138
article thumbnail

Enabling static analysis of SQL queries at Meta

Engineering at Meta

UPM is our internal standalone library to perform static analysis of SQL code and enhance SQL authoring. UPM takes SQL code as input and represents it as a data structure called a semantic tree. Infrastructure teams at Meta leverage UPM to build SQL linters, catch user mistakes in SQL code, and perform data lineage analysis at scale. Executing SQL queries against our data warehouse is important to the workflows of many engineers and data scientists at Meta for analytics and monitoring use cases

SQL 73
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Transaction Support in Cloudera Operational Database (COD)

Cloudera

What is CDP Operational Database (COD). CDP Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. It helps developers automate and simplify database management with capabilities like auto-scale, and is fully integrated with Cloudera Data Platform (CDP). For more information and to get started with COD, refer to Getting Started with Cloudera Data Platform Operational Database (COD).

article thumbnail

The Complete Data Engineering Study Roadmap

KDnuggets

Everything you need to know to start your career in Data Engineering.

article thumbnail

Measuring Code Coverage of Golang Binaries with Bincover

Confluent

Here's a deep dive on how we implemented Bincover, a simple, open source tool for measuring code coverage of Golang binaries.

Coding 131
article thumbnail

Large Scale Ad Data Systems at Booking.com using the Public Cloud

Booking.com Engineering

Booking.com’s mission is to make it easier for everyone to experience the world. To help people discover destinations, we are a leading travel advertiser on Google Pay Per Click (PPC). Booking Holdings, as a whole, spent $4.7 billion in marketing across all brands in the first nine months of 2022[1]. How do we run PPC at our scale, and efficiently? In this article, we want to illustrate our extensive use of the public cloud, specifically Google Cloud Platform (GCP).

Systems 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How to Deploy Transaction Support on Cloudera Operational Database (COD)

Cloudera

What is Cloudera Operational Database (COD). Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. It helps developers automate and simplify database management with capabilities like auto-scale, and is fully integrated with Cloudera Data Platform (CDP). For more information and to get started with COD, refer to our article Getting Started with Cloudera Data Platform Operational Database (COD).

article thumbnail

Black Friday Deal – Master Machine Learning for Less with DataCamp

KDnuggets

Secure major savings on DataCamp’s Black Friday deal and Cyber Monday deal!

article thumbnail

Stream Processing, CEP, Event Sourcing, and Data Streaming Explained

Confluent

What is stream processing, or complex event processing (CEP), and how does it work? Learn about real-time data and event stream analytics in this tutorial.

Process 126
article thumbnail

Improving the Player on Android

Pinterest Engineering

Grey Skold | (former Android Video Engineer) ; Lin Wang | Android Performance Engineer; Sheng Liu | Android Performance Engineer Pinterest Android App offers a rare experience with a mix of images and videos on a two-column grid. In order to maintain a performant video experience on Android devices, we focused on: Warming up Configurations Pooling players Warming Up In order to reduce the startup latency, we establish a video network connection by sending a dummy HTTP HEAD request during the ear

Media 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How Do You Summarize Data in Excel?

U-Next

Introduction To Summarizing Data In Excel . Data for Excel is a way of summarizing large amounts of data into a few numbers. For example, if you have 3,000 sales at $50 each, you could summarize this by saying that total sales were $150,000. The summarization of data in Excel is doable in many ways. . Approximately 54% of businesses use Excel , which doesn’t include any other spreadsheet programs.

Data 52
article thumbnail

How Can Python Be Used for Data Visualization?

KDnuggets

This article discusses the different python libraries used for data visualization with examples.

Python 137
article thumbnail

ksqlDB Execution Plans: Move Fast But Don’t Break Things

Confluent

Build fast, break nothing. Learn about the unique challenges Confluent's engineering team has faced building ksqlDB and continuously shipping the latest, greatest features.

Building 124
article thumbnail

Data Migration: Types, Process, and Successful Strategies

Ascend.io

Data migration is one of the most common undertakings for data teams. Yet, many businesses underestimate the process—resulting in extra time and money spent. A data migration process usually takes longer than it should and requires several teams. In addition, it is highly visible to both users and executives. How can you keep from making the same mistake?

Process 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

DataOps Observability and Automation to the Rescue!

DataKitchen

Data Team members, have you ever felt overwhelmed? The never-ending flow of new information can be stressful, and it’s hard to know where to start. Well, don’t worry because DataOps is here to help! In this post, we’ll discuss how DataOps Observability and Automation can relieve team stress and show you how to get started. So don’t wait any longer.

article thumbnail

An Introduction to SMOTE

KDnuggets

Improve the model performance by balancing the dataset using the synthetic minority oversampling technique.

Datasets 131
article thumbnail

Walmart’s Real-Time Inventory System Powered by Apache Kafka

Confluent

With over 4,700 stores, learn how Walmart used Kafka to build an event-driven architecture for real-time inventory management, providing a seamless omnichannel experience.

Kafka 119
article thumbnail

An introduction to Markdown by Charlie Olive

Scott Logic

An introduction to Markdown Markdown is a brilliant tool for quickly writing up universally accessible documents. Created by John Gruber and Aaron Schwartz in 2004, it stands as one of the most popular and widely used markup languages around. It uses simple and intuitive formatting that can be easily read and understood. “A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions” John Gruber, creator of

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.