Scikit-learn for Machine Learning Cheatsheet
KDnuggets
DECEMBER 1, 2022
The latest KDnuggets exclusive cheatsheet covers the essentials of machine learning with Scikit-learn.
KDnuggets
DECEMBER 1, 2022
The latest KDnuggets exclusive cheatsheet covers the essentials of machine learning with Scikit-learn.
Confluent
NOVEMBER 29, 2022
Apache Kafka’s Streams API embeds Machine Learning into any app or microservice (Java, Docker, Kubernetes, etc.) to add business value.
Confessions of a Data Guy
DECEMBER 1, 2022
Nothing captures the imagination and heart like a tale of betrayal and heartbreak, and that is a tale I want to bring to you today. It’s a tale of Databricks Workflows and Jobs, version changes, new features, API’s, and insidious little hidden gems that will make you pull your hair out when you find them. […] The post A Tale of Betrayal and Heartbreak – Databricks Workflows and Jobs. appeared first on Confessions of a Data Guy.
Data Engineering Podcast
NOVEMBER 27, 2022
Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data as bitmaps and the benefits that this representation provides for fast aggregate computation.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
KDnuggets
DECEMBER 1, 2022
In this blog, I shared my story on getting 4 data science job offers including Airbnb, Lyft and Twitter after being laid off. Any data scientist who was laid off due to the pandemic or who is actively looking for a data science position can find something here to which they can relate.
Confluent
DECEMBER 2, 2022
ksqlDB use case: see how apps can use ksqlDB to ingest, filter, enrich, aggregate, and query data directly with Kafka—no complex architectures or data stores needed.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Data Engineering Podcast
NOVEMBER 27, 2022
Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, and Voltron Data was created to help grow and support its technology and community.
KDnuggets
NOVEMBER 29, 2022
First steps to learning data science & machine learning are the foundations.
Confluent
DECEMBER 2, 2022
Broadcom's Mainframe Operational Intelligence Product (MOI) collects and analyzes data at mass scale, using ksqlDB to improve anomaly detection and custom alarm filtering.
DoorDash Engineering
NOVEMBER 29, 2022
As DoorDash’s business grows, engineers strive for a better network infrastructure to ensure more third-party services could be integrated into our system while keeping data securely transmitted. Due to security and compliance concerns, some vendors handling such sensitive data cannot expose services to the public Internet and therefore host their own on-premise data centers.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Cloudera
DECEMBER 1, 2022
Full-stack observability is a critical requirement for effective modern data platforms to deliver the agile, flexible, and cost-effective environment organizations are looking for. For analytic applications to properly leverage a hybrid, multi-cloud ecosystem to support modern data architectures, data observability has become even more important. I spoke to Mark Ramsey of Ramsey International (RI) to dive deeper into that last subject.
KDnuggets
DECEMBER 2, 2022
The data science field is full of job opportunities, yet there is still a lot of confusion about what data scientists actually do. This confusion is largely due to the many myths that exist about the role of a data scientist. In this article, we will bust the top 10 myths about data science. By the end of this article, you will have a better understanding of the role of a data scientist and what it takes to be one.
Confluent
DECEMBER 2, 2022
Major improvements to the Kafka consumer, Streams, and ksqlDB for incremental cooperative rebalancing while maintaining at-least-once and exactly-once guarantees.
Engineering at Meta
NOVEMBER 30, 2022
UPM is our internal standalone library to perform static analysis of SQL code and enhance SQL authoring. UPM takes SQL code as input and represents it as a data structure called a semantic tree. Infrastructure teams at Meta leverage UPM to build SQL linters, catch user mistakes in SQL code, and perform data lineage analysis at scale. Executing SQL queries against our data warehouse is important to the workflows of many engineers and data scientists at Meta for analytics and monitoring use cases
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Cloudera
NOVEMBER 30, 2022
What is CDP Operational Database (COD). CDP Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. It helps developers automate and simplify database management with capabilities like auto-scale, and is fully integrated with Cloudera Data Platform (CDP). For more information and to get started with COD, refer to Getting Started with Cloudera Data Platform Operational Database (COD).
KDnuggets
NOVEMBER 28, 2022
Everything you need to know to start your career in Data Engineering.
Confluent
DECEMBER 2, 2022
Here's a deep dive on how we implemented Bincover, a simple, open source tool for measuring code coverage of Golang binaries.
Booking.com Engineering
DECEMBER 2, 2022
Booking.com’s mission is to make it easier for everyone to experience the world. To help people discover destinations, we are a leading travel advertiser on Google Pay Per Click (PPC). Booking Holdings, as a whole, spent $4.7 billion in marketing across all brands in the first nine months of 2022[1]. How do we run PPC at our scale, and efficiently? In this article, we want to illustrate our extensive use of the public cloud, specifically Google Cloud Platform (GCP).
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Cloudera
NOVEMBER 30, 2022
What is Cloudera Operational Database (COD). Cloudera Operational Database enables developers to quickly build future-proof applications that are architected to handle data evolution. It helps developers automate and simplify database management with capabilities like auto-scale, and is fully integrated with Cloudera Data Platform (CDP). For more information and to get started with COD, refer to our article Getting Started with Cloudera Data Platform Operational Database (COD).
KDnuggets
NOVEMBER 29, 2022
Secure major savings on DataCamp’s Black Friday deal and Cyber Monday deal!
Confluent
NOVEMBER 30, 2022
What is stream processing, or complex event processing (CEP), and how does it work? Learn about real-time data and event stream analytics in this tutorial.
Pinterest Engineering
DECEMBER 2, 2022
Grey Skold | (former Android Video Engineer) ; Lin Wang | Android Performance Engineer; Sheng Liu | Android Performance Engineer Pinterest Android App offers a rare experience with a mix of images and videos on a two-column grid. In order to maintain a performant video experience on Android devices, we focused on: Warming up Configurations Pooling players Warming Up In order to reduce the startup latency, we establish a video network connection by sending a dummy HTTP HEAD request during the ear
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
U-Next
DECEMBER 2, 2022
Introduction To Summarizing Data In Excel . Data for Excel is a way of summarizing large amounts of data into a few numbers. For example, if you have 3,000 sales at $50 each, you could summarize this by saying that total sales were $150,000. The summarization of data in Excel is doable in many ways. . Approximately 54% of businesses use Excel , which doesn’t include any other spreadsheet programs.
KDnuggets
DECEMBER 2, 2022
This article discusses the different python libraries used for data visualization with examples.
Confluent
DECEMBER 2, 2022
Build fast, break nothing. Learn about the unique challenges Confluent's engineering team has faced building ksqlDB and continuously shipping the latest, greatest features.
Ascend.io
DECEMBER 1, 2022
Data migration is one of the most common undertakings for data teams. Yet, many businesses underestimate the process—resulting in extra time and money spent. A data migration process usually takes longer than it should and requires several teams. In addition, it is highly visible to both users and executives. How can you keep from making the same mistake?
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
DataKitchen
DECEMBER 1, 2022
Data Team members, have you ever felt overwhelmed? The never-ending flow of new information can be stressful, and it’s hard to know where to start. Well, don’t worry because DataOps is here to help! In this post, we’ll discuss how DataOps Observability and Automation can relieve team stress and show you how to get started. So don’t wait any longer.
KDnuggets
NOVEMBER 29, 2022
Improve the model performance by balancing the dataset using the synthetic minority oversampling technique.
Confluent
DECEMBER 2, 2022
With over 4,700 stores, learn how Walmart used Kafka to build an event-driven architecture for real-time inventory management, providing a seamless omnichannel experience.
Scott Logic
NOVEMBER 30, 2022
An introduction to Markdown Markdown is a brilliant tool for quickly writing up universally accessible documents. Created by John Gruber and Aaron Schwartz in 2004, it stands as one of the most popular and widely used markup languages around. It uses simple and intuitive formatting that can be easily read and understood. “A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions” John Gruber, creator of
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content