Sat.Nov 19, 2022 - Fri.Nov 25, 2022

article thumbnail

How Much Math Do You Need in Data Science?

KDnuggets

There exist so many great computational tools available for Data Scientists to perform their work. However, mathematical skills are still essential in data science and machine learning because these tools will only be black-boxes for which you will not be able to ask core analytical questions without a theoretical foundation.

article thumbnail

Data News — Week 22.47

Christophe Blefari

Capturing the news ( credits ) Hello you, I hope this data news finds you well. Time flies to be honest. I've launched in a rush an Advent of Data. The goal is simple, in December: 24 data people will produce 24 data gems. Every day a new piece of content will be release on a dedicated website. If you wanna join the initiative please reply, we are still looking for a few slots to be filled in.

Data 130
article thumbnail

Twitter’s ongoing cruel treatment of software engineers

The Pragmatic Engineer

Originally published on 24 November 2022. 👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. To get this newsletter every week, subscribe here. I was really hoping to not report anything more about Twitter, and that software engineers at the company would get space to heal after the traumatic events, and to focus on building the product.

article thumbnail

DuckDB: Getting started for Beginners

Marc Lamberti

DuckDB is an in-process OLAP DBMS written in C++ blah blah blah, too complicated. Let’s start simple, shall we? DuckDB is the SQLite for Analytics. It has no dependencies, is extremely easy to set up, and is optimized to perform queries on data. In this hands-on tutorial, you will learn what DuckDB is, how to use it, and why it is essential for you.

Datasets 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

10 Amazing Machine Learning Visualizations You Should Know in 2023

KDnuggets

Yellowbrick for creating machine learning plots with less code.

article thumbnail

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

Summary The problems that are easiest to fix are the ones that you prevent from happening in the first place. Sifflet is a platform that brings your entire data stack into focus to improve the reliability of your data assets and empower collaboration across your teams. In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures.

Data Lake 130

More Trending

article thumbnail

A (Stream Processing Use Case) Recipe for Thankfulness

Confluent

Our Stream Processing tutorials help you tackle real-life use cases with Apache Kafka and ksqlDB. Check out our newest Thanksgiving-themed use case: survey response analysis!

Process 59
article thumbnail

The Inescapable Conclusion: Machine Learning Is Not Like Your Brain

KDnuggets

The final article in this nine-part series summarizes the many reasons why Machine Learning is not like your brain - along with a few similarities.

article thumbnail

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. In this episode Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and recommeder systems that are deployed as part of the game binary.

Systems 130
article thumbnail

Impact of Digitization on HR Services and Processes

U-Next

Introduction to Digitization in Human Resources . Digitization in HR services is of utmost importance to an organization. It is a critical and strategic function that aims to optimize the workforce to meet business goals. The HR functions and processes have been evolving with advances in technology, changing consumer behavior patterns, and increasing globalization of markets.

Process 72
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Leveraging CockroachDB’s Change Feed for Real-Time Inventory Data Processing

DoorDash Engineering

Managing inventory levels is one of the biggest challenges for any convenience and grocery retailer on DoorDash. Maintaining accurate inventory levels in a timely manner becomes especially challenging when there are many constantly moving variables that may be changing on-hand inventory count. Situations that may affect inventory levels include, but are not limited to: Items expiring Items may have to be removed due to damage The items vendors sent are different than than what was ordered After

article thumbnail

What is Chebychev’s Theorem and How Does it Apply to Data Science?

KDnuggets

Chebyshev’s Theorem applies to every data set and is heavily used by Statisticians, Data Scientists, and Machine Learning Engineers.

article thumbnail

How Precision Time Protocol is being deployed at Meta

Engineering at Meta

Implementing Precision Time Protocol (PTP) at Meta allows us to synchronize the systems that drive our products and services down to nanosecond precision. PTP’s predecessor, Network Time Protocol (NTP) , provided us with millisecond precision, but as we scale to more advanced systems on our way to building the next computing platform, the metaverse and AI, we need to ensure that our servers are keeping time as accurately and precisely as possible.

article thumbnail

Why Should Software Engineers Be Good Writers?

Trio

When people think of software engineers, they generally don’t think about the painstaking authors meticulously putting together words on a page.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What’s the Relationship Between Big Data and Machine Learning?

U-Next

Introduction to Machine Learning and Big Data . Big Data and Machine Learning are one of the most crucial and irreplaceable technologies today. Machine Learning allows computers to learn from data automatically without being explicitly programmed. This is done by providing the computer with training data, which it can use to improve its performance on future tasks.

article thumbnail

SHAP: Explain Any Machine Learning Model in Python

KDnuggets

A Comprehensive Guide to SHAP and Shapley Values.

article thumbnail

Retrofitting null-safety onto Java at Meta

Engineering at Meta

We developed a new static analysis tool called Nullsafe that is used at Meta to detect NullPointerException (NPE) errors in Java code. Interoperability with legacy code and gradual deployment model were key to Nullsafe’s wide adoption and allowed us to recover some null-safety properties in the context of an otherwise null-unsafe language in a multimillion-line codebase.

Java 55
article thumbnail

Snowflake: Provisioning in AAD to synch Users

Cloudyard

Read Time: 2 Minute, 7 Second During last post we discussed how to configure the Snowflake SSO Login with Azure Active Directory We created User ‘Darsh’ in Azure Active directory and assigned the required permission. To enable the SSO login at snowflake side we also created user manually in below way: CREATE USER "DMITTAL" PASSWORD = 'xxx' LOGIN_NAME ='darsh@sachinmittal2904outlook.onmicrosoft.com' But assume the scenario where we have number of users available in Azure Active Direc

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Organizational Health: What It Is and How HR Manages It?

U-Next

Introduction To Organizational Health . Organizational health is a measure of the effectiveness of the organization. It’s a holistic approach to organizational success that focuses on creating an environment where employees can fulfill their potential and achieve the company’s goals. . What organizational health measures are the well-being in which employees are engaged and performing at their best.

IT 52
article thumbnail

How to Use Graph Theory to Scout Soccer

KDnuggets

Take Soccer Analytics to the Next Level with Graph Theory: Here’s What to Know and How to Do It.

IT 134
article thumbnail

PTP: Timing accuracy and precision for the future of computing

Engineering at Meta

Meta is deploying a timing protocol, Precision Time Protocol (PTP) , that will offer new levels of accuracy and precision to our networks and data centers. We believe PTP will become the global standard for keeping time in computer networks. PTP will benefit today’s products and services and will be a foundational technology behind the development of the metaverse.

Systems 53
article thumbnail

How Data and Finance Teams Can Be Friends (And Stop Being Frenemies)

Monte Carlo

Recently I wrote an article about data silos that form across the organization, often due to lack of alignment with partners. This alignment can be difficult to come by, but is crucial to a data leader’s success. With the range of internal customers to support, it can be tempting for data teams to inhabit the principles of an assembly line or even a fry cook at McDonalds.

Finance 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

A Brief Overview of the Unix File System

U-Next

Introduction . The Unix File System is a framework for organizing and storing large amounts of data in a manageable manner. It includes components like files, a group of connected data that can be conceptualized as a stream of bytes (or characters). In the Unix File System, a file is also the smallest storage unit. . In other words, the Unix File System is a method for organizing and logically analyzing massive amounts of data so that it is simple to manage. .

Systems 52
article thumbnail

Linux for Data Science Cheatsheet

KDnuggets

KDnuggets is back with another exclusive cheatsheet, this time sharing a Linux quick reference for data science.

article thumbnail

How to move data from spreadsheets into your data warehouse

dbt Developer Hub

Once your data warehouse is built out, the vast majority of your data will have come from other SaaS tools, internal databases, or customer data platforms (CDPs). But there’s another unsung hero of the analytics engineering toolkit: the humble spreadsheet. Spreadsheets are the Swiss army knife of data processing. They can add extra context to otherwise inscrutable application identifiers, be the only source of truth for bespoke processes from other divisions of the business, or act as the transl

article thumbnail

How SeatGeek Reduced Data Incidents to Zero with Data Observability

Monte Carlo

Data downtime, unknown unknowns, and the specter of schema changes loom large for data teams of all stripes, and the team at SeatGeek was no exception. As the only mobile ticketing marketplace built for fan experience, SeatGeek made its name on efficient customer experiences. So, when SeatGeek’s data leaders realized they were losing too much time root-causing data issues in their BI reports, they began looking for tools to help them discover their data problems faster.

Data 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

7 Crucial Steps Involved in A Sales Process

U-Next

Introduction . Whether you work in B2B sales or direct-to-consumer sales, the sales process aims to develop an interest in your product and trust in you as a salesperson. You must be meticulous in your efforts to get a potential customer closer to making a purchase since you won’t close a sale without both. . A sales process that we see now includes the following: .

Process 52
article thumbnail

Picking Examples to Understand Machine Learning Model

KDnuggets

Understanding ML by combining explainability and sample picking.

article thumbnail

AWS re:Invent 2022: Rockset Will Be There…Will You?

Rockset

Rockset is heading to Vegas for AWS re:Invent. Will you be there? We have several opportunities for you and your team to learn more about real-time analytics and how companies like Klarna, Meta and Seesaw have made the move from batch to real time. Come by the Rockset Booth (#130) in the expo hall, November 28-December 1st. See a demo and try your hand at winning a Playstation 5 in our re:Invent prize giveaway.

AWS 52
article thumbnail

What’s Next for Data Engineering in 2023? 10 Predictions 

Monte Carlo

What’s next for the future of data engineering? Each year, we chat with one of our industry’s pioneering leaders about their predictions for the modern data stack – and share a few of our own. A few weeks ago, I had the opportunity to chat with famed venture capitalist, prolific blogger , and friend Tomasz Tunguz about his top 9 data engineering predictions for 2023.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.