Sat.Aug 05, 2023 - Fri.Aug 11, 2023

article thumbnail

Why Is Data Modeling So Challenging – How To Data Model For Analytics

Seattle Data Guy

Learning about how to data models from basic star schemas on the internet is like learning data science using the IRIS data set. It works great as a toy example. But it doesn’t match real life at all. Data modeling in real life requires you fully understand the data sources and your business use cases.… Read more The post Why Is Data Modeling So Challenging – How To Data Model For Analytics appeared first on Seattle Data Guy.

article thumbnail

A senior engineer/EM job search story

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

article thumbnail

Senior Engineer – The Number One Skill

Confessions of a Data Guy

Do you think I’m just trying to get you to click? Maybe. Maybe not. After working in and around Data Teams for well over a decade, with both the smartest people to touch the keyboard, and the others, it’s become quite clear to me what the number one skill that identifies a Senior level Engineering […] The post Senior Engineer – The Number One Skill appeared first on Confessions of a Data Guy.

article thumbnail

_spark_metadata in Apache Spark Structured Streaming issue is no more!

Waitingforcode

There are probably not that many people working today on the flat files with Structured Streaming than 5 years ago thanks to the table file formats. However, if you are in this group and are still generating CSVs or JSONs with the streaming sink, brace yourself, the memory problems are coming if you don't take action!

130
130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Quantifying The Return On Investment For Your Data Team

Data Engineering Podcast

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.

article thumbnail

Are reports of StackOverflow’s fall greatly exaggerated?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

Retail 176

More Trending

article thumbnail

Confluent Champion: Niki Kapsi’s Journey From SDR to Commercial Account Executive

Confluent

Meet Commercial AE Niki Kapsi and learn about the “entrepreneurial” side of her role at Confluent.

98
article thumbnail

What is Data Observability? 5 Key Pillars To Know

Monte Carlo

Editor’s Note : So much has happened since we first published this post and created the data observability category and Monte Carlo in 2019. We have updated this post to reflect this rapidly maturing space. You can read the original article linked at the bottom of this page. What is Data observability? The five pillars My data observability definition has not changed since I first coined it in 2019: Data observability refers to an organization’s comprehensive understanding of the health an

article thumbnail

What’s new with Databricks SQL?

databricks

At this year's Data+AI Summit, Databricks SQL continued to push the boundaries of what a data warehouse can be, leveraging AI across the.

SQL 98
article thumbnail

Data Scientists Need to Specialize to Survive the Tech Winter

KDnuggets

In this article, I explore the benefits of specialization for data scientists. Drawing on my own experience as a data scientist, I argue that specializing in a specific area can help you stand out in a crowded job market and provide you with more fulfilling career opportunities.

Data 108
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Scaling the Instagram Explore recommendations system

Engineering at Meta

Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we’ve been able to make the Explore recommendation system even more scalable and flexible.

Systems 98
article thumbnail

Supercharging your Rust static executables with mimalloc

Tweag

Why link statically against musl? Have you ever faced compatibility issues when dealing with Linux binary executables? The culprit is often the libc implementation, glibc. Acting as the backbone of nearly all Linux distros, glibc is the library responsible for providing standard C functions. Yet, its version compatibility often poses a challenge. Binaries compiled with a newer version of glibc may not function on systems running an older one, creating a compatibility headache.

article thumbnail

How to execute your operating model for Data and AI

databricks

In Part 1 of this blog series, we discussed how Databricks enables organizations to develop, manage and operate processes that extract value from.

Data 98
article thumbnail

Best Python Tools for Building Generative AI Applications Cheat Sheet

KDnuggets

KDnuggets' new cheat sheet summarizes the top Python libraries for building generative AI apps, from OpenAI and Transformers to tools like Gradio, Diffusers, LangChain, and more. Ideal for both beginners and experts looking for a quick reference.

Python 108
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Fixit 2: Meta’s next-generation auto-fixing linter

Engineering at Meta

Fixit is dead! Long live Fixit 2 – the latest version of our open-source auto-fixing linter. Fixit 2 allows developers to efficiently build custom lint rules and perform auto-fixes for their codebases. Fixit 2 is available today on PyPI. Python is one of the most popular languages in use at Meta. Meta’s production engineers (PEs) are specialized software engineers (SWEs) who focus on reliability, efficiency, and scalability.

Python 98
article thumbnail

What is an Apache Kafka Cluster? (And Why You Should Care)

Confluent

Learn what an Apache Kafka cluster is, and what makes a cluster special.

Kafka 96
article thumbnail

A New Partnership with Redox and How We Unlock Healthcare Data to Drive Advanced Analytics

databricks

Healthcare is sitting on mountains of data Pop quiz: Which industry accounts for about 30% of newly created data around the world and.

article thumbnail

Overcoming Barriers in Multi-lingual Voice Technology: Top 5 Challenges and Innovative Solutions

KDnuggets

Voice assistants like Siri, Alexa and Google Assistant are household names, but they still don't do well in multilingual settings. This article first provides an overview of how voice assistants work, and then dives into the top 5 challenges for voice assistants when it comes to providing a superior multilingual user experience. It also provides strategies for mitigation of these challenges.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Using short-lived certificates to protect TLS secrets

Engineering at Meta

Short-lived certificates (SLCs) are part of our latest efforts to further secure our Transport Layer Security (TLS) private keys on our edge networks. SLCs have a very short exposure compared to traditional certificates and lower the chances of a compromised private key being abused. Implementing SLCs has required us to address tradeoffs between operability and reliability, while satisfying the strict security requirements of our edge environment.

article thumbnail

Reimagining a classic Cheysson thematic map

ArcGIS

Here's a a re-think on a classic. I'll rationalize some data-viz choices and layout choices and end up with something completely different.

Data 93
article thumbnail

Multiple Stateful Operators in Structured Streaming

databricks

In the world of data engineering, there are operations that have been used since the birth of ETL. You filter. You join. You.

article thumbnail

Fundamentals Of Statistics For Data Scientists and Analysts

KDnuggets

Key statistical concepts for your data science or data analysis journey.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How Meta is improving password security and preserving privacy

Engineering at Meta

Meta is developing new privacy-enhancing technologies (PETs) to innovate and solve problems with less data. These technologies enable teams to build and launch privacy-enhanced products in a way that’s verifiable and safeguards user data. Using state-of-the-art cryptographic techniques, we have developed Private Data Lookup (PDL) that allows users to privately query a server-side data set.

article thumbnail

Startup Spotlight: Tesorio Helps Finance Teams Tackle Cash Flow Challenges

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. Can accounts receivable be an agent of change? Tesorio Co-Founder and CTO Fabio Fleitas thinks so, and his startup’s AI/ML-driven platform aims to give finance teams better control over their cash flow so they can have greater impact on their organizations’ success.

Finance 93
article thumbnail

HDFS Snapshot Best Practices

Cloudera

Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem ( HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP).

Hadoop 85
article thumbnail

A Comprehensive Guide to MLOps

KDnuggets

Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How to Build a Fully Automated Data Drift Detection Pipeline

Towards Data Science

An Automate Guide to Detect and Handle Data Drift Continue reading on Towards Data Science »

article thumbnail

The LLM Factory: Driven by Snowflake and NVIDIA 

Snowflake

Snowflake recently announced a collaboration with NVIDIA to make it easy to run NVIDIA accelerated computing workloads directly within Snowflake accounts. One interesting use case is to train, customize, and deploy large language models (LLMs) safely and securely within Snowflake. Our new Snowpark Container Services , currently in private preview, together with NVIDIA AI, makes this possible.

article thumbnail

How to Make Your Own Search Engine: Semantic Search With LLM Embeddings by William Booth-Clibborn

Scott Logic

Google’s largest revenue source are its adverts which comprise 80% of its revenue. This relies on Google domination of the search engine market with Google Search enjoying a 92% market share. This is because Google search prioritises web pages that uses Google Ads, and the self proclaimed second largest search engine on the internet is Youtube which exclusively uses Google Ads.

article thumbnail

Unveiling StableCode: A New Horizon in AI-Assisted Coding

KDnuggets

This article explores StableCode, an innovative AI product by Stability AI, designed to enhance coding efficiency and accessibility. It delves into its unique features, underlying technology, and potential impact on the developer community.

Coding 108
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.