Sat.Aug 05, 2023 - Fri.Aug 11, 2023

article thumbnail

Why Is Data Modeling So Challenging – How To Data Model For Analytics

Seattle Data Guy

Learning about how to data models from basic star schemas on the internet is like learning data science using the IRIS data set. It works great as a toy example. But it doesn’t match real life at all. Data modeling in real life requires you fully understand the data sources and your business use cases.… Read more The post Why Is Data Modeling So Challenging – How To Data Model For Analytics appeared first on Seattle Data Guy.

article thumbnail

A senior engineer/EM job search story

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Senior Engineer – The Number One Skill

Confessions of a Data Guy

Do you think I’m just trying to get you to click? Maybe. Maybe not. After working in and around Data Teams for well over a decade, with both the smartest people to touch the keyboard, and the others, it’s become quite clear to me what the number one skill that identifies a Senior level Engineering […] The post Senior Engineer – The Number One Skill appeared first on Confessions of a Data Guy.

article thumbnail

_spark_metadata in Apache Spark Structured Streaming issue is no more!

Waitingforcode

There are probably not that many people working today on the flat files with Structured Streaming than 5 years ago thanks to the table file formats. However, if you are in this group and are still generating CSVs or JSONs with the streaming sink, brace yourself, the memory problems are coming if you don't take action!

130
130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Quantifying The Return On Investment For Your Data Team

Data Engineering Podcast

Summary As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.

article thumbnail

Are reports of StackOverflow’s fall greatly exaggerated?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Pulse issue. To get full issues twice a week, subscribe here.

Retail 173

More Trending

article thumbnail

Supercharging your Rust static executables with mimalloc

Tweag

Why link statically against musl? Have you ever faced compatibility issues when dealing with Linux binary executables? The culprit is often the libc implementation, glibc. Acting as the backbone of nearly all Linux distros, glibc is the library responsible for providing standard C functions. Yet, its version compatibility often poses a challenge. Binaries compiled with a newer version of glibc may not function on systems running an older one, creating a compatibility headache.

article thumbnail

Scaling the Instagram Explore recommendations system

Engineering at Meta

Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we’ve been able to make the Explore recommendation system even more scalable and flexible.

Systems 92
article thumbnail

Startup Spotlight: Tesorio Helps Finance Teams Tackle Cash Flow Challenges

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. Can accounts receivable be an agent of change? Tesorio Co-Founder and CTO Fabio Fleitas thinks so, and his startup’s AI/ML-driven platform aims to give finance teams better control over their cash flow so they can have greater impact on their organizations’ success.

Finance 90
article thumbnail

Best Python Tools for Building Generative AI Applications Cheat Sheet

KDnuggets

KDnuggets' new cheat sheet summarizes the top Python libraries for building generative AI apps, from OpenAI and Transformers to tools like Gradio, Diffusers, LangChain, and more. Ideal for both beginners and experts looking for a quick reference.

Python 86
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How to execute your operating model for Data and AI

databricks

In Part 1 of this blog series, we discussed how Databricks enables organizations to develop, manage and operate processes that extract value from.

Data 93
article thumbnail

Fixit 2: Meta’s next-generation auto-fixing linter

Engineering at Meta

Fixit is dead! Long live Fixit 2 – the latest version of our open-source auto-fixing linter. Fixit 2 allows developers to efficiently build custom lint rules and perform auto-fixes for their codebases. Fixit 2 is available today on PyPI. Python is one of the most popular languages in use at Meta. Meta’s production engineers (PEs) are specialized software engineers (SWEs) who focus on reliability, efficiency, and scalability.

Python 91
article thumbnail

HDFS Snapshot Best Practices

Cloudera

Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem ( HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP).

Hadoop 79
article thumbnail

Data Scientists Need to Specialize to Survive the Tech Winter

KDnuggets

In this article, I explore the benefits of specialization for data scientists. Drawing on my own experience as a data scientist, I argue that specializing in a specific area can help you stand out in a crowded job market and provide you with more fulfilling career opportunities.

Data 85
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

What’s new with Databricks SQL?

databricks

At this year's Data+AI Summit, Databricks SQL continued to push the boundaries of what a data warehouse can be, leveraging AI across the.

SQL 95
article thumbnail

Using short-lived certificates to protect TLS secrets

Engineering at Meta

Short-lived certificates (SLCs) are part of our latest efforts to further secure our Transport Layer Security (TLS) private keys on our edge networks. SLCs have a very short exposure compared to traditional certificates and lower the chances of a compromised private key being abused. Implementing SLCs has required us to address tradeoffs between operability and reliability, while satisfying the strict security requirements of our edge environment.

article thumbnail

The LLM Factory: Driven by Snowflake and NVIDIA 

Snowflake

Snowflake recently announced a collaboration with NVIDIA to make it easy to run NVIDIA accelerated computing workloads directly within Snowflake accounts. One interesting use case is to train, customize, and deploy large language models (LLMs) safely and securely within Snowflake. Our new Snowpark Container Services , currently in private preview, together with NVIDIA AI, makes this possible.

article thumbnail

Overcoming Barriers in Multi-lingual Voice Technology: Top 5 Challenges and Innovative Solutions

KDnuggets

Voice assistants like Siri, Alexa and Google Assistant are household names, but they still don't do well in multilingual settings. This article first provides an overview of how voice assistants work, and then dives into the top 5 challenges for voice assistants when it comes to providing a superior multilingual user experience. It also provides strategies for mitigation of these challenges.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Multiple Stateful Operators in Structured Streaming

databricks

In the world of data engineering, there are operations that have been used since the birth of ETL. You filter. You join. You.

article thumbnail

How Meta is improving password security and preserving privacy

Engineering at Meta

Meta is developing new privacy-enhancing technologies (PETs) to innovate and solve problems with less data. These technologies enable teams to build and launch privacy-enhanced products in a way that’s verifiable and safeguards user data. Using state-of-the-art cryptographic techniques, we have developed Private Data Lookup (PDL) that allows users to privately query a server-side data set.

article thumbnail

Reimagining a classic Cheysson thematic map

ArcGIS

Here's a a re-think on a classic. I'll rationalize some data-viz choices and layout choices and end up with something completely different.

Data 84
article thumbnail

A Comprehensive Guide to MLOps

KDnuggets

Machine Learning Operations (MLOps) is a relatively new discipline that provides the structure and support necessary for machine learning (ML) models to thrive in production environments.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

A New Partnership with Redox and How We Unlock Healthcare Data to Drive Advanced Analytics

databricks

Healthcare is sitting on mountains of data Pop quiz: Which industry accounts for about 30% of newly created data around the world and.

article thumbnail

Pioneering Data Observability:Data, Code, Infrastructure, & AI

Towards Data Science

Pioneering Data Observability: Data, Code, Infrastructure, & AI The four dimensions of data observability: data, code, infrastructure, and ai? Image courtesy of the author. Outlining the past, present, and future of architecting reliable data systems. When we launched the data observability category in 2019, the term was something I could barely pronounce.

Coding 72
article thumbnail

Confluent Champion: Niki Kapsi’s Journey From SDR to Commercial Account Executive

Confluent

Meet Commercial AE Niki Kapsi and learn about the “entrepreneurial” side of her role at Confluent.

98
article thumbnail

Unveiling StableCode: A New Horizon in AI-Assisted Coding

KDnuggets

This article explores StableCode, an innovative AI product by Stability AI, designed to enhance coding efficiency and accessibility. It delves into its unique features, underlying technology, and potential impact on the developer community.

Coding 80
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

How to Make Your Own Search Engine: Semantic Search With LLM Embeddings by William Booth-Clibborn

Scott Logic

Google’s largest revenue source are its adverts which comprise 80% of its revenue. This relies on Google domination of the search engine market with Google Search enjoying a 92% market share. This is because Google search prioritises web pages that uses Google Ads, and the self proclaimed second largest search engine on the internet is Youtube which exclusively uses Google Ads.

article thumbnail

How to Build a Fully Automated Data Drift Detection Pipeline

Towards Data Science

An Automate Guide to Detect and Handle Data Drift Continue reading on Towards Data Science »

article thumbnail

What is an Apache Kafka Cluster? (And Why You Should Care)

Confluent

Learn what an Apache Kafka cluster is, and what makes a cluster special.

Kafka 96
article thumbnail

5 Python Packages For Geospatial Data Analysis

KDnuggets

This article discusses the importance of geospatial analysis and introduces five essential Python packages for effectively handling and visualizing valuable insights from geospatial data.

Python 79
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.