Sat.Jan 07, 2023 - Fri.Jan 13, 2023

article thumbnail

Inside Pollen's Software Engineering Salaries

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one and a half out of eight topics in today’s subscriber-only issue, Inside Pollen's Transparent Compensation Data. If you’re not yet a subscriber, you also missed this week’s deep-dive on Becoming a Fractional CTO. To get this newsletter every week, subscribe here.

article thumbnail

Simplify Delta Lake Complexity with mack.

Confessions of a Data Guy

Anyone who’s been roaming around the forest of Data Engineering has probably run into many of the newish tools that have been growing rapidly around the concepts of Data Warehouses, Data Lakes, and Lake Houses … the merging of the old relational database functionality with TB and PB level cloud-based file storage systems. Tools like […] The post Simplify Delta Lake Complexity with mack. appeared first on Confessions of a Data Guy.

Data Lake 162
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Typing 2. Dataclass 3. Context Managers 4. Testing with pytest 5. Decorators Misc Conclusion Further reading References Introduction Using the appropriate code design pattern can make your code easy to read, extensible, and seamless to modify existing logic, debug, and enable developers to onboard quicker.

Designing 148
article thumbnail

Analysis of Confluent Buying Immerok

Jesse Anderson

If you haven’t heard, Confluent announced they’re buying Immerok. This purchase represents a significant shift in strategy for Confluent. I started a Twitter thread with some of my initial thoughts, but I want to write a post giving more analysis and opinions. In short, I still echo the sentiment from my original tweet “This was always the way it should have been.

Kafka 147
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Warehouse Consultants – What Do They Do And Why You Need One

Seattle Data Guy

A data warehouse consultant plays an important role in companies looking to become data-driven. They help companies design and deploy centralized data sets that are easy to use and reliable. But in order to understand why you need a data warehouse consultant we should take a step back. In this article we will not only… Read more The post Data Warehouse Consultants – What Do They Do And Why You Need One appeared first on Seattle Data Guy.

article thumbnail

Using Rust to write a Data Pipeline. Thoughts. Musings.

Confessions of a Data Guy

Rust has been on my mind a lot lately, probably because of Data Engineering boredom, watching Spark clusters chug along like some medieval farm worker endlessly trudging through the muck and mire of life. Maybe Rust has breathed some life back into my stagnant soul, reminding me there is a big world out there, […] The post Using Rust to write a Data Pipeline.

More Trending

article thumbnail

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake.

article thumbnail

Improving Your Data Analytics Infrastructure In 2023 – Part 1

Seattle Data Guy

Data has been consistently demonstrated to be a valuable asset for businesses of all sizes. Consulting firms, like McKinsey, have found that companies using AI and analytics attribute 20% of their earnings to it. As a consultant, I have personally witnessed how data can uncover new sources of revenue and cost reduction opportunities for clients… Read more The post Improving Your Data Analytics Infrastructure In 2023 – Part 1 appeared first on Seattle Data Guy.

article thumbnail

Data News — Week 23.01

Christophe Blefari

You and me celebrating 2023 ( credits ) Happy new year 🎆 For those who were already subscribed at the start of last year I tried to put resolutions and objectives for the year that I did not succeed to follow. The year was so different to what I was expected. Maybe this is an excuse. Anyway I did not reach my goals. What about if we don't care for this year?

article thumbnail

Modern Data Stack: The Struggle of Enterprise Adoption

Simon Späti

In part I, The Open Data Stack Distilled into Four Core Tools, we discussed how to quickly set up a data stack, tackling end-to-end data analytics challenges. As a manager or developer working with data at a mid- to large-sized enterprise, you might ask why aren’t we using any of these tools. In this article, we dive into what mid-to-large-sized companies are using instead, the struggle of setting up a Modern Data Stack (MDS) for an enterprise size, and the opportunities of a free-of-charge and

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Where Collaboration Fails Around Data (And 4 Tips for Fixing It)

KDnuggets

Data-driven organizations require complex collaboration between data teams and business stakeholders. Here are 4 proactive tips for reducing information asymmetries and achieving better collaboration.

IT 160
article thumbnail

Succeeding with Change Data Capture

Confluent

CDC is a software design pattern that identifies and captures changes made to data in a database. Learn how CDC works, the best solutions, and how to get started with various implementations.

Data 124
article thumbnail

Product Discovery – Building the Right Things

Teradata

Product discovery is a process that cross functional product teams follow to reduce the uncertainty about a problem worth solving and a solution worth developing. Learn more.

article thumbnail

Databricks Power BI Connector Now Supports Native Query

databricks

This is a collaborative post from Databricks and Microsoft. We thank Mahesh Prakriya (Director in Intelligence Platform, Microsoft) and Bob Zhang (Sr. Technical.

BI 97
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

7 Best Platforms to Practice SQL

KDnuggets

Looking to level up your SQL skills? Here's a list of the best platforms to practice SQL, ace your SQL interviews, and land your dream data role.

SQL 149
article thumbnail

In the spotlight with Nick Cooper: ThoughtSpot’s Selfless Excellence champion

ThoughtSpot

This is part of our ongoing spotlight series which highlights ThoughtSpot’s quarterly Selfless Excellence champion. Culture and shared values are at the heart of every decision, innovation, and team member at ThoughtSpot. By creating a family-first mentality among a truly diverse and inclusive team , we’ve been able to build more authentic relationships with one another.

article thumbnail

Top Data Integrity Trends Fueling Confident Business Decisions in 2023

Precisely

With global data creation projected to grow to more than 180 zettabytes by 2025 , it’s not surprising that more organizations than ever are looking to harness their ever-growing datasets to drive more confident business decisions. In fact, a recent study from 451 Research shows that nearly 79% of businesses report data will be more important to their organization’s strategic-making over the next 12 months.

article thumbnail

Supercharging H3 for Geospatial Analytics

databricks

On the heels of the initial release of H3 support in Databricks Runtime (DBR), we are happy to share ground-breaking performance improvements with.

86
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Top Posts January 2-8: Python Matplotlib Cheat Sheets

KDnuggets

Python Matplotlib Cheat Sheets • Free Data Management with Data Science Learning with CS639 • How to Select Rows and Columns in Pandas Using [ ],loc, iloc,at and.iat • Creating a Web Application to Extract Topics from Audio with Python • More Data Science Cheatsheets.

Python 120
article thumbnail

Saving Lives, Saving Costs: Predicting Heart Failure with Teradata

Teradata

A team of Teradata data scientists & industry experts worked alongside a U.S. insurance company to develop a solution that would predict the onset of heart failure 6 months in advance. Find out more.

article thumbnail

5 Challenges of Ethical Data Stewardship

Precisely

The pressure is mounting. Data privacy regulations are constantly evolving, and customer preferences and expectations are high and on the move. That means businesses want to provide hyper-personalized experiences, but they also need to ensure they’re using, sharing, and protecting customer data with the utmost integrity. And with the rising focus on environmental, social, and governance (ESG), businesses can no longer rely on quality products alone to win and maintain the support of customers, e

article thumbnail

How DoorDash Upgraded a Heuristic with ML to Save Thousands of Canceled Orders

DoorDash Engineering

One challenge in running our platform is being able to accurately track Merchants’ operational status and ability to receive and fulfill orders. For example, when a Merchant’s location is physically closed but marked as open on our platform, we might create a bad experience for all of our users; a Dasher cannot complete their accepted delivery, the Consumer cannot receive their ordered food, and the Merchant could see lower future revenues.

Food 62
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Overcome Your Data Quality Issues with Great Expectations

KDnuggets

Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.

Data 132
article thumbnail

Better Data for Better Decisions in the Public Sector Through Entity Resolution - Part 1

databricks

One of the domains where better decisions mean a better society is the Public Sector. Each and every one of us has a.

Data 82
article thumbnail

What Are Node.js Frameworks?: How To Choose the Best Node.js Framework for 2023

Trio

Node.js powers many of the modern real-time web applications you’re likely familiar with. It’s a scalable JavaScript runtime environment widely used to build online games, messengers, video platforms, and more. Technology companies like Netflix, Uber, Trello, and others use Node to create both rich user interfaces (UIs) and server-side environments.

article thumbnail

How to build a Snowflake API | Propel Data Analytics Blog

Propel Data

Create and query an API on top of your Snowflake data warehouse using Propel’s blazing-fast Serverless Analytics API Platform

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

How to Perform Unit Testing in Python?

KDnuggets

Unit testing is an important part of the software development life cycle as it helps to ensure that code is correct and working as intended. This article aims to introduce the concept of unit testing in Python and provide a basic tutorial on how to write and run unit tests using a unittest module.

Python 108
article thumbnail

Streaming in Production: Collected Best Practices, Part 2

databricks

In our two-part blog series titled "Streaming in Production: Collected Best Practices," this is the second article. Here we discuss the "After Deployment".

65
article thumbnail

Build an end to end JSON logging system for clients apps

Pinterest Engineering

Liang Ma | Software Engineer, Core Eng; Wei Zhu | Software Engineer, Observability In early 2020, during a critical iOS out of memory incident (we have a blogpost for that), we realized that we didn’t have much visibility of how the app is running or a good system to look up for monitoring and troubleshooting. State of logging At that time, on the client side, there were a few ways for logging in their daily work: Context logging : built for logging and reporting impressions or anything related

Systems 57
article thumbnail

The Power of Collaboration in Product Development

Eventbrite Engineering

Product development at Eventbrite is a practice centered around understanding what our customers need, so we can enhance current features or build new products. In order to achieve this, our product team collaborates across multiple disciplines throughout the company to ensure we’re thinking about customer needs from all angles. Who is involved in product development?

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.