December, 2021

article thumbnail

Revisiting The Technical And Social Benefits Of The Data Mesh

Data Engineering Podcast

Summary The data mesh is a thesis that was presented to address the technical and organizational challenges that businesses face in managing their analytical workflows at scale. Zhamak Dehghani introduced the concepts behind this architectural patterns in 2019, and since then it has been gaining popularity with many companies adopting some version of it in their systems.

BI 130
article thumbnail

Azure Data Factory Linked Service: Advanced Authoring

Azure Data Engineering

We have discussed Linked Service parameterization through the UI, in a previous post. But not all Linked Service Types support parametrization using the UI. In this post, we will discuss the Linked Services that can’t be parameterized using the UI. (i.e., they don’t have any option to add parameter). If you are familiar with Azure Services, you might know that the Linked Services or any other Azure artefact has corresponding underlying JSON code.

Coding 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to choose the right tools for your data pipeline

Start Data Engineering

1. Introduction 2. Requirements 3. Components 4. Choosing tools 4.1 Requirement x Component framework 4.2 Filters 5. Conclusion 6. Further reading 1. Introduction If you are building data pipelines from the ground up, the number of available data engineering tools to choose from can be overwhelming. If you are thinking Most of the tools seem to be doing the same/similar thing, which one should I choose?

article thumbnail

6 Predictive Models Every Beginner Data Scientist Should Master

KDnuggets

Data Science models come with different flavors and techniques — luckily, most advanced models are based on a couple of fundamentals. Which models should you learn when you want to begin a career as Data Scientist? This post brings you 6 models that are widely used in the industry, either in standalone form or as a building block for other advanced techniques.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A Guide to Stream Processing and ksqlDB Fundamentals

Confluent

Event streaming applications are a powerful way to react to events as they happen and to take advantage of data while it is fresh. However, they can be a challenge […].

Process 141
article thumbnail

Cloudera Response to CVE-2021-44228

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 127

More Trending

article thumbnail

Cadence Multi-Tenant Task Processing

Uber Engineering

Introduction. Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to handle millions of concurrent executions from various customers. It is currently used by hundreds of … The post Cadence Multi-Tenant Task Processing appeared first on Uber Engineering Blog.

Process 104
article thumbnail

2021 Gift Giving Guide for Data Nerds

DataKitchen

Back by popular demand, we’ve updated our data nerd Gift Giving Guide to cap off 2021. We’ve kept some classics and added some new titles that are sure to put a smile on your data nerd’s face. Here are eight highly recommendable books to help you find that special gift. ?? ?? ???. Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI, by Randy Bean.

article thumbnail

Building a solid data team

KDnuggets

How do you put together a solid data science team when it comes to developing data-driven products? A variety of roles are available to consider, so which ones do you need and which are most crucial?

Building 160
article thumbnail

Best Tutorials for Getting Started with Apache Kafka

Confluent

Each one of the more than 50 tutorials for Apache Kafka® on Confluent Developer answers a question that you might ask a knowledgeable friend or colleague about Kafka and its […].

Kafka 135
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Cloudera

For Cloudera ensuring data security is critical because we have large customers in highly regulated industries like financial services and healthcare, where security is paramount. Also, for other industries like retail, telecom or public sector that deal with large amounts of customer data and operate multi-tenant environments, sometimes with end users who are outside of their company, securing all the data may be a very time intensive process.

article thumbnail

Building Auditable Spark Pipelines At Capital One

Data Engineering Podcast

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large volumes of data Capital One has invested in it for their business needs. In this episode Gokul Prabagaren shares his use for it in calculating your rewards points, including the auditing requirements and how he designed his pipeline to maintain all of the necessary information through a pattern of data enrichment.

Building 130
article thumbnail

Reference Data: Smoothing Out the Bumps in M&A

Teradata

M&A is an important part of an organization's growth strategy. Getting reference data right can be foundational to overcoming many challenges that come with it.

Data 98
article thumbnail

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

Before we shut the door on 2021, we would like to share our most popular DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year. Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

5 Practical Data Science Projects That Will Help You Solve Real Business Problems for 2022

KDnuggets

This curated list of data science projects offers real-life problems that will help you master skills to demonstration that you are technically sound and know how to conduct data science projects that add business value.

article thumbnail

I Interviewed Nearly 200 Apache Kafka Experts and I Learned These 10 Things

Confluent

Many leading lights of the Apache Kafka® community have appeared as guests on Streaming Audio at one time or another in the past three years. But some of its episodes […].

Kafka 129
article thumbnail

Cloudera Data Engineering 2021 Year End Review

Cloudera

Since the release of Cloudera Data Engineering (CDE) more than a year ago , our number one goal was operationalizing Spark pipelines at scale with first class tooling designed to streamline automation and observability. In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines.

article thumbnail

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

Data Engineering Podcast

Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of information. The team at Metaphor are building a fully connected metadata layer to provide both technical and social intelligence about your data. In this episode Pardhu Gunnam and Mars Lan explain how they have designed the architecture and user experience to allow everyone to collaborate on the data lifecycle and provide opportunities for automatio

Systems 100
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Delivering Actionable Financial Insights to Automotive Business Leaders

Teradata

Automotive businesses need to build new frameworks for CFO Analytics that leverage existing systems to provide the granular, timely data they need to succeed. Read more.

Systems 89
article thumbnail

8 analytics startups to watch over the next year

DataKitchen

The post 8 analytics startups to watch over the next year first appeared on DataKitchen.

123
123
article thumbnail

Data Labeling for Machine Learning: Market Overview, Approaches, and Tools

KDnuggets

So much of data science and machine learning is founded on having clean and well-understood data sources that it is unsurprising that the data labeling market is growing faster than ever. Here, we highlight many of the top players in this industry and the techniques they use to help you consider which might make a good partner for your needs.

article thumbnail

The Definitive Guide to Building a Data Mesh with Event Streams

Confluent

Data mesh. This oft-talked-about architecture has no shortage of blog posts, conference talks, podcasts, and discussions. One thing that you may have found lacking is a concrete guide on precisely […].

Building 129
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

AI and ML: No Longer the Stuff of Science Fiction

Cloudera

Artificial Intelligence (AI) has revolutionized how various industries operate in recent years. But with growing demands, there’s a more nuanced need for enterprise-scale machine learning solutions and better data management systems. The 2021 Data Impact Awards aim to honor organizations who have shown exemplary work in this area. . The category “Data for Enterprise AI” awards companies from around the world that have built and deployed use cases for enterprise-scale machine learning and have in

article thumbnail

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Data Engineering Podcast

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform.

article thumbnail

Snaring the Bad Folks

Netflix Tech

Project by Netflix’s Cloud Infrastructure Security team ( Alex Bainbridge , Mike Grima , Nick Siow) Cloud security is a hard problem, but an even harder one is cloud security at scale. In recent years we’ve seen several cloud focused data breaches and evidence shows that threat actors are becoming more advanced with their techniques, goals, and tooling.

AWS 81
article thumbnail

Data-Driven in 2022: Data Management Opportunities in the Year Ahead

DataKitchen

The post Data-Driven in 2022: Data Management Opportunities in the Year Ahead first appeared on DataKitchen.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Data Science & Analytics Industry Main Developments in 2021 and Key Trends for 2022

KDnuggets

We have solicited insights from experts at industry-leading companies, asking: "What were the main AI, Data Science, Machine Learning Developments in 2021 and what key trends do you expect in 2022?" Read their opinions here.

article thumbnail

Serverless Stream Processing with Apache Kafka, AWS Lambda, and ksqlDB

Confluent

It seems like now more than ever developers are surrounded by a sea of terminology—but what does it really all mean? Here, we will take some often heard terms—some considered […].

AWS 126
article thumbnail

In AI we trust? Why we Need to Talk About Ethics and Governance (part 2 of 2)

Cloudera

In part 1 of this blog post, we discussed the need to be mindful of data bias and the resulting consequences when certain parameters are skewed. Surely there are ways to comb through the data to minimise the risks from spiralling out of control. We need to get to the root of the problem. In 2019, the Gradient institute published a white paper outlining the practical challenges for Ethical AI.

article thumbnail

Experimentation and A/B Testing For Modern Data Teams With Eppo

Data Engineering Podcast

Summary A/B testing and experimentation are the most reliable way to determine whether a change to your product will have the desired effect on your business. Unfortunately, being able to design, deploy, and validate experiments is a complex process that requires a mix of technical capacity and organizational involvement which is hard to come by. Chetan Sharma founded Eppo to provide a system that organizations of every scale can use to reduce the burden of managing experiments so that you can f

BI 100
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.