March, 2022

article thumbnail

End-to-end data engineering project - batch edition

Start Data Engineering

Objective Setup Pre-requisites Components Source systems Schedule & Orchestrate Extract Load Transform Data visualization Choosing tools & frameworks Future work & improvements Conclusion Further reading References Objective It can be difficult to know where to begin when starting a data engineering side project. If you have wondered What data to use for your data project?

article thumbnail

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

WTF is a Tensor?!?

KDnuggets

A tensor is a container which can house data in N dimensions, along with its linear operations, though there is nuance in what tensors technically are and what we refer to as tensors in practice.

IT 160
article thumbnail

How to make Apache Kafka clients go fast(er) on Confluent Cloud

Confluent

Imagine your team wants to design a data streaming architecture and you’re in charge of creating the prototype. Within a few minutes, you provision a fully managed Apache Kafka® cluster […].

Kafka 126
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Telecommunications Service Provider Journey – From Telco to Techco

Cloudera

Earlier this month, the multi-national carrier MTN announced a rebranding, and along with its logo refresh, announced that it was moving to focus on being a technology provider. The new look, “aligns with our evolution from a telecommunications company to a technology company,” said Nompilo Morafo, Chief Corporate Affairs officer at the company. Across APAC too, telcos are looking at the shift to becoming technology companies, and last week’s TMForum Leadership Summit “ The Tech Driven Telco ” s

article thumbnail

The Soldiers, Rogues, and Mages of Data Teams

Jesse Anderson

Data Teams are like Role Playing Games (RPG). If you’re not familiar with RPGs, there is a person or group of characters all working together for a common goal. A crucial part of the individual characters are their levels, skills, and stats. In many games, higher levels are required to unlock specific skills. Likewise, stats show how well a character can utilize their skills.

More Trending

article thumbnail

Eliminate The Bottlenecks In Your Key/Value Storage With SpeeDB

Data Engineering Podcast

Summary At the foundational layer many databases and data processing engines rely on key/value storage for managing the layout of information on the disk. RocksDB is one of the most popular choices for this component and has been incorporated into popular systems such as ksqlDB. As these systems are scaled to larger volumes of data and higher throughputs the RocksDB engine can become a bottleneck for performance.

article thumbnail

A Guide On How To Become A Data Scientist (Step By Step Approach)

KDnuggets

Becoming a Data Scientists is an exciting path, but you cannot learn data science within one year or six months—instead, it’s a lifetime process that you have to follow with proper dedication and hard work. To guide your journey, the skills outlined here are the first you must acquire to become a data scientist.

article thumbnail

Announcing ksqlDB 0.24.0

Confluent

We are excited to announce ksqlDB 0.24! It comes with a slew of improvements and new features. Access to Apache Kafka® record headers will enable a whole host of new […].

Kafka 111
article thumbnail

Do Data Companies Need Chief Ethics Officers?

Cloudera

Sometimes it takes a billion-dollar mistake to bring the murkier side of data ethics into sharp focus. Equifax found this out to their own cost in 2017 when they failed to protect the data of almost 150 million users globally. The catastrophic breach was bad enough on its own — but Equifax waited three months to go public with the news. As the public furore rose to a crescendo, the credit organization dragged its feet on disclosing exactly what kind of information had been leaked.

Data 119
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Case Study: How Rockset Made Me a Day Three Hero at Sounding Board

Rockset

I’ve been working as a data and software engineer for more than 20 years. Not long after I joined my current employer Sounding Board , I had to normalize nested JSON arrays in a complex document schema so that I could join the child records to other collections and then denormalize data into a single result set — and I had to do it fast. On top of that, I had to make that data available to our custom-built application via a secure RESTful endpoint with a less than one second response time.

MongoDB 52
article thumbnail

Crystal Ball, Black Box or Advanced Forecasting and Demand Planning in Retail and CPG

Teradata

Neither crystal balls nor black boxes will provide the agility needed for accurate demand forecasting in today’s retail & CPG environment. Learn more about new approaches to FDP.

Retail 98
article thumbnail

Accelerate Your Embedded Analytics With Apache Pinot

Data Engineering Podcast

Summary Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user product creates a significant shift in requirements for your data layer. The Pinot OLAP datastore was created for this purpose, optimizing for low latency queries on rapidly updating datasets with highly concurrent queries.

Datasets 100
article thumbnail

3 Reasons Why You Should Use Linear Regression Models Instead of Neural Networks

KDnuggets

While there may always seem to be something new, cool, and shiny in the field of AI/ML, classic statistical methods that leverage machine learning techniques remain powerful and practical for solving many real-world business problems.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Securing Your Logs in Confluent Cloud with HashiCorp Vault

Confluent

Logging is an important component of managing service availability, security, and customer experience. It allows Site Reliability Engineers (SREs), developers, security teams, and infrastructure teams to gain insights to how […].

Cloud 105
article thumbnail

Women Leaders in Data Discuss Breaking Bias on International Women’s Day

Cloudera

As an official sponsor of International Women’s Da y, Cloudera is excited to celebrate Women’s History Month and International Women’s Day, and to take up the mantle of this year’s theme #BreakTheBias. . Even in industries where women are underrepresented, like tech, women have made a lot of progress. Progress over many decades has slowly transformed the workplace into an environment where women’s strengths are recognized and valued.

Big Data 118
article thumbnail

You Have More Data Quality Issues Than You Think 

Monte Carlo

Say it with me: your data will never be perfect. Any team striving for completely accurate data will be sorely disappointed. Data testing , anomaly detection, and cataloging are important steps, but technology alone will not solve your data quality problem. Like any entropic system, data breaks. And as we’ve learned building solutions to curb the causes and downstream impact of data issues, it happens more often than you think.

article thumbnail

Women of Teradata: Molly Treese

Teradata

In honor of Women's History Month, we are spotlighting Molly Treese, Teradata's Chief Legal Officer, as she looks back at her career in law & recounts the importance of inclusion in the workplace.

98
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Exploring Incident Management Strategies For Data Teams

Data Engineering Podcast

Summary Data assets and the pipelines that create them have become critical production infrastructure for companies. This adds a requirement for reliability and management of up-time similar to application infrastructure. In this episode Francisco Alberini and Mei Tao share their insights on what incident management looks like for data platforms and the teams that support them.

article thumbnail

How to Stay on Top of What’s Going on in the AI World

KDnuggets

How do you keep up with all the news and trends, and navigate through the endless stream of AI information? Check out this author's list of favorite AI papers sources that help you float effortlessly in the info ocean.

159
159
article thumbnail

Introducing Stream Processing Use Case Recipes Powered by ksqlDB

Confluent

From fraud detection and predictive analytics, to real-time customer experiences and cyber security, stream processing has countless benefits for use cases big and small. By unlocking the power of continuous […].

Process 87
article thumbnail

#BreakTheBias: It’s a Journey

Cloudera

Bias is everywhere. . We’re surrounded by it. . And it’s natural. We are alive today as a species because of biases. But it has a tangible impact on our personal and professional lives. Biases shape us and our experience. . As primary caregivers, women have felt the impact of biases and expectations more keenly during the pandemic. Last year women in my network felt like they were being expected to do everything at home and at work.

Education 109
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

This 6-Month Product Management Program Is The Ultimate Choice For Next-Gen Product Experts!

U-Next

Let’s face it! Product Management CAN BE TOUGH, but only if you haven’t laid your hands on the best training experience for Product enthusiasts in all its glory: the PG Certificate Program in Product Management by IIM Indore & Jigsaw. Several present-day Product Experts started their journeys with this exclusive 6-month program & found multiple doors of opportunities, wide open to welcome them.

article thumbnail

Closing the Gap Left by Third Party Cookie Deprecation

Teradata

Consumers expect personalized experiences when they interact with a brand. But organizations are losing the ability to listen to their customers via digital channels. Fixing this is critical.

98
article thumbnail

Taking A Multidimensional Approach To Data Observability At Acceldata

Data Engineering Podcast

Summary Data observability is a term that has been co-opted by numerous vendors with varying ideas of what it should mean. At Acceldata, they view it as a holistic approach to understanding the computational and logical elements that power your analytical capabilities. In this episode Tristan Spaulding, head of product at Acceldata, explains the multi-dimensional nature of gaining visibility into your running data platform and how they have architected their platform to assist in that endeavor.

Data Lake 100
article thumbnail

Top Posts Feb 28 – Mar 6: The Complete Collection of Data Science Cheat Sheets – Part 2

KDnuggets

Also: Calculus: The hidden building block of machine learning; Decision Tree Algorithm, Explained; Telling a Great Data Story: A Visualization Decision Tree; The Complete Collection of Data Science Cheat Sheets – Part 1.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

An Introduction to Data Mesh

Confluent

Decentralized architectures continue to flourish as engineering teams look to unlock the potential of their people and systems. From Git, to microservices, to cryptocurrencies, these designs look to decentralization as […].

article thumbnail

New Features in Cloudera Streams Messaging for CDP Public Cloud 7.2.14

Cloudera

With the launch of CDP Public Cloud 7.2.14, Cloudera Streams Messaging for Data Hub deployments has gotten some powerful new features! In this release , the Streams Messaging templates in Data Hub will come with Apache Kafka 2.8 and Cruise Control 2.5 providing new core features and fixes. KConnect has been added and gains additional capabilities with new connectors and Stateless Apache NiFi capabilities which can run NiFi Flows as connectors.

Cloud 106
article thumbnail

These Sales Enthusiasts Mastered Strategic Sales In Just 4 Months With The Executive Program in Strategic Sales Management

U-Next

With the onset of the 5th industrial revolution, the world is moving closer towards embracing newer technologies in almost every walk of life. In the business ecosphere, those who upskill & transform into the best professionals versions of themselves are bound to be at the forefront of this revolution. The Sales domain, too, cannot be home to traditional sales methods for too long.

article thumbnail

Women of Teradata: Claire Bramley

Teradata

In honor of Women's History Month, we are spotlighting Claire Bramley, Teradata's Chief Financial Officer, as she looks back at her career in finance and tech.

Finance 98
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.