Sat.Mar 20, 2021 - Fri.Mar 26, 2021

article thumbnail

How to Host a Virtual Global Data Science Hackathon

Teradata

Learn how best to host a virtual hackathon, or any virtual event, with these tips and tricks from our Teradata team. Read more.

article thumbnail

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

Just an illustration – not the truth and you certainly can do it with other technologies. TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. the selfserve platform based on a serverless philisophy (life is too short to do provisioning) the building of data products (as code) : we are building data workflows not data pipelines the promotion of data domains where the metadata on the data life cycle is as important as your data The old dat

article thumbnail

Congratulations to our 2021 Partner Award Winners

Cloudera

We announced at our Partner Sales Kickoff, the winners of the 2021 Cloudera Partner Awards. These six awards recognize Cloudera partners who are dedicated to enabling customers to do more with their data by leveraging the power of an enterprise data cloud. Thank you to this year’s winners for their partnership in helping our joint customers’ ability to drive value from their data in the hybrid cloud.

article thumbnail

Real World Change Data Capture At Datacoral

Data Engineering Podcast

Summary The world of business is becoming increasingly dependent on information that is accurate up to the minute. For analytical systems, the only way to provide this reliably is by implementing change data capture (CDC). Unfortunately, this is a non-trivial undertaking, particularly for teams that don’t have extensive experience working with streaming data and complex distributed systems.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How to Run Confluent on Windows in Minutes

Confluent

I previously showed how to install and set up Apache Kafka® on Windows in minutes by using the Windows Subsystem for Linux 2 (WSL 2). From there, it’s only a […].

Kafka 72
article thumbnail

Scaling Revenue & Growth Tooling

Netflix Tech

Written by Nick Tomlin , Michael Possumato , and Rahul Pilani. This post shares how the Revenue & Growth Tools (RGT) team approaches creating full-stack tools for the teams that are the financial backbone of Netflix. Our primary partners are the teams of Revenue and Growth Engineering (RGE): Growth, Membership, Billing, Payments, and Partner Subscription.

More Trending

article thumbnail

Don’t Just Collect Vehicle Data – Monetize It!

Teradata

As the auto sector transforms, vehicle data is becoming one of the most important sources of insight. But if it is left in fragmented silos, it quickly becomes a cost & delivers little value.

IT 64
article thumbnail

How the DataKitchen Platform Delivers End-to-End Data Observability

DataKitchen

The post How the DataKitchen Platform Delivers End-to-End Data Observability first appeared on DataKitchen.

Data 52
article thumbnail

Community, Metadata Management, and More: Top 10 Links From Across the Web

Data Council

Here's our March 2021 roundup of links from across the web that we selected for you: 1. How to Build a Community (Fishtown Analytics) Claire Carroll's first personal blog post on community-building is a must-read. As Fishtown Analytics' community manager for the last 2.5 years, she's arguably behind the success of the dbt community and its best-in-class practices, so we expected good advice… but she really hit the ball out of the park with this one!

article thumbnail

Filter more pay less with the latest Cloudera Data Warehouse runtime!

Cloudera

Introduction. One of the most effective ways to improve performance and minimize cost in database systems today is by avoiding unnecessary work, such as data reads from the storage layer (e.g., disks, remote storage), transfers over the network, or even data materialization during query execution. Since its early days, Apache Hive improves distributed query execution by pushing down column filter predicates to storage handlers like HBase or columnar data format readers such as Apache ORC.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Building the Future of Payments With RippleNet’s VP of Engineering

Ripple Engineering

Amidst the work-from-home environment, Vidya Mani joined Ripple in early 2020 as the Vice President of Engineering for RippleNet. A year into her role, she focuses on improving Ripple’s infrastructure and strengthening her team to further the company’s vision for a more inclusive financial system. RippleNet is an enterprise solution which helps banks and other financial institutions streamline global payments and reach new customers.

article thumbnail

Integrating Azure and Confluent: Real-Time Search Powered by Azure Cache for Redis, Spring Cloud

Confluent

Self-managing a distributed system like Apache Kafka®, along with building and operating Kafka connectors, is complex and resource intensive. It requires significant Kafka skills and expertise in the development and […].

Kafka 52
article thumbnail

Promisifying Your Node Callback Functions

Grouparoo

The Grouparoo application is written in JavaScript (Node). It uses the modern promise-based pattern ( async / await ) for reading and writing data asynchronously. And we do this a lot — we are a data sync tool! Every once in awhile we'll come across a JavaScript library that is written around the old callback-based pattern, where the error object is the first parameter in the callback function, followed by the result.

article thumbnail

#ClouderaLife Spotlight: Christine Sherry, Director, Critical Incident Manager

Cloudera

For over 8 years, Christine Sherry has brought her hard-work ethic and skills to Cloudera. In her time here, she’s climbed up the ladder and currently sits as our Director, Global Critical Incident Management. . In her role, she leads Cloudera’s Global Support Critical Incident team who is responsible for managing our customers’ most critical technical issues to resolution.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

DataOps and the Cloud: A Match Made in Heaven

DataKitchen

The post DataOps and the Cloud: A Match Made in Heaven first appeared on DataKitchen.

Cloud 52
article thumbnail

Contributing To The Superset Project - Github + Superset

Preset

How to contribute to the Apache Superset™ Project. Help others, advocate for Superset and code development.

Project 52
article thumbnail

Ending Supply Chain Whack-a-Mole Management

Teradata

Working to optimize Retail & CPG Supply Chains often feels like a life-sized game of Whack-a-Mole -- making a change here creates an issue there. Find out how integrated, real-time data can help.

article thumbnail

Will Data Privacy drive an Enterprise Data Strategy?

Cloudera

Data privacy is an increasingly complex and contentious topic. The appropriate use of data and transparency to the potential uses of the data are at the center of debate amongst the largest Big Tech companies. . The protection and controls around data become increasingly complex when used in the context of banking and insurance activities. Personal and confidential information carries heightened sensitivity in the light of financial, health and insurance activities.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

DataKitchen Wins DataOps Company of the Year, 2021, from the Data Breakthrough Awards

DataKitchen

The post DataKitchen Wins DataOps Company of the Year, 2021, from the Data Breakthrough Awards first appeared on DataKitchen.

Data 40
article thumbnail

5 Things Every Data Engineer Needs to Know About Data Observability

Monte Carlo

As a new or aspiring data engineer, there are some essential technologies and frameworks you should know. How to build a data pipeline? Check. How to clean, transform, and model your data? Check. How to prevent broken data workflows before you get that frantic call from your CEO about her missing data? Maybe not. By leveraging best practices from our friends in software engineering and developer operations (DevOps), we can think more strategically about tackling the “good pipelines, bad data” pr

article thumbnail

On the Pursuit of Happiness (aka Squashing 502/504 Errors)

Rockset

Introduction 502 and 504 errors can be a nuisance for Rockset and our users. For many users running customer-facing applications on Rockset, availability and uptime are very important, so even a single 5xx error is cause for concern. As a cloud service, Rockset deploys code to our production clusters multiple times a week, which means that any component of our distributed system has to stop and restart with new code in an error-free way.

AWS 40
article thumbnail

Cracking the code on gender parity in the workplace

Cloudera

I used to work for a female CEO who said that in her organization, “the powder room is the power room”. It’s been a while since I heard that statement, yet such an environment is still far from the truth for many companies. Women are still underrepresented in the science, technology, engineering, and mathematics (STEM) field and more so in leadership positions.

Coding 64
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How Prophet Enables Time-Series Forecasting in Superset

Preset

Prophet is a popular time-series forecasting library created by Facebook. Learn how to use Prophet and Apache Druid for forecasting.

40