Sat.May 08, 2021 - Fri.May 14, 2021

article thumbnail

How to make data pipelines idempotent

Start Data Engineering

What is an idempotent function Pre-requisites Why idempotency matters Making your data pipeline idempotent Conclusion Further reading References What is an idempotent function “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” - wikipedia Defined as f(f(x)) = f(x) In the data engineering context, this can come to mean that: running a data pipeline

article thumbnail

Introducing Confluent for Kubernetes

Confluent

We are excited to announce that Confluent for Kubernetes is generally available! Today, we are enabling our customers to realize many of the benefits of our cloud service with the […].

Cloud 137
article thumbnail

Automating CDP Private Cloud Installations with Ansible

Cloudera

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse , Machine Learning , Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment. In CDP Private Cloud, the introduction of Cloudera Data Warehouse and Cloudera Machine Learning Experiences on RedHat OpenShift Kubernetes clusters means that we can deploy new

Cloud 106
article thumbnail

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Data Observability and Monitoring with DataOps

DataKitchen

Data errors impact decision-making. When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. Data errors infringe on work-life balance. They cause people to work long hours at the expense of personal and family time. Data errors also affect careers. If you have been in the data profession for any length of time, you probably know what it means to face a mob of stakeholders who are angry about inaccurate or late analytics.

article thumbnail

Using kafka-merge-purge to Deal with Failure in an Event-Driven System at FLYERALARM

Confluent

Failures are inevitable in any system, and there are various options for mitigating them automatically. This is made possible by event-driven applications leveraging Apache Kafka® and built with fault tolerance […].

Kafka 95

More Trending

article thumbnail

Making Analytical APIs Fast With Tinybird

Data Engineering Podcast

Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how they have architected their system to handle high data throughput and fast response times, and why they have invested heavily in Clickhouse as the core of their platform.

article thumbnail

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. The truth is, there’s more to this term than just the size of information generated. Not only does Big Data apply to the huge volumes of continuously growing data that come in different formats, but it also refers to the range of processes, tools, and approaches used to gain insights from that data.

article thumbnail

Achieving observability in async workflows

Netflix Tech

Written by Colby Callahan , Megha Manohara , and Mike Azar. Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Imagine getting paged outside normal work hours?—?users are having trouble with the application you’re responsible for, and you start diving into logs.

Java 67
article thumbnail

cdpcurl: Low-Level CDP API Access

Cloudera

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI , through a Java SDK , and through a low-level tool called cdpcurl. cdpcurl is des

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Beyond Resilience-The Next Generation of Supply Chain

Teradata

After the shock of COVID exposed the brittle nature of many global supply chains, focus has shifted to resilience, a necessary consideration but not the only one.

64
article thumbnail

Computer Vision in Healthcare: Creating an AI Diagnostic Tool for Medical Image Analysis

AltexSoft

Our lungs are the only body organs that constantly interact with the external environment, through the air we breathe. This exposure makes the respiratory system extremely susceptible to a wide range of diseases, from long-familiar asthma to novel COVID-19. Subtle at early stages, the signs of lung conditions are easy to overlook. And delays in diagnosis often lead to harsh consequences.

Medical 72
article thumbnail

Why are database columns 191 characters?

Grouparoo

Sometimes, when you are looking at a database’s schema, you see that there are text fields defined like this: email_address varchar ( 191 ) NOT NULL This means that the column supports strings with a maximum length of 191 characters, and can’t be null. 191 is such an odd number - where did it come from? In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases.

article thumbnail

Accelerate Moving to CDP with Workload Manager

Cloudera

Since my last blog, What you need to know to begin your journey to CDP , we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Open Banking is Transforming Financial Services and Chipping Away the Relevance of Traditional Banks

Teradata

The sharing of client data in an Open Banking marketplace challenges banks to adopt a customer-centric approach & collaborate with new players to re-define their relevance.

Banking 59
article thumbnail

DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021

DataKitchen

The post DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021 first appeared on DataKitchen.

Data 52
article thumbnail

SaaS Industry Trends in Real-Time Analytics

Rockset

We're seeing a lot of growth in real time analytics, ranging from companies that are delivering snappy, interactive experiences within their application to those doing semi-autonomous or autonomous machine learning processes. Companies are giving their users real-time data and insight with the goal of taking immediate action. This is the real time analytics trend that we're seeing across the SaaS industry.

article thumbnail

Announcing the 2021 Data Impact Awards

Cloudera

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. Each year, taking a moment to celebrate successes provides us with a wonderful opportunity to reflect on the incredible work we do together.

Food 72
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data.What? Why You Should Keep Doing Data Integration

Teradata

Data integration plays a key part of data management. But many enterprises have lost the faith in the value it can provide. Find out why data integration still matters.

article thumbnail

Responsive Mega Menu Using React Bootstrap

Grouparoo

Having clear and accessible navigation is huge for website conversions. Sites with poor navigation are frustrating to use. Nested navigation menus are a common way to help keep top-level navigation to a minimum, but they can have major usability issues. A better way to handle a large number of links in a dropdown is to create a mega menu. Recently, we gave our site navigation a face lift using mega menus.

Media 52
article thumbnail

Building Data Applications Powered by Real-Time Analytics

Rockset

For long-term success with real-time analytics it is important to use the right tool for the job. Data applications are an emerging breed of applications that demand sub-second analytics on fresh data. Examples include logistics tracking, gaming leaderboards, investment decisions systems, connected devices and embedded dashboards in SaaS apps. Real-time analytics is all about using data as soon as it is produced to answer questions, make predictions, understand relationships, and automate proces

article thumbnail

Forrester – Chart Your Course To Insights-Driven Business Maturity

DataKitchen

As organizations strive to become more data-driven, Forrester recommends 5 actions to take to move from one stage of insights-driven business maturity to another. . After establishing a solid strategy, the second phase involves planning key processes and practices to support the strategy, including “the emerging and increasingly important DataOps and ModelOps processes and methodologies.”.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Cloud Migration Series (Step 3 of 5): Assess Readiness

Cloud Academy

This is part 3 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!

Cloud 40
article thumbnail

Data Pipelining Mailchimp and Google Sheets

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 Web Developer Dylan : Hey there Mama's Travel, are you enjoying your new website? Client : Absolutely! There's just one more thing: I need a way to subscribe new people to my mailing list manually.

article thumbnail

Find and Replace Text with SQL Regular Expressions in Rockset

Rockset

In our first blog , we used a regular expression to replace the quotes in genres. Afterward, we were able to UNNEST() the JSON object. We’ll be working with the same data set in this blog In our data: Embedded content: [link] there is a JSON string that’s called spoken_languages, and it’s formatted similarly to genres: [ { "spoken_languages": "[{'iso_639_1': 'fr', 'name': 'Français'}]" }] Assuming everything is consistent, we can just write the SQL statement similar to what we wrote for genres -

SQL 40
article thumbnail

How to Extract Snowflake Data Observability Metrics Using SQL in 5 Steps

Monte Carlo

Your team just migrated to Snowflake. Your CTO is all in on this “modern data stack,” or as she calls it: “ The Enterprise Data Discovery.” But as any data engineer will tell you, not even the best tools will save you from broken pipelines. In fact, you’ve probably been on the receiving end of schema changes gone bad, duplicate tables, and one-too-many null values on more occasions than you wish to remember.

SQL 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

The Future of Data Pipeline Tools Must Include Better Transformations Than ETL Ever Had

RudderStack

Data pipeline tools have always made transformations difficult to use. RudderStack Transformations are easy to build, debug, and manage. Read more.

article thumbnail

Change the Primary Key Type with Sequelize

Grouparoo

We recently adjusted how we handle primary keys. Previously they were UUIDs with a max length of 40 characters. With our Declarative Sync feature, we allow developers to set primary key values from their configuration files. Thus, we needed to lengthen the maximum number of characters allowed on primary keys in our database. Seems simple, right? I thought so, too.

article thumbnail

The Data Stack Journey: Lessons from Architecting Stacks at Heroku and Mattermost

RudderStack

Learn how to build a data stack that will scale with you as your business grows and your data function matures.

Data 40
article thumbnail

Top 10 Tools for Data Engineers

RudderStack

The top 10 tools that data engineers use for building effective, efficient data infrastructure- Python, Spark, Snowflake, and more.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.