Sat.May 08, 2021 - Fri.May 14, 2021

article thumbnail

How to make data pipelines idempotent

Start Data Engineering

What is an idempotent function Pre-requisites Why idempotency matters Making your data pipeline idempotent Conclusion Further reading References What is an idempotent function “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” - wikipedia Defined as f(f(x)) = f(x) In the data engineering context, this can come to mean that: running a data pipeline

article thumbnail

Introducing Confluent for Kubernetes

Confluent

We are excited to announce that Confluent for Kubernetes is generally available! Today, we are enabling our customers to realize many of the benefits of our cloud service with the […].

Cloud 137
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Automating CDP Private Cloud Installations with Ansible

Cloudera

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse , Machine Learning , Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment. In CDP Private Cloud, the introduction of Cloudera Data Warehouse and Cloudera Machine Learning Experiences on RedHat OpenShift Kubernetes clusters means that we can deploy new

Cloud 104
article thumbnail

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Data Observability and Monitoring with DataOps

DataKitchen

Data errors impact decision-making. When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. Data errors infringe on work-life balance. They cause people to work long hours at the expense of personal and family time. Data errors also affect careers. If you have been in the data profession for any length of time, you probably know what it means to face a mob of stakeholders who are angry about inaccurate or late analytics.

article thumbnail

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. The truth is, there’s more to this term than just the size of information generated. Not only does Big Data apply to the huge volumes of continuously growing data that come in different formats, but it also refers to the range of processes, tools, and approaches used to gain insights from that data.

More Trending

article thumbnail

Making Analytical APIs Fast With Tinybird

Data Engineering Podcast

Summary Building an API for real-time data is a challenging project. Making it robust, scalable, and fast is a full time job. The team at Tinybird wants to make it easy to turn a continuous stream of data into a production ready API or data product. In this episode CEO Jorge Sancha explains how they have architected their system to handle high data throughput and fast response times, and why they have invested heavily in Clickhouse as the core of their platform.

article thumbnail

Achieving observability in async workflows

Netflix Tech

Written by Colby Callahan , Megha Manohara , and Mike Azar. Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Imagine getting paged outside normal work hours?—?users are having trouble with the application you’re responsible for, and you start diving into logs.

Java 66
article thumbnail

Beyond Resilience-The Next Generation of Supply Chain

Teradata

After the shock of COVID exposed the brittle nature of many global supply chains, focus has shifted to resilience, a necessary consideration but not the only one.

64
article thumbnail

cdpcurl: Low-Level CDP API Access

Cloudera

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI , through a Java SDK , and through a low-level tool called cdpcurl. cdpcurl is des

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

SaaS Industry Trends in Real-Time Analytics

Rockset

We're seeing a lot of growth in real time analytics, ranging from companies that are delivering snappy, interactive experiences within their application to those doing semi-autonomous or autonomous machine learning processes. Companies are giving their users real-time data and insight with the goal of taking immediate action. This is the real time analytics trend that we're seeing across the SaaS industry.

article thumbnail

Why are database columns 191 characters?

Grouparoo

Sometimes, when you are looking at a database’s schema, you see that there are text fields defined like this: email_address varchar ( 191 ) NOT NULL This means that the column supports strings with a maximum length of 191 characters, and can’t be null. 191 is such an odd number - where did it come from? In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases.

article thumbnail

Open Banking is Transforming Financial Services and Chipping Away the Relevance of Traditional Banks

Teradata

The sharing of client data in an Open Banking marketplace challenges banks to adopt a customer-centric approach & collaborate with new players to re-define their relevance.

Banking 59
article thumbnail

Accelerate Moving to CDP with Workload Manager

Cloudera

Since my last blog, What you need to know to begin your journey to CDP , we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021

DataKitchen

The post DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021 first appeared on DataKitchen.

Data 52
article thumbnail

Responsive Mega Menu Using React Bootstrap

Grouparoo

Having clear and accessible navigation is huge for website conversions. Sites with poor navigation are frustrating to use. Nested navigation menus are a common way to help keep top-level navigation to a minimum, but they can have major usability issues. A better way to handle a large number of links in a dropdown is to create a mega menu. Recently, we gave our site navigation a face lift using mega menus.

Media 52
article thumbnail

Data.What? Why You Should Keep Doing Data Integration

Teradata

Data integration plays a key part of data management. But many enterprises have lost the faith in the value it can provide. Find out why data integration still matters.

article thumbnail

Announcing the 2021 Data Impact Awards

Cloudera

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. Each year, taking a moment to celebrate successes provides us with a wonderful opportunity to reflect on the incredible work we do together.

Food 70
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Forrester – Chart Your Course To Insights-Driven Business Maturity

DataKitchen

As organizations strive to become more data-driven, Forrester recommends 5 actions to take to move from one stage of insights-driven business maturity to another. . After establishing a solid strategy, the second phase involves planning key processes and practices to support the strategy, including “the emerging and increasingly important DataOps and ModelOps processes and methodologies.”.

article thumbnail

Data Pipelining Mailchimp and Google Sheets

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 Web Developer Dylan : Hey there Mama's Travel, are you enjoying your new website? Client : Absolutely! There's just one more thing: I need a way to subscribe new people to my mailing list manually.

article thumbnail

Find and Replace Text with SQL Regular Expressions in Rockset

Rockset

In our first blog , we used a regular expression to replace the quotes in genres. Afterward, we were able to UNNEST() the JSON object. We’ll be working with the same data set in this blog In our data: Embedded content: [link] there is a JSON string that’s called spoken_languages, and it’s formatted similarly to genres: [ { "spoken_languages": "[{'iso_639_1': 'fr', 'name': 'Français'}]" }] Assuming everything is consistent, we can just write the SQL statement similar to what we wrote for genres -

SQL 40
article thumbnail

RudderStack is Now SOC 2 Certified

RudderStack

RudderStack is now more secure than ever before. Here's how we received our SOC 2 Type 1 certification. Click for more.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cloud Migration Series (Step 3 of 5): Assess Readiness

Cloud Academy

This is part 3 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!

Cloud 40
article thumbnail

Change the Primary Key Type with Sequelize

Grouparoo

We recently adjusted how we handle primary keys. Previously they were UUIDs with a max length of 40 characters. With our Declarative Sync feature, we allow developers to set primary key values from their configuration files. Thus, we needed to lengthen the maximum number of characters allowed on primary keys in our database. Seems simple, right? I thought so, too.

article thumbnail

How to Extract Snowflake Data Observability Metrics Using SQL in 5 Steps

Monte Carlo

Your team just migrated to Snowflake. Your CTO is all in on this “modern data stack,” or as she calls it: “ The Enterprise Data Discovery.” But as any data engineer will tell you, not even the best tools will save you from broken pipelines. In fact, you’ve probably been on the receiving end of schema changes gone bad, duplicate tables, and one-too-many null values on more occasions than you wish to remember.

SQL 40
article thumbnail

Using Grafana to Monitor the Health and Status of Your Customer Data Pipelines

RudderStack

Learn the specific metrics and logs produced by RudderStack and how you can use Grafana to create system health reports for your customer data pipelines.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

Using kafka-merge-purge to Deal with Failure in an Event-Driven System at FLYERALARM

Confluent

Failures are inevitable in any system, and there are various options for mitigating them automatically. This is made possible by event-driven applications leveraging Apache Kafka® and built with fault tolerance […].

Kafka 95
article thumbnail

Computer Vision in Healthcare: Creating an AI Diagnostic Tool for Medical Image Analysis

AltexSoft

Our lungs are the only body organs that constantly interact with the external environment, through the air we breathe. This exposure makes the respiratory system extremely susceptible to a wide range of diseases, from long-familiar asthma to novel COVID-19. Subtle at early stages, the signs of lung conditions are easy to overlook. And delays in diagnosis often lead to harsh consequences.

Medical 72
article thumbnail

Monte Carlo Named a Best Workplace for 2021 By Inc. Magazine

Monte Carlo

On behalf of the entire team, I’m excited to share that Monte Carlo has been named to Inc. magazine’s annual list of Best Workplaces for 2021, as well as to the publication’s On the Rise category of companies under 4 years old. Hitting newsstands May 18 in the May/June 2021 issue, and as part of a prominent Inc.com feature, the list is the result of a wide-ranging and comprehensive measurement of American companies that have created exceptional workplaces and company culture whether teams

article thumbnail

RudderStack ETL Makes Cloud-to-Warehouse Pipelines Easy

RudderStack

ETL makes it easy to build ELT pipelines from your cloud applications to your warehouse by providing integrations for Salesforce, ZenDesk, and many more.

Cloud 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.