Sat.May 08, 2021 - Fri.May 14, 2021

article thumbnail

How to make data pipelines idempotent

Start Data Engineering

What is an idempotent function Pre-requisites Why idempotency matters Making your data pipeline idempotent Conclusion Further reading References What is an idempotent function “Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application” - wikipedia Defined as f(f(x)) = f(x) In the data engineering context, this can come to mean that: running a data pipeline

article thumbnail

Introducing Confluent for Kubernetes

Confluent

We are excited to announce that Confluent for Kubernetes is generally available! Today, we are enabling our customers to realize many of the benefits of our cloud service with the […].

Cloud 137
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project.

article thumbnail

Automating CDP Private Cloud Installations with Ansible

Cloudera

The introduction of CDP Public Cloud has dramatically reduced the time in which you can be up and running with Cloudera’s latest technologies, be it with containerised Data Warehouse , Machine Learning , Operational Database or Data Engineering experiences or the multi-purpose VM-based Data Hub style of deployment. In CDP Private Cloud, the introduction of Cloudera Data Warehouse and Cloudera Machine Learning Experiences on RedHat OpenShift Kubernetes clusters means that we can deploy new

Cloud 104
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Observability and Monitoring with DataOps

DataKitchen

Data errors impact decision-making. When analytics and dashboards are inaccurate, business leaders may not be able to solve problems and pursue opportunities. Data errors infringe on work-life balance. They cause people to work long hours at the expense of personal and family time. Data errors also affect careers. If you have been in the data profession for any length of time, you probably know what it means to face a mob of stakeholders who are angry about inaccurate or late analytics.

article thumbnail

Using kafka-merge-purge to Deal with Failure in an Event-Driven System at FLYERALARM

Confluent

Failures are inevitable in any system, and there are various options for mitigating them automatically. This is made possible by event-driven applications leveraging Apache Kafka® and built with fault tolerance […].

Kafka 95

More Trending

article thumbnail

5 Factors to Consider When Choosing a Stream Processing Engine

Cloudera

Are you using the right stream processing engine for the job at hand? You might think you are—and you very well might be!—but have you really examined the stream processing engines out there in a side-by-side comparison to make sure? Our Choose the Right Stream Processing Engine for Your Data Needs whitepaper makes those comparisons for you, so you can quickly and confidently determine which engine best meets your key business requirements.

Process 101
article thumbnail

Big Data Analytics: How It Works, Tools, and Real-Life Applications

AltexSoft

Big Data enjoys the hype around it and for a reason. But the understanding of the essence of Big Data and ways to analyze it is still blurred. The truth is, there’s more to this term than just the size of information generated. Not only does Big Data apply to the huge volumes of continuously growing data that come in different formats, but it also refers to the range of processes, tools, and approaches used to gain insights from that data.

article thumbnail

Achieving observability in async workflows

Netflix Tech

Written by Colby Callahan , Megha Manohara , and Mike Azar. Managing and operating asynchronous workflows can be difficult without the proper tools and architecture that puts observability, debugging, and tracing at the forefront. Imagine getting paged outside normal work hours?—?users are having trouble with the application you’re responsible for, and you start diving into logs.

Java 65
article thumbnail

Beyond Resilience-The Next Generation of Supply Chain

Teradata

After the shock of COVID exposed the brittle nature of many global supply chains, focus has shifted to resilience, a necessary consideration but not the only one.

64
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

cdpcurl: Low-Level CDP API Access

Cloudera

Cloudera Data Platform (CDP) provides an API that enables you to access CDP functionality from a script, or to integrate CDP features with an application. In practice you can use the CDP API to script repetitive tasks, manage CDP resources, or even create custom applications. You can learn more about the API in its official documentation. There are multiple ways to access the API, including through a dedicated CLI , through a Java SDK , and through a low-level tool called cdpcurl. cdpcurl is des

article thumbnail

Computer Vision in Healthcare: Creating an AI Diagnostic Tool for Medical Image Analysis

AltexSoft

Our lungs are the only body organs that constantly interact with the external environment, through the air we breathe. This exposure makes the respiratory system extremely susceptible to a wide range of diseases, from long-familiar asthma to novel COVID-19. Subtle at early stages, the signs of lung conditions are easy to overlook. And delays in diagnosis often lead to harsh consequences.

Medical 72
article thumbnail

Why are database columns 191 characters?

Grouparoo

Sometimes, when you are looking at a database’s schema, you see that there are text fields defined like this: email_address varchar ( 191 ) NOT NULL This means that the column supports strings with a maximum length of 191 characters, and can’t be null. 191 is such an odd number - where did it come from? In this post, we’ll look at the historical reasons for the 191 character limit as a default in most relational databases.

article thumbnail

Open Banking is Transforming Financial Services and Chipping Away the Relevance of Traditional Banks

Teradata

The sharing of client data in an Open Banking marketplace challenges banks to adopt a customer-centric approach & collaborate with new players to re-define their relevance.

Banking 59
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Accelerate Moving to CDP with Workload Manager

Cloudera

Since my last blog, What you need to know to begin your journey to CDP , we received many requests for a tool from Cloudera to analyze the workloads and help upgrade or migrate to Cloudera Data Platform (CDP). The good news is Cloudera has a tried and tested tool, Workload Manager (WM) that meets your needs. WM saves time and reduces risks during upgrades or migrations.

article thumbnail

SaaS Industry Trends in Real-Time Analytics

Rockset

We're seeing a lot of growth in real time analytics, ranging from companies that are delivering snappy, interactive experiences within their application to those doing semi-autonomous or autonomous machine learning processes. Companies are giving their users real-time data and insight with the goal of taking immediate action. This is the real time analytics trend that we're seeing across the SaaS industry.

article thumbnail

Responsive Mega Menu Using React Bootstrap

Grouparoo

Having clear and accessible navigation is huge for website conversions. Sites with poor navigation are frustrating to use. Nested navigation menus are a common way to help keep top-level navigation to a minimum, but they can have major usability issues. A better way to handle a large number of links in a dropdown is to create a mega menu. Recently, we gave our site navigation a face lift using mega menus.

Media 52
article thumbnail

Forrester – Chart Your Course To Insights-Driven Business Maturity

DataKitchen

As organizations strive to become more data-driven, Forrester recommends 5 actions to take to move from one stage of insights-driven business maturity to another. . After establishing a solid strategy, the second phase involves planning key processes and practices to support the strategy, including “the emerging and increasingly important DataOps and ModelOps processes and methodologies.”.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Announcing the 2021 Data Impact Awards

Cloudera

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. Each year, taking a moment to celebrate successes provides us with a wonderful opportunity to reflect on the incredible work we do together.

Food 69
article thumbnail

Building Data Applications Powered by Real-Time Analytics

Rockset

For long-term success with real-time analytics it is important to use the right tool for the job. Data applications are an emerging breed of applications that demand sub-second analytics on fresh data. Examples include logistics tracking, gaming leaderboards, investment decisions systems, connected devices and embedded dashboards in SaaS apps. Real-time analytics is all about using data as soon as it is produced to answer questions, make predictions, understand relationships, and automate proces

article thumbnail

Data Pipelining Mailchimp and Google Sheets

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 Web Developer Dylan : Hey there Mama's Travel, are you enjoying your new website? Client : Absolutely! There's just one more thing: I need a way to subscribe new people to my mailing list manually.

article thumbnail

Data.What? Why You Should Keep Doing Data Integration

Teradata

Data integration plays a key part of data management. But many enterprises have lost the faith in the value it can provide. Find out why data integration still matters.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021

DataKitchen

The post DataKitchen’s Chris Bergh Reveals the Steps for Enterprise DataOps Success at Data Summit Connect 2021 first appeared on DataKitchen.

Data 52
article thumbnail

Find and Replace Text with SQL Regular Expressions in Rockset

Rockset

In our first blog , we used a regular expression to replace the quotes in genres. Afterward, we were able to UNNEST() the JSON object. We’ll be working with the same data set in this blog In our data: Embedded content: [link] there is a JSON string that’s called spoken_languages, and it’s formatted similarly to genres: [ { "spoken_languages": "[{'iso_639_1': 'fr', 'name': 'Français'}]" }] Assuming everything is consistent, we can just write the SQL statement similar to what we wrote for genres -

SQL 40
article thumbnail

Change the Primary Key Type with Sequelize

Grouparoo

We recently adjusted how we handle primary keys. Previously they were UUIDs with a max length of 40 characters. With our Declarative Sync feature, we allow developers to set primary key values from their configuration files. Thus, we needed to lengthen the maximum number of characters allowed on primary keys in our database. Seems simple, right? I thought so, too.

article thumbnail

RudderStack + Blendo: Better Together

RudderStack

RudderStack acquires Blendo: Kostas Pardalis, the founder of Blendo, talks about why he decided to merge Blendo with RudderStack, building the team, and working on the product together.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Cloud Migration Series (Step 3 of 5): Assess Readiness

Cloud Academy

This is part 3 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. Be sure to subscribe to our blog to be notified when new content goes live!

Cloud 40
article thumbnail

How to Extract Snowflake Data Observability Metrics Using SQL in 5 Steps

Monte Carlo

Your team just migrated to Snowflake. Your CTO is all in on this “modern data stack,” or as she calls it: “ The Enterprise Data Discovery.” But as any data engineer will tell you, not even the best tools will save you from broken pipelines. In fact, you’ve probably been on the receiving end of schema changes gone bad, duplicate tables, and one-too-many null values on more occasions than you wish to remember.

SQL 40
article thumbnail

Why Engineering and IT Need to own the CDP

RudderStack

Why Engineering and IT organizations should own CDP implementation & management. Here, we'll cover all of the benefits to Engineering and IT-led CDP implementation and management.

IT 40
article thumbnail

Reverse ETL is Just Another Data Pipeline

RudderStack

Reverse ETL is nothing but just another data pipeline that carries data from one end to another, & Single Pipe Simplifies Your Stack, Security & Data governance

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.