Sat.Jul 10, 2021 - Fri.Jul 16, 2021

article thumbnail

Tyrannical Data and Its Antidotes in the Microservices World

Confluent

Data is the lifeblood of so much of what we build as software professionals, so it’s unsurprising that operations involving its transfer occupy the vast majority of developer time across […].

IT 141
article thumbnail

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Cloudera

After the launch of CDP Data Engineering (CDE) on AWS a few months ago, we are thrilled to announce that CDE, the only cloud-native service purpose built for enterprise data engineers, is now available on Microsoft Azure. . CDP Data Engineering offers an all-inclusive toolset that enables data pipeline orchestration, automation, advanced monitoring, visual profiling, and a comprehensive management toolset for streamlining ETL processes and making complex data actionable across your analytic team

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Customer Support Automation Platform at Uber

Uber Engineering

High Level Overview of the Problem. Introduction. If you’ve used any online/digital service, chances are that you are familiar with what a typical customer service experience entails: you send a message (usually email aliased) to the company’s support staff, fill … The post Customer Support Automation Platform at Uber appeared first on Uber Engineering Blog.

article thumbnail

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Data Engineering Podcast

Summary There is a wealth of tools and systems available for processing data, but the user experience of integrating them and building workflows is still lacking. This is particularly important in large and complex organizations where domain knowledge and context is paramount and there may not be access to engineers for codifying that expertise. Raj Bains founded Prophecy to address this need by creating a UI first platform for building and executing data engineering workflows that orchestrates

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Create a Data Analysis Pipeline with Apache Kafka and RStudio

Confluent

In Data Science projects, we distinguish between descriptive analytics and statistical models running in production. Overall, these can be seen as one process. You start with analyzing historical data to […].

article thumbnail

Accelerate Offloading to Cloudera Data Warehouse (CDW) with Procedural SQL Support

Cloudera

Did you know Cloudera customers, such as SMG and Geisinger , offloaded their legacy DW environment to Cloudera Data Warehouse (CDW) to take advantage of CDW’s modern architecture and best-in-class performance? In addition to substantial cost savings upon moving to CDW, Geisinger is also able to search through hundreds of million patient note records in seconds providing better treatment to their patients.

More Trending

article thumbnail

Exploring The Design And Benefits Of The Modern Data Stack

Data Engineering Podcast

Summary We have been building platforms and workflows to store, process, and analyze data since the earliest days of computing. Over that time there have been countless architectures, patterns, and "best practices" to make that task manageable. With the growing popularity of cloud services a new pattern has emerged and been dubbed the "Modern Data Stack" In this episode members of the GoDataDriven team, Guillermo Sanchez, Bram Ochsendorf, and Juan Perafan, explain the combination

Designing 100
article thumbnail

Data Engineers of Netflix?—?Interview with Kevin Wylie

Netflix Tech

Data Engineers of Netflix?—?Interview with Kevin Wylie This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix. Kevin Wylie is a Data Engineer on the Content Data Science and Engineering team. In this post, Kevin talks about his extensive experience in content analytics at Netflix since joining more than 10 years ago.

article thumbnail

DIA Entries 2021: Judges’ Insight

Cloudera

The 2021 Data Impact Award (DIA) submissions are starting to stream in, and we know many of you are contemplating your entries – which we are excited to see. To help guide your award strategy, we thought it would be an excellent opportunity to ask our judges — a panel comprised of leading analysts and journalists well-versed in the application of data and the wider benefits it can bring across industries – what it takes for a winning project.

article thumbnail

How to build a successful cloud data architecture

DataKitchen

The post How to build a successful cloud data architecture first appeared on DataKitchen.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Real-Time Analytics with dbt + Rockset

Rockset

Rockset was founded to make it easy for developers and data teams to go from real-time data to actionable insights. We designed Rockset to remove many of the barriers teams face while building with real-time data including data preparation, performance tuning and infrastructure management. We also built ground up to support full SQL (including joins and aggregations), the most common query language for analytics.

SQL 52
article thumbnail

The Post-Pandemic Supply Chain: Time to Go Back to Basics?

Teradata

Learn how complexities baked into the data analytics ecosystems of supply chains can be simplified to eliminate redundancy, increase time to value, and reduce cost.

article thumbnail

Paving the way for women in Tech: Fostering young girls’ enthusiasm for STEM

Cloudera

In the late 90s, when I was pursuing my studies in engineering, only a few girls enrolled in any STEM-related courses. While it was our love for math & science and the prospect of future opportunities that brought us here, we sadly found many of them gave up halfway through the course, and those who graduated either quit or never entered the profession. .

article thumbnail

Keys to DataOps Transformation

DataKitchen

The post Keys to DataOps Transformation first appeared on DataKitchen.

52
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Identifying Document Types at Scribd

Scribd Technology

User-uploaded documents have been a core component of Scribd’s business from the very beginning, understanding what is actually in the document corpus unlocks exciting new opportunities for discovery and recommendation. With Scribd anybody can upload and share documents , analogous to YouTube and videos. Over the years, our document corpus has become larger and more diverse which has made understanding it an ever-increasing challenge.

article thumbnail

How to Get More ROI—Faster—From Machine Learning

Teradata

Find out how to harness machine learning and AI to contain costs, increase revenue, and grow your organization’s customer base. Read more.

article thumbnail

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

Introduction and Rationale. The release of Cloudera Data Platform (CDP) Private Cloud Base edition provides customers with a next generation hybrid cloud architecture. This blog post provides an overview of best practice for the design and deployment of clusters incorporating hardware and operating system configuration, along with guidance for networking and security as well as integration with existing enterprise infrastructure.

article thumbnail

The Weekly ETL: Will Data Engineering Ever Be Sexy like Data Science?

Monte Carlo

In Monte Carlo’s Weekly ETL (Explanations Through Lior) series, Lior Gavish, Monte Carlo’s co-founder and CTO, answers a trending question on Reddit about some of data engineering’s hottest topics. Reddit thread can be found here. Reddit user /SWE-Aaron asks if data engineering will ever get the same attention as data science and whether that would actually be a good thing.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Top 15 Cloud Computing Projects Ideas for Beginners in 2023

ProjectPro

People searching for cloud computing jobs per million grew by approximately 50%. According to an Indeed Jobs report, the share of cloud computing jobs has increased by 42% per million from 2018 to 2021. The global cloud computing market is poised to grow $287.03 billion during 2021-2025. Also, global spending on public cloud services will double by 2023.

article thumbnail

Why It’s Hard for Engineering to Support Marketing

RudderStack

Marketing teams get a bad rap from engineering, oftentimes for understandable reasons.

article thumbnail

Optimizing Risk and Exposure Management – Roundtable Highlights

Cloudera

We recently hosted a roundtable focused on o ptimizing risk and exposure management with data insights. For financial institutions and insurers, risk and exposure management has always been a fundamental tenet of the business. Now, risk management has become exponentially complicated in multiple dimensions. . In this session we explored what firms are doing to approach the uncertainty with more predictability.

article thumbnail

Data Quality Solutions: Build or Buy? 4 Things To Know

Monte Carlo

As data pipelines become increasingly complex, investing in a data quality solution is becoming an increasingly important priority for modern data teams. But should you build it—or buy it? There are 4 key challenges, opportunities, and trade-offs when considering building or buying a data observability or data quality solution. In this post we will cover: The importance of data quality Understanding the expected time-to-value for your data quality solution Factoring in the opportunity cost of bu

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

20 Linear Regression Interview Questions and Answers 2023

ProjectPro

Linear Regression is probably one of the most well-known machine learning algorithms. It essentially involves modeling the relation between the given or derived parameters and the target to be learned. Therefore, any machine Learning job interview would be incomplete without a peppering of Linear Regression questions. These linear regression interview questions and answers will help you prepare for your machine learning interview.

article thumbnail

Apache Superset 1.2: Release Notes

Preset

We're excited to announce the release of Apache Superset 1.2! In this release post, we will focus on the biggest and most interesting tangible, end-user features.

40
article thumbnail

Courage and Curiosity: Valuable Attributes for Women in Big Data

Cloudera

Last week we held our third Women In Data Webinar, and what a session it was! We were honored to welcome Justyna Lebedyk, Senior Product Owner Big Data, Commerzbank AG, who posed the question “Does diversity win?” . I had the pleasure of chatting with Justyna about the key themes from her talk and what advice she would give to others looking to pursue a career in data. .

article thumbnail

Announcing Monte Carlo’s Incident IQ, a Root Cause Analysis Workflow for Data Teams

Monte Carlo

Incident IQ gives data engineers and analysts a centralized, all-in-one solution for conducting incident management and root cause analysis on your data pipelines. Video courtesy of Monte Carlo. Today, we are excited to announce the release of Monte Carlo’s data incident management feature, Incident IQ, a new solution that allows data teams to collaboratively identify, alert on, and remediate the root cause of critical data issues before they impact downstream systems and end users.

Food 40
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to Become an Artificial Intelligence Engineer in 2023

ProjectPro

The demand for data-related roles has increased massively in the past few years. Companies are actively seeking talent in these areas, and there is a huge market for individuals who can manipulate data, work with large databases and build machine learning algorithms. While data science is the most hyped-up career path in the data industry, it certainly isn't the only one.

article thumbnail

Inclusive Leadership Minimises Negative Impact of Workplace Politics

Cloudera

Can an organization eradicate workplace politics completely? Defined by the Harvard Business Review as “a variety of activities associated with the use of influence tactics to improve personal or organizational interests”, politics at the workplace is inevitable. Undeniably, wielding influence to achieve positive outcomes is encouraged. However the question leaders should be asking is, are fragmented individual agendas taking precedence over an organization’s mission?

article thumbnail

Monte Carlo Launches Data Incident Management Feature, Incident IQ, to Help Organizations Achieve Data Trust

Monte Carlo

Monte Carlo , the data reliability company, today released data incident management feature, Incident IQ, a new suite of capabilities that help data engineers better pinpoint, address, and resolve data downtime at scale through the Monte Carlo Data Observability Platform. Incident IQ automatically generates rich insights about critical data issues through root cause analysis, giving teams unprecedented visibility into the end-to-end health and trust of their data beyond the scope of traditional

article thumbnail

15 Time Series Projects Ideas for Beginners to Practice 2023

ProjectPro

Time series analysis and forecasting is a dark horse in the domain of Data Science. Time series is among the most applied Data Science techniques in various industrial and business operations, such as financial analysis , production planning, supply chain management, and many more. Machine learning for time series is often a neglected topic. More recent techniques, such as natural language processing, pattern recognition, and others usually gain better attention.

Project 40
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m