Sat.Aug 28, 2021 - Fri.Sep 03, 2021

article thumbnail

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Uber Engineering

Introduction. Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One … The post Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework appeared first on Uber Engineering Blog.

AWS 141
article thumbnail

Understand & Deliver on Your Data Engineering Task

Start Data Engineering

1. Introduction 2. Understanding your data engineering task 2.1. Data infrastructure overview 2.2. What exactly 2.3. Why exactly 2.4. Current state 2.5. Downstream impact 3. Delivering your data engineering task 3.1. How 3.2. Breakdown into sub-tasks 3.3. Delivering the finished task 4. Conclusion 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and are assigned your very first data engineering task.

article thumbnail

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Confluent

Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take […].

Cloud 122
article thumbnail

When Data Redefines Companies

Cloudera

The more an enterprise wants to know about itself and its business prospects, the more data it needs to collect and analyze. Additionally, the more data it collects and stores, the better its ability to know customers, to find new ones, and to provide more of what they want to buy. Sounds simple, but a surprising majority of U.S. companies (about two-thirds, according to CIO.com ) are only now getting tuned in to become fully functioning data-driven enterprises by starting new initiatives, scali

Hadoop 106
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Designing And Building Data Platforms As A Product

Data Engineering Podcast

Summary The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their experiences building them at various companies, and provide advice on how to treat them like a software product.

Designing 100
article thumbnail

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Netflix Tech

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services.

More Trending

article thumbnail

Optimizing Cloudera Data Engineering Autoscaling Performance

Cloudera

The shift to cloud has been accelerating, and with it, a push to modernize data pipelines that fuel key applications. That is why cloud native solutions which take advantage of the capabilities such as disaggregated storage & compute, elasticity, and containerization are more paramount than ever. At Cloudera, we introduced Cloudera Data Engineering (CDE) as part of our Enterprise Data Cloud product — Cloudera Data Platform (CDP) — to meet these challenges.

article thumbnail

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. In this episode Dipti Borkar discusses the work that she and her team are doing at Ahana to simplify the work of running your own PrestoDB environment in the cloud.

Data Lake 100
article thumbnail

Towards a Reliable Device Management Platform

Netflix Tech

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In addition, Netflix continuously works with its partners (such as Roku, Samsung, LG, Amazon) to port the Netflix SDK to their new and upcoming devices (TVs, smart boxes, etc), to ensure the quality bar is reached be

article thumbnail

Learner Spotlight: Gino Parages

Dataquest

Meet Gino Parages, a former sales and IT business analyst with no coding skills who decided it was time to learn coding to give his career a boost. He chose Dataquest to help him achieve his learning goals and land the job he wanted. Here’s his story… Q: First, what are your preferred pronouns? A: He/him Q: All right, Gino! What’s your current job title?

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Data warehousing is the backbone of every data driven organization , providing mission critical analytics. Today, modern data warehousing has evolved to meet the intensive demands of the newest analytics required for a business to be data driven.

article thumbnail

Faster Results and a Better Experience with New Pagination in Rockset

Rockset

Summary: Pagination is a technique used to divide a result-set into smaller, more manageable chunks Historically, Rockset used the Limit-Offset method to implement pagination, but query results can be slow and inconsistent when dealing with very large data sets in real-time Rockset has now implemented a cursor-based approach for pagination, making queries faster, more consistent, and potentially cheaper for large data sets This is available today for all customers Pagination is a familiar techni

article thumbnail

Grouparoo v0.6 release

Grouparoo

The newest release of Grouparoo has a few updates that make working with data easier. Staying sync with your data warehouse. If rows are deleted in your data warehouse, then Grouparoo profiles get deleted. Combine or use logic to make profile properties. Use code to re-mix your data and get the perfect formats. New destinations: Mixpanel, Mailjet Profile deletion Data systems are often quite good at ingesting new data, but things get complicated when it gets deleted.

article thumbnail

Data Quality + Data Lineage = ???

Datakin

Blog Data Quality + Data Lineage = Written by Peter Hicks on Sep 2, 2021 In a prior life, I dwelled in the day-to-day cycles of an e-commerce platform. I worked with a quite generalized system with orders, products, variants, SKUs, and customers that pined for every discount they could come by. The system built around the core business schema was the kind of chaos that data engineers are all too familiar with; large volumes of clickstream data, etl_warehouses, read replicas, and machine learning

Bytes 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Terraform Databricks Labs

Advancing Analytics: Data Engineering

In late 2020, Databricks introduced Databricks Labs a collection of Terraform Providers that gives you the ability to deploy nearly all Databricks resources onto Azure and Amazon Web Services (AWS) cloud platforms. Meaning you can deploy Databricks workspace, clusters, secrets, libraries, notebooks and automated jobs (and many more) at the time of provisioning the infrastructure, making it easy to manage and configure Databricks.

article thumbnail

What is a Data Incident Commander?

Monte Carlo

Incident management isn’t just for software engineers. With the rise of data platforms and the data-as-a-product mentality, building more reliable processes and workflows to handle data quality has emerged as a top concern for data engineers. In a previous post , we discussed how to set up automatic detection and alerting for bad data; now, guest author Glen Willis shares how the best data teams handle triaging and severity assessment for your broken data pipelines with the help of an emerging r

article thumbnail

Build Your CFO Analytics Foundation

Teradata

A core finance foundation, supported by the right data management tools, creates a trusted, auditable, and traceable source of all things financial. Read more.

article thumbnail

How Rockset Enables SQL-Based Rollups for Streaming Data

Rockset

Until Now: The Slow Crawl from Batch to Real-Time Analytics The world is moving from batch to real-time analytics but it's been at a crawl. Apache Kafka has made acquiring real-time data more mainstream, but only a small sliver are turning batch analytics, run nightly, into real-time analytical dashboards with alerts and automatic anomaly detection.

SQL 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A Fresh Perspective on Monads: Generalizing Chained Computations

Rock the JVM

Explore a fresh perspective on monads: Discover new angles on this familiar concept with Rock the JVM

52
article thumbnail

The Data Janitor Letters - July 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. Building a data team at a mid-stage startup: a short story Erik Bernhardsson, Working on something, "Bernco" The data culture is driven both from above (the CEO pushing for it) as well as from below (people in the trenches). It's OK to fail if at least you learned something from it.

SQL 52
article thumbnail

Acquiring is Dead. Long Live Acquiring.

Teradata

Data-driven services can help merchant acquirers add value to their core capabilities. However, to succeed, they need to be armed with the necessary data governance capabilities & know-how.

article thumbnail

50 ML Projects To Strengthen Your Portfolio and Get You Hired

ProjectPro

The most trusted way to learn and master the art of machine learning is to practice hands-on projects. Projects help you create a strong foundation of various machine learning algorithms and strengthen your resume. But as the saying goes the voyage of a thousand miles starts with a single footstep, we present to you a 50 first steps guide on your machine learning journey.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Using RudderStack To Power Your Machine Learning Models

RudderStack

This post explores three interesting ways you can use RudderStack to unlock the power of machine learning.

article thumbnail

A day in the life of a Technical Fellow

Eventbrite Engineering

In my two most recent blog posts, I talked about how to write a Long-Term Technical Vision and a Golden Path. These are future-looking and high-level artifacts so the question I keep hearing is: do I need to give up coding to grow in my career and become a Technical Fellow? In this post I will … Continue reading "A day in the life of a Technical Fellow" The post A day in the life of a Technical Fellow appeared first on Engineering Blog.

Coding 40
article thumbnail

Using Internal Mobility For Growth

Zalando Engineering

Long time readers of this blog will remember that back in 2019, we published a feature on the benefits of rotating engineers between teams. For those of you who have not seen it, the article described an initiative that aimed to establish cross-functional knowledge sharing, encourage cross team collaboration, and bring greater product awareness, by providing engineers with an opportunity to work on different teams within our Developer Productivity department.

article thumbnail

Why Your Data Warehouse Should Be the Foundation of Your CDP

RudderStack

This post explores how RudderStack’s warehouse-first approach separates it from the traditional marketing CDP.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Send Form Data From Marketo to Multiple Destinations Using RudderStack

RudderStack

See how you can leverage RudderStack to easily track Marketo form submissions without disrupting Marketo or your marketing team.

Data 40
article thumbnail

Replacing Segment Computed & SQL Traits With dbt & RudderStack Reverse ETL

RudderStack

Learn to use dbt & RudderStack Reverse ETL to leverage the power of your data warehouse to sync enriched users, audiences, and other data to downstream tools.

SQL 40