November, 2020

article thumbnail

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety of queries both small and large; there are many circumstances where Impala queries small amounts of data; when end users are iterating on a use case, filterin

Metadata 144
article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 1)

Netflix Tech

Netflix is known for its loosely coupled and highly scalable microservice architecture. Independent services allow for evolving at different paces and scaling independently. Yet they add complexity for use cases that span multiple services. Rather than exposing 100s of microservices to UI developers, Netflix offers a unified API aggregation layer at the edge.

IT 144
article thumbnail

Analysing historical and live data with ksqlDB and Elastic Cloud

Confluent

Building data pipelines isn’t always straightforward. The gap between the shiny “hello world” examples of demos and the gritty reality of messy data and imperfect formats is sometimes all too […].

Cloud 139
article thumbnail

A Data Scientist in Engineering Wonderland

Team Data Science

As a data scientist, I always felt a missing link between my developed models and putting them in the production process. Yes, I can create a pipeline, write a model, get results, and interpret the results, but if I cannot scale it, these all will sit on my Jupiter notebooks. This thought led me to my data engineering adventure. I am confident that learning data engineering will make me a better data scientist.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How to Pull Data from an API, Using AWS Lambda

Start Data Engineering

Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below My company does not have the budget to purchase a tool like fivetran, What should I use to pull data from an API ?

AWS 130
article thumbnail

Streaming Data Integration Without The Code at Equalum

Data Engineering Podcast

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analytics the need to integrate that data in near real time is growing. With the improvements and increased variety of options for streaming data engines and improved tools for change data capture it is possible for data teams to make that goal a reality.

More Trending

article thumbnail

Keeping Netflix Reliable Using Prioritized Load Shedding

Netflix Tech

How viewers are able to watch their favorite show on Netflix while the infrastructure self-recovers from a system failure By Manuel Correa , Arthur Gonigberg , and Daniel West Getting stuck in traffic is one of the most frustrating experiences for drivers around the world. Everyone slows to a crawl, sometimes for a minor issue or sometimes for no reason at all.

article thumbnail

Digital Transformation in Style: How Boden Innovates Retail Using Apache Kafka

Confluent

As a clothing retailer with more than 1.5 million customers worldwide, Boden is always looking to capitalise on business moments to drive sales. For example, when the Duchess of Cambridge […].

Retail 135
article thumbnail

Road to AI

Team Data Science

Currently, the big buzz about big data is probably apt with the number of technologies and tools available to build products and services. Uber, Google, Microsoft, and now Apple are implementing AI to their core business operations to provide real-time AI services in their ecosystem. I personally believe once due to this success of big data companies, the hype behind AI has blown out of proportions.

Big Data 130
article thumbnail

How to Make the Most of Big Data Analytics in Your Business

Teradata

Big data's growth and its impact on business is undeniable. But how do you make the most of your data analytics to create real business value? Find out more.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Keeping A Bigeye On The Data Quality Market

Data Engineering Podcast

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of your data pipelines it can be overwhelming to stay up to date with how they all compare.

Hadoop 100
article thumbnail

Veterans Day: What Service Means to Clouderan Vets

Cloudera

Around the world, a number of countries celebrate November 11 as a day to give thanks and recognition for their veterans. Originally designated to honor the end of World War I ( Armistice Day and Remembrance Day ), in some countries it is now used to pay respect to all veterans ( Veterans Day ). . Year after year, we use this time to express our support and appreciation to those who have served in the military.

article thumbnail

Simple streaming telemetry

Netflix Tech

Introducing gnmi-gateway: a modular, distributed, and highly available service for modern network telemetry via OpenConfig and gNMI By: Colin McIntosh, Michael Costello Netflix runs its own content delivery network, Open Connect , which delivers all streaming traffic to our members. A backbone network underlies a large portion of the CDN, and we also run the high capacity networks that support our studios and corporate offices.

article thumbnail

IBM and Confluent Announce Strategic Partnership

Confluent

I’m excited to announce a new strategic partnership with IBM. As part of this partnership, IBM will be reselling Confluent Platform, enabling its customers to leverage their existing IBM relationships […].

IT 128
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

The Journey Begins

Team Data Science

Week 1: 10/9/20 - 10/16/20 In my quest to further improve my overall data science skills, I pulled the trigger on October 9th, 2020, and enrolled in a Data Engineering boot camp lead by Andreas Kretz. First a little bit about myself. I have a background in Aerospace Engineering and have been in the industry for close to 15 years now. A little more than a year ago, I decided to pivot to Machine Learning and Data Science.

article thumbnail

Risk-Based Wealth Management: What the Insurance Industry Gets Wrong

Teradata

Product-centric processes degrade customer experience. Insurers must insulate consumers from internal & regulatory-driven controls by placing them in the center of the customer experience.

article thumbnail

Self Service Data Management From Ingest To Insights With Isima

Data Engineering Podcast

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of impressive tools and products built to fulfill this mission, it is still an uphill battle to tie everything together into a cohesive and reliable platform.

article thumbnail

How a modern data platform supports government fraud detection

Cloudera

November 15-21 marks International Fraud Awareness Week – but for many in government, that’s every week. From bogus benefits claims to fraudulent network activity, fraud in all its forms represents a significant threat to government at all levels. Some experts estimate the U.S. government loses nearly 150 billion dollars due to potential fraud each year, McKinsey & Company reports.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

5 things you should know about Real-Time Analytics

A Cloud Guru: Data Engineering

Running analytics on real-time data is a challenge many data engineers are facing today. But not all analytics can be done in real time! Many are dependent on the volume of the data and the processing requirements. Even logic conditions are becoming a bottleneck. For example, think about join operations on huge tables with more […] The post 5 things you should know about Real-Time Analytics appeared first on A Cloud Guru.

article thumbnail

Use Cases and Architectures for HTTP and REST APIs with Apache Kafka

Confluent

This blog post presents the use cases and architectures of REST APIs and Confluent REST Proxy, and explores a new management API and improved integrations into Confluent Server and Confluent […].

article thumbnail

Branding Yourself

Team Data Science

Week 2: 10/16/20 - 10/23/20 Week 2 of the course consists of Modules 3 & 4. If you have not read my first blog go here. Module 3 focuses on creating a professional LinkedIn profile. Your LinkedIn profile is the world's access to you and how you want to be seen professionally. Below is a screenshot. So here, I have a professionally taken photograph, what I am interested in below, and the 'About' section that summarizes Me.in a professional sense.

article thumbnail

Boost Your Customer Experience with Better Payment Conversions

Teradata

With digital payments on the rise, payment processing has become more complex. Fortunately, advanced data technologies can create better customer experience via streamlined payment processes.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Building A Cost Effective Data Catalog With Tree Schema

Data Engineering Podcast

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs.

Building 100
article thumbnail

How insurers can better deliver at “The Moment of Truth”

Cloudera

It’s all about the Customer. Customers today expect services to be highly personalized. In a digital world tuned to understand your likes, dislikes, interests and preferences we expect a similar level of customization in all aspects of our lives. Insurance is no different. Insurance is not something the average consumer thinks about every day but when a life changing event happens, insurance becomes extremely important.

Insurance 120
article thumbnail

Scala 3: Mastering Path-Dependent Types and Type Projections

Rock the JVM

Unlock advanced programming techniques with Scala 3: explore dependent types, methods, and functions in this concise tutorial

Scala 52
article thumbnail

How Real-Time Stream Processing Safely Scales with ksqlDB, Animated

Confluent

Software engineering memes are in vogue, and nothing is more fashionable than joking about how complicated distributed systems can be. Despite the ribbing, many people adopt them. Why? Distributed systems […].

Process 122
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Data Quality at Airbnb

Airbnb Tech

Part 2 — A New Gold Standard Authors: Vaughn Quoss, Jonathan Parks, Paul Ellwood Introduction At Airbnb, we’ve always had a data-driven culture. We’ve assembled top-notch data science and engineering teams, built industry-leading data infrastructure, and launched numerous successful open source projects, including Apache Airflow and Apache Superset.

article thumbnail

Connect Teradata Vantage to Salesforce Data With Azure Data Factory

Teradata

This "how-to" guide will help you to connect Teradata Vantage using the Native Object Store feature to query Salesforce data sourced by Microsoft Azure Data Factory.

Data 59
article thumbnail

Add Version Control To Your Data Lake With LakeFS

Data Engineering Podcast

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. In order to address these challenges the team at Treeverse created LakeFS to introduce version control capabilities to your storage layer.

Data Lake 100
article thumbnail

Fraud Detection using Deep Learning

Cloudera

One of the many areas where machine learning has made a large difference for enterprise business is in the ability to make accurate predictions in the realm of fraud detection. Knowing that a transaction is fraudulent is a critical requirement for financial services companies, but knowing that a transaction that was flagged by a rules-based system as fraudulent is a valid transaction, can be equally important.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.