Sat.Nov 07, 2020 - Fri.Nov 13, 2020

article thumbnail

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety of queries both small and large; there are many circumstances where Impala queries small amounts of data; when end users are iterating on a use case, filterin

Metadata 144
article thumbnail

How Netflix Scales its API with GraphQL Federation (Part 1)

Netflix Tech

Netflix is known for its loosely coupled and highly scalable microservice architecture. Independent services allow for evolving at different paces and scaling independently. Yet they add complexity for use cases that span multiple services. Rather than exposing 100s of microservices to UI developers, Netflix offers a unified API aggregation layer at the edge.

IT 144
article thumbnail

Road to AI

Team Data Science

Currently, the big buzz about big data is probably apt with the number of technologies and tools available to build products and services. Uber, Google, Microsoft, and now Apple are implementing AI to their core business operations to provide real-time AI services in their ecosystem. I personally believe once due to this success of big data companies, the hype behind AI has blown out of proportions.

Big Data 130
article thumbnail

How to Pull Data from an API, Using AWS Lambda

Start Data Engineering

Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below My company does not have the budget to purchase a tool like fivetran, What should I use to pull data from an API ?

AWS 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Veterans Day: What Service Means to Clouderan Vets

Cloudera

Around the world, a number of countries celebrate November 11 as a day to give thanks and recognition for their veterans. Originally designated to honor the end of World War I ( Armistice Day and Remembrance Day ), in some countries it is now used to pay respect to all veterans ( Veterans Day ). . Year after year, we use this time to express our support and appreciation to those who have served in the military.

article thumbnail

Self-Describing Events and How They Reduce Code in Your Processors

Confluent

Have you ever had to write a program that needed to handle any data payload that could be thrown at you? If so, did you always have to update the […].

Coding 105

More Trending

article thumbnail

How to Make the Most of Big Data Analytics in Your Business

Teradata

Big data's growth and its impact on business is undeniable. But how do you make the most of your data analytics to create real business value? Find out more.

article thumbnail

Expediting SQL Workers means Expediting your Business

Cloudera

Two of the more painful things in your everyday life as an analyst or SQL worker are not getting easy access to data when you need it, or not having easy to use, useful tools available to you that don’t get in your way! As one of my dear customers, a data worker in Pharma, said to me: “I really don’t care about bells and whistles, I just want to get my task done.

SQL 116
article thumbnail

How to Choose Between Strict and Dynamic Schemas

Confluent

Event modeling has always been a pain point in organizations. From figuring out the standard format of your schemas, processing said data models effectively, and finally testing before you deploy […].

Process 105
article thumbnail

Developing Grouparoo on macOS Big Sur

Grouparoo

The newest release of macOS is out! Like any new OS release, there are plenty of new features. and new bugs to squash. The Grouparoo team uses develops on macOS, and we've taken notes about what we needed to do to continue being productive though the upgrade. Update Homebrew and Databases Like most macOS developers, we install our dependencies and database with Homebrew , a great package manager for macOS.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Boost Your Customer Experience with Better Payment Conversions

Teradata

With digital payments on the rise, payment processing has become more complex. Fortunately, advanced data technologies can create better customer experience via streamlined payment processes.

article thumbnail

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

A tale of two organizations. Here at Cloudera, we’ve seen many large organizations struggle to meet ever-changing and ever-growing business demands. We see it everywhere. Traditional on-premise architectures, which create a fixed, finite set of resources, forces every business request for new insight to be a crazy resource balancing act, coupled with long wait times, or a straight-up no, it cannot be done.

Cloud 105
article thumbnail

Advanced Testing Techniques for Spring Kafka

Confluent

Asynchronous boundaries. Frameworks. Configuring frameworks. Apache Kafka®. All of these share one thing in common: complexity in testing. Now imagine them combined—it gets much harder. This is the final blog […].

Kafka 98
article thumbnail

Liquidity Monitoring: Depth

Ripple Engineering

In our last liquidity monitoring post , we introduced the concept of dislocation as a way to measure the price competitiveness of an XRP-fiat pair. In this post, we introduce the companion depth metric and combine both metrics into a data visualization for assessing liquidity performance. Depth Dislocation tells us how competitive an exchange’s XRP prices are, but it ignores the important quantity component of liquidity.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Getting Started with Native Object Store and Microsoft Azure Object Storage in 5 Easy Steps

Teradata

Learn the prerequisites and configuration required for Vantage with Native Object Store to easily access Azure Blob storage and Azure Data Lake Gen 2.

article thumbnail

True workplace diversity goes beyond gender parity

Cloudera

Diversity takes on many forms around us. Think of a garden, an orchestra, and the example that’s easiest to relate to: food. While every ingredient has its unique taste, combining them in the right amount will result in a delicious dish. If we understand the value of diversity, why is workplace diversity still a big challenge for many companies? D&I’s progress limited a narrow view of diversity.

Food 104
article thumbnail

Databricks SQL Analytics Workspace - The Evolution of the Lakehouse

Advancing Analytics: Data Engineering

We have discussed in the past this idea of the lakehouse , the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. It’s an interesting concept but has, in the past, been firmly an aspiration. In the world without lakehouses, we often see the “Modern Data Warehouse”, this two-phased approach to providing a holistic platform – we load our early data into a lake where we

SQL 52
article thumbnail

Demystifying Variance Positions in Scala

Rock the JVM

Explore the infamous 'covariant type occurs in contravariant position' problem in Scala: discover effective solutions and best practices

Scala 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How Tesla is Redefining the Auto Industry

Teradata

New players like Tesla are changing the automotive industry into a software-driven paradigm which has made data management & analysis at scale a critical capability for OEMs.

article thumbnail

Project Metamorphosis Month 7: Reliable Event Streaming with Confluent Cloud and Proactive Support

Confluent

The rise of the cloud introduced a focus on rapid iteration and agility that is founded on specialization. If you are an application developer, you know your applications better than […].

Cloud 52
article thumbnail

Using Elasticsearch to Offload Real-Time Analytics from MongoDB

Rockset

Offloading analytics from MongoDB establishes clear isolation between write-intensive and read-intensive operations. Elasticsearch is one tool to which reads can be offloaded, and, because both MongoDB and Elasticsearch are NoSQL in nature and offer similar document structure and data types, Elasticsearch can be a popular choice for this purpose. In most scenarios, MongoDB can be used as the primary data storage for write-only operations and as support for quick data ingestion.

MongoDB 40