Top Data Engineering Digest Aggregated Data Computer Science Content for Week of Nov 07

Sat.Nov 07, 2020 - Fri.Nov 13, 2020

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Cloudera

NOVEMBER 13, 2020

This is part of our series of blog posts on recent enhancements to Impala. The entire collection is available here. Apache Impala is synonymous with high-performance processing of extremely large datasets, but what if our data isn’t huge? What if our queries are very selective? The reality is that data warehousing contains a large variety of queries both small and large; there are many circumstances where Impala queries small amounts of data; when end users are iterating on a use case, filterin

Metadata

Metadata Coding SQL Database

Road to AI

Team Data Science

NOVEMBER 10, 2020

Currently, the big buzz about big data is probably apt with the number of technologies and tools available to build products and services. Uber, Google, Microsoft, and now Apple are implementing AI to their core business operations to provide real-time AI services in their ecosystem. I personally believe once due to this success of big data companies, the hype behind AI has blown out of proportions.

Big Data

Big Data Data Science Datasets Data Pipeline

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How to Pull Data from an API, Using AWS Lambda

Start Data Engineering

NOVEMBER 8, 2020

Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below My company does not have the budget to purchase a tool like fivetran, What should I use to pull data from an API ?

AWS

AWS Cloud Storage Data Pipeline Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Self-Describing Events and How They Reduce Code in Your Processors

Confluent

NOVEMBER 12, 2020

Have you ever had to write a program that needed to handle any data payload that could be thrown at you? If so, did you always have to update the […].

Coding

Coding Programming Data

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Expediting SQL Workers means Expediting your Business

Cloudera

NOVEMBER 10, 2020

Two of the more painful things in your everyday life as an analyst or SQL worker are not getting easy access to data when you need it, or not having easy to use, useful tools available to you that don’t get in your way! As one of my dear customers, a data worker in Pharma, said to me: “I really don’t care about bells and whistles, I just want to get my task done.

SQL

SQL Unstructured Data Hadoop Data Lake

Building A Cost Effective Data Catalog With Tree Schema

Data Engineering Podcast

NOVEMBER 9, 2020

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs.

Building

Building PostgreSQL BI Metadata

How to Make the Most of Big Data Analytics in Your Business

Teradata

NOVEMBER 9, 2020

Big data's growth and its impact on business is undeniable. But how do you make the most of your data analytics to create real business value? Find out more.

Big Data

Big Data Data Analytics Data IT

More Trending

How to Make the Most of Big Data Analytics in Your Business

Teradata

NOVEMBER 9, 2020

Big data's growth and its impact on business is undeniable. But how do you make the most of your data analytics to create real business value? Find out more.

Big Data

Big Data Data Analytics Data IT

How to Choose Between Strict and Dynamic Schemas

Confluent

NOVEMBER 9, 2020

Event modeling has always been a pain point in organizations. From figuring out the standard format of your schemas, processing said data models effectively, and finally testing before you deploy […].

Process

Process Data

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

A tale of two organizations. Here at Cloudera, we’ve seen many large organizations struggle to meet ever-changing and ever-growing business demands. We see it everywhere. Traditional on-premise architectures, which create a fixed, finite set of resources, forces every business request for new insight to be a crazy resource balancing act, coupled with long wait times, or a straight-up no, it cannot be done.

Cloud

Cloud Data Warehouse Banking Data

Developing Grouparoo on macOS Big Sur

Grouparoo

NOVEMBER 12, 2020

The newest release of macOS is out! Like any new OS release, there are plenty of new features. and new bugs to squash. The Grouparoo team uses develops on macOS, and we've taken notes about what we needed to do to continue being productive though the upgrade. Update Homebrew and Databases Like most macOS developers, we install our dependencies and database with Homebrew , a great package manager for macOS.

PostgreSQL

PostgreSQL Database Management Systems

Boost Your Customer Experience with Better Payment Conversions

Teradata

NOVEMBER 8, 2020

With digital payments on the rise, payment processing has become more complex. Fortunately, advanced data technologies can create better customer experience via streamlined payment processes.

Technology

Technology Process Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Advanced Testing Techniques for Spring Kafka

Confluent

NOVEMBER 13, 2020

Asynchronous boundaries. Frameworks. Configuring frameworks. Apache Kafka®. All of these share one thing in common: complexity in testing. Now imagine them combined—it gets much harder. This is the final blog […].

Kafka

Kafka IT

True workplace diversity goes beyond gender parity

Cloudera

NOVEMBER 12, 2020

Diversity takes on many forms around us. Think of a garden, an orchestra, and the example that’s easiest to relate to: food. While every ingredient has its unique taste, combining them in the right amount will result in a delicious dish. If we understand the value of diversity, why is workplace diversity still a big challenge for many companies? D&I’s progress limited a narrow view of diversity.

Food

Food Education Programming Building

Liquidity Monitoring: Depth

Ripple Engineering

NOVEMBER 12, 2020

In our last liquidity monitoring post , we introduced the concept of dislocation as a way to measure the price competitiveness of an XRP-fiat pair. In this post, we introduce the companion depth metric and combine both metrics into a data visualization for assessing liquidity performance. Depth Dislocation tells us how competitive an exchange’s XRP prices are, but it ignores the important quantity component of liquidity.

Technology

Technology IT Data

Getting Started with Native Object Store and Microsoft Azure Object Storage in 5 Easy Steps

Teradata

NOVEMBER 11, 2020

Learn the prerequisites and configuration required for Vantage with Native Object Store to easily access Azure Blob storage and Azure Data Lake Gen 2.

Data Lake

Data Lake Accessibility Accessible Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Demystifying Variance Positions in Scala

Rock the JVM

NOVEMBER 9, 2020

Explore the infamous 'covariant type occurs in contravariant position' problem in Scala: discover effective solutions and best practices

Scala

Project Metamorphosis Month 7: Reliable Event Streaming with Confluent Cloud and Proactive Support

Confluent

NOVEMBER 9, 2020

The rise of the cloud introduced a focus on rapid iteration and agility that is founded on specialization. If you are an application developer, you know your applications better than […].

Cloud

Cloud Project

Using Elasticsearch to Offload Real-Time Analytics from MongoDB

Rockset

NOVEMBER 12, 2020

Offloading analytics from MongoDB establishes clear isolation between write-intensive and read-intensive operations. Elasticsearch is one tool to which reads can be offloaded, and, because both MongoDB and Elasticsearch are NoSQL in nature and offer similar document structure and data types, Elasticsearch can be a popular choice for this purpose. In most scenarios, MongoDB can be used as the primary data storage for write-only operations and as support for quick data ingestion.

MongoDB

MongoDB NoSQL Data Pipeline Data Storage

How Tesla is Redefining the Auto Industry

Teradata

NOVEMBER 10, 2020

New players like Tesla are changing the automotive industry into a software-driven paradigm which has made data management & analysis at scale a critical capability for OEMs.

Data Management

Data Management Management Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

How Netflix Scales its API with GraphQL Federation (Part 1)

Netflix Tech

NOVEMBER 9, 2020

Netflix is known for its loosely coupled and highly scalable microservice architecture. Independent services allow for evolving at different paces and scaling independently. Yet they add complexity for use cases that span multiple services. Rather than exposing 100s of microservices to UI developers, Netflix offers a unified API aggregation layer at the edge.

IT Architecture Engineering Building

Veterans Day: What Service Means to Clouderan Vets

Cloudera

NOVEMBER 10, 2020

Around the world, a number of countries celebrate November 11 as a day to give thanks and recognition for their veterans. Originally designated to honor the end of World War I ( Armistice Day and Remembrance Day ), in some countries it is now used to pay respect to all veterans ( Veterans Day ). . Year after year, we use this time to express our support and appreciation to those who have served in the military.

Recruitment

Recruitment Education Programming Designing

Databricks SQL Analytics Workspace - The Evolution of the Lakehouse

Advancing Analytics: Data Engineering

NOVEMBER 10, 2020

We have discussed in the past this idea of the lakehouse , the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. It’s an interesting concept but has, in the past, been firmly an aspiration. In the world without lakehouses, we often see the “Modern Data Warehouse”, this two-phased approach to providing a holistic platform – we load our early data into a lake where we

SQL

SQL BI Data Warehouse Data Lake

Sat.Nov 07, 2020 - Fri.Nov 13, 2020

Keeping Small Queries Fast – Short query optimizations in Apache Impala

Road to AI

Webinars

Trending Sources

How to Pull Data from an API, Using AWS Lambda

Webinars

Self-Describing Events and How They Reduce Code in Your Processors

A Guide to Debugging Apache Airflow® DAGs

Expediting SQL Workers means Expediting your Business

Building A Cost Effective Data Catalog With Tree Schema

How to Make the Most of Big Data Analytics in Your Business

Sign up to get articles personalized to your interests!

More Trending

How to Make the Most of Big Data Analytics in Your Business

How to Choose Between Strict and Dynamic Schemas

Extreme data center pressure? Burst to the cloud with CDP!

Developing Grouparoo on macOS Big Sur

Boost Your Customer Experience with Better Payment Conversions

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Advanced Testing Techniques for Spring Kafka

True workplace diversity goes beyond gender parity

Liquidity Monitoring: Depth

Getting Started with Native Object Store and Microsoft Azure Object Storage in 5 Easy Steps

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Demystifying Variance Positions in Scala

Project Metamorphosis Month 7: Reliable Event Streaming with Confluent Cloud and Proactive Support

Using Elasticsearch to Offload Real-Time Analytics from MongoDB

How Tesla is Redefining the Auto Industry

How to Modernize Manufacturing Without Losing Control

How Netflix Scales its API with GraphQL Federation (Part 1)

Veterans Day: What Service Means to Clouderan Vets

Databricks SQL Analytics Workspace - The Evolution of the Lakehouse

Stay Connected