Top Data Engineering Digest Data Schemas Raw Data Content for Week of Jan 16

Sat.Jan 16, 2021 - Fri.Jan 22, 2021

Helpful Tools for Apache Kafka Developers

Confluent

JANUARY 20, 2021

Apache Kafka® is at the core of a large ecosystem that includes powerful components, such as Kafka Connect and Kafka Streams. This ecosystem also includes many tools and utilities that […].

Kafka

Kafka Utilities

The last (but not least)”ops” you need for your data : DataGovops

François Nguyen

JANUARY 18, 2021

To finish the trilogy (Dataops, MLops), let’s talk about DataGovOps or how you can support your Data Governance initiative. The origin of the term : Datakitchen We must give credit to Chris Bergh and his team DataKictchen. You should visit their website , you will find incredible good stuff there. This article was published in October 2020 with this title : “Data Governance as Code” The idea behind that is you should “actively promotes the safe use of data with automation

Data Governance

Data Governance Metadata Government Data Pipeline

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How to unit test sql transforms in dbt

Start Data Engineering

JANUARY 16, 2021

Introduction Setup Code Conditional logic to read from mock input Custom macro to test for equality Setup environment specific test Run ELT using dbt Conclusion Further reading Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse. While this provides a lot of functionality out of the box, it gets tricky when you want to test your sql code locally before deploying to production.

SQL

SQL Data Warehouse Coding IT

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

JANUARY 20, 2021

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

Machine Learning

Machine Learning Database Data Science Building

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Optimizing the Aural Experience on Android Devices with xHE-AAC

Netflix Tech

JANUARY 22, 2021

By Phill Williams and Vijay Gondi Introduction At Netflix, we are passionate about delivering great audio to our members. We began streaming 5.1 channel surround sound in 2010, Dolby Atmos in 2017 , and adaptive bitrate audio in 2019. Continuing in this tradition, we are proud to announce that Netflix now streams Extended HE-AAC with MPEG-D DRC ( xHE-AAC ) to compatible Android Mobile devices (Android 9 and newer).

Metadata

Metadata Programming Algorithm Media

Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

Data Engineering Podcast

JANUARY 18, 2021

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and

Data Warehouse

Data Warehouse BI Data Data Engineering

Powering Microservices at SEI Investments with Event Streaming

Confluent

JANUARY 22, 2021

We launched a transformation initiative three years ago that transitioned SEI Investments from a monolithic database-oriented architecture to a containerized services platform with an event-driven architecture based on Confluent Platform. […].

Architecture

Architecture Database Data Schemas Data Governance

More Trending

Powering Microservices at SEI Investments with Event Streaming

Confluent

JANUARY 22, 2021

Architecture

Architecture Database Data Schemas Data Governance

Digital Transformation is a Data Journey From Edge to Insight

Cloudera

JANUARY 20, 2021

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

Manufacturing

Manufacturing Data Warehouse Kafka Retail

Do You Need a DataOps Dojo?

DataKitchen

JANUARY 20, 2021

As DataOps activity takes root within an enterprise, managers face the question of whether to build centralized or decentralized DataOps capabilities. Centralizing analytics brings it under control but granting analysts free reign is necessary to foster innovation and stay competitive. The beauty of DataOps is that you don’t have to choose between centralization and freedom.

Education

Education Coding Project Engineering

What is the Business Case for Delivering a Good Customer Experience at Your Bank?

Teradata

JANUARY 21, 2021

Most banks talk about developing great customer experiences but don't understand the value that investment would deliver. Learn about the 6 key capabilities banks require to address this problem.

Banking

Event Streaming Across Networks and Corporate Firewalls Using PubNub and Confluent Platform

Confluent

JANUARY 21, 2021

This year’s pandemic has forced businesses all around the world to adopt a “remote-first” approach to their operations, with an emphasis on better enabling collaboration, remote work, and productivity. This […].

Programming

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Fostering community to help drive cultural change

Cloudera

JANUARY 18, 2021

2020 put on full display how humanity shows up in times of hardship. We saw everything from street celebrations to usher weary medical personnel home after long days fighting to save lives to places like food banks receiving more donations and volunteers than ever before. Some communities were harder hit than others, and we’ve seen the same in the global workplace.

Food

Food Medical Banking Programming

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Silectis

JANUARY 22, 2021

For those using a robust analytics database, such as the Snowflake® Data Cloud , adding the power of a data engineering platform can help maximize the value you’re getting out of that database. In this demo, we’ll show you how native tools in the Magpie data engineering platform play well with Snowflake, ultimately, allowing your team to do more in a centralized data engineering environment.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Digital Payments Data Drives Increased Usage and Customer Retention

Teradata

JANUARY 18, 2021

Payment data drives opportunities to increase usage & prevent attrition through hyper-segmentation, personalized interactions & optimized rewards programs. Read more.

Programming

Programming Data

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

Data Council

JANUARY 21, 2021

Here's our January 2021 roundup of links from across the web that could be relevant to you: 1. Storing Cold Metadata with Alki (Dropbox) Dropbox shared insights into Alki , the petabyte-scale metadata store it designed for infrequently accessed metadata (“cold data”). The post details how one-size-fits-all database Edgestore was reaching capacity limits, and why audit logs were a good candidate to be moved elsewhere than on costly SSDs.

Metadata

Metadata Cloud AWS Database

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

IT Data Lake Data Warehouse Cloud Storage

Hepta Analytics Microsoft Silver Partner

Hepta Analytics

JANUARY 21, 2021

Hepta Analytics is proud to announce that we have attained Silver Status within the Microsoft Partner Network ! This achievement means that we have demonstrated our proven expertise in delivering quality solutions in one or more specialized areas of business (namely Cloud Platform and, in future, Data Analytics and Security). Microsoft competencies are designed to prepare companies to meet their customers’ needs, and to help attract new customers who are looking for Microsoft-certified sol

Data Analytics

Data Analytics Certification Cloud Programming

Defer Transaction Side-Effects in Node.js

Grouparoo

JANUARY 20, 2021

At Grouparoo, we use Actionhero as our Node.js API server and Sequelize for our Object Relational Mapping (ORM) tool - making it easy to work with complex records from our database. Within our Actions and Tasks, we often want to treat the whole execution as a single database transaction - either all the modifications to the database will succeed or fail as a unit.

Database

Database SQL Coding Utilities

Head Pose Estimation with Computer Vision

InData Labs

JANUARY 19, 2021

Recently, head pose estimation has become a popular area of research. Data scientists have spent over 20 years researching the most effective approaches to it, уеt haven’t settled for one. The technology is needed for facial recognition, eye gaze estimation and emotion recognition. For instance, it can be used for safety monitoring on the road, Запись Head Pose Estimation with Computer Vision впервые появилась InData Labs.

Technology

Technology IT Data Data Engineer

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

How to configure clients to connect to Apache Kafka Clusters securely – Part 3: PAM authentication

Cloudera

JANUARY 20, 2021

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka

Kafka Systems Management Accessible

How Does UX Design Help in Visualizing Big Data?

Teradata

JANUARY 19, 2021

Learn about the UX principles that help in designing effective Big Data visualizations so users can better understand data and make more informed decisions.

Big Data

Big Data Designing Data

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

Confluent

JANUARY 19, 2021

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka® and software architecture in […].

Kafka

Kafka Architecture Data Architecture Data

Creating a uniform landscape for macOS Software

Zalando Engineering

JANUARY 20, 2021

At the time of this writing, we have a universe of Mac applications — that are identified and version-inventoried — within the fleet of little over 3,000 Mac devices in Zalando from which a subset — selected either by their importance, frequency of updates or size of the install base — are part of a so-called software lifecycle. However, in July 2019, when a vulnerability was discovered in Zoom (long before becoming the mainstream video conference app during the COVID-19 pandemic), Information S

Database

Database Python Management Coding

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera

JANUARY 19, 2021

Cloudera Flow Management , based on Apache NiFi and part of the Cloudera DataFlow platform , is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

Management

Management Big Data Ecosystem Kafka AWS

Elasticsearch or Rockset for Real-Time Analytics: Managing Clusters vs Going Serverless

Rockset

JANUARY 19, 2021

Having the right analytics backend for your real-time application makes all the difference when it comes to how much time your team spends managing and maintaining the underlying infrastructure. Today, distributed systems that used to require a lot of manual intervention can often be replaced by more operationally efficient solutions. One example of this evolution is the move from Elasticsearch —which has been a great open-source, full-text search and analytics engine—to a low-ops alternative in

Management

Management Datasets Architecture Database

How to Build a Successful Cloud DataOps Program

DataKitchen

JANUARY 18, 2021

The post How to Build a Successful Cloud DataOps Program first appeared on DataKitchen.

Programming

Programming Cloud Building

2020 Visual Recap of the Apache Superset Project

Preset

JANUARY 17, 2021

The Apache Superset project experienced a critical growth period in 2020 in all aspects. In this post, I'll document how the key facets of the project changed last year.

Project

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

JANUARY 17, 2021

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality. .

Database-centric

Database-centric Manufacturing Retail Food

Cloudera Cares Speaker Series guiding value: Diversity

Cloudera

JANUARY 21, 2021

With intention and creativity, we opened eyes and minds. What now seems like a lifetime ago, our worlds were upended. As the stay at home orders were extended again and again and we continued to work from home, many of us were faced with reimagining our work. . For me, an unexpected challenge as head of Cloudera Cares has been redesigning the employee volunteer experience to continue engaging Clouderans even when in-person activities were no longer possible.

Programming

Programming IT

Apache Superset 1.0 is out!

Preset

JANUARY 17, 2021

The best Superset release to date is finally out

Sat.Jan 16, 2021 - Fri.Jan 22, 2021

Helpful Tools for Apache Kafka Developers

The last (but not least)”ops” you need for your data : DataGovops

Webinars

Trending Sources

How to unit test sql transforms in dbt

Webinars

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Optimizing the Aural Experience on Android Devices with xHE-AAC

Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

Powering Microservices at SEI Investments with Event Streaming

Sign up to get articles personalized to your interests!

More Trending

Powering Microservices at SEI Investments with Event Streaming

Digital Transformation is a Data Journey From Edge to Insight

Do You Need a DataOps Dojo?

What is the Business Case for Delivering a Good Customer Experience at Your Bank?

Event Streaming Across Networks and Corporate Firewalls Using PubNub and Confluent Platform

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Fostering community to help drive cultural change

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Digital Payments Data Drives Increased Usage and Customer Retention

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

How to Modernize Manufacturing Without Losing Control

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Hepta Analytics Microsoft Silver Partner

Defer Transaction Side-Effects in Node.js

Head Pose Estimation with Computer Vision

The Ultimate Guide to Apache Airflow DAGS

How to configure clients to connect to Apache Kafka Clusters securely – Part 3: PAM authentication

How Does UX Design Help in Visualizing Big Data?

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

Creating a uniform landscape for macOS Software

Apache Airflow® Best Practices: DAG Writing

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Elasticsearch or Rockset for Real-Time Analytics: Managing Clusters vs Going Serverless

How to Build a Successful Cloud DataOps Program

2020 Visual Recap of the Apache Superset Project

How to Achieve High-Accuracy Results When Using LLMs

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera Cares Speaker Series guiding value: Diversity

Apache Superset 1.0 is out!

Stay Connected