Top Data Engineering Digest Data Architect Data Security Content for Week of Sep 12

Sat.Sep 12, 2020 - Fri.Sep 18, 2020

Streaming Data from Apache Kafka into Azure Data Explorer with Kafka Connect

Confluent

SEPTEMBER 16, 2020

Near-real-time insights have become a de facto requirement for Azure use cases involving scalable log analytics, time series analytics, and IoT/telemetry analytics. Azure Data Explorer (also called Kusto) is the […].

Kafka

Kafka Data Programming Cloud

Fundamentals for Success in Cloud Data Management

Cloudera

SEPTEMBER 14, 2020

Everybody needs more data and more analytics, with so many different and sometimes often conflicting needs. Data engineers need batch resources, while data scientists need to quickly onboard ephemeral users. Data architects deal with constantly evolving workloads and business analysts must balance the urgency and importance of a concurrent user population that continues to grow.

Data Management

Data Management Cloud Management Business Analyst

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Distributed In Memory Processing And Streaming With Hazelcast

Data Engineering Podcast

SEPTEMBER 14, 2020

Summary In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the Hazelcast team has also built a streaming platform for reliable high throughput data transmission.

Process

Process Unstructured Data Metadata Data Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Analytics at Netflix: Who we are and what we do

Netflix Tech

SEPTEMBER 18, 2020

Analytics at Netflix: Who We Are and What We Do An Introduction to Analytics and Visualization Engineering at Netflix by Molly Jackman & Meghana Reddy Explained: Season 1 (Photo Credit: Netflix) Across nearly every industry, there is recognition that data analytics is key to driving informed business decision-making. But there is far less agreement on what that term “data analytics” actually means?

BI Engineering Data Science Data Warehouse

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Using the Fully Managed MongoDB Atlas Connector in a Secure Environment

Confluent

SEPTEMBER 18, 2020

Since the MongoDB Atlas source and sink became available in Confluent Cloud, we’ve received many questions around how to set up these connectors in a secure environment. By default, MongoDB […].

MongoDB

MongoDB Management Cloud Kafka

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

For enterprise organizations, managing and operationalizing increasingly complex data across the business has presented a significant challenge for staying competitive in analytic and data science driven markets. With growing disparate data across everything from edge devices to individual lines of business needing to be consolidated, curated, and delivered for downstream consumption, it’s no wonder that data engineering has become the most in-demand role across businesses — growing at an estima

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

The Game Has Changed for Retail – or Has it?

Teradata

SEPTEMBER 14, 2020

The game had changed for the retail sector long ago – but it has taken the COVID-19 crisis for people to notice. A new appreciation for the role of data in retail has emerged.

Retail

Retail IT Data

More Trending

The Game Has Changed for Retail – or Has it?

Teradata

SEPTEMBER 14, 2020

The game had changed for the retail sector long ago – but it has taken the COVID-19 crisis for people to notice. A new appreciation for the role of data in retail has emerged.

Retail

Retail IT Data

How Our Paths Brought Us to Data and Netflix

Netflix Tech

SEPTEMBER 18, 2020

Part of our series on who works in Analytics at Netflix?—?and what the role entails by Julie Beckley & Chris Pham This Q&A provides insights into the diverse set of skills, projects, and culture within Data Science and Engineering (DSE) at Netflix through the eyes of two team members: Chris Pham and Julie Beckley. Photo from a team curling offsite?

Data Science

Data Science Consulting Food Algorithm

Confluent Is Now Certified Ready on AWS Outposts

Confluent

SEPTEMBER 15, 2020

Are you looking for a way to run AWS services on premises in your own datacenter? I am excited to share today that we have completed validation of support for […].

AWS

AWS Programming Cloud

Announcing the 2020 Data Impact Awards Finalists

Cloudera

SEPTEMBER 17, 2020

Announcing the finalists of the Data Impact Awards is always a highlight in our annual Cloudera calendar, and this year is no different. The 2020 entrants have shown incredible data-driven innovation, problem-solving ability and have proven real-world impact. . Our independent judges certainly had their jobs cut out for them, as they were faced with an overwhelming number of outstanding entries.

Banking

Banking Government Medical Data Security

How to develop digital products and solutions for industrial environments?

Data Science Blog: Data Engineering

SEPTEMBER 18, 2020

The Data Science and Engineering Process in PLM. Huge opportunities for digital products are accompanied by huge risks Digitalization is about to profoundly change the way we live and work. The increasing availability of data combined with growing storage capacities and computing power make it possible to create data-based products, services, and customer specific solutions to create insight with value for the business.

Data Science

Data Science Consulting Software Engineer Software Engineering

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Data

Lenses, Prisms, and Optics in Scala

Rock the JVM

SEPTEMBER 17, 2020

Inspect, extract, and modify deeply nested data structures in Scala with ease: discover a powerful method to handle complex data effortlessly

Scala

Scala Data

Celebrating Hispanic Heritage Month

Teradata

SEPTEMBER 17, 2020

Hispanic Heritage Month not only promotes the rich culture & heritage that so many Americans share, it sheds a distinct light on colleagues & friends. Read more from our colleague, Crystal Diaz.

Covid-19 Accelerates The Need for Retail, Manufacturing Supply Chains To Adapt – Part 2

Cloudera

SEPTEMBER 18, 2020

Supply chain redesign has become an area of critical focus as businesses try to ensure supply chains aren’t overly reliant on any one constituent part. . In Part One of our round-table discussion with Michael Ger, Managing Director of Manufacturing and Automotive, and Brent Biddulph, Global Managing Director, Retail and Consumer Goods at Cloudera they talked about the direct impact of COVID 19 on the retail and manufacturing sectors with Vijay Raja, Director of Industry & Solutions Marketing

Manufacturing

Manufacturing Retail Datasets Machine Learning

Don't Track Product Performance with Events

Grouparoo

SEPTEMBER 16, 2020

Many businesses have built great analytics products to help with tracking the actions your users are taking in your product ( Mixpanel , Pendo , and Amplitude , to name a few). These products use an events-based data model where they track user behavior, usually client-side, so you can understand and visualize behavior like page views and button clicks.

Data Warehouse

Data Warehouse Database Engineering Data Science

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Data Observability Tools: Data Engineering’s Next Frontier

Monte Carlo

SEPTEMBER 16, 2020

To keep pace with data’s lightning innovation speed, data engineers need to invest not only in the latest data modeling and analytics tools, but also technologies that can increase data accuracy and prevent broken ETL pipelines. The solution? Data observability tools , the next frontier of data engineering and a pillar of the emerging data reliability category.

Software Engineer

Software Engineer Software Engineering Data Machine Learning

Clean Up Your Enterprise Data Mess the Easy Way: Ignore it

Teradata

SEPTEMBER 16, 2020

If you’re responsible for data strategy in a large organization, there’s a good chance you’ve got a data mess on your hands. So what do you do with it? Read more.

IT Data

Access control for Azure ADLS cloud object storage

Cloudera

SEPTEMBER 15, 2020

Cloudera Data Platform 7.2.1 introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage. Apache Ranger provides a centralized console to manage authorization and view audits of access to resources in a large number of services including Apache Hadoop’s HDFS, Apache Hive, Apache HBase

Accessible

Accessible Accessibility Cloud Cloud Storage

Akka HTTP Loves JSON: 3 Libraries You Can Integrate into Akka HTTP

Rock the JVM

SEPTEMBER 15, 2020

Akka HTTP needs JSON like humans need water: discover how to integrate Spray-Json, circe, and Jackson into Akka HTTP

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

#CloudGuruChallenge – Event-Driven Python on AWS

A Cloud Guru: Data Engineering

SEPTEMBER 14, 2020

You can complete the project requirements by yourself or in collaboration with others. Feel free to ask questions in the discussion forum or on social media using the #CloudGuruChallenge hashtag! The post #CloudGuruChallenge – Event-Driven Python on AWS appeared first on A Cloud Guru.

AWS

AWS Python Media Cloud

Leveraging Teradata Vantage's Superior Performance for Real-Time Analytics

Teradata

SEPTEMBER 14, 2020

Learn how Teradata Vantage was used for a leading Turkish bank to predict credit scores for customers in real time and to make near-immediate decisions on their loan applications.

Banking

Addressing the data storm with the Enterprise Data Cloud

Cloudera

SEPTEMBER 15, 2020

For some, this may look like a new category at this year’s Data Impact Awards. However, the Enterprise Data Cloud category marks the evolution of what was once the Data Anywhere category. The main reason for this change is that this title better represents the move that our customers are making; away from acknowledging the ability to have data ‘anywhere’.

Cloud

Cloud Data Government Healthcare

Save your High Water Marks as Strings

Grouparoo

SEPTEMBER 13, 2020

In Brian’s post, Building a Sync Engine , he talks about the value of using a High Water Mark to keep track of the latest bit of data you’ve imported. This approach is often a better pattern than using Limit and Offset , especially when the underlying data might be changing. In this post, I’m gong to dive even deeper into this topic, and suggest that you should be storing you High Water Marks as strings whenever possible.

MySQL

MySQL Database SQL Building

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Rockset: 1 Billion Events in a Day with 1-Second Data Latency

Rockset

SEPTEMBER 15, 2020

YADB (Yet Another Database Benchmark) The world has a plethora of database benchmarks, starting with the Wisconsin Benchmark which is my favorite. Firstly, that benchmark was from Dr David Dewitt, who taught me Database Internals when I was a graduate student at University of Wisconsin. Secondly, it is probably the earliest conference paper (circa 1983) that I ever read.

Database

Database Bytes Data Warehouse Data Pipeline

Monte Carlo Raises $16M to Build the World’s First Data Reliability Platform

Monte Carlo

SEPTEMBER 16, 2020

We’re excited to share that Monte Carlo has raised $16M in funding to pioneer the Data Reliability category. Our Series A was led by Accel , with participation from GGV Capital , and enables us to pursue our mission of accelerating the world’s adoption of data by reducing Data Downtime. Other angel investors include DJ Patil , the former Chief Data Scientist for the U.S. as well as top executives from Cloudera, eBay, Google and VMWare.

Building

Building Data Machine Learning Data Pipeline

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera

SEPTEMBER 17, 2020

Today’s data tool challenges. A North American telecom company struggled for years trying to react quickly enough to new categories and new levels of spam texts and calls. They also did not have a good way to know when and why they would need additional capacity on their own, or any other telecom company’s networks. By enabling their event analysts to monitor and analyze events in real time, as well as directly in their data visualization tool, and also rate and give feedback to the system

Data Warehouse

Data Warehouse Unstructured Data Pharmaceutical MySQL

Sat.Sep 12, 2020 - Fri.Sep 18, 2020

Streaming Data from Apache Kafka into Azure Data Explorer with Kafka Connect

Fundamentals for Success in Cloud Data Management

Webinars

Trending Sources

Distributed In Memory Processing And Streaming With Hazelcast

Webinars

Analytics at Netflix: Who we are and what we do

A Guide to Debugging Apache Airflow® DAGs

Using the Fully Managed MongoDB Atlas Connector in a Secure Environment

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

The Game Has Changed for Retail – or Has it?

Sign up to get articles personalized to your interests!

More Trending

The Game Has Changed for Retail – or Has it?

How Our Paths Brought Us to Data and Netflix

Confluent Is Now Certified Ready on AWS Outposts

Announcing the 2020 Data Impact Awards Finalists

How to develop digital products and solutions for industrial environments?

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Lenses, Prisms, and Optics in Scala

Celebrating Hispanic Heritage Month

Covid-19 Accelerates The Need for Retail, Manufacturing Supply Chains To Adapt – Part 2

Don't Track Product Performance with Events

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Data Observability Tools: Data Engineering’s Next Frontier

Clean Up Your Enterprise Data Mess the Easy Way: Ignore it

Access control for Azure ADLS cloud object storage

Akka HTTP Loves JSON: 3 Libraries You Can Integrate into Akka HTTP

How to Modernize Manufacturing Without Losing Control

#CloudGuruChallenge – Event-Driven Python on AWS

Leveraging Teradata Vantage's Superior Performance for Real-Time Analytics

Addressing the data storm with the Enterprise Data Cloud

Save your High Water Marks as Strings

The Ultimate Guide to Apache Airflow DAGS

Rockset: 1 Billion Events in a Day with 1-Second Data Latency

Monte Carlo Raises $16M to Build the World’s First Data Reliability Platform

How to get powerful and actionable insights from any and all of your data, without delay

Stay Connected