Top Data Engineering Digest Kafka MongoDB Content for June, 2020

June, 2020

Aws Account

Start Data Engineering

JUNE 26, 2020

1. AWS account Sign up for an AWS account at AWS Sign Up. You will be eligible for some free services for the first time sign up, ref: AWS Free Tier get your access key by clicking on your name -> My Security Credentials on the top pane and then clicking Create New Access Key.

AWS

AWS Accessible Accessibility IT

Business Intelligence meets Data Engineering with Emerging Technologies

Simon Späti

JUNE 14, 2020

Today we have more requirements with ever-growing tools and framework, complex cloud architectures, and with data stack that is changing rapidly. I hear claims: “Business Intelligence (BI) takes too long to integrate new data”, or “understanding how the numbers match up is very hard and needs lots of analysis”. The goal of this article is to make business intelligence easier, faster and more accessible with techniques from the sphere of data engineering.

Business Intelligence

Business Intelligence Data Engineering Data Engineer Technology

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

EC2 & Session Manager (Toronto Project)

Team Data Science

JUNE 6, 2020

Welcome back to this Toronto Specific data engineering project. We left off last time concluding finance has the largest demand for data engineers who have skills with AWS, and sketched out what our data ingestion pipeline will look like. I began building out the data ingestion pipeline by launching an EC2 instance. I should note that if you have created an AWS account, but have not yet created an Identity Access Management (IAM) admin role, and are therefore still using root credentials, I am s

Project

Project Management Data Ingestion AWS

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Stream Processing with IoT Data: Challenges, Best Practices, and Techniques

Confluent

JUNE 4, 2020

The rise of IoT devices means that we have to collect, process, and analyze orders of magnitude more data than ever before. As sensors and devices become ever more ubiquitous, […].

Process

Process Data Big Data Architecture

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

There Are No Perfect Words…

Teradata

JUNE 17, 2020

Juneteenth has been declared a U.S. holiday at Teradata, as we stand with the black community and reflect on what we can do to fight racism and injustice, and embrace diversity.

Data Collection And Management To Power Sound Recognition At Audio Analytic

Data Engineering Podcast

JUNE 29, 2020

Summary We have machines that can listen to and process human speech in a variety of languages, but dealing with unstructured sounds in our environment is a much greater challenge. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology. In this episode Dr. Chris Mitchell and Dr.

Data Collection

Data Collection Management High Quality Data Metadata

Aws Emr

Start Data Engineering

JUNE 26, 2020

EMR AWS EMR is a managed service provided by AWS to run Spark, HDFS, HIVE and other select software.

AWS

AWS Project Management

More Trending

Aws Emr

Start Data Engineering

JUNE 26, 2020

EMR AWS EMR is a managed service provided by AWS to run Spark, HDFS, HIVE and other select software.

AWS

AWS Project Management

Netflix Studio Engineering Overview

Netflix Tech

JUNE 30, 2020

By Steve Urban , Sridhar Seetharaman , Shilpa Motukuri , Tom Mack , Erik Strauss , Hema Kannan , CJ Barker Netflix is revolutionizing the way a modern studio operates. Our mission in Studio Engineering is to build a unified, global, and digital studio that powers the effective production of amazing content. [link] Netflix produces some of the world’s most beloved and award-winning films and series, including The Irishman, The Crown, La Casa de Papel, Ozark, and Tiger King.

Engineering

Engineering Entertainment Finance Machine Learning

Understanding Azure Synapse Analytics

Advancing Analytics: Data Engineering

JUNE 16, 2020

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace.

SQL

SQL Data Warehouse Engineering Data Engineering

Spring for Apache Kafka – Beyond the Basics: Can Your Kafka Consumers Handle a Poison Pill?

Confluent

JUNE 30, 2020

You know the fundamentals of Apache Kafka®. You are a Spring Boot developer working with Apache Kafka. You have chosen Spring Kafka for your integration. You have implemented your first […].

Kafka

Kafka Data

Modernization Means Simplicity and Sophistication

Teradata

JUNE 22, 2020

When it comes to being a modern data warehouse, your age really is just a number. It’s the underlying capabilities that actually count. Read more.

Data Warehouse

Data Warehouse IT Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Bringing Business Analytics To End Users With GoodData

Data Engineering Podcast

JUNE 22, 2020

Summary The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves there is a growing need to bring business intelligence use cases to a broader audience. GoodData is a platform focused on simplifying the work of bringing data to employees and end users.

Business Intelligence

Business Intelligence Data Engineering Data Engineer SQL

3 Key techniques, to optimize your Apache Spark code

Start Data Engineering

JUNE 19, 2020

Intro A lot of tutorials show how to write spark code with just the API and code samples, but they do not explain how to write “efficient Apache Spark” code.

Coding

Coding IT Data

12 Data Quality Metrics That ACTUALLY Matter

Monte Carlo

JUNE 11, 2020

One of our customers recently posed this question related to data quality metrics: I would like to set up an OKR for ourselves [the data team] around data availability. I’d like to establish a single data quality KPI that would summarize availability, freshness, quality. What’s the best way to do this? I can’t tell you how much joy this request brought me.

Data

Data Data Pipeline BI Data Engineering

How will Cloud HR Software change the human resources function?

U-Next

JUNE 4, 2020

Today, the way Human Resources function has changed from how it used to operate in the 90s and early 2000. Introduction of Human Resource Management (HRM) software in the late 90s had already revolutionized the way HR departments across various industries functioned. The HR department is not contained in the back-office anymore, where dealing with paperwork and recruiting were the only processes they were involved in.

Cloud

Cloud Recruitment Cloud Computing Big Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

Confluent

JUNE 9, 2020

tl;dr When a client wants to send or receive a message from Apache Kafka®, there are two types of connection that must succeed: The initial connection to a broker (the […].

Kafka

Kafka Java Python AWS

How to Leverage Advanced Analytics in the Healthcare Domain

Teradata

JUNE 24, 2020

Learn how Teradata Vantage's advanced analytics capabilities can analyze and predict useful diagnoses and insights in biomedicine and healthcare.

Healthcare

Accelerate Your Machine Learning With The StreamSQL Feature Store

Data Engineering Podcast

JUNE 15, 2020

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage.

Machine Learning

Machine Learning Google Cloud Kafka Data Engineer

What, why, when to use Apache Kafka, with an example

Start Data Engineering

JUNE 11, 2020

I have seen, heard and been asked questions and comments like What is Kafka and When should I use it? I don’t understand why we have to use Kafka The objective of this post is to get you up to speed with what Apache Kafka is, when to use them and the foundational concepts of Apache Kafka with a simple example. What is Apache Kafka First let’s understand what Apache Kafka is.

Kafka

Kafka IT

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

How to Create Near Real-time Models With Just dbt + SQL

dbt Developer Hub

JUNE 30, 2020

Before I dive into how to create this, I have to say this. You probably don’t need this. I, along with my other Fishtown colleagues, have spent countless hours working with clients that ask for near-real-time streaming data. However, when we start digging into the project, it is often realized that the use case is not there. There are a variety of reasons why near real-time streaming is not a good fit.

SQL

SQL Lambda Architecture Raw Data Architecture

PgBouncer on Kubernetes and how to achieve minimal latency

Zalando Engineering

JUNE 23, 2020

Introduction In the new Postgres Operator release 1.5 we have implemented couple of new interesting features , including connection pooling support. Master Wq says there is "No greatest tool", to run something successfully in production one needs to understand pros and cons. Let's try to dig into the topic, and take a look at the performance aspect of connection pooler support, mostly from a scaling perspective.

PostgreSQL

PostgreSQL Bytes Database SQL

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Confluent

JUNE 19, 2020

When I have a small software project that I want to share with the world, I don’t write my own version control system with a web UI. I don’t even […].

Kafka

Kafka Engineering Project Systems

Rising from the Ashes

Teradata

JUNE 9, 2020

Teradata's own Sir Freek Cox on dedicating one's life to charity and good works. Read more.

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Data Management Trends From An Investor Perspective

Data Engineering Podcast

JUNE 8, 2020

Summary The landscape of data management and processing is rapidly changing and evolving. There are certain foundational elements that have remained steady, but as the industry matures new trends emerge and gain prominence. In this episode Astasia Myers of Redpoint Ventures shares her perspective as an investor on which categories she is paying particular attention to for the near to medium term.

Data Management

Data Management Management Machine Learning Portfolio

A proven approach to land a Data Engineering job

Start Data Engineering

JUNE 2, 2020

I have seen and been asked the following questions by students, backend engineers and analysts who want to get into the data engineering industry. What approach should i take to land a Data Engineering job? I really want to get into DE. What can I do to learn more about it? In this article, I will try to provide a general approach that you as a beginner, student, backend engineer or analyst can use to land your first data engineering job.

Data Engineering

Data Engineering Data Engineer Engineering Data

Comparing Akka Streams, Kafka Streams and Spark Streaming

Rock the JVM

JUNE 16, 2020

Explore how Akka Streams, Kafka Streams, and Spark Streaming stack up and find out which one is best for your use case

Kafka

Learnings from Distributed XGBoost on Amazon SageMaker

Zalando Engineering

JUNE 21, 2020

Overview XGBoost is a popular Python library for gradient boosted decision trees. The implementation allows practitioners to distribute training across multiple compute instances (or workers), which is especially useful for large training sets. One tool used at Zalando for deploying production machine learning models is the managed service from Amazon called SageMaker.

Algorithm

Algorithm Machine Learning Python Project

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

How Merging Companies Will Give Rise to Unified Data Streams

Confluent

JUNE 23, 2020

Company mergers are becoming more common as businesses strive to improve performance and grow market share by saving costs and eliminating competition through acquisitions. But how do business mergers relate […].

Data

Data AWS Kafka Cloud

Announcing Vantage Trial

Teradata

JUNE 29, 2020

Vantage Trial provides free, 30-day access to Teradata Vantage in the cloud along with easy-to-use, web-based tools and applications for performing advanced analytics. Learn more.

Cloud

Cloud Accessible Accessibility

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

JUNE 1, 2020

Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform.

Data Lake

Data Lake Database Building Lambda Architecture

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

JUNE 16, 2020

MongoDB.live took place last week, and Rockset had the opportunity to participate alongside members of the MongoDB community and share about our work to make MongoDB data accessible via real-time external indexing. In our session, we discussed the need for modern data-driven applications to perform real-time aggregations and joins, and how Rockset uses MongoDB change streams and Converged Indexing to deliver fast queries on data from MongoDB.

MongoDB

MongoDB Data Lake PostgreSQL Kafka

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

June, 2020

Aws Account

Business Intelligence meets Data Engineering with Emerging Technologies

Webinars

Trending Sources

EC2 & Session Manager (Toronto Project)

Webinars

Stream Processing with IoT Data: Challenges, Best Practices, and Techniques

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

There Are No Perfect Words…

Data Collection And Management To Power Sound Recognition At Audio Analytic

Aws Emr

Sign up to get articles personalized to your interests!

More Trending

Aws Emr

Netflix Studio Engineering Overview

Understanding Azure Synapse Analytics

Spring for Apache Kafka – Beyond the Basics: Can Your Kafka Consumers Handle a Poison Pill?

Modernization Means Simplicity and Sophistication

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Bringing Business Analytics To End Users With GoodData

3 Key techniques, to optimize your Apache Spark code

12 Data Quality Metrics That ACTUALLY Matter

How will Cloud HR Software change the human resources function?

How to Modernize Manufacturing Without Losing Control

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

How to Leverage Advanced Analytics in the Healthcare Domain

Accelerate Your Machine Learning With The StreamSQL Feature Store

What, why, when to use Apache Kafka, with an example

The Ultimate Guide to Apache Airflow DAGS

How to Create Near Real-time Models With Just dbt + SQL

PgBouncer on Kubernetes and how to achieve minimal latency

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Rising from the Ashes

Optimizing The Modern Developer Experience with Coder

Data Management Trends From An Investor Perspective

A proven approach to land a Data Engineering job

Comparing Akka Streams, Kafka Streams and Spark Streaming

Learnings from Distributed XGBoost on Amazon SageMaker

15 Modern Use Cases for Enterprise Business Intelligence

How Merging Companies Will Give Rise to Unified Data Streams

Announcing Vantage Trial

Building A Data Lake For The Database Administrator At Upsolver

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Apache Airflow® Best Practices: DAG Writing

Stay Connected