Top Data Engineering Digest PostgreSQL MySQL Content for July, 2019

July, 2019

Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning

KDnuggets

JULY 29, 2019

Check out our latest Top 10 Most Popular Data Science and Machine Learning podcasts available on iTunes. Stay up to date in the field with these recent episodes and join in with the current data conversations.

Machine Learning

Machine Learning Data Science Data

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Confluent

JULY 24, 2019

Using Jaeger tracing, I’ve been able to answer an important question that nearly every Apache Kafka ® project that I’ve worked on posed: how is data flowing through my distributed system? Quick disclaimer: if you’re simply looking for an answer to that question, this post won’t provide that answer directly. Instead, in this post I will point you to an earlier blog post where I already answered that question and then I will focus on what should be your next question: now that I’m relying on Jaege

Kafka

Kafka Systems Bytes Project

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Our Commitment to Open Source Software

Cloudera

JULY 10, 2019

Open source has been core to the missions of both Hortonworks and Cloudera and central to our values and culture. With more than 700 engineers in the new Cloudera, our company writes a prodigious amount of open source code each year that’s contributed to more than 30 different open source projects. We’re also a very innovative open source company, having collectively launched more than a dozen new open source projects since the founding of the two companies. .

Consulting

Consulting Kafka Project Data Science

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Power of Integrated Data and Analytics

Teradata

JULY 9, 2019

Integrated data and analytics has a proven track record of helping organize operations, enhance customer experience and improve revenue and market growth.

Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Simplifying Data Integration Through Eventual Connectivity

Data Engineering Podcast

JULY 28, 2019

Summary The ETL pattern that has become commonplace for integrating data from multiple sources has proven useful, but complex to maintain. For a small number of sources it is a tractable problem, but as the overall complexity of the data ecosystem continues to expand it may be time to identify new ways to tame the deluge of information. In this episode Tim Ward, CEO of CluedIn, explains the idea of eventual connectivity as a new paradigm for data integration.

Data Integration

Data Integration Metadata Architecture Media

Evolution of Netflix Conductor:

Netflix Tech

JULY 30, 2019

v2.0 and beyond By Anoop Panicker and Kishore Banala Conductor is a workflow orchestration engine developed and open-sourced by Netflix. If you’re new to Conductor, this earlier blogpost and the documentation should help you get started and acclimatized to Conductor. Netflix Conductor: A microservices orchestrator In the last two years since inception, Conductor has seen wide adoption and is instrumental in running numerous core workflows at Netflix.

Metadata

Metadata Media AWS Transportation

Top 13 Skills To Become a Rockstar Data Scientist

KDnuggets

JULY 26, 2019

Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist.

Education

Education Big Data SQL Data

More Trending

Top 13 Skills To Become a Rockstar Data Scientist

KDnuggets

JULY 26, 2019

Education, coding, SQL, big data platforms, storytelling and more. These are the 13 skills you need to master to become a rockstar data scientist.

Education

Education Big Data SQL Data

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. This API enables users to leverage ready-to-use components that can stream data from external systems into Kafka topics, as well as stream data from Kafka topics into external systems.

MongoDB

MongoDB Kafka Database Medical

Solving the Pain Points of Big Data Management

Cloudera

JULY 9, 2019

Every business aims to deliver products and services quickly and efficiently based upon customer wants and needs. Today, much of that speed and efficiency relies on insights driven by big data. Yet big data management often serves as a stumbling block, because many businesses continue to struggle with how to best capture and analyze their data. Unorganized data presents another roadblock.

Big Data

Big Data Data Management Management Cloud

How Analytics Answer the Most Challenging Business Questions

Teradata

JULY 14, 2019

Analytics can help enterprises answer the toughest business questions by leveraging all of the data across an organization.

Data

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access. In this episode Zhamak Dehghani shares an alternative approach in the form of a data mesh.

Data Lake

Data Lake Hadoop Data Architecture

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Re-Architecting the Video Gatekeeper

Netflix Tech

JULY 12, 2019

By Drew Koszewnik This is the story about how the Content Setup Engineering team used Hollow, a Netflix OSS technology, to re-architect and simplify an essential component in our content pipeline?—?delivering a large amount of business value in the process. The Context Each movie and show on the Netflix service is carefully curated to ensure an optimal viewing experience.

Datasets

Datasets Kafka Architecture Computer Science

Top Certificates and Certifications in Analytics, Data Science, Machine Learning and AI

KDnuggets

JULY 25, 2019

Here are the top certificates and certifications in Analytics, AI, Data Science, Machine Learning and related areas.

Certification

Certification Machine Learning Data Science Data

KSQL in Football: FIFA Women’s World Cup Data Analysis

Confluent

JULY 3, 2019

One of the football (as per European terminology) highlights of the summer is the FIFA Women’s World Cup. France, Brazil, and the USA are the favourites, and this year Italy is present at the event for the first time in 20 years. From a data perspective, the World Cup represents an interesting source of information. There’s a lot of dedicated press coverage, as well as the standard social media excitement following any kind of big event.

Data Analysis

Data Analysis Kafka Datasets Java

Educating Data Analysts at Scale: Cloudera Launches Modern Big Data Analysis with SQL on Coursera

Cloudera

JULY 15, 2019

At a time when machine learning, deep learning, and artificial intelligence capture an outsize share of media attention, jobs requiring SQL skills continue to vastly outnumber jobs requiring those more advanced skills. Influential data scientists often point to SQL as the most important yet underrated skill for anyone who works with data. SQL is today—and will remain for the foreseeable future—a vital foundational skill for a wide range of data professionals working in different roles across dif

Education

Education Big Data Data Analysis SQL

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

How to Enjoy Hybrid Partitioning with Teradata Columnar

Teradata

JULY 16, 2019

Teradata Vantage's NewSQL Engine's performance-enhancing options include column-row hybrid partitioning. Find out how to take advantage of this great feature.

Engineering

Data Labeling That You Can Feel Good About With CloudFactory

Data Engineering Podcast

JULY 14, 2019

Summary Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box.

Machine Learning

Machine Learning Media Cloud Data

Bringing Rich Experiences to Memory-constrained TV Devices

Netflix Tech

JULY 1, 2019

Bringing Rich Experiences to Memory-Constrained TV Devices By Jason Munning, Archana Kumar, Kris Range Netflix has over 148M paid members streaming on more than half a billion devices spanning over 1,900 different types. In the TV space alone, there are hundreds of device types that run the Netflix app. We need to support the same rich Netflix experience on not only high-end devices like the PS4 but also memory and processor-constrained consumer electronic devices that run a similar chipset as w

Designing

Designing Bytes Electronics Project

Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras

KDnuggets

JULY 26, 2019

Different neural network architectures excel in different tasks. This particular article focuses on crafting convolutional neural networks in Python using TensorFlow and Keras.

Python

Python Architecture

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Confluent

JULY 16, 2019

Have you ever realized that, according to the latest FBI report , more than 80% of all crimes are property crimes, such as burglaries? And that the FBI clearance figures indicate that only 13% of all burglaries in 2017 were cleared due to lack of witnesses and/or physical evidence? How cool would it be to build your own burglar alarm system that can alert you before the actual event takes place simply by using a few network-connected cameras and analyzing the camera images with Apache Kafka ® ,

Machine Learning

Machine Learning Kafka Java Datasets

Crafting the Perfect Internship Playlist

Pandora Engineering

JULY 29, 2019

Credit: Kanok Sulaiman Disclaimer: These are my experiences from being a Pandora software developer intern in the summer of 2019. All opinions expressed are my own, and represent no one except myself. I recently spent the last summer of my undergraduate program as an intern for Pandora Media in Oakland, CA. I gained a lot from my experience, and I’m writing this post to detail the application process, the lessons that I learned, and the company culture.

Java

Java Recruitment Algorithm Computer Science

Data Science for All: How to Bridge the Data Scientist Gap

Teradata

JULY 21, 2019

Democratizing data science through access to tools like Teradata Vantage are helping businesses bridge the data scientist gap to get the outcomes they need.

Data Science

Data Science Data Accessible Accessibility

Scale Your Analytics On The Clickhouse Data Warehouse

Data Engineering Podcast

JULY 8, 2019

Summary The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production.

Data Warehouse

Data Warehouse MySQL Hadoop Data Lake

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Introduction to Streaming Data

Cloud Academy

JULY 16, 2019

Designing a streaming data pipeline presents many challenges, particularly around specific technology requirements. When designing a cloud-based solution, an architect is no longer faced with the question, “How do I get this job done with the technology we have?” but rather, “What is the right technology to support my use case?” In this blog post, we will walk through some initial scoping steps and walk through an example.

Manufacturing

Manufacturing MySQL Data Cloud

Fantastic Four of Data Science Project Preparation

KDnuggets

JULY 26, 2019

This article takes a closer look at the four fantastic things we should keep in mind when approaching every new data science project.

Data Science

Data Science Project Data Data Preparation

Kafka Listeners – Explained

Confluent

JULY 1, 2019

This question comes up on Stack Overflow and such places a lot , so here’s something to try and help. tl;dr: You need to set advertised.listeners (or KAFKA_ADVERTISED_LISTENERS if you’re using Docker images) to the external address (host/IP) so that clients can correctly connect to it. Otherwise, they’ll try to connect to the internal host address—and if that’s not reachable, then problems ensue.

Kafka

Kafka Metadata AWS Bytes

Open Source: June Updates - New releases, continue to foster diversity and inclusion in tech

Zalando Engineering

JULY 14, 2019

Project Highlights Kopf - Kubernetes Operator Pythonic Framework now supports built-in resources and can be used to write controllers of any kind (pods, namespaces, mixed), not only of custom resources. Check out the latest release for more details [link] Skipper publishes new releases weekly. Some of the important features were implemented such as support to proxy Kubernetes API server and support Kubernetes externalName services from ingress.

AWS

AWS SQL Python Project

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Enterprise Data Strategy: The Upside of Scarce Funding

Teradata

JULY 28, 2019

In a cost-cutting culture, directly linking data projects to top business initiatives is a good way to keep them from getting clipped. Learn more.

Data

Data Project

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

Data Engineering Podcast

JULY 1, 2019

Summary Anomaly detection is a capability that is useful in a variety of problem domains, including finance, internet of things, and systems monitoring. Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. In this interview he explains the system design that he tested, his findings for how these tools were able to work together, and how they behaved at diffe

Kafka

Kafka Finance Media Architecture

What is Data Extraction and How It Can Serve Your Business

InData Labs

JULY 11, 2019

In the highly competitive business world of today, data reign supreme. Customer personal data, comprehensive operating statistics, sales figures, or inter-company information may play a core role in strategic decision making. It’s vital to keep an eye on the quantity and quality of data that can be captured and extracted from different web sources.

IT Data Machine Learning Data Engineering

This New Google Technique Help Us Understand How Neural Networks are Thinking

KDnuggets

JULY 24, 2019

Recently, researchers from the Google Brain team published a paper proposing a new method called Concept Activation Vectors (CAVs) that takes a new angle to the interpretability of deep learning models.

Deep Learning

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

July, 2019

Top 10 Best Podcasts on AI, Analytics, Data Science, Machine Learning

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Webinars

Trending Sources

Our Commitment to Open Source Software

Webinars

The Power of Integrated Data and Analytics

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Simplifying Data Integration Through Eventual Connectivity

Evolution of Netflix Conductor:

Top 13 Skills To Become a Rockstar Data Scientist

Sign up to get articles personalized to your interests!

More Trending

Top 13 Skills To Become a Rockstar Data Scientist

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Solving the Pain Points of Big Data Management

How Analytics Answer the Most Challenging Business Questions

Straining Your Data Lake Through A Data Mesh

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Re-Architecting the Video Gatekeeper

Top Certificates and Certifications in Analytics, Data Science, Machine Learning and AI

KSQL in Football: FIFA Women’s World Cup Data Analysis

Educating Data Analysts at Scale: Cloudera Launches Modern Big Data Analysis with SQL on Coursera

How to Modernize Manufacturing Without Losing Control

How to Enjoy Hybrid Partitioning with Teradata Columnar

Data Labeling That You Can Feel Good About With CloudFactory

Bringing Rich Experiences to Memory-constrained TV Devices

Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras

The Ultimate Guide to Apache Airflow DAGS

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Crafting the Perfect Internship Playlist

Data Science for All: How to Bridge the Data Scientist Gap

Scale Your Analytics On The Clickhouse Data Warehouse

Optimizing The Modern Developer Experience with Coder

Introduction to Streaming Data

Fantastic Four of Data Science Project Preparation

Kafka Listeners – Explained

Open Source: June Updates - New releases, continue to foster diversity and inclusion in tech

15 Modern Use Cases for Enterprise Business Intelligence

Enterprise Data Strategy: The Upside of Scarce Funding

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

What is Data Extraction and How It Can Serve Your Business

This New Google Technique Help Us Understand How Neural Networks are Thinking

Apache Airflow® Best Practices: DAG Writing

Stay Connected