Top Data Engineering Digest MySQL PostgreSQL Content for Week of Dec 14

Sat.Dec 14, 2019 - Fri.Dec 20, 2019

Interpretability part 3: LIME and SHAP

KDnuggets

DECEMBER 19, 2019

The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Uber Engineering

DECEMBER 19, 2019

Every day around the world, millions of trips take place across the Uber network, giving users more reliable transportation through ridesharing, bikes, and scooters, drivers and truckers additional opportunities to earn, employees and employers more convenient business travel, and hungry … The post Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction appeared first on Uber Engineering Blog.

Transportation

Transportation Engineering Architecture

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Apache Kafka Producer Improvements with the Sticky Partitioner

Confluent

DECEMBER 18, 2019

The amount of time it takes for a message to move through a system plays a big role in the performance of distributed systems like Apache Kafka®. In Kafka, the […].

Kafka

Kafka Systems IT

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

The 4 fastest ways not to get hired as a data scientist

KDnuggets

DECEMBER 18, 2019

Ready to try to get hired as a data scientist for the first time? Avoiding these common mistakes won’t guarantee an offer, but not avoiding them is a sure fire way for your application to be tossed into the trash bin.

Data

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber Engineering

DECEMBER 17, 2019

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the … The post Uber’s Data Platform in 2019: Transforming Information to Intelligence appeared first on Uber Engineering Blog.

Data

Data Engineering Building Big Data

The Easiest Way to Install Apache Kafka and Confluent Platform – Using Ansible

Confluent

DECEMBER 19, 2019

With Confluent Platform 5.3, we are actively embracing the rising DevOps movement by introducing CP-Ansible, our very own open source Ansible playbooks for deployment of Apache Kafka® and the Confluent […].

Kafka

More Trending

The Easiest Way to Install Apache Kafka and Confluent Platform – Using Ansible

Confluent

DECEMBER 19, 2019

Kafka

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL

MySQL PostgreSQL Database Transportation

Automatic Text Summarization in a Nutshell

KDnuggets

DECEMBER 18, 2019

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about Automatic Text Summarization and the various ways it is used.

How Dataquest Made the Difference for Stacey’s Data Job

Dataquest

DECEMBER 18, 2019

Today, Stacey Ustian is a data engineer. But the path that led her here wasn’t always easy, and there were a few bumps and twists along the way. Her journey to data science started in a rather unusual place: the law library. After earning her Master’s degree in Library and Information Science, Stacey had taken a job working in the library of a law firm.

SQL

SQL Python Data Engineering Data Engineer

What’s New in Apache Kafka 2.4

Confluent

DECEMBER 16, 2019

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.4.0. This release includes a number of key new features and improvements […].

Kafka

Kafka Scala IT

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

MySQL

MySQL PostgreSQL Database Transportation

Alternative Cloud Hosted Data Science Environments

KDnuggets

DECEMBER 19, 2019

Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.

Data Science

Data Science Cloud Data Cloud Computing

Keeping a Lid on Concurrency within the Vantage Platform

Teradata

DECEMBER 18, 2019

Carrie Ballinger discusses the techniques for managing concurrency inside the Advanced SQL Engine and the benefits provided. Read more.

SQL

SQL Engineering Management

Testing Kafka Streams Using TestInputTopic and TestOutputTopic

Confluent

DECEMBER 17, 2019

As a test class that allows you to test Kafka Streams logic, TopologyTestDriver is a lot faster than utilizing EmbeddedSingleNodeKafkaCluster and makes it possible to simulate different timing scenarios. Not […].

Kafka

Kafka Utilities IT Process

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis, Ioannis Papapanagiotou Continue reading on Netflix TechBlog ».

Data

Data MySQL Database

Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

KDnuggets

DECEMBER 16, 2019

Predictions for 2020 from a dozen innovative companies in AI, Analytics, Machine Learning, Data Science, and Data industry.

Machine Learning

Machine Learning Data Science Data

6 Practices to Realize a Long-Term Data Vision Through Near-Term Work

Teradata

DECEMBER 16, 2019

Enterprises either have no data strategy at all or an over-complicated one that under delivers. Find out how to create an effective data strategy by striking balance.

Data

Superset Announces Elasticsearch Support!

Preset

DECEMBER 15, 2019

Announcing Elasticsearch in Superset, powered by a new open-source Python library from Preset

Python

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

Google’s New Explainable AI Service

KDnuggets

DECEMBER 20, 2019

Google has started offering a new service for “explainable AI” or XAI, as it is fashionably called. Presently offered tools are modest, but the intent is in the right direction.

Let’s Build an Intelligent Chatbot

KDnuggets

DECEMBER 17, 2019

Check out this step by step approach to building an intelligent chatbot in Python.

Building

Building Python

The Most In Demand Tech Skills for Data Scientists

KDnuggets

DECEMBER 20, 2019

By the end of this article you’ll know which technologies are becoming more popular with employers and which are becoming less popular.

Technology

Technology Data Data Science

The Ultimate Guide to Model Retraining

KDnuggets

DECEMBER 16, 2019

Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.

Machine Learning

Machine Learning Process IT Data

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

How to Convert an RGB Image to Grayscale

KDnuggets

DECEMBER 18, 2019

This post is about working with a mixture of color and grayscale images and needing to transform them into a uniform format - all grayscale. We'll be working in Python using the Pillow, Numpy, and Matplotlib packages.

Python

Python Process

5 Ways to Apply Ethics to AI

KDnuggets

DECEMBER 19, 2019

Here are six more lessons based on real life examples that I think we should all remember as people working in machine learning, whether you’re a researcher, engineer, or a decision-maker.

Machine Learning

Machine Learning Engineering Algorithm

Pedestrian Detection Using Non Maximum Suppression Algorithm

KDnuggets

DECEMBER 17, 2019

Read this overview of a complete pipeline for detecting pedestrians on the road.

Algorithm

Algorithm Python

Microsoft Introduces Icebreaker to Address the Famous Ice-Start Challenge in Machine Learning

KDnuggets

DECEMBER 16, 2019

The new technique allows the deployment of machine learning models that operate with minimum training data.

Machine Learning

Machine Learning Data Preparation Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineering

How To “Ultralearn” Data Science: optimization learning, Part 3

KDnuggets

DECEMBER 20, 2019

This third part in a series about how to "ultralearn" data science will guide you through how to optimize your learning through five valuable techniques.

Data Science

Data Science Data

How To “Ultralearn” Data Science: removing distractions and finding focus, Part 2

KDnuggets

DECEMBER 17, 2019

This second part in a series about how to "ultralearn" data science will guide you through several techniques to remove those distractions -- because your focus needs more focus.

Data Science

Data Science Data Education

Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

KDnuggets

DECEMBER 18, 2019

Ontotext Platform 3.0 features significant technology improvements to enable simpler and faster graph navigation, including GraphQL interfaces to make it easier for application developers to access knowledge graphs without tedious development of back-end APIs or complex SPARQL.

Accessible

Accessible Accessibility Technology IT

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Sat.Dec 14, 2019 - Fri.Dec 20, 2019

Interpretability part 3: LIME and SHAP

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Webinars

Trending Sources

Apache Kafka Producer Improvements with the Sticky Partitioner

Webinars

Solving Data Lineage Tracking And Data Discovery At WeWork

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

The 4 fastest ways not to get hired as a data scientist

Uber’s Data Platform in 2019: Transforming Information to Intelligence

The Easiest Way to Install Apache Kafka and Confluent Platform – Using Ansible

Sign up to get articles personalized to your interests!

More Trending

The Easiest Way to Install Apache Kafka and Confluent Platform – Using Ansible

DBLog: A Generic Change-Data-Capture Framework

Automatic Text Summarization in a Nutshell

How Dataquest Made the Difference for Stacey’s Data Job

What’s New in Apache Kafka 2.4

Agent Tooling: Connecting AI to Your Tools, Systems & Data

DBLog: A Generic Change-Data-Capture Framework

Alternative Cloud Hosted Data Science Environments

Keeping a Lid on Concurrency within the Vantage Platform

Testing Kafka Streams Using TestInputTopic and TestOutputTopic

How to Modernize Manufacturing Without Losing Control

DBLog: A Generic Change-Data-Capture Framework

Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

6 Practices to Realize a Long-Term Data Vision Through Near-Term Work

Superset Announces Elasticsearch Support!

The Ultimate Guide to Apache Airflow DAGS

Google’s New Explainable AI Service

Let’s Build an Intelligent Chatbot

The Most In Demand Tech Skills for Data Scientists

The Ultimate Guide to Model Retraining

Apache Airflow® Best Practices: DAG Writing

How to Convert an RGB Image to Grayscale

5 Ways to Apply Ethics to AI

Pedestrian Detection Using Non Maximum Suppression Algorithm

Microsoft Introduces Icebreaker to Address the Famous Ice-Start Challenge in Machine Learning

How to Achieve High-Accuracy Results When Using LLMs

How To “Ultralearn” Data Science: optimization learning, Part 3

Top 2019 Stories: Top 10 Technology Trends of 2019; How to select rows and columns in Pandas

How To “Ultralearn” Data Science: removing distractions and finding focus, Part 2

Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

Optimizing The Modern Developer Experience with Coder

Stay Connected