Top Data Engineering Digest Relational Database ETL Tools Content for Week of Aug 17

Sat.Aug 17, 2019 - Fri.Aug 23, 2019

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

KDnuggets

AUGUST 23, 2019

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

Coding

Coding Python

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

AUGUST 22, 2019

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

Building

Building Engineering IT Architecture

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Building Transactional Systems Using Apache Kafka

Confluent

AUGUST 20, 2019

Traditional relational database systems are ubiquitous in software systems. They are surrounded by a strong ecosystem of tools, such as object-relational mappers and schema migration helpers. Relational databases also provide strong guarantees in the form of ACID transactions, which are loved by developers for their all-or-nothing semantics. Today’s businesses, however, want to process ever-increasing amounts of data.

Kafka

Kafka Systems Building Relational Database

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

AUGUST 19, 2019

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale.

Big Data

Big Data Hadoop Data Lake Media

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Top Handy SQL Features for Data Scientists

KDnuggets

AUGUST 23, 2019

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

SQL

SQL Data IT Data Science

Applying Netflix DevOps Patterns to Windows

Netflix Tech

AUGUST 22, 2019

Baking Windows with Packer By Justin Phelps and Manuel Correa Customizing Windows images at Netflix was a manual, error-prone, and time consuming process. In this blog post, we describe how we improved the methodology, which technologies we leveraged, and how this has improved service deployment and consistency. Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of t

AWS

AWS Java Coding Engineering

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Teradata

AUGUST 20, 2019

Teradata is named yet again to the Constellation ShortList™ for “Hybrid and Multi-Cloud Relational Database Management Systems." Read more!

Cloud

Cloud Relational Database Database Systems

More Trending

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Teradata

AUGUST 20, 2019

Teradata is named yet again to the Constellation ShortList™ for “Hybrid and Multi-Cloud Relational Database Management Systems." Read more!

Cloud

Cloud Relational Database Database Systems

A Guide to the Confluent Verified Integrations Program

Confluent

AUGUST 19, 2019

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This post specifically outlines the process by which we verify partner integrations, and is a means of letting the world know about our partner’s contributions to our connector ecosystem.

Programming

Programming Kafka Database-centric MongoDB

Is Kaggle Learn a “Faster Data Science Education?”

KDnuggets

AUGUST 20, 2019

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

Education

Education Data Science Data IT

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Rockset

AUGUST 23, 2019

Many of our users implement operational reporting and analytics on DynamoDB using Rockset as a SQL intelligence layer to serve live dashboards and applications. As an engineering team, we are constantly searching for opportunities to improve their SQL-on-DynamoDB experience. For the past few weeks, we have been hard at work tuning the performance of our DynamoDB ingestion process.

Bytes

Bytes NoSQL SQL AWS

Announcing Bottom Navigator

Pandora Engineering

AUGUST 19, 2019

An Android Multiple Backstack Bottom Navigation Library Pandora’s latest mobile redesign brings the bottom navigation pattern to our apps. Bottom navigation has become a popular design choice for many apps due to its many advantages including easy one-handed use and enhanced discoverability of top app destinations. When Pandora embarked on this project our designers had a clear vision of how navigation should work, a vision that in many ways is familiar to users of other popular apps like Instag

Designing

Designing Algorithm Data Science Programming

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Data is Not the New Oil. Data is Water!

Teradata

AUGUST 18, 2019

If you work in data analytics or a related field, you’ve probably heard the mantra that data is the new oil. But data is not oil, it's water. Find out why.

Data

Data Data Analytics IT

Order Matters: Alibaba’s Transformer-based Recommender System

KDnuggets

AUGUST 23, 2019

Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.

Systems

Systems IT Engineering

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

AUGUST 22, 2019

Building

Building Engineering IT Architecture

The Kafka Connect Plugin for Rockset and How It Works

Rockset

AUGUST 21, 2019

Rockset continuously ingests data streams from Kafka, without the need for a fixed schema, and serves fast SQL queries on that data. We created the Kafka Connect Plugin for Rockset to export data from Kafka and send it to a collection of documents in Rockset. Users can then build real-time dashboards or data APIs on top of the data in Rockset. This blog covers how we implemented the plugin.

Kafka

Kafka IT Data Storage Relational Database

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Deep Learning for NLP: Creating a Chatbot with Keras!

KDnuggets

AUGUST 19, 2019

Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?

Deep Learning

Deep Learning Building Python

Detecting stationarity in time series data

KDnuggets

AUGUST 20, 2019

Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

Data

Data Process

Understanding Decision Trees for Classification in Python

KDnuggets

AUGUST 21, 2019

This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.

Python

An Overview of Python’s Datatable package

KDnuggets

AUGUST 20, 2019

Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.

Big Data

Big Data Machine Learning Process Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Math for Programmers

KDnuggets

AUGUST 19, 2019

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

Programming

Gender Diversity in AI Research

KDnuggets

AUGUST 21, 2019

Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.

Proptech and the proper use of technology for house sales prediction

KDnuggets

AUGUST 22, 2019

Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.

Technology

Technology Datasets Data

Automate Stacking In Python: How to Boost Your Performance While Saving Time

KDnuggets

AUGUST 21, 2019

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.

Python

Python Algorithm Machine Learning Utilities

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Crafting an Elevator Pitch for your Data Science Startup

KDnuggets

AUGUST 19, 2019

If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.

Data Science

Data Science Data

Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

KDnuggets

AUGUST 19, 2019

What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.

Data Integration

Data Integration Coding ETL Tools Data

Which skills / knowledge areas do you currently have, and which do you want to add or improve?

KDnuggets

AUGUST 22, 2019

New KDnuggets survey looks to find out what skills our readers currently use, and which they are looking to add or improve. Take a few minutes to participate.

Machine Learning

Machine Learning Data Science Data

Comparing Decision Tree Algorithms: Random Forest vs. XGBoost

KDnuggets

AUGUST 21, 2019

Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.

Algorithm

Algorithm Python

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)

KDnuggets

AUGUST 20, 2019

Crunch is coming to Budapest, Hungary on 16-18 Oct. Use code KDNuggets to save on Data Science, Data Engineering, or BI tracks. But first, read this interview with keynote speaker Andy Cotgreave.

BI Data Science Data Engineering Data Engineer

eBook: How to Enhance Privacy in Data Science

KDnuggets

AUGUST 22, 2019

Check out this eBook, How to Enhance Privacy in Data Science, to equip yourself with the tools to enhance privacy in data science, including transforming data in a manner that protects the privacy, an overview of the challenges and opportunities of privacy-aware analytics, and more.

Data Science

Data Science Data

How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

KDnuggets

AUGUST 22, 2019

As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.

Machine Learning

Machine Learning Management Data Management Datasets

Lincoln Clean Energy: Director, Asset Performance [Austin, TX]

KDnuggets

AUGUST 19, 2019

Seeking an Asset Performance Director, a role which requires an individual that possesses a strong technical skill set and the ability to communicate findings effectively throughout the organization.

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Aug 17, 2019 - Fri.Aug 23, 2019

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

Building the New Uber Freight App as Lists of Modular, Reusable Components

Webinars

Trending Sources

Building Transactional Systems Using Apache Kafka

Webinars

A High Performance Platform For The Full Big Data Lifecycle

A Guide to Debugging Apache Airflow® DAGs

Top Handy SQL Features for Data Scientists

Applying Netflix DevOps Patterns to Windows

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Sign up to get articles personalized to your interests!

More Trending

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

A Guide to the Confluent Verified Integrations Program

Is Kaggle Learn a “Faster Data Science Education?”

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Announcing Bottom Navigator

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Data is Not the New Oil. Data is Water!

Order Matters: Alibaba’s Transformer-based Recommender System

Building the New Uber Freight App as Lists of Modular, Reusable Components

The Kafka Connect Plugin for Rockset and How It Works

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Deep Learning for NLP: Creating a Chatbot with Keras!

Detecting stationarity in time series data

Understanding Decision Trees for Classification in Python

An Overview of Python’s Datatable package

How to Modernize Manufacturing Without Losing Control

Math for Programmers

Gender Diversity in AI Research

Proptech and the proper use of technology for house sales prediction

Automate Stacking In Python: How to Boost Your Performance While Saving Time

The Ultimate Guide to Apache Airflow DAGS

Crafting an Elevator Pitch for your Data Science Startup

Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

Which skills / knowledge areas do you currently have, and which do you want to add or improve?

Comparing Decision Tree Algorithms: Random Forest vs. XGBoost

Apache Airflow® Best Practices: DAG Writing

Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)

eBook: How to Enhance Privacy in Data Science

How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

Lincoln Clean Energy: Director, Asset Performance [Austin, TX]

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected