Top Data Engineering Digest Aggregated Data Analytics Application Content for November, 2018

November, 2018

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

Simon Späti

NOVEMBER 28, 2018

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system. I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

Data Warehouse

Data Warehouse Data Storage Data Architecture Architecture

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

Data Engineering Podcast

NOVEMBER 25, 2018

Summary When your data lives in multiple locations, belonging to at least as many applications, it is exceedingly difficult to ask complex questions of it. The default way to manage this situation is by crafting pipelines that will extract the data from source systems and load it into a data lake or data warehouse. In order to make this situation more manageable and allow everyone in the business to gain value from the data the folks at Dremio built a self service data platform.

Data Lake

Data Lake Data Warehouse Hadoop BI

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Trending Sources

Observability at Scale: Building Uber’s Alerting Ecosystem

Uber Engineering

NOVEMBER 20, 2018

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex … The post Observability at Scale: Building Uber’s Alerting Ecosystem appeared first on Uber Engineering Blog.

Building

Building Architecture Engineering

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Netflix Information Security: Preventing Credential Compromise in AWS

Netflix Tech

NOVEMBER 28, 2018

by Will Bengtson Previously we wrote about a method for detecting credential compromise in your AWS environment. The methodology focused on a continuous learning model and first use principle. This solution still is reactive in nature?—?we only detect credential compromise after it has already happened. Even with detection capabilities, there is a risk that exposed credentials can provide access to sensitive data and/or the ability to cause damage in our environment.

AWS

AWS Metadata Amazon Web Services Cloud

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Collaboration Between Data Science and Data Engineering: True or False?

Domino Data Lab: Data Engineering

NOVEMBER 18, 2018

This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down with Don Miner and Marshall Presser to discuss the state of collaboration between data science and data engineering. The blog post provides distilled insights, audio clips, excerpted quotes as well as the full audio and written transcript.

Data Science

Data Science Data Engineer Data Engineering Engineering

Five strategies for skills-based volunteering: Lessons learned from Cloudera Cares first-ever Global Day of Service

Cloudera

NOVEMBER 5, 2018

Corporate volunteering is on the rise. However, only half of companies encourage their employees to participate in skills-based volunteering – defined as employees applying their abilities and specialized talents to challenges facing their communities. As the Program Manager for Cloudera Cares, Cloudera’s employee giving and volunteering program at the Cloudera Foundation, I believe that we can have more impact if we offer employees opportunities for skills-based volunteering.

Food

Food Banking Finance Programming

OLAP, what’s coming next?

Simon Späti

NOVEMBER 23, 2018

Are you on the lookout for a replacement for the Microsoft Analysis Cubes, are you looking for a big data OLAP system that scales ad libitum, do you want to have your analytics updated even real-time? In this blog, I want to show you possible solutions that are ready for the future and fits into existing data architecture. What is OLAP? OLAP is an acronym for Online Analytical Processing.

Big Data

Big Data Data Architecture Architecture Systems

More Trending

OLAP, what’s coming next?

Simon Späti

NOVEMBER 23, 2018

Big Data

Big Data Data Architecture Architecture Systems

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

NOVEMBER 18, 2018

Summary Modern applications and data platforms aspire to process events and data in real time at scale and with low latency. Apache Flink is a true stream processing engine with an impressive set of capabilities for stateful computation at scale. In this episode Fabian Hueske, one of the original authors, explains how Flink is architected, how it is being used to power some of the world’s largest businesses, where it sits in the lanscape of stream processing tools, and how you can start us

Process

Process Google Cloud Scala Kafka

Tag-based Navigation of a Fashion Catalog

Zalando Engineering

NOVEMBER 28, 2018

Exploring the Zalando Assortment by Browsing a Product Similarity Graph Introduction As Europe's leading online fashion and lifestyle platform, Zalando is continually developing new features to enable our customers to find the products they want. While the standard tools of Search, Categorization & Attribute Filtering are par-for-the-course for purchasing items online, with an ever-expanding fashion assortment and an increase in the data available to describe a product, this browsing experie

Algorithm

Algorithm Computer Science Python Big Data

Delivering Meaning with Previews on Web

Netflix Tech

NOVEMBER 12, 2018

By Corey Grunewald and Tony Casparro As the Netflix catalog of films and series continues to grow, it becomes more challenging to present members with enough information to decide what to watch. How can a member tell if a movie is both a horror and a comedy? The synopsis and artwork help provide some context, but how can we leverage video previews (trailers) to help members find something great to watch?

Utilities

Utilities Coding Management Systems

Rockset's RocksDB-Cloud Library - Enabling the Next Generation of Cloud Native Databases

Rockset

NOVEMBER 7, 2018

Rockset and I began collaborating in 2016 due to my interest in their RocksDB-Cloud open-source key-value store. This post is primarily about the RocksDB-Cloud software, which Rockset open-sourced in 2016, rather than Rockset's newly launched cloud service. In it, I will explore how RocksDB-Cloud can be used to build an open-source cloud-friendly storage system.

Database

Database Cloud Cloud Storage MySQL

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

Data

Cloudera Named a Fastest Growing Company by Deloitte for Fourth Year

Cloudera

NOVEMBER 20, 2018

For the fourth time in the past five years, Cloudera has been named to Deloitte’s Technology Fast 500 as one of the fastest growing companies in North America. This annual ranking showcases the growth of companies in the technology, media, telecommunications, life sciences, and energy tech sectors. This year’s list demonstrated the power of combining breakthrough research and development, entrepreneurship and rapid growth, with software companies like Cloudera making up nearly two-thirds of the

Telecommunication

Telecommunication Media Cloud Technology

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Engineering Podcast

NOVEMBER 11, 2018

Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. Unfortunately, that can be a complex and time-consuming effort, requiring specialized knowledge and diverting resources from your primary business. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

Data Lake

Data Lake Building Kafka Cloud

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Data Engineering Podcast

NOVEMBER 4, 2018

Summary Business intelligence is a necessity for any organization that wants to be able to make informed decisions based on the data that they collect. Unfortunately, it is common for different portions of the business to build their reports with different assumptions, leading to conflicting views and poor choices. Looker is a modern tool for building and sharing reports that makes it easy to get everyone on the same page.

Business Intelligence

Business Intelligence Hadoop BI Data Warehouse

Netflix at AWS re:Invent 2018

Netflix Tech

NOVEMBER 26, 2018

by Shaun Blackburn AWS re:Invent is back in Las Vegas this week! Many Netflix engineers and leaders will be among the 40,000 attending the conference to connect with fellow cloud and OSS enthusiasts. You can find us at our booth on the expo floor, speaking on a variety of subjects, and at meetups and events around the re:Invent campus. We have listed all our talks below to make it easy to hear what we have been up to.

AWS

AWS Software Engineering Software Engineer Cloud

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

Manufacturing

An introduction to Federated Learning

Cloudera

NOVEMBER 14, 2018

We’re excited to release Federated Learning , the latest report and prototype from Cloudera Fast Forward Labs. Federated learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs. This article is about the business case for federated learning.

Manufacturing

Manufacturing Healthcare Machine Learning Medical

Zalando Postgres Operator: One Year Later

Zalando Engineering

NOVEMBER 25, 2018

Zalando Postgres operator: one year later The Postgres operator provides a managed Postgres service for Kubernetes. It extends the Kubernetes API with a custom “postgresql” resource that describes desired characteristics of a Postgres cluster, monitors updates of this resource and adjusts Postgres clusters accordingly. Zalando successfully uses the operator to manage more than 450 Postgres clusters across a large number of Kubernetes installations.

PostgreSQL

PostgreSQL Cloud Storage Cloud Computing Database

Zalando Research Releases “Flair”

Zalando Engineering

NOVEMBER 21, 2018

Open sourcing machine learning research for natural language processing (NLP) Two years ago, Zalando Research launched with a clear purpose to ensure that Zalando Tech is at the forefront of research in the areas of data science, machine learning, natural language processing and artificial intelligence. Our researchers’ work previously focused mainly within Zalando.

Deep Learning

Deep Learning Machine Learning Datasets Data Science

Digital Transformation Focused on Sustainability

Cloudera

NOVEMBER 19, 2018

My inspiration for writing this blog was a recent trip to a warehouse and distribution center of a well-known U.S. fast-food enterprise with a reputation for superior quality. During my visit, I had the opportunity to chat with the center’s Manager for Food Safety whose credentials (Ph.D. in Food Science), knowledge, and experience reflect the company’s commitment to product safety and quality.

Food

Food Big Data Machine Learning Database

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

Systems

Train Deep Learning Models on AWS

Zalando Engineering

NOVEMBER 7, 2018

A real-life example of how to train a Deep Learning model on an AWS Spot Instance using Spotty Spotty is a tool that simplifies training of Deep Learning models on AWS. Why will you ❤️this tool? it makes training on AWS GPU instances as simple as a training on your local computer it automatically manages all necessary AWS resources including AMIs, volumes and snapshots it makes your model trainable on AWS by everyone with a couple of commands it detaches remote processes from SSH sessions it sav

Deep Learning

Deep Learning AWS Python Project

Open Source: October Review - Hacktoberfest, new releases and more.

Zalando Engineering

NOVEMBER 5, 2018

Project Highlights Connexion version 2.0 with OpenAPI 3 support is ready, check out what is new in our latest release! Connexion is the Swagger/OpenAPI first framework for Python on top of Flask with automatic endpoint validation & OAuth2 support. With 87 active contributors and more than 1,000 repositories that depend on Connexion worldwide makes this project one of the most successful open source releases of Zalando.

PostgreSQL

PostgreSQL Professional Services Media Software Engineering

Connexion 2.0 Release

Zalando Engineering

NOVEMBER 4, 2018

Today, we released Connexion 2.0 with OpenAPI 3 support. Connexion is a Python framework that automagically handles HTTP requests based on OpenAPI Specification (formerly known as Swagger Spec) of your API described in YAML format. Connexion allows you to write a Swagger specification, then maps the endpoints to your Python functions. Besides routing, Connexion also validates requests and responses automatically based on OpenAPI specifications, handles common authentication schemes, supports API

Python

Python IT

Why SQL on Raw Data?

Rockset

NOVEMBER 1, 2018

Over a decade after the inception of the Hadoop project, the amount of unstructured data available to modern applications continues to increase. Moreover, despite forecasts to the contrary, SQL remains the lingua franca of data processing; today's NoSQL and Big Data infrastructure platform usage often involves some form of SQL-based querying. This longevity is a testament to the community of analysts and data practitioners who are familiar with SQL as well as the mature ecosystem of tools around

Raw Data

Raw Data SQL Unstructured Data NoSQL

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

Project

Dynamic Typing in SQL

Rockset

NOVEMBER 1, 2018

As Peter Bailis put it in his post , querying unstructured data using SQL is a painful process. Moreover, developers frequently prefer dynamic programming languages, so interacting with the strict type system of SQL is a barrier. We at Rockset have built the first schemaless SQL data platform. In this post and a few others that follow, we'd like to introduce you to our approach.

SQL

SQL NoSQL Programming Language Bytes

Making smart cities safer with data

Cloudera

NOVEMBER 9, 2018

By Mark Micallef, Vice President of Asia Pacific and Japan , Cloudera. What comes to your mind when you think of the term “smart city”? For me, it conjures an image of a city where everything is interconnected, enabling it to run efficiently and offer convenient, secure, and personalized services to its residents at the touch of their fingertips. While such a city might sound like a utopian dream, it could potentially turn into a dystopian nightmare if we overlook the risks brought about by the

Machine Learning

Machine Learning Banking Government Media

November, 2018

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

Webinars

Trending Sources

Observability at Scale: Building Uber’s Alerting Ecosystem

Webinars

Netflix Information Security: Preventing Credential Compromise in AWS

15 Modern Use Cases for Enterprise Business Intelligence

Collaboration Between Data Science and Data Engineering: True or False?

Five strategies for skills-based volunteering: Lessons learned from Cloudera Cares first-ever Global Day of Service

OLAP, what’s coming next?

Sign up to get articles personalized to your interests!

More Trending

OLAP, what’s coming next?

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Tag-based Navigation of a Fashion Catalog

Delivering Meaning with Previews on Web

Rockset's RocksDB-Cloud Library - Enabling the Next Generation of Cloud Native Databases

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Cloudera Named a Fastest Growing Company by Deloitte for Fourth Year

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Netflix at AWS re:Invent 2018

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

An introduction to Federated Learning

Zalando Postgres Operator: One Year Later

Zalando Research Releases “Flair”

Digital Transformation Focused on Sustainability

Improving the Accuracy of Generative AI Systems: A Structured Approach

Train Deep Learning Models on AWS

Open Source: October Review - Hacktoberfest, new releases and more.

Connexion 2.0 Release

Why SQL on Raw Data?

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Dynamic Typing in SQL

Making smart cities safer with data

Stay Connected