November, 2018

article thumbnail

Open-Source Data Warehousing – Druid, Apache Airflow & Superset

Simon Späti

These days, everyone talks about open-source. However, this is still not common in the Data Warehouse (DWH) field. Why is this? In my recent blog, I researched OLAP technologies, for this post I chose some open-source technologies and used them together to build a full data architecture for a Data Warehouse system. I went with Apache Druid for data storage, Apache Superset for querying and Apache Airflow as a task orchestrator.

article thumbnail

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

Summary Modern applications and data platforms aspire to process events and data in real time at scale and with low latency. Apache Flink is a true stream processing engine with an impressive set of capabilities for stateful computation at scale. In this episode Fabian Hueske, one of the original authors, explains how Flink is architected, how it is being used to power some of the world’s largest businesses, where it sits in the lanscape of stream processing tools, and how you can start us

Process 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Observability at Scale: Building Uber’s Alerting Ecosystem

Uber Engineering

Uber’s software architectures consists of thousands of microservices that empower teams to iterate quickly and support our company’s global growth. These microservices support a variety of solutions, such as mobile applications, internal and infrastructure services, and products along with complex … The post Observability at Scale: Building Uber’s Alerting Ecosystem appeared first on Uber Engineering Blog.

Building 111
article thumbnail

Netflix Information Security: Preventing Credential Compromise in AWS

Netflix Tech

by Will Bengtson Previously we wrote about a method for detecting credential compromise in your AWS environment. The methodology focused on a continuous learning model and first use principle. This solution still is reactive in nature?—?we only detect credential compromise after it has already happened. Even with detection capabilities, there is a risk that exposed credentials can provide access to sensitive data and/or the ability to cause damage in our environment.

AWS 96
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Collaboration Between Data Science and Data Engineering: True or False?

Domino Data Lab: Data Engineering

This blog post includes candid insights about addressing tension points that arise when people collaborate on developing and deploying models. Domino’s Head of Content sat down with Don Miner and Marshall Presser to discuss the state of collaboration between data science and data engineering. The blog post provides distilled insights, audio clips, excerpted quotes as well as the full audio and written transcript.

article thumbnail

Five strategies for skills-based volunteering: Lessons learned from Cloudera Cares first-ever Global Day of Service

Cloudera

Corporate volunteering is on the rise. However, only half of companies encourage their employees to participate in skills-based volunteering – defined as employees applying their abilities and specialized talents to challenges facing their communities. As the Program Manager for Cloudera Cares, Cloudera’s employee giving and volunteering program at the Cloudera Foundation, I believe that we can have more impact if we offer employees opportunities for skills-based volunteering.

Food 44

More Trending

article thumbnail

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Engineering Podcast

Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. Unfortunately, that can be a complex and time-consuming effort, requiring specialized knowledge and diverting resources from your primary business. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

Data Lake 100
article thumbnail

Zalando Research Releases “Flair”

Zalando Engineering

Open sourcing machine learning research for natural language processing (NLP) Two years ago, Zalando Research launched with a clear purpose to ensure that Zalando Tech is at the forefront of research in the areas of data science, machine learning, natural language processing and artificial intelligence. Our researchers’ work previously focused mainly within Zalando.

article thumbnail

Rockset's RocksDB-Cloud Library - Enabling the Next Generation of Cloud Native Databases

Rockset

Rockset and I began collaborating in 2016 due to my interest in their RocksDB-Cloud open-source key-value store. This post is primarily about the RocksDB-Cloud software, which Rockset open-sourced in 2016, rather than Rockset's newly launched cloud service. In it, I will explore how RocksDB-Cloud can be used to build an open-source cloud-friendly storage system.

article thumbnail

Delivering Meaning with Previews on Web

Netflix Tech

By Corey Grunewald and Tony Casparro As the Netflix catalog of films and series continues to grow, it becomes more challenging to present members with enough information to decide what to watch. How can a member tell if a movie is both a horror and a comedy? The synopsis and artwork help provide some context, but how can we leverage video previews (trailers) to help members find something great to watch?

article thumbnail

Launching LLM-Based Products: From Concept to Cash in 90 Days

Speaker: Christophe Louvion, Chief Product & Technology Officer of NRC Health and Tony Karrer, CTO at Aggregage

Christophe Louvion, Chief Product & Technology Officer of NRC Health, is here to take us through how he guided his company's recent experience of getting from concept to launch and sales of products within 90 days. In this exclusive webinar, Christophe will cover key aspects of his journey, including: LLM Development & Quick Wins 🤖 Understand how LLMs differ from traditional software, identifying opportunities for rapid development and deployment.

article thumbnail

Cloudera Named a Fastest Growing Company by Deloitte for Fourth Year

Cloudera

For the fourth time in the past five years, Cloudera has been named to Deloitte’s Technology Fast 500 as one of the fastest growing companies in North America. This annual ranking showcases the growth of companies in the technology, media, telecommunications, life sciences, and energy tech sectors. This year’s list demonstrated the power of combining breakthrough research and development, entrepreneurship and rapid growth, with software companies like Cloudera making up nearly two-thirds of the

article thumbnail

Self Service Business Intelligence And Data Sharing Using Looker with Daniel Mintz - Episode 55

Data Engineering Podcast

Summary Business intelligence is a necessity for any organization that wants to be able to make informed decisions based on the data that they collect. Unfortunately, it is common for different portions of the business to build their reports with different assumptions, leading to conflicting views and poor choices. Looker is a modern tool for building and sharing reports that makes it easy to get everyone on the same page.

article thumbnail

Digital Transformation Focused on Sustainability

Cloudera

My inspiration for writing this blog was a recent trip to a warehouse and distribution center of a well-known U.S. fast-food enterprise with a reputation for superior quality. During my visit, I had the opportunity to chat with the center’s Manager for Food Safety whose credentials (Ph.D. in Food Science), knowledge, and experience reflect the company’s commitment to product safety and quality.

Food 40
article thumbnail

Train Deep Learning Models on AWS

Zalando Engineering

A real-life example of how to train a Deep Learning model on an AWS Spot Instance using Spotty Spotty is a tool that simplifies training of Deep Learning models on AWS. Why will you ❤️this tool? it makes training on AWS GPU instances as simple as a training on your local computer it automatically manages all necessary AWS resources including AMIs, volumes and snapshots it makes your model trainable on AWS by everyone with a couple of commands it detaches remote processes from SSH sessions it sav

article thumbnail

How To Speak The Language Of Financial Success In Product Management

Speaker: Jamie Bernard

Success in product management goes beyond delivering great features - it’s about achieving measurable financial outcomes that resonate across the organization. By connecting your product’s journey with the company’s financial success, you’ll ensure that every feature, release, and innovation contributes to the bottom line, driving both customer satisfaction and business growth.

article thumbnail

Connexion 2.0 Release

Zalando Engineering

Today, we released Connexion 2.0 with OpenAPI 3 support. Connexion is a Python framework that automagically handles HTTP requests based on OpenAPI Specification (formerly known as Swagger Spec) of your API described in YAML format. Connexion allows you to write a Swagger specification, then maps the endpoints to your Python functions. Besides routing, Connexion also validates requests and responses automatically based on OpenAPI specifications, handles common authentication schemes, supports API

Python 40
article thumbnail

Set Up Your Own Data-as-a-Service Platform On Dremio with Tomer Shiran - Episode 58

Data Engineering Podcast

Summary When your data lives in multiple locations, belonging to at least as many applications, it is exceedingly difficult to ask complex questions of it. The default way to manage this situation is by crafting pipelines that will extract the data from source systems and load it into a data lake or data warehouse. In order to make this situation more manageable and allow everyone in the business to gain value from the data the folks at Dremio built a self service data platform.

Data Lake 100
article thumbnail

Netflix at AWS re:Invent 2018

Netflix Tech

by Shaun Blackburn AWS re:Invent is back in Las Vegas this week! Many Netflix engineers and leaders will be among the 40,000 attending the conference to connect with fellow cloud and OSS enthusiasts. You can find us at our booth on the expo floor, speaking on a variety of subjects, and at meetups and events around the re:Invent campus. We have listed all our talks below to make it easy to hear what we have been up to.

AWS 45
article thumbnail

Dynamic Typing in SQL

Rockset

As Peter Bailis put it in his post , querying unstructured data using SQL is a painful process. Moreover, developers frequently prefer dynamic programming languages, so interacting with the strict type system of SQL is a barrier. We at Rockset have built the first schemaless SQL data platform. In this post and a few others that follow, we'd like to introduce you to our approach.

SQL 40
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.

article thumbnail

Open Source: October Review - Hacktoberfest, new releases and more.

Zalando Engineering

Project Highlights Connexion version 2.0 with OpenAPI 3 support is ready, check out what is new in our latest release! Connexion is the Swagger/OpenAPI first framework for Python on top of Flask with automatic endpoint validation & OAuth2 support. With 87 active contributors and more than 1,000 repositories that depend on Connexion worldwide makes this project one of the most successful open source releases of Zalando.

article thumbnail

An introduction to Federated Learning

Cloudera

We’re excited to release Federated Learning , the latest report and prototype from Cloudera Fast Forward Labs. Federated learning makes it possible to build machine learning systems without direct access to training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs. This article is about the business case for federated learning.

article thumbnail

Tag-based Navigation of a Fashion Catalog

Zalando Engineering

Exploring the Zalando Assortment by Browsing a Product Similarity Graph Introduction As Europe's leading online fashion and lifestyle platform, Zalando is continually developing new features to enable our customers to find the products they want. While the standard tools of Search, Categorization & Attribute Filtering are par-for-the-course for purchasing items online, with an ever-expanding fashion assortment and an increase in the data available to describe a product, this browsing experie

article thumbnail

Zalando Postgres Operator: One Year Later

Zalando Engineering

Zalando Postgres operator: one year later The Postgres operator provides a managed Postgres service for Kubernetes. It extends the Kubernetes API with a custom “postgresql” resource that describes desired characteristics of a Postgres cluster, monitors updates of this resource and adjusts Postgres clusters accordingly. Zalando successfully uses the operator to manage more than 450 Postgres clusters across a large number of Kubernetes installations.

article thumbnail

Provide Real Value in Your Applications with Data and Analytics

The complexity of financial data, the need for real-time insight, and the demand for user-friendly visualizations can seem daunting when it comes to analytics - but there is an easier way. With Logi Symphony, we aim to turn these challenges into opportunities. Our platform empowers you to seamlessly integrate advanced data analytics, generative AI, data visualization, and pixel-perfect reporting into your applications, transforming raw data into actionable insights.

article thumbnail

Why SQL on Raw Data?

Rockset

Over a decade after the inception of the Hadoop project, the amount of unstructured data available to modern applications continues to increase. Moreover, despite forecasts to the contrary, SQL remains the lingua franca of data processing; today's NoSQL and Big Data infrastructure platform usage often involves some form of SQL-based querying. This longevity is a testament to the community of analysts and data practitioners who are familiar with SQL as well as the mature ecosystem of tools around

article thumbnail

Making smart cities safer with data

Cloudera

By Mark Micallef, Vice President of Asia Pacific and Japan , Cloudera. What comes to your mind when you think of the term “smart city”? For me, it conjures an image of a city where everything is interconnected, enabling it to run efficiently and offer convenient, secure, and personalized services to its residents at the touch of their fingertips. While such a city might sound like a utopian dream, it could potentially turn into a dystopian nightmare if we overlook the risks brought about by the