Sat.Feb 16, 2019 - Fri.Feb 22, 2019

article thumbnail

Speed Up Your Analytics With The Alluxio Distributed Storage System

Data Engineering Podcast

Summary Distributed storage systems are the foundational layer of any big data stack. There are a variety of implementations which support different specialized use cases and come with associated tradeoffs. Alluxio is a distributed virtual filesystem which integrates with multiple persistent storage systems to provide a scalable, in-memory storage layer for scaling computational workloads independent of the size of your data.

Systems 100
article thumbnail

Extending Vector with eBPF to inspect host and container performance

Netflix Tech

by Jason Koch , with Martin Spier , Brendan Gregg , Ed Hunter Improving the tools available to our engineers to help them diagnose, triage, and work through software performance challenges in the cloud is a key goal for the cloud performance engineering team at Netflix. Today we are excited to announce latency heatmaps and improved container support for our on-host monitoring solution?

article thumbnail

Sysmon Security Event Processing in Real Time with KSQL and HELK

Confluent

During a recent talk titled Hunters ATT&CKing with the Right Data , which I presented with my brother Jose Luis Rodriguez at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. Defining relationships among Windows security event logs such as Sysmon , for example, helped us to appreciate the extra context that two or more events together can provide for a hunt.

Process 83
article thumbnail

How to Run SQL on PDF Files

Rockset

PDFs are the de facto standard for distributing and sharing fixed-layout documents today. A quick survey of my laptop folders reveals account statements, receipts, technical papers, book chapters, and presentation slides—all PDFs. Lots of valuable information finds its way into all manner of PDF files. Which is a great reason for Rockset to support SQL queries on PDF files, in our mission to make data more usable to everyone.

SQL 52
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

A Journey On End To End Testing A Microservices Architecture

Zalando Engineering

End to end testing is a testing technique used to test the flow of an application through a business transaction. In microservices architecture there are different components working together to enable a business capability, therefore testing all of them can get tricky. In this article you can read about our team’s journey: What our system looks like What do you get from e2e testing?

article thumbnail

It's the Relationship - Not Just the Data - That is Critical to Success

Teradata

Rob Armstrong explains that while data is important, the real key is preserving the relationships across the data models that leads to insight and successful business outcomes.

IT 40

More Trending

article thumbnail

Using Smart Schema to Accelerate Insights from Nested JSON

Rockset

Developers often need to work with datasets without a fixed schema, like heavily nested JSON data with several deeply nested arrays and objects, mixed data types, null values, and missing fields. In addition, the shape of the data is prone to change when continuously syncing new data. Understanding the shape of a dataset is crucial to constructing complex queries for building applications or performing data science investigations.

article thumbnail

Kafka Summit 2019: 3 Big Things!

Confluent

How many Kafka Summits should there be in a year? Experts disagree. Some say there should be one giant event where everybody gathers at once. Some say there should be one once a month in different regions of the world. Others say you should live every day like it’s Kafka Summit. As you may know, we have adopted a happy medium: three Summits in 2019.

Kafka 66
article thumbnail

The Utah Jazz Uses Pervasive Data Intelligence for Next Generation Sports Analytics

Teradata

Larry H. Miller is using data and analytics to successfully increase customer satisfaction from a multitude of data sources and customer touchpoints.

Data 40
article thumbnail

The Utah Jazz Uses Pervasive Data Intelligence for Next Generation Sports Analytics

Teradata

Larry H. Miller is using data and analytics to successfully increase customer satisfaction from a multitude of data sources and customer touchpoints.

Data 40
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!