September, 2019

article thumbnail

Know Your Data: Part 1

KDnuggets

This article will introduce the different type of data sets, data object and attributes.

Data 123
article thumbnail

Every Company is Becoming a Software Company

Confluent

In 2011, Marc Andressen wrote an article called Why Software is Eating the World. The central idea is that any process that can be moved into software, will be. This has become a kind of shorthand for the investment thesis behind Silicon Valley’s current wave of unicorn startups. It’s also a unifying idea behind the larger set of technology trends we see today, such as machine learning, IoT, ubiquitous mobile connectivity, SaaS, and cloud computing.

article thumbnail

Ship Faster With An Opinionated Data Pipeline Framework

Data Engineering Podcast

Summary Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that you can structure it. Kedro is a framework that provides an opinionated workflow that lets you focus on the parts that matter, so that you don’t waste time on gluing the steps together. In this episode Tom Goldenberg explains how it works, how it is being used at Quantum Black for customer projects, and how it can help you structure your own.

article thumbnail

How Artificial Intelligence & Deep Learning Change the Game

Teradata

AI & Deep Learning allow organizations to maximize player performance while minimizing player risk through better insights from performance and wellness data.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Evolving Regional Evacuation

Netflix Tech

Niosha Behnam | Demand Engineering @ Netflix At Netflix we prioritize innovation and velocity in pursuit of the best experience for our 150+ million global customers. This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers.

article thumbnail

Scaling a Mature Data Pipeline?—?Managing Overhead

Airbnb Tech

Scaling a Mature Data Pipeline — Managing Overhead There is often a hidden performance cost tied to the complexity of data pipelines — the overhead. In this post, we will introduce its concept, and examine the techniques we use to avoid it in our data pipelines. Author : Zachary Ennenga The view from the third floor at Airbnb HQ! Background There is often a natural evolution in the tooling, organization, and technical underpinning of data pipelines.

More Trending

article thumbnail

Introducing Derivative Event Sourcing

Confluent

First, what is event sourcing? Here’s an example. Consider your bank account: viewing it online, the first thing you notice is often the current balance. How many of us drill down to see how we got there? We probably all ask similar questions such as: What payments have cleared? Did my direct deposit hit yet? Why am I spending so much money at Sephora?

Kafka 22
article thumbnail

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with this service, so the team at MinIO have built a production grade, easy to manage storage engine that replicates that interface.

AWS 100
article thumbnail

Teradata Certification Program Embraces Vantage

Teradata

The Teradata Certification program is celebrating its 20th anniversary! Find out how it can advance your career by making you a certified expert on Vantage.

article thumbnail

Grafana Time-Series Dashboards with the Rockset-Grafana Plugin

Rockset

What Is Grafana? Grafana is an open-source software platform for time series analytics and monitoring. You can connect Grafana to a large number of data sources, from PostgreSQL to Prometheus. Once your data source is connected, you can use a built-in query control or editor to fetch data, and build dashboards from your data source. Grafana is frequently deployed for a wide variety of use cases, including DevOps and AdTech.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Story about AWS RDS upgrade to AWS Aurora and InnoDB adaptive hash index parameter

nodeSWAT

Story about unexpected slowdown during AWS RDS upgrade to AWS Aurora and InnoDB adaptive hash index parameter TL;DR at the end. The parameter. MySQL 5.7 documentation about InnoDB adaptive hash index. Turning this parameter ON enables the database engine to analyze index searches and to automatically adapt to the queries/searches you are running. It does so by making custom indexes for these specific cases, in return making your queries run faster because they can now use the automatically gener

AWS 52
article thumbnail

5 Famous Deep Learning Courses/Schools of 2019

KDnuggets

Deep Learning is/has become the hottest skill in Data Science at the moment. There is a plethora of articles, courses, technologies, influencers and resources that we can leverage to gain the Deep Learning skills.

article thumbnail

Apache Kafka Rebalance Protocol for the Cloud: Static Membership

Confluent

Static Membership is an enhancement to the current rebalance protocol that aims to reduce the downtime caused by excessive and unnecessary rebalances for general Apache Kafka ® client implementations. This applies to Kafka consumers, Kafka Connect, and Kafka Streams. To get a better grasp on the rebalance protocol, we’ll examine this concept in depth and explain what it means.

Kafka 21
article thumbnail

Navigating Boundless Data Streams With The Swim Kernel

Data Engineering Podcast

Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysis and interpretation. Unfortunately this strategy is not viable for handling real-time, real-world use cases such as traffic management or supply chain logistics. In this episode Simon Crosby, CTO of Swim Inc., explains how the SwimOS kernel and the enterprise data fabric built on top of it enable brand new use cases for instant insights.

Hadoop 100
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Vantage: A Cloud-First Integrated Data & Analytics Platform

Teradata

There are a lot of misperceptions about Teradata. Learn more about what Teradata Vantage really is: a cloud-first integrated data and analytics platform.

Cloud 63
article thumbnail

Choosing a Reactive Programming Framework for Modern Android Development

Pandora Engineering

When embarking on the journey of developing a new application, a team must establish the foundational technologies upon which their… Continue reading on Algorithm and Blues »

article thumbnail

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Rockset

Airbnb Prices Around Major Events Operational analytics on real-time data streams requires being able to slice and dice it along all the axes that matter to people, including time and space. We can see how important it is to analyze data spatially by looking at an app that’s all about location: Airbnb. Major events in San Francisco cause huge influxes of people, and Airbnb prices increase accordingly.

IT 40
article thumbnail

What is Hierarchical Clustering?

KDnuggets

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

Algorithm 123
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Rise of Managed Services for Apache Kafka

Confluent

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. Luckily for on-premises scenarios, a myriad of deployment options are available, such as the Confluent Platform which can be deployed on bare metal, virtual machines, containers, etc. But deployment is just the tip of the iceberg.

Kafka 21
article thumbnail

Building A Reliable And Performant Router For Observability Data

Data Engineering Podcast

Summary The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data this typically means collecting log messages and system metrics. Often a different tool is used for each class of data, increasing the overall complexity and number of moving parts. The engineers at Timber.io decided to build a new tool in the form of Vector that allows for processing both of these data types in a single framework that is reliable an

Building 100
article thumbnail

Time Series Analysis: Looking Back to See the Future

Teradata

Time series data is found everywhere from stock prices to public health. Vantage's Machine Learning Engine helps turn that data into answers. Find out how.

article thumbnail

AsyncTask, Rx, and Coroutines… Oh My!

Pandora Engineering

Credit: Sally Anscombe An Android Apprentice’s journey to understand Pandora’s migration from AsyncTask to newer APIs During my second month as an Android Engineer Apprentice, I was tasked with migrating AsyncTask to newer APIs. Early on, I was asked, “Do you know why we are migrating from AsyncTask?” I wracked my brain and answered shyly, “It has something to do with memory leaks?

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Reimagining Experimentation Analysis at Netflix

Netflix Tech

Toby Mao , Sri Sri Perangur , Colin McFarland Another day, another custom script to analyze an A/B test. Maybe you’ve done this before and have an old script lying around. If it’s new, it’s probably going to take some time to set up, right? Not at Netflix. ABlaze: The standard view of analyses in the XP UI Suppose you’re running a new video encoding test and theorize that the two new encodes should reduce play delay, a metric describing how long it takes for a video to play after you press the s

article thumbnail

12 Deep Learning Researchers and Leaders

KDnuggets

Our list of deep learning researchers and industry leaders are the people you should follow to stay current with this wildly expanding field in AI. From early practitioners and established academics to entrepreneurs and today’s top corporate influencers, this diverse group of individuals is leading the way into tomorrow’s deep learning landscape.

article thumbnail

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. The significant difference today is that companies use Apache Kafka as an event streaming platform for building mission-critical infrastructures and core operations platforms. Examples include microservice architectures, mainframe integration, instant payment, fraud detection, sensor analytics, real-time monitoring, and many more—dri

Kafka 21
article thumbnail

Building A Community For Data Professionals at Data Council

Data Engineering Podcast

Summary Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical presentations that aren’t burdened by extraneous marketing. To fulfill that need Pete Soderling and his team have been running the Data Council series of conferences and meetups around the world. In this episode Pete discusses his motivation for starting these events, how they serve to bring the data community together, and the observations that he has ma

Building 100
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Taking Analytics to the 4th Dimension

Teradata

4D analytics combines geospatial, temporal and time series data to do advanced analysis of time and space. Learn how to uncover new insights today.

Data 56
article thumbnail

Real-Time Analytics in the World of Virtual Reality and Live Streaming

Rockset

"A fast-moving technology field where new tools, technologies and platforms are introduced very frequently and where it is very hard to keep up with new trends." I could be describing either the VR space or Data Engineering, but in fact this post is about the intersection of both. Virtual Reality – The Next Frontier in Media I work as a Data Engineer at a leading company in the VR space, with a mission to capture and transmit reality in perfect fidelity.

article thumbnail

A Single Function to Streamline Image Classification with Keras

KDnuggets

We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

Utilities 123
article thumbnail

A Gentle Introduction to PyTorch 1.2

KDnuggets

This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.

Building 123
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.