August, 2019

article thumbnail

Types of Bias in Machine Learning

KDnuggets

The sample data used for training has to be as close a representation of the real scenario as possible. There are many factors that can bias a sample from the beginning and those reasons differ from each domain (i.e. business, security, medical, education etc.).

article thumbnail

Using Graph Processing for Kafka Stream Visualizations

Confluent

We know that Apache Kafka ® is great when you’re dealing with streams, allowing you to conveniently look at streams as tables. Stream processing engines like KSQL furthermore give you the ability to manipulate all of this fluently. But what about when the relationships between items dominate your application? For example, in a social network, understanding the network means we need to look at the friend relationships between people.

Kafka 55
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

Building 111
article thumbnail

Building Tools And Platforms For Data Analytics

Data Engineering Podcast

Summary Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users has their own set of requirements for the way that they access and interact with those platforms depending on the insights they are trying to gather. Benn Stancil is the chief analyst at Mode Analytics and in this episode he explains the set of considerations and requirements that data analysts need in their tools and.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Applying Netflix DevOps Patterns to Windows

Netflix Tech

Baking Windows with Packer By Justin Phelps and Manuel Correa Customizing Windows images at Netflix was a manual, error-prone, and time consuming process. In this blog post, we describe how we improved the methodology, which technologies we leveraged, and how this has improved service deployment and consistency. Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of t

AWS 84
article thumbnail

Is Self-Service Analytics Sustainable?

Teradata

Self-service analytics are increasingly being implemented by organizations that want to promote a data-driven culture. But how sustainable is it? Read more.

IT 16

More Trending

article thumbnail

Announcing Tutorials for Apache Kafka

Confluent

We’re excited to announce Tutorials for Apache Kafka ® , a new area of our website for learning event streaming. Kafka Tutorials is a collection of common event streaming use cases, with each tutorial featuring an example scenario and several complete code solutions. It’s the fastest way to learn how to use Kafka with confidence. We’re building this because we know that event streaming is a radically different way of thinking.

Kafka 22
article thumbnail

Less is More: Engineering Data Warehouse Efficiency with Minimalist Design

Uber Engineering

Maintaining Uber’s large-scale data warehouse comes with an operational cost in terms of ETL functions and storage. In our experience, optimizing for operational efficiency requires answering one key question: for which tables does the maintenance cost supersede utility? Once identified, … The post Less is More: Engineering Data Warehouse Efficiency with Minimalist Design appeared first on Uber Engineering Blog.

article thumbnail

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale.

Big Data 100
article thumbnail

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Rockset

Many of our users implement operational reporting and analytics on DynamoDB using Rockset as a SQL intelligence layer to serve live dashboards and applications. As an engineering team, we are constantly searching for opportunities to improve their SQL-on-DynamoDB experience. For the past few weeks, we have been hard at work tuning the performance of our DynamoDB ingestion process.

Bytes 52
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Teradata Earns Spot (Again x2!) on Constellation ShortList for Hybrid Cloud

Teradata

Teradata is named yet again to the Constellation ShortList™ for “Hybrid and Multi-Cloud Relational Database Management Systems." Read more!

Cloud 15
article thumbnail

Why Data Visualization Is The Most Important Skill in a Data Analyst Arsenal

KDnuggets

Visually-displayed data is much more accessible, and it’s criticalto promptly identify the weaknesses of an organization, accurately forecasttrading volumes and sale prices, or make the right business choices.

Data 123
article thumbnail

Building Transactional Systems Using Apache Kafka

Confluent

Traditional relational database systems are ubiquitous in software systems. They are surrounded by a strong ecosystem of tools, such as object-relational mappers and schema migration helpers. Relational databases also provide strong guarantees in the form of ACID transactions, which are loved by developers for their all-or-nothing semantics. Today’s businesses, however, want to process ever-increasing amounts of data.

Kafka 22
article thumbnail

Migrating Functionality Between Large-scale Production Systems Seamlessly

Uber Engineering

A common axiom among Uber engineers states that building new features is like fixing a car’s engine while driving it. As we scaled up to our present level of support for 14 million trips per day, the car in that … The post Migrating Functionality Between Large-scale Production Systems Seamlessly appeared first on Uber Engineering Blog.

Systems 97
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Digging Into Data Replication At Fivetran

Data Engineering Podcast

Summary The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most difficult tasks that we encounter. Fivetran is a platform that does the hard work for you and replicates information from your source systems into whichever data warehouse you use.

Media 100
article thumbnail

Announcing Bottom Navigator

Pandora Engineering

An Android Multiple Backstack Bottom Navigation Library Pandora’s latest mobile redesign brings the bottom navigation pattern to our apps. Bottom navigation has become a popular design choice for many apps due to its many advantages including easy one-handed use and enhanced discoverability of top app destinations. When Pandora embarked on this project our designers had a clear vision of how navigation should work, a vision that in many ways is familiar to users of other popular apps like Instag

article thumbnail

Four Steps to Drive Digital Transformation in Your Bank

Teradata

Digital transformation & regulatory requirements have long challenged Banks. Teradata has deep experience in ushering them through the transformation process.

Banking 15
article thumbnail

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

KDnuggets

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

Coding 123
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Confluent

The Kafka Streams API boasts a number of capabilities that make it well suited for maintaining the global state of a distributed system. At Imperva, we took advantage of Kafka Streams to build shared state microservices that serve as fault-tolerant, highly available single sources of truth about the state of objects in our system. Why we chose Kafka Streams.

Kafka 20
article thumbnail

Simple node.JS and Slack WebHook integration

nodeSWAT

This post will walk you through the process of how to turn this awesome chat tool into a handy monitoring & alerting tool for your application. All this without any 3rd party modules and minimal code to keep the footprint small. Note: This post is using now outmoded integration method. Slack has introduced new ways to manage and send messages via Apps.

Coding 52
article thumbnail

Solving Data Discovery At Lyft

Data Engineering Podcast

Summary Data is only valuable if you use it for something, and the first step is knowing that it is available. As organizations grow and data sources proliferate it becomes difficult to keep track of everything, particularly for analysts and data scientists who are not involved with the collection and management of that information. Lyft has build the Amundsen platform to address the problem of data discovery and in this episode Tao Feng and Mark Grover explain how it works, why they built it, a

article thumbnail

Using Tableau with DynamoDB: How to Build a Real-Time SQL Dashboard on NoSQL Data

Rockset

In this blog, we examine DynamoDB reporting and analytics, which can be challenging given the lack of SQL and the difficulty running analytical queries in DynamoDB. We will demonstrate how you can build an interactive dashboard with Tableau, using SQL on data from DynamoDB, in a series of easy steps, with no ETL involved. DynamoDB is a widely popular transactional primary data store.

NoSQL 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Cloud Analytic Migrations with Microsoft, Informatica & Teradata?

Teradata

Teradata partners Microsoft & Informatica announced that they are taking on cloud analytic migrations. Find out what this means for our on-premises customers.

Cloud 15
article thumbnail

Top Handy SQL Features for Data Scientists

KDnuggets

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

SQL 123
article thumbnail

Top 10 Reasons to Attend Kafka Summit

Confluent

Yes, the other definition of event sourcing. 1. Keynotes from leading technologists. At Kafka Summit SF, you’ll get to hear incredible keynotes from leading technologists, including Jay Kreps and Neha Narkhede , original co-creators of Apache Kafka ®. In the past, we’ve featured Chris D’Agostino, James Watters, Martin Kleppmann, and Martin Fowler. This time around, we’re delighted to have Devendra Tagare , Engineering Manager of Streaming Platforms from Lyft and Chris Kasten , VP of Walmart Clou

Kafka 19
article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

R Users’ Salaries from the 2019 Stackoverflow Survey

KDnuggets

Let’s take a look on what R users are saying about their salaries. Note that the following results could be biased because of unrepresentative and in some cases small samples.

123
123
article thumbnail

Is Kaggle Learn a “Faster Data Science Education?”

KDnuggets

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

Education 123
article thumbnail

How to Become More Marketable as a Data Scientist

KDnuggets

As a data scientist, you are in high demand. So, how can you increase your marketability even more? Check out these current trends in skills most desired by employers in 2019.

Data 123
article thumbnail

Statistical Modelling vs Machine Learning

KDnuggets

At times it may seem Machine Learning can be done these days without a sound statistical background but those people are not really understanding the different nuances. Code written to make it easier does not negate the need for an in-depth understanding of the problem.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!