March, 2020

article thumbnail

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

KDnuggets

We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!

article thumbnail

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Uber Engineering

The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform , regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture … The post Why We Leverage Multi-tenancy in Uber’s Microservice Architecture appeared first on Uber Engineering Blog.

article thumbnail

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

article thumbnail

Ready for changes with Hexagonal Architecture

Netflix Tech

by Damir Svrtan and Sergii Makagon As the production of Netflix Originals grows each year, so does our need to build apps that enable efficiency throughout the entire creative process. Our wider Studio Engineering Organization has built more than 30 apps that help content progress from pitch (aka screenplay) to playback: ranging from script content acquisition, deal negotiations and vendor management to scheduling, streamlining production workflows, and so on.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How to process simple data stream and consume with Lambda

Team Data Science

I built a serverless architecture for my simulated credit card complaints stream using, AWS S3 AWS Lambda AWS Kinesis the above picture gives a high-level view of the data flow. I assume uploading the CSV file as a data producer, so once you upload a file, it generates object created event and the Lambda function is invoked asynchronously. The file data content will be written to the Kinesis stream as a record (record = data + partition key), which triggers another Lambda function and persist th

Process 130
article thumbnail

Scheduling a SQL script, using Apache Airflow, with an example

Start Data Engineering

One of the most common use cases for Apache Airflow is to run scheduled SQL scripts. Developers who start with Airflow often ask the following questions “How to use airflow to orchestrate sql?

SQL 130

More Trending

article thumbnail

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

Confluent

Single-cluster deployments of Apache Kafka® are rare. Most medium to large deployments employ more than one Kafka cluster, and even the smallest use cases include development, testing, and production clusters. […].

Kafka 122
article thumbnail

Improving Prediction of the Unconfirmed COVID-19 Cases

Teradata

With the lack of available tests & uncertainty around the true number of COVID-19 cases, Teradata Epidemiologist Daniel Ulatowski & Data Scientist Jack McCush hypothesize how symptomatic data & the Vantage ML Engine can be utilized to predict cases.

Utilities 128
article thumbnail

The Life Of A Non-Profit Data Professional

Data Engineering Podcast

Summary Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a shoe-string budget makes it even more challenging. In this episode Tyler Colby shares his experiences working as a data professional in the non-profit sector. From managing Salesforce data models to wrangling a multitude of data sources and compliance challenges, he describes the biggest challenges that he is facing.

AWS 100
article thumbnail

Simplistic Ways to Find Interesting Data Sets

Team Data Science

I am taking you through my recent experience to find a dataset for my project. Industry Search To work with data, I need to narrow down the industry like health care, finance, insurance or other. I defined a few sources in my earlier blog post, which will give a sneak peek of techniques to extract industries. For Instance, most of the job listings introduce their job description as, One of the top insurance client looking for Data Engineer which exposes the industry.

Insurance 130
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

10 Key skills, to help you become a data engineer

Start Data Engineering

This article gives you an overview of the 10 key skills you need to become a better data engineer. If you are struggling to get started on what to learn, start with the first topic and proceed through the list.

article thumbnail

Time Series Classification Synthetic vs Real Financial Time Series

KDnuggets

This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.

Finance 160
article thumbnail

Kafka Connect Elasticsearch Connector in Action

Confluent

The Elasticsearch sink connector helps you integrate Apache Kafka® and Elasticsearch with minimum effort. You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be […].

Kafka 121
article thumbnail

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

Netflix Tech

By Ben Sykes Continue reading on Netflix TechBlog ».

Kafka 98
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Behind The Scenes Of The Linode Object Storage Service

Data Engineering Podcast

Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at Linode recently completed to bring a fast and reliable S3 compatible object storage to production for your benefit.

Media 100
article thumbnail

People, We Need to Talk About Mass Electronic Surveillance

Teradata

With the COVID-19 epidemic in full swing, the countries that are faring the best are employing large-scale testing and electronic surveillance. But what does this mean for our civil liberties?

article thumbnail

Sending HTTP Requests with Scala and Akka in 5 Minutes

Rock the JVM

Learn to use Akka HTTP with Scala and send HTTP requests in just a few minutes with the Akka HTTP server DSL

Scala 52
article thumbnail

Covid-19, your community, and you — a data science perspective

KDnuggets

Let's talk about covid-19; the reality, the numbers, and the data science.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Building a Cloud ETL Pipeline on Confluent Cloud

Confluent

As enterprises move more and more of their applications to the cloud, they are also moving their on-prem ETL (extract, transform, load) pipelines to the cloud, as well as building […].

Cloud 119
article thumbnail

Introducing Dispatch

Netflix Tech

By Kevin Glisson, Marc Vilanova, Forest Monsen Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch! Okay, but what is Dispatch? Put simply, Dispatch is: All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

article thumbnail

Building A New Foundation For CouchDB

Data Engineering Podcast

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB.

Building 100
article thumbnail

Five Books Every CX Leader Should Read in this Time of Social Distancing

Teradata

Check out this curated reading list of books on customer experience. From updated classics to new research and insights into how large enterprises can drive business outcomes from a CX initiative.

59
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Why Is Contravariance So Hard in Scala?

Rock the JVM

Unravel the complexities of Scala's powerful type system with our deep dive into contravariance: we simplify and demystify its challenging aspects

Scala 52
article thumbnail

20+ Machine Learning Datasets & Project Ideas

KDnuggets

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

article thumbnail

Sharpening your Stream Processing Skills with Kafka Tutorials

Confluent

In the Apache Kafka® ecosystem, ksqlDB and Kafka Streams are two popular tools for building event streaming applications that are tightly integrated with Apache Kafka. While ksqlDB and Kafka Streams […].

Kafka 115
article thumbnail

Open-Sourcing riskquant, a library for quantifying risk

Netflix Tech

Netflix has a program in our Information Security department for quantifying the risk of deliberate (attacker-driven) and accidental… Continue reading on Netflix TechBlog ».

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Scaling Data Governance For Global Businesses With A Data Hub Architecture

Data Engineering Podcast

Summary Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well considered and coherent strategy. In this episode Tim Ward describes an architecture that he has used successfully with multiple organizations to scale compliance. By treating it as a graph problem, where each hub in the network has localized control with inheritance of higher level controls it reduces overhead and provides greater flexibility.

article thumbnail

Teradata's Response to COVID-19

Teradata

How Teradata is responding to the COVID-19 crisis for the health and well-being of its employees, customers and partners.

IT 59
article thumbnail

Why Is Contravariance So Hard in Scala?

Rock the JVM

Unravel the complexities of Scala's powerful type system with our deep dive into contravariance: we simplify and demystify its challenging aspects

Scala 52
article thumbnail

50 Must-Read Free Books For Every Data Scientist in 2020

KDnuggets

In this article, we are listing down some excellent data science books which cover the wide variety of topics under Data Science.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.