Sat.Oct 30, 2021 - Fri.Nov 05, 2021

article thumbnail

A First Principles Theory of Generalization

KDnuggets

Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.

160
160
article thumbnail

The Future of SQL: Databases Meet Stream Processing

Confluent

SQL has proven to be an invaluable asset for most software engineers building software applications. Yet, the world as we know it has changed dramatically since SQL was created in […].

SQL 132
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Airflow Timetable: Schedule your DAGs like never before

Marc Lamberti

Airflow Timetable. This new concept introduced in Airflow 2.2 is going to change your way of scheduling your data pipelines. Or I would say, you’re finally going to have all the freedom and flexibility you ever dreamt of for scheduling your DAGs. What if you want to run your DAG for specific schedule intervals with “holes” in between?

article thumbnail

Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS

Cloudera

Have you ever asked a data scientist if they wanted their code to run faster? You would probably get a more varied response asking if the earth is flat. It really isn’t any different from anything else in tech, faster is almost always better. One of the best ways to make a substantial improvement in processing time is to, if you haven’t already, switched from CPUs to GPUs.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Design Patterns for Machine Learning Pipelines

KDnuggets

ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.

article thumbnail

How Do You Change a Never-Ending Query?

Confluent

There’s a philosophical puzzle of the Ship of Theseus where throughout a long voyage planks in a ship are individually replaced as they begin to rot. At the end, there […].

Process 126

More Trending

article thumbnail

Accelerate Insight with Proactive Data Governance Practices

Cloudera

Becoming a data-driven organization is not exactly getting any easier. Businesses are flooded with ever more data. Although it is true that more data enables more insight, the effort needed to separate the wheat from the chaff grows exponentially. Doing so and truly understanding the data is more important than ever, especially when data privacy regulations are tightening.

article thumbnail

Data Scientist Career Path from Novice to First Job

KDnuggets

If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.

Education 159
article thumbnail

Readings in Streaming Database Systems

Confluent

What will the next important category of databases look like? For decades, relational databases were the undisputed home of data. They powered everything: from websites to analytics, from customer data […].

Database 121
article thumbnail

The vast majority of data engineers are burnt out. Those working in healthcare are no exception

DataKitchen

The post The vast majority of data engineers are burnt out. Those working in healthcare are no exception first appeared on DataKitchen.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cloudera Ireland Center of Excellence Certified as a Great Place to Work

Cloudera

Today is an exciting day for Cloudera as our Ireland Centre of Excellence (COE) in Cork has been certified as a Great Place To Work. It is an outstanding achievement that is testament to the culture of Cloudera and we’re delighted that we smashed many of the set benchmarks. To achieve certification we needed a composite score of >64.5% on the Employee Engagement Survey and Culture Audit Submission.

article thumbnail

Neural Networks from a Bayesian Perspective

KDnuggets

This article looks at neural networks from a Bayesian perspective.

158
158
article thumbnail

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.

article thumbnail

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows. The DataKitchen Platform is a “ process hub” that masters and optimizes those processes.

Process 98
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

#ClouderaLife Spotlight: William Dailey, Senior Technical Instructor

Cloudera

On November 11 th we celebrate Veterans and Armistice Day honoring those who have served in the military. To commemorate this special occasion, this month, we will spotlight two Clouderans who have served in the military both in the United States and the United Kingdom. For this week’s spotlight, I sat down with Clouderan William Dailey who served in the United States Navy.

article thumbnail

ORDAINED: The Python Project Template

KDnuggets

Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.

Python 158
article thumbnail

Netflix Video Quality at Scale with Cosmos Microservices

Netflix Tech

by Christos G. Bampis , Chao Chen , Anush K. Moorthy and Zhi Li Introduction Measuring video quality at scale is an essential component of the Netflix streaming pipeline. Perceptual quality measurements are used to drive video encoding optimizations , perform video codec comparisons , carry out A/B testing and optimize streaming QoE decisions to mention a few.

Media 71
article thumbnail

Battle for Data Pros Heats Up as Burnout Builds

DataKitchen

The post Battle for Data Pros Heats Up as Burnout Builds first appeared on DataKitchen.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A Fresh Squeeze on Data

Cloudera

Guest Author Roozbeh Aliabadi is CEO at ReadyAI. Our children have the right to be AI-educated so they can thrive intellectually, emotionally, and morally alongside AI. In the next decade or so, for most children, AI will be their co-workers, drivers, insurance agents, customer service reps, bank tellers, receptionists, radiologists, in short, a natural part of their lives.

article thumbnail

AI Infinite Training & Maintaining Loop

KDnuggets

Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.

Designing 157
article thumbnail

4 Key Design Principles and Guarantees of Streaming Databases

Confluent

Classic relational database management systems (RDBMS) distribute and organize data in a relatively static storage layer. When queries are requested, they compute on the stored data and then return results […].

article thumbnail

The Data Janitor Letters - October 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. Eating the Cloud from Outside In Shawn Wang, Developer Experience, Temporal.io AWS is playing Chess. Cloudflare is playing Go. Why Lightspeed invested in ClickHouse: a database built for speed Gaurav Gupta, VC, Lightspeed Venture Partners $250M Series B financing of ClickHouse.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Case Study: Powering Customer-Facing Dashboards at Scale Using Rockset with PostgreSQL at DataBrain

Rockset

Summary: DataBrain, a SaaS company, was using PostgreSQL through Amazon RDS to land and query incoming customer data. However, PostgreSQL couldn’t scale, quickly ingest schemaless data, or efficiently run analytics as DataBrain’s data grew. Plus, incoming customer data had a dynamic schema, making it painful and expensive for DataBrain to clean the data for PostgreSQL and run queries.

article thumbnail

Machine Learning Safety: Unsolved Problems

KDnuggets

There remain critical challenges in machine learning that, if left resolved, could lead to unintended consequences and unsafe use of AI in the future. As an important and active area of research, roadmaps are being developed to help guide continued ML research and use toward meaningful and robust applications.

article thumbnail

7 Step Guide to Become a Freelance Data Scientist

ProjectPro

If you are tired of googling how to become a freelance data scientist , you need to relax because your search is finally over. In this blog, we have presented a step by step guide for becoming a freelance data scientist and a quick and easy way of getting hired as a freelance data scientist. So, take a backseat and simply continue reading our blog. With COVID-19 restrictions forcing companies to lay off their employees, millions of individuals who lost their jobs decided to navigate a freelance

article thumbnail

Parallel Run Pattern - A Migration Technique in Microservices Architecture

Zalando Engineering

The business landscape in Zalando is growing every day. This continuous growth implies that we need to be able to cope with an ever-changing environment. Everyone with experience in software development knows that dealing with changes is a challenging problem. Especially, when the software is already working in production. Changing the software in production is like changing the tires on a car while it is still moving.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Getting Started with Apache Spark, S3 and Rockset for Real-Time Analytics

Rockset

Apache Spark is an open-source project that was started at UC Berkeley AMPLab. It has an in-memory computing framework that allows it to process data workloads in batch and in real-time. Even though Spark is written in Scala, you can interact with Spark with multiple languages like Spark, Python, and Java. Here are some examples of the things you can do in your apps with Apache Spark: Build continuous ETL pipelines for stream processing SQL BI and analytics Do machine learning, and much more!

Scala 52
article thumbnail

Visual Scoring Techniques for Classification Models

KDnuggets

Read this article assessing a model performance in a broader context.

Coding 153
article thumbnail

If Facebook Can Go Down, What About Your Cloud Provider?

Teradata

Banks’ reliance on a handful of global cloud providers presents regulators with a new headache. Find out more.

Cloud 52
article thumbnail

10 Tips to Overcome Data Engineer Burnout

DataKitchen

data.world's Bryon Jacob & DataKitchen's Chris Bergh discuss why Data Engineers are burnt out & how data teams can fix & prevent burnout with DataOps. The post 10 Tips to Overcome Data Engineer Burnout first appeared on DataKitchen.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.