A First Principles Theory of Generalization
KDnuggets
NOVEMBER 4, 2021
Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.
KDnuggets
NOVEMBER 4, 2021
Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.
Confluent
NOVEMBER 3, 2021
SQL has proven to be an invaluable asset for most software engineers building software applications. Yet, the world as we know it has changed dramatically since SQL was created in […].
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Marc Lamberti
NOVEMBER 2, 2021
Airflow Timetable. This new concept introduced in Airflow 2.2 is going to change your way of scheduling your data pipelines. Or I would say, you’re finally going to have all the freedom and flexibility you ever dreamt of for scheduling your DAGs. What if you want to run your DAG for specific schedule intervals with “holes” in between?
Cloudera
NOVEMBER 3, 2021
Have you ever asked a data scientist if they wanted their code to run faster? You would probably get a more varied response asking if the earth is flat. It really isn’t any different from anything else in tech, faster is almost always better. One of the best ways to make a substantial improvement in processing time is to, if you haven’t already, switched from CPUs to GPUs.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
NOVEMBER 2, 2021
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
Confluent
NOVEMBER 5, 2021
There’s a philosophical puzzle of the Ship of Theseus where throughout a long voyage planks in a ship are individually replaced as they begin to rot. At the end, there […].
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
DataKitchen
NOVEMBER 4, 2021
The post The vast majority of data engineers are burnt out. Those working in healthcare are no exception first appeared on DataKitchen.
KDnuggets
NOVEMBER 2, 2021
Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.
Confluent
NOVEMBER 2, 2021
What will the next important category of databases look like? For decades, relational databases were the undisputed home of data. They powered everything: from websites to analytics, from customer data […].
AltexSoft
OCTOBER 30, 2021
Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
DataKitchen
NOVEMBER 4, 2021
Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows. The DataKitchen Platform is a “ process hub” that masters and optimizes those processes.
KDnuggets
NOVEMBER 5, 2021
There remain critical challenges in machine learning that, if left resolved, could lead to unintended consequences and unsafe use of AI in the future. As an important and active area of research, roadmaps are being developed to help guide continued ML research and use toward meaningful and robust applications.
Cloudera
NOVEMBER 2, 2021
Becoming a data-driven organization is not exactly getting any easier. Businesses are flooded with ever more data. Although it is true that more data enables more insight, the effort needed to separate the wheat from the chaff grows exponentially. Doing so and truly understanding the data is more important than ever, especially when data privacy regulations are tightening.
Netflix Tech
NOVEMBER 2, 2021
by Christos G. Bampis , Chao Chen , Anush K. Moorthy and Zhi Li Introduction Measuring video quality at scale is an essential component of the Netflix streaming pipeline. Perceptual quality measurements are used to drive video encoding optimizations , perform video codec comparisons , carry out A/B testing and optimize streaming QoE decisions to mention a few.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
DataKitchen
NOVEMBER 2, 2021
The post Battle for Data Pros Heats Up as Burnout Builds first appeared on DataKitchen.
KDnuggets
NOVEMBER 3, 2021
If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.
Cloudera
NOVEMBER 1, 2021
Today is an exciting day for Cloudera as our Ireland Centre of Excellence (COE) in Cork has been certified as a Great Place To Work. It is an outstanding achievement that is testament to the culture of Cloudera and we’re delighted that we smashed many of the set benchmarks. To achieve certification we needed a composite score of >64.5% on the Employee Engagement Survey and Culture Audit Submission.
Confluent
NOVEMBER 4, 2021
Classic relational database management systems (RDBMS) distribute and organize data in a relatively static storage layer. When queries are requested, they compute on the stored data and then return results […].
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Rockset
NOVEMBER 5, 2021
Summary: DataBrain, a SaaS company, was using PostgreSQL through Amazon RDS to land and query incoming customer data. However, PostgreSQL couldn’t scale, quickly ingest schemaless data, or efficiently run analytics as DataBrain’s data grew. Plus, incoming customer data had a dynamic schema, making it painful and expensive for DataBrain to clean the data for PostgreSQL and run queries.
KDnuggets
NOVEMBER 4, 2021
Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.
Cloudera
NOVEMBER 1, 2021
On November 11 th we celebrate Veterans and Armistice Day honoring those who have served in the military. To commemorate this special occasion, this month, we will spotlight two Clouderans who have served in the military both in the United States and the United Kingdom. For this week’s spotlight, I sat down with Clouderan William Dailey who served in the United States Navy.
Pipeline Data Engineering
NOVEMBER 4, 2021
Data engineering salon. News and interesting reads about the world of data. Eating the Cloud from Outside In Shawn Wang, Developer Experience, Temporal.io AWS is playing Chess. Cloudflare is playing Go. Why Lightspeed invested in ClickHouse: a database built for speed Gaurav Gupta, VC, Lightspeed Venture Partners $250M Series B financing of ClickHouse.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Rockset
NOVEMBER 4, 2021
Apache Spark is an open-source project that was started at UC Berkeley AMPLab. It has an in-memory computing framework that allows it to process data workloads in batch and in real-time. Even though Spark is written in Scala, you can interact with Spark with multiple languages like Spark, Python, and Java. Here are some examples of the things you can do in your apps with Apache Spark: Build continuous ETL pipelines for stream processing SQL BI and analytics Do machine learning, and much more!
KDnuggets
NOVEMBER 3, 2021
This article looks at neural networks from a Bayesian perspective.
Cloudera
NOVEMBER 5, 2021
Guest Author Roozbeh Aliabadi is CEO at ReadyAI. Our children have the right to be AI-educated so they can thrive intellectually, emotionally, and morally alongside AI. In the next decade or so, for most children, AI will be their co-workers, drivers, insurance agents, customer service reps, bank tellers, receptionists, radiologists, in short, a natural part of their lives.
ProjectPro
NOVEMBER 3, 2021
If you are tired of googling how to become a freelance data scientist , you need to relax because your search is finally over. In this blog, we have presented a step by step guide for becoming a freelance data scientist and a quick and easy way of getting hired as a freelance data scientist. So, take a backseat and simply continue reading our blog. With COVID-19 restrictions forcing companies to lay off their employees, millions of individuals who lost their jobs decided to navigate a freelance
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Zalando Engineering
NOVEMBER 3, 2021
The business landscape in Zalando is growing every day. This continuous growth implies that we need to be able to cope with an ever-changing environment. Everyone with experience in software development knows that dealing with changes is a challenging problem. Especially, when the software is already working in production. Changing the software in production is like changing the tires on a car while it is still moving.
KDnuggets
NOVEMBER 2, 2021
Machine Learning vs NLP vs Data Engineer vs Data Scientist, and what it means to be in each role.
Teradata
NOVEMBER 3, 2021
Banks’ reliance on a handful of global cloud providers presents regulators with a new headache. Find out more.
Rockset
NOVEMBER 2, 2021
Rockset’s new partner integrations with leading reverse ETL platforms Census, Hightouch, and Omnata will enable everyday business tools to consume real-time customer insights seamlessly from Rockset. The World is Moving to Real-Time Data The scope of data that is generated and collected throughout an organization is growing exponentially. This makes it difficult for leaders who are tasked with organizing and managing all this information.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content