A First Principles Theory of Generalization
KDnuggets
NOVEMBER 4, 2021
Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.
KDnuggets
NOVEMBER 4, 2021
Some new research from University of California, Berkeley shades some new light into how to quantify neural networks knowledge.
Confluent
NOVEMBER 3, 2021
SQL has proven to be an invaluable asset for most software engineers building software applications. Yet, the world as we know it has changed dramatically since SQL was created in […].
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Marc Lamberti
NOVEMBER 2, 2021
Airflow Timetable. This new concept introduced in Airflow 2.2 is going to change your way of scheduling your data pipelines. Or I would say, you’re finally going to have all the freedom and flexibility you ever dreamt of for scheduling your DAGs. What if you want to run your DAG for specific schedule intervals with “holes” in between?
Cloudera
NOVEMBER 3, 2021
Have you ever asked a data scientist if they wanted their code to run faster? You would probably get a more varied response asking if the earth is flat. It really isn’t any different from anything else in tech, faster is almost always better. One of the best ways to make a substantial improvement in processing time is to, if you haven’t already, switched from CPUs to GPUs.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
KDnuggets
NOVEMBER 2, 2021
ML pipeline design has undergone several evolutions in the past decade with advances in memory and processor performance, storage systems, and the increasing scale of data sets. We describe how these design patterns changed, what processes they went through, and their future direction.
Confluent
NOVEMBER 5, 2021
There’s a philosophical puzzle of the Ship of Theseus where throughout a long voyage planks in a ship are individually replaced as they begin to rot. At the end, there […].
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Cloudera
NOVEMBER 2, 2021
Becoming a data-driven organization is not exactly getting any easier. Businesses are flooded with ever more data. Although it is true that more data enables more insight, the effort needed to separate the wheat from the chaff grows exponentially. Doing so and truly understanding the data is more important than ever, especially when data privacy regulations are tightening.
KDnuggets
NOVEMBER 3, 2021
If you are beginning your data science journey, then you must be prepared to plan it out as a step-by-step process that will guide you from being a total newbie to getting your first job as a data scientist. These tips and educational resources should be useful for you and add confidence as you take that first big step.
Confluent
NOVEMBER 2, 2021
What will the next important category of databases look like? For decades, relational databases were the undisputed home of data. They powered everything: from websites to analytics, from customer data […].
DataKitchen
NOVEMBER 4, 2021
The post The vast majority of data engineers are burnt out. Those working in healthcare are no exception first appeared on DataKitchen.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Cloudera
NOVEMBER 1, 2021
Today is an exciting day for Cloudera as our Ireland Centre of Excellence (COE) in Cork has been certified as a Great Place To Work. It is an outstanding achievement that is testament to the culture of Cloudera and we’re delighted that we smashed many of the set benchmarks. To achieve certification we needed a composite score of >64.5% on the Employee Engagement Survey and Culture Audit Submission.
KDnuggets
NOVEMBER 3, 2021
This article looks at neural networks from a Bayesian perspective.
AltexSoft
OCTOBER 30, 2021
Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.
DataKitchen
NOVEMBER 4, 2021
Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows. The DataKitchen Platform is a “ process hub” that masters and optimizes those processes.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Cloudera
NOVEMBER 1, 2021
On November 11 th we celebrate Veterans and Armistice Day honoring those who have served in the military. To commemorate this special occasion, this month, we will spotlight two Clouderans who have served in the military both in the United States and the United Kingdom. For this week’s spotlight, I sat down with Clouderan William Dailey who served in the United States Navy.
KDnuggets
NOVEMBER 2, 2021
Recently I decided to take the time to better understand the Python packaging ecosystem and create a project boilerplate template as an improvement over copying a directory tree and doing find and replace.
Netflix Tech
NOVEMBER 2, 2021
by Christos G. Bampis , Chao Chen , Anush K. Moorthy and Zhi Li Introduction Measuring video quality at scale is an essential component of the Netflix streaming pipeline. Perceptual quality measurements are used to drive video encoding optimizations , perform video codec comparisons , carry out A/B testing and optimize streaming QoE decisions to mention a few.
DataKitchen
NOVEMBER 2, 2021
The post Battle for Data Pros Heats Up as Burnout Builds first appeared on DataKitchen.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Cloudera
NOVEMBER 5, 2021
Guest Author Roozbeh Aliabadi is CEO at ReadyAI. Our children have the right to be AI-educated so they can thrive intellectually, emotionally, and morally alongside AI. In the next decade or so, for most children, AI will be their co-workers, drivers, insurance agents, customer service reps, bank tellers, receptionists, radiologists, in short, a natural part of their lives.
KDnuggets
NOVEMBER 4, 2021
Productizing AI is an infrastructure orchestration problem. In planning your solution design, you should use continuous monitoring, retraining, and feedback to ensure stability and sustainability.
Confluent
NOVEMBER 4, 2021
Classic relational database management systems (RDBMS) distribute and organize data in a relatively static storage layer. When queries are requested, they compute on the stored data and then return results […].
Pipeline Data Engineering
NOVEMBER 4, 2021
Data engineering salon. News and interesting reads about the world of data. Eating the Cloud from Outside In Shawn Wang, Developer Experience, Temporal.io AWS is playing Chess. Cloudflare is playing Go. Why Lightspeed invested in ClickHouse: a database built for speed Gaurav Gupta, VC, Lightspeed Venture Partners $250M Series B financing of ClickHouse.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
Rockset
NOVEMBER 5, 2021
Summary: DataBrain, a SaaS company, was using PostgreSQL through Amazon RDS to land and query incoming customer data. However, PostgreSQL couldn’t scale, quickly ingest schemaless data, or efficiently run analytics as DataBrain’s data grew. Plus, incoming customer data had a dynamic schema, making it painful and expensive for DataBrain to clean the data for PostgreSQL and run queries.
KDnuggets
NOVEMBER 5, 2021
There remain critical challenges in machine learning that, if left resolved, could lead to unintended consequences and unsafe use of AI in the future. As an important and active area of research, roadmaps are being developed to help guide continued ML research and use toward meaningful and robust applications.
ProjectPro
NOVEMBER 3, 2021
If you are tired of googling how to become a freelance data scientist , you need to relax because your search is finally over. In this blog, we have presented a step by step guide for becoming a freelance data scientist and a quick and easy way of getting hired as a freelance data scientist. So, take a backseat and simply continue reading our blog. With COVID-19 restrictions forcing companies to lay off their employees, millions of individuals who lost their jobs decided to navigate a freelance
Zalando Engineering
NOVEMBER 3, 2021
The business landscape in Zalando is growing every day. This continuous growth implies that we need to be able to cope with an ever-changing environment. Everyone with experience in software development knows that dealing with changes is a challenging problem. Especially, when the software is already working in production. Changing the software in production is like changing the tires on a car while it is still moving.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
Rockset
NOVEMBER 4, 2021
Apache Spark is an open-source project that was started at UC Berkeley AMPLab. It has an in-memory computing framework that allows it to process data workloads in batch and in real-time. Even though Spark is written in Scala, you can interact with Spark with multiple languages like Spark, Python, and Java. Here are some examples of the things you can do in your apps with Apache Spark: Build continuous ETL pipelines for stream processing SQL BI and analytics Do machine learning, and much more!
KDnuggets
NOVEMBER 3, 2021
Read this article assessing a model performance in a broader context.
Teradata
NOVEMBER 3, 2021
Banks’ reliance on a handful of global cloud providers presents regulators with a new headache. Find out more.
DataKitchen
NOVEMBER 2, 2021
data.world's Bryon Jacob & DataKitchen's Chris Bergh discuss why Data Engineers are burnt out & how data teams can fix & prevent burnout with DataOps. The post 10 Tips to Overcome Data Engineer Burnout first appeared on DataKitchen.
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content