The 6 Python Machine Learning Tools Every Data Scientist Should Know About
KDnuggets
MAY 20, 2022
Let's look at six must-have tools every data scientist should use.
KDnuggets
MAY 20, 2022
Let's look at six must-have tools every data scientist should use.
Start Data Engineering
MAY 18, 2022
1. Introduction 2. Set up 3. Reproducibility 3.1. Docker 3.2. Docker Compose 4. Developer ergonomics 4.1. Formatting and testing 4.2. Makefile 5. Conclusion 6. Further reading 7. References 1. Introduction Data systems usually involve multiple systems, which makes local development challenging.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Confluent
MAY 17, 2022
I’m proud to announce the release of Apache Kafka 3.2.0 on behalf of the Apache Kafka® community. The 3.2.0 release contains many new features and improvements. This blog will highlight […].
Azure Data Engineering
MAY 15, 2022
When it comes to transforming structured data, (e.g., applying business logic, standardization etc.) stored in a database, SQL is the most convenient and fit-to-purpose option. Stored procedures provide a way to store the transformation logic as a set of SQL statements that can be re-executed as pre-compiled code. The Stored Procedure Activity in Data Factory provides and simple and convenient way to execute Stored Procedures.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
MAY 19, 2022
Read the best books on Programming, Statistics, Data Engineering, Web Scraping, Data Analytics, Business Intelligence, Data Applications, Data Management, Big Data, and Cloud Architecture.
Data Engineering Podcast
MAY 15, 2022
Summary Industrial applications are one of the primary adopters of Internet of Things (IoT) technologies, with business critical operations being informed by data collected across a fleet of sensors. Vopak is a business that manages storage and distribution of a variety of liquids that are critical to the modern world, and they have recently launched a new platform to gain more utility from their industrial sensors.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Teradata
MAY 19, 2022
Faced with persistent supply chain disruption automotive companies need a new approach to planning. Find out more.
KDnuggets
MAY 16, 2022
This guide will help aspiring data scientists and machine learning engineers gain better knowledge and experience. I will list different types of machine learning algorithms, which can be used with both Python and R.
Data Engineering Podcast
MAY 15, 2022
Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Srivatsan Sridharan has had the opportunity to design, build, and run data lake platforms for both Yelp and Robinhood, with many valuable lessons learned from each experience.
Cloudera
MAY 19, 2022
From fashion to data flow, in this #ClouderaLife Spotlight Margot talks about her career transition from fashion design to cloud computing and her co-founding of Cloudera’s Asian American and Pacific Islander community Employee Resource Group amid the racial tensions of 2021. . It started with feeling stuck and ended with a brand-new career (BTW, lots of hard work in the middle).
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Confluent
MAY 18, 2022
There’s an increasing need for businesses to act intelligently and in real time to win in today’s digital-first world. To achieve this, forward-thinking companies are modernizing their data infrastructure with […].
KDnuggets
MAY 16, 2022
This post provides a concise overview of 18 natural language processing terms, intended as an entry point for the beginner looking for some orientation on the topic.
dbt Developer Hub
MAY 18, 2022
If you're reading this article, it looks like you're wondering how you can better optimize your Redshift queries - and you're probably wondering how you can do that in conjunction with dbt. In order to properly optimize, we need to understand why we might be seeing issues with our performance and how we can fix these with dbt sort and dist configurations.
Emeritus
MAY 19, 2022
With businesses increasingly relying on data for their day-to-day operations, the role of a data engineer has emerged as one of the most sought-after professions in the industry. But what does a data engineer do exactly? And why is it in demand? According to McKinsey, by 2025, smart workflows and seamless interactions between humans and… The post What Does a Data Engineer do?
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Confluent
MAY 18, 2022
With more data being produced in real time by many systems and devices than ever before, it is critical to be able to process it in real time and get […].
KDnuggets
MAY 19, 2022
What should you be looking for in an IDE? Find out here.
Big Data Tools
MAY 19, 2022
Long time no see! Sorry about the silence, but luckily we’re back. Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see.
Emeritus
MAY 19, 2022
Data is so ubiquitous and valuable that it is touted as the new currency. From data analytics to data engineering, everything is data-centric. As Carly Fiorina, the former Chief Executive Officer of Hewlett Packard, said, “The goal is to turn data into information, and information into insight.” Data allows leaders to make informed decisions that… The post What Does a Data Engineer Do and How Can You Become One?
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Rockset
MAY 17, 2022
This is the fourth post in a series by Rockset's CTO Dhruba Borthakur on Designing the Next Generation of Data Systems for Real-Time Analytics. We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! Posts published so far in the series: Why Mutability Is Essential for Real-Time Data Analytics Handling Out-of-Order Data in Real-Time Analytics Applications Handling Bursty Traffic in Real-Time Analytics Applications SQL and Complex Queries
KDnuggets
MAY 18, 2022
Complete guide and blog post series on IT Operations Management with AIOps. Using AI and Machine Learning to manage IT complexity to deliver world class IT service while keeping the lights on.
Big Data Tools
MAY 19, 2022
Long time no see! Sorry about the silence, but luckily we’re back. Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see.
Rock the JVM
MAY 16, 2022
Scala's general type projections are considered unsound and were removed in Scala 3: discover what this means and how it affects your code
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Monte Carlo
MAY 16, 2022
Data lineage isn’t new, but automation has finally made it accessible and scalable—to a certain extent. In the old days (way back in the mid-2010s), lineage happened through a lot of manual work. This involved identifying data assets, tracking them to their ingestion sources, documenting those sources, mapping the path of data as it moved through various pipelines and stages of transformation, and pinpointing where the data was served up in dashboards and reports.
KDnuggets
MAY 18, 2022
Here’s how you can use your data skills to generate side income from home.
dbt Developer Hub
MAY 16, 2022
Analytics engineers (AEs) are constantly navigating through the names of the models in their project, so naming is important for maintainability in your project in the way you access it and work within it. By default, dbt will use your model file name as the view or table name in the database. But this means the name has a life outside of dbt and supports the many end users who will potentially never know about dbt and where this data came from, but still access the database objects in the datab
KDnuggets
MAY 20, 2022
Most companies haven’t seen ROI from machine learning since the benefit is only realized when the models are in production. Here’s how to make sure your ML project works.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
KDnuggets
MAY 16, 2022
A simple guide to reinforcement learning for a complete beginner. The blog includes definitions with examples, real-life applications, key concepts, and various types of learning resources.
KDnuggets
MAY 20, 2022
We give a taxonomy of the trustworthy GNNs in privacy, robustness, fairness, and explainability. For each aspect, we categorize existing works into various categories, give general frameworks in each category, and more.
KDnuggets
MAY 17, 2022
Is the data warehouse broken? Is the "immutable data warehouse" the right path for your data team? Learn more here.
KDnuggets
MAY 17, 2022
As a data scientist, you might have a great portfolio of technical skills, but if you can’t communicate effectively, you won’t be able to convey your ideas clearly during virtual meetings.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content