21 Cheat Sheets for Data Science Interviews
KDnuggets
JUNE 1, 2022
This article has researched and presents the best data science cheat sheets from around the internet, so you don’t have to do it yourself.
KDnuggets
JUNE 1, 2022
This article has researched and presents the best data science cheat sheets from around the internet, so you don’t have to do it yourself.
Azure Data Engineering
MAY 28, 2022
ARM or Azure Resource Manager templates make it easy to manage deployments for Data Factory. When we connect Data Factory to a source control repository (e.g. GitHub or Azure DevOps Git), the data factory along with all its artefacts ( pipelines , datasets , linked services etc.) is saved in the repository in the form of ARM templates. We can then create DevOps pipelines to manage deployments by overriding the parameters to deploy to the production environments.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Confluent
MAY 30, 2022
Kafka is horizontally scalable, but it's not enough. So we made Confluent Cloud 10x more elastic - 10x faster to scale up to GB/s or down to zero, easier to use, and cost-effective.
Cloudera
JUNE 2, 2022
Since 2015, the Cloudera DataFlow team has been helping the largest enterprise organizations in the world adopt Apache NiFi as their enterprise standard data movement tool. Over the last few years, we have had a front-row seat in our customers’ hybrid cloud journey as they expand their data estate across the edge, on-premise, and multiple cloud providers.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
MAY 30, 2022
Also: Decision Tree Algorithm, Explained; Data Science Projects That Will Land You The Job in 2022; The 6 Python Machine Learning Tools Every Data Scientist Should Know About; Naïve Bayes Algorithm: Everything You Need to Know.
Data Engineering Podcast
MAY 29, 2022
Summary A large fraction of data engineering work involves moving data from one storage location to another in order to support different access and query patterns. Singlestore aims to cut down on the number of database engines that you need to run so that you can reduce the amount of copying that is required. By supporting fast, in-memory row-based queries and columnar on-disk representation, it lets your transactional and analytical workloads run in the same database.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Confluent
MAY 30, 2022
What we’ve done to evolve from cloud Kafka to Confluent Cloud, a data streaming platform that’s 10X better than Kafka in elasticity, storage, resiliency, and more.
KDnuggets
MAY 30, 2022
A machine learning engineer is a programmer proficient in building and designing software to automate predictive models. They have a deeper focus on computer science, compared to data scientists.
Data Engineering Podcast
MAY 29, 2022
Summary The latest generation of data warehouse platforms have brought unprecedented operational simplicity and effectively infinite scale. Along with those benefits, they have also introduced a new consumption model that can lead to incredibly expensive bills at the end of the month. In order to ensure that you can explore and analyze your data without spending money on inefficient queries Mingsheng Hong and Zheng Shao created Bluesky Data.
Cloudera
JUNE 3, 2022
Data scientists and machine learning engineers in enterprise organizations need to fully understand their data in order to properly analyze it, build models, and power machine learning use cases across their business. Due to the lack of tooling specifically designed for data discovery, exploration, and preliminary analysis, this presents a significant challenge for these teams. .
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Rockset
JUNE 3, 2022
Zembula is a Portland, Oregon-based venture-backed startup that is breaking new ground in real-time customer personalization. Expanding Smart Banners to all kinds of promotional emails caused our traffic to explode 10x. We needed a lower-ops, cost-effective and scalable database to pave the way for our next 100x of growth. — Robert Haydock, CEO, Zembula We have developed technology enabling companies to deliver emails that are dynamic and hyper relevant to every recipient.
KDnuggets
MAY 30, 2022
Get into the highly in-demand world of data engineering for free and earn 6 figures salary.
Rock the JVM
JUNE 1, 2022
Scala Options are among the first concepts we encounter: Discover what they do, why they're useful, and their importance in programming
Cloudera
JUNE 1, 2022
Imagine you’re the superintendent of a school district and you discover that your district has a problem with bullying. How do you go about enacting an informed policy that will help stem that problem? Where would you find the data to support your decision? Even if you could collect all the data around bullying incidents in the district over the past several years, do you have the time and knowledge to analyze that data?
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
KDnuggets
JUNE 3, 2022
Check out a this article for a better understanding of activation functions.
Monte Carlo
MAY 31, 2022
This article is sourced based on the interview between Lior Solomon, (now the former) VP of Engineering, Data, at Vimeo with the co-founders of Firebolt on their Data Engineering Show podcast which took place August 18, 2021. Watch the full episode. Vimeo is a leading video hosting, sharing, and services platform provider. The 1,000+ company helps small, medium and enterprise businesses scale with the impact of video.
AltexSoft
MAY 30, 2022
In the modern world, there’s hardly a business that doesn’t need a communication channel with its customers. Here’s the catch though. According to Meta (formerly Facebook), 64 percent of people would prefer to message rather than speak to a human call center agent on the phone. Besides that, customers want timely responses to whatever questions they have.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
DataKitchen
JUNE 1, 2022
DataOps Mission Control. Data Teams can’t answer very basic questions about the many, many pipelines they have in production and in development. For example: Data. Is there a troublesome pipeline (lots of errors, intermittent errors)? Did my source files/data arrive on time? Is the data in the report I am looking at “fresh”? Is my output data the right quality?
KDnuggets
JUNE 1, 2022
Join the best data science professional groups on LinkedIn to share insights and experiences, ask for guidance, and build valuable connections.
Monte Carlo
MAY 31, 2022
When a data pipeline breaks, data engineers need to immediately understand where the rupture occurred and what has been impacted. Data downtime is costly. Without data lineage –a map of how assets are connected and data moves across its lifecycle–data engineers might as well conduct their incident triage and root cause analysis blindfolded. Field-level data lineage (not necessarily Spark lineage) with hundreds of connections between objects in upstream and downstream tables.
KDnuggets
MAY 30, 2022
Interested in a survey of important database concepts and terminology? This post concisely defines 16 essential database key terms.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
KDnuggets
JUNE 3, 2022
In this article, we will go beyond the theoretical realm of what a data science manager does and focus more on how to become an “effective” data science manager.
KDnuggets
JUNE 3, 2022
Learn the basics of Q-learning in this article, a model-free reinforcement learning algorithm.
KDnuggets
JUNE 2, 2022
This article presents the top industries and companies that are currently actively hiring data scientists.
KDnuggets
JUNE 1, 2022
Also: Python Libraries Data Scientists Should Know in 2022; The Complete Collection Of Data Repositories - Part 1; Top YouTube Channels for Learning Data Science; 7 Steps to Mastering SQL for Data Science; A Brief Introduction to Papers With Code.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
KDnuggets
MAY 31, 2022
This article will explore a few areas that we feel are essential when assessing data management solutions for computer vision.
KDnuggets
MAY 31, 2022
Add Layer to your existing ML code and quickly get a rich model and data registry with experiment tracking!
KDnuggets
JUNE 1, 2022
The Complete Collection of Data Science Books - Part 2; Data Science Projects That Will Land You The Job in 2022; How to Become a Machine Learning Engineer; Dynamic Time Warping Algorithm in Time Series, Explained; Free Data Engineering Courses.
Let's personalize your content