Sat.Feb 05, 2022 - Fri.Feb 11, 2022

article thumbnail

The Complete Collection of Data Science Cheat Sheets – Part 1

KDnuggets

A collection of cheat sheets that will help you prepare for a technical interview, assessment tests, class presentation, and help you revise core data science concepts.

article thumbnail

Scale Your Spatial Analysis By Building It In SQL With Syntax Extensions

Data Engineering Podcast

Summary Along with globalization of our societies comes the need to analyze the geospatial and geotemporal data that is needed to manage the growth in commerce, communications, and other activities. In order to make geospatial analytics more maintainable and scalable there has been an increase in the number of database engines that provide extensions to their SQL syntax that supports manipulation of spatial data.

SQL 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

#ClouderaLife Spotlight: Marque Blackman, Director of Global Workplace

Cloudera

As we celebrate Black History Month, for this Employee Spotlight I sat down with Marque Blackman, co-lead of the Cloudera Black Employee Network (CBEN). We discussed his experience at Cloudera, his career transitions, and what he learned along the way. We also discussed his work with CBEN and his perspective on Black History Month. Meet Marque Blackman, Director of Global Workplace .

article thumbnail

New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time

DataKitchen

The post New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time first appeared on DataKitchen.

Data 98
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Managing Your Reusable Python Code as a Data Scientist

KDnuggets

Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.

Python 156
article thumbnail

Scalable Strategies For Protecting Data Privacy In Your Shared Data Sets

Data Engineering Podcast

Summary There are many dimensions to the work of protecting the privacy of users in our data. When you need to share a data set with other teams, departments, or businesses then it is of utmost importance that you eliminate or obfuscate personal information. In this episode Will Thompson explores the many ways that sensitive data can be leaked, re-identified, or otherwise be at risk, as well as the different strategies that can be employed to mitigate those attack vectors.

More Trending

article thumbnail

Data pipeline asset management with Dataflow

Netflix Tech

by Sam Setegne, Jai Balani, Olek Gorajek Glossary asset ?—?any business logic code in a raw (e.g. SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a set of tasks (or jobs) to be executed in a predefined order (a.k.a. DAG) for the purpose of transforming data using some business logic. Dataflow ?

article thumbnail

How to Learn Math for Machine Learning

KDnuggets

So how much math do you need to know in order to work in the data science industry? The answer: Not as much as you think.

article thumbnail

How To Join Data in MongoDB

Rockset

MongoDB is one of the most popular databases for modern applications. It enables a more flexible approach to data modeling than traditional SQL databases. Developers can build applications more quickly because of this flexibility and also have multiple deployment options, from the cloud MongoDB Atlas offering through to the open-source Community Edition.

MongoDB 52
article thumbnail

Getting Started with Machine Learning

Cloudera

In recent years, Ethical AI has become an area of increased importance to organisations. Advances in the development and application of Machine Learning (ML) and Deep Learning (DL) algorithms, require greater care to ensure that the ethics embedded in previous rule-based systems are not lost. This has led to Ethical AI being an increasingly popular search term and the subject of many industry analyst reports and papers.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

ETL Testing Process

Grouparoo

Today, organizations are adopting modern ETL tools and approaches to gain as many insights as possible from their data. However, to ensure the accuracy and reliability of such insights, effective ETL testing needs to be performed. So what is an ETL tester’s responsibility? In this ETL testing tutorial, we’ll look at what ETL testing involves, the different types of ETL tests, and some challenges of ETL testing.

Process 52
article thumbnail

Junior Data Scientist: The Next Level

KDnuggets

There is a difference in the level of experience compared to Junior, Mid-Level, and Senior Data Scientists. This article will go through the expectations for all job roles and what is required to move up the ladder.

Data 129
article thumbnail

Palantir Developers: Learn to build in Palantir Foundry

Palantir

Introducing new resources for developers to elevate their impact in Foundry. Everyone in an organization should be able to use the right data to make the best decisions. That’s why Palantir is committed to making Foundry as intuitive and accessible as possible — not only for data scientists and engineers, but also for sales, product development, recruiting, and more.

article thumbnail

Gartner® Recognizes Cloudera in Critical Capabilities for Cloud Database Management Systems for Operational Use Cases

Cloudera

Cloudera has been recognized as a Visionary in 2021 Gartner® Magic Quadrant for Cloud Database Management Systems (DBMS) and for the first time, evaluated CDP Operational Database (COD) against the 12 critical capabilities for Operational Databases. Overall, Gartner recognized 20 vendors for the Magic Quadrant of which 16 were evaluated in the 2021 Gartner Critical Capabilities for Cloud Database Management Systems for Operational Use Cases and 18 vendors for the 2021 Gartner Critical Capabil

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Data Engineering Annotated Monthly – January 2022

Big Data Tools

Due to the public holidays in Russia and my own vacation time, I didn’t get a chance to write an Annotated for December. Waiting a little longer might not be such a bad thing in this case, because now we have even more interesting releases to talk about! Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering sector and highlight new ideas from the wider community.

article thumbnail

The Not-so-Sexy SQL Concepts to Make You Stand Out

KDnuggets

Databases are the houses of our data and data scientists HAVE TO HAVE A KEY! In this article, I discuss some lesser known concepts of SQL that data scientists do not familiarize themselves with.

SQL 126
article thumbnail

Monte Carlo Data Observability Insights Now Available in the Snowflake Data Marketplace

Monte Carlo

Is your data quality improving? What is your most used data? Where in the pipeline are your most frequent data issues occurring? With Snowflake Secure Data Sharing, building custom workflows and dashboards to answer these questions has never been easier. I am excited to announce Monte Carlo Data Observability Insights , end-to-end operational analytics of an organization’s data platform, is now available in the Snow flake Data Marketplace.

article thumbnail

Principal Engineering at Zalando

Zalando Engineering

In many companies, Senior Engineers who do not pursue Engineering Management, end up in a dead end in terms of their career progression. At Zalando, we have had a career path for individual contributors since 2016. Senior Software Engineers can choose one of the three possible career paths: Engineering Management Principal Engineering Technical Program Management In this post, we detail out how we leverage our senior individual contributors (Principal Engineers) throughout the company.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Engineering Annotated Monthly – January 2022

Big Data Tools

Due to the public holidays in Russia and my own vacation time, I didn’t get a chance to write an Annotated for December. Waiting a little longer might not be such a bad thing in this case, because now we have even more interesting releases to talk about! Hi, I’m Pasha Finkelshteyn , and I’ll be your guide through this month’s news. I’ll offer my impressions of recent developments in the data engineering sector and highlight new ideas from the wider community.

article thumbnail

5 Ways to Apply AI to Small Data Sets

KDnuggets

It is better to use AI algorithms on small data sets for results free of human errors and false results when applied correctly. Here are some methods to apply AI to small data sets.

Algorithm 120
article thumbnail

The JaffleGaggle Story: Data Modeling for a Customer 360 View

dbt Developer Hub

Editor's note: In this tutorial, Donny walks through the fictional story of a SaaS company called JaffleGaggle, who needs to group their freemium individual users into company accounts (aka a customer 360 view) in order to drive their product-led growth efforts. You can follow along with Donny's data modeling technique for identity resolution in this dbt project repo.

article thumbnail

Time Series Forecasting: What, Why, and, How?

ProjectPro

This blog introduces the concept of time series forecasting models in the most detailed form. First, there will be a simple introduction to highlight the significance of such models. Next, you will find a section that presents the definition of a time series forecasting article. After that, you will explore popular time-series-forecasting models. The blog's last two parts cover various use cases of these models and projects related to time series analysis and forecasting problems.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Releasing Connexion to the Community

Zalando Engineering

Connexion is a Python framework that automagically handles HTTP requests based on OpenAPI specification (formerly known as Swagger Spec) of your API described in YAML format. Connexion allows you to write an OpenAPI specification, then maps the endpoints to your Python functions; this makes it unique, as many tools generate the specification based on your Python code.

Scala 52
article thumbnail

Build a Web Scraper with Python in 5 Minutes

KDnuggets

In this article, I will show you how to create a web scraper from scratch in Python.

Python 150
article thumbnail

Make a Snake Game with Scala in 10 Minutes

Rock the JVM

The ultimate 10-minute guide to building a Snake game in Scala: learn fast and code smarter

Scala 52
article thumbnail

Building the Business Case for DataOps

DataKitchen

The post Building the Business Case for DataOps first appeared on DataKitchen.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

KDnuggets™ News 22:n06, Feb 9: Data Science Programming Languages and When To Use Them; Complete Collection of Data Science Cheat Sheets

KDnuggets

Data Science Programming Languages and When To Use Them; The Complete Collection of Data Science Cheat Sheets – Part 1; Build a Web Scraper with Python in 5 Minutes; 8 Best Data Science Courses to Enroll in 2022 For Steep Career Advancement; Classifying Long Text Documents Using BERT.

article thumbnail

Building a Visual Search Engine – Part 1: Data Exploration

KDnuggets

Ever wonder how Google or Bing finds similar images to your image? The algorithms for generating text based 10 blue-links are very different from finding visually similar or related images. In this article, we will explain one such method to build a visual search engine. We will use the Caltech 101 dataset which contains images of common objects used in daily life.

article thumbnail

The motivation behind using graph convolutions

KDnuggets

This article is an excerpt from the book Machine Learning with PyTorch and Scikit-Learn is the new book from the widely acclaimed and bestselling Python Machine Learning series, fully updated and expanded to cover PyTorch, transformers, graph neural networks, and best practices.

article thumbnail

Data Mesh & Its Distributed Data Architecture

KDnuggets

Going forward, data professionals have found a new way to address the scalability of sources through data mesh.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.