Telling a Great Data Story: A Visualization Decision Tree
KDnuggets
FEBRUARY 25, 2022
Pick your visualizations strategically. They need to tell a story.
KDnuggets
FEBRUARY 25, 2022
Pick your visualizations strategically. They need to tell a story.
Start Data Engineering
FEBRUARY 22, 2022
1. Introduction 2. CI 3. Sample project: Data testing with Github Actions 3.1. Prerequisites 3.2. Project overview 3.3. Automating data tests with Github Actions 4. Conclusion 5. Further reading 1. Introduction Automated testing is crucial for ensuring that your code is bug-free and avoiding regressions. If you are wondering How can data tests be integrated into a CI (Continuous Integration) pipeline?
Marc Lamberti
FEBRUARY 21, 2022
Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know.
Confluent
FEBRUARY 24, 2022
A few years ago I helped build an event-driven system for gym bookings. The pitch was that we were building a better experience for both the gym members booking different […].
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
KDnuggets
FEBRUARY 25, 2022
This blog post aims to describe the vanishing gradient problem and explain how use of the sigmoid function resulted in it.
Cloudera
FEBRUARY 22, 2022
Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists. This unprecedented level of big data workloads hasn’t come without its fair share of challenges.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
ProjectPro
FEBRUARY 25, 2022
What is a Machine Learning Pipeline? A machine learning pipeline helps automate machine learning workflows by processing and integrating data sets into a model, which can then be evaluated and delivered. A well-built pipeline helps in the flexibility of the model implementation. A pipeline in machine learning is a technical infrastructure that allows an organization to organize and automate machine learning operations.
KDnuggets
FEBRUARY 22, 2022
Machine Learning as a technology, ensures that our current gadgets and their software get smarter by the day. Here are the algorithms that you ought to know about to understand Machine Learning’s varied and extensive functionalities and their effectiveness.
Cloudera
FEBRUARY 23, 2022
The telecommunications industry has been doing well since the pandemic started (not that many would notice). Revenues have remained relatively stable, while consumption has gone up, as virtual engagement has become the primary mode of operations for many businesses (and families!) In the mean-time, digital transformation has been accelerating both as a means to respond to the pandemic, and as a mechanism to drive costs down further, allowing for margin growth.
Data Engineering Podcast
FEBRUARY 20, 2022
Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when scaling from single machines and exploratory development to massively parallel distributed computation.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
ProjectPro
FEBRUARY 25, 2022
When the world was under lockdown and movement was restricted to an absolute emergency- millions were introduced to the world of online shopping. The convenience of online shopping helped e-commerce platforms record historic sales. While that happened, it is no surprise that the rate of online financial fraud also increased incredibly. Online fraud cases using credit and debit cards saw a historic upsurge of 225 percent during the COVID-19 pandemic in 2020 as compared to 2019.
KDnuggets
FEBRUARY 23, 2022
This article outlines some of the most common design patterns encountered when creating successful Machine Learning solutions.
Cloudera
FEBRUARY 24, 2022
In the latest installment of the EMEA Influential Women in Data webinar series, we welcomed Shirley Collie, Chief Health Analytics Actuary at Discovery Health to discuss everything from how the pandemic has impacted working, to the opportunities within data, and the importance of intentionality. A data-driven organization. Shirley knows better than most about the impact that COVID 19 has had on the world.
Hepta Analytics
FEBRUARY 24, 2022
Week 3 was about data warehousing, working on the data that was ingested in the week 2. We will take the already ingested data and create an external table from it and optimize the performance of queries through partitioning and clustering. Then automate the whole process using airflow. There are two systems types when dealing with data: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP).
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
ProjectPro
FEBRUARY 25, 2022
Read this article to find the right resources for learning MLOps. The blog starts with an introduction to MLOps, skills required to become an MLOps engineer, and then lays out an MLOps learning path for beginners. MLOps is an acronym that represents the combination of Machine-Learning(ML) and Operations. It is a beautiful technique for implementing data science projects that allow businesses to increase their projects’ efficiency minimize the risk of introducing machine learning, artificia
KDnuggets
FEBRUARY 21, 2022
A collection of cheat sheets that will help you prepare for a technical interview on Data Structures & Algorithms, Machine learning, Deep Learning, Natural Language Processing, Data Engineering, Web Frameworks.
Rockset
FEBRUARY 24, 2022
Event-based architectures have been gaining popularity for some time. With increased adoption has come a flood of options for aggregating and analyzing events. Which databases are optimized for ingesting streaming events and analyzing them in real time? The answer is complex, nuanced and heavily dependent on the precise problem being solved. This post is intended to help anyone seeking to make a selection from a difficult to understand landscape.
Preset
FEBRUARY 23, 2022
This article walks you through a potential approach to monitor your Superset usage directly within Superset leveraging the internal metadata database.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
ProjectPro
FEBRUARY 24, 2022
Working with audio data has been a relatively less widespread and explored problem in machine learning. In most cases, benchmarks for the latest seminal work in deep learning are measured on text and image data performances. Moreover, the most significant advances in deep learning are found in models that work with text and images. Amidst this, speech and audio, an equally important type of data, often gets overlooked.
KDnuggets
FEBRUARY 24, 2022
Object-relational mapping, or ORM, is a technique that allows you to interact with databases using the object-oriented paradigm of the programming language of your choosing. How is that different from structured query language, though, and when do you use them?
Confluent
FEBRUARY 23, 2022
In many ways, Storyblocks’ technical journey has mirrored that of most other startups and disruptors: Start small and as simple as possible (i.e., with a PHP monolith) Watch the company […].
Pipeline Data Engineering
FEBRUARY 22, 2022
Data engineering salon. News and interesting reads about the world of data. We’ve only scratched the surface of the full potential for the data warehouse Mikkel Dengsøe, Head of Data Science, Operations & Financial Crime, Monzo Bank Why I think the data warehouse will become the control centre for modern companies Git, SQL, CLI Vicki Boykis, Machine Learning Engineer, Automattic I’ve narrowed it down to three basic tools.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
ProjectPro
FEBRUARY 22, 2022
This article will teach you exciting SQL project ideas to develop data analysis skills. You will explore challenging problems that you can quickly solve with this simple query language. It doesn’t matter if you are a beginner or a professional at using SQL; our list of SQL database projects has one for you. Data, data, everywhere! Where’s the way to manage it?
KDnuggets
FEBRUARY 22, 2022
Learn how to optimize your data science workflow in a few lines of code.
Preset
FEBRUARY 21, 2022
CrateDB is a distributed SQL database that excels at IoT and Time Series data workflows. In this post, we'll showcase how CrateDB and Superset can be used together.
AltexSoft
FEBRUARY 21, 2022
There has been a lot of buzz around data science, machine learning (ML), and artificial intelligence (AI) lately. As you may already know, to train a machine learning model, you need data. Lots of data, to be more precise. Lots of quality data, to be even more precise. To save you time, watch our 14-minute video on how data is prepared for machine learning.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
ProjectPro
FEBRUARY 21, 2022
Learn PySpark Joins in a single go! From the various type of PySpark joins to their syntax and PySpark join example, this blog has it all for you. Without much ado, let’s dive right into the concepts. Table of Contents Why are PySpark Joins Important for Data Analytics? PySpark Joins- Types of Joins with Examples General Syntax for PySpark Join- PySpark Inner Join PySpark Left Join / PySpark Left Outer Join PySpark Right Join/ PySpark Right Outer Join PySpark Full Outer Join PySpark Left S
KDnuggets
FEBRUARY 25, 2022
Learn data analytics by taking the best YouTube courses. These courses will cover data analysis with Python, R, SQL, PowerBI, Tableau, Excel, and SPSS.
U-Next
FEBRUARY 21, 2022
Artificial Intelligence (AI) is not just making our lives convenient. It is empowering us with information and insights that have the potential to change the world for the better. With its application across diverse industries, market segments and real-world concerns, the role of AI is becoming increasingly inevitable by the day. This is to the extent that we see AI as a savior to some of the most plaguing concerns of humankind.
KDnuggets
FEBRUARY 25, 2022
The data scientist salary - the past, the present, and a little bit of the future.
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content