Telling a Great Data Story: A Visualization Decision Tree
KDnuggets
FEBRUARY 25, 2022
Pick your visualizations strategically. They need to tell a story.
KDnuggets
FEBRUARY 25, 2022
Pick your visualizations strategically. They need to tell a story.
Netflix Tech
FEBRUARY 18, 2022
By: Ankush Gulati , David Gevorkyan Additional credits: Michael Clark , Gokhan Ozer Intro Netflix has more than 220 million active members who perform a variety of actions throughout each session, ranging from renaming a profile to watching a title. Reacting to these actions in near real-time to keep the experience consistent across devices is critical for ensuring an optimal member experience.
Start Data Engineering
FEBRUARY 22, 2022
1. Introduction 2. CI 3. Sample project: Data testing with Github Actions 3.1. Prerequisites 3.2. Project overview 3.3. Automating data tests with Github Actions 4. Conclusion 5. Further reading 1. Introduction Automated testing is crucial for ensuring that your code is bug-free and avoiding regressions. If you are wondering How can data tests be integrated into a CI (Continuous Integration) pipeline?
Marc Lamberti
FEBRUARY 21, 2022
Airflow dynamic DAGs can save you a ton of time. As you know, Apache Airflow is written in Python, and DAGs are created via Python scripts. That makes it very flexible and powerful (even complex sometimes). By leveraging Python, you can create DAGs dynamically based on variables, connections, a typical pattern, etc. This very nice way of generating DAGs comes at the price of higher complexity and subtle tricky things that you must know.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
Confluent
FEBRUARY 24, 2022
A few years ago I helped build an event-driven system for gym bookings. The pitch was that we were building a better experience for both the gym members booking different […].
Cloudera
FEBRUARY 9, 2022
As we celebrate Black History Month, for this Employee Spotlight I sat down with Marque Blackman, co-lead of the Cloudera Black Employee Network (CBEN). We discussed his experience at Cloudera, his career transitions, and what he learned along the way. We also discussed his work with CBEN and his perspective on Black History Month. Meet Marque Blackman, Director of Global Workplace .
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Data Engineering Podcast
FEBRUARY 27, 2022
Summary There are a wealth of options for managing structured and textual data, but unstructured binary data assets are not as well supported across the ecosystem. As organizations start to adopt cloud technologies they need a way to manage the distribution, discovery, and collaboration of data across their operating environments. To help solve this complicated challenge Krishna Subramanian and her co-founders at Komprise built a system that allows you to treat use and secure your data wherever
DataKitchen
FEBRUARY 8, 2022
The post New Data Horizons: Data Prep, Data Visualization, and Data Catalogs Are Ready for Prime Time first appeared on DataKitchen.
Netflix Tech
FEBRUARY 9, 2022
by Sam Setegne, Jai Balani, Olek Gorajek Glossary asset ?—?any business logic code in a raw (e.g. SQL) or compiled (e.g. JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a set of tasks (or jobs) to be executed in a predefined order (a.k.a. DAG) for the purpose of transforming data using some business logic. Dataflow ?
Confluent
FEBRUARY 18, 2022
As data flows in and out of your Confluent Cloud clusters, it’s imperative to monitor their behavior. Bring Your Own Monitoring (BYOM) means you can configure an application performance monitoring […].
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Cloudera
FEBRUARY 10, 2022
After the launch of Cloudera DataFlow for the Public Cloud (CDF-PC) on AWS a few months ago, we are thrilled to announce that CDF-PC is now generally available on Microsoft Azure, allowing NiFi users on Azure to run their data flows in a cloud-native runtime. . With CDF-PC, NiFi users can import their existing data flows into a central catalog from where they can be deployed to a Kubernetes based runtime through a simple flow deployment wizard or with a single CLI command.
KDnuggets
FEBRUARY 25, 2022
This blog post aims to describe the vanishing gradient problem and explain how use of the sigmoid function resulted in it.
Data Engineering Podcast
FEBRUARY 27, 2022
Summary Building a data platform is a complex journey that requires a significant amount of planning to do well. It requires knowledge of the available technologies, the requirements of the operating environment, and the expectations of the stakeholders. In this episode Tobias Macey, the host of the show, reflects on his plans for building a data platform and what he has learned from running the podcast that is influencing his choices.
ProjectPro
FEBRUARY 28, 2022
Facial Expression Recognition (FER) based technologies are an integral part of the emotion recognition market, which is anticipated to reach $56 billion by 2024—detecting Emotions? Using AI? Can we really do that? The answer is YES! One can easily build a facial emotion recognition project in Python. Continue reading to find the answer to how you can do that.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Netflix Tech
FEBRUARY 1, 2022
By Karen Casella, Director of Engineering, Access & Identity Management Have you ever experienced one of the following scenarios while looking for your next role? You study and practice coding interview problems for hours/days/weeks/months, only to be asked to merge two sorted lists. You apply for multiple roles at the same company and proceed through the interview process with each hiring team separately, despite the fact that there is tremendous overlap in the roles.
Confluent
FEBRUARY 3, 2022
A common challenge organizations face is how to extract, transform, and load (ETL) Salesforce data into a data warehouse, so that the business can use the data. Salesforce (SFDC) is […].
Cloudera
FEBRUARY 1, 2022
Okay, I admit, the title is a little click-batey, but it does hold some truth! I spent the holidays up in the mountains, and if you live in the northern hemisphere like me, you know that means that I spent the holidays either celebrating or cursing the snow. When I was a kid, during this time of year we would always do an art project making snowflakes.
KDnuggets
FEBRUARY 18, 2022
AI and machine learning can provide us with these tools. This guide will explore how we can use machine learning to label data.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Data Engineering Podcast
FEBRUARY 20, 2022
Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make it possible to analyze massive amounts of genomic information. In this episode Guy Yachdav, director of software engineering for ImmunAI, shares the complexities that are inherent to managing data workflows for bioinformatics.
ProjectPro
FEBRUARY 25, 2022
What is a Machine Learning Pipeline? A machine learning pipeline helps automate machine learning workflows by processing and integrating data sets into a model, which can then be evaluated and delivered. A well-built pipeline helps in the flexibility of the model implementation. A pipeline in machine learning is a technical infrastructure that allows an organization to organize and automate machine learning operations.
Hepta Analytics
FEBRUARY 24, 2022
Week 3 was about data warehousing, working on the data that was ingested in the week 2. We will take the already ingested data and create an external table from it and optimize the performance of queries through partitioning and clustering. Then automate the whole process using airflow. There are two systems types when dealing with data: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP).
Rockset
FEBRUARY 24, 2022
Event-based architectures have been gaining popularity for some time. With increased adoption has come a flood of options for aggregating and analyzing events. Which databases are optimized for ingesting streaming events and analyzing them in real time? The answer is complex, nuanced and heavily dependent on the precise problem being solved. This post is intended to help anyone seeking to make a selection from a difficult to understand landscape.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
Cloudera
FEBRUARY 22, 2022
Over the past decade, the successful deployment of large scale data platforms at our customers has acted as a big data flywheel driving demand to bring in even more data, apply more sophisticated analytics, and on-board many new data practitioners from business analysts to data scientists. This unprecedented level of big data workloads hasn’t come without its fair share of challenges.
KDnuggets
FEBRUARY 17, 2022
There's no free lunch in machine learning. So, determining which algorithm to use depends on many factors from the type of problem at hand to the type of output you are looking for. This guide offers several considerations to review when exploring the right ML approach for your dataset.
Data Engineering Podcast
FEBRUARY 20, 2022
Summary Python has grown to be one of the top languages used for all aspects of data, from collection and cleaning, to analysis and machine learning. Along with that growth has come an explosion of tools and engines that help power these workflows, which introduces a great deal of complexity when scaling from single machines and exploratory development to massively parallel distributed computation.
ProjectPro
FEBRUARY 25, 2022
When the world was under lockdown and movement was restricted to an absolute emergency- millions were introduced to the world of online shopping. The convenience of online shopping helped e-commerce platforms record historic sales. While that happened, it is no surprise that the rate of online financial fraud also increased incredibly. Online fraud cases using credit and debit cards saw a historic upsurge of 225 percent during the COVID-19 pandemic in 2020 as compared to 2019.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
Preset
FEBRUARY 23, 2022
This article walks you through a potential approach to monitor your Superset usage directly within Superset leveraging the internal metadata database.
Confluent
FEBRUARY 23, 2022
In many ways, Storyblocks’ technical journey has mirrored that of most other startups and disruptors: Start small and as simple as possible (i.e., with a PHP monolith) Watch the company […].
Cloudera
FEBRUARY 17, 2022
CDP Private Cloud Base is an on-premises version of Cloudera Data Platform (CDP). This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack. This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.
KDnuggets
FEBRUARY 14, 2022
Calculus is the key to fully understanding how neural networks function. Go beyond a surface understanding of this mathematics discipline with these free course materials from MIT.
Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage
When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.
Let's personalize your content