Sat.Dec 11, 2021 - Fri.Dec 17, 2021

article thumbnail

10 Key AI & Data Analytics Trends for 2022 and Beyond

KDnuggets

What AI and data analytics trends are taking the industry by storm this year? This comprehensive review highlights upcoming directions in AI to carefully watch and consider implementing in your personal work or organization.

article thumbnail

Cloudera Response to CVE-2021-44228

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 132
article thumbnail

Building Auditable Spark Pipelines At Capital One

Data Engineering Podcast

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large volumes of data Capital One has invested in it for their business needs. In this episode Gokul Prabagaren shares his use for it in calculating your rewards points, including the auditing requirements and how he designed his pipeline to maintain all of the necessary information through a pattern of data enrichment.

Building 130
article thumbnail

How to choose the right tools for your data pipeline

Start Data Engineering

1. Introduction 2. Requirements 3. Components 4. Choosing tools 4.1 Requirement x Component framework 4.2 Filters 5. Conclusion 6. Further reading 1. Introduction If you are building data pipelines from the ground up, the number of available data engineering tools to choose from can be overwhelming. If you are thinking Most of the tools seem to be doing the same/similar thing, which one should I choose?

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The 5 Characteristics of a Successful Data Scientist

KDnuggets

I've put some thought into this, and come up with the 5 characteristics of a what I believe define a successful data scientist. Do you agree?

Data 160
article thumbnail

Azure Data Factory Linked Service: Advanced Authoring

Azure Data Engineering

We have discussed Linked Service parameterization through the UI, in a previous post. But not all Linked Service Types support parametrization using the UI. In this post, we will discuss the Linked Services that can’t be parameterized using the UI. (i.e., they don’t have any option to add parameter). If you are familiar with Azure Services, you might know that the Linked Services or any other Azure artefact has corresponding underlying JSON code.

Coding 130

More Trending

article thumbnail

8 analytics startups to watch over the next year

DataKitchen

The post 8 analytics startups to watch over the next year first appeared on DataKitchen.

126
126
article thumbnail

Write Clean Python Code Using Pipes

KDnuggets

A short and clean approach to processing iterables.

Coding 160
article thumbnail

AI and ML: No Longer the Stuff of Science Fiction

Cloudera

Artificial Intelligence (AI) has revolutionized how various industries operate in recent years. But with growing demands, there’s a more nuanced need for enterprise-scale machine learning solutions and better data management systems. The 2021 Data Impact Awards aim to honor organizations who have shown exemplary work in this area. . The category “Data for Enterprise AI” awards companies from around the world that have built and deployed use cases for enterprise-scale machine learning and have in

article thumbnail

Data Sharing Patterns with Confluent Schema Registry

Confluent

Sharing metadata on the data you store in your Confluent cluster is paramount to allow for effective sharing of that data across the enterprise. As the usage of real-time data […].

Metadata 105
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cadence Multi-Tenant Task Processing

Uber Engineering

Introduction. Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to handle millions of concurrent executions from various customers. It is currently used by hundreds of … The post Cadence Multi-Tenant Task Processing appeared first on Uber Engineering Blog.

Process 104
article thumbnail

Feature Selection: Where Science Meets Art

KDnuggets

From heuristic to algorithmic feature selection techniques for data science projects.

Algorithm 160
article thumbnail

Cloudera Response to CVE-2021-4428

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 100
article thumbnail

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Data Engineering Podcast

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Quickly Deploy Confluent Platform with the New Ansible Installer

Confluent

An initial distributed deployment of Confluent Platform is often a necessary step toward supporting your first real-time data use case. We offer enterprise-grade deployment orchestration with Confluent for Kubernetes and […].

Data 98
article thumbnail

5 Key Skills Needed To Become a Great Data Scientist

KDnuggets

Based on 10 years of my experience (learn to build those skills).

Data 160
article thumbnail

#ClouderaLife Spotlight: Manoj Shanmugasundaram – Principal Solutions Engineer

Cloudera

Manoj Shanmugasundaram has been with Cloudera for 5 and a half years bringing his talents to our Solutions Engineering team. . As a Principal Solutions Engineer, he says his core responsibility is “to take Cloudera’s latest and greatest technology and meet a customer’s complex business requirements, across the data lifecycle, on any cloud or the datacenter.”.

article thumbnail

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

Before we shut the door on 2021, we would like to share our most popular DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year. Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Mesh and Data Virtualization are not the Same Thing

Teradata

The Data Mesh approach to enterprise data architecture has many benefits, but there is a widespread misunderstanding that will significantly limit those benefits for anyone who holds it.

article thumbnail

A Full End-to-End Deployment of a Machine Learning Algorithm into a Live Production Environment

KDnuggets

How to use scikit-learn, pickle, Flask, Microsoft Azure and ipywidgets to fully deploy a Python machine learning algorithm into a live, production environment.

article thumbnail

How To Overcome Hybrid Cloud Migration Roadblocks

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Cloud 91
article thumbnail

How to Learn SQL Basics for Data Science in 2023?

ProjectPro

Data science and artificial intelligence might be the buzzwords of recent times, but they are of no value without the right data backing them. The process of data collection has increased exponentially over the last few years. The companies are churning out massive volumes of data every day for analysis and deriving business insights. All this data is stored in a database that requires SQL-based queries for retrieval and transformations, making it essential for every data professional to learn S

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Powering SQL Draw with Rockset, Retool and dbt

Rockset

If you were one of the 15,000 people who attended Coalesce 2021 , you will likely remember SQL Draw, the Slack-based game combining SQL with cartesian geometry, art, creativity and teamwork. If you missed it, you can read more about SQL Draw on the Omnata website. Below are a few of the artworks that received the most votes: Behind the scenes, SQL Draw is made up of two parts: The core game is built as a Slack app with a totally serverless backend architecture.

SQL 52
article thumbnail

My First Six Months as a Data Scientist

KDnuggets

The technical and non-technical lessons I’ve learned.

Data 160
article thumbnail

Why Company Data Strategies Are Indelibly Linked with DEI

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Data 90
article thumbnail

A Collection of Take-Home Data Science Challenges for 2023

ProjectPro

Challenges make us all uncomfortable but none of us can deny that difficult challenges only help us bring out the stronger and better version of ourselves. So, if you are a professional data scientist or an enthusiast, read this article for a collection of take-home Data Science Challenges and develop better skills by attempting them. Working on take-home data science challenges is equally important for professionals and beginners alike.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

What’s a Data Catalog and How to Choose the Right One

phData: Data Engineering

Your business might be moving to the cloud, just completed, or have been established with it for a little while, and you are likely wondering, “what data catalog tool is best for me?” The short answer is…it depends. There are a lot of options available, and choosing the right data catalog for your business will highly depend on: What drives your business Your data needs Your unique data culture How you can support your data To provide you with the best possible chance of success on your d

article thumbnail

Top Resources for Learning Statistics for Data Science

KDnuggets

Let’s take a look at the current state of statistics in data science, and what you can do to accelerate your learning.

article thumbnail

It’s Time to Listen More to Your Employees!

Cloudera

Now is the time to sit up and listen. Not to me, but to your teams. Much of 2020 and 2021 were spent coping with new demands of remote work while negotiating the multitude of disruptions resulting from the pandemic. And this year, even as we inch our way back to business as we knew it, redefining norms for a hybrid future requires us to answer questions that often cannot be resolved on our own.

Cloud 83
article thumbnail

Machine Learning Engineer vs Data Scientist - The Differences

ProjectPro

Are you a newbie in the data science domain ready to embark on a rewarding journey but are confused between the roles of a Machine Learning Engineer vs Data Scientist? Many data science beginners do not clearly understand the two job roles and often find it challenging to understand the day-to-day roles and responsibilities revolving around these jobs.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.