Sat.Dec 11, 2021 - Fri.Dec 17, 2021

article thumbnail

10 Key AI & Data Analytics Trends for 2022 and Beyond

KDnuggets

What AI and data analytics trends are taking the industry by storm this year? This comprehensive review highlights upcoming directions in AI to carefully watch and consider implementing in your personal work or organization.

article thumbnail

Azure Data Factory Linked Service: Advanced Authoring

Azure Data Engineering

We have discussed Linked Service parameterization through the UI, in a previous post. But not all Linked Service Types support parametrization using the UI. In this post, we will discuss the Linked Services that can’t be parameterized using the UI. (i.e., they don’t have any option to add parameter). If you are familiar with Azure Services, you might know that the Linked Services or any other Azure artefact has corresponding underlying JSON code.

Coding 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Building Auditable Spark Pipelines At Capital One

Data Engineering Podcast

Summary Spark is a powerful and battle tested framework for building highly scalable data pipelines. Because of its proven ability to handle large volumes of data Capital One has invested in it for their business needs. In this episode Gokul Prabagaren shares his use for it in calculating your rewards points, including the auditing requirements and how he designed his pipeline to maintain all of the necessary information through a pattern of data enrichment.

Building 130
article thumbnail

How to choose the right tools for your data pipeline

Start Data Engineering

1. Introduction 2. Requirements 3. Components 4. Choosing tools 4.1 Requirement x Component framework 4.2 Filters 5. Conclusion 6. Further reading 1. Introduction If you are building data pipelines from the ground up, the number of available data engineering tools to choose from can be overwhelming. If you are thinking Most of the tools seem to be doing the same/similar thing, which one should I choose?

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

The 5 Characteristics of a Successful Data Scientist

KDnuggets

I've put some thought into this, and come up with the 5 characteristics of a what I believe define a successful data scientist. Do you agree?

Data 160
article thumbnail

The Definitive Guide to Building a Data Mesh with Event Streams

Confluent

Data mesh. This oft-talked-about architecture has no shortage of blog posts, conference talks, podcasts, and discussions. One thing that you may have found lacking is a concrete guide on precisely […].

Building 129

More Trending

article thumbnail

8 analytics startups to watch over the next year

DataKitchen

The post 8 analytics startups to watch over the next year first appeared on DataKitchen.

123
123
article thumbnail

Write Clean Python Code Using Pipes

KDnuggets

A short and clean approach to processing iterables.

Coding 160
article thumbnail

Cadence Multi-Tenant Task Processing

Uber Engineering

Introduction. Cadence is a multi-tenant orchestration framework that helps developers at Uber to write fault-tolerant, long-running applications, also known as workflows. It scales horizontally to handle millions of concurrent executions from various customers. It is currently used by hundreds of … The post Cadence Multi-Tenant Task Processing appeared first on Uber Engineering Blog.

Process 121
article thumbnail

AI and ML: No Longer the Stuff of Science Fiction

Cloudera

Artificial Intelligence (AI) has revolutionized how various industries operate in recent years. But with growing demands, there’s a more nuanced need for enterprise-scale machine learning solutions and better data management systems. The 2021 Data Impact Awards aim to honor organizations who have shown exemplary work in this area. . The category “Data for Enterprise AI” awards companies from around the world that have built and deployed use cases for enterprise-scale machine learning and have in

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Data Sharing Patterns with Confluent Schema Registry

Confluent

Sharing metadata on the data you store in your Confluent cluster is paramount to allow for effective sharing of that data across the enterprise. As the usage of real-time data […].

Metadata 105
article thumbnail

Feature Selection: Where Science Meets Art

KDnuggets

From heuristic to algorithmic feature selection techniques for data science projects.

Algorithm 160
article thumbnail

Deliver Personal Experiences In Your Applications With The Unomi Open Source Customer Data Platform

Data Engineering Podcast

Summary The core to providing your users with excellent service is to understand them and provide a personalized experience. Unfortunately many sites and applications take that to the extreme and collect too much information. In order to make it easier for developers to build customer profiles in a way that respects their privacy Serge Huber helped to create the Apache Unomi framework as an open source customer data platform.

article thumbnail

DataKitchen’s Best of 2021 DataOps Resources

DataKitchen

Before we shut the door on 2021, we would like to share our most popular DataOps content in hopes that it can help you as you learn about and implement DataOps. We hope you and your family have happy holidays and we look forward to continuing your DataOps journey with you in the new year. Without further ado, here are DataKitchen’s top ten blog posts, top five white papers, and top five webinars from 2021.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Quickly Deploy Confluent Platform with the New Ansible Installer

Confluent

An initial distributed deployment of Confluent Platform is often a necessary step toward supporting your first real-time data use case. We offer enterprise-grade deployment orchestration with Confluent for Kubernetes and […].

Data 98
article thumbnail

5 Key Skills Needed To Become a Great Data Scientist

KDnuggets

Based on 10 years of my experience (learn to build those skills).

Data 160
article thumbnail

Cloudera Response to CVE-2021-4428

Cloudera

Summary. On December 10th 2021, the Apache Software Foundation released version 2.15.0 of the Log4j Java logging library, fixing CVE-2021-44228 , a remote code execution vulnerability affecting Log4j 2.0-2.14. An attacker can use this vulnerability to instruct affected systems to download and execute a malicious payload through submitting a custom-crafted request.

Java 96
article thumbnail

AutoML: How to Automate Machine Learning With Google Vertex AI, Amazon SageMaker, H20.ai, and Other Providers

AltexSoft

Machine learning evangelizes the idea of automation. On the surface, ML algorithms take the data, develop their own understanding of it, and generate valuable business insights and predictions — all without human intervention. In truth, ML involves an enormous amount of repetitive manual operations, all hidden behind the scenes. Citing Microsoft’s principal researcher Rich Caruana, ‘75 percent of machine learning is preparing to do machine learning… and 15 percent is what you do afterwards.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Data Mesh and Data Virtualization are not the Same Thing

Teradata

The Data Mesh approach to enterprise data architecture has many benefits, but there is a widespread misunderstanding that will significantly limit those benefits for anyone who holds it.

article thumbnail

A Full End-to-End Deployment of a Machine Learning Algorithm into a Live Production Environment

KDnuggets

How to use scikit-learn, pickle, Flask, Microsoft Azure and ipywidgets to fully deploy a Python machine learning algorithm into a live, production environment.

article thumbnail

#ClouderaLife Spotlight: Manoj Shanmugasundaram – Principal Solutions Engineer

Cloudera

Manoj Shanmugasundaram has been with Cloudera for 5 and a half years bringing his talents to our Solutions Engineering team. . As a Principal Solutions Engineer, he says his core responsibility is “to take Cloudera’s latest and greatest technology and meet a customer’s complex business requirements, across the data lifecycle, on any cloud or the datacenter.”.

article thumbnail

How to Learn SQL Basics for Data Science in 2023?

ProjectPro

Data science and artificial intelligence might be the buzzwords of recent times, but they are of no value without the right data backing them. The process of data collection has increased exponentially over the last few years. The companies are churning out massive volumes of data every day for analysis and deriving business insights. All this data is stored in a database that requires SQL-based queries for retrieval and transformations, making it essential for every data professional to learn S

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Powering SQL Draw with Rockset, Retool and dbt

Rockset

If you were one of the 15,000 people who attended Coalesce 2021 , you will likely remember SQL Draw, the Slack-based game combining SQL with cartesian geometry, art, creativity and teamwork. If you missed it, you can read more about SQL Draw on the Omnata website. Below are a few of the artworks that received the most votes: Behind the scenes, SQL Draw is made up of two parts: The core game is built as a Slack app with a totally serverless backend architecture.

SQL 52
article thumbnail

My First Six Months as a Data Scientist

KDnuggets

The technical and non-technical lessons I’ve learned.

Data 160
article thumbnail

How To Overcome Hybrid Cloud Migration Roadblocks

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Cloud 85
article thumbnail

A Collection of Take-Home Data Science Challenges for 2023

ProjectPro

Challenges make us all uncomfortable but none of us can deny that difficult challenges only help us bring out the stronger and better version of ourselves. So, if you are a professional data scientist or an enthusiast, read this article for a collection of take-home Data Science Challenges and develop better skills by attempting them. Working on take-home data science challenges is equally important for professionals and beginners alike.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

What’s a Data Catalog and How to Choose the Right One

phData: Data Engineering

Your business might be moving to the cloud, just completed, or have been established with it for a little while, and you are likely wondering, “what data catalog tool is best for me?” The short answer is…it depends. There are a lot of options available, and choosing the right data catalog for your business will highly depend on: What drives your business Your data needs Your unique data culture How you can support your data To provide you with the best possible chance of success on your d

article thumbnail

Top Resources for Learning Statistics for Data Science

KDnuggets

Let’s take a look at the current state of statistics in data science, and what you can do to accelerate your learning.

article thumbnail

Why Company Data Strategies Are Indelibly Linked with DEI

Cloudera

About the report. The Cloudera Enterprise Data Maturity Report is a global survey of 3,150 business and IT decision makers assessing organizations’ maturity when it comes to their current capabilities and handling of data and analytics. Organizations were evaluated based on their current use of data and analytics, parties championing the use of data and the extent to which data is used across processes, the presence of enterprise data strategies, and the extent to which capabilities relating to

Data 84
article thumbnail

Machine Learning Engineer vs Data Scientist - The Differences

ProjectPro

Are you a newbie in the data science domain ready to embark on a rewarding journey but are confused between the roles of a Machine Learning Engineer vs Data Scientist? Many data science beginners do not clearly understand the two job roles and often find it challenging to understand the day-to-day roles and responsibilities revolving around these jobs.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.