Sat.Apr 30, 2022 - Fri.May 06, 2022

article thumbnail

Hypothesis Testing Explained

KDnuggets

This brief overview of the concept of Hypothesis Testing covers its classification in parametric and non-parametric tests, and when to use the most popular ones, including means, correlation, and distribution, in the case of one sample and two samples.

IT 160
article thumbnail

AI-First Benefits: 5 Real-World Outcomes

Cloudera

Artificial intelligence (AI) has been a focus for research for decades, but has only recently become truly viable. The availability and maturity of automated data collection and analysis systems is making it possible for businesses to implement AI across their entire operations to boost efficiency and agility. AI has the potential to transform operations by improving three fundamental business requirements: process automation, decision-making based on data insights, and customer interaction.

Insurance 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DataKitchen In The The insideBIGDATA IMPACT 50 List

DataKitchen

108
108
article thumbnail

Evolving And Scaling The Data Platform at Yotpo

Data Engineering Podcast

Summary Building a data platform is an iterative and evolutionary process that requires collaboration with internal stakeholders to ensure that their needs are being met. Yotpo has been on a journey to evolve and scale their data platform to continue serving the needs of their organization as it increases the scale and sophistication of data usage. In this episode Doron Porat and Liran Yogev explain how they arrived at their current architecture, the capabilities that they are optimizing for, an

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Machine Learning Is Not Like Your Brain Part One: Neurons Are Slow, Slow, Slow

KDnuggets

Artificial intelligence is not all that intelligent. While today’s AI can do some extraordinary things, the functionality underlying its accomplishments has very little to do with the way in which a human brain works to achieve the same tasks.

article thumbnail

Choose Compliance, Choose Hybrid Cloud

Cloudera

As digital transformation accelerates, and digital commerce increasingly becomes the dominant form of all commerce, regulators and governments around the world are recognizing the increased need for consumer protections and data protection measures. The European Union has been at the vanguard for some time (most recently having reached provisional agreement on the Digital Services Act ) but from Australia to Brazil , from South Africa to California (the rest of the US hasn’t quite caught on yet!

Cloud 104

More Trending

article thumbnail

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Engineering Podcast

Summary The predominant pattern for data integration in the cloud has become extract, load, and then transform or ELT. Matillion was an early innovator of that approach and in this episode CTO Ed Thompson explains how they have evolved the platform to keep pace with the rapidly changing ecosystem. He describes how the platform is architected, the challenges related to selling cloud technologies into enterprise organizations, and how you can adopt Matillion for your own workflows to reduce the ma

article thumbnail

How To Structure a Data Science Project: A Step-by-Step Guide

KDnuggets

Check out all the necessary steps to successfully structure your data science projects leveraging data science templates.

article thumbnail

Winning With Data in the Fight Against Fraud, Waste, and Abuse

Cloudera

Fraud, waste, and abuse (FWA) in government is a constant, multi-billion dollar issue that challenges agency leaders at all levels and across all sectors, from healthcare to education to taxation to Social Security. The scope and scale of public spending — federal outlays alone were approximately $6.6 trillion in fiscal year 2020 according to the Congressional Budget Office — make FWA an inherently difficult problem to solve.

article thumbnail

How to Remove Apache Kafka Brokers the Easy Way

Confluent

The recent release of Confluent Cloud and Confluent Platform 7.0 introduced the ability to easily remove Apache Kafka® brokers and shrink your Confluent Server cluster with just a single command. […].

Kafka 84
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Podcast: Storytime for DataOps

DataKitchen

The post Podcast: Storytime for DataOps first appeared on DataKitchen.

69
article thumbnail

Image Classification with Convolutional Neural Networks (CNNs)

KDnuggets

In this article, we’ll look at what Convolutional Neural Networks are and how they work.

article thumbnail

#Clouderalife Volunteer Spotlight: Lynne Montalbo!

Cloudera

This month we are proud to spotlight Lynne Montalbo, senior business systems analyst from Santa Clara, California, who volunteers as a professional development mentor with Braven. Braven’s mission is to empower promising, underrepresented young people—first-generation college students, students from low-income backgrounds, and students of color—with the skills, confidence, experiences, and networks necessary to transition from college to strong first jobs, which lead to meaningful careers and li

article thumbnail

From the Cellar to the Cloud – How Aedifion is Driving Next-Generation Building Automation with Confluent

Confluent

It is no exaggeration that a lot is going wrong in commercial buildings today. The building and construction sector consumes 36% of global final energy and accounts for almost 40% […].

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Monte Carlo Named One of the Best Places to Work in the Bay Area for 2022

Monte Carlo

I’m honored to share that Monte Carlo was just named a Best Place to Work in the Bay Area for 2022 by the San Francisco Business Times and the Silicon Valley Business Journal, placing 6th in the small business category. This recognition is especially meaningful to our leadership team because the results are based directly on employee feedback, collected anonymously from a third-party researcher.

article thumbnail

9 Free Harvard Courses to Learn Data Science in 2022

KDnuggets

Learn Python programming, statistics, and machine learning online from one of the world’s top universities.

article thumbnail

Seven Benefits of a Powerful Data Fabric

Teradata

The value provided by a powerful data fabric is key for a successful digital transformation. Find out why.

Data 52
article thumbnail

A Real-Time Rockset Intern Experience

Rockset

I spent the spring of my junior year interning at Rockset , and it couldn’t have been a better decision. When I first arrived at the office on a sunny day in San Mateo, I had no idea that I was about to meet so many systems engineering gurus, or that I was about to consume immensely good food from the festive neighboring streets. Working with my talented and resourceful mentor, Ben (Software Engineer, Systems), I’ve been able to learn more than I ever thought I could in three months!

Food 52
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Slim CI/CD with Bitbucket Pipelines

dbt Developer Hub

Continuous Integration (CI) sets the system up to test everyone’s pull request before merging. Continuous Deployment (CD) deploys each approved change to production. “Slim CI” refers to running/testing only the changed code, thereby saving compute. In summary, CI/CD automates dbt pipeline testing and deployment. dbt Cloud , a much beloved method of dbt deployment, supports GitHub- and Gitlab-based CI/CD out of the box.

article thumbnail

SQL Notes for Professionals: The Free eBook Review

KDnuggets

The free book is a combination of SQL cheat sheets and practical database examples. It provided bite-size information about every SQL function and attribute with coding samples.

SQL 159
article thumbnail

Mind the (Sustainability) Gap

Teradata

Less than 20% of retailers on track to meet sustainability pledges. Granular, integrated data is the key to move from reporting to action. Read about our framework for profitable sustainability.

Retail 52
article thumbnail

How Rockset Handles Data Deduplication

Rockset

There are two major problems with distributed data systems. The second is out-of-order messages, the first is duplicate messages, the third is off-by-one errors, and the first is duplicate messages. This joke inspired Rockset to confront the data duplication issue through a process we call deduplication. As data systems become more complex and the number of systems in a stack increases, data deduplication becomes more challenging.

Kafka 52
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Packaging generated code from protobuf files for gRPC Services

Eventbrite Engineering

Background At Eventbrite, we identified in our 3-year technical vision that one of our goals is to enable autonomous dev teams to own their code and architecture so as to be able to deliver reliable, high quality and cost effective solutions to our customers. However, this autonomy does not mean that our team has to … Continue reading "Packaging generated code from protobuf files for gRPC Services" The post Packaging generated code from protobuf files for gRPC Services appeared first on E

Coding 52
article thumbnail

6 Highest Paying Companies for Data Scientists

KDnuggets

These are the six top paying companies for data scientists. I’ve looked at absolute salary, but I’ll fill you in on other factors you should consider as well when it comes to picking a data science job for money.

article thumbnail

Why Does Elder Research Need a Chief Scientist Committee?

Elder Research

The post Why Does Elder Research Need a Chief Scientist Committee? appeared first on Elder Research.

52
article thumbnail

Meet The Graduates: Guoda Paulikaite

Pipeline Data Engineering

In this interview series we’ll share some of the stories that Daniel and I get to watch unfold at Pipeline Academy. Check out what our graduates have to say about the course, how they’ve tackled its challenges and what they are doing now with their new data engineering superpowers. Peter: Can I ask you to please introduce yourself to the readers of Pipeline Academy’s blog?

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Making dbt Cloud API calls using dbt-cloud-cli

dbt Developer Hub

dbt Cloud is a hosted service that many organizations use for their dbt deployments. Among other things, it provides an interface for creating and managing deployment jobs. When triggered (e.g., cron schedule, API trigger), the jobs generate various artifacts that contain valuable metadata related to the dbt project and the run results. dbt Cloud provides a REST API for managing jobs, run artifacts and other dbt Cloud resources.

Cloud 52
article thumbnail

How to Build Strong Data Science Portfolio as a Beginner

KDnuggets

After learning the basics of data science, you can start to work on real-world problems. But how do you showcase your work? In this article, we are going to learn a unique way to create a data science portfolio.

Portfolio 123
article thumbnail

Building Ripple: Engineering Spotlight Pt. 2

Ripple Engineering

In part one of our two-part series, we heard from RippleX engineers that are ideating, creating and executing on new applications using cutting-edge blockchain and crypto technology. Now, we’ll explore how the RippleNet engineering team is building the foundational payments infrastructure on the XRP Ledger that will allow value to move as easily as information moves today.

article thumbnail

DataKitchen Noted For DataOps Thought LeaderShip

DataKitchen

The post DataKitchen Noted For DataOps Thought LeaderShip first appeared on DataKitchen.

52
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m