Sat.Aug 27, 2022 - Fri.Sep 02, 2022

article thumbnail

How to Select Rows and Columns in Pandas Using [ ],loc, iloc,at and.iat

KDnuggets

Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.

Data 160
article thumbnail

Incremental Strategies to Move Your Data Strategy Forward Remove Obstacles to Unlock Possibilities in Financial Services

Cloudera

Firms are burdened with tech debt and endless regulatory compliance, often leaving innovation last to receive the necessary budgets. Data-fuelled innovation requires a pragmatic strategy. This blog lays out some steps to help you incrementally advance efforts to be a more data-driven, customer-centric organization. Embrace incremental progress. The financial sector’s evolution is unleashing myriad demands on firms operating in the market.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. In this episode Sean Knapp shares his views on what constitutes proper automation and the work that he and his team at Ascend are doing to help make it a reality.

article thumbnail

Teradata VantageCloud Lake and ClearScape Analytics: Empowering Enterprise Analytical Innovation

Teradata

Teradata's new offerings, VantageCloud Lake and ClearScape Analytics, make it the complete cloud analytics & data platform, with cloud-native deployment and expanded analytics capabilities.

Cloud 98
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

The Complete Data Science Study Roadmap

KDnuggets

This article will map out the things you need to do to become a data scientist.

article thumbnail

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

Cloudera

Apache HBase has long been the database of choice for business-critical applications across industries. This is primarily because HBase provides unmatched scale, performance, and fault-tolerance that few other databases can come close to. Think petabytes of data spread across trillions of rows, ready for consumption in real-time. While application developers and database admins are well aware of the benefits of using HBase, they also know about a few shortcomings that the database has historical

More Trending

article thumbnail

What Do You Want to be Famous for?

Teradata

Financial services organizations that exhibit true data literacy avoid bottlenecks and instead choose to build best in class solutions that meet current and future needs. Find out more.

article thumbnail

Build a Reproducible and Maintainable Data Science Project: A Free Online Book

KDnuggets

This free online book is a fantastic resource on how to structure, manage, and maintain your real-world data science projects.

article thumbnail

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

Data is the fuel that drives government, enables transparency, and powers citizen services. But while state and local governments seek to improve policies, decision making, and the services constituents rely upon, data silos create accessibility and sharing challenges that hinder public sector agencies from transforming their data into a strategic asset and leveraging it for the common good. .

article thumbnail

Celebrate Back-to-School Season With Data Streaming Basics

Confluent

All the best data streaming resources, tips, and guides to help you learn introductory concepts, streaming architecture basics, common tools and technologies, and more.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Expert Roundtable: How to Build Real-Time Personalization and Recommendation Systems

Rockset

I recently had the good fortune to host a small-group discussion on personalization and recommendation systems with two technical experts with years of experience at FAANG and other web-scale companies. Raghavendra Prabhu (RVP) is Head of Engineering and Research at Covariant , a Series C startup building an universal AI platform for robotics starting in the logistics industry.

Systems 52
article thumbnail

Machine Learning Metadata Store

KDnuggets

In this article, we will learn about metadata stores, the need for them, their components, and metadata store management.

Metadata 159
article thumbnail

5 Ways To Ensure High Functioning Data Engineering Teams 

Monte Carlo

Data engineering is a relatively young profession, even for the tech space. To put it in perspective, front-end engineering has twice the number of years in industry maturity. While the role itself is rapidly evolving, the tooling, processes, and team structure are fragmented and amorphous at best. As a result, the day-to-day responsibilities of a data engineer can look radically different from one company to another, depending on the needs of the business and the data that drives it.

article thumbnail

Declarative Connectors with Confluent for Kubernetes

Confluent

Manage connectors declaratively with Confluent for Kubernetes.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Loan Prediction using Machine Learning Project Source Code

ProjectPro

This article will walk you through how one can start by exploring a loan prediction system as a data science and machine learning problem and build a system/application for loan prediction using your own machine learning project. Loan sanctioning and credit scoring forms a multi-billion dollar industry -- in the US alone. With everyone from young students, entrepreneurs, and multi-million dollar companies turning to banks to seek financial support for their ventures, processing these application

article thumbnail

3 Ways to Append Rows to Pandas DataFrames

KDnuggets

Learn a simple way to append rows in the form of arrays, dictionaries, series, and dataframes to another dataframe.

Python 158
article thumbnail

August 2022 dbt Update: v1.3 beta, Tech Partner Program, and Coalesce!

dbt Developer Hub

Semantic layer, Python model support, the new dbt Cloud UI and IDE… there’s a lot our product team is excited to share with you at Coalesce in a few weeks. But how these things fit together—because of where dbt Labs is headed—is what I’m most excited to discuss. You’ll hear more in Tristan’s keynote , but this feels like a good time to remind you that Coalesce isn’t just for answering tough questions… it’s for surfacing them.

article thumbnail

MarkLogic And Machine Learning: Easy way of ML

Knoldus

Reading Time: 6 minutes Introduction Machine learning is a subfield of computer science. Used to deal with the construction of artificial intelligence systems that can learn without being explicitly programmed. It has been applied in many areas such as data analysis, pattern recognition, and understanding human behavior. MarkLogic combines database internals, search-style indexing, and application server behavior into a unified system.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Data Quality Monitoring – You’re Doing It Wrong

Monte Carlo

Occasionally, we’ll talk with data teams interested in applying data quality monitoring narrowly across only a specific set of key tables. The argument goes something like: “You may have hundreds or thousands of tables in your environment, but most of your business value derives from only a few that really matter. That’s where you really want to focus your efforts.

IT 52
article thumbnail

The Difference Between Training and Testing Data in Machine Learning

KDnuggets

When building a predictive model, the quality of the results depends on the data you use. In order to do so, you need to understand the difference between training and testing data in machine learning.

article thumbnail

Getting Started with Scala sbt

Rock the JVM

Discover sbt: The popular Scala build tool that simplifies project management and enhances productivity

Scala 52
article thumbnail

AI in Drug Discovery and Repurposing: Benefits, Approaches, and Use Cases

AltexSoft

According to McKesson, a company with a two-hundred-year history delivering a third of all drugs across North America, you need around six months to start a pharmacy and another seven to nine months to see any revenue. If this seems too long and complex, just make a comparison with a drug development process. It takes at least ten years and $2.6 billion to get a new medicine to the market.

Medical 52
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Data Engineering Podcast

Summary AirBnB pioneered a number of the organizational practices that have become the goal of modern data teams. Out of that culture a number of successful businesses were created to provide the tools and methods to a broader audience. In this episode several almuni of AirBnB’s formative years who have gone on to found their own companies join the show to reflect on their shared successes, missed opportunities, and lessons learned.

Building 100
article thumbnail

Decision Tree Pruning: The Hows and Whys

KDnuggets

Decision trees are a machine learning algorithm that is susceptible to overfitting. One of the techniques you can use to reduce overfitting in decision trees is pruning.

article thumbnail

What is Data Discovery: Definitions & Overview

Monte Carlo

In the world of data engineering, data discovery refers to the ability to find relevant data sets across your data platform and understand their context. Data discovery makes data engineering and analytical engineering tasks more efficient and can enable self-service access for other types of data consumers. Just like knowledge workers need to tap into a shared repository to discover and combine relevant information across documents or slide decks, data professionals need to do the same with dat

article thumbnail

Combining Pandas DataFrames Made Simple

KDnuggets

For this tutorial, we will work through examples to understand how different mehtods for combining Pandas DataFrames work.

Python 132
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Opening Keynote Speaker for Big Data London (21-22 September 2022) announced as Zhamak Deghani – founder of the Data Mesh concept

KDnuggets

Big Data London will run 21-22 Sep 2022 at Olympia, London. Visitors can register to secure their free ticket now.

Big Data 120
article thumbnail

Machine Learning in the Enterprise: Use Cases & Challenges

KDnuggets

This article provides insights into how leading data scientists are embracing machine learning in their organizations and covers some of the major ML challenges and trends in the enterprise.

article thumbnail

KDnuggets News, August 31: The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data

KDnuggets

The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data • 3 Ways to Append Rows to Pandas DataFrames • The Bias-Variance Trade-off • How to Package and Distribute Machine Learning Models with MLFlow.

article thumbnail

The Benefits of Natural Language AI for Content Creators

KDnuggets

In this article, we will discuss the benefits of natural language AI for content creators, highlighting the key reasons why you should consider using it to improve your content output.

IT 112
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m