Sat.Aug 27, 2022 - Fri.Sep 02, 2022

article thumbnail

How to Select Rows and Columns in Pandas Using [ ],loc, iloc,at and.iat

KDnuggets

Subset selection is one of the most frequently performed tasks while manipulating data. Pandas provides different ways to efficiently select subsets of data from your DataFrame.

Data 160
article thumbnail

Incremental Strategies to Move Your Data Strategy Forward Remove Obstacles to Unlock Possibilities in Financial Services

Cloudera

Firms are burdened with tech debt and endless regulatory compliance, often leaving innovation last to receive the necessary budgets. Data-fuelled innovation requires a pragmatic strategy. This blog lays out some steps to help you incrementally advance efforts to be a more data-driven, customer-centric organization. Embrace incremental progress. The financial sector’s evolution is unleashing myriad demands on firms operating in the market.

article thumbnail

Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Data Engineering Podcast

Summary AirBnB pioneered a number of the organizational practices that have become the goal of modern data teams. Out of that culture a number of successful businesses were created to provide the tools and methods to a broader audience. In this episode several almuni of AirBnB’s formative years who have gone on to found their own companies join the show to reflect on their shared successes, missed opportunities, and lessons learned.

Building 100
article thumbnail

Teradata VantageCloud Lake and ClearScape Analytics: Empowering Enterprise Analytical Innovation

Teradata

Teradata's new offerings, VantageCloud Lake and ClearScape Analytics, make it the complete cloud analytics & data platform, with cloud-native deployment and expanded analytics capabilities.

Cloud 98
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The Complete Data Science Study Roadmap

KDnuggets

This article will map out the things you need to do to become a data scientist.

article thumbnail

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

Cloudera

Apache HBase has long been the database of choice for business-critical applications across industries. This is primarily because HBase provides unmatched scale, performance, and fault-tolerance that few other databases can come close to. Think petabytes of data spread across trillions of rows, ready for consumption in real-time. While application developers and database admins are well aware of the benefits of using HBase, they also know about a few shortcomings that the database has historical

More Trending

article thumbnail

What Do You Want to be Famous for?

Teradata

Financial services organizations that exhibit true data literacy avoid bottlenecks and instead choose to build best in class solutions that meet current and future needs. Find out more.

article thumbnail

Build a Reproducible and Maintainable Data Science Project: A Free Online Book

KDnuggets

This free online book is a fantastic resource on how to structure, manage, and maintain your real-world data science projects.

article thumbnail

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

Data is the fuel that drives government, enables transparency, and powers citizen services. But while state and local governments seek to improve policies, decision making, and the services constituents rely upon, data silos create accessibility and sharing challenges that hinder public sector agencies from transforming their data into a strategic asset and leveraging it for the common good. .

article thumbnail

Getting Started with the KRaft Protocol

Confluent

Kafka Raft lets you use Apache Kafka without ZooKeeper by consolidating metadata management. Here’s how you can learn and do more with KRaft.

Kafka 78
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Expert Roundtable: How to Build Real-Time Personalization and Recommendation Systems

Rockset

I recently had the good fortune to host a small-group discussion on personalization and recommendation systems with two technical experts with years of experience at FAANG and other web-scale companies. Raghavendra Prabhu (RVP) is Head of Engineering and Research at Covariant , a Series C startup building an universal AI platform for robotics starting in the logistics industry.

Systems 52
article thumbnail

Machine Learning Metadata Store

KDnuggets

In this article, we will learn about metadata stores, the need for them, their components, and metadata store management.

Metadata 159
article thumbnail

5 Ways To Ensure High Functioning Data Engineering Teams 

Monte Carlo

Data engineering is a relatively young profession, even for the tech space. To put it in perspective, front-end engineering has twice the number of years in industry maturity. While the role itself is rapidly evolving, the tooling, processes, and team structure are fragmented and amorphous at best. As a result, the day-to-day responsibilities of a data engineer can look radically different from one company to another, depending on the needs of the business and the data that drives it.

article thumbnail

Celebrate Back-to-School Season With Data Streaming Basics

Confluent

All the best data streaming resources, tips, and guides to help you learn introductory concepts, streaming architecture basics, common tools and technologies, and more.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Loan Prediction using Machine Learning Project Source Code

ProjectPro

This article will walk you through how one can start by exploring a loan prediction system as a data science and machine learning problem and build a system/application for loan prediction using your own machine learning project. Loan sanctioning and credit scoring forms a multi-billion dollar industry -- in the US alone. With everyone from young students, entrepreneurs, and multi-million dollar companies turning to banks to seek financial support for their ventures, processing these application

article thumbnail

3 Ways to Append Rows to Pandas DataFrames

KDnuggets

Learn a simple way to append rows in the form of arrays, dictionaries, series, and dataframes to another dataframe.

Python 158
article thumbnail

August 2022 dbt Update: v1.3 beta, Tech Partner Program, and Coalesce!

dbt Developer Hub

Semantic layer, Python model support, the new dbt Cloud UI and IDE… there’s a lot our product team is excited to share with you at Coalesce in a few weeks. But how these things fit together—because of where dbt Labs is headed—is what I’m most excited to discuss. You’ll hear more in Tristan’s keynote , but this feels like a good time to remind you that Coalesce isn’t just for answering tough questions… it’s for surfacing them.

article thumbnail

Declarative Connectors with Confluent for Kubernetes

Confluent

Manage connectors declaratively with Confluent for Kubernetes.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data Quality Monitoring – You’re Doing It Wrong

Monte Carlo

Occasionally, we’ll talk with data teams interested in applying data quality monitoring narrowly across only a specific set of key tables. The argument goes something like: “You may have hundreds or thousands of tables in your environment, but most of your business value derives from only a few that really matter. That’s where you really want to focus your efforts.

IT 52
article thumbnail

The Difference Between Training and Testing Data in Machine Learning

KDnuggets

When building a predictive model, the quality of the results depends on the data you use. In order to do so, you need to understand the difference between training and testing data in machine learning.

article thumbnail

MarkLogic And Machine Learning: Easy way of ML

Knoldus

Reading Time: 6 minutes Introduction Machine learning is a subfield of computer science. Used to deal with the construction of artificial intelligence systems that can learn without being explicitly programmed. It has been applied in many areas such as data analysis, pattern recognition, and understanding human behavior. MarkLogic combines database internals, search-style indexing, and application server behavior into a unified system.

article thumbnail

Getting Started with Scala sbt

Rock the JVM

Discover sbt: The popular Scala build tool that simplifies project management and enhances productivity

Scala 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

AI in Drug Discovery and Repurposing: Benefits, Approaches, and Use Cases

AltexSoft

According to McKesson, a company with a two-hundred-year history delivering a third of all drugs across North America, you need around six months to start a pharmacy and another seven to nine months to see any revenue. If this seems too long and complex, just make a comparison with a drug development process. It takes at least ten years and $2.6 billion to get a new medicine to the market.

Medical 52
article thumbnail

Decision Tree Pruning: The Hows and Whys

KDnuggets

Decision trees are a machine learning algorithm that is susceptible to overfitting. One of the techniques you can use to reduce overfitting in decision trees is pruning.

article thumbnail

What is Data Discovery: Definitions & Overview

Monte Carlo

In the world of data engineering, data discovery refers to the ability to find relevant data sets across your data platform and understand their context. Data discovery makes data engineering and analytical engineering tasks more efficient and can enable self-service access for other types of data consumers. Just like knowledge workers need to tap into a shared repository to discover and combine relevant information across documents or slide decks, data professionals need to do the same with dat

article thumbnail

Combining Pandas DataFrames Made Simple

KDnuggets

For this tutorial, we will work through examples to understand how different mehtods for combining Pandas DataFrames work.

Python 139
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Opening Keynote Speaker for Big Data London (21-22 September 2022) announced as Zhamak Deghani – founder of the Data Mesh concept

KDnuggets

Big Data London will run 21-22 Sep 2022 at Olympia, London. Visitors can register to secure their free ticket now.

Big Data 127
article thumbnail

Machine Learning in the Enterprise: Use Cases & Challenges

KDnuggets

This article provides insights into how leading data scientists are embracing machine learning in their organizations and covers some of the major ML challenges and trends in the enterprise.

article thumbnail

KDnuggets News, August 31: The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data

KDnuggets

The Complete Data Science Study Roadmap • 7 Techniques to Handle Imbalanced Data • 3 Ways to Append Rows to Pandas DataFrames • The Bias-Variance Trade-off • How to Package and Distribute Machine Learning Models with MLFlow.

article thumbnail

The Benefits of Natural Language AI for Content Creators

KDnuggets

In this article, we will discuss the benefits of natural language AI for content creators, highlighting the key reasons why you should consider using it to improve your content output.

IT 116
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.