Sat.Aug 13, 2022 - Fri.Aug 19, 2022

article thumbnail

5 Tricky SQL Queries Solved

KDnuggets

Explaining the approach to solving a few complex SQL queries.

SQL 160
article thumbnail

Real-Time Wildlife Monitoring with Apache Kafka

Confluent

Confluent Hackathon ‘22: Using Apache Kafka a Raspberry Pi, and a camera, Simon Aubury builds a detection and monitoring system to better understand wildlife population trends over time.

Kafka 119
article thumbnail

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

Summary Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first solution to this problem, but they are only helpful if you know what you are looking for. In this episode Shinji Kim discusses the challenges of data discovery and how to collect and preserve additional context about each piece of information so that you can find what you need when you don’t even know what you’re looking for yet.

Metadata 100
article thumbnail

Reflections on Data Literacy for Financial Services Leaders

Teradata

In conversations with c-level execs at banks & financial institutions, one theme always crops up. How do we change our operating model to be more agile & customer focused in a digital first world?

Banking 98
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

What Does ETL Have to Do with Machine Learning?

KDnuggets

ETL during the process of producing effective machine learning algorithms is found at the base - the foundation. Let’s go through the steps on how ETL is important to machine learning.

article thumbnail

Data Enrichment in Existing Data Pipelines Using Confluent Cloud

Confluent

Learn how you can integrate data streams into your environment, and enrich data across your existing data pipelines using Confluent Cloud.

More Trending

article thumbnail

#Clouderalife Volunteer Spotlight: Thatiane Freire, Account Executive, Public Sector

Cloudera

Cloudera’s August Volunteer Spotlight is Thatiane Freire, account executive for the public sector, located in Bras í lia, Brazil, and one of the company’s Cloudera Cares Ambassadors. . Thatiane volunteers with a local organization called Casa de Caridade Inacio Daniel, which began with the goal to meet the day-to-day, foundational needs of the homeless community in Bras í lia.

Food 93
article thumbnail

How Do Data Scientists and Data Engineers Work Together?

KDnuggets

If you’re considering a career in data science, it’s important to understand how these two fields differ, and which one might be more appropriate for someone with your skills and interests.

article thumbnail

How we shaved 90 minutes off our longest running model

dbt Developer Hub

When running a job that has over 1,700 models, how do you know what a “good” runtime is? If the total process takes 3 hours, is that fantastic or terrible? While there are many possible answers depending on dataset size, complexity of modeling, and historical run times, the crux of the matter is normally “did you hit your SLAs”? However, in the cloud computing world where bills are based on usage, the question is really “did you hit your SLAs and stay within budget ”?

article thumbnail

An Introduction to Apache Kafka Security: Securing Real-Time Data Streams

Confluent

Learn the basics of Kafka security, including authentication, authorization, encryption, and audit logs for compliant, secure data streaming within any Kafka system.

Kafka 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Online Data Migration from HBase to TiDB with Zero Downtime

Pinterest Engineering

Ankita Girish Wagh | Senior Software Engineer, Storage and Caching Introduction and Motivation At Pinterest, HBase is one of the most critical storage backends, powering many online storage services like Zen (graph database), UMS (wide column datastore), and Ixia (near real time secondary indexing service). The HBase Ecosystem, though having various advantages like strong consistency at row level in high volume requests, flexible schema, low latency access to data, Hadoop integration, etc. canno

article thumbnail

Implementing DBSCAN in Python

KDnuggets

Density-based clustering algorithm explained with scikit-learn code example.

Python 159
article thumbnail

Data Science Projects for Beginners

U-Next

Introduction: Data Science Projects for Beginners. You have your sights set on a lucrative Data Science position that literally screams “you” in the job title. You know that you possess the Data Science expertise needed for the position. The issue is that you have nothing to show for your broad Data Science skill set. Anyone can claim to be a good data scientist on their CV, but hiring managers want to see examples to support that claim.

article thumbnail

A Data Engineer’s Guide to Building Reliable Systems

Monte Carlo

Over the years, I’ve helped companies of all sizes build and maintain data systems—from my days as a data engineer at Facebook to my current role as an end-to-end data solutions consultant. As a YouTuber and blogger , I’ve connected with data engineers from all over the world. And these days, everyone seems to share a common concern: how do we make sure the data we rely on to make all of our important business decisions is actually reliable?

Systems 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

ZIO HTTP Explained: The REST of the Owl

Rock the JVM

Learn how to effortlessly set up an HTTP server with zio-http: the powerful HTTP library in the ZIO ecosystem

52
article thumbnail

Type I and Type II Errors: What’s the Difference?

KDnuggets

Looking to sort out the difference between Type I and Type II errors? Read on for more.

article thumbnail

Power function in Java

U-Next

The power function in Java allows users to deal with mathematical equations and procedures. Read on to learn about it in detail. An Introduction to Power Functions in Java. A large library allowing the calculation of many complex mathematical equations and procedures is available in Java. In Java, the library is known as the Math class. It is contained in the Java Lang package.

Java 52
article thumbnail

Monte Carlo and dbt Labs Announce Partnership to Help Analytics Engineering Teams Achieve More Reliable Data

Monte Carlo

When it comes to trusting your data, Monte Carlo, the creator of the data observability category, and dbt Labs , creators of dbt, are better together. “Why didn’t my job run?” “What happened to this dashboard?” “Why is this column missing?” “What went wrong with my data?!” If you’ve been on the receiving end of a broken data pipeline, these questions probably look familiar to you.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

ZIO HTTP Explained: The REST of the Owl

Rock the JVM

Learn how to effortlessly set up an HTTP server with zio-http: the powerful HTTP library in the ZIO ecosystem

52
article thumbnail

Machine Learning Over Encrypted Data

KDnuggets

This blog outlines a solution to the Kaggle Titanic challenge that employs Privacy-Preserving Machine Learning (PPML) using the Concrete-ML open-source toolkit.

article thumbnail

Identifiers in Java

U-Next

A name that “identifies” either a singular thing or a particular class of objects can be an idea, a countable physical object, or a physical uncountable substance. For in-depth understanding, read the full blog. Introduction to Java Identifiers. A program’s basic building blocks are variables, methods, and classes. There is no use in writing a program if it does not include class, process, and variable.

Java 52
article thumbnail

Kafka vs Kinesis: How to Choose

Rockset

Streams for Everyone If you have come this far it means you have already considered or are considering using event streaming in your data architecture for the wide variety of benefits it can offer. Or perhaps you are looking for something to support a Data Mesh initiative because that’s all the rage right now. In either case, both Amazon Kinesis and Apache Kafka can help but which one is the right fit for you and your goals.

Kafka 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

7 Steps for Building a Successful Data Team at Your Startup

Monte Carlo

When you’re the first data hire at a startup, the sky’s the limit—and that can be incredibly overwhelming. Who do you hire first? What tools should you invest in? What KPIs should you measure? And much more. No matter how you cut it, you don’t have an instruction manual, and given how fast the data landscape is evolving, it’s hard to find (let alone follow) best practices for building a data team from scratch.

article thumbnail

Why is Data Management so Important to Data Science?

KDnuggets

High data availability may help power digital transformation, but data management systems are needed to keep that data organizaed and make it accessible. Read this article to see why data management is important to data science.

article thumbnail

What Are the 7 Ps of Marketing?

U-Next

Introduction to the 7 Ps of Marketing. A strategic marketing framework helps us define targets based on the existing position of a firm. The strategy outlines how those goals will be met, including the target market and the firm’s position. So we need to specify the techniques to make this strategy a reality, which is where the 7 ps of marketing comes into play.

Media 52
article thumbnail

6 Steps of Process Mining – Infographic

Data Science Blog: Data Engineering

Many Process Mining projects mainly revolve around the selection and introduction of the right Process Mining tools. Relying on the right tool is of course an important aspect in the Process Mining project. Depending on whether the process analysis project is a one-time affair or daily process monitoring, different tools are pre-selected. Whether, for example, a BI system has already been established and whether a sophisticated authorization concept is required for the process analyzes also play

Process 52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Best React Charting Libraries for Data Visualization and Analytics | Propel Data Analytics Blog

Propel Data

We've picked Recharts, Echarts, React ChartJS 2, and VISX as the best charting libraries for data visualization and data analytics in React.

article thumbnail

The Complete Collection of Data Science Projects – Part 2

KDnuggets

The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.

article thumbnail

Data Science Projects for Beginners

U-Next

Introduction: Data Science Projects for Beginners. You have your sights set on a lucrative Data Science position that literally screams “you” in the job title. You know that you possess the Data Science expertise needed for the position. The issue is that you have nothing to show for your broad Data Science skill set. Anyone can claim to be a good data scientist on their CV, but hiring managers want to see examples to support that claim.

article thumbnail

Accelerate Analytics for All

Cloudera

?. What if you could access all your data and execute all your analytics in one workflow, quickly with only a small IT team? CDP One is a new service from Cloudera that is the first data lakehouse SaaS offering with cloud compute, cloud storage, machine learning (ML), streaming analytics, and enterprise grade security built-in. Data practitioners can now produce end to end analytic pipelines through one service.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.