Sat.Nov 25, 2023 - Fri.Dec 01, 2023

article thumbnail

5 Free Courses to Master Data Engineering

KDnuggets

Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.

article thumbnail

Unlocking the Power of Analytics with Dr. Swati Jain

Analytics Vidhya

In this Leading with Data episode, explore the analytics landscape with Dr. Swati Jain, a seasoned leader boasting over two decades of experience. From her unforeseen foray into analytics to steering EXL Analytics’ India business, Dr. Jain imparts invaluable insights into the ever-evolving world of data science. Read on to know more about her career, […] The post Unlocking the Power of Analytics with Dr.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Difference Between Learning and Doing

Jesse Anderson

Lately, I’ve been learning how to trade options. Although there’s data and programming involved in options trading, it isn’t as technical as data engineering or software engineering. However, it reflects the current state of learning, whether that’s data engineering or options trading. It gave me a look into learning a skill using videos. Each lesson I learned will directly apply to your learning or skill improvement.

article thumbnail

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being ma

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

10 GitHub Repositories to Master Machine Learning

KDnuggets

The blog covers machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job.

article thumbnail

How to be Better Than Everyone Else

Confessions of a Data Guy

Ok. Get off your high horse. You are human just like the rest of us. Just like your ancient ancestors who were throwing rocks and sticks at each other a thousand years ago … you are looking for a leg up on the competition. Isn’t that the world we live in? At the end of […] The post How to be Better Than Everyone Else appeared first on Confessions of a Data Guy.

Data 147

More Trending

article thumbnail

Top 7 Free Apache Kafka Tutorials and Courses for Beginners in 2023

Confluent

The top 7 free online courses, tutorials, get started guides, and examples for the easiest way to learn Apache Kafka.

Kafka 132
article thumbnail

Building Predictive Models: Logistic Regression in Python

KDnuggets

Image by Author When you are getting started with machine learning, logistic regression is one of the first algorithms you’ll add to your toolbox.

Python 160
article thumbnail

Accumulators and reliability

Waitingforcode

In March I wrote a blog showing how to use accumulators to know the application of each filter statement. Turns out, the solution may not be perfect as mentioned by Aravind in one of the comments. I bet you already have an idea but if not, keep reading. Everything will be clear in the end!

130
130
article thumbnail

Finding The Right ETL/ELT Solution – What Is Estuary And Should You Use It?

Seattle Data Guy

Data warehousing would be easy if all data were structured and formatted in the data source. Maybe we wouldn’t even need to build a data warehouse. But as anyone who has worked with data from more than one source knows, that’s rarely the case. Businesses today need to pull data from a plethora of sources,… Read more The post Finding The Right ETL/ELT Solution – What Is Estuary And Should You Use It?

IT 130
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

A Deep Dive Into Sending With librdkafka

Confluent

Learn how to write code that produces messages via librdkafka, how it will behave during error situations, and how your application should detect and respond to them.

Coding 131
article thumbnail

Free MIT Course: TinyML and Efficient Deep Learning Computing

KDnuggets

Curious about optimizing AI for everyday devices? Dive into the complete overview of MIT's TinyML and Efficient Deep Learning Computing course. Explore strategies to make AI smarter on small devices. Read the full article for an in-depth look!

article thumbnail

Find the right projection with filters in ArcGIS Pro

ArcGIS

Learn how to filter coordinate systems based on a spatial extent, GCS, or projection property.

Project 129
article thumbnail

Common Pitfalls in Deploying Airflow for Data Teams

Seattle Data Guy

If you’re a data engineer, then you’ve likely at least heard of Airflow. Apache Airflow is one of the most popular open-source workflow orchestration solutions that gets used for data pipelines. This is what spurred me to write the article “Should You Use Airflow” because there are plenty of people who don’t enjoy Airflow or… Read more The post Common Pitfalls in Deploying Airflow for Data Teams appeared first on Seattle Data Guy.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Source filtering with file sets

Tweag

Sponsored by Antithesis (distributed systems reliability testing experts), I’ve developed a new library to filter local files in Nix which I’d like to introduce! This post requires some familiarity with Nix and its language. So if you don’t know what Nix is yet, take a look first, it’s pretty neat. In this post we’re going to look at what source filtering is, why it’s useful, why a new library was needed for it, and the basics of the new library.

Building 125
article thumbnail

Beyond Human Boundaries: The Rise of SuperIntelligence

KDnuggets

From ANI to AGI and Beyond: Deciphering AI's Evolutionary Path.

154
154
article thumbnail

Announcing Causal Inference in ArcGIS Pro 3.2

ArcGIS

New in ArcGIS Pro 3.

article thumbnail

Startup Spotlight: Hum Applies AI and LLMs to Help Publishers ‘Own’ Their Audiences

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. In this edition, find out how Hum is applying the power of AI and large language models (LLMs) to help publishers build stronger customer relationships, and how the mantra of “build what people want” helped their leadership team make the decision to target publishers as their target audience.

Raw Data 122
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Presenting New Partner Integrations in Partner Connect

databricks

We are excited to introduce five new integrations in Databricks Partner Connect—a one-stop portal enabling you to use partner solutions with your Databricks D.

119
119
article thumbnail

11 Python Magic Methods Every Programmer Should Know

KDnuggets

Want to support the behavior of built-in functions and method calls in your Python classes? Magic methods in Python let you do just that! So let’s uncover the method behind the magic.

Python 153
article thumbnail

Solutions to help you weather the storm(water)

ArcGIS

A suite of ArcGIS Solutions to support common workflows in the stormwater industry.

Utilities 120
article thumbnail

Best Practices for Migrating Historical Data to Snowflake

Snowflake

At TCS , we help companies shift their enterprise data warehouse (EDW) platforms to the cloud as well as offering IT services. We’re extremely familiar with just how tricky a cloud migration can be, especially when it involves moving historical business data. Choosing a migration approach involves balancing cloud strategy, architecture needs and business priorities.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Automating Governance of PHI Data in Healthcare

databricks

Background: Modernizing Data Delivery Today's enterprise data estates are vastly different from 10 years ago. Industries have transitioned their analytics from monolithic data.

article thumbnail

Learn Probability in Computer Science with Stanford University for FREE

KDnuggets

Probability is one of the foundational elements of computer science. Some bootcamps will skim over the topic, however, it is integral to your computer science knowledge.

article thumbnail

What does error 00374 mean and what should I do?

ArcGIS

Learn how to resolve error 00374: Unique numeric IDs are not assigned.

116
116
article thumbnail

Reinventing ERP Insights With Maxa and Snowflake Native Apps

Snowflake

ERP systems run the world’s businesses. These stalwart systems are great at managing records and processes for finance, operations, supply chain management and more. But their insights need an upgrade. That’s the case put forward by Maxa , an enterprise-grade startup that has made it their mission to reinvent the way companies access and use ERP data for transformational insights.

Finance 111
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Databricks Wins AWS ISV Partner of the Year Award in NAMER

databricks

We’re thrilled to share that Databricks has won the AWS ISV Partner of the Year award for North America. This award recognizes top I.

AWS 111
article thumbnail

The Top 5 Alternatives to GitHub for Data Science Projects

KDnuggets

The blog discusses five platforms designed for data scientists with specialized capabilities in managing large datasets, models, workflows, and collaboration beyond what GitHub offers.

article thumbnail

Building Trust in Public Sector AI Starts with Trusting Your Data

Cloudera

Recent Government Initiatives on Public Sector AI Solutions In recent years, governments across the globe have recognized the transformative potential of artificial intelligence (AI) and have embarked on initiatives to harness this technology to drive innovation and serve their citizens more effectively. These government-led efforts have had a profound impact on the development and adoption of AI solutions in the public sector, paving the way for a future where data-driven decision-making and au

Building 109
article thumbnail

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

Written by Ritesh Varyani and Jeana Choi at Lyft. Introduction At Lyft, we have used systems like Apache ClickHouse and Apache Druid for near real-time and sub-second analytics. Sub-second query systems allow for near real-time data explorations and low latency, high throughput queries, which are particularly well-suited for handling time-series data.

Kafka 108
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m