Sat.Dec 16, 2023 - Fri.Dec 22, 2023

article thumbnail

Most Common Use Cases of Data Engineering in Manufacturing

phData: Data Engineering

Data engineering refers to the design of systems that are capable of collecting, analyzing, and storing data at a large scale. In manufacturing, data engineering aids in optimizing operations and enhancing productivity while ensuring curated data that is both compliant and high in integrity. The increased efficiency in data “wrangling” means that more accurate modeling and planning may be done, enabling manufacturers to make stronger data-driven decisions.

article thumbnail

Mentoring software engineers or engineering leaders

The Pragmatic Engineer

I get asked every now and then if I offer 1:1 mentoring for either software engineers or engineering managers or leaders. While I used to do this in the past, I don't offer this any more. I collected much of the advice I have to offer for software engineers in The Software Engineer's Guidebook. I also write The Pragmatic Engineer Newsletter where I do cover topics like what it means to be a senior engineer at various companies , how to deal with a low-quality engineering culture , and

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Unlocking the Power of Containers: Exploring the Top 20 Docker Containers for Every Development Need

Analytics Vidhya

Introduction Docker containers have emerged as indispensable tools in the fast-evolving landscape of software development and deployment, providing a lightweight and efficient way to package, distribute, and run applications. This article delves into the top 20 Docker containers across various categories, showcasing their features, use cases, and contributions to streamlining development workflows.

183
183
article thumbnail

How to Access and Use Gemini API for Free

KDnuggets

Learn how to integrate advanced AI multimodal models into your project using a simple Python API.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Databricks Named a Leader in 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems

databricks

We are excited to announce that Gartner has recognized Databricks as a Leader for a third consecutive year in the 2023 Gartner® Magic.

Database 145
article thumbnail

The Pragmatic Engineer Newsletter in 2023

The Pragmatic Engineer

2023 was the second full year of The Pragmatic Engineer Newsletter , and this newsletter is now almost two and a half years old; the first issue came out on 26 August 2021. Thank you for being a reader, I greatly value your support. This year, 102 newsletter issues were published, and this is number 103. You received a deepdive issue on Tuesdays, and every Thursday it was  “The Pulse”  – formerly The Scoop.

More Trending

article thumbnail

Free Harvard Course: Introduction to AI with Python

KDnuggets

Looking for a great course to learn Artificial Intelligence with Python? Check out this free course from Harvard University.

Python 151
article thumbnail

Introducing Mixtral 8x7B with Databricks Model Serving

databricks

Today, Databricks is excited to announce support for Mixtral 8x7B in Model Serving. Mixtral 8x7B is a sparse Mixture of Experts (MoE) open.

article thumbnail

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Snowflake

The rise of generative AI (gen AI) is inspiring organizations to envision a future in which AI is integrated into all aspects of their operations for a more human, personalized and efficient customer experience. However, getting the required compute infrastructure into place, particularly GPUs for large language models (LLMs), is a real challenge. Accessing the necessary resources from cloud providers demands careful planning and up to month-long wait times due to the high demand for GPUs.

Scala 139
article thumbnail

Make this AI-inspired topo landscape please

ArcGIS

Here's how to fake an isometric 3D topo terrain in 2D! And stuff.

138
138
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Top KDnuggets Posts of 2023: Free Learning Resources and More

KDnuggets

Here are our top posts of 2023, including: 5 Free Books to Master Data Science • 5 Free Courses to Master Machine Learning • 3 Ways to Access GPT-4 for Free • and much more!

article thumbnail

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

databricks

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

Python 132
article thumbnail

Snowflake Announces Agreement to Acquire Samooha to Simplify Building Interoperable Data Clean Rooms in the Data Cloud

Snowflake

When businesses share sensitive first-party data with outside partners or customers, they must do so in a way that meets strict governance requirements around security and privacy. Data clean rooms have emerged as the technology to meet this need, enabling interoperability where multiple parties can collaborate on and analyze sensitive data in a governed way without exposing direct access to the underlying data and business logic.

Cloud 137
article thumbnail

Order is king for the performance

Waitingforcode

Even though nowadays data processing frameworks and data stores have smart query planners, they don't take our responsibility to correctly design the job logic.

Designing 130
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Easily Integrate LLMs into Your Scikit-learn Workflow with Scikit-LLM

KDnuggets

LLM is a powerful model that could improve our text analysis. With Scikit-LLM, we could integrate the LLM easily into our ML pipeline.

146
146
article thumbnail

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the s

Data Lake 130
article thumbnail

Practical Magic: Improving Productivity and Happiness for Software Development Teams

LinkedIn Engineering

Co-authors: Max Kanat-Alexander and Grant Jenks Today we are open-sourcing the LinkedIn Developer Productivity & Happiness Framework (DPH Framework) - a collection of documents that describe the systems, processes, metrics, and feedback systems we use to understand our developers and their needs internally at LinkedIn. Now more than ever, developers are navigating so much change and new opportunity in this new era of Generative AI, so ensuring teams have the systems, processes, metrics and f

article thumbnail

AI debugging at Meta with HawkEye

Engineering at Meta

HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the time spent debugging production issues.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

5 Cheap Books to Master Data Science

KDnuggets

There are many data-learning materials locked up behind expensive books. These cheap books would bolster your skills without blowing up your savings.

article thumbnail

Geoprocessing UIUX Enhancements that will boost your productivity in ArcGIS Pro 3.2

ArcGIS

Check out this blog to see how we enhanced several geoprocessing controls with additional features and new designs in ArcGIS Pro 3.2.

Designing 117
article thumbnail

Databricks Data Intelligence Platform for Retail comes to NRF 2024

databricks

Request a meeting with Databricks executives/thought leaders at NRF! Each January, thousands of leaders from retailers around the globe gather at Javits Center.

Retail 115
article thumbnail

Sherwood Media Portfolio Grows with Chartr Limited Acquisition

Robinhood

Sherwood Media, LLC has added U.K.-based Chartr Limited, a data-driven media company and newsletter publisher, to its portfolio through an acquisition by Robinhood Markets, Inc. Chartr’s visual storytelling turns complex data into easy-to-understand narratives, and will now give the tens of millions of readers of Sherwood Media the ability to better understand the finer details of important trends and the news of the day.

Media 115
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

A Simple Guide to Running LlaMA 2 Locally

KDnuggets

We will learn a simple way to install and use Llama 2 without setting up Python or any program. Just download the files and run a command in PowerShell.

Python 145
article thumbnail

Datafusion SQL CLI – Look Ma, I made a new ETL tool.

Confessions of a Data Guy

Sometimes I just need something new and interesting to work on, to keep me engaged. A few days ago I was lying by the river next to a fire, with the cold air blowing on my face and the eagles soaring above. Thinking about and contemplating life and data engineering … something flitted across my […] The post Datafusion SQL CLI – Look Ma, I made a new ETL tool. appeared first on Confessions of a Data Guy.

ETL Tools 113
article thumbnail

Top 6 Episodes of The Data Chief Podcast: 2023

ThoughtSpot

2023 has been a year of breakthrough innovation for many, and a deer-in-headlights moment for others. I keep flashing back to the 90s when the Internet created new businesses and destroyed others—LLMs are doing the same, only with more velocity. From CDAOs to VCs alike, the rate of creative destruction is faster, but there is also an intense focus on value.

article thumbnail

ML-Based Forecasting and Anomaly Detection in Snowflake Cortex, Now in GA

Snowflake

Historically, only a few AI experts within an organization could develop insights using machine learning (ML) and predictive analytics. Yet in this new wave of AI, democratizing ML to more data teams is crucial—and for Snowflake SQL users, it’s now a reality. With the general availability of ML-based forecasting and anomaly detection functions in Snowflake Cortex, data analysts and other SQL users can now build more accurate forecasts and identify outliers for their time-series data in Snowflake

SQL 110
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

The Best Data Science Resources, Bootcamp, and Courses to Learn Data Science in the New Year

KDnuggets

We've partnered with Springboard, the leading data science bootcamp offering personalized 1:1 mentorship, dedicated career support, proven outcomes, and an unbeatable money-back job guarantee, to present a handpicked collection of resources to supercharge your data science journey in the coming year.

article thumbnail

Top Trends in Agile You Can’t Miss in 2024

Knowledge Hut

Technology is evolving at breakneck speed, and the information we consume every day continues to grow exponentially with every passing day. Analyzing this complex mountain of data to make the right decisions informed by this data has become ever more challenging. Traditional models of project management , like the waterfall method and hierarchical team structures are too rigid to respond to the fast-paced change organizations are facing today.

article thumbnail

Harnessing Machine Learning for Anomaly Detection in the Building Products Industry with Databricks

databricks

Introduction Anomaly detection is widely applied across various industries, playing a significant role in the enterprise sector. This blog focuses on its application.

article thumbnail

Startup Spotlight: Patch Helps Devs Unblock Pipelines With Data Packages 

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we feature awesome companies building businesses on Snowflake. In this edition, Patch.tech Co-Founder and CPO Whelan Boyd talks about how frustration with clogged data pipelines sparked the idea for Patch’s code packages, which allow engineers to distribute data sets with all the built-in elements that analysts and developers need to create apps.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m