Sat.Oct 07, 2023 - Fri.Oct 13, 2023

article thumbnail

The Power of a Semantic Layer: A Data Engineer’s Guide

KDnuggets

Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.

Data 108
article thumbnail

Going from Developer to CEO: Chronosphere

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of eight topics from today’s deepdive into tech scaleup Chronosphere. To get full issues twice a week, subscribe here.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Using Data To Illuminate The Intentionally Opaque Insurance Industry

Data Engineering Podcast

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Insurance 162
article thumbnail

LLM Inference Performance Engineering: Best Practices

databricks

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Why SQL is THE Language to Learn for Data Science

KDnuggets

SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.

article thumbnail

How to use the DockerOperator

Marc Lamberti

Do you wonder how to use the DockerOperator in Airflow to kick off a docker image? Or how to run a task without creating dependency conflicts? In this tutorial, you will discover everything you need about the DockerOperator with practical examples. If you’re new to Airflow, I’ve created a course you can check out here. Ready? Let’s go!

AWS 130

More Trending

article thumbnail

Llama 2 Foundation Models Available in Databricks Lakehouse AI

databricks

We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep.

article thumbnail

Unlocking GPT-4 Summarization with Chain of Density Prompting

KDnuggets

Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.

151
151
article thumbnail

Build an Actionable Customer 360 in the Data Cloud with Hightouch Events

Snowflake

Easily collect and store digital events directly to create a complete composable customer data platform (CDP) Marketers are increasingly leveraging the Snowflake Data Cloud as the foundation for all of their customer data analytics and activation. Marketing teams are creating composable customer data platforms (CDPs) on the Data Cloud to build a 360-degree view of each customer.

Cloud 121
article thumbnail

Increase data literacy and trust with Alation data catalog integration

ThoughtSpot

When using data to make impactful business decisions, certain doubts may start to arise, like “What does this column exactly mean?” or “Can I trust this data source I want to use?” Questions like these speak to a larger need for increased data literacy and trust in data. ThoughtSpot continually invests in this area, giving users the confidence to build the correct Answers needed for their analysis—and ensuring they can trust the data they are shown.

Metadata 105
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Announcing public preview of Databricks Assets Bundles: Apply software development best practices with ease

databricks

We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering.

article thumbnail

AI and Open Source Software: Separated at Birth?

KDnuggets

In this article, Luis shares with readers his thoughts on the intersection of open source software and machine learning and what the future might bring. Many articles cover how open source software is used by the machine learning community but this post focuses on the similarities between the two areas of practice and what machine learning can and can’t learn from open source software.

article thumbnail

5 Trends Changing the Modern Startup Ecosystem

Snowflake

While the startup world listened for better news in the aftermath of a volatile 2022, a new salvo of bad news emerged: global venture capital funding declined about 49% within the first six months of 2023 alone. Worsening inflation and rising interest rates are putting pressure on startups across all stages of venture funding to reframe their tech stacks or business models along the tech-scape’s collapsing edges.

article thumbnail

Unapologetically Technical Episode 5 – Neil Avery

Jesse Anderson

Unapologetically Technical is finally back with a new episode! In this episode of Unapologetically Technical, I had the pleasure of interviewing Neil Avery from Liquidlabs. We discussed his experiences creating grid computing systems at major banks like Royal Bank of Scotland and Deutchebank, as well as his journey to founding a startup called Logscape and working as a consultant at Excellian.

Banking 100
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Databricks Obtains ISO 27701 Certification

databricks

We’re excited to announce that Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor. This certification reflects our c.

article thumbnail

Best Practices for Building ETLs for ML

KDnuggets

This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.

Building 148
article thumbnail

Snowflake and Partners Develop Award-Winning Solution to Give Telecoms and Consumers the Power to Reduce Carbon Emissions with Generative AI

Snowflake

In the age of climate consciousness, industries worldwide are grappling with the urgent need to reduce their carbon footprints. One industry that has come under increased scrutiny is telecommunications, where Scope 3 emissions , or the indirect emissions that occur in a company’s value chain that the company has no direct control over, alone account for a staggering 85% of a typical telecom company’s carbon footprint.

article thumbnail

Projetando a arquitetura orientada a eventos da Loggi para flexibilidade e produtividade em engenharia

Confluent

With Confluent Cloud, Loggi migrated to an event-driven architecture, powering real-time analytics, boosting productivity, and cutting costs.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Scalable, In-House Quality Measurement with a NCQA-Certified Engine on the Lakehouse

databricks

This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park.

article thumbnail

Rust Burn Library for Deep Learning

KDnuggets

A new deep learning framework built entirely in Rust that aims to balance flexibility, performance, and ease of use for researchers, ML engineers, and developers.

article thumbnail

How to Become a Data Engineer

Towards Data Science

A shortcut for beginners in 2024 Continue reading on Towards Data Science »

article thumbnail

How LinkedIn Elevated Its Risk and Compliance Platform To Improve Stakeholder Experience And Enable Next Generation Integrated Risk Management

LinkedIn Engineering

Co-Authors: Chaitali Parmar , Eric Stoll , and Natasha Michel At Linkedin, one of the Information Security team's core commitments is to enable an environment of trusted and secure products, platforms, and infrastructure for our employees, members, and customers. The Infosec Governance, Risk and Compliance (GRC) and Third Party Security (TPS) teams are responsible for documenting security policy and monitoring in-house and third party risk and control environments to assure compliance and a heal

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Databricks and Shell collaborate to simplify industrial time series data analytics on the Lakehouse

databricks

Written in partnership with Shell. The energy industry is all about physical assets – from terminals, ships and pipelines to refineries and wind f.

article thumbnail

7 High Paying Side Hustles for Data Scientists

KDnuggets

This article serves as a guide for the data professional who wants to earn more in these trying times.

Data 148
article thumbnail

5 Generative AI Use Cases Companies Can Implement Today

Towards Data Science

Getting started with LLMs? Here are 5 popular applications data teams at OpenAI, Vimeo, and other companies are putting into practice today. Image courtesy of author. The hype around generative AI is real, and data and ML teams are feeling the heat. Across industries, executives are pushing their data leaders to build AI-powered products that will save time, drive revenue, or give them a competitive advantage.

article thumbnail

How to Become a Project Director? In 5 Simple Steps

Knowledge Hut

Project management involves muti faceted skills and competencies. There are various skilled people involved in project management, from project coordinators to project consultants, the list is endless. One key role in project management is the project director. These individuals are in the top line of project management, they are responsible for making crucial decisions involved in the projects.

Project 96
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Announcing the General Availability of the Databricks SQL Statement Execution API

databricks

Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for.

SQL 105
article thumbnail

Revamping Data Visualization: Mastering Time-Based Resampling in Pandas

KDnuggets

Unlock the power of time-based data visualization with Pandas as we delve into the art of resampling, turning your data into insightful temporal masterpieces.

Data 145
article thumbnail

Mastering data integration from SAP Systems with prompt engineering

Towards Data Science

Construction engineer investigating his work — Stable diffusion Introduction In our previous publication, From Data Engineering to Prompt Engineering , we demonstrated how to utilize ChatGPT to solve data preparation tasks. Apart from the good feedback we have received, one critical point has been raised: Prompt engineering may help with simple tasks, but is it really useful in a more challenging environment?

article thumbnail

Hispanic Heritage Month Tribute: Latinos Powering Progress in America

Robinhood

Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to financial information and investing. Together, we are building products and services that help create a financial system everyone can participate in. … We are excited to join Latinhood, our Employee Resource Group (ERG) dedicated to the Hispanic/Latino community, in celebrating Hispanic Heritag

Finance 91
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m