Sat.Oct 07, 2023 - Fri.Oct 13, 2023

article thumbnail

The Power of a Semantic Layer: A Data Engineer’s Guide

KDnuggets

Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.

Data 112
article thumbnail

Going from Developer to CEO: Chronosphere

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of eight topics from today’s deepdive into tech scaleup Chronosphere. To get full issues twice a week, subscribe here.

article thumbnail

Using Data To Illuminate The Intentionally Opaque Insurance Industry

Data Engineering Podcast

Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Insurance 162
article thumbnail

LLM Inference Performance Engineering: Best Practices

databricks

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Why SQL is THE Language to Learn for Data Science

KDnuggets

SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.

article thumbnail

How to use the DockerOperator

Marc Lamberti

Do you wonder how to use the DockerOperator in Airflow to kick off a docker image? Or how to run a task without creating dependency conflicts? In this tutorial, you will discover everything you need about the DockerOperator with practical examples. If you’re new to Airflow, I’ve created a course you can check out here. Ready? Let’s go!

AWS 130

More Trending

article thumbnail

Llama 2 Foundation Models Available in Databricks Lakehouse AI

databricks

We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep.

article thumbnail

Unlocking GPT-4 Summarization with Chain of Density Prompting

KDnuggets

Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.

153
153
article thumbnail

Data News — Week 23.40

Christophe Blefari

( credits ) Hey, I'm a bit late once again. I hope this newsletter edition finds you well. This is almost a raw edition, I had quite a big amount of links, I hope you will like this selection. Gen AI 🤖 OpenAI’s plan to build the "iPhone of artificial intelligence" — Obviously this is one of the main struggle for OpenAI.

Python 130
article thumbnail

Increase data literacy and trust with Alation data catalog integration

ThoughtSpot

When using data to make impactful business decisions, certain doubts may start to arise, like “What does this column exactly mean?” or “Can I trust this data source I want to use?” Questions like these speak to a larger need for increased data literacy and trust in data. ThoughtSpot continually invests in this area, giving users the confidence to build the correct Answers needed for their analysis—and ensuring they can trust the data they are shown.

Metadata 105
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Announcing public preview of Databricks Assets Bundles: Apply software development best practices with ease

databricks

We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering.

article thumbnail

Best Practices for Building ETLs for ML

KDnuggets

This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.

Building 150
article thumbnail

Build an Actionable Customer 360 in the Data Cloud with Hightouch Events

Snowflake

Easily collect and store digital events directly to create a complete composable customer data platform (CDP) Marketers are increasingly leveraging the Snowflake Data Cloud as the foundation for all of their customer data analytics and activation. Marketing teams are creating composable customer data platforms (CDPs) on the Data Cloud to build a 360-degree view of each customer.

Cloud 102
article thumbnail

Unapologetically Technical Episode 5 – Neil Avery

Jesse Anderson

Unapologetically Technical is finally back with a new episode! In this episode of Unapologetically Technical, I had the pleasure of interviewing Neil Avery from Liquidlabs. We discussed his experiences creating grid computing systems at major banks like Royal Bank of Scotland and Deutchebank, as well as his journey to founding a startup called Logscape and working as a consultant at Excellian.

Banking 100
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Databricks Obtains ISO 27701 Certification

databricks

We’re excited to announce that Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor. This certification reflects our c.

article thumbnail

Rust Burn Library for Deep Learning

KDnuggets

A new deep learning framework built entirely in Rust that aims to balance flexibility, performance, and ease of use for researchers, ML engineers, and developers.

article thumbnail

Projetando a arquitetura orientada a eventos da Loggi para flexibilidade e produtividade em engenharia

Confluent

With Confluent Cloud, Loggi migrated to an event-driven architecture, powering real-time analytics, boosting productivity, and cutting costs.

article thumbnail

How LinkedIn Elevated Its Risk and Compliance Platform To Improve Stakeholder Experience And Enable Next Generation Integrated Risk Management

LinkedIn Engineering

Co-Authors: Chaitali Parmar , Eric Stoll , and Natasha Michel At Linkedin, one of the Information Security team's core commitments is to enable an environment of trusted and secure products, platforms, and infrastructure for our employees, members, and customers. The Infosec Governance, Risk and Compliance (GRC) and Third Party Security (TPS) teams are responsible for documenting security policy and monitoring in-house and third party risk and control environments to assure compliance and a heal

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Scalable, In-House Quality Measurement with a NCQA-Certified Engine on the Lakehouse

databricks

This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park.

article thumbnail

AI and Open Source Software: Separated at Birth?

KDnuggets

In this article, Luis shares with readers his thoughts on the intersection of open source software and machine learning and what the future might bring. Many articles cover how open source software is used by the machine learning community but this post focuses on the similarities between the two areas of practice and what machine learning can and can’t learn from open source software.

article thumbnail

Introducing Apache Kafka 3.6

Confluent

Apache Kafka 3.6 brings Tiered Storage Early Access, migrating clusters from ZooKeeper to KRaft with no downtime, a grace period for stream-table joins, and more!

Kafka 90
article thumbnail

How to Become a Data Engineer

Towards Data Science

A shortcut for beginners in 2024 Continue reading on Towards Data Science »

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Databricks and Shell collaborate to simplify industrial time series data analytics on the Lakehouse

databricks

Written in partnership with Shell. The energy industry is all about physical assets – from terminals, ships and pipelines to refineries and wind f.

article thumbnail

7 High Paying Side Hustles for Data Scientists

KDnuggets

This article serves as a guide for the data professional who wants to earn more in these trying times.

Data 150
article thumbnail

Snowflake and Partners Develop Award-Winning Solution to Give Telecoms and Consumers the Power to Reduce Carbon Emissions with Generative AI

Snowflake

In the age of climate consciousness, industries worldwide are grappling with the urgent need to reduce their carbon footprints. One industry that has come under increased scrutiny is telecommunications, where Scope 3 emissions , or the indirect emissions that occur in a company’s value chain that the company has no direct control over, alone account for a staggering 85% of a typical telecom company’s carbon footprint.

article thumbnail

Data Streaming and Artificial Intelligence: The Future of Real-Time Social Media Monitoring

Confluent

Learn how data streaming and artificial intelligence enables you to project your brand’s reputation with real-time social media monitoring.

Media 80
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Announcing the General Availability of the Databricks SQL Statement Execution API

databricks

Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for.

SQL 105
article thumbnail

Comparing Natural Language Processing Techniques: RNNs, Transformers, BERT

KDnuggets

RNN, Transformers, and BERT are popular NLP techniques with tradeoffs in sequence modeling, parallelization, and pre-training for downstream tasks.

Process 149
article thumbnail

Propel Telecom Growth with Location-Based Context

Precisely

Telecom providers invest heavily in infrastructure, so it’s vital that they optimize those investments by using an intelligent planning process. That means making data-driven decisions based on rich, contextual, location-based data. Is your company making the right investments in infrastructure? That depends on the answers to three questions: Are you building in the right place?

article thumbnail

Building the Yellow Brick Road to Data-Centric Security

Confluent

Discover data-centric security strategies with Confluent. Join Mike Peacock on Oct 12 for key insights. Register now!

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.