The Power of a Semantic Layer: A Data Engineer’s Guide
KDnuggets
OCTOBER 10, 2023
Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.
KDnuggets
OCTOBER 10, 2023
Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.
The Pragmatic Engineer
OCTOBER 10, 2023
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of eight topics from today’s deepdive into tech scaleup Chronosphere. To get full issues twice a week, subscribe here.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Data Engineering Podcast
OCTOBER 8, 2023
Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.
databricks
OCTOBER 12, 2023
In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
OCTOBER 12, 2023
SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.
Marc Lamberti
OCTOBER 11, 2023
Do you wonder how to use the DockerOperator in Airflow to kick off a docker image? Or how to run a task without creating dependency conflicts? In this tutorial, you will discover everything you need about the DockerOperator with practical examples. If you’re new to Airflow, I’ve created a course you can check out here. Ready? Let’s go!
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
databricks
OCTOBER 10, 2023
We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep.
KDnuggets
OCTOBER 9, 2023
Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.
Snowflake
OCTOBER 9, 2023
Easily collect and store digital events directly to create a complete composable customer data platform (CDP) Marketers are increasingly leveraging the Snowflake Data Cloud as the foundation for all of their customer data analytics and activation. Marketing teams are creating composable customer data platforms (CDPs) on the Data Cloud to build a 360-degree view of each customer.
ThoughtSpot
OCTOBER 9, 2023
When using data to make impactful business decisions, certain doubts may start to arise, like “What does this column exactly mean?” or “Can I trust this data source I want to use?” Questions like these speak to a larger need for increased data literacy and trust in data. ThoughtSpot continually invests in this area, giving users the confidence to build the correct Answers needed for their analysis—and ensuring they can trust the data they are shown.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
databricks
OCTOBER 11, 2023
We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering.
KDnuggets
OCTOBER 9, 2023
In this article, Luis shares with readers his thoughts on the intersection of open source software and machine learning and what the future might bring. Many articles cover how open source software is used by the machine learning community but this post focuses on the similarities between the two areas of practice and what machine learning can and can’t learn from open source software.
Snowflake
OCTOBER 11, 2023
While the startup world listened for better news in the aftermath of a volatile 2022, a new salvo of bad news emerged: global venture capital funding declined about 49% within the first six months of 2023 alone. Worsening inflation and rising interest rates are putting pressure on startups across all stages of venture funding to reframe their tech stacks or business models along the tech-scape’s collapsing edges.
Jesse Anderson
OCTOBER 12, 2023
Unapologetically Technical is finally back with a new episode! In this episode of Unapologetically Technical, I had the pleasure of interviewing Neil Avery from Liquidlabs. We discussed his experiences creating grid computing systems at major banks like Royal Bank of Scotland and Deutchebank, as well as his journey to founding a startup called Logscape and working as a consultant at Excellian.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
databricks
OCTOBER 9, 2023
We’re excited to announce that Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor. This certification reflects our c.
KDnuggets
OCTOBER 12, 2023
This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.
Snowflake
OCTOBER 12, 2023
In the age of climate consciousness, industries worldwide are grappling with the urgent need to reduce their carbon footprints. One industry that has come under increased scrutiny is telecommunications, where Scope 3 emissions , or the indirect emissions that occur in a company’s value chain that the company has no direct control over, alone account for a staggering 85% of a typical telecom company’s carbon footprint.
Confluent
OCTOBER 12, 2023
With Confluent Cloud, Loggi migrated to an event-driven architecture, powering real-time analytics, boosting productivity, and cutting costs.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
databricks
OCTOBER 13, 2023
This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park.
KDnuggets
OCTOBER 13, 2023
A new deep learning framework built entirely in Rust that aims to balance flexibility, performance, and ease of use for researchers, ML engineers, and developers.
Towards Data Science
OCTOBER 7, 2023
A shortcut for beginners in 2024 Continue reading on Towards Data Science »
LinkedIn Engineering
OCTOBER 13, 2023
Co-Authors: Chaitali Parmar , Eric Stoll , and Natasha Michel At Linkedin, one of the Information Security team's core commitments is to enable an environment of trusted and secure products, platforms, and infrastructure for our employees, members, and customers. The Infosec Governance, Risk and Compliance (GRC) and Third Party Security (TPS) teams are responsible for documenting security policy and monitoring in-house and third party risk and control environments to assure compliance and a heal
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
databricks
OCTOBER 9, 2023
Written in partnership with Shell. The energy industry is all about physical assets – from terminals, ships and pipelines to refineries and wind f.
KDnuggets
OCTOBER 9, 2023
This article serves as a guide for the data professional who wants to earn more in these trying times.
Towards Data Science
OCTOBER 7, 2023
Getting started with LLMs? Here are 5 popular applications data teams at OpenAI, Vimeo, and other companies are putting into practice today. Image courtesy of author. The hype around generative AI is real, and data and ML teams are feeling the heat. Across industries, executives are pushing their data leaders to build AI-powered products that will save time, drive revenue, or give them a competitive advantage.
Knowledge Hut
OCTOBER 13, 2023
Project management involves muti faceted skills and competencies. There are various skilled people involved in project management, from project coordinators to project consultants, the list is endless. One key role in project management is the project director. These individuals are in the top line of project management, they are responsible for making crucial decisions involved in the projects.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
databricks
OCTOBER 13, 2023
Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for.
KDnuggets
OCTOBER 9, 2023
Unlock the power of time-based data visualization with Pandas as we delve into the art of resampling, turning your data into insightful temporal masterpieces.
Towards Data Science
OCTOBER 12, 2023
Construction engineer investigating his work — Stable diffusion Introduction In our previous publication, From Data Engineering to Prompt Engineering , we demonstrated how to utilize ChatGPT to solve data preparation tasks. Apart from the good feedback we have received, one critical point has been raised: Prompt engineering may help with simple tasks, but is it really useful in a more challenging environment?
Robinhood
OCTOBER 12, 2023
Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to financial information and investing. Together, we are building products and services that help create a financial system everyone can participate in. … We are excited to join Latinhood, our Employee Resource Group (ERG) dedicated to the Hispanic/Latino community, in celebrating Hispanic Heritag
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content