The Power of a Semantic Layer: A Data Engineer’s Guide
KDnuggets
OCTOBER 10, 2023
Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.
KDnuggets
OCTOBER 10, 2023
Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonor report on Semantic Layers can help you delve deeper.
The Pragmatic Engineer
OCTOBER 10, 2023
👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover three out of eight topics from today’s deepdive into tech scaleup Chronosphere. To get full issues twice a week, subscribe here.
Data Engineering Podcast
OCTOBER 8, 2023
Summary The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.
databricks
OCTOBER 12, 2023
In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
KDnuggets
OCTOBER 12, 2023
SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.
Marc Lamberti
OCTOBER 11, 2023
Do you wonder how to use the DockerOperator in Airflow to kick off a docker image? Or how to run a task without creating dependency conflicts? In this tutorial, you will discover everything you need about the DockerOperator with practical examples. If you’re new to Airflow, I’ve created a course you can check out here. Ready? Let’s go!
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
databricks
OCTOBER 10, 2023
We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep.
KDnuggets
OCTOBER 9, 2023
Unlock the power of GPT-4 summarization with Chain of Density (CoD), a technique that attempts to balance information density for high-quality summaries.
Christophe Blefari
OCTOBER 9, 2023
( credits ) Hey, I'm a bit late once again. I hope this newsletter edition finds you well. This is almost a raw edition, I had quite a big amount of links, I hope you will like this selection. Gen AI 🤖 OpenAI’s plan to build the "iPhone of artificial intelligence" — Obviously this is one of the main struggle for OpenAI.
ThoughtSpot
OCTOBER 9, 2023
When using data to make impactful business decisions, certain doubts may start to arise, like “What does this column exactly mean?” or “Can I trust this data source I want to use?” Questions like these speak to a larger need for increased data literacy and trust in data. ThoughtSpot continually invests in this area, giving users the confidence to build the correct Answers needed for their analysis—and ensuring they can trust the data they are shown.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
databricks
OCTOBER 11, 2023
We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering.
KDnuggets
OCTOBER 12, 2023
This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.
Snowflake
OCTOBER 9, 2023
Easily collect and store digital events directly to create a complete composable customer data platform (CDP) Marketers are increasingly leveraging the Snowflake Data Cloud as the foundation for all of their customer data analytics and activation. Marketing teams are creating composable customer data platforms (CDPs) on the Data Cloud to build a 360-degree view of each customer.
Jesse Anderson
OCTOBER 12, 2023
Unapologetically Technical is finally back with a new episode! In this episode of Unapologetically Technical, I had the pleasure of interviewing Neil Avery from Liquidlabs. We discussed his experiences creating grid computing systems at major banks like Royal Bank of Scotland and Deutchebank, as well as his journey to founding a startup called Logscape and working as a consultant at Excellian.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
databricks
OCTOBER 9, 2023
We’re excited to announce that Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor. This certification reflects our c.
KDnuggets
OCTOBER 13, 2023
A new deep learning framework built entirely in Rust that aims to balance flexibility, performance, and ease of use for researchers, ML engineers, and developers.
Confluent
OCTOBER 12, 2023
With Confluent Cloud, Loggi migrated to an event-driven architecture, powering real-time analytics, boosting productivity, and cutting costs.
LinkedIn Engineering
OCTOBER 13, 2023
Co-Authors: Chaitali Parmar , Eric Stoll , and Natasha Michel At Linkedin, one of the Information Security team's core commitments is to enable an environment of trusted and secure products, platforms, and infrastructure for our employees, members, and customers. The Infosec Governance, Risk and Compliance (GRC) and Third Party Security (TPS) teams are responsible for documenting security policy and monitoring in-house and third party risk and control environments to assure compliance and a heal
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
databricks
OCTOBER 13, 2023
This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park.
KDnuggets
OCTOBER 9, 2023
In this article, Luis shares with readers his thoughts on the intersection of open source software and machine learning and what the future might bring. Many articles cover how open source software is used by the machine learning community but this post focuses on the similarities between the two areas of practice and what machine learning can and can’t learn from open source software.
Confluent
OCTOBER 11, 2023
Apache Kafka 3.6 brings Tiered Storage Early Access, migrating clusters from ZooKeeper to KRaft with no downtime, a grace period for stream-table joins, and more!
Towards Data Science
OCTOBER 7, 2023
A shortcut for beginners in 2024 Continue reading on Towards Data Science »
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
databricks
OCTOBER 9, 2023
Written in partnership with Shell. The energy industry is all about physical assets – from terminals, ships and pipelines to refineries and wind f.
KDnuggets
OCTOBER 9, 2023
This article serves as a guide for the data professional who wants to earn more in these trying times.
Snowflake
OCTOBER 12, 2023
In the age of climate consciousness, industries worldwide are grappling with the urgent need to reduce their carbon footprints. One industry that has come under increased scrutiny is telecommunications, where Scope 3 emissions , or the indirect emissions that occur in a company’s value chain that the company has no direct control over, alone account for a staggering 85% of a typical telecom company’s carbon footprint.
Confluent
OCTOBER 9, 2023
Learn how data streaming and artificial intelligence enables you to project your brand’s reputation with real-time social media monitoring.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
databricks
OCTOBER 13, 2023
Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for.
KDnuggets
OCTOBER 11, 2023
RNN, Transformers, and BERT are popular NLP techniques with tradeoffs in sequence modeling, parallelization, and pre-training for downstream tasks.
Precisely
OCTOBER 9, 2023
Telecom providers invest heavily in infrastructure, so it’s vital that they optimize those investments by using an intelligent planning process. That means making data-driven decisions based on rich, contextual, location-based data. Is your company making the right investments in infrastructure? That depends on the answers to three questions: Are you building in the right place?
Confluent
OCTOBER 11, 2023
Discover data-centric security strategies with Confluent. Join Mike Peacock on Oct 12 for key insights. Register now!
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content