My Obsidian Note-Taking Workflow
Simon Späti
JULY 28, 2024
A Vim-Inspired Approach to Efficient Note Management with Obsidian and Markdown
Simon Späti
JULY 28, 2024
A Vim-Inspired Approach to Efficient Note Management with Obsidian and Markdown
Start Data Engineering
JULY 16, 2024
1. Introduction 2. Data Quality(DQ) checks are run as part of your pipeline 2.1. Ensure your consumers don’t get incorrect data with output DQ checks 2.2. Catch upstream issues quickly with input DQ checks 2.3. Waiting a long time to run output DQ checks? Save time & money with mid-pipeline DQ checks. 2.4. Track incoming and outgoing row counts with Audit logs 3.
Confessions of a Data Guy
JULY 24, 2024
I’ve had something rattling around in the old noggin for a while; it’s just another strange idea that I can’t quite shake out. We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.
KDnuggets
JULY 24, 2024
From the soft tools to the hard tools, these are what make a data scientist successful.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
databricks
JULY 23, 2024
We are excited to partner with Meta to release the Llama 3.1 series of models on Databricks, further advancing the standard of powerful.
The Pragmatic Engineer
JULY 15, 2024
The past 18 months have seen major change reshape the tech industry. What does it all mean for businesses and dev teams – and what will pragmatic software engineering approaches look like in the future? I tackled these burning questions in my conference talk, “What’s Old is New Again,” which was the keynote of the Craft Conference in May 2024.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Start Data Engineering
JULY 26, 2024
1. Introduction 2. Project overview 3. Check your data before making it available to end-users; Write-Audit-Publish(WAP) pattern 4. TL;DR: How the greatexpectations library works 4.1. greatexpectations quick setup 5. From an implementation perspective, there are four types of tests 5.1. Running checks on one dataset 5.2. Checks involving the current dataset and its historical data 5.3.
Christophe Blefari
JULY 26, 2024
Tallinn ( credits ) Dear members, it's Summer Data News, the only news you can consume by the pool, the beach or at the office—if you're not lucky. This week, I'm writing from the Baltics, nomading a bit in Eastern and Northern Europe. I'm pleased to announce that we have successfully closed the CfP for Forward Data Conf, we received nearly 100 submissions and the program committee is currently reviewing all submissions.
KDnuggets
JULY 16, 2024
Empowering Developers and Transforming Programming Practices
databricks
JULY 22, 2024
Evaluating long-form LLM outputs quickly and accurately is critical for rapid AI development. As a result, many developers wish to deploy LLM-as-judge methods.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Waitingforcode
JULY 16, 2024
With this blog I'm starting a follow-up series for my Data+AI Summit 2024 talk. I missed this family of blog posts a lot as the previous DAIS with me as speaker was 4 years ago! As previously, this time too I'll be writing several blog posts that should help you remember the talk and also cover some of the topics left aside because of the time constraints.
Seattle Data Guy
JULY 2, 2024
Running a successful data team is hard. Data teams are expected to juggle a combination of ad-hoc requests, big bet projects, migrations, etc. All while keeping up with the latest changes in technology. In the past few years I have gotten to work with dozens of teams and see how various directors and managers deal… Read more The post 9 Habits Of Effective Data Managers – Running A Data Team appeared first on Seattle Data Guy.
ArcGIS
JULY 31, 2024
Catch eyes and imaginations with this fun technique that draws attention to your area of interest with a bit of style!
Christophe Blefari
JULY 13, 2024
EuroSeagull ( credits ) Dear members, it's been a few weeks since I did not catch you on a proper Data News with a collection of links. Here we are. This week, I attended EuroPython in Prague. While I spent most of my time at the dltHub booth in the sponsors hall, I didn't attend many talks. However, I did give a few presentations on my SQL orchestration library, yato , which pairs well with dlt.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
KDnuggets
JULY 29, 2024
Learn to build the end-to-end data science pipelines from data ingestion to data visualization using Pandas pipe method.
databricks
JULY 2, 2024
Databricks announced the public preview of Mosaic AI Agent Framework & Agent Evaluation alongside our Generative AI Cookbook at the Data + AI.
Waitingforcode
JULY 10, 2024
Welcome to the first Data+AI Summit 2024 retrospective blog post. I'm opening the series with the topic close to my heart at the moment, stream processing!
Start Data Engineering
JULY 1, 2024
1. Introduction 2. Code is an interface to the execution engine 3. How to choose the execution engine and the coding interface 3.1. Chose execution engine based on your workload 3.1.1. Types of execution engine 3.1.2. Criteria to chose your execution engine 3.2. Chose coding interface for people who will maintain the pipeline 3.2.1. Types of coding interfaces 3.2.2.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Snowflake
JULY 9, 2024
Snowflake is committed to helping customers protect their accounts and data. That’s why we have been working on product capabilities that allow Snowflake admins to make multifactor authentication (MFA) mandatory and monitor compliance with this new policy. As part of that effort, today we’re announcing several key features: 1. A new authentication policy that requires MFA for all users in a Snowflake account 2.
ArcGIS
JULY 12, 2024
Creation of a Digital Twin in Seven Days with ArcGIS in Zurich
KDnuggets
JULY 15, 2024
Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.
databricks
JULY 22, 2024
Today, we're thrilled to announce that Mosaic AI Model Training's support for fine-tuning GenAI models is now available in Public Preview. At Databricks.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
Confluent
JULY 30, 2024
Confluent Platform 7.
Data Engineering Weekly
JULY 21, 2024
Editor’s Note: A New Series on Data Engineering Tools Evaluation There are plenty of data tools and vendors in the industry. But how can we choose a tool for the specific need? The traditional evaluation of running PoC on all the selected vendor tools is time-consuming and practically unviable for growth-driven companies. Data Engineering Weekly is launching a new series on software evaluation focused on data engineering to better guide data engineering leaders in evaluating data tools.
Engineering at Meta
JULY 16, 2024
The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A/B test common ML workflows – enabling proactive improvements and automatically preventing regressions on TTFB. AI Lab prevents TTFB regressions whilst enabling experimentation to develop improvements.
ArcGIS
JULY 12, 2024
Esri is working with partners (Maxar, TomTom) to enhance our 3D basemaps with high-quality commercial data for elevation and buildings layers.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
KDnuggets
JULY 12, 2024
Discover the essential tools every data scientist should know to elevate their data science game, from Python and R to SQL and advanced visualization tools.
databricks
JULY 23, 2024
In this blog, we are excited to share Databricks's journey in migrating to Unity Catalog for enhanced data governance. We'll discuss our high-level strategy and the tools we developed to facilitate the migration. Our goal is to highlight the benefits of Unity Catalog and make you feel confident about transitioning to it.
Snowflake
JULY 25, 2024
Snowflake Cortex Search, a fully managed search service for documents and other unstructured data, is now in public preview. With Cortex Search, organizations can effortlessly deploy retrieval-augmented generation (RAG) applications with Snowflake, powering use cases like customer service, financial research and sales chatbots. Cortex Search offers state-of-the-art semantic and lexical search over your text data in Snowflake behind an intuitive user interface, and it comes with the robust securi
Confessions of a Data Guy
JULY 9, 2024
When I was young and full of myself, writing Perl and PHP, while your ma was still reading you a bedtime story and giving you a stuffy to fall asleep with, I had to program uphill, both ways, in the rain and snow. Not like you milk toast Data Engineers clickty clicking around Databricks and […] The post The Abstractions Are Making You Dumb (rise of the Shallow Expert) appeared first on Confessions of a Data Guy.
Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage
When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.
Let's personalize your content