Top Data Engineering Digest Amazon Web Services Cloud Computing Content for Week of Nov 11

Sat.Nov 11, 2023 - Fri.Nov 17, 2023

Customer Spotlight: MetaMap

Preset

NOVEMBER 13, 2023

Product Preset Cloud Fully-managed, cloud-hosted service for Apache Superset Managed Private Cloud Preset with additional security in your private cloud Preset Certified Superset Deploy QA-approved Superset on any infrastructure Preset Embedded Dashboards Interactive analytics in your custom applications Preset API Managing your Preset workspaces as code Use Cases Business Intelligence (BI) Analytics and visualizations powered by Apache Superset for modern data stacks Internal Tooling Embedded a

BI Business Intelligence Data Warehouse SQL

What is an Open Table Format? & Why to use one?

Start Data Engineering

NOVEMBER 14, 2023

1. Introduction 2. What is an Open Table Format (OTF) 3. Why use an Open Table Format (OTF) 3.0. Setup 3.1. Evolve data and partition schema without reprocessing 3.2. See previous point-in-time table state, aka time travel 3.3. Git like branches & tags for your tables 3.4. Handle multiple reads & writes concurrently 4. Conclusion 5. Further reading 6.

Data

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

5 Free Courses to Master Data Science

KDnuggets

NOVEMBER 13, 2023

Want to break into data science? Start upskilling today with these free courses to learn programming, data analysis, and machine learning.

Data Science

Data Science Machine Learning Data Analysis Data

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

The Data Discovery Team

Jesse Anderson

NOVEMBER 14, 2023

A Guest Post by Ole Olesen-Bagneux In this blog post I would like to describe a new data team, that I call ‘the data discovery team’. It’s a team that connects naturally into the constellation of the three data teams Operations team Data engineering team Data Science team as described in Jesse Anderson’s book Data Teams (2020) Before I explain what the data discovery team should do, it is necessary to add a bit of context on the concept of data discovery itself.

Metadata

Metadata Data Science Big Data Data

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

Manufacturing

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Summary Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when t

Software Engineering

Software Engineering Software Engineer Engineering Data Lake

Data Intelligence Platforms

databricks

NOVEMBER 15, 2023

The observation that "software is eating the world" has shaped the modern tech industry. Today, software is ubiquitous in our lives, from the.

Data

The 5 Best Vector Databases You Must Try in 2024

KDnuggets

NOVEMBER 17, 2023

The top vector databases are known for their versatility, performance, scalability, consistency, and efficient algorithms in storing, indexing, and querying vector embeddings for AI applications.

Database

Database Algorithm

More Trending

The 5 Best Vector Databases You Must Try in 2024

KDnuggets

NOVEMBER 17, 2023

The top vector databases are known for their versatility, performance, scalability, consistency, and efficient algorithms in storing, indexing, and querying vector embeddings for AI applications.

Database

Database Algorithm

Apache Druid: Who’s Using It and Why?

Seattle Data Guy

NOVEMBER 17, 2023

Image Source: Druid The past few decades have increased the need for faster data. Some of the catalysts were the push for better data and decisions to be made around advertising. In fact, Adtech has driven much of the real-time data technologies that we have today. For example, Reddit uses a real-time database to provide… Read more The post Apache Druid: Who’s Using It and Why?

IT Database Technology Data

Introducing the Geodatabase Resources Hub

ArcGIS

NOVEMBER 17, 2023

This blog introduces the Geodatabase Resources Hub, a one-stop shop for all content offered by Esri's Geodatabase Team.

Data Management

Data Management Management Data

Apache Flink - anatomy of a job

Waitingforcode

NOVEMBER 14, 2023

Have you written your first successful Apache Flink job and are still wondering the high-level API translates into the executable details? I did and decided to answer the question in the new blog post.

7 Steps to Running a Small Language Model on a Local CPU

KDnuggets

NOVEMBER 14, 2023

Discover how to run a small language model on your local CPU in just seven easy steps.

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

Data Engineering

5 Reasons to Attend BUILD 2023: The Dev Conference for AI & Apps

Snowflake

NOVEMBER 15, 2023

BUILD 2023 is where AI gets real. Join our two-day virtual global conference and learn how to build with the app dev innovations you heard about at Snowflake Summit and Snowday. We have more demos and hands-on virtual labs than ever before—and you won’t find a bunch of slideware here. The focus is on tools and capabilities that are generally available or in public and private preview, so you can leave BUILD and put your new skills into action immediately.

Building

Building Accessibility Accessible IT

What’s new from the geodatabase team in ArcGIS Pro 3.2

ArcGIS

NOVEMBER 14, 2023

Here's everything new in ArcGIS Pro 3.2 from the Geodatabase Team. Schema Reports, 64-bit OIDs, Big Integer fields, new date fields, etc.

Data Management

Data Management Management Data

Announcing the General Availability of Azure Databricks support for Azure confidential computing (ACC)

databricks

NOVEMBER 16, 2023

Today we are excited to announce the general availability of Azure Databricks support for Azure confidential computing (ACC)! With support for Azure confidential.

Optimizing Data Analytics: Integrating GitHub Copilot in Databricks

KDnuggets

NOVEMBER 16, 2023

Integrating AI-powered pair programming tools for data analytics in Databricks optimizes and streamlines the development process, freeing up developer time for innovation.

Data Analytics

Data Analytics Data Programming Process

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data Workflow

Organist: stay sane managing your development environments

Tweag

NOVEMBER 15, 2023

tl;dr: We’re pleased to announce the beta release of Organist , a tool designed to ease the definition of reliable and low-friction development environments and workflows, building on the combined strengths of Nix and Nickel. A mess of cables and knobs I used to play piano as a kid. As a teenager, I became frustrated by the limitations of the instrument and started getting into synthesizers.

Management

Management Python Programming Language Programming

Geodatabase Schema Reports

ArcGIS

NOVEMBER 16, 2023

Geodatabase schema reports add Xray functionality to ArcGIS Pro 3.2.

IT Data Data Management Management

Generative AI Is The Key To Transforming The Telecom Industry

Snowflake

NOVEMBER 16, 2023

The telecom industry is undergoing a monumental transformation. The rise of new technologies such as 5G, cloud computing, and the Internet of Things (IoT) is putting pressure on telecom operators to find new ways to improve the performance of their networks, reduce costs and provide better customer service. Cost pressures especially are incentivizing telecoms to find new ways to implement automation and more efficient processes to help optimize operations and employee productivity.

Cloud Computing

Cloud Computing Cloud Systems Technology

Everything you need to become a SAS Certified Machine Learning Engineer

KDnuggets

NOVEMBER 14, 2023

Read on to find out everything you need to become a SAS Certified Machine Learning Engineer.

Machine Learning

Machine Learning Engineering

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Why Spatial Data Governance is Critical to Your Business Strategy

Precisely

NOVEMBER 14, 2023

When speaking to organizations about data integrity , and the key role that both data governance and location intelligence play in making more confident business decisions, I keep hearing the following statements: “For any organization, data governance is not just a nice-to-have! “ “Everyone knows that 80% of data contains location information. Why are you still telling us this, Monica?

Data Governance

Data Governance Government Metadata Retail

Deep Learning for Image Analyst – What’s New in ArcGIS Pro 3.2

ArcGIS

NOVEMBER 14, 2023

This blog details the new features and enhancements that were add for deep learning using the Image Analyst extension - for Pro 3.2.

Deep Learning

Fleetclusters for Databricks + AWS to reduce Costs.

Confessions of a Data Guy

NOVEMBER 14, 2023

Show me the money. That’s what it’s all about. I have a question for you, to tickle your ears and mind. Get you out of that humdrum funk you are in. Here is my question, riddle me this all you hobbits. “Of what use is, and what good does the best and most advanced architecture […] The post Fleetclusters for Databricks + AWS to reduce Costs. appeared first on Confessions of a Data Guy.

AWS

AWS Architecture IT Data

Back to Basics Week 2: Database, SQL, Data Management and Statistical Concepts

KDnuggets

NOVEMBER 13, 2023

Welcome back to Week 2 of KDnuggets’ "Back to Basics" series. This week, we delve into the vital world of Databases, SQL, Data Management, and Statistical Concepts in Data Science.

Database

Database SQL Data Management Management

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Netflix Tech

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions. Many metrics in Netflix’s financial reports are powered and reconciled with efforts from our team!

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Python Dependency Management in Spark Connect

databricks

NOVEMBER 13, 2023

Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to.

Management

Management Python Data Engineering Data Engineer

Deep Learning with ArcGIS Pro Tips & Tricks: Part 1

ArcGIS

NOVEMBER 15, 2023

Prepare your environment to run out-of-the-box deep learning geoprocessing tools in ArcGIS Pro. Machine learning is more accessible than ever with pre-trained models enabling you to extract data from your imagery.

Deep Learning

Deep Learning Machine Learning Accessibility Accessible

Make Your Own GPTs with ChatGPT’s GPTs!

KDnuggets

NOVEMBER 15, 2023

Want to out-GPT ChatGPT with your own GPT? Then let's GPT the GPTs!

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Data Workflow

How Financial Platform Tide Automated GDPR Compliance With Atlan and Snowflake

Snowflake

NOVEMBER 13, 2023

Tide , a mobile-first financial platform based in the U.K., offers fast, intuitive service to small business customers. Data is crucial to Tide, having supported its incredible growth to nearly 500,000 customers in just eight years. As a regulated financial platform, the company sought to improve its compliance with GDPR’s right to erasure provision, commonly known as the “right to be forgotten.

Consulting

Consulting Finance Data Warehouse Data Governance

Cybersecurity Lakehouses Best Practices Part 4: Data Normalization Strategies

databricks

NOVEMBER 17, 2023

In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering.

Data Engineering

Data Engineering Data Engineer Data Engineering

Demystifying SAR Satellite Data in ArcGIS Pro: ICEYE

ArcGIS

NOVEMBER 14, 2023

This article is specific to ICEYE SAR satellite data and is part of a blog series on sensor support in ArcGIS Pro.

Data

Data Education

The Rise and Fall of Prompt Engineering: Fad or Future?

KDnuggets

NOVEMBER 15, 2023

This article provides an overview of prompt engineering, from its inception to current status.

Engineering

Engineering IT

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Sat.Nov 11, 2023 - Fri.Nov 17, 2023

Customer Spotlight: MetaMap

What is an Open Table Format? & Why to use one?

Webinars

Trending Sources

5 Free Courses to Master Data Science

Webinars

The Data Discovery Team

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Intelligence Platforms

The 5 Best Vector Databases You Must Try in 2024

Sign up to get articles personalized to your interests!

More Trending

The 5 Best Vector Databases You Must Try in 2024

Apache Druid: Who’s Using It and Why?

Introducing the Geodatabase Resources Hub

Apache Flink - anatomy of a job

7 Steps to Running a Small Language Model on a Local CPU

Airflow Best Practices for ETL/ELT Pipelines

5 Reasons to Attend BUILD 2023: The Dev Conference for AI & Apps

What’s new from the geodatabase team in ArcGIS Pro 3.2

Announcing the General Availability of Azure Databricks support for Azure confidential computing (ACC)

Optimizing Data Analytics: Integrating GitHub Copilot in Databricks

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Organist: stay sane managing your development environments

Geodatabase Schema Reports

Generative AI Is The Key To Transforming The Telecom Industry

Everything you need to become a SAS Certified Machine Learning Engineer

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Why Spatial Data Governance is Critical to Your Business Strategy

Deep Learning for Image Analyst – What’s New in ArcGIS Pro 3.2

Fleetclusters for Databricks + AWS to reduce Costs.

Back to Basics Week 2: Database, SQL, Data Management and Statistical Concepts

How to Modernize Manufacturing Without Losing Control

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Python Dependency Management in Spark Connect

Deep Learning with ArcGIS Pro Tips & Tricks: Part 1

Make Your Own GPTs with ChatGPT’s GPTs!

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

How Financial Platform Tide Automated GDPR Compliance With Atlan and Snowflake

Cybersecurity Lakehouses Best Practices Part 4: Data Normalization Strategies

Demystifying SAR Satellite Data in ArcGIS Pro: ICEYE

The Rise and Fall of Prompt Engineering: Fad or Future?

A Guide to Debugging Apache Airflow® DAGs

Stay Connected