Accessible, Management and Process - Data Engineering Digest

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Support Data Engineering Podcast Summary Streaming data processing enables new categories of data products and analytics.

Process

Process Data Lake High Quality Data Machine Learning

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Once it is running, the next challenge is figuring out how to address release management for all of the different component parts. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat.

Management

Management Data Lake High Quality Data Machine Learning

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. Users have a variety of tools they can use to manage and access their information on Meta platforms. feature on Facebook.

Accessibility

Accessibility Accessible Raw Data Data Warehouse

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. That’s where data-driven construction comes in.

Project

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. After Zynga, he rejoined Amazon, and was the General Manager (GM) for Compute services at AWS, and later chief of staff, and advisor to AWS executives like Charlie Bell and Andy Jassy (Amazon’s current CEO.)

Engineering

Engineering Bytes Cloud Computing AWS

FedRAMP In Process Designation, A Milestone in Cybersecurity Commitment

Cloudera

APRIL 2, 2024

I am pleased to announce that Cloudera has achieved FedRAMP “In Process”, a significant milestone that underscores our commitment to providing the public sector with secure and reliable data management solutions across on-prem, hybrid and multi-cloud environments.

Designing

Designing Process Government Metadata

Master Data Management: Common Misconceptions You Should Know

Precisely

OCTOBER 23, 2023

When most people think of master data management, they first think of customers and products. The business must also manage locations, including warehouses, offices, and subsidiaries, not to mention the various addresses associated with virtually every data element the business manages. Four main challenges make MDM complex.

Data Management

Data Management Management Data Data Integration

How does ChatGPT work? As explained by the ChatGPT team.

The Pragmatic Engineer

APRIL 21, 2024

Other shipped things include DALL·E 3 (image generation,) GPT-4 (an advanced model,) and the OpenAI API which developers and companies use to integrate AI into their processes. I managed our entire Applied Engineering org from its earliest days through the launch and scaling of ChatGPT.

Engineering

Engineering Software Engineer Software Engineering Programming

How Snowflake and Merit Helped Provide Over 120,000 Students with Access to Education Funding

Snowflake

MAY 16, 2024

Snowflake joined forces with Merit to provide an identity verification platform and a set of program delivery services that help run large-scale government programs in areas such as licensing regulations, workforce development, emergency management, and educational grants and scholarships.

Education

Education Accessibility Accessible Recruitment

Snowflake ML Now Supports Expanded MLOps Capabilities for Streamlined Management of Features and Models

Snowflake

JUNE 11, 2024

Bringing machine learning (ML) models into production is often hindered by fragmented MLOps processes that are difficult to scale with the underlying data. The friction of having to set up and manage separate environments for features and models creates operational complexity that can be costly to maintain and difficult to use.

Management

Management Government Metadata Python

Composable data management at Meta

Engineering at Meta

MAY 22, 2024

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. A few years ago we embarked on a journey to address these shortcomings by rethinking how our data management systems were designed.

Data Management

Data Management Management Data SQL

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Cloudera, together with Octopai, will make it easier for organizations to better understand, access, and leverage all their data in their entire data estate – including data outside of Cloudera – to power the most robust data, analytics and AI applications.

Metadata

Metadata Management Data Governance Government

Layoffs push down scores on Glassdoor: this is how companies respond

The Pragmatic Engineer

MAY 25, 2023

In every issue, I cover topics related to Big Tech and high-growth startups through the lens of engineering managers and senior engineers. ” I managed to talk to someone in this company’s HR department, who confirmed that the leadership set a goal to improve the business’s Glassdoor rating.

Software Engineer

Software Engineer Software Engineering AWS Engineering

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

FEBRUARY 20, 2024

This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable. Meta’s Data Infrastructure teams have been rethinking how data management systems are designed. An introduction to Velox Velox is the first project in our composable data management system program.

Data Management

Data Management Bytes Management Datasets

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

. "Serverless computing" has enabled customers to use cloud capabilities without provisioning, deploying and managing either hardware or software resources. Snowflake has embraced serverless since our founding in 2012, with customers providing their code to load, manage and query data and us taking care of the rest.

Management

Management Government Cloud Unstructured Data

Organist: stay sane managing your development environments

Tweag

NOVEMBER 15, 2023

Organist is a tool meant to do just that: Provide you with an ergonomic way of managing the complexity of your development environment. Asking contributors to apt get something just to contribute to your project is both a hindrance, and a source of failure and friction in the setup process. As much as possible, it should: Remain local.

Management

Management Python Programming Language Project

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

Without the backing of management, a large-scale rewrite is likely to fail. My goal was to fix the debt of hardcoded strings, but I learned a lot about the codebase and our process as I did it. In 2004, I was hired by ISO-NE, a non-profit that manages the electric grid in New England. Big rewrites need heavyweight support.

Recruitment

Recruitment Java Coding Project

Snowflake Cortex AI Continues to Advance Enterprise AI with No-Code Development, Serverless Fine-Tuning and Managed Services to Build Chat-with-Data Applications

Snowflake

JUNE 5, 2024

Addressing a lack of in-house AI expertise and simplifying AI processes can make adoption easier. Snowflake Cortex AI is a fully managed service designed to unlock the potential of the technology for everyone within an organization, regardless of their technical expertise. That’s where Snowflake comes in.

Coding

Coding Building Management Government

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP

The Pragmatic Engineer

OCTOBER 31, 2023

In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. Diagnosis: Customers may be unable to access Cloud resources in europe-west9-a Workaround: Customers can fail over to other zones.” We apologize to all who are affected by the disruption.

AWS

AWS Google Cloud Cloud Engineering

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

MARCH 19, 2024

In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. Today, full subscribers got access to a comprehensive Senior-and-above tech compensation research. Source: Cognition So far, all we have is video demos, and accounts of those with access to this tool.

Software Engineer

Software Engineer Software Engineering Programming Language Media

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. What motivated to write a book about how to manage Kafka in production? Can you describe your experiences with Kafka? There are many options now for persistent data queues.

Kafka

Kafka Data Lake High Quality Data SQL

New Year, New Approaches to Tackling IT Operations Management

Precisely

FEBRUARY 6, 2025

The traditional ways of operations management are over modernization and holistic approaches are now essential. For IT operations (ITOps) teams, 2025 means reassessing technology stacks, processes, and people. Success in tackling modernization of IT operations management starts with assessing where your team is. Whats next?

IT

IT Management Datasets Systems

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. What are the requirements around governance and auditability of data access that need to be addressed when sharing data?

Data Lake

Data Lake High Quality Data Government Data Pipeline

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it? Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Data Pipeline

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. Closing Announcements Thank you for listening!

Architecture

Architecture Data Lake High Quality Data SQL

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Data Engineering Podcast

APRIL 16, 2023

As the capabilities of these systems has improved and become more accessible, the target of what self-serve means changes. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake.

Business Intelligence

Business Intelligence Building Data Lake BI

Introducing Snowpark pandas API: Run Distributed pandas at Scale in Snowflake

Snowflake

JUNE 5, 2024

With Snowpark’s existing DataFrame API , users have access to a robust framework for lazily evaluated, relational operations on data, closely resembling Spark’s conventions. pandas is the go-to data processing library for millions worldwide, including countless Snowflake users. Why introduce a distributed pandas API?

Python

Python Programming Language Government SQL

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Synq's platform helps data teams manage incidents, understand data dependencies, and ensure data quality by providing insights and automation capabilities. Petr emphasizes the need for a holistic approach to data reliability, integrating data systems into broader business processes. What do you have planned for the future of Synq?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

When that system is responsible for the data layer the process becomes more challenging. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Summary Any software system that survives long enough will require some form of migration or evolution.

Systems

Systems Data Lake High Quality Data Google Cloud

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake

JUNE 4, 2024

Embrace declarative and dynamic data pipelines with automated orchestration Snowflake is also excited to announce Snowflake Tasks, Dynamic Tables and Database Change Management (DCM), powerful new features designed to streamline your development workflow with declarative best practices and automated orchestration.

Data Pipeline

Data Pipeline Python SQL Database

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. What are the features and focus of Pieces that might encourage someone to use it over the alternatives?

Building

Building Data Lake High Quality Data Machine Learning

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Snowflake

JUNE 5, 2024

At the same time, organizations must ensure the right people have access to the right content, while also protecting sensitive and/or Personally Identifiable Information (PII) and fulfilling a growing list of regulatory requirements. Additional built-in UI’s and privacy enhancements make it even easier to understand and manage sensitive data.

Government

Government Accessibility Accessible Cloud

Securely Connect to LLMs and Other External Services from Snowpark

Snowflake

SEPTEMBER 7, 2023

We are excited to announce the public preview of External Access, which enables customers to reach external endpoints from Snowpark seamlessly and securely. With this announcement, External Access is in public preview on Amazon Web Services (AWS) regions. This has eliminated any additional cost or dependency on external orchestrators.

Amazon Web Services

Amazon Web Services AWS Government Python

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Who needs to be involved in the process of defining and developing that program? Closing Announcements Thank you for listening!

Programming

Programming Data Lake High Quality Data Data Pipeline

Data Teams Survey 2023 Follow-Up

Jesse Anderson

MAY 9, 2023

We’re not allowed to see process beyond the Azure boundary, and in some cases it involves transfer by hand of source files from the GitHub private repos into an internal GitLab repo which we’re not allowed to see. But we’ve had to evolve a homegrown process that fits both teams.”

Software Engineer

Software Engineer Software Engineering Consulting Data

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

You can now use Snowflake Notebooks to simplify the process of connecting to your data and to amplify your data engineering, analytics and machine learning workflows. Leverage your existing role-based access controls (RBAC) to manage access to notebooks and the underlying data assets to enable consistent and robust data governance.

SQL

SQL Python Machine Learning Data Workflow

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Snowflake

APRIL 8, 2024

BigGeo BigGeo accelerates geospatial data processing by optimizing performance and eliminating challenges typically associated with big data. Implentio Implentio is a centralized tool that helps ecommerce ops and finance teams efficiently and cost-effectively manage fulfillment and logistics spending.

Pipeline-centric

Pipeline-centric Food Healthcare Unstructured Data

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

MARCH 1, 2023

The Challenge of Compute Contention At the heart of every real-time application you have this pattern that the data never stops coming in and requires continuous processing, and the queries never stop – whether they come from anomaly detectors that run 24x7 or end-user-facing analytics. So they are not suitable for real-time analytics.

Data Ingestion

Data Ingestion Database Architecture SQL

Accelerate Development and Productivity with DevOps in Snowflake

Snowflake

JUNE 10, 2024

This cohesive approach integrates Git version control, Python APIs, declarative object management and seamless CI/CD automation, and it offers powerful ways to: Maintain a single source of truth: With all files residing within Git, your data assets, code and configurations are centrally managed and version-controlled.

Python

Python Data Pipeline SQL Database

Behind the Scenes with Two New Salary Transparency Websites

The Pragmatic Engineer

APRIL 6, 2023

Our hope is that making salary ranges more accessible on Comprehensive.io For AI, we’ve built a system to efficiently use GPT-4 for this purpose, including auto-crafting prompts and performing pre and post-processing. Our system is using purely Serverless to process the data. ” How does Comprehensive.io

Software Engineer

Software Engineer Software Engineering Datasets Database

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Snowflake

DECEMBER 20, 2023

Accessing the necessary resources from cloud providers demands careful planning and up to month-long wait times due to the high demand for GPUs. And because Snowpark Container Services is designed with data-intensive processing in mind, developers can effortlessly load and process millions of rows of data.

Scala

Scala Government Java Cloud

Build and deploy ML with ease Using Snowpark ML, Snowflake Notebooks, and Snowflake Feature Store

Snowflake

NOVEMBER 1, 2023

Snowflake has invested heavily in extending the Data Cloud to AI/ML workloads, starting in 2021 with the introduction of Snowpark , the set of libraries and runtimes in Snowflake that securely deploy and process Python and other popular programming languages.

Building

Building Python SQL Programming Language

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Closing Announcements Thank you for listening!

Database

Database Data Lake High Quality Data Data Workflow

X-Ray Vision For Your Flink Stream Processing With Datorios

Release Management For Data Platform Services And Logic

Trending Sources

Data logs: The latest evolution in Meta’s access tools

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

The Roots of Today's Modern Backend Engineering Practices

FedRAMP In Process Designation, A Milestone in Cybersecurity Commitment

Master Data Management: Common Misconceptions You Should Know

How does ChatGPT work? As explained by the ChatGPT team.

How Snowflake and Merit Helped Provide Over 120,000 Students with Access to Education Funding

Snowflake ML Now Supports Expanded MLOps Capabilities for Streamlined Management of Features and Models

Composable data management at Meta

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Layoffs push down scores on Glassdoor: this is how companies respond

Aligning Velox and Apache Arrow: Towards composable data management

Snowflake’s Fully Managed Service: Beyond Serverless

Organist: stay sane managing your development environments

Paying down tech debt: further learnings

Snowflake Cortex AI Continues to Advance Enterprise AI with No-Code Development, Serverless Fine-Tuning and Managed Services to Build Chat-with-Data Applications

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP

Is the “AI developer”a threat to jobs – or a marketing stunt?

Troubleshooting Kafka In Production

New Year, New Approaches to Tackling IT Operations Management

Data Sharing Across Business And Platform Boundaries

Tackling Real Time Streaming Data With SQL Using RisingWave

Addressing The Challenges Of Component Integration In Data Platform Architectures

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Introducing Snowpark pandas API: Run Distributed pandas at Scale in Snowflake

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Migration Strategies For Large Scale Systems

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Build Your Second Brain One Piece At A Time

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Securely Connect to LLMs and Other External Services from Snowpark

When And How To Conduct An AI Program

Data Teams Survey 2023 Follow-Up

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Introducing Compute-Compute Separation for Real-Time Analytics

Accelerate Development and Productivity with DevOps in Snowflake

Behind the Scenes with Two New Salary Transparency Websites

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Build and deploy ML with ease Using Snowpark ML, Snowflake Notebooks, and Snowflake Feature Store

Reconciling The Data In Your Databases With Datafold

Stay Connected