Sat.Sep 23, 2023 - Fri.Sep 29, 2023

article thumbnail

Why are Cloud Development Environments Spiking in Popularity, Now?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover a fresh industry trends: Cloud Developent Environments — which is analysis full subscribers have received 3 weeks ago.

Cloud 253
article thumbnail

DuckDB + Delta Lake (the new lake house?)

Confessions of a Data Guy

I always leave it to my dear readers and followers to give me pokes in the right direction. Nothing like the teaming masses to set you straight. Recently I was working on my Substack Newsletter, on the topic of Polars + Delta Lake, reading remove files from s3 … I left a question open on […] The post DuckDB + Delta Lake (the new lake house?

Data 147
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Upgrade your Modern Data Stack

Christophe Blefari

Make your data stack take-off ( credits ) Hello, another edition of Data News. This week, we're going to take a step back and look at the current state of data platforms. What are the current trends and why are people fighting around the concept of the modern data stack. Early September is usually conference season. All over the world, people gather in huge venues to attend conferences.

Big Data 147
article thumbnail

Arbitrary stateful processing in PySpark with applyInPandasWithState

Waitingforcode

It's always a huge pleasure to see the PySpark API covering more and more Scala API features. Starting from Apache Spark 3.4.0 you can even write arbitrary stateful processing jobs! But since the API is a little bit different than the one available on the Scala side, I wanted to take a deeper look.

Process 147
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Working at a Startup vs in Big Tech

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Willem Spruijt is a software engineer whom I worked on the same team with at Uber in Amsterdam, building payments systems.

article thumbnail

Powering Vector Search With Real Time And Incremental Vector Indexes

Data Engineering Podcast

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

SQL 147

More Trending

article thumbnail

Data News — Week 23.38 (late)

Christophe Blefari

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data 130
article thumbnail

Getting Started with PyTorch in 5 Steps

KDnuggets

This tutorial provides an in-depth introduction to machine learning using PyTorch and its high-level wrapper, PyTorch Lightning. The article covers essential steps from installation to advanced topics, offering a hands-on approach to building and training neural networks, and emphasizing the benefits of using Lightning.

article thumbnail

Deploy Private LLMs using Databricks Model Serving

databricks

We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy.

article thumbnail

Airflow TaskGroup: All you need to know!

Marc Lamberti

An Airflow TaskGroup helps make a complex DAG easier to organize and read. Airflow taskgroups are meant to replace SubDAGs, the historical way of grouping your tasks. Indeed, SubDAGs are too complicated only for grouping tasks. They bring a lot of complexity as you must create a DAG in a DAG, import the SubDagOperator (which is a sensor), define the parameters correctly, and so on.

Coding 130
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Data News — Week 23.38 (late)

Christophe Blefari

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data 130
article thumbnail

Top 7 Free Cloud Notebooks for Data Science

KDnuggets

Cloud notebooks are game-changers for data science, providing free access to computing, pre-built environments, collaboration features, and third-party integrations - everything you need to enhance your workflow.

article thumbnail

Lessons from debugging a tricky direct memory leak

Pinterest Engineering

Sanchay Javeria | Software Engineer, Ads Data Infrastructure To support metrics reporting for ads from external advertisers and real-time ad budget calculations at Pinterest, we run streaming pipelines using Apache Flink. These jobs have guaranteed an overall 99th percentile availability to our users; however, every once in a while some tasks get hit with nasty direct out-of-memory (OOM) errors on multiple operators that look something like this: As is the case with most failures in a distribute

Utilities 109
article thumbnail

Old School: Adapting Esri Basemaps for Printed Products

ArcGIS

Esri basemaps are designed to be used at multiple scales, but a static map needs everything in one view. How doe we get around that?

Designing 121
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Fueling Data-Driven Decision-Making with Data Validation and Enrichment Processes

Precisely

77% of data and analytics professionals say data-driven decision-making is the top goal for their data programs. Data-driven decision-making and initiatives are certainly in demand, but their success hinges on … well, the data that supports them. More specifically, the quality and integrity of that data. It seems obvious enough, but checking that your data is up to the task and taking any necessary steps to improve and maintain its quality can be easier said than done.

article thumbnail

A Comparative Overview of the Top 10 Open Source Data Science Tools in 2023

KDnuggets

Are you looking for the open source tools to help you in your data science journey? Look no further. Discover these game-changers that will elevate your data-driven decisions.

article thumbnail

Training Foundation Improvements for Closeup Recommendation Ranker

Pinterest Engineering

Fan Jiang | Software Engineer, Closeup Candidate Retrieval; Liyao Lu | Software Engineer, Closeup Ranking & Blending; Laksh Bhasin | Software Engineer, Core ML Foundations; Chen Yang | Software Engineer, Core ML Foundations; Shivin Thukral | Software Engineer, Closeup Ranking & Blending; Travis Ebesu | Software Engineer, Closeup Ranking & Blending; Kent Jiang | Software Engineer, Core Serving Infra; Yan Sun | Engineering Manager, Closeup Ranking & Blending; Huizhong Duan | Engine

article thumbnail

Announcing the Public Preview of Lakeview Dashboards!

databricks

We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new.

SQL 111
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

Organizations increasingly rely on streaming data sources not only to bring data into the enterprise but also to perform streaming analytics that accelerate the process of being able to get value from the data early in its lifecycle. As lakehouse architectures (including offerings from Cloudera and IBM) become the norm for data processing and building AI applications, a robust streaming service becomes a critical building block for modern data architectures.

Kafka 94
article thumbnail

5 Free Books to Help You Master Python

KDnuggets

From the basics of Python to clean architecture and more, here are five free books to level up your Python skills.

Python 148
article thumbnail

How Snowflake Native Apps Deliver Security for App Builders and Consumers

Snowflake

The Snowflake Native App Framework , which leverages Snowflake’s advanced architecture, allows for a new level of security for applications. This security spans not just the application consumer, but also the application providers. Controlling all software and infrastructure in the Snowflake Data Cloud, Snowflake can secure the application code to protect the intellectual property (IP) of builders.

Python 95
article thumbnail

easyJet bets on Databricks Lakehouse and Generative AI to be an Innovation Leader in Aviation

databricks

This blog is authored by Ben Dias, Director of Data Science and Analytics and Ioannis Mesionis, Lead Data Scientist at easyJet Introduction to.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Data Access API over Data Lake Tables Without the Complexity

Towards Data Science

Data Access API over Data Lake Tables Without the Complexity Build a robust GraphQL API service on top of your S3 data lake files with DuckDB and Go Photo by Joshua Sortino on Unsplash 1. Intro Data lake tables are mostly utilized by data engineering teams using big data compute engines, such as Spark or Flink, as well as by data analysts and scientists creating models and reports with heavy SQL query engines, such as Trino or Redshift.

article thumbnail

The Data Maturity Pyramid: From Reporting to a Proactive Intelligent Data Platform

KDnuggets

This article describes the data maturity pyramid and its various levels, from simple reporting to AI-ready data platforms. It emphasizes the importance of data for business and illustrates how data platforms serve as the driving force behind AI.

Data 113
article thumbnail

Molex Improves Data Sharing, Visibility, and Performance with the Snowflake Manufacturing Data Cloud

Snowflake

The unprecedented amount of volatility in supply and demand has caused economic uncertainty throughout the world. To navigate today’s challenging economy, manufacturers must digitize their supply chain and manufacturing processes. Digital advancements such as smart manufacturing and automation through AI, machine learning (ML), robotics, and IoT require a connected value chain ecosystem with a secure, scalable, and flexible data platform.

article thumbnail

Working with Esri Vector Basemaps in ArcGIS Pro

ArcGIS

Esri Vector Basemaps are available for use in ArcGIS Pro, and that opens up some new possibilities for you.

Designing 116
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Strengthening Your Data Ecosystem with Unrivaled Security

Cloudera

As data ecosystems evolve security becomes a paramount concern, especially within the realm of private cloud environments. Cloudera on Private Cloud with the Private Cloud Base (CDP PvC Base) stands as a beacon of innovation in the realm of data security, offering a holistic suite of features that work in concert to safeguard sensitive information. With the latest 7.1.9 release , the journey towards a more secure data ecosystem continues — one where businesses can unlock the full potential of th

article thumbnail

Building a Convolutional Neural Network with PyTorch

KDnuggets

This blog post provides a tutorial on constructing a convolutional neural network for image classification in PyTorch, leveraging convolutional and pooling layers for feature extraction as well as fully-connected layers for prediction.

Building 112
article thumbnail

Marketing Success in the Age of AI: Celebrating EMEA’s Modern Marketing Data Stack Pioneers

Snowflake

Data is an invaluable asset in today’s marketing ecosystem. With its unique blend of cultures, economies, and regulatory environments, the EMEA market offers a nuanced picture of how marketers harness data technologies to understand their audiences, calibrate campaigns in real time, and adhere to complex government and industry regulations. In our second annual Snowflake Modern Marketing Data Stack 2023 report , we delve into actual usage and adoption of marketing technologies within the Snowfla

article thumbnail

How to Stream JSON Data Using Server-Sent Events and FastAPI in Python over HTTP?

Workfall

Reading Time: 9 minutes In this blog, we will cover: What are Server-Sent Events? Why Stream Data Using Server-Sent Events (SSE)? What is FastAPI? Hands-On Conclusion What are Server-Sent Events? Server-Sent Events (SSE) is a simple and efficient technology for sending real-time updates from the server to the web browser over a single HTTP connection.

Python 85
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.