March, 2024

article thumbnail

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of three topics from last week’s subscriber-only The Pulse issue. Today, full subscribers got access to a comprehensive Senior-and-above tech compensation research.

article thumbnail

Data News — Week 24.11

Christophe Blefari

Mountains I hope this e-mail finds you well, wherever you are. I'd like to thank you for the excellent comments you sent me last week after the publication of the first version of the Recommendations. This is just the beginning! This week I've added a subscribe button in the Recommendations page in order for you to opt-in for the weekly recommendation email—every Tuesday.

Metadata 272
article thumbnail

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they ena

Data Lake 162
article thumbnail

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

KDnuggets

Learn everything about data science by exploring our curated collection of free courses from top universities, covering essential topics from math and programming to machine learning, and mastering the nine steps to become a job-ready data scientist.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The Best Piece of Software Engineering Advice

Confessions of a Data Guy

You probably think this is another internet clickbait title uh? Just trying to get you to clickty clickty and sell you some Google Ads. Two problems. I don’t have Google Ads, and I know a small percentage of people will actually listen to this advice. Whatever. There is a reason some developers struggle to move […] The post The Best Piece of Software Engineering Advice appeared first on Confessions of a Data Guy.

article thumbnail

Announcing DBRX: A new standard for efficient open source LLMs

databricks

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their.

Building 145

More Trending

article thumbnail

Data News — Week 24.09

Christophe Blefari

Mistral ( credits ) Hello all, this is the Data News, this week edition might be smaller than usual in term of comments as I'm working on a Data News related project that takes me a bit of time, which will probably lead to a series of articles. Before I forget I've appeared on The Joe Reis Show , we chatted with Joe about data engineering teaching, why it is hard and about generative AI that will change education for ever.

Data 162
article thumbnail

When And How To Conduct An AI Program

Data Engineering Podcast

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization.

article thumbnail

Top 6 YouTube Series for Data Science Beginners

KDnuggets

Want to start your data science journey from home, for free, and work at your own pace? Have a dive into this data science roadmap using the YouTube series.

article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open compute and open source.

Building 145
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Introducing DBRX: A New State-of-the-Art Open LLM by Databricks

databricks

Comments

145
145
article thumbnail

Announcing {arcgis}, an R package for ArcGIS Location Services

ArcGIS

A new R package created by the R-ArcGIS Bridge team enables integration with ArcGIS location services, enhancing their combined powers.

144
144
article thumbnail

Introducing Tableflow

Confluent

Seamlessly integrate Apache Kafka data into your lakehouse as Apache Iceberg tables, bridging the operational and analytical divide, with Tableflow. Read more in our blog post.

Kafka 133
article thumbnail

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.

Database 147
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Top 8 AI Search Engine That You Should Replace With Google

KDnuggets

GenAI has enabled new search engine platforms with unique features and advantages, challenging Google's dominance.

article thumbnail

Threads has entered the fediverse

Engineering at Meta

Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from the fediverse.

Media 139
article thumbnail

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI

databricks

Today, we are thrilled to announce that Lilac is joining Databricks. Lilac is a scalable, user-friendly tool for data scientists to search, cluster.

article thumbnail

Schema tracking in Delta Lake

Waitingforcode

Streaming Delta tables is slightly different from streaming native streaming sources, such as Apache Kafka topics. One of the significant differences is schema enforcement. It leads to the job failure in case of schema changes of the streamed table.

Kafka 130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Data News — Week 24.12

Christophe Blefari

Friday routine ( credits ) It's Friday and it's Data News. I don't go into too much detail about the magic of Data News, but every Friday is the same. At first, I'm: oh s**t, here we go again and 10 minutes later I'm lost in reading the content and picking too many articles to fit into a thousand word edition. Usually all the process takes me a whole Friday.

article thumbnail

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility.

Data Lake 147
article thumbnail

10 GitHub Repositories to Master MLOps

KDnuggets

Begin your MLOps journey with these comprehensive free resources available on GitHub.

157
157
article thumbnail

Making messaging interoperability with third parties safe for users in Europe

Engineering at Meta

To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as far as possible.

Media 135
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Delivering the Next Generation of Consumer Experiences: Databricks and Adobe Announce Strategic Partnership

databricks

By Steve Sobel - Global Industry Leader; Communications, Media & Entertainment Today Databricks and Adobe are excited to announce a strategic partnership focused.

article thumbnail

StreamingQueryListener, from states to questions

Waitingforcode

Apache Spark leverages the observer design pattern for the framework-to-code communication. One of the consumers' implementations is StreamingQueryListener.

Coding 130
article thumbnail

Never Put Databricks Notebooks in Production

Confessions of a Data Guy

Recently an Architecture at Databricks recommended people use Notebooks for Production workloads. Very bad and horrible idea. Very expensive compute for most people (All Purpose Clusters) and it leads to horrible development practices. It set off a firestorm on Linkedin when I commented people SHOULD NOT follow this advice. Read here and here The post Never Put Databricks Notebooks in Production appeared first on Confessions of a Data Guy.

article thumbnail

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on.

Project 130
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

2024 Reading List: 5 Essential Reads on Artificial Intelligence

KDnuggets

Transform your understanding of current and future tech with these top 5 AI reads to explore the minds shaping our future.

157
157
article thumbnail

Data News — Recommendations

Christophe Blefari

We all need recommendations ( credits ) When I started writing this newsletter nearly three years ago, I never imagined that the words I write on my keyboard would take such an important place in my life. All the interactions I have with you, whether online or offline, are always amazing and give me wings. Today I want to introduce a new feature in the Data News galaxy.

Data 130
article thumbnail

Databricks invests in Mistral AI and integrates Mistral AI’s models into the Databricks Data Intelligence Platform

databricks

Sharing a belief that open source solutions will foster innovation and transparency in generative AI development, Databricks has announced a partnership and participation.

Data 138
article thumbnail

Processing time trigger, to be or not to be?

Waitingforcode

That's the question. The lack of the processing time trigger means more a reactive micro-batch triggering but it cannot be considered as the single true best practice. Let's see why.

Process 130
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.