March, 2024

article thumbnail

Data News — Week 24.11

Christophe Blefari

Mountains I hope this e-mail finds you well, wherever you are. I'd like to thank you for the excellent comments you sent me last week after the publication of the first version of the Recommendations. This is just the beginning! This week I've added a subscribe button in the Recommendations page in order for you to opt-in for the weekly recommendation email—every Tuesday.

Metadata 272
article thumbnail

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of three topics from last week’s subscriber-only The Pulse issue. Today, full subscribers got access to a comprehensive Senior-and-above tech compensation research.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they ena

Data Lake 162
article thumbnail

The Best Piece of Software Engineering Advice

Confessions of a Data Guy

You probably think this is another internet clickbait title uh? Just trying to get you to clickty clickty and sell you some Google Ads. Two problems. I don’t have Google Ads, and I know a small percentage of people will actually listen to this advice. Whatever. There is a reason some developers struggle to move […] The post The Best Piece of Software Engineering Advice appeared first on Confessions of a Data Guy.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A Collection Of Free Data Science Courses From Harvard, Stanford, MIT, Cornell, and Berkeley

KDnuggets

Learn everything about data science by exploring our curated collection of free courses from top universities, covering essential topics from math and programming to machine learning, and mastering the nine steps to become a job-ready data scientist.

article thumbnail

Building Meta’s GenAI Infrastructure

Engineering at Meta

Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We are strongly committed to open compute and open source.

Building 145

More Trending

article thumbnail

The “10x engineer:" 50 years ago and now

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only article What Changed in 50 Years of Computing.

article thumbnail

When And How To Conduct An AI Program

Data Engineering Podcast

Summary Artificial intelligence technologies promise to revolutionize business and produce new sources of value. In order to make those promises a reality there is a substantial amount of strategy and investment required. Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization.

article thumbnail

Announcing DBRX: A new standard for efficient open source LLMs

databricks

Databricks’ mission is to deliver data intelligence to every enterprise by allowing organizations to understand and use their unique data to build their.

Building 145
article thumbnail

Introducing Tableflow

Confluent

Seamlessly integrate Apache Kafka data into your lakehouse as Apache Iceberg tables, bridging the operational and analytical divide, with Tableflow. Read more in our blog post.

Kafka 133
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Making messaging interoperability with third parties safe for users in Europe

Engineering at Meta

To comply with a new EU law, the Digital Markets Act (DMA), which comes into force on March 7th, we’ve made major changes to WhatsApp and Messenger to enable interoperability with third-party messaging services. We’re sharing how we enabled third-party interoperability (interop) while maintaining end-to-end encryption (E2EE) and other privacy guarantees in our services as far as possible.

Media 131
article thumbnail

Extending destination-passing style programming to arbitrary data types in Linear Haskell

Tweag

Three years ago, a blog post introduced destination-passing style (DPS) programming in Haskell, focusing on array processing, for which the API was made safe thanks to Linear Haskell. Today, I’ll present a slightly different API to manipulate arbitrary data types in a DPS fashion, and show why it can be useful for some parts of your programs. The present blog post is mostly based on my recent paper Destination-passing style programming: a Haskell implementation , published at JFLA 2024.

article thumbnail

Statistics for Machine Learning: What you need to know to become a certified expert

KDnuggets

Ready to become a SAS Certified Specialist in Statistics for Machine Learning? Here’s everything you need to know about the recently released certification from SAS.

article thumbnail

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.

Database 147
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Delivering the Next Generation of Consumer Experiences: Databricks and Adobe Announce Strategic Partnership

databricks

By Steve Sobel - Global Industry Leader; Communications, Media & Entertainment Today Databricks and Adobe are excited to announce a strategic partnership focused.

article thumbnail

Snowflake Ventures Invests in Landing AI, Boosting Visual AI in the Data Cloud

Snowflake

As Large Language Models are revolutionizing natural language prompts, Large Vision Models (LVMs) represent another new, exciting frontier for AI. An estimated 90% of the world’s data is unstructured, much of it in the form of visual content such as images and videos. Insights from analyzing this visual data can open up powerful new use cases that significantly boost productivity and efficiency, but enterprises need sophisticated computer vision technologies to achieve this.

Cloud 126
article thumbnail

Threads has entered the fediverse

Engineering at Meta

Threads has entered the fediverse! As part of our beta experience, now available in a few countries, Threads users aged 18+ with public profiles can now choose to share their Threads posts to other ActivityPub-compliant servers. People on those servers can now follow federated Threads profiles and see, like, reply to, and repost posts from the fediverse.

Media 128
article thumbnail

Announcing {arcgis}, an R package for ArcGIS Location Services

ArcGIS

A new R package created by the R-ArcGIS Bridge team enables integration with ArcGIS location services, enhancing their combined powers.

144
144
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Top 6 YouTube Series for Data Science Beginners

KDnuggets

Want to start your data science journey from home, for free, and work at your own pace? Have a dive into this data science roadmap using the YouTube series.

article thumbnail

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility.

Data Lake 147
article thumbnail

Databricks invests in Mistral AI and integrates Mistral AI’s models into the Databricks Data Intelligence Platform

databricks

Sharing a belief that open source solutions will foster innovation and transparency in generative AI development, Databricks has announced a partnership and participation.

Data 137
article thumbnail

Data Trends 2024: Strategies for an AI-Ready Data Foundation

Snowflake

A company’s data strategy is always in motion. Since the explosion of interest in generative AI and large language models (LLMs), that is more true than ever, with business leaders discussing how quickly they should adopt these technologies to stay competitive. Some emerging approaches may be seen in our newly released Snowflake Data Trends 2024 , looking at how users in the Data Cloud are working with their data.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Robinhood 24 Hour Market Reaches $10B+ in Total Volume Traded Overnight

Robinhood

On our busiest days, as much as 25% of the total daily trading volume has come from outside of traditional market hours Last year, Robinhood became the first US retail brokerage to offer 24/5 trading of single name stocks when we launched the Robinhood 24 Hour Market. The news cycle, world events, and market moving events like earnings often happen outside of US East Coast business hours.

Retail 119
article thumbnail

Cloudera’s RHEL-volution: Powering the Cloud with Red Hat

Cloudera

As enterprise AI technologies rapidly reshape our digital environment, the foundation of your cloud infrastructure is more critical than ever. That’s why Cloudera and Red Hat , renowned for their open-source solutions, have teamed up to bring Red Hat Enterprise Linux ( RHEL ) to Cloudera on public cloud as the operating system for all of our public cloud platform images.

Cloud 114
article thumbnail

Best Free Resources to Learn Data Analysis and Data Science

KDnuggets

This article introduces six top-notch, free data science resources ideal for aspiring data analysts, data scientists, or anyone aiming to enhance their analytical skills.

article thumbnail

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

Summary Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on.

Project 130
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI

databricks

Today, we are thrilled to announce that Lilac is joining Databricks. Lilac is a scalable, user-friendly tool for data scientists to search, cluster.

article thumbnail

Snowflake Invests in Observe to Expand Observability in the Data Cloud

Snowflake

As organizations seek to drive more value from their data, observability plays a vital role in ensuring the performance, security and reliability of applications and pipelines while helping to reduce costs. At Snowflake, we aim to provide developers and engineers with the best possible observability experience to monitor and manage their Snowflake environment.

Cloud 119
article thumbnail

Moderating Inappropriate Video Content at Yelp

Yelp Engineering

One of Yelp’s top priorities is the trust and safety of our users. Yelp’s platform is most well-known for its reviews, and its moderation practices have been recognised in academic research for mitigating misinformation and building consumer trust. In addition to reviews, Yelp’s Trust and Safety team takes significant measures when it comes to protecting its users from inappropriate material posted through other content types.

Building 115
article thumbnail

Best Practices for Confluent Schema Registry

Confluent

Learn the best practices for using Confluent Schema Registry, including using schema IDs, understanding subjects and versions, using data contracts, pre-registering schemas, and more.

Data 111
article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.