December, 2023

article thumbnail

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

For a data scientist, there’s no such thing as too much data. But when we take a broader look at the organizational context, we have to balance our goals with other considerations. Photo by Trnava University on Unsplash Data Science vs Security/IT: A Battle for the Ages Acquiring and keeping data is the focus of a huge amount of our mental energy as data scientists.

article thumbnail

25 Free Courses to Master Data Science, Data Engineering, Machine Learning, MLOps, and Generative AI

KDnuggets

Discover a collection of top courses to launch your dream career or master a new skill, all for free!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Streaming in Data Engineering

Towards Data Science

Streaming data pipelines and real-time analytics Continue reading on Towards Data Science »

article thumbnail

A Tech Conference Listed Fake Speakers for Years: I Accidentally Noticed

The Pragmatic Engineer

For 3 years straight, the DevTernity conference listed non-existent Coinbase employees as featured speakers. When were they added and what could have the motivation been? Three featured speakers listed at DevTernity 2021, 2022 and 2023, and JDKon 2024. These people do not exist. A year ago, I spent months doing an investigative report on how UK events tech company Pollen had its staff work for free, as it had run out of money but still kept operating.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Troubleshooting Kafka In Production

Data Engineering Podcast

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production" In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate

Kafka 245
article thumbnail

Unlocking the Power of Containers: Exploring the Top 20 Docker Containers for Every Development Need

Analytics Vidhya

Introduction Docker containers have emerged as indispensable tools in the fast-evolving landscape of software development and deployment, providing a lightweight and efficient way to package, distribute, and run applications. This article delves into the top 20 Docker containers across various categories, showcasing their features, use cases, and contributions to streamlining development workflows.

231
231

More Trending

article thumbnail

Free MIT Course: TinyML and Efficient Deep Learning Computing

KDnuggets

Curious about optimizing AI for everyday devices? Dive into the complete overview of MIT's TinyML and Efficient Deep Learning Computing course. Explore strategies to make AI smarter on small devices. Read the full article for an in-depth look!

article thumbnail

Creating High Quality RAG Applications with Databricks

databricks

Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are.

Data 135
article thumbnail

Mentoring software engineers or engineering leaders

The Pragmatic Engineer

I get asked every now and then if I offer 1:1 mentoring for either software engineers or engineering managers or leaders. While I used to do this in the past, I don't offer this any more. I collected much of the advice I have to offer for software engineers in The Software Engineer's Guidebook. I also write The Pragmatic Engineer Newsletter where I do cover topics like what it means to be a senior engineer at various companies , how to deal with a low-quality engineering culture , and

article thumbnail

Making Flink Serverless, With Queries for Less Than a Penny

Confluent

Dive into the serverless architecture of Confluent Cloud for Apache Flink and explore its benefits like reduced infrastructure costs, increased reliability, & seamless adoption.

Cloud 126
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Practical Magic: Improving Productivity and Happiness for Software Development Teams

LinkedIn Engineering

Co-authors: Max Kanat-Alexander and Grant Jenks Today we are open-sourcing the LinkedIn Developer Productivity & Happiness Framework (DPH Framework) - a collection of documents that describe the systems, processes, metrics, and feedback systems we use to understand our developers and their needs internally at LinkedIn. Now more than ever, developers are navigating so much change and new opportunity in this new era of Generative AI, so ensuring teams have the systems, processes, metrics and f

article thumbnail

How Meta built the infrastructure for Threads

Engineering at Meta

On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production launch had been under consideration for some time, the business finally made the decision and informed the infrastructure teams to prepare for its launch with only two days’ advance notice.

article thumbnail

10 GitHub Repositories to Master Machine Learning

KDnuggets

The blog covers machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job.

article thumbnail

Introducing Databricks Vector Search Public Preview

databricks

Following the announcement we made yesterday around Retrieval Augmented Generation (RAG), today, we’re excited to announce the public preview of Databricks Vector Search. W.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

The Pragmatic Engineer Newsletter in 2023

The Pragmatic Engineer

2023 was the second full year of The Pragmatic Engineer Newsletter , and this newsletter is now almost two and a half years old; the first issue came out on 26 August 2021. Thank you for being a reader, I greatly value your support. This year, 102 newsletter issues were published, and this is number 103. You received a deepdive issue on Tuesdays, and every Thursday it was  “The Pulse”  – formerly The Scoop.

article thumbnail

Make this 3D printed globe please

ArcGIS

It's that time of year to warm ourselves beside the electric hum of a plastic filament printer and fall into the joy of making.

IT 142
article thumbnail

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Snowflake

The rise of generative AI (gen AI) is inspiring organizations to envision a future in which AI is integrated into all aspects of their operations for a more human, personalized and efficient customer experience. However, getting the required compute infrastructure into place, particularly GPUs for large language models (LLMs), is a real challenge. Accessing the necessary resources from cloud providers demands careful planning and up to month-long wait times due to the high demand for GPUs.

Scala 118
article thumbnail

Top 6 Episodes of The Data Chief Podcast: 2023

ThoughtSpot

2023 has been a year of breakthrough innovation for many, and a deer-in-headlights moment for others. I keep flashing back to the 90s when the Internet created new businesses and destroyed others—LLMs are doing the same, only with more velocity. From CDAOs to VCs alike, the rate of creative destruction is faster, but there is also an intense focus on value.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Best Data Science Resources, Bootcamp, and Courses to Learn Data Science in the New Year

KDnuggets

We've partnered with Springboard, the leading data science bootcamp offering personalized 1:1 mentorship, dedicated career support, proven outcomes, and an unbeatable money-back job guarantee, to present a handpicked collection of resources to supercharge your data science journey in the coming year.

article thumbnail

Improve your RAG application response quality with real-time structured data

databricks

Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use.

article thumbnail

Just Arrived: New Symbols on the Robinhood 24 Hour Market

Robinhood

Robinhood is the only US retail brokerage to offer 24/5 trading of single name stocks At Robinhood, we know the world never stops – and believe investing shouldn’t be any different. Since launching in May, we’ve seen customers utilize the unprecedented flexibility and access to the markets with the Robinhood 24 Hour Market. And we’re just getting started – we’re proud to announce that we’ve expanded the total number of symbols available from 95 to 226.

Retail 110
article thumbnail

Join Enhancements in ArcGIS Pro 3.2

ArcGIS

ArcGIS Pro 3.2 includes a number of enhancements to the Spatial Join, Add Spatial Join, Add Join, and Join Field tools.

139
139
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Snowflake Announces Agreement to Acquire Samooha to Simplify Building Interoperable Data Clean Rooms in the Data Cloud

Snowflake

When businesses share sensitive first-party data with outside partners or customers, they must do so in a way that meets strict governance requirements around security and privacy. Data clean rooms have emerged as the technology to meet this need, enabling interoperability where multiple parties can collaborate on and analyze sensitive data in a governed way without exposing direct access to the underlying data and business logic.

Cloud 116
article thumbnail

Real-Time Field Service Optimization

Confluent

Telcos use Confluent with event-driven microservices to enable real-time communications with 3rd-party field service providers, fulfilling customer service requests more efficiently.

110
110
article thumbnail

The KDnuggets 2023 Cheat Sheet Collection

KDnuggets

KDnuggets has brought together all of its in-house cheat sheets from 2023 in this single, convenient location. Have a look to make sure you didn't miss out on anything over the year.

IT 139
article thumbnail

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

databricks

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.

Python 118
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

AI debugging at Meta with HawkEye

Engineering at Meta

HawkEye is the powerful toolkit used internally at Meta for monitoring, observability, and debuggability of the end-to-end machine learning (ML) workflow that powers ML-based products. HawkEye supports recommendation and ranking models across several products at Meta. Over the past two years, it has facilitated order of magnitude improvements in the time spent debugging production issues.

article thumbnail

Building Trust in Public Sector AI Starts with Trusting Your Data

Cloudera

Recent Government Initiatives on Public Sector AI Solutions In recent years, governments across the globe have recognized the transformative potential of artificial intelligence (AI) and have embarked on initiatives to harness this technology to drive innovation and serve their citizens more effectively. These government-led efforts have had a profound impact on the development and adoption of AI solutions in the public sector, paving the way for a future where data-driven decision-making and au

Building 107
article thumbnail

Top Trends in Agile You Can’t Miss in 2024

Knowledge Hut

Technology is evolving at breakneck speed, and the information we consume every day continues to grow exponentially with every passing day. Analyzing this complex mountain of data to make the right decisions informed by this data has become ever more challenging. Traditional models of project management , like the waterfall method and hierarchical team structures are too rigid to respond to the fast-paced change organizations are facing today.

article thumbnail

Our First Netflix Data Engineering Summit

Netflix Tech

Holden Karau Elizabeth Stone Pedro Duarte Chris Stephens Pallavi Phadnis Lee Woodridge Mark Cho Guil Pires Sujay Jain Tristan Reid Senthilnathan Athinarayanan Bharath Mummadisetty Abhinaya Shetty Judit Lantos Amanuel Kahsay Dao Mi Mick Dreeling Chris Colburn and Agata Gryzbek Introduction Earlier this summer Netflix held our first-ever Data Engineering Forum.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.