December, 2023

article thumbnail

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

For a data scientist, there’s no such thing as too much data. But when we take a broader look at the organizational context, we have to balance our goals with other considerations. Photo by Trnava University on Unsplash Data Science vs Security/IT: A Battle for the Ages Acquiring and keeping data is the focus of a huge amount of our mental energy as data scientists.

article thumbnail

25 Free Courses to Master Data Science, Data Engineering, Machine Learning, MLOps, and Generative AI

KDnuggets

Discover a collection of top courses to launch your dream career or master a new skill, all for free!

article thumbnail

Streaming in Data Engineering

Towards Data Science

Streaming data pipelines and real-time analytics Continue reading on Towards Data Science »

article thumbnail

A Tech Conference Listed Fake Speakers for Years: I Accidentally Noticed

The Pragmatic Engineer

For 3 years straight, the DevTernity conference listed non-existent Coinbase employees as featured speakers. When were they added and what could have the motivation been? Three featured speakers listed at DevTernity 2021, 2022 and 2023, and JDKon 2024. These people do not exist. A year ago, I spent months doing an investigative report on how UK events tech company Pollen had its staff work for free, as it had run out of money but still kept operating.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Troubleshooting Kafka In Production

Data Engineering Podcast

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production" In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate

Kafka 245
article thumbnail

Unlocking the Power of Containers: Exploring the Top 20 Docker Containers for Every Development Need

Analytics Vidhya

Introduction Docker containers have emerged as indispensable tools in the fast-evolving landscape of software development and deployment, providing a lightweight and efficient way to package, distribute, and run applications. This article delves into the top 20 Docker containers across various categories, showcasing their features, use cases, and contributions to streamlining development workflows.

178
178

More Trending

article thumbnail

25 Free Books to Master SQL, Python, Data Science, Machine Learning, and Natural Language Processing

KDnuggets

Discover a collection of best books to start your data career or master a new skill, all for free!

article thumbnail

Databricks Named a Leader in 2023 Gartner® Magic Quadrant™ for Cloud Database Management Systems

databricks

We are excited to announce that Gartner has recognized Databricks as a Leader for a third consecutive year in the 2023 Gartner® Magic.

Database 145
article thumbnail

Mentoring software engineers or engineering leaders

The Pragmatic Engineer

I get asked every now and then if I offer 1:1 mentoring for either software engineers or engineering managers or leaders. While I used to do this in the past, I don't offer this any more. I collected much of the advice I have to offer for software engineers in The Software Engineer's Guidebook. I also write The Pragmatic Engineer Newsletter where I do cover topics like what it means to be a senior engineer at various companies , how to deal with a low-quality engineering culture , and

article thumbnail

Building end-to-end security for Messenger

Engineering at Meta

We are beginning to upgrade people’s personal conversations on Messenger to use end-to-end encryption (E2EE) by default Meta is publishing two technical white papers on end-to-end encryption: Our Messenger end-to-end encryption whitepaper describes the core cryptographic protocol for transmitting messages between clients. The Labyrinth encrypted storage protocol whitepaper explains our protocol for end-to-end encrypting stored messaging history between devices on a user’s account.

Building 145
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Make this 3D printed globe please

ArcGIS

It's that time of year to warm ourselves beside the electric hum of a plastic filament printer and fall into the joy of making.

IT 143
article thumbnail

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing 130
article thumbnail

10 GitHub Repositories to Master Machine Learning

KDnuggets

The blog covers machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job.

article thumbnail

Creating High Quality RAG Applications with Databricks

databricks

Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are.

Data 145
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

The Pragmatic Engineer Newsletter in 2023

The Pragmatic Engineer

2023 was the second full year of The Pragmatic Engineer Newsletter , and this newsletter is now almost two and a half years old; the first issue came out on 26 August 2021. Thank you for being a reader, I greatly value your support. This year, 102 newsletter issues were published, and this is number 103. You received a deepdive issue on Tuesdays, and every Thursday it was  “The Pulse”  – formerly The Scoop.

article thumbnail

How Meta built the infrastructure for Threads

Engineering at Meta

On July 5, 2023, Meta launched Threads, the newest product in our family of apps, to an unprecedented success that saw it garner over 100 million sign ups in its first five days. A small, nimble team of engineers built Threads over the course of only five months of technical work. While the app’s production launch had been under consideration for some time, the business finally made the decision and informed the infrastructure teams to prepare for its launch with only two days’ advance notice.

article thumbnail

Join Enhancements in ArcGIS Pro 3.2

ArcGIS

ArcGIS Pro 3.2 includes a number of enhancements to the Spatial Join, Add Spatial Join, Add Join, and Join Field tools.

139
139
article thumbnail

Streamhouse, the next house to move into?

Waitingforcode

I must admit it, if you want to catch my attention, you can use some keywords. One of them is "stream". Knowing that, the topic of my new blog post shouldn't surprise you.

IT 130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Building Predictive Models: Logistic Regression in Python

KDnuggets

Image by Author When you are getting started with machine learning, logistic regression is one of the first algorithms you’ll add to your toolbox.

Python 160
article thumbnail

Improve your RAG application response quality with real-time structured data

databricks

Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use.

article thumbnail

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the s

Data Lake 130
article thumbnail

Uplevel your dbt workflow with these tools and techniques

Start Data Engineering

1. Introduction 2. Setup 3. Ways to uplevel your dbt workflow 3.1. Reproducible environment 3.1.1. A virtual environment with Poetry 3.1.2. Use Docker to run your warehouse locally 3.2. Reduce feedback loop time when developing locally 3.2.1. Run only required dbt objects with selectors 3.2.2. Use prod datasets to build dev models with defer 3.2.3. Parallelize model building by increasing thread count 3.

Datasets 130
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Make this AI-inspired topo landscape please

ArcGIS

Here's how to fake an isometric 3D topo terrain in 2D! And stuff.

138
138
article thumbnail

Order is king for the performance

Waitingforcode

Even though nowadays data processing frameworks and data stores have smart query planners, they don't take our responsibility to correctly design the job logic.

Designing 130
article thumbnail

The KDnuggets 2023 Cheat Sheet Collection

KDnuggets

KDnuggets has brought together all of its in-house cheat sheets from 2023 in this single, convenient location. Have a look to make sure you didn't miss out on anything over the year.

IT 157
article thumbnail

Even Santa Claus has AI fever

databricks

As CEO of the North Pole, Santa Claus oversees one of the world’s most complicated supply chain, manufacturing and logistics operations. Every year, S.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

My Vim-Verse: The Backbone of My Workflow

Simon Späti

In my journey, detailed in why Vim is more than an editor , I’ve discovered the profound impact of integrating Vim and its motions into my entire computer workflow. This evolution, from using familiar tools like Notepad++ and SQL Server Management Studio to embracing Vim, represents a significant shift in how I approach tasks in data engineering and writing.

SQL 130
article thumbnail

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project.

Data Lake 130
article thumbnail

Big improvements for field management in Geoprocessing in ArcGIS Pro 3.2

ArcGIS

In ArcGIS Pro 3.2, the field map parameter has been redesigned for improved usability and new capabilities.

article thumbnail

Data+AI Summit 2023, retrospective part 2

Waitingforcode

One week later than initially announced, but here it is, the second part for Data+AI Summit 2023 retrospective. I don't know how, but I managed to include some streaming-related talks here too!

Data 130
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.