March, 2023

article thumbnail

5 Machine Learning Skills Every Machine Learning Engineer Should Know in 2023

KDnuggets

Most essential skills are programming, data preparation, statistical analysis, deep learning, and natural language processing.

article thumbnail

5 Advance Projects for Data Science Portfolio

KDnuggets

Work on data analytics, time series, natural language processing, machine learning, and ChatGPT projects to improve your chance of getting hired.

Portfolio 176
article thumbnail

AWS Lambdas. Useful for Data Engineering?

Confessions of a Data Guy

Are lambdas one of those tools that everyone uses and no one talks about? I guess I’ve taken them for granted over the years, even though they are incredibly useful. For a lot of my Data Engineering career I didn’t really think about or use AWS lambdas, I just saw them as little annoying flies […] The post AWS Lambdas. Useful for Data Engineering?

AWS 130
article thumbnail

Amazon doubling down on return to office

The Pragmatic Engineer

Comments

291
291
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How to get started with dbt

Christophe Blefari

This article is meant to be a resource hub in order to understand dbt basics and to help get started your dbt journey. When I write dbt, I often mean dbt Core. dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt Core has been developed by dbt Labs, which was previously named Fishtown Analytics. The company has been founded in May 2016. dbt Labs also develop dbt Cloud which is a cloud product that hosts and runs dbt Core projects.

article thumbnail

Advanced NumPy: Broadcasting and Strides

Analytics Vidhya

Introduction NumPy is an open-source library in python and a must-learn if you want to enter the data science ecosystem. It is the library underpinning other important libraries such as Pandas, matplotlib, Scipy, scikit-learn, etc. One of the reasons this library is so foundational is because of its array of programming capabilities. Array programming, or […] The post Advanced NumPy: Broadcasting and Strides appeared first on Analytics Vidhya.

Python 269

More Trending

article thumbnail

A Complete Collection of Data Science Free Courses – Part 2

KDnuggets

The second part covers the list of Machine Learning, Deep Learning, Computer Vision, Natural Language Processing, Data Engineering, and MLOps.

article thumbnail

Exploring The Nuances Of Building An Intential Data Culture

Data Engineering Podcast

Summary The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about what constitutes an effective and productive data culture, the team at Data Council have dedicated an entire conference track to the subject.

Building 147
article thumbnail

Big Tech job-switching stats

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics from The Scoop #39 , published two weeks ago, 23 February. To get full newsletters twice a week, subscribe here. I have collaborated with a tech recruiter - they’ve asked to be anonymous - who’s been running some very interesting queries on LinkedIn for software engineers.

article thumbnail

Announcing FawltyDeps - a dependency checker for your Python code

Tweag

It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker. In the least, it’s our hope to convince you that Tweag’s new dependency checker, FawltyDeps, can help you maintain an environment that is minimal and reproducible for your Python project, by ensuring that required dependencies are explicitly declared and detecting unused dependencies.

Python 144
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Top 6 Amazon S3 Interview Questions

Analytics Vidhya

Introduction S3 is Amazon Web Services cloud-based object storage service (AWS). It stores and retrieves large amounts of data, including photos, movies, documents, and other files, in a durable, accessible, and scalable manner. S3 provides a simple web interface for uploading and downloading data and a powerful set of APIs for developers to integrate S3.

article thumbnail

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

Every database built for real-time analytics has a fundamental limitation. When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. These two parts running on the same compute unit is what makes the database real-time: queries can reflect the effect of the new data that was just ingested.

article thumbnail

A Complete Collection of Data Science Free Courses – Part 1

KDnuggets

The first part covers the list of Programming, Web scraping, Statistics & Probability, Data Analytics, SQL, and Business Intelligence free courses.

article thumbnail

Failure Mitigation for Microservices: An Intro to Aperture

DoorDash Engineering

When dealing with failures in a microservice system, localized mitigation mechanisms like load shedding and circuit breakers have always been used, but they may not be as effective as a more globalized approach. These localized mechanisms ( as demonstrated in a systematic study on the subject published at SoCC 2022 ) are useful in preventing individual services from being overloaded, but they are not very effective in dealing with complex failures that involve interactions between services, whic

Java 135
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

The Collapse of Silicon Valley Bank

The Pragmatic Engineer

It’s been a wild weekend, starting Friday. In case you somehow missed it: we went through the fastest bank run in history, in an event that impacted about half of all VC-funded startups in the US and UK. On Friday night, Silicon Valley Bank (SVB) was shut down by regulators, triggering a weekend of fear and uncertainty for many people and businesses with questions like: “can we make payroll next week?

Banking 192
article thumbnail

Announcing Topiary

Tweag

Topiary aims to be a universal formatter engine within the Tree-sitter ecosystem. Named after the art of clipping or trimming trees into fantastic shapes, it is designed for formatter authors and formatter users: Authors can create a formatter for a language without having to write their own formatting engine, or even their own parser. Users benefit from uniform, comparable code style, across multiple languages, with the convenience of a single formatter tool.

Coding 139
article thumbnail

Top 6 Cassandra Interview Questions

Analytics Vidhya

Introduction Apache Cassandra is a NoSQL database management system that is open-source and distributed. It is meant to handle massive volumes of data across many commodity servers while maintaining high availability with no single point of failure. Facebook created Cassandra, which ultimately became an Apache Software Foundation project. It is well-known for its rapid write […] The post Top 6 Cassandra Interview Questions appeared first on Analytics Vidhya.

NoSQL 252
article thumbnail

SimulatedRides: How Lyft uses load testing to ensure reliable service during peak events

Lyft Engineering

Authors: Remco van Bree , Ben Radler Contributors : Alex Ilyenko , Ben Radler , Francisco Souza , Garrett Heel , Nathan Hsieh , Remco van Bree , Shu Zheng , Alex Hartwell , Brian Witt “Load testing in production is great.” We know what you’re thinking — testing in production is one of the cardinal sins of software development. However, at Lyft we have come to realize that load testing in production is a powerful tool to prepare systems for unexpected bursty traffic and peak events.

Coding 133
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

GPT-4: Everything You Need To Know

KDnuggets

A new model by OpenAI with improved natural language generation and understanding capabilities.

Process 160
article thumbnail

Uniting the Machine Learning and Data Streaming Ecosystems - Part 1

Confluent

The future of data is real time and enriched by machine learning. How can we overcome socio-technical blockers and unite the ML and data streaming markets?

article thumbnail

Why did Google close its coding competitions after 20 years?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in yesterday's subscriber-only The Scoop issue. To get full newsletters twice a week, subscribe here. On 22 February 2023, Google announced its coding competitions are coming to an end: The visual that accompanied the announcement of the end of Google’s coding competitions.

Coding 176
article thumbnail

Using CockroachDB to Reduce Feature Store Costs by 75%

DoorDash Engineering

While building a feature store to handle the massive growth of our machine-learning (“ML”) platform, we learned that using a mix of different databases can yield significant gains in efficiency and operational simplicity. We saw that using Redis for our online machine-learning storage was not efficient from a maintenance and cost perspective.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Top 6 Microsoft HDFS Interview Questions

Analytics Vidhya

Introduction Microsoft Azure HDInsight(or Microsoft HDFS) is a cloud-based Hadoop Distributed File System version. A distributed file system runs on commodity hardware and manages massive data collections. It is a fully managed cloud-based environment for analyzing and processing enormous volumes of data. HDInsight works seamlessly with the Hadoop ecosystem, which includes technologies like MapReduce, Hive, […] The post Top 6 Microsoft HDFS Interview Questions appeared first on Analytics V

Hadoop 246
article thumbnail

Data News — Week 23.13

Christophe Blefari

This newsletter is about money ( credits ) Dear readers, already 3 months done in 2023. We are slowly approaching the 2-years anniversary of the blog and the newsletter. We are almost 3000 and once again I want to thank you for the trust. To be honest time flies and I’d have preferred to do more for the blog in the start of the year but my freelancing activities and my laziness took me so much.

Bytes 130
article thumbnail

Top Machine Learning Papers to Read in 2023

KDnuggets

These curated papers would step up your machine-learning knowledge.

article thumbnail

Table file formats - Z-Order compaction: Delta Lake

Waitingforcode

In my recent exploration of the compaction, aka OPTIMIZE command, in Delta Lake, I found this famous Z-Ordering mode. It was one of the most outstanding features when I first heard about Delta Lake. You can't even imagine how impatient I was to see what it is doing under-the-hood. Fortunately, this time has come!

IT 130
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Polars vs Spark. Real Talk.

Confessions of a Data Guy

Real talk. Polars is all the rage. People love Spark. People use Spark for small data, but data is too big for Pandas. Spark runs on a local machine. Polars runs on a local machine. What do I choose, Spark or Polars? Does it matter? I’ve written about Polars at different points, here, and here […] The post Polars vs Spark. Real Talk. appeared first on Confessions of a Data Guy.

IT 130
article thumbnail

Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

Data Engineering Podcast

Summary The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching records together. The peril is that building a robust and scalable streaming architecture is always more complicated and error-prone than you think it's going to be. After experiencing this unfortunate reality for themselves, Abhishek Chauhan and Ashish Kumar founded Grainite so that you don't have to suffer the same pain.

MySQL 130
article thumbnail

Complete Guide to Pub/Sub in Redis

Analytics Vidhya

Introduction Publish and Subscribe is a messaging mechanism having one or a set of senders sending messages and one or a group of receivers receiving these messages. These senders are called Publishers, responsible for publishing these messages, and the receivers are called Subscribers who subscribe to these Publishers to receive their notifications.

article thumbnail

Data News — Week 23.12

Christophe Blefari

The Earth can also generate great images ( credits ) Dear readers, I hope this new edition finds you well. It seems that you really liked the recent editions, which is perfect because it was fun to write. I feel that this week all the articles I found relevant for the newsletter are either AI related or technical. I really don't know how to deal with news overflow about the Gen AI landscape.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.