Sat.Feb 04, 2023 - Fri.Feb 10, 2023

article thumbnail

Most Essential 2023 Interview Questions on Data Engineering

Analytics Vidhya

Introduction Data engineering is the field of study that deals with the design, construction, deployment, and maintenance of data processing systems. The goal of this domain is to collect, store, and process data efficiently and efficiently so that it can be used to support business decisions and power data-driven applications. This includes designing and implementing […] The post Most Essential 2023 Interview Questions on Data Engineering appeared first on Analytics Vidhya.

article thumbnail

Data Types in Delta Lake + Spark. Join and Storage Performance.

Confessions of a Data Guy

Hmm … data types. We all know they are important, but we don’t take them very seriously. I mean we know the difference between boolean, string, and integers, those are easy to get right. But we all get sloppy, sometimes we got the string and varchar route because we don’t spend enough time on the […] The post Data Types in Delta Lake + Spark.

Data 238
article thumbnail

Learn Data Engineering From These GitHub Repositories

KDnuggets

Kickstart your Data Engineering career with these curated GitHub repositories.

article thumbnail

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

Summary This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Your host is Tobias Macey and today I'm reflecting on the m

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

What are Data Access Object and Data Transfer Object in Python?

Analytics Vidhya

Introduction A design pattern is simply a repeatable solution for problems that keep on reoccurring. The pattern is not an actual code but a template that can be used to solve problems in different situations. Especially while working with databases, it is often considered a good practice to follow a design pattern. This ensures easy […] The post What are Data Access Object and Data Transfer Object in Python?

article thumbnail

Ownership and Borrowing in Rust – Data Engineering Gold Mine.

Confessions of a Data Guy

As I started to use Rust on and off, more out of curiosity than anything, I discovered some specs of gold buried down in the depths. Some of the things I’m going to talk about, well … all of it, is probably fairly obvious to most Rust folk, but it’s enjoyable to learn what new […] The post Ownership and Borrowing in Rust – Data Engineering Gold Mine. appeared first on Confessions of a Data Guy.

More Trending

article thumbnail

The evolution of Facebook’s iOS app architecture

Engineering at Meta

Facebook for iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012 , it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time. After years of iteration , the Facebook codebase does not resemble a typical iOS codebase: It’s full of C++, Objective-C(++), and Swift.

article thumbnail

Top 6 Amazon Redshift Interview Questions

Analytics Vidhya

Introduction Amazon Redshift is a fully managed, petabyte-scale data warehousing Amazon Web Services (AWS). It allows users to easily set up, operate, and scale a data warehouse in the cloud. Redshift uses columnar storage techniques to store data efficiently and supports data warehousing workloads intelligence, reporting, and analytics. It allows users to perform complex queries […] The post Top 6 Amazon Redshift Interview Questions appeared first on Analytics Vidhya.

article thumbnail

Apache Kafka Beyond the Basics: Windowing

Confluent

Learn what windowing is, the difference between the four types of windows (hopping and tumbling, or session and sliding), and how to create them.

Kafka 140
article thumbnail

ChatGPT for Coding: Unleash the Power of ChatGPT

Edureka

We are introduced to new discoveries and technologies every day, and one of the best and most popular inventions today is artificial intelligence (AI) and its tools. One of them is Chat GPT, a conversational model of AI that is a powerful chatbot that answers follow-up questions and writes code for the users. The day it was launched, everybody was going gaga over the new technology and the remarkable uses of this AI-powered chatbot.

Coding 130
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Improving Meta’s global maps

Engineering at Meta

A lot has changed since the initial launch of our basemap in late 2020. We’re Meta now, but our mission remains the same: Giving people the power to build community and bring the world closer together. Across Meta, our family of applications (Facebook, Instagram, WhatsApp, among others) are using our basemap to connect people through functions like status updates, location sharing, and location-based searching.

article thumbnail

Isolated Python Environments using Docker

Analytics Vidhya

Introduction While working with multiple projects, there are chances of issues with versions of packages in python; for example, a project needs a new version of a package, and another requires a different version. Sometimes the python version itself changes from project to project. Managing these different python versions and different versions of packages is […] The post Isolated Python Environments using Docker appeared first on Analytics Vidhya.

Python 218
article thumbnail

Data News — Week 23.06

Christophe Blefari

This is what the metrics store inspires me ( credits ) Dear Data News friend, every week there is a bit of randomness when this email will truly land in your mailbox—which, btw, breaks all the rules of newsletter writing. Yeah, you know, you have to get your readers used to a fixed schedule, which they can trust and bla, bla, bla. The good news is that at least with me you can trust that I have no schedule except that you should have the newsletter on Friday or Saturday.

Kafka 130
article thumbnail

Table file formats - compaction: Apache Iceberg

Waitingforcode

Compaction is also a feature present in Apache Iceberg. However, it works a little bit differently than for Delta Lake presented last time. Why? Let's see in this new blog post!

IT 130
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

SQL and Python Interview Questions for Data Analysts

KDnuggets

Walking you through the most important SQL and Python technical concepts and four interview questions to practice for the Data Analyst position.

SQL 127
article thumbnail

How to Implement a Data Pipeline Using Amazon Web Services?

Analytics Vidhya

Introduction The demand for data to feed machine learning models, data science research, and time-sensitive insights is higher than ever thus, processing the data becomes complex. To make these processes efficient, data pipelines are necessary. Data engineers specialize in building and maintaining these data pipelines that underpin the analytics ecosystem.

article thumbnail

How We Scaled New Verticals Fulfillment Backend with CockroachDB

DoorDash Engineering

It would be almost impossible to build a scalable backend without a scalable datastore. DoorDash’s expansion from food delivery into new verticals like convenience and grocery introduced a number of new business challenges that would need to be supported by our technical stack. This business expansion not only increased the number of integrated merchants dramatically but also exponentially increased the number of menu items, as stores have much larger and more complicated inventories than typica

article thumbnail

Regulation: Hurdle or Driver for Data Analytics in Financial Services

Teradata

In the aftermath of the 2008 financial crash, service providers have been subject to increasing rules & requirements. To what extent has this climate held back advances in data analytics?

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

KDnuggets Survey: Benchmark with your peers on industry spend and trends

KDnuggets

KDnuggets and its partners have just released a Spend & Trends survey to provide you the opportunity to benchmark with your peers on how folks are spending and the mindsets around current trends.

IT 116
article thumbnail

Data Warehouse Interview Questions

Analytics Vidhya

source: svitla.com Introduction Before jumping to the data warehouse interview questions, let’s first understand the overview of a data warehouse. A data warehouse is a system used for collecting and managing large amounts of data from various sources, such as transactional systems, log files, and external data sources. The data is then organized and structured […] The post Data Warehouse Interview Questions appeared first on Analytics Vidhya.

article thumbnail

Storybook cartography

ArcGIS

How to make your maps look like storybook illustrations.because storybook illustrations!

Education 105
article thumbnail

Getting started with NLP using Hugging Face transformers pipelines

databricks

Advances in Natural Language Processing (NLP) have unlocked unprecedented opportunities for businesses to get value out of their text data. Natural Language Processing.

Process 105
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Making Intelligent Document Processing Smarter: Part 1

KDnuggets

This article attempts to measure the effect of various noises present in scanned documents on the performance of various APIs in the OCR segment.

Process 108
article thumbnail

A Beginner’s Guide to the Basics of Big Data and Hadoop

Analytics Vidhya

Introduction In this technical era, Big Data is proven as revolutionary as it is growing unexpectedly. According to the survey reports, around 90% of the present data was generated only in the past two years. Big data is nothing but the vast volume of datasets measured in terabytes or petabytes or even more. Big data […] The post A Beginner’s Guide to the Basics of Big Data and Hadoop appeared first on Analytics Vidhya.

Hadoop 205
article thumbnail

Deploying Data Pipelines using the Saga pattern

Picnic Engineering

Delivering the right events at low latency and with a high volume is critical to Picnic’s system architecture. In our previous blog, Dima Kalashnikov explained how we configure our Internal services pipeline in the Analytics Platform. In this post, we will explain how our team automates the creation of new data pipeline deployments. The step towards automation was an important improvement for us, as the previous setup was manual, slow, and error-prone.

article thumbnail

Databricks Expands Brickbuilder Solutions for Migrations in EMEA

databricks

Today, we're excited to announce that Databricks has expanded Brickbuilder Solutions by collaborating with key partners in Europe, the Middle East, and Africa.

105
105
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

5 Pandas Plotting Functions You Might Not Know

KDnuggets

Utilize these plotting functions to improve your visualization game.

Utilities 108
article thumbnail

February DataHour: Enhance Your Skills with Expert Sessions

Analytics Vidhya

Introduction The February installment of the webinar series is now open! It’s a farewell time to your quest for finding the ideal data science learning platform, as Analytics Vidhya has arrived. Explore your ultimate data science destination where the emphasis is on supporting the community and fostering professional development. Attend expert-led DataHour sessions to boost […] The post February DataHour: Enhance Your Skills with Expert Sessions appeared first on Analytics Vidhya.

article thumbnail

ThoughtSpot and Databricks make governed, self-service analytics a reality with new Unity Catalog integration

ThoughtSpot

Two years ago, we announced our Databricks partnership —including the launch of ThoughtSpot for Databricks, which gives joint customers the ability to run ThoughtSpot search queries directly on the Databricks Lakehouse without the need to move any data. Since then, we’ve empowered teams at companies like Johnson & Johnson, NASDAQ, and Flyr to safely self-serve business-critical insights on governed and reliable data.

article thumbnail

What’s New in Apache Kafka 3.4

Confluent

Migrate Kafka clusters from ZooKeeper to KRaft with no downtime (early access), get improvements for Kafka Streams and Kafka Connect, and more.

Kafka 105
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.