Building and Scala - Data Engineering Digest

Scala as a Junior Developer

Rock the JVM

SEPTEMBER 17, 2023

Lucas’ story is shared by lots of beginner Scala developers, which is why I wanted to post it here on the blog. I’ve watched thousands of developers learn Scala from scratch, and, like Lucas, they love it! If you want to learn Scala well and fast, take a look at my Scala Essentials course at Rock the JVM. sum > 8 ).

Scala

Scala Programming Coding Java

Building ETL Pipeline with Snowpark

Cloudyard

DECEMBER 24, 2024

Snowflakes Snowpark is a game-changing feature that enables data engineers and analysts to write scalable data transformation workflows directly within Snowflake using Python, Java, or Scala. Happy 0 0 % Sad 0 0 % Excited 0 0 % Sleepy 0 0 % Angry 0 0 % Surprise 0 0 % The post Building ETL Pipeline with Snowpark appeared first on Cloudyard.

Building

Building Raw Data Scala Business Intelligence

dbt on Databricks.

Confessions of a Data Guy

MARCH 4, 2025

Context and Motivation dbt (Data Build Tool): A popular open-source framework that organizes SQL transformations in a modular, version-controlled, and testable way. Databricks: A platform that unifies data engineering and data science pipelines, typically with Spark (PySpark, Scala) or SparkSQL.

Scala

Scala Data Science SQL Data Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Rock the JVM

MAY 22, 2024

Since this tutorial builds on the previous article, to follow along, we’ll need to clone that GitHub repo where we’ll be making the necessary updates to build this new version. Next, we’ll create a user.scala file in the following path, src/main/scala/rockthejvm/websockets/domain. val RedisVersion = "1.5.2" cond ( ( value.

PostgreSQL

PostgreSQL Scala Database SQL

Two-Factor Authentication in Scala with Http4s

Rock the JVM

JULY 26, 2023

If you want to master the Typelevel Scala libraries (including Http4s) with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. HOTP scala implementation HOTP generation is quite tedious, therefore for simplicity, we will use a java library, otp-java by Bastiaan Jansen. SHA256 ) }).

Scala

Scala Java Bytes Algorithm

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Pipeline

Data Pipeline Building MongoDB MySQL

Scala In Demand Technologies Built On Scala

Knowledge Hut

MAY 20, 2024

The term Scala originated from “Scalable language” and it means that Scala grows with you. In recent times, Scala has attracted developers because it has enabled them to deliver things faster with fewer codes. Developers are now much more interested in having Scala training to excel in the big data field.

Scala

Scala Technology Kafka Hadoop

A guide to UDP in Scala with FS2

Rock the JVM

DECEMBER 17, 2023

Setting Up Let’s create a new Scala 3 project and add the following to your build.sbt file. Let’s build on our if statement: import java.net.StandardSocketOptions import java.net.InetSocketAddress if ( datagramChannel. Lets see how we can search for network interfaces in scala: import cats.effect. val scala3Version = "3.3.1"

Scala

Scala Bytes Java Coding

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Cloudera

JANUARY 6, 2021

Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Building this user-defined JSON format is the most preferred method since it can be used with other operations as well. Introduction. Restart Region Servers.

Machine Learning

Machine Learning Data Science Database Building

Adopting Spark Connect

Towards Data Science

NOVEMBER 6, 2024

However, this ability to remotely run client applications written in any supported language (Scala, Python) appeared only in Spark 3.4. Therefore, we are supposed to know at the build time of each client application whether it will run via Spark Connect or not. getOrCreate() // If the client application uses your Scala code (e.g.,

Scala

Scala Java AWS Coding

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Data Engineering Podcast

JULY 27, 2020

In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to understand their limitations. What are the pros and cons of using Clojure for building Jepsen? How is Jepsen implemented?

Systems

Systems Building Scala Java

gRPC in Scala with Fs2 and Scalapb

Rock the JVM

OCTOBER 1, 2023

Scala is not officially supported at the moment however the ScalaPB library provides a good wrapper around the official gRPC Java implementation, it provides an easy API that enables translation of Protocol Buffers to Scala case classes with support for Scala3, Scala.js, and Java Interoperability. Setting Up. in ( file ( "protobuf" )).

Scala

Scala Metadata Transportation Java

Building a Full-Stack Scala 3 Application with the Typelevel Stack

Rock the JVM

JANUARY 22, 2024

Unlock the secrets to crafting a full-stack Scala 3 application from scratch: dive into Cats Effect, doobie, http4s, and Tyrian and build robust, modern software with ease

Scala

Scala Building

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Snowflake

JUNE 20, 2024

In the age of AI, enterprises are increasingly looking to extract value from their data at scale but often find it difficult to establish a scalable data engineering foundation that can process the large amounts of data required to build or improve models. The tool serves two primary functions: assessment and conversion.

Data Engineering

Data Engineering Data Engineer Scala Engineering

A Comprehensive Guide to Choosing the Best Scala Course

Rock the JVM

MAY 22, 2023

This article is all about choosing the right Scala course for your journey. How should I get started with Scala? Do you have any tips to learn Scala quickly? How to Learn Scala as a Beginner Scala is not necessarily aimed at first-time programmers. Which course should I take?

Scala

Scala Java Programming Language Programming

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

This is one way to build trust with our internal user base. Obviously not all tools are made with the same use case in mind, so we are planning to add more code samples for other (than classical batch ETL) data processing purposes, e.g. Machine Learning model building and scoring. Then we’ll segue into the Scala and R use cases.

Data Pipeline

Data Pipeline Scala Metadata Food

Scala CLI Tutorial: Creating a CLI Sudoku Solver

Rock the JVM

JANUARY 8, 2023

Antonio is an alumnus of Rock the JVM, now a senior Scala developer with his own contributions to Scala libraries and junior devs under his mentorship. Which brings us to this article: Antonio originally started from my Sudoku backtracking article and built a Scala CLI tutorial for the juniors he’s mentoring.

Scala

Scala Java Algorithm Utilities

Creating a CLI Sudoku Solver with Scala Native

Rock the JVM

JANUARY 8, 2023

Scala CLI is a powerful tool for prototyping and building Scala applications: learn how to use Scala Cli, Scala Native, and decline to create a brute-force Sudoku solver

Scala

Scala Building

Apache Spark MLlib vs Scikit-learn: Building Machine Learning Pipelines

Towards Data Science

MARCH 9, 2023

Databricks has a community edition hosted in AWS that is free and allows users to access one micro-cluster and build codes in Spark using Python or Scala. Obviously, it runs on Apache Spark, which makes it the right choice when dealing with a big data context because of Spark’s properties of large-scale distributed computing.

Machine Learning

Machine Learning Building Datasets Big Data

A Distributed Code Execution Engine in Pekko with Scala

Rock the JVM

JUNE 13, 2024

A practical guide to building a distributed system with Scala and Pekko: learn how to run other people's code in a real-world scenario

Scala

Scala Coding Engineering Systems

How to Write a Full-Stack Scala 3 Application with the Typelevel Stack

Rock the JVM

JANUARY 22, 2024

Introduction The Typelevel stack is one of the most powerful sets of libraries in the Scala ecosystem. They allow you to write powerful applications with pure functional programming - as of this writing, the Typelevel ecosystem is one of the biggest selling points of Scala. The Typelevel stack is based on Cats and Cats Effect.

Scala

Scala SQL Database Coding

OAuth Authentication in Scala with Http4s

Rock the JVM

JULY 26, 2023

In this section, we’ll build an application that connects to GitHub using OAuth and request user information using the GitHub API. To build this application we will need to add the following to our build.sbt file: val scala3Version = "3.3.0" fromString ( "oauth/src/main/scala/com/xonal/index.html" , Some ( request ) ).

Scala

Scala Coding Accessibility Accessible

Scala Vs Python Vs R Vs Java - Which language is better for Spark & Why?

Knowledge Hut

MAY 3, 2024

If you search top and highly effective programming languages for Big Data on Google, you will find the following top 4 programming languages: Java Scala Python R Java Java is one of the oldest languages of all 4 programming languages listed here. Scala is a highly Scalable Language. Scala is the native language of Spark.

Scala

Scala Java Python Programming Language

REST APIs with Play Framework and Scala

Rock the JVM

SEPTEMBER 3, 2023

Master the basics of Play Framework to build robust HTTP APIs with Scala in this comprehensive tutorial

Scala

Scala Building

HTTP Authentication with Scala and Http4s

Rock the JVM

JUNE 5, 2023

Http4s is one of the most powerful libraries in the Scala ecosystem, and it’s part of the Typelevel stack. If you want to master the Typelevel Scala libraries with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. by Herbert Kateu Hey, it’s Daniel here. val scala3Version = "3.2.2"

Scala

Scala Coding Accessibility Accessible

HTTP Authentication with Scala and Http4s

Rock the JVM

JUNE 5, 2023

Http4s is one of the most powerful libraries in the Scala ecosystem, and it’s part of the Typelevel stack. If you want to master the Typelevel Scala libraries with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. by Herbert Kateu Hey, it’s Daniel here. val scala3Version = "3.2.2"

Scala

Scala Coding Accessibility Accessible

12 Programming Languages Walk into a Kafka Cluster…

Confluent

APRIL 23, 2019

When it was first created, Apache Kafka ® had a client API for just Scala and Java. This freedom of choice ultimately allows you to build an event streaming platform with the language best suited to your business needs. They make these clients more robust so that you can confidently deploy them in production.

Programming Language

Programming Language Kafka Programming Scala

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

They still take on the responsibilities of a traditional data engineer, like building and managing pipelines and maintaining data quality, but they are tasked with delivering AI data products, rather than traditional data products. The ability and skills to build scalable, automated data pipelines.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

A Functional Load Balancer with Scala, http4s and Cats Effect

Rock the JVM

OCTOBER 29, 2023

Build an application load balancer with Scala & Cats Effect: achieve efficiency and concurrency with included tests

Scala

Scala Building

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

Eventador, based in Austin, TX, was founded by Erik Beebe and Kenny Gorman in 2016 to address a fundamental business problem – make it simpler to build streaming applications built on real-time data. This typically involved a lot of coding with Java, Scala or similar technologies.

Cloud

Cloud Process Scala Kafka

Data pipeline asset management with Dataflow

Netflix Tech

FEBRUARY 9, 2022

It could be a JAR compiled from Scala, a Python script or module, or a simple SQL file. For example, you may want to build your Scala code and deploy it to an alternative location in S3 while pushing a sandbox version of your workflow that points to this alternative location. scala-workflow ??? dataflow.yaml ???

Data Pipeline

Data Pipeline Management Scala Python

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with Here what Databricks brought this year: Spark 4.0 — (1) PySpark erases the differences with the Scala version, creating a first class experience for Python users. (2) Databricks sells a toolbox, you don't buy any UX. 3) Spark 4.0

Metadata

Metadata Data Warehouse BI MySQL

REST APIs Using Play Framework and Scala: A Comprehensive Guide

Rock the JVM

SEPTEMBER 3, 2023

Introduction One of the simplest and well documented ways to build a web API is to follow the REST paradigm. Play Framework “makes it easy to build web applications with Java & Scala”, as it is stated on their site, and it’s true. In this article we will try to develop a basic skeleton for a REST API using Play and Scala.

Scala

Scala Database Project Coding

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Snowflake

DECEMBER 20, 2023

Within the scope of gen AI, this new Snowpark runtime empowers developers to efficiently and securely deploy containers to do things like the following and more: LLM fine-tuning Open-source vector database deployment Distributed embedding processing Voice to text transcription Why did Snowflake build a container service?

Scala

Scala Government Java Cloud

Bring your Snowpark models to life on ThoughtSpot

ThoughtSpot

JANUARY 23, 2024

If you’re new to Snowpark, this is Snowflake ’s set of libraries and runtimes that securely deploy and process non-SQL code including Python, Java, and Scala. Build interactive, AI-powered data apps: Product leaders can use ThoughtSpot Everywhere and Snowpark to drive richer, search-driven embedded analytics experiences for all users.

Scala

Scala Programming Language Java Python

How to Build a ZIO Full-Stack Web Application

Rock the JVM

OCTOBER 14, 2024

Learn how to build a full-stack web application in Scala with ZIO and Laminar, step by step

Building

Building Scala

How to Build a ZIO Full-Stack Web Application

Rock the JVM

OCTOBER 14, 2024

Learn how to build a full-stack web application in Scala with ZIO and Laminar, step by step

Building

Building Scala

Accelerated integration of Eventador with Cloudera – SQL Stream Builder

Cloudera

MARCH 29, 2021

Eventador was adept at simplifying the process of building streaming applications. They no longer have to depend on any skilled Java or Scala developers to write special programs to gain access to such data streams. . In October 2020, Cloudera made a strategic acquisition of a company called Eventador. This is not a scalable model.

SQL

SQL Scala Manufacturing Java

How Software Bill of Materials change the dependency game

Zalando Engineering

APRIL 12, 2023

Some teams use tools like dependabot , scala-steward that create pull requests in repositories when new library versions are available. Other teams update dependencies regularly in bulk, supported by build system plugins (e.g. Addressing this finding helped reduce build times and lower resulting docker image size significantly.

Java

Java Scala Python Metadata

Discover And De-Clutter Your Unstructured Data With Aparavi

Data Engineering Podcast

JUNE 12, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Unstructured Data

Unstructured Data MongoDB MySQL Scala

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

NOVEMBER 20, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Data Lake

Data Lake Data Ingestion MongoDB MySQL

Building Spark Lineage For Data Lakes

Monte Carlo

MAY 31, 2022

Spark supports several different programming interfaces that can create jobs such as Scala, Python, or R. Following are examples from Databricks notebooks in Python, Scala, and R that all do the same thing – load a CSV file into a Spark DataFrame. load('/data/input.csv') Scala %scala val data = spark.read.format("csv").option("header",

Data Lake

Data Lake Building Scala Metadata

Apache Spark vs MapReduce: A Detailed Comparison

Knowledge Hut

MAY 2, 2024

Also, there is no interactive mode available in MapReduce Spark has APIs in Scala, Java, Python, and R for all basic transformations and actions. It also supports multiple languages and has APIs for Java, Scala, Python, and R. The Pig has SQL-like syntax and it is easier for SQL developers to get on board easily.

Hadoop

Hadoop Scala Datasets Java

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Data Engineering Podcast

NOVEMBER 6, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

MongoDB

MongoDB MySQL Scala Machine Learning

Scala as a Junior Developer

Building ETL Pipeline with Snowpark

Webinars

Trending Sources

dbt on Databricks.

Webinars

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Two-Factor Authentication in Scala with Http4s

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Scala In Demand Technologies Built On Scala

A guide to UDP in Scala with FS2

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Adopting Spark Connect

Build More Reliable Distributed Systems By Breaking Them With Jepsen

gRPC in Scala with Fs2 and Scalapb

Building a Full-Stack Scala 3 Application with the Typelevel Stack

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

A Comprehensive Guide to Choosing the Best Scala Course

Ready-to-go sample data pipelines with Dataflow

Scala CLI Tutorial: Creating a CLI Sudoku Solver

Creating a CLI Sudoku Solver with Scala Native

Apache Spark MLlib vs Scikit-learn: Building Machine Learning Pipelines

A Distributed Code Execution Engine in Pekko with Scala

How to Write a Full-Stack Scala 3 Application with the Typelevel Stack

OAuth Authentication in Scala with Http4s

Scala Vs Python Vs R Vs Java - Which language is better for Spark & Why?

REST APIs with Play Framework and Scala

HTTP Authentication with Scala and Http4s

HTTP Authentication with Scala and Http4s

12 Programming Languages Walk into a Kafka Cluster…

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

A Functional Load Balancer with Scala, http4s and Cats Effect

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Data pipeline asset management with Dataflow

Databricks, Snowflake and the future

REST APIs Using Play Framework and Scala: A Comprehensive Guide

Unlock the New Wave of Gen AI With Snowpark Container Services GPU-Powered Compute

Bring your Snowpark models to life on ThoughtSpot

How to Build a ZIO Full-Stack Web Application

How to Build a ZIO Full-Stack Web Application

Accelerated integration of Eventador with Cloudera – SQL Stream Builder

How Software Bill of Materials change the dependency game

Discover And De-Clutter Your Unstructured Data With Aparavi

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Building Spark Lineage For Data Lakes

Apache Spark vs MapReduce: A Detailed Comparison

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Stay Connected