Information and Scala - Data Engineering Digest

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Rock the JVM

MAY 22, 2024

Let’s create a validateutility.scala in the following path, src/main/scala/rockthejvm/websockets/domain , and add the following code: package rockthejvm.websockets.domain import cats.data.Validated object validateutility { def validateItem [ F ]( value : String , userORRoom : F , name : String ) : Validated [ String , F ] = { Validated.

PostgreSQL

PostgreSQL Scala Database SQL

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Data Engineering Podcast

JUNE 5, 2022

The team at Skyflow decided that the second best way is to build a storage system dedicated to securely managing your sensitive information and making it easy to integrate with your applications and data systems. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem.

Data Security

Data Security Metadata MongoDB MySQL

Two-Factor Authentication in Scala with Http4s

Rock the JVM

JULY 26, 2023

If you want to master the Typelevel Scala libraries (including Http4s) with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. HOTP scala implementation HOTP generation is quite tedious, therefore for simplicity, we will use a java library, otp-java by Bastiaan Jansen.

Scala

Scala Java Bytes Algorithm

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A guide to UDP in Scala with FS2

Rock the JVM

DECEMBER 17, 2023

It works by bundling up data in a UDP packet, adding header information, and sending these packets to the target destination. Setting Up Let’s create a new Scala 3 project and add the following to your build.sbt file. Lets see how we can search for network interfaces in scala: import cats.effect. val scala3Version = "3.3.1"

Scala

Scala Bytes Java Coding

Building ETL Pipeline with Snowpark

Cloudyard

DECEMBER 24, 2024

Snowflakes Snowpark is a game-changing feature that enables data engineers and analysts to write scalable data transformation workflows directly within Snowflake using Python, Java, or Scala. RAW_CUSTOMERS : Stores customer information. The data resides in three tables: RAW_ORDERS : Captures order details.

Building

Building Raw Data Scala Business Intelligence

Adopting Spark Connect

Towards Data Science

NOVEMBER 6, 2024

However, this ability to remotely run client applications written in any supported language (Scala, Python) appeared only in Spark 3.4. In any case, all client applications use the same Scala code to initialize SparkSession, which operates depending on the run mode. getOrCreate() // If the client application uses your Scala code (e.g.,

Scala

Scala Java AWS Coding

Scala CLI Tutorial: Creating a CLI Sudoku Solver

Rock the JVM

JANUARY 8, 2023

Antonio is an alumnus of Rock the JVM, now a senior Scala developer with his own contributions to Scala libraries and junior devs under his mentorship. Which brings us to this article: Antonio originally started from my Sudoku backtracking article and built a Scala CLI tutorial for the juniors he’s mentoring.

Scala

Scala Java Algorithm Utilities

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Snowflake

JUNE 20, 2024

The assessment is built by scanning any codebase written in Python or Scala and outputting a readiness score for conversion to Snowpark. As a result, the tool can take in both code files and notebooks with multiple languages (such as Scala, Python and SQL) at the same time. Try it today to see how smooth the on-ramp to Snowpark can be.

Data Engineering

Data Engineering Data Engineer Scala Engineering

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

For more information on this and other examples please visit the Dataflow documentation page." This logic consists of the following parts: DDL code, table metadata information, data transformation and a few audit steps. A large number of our data users employ SparkSQL, pyspark, and Scala. scala-workflow ? ???

Data Pipeline

Data Pipeline Scala Metadata Food

OAuth Authentication in Scala with Http4s

Rock the JVM

JULY 26, 2023

The authorization server on app2 will respond with a token id and an access token app1 can now request the user’s information from app2’s API using the access token. In this section, we’ll build an application that connects to GitHub using OAuth and request user information using the GitHub API. Accessing the Github API through OAuth.

Scala

Scala Coding Accessible Accessibility

The Scala Travel Diary

Zalando Engineering

MAY 23, 2016

It’s no secret that Zalando Tech has had its hands full lately with its participation in several Scala conferences and meetups. As a company who practices Radical Agility , our use of Scala has skyrocketed and it’s now one of our most adopted programming languages amongst developers. So, where have we been in the Scala world?

Scala

Scala Programming Language Technology Programming

HTTP Authentication with Scala and Http4s

Rock the JVM

JUNE 5, 2023

Http4s is one of the most powerful libraries in the Scala ecosystem, and it’s part of the Typelevel stack. If you want to master the Typelevel Scala libraries with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. by Herbert Kateu Hey, it’s Daniel here. val scala3Version = "3.2.2"

Scala

Scala Coding Accessible Accessibility

HTTP Authentication with Scala and Http4s

Rock the JVM

JUNE 5, 2023

Http4s is one of the most powerful libraries in the Scala ecosystem, and it’s part of the Typelevel stack. If you want to master the Typelevel Scala libraries with real-life practice, check out the Typelevel Rite of Passage course, a full-stack project-based course. by Herbert Kateu Hey, it’s Daniel here. val scala3Version = "3.2.2"

Scala

Scala Coding Accessible Accessibility

Functors and Monads with Java and Scala by Magnus Smith

Scott Logic

MARCH 30, 2025

Previous posts have looked at Algebraic Data Types with Java Variance, Phantom and Existential types in Java and Scala Intersection and Union Types with Java and Scala In this post we will combine some ideas from functional programming with strong typing to produce robust expressive code that is more reusable.

Scala

Scala Java Coding Systems

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with Here what Databricks brought this year: Spark 4.0 — (1) PySpark erases the differences with the Scala version, creating a first class experience for Python users. (2) Databricks sells a toolbox, you don't buy any UX. 3) Spark 4.0

Metadata

Metadata Data Warehouse BI MySQL

Discover And De-Clutter Your Unstructured Data With Aparavi

Data Engineering Podcast

JUNE 12, 2022

Another category of unstructured data that every business deals with is PDFs, Word documents, workstation backups, and countless other types of information. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Unstructured Data

Unstructured Data MongoDB MySQL Scala

REST APIs Using Play Framework and Scala: A Comprehensive Guide

Rock the JVM

SEPTEMBER 3, 2023

Play Framework “makes it easy to build web applications with Java & Scala”, as it is stated on their site, and it’s true. In this article we will try to develop a basic skeleton for a REST API using Play and Scala. PlayScala plugin defines default settings for Scala-based applications. import Keys._ get ( id ).

Scala

Scala Database Project Coding

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Cloudera

JANUARY 6, 2021

For more information about CDSW visit the Cloudera Data Science Workbench product page. Value: /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded.jar:/opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar. Restart Region Servers.

Machine Learning

Machine Learning Data Science Database Building

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. A variety of platforms have been developed to capture and analyze that information to great effect, but they are inherently limited in their utility due to their nature as storage systems.

Metadata

Metadata MongoDB MySQL Scala

How Software Bill of Materials change the dependency game

Zalando Engineering

APRIL 12, 2023

Some teams use tools like dependabot , scala-steward that create pull requests in repositories when new library versions are available. The Software Bill of Materials contains information about the packages and libraries used by an application. Other teams update dependencies regularly in bulk, supported by build system plugins (e.g.

Java

Java Scala Python Metadata

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

In this episode he shares his journey from building a consumer product to launching a data pipeline service and how his frustrations as a product owner have informed his work at Hevo Data. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses.

Data Pipeline

Data Pipeline Building MongoDB MySQL

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

Data Engineering Podcast

SEPTEMBER 9, 2018

Summary Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become problematic to keep all of your history. What are some of the most interesting or unexpected uses of Chaos Search and access to large amounts of historical log information that you have seen?

IT

IT PostgreSQL Scala AWS

Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

Data Engineering Podcast

APRIL 29, 2018

Metabase is a tool built with the goal of making the act of discovering information and asking questions of an organizations data easy and self-service for non-technical users.

Business Intelligence

Business Intelligence Scala Hadoop Machine Learning

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

In addition, AI data engineers should be familiar with programming languages such as Python , Java, Scala, and more for data pipeline, data lineage, and AI model development. This can come with tedious checks on secure information like PII, extra layers of security, and more meetings with the legal team.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Data Engineering Podcast

NOVEMBER 6, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. How has that informed your efforts in the development and release of the project? How has that informed your efforts in the development and release of the project?

MongoDB

MongoDB MySQL Scala Machine Learning

User Analytics In Depth At Heap with Dan Robinson - Episode 36

Data Engineering Podcast

JUNE 17, 2018

How do you ensure the integrity and accuracy of that information? What is your approach for merging and enriching event data with the information that you retrieve from your supported integrations? How do you ensure the integrity and accuracy of that information? What challenges does that pose in your processing architecture?

Scala

Scala Kafka SQL Architecture

Using SQL to democratize streaming data

Cloudera

MARCH 2, 2021

This data engineering skillset typically consists of Java or Scala programming skills mated with deep DevOps acumen. They no longer need to ask a small subset of the organization to provide them with information, rather, they have tooling, systems, and capabilities to get the data they need. A rare breed.

SQL

SQL Java Data Lake Scala

Migrating From Elasticsearch 7.17 to Elasticsearch 8.x: Pitfalls and Learnings

Zalando Engineering

NOVEMBER 19, 2023

And even if you were to gather all the information from the docs, it's still not enough. It provides a powerful information retrieval language and engine that integrates several microservice components built by the Search Department. From the description on its (internal) repository page: Origami is the Zalando Core Search API.

Scala

Scala Java Coding IT

Fundamentals of Apache Spark

Knowledge Hut

MAY 3, 2024

Spark offers over 80 high-level operators that make it easy to build parallel apps and one can use it interactively from the Scala, Python, R, and SQL shells. The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development.

Hadoop

Hadoop Scala Healthcare Big Data

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Data Engineering Podcast

MAY 27, 2018

Links Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay (..)

Data Pipeline

Data Pipeline MongoDB Google Cloud Scala

Driving Agility and Scalability through Smart Data

Cloudera

MAY 3, 2021

This data engineering skill set typically consists of Java or Scala programming skills mated with deep DevOps acumen. They no longer need to ask a small subset of the organization to provide them with information, rather, they have tooling, systems, and capabilities to get the data they need. A rare breed.

Scala

Scala Retail Java SQL

Collect Logs and Traces From Your Snowflake Applications With Event Tables

Snowflake

OCTOBER 30, 2023

Enter the new Event Tables feature, which helps developers and data engineers easily instrument their code to capture and analyze logs and traces for all languages: Java, Scala, JavaScript, Python and Snowflake Scripting. For further information about how Event Tables work, visit Snowflake product documentation.

Java

Java Scala Hadoop Data Ingestion

Reliable, Fast Access to On-Chain Data Insights

Confluent

JUNE 7, 2019

For example, we recently examined data on Ethereum smart contract interactions and clearly identified patterns of usage that could inform future development in what is essentially a machine dominated ecosystem. Complementary data types such as transaction receipts, event logs, and state diffs are also extracted.

Accessible

Accessible Accessibility Kafka Scala

The Future of Java: Top Trends and Technologies

Knowledge Hut

JULY 7, 2023

Git Git, or Global Information Tracker is a version control system popular among DevOps users. Scala Scala is a programming language that combines object-oriented and functional programming paradigms. It runs on the JVM and offers seamless Java interoperability, making it easy for Java developers to transition to Scala.

Java

Java Technology Programming Language Scala

Top 11 Programming Languages for Data Science

Knowledge Hut

JANUARY 18, 2024

Data scientists are thought leaders who apply their expertise in statistics and machine learning to extract useful information from data. It is a declarative language for interacting with databases and allows you to create queries to extract information from your data sets.

Programming Language

Programming Language Data Science Programming Java

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

AUGUST 13, 2022

In this episode Shinji Kim discusses the challenges of data discovery and how to collect and preserve additional context about each piece of information so that you can find what you need when you don’t even know what you’re looking for yet. Go to dataengineeringpodcast.com/ascend and sign up for a free trial.

Metadata

Metadata MongoDB MySQL Scala

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

For example, the data storage systems and processing pipelines that capture information from genomic sequencing instruments are very different from those that capture the clinical characteristics of a patient from a site. Snowflake is the pioneer of the Data Cloud , a global, federated network for secure, governed information exchange.

Metadata

Metadata Healthcare Medical Data Storage

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global

Data Engineering Podcast

SEPTEMBER 4, 2022

Sust Global was created to provide curated data sets for organizations to be able to analyze climate information in the context of their business needs. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses.

MongoDB

MongoDB MySQL Scala Machine Learning

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

Data Engineering Podcast

OCTOBER 2, 2022

In this episode co-founder Martin Sahlen explains the impact that easy access to lineage information can have on the work of data engineers and analysts, and how he and his team have designed their platform to offer that information to engineers and stakeholders in the places that they interact with data.

IT

IT Food MongoDB PostgreSQL

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

JULY 18, 2022

In this blog we will explore how we can use Apache Flink to get insights from data at a lightning-fast speed, and we will use Cloudera SQL Stream Builder GUI to easily create streaming jobs using only SQL language (no Java/Scala coding required). It provides flexible and expressive APIs for Java and Scala. Use case recap.

Process

Process Kafka Scala SQL

1.5 Years of Spark Knowledge in 8 Tips

Towards Data Science

DECEMBER 24, 2023

It takes python/java/scala/R/SQL and converts that code into a highly optimized set of transformations. Ok, hopefully not all of that was new information. 5— Use SQL Syntax Whether you’re using scala, java, python, SQL, or R, spark will always leverage the same transformations under the hood. Let’s dive in! Image by author.

Scala

Scala SQL Java Python

Bridging The Gap Between Machine Learning And Operations At Iguazio

Data Engineering Podcast

MARCH 1, 2021

CI/CD) Once a model is in production, what are the types and sources of information that you collect to monitor their performance? CI/CD) Once a model is in production, what are the types and sources of information that you collect to monitor their performance? What are the factors that contribute to model drift?

Machine Learning

Machine Learning Data Warehouse Scala Hadoop

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

Data Engineering Podcast

AUGUST 21, 2022

In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Lambda Architecture

Lambda Architecture MongoDB MySQL Scala

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Data Engineering Podcast

JUNE 28, 2021

Who are your target customers and how does that focus inform your product and feature priorities? Who are your target customers and how does that focus inform your product and feature priorities? This has led to an explosion of database engines and related tools to address these different needs.

Data Analysis

Data Analysis Scala Data Warehouse Kafka

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Webinars

Trending Sources

Two-Factor Authentication in Scala with Http4s

Webinars

A guide to UDP in Scala with FS2

Building ETL Pipeline with Snowpark

Adopting Spark Connect

Scala CLI Tutorial: Creating a CLI Sudoku Solver

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Ready-to-go sample data pipelines with Dataflow

OAuth Authentication in Scala with Http4s

The Scala Travel Diary

HTTP Authentication with Scala and Http4s

HTTP Authentication with Scala and Http4s

Functors and Monads with Java and Scala by Magnus Smith

Databricks, Snowflake and the future

Discover And De-Clutter Your Unstructured Data With Aparavi

REST APIs Using Play Framework and Scala: A Comprehensive Guide

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Level Up Your Data Platform With Active Metadata

How Software Bill of Materials change the dependency game

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

Metabase Self Service Business Intelligence with Sameer Al-Sakran - Episode 29

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

User Analytics In Depth At Heap with Dan Robinson - Episode 36

Using SQL to democratize streaming data

Migrating From Elasticsearch 7.17 to Elasticsearch 8.x: Pitfalls and Learnings

Fundamentals of Apache Spark

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Driving Agility and Scalability through Smart Data

Collect Logs and Traces From Your Snowflake Applications With Event Tables

Reliable, Fast Access to On-Chain Data Insights

The Future of Java: Top Trends and Technologies

Top 11 Programming Languages for Data Science

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Snowflake and the Pursuit Of Precision Medicine

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

1.5 Years of Spark Knowledge in 8 Tips

Bridging The Gap Between Machine Learning And Operations At Iguazio

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Stay Connected