Sat.Jun 13, 2020 - Fri.Jun 19, 2020

article thumbnail

3 Key techniques, to optimize your Apache Spark code

Start Data Engineering

Intro A lot of tutorials show how to write spark code with just the API and code samples, but they do not explain how to write “efficient Apache Spark” code.

Coding 130
article thumbnail

Business Intelligence meets Data Engineering with Emerging Technologies

Simon Späti

Today we have more requirements with ever-growing tools and framework, complex cloud architectures, and with data stack that is changing rapidly. I hear claims: “Business Intelligence (BI) takes too long to integrate new data”, or “understanding how the numbers match up is very hard and needs lots of analysis”. The goal of this article is to make business intelligence easier, faster and more accessible with techniques from the sphere of data engineering.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Confluent

When I have a small software project that I want to share with the world, I don’t write my own version control system with a web UI. I don’t even […].

Kafka 123
article thumbnail

There Are No Perfect Words…

Teradata

Juneteenth has been declared a U.S. holiday at Teradata, as we stand with the black community and reflect on what we can do to fight racism and injustice, and embrace diversity.

119
119
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Accelerate Your Machine Learning With The StreamSQL Feature Store

Data Engineering Podcast

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage.

article thumbnail

Understanding Azure Synapse Analytics

Advancing Analytics: Data Engineering

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace.

SQL 59

More Trending

article thumbnail

AWS First-Party Service Integration with Teradata Vantage

Teradata

Integration with AWS first-party services gives our enterprise customers as much cloud-native functionality as they want for their Vantage environments. Learn more.

AWS 71
article thumbnail

Comparing Akka Streams, Kafka Streams and Spark Streaming

Rock the JVM

Explore how Akka Streams, Kafka Streams, and Spark Streaming stack up and find out which one is best for your use case

Kafka 52
article thumbnail

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

MongoDB.live took place last week, and Rockset had the opportunity to participate alongside members of the MongoDB community and share about our work to make MongoDB data accessible via real-time external indexing. In our session, we discussed the need for modern data-driven applications to perform real-time aggregations and joins, and how Rockset uses MongoDB change streams and Converged Indexing to deliver fast queries on data from MongoDB.

MongoDB 52
article thumbnail

Build Real-Time Observability Pipelines with Confluent Cloud and AppDynamics

Confluent

Many organisations rely on commercial or open source monitoring tools to measure the performance and stability of business-critical applications. AppDynamics, Datadog, and Prometheus are widely used commercial and open source […].

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Microsoft Azure First-Party Service Integration with Teradata Vantage

Teradata

Integration with Azure first-party services enables Vantage users to tap into new sources of innovation across all aspects of the analytic process from start to finish.

Process 59
article thumbnail

Your Essential dbt Project Checklist

dbt Developer Hub

If you’ve been using dbt for over a year, your project is out-of-date. This is natural. New functionalities have been released. Warehouses change. Best practices are updated. Over the last year, I and others on the Fishtown Analytics (now dbt Labs!) team have conducted seven audits for clients who have been using dbt for a minimum of 2 months. In every single audit, we found opportunities to: Improve performance Improve maintainability Make it easier for new people to get up-to-speed on the proj

Project 40
article thumbnail

Lloyds Banking Group

Teradata

Lloyds Banking Group executes analytic projects that benefit the customer journey for multiple brands within Lloyds Banking Group.

Banking 52
article thumbnail

Intelligent Analytics for Telcos Using Teradata Vantage

Teradata

Learn how leveraging Machine Learning for advanced analytics enables Telcos to tackle problems from identifying network anomalies to customer churn. Read more.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.