Sat.Jun 13, 2020 - Fri.Jun 19, 2020

article thumbnail

3 Key techniques, to optimize your Apache Spark code

Start Data Engineering

Intro A lot of tutorials show how to write spark code with just the API and code samples, but they do not explain how to write “efficient Apache Spark” code.

article thumbnail

Business Intelligence meets Data Engineering with Emerging Technologies

Simon Späti

Today we have more requirements with ever-growing tools and framework, complex cloud architectures, and with data stack that is changing rapidly. I hear claims: “Business Intelligence (BI) takes too long to integrate new data”, or “understanding how the numbers match up is very hard and needs lots of analysis”. The goal of this article is to make business intelligence easier, faster and more accessible with techniques from the sphere of data engineering.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

The Cost of Apache Kafka: An Engineer’s Guide to Pricing Out DIY Operations

Confluent

When I have a small software project that I want to share with the world, I don’t write my own version control system with a web UI. I don’t even […].

article thumbnail

There Are No Perfect Words…

Teradata

Juneteenth has been declared a U.S. holiday at Teradata, as we stand with the black community and reflect on what we can do to fight racism and injustice, and embrace diversity.

article thumbnail

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

article thumbnail

Accelerate Your Machine Learning With The StreamSQL Feature Store

Data Engineering Podcast

Summary Machine learning is a process driven by iteration and experimentation which requires fast and easy access to relevant features of the data being processed. In order to reduce friction in the process of developing and delivering models there has been a recent trend toward building a dedicated feature. In this episode Simba Khadder discusses his work at StreamSQL building a feature store to make creation, discovery, and monitoring of features fast and easy to manage.

article thumbnail

Understanding Azure Synapse Analytics

Advancing Analytics: Data Engineering

You might have seen that I’ve been pretty busy recently, digging into the new Azure Synapse Analytics preview, announced back at Microsoft Build 2020. I’ve explored the spark engine, SQL serverless/On-Demand and various other bits… but I’m still getting the same question of “Cool!…. but what actually is it?”. One of the problems here is that Azure SQL Data Warehouse was rebranded as “Azure Synapse Analytics”… but it’s not the same as the full workspace.

SQL

More Trending

article thumbnail

AWS First-Party Service Integration with Teradata Vantage

Teradata

Integration with AWS first-party services gives our enterprise customers as much cloud-native functionality as they want for their Vantage environments. Learn more.

AWS
article thumbnail

Comparing Akka Streams, Kafka Streams and Spark Streaming

Rock the JVM

Explore how Akka Streams, Kafka Streams, and Spark Streaming stack up and find out which one is best for your use case

article thumbnail

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

MongoDB.live took place last week, and Rockset had the opportunity to participate alongside members of the MongoDB community and share about our work to make MongoDB data accessible via real-time external indexing. In our session, we discussed the need for modern data-driven applications to perform real-time aggregations and joins, and how Rockset uses MongoDB change streams and Converged Indexing to deliver fast queries on data from MongoDB.

article thumbnail

Build Real-Time Observability Pipelines with Confluent Cloud and AppDynamics

Confluent

Many organisations rely on commercial or open source monitoring tools to measure the performance and stability of business-critical applications. AppDynamics, Datadog, and Prometheus are widely used commercial and open source […].

article thumbnail

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

article thumbnail

Microsoft Azure First-Party Service Integration with Teradata Vantage

Teradata

Integration with Azure first-party services enables Vantage users to tap into new sources of innovation across all aspects of the analytic process from start to finish.

article thumbnail

Your Essential dbt Project Checklist

dbt Developer Hub

If you’ve been using dbt for over a year, your project is out-of-date. This is natural. New functionalities have been released. Warehouses change. Best practices are updated. Over the last year, I and others on the Fishtown Analytics (now dbt Labs!) team have conducted seven audits for clients who have been using dbt for a minimum of 2 months. In every single audit, we found opportunities to: Improve performance Improve maintainability Make it easier for new people to get up-to-speed on the proj

article thumbnail

Lloyds Banking Group

Teradata

Lloyds Banking Group executes analytic projects that benefit the customer journey for multiple brands within Lloyds Banking Group.

article thumbnail

Intelligent Analytics for Telcos Using Teradata Vantage

Teradata

Learn how leveraging Machine Learning for advanced analytics enables Telcos to tackle problems from identifying network anomalies to customer churn. Read more.

article thumbnail

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.