Sat.Jun 15, 2019 - Fri.Jun 21, 2019

article thumbnail

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless cu

Data Lake 100
article thumbnail

Building a Scalable Search Architecture

Confluent

Software projects of all sizes and complexities have a common challenge: building a scalable solution for search. Who has never seen an application use RDBMS SQL statements to run searches? You might be wondering, is this a good solution? As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search,

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Netflix Studio Hack Day?—?May 2019

Netflix Tech

Netflix Studio Hack Day ?—?May 2019 By Tom Richards , Carenina Garcia Motion , and Marlee Tart Hack Days are a big deal at Netflix. They’re a chance to bring together employees from all our different disciplines to explore new ideas and experiment with emerging technologies. For the most recent hack day, we channeled our creative energy towards our studio efforts.

Java 15
article thumbnail

AI for Industrials: Why is it different?

Teradata

Cheryl Wiebe examines the challenges of using AI in industrial situations.

IT 75
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Should you have an ETL window in your Modern Data Warehouse?

Advancing Analytics: Data Engineering

Ah the ETL (Extract-Transform-Load) Window, the schedule by which the Business Intelligence developer sets their clock, the nail-biting nightly period during which the on-call support hopes their phone won’t ring. It’s a cornerstone of the data warehousing approach… and we shouldn’t have one. There, I said it. Hear me out – back in the on-premises days we had data loading processes that connect directly to our source system databases and perform huge data extract queries as the start of one long

article thumbnail

Four Reasons Why Upgrading to Vantage is Worth It

Teradata

Running older Teradata analytics software versions may not support the latest innovations of Vantage and could cost you more than upgrading. Learn more.

IT 75