September, 2018

article thumbnail

Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50

Data Engineering Podcast

Summary There are countless sources of data that are publicly available for use. Unfortunately, combining those sources and making them useful in aggregate is a time consuming and challenging process. The team at Enigma builds a knowledge graph for use in your own data projects. In this episode Chris Groskopf explains the platform they have built to consume large varieties and volumes of public data for constructing a graph for serving to their customers.

Building 100
article thumbnail

A new era of SQL-development, fueled by a modern data warehouse

Cloudera

SQL development is not a new concept. However, as the data warehousing world shifts into a fast-paced, digital, and agile era, the demands to quickly generate reports and help guide data-driven decisions are constantly increasing. This puts new pressures on the people working behind the scenes to prepare and serve data in a consumable way to a growing audience with various levels of access credentials and technical expertise.

article thumbnail

Themes and Conferences per Pacoid, Episode 1

Domino Data Lab: Data Engineering

Introduction: New Monthly Series! Welcome to a new monthly series! I’ll summarize highlights from recent industry conferences, new open source projects, interesting research, great examples, amazing people, etc. – all pointed at how to level up your organization’s data science practices.

article thumbnail

The Journey to Connecting Retail

Zalando Engineering

Digitizing brick & mortar fashion stores through Connected Retail Everything started back in 2015 when Zalando was already successful as an online fashion retailer in Europe. However, a B2B problem was identified that needed to be tackled: brick-and-mortar fashion stores need a way to increase their sales. Seeing the need to connect offline with online in order to help merchants solve this problem, is when I joined Zalando as a Product Manager in early 2016 at the newly established Helsinki

Retail 40
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Recap of Hadoop News for August 2018

ProjectPro

News on Hadoop - August 2018 Apache Hadoop: A Tech Skill That Can Still Prove Lucrative.Dice.com, August 2, 2018. In 2017, Gartner announced that organizations were spending close to $800 million on Hadoop distributions , even though only 14% of companies reported that they were relying on hadoop technology.However, several studies have revealed that the adoption and spending on hadoop technology continues to rise high through last year.Dice analysis demonstrates that jobs that intersect with Ha

Hadoop 40
article thumbnail

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

Summary As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access becomes increasingly critical. In this episode Todd Walter shares his considerable experience in data curation to clarify the many aspects that are necessary for a successful platform for your business.

Data Lake 100

More Trending

article thumbnail

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

Data Engineering Podcast

Summary Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become problematic to keep all of your history. Chaos Search was started to make it easy for you to keep all of your data and make it usable in S3, so that you can have the best of both worlds. In this episode the CTO, Thomas Hazel, and VP of Product, Pete Cheslock, describe how they have built a platform to let you keep all of your history, save money, a

IT 100
article thumbnail

An Agile Approach To Master Data Management with Mark Marinelli - Episode 46

Data Engineering Podcast

Summary With the proliferation of data sources to give a more comprehensive view of the information critical to your business it is even more important to have a canonical view of the entities that you care about. Is customer number 342 in your ERP the same as Bob Smith on Twitter? Using master data management to build a data catalog helps you answer these questions reliably and simplify the process of building your business intelligence reports.

article thumbnail

Taking out the threat from the inside

Cloudera

The worst thing about an inside job is that once it’s detected, it’s usually too late. Early detection is critical to prevent considerable damage arising out of insider threats to the business. But that’s easier said than done! Whether it’s a rogue trader in a bank or brokerage or someone illegally sharing company intellectual property or intelligence, illegal insider actions put enterprises at risk of losing millions.

article thumbnail

An End-to-End Open & Modular Architecture for IoT

Cloudera

While the Internet of Things (IoT) represents a significant opportunity, IoT architectures are often rigid, complex to implement, costly, and create a multitude of challenges for organizations. First of all, in order to effectively pull together an end-to-end architecture for IoT, organizations must manage multiple vendor solutions, validate that they work together, integrate them to ensure the right functionality, and provide for future enhancement compatibility.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Heralding a new era in GDPR compliance with Accenture and Cloudera

Cloudera

The General Data Protection Regulation (GDPR), in force since May 25, strengthens and unifies data protection laws for individuals within the European Union (EU), making personal data privacy a fundamental right for all. Traditionally, while companies have relied on time-consuming manual processes to achieve compliance, Accenture and Cloudera are harnessing advances in technology to simplify the compliance.

article thumbnail

And the winners are…. Congratulations to the Sixth Annual Data Impact Awards winners

Cloudera

It’s a big week for us, as many Clouderans descend on New York for the Strata Data Conference. The week is typically filled with exciting announcements from Cloudera and many partners and others in the data management, machine learning and analytics industry. Last night we kicked it off with the sixth annual Data Impact Awards Celebration. These awards recognize organizations that transform complex data into actionable insights and illustrate impact to technology, science, health, lifestyle, and

article thumbnail

Cloudera spotlights partner success at Strata Data with Partner Impact Awards

Cloudera

At Strata Data , it appeared that artificial intelligence, machine learning, and the promise of game-changing insights from big data were at the forefront of everyone’s mind. Cloudera aimed to demystify the “how” in the AI and big data equation at Strata Data through helpful sessions, anticipated keynotes, and new product announcements to alleviate the mystery associated with leveraging this revolutionary technology.

article thumbnail

Cloudera Data Warehouse – A Partner Perspective

Cloudera

Among the many reasons that a majority of large enterprises have adopted Cloudera Data Warehouse as their modern analytic platform of choice is the incredible ecosystem of partners that have emerged over recent years. In this new blog series, we will take a closer look at some of the most innovative partners, and how the Cloudera platform is helping them deliver groundbreaking solutions to our customers.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Building an Open Data Processing Pipeline for IoT

Cloudera

Authors: David Bericat, Global Technical Lead, Internet of Things, Red Hat and Jonathan Cooper-Ellis, Solutions Architect, Cloudera. Last week Cloudera introduced an open end-to-end architecture for IoT and the different components needed to help satisfy today’s enterprise needs regarding operational technology (OT), information technology (IT), data analytics and machine learning (ML), along with modern and traditional application development, deployment, and integration.

article thumbnail

AML: Past, Present and Future – Part III

Cloudera

This is the third installment in a 3 part series. The first installment provides a short background on anti-money laundering. The second installment examines common AML problems faced by financial institutions today. In this installment, we introduce an approach that carries AML well into the future. Part III: The future is now. Given what we know about current anti-money laundering systems, if we wanted to build one from scratch today, we might come up with the following requirements.

article thumbnail

Take Customer Experience Back to the Future with Data

Cloudera

Delivering a positive and memorable customer experience is the cornerstone of nearly every organization. Failure to do so negatively impacts a company’s bottom line and reputation. Each year, companies invest millions of dollars in programs and solutions that aim to improve the customer experience and provide valuable customer insights, but what if for the answer, they only had to look back to the future?

Banking 40
article thumbnail

Boosting enterprise machine learning with automated feature engineering

Cloudera

Machine learning. The very name suggests there’s little involvement required from actual people. It’s a bit surprising to note, then, that perhaps the most limiting factor in data science and machine learning today is people. People add complexity. People add the risk of error. And people add a lot of time. However, we’ll always need people to come up with the overarching prediction problems to solve and to make the ultimate choices to solve them, but there is a lot of data science work now that

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Delivering a Shared Multidisciplinary Analytics Experience Anywhere With SDX and Altus

Cloudera

Woodworking is one of my passions and I often use wooden pallets as my source material. Regardless of what I build—whether a shelf, chair, or bookcase—I always use the same things: Wood, tools, and a plan that shows dimensions and steps to put all the bits together. The other day, it struck me how similar this is to how organizations digitally transform and become data-driven.

article thumbnail

Altus Data Warehouse

Cloudera

We are proud to announce the general availability of Cloudera Altus Data Warehouse , the only cloud data warehousing service that brings the warehouse to the data. Cloudera’s modern data warehouse runs wherever it makes the most sense for your business – on-premises, public cloud, hybrid cloud, or even multi-cloud. Modern data warehousing for the cloud.

article thumbnail

AML: Past, Present and Future – Part II

Cloudera

This is the second installment in a 3 part series. The first installment provides a short background on anti-money laundering. In this installment, we examine common AML problems faced by financial institutions today. The third installment introduces an approach that carries AML into the future. Part II: Current Challenges in AML. There are several key areas in the field of anti-money laundering (AML) that rely heavily on technology.

article thumbnail

Shop the Look with Deep Learning

Zalando Engineering

Retrieving fashion products based on a query image Have you ever seen a picture on Instagram and thought, “Oh, wow! I want these shoes”? or been inspired by your favourite fashion blogger and looked for similar products (for example, on Zalando)? Visual search for fashion, the task of identifying fashion articles in an image and finding them in an online store, has been the subject of an ever growing body of scientific literature over the last few years (see for example [1-11]).

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Visual Creation and Exploration at Zalando Research

Zalando Engineering

Adversarial texture distribution learning as a tool of artistic expression Deep learning is progressing fast these days. Despite advances that were expected to happen sooner or later (e.g. accurate face and speech recognition), there are some new developments that would have seemed like a pipe dream years ago: neural networks can now generate realistic images just by looking at few examples of their properties.

article thumbnail

Zalando Strengthens its InnerSource Strategy

Zalando Engineering

Zalando is known for its commitment to the open source world. Many of our engineers are active contributors of open source projects like PostgreSQL or Kubernetes. The Zalando tech department currently consists of more than 2,000 employees that manage over 200 delivery teams and virtual teams. Zalando engineers are from 77 nations and based out of various locations across Europe which makes us super international but also quite distributed.

IT 40
article thumbnail

AML: Past, Present and Future Part I

Cloudera

This is the first installment in a 3 part series. It provides a short background on anti-money laundering for the layperson. AML professionals may wish to skip this installment and go directly to the second and third parts. The second installment examines common AML problems faced by financial institutions today. The third installment introduces an approach that carries AML into the future.

Banking 44