June, 2018

article thumbnail

Package Management And Distribution For Your Data Using Quilt with Kevin Moore - Episode 37

Data Engineering Podcast

Summary Collaboration, distribution, and installation of software projects is largely a solved problem, but the same cannot be said of data. Every data team has a bespoke means of sharing data sets, versioning them, tracking related metadata and changes, and publishing them for use in the software systems that rely on them. The CEO and founder of Quilt Data, Kevin Moore, was sufficiently frustrated by this problem to create a platform that attempts to be the means by which data can be as collabo

article thumbnail

JVM Profiler: An Open Source Tool for Tracing Distributed JVM Applications at Scale

Uber Engineering

Computing frameworks like Apache Spark have been widely adopted to build large-scale data applications. For Uber, data is at the heart of strategic decision-making and product development. To help us better leverage this data, we manage massive deployments of Spark … The post JVM Profiler: An Open Source Tool for Tracing Distributed JVM Applications at Scale appeared first on Uber Engineering Blog.

article thumbnail

Top AWS Certifications-Which one should I choose?

ProjectPro

AWS certifications are the most in-demand cloud computing certifications in the IT industry today, with an overwhelming growth in cloud computing. So, for those looking for a career in Amazon Web Services, this blog lists the best AWS certifications available today, including the cost, duration, and topics covered in each certification exam. With everyone from Netflix to American Airlines signing up to the cloud to keep things from crumbling into pieces, organizations are running into a signific

AWS 52
article thumbnail

Programming Best Practices For Data Science

Dataquest

The data science life cycle is generally comprised of the following components: data retrieval data cleaning data exploration and visualization statistical or predictive modeling While these components are helpful for understanding the different phases, they don’t help us think about our programming workflow. Often, the entire data science life cycle ends up as an arbitrary mess of notebook cells in either a Jupyter Notebook or a single messy script.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Turning petabytes of pharmaceutical data into actionable insights

Cloudera

Authors: Mai N. Nguyen, Accenture & Mitch Gomulinski, Cloudera. Imagine storing the DNA of the entire population of the US – and then cloning them, twice. That’s the equivalent of 1 petabyte ( ComputerWeekly ) – the amount of unstructured data available within our large pharmaceutical client’s business. Then imagine the insights that are locked in that massive amount of data.

article thumbnail

The State of Open Source

Zalando Engineering

The evolution and future of open source at Zalando Open source software has been the core of Zalando’s tech stack since the company’s humble beginnings, selling flip-flops from a basement 10 years ago; it’s part of our DNA as a tech company. For engineering teams at Zalando, open source is a natural part of how we solve problems, we consult and share the TechRadar for guidance on appropriate technologies to use, we contribute to projects such as Kubernetes , and work in the open on a very large

More Trending

article thumbnail

CockroachDB In Depth with Peter Mattis - Episode 35

Data Engineering Podcast

Summary With the increased ease of gaining access to servers in data centers across the world has come the need for supporting globally distributed data storage. With the first wave of cloud era databases the ability to replicate information geographically came at the expense of transactions and familiar query languages. To address these shortcomings the engineers at Cockroach Labs have built a globally distributed SQL database with full ACID semantics in Cockroach DB.

article thumbnail

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

Data Engineering Podcast

Summary Using a multi-model database in your applications can greatly reduce the amount of infrastructure and complexity required. ArangoDB is a storage engine that supports documents, dey/value, and graph data formats, as well as being fast and scalable. In this episode Jan Steeman and Jan Stücke explain where Arango fits in the crowded database market, how it works under the hood, and how you can start working with it today.

article thumbnail

Recap of Hadoop News for May 2018

ProjectPro

News on Hadoop - May 2018 Data-Driven HR: How Big Data And Analytics Are Transforming Recruitment.Forbes.com, May 4, 2018. With platforms like LinkedIn and Glassdoor giving every employer access to valuable big data, the world of recruitment transforming to intelligent recruitment.HR teams that make use of big data in future are likely to be successful in recruiting the right talent in the coming years.

Hadoop 52
article thumbnail

The cost of not embarking on a customer 360 strategy

Cloudera

Gartner’s recently released report “Master Data Management Forms the Basis of a Trusted 360-Degree View of the Customer,” shares the results of an executive survey highlighting several key points, including that customer initiatives, are among CEOs’ top five priorities in 2018. The report includes numerous strategic recommendations and outlines the impact of a Master Data Management (MDM) strategy.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Introducing Blended Learning From Cloudera University

Cloudera

Over the past decade, Cloudera University has taught more than 50,000 developers, administrators, analysts, and data scientists how to apply big data technologies. Developers are learning the APIs, so they can create new applications that were never before possible. Administrators learn to plan, install, monitor, and troubleshoot clusters. And analysts discover the power of SQL over large, diverse datasets.

Hadoop 44
article thumbnail

The Intrapreneurship Journey at Zalando

Zalando Engineering

Sharing our innovation stories: success, failures, and learnings Franzi, Humberto, Neil, Lenia, Vivek. These are just some names of the people who are willing to put in the extra effort and run the additional mile to impact the organization in a way they haven’t done before. The stories of these Zalando intrapreneurs are the ones I summarized at the Innov8rs conference in Madrid.

Media 40
article thumbnail

All Aboard

Zalando Engineering

What new tech employees can expect from Zalando onboarding So, you’ve applied for a technical role at Zalando and you’ve just accepted the offer! If you’re wondering what to expect, look no further. We are excited to share a peek behind the scenes, so you can see what awaits you in the first few weeks of this journey, regardless of whether you’re joining in Berlin, Dortmund, Dublin, Hamburg, Helsinki or Lisbon, to make sure you’re well-equipped to dive into life at Zalando.

article thumbnail

Loading Time Matters

Zalando Engineering

How Zalando's overall site speed improved by more than 25% in five months We all know that providing a fast user experience is key. Still, it was somewhat a wake-up call for us last fall when we saw our aggregated loading time increasing; not because we had increased latency in our systems but simply because the share of mobile visits kept increasing.

Bytes 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.