Data-driven 2021: Predictions for a new year in data, analytics and AI
DataKitchen
JANUARY 4, 2021
The post Data-driven 2021: Predictions for a new year in data, analytics and AI first appeared on DataKitchen.
DataKitchen
JANUARY 4, 2021
The post Data-driven 2021: Predictions for a new year in data, analytics and AI first appeared on DataKitchen.
Cloudera
JANUARY 6, 2021
Introduction. Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. For data professionals that want to make use of data stored in HBase the recent upstream project “hbase-connectors” can be used with PySpark for basic operations.
Teradata
JANUARY 5, 2021
By leveraging data to create a 360 degree view of its citizenry, government agencies can create more optimal experiences & improve outcomes such as closing the tax gap or improving quality of care.
Team Data Science
JANUARY 8, 2021
Big Data has become the dominant innovation in all high-performing companies. Notable businesses today focus their decision-making capabilities on knowledge gained from the study of big data. Big Data is a collection of large data sets, particularly from new sources, providing an array of possibilities for those who want to work with data and are enthusiastic about unraveling trends in rows of new, unstructured data.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
Start Data Engineering
JANUARY 6, 2021
What is backfilling ? Setup Prerequisites Apache Airflow - Execution Day Backfill Conclusion Further Reading References What is backfilling ? Backfilling refers to any process that involves modifying or adding new data to existing records in a dataset. This is a common use case in data engineering. Some examples can be a change in some business logic may need to be applied to an already processed dataset.
Confluent
JANUARY 7, 2021
At Zendesk, Apache Kafka® is one of our foundational services for distributing events among different internal systems. We have pods, which can be thought of as isolated cloud environments where […].
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Data Engineering Podcast
JANUARY 4, 2021
Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner.
DataKitchen
JANUARY 6, 2021
Savvy executives maximize the value of every budgeted dollar. Decisions to invest in new tools and methods must be backed up with a strong business case. As data professionals, we know the value and impact of DataOps: streamlining analytics workflows, reducing errors, and improving data operations transparency. Being able to quantify the value and impact helps leadership understand the return on past investments and supports alignment with future enterprise DataOps transformation initiatives.
Teradata
JANUARY 3, 2021
Digital payments generate 90% of financial institutions’ useful customer data. How can they exploit its value? Find out more.
Cloudera
JANUARY 5, 2021
In my last two blogs ( Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance, and Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising ) we looked at the benefits to retail in building personalized interactions by accessing both structured and unstructured data from website clicks, email and SMS opens, in-store point sale systems and past purchased behaviors.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Data Council
JANUARY 7, 2021
Orchest is an open-source tool for creating data science pipelines. Its core value proposition is to make it easy to combine notebooks and scripts with a visual pipeline editor (“build”); to make your notebooks executable (“run”); and to facilitate experiments (“discover”).
DataKitchen
JANUARY 5, 2021
Remote working has revealed the inconsistency and fragility of workflow processes in many data organizations. The data teams share a common objective; to create analytics for the (internal or external) customer. Execution of this mission requires the contribution of several groups: data center/IT, data engineering, data science, data visualization, and data governance.
Teradata
JANUARY 7, 2021
The problem for regulators & for banks alike is agreeing what good data looks like & how to share it to create a modern, flexible, shared data model. Read more.
Rock the JVM
JANUARY 4, 2021
Explore one of the most essential concepts in pure functional programming: the Functor, a crucial but abstract idea that will challenge your understanding
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Silectis
JANUARY 3, 2021
If you’re new to data engineering or are a practitioner of a related field, such as data science, or business intelligence, we thought it might be helpful to have a handy list of commonly used terms available for you to get up to speed. This data engineering glossary is by no means exhaustive, but should provide some foundational context and information.
DataKitchen
JANUARY 2, 2021
In Gartner’s recent report, Operational AI Requires Data Engineering, DataOps, and Data-AI Role Alignment , Robert Thanaraj and Erick Brethenoux recognize that “organizations are not familiar with the processes needed to scale and promote artificial intelligence models from the prototype to the production stages; resulting in uncoordinated production deployment attempts.”.
Rock the JVM
JANUARY 4, 2021
Explore one of the most essential concepts in pure functional programming: the Functor, a crucial but abstract idea that will challenge your understanding
Let's personalize your content