Upgrade Data Science Python Packages from Greenplum 6 to Greenplum 7
search cancel

Upgrade Data Science Python Packages from Greenplum 6 to Greenplum 7

book

Article ID: 439502

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire VMware Tanzu Data Suite VMware Tanzu Data Suite

Issue/Introduction

You need to handle the upgrade of Data Science Python packages when migrating from VMware Tanzu Greenplum 6 (GPDB6) to Greenplum 7 (GPDB7). GPDB6 utilizes Python 3.9, while GPDB7 shifts to Python 3.11, introducing potential runtime failures for existing User Defined Functions (UDFs) due to syntax changes or library differences.

Environment

  • VMware Tanzu Greenplum 6.x
  • VMware Tanzu Greenplum 7.x

Resolution

To successfully transition your data science environment during a Greenplum upgrade, you follow these steps:

  1. Understand the Dependency Scope The DataSciencePython package in GPDB6 is utilized by the plpython3u language. While removing the package does not break the metadata definitions of your database objects, your Python UDFs will fail at runtime if the underlying dependencies or syntax are incompatible with the newer version.

  2. Identify All Python UDFs You perform an audit of your current database to identify every Python-based UDF that requires migration.

  3. Review and Update UDFs for Python 3.11 Compatibility This manual step is mandatory. You must review your code to address:

    • Syntax changes between Python 3.9 and Python 3.11.
    • Libraries that are deprecated or removed in the newer version.
    • Differences in package versions.
    • Testing all UDFs in a controlled, non-production environment.
  4. Compare Library Availability You compare the installed libraries in DataSciencePython3.9 against those in DataSciencePython3.11. You ensure that all required packages are either pre-bundled in GPDB7 or are available for manual installation.

  5. Upgrade the Cluster You perform the upgrade of the Greenplum cluster to version 7. Once the upgrade completes, the old Python 3.9 environment is no longer relevant to the database operations.

  6. Uninstall the Old Package You uninstall DataSciencePython3.9 from the system if it persists post-upgrade or remains on shared systems.

  7. Install the New Package You install DataSciencePython3.11 specifically designed for GPDB7 on all relevant nodes.

  8. Validate the Upgrade You perform a final validation by re-testing all Python UDFs to confirm they execute correctly within the GPDB7 environment.

Additional Information

Upgrade Data Science Python Packages from Greenplum 6 to Greenplum 7