ERROR: spiexceptions.CannotCoerce: cannot cast type record to double precision[]
search cancel

ERROR: spiexceptions.CannotCoerce: cannot cast type record to double precision[]

book

Article ID: 427328

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

gpadmin=# select madlib.version();
                                                                                                   version
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 MADlib version: 2.1.0, git revision: rel/v2.1.0, cmake configuration time: Fri Sep  8 20:02:29 UTC 2023, build type: Release, build system: Linux-5.4.0-144-generic, C compiler: gcc 8, C++ compiler: g++ 8
(1 row)

gpadmin=# CREATE TABLE IF NOT EXISTS fraud.madlib_training_data (
gpadmin(#     transaction_id BIGINT,
gpadmin(#     fraud_label INTEGER,
gpadmin(#     amount DOUBLE PRECISION,
gpadmin(#     v1 DOUBLE PRECISION, v2 DOUBLE PRECISION, v3 DOUBLE PRECISION,
gpadmin(#     v4 DOUBLE PRECISION, v5 DOUBLE PRECISION,
gpadmin(#     v14 DOUBLE PRECISION, v17 DOUBLE PRECISION, v20 DOUBLE PRECISION
gpadmin(# ) DISTRIBUTED BY (transaction_id);
ERROR:  schema "fraud" does not exist
LINE 1: CREATE TABLE IF NOT EXISTS fraud.madlib_training_data (
                                   ^
gpadmin=# create scehma fraud;
ERROR:  syntax error at or near "scehma"
LINE 1: create scehma fraud;
               ^
gpadmin=# create schema fraud;
CREATE SCHEMA
gpadmin=# CREATE TABLE IF NOT EXISTS fraud.madlib_training_data (
    transaction_id BIGINT,
    fraud_label INTEGER,
    amount DOUBLE PRECISION,
    v1 DOUBLE PRECISION, v2 DOUBLE PRECISION, v3 DOUBLE PRECISION,
    v4 DOUBLE PRECISION, v5 DOUBLE PRECISION,
    v14 DOUBLE PRECISION, v17 DOUBLE PRECISION, v20 DOUBLE PRECISION
) DISTRIBUTED BY (transaction_id);
CREATE TABLE
gpadmin=# INSERT INTO fraud.madlib_training_data (transaction_id, fraud_label, amount, v1, v2, v3, v4, v5, v14, v17, v20)
gpadmin-# SELECT n, (n % 5 = 0)::int, (n * 1.5)::double precision,
gpadmin-#        (n * 0.1)::double precision, (n * 0.2)::double precision, (n * 0.3)::double precision,
gpadmin-#        (n * 0.4)::double precision, (n * 0.5)::double precision,
gpadmin-#        (n * 0.6)::double precision, (n * 0.7)::double precision, (n * 0.8)::double precision
gpadmin-# FROM generate_series(1, 100) n;
INSERT 0 100
gpadmin=# DROP TABLE IF EXISTS fraud.madlib_logreg_model CASCADE;
NOTICE:  table "madlib_logreg_model" does not exist, skipping
DROP TABLE
gpadmin=# SELECT madlib.logregr_train(
gpadmin(#     'fraud.madlib_training_data'::text,
gpadmin(#     'fraud.madlib_logreg_model'::text,
gpadmin(#     'fraud_label'::text,
gpadmin(#     'amount, v1, v2, v3, v4, v5, v14, v17'::text,
gpadmin(#     NULL::text,
gpadmin(#     20::integer,
gpadmin(#     'irls'::text,
gpadmin(#     0.0001::double precision,
gpadmin(#     false::boolean
gpadmin(# );
ERROR:  spiexceptions.CannotCoerce: cannot cast type record to double precision[]
LINE 9: ...            (amount, v1, v2, v3, v4, v5, v14, v17)::double p...
                                                             ^
QUERY:
            insert into pg_temp.__madlib_temp_65233657_1769558775_11488247__
                (select
                    Null,
                    1,
                    (
                madlib.__logregr_irls_step(
                    (fraud_label)::boolean,
                    (amount, v1, v2, v3, v4, v5, v14, v17)::double precision[],
                    rel_state.__madlib_temp_18553644_1769558775_37450018__)
                )
                from
                    (fraud.madlib_training_data AS _src
                    join
                    pg_temp.__madlib_temp_65233657_1769558775_11488247__ AS rel_state
                    on True)
                where
                    rel_state.__madlib_temp_96048304_1769558775_8661491__ = 1 - 1 and
                    (case when 1 = 1 then
                        True
                    else
                        rel_state.__madlib_temp_18553644_1769558775_37450018__[array_upper(rel_state.__madlib_temp_18553644_1769558775_37450018__, 1)] = 0
                    end)
                )

CONTEXT:  Traceback (most recent call last):
  PL/Python function "logregr_train", line 20, in <module>
    return logistic.logregr_train(**globals())
  PL/Python function "logregr_train", line 130, in logregr_train
  PL/Python function "logregr_train", line 253, in __logregr_train_compute
  PL/Python function "logregr_train", line 78, in __compute_logregr
  PL/Python function "logregr_train", line 249, in update
  PL/Python function "logregr_train", line 130, in runSQL
PL/Python function "logregr_train"

Cause

This error is happening because of how you are passing the feature columns (independent variables).

In MADlib, the independent_var argument must be a string that evaluates to a valid array or an actual array column. By providing 'amount, v1, v2, v3, v4, v5, v14, v17', MADlib attempts to wrap those columns in parentheses, which SQL interprets as a record (a tuple), and you cannot cast a record directly to a double precision[] array.

Resolution

You need to explicitly format the feature list as a PostgreSQL array within the string. Use the ARRAY[...] syntax:

 

gpadmin=# SELECT madlib.logregr_train(
gpadmin(#     'fraud.madlib_training_data'::text,
gpadmin(#     'fraud.madlib_logreg_model'::text,
gpadmin(#     'fraud_label'::text,
gpadmin(#     'ARRAY[1, amount, v1, v2, v3, v4, v5, v14, v17]'::text, -- Fixed line
gpadmin(#     NULL::text,
gpadmin(#     20::integer,
gpadmin(#     'irls'::text,
gpadmin(#     0.0001::double precision,
gpadmin(#     false::boolean
gpadmin(# );
 logregr_train
---------------

(1 row)

gpadmin=#

 

 

Why you should add the 1

I added a 1 inside the array (ARRAY[1, amount, ...]). In Logistic Regression, you almost always need an intercept (the "constant" term). MADlib does not add one automatically. If you don't include the 1, your model will be forced to pass through the origin (0,0), which usually leads to a poor fit.

Additional Information

Key Requirements for MADlib Feature Strings

  1. Format: It must be a valid SQL expression that results in a numeric array.

  2. The Intercept: Use 1 as the first element if you want an intercept.

  3. No Quotes inside Quotes: Ensure the internal array does not use extra single quotes that break the string literal.