Sum of colum DOUBLE PRECISION not working properly

EdgarCap · December 23, 2022, 4:13pm

Hi,
with Open Virtuoso Open Source Version (Version: 07.20.3233
Build: Apr 29 2022 (000000) I was doing via isql_main.vspx a SUM with GROUP BY of a COLUM definined with DOUBLE PRECISION :

  select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS  TOTDOUBLE FROM  
  "CSV"."DBA"."XXXXX"   GROUP BY  "DESC"

The result of column TOTDOUBLE was wrong, ex row with DESC X8:

        DESC(VARCHAR) NUM(INTEGER)	 TOTDOUBLE(DOUBLE PRECISION)
         X1	 14566	 77768079
         X2	 8942	 53010726
         X3	 20016	 135716820
         X4	 7094	 39881287
         X5	 45126	 277069563
         X6	 26057	 183554148
         X7	 32203	 177582935
         X8	 60497	 390060857

         ...................

I Fixed using DECIMAL for column :

         DESC(VARCHAR) NUM(INTEGER)	TOTDECIMAL(DECIMAL)
         X1	 14566	 77448445
         X2	 8942	 48544209
         X3	 20016	 120022455
         X4	 7094	 41648754
         X5	 45126	 274503264
         X6	 26057	 173246078
         X7	 32203	 204019170
         X8	 60497	 417778465

         ...................

The strange thing is if a use in the select both of the columns (DECIMAL) and (DOUBLE) results are the same…

  select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS TOTDOUBLE,  
   SUM("IMPORTONUM") AS TOTDECIMAL FROM  
  "CSV"."DBA"."XXXXX"   GROUP BY  "DESC":

     DESC(VARCHAR) NUM(INTEGER)	TOTDOUBLE(DOUBLE PRECISION)  TOTDECIMAL  (DECIMAL)
             X1	 14566	 77448445    77448445
             X2	 8942	 48544209    48544209
             X3	 20016	 120022455   120022455
             X4	 7094	 41648754    41648754
             X5	 45126	 274503264   274503264
             X6	 26057	 173246078   173246078
             X7	 32203	 204019170   204019170
             X8	 60497	 417778465   417778465

hwilliams · December 23, 2022, 6:04pm

Do you have a simple test case for recreating this problem, including table schema, sample data and queries run to demonstrate the problem ?

EdgarCap · January 17, 2023, 9:36am

Hi @hwilliams,

this is the table definition:

  drop table "CSV"."DBA"."example_csv";
  create table "CSV"."DBA"."example_csv"
  (
    "ID" INTEGER IDENTITY,
    "DESC" VARCHAR,
    "NUM_C" INTEGER,
    "IMPORTO" DOUBLE PRECISION,
    PRIMARY KEY ("ID")
  );

I will send you the data.

Regards.

hwilliams · January 17, 2023, 2:51pm

I received the CSV data file and loaded it into a latest Virtuoso open source build (Version 7.2.8-rc1.3235-pthreads as of Jan 9 2023 (ab28163ff)).

The first observation is that the data loaded seems different to what is reported in the original post:

SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS  TOTDOUBLE FROM     "CSV"."DBA"."example_csv"   GROUP BY  "DESC";
"DESC","NUM","TOTDOUBLE"
"F","14491","78425750"
"I","56221","389082278"
"M","7997","11974551"
"R","44540","183303844"
"S","6152","123409553"
"Q","33297","142242056"
"D","23347","113975236"
"G","14347","176494618"
"H","13140","54077768"
"L","17844","94711636"
"A","17040","59388634"
"C","10736","61675378"
"E","35349","247437238"
"nd","2065",""
"T","6305","39016271"
"U","4726","16396253"
"V","30203","316919429"
"B","3346","21147552"
"N","35611","276723257"
"O","17169","79083261"
"P","16956","77000586"

21 Rows. -- 1 msec.
SQL>

I also note that by default the Virtuoso Conductor, I loaded the CSV data with, the datatype of the IMPORTO column is INTEGER:

create table "CSV"."DBA"."example_csv"
(
  "ID" INTEGER IDENTITY,
  "DESC" VARCHAR,
  "NUM_C" INTEGER,
  "IMPORTO" INTEGER,
  PRIMARY KEY ("ID")
);

If I import another copy of the table with another name and make the IMPORTO column datatype to be DOUBLE PRECISION , which you indicated was problematic ie

create table "CSV"."DBA"."example_csv_double"
(
  "ID" INTEGER IDENTITY,
  "DESC" VARCHAR,
  "NUM_C" INTEGER,
  "IMPORTO" DOUBLE PRECISION,
  PRIMARY KEY ("ID")
);

It returns the same results:

SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS  TOTDOUBLE FROM     "CSV"."DBA"."example_csv_double"   GROUP BY  "DESC";
"DESC","NUM","TOTDOUBLE"
"F","14491","78425750"
"I","56221","389082278"
"M","7997","11974551"
"R","44540","183303844"
"S","6152","123409553"
"Q","33297","142242056"
"D","23347","113975236"
"G","14347","176494618"
"H","13140","54077768"
"L","17844","94711636"
"A","17040","59388634"
"C","10736","61675378"
"E","35349","247437238"
"nd","2065",""
"T","6305","39016271"
"U","4726","16396253"
"V","30203","316919429"
"B","3346","21147552"
"N","35611","276723257"
"O","17169","79083261"
"P","16956","77000586"

21 Rows. -- 4 msec.
SQL>

Although I note in both cases the return values of "nd","2065","", with the SUM("IMPORTO") value being empty, whereas querying the table to see what values the nd values of the DESC column return I see:

SQL> select * FROM     "CSV"."DBA"."example_csv" where "DESC" = 'nd';
"ID","DESC","NUM_C","IMPORTO"
"468","nd","201",""
"576","nd","57",""
"684","nd","731",""
"792","nd","1076","14850"

4 Rows. -- 2 msec.
SQL>

Thus there is one value ie 14850 , which should have been returned as the SUM("IMPORTO") value.

In fact if I import the CSV file with IMPORTO column as NUMERIC, which is then stored as a DECIMAL datatype in the table:

create table "CSV"."DBA"."example_csv_numeric"
(
  "ID" INTEGER IDENTITY,
  "DESC" VARCHAR,
  "NUM_C" INTEGER,
  "IMPORTO" DECIMAL,
  PRIMARY KEY ("ID")
);

Then the expected "nd","2065","14850" result is returned:

SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS  TOTDOUBLE FROM     "CSV"."DBA"."example_csv_numeric"   GROUP BY  "DESC";
"DESC","NUM","TOTDOUBLE"
"M","7997","8173147"
"E","35349","243103828"
"H","13140","48544209"
"U","4726","15267714"
"O","17169","83305921"
"A","17040","50375607"
"P","16956","76516561"
"nd","2065","14850"
"V","30203","332485977"
"L","17844","75996412"
"F","14491","77448445"
"C","10736","55100881"
"R","44540","173246078"
"I","56221","417778465"
"Q","33297","120022455"
"B","3346","13757558"
"G","14347","204019170"
"D","23347","121298964"
"N","35611","274503264"
"T","6305","41648754"
"S","6152","131427847"

21 Rows. -- 1 msec.
SQL>

So there does appear to be an issue with the DOUBLE PRECISION and INTEGER types when aggregating values, which needs to be looked into.

Please confirm this is the issue with the CSV dataset file provided ?

EdgarCap · January 17, 2023, 3:54pm

Yes, I confirm the issue, yes is another dataset, if you need more info I would provide you.

Regards.

hwilliams · January 17, 2023, 4:32pm

Thanks for confirming, no additional information required at this point as the issue can be seen from the test case and has been reported to development to fix …

EdgarCap · January 18, 2023, 1:23pm

Thanks @hwilliams,

also for developers note the difference of field IMPORTO in the various cases, for example:

----- IMPORTO(DECIMAL) > IMPORTO(DOUBLE)

  "DESC","NUM_C","IMPORTO"
  "G","14347","176494618"  (DOUBLE )
  "G","14347","204019170"  (DECIMAL)

----- IMPORTO(DECIMAL) < IMPORTO(DOUBLE)

  "DESC","NUM_C","IMPORTO"    
  "A","17040","59388634"  (DOUBLE)
  "A","17040","50375607"  (DECIMAL)

The correct value of column IMPORTO is in the TABLE with DECIMAL type.

hwilliams · January 18, 2023, 2:12pm

@EdgarCap Yes, we have observed those differences also, which have been reported to development in the case report …