I received the CSV data file and loaded it into a latest Virtuoso open source build (Version 7.2.8-rc1.3235-pthreads as of Jan 9 2023 (ab28163ff)
).
The first observation is that the data loaded seems different to what is reported in the original post:
SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS TOTDOUBLE FROM "CSV"."DBA"."example_csv" GROUP BY "DESC";
"DESC","NUM","TOTDOUBLE"
"F","14491","78425750"
"I","56221","389082278"
"M","7997","11974551"
"R","44540","183303844"
"S","6152","123409553"
"Q","33297","142242056"
"D","23347","113975236"
"G","14347","176494618"
"H","13140","54077768"
"L","17844","94711636"
"A","17040","59388634"
"C","10736","61675378"
"E","35349","247437238"
"nd","2065",""
"T","6305","39016271"
"U","4726","16396253"
"V","30203","316919429"
"B","3346","21147552"
"N","35611","276723257"
"O","17169","79083261"
"P","16956","77000586"
21 Rows. -- 1 msec.
SQL>
I also note that by default the Virtuoso Conductor, I loaded the CSV data with, the datatype of the IMPORTO
column is INTEGER
:
create table "CSV"."DBA"."example_csv"
(
"ID" INTEGER IDENTITY,
"DESC" VARCHAR,
"NUM_C" INTEGER,
"IMPORTO" INTEGER,
PRIMARY KEY ("ID")
);
If I import another copy of the table with another name and make the IMPORTO
column datatype to be DOUBLE PRECISION
, which you indicated was problematic ie
create table "CSV"."DBA"."example_csv_double"
(
"ID" INTEGER IDENTITY,
"DESC" VARCHAR,
"NUM_C" INTEGER,
"IMPORTO" DOUBLE PRECISION,
PRIMARY KEY ("ID")
);
It returns the same results:
SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS TOTDOUBLE FROM "CSV"."DBA"."example_csv_double" GROUP BY "DESC";
"DESC","NUM","TOTDOUBLE"
"F","14491","78425750"
"I","56221","389082278"
"M","7997","11974551"
"R","44540","183303844"
"S","6152","123409553"
"Q","33297","142242056"
"D","23347","113975236"
"G","14347","176494618"
"H","13140","54077768"
"L","17844","94711636"
"A","17040","59388634"
"C","10736","61675378"
"E","35349","247437238"
"nd","2065",""
"T","6305","39016271"
"U","4726","16396253"
"V","30203","316919429"
"B","3346","21147552"
"N","35611","276723257"
"O","17169","79083261"
"P","16956","77000586"
21 Rows. -- 4 msec.
SQL>
Although I note in both cases the return values of "nd","2065",""
, with the SUM("IMPORTO")
value being empty, whereas querying the table to see what values the nd
values of the DESC
column return I see:
SQL> select * FROM "CSV"."DBA"."example_csv" where "DESC" = 'nd';
"ID","DESC","NUM_C","IMPORTO"
"468","nd","201",""
"576","nd","57",""
"684","nd","731",""
"792","nd","1076","14850"
4 Rows. -- 2 msec.
SQL>
Thus there is one value ie 14850
, which should have been returned as the SUM("IMPORTO")
value.
In fact if I import the CSV file with IMPORTO
column as NUMERIC
, which is then stored as a DECIMAL
datatype in the table:
create table "CSV"."DBA"."example_csv_numeric"
(
"ID" INTEGER IDENTITY,
"DESC" VARCHAR,
"NUM_C" INTEGER,
"IMPORTO" DECIMAL,
PRIMARY KEY ("ID")
);
Then the expected "nd","2065","14850"
result is returned:
SQL> select "DESC", SUM ("NUM_C") AS NUM, SUM("IMPORTO") AS TOTDOUBLE FROM "CSV"."DBA"."example_csv_numeric" GROUP BY "DESC";
"DESC","NUM","TOTDOUBLE"
"M","7997","8173147"
"E","35349","243103828"
"H","13140","48544209"
"U","4726","15267714"
"O","17169","83305921"
"A","17040","50375607"
"P","16956","76516561"
"nd","2065","14850"
"V","30203","332485977"
"L","17844","75996412"
"F","14491","77448445"
"C","10736","55100881"
"R","44540","173246078"
"I","56221","417778465"
"Q","33297","120022455"
"B","3346","13757558"
"G","14347","204019170"
"D","23347","121298964"
"N","35611","274503264"
"T","6305","41648754"
"S","6152","131427847"
21 Rows. -- 1 msec.
SQL>
So there does appear to be an issue with the DOUBLE PRECISION
and INTEGER
types when aggregating values, which needs to be looked into.
Please confirm this is the issue with the CSV dataset file provided ?