Inserting into a PostgreSQL database from R using the COPY INTO commands are technically possible, but the API is kind of ugly, and it doesn’t have critical functionality like reporting the number of rows affected, or even reporting errors!

How to code it

RPostgreSQL has two functions: postgresqlCopyIn and postgresqlCopyInDataframe and the (undocumented) intent seems to be that you do a dbSendQuery with your copy statement, and then call the postgresqlCopy* functions. For example, suppose that we want to create a temporary table and we have a data frame to put into it:

> library(RPostgreSQL)
Loading required package: DBI
> con <- dbConnect("PostgreSQL", 'crmp')
> q <- 'CREATE TEMP TABLE copy_test (x double precision, y double precision, z double precision)'
> res <- dbSendQuery(con, q)
> df <- data.frame(x=rnorm(100), y=rnorm(100), z=rnorm(100))

At this point, the correct way to copy data in is this:

> q <- 'COPY copy_test (x, y, z) FROM STDIN'
> res <- dbSendQuery(con, q)
> res
<PostgreSQLResult:(27700,0,8)> 
> postgresqlCopyInDataframe(con, df)
NULL
> dbGetRowsAffected(res)
[1] 0
> ?postgresqlCopyInDataframe
> q <- 'COPY copy_test (x, y, z) FROM STDIN'
> res <- dbSendQuery(con, q)
> postgresqlCopyInDataframe(con, df)
NULL
> dbGetException(con)
$errorNum
[1] 0

$errorMsg
[1] "OK"

>

It seems sensical for postgresqlCopyInDataframe to just handle sending the copy statement:

> postgresqlCopyInDataframe(con, df)
Error in postgresqlCopyInDataframe(con, df) : 
  RS-DBI driver: (PQputCopyData failed: no COPY in progress
)

but clearly that was not the intent of the API writers.

Rows afffected

Unfortunately, RPostgreSQL doesn’t play nice with DBI and we can’t get any information about what actually happened with our COPY procedure.

>q <- 'SELECT count(*) FROM copy_test' # Start by counting the rows in the table
>fetch(dbSendQuery(con, q), -1)
  count
1   400
>q <- 'COPY copy_test (x, y, z) FROM STDIN'
>res <- dbSendQuery(con, q)
>postgresqlCopyInDataframe(con, df) # Insert some rows
NULL
>dbGetRowsAffected(res) # Zero rows affected... did it not work?
[1] 0
>q <- 'SELECT count(*) FROM copy_test'
>fetch(dbSendQuery(con, q), -1) # Hmmmm, nope, clearly the new rows are there.
  count
1   500

This isn’t such a big deal when you have small table; just count the rows before and after. However, if you have a table with hundreds of millions of rows (e.g. obs_raw) , it is extremely expensive to count the number or rows (sequential scan, anyone?).

What about errors?

Yeah… what about those errors?

I have a section of code for bringing in meteorological observations which looked something like this:

  df <- data.frame('history_id'=hist.id, 'vars_id'=vars.id, 'obs_time'=format.time, 'mod_time'=Sys.time(), datum=a[,1,drop=T])
  q <- "COPY obs_raw (history_id, vars_id, obs_time, mod_time, datum) FROM STDIN"
  rv <- query(con, q)
  copy.rv <- postgresqlCopyInDataframe(con, df)

Stepping through the code, after calling postgresqlCopyInDataframe, dbGetException would show everything as being fine.

> dbGetException(con)
$errorNum
[1] 0

$errorMsg
[1] "OK"

However, the next database operation that I would try to do would error out saying:

[1] "Error in postgresqlExecStatement(conn, statement, ...) : \n  RS-DBI driver: (could not Retrieve the result : ERROR:  current transaction is aborted, commands ignored \
until end of transaction block\n)\n"

That would have been nice to know back when I checked the error code, dude. This was extremely puzzling until I, personally, checked the Postgres logs which showthis:

2011-10-24 13:33:47 PDT ERROR:  invalid input syntax for type timestamp: "1319488404.79127"
2011-10-24 13:33:47 PDT CONTEXT:  COPY obs_raw, line 1, column mod_time: "1319488404.79127"
2011-10-24 13:33:47 PDT STATEMENT:  COPY obs_raw (history_id, vars_id, obs_time, mod_time, datum) FROM STDIN

Would have been nice if my application could get that error back, but it can’t. All-in-all, I’m happy that RPostgreSQL exists, but it would be nice if it could handle basic things like passing error message back to the application.



blog comments powered by Disqus

Published

24 October 2011

Category

work

Tags