How should PostgreSQL PgAdmin III deal with Updateable Views in the Edit Grid?
I was motivated to put this document together after participating in a discussion[1] on the PostgreSQL IRC Channel (irc://irc.freenode.net#postgresql) in which some PostgreSQL users were questioning the limitation in PgAdmin III that prevents one from using the Edit Grid to edit the contents of an Updateable View.
Arguments for why permitting updating of views is a bad idea can be summarized as follows:
- In PostgreSQL Views are implemented using the Rules (Query Re-Write Rules) System to re-write a select on a dummy table to a different select on one or more other tables.
- Updateable Views extend normal views by the Schema Designer/Admin writing Insert, Update, and/or Delete Re-Write Rules on the same dummy table (effectively, the View).
- Because of 1 & 2, Views and Updateable Views have no concept of a Primary Key for the View itself.
- There is no way that is not unmanageably complex (proof of this is left to the reader - if it exists) for a client application to tease out a meaningful Primary Key or Keys from the definition (Re-Write Rules) of the View/Updateable View.
- Without a Primary Key, permitting update of a record (tuple) returned from a view could have undesirable and/or unexpected consequences like an update that the user believes applies to a single row/tuple actually affecting many rows/tuples in the view.
In answering these arguments I would like to point out the following:
- Normal views created with "create view" ARE NOT by default updateable. If one tries to perform an insert, update, or delete on such a view, it is silently ignored.
- For a view to be updateable, the Schema Designer or Admin needs to take affirmative steps to specifically create insert, update, and/or delete Query Re-Write Rules on the view to perform those actions.
- By taking such an action, the Schema Designer or Admin is creating a contract with the consumer/client of the datatbase/schema guaranteeing that updates, inserts, and/or deletes, if permitted by the rules, will perform actions that are sensible and consistent with the overall definition and purpose of the view within the schema.
So, in PostgreSQL, if a view permits one to issue an update statement against it, then there is an explicit guarantee made by the schema designer or admin that the update will be handled in a fashion that is consistent with the purpose of the view. Therefore, it is not the responsibility of any client application to protect the user from "corrupting their data" by prohibiting them from performing updates against an updateable view. Protecting the data in the database from unknowledgeable or inexperienced users is the job of permissions, rules, constraints, and triggers in the schema created by the schema designer or admin and roles granted by the same to the users. That being said, the fact that an update (or insert or delete) to a view can (and likely will) have consequences beyond a single row/tuple supports the argument that a client application, like PgAdmin III, may want to only permit such updates after warning the user and/or only within an "Expert" mode.
Let us not forget though, that even an update to a perfectly normal table could have similar consequences if the schema designer or admin were to create an unusual sort of update, insert, or delete trigger that did something very unexpected. For example, if a schema designer were to create an update trigger, on an otherwise perfectly normal table, that deleted ALL rows in the table for which one field were the same as the updating row before completing the update, that would be just as disconcerting to the end-user as anything that might occur in an updateable view. Such a situation could not be foreseen by any client application (including PgAdmin III) and so would never be the responsibility of said application.
So, given that we have shown that a client application, like PgAdmin III, should not prohibit the end-user from performing updates against an otherwise updateable view, how should such a client application behave when permitting such updates? It is clear, that unlike the above example where a strange and highly abnormal and unlikely sort of trigger is created on an otherwise normal table, that updates on an updateable view can be expected to have unexpected[2] and surprising results in most cases. It is with this in mind that a case can be made that a client application like PgAdmin III should provide the ability to update a view only after warning the user and/or the user enabling an expert mode after reviewing such a warning. Also, the user-interface should make it clear to the user that the data they are viewing originates from an updateable view and not a normal table and the look-and-feel of updateable view data vs. normal table data should be well documented and explained. Before discussing further the exact nature of the desired user-interface and behavior, it is probably worth discussing how other popular RDBMS systems and tools handle updateable views - in particular, one of the more popular commercial RDBMS systems, Microsoft(R) SQL Server 20xx(TM).
MS-SQL Server(TM) supports update of views by default with only a few minor restrictions. For a view update to succeed in MS-SQL Server, the following conditions must be met (excerpted from MS-SQL Server 2005 Documentation):
- Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table.
- The columns being modified in the view must directly reference the underlying data in the table columns. The columns cannot be derived in any other way, such as through the following:
- An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV, STDEVP, VAR, and VARP.
- A computation. The column cannot be computed from an expression that uses other columns. Columns that are formed by using the set operators UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a computation and are also not updateable.
- The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses.
The previous restrictions apply to any sub-queries in the FROM clause of the view, just as they apply to the view itself. Generally, the Database Engine must be able to unambiguously trace modifications from the view definition to one base table.
Basically, the above can be summarized as:
By default, only those fields in a view that have a direct, unambiguous mapping to a single field, in a single underlying table are updateable. Furthermore, only fields from a single underlying table can be updated within a single update statement[3].
This, however, is not the end of the story with respect to updateable views in MS-SQL Server. In addition, if the author of the view wishes to provide update capability for a view beyond that normally permitted as described above, said author may create insert, update, and/or delete "INSTEAD OF" triggers on the view that may perform any desired functions to implement consistent update for the view (NOTE: The actions of these triggers entirely replace the actions of the normal DML statement). If the view author creates such triggers, there is no mechanism for any client application, or even the database server/instance itself, to know if the triggers implement consistent or predictable behavior. The triggers can do anything the trigger author desires up to and including performing actions which would cause many rows within the view to change as the result of updating a single row. This behaviour and expectation is entirely consistent, and has no qualitative difference, with the experience of updatable views in PostgreSQL. However, the default case that MS-SQL Server permits updates that apply to a sinlge underlying table's set of fields unambiguously is quite different from the default PostgreSQL behavior of not permitting such updates by default. In fact, the default behaviour of MS-SQL Server can result in extremely undesirable sorts of updates that are not at all congruent with an end-user's expectations[4] and for which, no explicit contract has been made by the view author to provide safe update of the view (though, one can argue, that their is an implied contract because the view author has "chosen" not to restrict update of the view)[5].
So, in summary, the default ability of views to be updated that MS-SQL Server provides is a bad idea because: 1) It requires the end-user to understand that it is, in fact, a view, and the details of the view before performing any update safely, 2) It does not require an explicit contract between the view author and the end-user guaranteeing sensible behaviour of the updates, 3) It is guaranteed to leak details of the abstraction to the end-user because of the undesirable and unexpected behaviour. On the other hand, the explicitly permitted update of views provided by rules in PostgreSQL and by "INSTEAD OF Triggers" in MS-SQL Server, are good because the end-user (or any client application) need not concern itself with the details of the view (or even that it is a view) before performing an update because it may rely on the explicit contract made by the view author to ensure that any updates produce sensible and consistent results. If the view author fails to create rules/triggers that result in consistent and sensible results, then that is a failure of the view author to fulfill the explicit contract and is not a failure of the end-user nor of the client application. This does not mean, however, that the client application has no responsibility for assisting the user with understanding that updates performed against a view will likely result in updates of more than one row, even though, the user's intention is to only update a single row.
The responsibility of a client application like PgAdmin III should be to assist the user in understanding that "editing" a row in a result set generated from a view may, and likely will, result in changes to multiple rows in the viewed data set (NOTE: Let us not forget that any table could have this problem with appropriate triggers, but, that views are much more likely to have this issue). It should not be the responsibility of PgAdmin III to prohibit the end-user from performing such an update as there exists an explicit contract in PostgreSQL between the view author and the end-user guaranteeing that updates will not do something meaningless, dangerous, or inconsistent with the purpose of the view. Therefore, I propose the following behaviour for PgAdmin III (and similar tools) with respect to updateable views:
1. Provide several levels of updatablility for result sets in the edit grid with the default being the "Safest" option as follows:
a. Level 1 (Default) - Only Permit Editing/Update of Normal Tables that have a Primary Key (or other unique/non-null index) that also have no update, insert, or delete triggers (as these can potentially cause "unexpected" results)
b. Level 2 - Only Permit Editing/Update of Normal Tables that have a Primary Key, but, may have update, insert, or delete triggers
c. Level 3 - Only Permit Editing/Update of Normal Tables, even those without Primary Key, and/or with update, insert, or delete triggers and use CTID plus All Fields during updates as a Pseudo-Primary Key for those Tables lacking a Primary Key
d. Level 4 - Permit update of all normal tables, as in Level 3, but also permit update of updateable views using ALL fields in the result-set as the Primary Key (only permit if all fields in the view are in the result set)
2. Provide a configuration option in for the above settings. Whenever the end-user changes the configuration to anything but the default setting, present an appropriate warning explaining the ramifications of permitting updates under such criteria. Provide clear documentation in any associated help/hint files explaining the different levels.
3. When viewing a result set in the edit grid, provide clear visual queues as to which type of data-set the user is viewing. Something like different background color which is clearly documented as well as a text and icon indicator at the top of the result set windows that is also clearly documented.
4. If the user tries to edit a result-set that is a higher edit level (as defined above) for which the user has not already chosen the appropriate setting in the configuration, permit them to enter the edit mode only after displaying the same warning that would be displayed when the user would change the configuration setting to the same level.
In additon to the configuration option, the visual queues, and the warnings, the application should behave slightly differently when the user edits results sets in levels above 1. The application should behave as follows:
1. For Level 1, it should permit the update to the given row, perform the update against the database with the PK, and display the updated data in-place if the update succeeds. If it fails, it should inform the user and show the original data.
2. For Level 2, it should permit the udpate, but, after the update succeeds it should refresh the entire result set and seek (by primary key) to the updateded row and show it in the view in the same display row within the UI that it was originally. If the row no longer exists, it should display an empty row in its place and pop-up a notification. If the update fails, it should display the original data as in 1.
3. Same behavior as Level 2 for tables with insert/update/delete triggers. Same as Level 1 for those without.
4. Same behavior as Level 3 for tables. For views, same behavior as 2.
Given all of the above, update of updateable views in PgAdminIII, and similar applications, should not cause the user to become confused and should permit legitimate updates via those views. Details of the exact presentation and behavior at each level can be fleshed out during development as necessary.
* Footnotes
[1] Discussion from IRC
wide_awake: apparently I'm the only person interested in using the PGAdmin Edit Grid to edit updateable view data. :-/
gbutler: wide_awake: this came up in discussion a few days ago. xzilla was of the opinion that there was no good way to have the edit grid handle views correctly. I felt it was a simple matter of treating all fields as part of the primary key collectively (or a user selected sub-set) with warning.
xzilla: gbutler: and i pointed out that method can easily lead to corrupted data
wide_awake: gbutler: ya, I brought it up a couple days ago. I proposed taking the naive approach, with big fat warnings
* wide_awake heard crickets chirping...
xzilla: gbutler: and also that it doesnt fully handle views across multiple tables
xzilla: gbutler: i'd note we have a similar set of rules for ppa which handle updates on non-pk tables. but the list of gotchas is bigger on views (which is why i never bothered to code up the feature)
wide_awake: gbutler, xzilla there was also a post on planetpostgresql about missing features for SQL conformance
wide_awake: one of which was "E153 Updatable queries with subqueries"
wide_awake: maybe it would make more sense to wait until that's done (or an approach is agreed upon) before barking any further up the tree
gbutler: xzilla: I'm still unable to understand your point regarding corrupted data and multiple table views. If I have defined an updateable view (meaning I've created re-write rules for Insert, Update, Delete), then why is data-corruption or multiple-table issues a concernd of PgAdmin? That is up to *me* as the creator of the re-write rules to make it do something sensible, No? (Or am I missing something big?)
wide_awake: gbutler: +1. In other words, PGAdmin shouldn't prevent you from making use of your re-write rules, if they exist
gbutler: Yep, that is my point! But, if there is something I'm missing, feel free to slap me down.
wide_awake: my suggestion was to add a "Edit (and potentially destroy) Data" option
wide_awake: for views
gbutler: wide_awake: sounds reasonable to me...
wide_awake: just taking the TIAS approach
xzilla: gbutler: the corrupted data problem comes down to how the information is presented vs what is actually happening. ie. you see a row, you decide to edit it, now you find out you edited 2 rows.
gbutler: xzilla: my point is that is a concern for the re-write rules, not anything for PgAdmin to worry about. If editing one row results in changes to 100 rows, that's absolutely fine, if the re-write rules are so defined.
endpoint_david: mastermind: what is the impact of this on a warm-standby or other log-shipping/archiving based approach?
xzilla: gbutler: but how can you tell when that is a failing of the UI or intentional configuration based on rewrite rules ?
gbutler: xzilla: That's why the warning.
xzilla: "warning, i may have just borked your data"
wide_awake: xzilla: it seems like a "don't show me this again" type message that says that editing a view is different than editing table data should suffice
gbutler: xzilla: No. PgAdmin would never have Borked the data. PgAdmin request an update (insert, update, or delete). The re-write rules do with this what the AUTHOR OF THE RULES decided would be appropriate. PgAdmin then displays an updated result set that accurately reflects the intention of the RULES AUTHOR. The warning would be something along the lines of, "When you click to EDIT mode, PgAdmin would display a warning saying something like, "
gbutler: This table is an updateable view. Editing the data may produce unexpected results based upon the update rules defined by the view author. Proceed with caution."
wide_awake: gbutler: it could even let you review the rules before proceeding
gbutler: wide_awake: Sure, although I'm not sure how useful that would be except to an expert.
wide_awake: not terribly, but in my case i am the rules author
wide_awake: hence why I'm so certain I really do want to edit the data
gbutler: Regardless, it is the responsibility of the VIEW AUTHOR to get the rules right, not PgAdmin's concern.
wide_awake: also, wasn't there talk of making VIEWs updatably by default?
xzilla: gbutler: typing in all caps doesnt seem to be helping you understand the problem
xzilla: wide_awake: yes, which is what most of the tools authors are waiting for wrt supporting updatable views
wide_awake: xzilla: that won't change the problem, though, will it?
RhodiumToad: query_: careful use of partial and/or composite indexes will speed that query up a lot
xzilla: wide_awake: presumly it lets people punt on all the corner cases
xzilla: wide_awake: you dont have to deal with the RULE implementation, it makes things simpler
xzilla: wide_awake: updatable views have a very narrow defined feature set
wide_awake: you'll still be able to define your own rules, though, right?
davidfetter: gorthx, when's john naylor speaking at pdxpug?
gbutler: xzilla: I still think you fail to realize that you don't have to deal with the RULE implementation. That is a concern for the rule author. The tool used to do the update, whether PgAdmin or just psql, the result is exactly the same.
xzilla: wide_awake: one would hope, but that would be outside the scope of the updatable option on views
RhodiumToad: query_: for example, have you tried an index on (origin) WHERE active=1 ?
andres: wide_awake: I guess that wont happen that quickly - I think bernd which was doing that work was a bit discouraged for now
xzilla: gbutler: you seem to still not be getting the point about the failing of the UI
query_: no haven't tried that yet. another query uses that index
andres: wide_awake: especially as the current approaches didnt work that well
xzilla: gbutler: when i click on "edit row", i have no way to garauntee that is the row being edited
xzilla: gbutler: even when that is the intention of things
wide_awake: xzilla: I think the point is that it doesn't matter for VIEWs
andres: wide_awake: triggers on views...
gbutler: xzilla: Why do I care? Simply issue: "update foo set fld1 = newval1, fld2 = newal2 ... where fld1 = oldva1 and fld2 = oldval2 and ... fldn = oldvaln"
gbutler: where the "where clause" includes ALL fields and ALL old values.
wide_awake: xzilla: if it's clearly marked as a footgun, why not make it *possible* to edit these views, at least?
xzilla: the old pg philosophy of not actually letting people have footguns
wide_awake: heh
wide_awake: the footgun is already there... allowing plain-sql updates on views
xzilla: some would say putting a gui on it is like loading the footgun and showing you which way to aim
gbutler: wide_awake/xzilla: I would not even consider it a foot-gun. To me an updateable view is an updateable view. Think of it this way, if I have a table (normal table) that has no PK or unique index, I can confidently say, that ALL the fields collectively make the record unique with respect to updates. If there are multiple rows in the DB for that table that are 100% identical, then any update I do on one (including delete) would be done on a
gbutler: ll of them if I treat ALL fields collectively as the PK for update purposes. To me, this is 100% the expected and correct behaviour on a table with no pk or otherwise unique index. Now, if you consider an updateable view as just a special case of a table with no PK/Unique index, what's the problem. I can see none.
xzilla: aside: if you guys want to implement this in ppa, i'm happy to review patches. I think the issues can be worked out, just that it would take some effort.
wide_awake: PGAdmin doesn't let you edit tables without a PK or OID either
xzilla: gbutler: but that is exactly the case where the UI tends to break down.
xzilla: in ppa, if you dont have a pk, we check for unique index, then oids, and count updates on more than 2 rows and return error
xzilla: arguable that could be done for something else, but the ui is broken for this scenario
wide_awake: xzilla: even if view-editing did that, I'd be happy.
wide_awake: I just want it to work in the straight forward case.
wide_awake: but it seems that letting people try it in any case isn't that dangerous
wide_awake: if there's an update rule, it's specifically there to allow the view to be updated.
wide_awake: and if there's not, no harm done.
gbutler: xzilla: but it doesn't have to. If I edit a table that has no PK or OID, then the expected behaviour should be: Refresh the record set (not just the edited record) after the update. For inserts/updates re-position to the new row (using all fields as PK), for deletes, well, that depends, reposition to first row of result set would be my preference.
wide_awake: I'm out of here for the day. If you guys come to a conclusion, please reply to my message on the pgadmin-hackers list
gbutler: That same logic applies just as directly to updateable views.
xzilla: gbutler: you are confusing your desired behavior with the expected behavior
xzilla: if I click on a button that says "edit row", i expect to edit *that* row, not a bunch of rows
gbutler: xzilla: I'd say it would be the expected behavior for tables/updateable views without OID/PK.
xzilla: gbutler: i have user complaints that indicate it is not the generally desired behavior
xzilla: gbutler: and most people dont expect it, because 1) the gui indicates a different behavior, and 2) most people dont know the underlying strcuture of the table when editing rows in it
gbutler: xzilla: I can understand that. I do understand where you are coming from, even if it may not seem so. My only point would be that offering this behaviour, with some sort of warning, would fulfil the requirements/wishes of someone like wide_awake while not breaking things for everyone else. As I said, personally I don't think it even needs a warning (other than clearly showing it is a view/no-PK table as opposed to normal table) but, I th
gbutler: ink enough people would be confused so that a warning is warranted.
xzilla: at one point i played with the idea of selecting ctid in non-pk tables, and then using a combination of ctid/all columns, for update... but this wouldnt work for views
gbutler: xzilla: Yeah, I can understand that. The basic problem is that when I edit a row in the UI, if it is a view, that could potentially update many, many rows of underlying data and hence rows in the view result set. That could be confusing to someone who doesn't understand fully the concept of updateable views, but, it is in fact the expected behaviour for such a view. A appropriate warning should suffice to clear up confusion (I would thin
gbutler: k).
andres: xzilla: and it doesnt work with concurrent updates.
xzilla: andres: right, you potentially get no updates rows from that update, but i figured i could live with that
* endpoint_david wishes SHOW worked with wildcards
andres: xzilla: you even can update the wrong row
xzilla: endpoint_david: agreed, though you can ~* pg_settings.name
xzilla: andres: i dont think so
andres: xzilla: as quite likely youre not doing the whole looking at the table inside a transaction
andres: s/table/view/
xzilla: andres: for purposes of that update, a combinations of ctid/all columns would enfore one row with the data you expected it to have
xzilla: gbutler: another thing to keep in mind is that our sense of updateable views is quite different from say an oracle persons, where they have updateable views which more closely match the spec
andres: xzilla: unlikely to yield problems in practice, yes, but I can sure create some not that likely problem scenarios.
gbutler: xzilla: it seems to me, though, that the concept of updateable views in Postgres is not all that different from MS-SQL (MS-SQL requires you to create insert, update, delete triggers if the column mappings from view to underlying tables is ambiguous --- which it often is). To me, having re-write rules for insert, update, delete is not meaningfully different then insert, update, delete triggers. in either case, when I update a row in the v
gbutler: iew, I have no guarantees as to the number or kind of rows that will be updated.
gbutler: This will result in potentially many rows in the view updating. This to me is expected and desired.
xzilla: andres: well, i dont think i will ever implement that idea, but it seemed pretty good at the time. even in the unlikely event someone updates your old row away, and a new row in place, which just magically happens to match your ctid/columns, since that is exactly what you though you were editing, having your edit go through seems correct
gbutler: I write and use updateable views all the time, and this is the exact behaviour I expect from them.
xzilla: gbutler: hmm, guess i need to study how mssql tools handle these corner cases
xzilla: or better, someone else needs to study it and forward on the information
gbutler: Well, since I've bothered to open my big mouth, I guess I should man up and do that for you???
xzilla: gbutler: i wont ask, i'd hate to end up obligated myself to coding something
gbutler: xzilla: you said earlier, "if I click on a button that says "edit row", i expect to edit *that* row, not a bunch of rows". My answer would be, if I click on a row to edit in the UI where there is no PK, highlight all rows with exactly same fields (that are shown), provide the warning, permit the update, refresh the set, and reposition to the edited row (or first edited row) by positioning based upon all fields/new values.
gbutler: It would seem that would make things be very consistent from a UI perspective.
xzilla: gbutler: in theory we cant do that for updateable views though. i guess in the case of updateable views, we just always punt
xzilla: or maybe, if you click update, we show you all the update rules for that view
gbutler: xzilla: Yep, that too!
gbutler: xzilla: no obligation. When I have some time I'll put together a write-up of how MS-SQL tools handle this (HINT: They don't get it right with respect to what I or wide_awake would want)
[2] Hmmm....something is expected to do something unexpected. Sounds like some sort of '60's recreational pharmaceutical!
[3] Here, as elsewhere in this document, I'm using "update statement" to mean any statement that performs an update to the data in the table or view, including (but not limited to): update, insert, and delete.
[4] For example: Customer and Sale table in a simple "left outer join" view. I issue an update against "customer.name" where "sales.value = $100.00" to a new name. I have now, likely, just changed the name of many customers to a new name. This would almost NEVER be correct. This shows that the default behavior provided by MS-SQL for views is dangerous, unexpected, and completely wrong. What I, as a user, had I not known this were a view and/or did not know the structure of the underlying tables would've expected, would be only those names on rows with $ 100.00 sales would change. Instead, any row, no matter the sale value, would have the name changed, provided the same customer was associated with another row that includes as $100.00 sale. Yuck! Not expected. Updates to implicitly updateable views are extremely dangerous and REQUIRE the user/client/updater to KNOW that they are updating and view and to also KNOW the details fo the view. Leaky abstraction at it's worst!
[5] In this author's humble opinion, an implied contract such as this is not nearly as strong or reliable as the explicit contract created by the view author that takes affirmative action to creates insert/update/delete triggers/rules on a view.