auf.suno
Connector, geek, tech evangelist, business enabler, business angel, globetrotter, sportsman, agnostic, cosmopolitan, funny finch ...

This is my (Markus Gattol aka Suno Ano) website. It is composed and driven exclusively by Open Source Software. This website is
seamlessly integrating into my daily working environment (GNU Emacs + DebianGNU/Linux) which therefore means it becomes
a fully fledged and automatized publishing and communication platform. It will be under construction until 2012.

Open Source / Free Software, because freedom is in everyone's language ...
Frihed Svoboda Libertà Vrijheid เสรีภาพ Liberté Freiheit Cê̤ṳ-iù Ελευθερία Свобода פריי Bebas Libertada 自由
auf.suno
Website Sections
Home
FAQs
About me
Contact
CouchDB
Status: Just notes so far.
Pagecode: T->2 A->SAml H->trsa[t,a,si]d[t,a,si] C->SA[ccceji]
Last changed: Monday 2010-03-01 [12:21 UTC]
Abstract:

writeme
Table of Contents
Quickstart
Introduction
Installation and Configuration
Files
CouchApp
Scalability / Faulttolerance / Loadbalancing
Security
Miscellaneous
Benchmarks
Financial Applications
REST
Python Framworks
JavaScript Libraries / Client

  • http://wiki.apache.org/couchdb/FrontPage
  • http://johnpwood.net/2009/06/15/couchdb-a-case-study/
  • http://wiki.apache.org/couchdb/Related_Projects
  • http://pylab.blogspot.com/2009/01/ten-reasons-why-couchdb-is-better-than.html

Quickstart

  • curl -vX PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg?rev=2-3026105705 -d@duck.jpg -H "Content-Type: image/jpg"
  • The underscore namespace is reserved for internal use by CouchDB, for things like _id, _rev, _attachments, _conflicts, etc. We had a bug that failed to check for illegal underscores in all cases.
  • CouchDB does not have JOINS, instead they called collations

show contents of couchdb There are two possibilities:

  • A show function directly renders a document using JavaScript
  • Transforming Views with list functions; you might use a list function to render a view result into HTML, which gives you the opportunity to use more than one document as the input of your function. see http://wiki.apache.org/couchdb/Formatting_with_Show_and_List

query couchdb

  • a view is used to query data stored inside douchdb. The combination of a map- and a reduce function is called a view in CouchDB terminology.
    • Views are defined in design documents and can be replicated across instances. These design documents contain JavaScript functions that run queries using the concept of MapReduce.
    • CouchDB views can be permanent views stored inside design document, or temporary views executed on demand. Temporary views are resource-intensive and become slower as the amount of data stored in the database increases. For this reason, CouchDB views should, for the most part, be created in design documents.
  • CouchDB cannot create views across multiple databases.
  • it is completely possible to establish relationships between documents by having one document store the unique id of another document. However, these links are not directly supported by CouchDB, and can be easily broken.
{
   “_id”: “CouchDB: Databases and Documents”,
   “_rev”: “1-704787893”,
   “author”: “John Wood”,
   “email”: “john_p_wood”,
   “post”: “CouchDB is a documented oriented database.  A document...”,
   “tags”: [“couchdb”, “couchdb case study”, “json”],
   “comments”: [
      {
         “email”: “joe@somewhere.com”,
         “comment”: “Thanks for the information”
      },
      {
         “email”: “kevin@xyz.com”,
         “comment”: “CouchDB sounds pretty interesting”
      }
   ]
}
  • If there is only one piece of information from the relationship worth storing in the document, then an array works great (see the “tags” property above). For relationships with more complex data structures, an array of dictionaries fits the bill quite nicely (see the “comments” property above).
  • Instead, it is recommended that you use the data’s “natural key” as the id of your document. The natural key is some field, or combination of fields, in your document that uniquely identifies that document. In the example above, the title of the blog post is a good fit for a natural key. It is not very likely that I will be writing posts with the same title. If you happen to enjoy writing about the same stuff over and over, perhaps the title of the post combined with the date and time it was created would be a better fit. Either way, the id should be composed from data within the document.
    • CouchDB builds views using a map/reduce algorithm. When building a view, CouchDB will feed all documents that are new or have changed since the last time the view was built through a map function. The map function selects the documents of interest for that particular view. Then, optionally, a reduce function is run to calculate some aggregate statistics on the documents that have been selected (counts, sums, etc).
    • Views are re-built when they are accessed, and not when new documents are added to the database or existing documents are changed. However, you do have control over when views are built. If you specify stale=”ok” when accessing your view, CouchDB will not check to see if the view needs to be re-built. It will simply return results from the last time the view was built.
    • In our case, data is only added to the database once a day, and it is added by a background job. When the job is finished inserting data into the CouchDB database, it triggers the views to re-build themselves by accessing all of the views in the database (a few at a time), without specifying the stale=”ok” flag. Since this background job takes on the responsibility of updating the views after it inserts new data, the rest of our application can always specify stale=”ok” when accessing the views. This keeps the queries executed by the application fast, even when views are in the process of being re-built.
    • Views live in documents called “design documents”. Views within the same design document share a B-Tree data structure. This means that when one view in the design document is built, they all are built. So, careful planning is required to make sure unrelated views do not live in the same design document. You would not want the re-building of one view to delay the accessibility of another, totally unrelated view.
    • CouchDB does support “temporary views”, which are ad-hoc views that you can build and execute on the fly. However, temporary views are not recommended for production use, because they need to be built before they can get you the data you need. This could take hours, or days depending on the size of you database and the processing power of your database server. Temporary views are meant as a way to test new views in development which will eventually be saved into a design document, and not for running ad-hoc queries.
    • One possibility for reducing disk space used is being careful what you emit in your mapping functions. If you have a map function something like: function(doc) { emit(doc.key, doc); } Then the second parameter is storing a copy of the entire document. If you know that the particular field will only need one or two fields from the document, you could construct it like: function(doc) { emit(doc.key, { “field1″: doc.field1, “field2″: doc.field2 }); } Or, if you do need the whole document, you could write the map as: function(doc) { emit(doc.key), null } and then query it with the “include_docs=true” parameter. This takes a performance hit though as it has to do a separate disk seek to pull each document matching each row of the view. When I discussed some of these aspects on the mailing list the basic response was, if you’re that worried about disk space then CouchDB might not be suitable for you.
    • I’m a bit behind with these posts, and we’ve since figured out how to reduce the amount of disk space being used by the views. What was burning us was not so much what we were emitting, but how many times we were emitting it. The database in question has approximately 30 million documents. Each document consists of one or more messages. And, each map function emitted data from each message. To reduce the size of the views, we simply needed to cut back on what we were emitting (as you suggested). Since we didn’t really need to view the data at this fine level of detail, we created “summary” documents that contained aggregate data for the messages at the level of detail we needed. For example, we had several views that reported certain stats for a given period of time. Before making the change, we were able to get these stats for every second, which we didn’t really need. Instead, the summary documents contain aggregate stats per minute. The views now aggregate data in these summary documents to get us stats per minute, hour, day, etc. This cut back dramatically on the amount of data we were emitting, and the size of the views on disk.

view server

  • http://wiki.apache.org/couchdb/View_server in /etc/couchdb/default.ini
  • man 1 couchpy
  • http://code.google.com/p/couchdb-python/

design document

  • A design document is a CouchDB document with an id that begins with _design/
  • For instance, the example blog application, Sofa, is stored in a design document with the id _design/sofa.
  • The design doc fields show and list contain functions used to transform raw JSON into HTML, XML or other Content-Types.
  • We show this to note that design documents have a special case, as they are the only documents whose URLs can be used with a literal slash.
  • The lib field is used to hold additional JavaScript code and JSON data to be inserted at deploy time into view, show, and validation functions.

Introduction

  • collation
    • http://en.wikipedia.org/wiki/Collation
    • http://www.cmlenz.net/archives/2007/10/couchdb-joins
  • json
    • http://www.json.org/

Installation and Configuration

  • The combination of a map- and a reduce function is called a view in CouchDB terminology.
  • CouchDB recovers gracefully from a few isolated map function failures, but when a map function fails regularly (due to a missing required field or other JavaScript exception), CouchDB shuts off its indexing to prevent any further resource usage. For this reason, it’s important to check for the existence of any fields before you use them.
  • You send an HTTP request and you receive a JSON string in the HTTP response as a result.
  • CouchDB stores each database in a single file.
  • CouchDB uses the JSON format to store documents
  • CouchDB does not guarantee that older versions are kept around. If you want to store older versions of your data and want them to be around the next time you look for them, check out chapter xy Keeping track of History.
  • An Etag in HTTP-speak identifies a specific version of a resource.
  • Using a local source and a remote target database is called *push replication*. We’re pushing changes to a remote server.
    • curl -vX POST http://127.0.0.1:5984/_replicate -d '{"source":"albums","target":"http://127.0.0.1:5984/albums-replica"}'
  • You can also use a remote source and a local target to do a pull replication. This is great for getting the latest changes from a server that is used by others.
    • curl -vX POST http://127.0.0.1:5984/_replicate -d '{"source":"http://127.0.0.1:5984/albums-replica","target":"albums"}'
  • Finally, you can run remote replication which is mostly useful for management operations:
    • curl -vX POST http://127.0.0.1:5984/_replicate -d '{"source":"http://127.0.0.1:5984/albums","target":"http://127.0.0.1:5984/albums-replica"}'

Design Documents:

  • Design documents are a special type of CouchDB document which contains application code. Because it runs inside a database, the application API is highly structured.
  • If you read carefully over the last few paragraphs, one clause stands out: “when you query your view, CouchDB takes the source code and runs it for you on every document in the database”. If you have a lot of documents, that takes quite a bit of time and you might wonder if it is not horribly inefficient to do this. Yes it would be, but CouchDB is designed to avoid any extra costs: it only runs through all documents once, when you first query your view. If a document is changed, the map function is only run once, to recompute the keys and values for that single document. The view result is stored in a B-tree, just like the structure which is responsible for holding your documents. View B-trees are stored in their own file, so that for high-performance CouchDB usage, you can keep views on their own disk. The B-tree provides very fast lookups of rows by key, as well as efficient streaming of rows in a key range.
  • When implemented properly, the use of Etags can cut down significantly on server load

CouchApp:

  • We call applications that can be hosted from a standard CouchDB, CouchApps.
  • Applications are stored as design documents. You can replicate design documents just like everything else in CouchDB. Because design documents can be replicated, whole CouchApps are replicated. CouchApps can be updated via replication, but they are also easily "forked" by the users, who can alter the source code at will.
  • Applications that live in CouchDB — nice. You just attach a bunch of HTML and JavaScript files to a design document and you are good to go. Spice that up with view-powered queries, and show functions that render any media type from your JSON documents and you have all it takes to write self-contained CouchDB applications.

Files

  • web server root /usr/share/couchdb/www

CouchApp

  • http://github.com/couchapp/couchapp
  • http://japhr.blogspot.com/2010/02/simple-couchapp.html
  • http://japhr.blogspot.com/2010/02/couchapp-templates-for-showing.html
  • http://japhr.blogspot.com/2010/02/textile-and-partial-templates-in.html
  • http://japhr.blogspot.com/2010/02/couchapp-updates-with-slight-conflict.html

Scalability / Faulttolerance / Loadbalancing

  • http://code.google.com/p/couchdb-lounge/
  • For each design document CouchDB hashes view indexes by map functions. If you have byte-by-byte identical map functions in two views that have different reduce functions, the views share a single index.
  • Is there any difference between push and pull mode replication? Pull replication is a little more efficient than push, due to http pipelining etc. Update: Adam corrected me on that the other day, push replication now can use bulk doc updates and is more efficient again

Security

  • http://wiki.apache.org/couchdb/Setting_up_an_Admin_account
  • http://en.wikipedia.org/wiki/Same_origin_policy
  • http://github.com/mcaprari/eopenid

Miscellaneous

  • http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing

Benchmarks

- Http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html - http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html

Financial Applications

  • Also, if you are worried about loss of precision with large integers (with more than 15 digits) you should use a JavaScript port of BigInteger (or BigDecimal) for calculations and store the numbers as strings. I'm successfully using BigDecimal for financial calculations at the moment. It's probably overkill for most people but it works. See also http://issues.apache.org/jira/browse/COUCHDB-227 for some links about this issue.

REST

  • http://www.infoq.com/articles/rest-introduction

Python Framworks

Werkzeug

  • http://werkzeug.pocoo.org/
  • http://dev.pocoo.org/projects/werkzeug/
  • using couchdb with werkzeug is a matter of four lines of code http://bitbucket.org/benoitc/couchit/src/tip/couchit/utils/__init__.py#cl-44

JavaScript Libraries / Client

Creative Commons License
The content of this site is licensed under Creative Commons Attribution-Share Alike 3.0 License.