|
|
CouchDB
Status: Just notes so far.
Pagecode: T->2 A->SAml H->trsa[t,a,si]d[t,a,si] C->SA[ccceji]
Last changed: Monday 2010-03-01 [12:21 UTC]
Abstract:
writeme
|
Table of Contents
|
Quickstart
show contents of couchdb
There are two possibilities:
- A show function directly renders a document using JavaScript
- Transforming Views with list functions; you might use a list
function to render a view result into HTML, which gives you the
opportunity to use more than one document as the input of your
function. see
http://wiki.apache.org/couchdb/Formatting_with_Show_and_List
query couchdb
- a view is used to query data stored inside douchdb. The combination
of a map- and a reduce function is called a view in CouchDB
terminology.
- Views are defined in design documents and can be replicated across
instances. These design documents contain JavaScript functions
that run queries using the concept of MapReduce.
- CouchDB views can be permanent views stored inside design
document, or temporary views executed on demand. Temporary views
are resource-intensive and become slower as the amount of data
stored in the database increases. For this reason, CouchDB views
should, for the most part, be created in design documents.
- CouchDB cannot create views across multiple databases.
- it is completely possible to establish relationships between
documents by having one document store the unique id of another
document. However, these links are not directly supported by
CouchDB, and can be easily broken.
{
“_id”: “CouchDB: Databases and Documents”,
“_rev”: “1-704787893”,
“author”: “John Wood”,
“email”: “john_p_wood”,
“post”: “CouchDB is a documented oriented database. A document...”,
“tags”: [“couchdb”, “couchdb case study”, “json”],
“comments”: [
{
“email”: “joe@somewhere.com”,
“comment”: “Thanks for the information”
},
{
“email”: “kevin@xyz.com”,
“comment”: “CouchDB sounds pretty interesting”
}
]
}
- If there is only one piece of information from the relationship
worth storing in the document, then an array works great (see the
“tags” property above). For relationships with more complex data
structures, an array of dictionaries fits the bill quite nicely
(see the “comments” property above).
- Instead, it is recommended that you use the data’s “natural key” as
the id of your document. The natural key is some field, or
combination of fields, in your document that uniquely identifies
that document. In the example above, the title of the blog post is
a good fit for a natural key. It is not very likely that I will be
writing posts with the same title. If you happen to enjoy writing
about the same stuff over and over, perhaps the title of the post
combined with the date and time it was created would be a better
fit. Either way, the id should be composed from data within the
document.
- CouchDB builds views using a map/reduce algorithm. When building a
view, CouchDB will feed all documents that are new or have changed
since the last time the view was built through a map function. The
map function selects the documents of interest for that particular
view. Then, optionally, a reduce function is run to calculate some
aggregate statistics on the documents that have been selected
(counts, sums, etc).
- Views are re-built when they are accessed, and not when new
documents are added to the database or existing documents are
changed. However, you do have control over when views are built.
If you specify stale=”ok” when accessing your view, CouchDB will
not check to see if the view needs to be re-built. It will simply
return results from the last time the view was built.
- In our case, data is only added to the database once a day, and it
is added by a background job. When the job is finished inserting
data into the CouchDB database, it triggers the views to re-build
themselves by accessing all of the views in the database (a few at
a time), without specifying the stale=”ok” flag. Since this
background job takes on the responsibility of updating the views
after it inserts new data, the rest of our application can always
specify stale=”ok” when accessing the views. This keeps the
queries executed by the application fast, even when views are in
the process of being re-built.
- Views live in documents called “design documents”. Views within
the same design document share a B-Tree data structure. This means
that when one view in the design document is built, they all are
built. So, careful planning is required to make sure unrelated
views do not live in the same design document. You would not want
the re-building of one view to delay the accessibility of another,
totally unrelated view.
- CouchDB does support “temporary views”, which are ad-hoc views
that you can build and execute on the fly. However, temporary
views are not recommended for production use, because they need to
be built before they can get you the data you need. This could
take hours, or days depending on the size of you database and the
processing power of your database server. Temporary views are
meant as a way to test new views in development which will
eventually be saved into a design document, and not for running
ad-hoc queries.
- One possibility for reducing disk space used is being careful what
you emit in your mapping functions. If you have a map function
something like:
function(doc) { emit(doc.key, doc); } Then the
second parameter is storing a copy of the entire document. If you
know that the particular field will only need one or two fields
from the document, you could construct it like: function(doc) {
emit(doc.key, { “field1″: doc.field1, “field2″: doc.field2 }); }
Or, if you do need the whole document, you could write the map as:
function(doc) { emit(doc.key), null } and then query it with the
“include_docs=true” parameter. This takes a performance hit though
as it has to do a separate disk seek to pull each document
matching each row of the view. When I discussed some of these
aspects on the mailing list the basic response was, if you’re that
worried about disk space then CouchDB might not be suitable for
you.
- I’m a bit behind with these posts, and we’ve since figured out how
to reduce the amount of disk space being used by the views. What
was burning us was not so much what we were emitting, but how many
times we were emitting it. The database in question has
approximately 30 million documents. Each document consists of one
or more messages. And, each map function emitted data from each
message. To reduce the size of the views, we simply needed to cut
back on what we were emitting (as you suggested). Since we didn’t
really need to view the data at this fine level of detail, we
created “summary” documents that contained aggregate data for the
messages at the level of detail we needed. For example, we had
several views that reported certain stats for a given period of
time. Before making the change, we were able to get these stats
for every second, which we didn’t really need. Instead, the
summary documents contain aggregate stats per minute. The views
now aggregate data in these summary documents to get us stats per
minute, hour, day, etc. This cut back dramatically on the amount
of data we were emitting, and the size of the views on disk.
view server
design document
- A design document is a CouchDB document with an id that begins with
_design/
- For instance, the example blog application, Sofa, is stored in a
design document with the id _design/sofa.
- The design doc fields show and list contain functions used to
transform raw JSON into HTML, XML or other Content-Types.
- We show this to note that design documents have a special case, as
they are the only documents whose URLs can be used with a literal
slash.
- The lib field is used to hold additional JavaScript code and JSON
data to be inserted at deploy time into view, show, and validation
functions.
Introduction
Installation and Configuration
- The combination of a map- and a reduce function is called a view in
CouchDB terminology.
- CouchDB recovers gracefully from a few isolated map function
failures, but when a map function fails regularly (due to a missing
required field or other JavaScript exception), CouchDB shuts off
its indexing to prevent any further resource usage. For this
reason, it’s important to check for the existence of any fields
before you use them.
- You send an HTTP request and you receive a JSON string in the HTTP
response as a result.
- CouchDB stores each database in a single file.
- CouchDB uses the JSON format to store documents
- CouchDB does not guarantee that older versions are kept around. If
you want to store older versions of your data and want them to be
around the next time you look for them, check out chapter xy
Keeping track of History.
- An Etag in HTTP-speak identifies a specific version of a resource.
- Using a local source and a remote target database is called *push
replication*. We’re pushing changes to a remote server.
- You can also use a remote source and a local target to do a pull
replication. This is great for getting the latest changes from a
server that is used by others.
- Finally, you can run remote replication which is mostly useful for
management operations:
Design Documents:
- Design documents are a special type of CouchDB document which
contains application code. Because it runs inside a database, the
application API is highly structured.
- If you read carefully over the last few paragraphs, one clause
stands out: “when you query your view, CouchDB takes the source
code and runs it for you on every document in the database”. If you
have a lot of documents, that takes quite a bit of time and you
might wonder if it is not horribly inefficient to do this. Yes it
would be, but CouchDB is designed to avoid any extra costs: it only
runs through all documents once, when you first query your view. If
a document is changed, the map function is only run once, to
recompute the keys and values for that single document. The view
result is stored in a B-tree, just like the structure which is
responsible for holding your documents. View B-trees are stored in
their own file, so that for high-performance CouchDB usage, you can
keep views on their own disk. The B-tree provides very fast lookups
of rows by key, as well as efficient streaming of rows in a key
range.
- When implemented properly, the use of Etags can cut down
significantly on server load
CouchApp:
- We call applications that can be hosted from a standard CouchDB,
CouchApps.
- Applications are stored as design documents. You can replicate
design documents just like everything else in CouchDB. Because
design documents can be replicated, whole CouchApps are replicated.
CouchApps can be updated via replication, but they are also easily
"forked" by the users, who can alter the source code at will.
- Applications that live in CouchDB — nice. You just attach a bunch
of HTML and JavaScript files to a design document and you are good
to go. Spice that up with view-powered queries, and show functions
that render any media type from your JSON documents and you have
all it takes to write self-contained CouchDB applications.
Files
- web server root /usr/share/couchdb/www
CouchApp
Scalability / Faulttolerance / Loadbalancing
- http://code.google.com/p/couchdb-lounge/
- For each design document CouchDB hashes view indexes by map
functions. If you have byte-by-byte identical map functions in two
views that have different reduce functions, the views share a
single index.
- Is there any difference between push and pull mode replication?
Pull replication is a little more efficient than push, due to http
pipelining etc. Update: Adam corrected me on that the other day,
push replication now can use bulk doc updates and is more efficient
again
Security
Miscellaneous
Benchmarks
- Http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html
- http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html
Financial Applications
- Also, if you are worried about loss of precision with large
integers (with more than 15 digits) you should use a JavaScript
port of BigInteger (or BigDecimal) for calculations and store the
numbers as strings. I'm successfully using BigDecimal for financial
calculations at the moment. It's probably overkill for most people
but it works. See also
http://issues.apache.org/jira/browse/COUCHDB-227 for some links
about this issue.
REST
Python Framworks
Werkzeug
JavaScript Libraries / Client
|