20:09:44 <MadameZou> #startmeeting
20:09:44 <MeetBot> Meeting started Thu Dec 16 20:09:44 2010 UTC.  The chair is MadameZou. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:09:44 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
20:09:52 <enrico> Hello
20:10:01 <MadameZou> #topic Debian package informations
20:10:18 <enrico> I'm Enrico Zini and I'll bring you on a trip of Debian package information
20:10:28 <enrico> If you have any questions, please direct them to #dw-question, and prefix them with "QUESTION: "
20:10:34 <enrico> Feel free to make any kind of question, even if it might seem silly -- the answer could help other people
20:10:39 <enrico> aghisla will take care of posting them here, when it's more appropriate
20:10:56 <enrico> I am going to talk about information about Debian package
20:10:58 <enrico> there is a lot of it
20:11:17 <enrico> some we see every day, some we can't even begin to suspect it could possibly ever exist, but is there
20:11:42 <enrico> The thing you see every day in Debian is packages
20:11:59 <enrico> there are loads of them, we usually say in the order of like 25000 or so
20:12:20 <enrico> we install and remove packages from our systems, upload new versions of them and so on
20:12:35 <enrico> we're probably all used in seeing package information with apt-cache
20:12:48 <enrico> for example, "apt-cache show debtags" shows information about the package "debtags"
20:15:54 <enrico> Every package has a name, the format of the name is defined by the Debian policy: for example, it cannot contain underscores, but it can contain dashes
20:15:54 <enrico> Then there is a version, with a more interesting format. The policy defines it as well as how to compare two versions, which is a remarkably interesting problem
20:15:54 <enrico> Then there's all the rest that we're used to see, like dependencies, descriptions, maintainers and so on
20:15:54 <enrico> Information about packages is used for many different tasks, some are performed by machines and some by humans
20:15:55 <enrico> so you have dependencies, that package managers such as apt or software-center use to decide what is needed for a package to work
20:15:55 <enrico> and you have descriptions, which are used by people to decide whether they'd like to install a package or not
20:16:44 <enrico> these tasks can be nontrivial: dependency resolution is a complex task (so complex there are research centers devoted to studying the problem, which is great because they hire Debian people :)
20:16:55 <enrico> and another complex tasks is to find the packages you need
20:17:14 <enrico> very often we really really need a package that is in Debian but we don't know how to find it
20:17:23 <enrico> so a good description is important
20:17:45 <enrico> not only to find a package, but to evaluate it, and to compare it with its alternatives before installing it, and so on
20:17:50 <enrico> you're probably familiar with it
20:18:05 <enrico> There are other interesting things in the output of apt-cache
20:18:46 <enrico> like "how big it is". Maybe nowadays we don't care anymore how big is the software we install on an average desktop, but it does make sense on smaller systems
20:19:16 <enrico> it'd be nice to have a package manager to be able to compute the space that would be used by a package and all its dependencies, at the moment we don't have that
20:19:34 <enrico> (talking about package information is a good idea to have cool ideas for package managers :)
20:20:01 <enrico> Recently there are new fields in the output of apt-cache: Homepage: and Tag:
20:20:31 <enrico> Homepage is nice: we can learn more about a package by just visiting its website. It's a simple additions that makes package managers much more useful
20:21:04 <magellanino> enrico: but were not present before? this field?
20:21:07 <enrico> (it'd be nice to have a system that automatically checks the Homepage: fields for broken links: I'm not aware of it existing yet)
20:21:25 <enrico> magellanino: please ask questions in #dw-question, and prefix them with "QUESTION: ": aghisla will take care of posting them here
20:21:37 <magellanino> enrico: ah ok sorry
20:22:53 <enrico> anyway, Homepage and Tags are recent additions. Recent as in, 2 or 3 years IIRC
20:22:54 <enrico> "Tag:" is categories for packages. There are lots of them available for use, we'll come back to them later when I'll cover debtags
20:23:33 <enrico> A useful thing for Tag: seen together with the package descriptions is that it gives you lots of extra information like "what programming language is this written in?" "what UI toolkit does it use?" that could be interesting but should really not be in the package descriptions
20:24:10 <enrico> The information you see in "apt-cache show debtags" comes from "Packages files"
20:24:33 <enrico> they are found in Debian mirrors and CDs and acquired by Apt when you do "apt-get update"
20:24:55 <enrico> if you do /var/lib/apt/lists/ you can see your local copies of Packages files acquired by apt
20:25:04 <enrico> ...if you do "ls /var/lib/apt/lists/" sorry
20:25:49 <enrico> Here is where you find Package files on mirrors: http://ftp.debian.org/debian/dists/squeeze/main/binary-armel
20:25:55 <enrico> (that is for the people who run armel)
20:26:27 <enrico> Every combination of distribution, suite and architecture has a different Packages file
20:26:53 <enrico> so in any computer, apt needs to download at least 2 of them: the one for your architecture and the one for the "all" architecture
20:27:33 <enrico> then it does some merging and indexing and builds the .bin files in /var/cache/apt that it uses to access the package information efficiently
20:27:39 <enrico> Any questions so far?
20:27:51 <MadameZou> there's the one by magellanino
20:28:18 <enrico> MadameZou: I think I have answered it
20:28:22 <MadameZou> :)
20:28:44 <MadameZou> so, we are ok
20:28:50 <enrico> Ok. The information we have seen so far is about "binary" packages
20:30:09 <enrico> a binary package is the one you install in your machine. It's called binary because it's been made ready for use by the computer. It is not the source that you download from the package author: it's been compiled and somehow preinstalled so that it can be unpacked in your system
20:30:21 <enrico> in Debian we also have "Source" packages
20:30:49 <enrico> that is, you can download the source code of any package in Debian
20:31:10 <enrico> if you do "apt-cache showsrc debtags" you find information about the sources of Debtags
20:31:27 <enrico> it doesn't work on all systems: you need to have source entries in /etc/apt/sources.list
20:31:50 <enrico> Something like "deb-src http://ftp.uk.debian.org/debian/ sid main"
20:32:11 <enrico> if you have "deb-src" sources, apt will download Sources files from the mirrors, and make them available for you when you do "apt-cache showsrc"
20:32:30 <enrico> In http://ftp.debian.org/debian/dists/squeeze/main/source/ you can see the source files in the mirror
20:33:35 <enrico> You have a different source file per combination of (distribution, suite). But you have a single source package for all architectures. The source package will be compiled once per architecture to build the various binary packages
20:34:36 <enrico> Let's see an example of source package information. You can run "apt-cache showsrc debtags"; I've pasted the output to http://paste.debian.net/102567/ in case you don't have sources available in your /etc/apt/sources.list
20:35:22 <enrico> Some information, like the package name, version and maintainers, is similar. Some is different: for example we have "Build-Depends" instead of "Depends".
20:35:43 <enrico> Build-Depends are the binary packages you need to build this source package
20:36:11 <enrico> They are usually different from Depends: for example you need "gcc" to compile many packages, but not to run them.
20:36:54 <enrico> "Vcs-Browser:" and the other "Vcs-*" tags are another very welcome recent addition: they tell you where you can find the sources of the package in a version control system
20:37:29 <enrico> suppose you find a bug in a package, you can use "apt-cache showsrc" to see where is its code, check it out and start hacking on it
20:38:23 <dapal> enrico: QUESTION: is that (Vcs-*) an upstream source or a debian source?
20:38:51 <enrico> IIRC it's the *debian* source, but please correct me if I remember wrong
20:39:14 <dapal> (yup, it's the Debian source)
20:39:31 <enrico> there is a difference because often the Debian developers have a version control system where they do the packaging, which is not necessarily the same one used by the software author
20:39:59 <enrico> In the description of _binary_ packages you have an interesting header which doesn't always show, and it tells you what is the name of the source package
20:40:10 <enrico> it's not always the same: one source package can generate many binary packages
20:40:38 <enrico> If you do, for example, "apt-cache show libc6" you'll see "Source: eglibc"
20:41:03 <enrico> there is no "libc6" source package: "libc6" is generated by the "eglibc" sources
20:41:26 <enrico> so apt tells you that if you want to see the sources of libc6, you need to get the "eglibc" source package
20:41:37 <enrico> the "Source:" header is omitted when the names of the source and binary packages are the same
20:42:08 <enrico> "apt-cache showsrc libc6" is smart enough to see the "Source:" header and show you the right source package anyway
20:42:59 <enrico> you have the opposite header in "apt-cache showsrc": for example, "apt-cache showsrc eglibc" has: "Binary: libc-bin, libc-dev-bin, glibc-doc, eglibc-source, locales, locales-all, [...]"
20:43:50 <enrico> eglibc is a source package that generates many binary packages :)
20:43:50 <enrico> So we've seen binary packages and source packages
20:43:50 <enrico> Any question?
20:44:37 <MadameZou> QUESTION:one source package can generate many binary packages.. i dont understand this
20:44:45 <enrico> Good question
20:45:09 <enrico> Think of a source package as the real software you find on the internet
20:45:14 <enrico> for example, "Open Office"
20:45:23 <enrico> or "Firefox"
20:45:33 <enrico> we normally have one source package for them
20:45:52 <enrico> but after compiling it, their build system generates lots of different packages
20:46:13 <enrico> because we don't always want to install all of Open^WLibre Office, or all the translations of Firefox
20:46:49 <enrico> so if you run "apt-cache search openoffice.org" you'll find lots of binary packages, and likely they're all pieces of the single big source
20:47:12 <enrico> does it make sense?
20:47:36 <magellanino> yes yes i understand now thanks enrico
20:47:41 <enrico> :)
20:48:01 <MadameZou> no more questions, enrico
20:48:17 <enrico> So we've seen source packages and binary packages. Maybe it would make sense to count the size of Debian in terms of source packages, but that'd make Debian feel much smaller :)
20:48:34 <enrico> like only 15000 packages or so <grin>
20:48:43 <enrico> which is still a lot
20:49:20 <enrico> Let's see that "Tag:" header
20:49:31 <enrico> it's been introduce to help dealing with a large number of packages
20:50:00 <enrico> in the past there was only the "Section:" header, which still exists: you also see it in "apt-cache show"
20:50:13 <enrico> Section is limited, in that one package can only be in one section
20:50:45 <enrico> would you put Evolution in the "mail" section or in the "gnome" section?
20:51:33 <magellanino> mail
20:51:36 <enrico> Both would be appropriate. So we started working on "Debtags" as a way to have a far better category system
20:52:17 <enrico> You can see that the Tag: header has several tags, not just one
20:52:26 <enrico> but Debtags is not just "multiple sections"
20:53:20 <enrico> Every tag is made of two parts, separated by "::"
20:53:33 <enrico> For example, debtags is "role::program"
20:54:05 <enrico> the first part identifies a group of similar tags, which due to habits in library science is called a "facet"
20:54:36 <enrico> so role::* is the group of all roles one package can have in the system
20:54:44 <enrico> program, library, documentation, plugin and so on
20:55:17 <enrico> there is a fantastic web page to browse all available tags, let me find the URL
20:56:01 <enrico> http://debtags.alioth.debian.org/vocabulary/
20:56:23 <enrico> You can see all the facets (groups of tags) and, clicking on a facet, you can see all the options for that group
20:56:29 <enrico> ...all the tags for that group
20:57:09 <enrico> there are 620 different tags available at the moment
20:57:10 <enrico> quite a lot
20:57:23 <enrico> if we didn't have the groups, it'd be really complicated to keep track of them
20:57:38 <enrico> each group is a different "point of view" from which we look at Debian
20:58:15 <enrico> this is called "Faceted Classification" and the Debtags simplification of it is described here: http://debtags.alioth.debian.org/paper-debtags.html#debtags-theoretical-foundations
20:58:46 <enrico> the theory behind it is fascinating, but I won't go into it now
20:59:18 <enrico> There are many things in the Debtags project that is worth looking into
20:59:37 <enrico> one of them is the idea of looking at Debian from different points of view
20:59:49 <enrico> we like to say that "Debian is the universal operating system"
21:00:34 <enrico> but saying that you can do everything with Debian is not really helpful if somebody has a specific need
21:00:51 <enrico> so by using a group of tags we can give examples of what is available for a given field
21:01:02 <enrico> See the "Accessibility Support" group of tags
21:01:34 <enrico> "Biology", "Software Development", "Games and Amusement", "Security", "World Wide Web"...
21:01:59 <enrico> they are all examples of how rich is Debian
21:02:26 <enrico> Debtags is designed so that there are at least 7 packages for each tag
21:02:47 <enrico> this makes tags very concrete, they really represent some bit of Debian
21:03:07 <enrico> (7 comes from http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two)
21:03:28 <dapal> enrico, QUESTION: is there a command to search for tags? Say like if i search a media-player i do :"apt-cache search audio"
21:03:42 <enrico> dapal: very good question
21:03:57 <enrico> having so many (620) different tags calls for a search system for tags
21:04:36 <enrico> over the time we put together some interestingly scary smart algorithms to find tags
21:05:14 <enrico> one you can see in "axi-cache": if you have it installed, you can run, for example, "axi-cache search --tags image editor"
21:05:24 <enrico> and it will give you a list of tags that could be related to those keywords
21:05:43 <enrico> axi-cache comes with the package "apt-xapian-index": it is installed by default in many systems, but not all of them
21:05:47 <enrico> more of that later
21:06:20 <enrico> "goplay" is a wonderful little program that shows off the "many different points of view" idea
21:06:43 <enrico> thanks to Miriam Ruiz
21:06:48 <enrico> Here is a screenshot: http://www.miriamruiz.es/img/goplay-1.0_screenshot.png
21:06:58 <enrico> it is a program to find packages, but only *game* packages
21:07:15 <enrico> it can show screenshots, and allow to filter by game categories
21:07:40 <enrico> it's a program that does more with less: it *hides* some information to show only the information that really matters in a given field
21:08:37 <dapal> enrico: <dunetna> QUESTION: Do you think, in the future, Section: header could disappear (and only use debtags)?
21:08:40 <enrico> in the package goplay there are also goadmin, golearn, gosafe and goweb, which are similar to goplay but show a different point of view (for example, system administration)
21:08:51 <enrico> thanks, good question
21:08:54 <enrico> Probably not
21:09:08 <enrico> Sections will be around for quite a while
21:09:33 <enrico> showing an example in a sec
21:09:40 <dapal> enrico: (I also have another question queued, regarding debtags+UDD, do you want it now or later?)
21:10:19 <dapal> in the meanwhile:
21:10:20 <dapal> <valhalla_> QUESTION: the goplay screenshot shows Sexual Content and Violence Content facets, but I can't find them in the Debtags - Volabulary Browser, why?
21:11:05 <enrico> For example "Section: oldlibs" is used to automatically track packages that need to be ported to newer libraries
21:11:53 <enrico> There is a big difference between "Section" and "Tag": Section is maintained by ftp-master and Tag is maintained by developers and users
21:13:03 <enrico> so Section is a field that can be used to take important decisions on a package, because its editing is much more controlled
21:13:40 <enrico> but Section is going to be something that's used to sort of track the state of a package in Debian, and Tag something used to find a package in Debian
21:13:51 <enrico> I see them evolving in different directions
21:14:08 <enrico> although I reckon this is a rather subtle distinction at this stage
21:14:17 <enrico> more on the "how Debtags is maintained" later
21:14:42 <enrico> about "Sexual Content and Violence Content", that was an experiment by Miriam
21:14:47 <enrico> a very big work, actually
21:15:18 <enrico> debtags allow to have external tag sources, listed in /etc/debtags/sources.list
21:15:31 <enrico> it will download them and merge them similarly to what apt does with package information
21:15:56 <enrico> this can be used to provide tags that Debian cannot maintain in a standard way
21:16:25 <enrico> for example, many people disagree on the methods to rate a game by violence or sexual content
21:17:15 <enrico> while I don't feel confident in picking one method and making it Universal by adding the information in vanilla Debtags, I'm very happy to allow the content to be merged to a system if the user wants
21:17:26 <enrico> http://www.miriamruiz.es/weblog/?p=69 is some information from Miriam about the project
21:18:23 <enrico> I personally have't heard news about the game rating project since quite a while and I lost the link to the debtags source to use for it
21:19:07 <enrico> we had the idea to ship the ratings in a Debian package one can install, and provides the extra bit of configuration for Debtags
21:19:56 <enrico> (someone should chase Miriam up, and maybe offer her help: it was the main example of external tag data that can be optionally included in a Debian system and I'd hate to lose it)
21:20:43 <enrico> Another example use of external tag sources is to make Debian scale *down* to an organisation
21:21:45 <enrico> for example, a network of schools can maintain its own tag database with things like "school::teacher" "school::primary-education" "school::science-lab" and so on
21:22:32 <enrico> I think the Fuss project played with the idea some time ago, but I don't remember if they eventually deployed it
21:22:44 <enrico> (http://fuss.bz.it/)
21:23:13 <enrico> (The Fuss project is a Debian blend for the Italian speaking minority shools in the German speaking area of Italy)
21:23:39 <enrico> Let's move to how Debtags information is edited
21:23:56 <enrico> Obviously we cannot ask Debian developers to learn how to use 620 tags for their packages
21:24:11 <enrico> We could ask them to, but we can't expect them to actually do it well
21:24:36 <enrico> also, users can be better taggers than developers, because they can be field experts in a way the developer is not
21:24:57 <enrico> every IT person who worked with very specialised customers is well aware of this
21:25:46 <enrico> it's common to be asked to write, debug or package software that does things that one cannot understand
21:25:53 <enrico> (at least, it happens to me a lot)
21:26:05 <enrico> so tagging is done is a wiki-like way
21:26:24 <enrico> If you go to http://debtags.alioth.debian.org/todo.html you see a list of packages that need tagging
21:26:31 <enrico> click on a package and you'll have the tag editor
21:26:58 <enrico> the editor is a web application that allows anybody to edit the tags of a package
21:27:25 <enrico> it has interesting features, like it tries to suggest you tags or ways to improve the classification of a package
21:28:24 <enrico> Debian Developers are of course encouraged to have a look at their packages: in http://qa.debian.org/developer.php?login=enrico for example you can find a "Debtags" link that takes to a per-developer tagging TODO-list page
21:28:38 <enrico> http://debtags.alioth.debian.org/todo.html?maint=enrico%40debian.org is mine
21:28:56 <enrico> oh dear the interface is telling me off, I should fix some of them
21:29:19 <enrico> note it says things like "There is a 95.4% chance that the tag devel::library is missing"
21:29:25 <dapal> <dunetna> QUESTION: I see the debtag devel::lang:c. Is "lang" a kind of "subfacet"?
21:29:43 <enrico> it uses the same algorithms used by supermarkets to suggest you products to buy :) but I digress
21:29:57 <enrico> dunetna: well spotted
21:31:03 <enrico> I really want to keep the structure of debtags as just 2 levels: facet and tag. We tried trees and gave up because they are extremely difficult to maintain
21:31:56 <enrico> but sometimes we end up having little groups inside a facet, like in devel::lang:c ; it's convenient in that case, but not something I'd like to encourage
21:32:18 <enrico> so I don't like to think of "subfacets" or "subtags"
21:32:48 <dapal> <hlf> QUESTION : if anybody can add tag, are there no SPAM, or false tag ?
21:33:08 <dapal> (*cough*)
21:33:12 <enrico> hlf: thanks, good question
21:33:14 <enrico> dapal: :)
21:33:31 <enrico> indeed everybody can edit tags: go to http://debtags.alioth.debian.org/edit.html pick a package and play with it
21:34:00 <enrico> there is a "When done: [Submit]" button that does just that: it saves your edits in the Debtags database
21:34:19 <enrico> and no autentication of any kind: the idea is, you see an issue in the tagging of the package, go there and fix it
21:34:45 <enrico> SPAM is not an issue, because there is no way to send email or even to enter text contents like advertisement: the only thing you can do is add and remove tags
21:35:02 <enrico> there is an issue of quality of course, and possibly vandalism
21:36:04 <enrico> (although if somebody wanted to vandalise debtags, I'd be impressed: there are far more visible and more rewarding things worth messing with :)
21:36:47 <enrico> in case of vandalism, we have daily backups going back since the beginning of the Debtags project: the dataset is small, so backups are cheap :)
21:37:28 <dapal> <dunetna> QUESTION: Can you have two facets with more than one tag for a package? (I'm thinknig in works-with-format::)
21:37:30 <enrico> the issue is indeed quality. Sometimes people play with the interface by clicking at random and accidentally submit
21:37:59 <enrico> dunetna: yes. I'll add details in a moment
21:38:04 <dapal> (oops, sorry, thought you had finished)
21:38:27 <enrico> for ensuring quality, what happens is that all submissions are manually reviewed before entering Debian proper
21:38:50 <enrico> they are somehow aggregated so that they are easier to review
21:39:00 <enrico> the review is done by me and dapal
21:39:11 <enrico> big applause to dapal for helping there
21:39:16 <enrico> \o/
21:39:16 <dapal> \o/
21:39:20 * dapal thanks everybody
21:39:50 <enrico> the plan is to design some interface to allow debian maintainers to review submissions for their own packages
21:40:05 <TetsuyO> \o/
21:40:06 <enrico> something like "people think the tagging of your packages should be changed this way:"
21:40:55 <enrico> but that interface is technically feasible, we have a decently good idea of how to build it, but still needs to be written
21:41:07 <enrico> I see it happening in a year or so, to give a rough timeframe
21:41:28 <enrico> with regards of two facets with more than one tag per package, you can, indeed
21:41:52 <enrico> another example is the "use::" facet, and the fact that a package can have many uses (think a web browser)
21:42:33 <enrico> in fact, any attempt to add restrictions to the way tags can be used has succeeded in showing a sizable number of unexpected corner cases where the rule would need to be broken
21:42:49 <enrico> therefore it just makes sense to have no restrictions except common sense
21:43:18 <dapal> enrico: two (three) questions in queue
21:43:25 <enrico> dapal: go ahead
21:43:48 <dapal> so, the in-topic one:
21:43:48 <enrico> let's do all the questions before moving on
21:43:49 <dapal> <MadameZou> QUESTION: are you looking for volunteer to review submissions?
21:43:52 <dapal> nice
21:44:09 <enrico> Always looking for volunteers there :)
21:44:20 <enrico> beware the current procedure is... special
21:45:24 <enrico> so I'm not too actively advertising the need for volunteers because I'm not sure I feel comfortable asking people to do it the way I do it, and I can't think of any better way that can be quickly put into place
21:45:53 <enrico> for that reason I'm very interested in building new "allow people to review" interfaces
21:46:19 <dapal> enrico: next question, <komozo> QUESTION: Could you give us one example (a name or a link) of algo used to implement facets ?
21:46:36 <enrico> MadameZou: but in the meantime, by all means if you'd like to get your hands dirty in it you'd make me very happy
21:47:09 <enrico> komozo: what do you mean with "algorithm used to implement facets"?
21:47:12 <MadameZou> enrico: thanks ;)
21:48:07 <enrico> The "supermarket suggestion" algorithm used to give some tagging suggestions is here: http://www.borgelt.net/apriori.html
21:48:53 <enrico> and http://www.enricozini.org/2007/debtags/axi-query-tags/ has the algorithm used for the smart way of searching tags used in "axi-cache search --tags"
21:49:48 <komozo> enrico: thanks
21:49:51 <enrico> komozo: could those be examples of what you're looking for? If not, please ask more :)
21:50:21 <dapal> enrico: next, <valhalla_> QUESTION: is partial tagging better than no tagging, or is it better not to add a few tags to a package if one is not sure it is missing some other tag?
21:50:22 <enrico> more questions?
21:50:29 <dapal> enrico: (that one, and one more)
21:50:51 <enrico> valhalla_: partial tagging is better than no tagging
21:51:04 <enrico> valhalla_: the wiki phylosophy works: you do your bit, someone else will do their bit
21:51:28 <enrico> valhalla_: there are "special::not-yet-tagged" tags in the web interface, removing those means one considers the package acceptably tagged
21:51:41 <enrico> valhalla_: worse case you can add some tags but leave it as "not yet tagged"
21:52:06 <enrico> Another interesting bit of the not-yet-tagged tags is that they are used to keep robots away
21:52:31 <enrico> there are tagging "robots" that use euristics on package information to decide that some tags could be added
21:52:44 <enrico> but they only work on packages that have "not-yet-tagged" tags attached
21:53:12 <enrico> only a human would remove the "not-yet-tagged" tags, so the tagging robots will respect the superior intelligence of humans and stop interfering :)
21:53:23 <TetsuyO> cool :)
21:53:35 <dapal> enrico: QUESTION: <hlf> can we use udd to search for tag like Implemented in C
21:53:58 <enrico> hlf1: I believe there is a debtags table in UDD, yes
21:54:29 <enrico> (http://wiki.debian.org/UltimateDebianDatabase is the page describing UDD)
21:54:38 <enrico> (for those who haven't heard it)
21:54:41 <dapal> (queue empty :))
21:54:49 <enrico> it's the Ultimate Debian Database, a big source of information about Debian
21:55:16 <enrico> I'll move on with the trip, possibly a bit quicker (that is, going into a bit less detail) because there is more
21:55:45 <enrico> An interesting newish software is apt-xapian-index, that we quickly mentioned earlier because of axi-search
21:56:17 <enrico> apt-xapian-index maintains another index of package information in your system, in /var/lib/apt-xapian-index/
21:56:45 <enrico> it does not replace apt's index, but it adds to it: it's designed to support higher-level queries
21:57:05 <enrico> it cannot be however used for installing packages because it cannot do depedency resolution (apt does that well, why reimplementing it)
21:57:11 <enrico> axi-cache is a tool that uses apt-xapian-index
21:57:27 <enrico> for example, "axi-cache search image editor" will show you image editors
21:57:44 <enrico> http://paste.debian.net/102573/ is an example in my system
21:58:23 <enrico> It will also suggest terms to improve the search, show a little tag cloud of extra tags you could use (text only, so somehow simplified)
21:58:28 <enrico> and it will also do spell checking
21:58:45 <enrico> axi-cache search firefax -> Did you mean: firefox ?
21:59:22 <enrico> it has really nice tab completion (dapal being the bash-complation maintainer as well as an extremely helpful fellow)
21:59:36 <enrico> axi-cache search <TAB> will start suggesting you tags
21:59:46 <enrico> axi-cache search image <TAB> will search you image-related keywords, and so on
22:00:26 <enrico> a really interesting feature of apt-xapian-index is that it can index all sorts of package information, even things that are not found in the Packages file
22:01:04 <enrico> one can implement more indexing features via plugins
22:01:29 <dapal> (another applause?)
22:01:34 <enrico> it's also self-documenting: every indexing run generates an updated version of /var/lib/apt-xapian-index/README which documents what is in the index
22:02:49 <enrico> so debtags tags are indexed for fast lookup in the apt-xapian-index index
22:03:02 <enrico> that's why axi-cache can generate tag clouds and suggest tags so quickly
22:03:29 <enrico> (I want a tag cloud in every graphical package manager! We're almost in 2011!)
22:04:31 <enrico> I was looking for a blog post where I show the algorithm for computing tag clouds but I can't find it right away
22:04:48 <enrico> extra information you find in apt-xapian-index:
22:04:55 <enrico> - "newness of a package"
22:05:29 <enrico> - "GUI menu entries for applications provided by this package" (and their icons)
22:05:40 <enrico> - translated package descriptions
22:06:14 <enrico> for example, you can look for "all packages that provide an application in a menu entry"
22:06:46 <enrico> I used this feature to implement fuss-launcher (http://www.enricozini.org/2010/debian/fuss-launcher/)
22:07:12 <enrico> which was interesting, because it uses Debian package information to look, not for packages, but for programs to run
22:07:55 <enrico> ideally you could write an application launcher that shows, grayed, matching applications that are not installed; then you could ask for information about them, and ask it to install them
22:08:06 <enrico> all the data is there, indexed and querable in a very fast way
22:08:22 <enrico> "newness" of a package is a very new feature
22:09:13 <enrico> in a nutshell, every time apt-xapian-index sees a package that wasn't there before, it takes note of the date
22:09:31 <enrico> so you could search or sort packages by "how recently they appeared in my system"
22:09:46 <enrico> like the "New packages" view of aptitude, but with history
22:10:00 <enrico> "what was that package that was new last week?"
22:10:24 <enrico> there are currently no UIs I know of that use this information, but the data is there
22:10:46 <enrico> ready to be used
22:11:17 <enrico> "newness" is not information about a package per se, but more like information about a package in a specific system
22:11:28 <enrico> other similar information is "is the package installed?"
22:11:43 <enrico> or "was the package installed automatically or was it explicitly requested by the user?"
22:12:00 <enrico> these you usually find in aptitude or apt
22:12:09 <enrico> there is more
22:12:29 <enrico> if you have popularity-contest installed, you get /var/log/popularity-contest with information about when you last used every package in your system
22:12:50 <enrico> it'd be trivial to write a script that shows you the packages you have installed but never used, using that information
22:13:31 <enrico> (I need a plugin to get that information into apt-xapian-index, so that one can sort packages by "when did I last use it" in the axi-cache results)
22:14:32 <enrico> I mentioned apt-xapian-index knows of what applications are provided by a package, even for packages that are not installed: it can do so thanks to the information provided in the "app-install-data" package, which contains a copy of the .desktop files contained in any package in Debian
22:14:56 <enrico> it's used to implement "find more applications for this menu" kind of features
22:16:02 <enrico> There is obviously more information about packages: most of it you can find in UDD (http://wiki.debian.org/UltimateDebianDatabase) if you know SQL
22:16:10 <enrico> for example: bug reports
22:16:21 <enrico> or all sort of information collected by the Debian-QA project
22:17:31 <enrico> ok, that's a general idea of information about Debian packages
22:17:59 <enrico> There is also quite a bit of information about packagers :)
22:18:11 <enrico> http://wiki.debian.org/DDPortfolio is a very good index
22:18:28 <enrico> you can use it to look up everything known about a Debian Developer
22:18:38 <enrico> (people in Front Desk use it quite a bit :)
22:19:29 <enrico> I notice now that I have another page in my notes
22:19:46 <enrico> I could:
22:19:50 <enrico> 1. keep talking for another hour
22:19:57 <enrico> 2. quick fire links about more information
22:20:05 <enrico> 3. keep the rest for another session
22:20:42 <dapal> enrico: in the meanwhile, question
22:20:49 <enrico> unfortunately I can't see from IRC whether you're all listening keenly or snoring loudly :/
22:20:56 <dapal> <nadir> QUESTION: the xapian in apt-xapian-index has got a meaning? I got problems to remember the name... knowing something about xapian might help.
22:21:29 <enrico> that is a very good point
22:21:50 <enrico> it's called Xapian because it's built on the Xapian indexing system http://xapian.org/
22:22:14 <enrico> unfortunately I don't know why they chose that name for their project
22:22:42 <enrico> in hindsight, apt-xapian-index should have had some more memorable name
22:23:10 <enrico> the idea was to not require users to install that package explicitly, but to have it as a dependency of high level package managers
22:23:17 <enrico> for example, goplay depends on apt-xapian-index
22:24:50 <enrico> Ok, so I got two votes for option 1 and none for 2 and 3
22:24:59 <enrico> so there are at least 2 people not snoring loudly :)
22:25:17 <enrico> More package information: popularity contest
22:25:26 <enrico> see http://popcon.debian.org/
22:25:37 <enrico> for example: http://qa.debian.org/popcon.php?package=debtags
22:25:58 <enrico> it shows some statistics of how many people have that package installed
22:26:24 <enrico> it has all sort of biases, but it's a way to implement a "sort by popularity" feature in a package manager
22:27:07 <enrico> such feature has not yet happened because there is still no proper way to acquire that information in a Debian system
22:28:07 <enrico> I'd like to have a way to have it done at "apt-get update" time, maybe with a file in the mirrors next to the Packages file; that would be the proper way to do it, but it would require coordination about several busy people in Debian
22:28:24 <enrico> still, it's in my wishlist of things to maybe tackle at some Debconf
22:28:50 <enrico> Another data source, really cute one, the EDOS Debian Weather: http://edos.debian.net/weather/
22:29:35 <enrico> It's a research project studying package dependencies
22:30:08 <enrico> they put together some really smart algorithms for checking dependencies, and as a demo they compute how "installable" Debian is on any given day
22:30:23 <enrico> if most packages can be installed fine, they show a sunny icon
22:30:40 <enrico> if there are so and so packages that are uninstallable due to broken dependencies, they show rain
22:31:15 <enrico> if there are a lot of broken packages today, maybe because there is some transition mess going on in sid, they show a thunderstorm icon
22:31:25 <enrico> so you can check how's the weather like before running dist-upgrade
22:31:28 <enrico> genius!
22:32:13 <enrico> I wanted them to make an applet with the Debian Weather to add to my panel, but I'm not aware it has been made yet :-/
22:32:40 <enrico> Another information source: apt-file
22:32:51 <enrico> you can use it to search the contents of packages
22:33:34 <enrico> for example, you hear a friend say "ah, you can do that by running foo". You run "foo" and you get "Command not found": what package contains foo?
22:33:43 <enrico> "apt-file search foo" will tell you#
22:33:52 <enrico> it uses the Contents files in the Debian mirrors
22:34:17 <enrico> if you look at http://ftp.debian.org/debian/dists/squeeze/ you'll see the Contents files
22:34:34 <enrico> they're very big, as they list the name of every file provided by every package
22:34:54 <enrico> in order to run apt-file, you need to run "apt-file update", which will download the right Contents files for your system
22:35:18 <enrico> if you're in a hurry, you can also use "rapt-file", which is also in the "apt-file" package. The "r" stands for remote
22:36:10 <enrico> so if you want to find out what is the package that provides GNU R, and "apt-cache search r" is not very helpful, you can use "rapt-file search bin/R"
22:36:36 <enrico> (alternatively, you can use "axi-cache search r" and wow yes it is that smart, it does the right thing)
22:37:18 <enrico> Questions so far?
22:37:47 <dapal> enrico: none in #dw-question
22:38:06 <enrico> If you're interested in tracking what happens in a package, there is also the Package Tracking System
22:38:10 <dapal> err
22:38:13 <dapal> there's a question
22:38:18 <enrico> at http://packages.qa.debian.org
22:38:36 <enrico> dapal: I'll take the question
22:38:47 <dapal> <nadir> Question: i ran across special signs, where apt-cache failed (often a + sign). Is axi-cache a way out?
22:39:24 <enrico> Good question.
22:39:59 <enrico> axi-cache delegates most of indexing and query parsing to Xapian, so it boils down to how Xapian treats special signs
22:40:20 <enrico> it looks like the + sign is handled properly: at least "axi-cache search a+" finds the A+ programming language
22:41:28 <enrico> I wouldn't know for sure about other characters, at least not without looking up the documentation of Xapian's TermGenerator and QueryParser
22:42:07 <enrico> talking about QueryParser documentation, http://xapian.org/docs/queryparser.html is a good piece of documentation for axi-cache
22:42:31 <enrico> you can for example do "axi-cache search mail AND NOT implemented::php"
22:43:15 <enrico> (...implemented-in::php)
22:43:56 <enrico> back to the package tracking system
22:44:40 <enrico> The Package Tracking System (packages.qa.debian.org) is a tool to track everything about a package
22:45:20 <enrico> If you look for example at http://packages.qa.debian.org/d/debtags.html you'll find a page with the package status and all sorts of links to every possible information available about it
22:46:05 <enrico> and in the bottom left of the page there is a little half hidden box where you can add your e-mail address to be kept "in the loop" about many things that happen to the package
22:46:50 <enrico> the little selection next to the email field has three options: sub/unsub/opts
22:47:10 <enrico> sub for subscribe, unsub for unsubscribe. Opts for subscription options
22:47:45 <enrico> http://paste.debian.net/102578/ is a list of all available subscription options
22:48:02 <enrico> it is a really nice tool
22:48:29 <enrico> you can get for example a copy of all mails reporting a new bug in a package
22:48:50 <enrico> or a mail with the changelog of every new version of the package uploaded in Debian
22:50:07 <enrico> Finally, still a bit work in progress, we have Debian Data Export: http://dde.debian.net/dde/ which is a web application to make it easy to download information about Debian packages
22:50:27 <enrico> it is currently used as the remote backend for rapt-file
22:51:41 <enrico> I'm looking for an example URL, a sec...
22:52:24 <enrico> For example, http://dde.debian.net/dde/q/bts/bynumber/123456 will give you all available information about Debian bug 123456
22:52:42 <enrico> by default, it shows a page with a bit of documentation
22:53:07 <enrico> but you can add ?t=FORMAT and it will give you the same information in a format of your choice: for example, http://dde.debian.net/dde/q/bts/bynumber/123456?t=json
22:53:45 <enrico> http://dde.debian.net/dde/ lists the formats that are available: currently JSON, YAML, CSV and Python Pickled objects
22:54:11 <enrico> the JSON export is interesting: a DDE plugin can become the backend for a Javascript web application
22:54:50 <enrico> that is, incidentally, how I intend to implement the interface for maintainers to approve changes to the Debtags tags of their packages
22:55:46 <enrico> This more or less brings us to the end of my notes
22:56:23 <enrico> I'd do the final questions
22:56:34 <enrico> (if any)
22:57:12 <enrico> Personal relection of mine: we have way more information that we currently show
22:57:49 <dapal> seems like there are no questions?
22:57:59 <enrico> there is an incredible amount of neat applications that can be built on it
22:58:36 <enrico> I hope this trip can inspire more such applications to appear :)
22:59:01 <dapal> MadameZou: time to #endmeeting ? :)
22:59:13 <MadameZou> dapal: yes sir
22:59:29 <MadameZou> #endmeeting