Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
The real reason I originally added transactions to this code was to
prevent half-updates; e.g. a package gets in without the matching
depends values. We can safely commit between packages and resume
processing the database at a later time.
Take advantage of this fact and commit every so often in batch fashion
if we have a lot of updates piling up. In the case of updating the files
DB, this can really cut down on the need to hold open a long-running,
statement heavy transaction and get the information public faster.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Now that we aren't seeing odd segfaults and hung tasks, we can remove
the traceback stuff from the scripts. Also use the 'io' module only, it
has been long enough.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Rather than the twisted mix of local times and UTC times we currently have.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Ensure all our multivalued attributes already exist on the object
beforehand, and add some special sauce to handle the difference between
a package without files and a database without files entries.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This will come in more handy with our new models, but we can adapt groups
and licenses to use it first.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This comes with pacman 3.5, replacing the old "force" PKGBUILD option.
We parse it and store it for now, but don't display it anywhere just
yet. Also update a few queries relying on version differences in any of
the multiple parts.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
|
|
When importing over a million files, it makes sense to take the slightly
faster route and call the PackageFile() constructor directly rather than
going through the related manager's create method.
We can also get huge performance improvements, especially with files
databases, by using the 'io' rather than 'codecs' module. The former is
now implemented in C in 2.7 and results in a no-work import (so
measuring only the DB read speed) of extra.files.tar.gz from ~30 seconds
to ~5 seconds.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This allows us to store multiple licenses per package in a more elegant
fashion, and will later allow us to search and filter on this information.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Main change is just to move groups from the default packagegroup_set
location to a related_name of groups. Also refer to the Package class
directly rather than by text string if we have it available.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Don't use 'fmtstr % (arg1, arg2)' type format; logger can be passed a format
string and the arguments to populate it. Saves a bit of work for strings
that never end up getting displayed anyway.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
We didn't verify that the version in the files database was the same as in
the SQL side of things, so we could load old files for a new package and
lose track of this fact. When loading files, ensure the database version
matches the version in the package before continuing with the file load
operation.
There are also a few other small updates in here, like skipping the sanity
check for filesonly as we never delete packages, and removing some
unnecessary string concatenation operations.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
And make filename check more lenient.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Cleanup to some of the orphan code cleanup, especially so we are never lying
in the percentage we print, and remove a bunch of debug prints that aren't
all that useful.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
We had this set up as a unique ForeignKey before, which adds some
indirection due to the RelatedManager object being there. By making it a
OneToOneField, we can get the profile object directly, enforce uniqueness,
and also use it in select_related() calls to make our profiles page a bit
more efficient.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
|
|
This needed a little sprucing up as it has grown quite organically over the
life of this script. Make things a bit more pythonic through the use of
iterators rather than collection indexing, and try to generalize the special
cases of things a bit.
Also catch encoding problems early and fail gracefully rather than blow up
the entire package parser. A failed decode of a file should cause us to just
skip it rather than stop the entire parser. Worst case, this leaves that
package out of the web interface.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
When I have caught reporead behaving badly on the production box, I haven't
been able to successfully get a traceback without killing the process.
Hopefully using a different signal will allow me to actually capture some
data.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Just use a plain Exception instead since we don't get any added value by
subclassing.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
We had a bunch of extra imports, non-conventional variable names, spacing
issues, etc. that were relatively low-hanging fruit to clean up. Fix them
and make the code a bit cleaner in the process.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Every once in a while we see this command hanging on the main server but it
isn't making any system calls, so it is hard to tell where it is getting
stuck. Add a signal handler on SIGQUIT that will listen and print a
traceback when signaled.
This is the easiest thing to implement; future additions may need to be able
to hook up to a remote debugger (e.g. pdb) if this doesn't work.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
With suggestions from Jason Chu, make the code a bit less repetitive with
regards to exception handling and fallthrough to the next method of finding
the user.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Otherwise we get duplicate groups each time we update the package, and any
group removals would never actually happen.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This is a bit more work than just a simple field addition. We attempt to map
packager specs (e.g. "A. U. Thor <author@example.com>") to actual Django
users in a relatively robust way- first try matching on User.email, then
fall back to UserProfile.public_email, then finally try a name-based match.
For those packages we can't generate a mapping, the raw string is still
stored so it can be displayed.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
They show up but aren't hotlinked to anything...just yet.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Rather than go to the database for every single package on something like a
files update, use the one we already have. Duh.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Apparently Django 1.1.1 let null fields pass right through but this now
causes reporead to blow up in 1.1.2. Fix the issue and get things working
again by allowing nulls where it probably makes sense and including a
migration to fix the issue, which for the real database will be a no-op.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
We had a situation where the last 'any' architecture package was present in
the [testing] repo and never got removed because we never did the
db_update() call on that architecture. Instead of looping all possible
architectures and only calling if len() > 0, always call db_update() for
both the primary architecture and the 'any' architecture.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
And also add a data migration to add the value retroactively for anything
already in our database. We simply fall back to pkgname if pkgbase isn't
available.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This will allow files to be imported for all existing packages in the
database while not worrying about the files database being a touch out of
date. It utilizes the new files_last_update column to perform the insertion
and updating of file lists intelligently.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
This depends on some changes I made to our script that generates the file
list databases, but it allows us to treat the files databases in an almost
identical manner to a regular database. The only difference is the fact that
it contains 'files' entries.
One catch that will be addressed in a separate patch: if the files DB lags
behind the regular DB, running an update from it could cause packages in the
web interface to be downgraded. A 'no-add/remove' option could be helpful
for this case.
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Signed-off-by: Dan McGee <dan@archlinux.org>
|
|
Otherwise a --force will clear out all our flagged packages. :/ Whoops.
Signed-off-by: Dan McGee <dan@archlinux.org>
|