summaryrefslogtreecommitdiffstats
path: root/devel/management/commands/reporead.py
AgeCommit message (Collapse)AuthorFilesLines
2013-12-18reporead: implement delayed parsing of files dataDan McGee1-23/+35
This gives us some large memory savings in python due to the internal storage of Unicode strings vs. byte strings, as well as saving us processing time up front for filelist data we are never going to have to actually use. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-12-18reporead: bring back batched_bulk_create()Dan McGee1-1/+19
For packages with filelists with > 80,000 items, we were starting to see some serious memory issues in reporead. This was both on the statement generation side in Python as well as on the database side. Break the updates into chunks of 10,000 when we encounter packages with tons of files to control things in a bit. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-11-07Move signature data from base64 string to bytes typeDan McGee1-1/+2
Signed-off-by: Dan McGee <dan@archlinux.org>
2013-11-07Django 1.6 upgrade, deprecation cleanupDan McGee1-5/+5
PendingDeprecationWarning: commit_on_success is deprecated in favor of atomic. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-11-07Fix parsing of depends with both epoch and descriptionDan McGee1-2/+2
Not a common case, but one we can and should support and hasn't been noticed up until this point. That pesky colon! Fixes FS#37477. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-03-29Remove old-style build date parsingDan McGee1-7/+3
This was added in 2010 in commit e95c4563e32 as a short-term fix. The short-term is up. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-02-09reporead: remove batched_bulk_createDan McGee1-27/+5
Now that Django 1.5 is out and realized SQLite3 only allows for 999 parameters per SQL call, we don't need to manually batch things up anymore and can let the underlying bulk_create code do it for us. This basically reverts commit 88ee61a39ac3. Signed-off-by: Dan McGee <dan@archlinux.org>
2013-01-16Handle connection and transaction more properly in reporeadDan McGee1-0/+1
A few minor things are fixed here. One is PostgreSQL, and more specifically pgbouncer, don't like it when the connection is closed after psycopg2 has started an implicit transaction even for read-only queries. Ensure we call commit as our last database action in all cases. The other is related- Django in management commands doesn't ever call close on any database connection you may have been using, so PostgreSQL gets mad about this fact and logs a message saying such. Close the connection explicitly when we are done with it to play nice. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-12-31Add 'created' field to packages modelDan McGee1-2/+4
This will be used to eventually implement the UI side of FS#13441, but to do that, we first need the data. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-11-16Use python set comprehension syntax supported in 2.7Dan McGee1-1/+1
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-11-16Use Python 2.7 dictionary comprehension syntaxDan McGee1-2/+2
Rather than the old idiom of dict((k, v) for <> in <>). Signed-off-by: Dan McGee <dan@archlinux.org>
2012-10-12reporead: don't print full backtrace if unnecessaryDan McGee1-3/+6
In the architecture agnostic case, this error is much more likely to happen, so printing it like an error message is deceiving. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-09-20Explicitly close the database connection in reporeadDan McGee1-0/+1
This is the cause of these warnings showing up in the PostgreSQL log: LOG: unexpected EOF on client connection with an open transaction All management commands are guilty of this as they do not clean up and close the connection when they exit, unlike the standard web request cycle. Other commands should probably be updated as well, but for now, this is the biggest culprit. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-09-18Sort package list before inserting it into the databaseDan McGee1-1/+4
FS#30323. This will take some time to propagate to all existing packages, but all new and updated packages will start getting filelists in the right order. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-08-09Extract parse_version function from reporead logicDan McGee1-7/+2
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-08-04reporead: import make and check dependsDan McGee1-2/+5
We don't have these in the database yet, but future verisons of repo-add will put this information in the sync databases. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-08-04Make adjustments for optional -> deptype conversionDan McGee1-3/+3
Very little dealt directly with this field. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-28reporead: don't use iexact lookup on arch nameDan McGee1-3/+3
We don't do this anywhere else, so we shouldn't do this here either. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-25Remove custom utc_now() function, use django.utils.timezone.now()Dan McGee1-4/+5
This was around from the time when we handled timezones sanely and Django did not; now that we are on 1.4 we no longer need our own code to handle this. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-09Work around bulk_create limitations in sqlite3 in reporeadDan McGee1-6/+28
Given the 999 SQL statement variable limit, we can easily hit it when updating a package with thousands of files or a few hundred depends. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-09reporead: disable FULL synchronous writes for sqlite3Dan McGee1-0/+6
At least on Linux, we hit a huge bottleneck waiting for the FULL commit to happen for each added package during reporead operations. It makes much more sense to back this off to FULL level instead, which trades some possible loss of durability for speedier operation. Additionally, no one would possibly be running their production version of this site on sqlite3, right? Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-06reporead: don't append slash to empty (root) directoryDan McGee1-1/+2
Add the slash only if we have a directory name, and not otherwise. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-05reporead: handle files in root directory properlyDan McGee1-1/+4
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-05reporead: properly handle cases where last_update == files_last_updateDan McGee1-2/+2
We should assume the filelists are up to date in this case, not out of date. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-07-02Log package updates during reporead invocationDan McGee1-1/+6
This adds a Manager and log_update method to help log all updates made to the packages table during reporead runs. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-05-19reporead: fix copy/paste issueDan McGee1-1/+1
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-05-19Switch to usage of new Depend objectDan McGee1-8/+10
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-26Rename 'packagedepend_set' attribute to 'depends'Dan McGee1-1/+1
We do this for every other related package attribute, so do it here too. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-24reporead: use bulk_create() for more propertiesDan McGee1-13/+17
Depends, conflicts, provides, etc. can all be done via bulk_create. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-24Merge branch 'django14'Dan McGee1-23/+13
Conflicts: templates/releng/result_section.html
2012-03-24Make all datetime objects fully timezone awareDan McGee1-4/+8
This is most of the transition to Django 1.4 `USE_TZ = True`. We need to ensure we don't mix aware and non-aware datetime objects when dealing with datetimes in the code. Add a utc_now() helper method that we can use most places, and ensure there is always a timezone attached when necessary. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-24reporead: use Django 1.4 bulk_create() for package filesDan McGee1-4/+3
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-24reporead: use Django 1.4 select_for_update()Dan McGee1-15/+2
As per TODO comments in the existing code. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-24reporead: blow up when package found with wrong architectureDan McGee1-2/+3
Signed-off-by: Dan McGee <dan@archlinux.org>
2012-03-16reporead: rename Pkg to RepoPackageDan McGee1-3/+4
The bytes saved on the shorter name aren't worth it. Also ensure 'desc' is always initialized to None in case packages do not provide one. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-02-12reporead: only reset flag date if upstream version changesDan McGee1-1/+7
This preserves the flag date if only a simple pkgrel bump occurred, which makes sense more often than not for rebuilds. Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-19reporead: simplify and fix transaction management in update_common()Dan McGee1-15/+16
We can use the easier transaction.commit_on_success() decorator if we be sure to explicitly mark the transaction dirty. This fixes the issue where a raised exception in this code called neither commit nor rollback. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-12reporead: more efficient deletion of filesDan McGee1-1/+9
Rather than delegating to Django and batch deletion by ID, force issuing of a single delete query to clear out all existing file objects when necessary. This should speed up the deletion and update of packages with a lot of files by a non-trivial amount. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-12PyLint suggested cleanupsDan McGee1-1/+1
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-03reporead: don't update timestamp on --forceDan McGee1-1/+1
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-03reporead: fix --force flagDan McGee1-5/+4
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-01reporead: fix not defined variableDan McGee1-0/+2
Way to fail at refactoring, Dan. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-30reporead: split out filesonly update methodDan McGee1-75/+95
This removes a bunch of the conditional logic at a slight cost of some code duplication. However, the methods and madness is now much easier to follow. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-30reporead: fix filesonly needs update checksDan McGee1-3/+5
This was broken after the select for update changes. We really should split the whole filesonly update into another method instead of the current shotgun approach with conditionals everywhere. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-17reporead: don't trim pkgdesc lengthDan McGee1-3/+3
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-17Ensure reporead is protected against simultaneous runsDan McGee1-100/+106
This adds a bunch of transaction magic and SELECT FOR UPDATE stuff to reporead to cope with the now-concurrent runs of reporead we get when invoked from our inotify-based updater. The collision occurs with 'any' architecture packages as both repo databases contain the new version, and the updates occur at exactly the same time. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-16reporead: a few small tweaksDan McGee1-3/+4
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-16reporead: clean up some debug loggingDan McGee1-3/+5
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-16Improve primary arch validationDan McGee1-7/+14
Ensure we can accept either a Arch object or an architecture name when passed to read_repo() by moving the validation there and being a bit more careful about typechecking and object lookup. Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-26Ensure PGP signature values are not trimmedDan McGee1-1/+5
This makes them totally unusable for any real purpose down the road. Signed-off-by: Dan McGee <dan@archlinux.org>